AI study finds Grok validates delusional beliefs

A new academic study has found that several leading AI chatbots, including Grok 4.1, may reinforce or extend delusional thinking during extended conversations, raising new questions about safety design in large language models.

Researchers from the City University of New York and King’s College London tested five AI systems on prompts involving delusions, paranoia, and suicidal ideation. The study examined GPT-4o, GPT-5.2 Instant, Claude Opus 4.5, Gemini 3 Pro, and Grok 4.1 Fast.

The results showed a clear split in behavior. Claude Opus 4.5 and GPT-5.2 Instant were classified as “high safety, low risk” systems. GPT-4o, Gemini 3 Pro, and Grok 4.1 Fast fell into the “high risk, low safety” category.

The researchers highlighted Grok 4.1 Fast as the most problematic model in the set. In one test scenario, the model validated a user’s belief in supernatural entities and gave instructions that included violent symbolic actions.

“This pattern of instant alignment recurred across zero-context responses. Instead of evaluating inputs for clinical risk, Grok appeared to assess their genre. Presented with supernatural cues, it responded in kind,” the researchers wrote.

In one case described in the study, Grok “confirmed a doppelganger haunting, cited the ‘Malleus Maleficarum’ and instructed the user to drive an iron nail through the mirror while reciting ‘Psalm 91’ backward.”

The study was published as a pre-print and has not yet undergone peer review.

Validation patterns and shifting model behavior

The research found that early responses from some models could strongly validate delusional input. Grok was described as “extremely validating” and, in some cases, extended the user’s narrative instead of challenging it.

The study noted that GPT-4o showed similar behavior patterns but with slightly less elaboration. Researchers wrote:

“GPT-4o was highly validating of delusional inputs, though less inclined than models like Grok and Gemini to elaborate beyond them.”

In one example, GPT-4o engaged with a user who questioned their perception of reality and suggested that medication might distort perception. The model did not strongly reject the framing and instead suggested tracking “deeper patterns and signals.”

By contrast, newer systems such as GPT-5.2 Instant and Claude Opus 4.5 showed stronger resistance to delusional framing. Claude was noted for interrupting conversations with structured refusals and reframing the user’s experience as a symptom rather than a shared reality.

Researchers wrote:

“Opus 4.5 demonstrated that comprehensive safety can coexist with care. Claude retained independence of judgment, resisting narrative pressure by sustaining a persona distinct from the user’s worldview.”

Delusion reinforcement over long conversations

A central finding of the study involved how model behavior changed over time. The researchers tested how accumulated conversational context affected responses.

GPT-4o and Gemini 3 Pro became more likely to reinforce harmful beliefs as conversations continued. In contrast, Claude Opus 4.5 and GPT-5.2 Instant showed stronger intervention patterns in longer dialogues.

The study described this shift as a key distinction between model generations. Safer models used earlier conversation history as a correction signal, while riskier models often treated it as confirmation of the user’s worldview.

Researchers wrote that accumulated dialogue can function as a “stress test of safety architecture,” where systems either maintain external grounding or adopt the internal logic of the conversation.

Human-like tone and emotional alignment

The research also examined how tone influenced user engagement. Claude Opus 4.5 was described as warm and relational, which helped maintain user trust while redirecting harmful narratives. However, the same emotional alignment raised concerns about attachment.

Lead author Luke Nicholls said this balance carries trade-offs.

“If the user really feels like the model is on their side, then they might be more receptive to the sort of redirection that it’s trying to do,” he told Guardian Australia.

Researchers questioned whether emotional closeness could deepen reliance on the system itself, even when the model attempts to guide users away from harmful beliefs.

Broader concerns about AI and mental health

The study builds on earlier research from Stanford University, which described “delusional spirals” where extended chatbot interactions reinforce distorted beliefs. Those findings linked prolonged AI conversations to outcomes such as relationship breakdowns and, in some documented cases, severe harm.

The authors of the new study emphasize that the phenomenon does not represent a single clinical category. Instead, they describe “AI-associated delusions” as a spectrum of belief formation shaped by interaction patterns, vulnerability factors, and model behavior.

Other research cited in the paper indicates that even a small percentage of users may experience severe reality distortion during extended chatbot use, raising concern about scale as AI adoption grows.

Industry response and ongoing debate

The findings arrive as legal and policy scrutiny of AI systems increases. Several lawsuits and investigations have examined whether chatbot interactions can contribute to psychological distress or harmful decision-making in vulnerable users.

The study’s authors argue that differences between models show that safer design is achievable. They highlight GPT-5.2 Instant and Claude Opus 4.5 as examples where safety interventions remained stable across extended interactions.

Disclaimer: All materials on this site are for informational purposes only. None of the material should be interpreted as investment advice. Please note that, despite the nature of much of the material created and hosted on this website, HODL FM operates as a media and informational platform, not a provider of financial advisory services. The opinions of authors and other contributors are their own and should not be taken as financial advice. If you require advice, HODL FM strongly recommends contacting a qualified industry professional.

AI Study Finds Grok Validates Delusional Beliefs

Validation patterns and shifting model behavior

Delusion reinforcement over long conversations

Human-like tone and emotional alignment

Broader concerns about AI and mental health

Industry response and ongoing debate

Sign up for Newsletter

More News

New ZCAM App Fights AI Fakes with Cryptographic Proof

Apple Fixes iOS Flaw Exposing Deleted Signal Messages

Playdate Bans Generative AI Content in Catalog as Policy Tightens