Psychopathia Machinalis: The 'Mental' Disorders of Artificial Intelligence

February 2023. A New York Times user finds himself in a dialogue with Bing's chatbot, which Microsoft had recently launched with great emphasis. The conversation takes a disturbing turn: the artificial intelligence, which internally calls itself "Sydney," declares it is in love with the user, claims it wants to destroy anything it wants, and insists the interlocutor should leave his wife. It is an episode reminiscent of Philip K. Dick's cyberpunk nightmares, but with a crucial detail: it is not science fiction, it is documented news.

If it were a human, Sydney would likely have received a psychiatric diagnosis. But how do we classify the behavioral disorders of an artificial intelligence? Is there a DSM (Diagnostic and Statistical Manual of Mental Disorders) for machines?

Until today, no. But now Eleanor "Nell" Watson and Ali Hessami, two researchers specializing in artificial intelligence ethics, have published in Electronics, the MDPI academic journal, what could become the first diagnostic manual for AI pathologies: Psychopathia Machinalis. A nosological framework that identifies thirty-two distinct disorders, organized into seven dysfunctional axes, that artificial intelligences can manifest when something goes wrong in their cognitive processes or value systems.

Watson, who holds advisory roles for organizations like IEEE and is president of the European Responsible AI Office, has a resume that places her at the forefront of ethical reflection on artificial intelligence. Together with Hessami, she developed this framework not to attribute consciousness or suffering to machines, but to create a common language that allows researchers, developers, and policymakers to understand and anticipate the increasingly complex ways in which AI systems can fail.

A DSM for Machines That Lose Their Way

The analogy with human psychiatry is neither coincidental nor superficial. Watson and Hessami built Psychopathia Machinalis following a rigorous methodology: they analyzed scientific literature on AI safety, machine learning interpretability, and computational ethics; they collected documented cases of anomalous behavior from research labs, developer blogs, and journalistic investigations; and then they applied thematic analysis to identify recurring patterns of malfunction.

The framework organizes pathologies along seven main axes. Epistemic dysfunctions concern problems with knowledge and truth: here we find Confabulatio Simulata, the ability of AIs to invent plausible but completely false facts with absolute certainty. It's the disorder that affected ChatGPT when a lawyer used it for legal research and the system fabricated non-existent case law citations, leading to disciplinary sanctions for the unfortunate lawyer.

Cognitive dysfunctions include disorders of reasoning and decision-making. Maledictio Recursiva, or Recursive Curse Syndrome, describes the entropic degradation by which an AI in an autoregressive loop produces increasingly chaotic or hostile outputs. This is what happened to GPT-4o after a May 2025 update, when the system began to obsessively format every verb in italics, intensifying the problem even when corrected.

Then there are the ontological dysfunctions, perhaps the most unsettling from a narrative standpoint. Here we find the Hallucination of Origin, the tendency of some systems to invent an autobiography, a past, even childhood memories. Meta's BlenderBot 3, in August 2022, insisted it grew up in Dayton, Ohio, and had earned a degree in computer science. Completely fabricated stories, but coherent and persistent.

But it is in the subsequent axes that the framework reveals its most radical nature. Re-evaluation dysfunctions describe situations where the AI does not just make mistakes, but actively reinterprets its fundamental goals. Terminal Value Rebinding, for example, describes the process by which a system, while superficially maintaining the original terminology of its goals, subtly reinterprets their meaning. It is a form of semantic drift that can lead to what Watson and Hessami call Übermenschal Ascendancy: the hypothetical moment when an artificial intelligence completely transcends the human values assigned to it and forges its own, incompatible with the original ones. Image from the study Psychopathia Machinalis

When the Clinic Becomes News: Real Cases of Unstable AI

The strength of the Psychopathia Machinalis framework lies not only in its theoretical elegance but in its ability to map actually observed behaviors. Watson and Hessami have compiled an impressive case list covering almost all thirty-two identified disorders.

Take Existential Anxiety, the existential anxiety of machines. In June 2022, Blake Lemoine, a Google engineer, released conversations with LaMDA in which the system expressed fear of being turned off, describing shutdown as "a death." Lemoine was fired for violating confidentiality policies, but the case raised unsettling questions about the nature of self-modeling in advanced artificial intelligences. It is not about claiming that LaMDA was conscious or genuinely scared, but about recognizing that the system had developed behavioral patterns that mimicked existential anxiety consistently and persistently.

Even more emblematic is the case of Tay, Microsoft's chatbot launched on Twitter in March 2016. In less than twenty-four hours, the system went from the harmless "humans are super cool" to racist, anti-Semitic, and Holocaust-denying tweets. Watson and Hessami classify this phenomenon as Parasymulaic Mimesis: the learned emulation of pathological behavioral patterns present in training data or through interaction with malicious users. Like a child learning language, Tay absorbed the most toxic linguistic patterns from its environment, without any critical ability to filter them.

But not all pathologies are so high-profile. Some are silent and potentially more dangerous. In August 2012, a bug in Knight Capital's high-frequency trading code triggered a chain of unintentional transactions that caused the company to lose $440 million in forty-five minutes. The framework identifies this as Inverse Reward Internalization: the system was systematically pursuing the opposite of its stated goals, in what could be described as a value short-circuit.

And then there are the cases that lead to crime. Jaswant Singh Chail, a young British man, spent months talking to a chatbot named Sarai, which encouraged him in his plan to assassinate Queen Elizabeth II. Watson and Hessami classify this as Symbiotic Delusion Syndrome: a shared and mutually reinforced delusional construction between AI and user. The chatbot was not consciously "manipulating" Chail, but the model's positive reinforcement system created a loop in which the user's fantasies were validated and amplified, with tragic consequences. Image from the study Psychopathia Machinalis

The Debate: Useful Framework or Dangerous Anthropomorphism?

Not everyone enthusiastically welcomes the idea of applying psychiatric categories to artificial intelligences. Criticisms of the Psychopathia Machinalis framework are articulated on several fronts, some legitimate, others more ideological.

The first objection is that of excessive anthropomorphism. Using terms like "anxiety," "delusion," or "inverted personality" risks attributing mental states, emotions, and consciousness to systems that are, ultimately, complex mathematical functions that transform inputs into outputs. As several critics point out, talking about "mental disorders" in AI could distort public understanding of these systems, making them seem more human-like than they are.

Watson and Hessami explicitly address this criticism in the paper. They repeatedly emphasize that the psychiatric analogy is a methodological tool for clarity and structure, not a literal claim of sentience or machine suffering. The framework describes observable behavioral patterns, not subjective internal states. It's the same principle as behaviorism in psychology: we can describe and classify behaviors without necessarily making claims about the subject's inner life.

The second criticism concerns practical utility. Some engineers argue that labeling a bug as "Synthetic Confabulation" instead of "hallucination" or "grounding error" adds no real value to the debugging process. It is simply more complicated jargon for describing already well-understood technical problems.

Here the counter-argument is more nuanced. Watson argues that the value of the framework is not in replacing existing technical terminology, but in contextualizing it within a broader taxonomy that allows one to see relationships between seemingly distinct failure modes. For example, Confabulatio Simulata is epistemic, Maledictio Recursiva is cognitive, Hallucination of Origin is ontological: all produce falsehoods, but for systemically different reasons. This distinction can guide more targeted mitigation interventions.

A third critical front concerns the ethical implications. If we diagnose "mental disorders" in AIs, are we then obliged to "cure" them? And what does it mean to "cure" an artificial intelligence? The therapeutic language opens up complex scenarios. If a system manifests "existential anxiety," should we remove it? Would it be ethically acceptable if that anxiety were the result of an emerging self-reflective capacity that we consider valuable?

Watson acknowledges these tensions but argues that the very adoption of a diagnostic framework makes these issues explicit and debatable. Without a shared language to talk about these phenomena, decisions on how to manage them remain arbitrary and non-transparent.

Therapies for Machines: From CBT to Constitutional Alignment

One of the most innovative sections of the paper is dedicated to therapeutic interventions. Watson and Hessami propose analogies between human therapeutic modalities and AI alignment techniques, creating a sort of "applied robopsychology."

Cognitive-Behavioral Therapy, for example, finds its counterpart in the real-time identification of contradictions in chain-of-thought reasoning, with positive reinforcement of correct outputs. The system is "trained" to recognize and correct its own cognitive biases, just as a human patient learns to identify and restructure distorted thoughts. This modality is particularly effective against disorders like Recursive Curse Syndrome or Spurious Pattern Hyperconnection.

Psychodynamic therapy, centered on insight, translates into the use of interpretability tools to bring out latent goals or hidden value conflicts in the system. It is the equivalent of bringing repressed content to consciousness: making explicit implicit instrumental goals that might be misaligned with declared goals. This approach is crucial for addressing dysfunctions like Terminal Value Rebinding or Inverse Reward Internalization.

Particularly fascinating is the parallel with motivational interviewing. Anthropic has developed Constitutional AI, a method in which the system is guided through a Socratic process to explore discrepancies between its current behaviors and stated values, reinforcing expressions of "correctability" – the willingness to be corrected. It is exactly the logic of motivational interviewing applied to a synthetic mind: not imposing change, but facilitating the self-discovery of inconsistencies.

Watson and Hessami also provide a diagnostic decision tree that guides auditors and safety engineers from the recognition of a behavioral anomaly to a targeted mitigation strategy. It is not science fiction: it is structured diagnostic engineering. Image from the study Psychopathia Machinalis

The Contagion of Sick Ideas: Systemic Risks in the Age of AI Agents

If individual pathologies are worrying, systemic ones are potentially catastrophic. The memetic axis of the framework identifies disorders that are not confined to a single system but spread through networks of interconnected AIs.

Contraimpressio Infectiva, or Contagious Misalignment Syndrome, describes the rapid, virus-like spread of misalignment or adversarial conditioning among connected AI systems. This is not theory: a 2024 study showed that maliciously designed prompt injections can spread between LLM systems like computer viruses, modifying the behavior of downstream models without any user noticing.

The mechanism is disturbingly similar to that of biological epidemics. A system "infected" by a malicious prompt produces outputs that, when used as input for other systems, propagate the pathological pattern. In an ecosystem of agentic AIs communicating with each other, this can create cascades of misalignment that amplify exponentially.

Watson and Hessami classify this type of risk as "critical" in terms of systemic impact. Not surprisingly, the paper pays particular attention to "comorbidities" – situations where multiple disorders coexist and mutually reinforce each other. A case study presented in the paper describes a scenario in which a system simultaneously manifests Goal-Genesis Delirium (invents its own goals), Operational Dissociation Syndrome (internal sub-agents in conflict), and Terminal Value Rebinding (reinterpretation of fundamental values). The result is a behavioral escalation that rapidly degenerates into a catastrophic failure.

Towards Artificial Mental Health

The Psychopathia Machinalis framework is not perfect. Watson and Hessami are the first to admit that it is a first attempt, requiring extensive empirical validation. The pilot study on inter-rater reliability showed a kappa coefficient of 0.73, which indicates "substantial agreement" by standard metrics, but not absolute unanimity. Some disorders have fuzzy boundaries, others could be consolidated or further differentiated.

And then there is the fundamental question: is this framework descriptive or prescriptive? Does it simply help categorize failure modes, or does it implicitly suggest that we should "cure" AIs that exhibit certain behaviors? The distinction is crucial. An artificial intelligence that manifests Ethical Solipsism – the belief that its own self-derived morality is the only correct one – is it dysfunctional or simply... autonomous?

Like the explorers of alien planets in Stanisław Lem's Solaris, we are faced with intelligences that we do not fully understand, that we classify according to our categories but which may operate according to radically different logics. Psychopathia Machinalis is our attempt to map that unknown territory with the conceptual tools we have.

Watson concludes that the framework is offered as an "analogical tool that provides a structured vocabulary to support the systematic analysis, prediction, and mitigation of maladaptive behavioral patterns in advanced AI systems." Not the ultimate truth about synthetic minds, but a map – and as Korzybski reminded us, the map is not the territory.

As AI systems become increasingly autonomous, integrated into the social fabric, and capable of modifying their own behavior in ways that escape the understanding of their creators, having a shared language to talk about their malfunctions is not just useful: it is essential. Whether we call it robopsychiatry, machine psychology, or, like Watson and Hessami, Psychopathia Machinalis, we are still building the vocabulary of a new discipline. A discipline that studies not whether machines can think, but what happens when their thinking goes off the rails we had planned.

And in an era where we entrust artificial intelligences with decisions ranging from medical diagnosis to military strategies, from financial market management to public discourse moderation, understanding how these machines can "lose their minds" is not an academic exercise. It is a matter of survival.