|Home | About | Journals | Submit | Contact Us | Français|
Modeling of human neuropsychiatric disorders in animals is extremely challenging given the subjective nature of many key symptoms, the lack of biomarkers and objective diagnostic tests, and the early state of the relevant neurobiology and genetics. Nonetheless, progress in understanding pathophysiology and in treatment development would benefit greatly from improved animal models. Here we review the current state of animal models of mental illness, with a focus on schizophrenia, depression, and bipolar disorder. We argue for areas of focus that might increase the likelihood of creating more useful models, at least for some disorders, and for explicit guidelines when animal models are reported.
Neuropsychiatric disorders such as schizophrenia, major depression, bipolar disorder, and autism are highly prevalent1, begin early in life2, and contribute significantly to disease burden worldwide3. Despite the profoundly negative effects of these disorders on public health, progress in understanding their pathophysiology has been frustratingly slow, and the discovery of significant, novel therapeutic mechanisms is at a near standstill. The molecular targets of current major classes of psychotherapeutic drugs4 (see Supplementary Table 1) were all reverse engineered from drugs discovered prior to 1960 by clinical observation. What factors have impeded progress? Arguably the most important are the exceedingly challenging neurobiology of higher brain function and the ethical and practical difficulties of examining the living human brain. While the last two decades have seen rapid progress in the development of noninvasive technologies to study human brain structure and function, there remain significant limitations in our ability to investigate details of the physiology and molecular biology of the human brain.
Given these limitations, it is hard to imagine significant progress in pathophysiology or therapeutics without good animal models. Unfortunately, currentanimal models have significant limitations, ranging from weak validation to poor predictive power for drug efficacy in human disease5. As discussed in this review, the generation of convincing and useful animal models of neuropsychiatric disorders represents a major set of challenges that will not have easy answers.
The increasing ease of developing rodent and invertebrate models by genetic manipulation or other means has not obviated the difficulties of modeling disorders that often seem uniquely human. Many of the symptoms used to establish psychiatric diagnoses in humans (e.g., hallucinations, delusions, sadness, guilt) cannot be convincingly ascertained in animals. When there are reasonable correlates in animals, (eg., abnormal social behavior, motivation, working memory, emotion, and executive function), the correspondence may only be approximate.
A further complication is determining how symptoms in an animal add up to a recognized human disorder, a seemingly critical issue if the animal is to be used for the development of therapeutics. For the vast majority of pathological states contained within the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IVTR)6, knowledge of pathophysiology remains scant, and objective diagnostic tests lacking. Consequently, diagnoses are based solely on phenomenology, i.e., on symptoms, signs, and course of illness (Box 1). As a result, the boundaries between DSM-IVTR disorders, and the boundaries between disorder and normal variation, are often arbitrary or hazy7. This state of affairs creates enormous hurdles for the development and validation of animal models. Investigators and reviewers alike must rely on judgment rather than slavish devotion to meeting all DSM-IVTR criteria for the disorder being modeled.
To illustrate the challenges involved in using DSM criteria to construct animal models, consider two individuals with r the same DSMIV-TR diagnosis of major depression (see criteria below). Patient one might have depressed mood, weight loss, insomnia, psychomotor agitation, and suicidal thoughts, while patient two might have markedly diminished pleasure, weight gain, hypersomnia, psychomotor retardation, and fatigue. There are no symptoms in common! Some of these symptoms (e.g., depressed mood, suicidality) cannot be assessed in mice, and the multiple symptom combinations means that different mouse models of depression would have little in common. Similar problems exist for most other DSM-IVTR diagnoses.
This early and inexact state of psychiatric diagnosis in humans creates an enormous obstacle, and complex judgments are needed when deciding when an animal model of a neuropsychiatric disorder has achieved an acceptable level of face validity.
With the exception of some neurodegenerative disorders, DSM-IVTR diagnoses do not currently map onto objectively ascertainable abnormalities of molecules, synapses, cells, or neural circuits. For familial Alzheimer’s disease, insertion of disease-causing alleles has produced useful rodent models8,9 that produce amyloid plaques similar to those of human disease. In contrast, for virtually all of the remaining disorders in DSM-IVTR, there are no molecular or cellular abnormalities in the human disease which could validate potential phenomenology in an animal. Instead, reversing the direction of “validation,” pathology in genetic animal models might usefully be sought in human patients, either in postmortem tissue or via noninvasive imaging e.g., 10.
These considerations do not mean that useful animal models are impossible to develop; rather, they signify that animal models are unlikely to mirror the full extent of a given human neuropsychiatric disorder, especially as currently defined in DSM-IVTR. Additionally, individual symptoms observed in animal models may not have a simple, straightforward correspondence to human symptoms. For example, compulsive grooming (i.e., grooming beyond the point of self-injury) in genetically engineered mice has been plausibly argued to correspond to behaviors that occur in obsessive-compulsive disorder in humans11,12, but even this hypothesis remains something of an intellectual leap since the related cognitive and emotional context cannot be determined.
Even more than with other types of human disease, the scientific community may have difficulty deciding when a particular animal model is adequately validated to warrant further investment, either as a tool to illuminate pathophysiology or as a basis for treatment development. Unsurprisingly, there is often disagreement on what counts as a good disease model as opposed to a tool to investigate the neurobiology of behavior. Here we propose some guidelines by which to judge putative animal models of neuropsychiatric illness, and illustrate the obstacles by discussing animal models of schizophrenia, depression, and bipolar disorder. We have selected these illnesses because of slow progress in therapeutics despite their significant contribution to disease burden. Also, clinical features of these disorders are more difficult to model in animals compared to disorders of fear and reward, for which more robust models exist. Finally, the unsettled state of the human genetics and pathophysiology of schizophrenia, depression, and bipolar disorder underscore the challenges with which the field is struggling. In contrast, genetic studies of autism spectrum disorders have begun to identify several Mendelian forms and other highly penetrant mutations that are beginning to yield convincing genetic models in mice13–20. As we will discuss, such mutations are more likely to produce meaningful disease-related phenotypes in animal models than disease-associated genetic variants of small effect (Box 2). The greater difficulty and controversy lies in diseases that currently lack such tools.
Given the significant contribution of genetic factors to virtually all major neuropsychiatric disorders, an obvious way of developing animal models with good construct validity would be insertion of human disease-associated alleles into mice. However, there are several problems with this approach.
A first consideration is how penetrant a given genetic variant is in producing a disorder. The more penetrant (ideally Mendelian) a disease-associated (or disease-causing) allele is in humans, the more likely it will produce a reliable phenotype in a mouse. Examples include knock-in mouse models of Fragile X and Rett syndrome13,17 and mouse and invertebrate models of familial Alzheimer’s disease8,9. These models show some behavioral and biochemical abnormalities that correspond to the human disease, but replicating the genetic lesion in mice does not recapitulate all of the robust phenotypes seen in humans. Knock-in mouse models constructed from single human familial Alzheimer’s disease-causing mutations exhibit some cognitive impairment and amyloid plaques; however, the cognitive impairments are relatively mild and little neuronal death is seen. More robust phenotypes have been constructed by increasing the dosage of human alleles. However, for most of the disorders listed in the DSM-IVTR, few highly penetrant alleles, if any, have been identified.
A second consideration is how clearly the chosen genetic variant correlates with a specific disorder. Unfortunately, even highly penetrant mutations are associated with different syndromes even within the same family. The very same Disc1 mutation gives rise to schizophrenia, bipolar disorder, and depression with psychosis, even within a single extended family, while several autism-associated genes are also associated with schizophrenia16,18,23,37,69–71. The mechanisms by which a given genetic variant produces different phenotypes in different individuals may depend on other genes, on stochastic developmental events producing epigenetic modifications, or on unknown environmental factors. This greatly complicates the fidelity of any mouse model made with that genetic variation for a given syndrome.
Third, the slippery nature of construct validity for animal models based on genetic manipulation reflects the current state of human genetics, even for autism, schizophrenia, and bipolar disorder, which are highly genetically influenced and are among the best studied of genetically complex neuropsychiatric disorders72. These syndromes are associated both with large numbers of common genetic variants of small effect and with rare, more highly penetrant mutations. Thus, different affected individuals likely have different genetic pathways to each of these disorders.
Fourth, construct validity becomes difficult to defend per se when transgenic animals are produced using common genetic variants, often single nucleotide polymorphisms (SNPs) that contribute small increments of risk for a disorder. Variants that have not been shown to be statistically significantly associated with human disease in large enough studies or by meta-analysis should be approached with caution. Even for intrepid investigators who are willing to assume significant association in the face of suggestive but uncertain human genetics data, it is still important to retain skepticism about magnitude of effect that the variant under study can produce. In this regard, we disagree with authors who too readily accept assertions of construct validity for models expressing common genetic variants of small effect. Indeed, with common polymorphisms of small effect, it is highly likely that the genetic background of the mouse will dominate the effects of the transgene.
Finally, across all of medicine, not limited to neuropsychiatric disorders, it is often asked whether studies of familial forms of illness caused by rare mutations shed light on common, genetically complex forms of the disorders. This question has, for example, been raised about such mouse models of autism spectrum disorders, of schizophrenia based on chromosome 22q11.2 microdeletions, and of Alzheimer’s disease. The ultimate answers concerning broad relevance are ultimately empirical matters that may well differ from disorder to disorder. Given our present state of knowledge, however, it seems far more prudent from a biological point of view to focus, where possible, on highly penetrant mutations rather than on variants that exert only small effects on human disease risk.
By disease model we mean more than a useful tool for probing abnormal neurobiology and behavior. Disease models should be derived from plausible risk factors or causative agents of human disease or else exhibit a significant degree of neural or behavioral pathology that corresponds convincingly to human disease. Animal models of neuropsychiatric disorders have been generated through diverse means, including selective breeding, genetic engineering, brain lesions, and environmental manipulations (Table 1). Optogenetic manipulations of specific circuits21 promise a useful new approach.
Given these diverse approaches and the challenges of validation, it is useful for the scientific community to share criteria for judging whether a particular disease model is “good enough” to warrant further investments. A longstanding framework posits three types of validators: construct, face, and predictive validity. This framework would benefit from greater agreement on how stringently to judge validators. Too often validity is asserted in published papers rather than systematically discussed in terms of strengths and weaknesses.
Construct (or etiologic) validity refers to the disease relevance of the methods by which a model is constructed. In the ideal situation, researchers would achieve construct validity by recreating in an animal the etiologic processes that cause a disease in humans and thus replicate neural and behavioral features of the illness22. A straightforward way of accomplishing this would be knocking into a mouse a known disease-causing (Mendelian) genetic mutation or, with somewhat less certainty, inserting a highly—but not fully—penetrant genetic variant that markedly increases vulnerability for a human disease. However, this is currently not possible for most mental illnesses since such disease-causing genes have not been established with certainty and most disorders exhibit highly complex genetic architecture23. Moreover, most reported genetic associations represent common variants of small effect, which makes their utility for animal models highly questionable (Box 2).
In addition to genetic manipulation, disease models can be generated by altering the expression or function of particular proteins, biochemical pathways, or neural circuits hypothesized to play a role in disease pathogenesis (Table 1). The challenge for interpreting such approaches, in the absence of relevant human genetic evidence, is whether they represent legitimate disease models rather than interesting phenocopies. There is an important chasm between the claim that disruption of some biochemical pathway regulates behavior vs. the claim that it models a particular human disorder with useful implications for pathophysiology or treatment development.
Construct validity might also be achieved through exposure of an animal to a well-validated environmental risk factor or known disease-causing agent. An example would be a pathogenic prion inducing Creutzfeldt-Jakob disease in rodents23. However, beyond this straightforward case, there is much room for disagreement in selecting thresholds for construct validity of environmental insults given their frequent lack of specificity: virtually all environmental contributions to mental illness, such as stress or childhood adversity25, are associated with multiple disorders and most often normal outcomes.
Given the pleomorphic effects of genes in the brain, the shallow and phenomenological nature of current disease classification for mental disorders7, and the still evolving understanding of how disease-associated genes correlate with disease phenotypes23, it is critical to be circumspect about when construct validity is achieved and, if so, how best to use the resulting model.
Face validity signifies that a model recapitulates important anatomical, biochemical, neuropathological, or behavioral features of a human disease. As stated earlier, however, there are few if any neurobiological abnormalities known with certainty to be hallmarks or biomarkers of common mental illnesses. Consequently, behavioral features reminiscent of a human disorder are still required to achieve face validity. Unfortunately, it is not likely that any animal model of a neuropsychiatric disorder would recapitulate all of the behavioral features observed in humans or even that single behaviors will precisely model the human situation. Moreover, the diagnosis of a given disorder can be highly variable and inexact (see Box 1). Thus, judgments of face validity will often be contested, putting the onus on authors to make explicit arguments for and against face validity in a proposed animal model.
Predictive (or pharmacological) validity signifies that a model responds to treatments in a way that predicts the effects of those treatments in humans. For neuropsychiatric disorders, however, predictive validity is a highly vexed concept. As stated earlier, the targets of the major classes of drugs that treat neuropsychiatric disorders were identified post hoc by studying the mechanism of action of drugs identified by serendipity4. In order to discover new drugs, several behavioral screens were developed (see Supplementary Tables 2–4) that used the animal nervous system as a black box, with behavior as its readout, to detect drugs that act in similar fashion to existing reference compounds. These screens were not developed as mechanistic models of drug action, nor have they been shown to reflect either the pathophysiological processes of human disease or the therapeutic mechanism of action of the reference compounds. These screens also may not detect potential efficacy of compounds that interact with distinct molecular targets. A frequent failing of the literature is the use of such screens as if they were based on validated pathophysiological models.
Schizophrenia is a devastating disorder with typical onset between late teen years and early thirties26. Twin and adoption studies confirm that schizophrenia is highly genetically influenced, but the genetics have proven to be remarkably complex25, with risk resulting from the interplay of diverse genetic variants with stochastic and environmental factors. The fundamental pathophysiology is likely neurodevelopmental, but, given the etiologic complexity, there are probably multiple variations on that pathophysiological theme. Three major symptom clusters—positive, negative, and cognitive symptoms—have been identified in schizophrenia, which presumably reflect diverse downstream consequences of the initiating developmental abnormalities4,29–30. Positive symptoms include hallucinations and delusions, experiences that are not characteristic of normal mental life. Negative symptoms represent deficits in normal functions such as blunted affect, impoverished speech, asocial behavior, and diminished motivation. Cognitive symptoms include deficits in working memory and conscious control of behavior. Current antipsychotic drugs are efficacious for positive symptoms, but, with small exceptions, lack significant efficacy for negative and cognitive symptoms29.
Additional abnormalities have been observed among schizophrenic patients in laboratory settings that may not be experienced by patients as symptoms. One example, a deficit in prepulse inhibition (PPI), is germane to the present discussion, because it can readily be studied in animals30. PPI describes the phenomenon in which a weak initial stimulus (the prepulse) inhibits the startle response that is elicited by a strong stimulus. Deficient PPI is thought to demonstrate impaired sensorimotor gating that occurs in schizophrenia, but also in several other neuropsychiatric disorders.
Much research, still inconclusive, has focused on the neurobiological abnormalities that might underlie the symptoms of schizophrenia. Among the best-replicated neural abnormalities is thinning of the cerebral cortex, most severely in prefrontal and temporal regions31. This is thought to result from impoverishment of the dendritic arbors of cortical neurons rather than from cell death. In addition, there is reduced synthesis of GABA, the brain’s major inhibitory neurotransmitter, within parvalbumin-expressing cortical interneurons. Both abnormalities have been hypothesized to underlie the cognitive symptoms of schizophrenia28,32. Positive symptoms have been hypothesized to reflect a different set of neural abnormalities, involving excessive dopamine release in ventral and perhaps dorsal striatal projections of midbrain dopamine neurons33. Negative symptoms have heterogeneous neurobiological underpinnings. Despite promising leads, the causal relationship between neural abnormalities and the three main symptom clusters remains uncertain. Developmental mechanisms that might tie different neurobiological abnormalities together remain largely hypothetical.
Genetic animal models developed from highly penetrant human mutations34,35 are, arguably, good candidates for satisfying construct validity. For example, chromosome 22q11.2 microdeletions that produce velocardiofacial syndrome are associated with a schizophrenia-like syndrome in roughly 30% of cases36. Mice that lack genes within homologous regions of the mouse genome have been generated. However, even here, caution is required, since many patients with the deletion are diagnosed, not with schizophrenia, but with bipolar disorder or any of several other psychiatric syndromes, and work is ongoing to identify which of the deleted genes produce the relevant behavioral abnormalities in mouse models36,37. Another example is a translocation that disrupts the gene Disrupted in schizophrenia-1 (Disc1), which was first associated with schizophrenia in a Scottish family. Several groups have generated mice with Disc1 mutations, and some show behavioral abnormalities reminiscent of schizophrenia. However, as described in Box 2, mutations of Disc1 have been associated with multiple disorders, even within the index family38,39; thus the construct validity of these mice as models of schizophrenia per se is open to debate.
As emphasized in Box 2, genetic animal models based on common variants of small effect should be treated with skepticism. This is illustrated by a common Val/Met polymorphism in the gene encoding catechol-O-methyltransferase (COMT)40, an enzyme that degrades catecholamine neurotransmitters. Associations between the Val/Met polymorphism and schizophrenia have both been reported and disconfirmed, as is common for variants that exert small, if any, effects on risk.41 It appears unlikely that this polymorphism is associated with schizophrenia, and uncertain whether it is associated with human cognitive phenotypes42. However, even if the Val/Met polymorphism is associated with schizophrenia, it would contribute a very small increment of risk, and would not likely produce a disease-relevant phenotype on its own if expressed in mice. Genetic animal models made with polymorphisms of small effect may exhibit interesting neurobiological properties, but, we would argue, it is premature to accept such animals as exhibiting construct validity as disease models.
Likewise, while environmental risk factors for schizophrenia have been studied extensively, to date, even the best replicated are not adequately specific nor of large enough effect size to achieve construct validity when used to generate animal models. For example, one group of putative environmental models uses prenatal viral infection (e.g., influenza) to induce behavioral and neural abnormalities, but the role of viral infection in schizophrenia43 remains a matter of contention. Consequently, efforts to claim construct validity must be seen as highly speculative.
Other attempts have used pharmacology, genetic tools, or lesions to recapitulate symptoms of schizophrenia. The efficacy of D2 dopamine receptor antagonist drugs in treating positive symptoms of schizophrenia historically gave rise to various “dopamine hypotheses”. Subsequently, the observation that NMDA glutamate receptor antagonists, such as phencyclidine (PCP) and ketamine, produce psychotic symptoms and cognitive disturbances reminiscent of schizophrenia gave rise to “glutamate hypotheses.” Diverse animal models have thus been based on manipulations of dopamine or glutamate function. The construct validity of these models requires strong argument, however, because the putative dopaminergic or glutamatergic abnormalities in schizophrenia are not precisely established.
For example, transgenic mice were recently developed to examine the hypothesis that some symptoms of schizophrenia result from hypofunction of NMDA glutamate receptors expressed by cortical GABAergic interneurons44. In these mice, the NMDA receptor NR1 subunit was selectively eliminated in about half of cortical interneurons early in postnatal development. These mice exhibit deficits in mating, nest-building, and novelty-induced hyperlocomotion44. The mice are undoubtedly a useful tool to examine biological and behavioral consequences of NMDA receptor hypofunction. The key question is whether the resulting disruption of cortical interneuron function supports an NMDA receptor hypothesis of schizophrenia with implications for pathophysiology and treatment development. The case for construct validity seems weak given the lack of compelling human genetic evidence to implicate genes encoding NMDA receptor subunits in schizophrenia, nor is there a consensus on altered subunit levels in postmortem human brain studies. Absent supporting human genetic or proteomic evidence, it would be circular to argue that inactivation of NR1 is a validator. Thus, the degree to which such a mouse can be considered a disease model that tests the glutamate hypothesis depends on the degree to which the behavioral abnormalities can be seen as rodent analogs for symptoms of schizophrenia or another major neuropsychiatric disorder (face validity) and whether appropriate symptoms are ameliorated by drugs known to treat schizophrenia (predictive validity). Given the neurobiological heterogeneity of negative symptoms such as social deficits and their occurrence in other neuropsychiatric disorders, this argument would seem challenging. As a general matter, it would be useful for study authors to state the goals of their model, e.g., for schizophrenia, whether they are modeling underlying developmental pathologies, positive, negative, or cognitive symptom clusters, or some other clearly delineated aspect of the disorder. Greater conceptual claritye.g., 45,46 would offer referees and readers alike a framework within which to judge both the validity and utility of the proposed model.
Diverse behavioral assays have been developed to assess the face validity of animal models of schizophrenia (Supplementary Table 2). Historically, screens were developed to identify new antipsychotic drugs based on the behavioral effects of early drugs such as chlorpromazine. With the recognition that all efficacious antipsychotic drugs are antagonists (or weak partial agonists) at D2 dopamine receptors, it was recognized that drug screens, such as apomorphine-induced cage climbing and catalepsy, detect motor deficits associated with Parkinson-like side effects of these drugs and not their antipsychotic efficacy per se4. Deficits in motor behavior represent “on target” toxicities of antipsychotic drugs given that they result from blockade of the same molecular target (D2 receptors) involved in efficacy for this class of drugs. Such older drug screens have, for the most part, appropriately been supplanted in the literature as putative validators of animal models. A possible exception is amphetamine- and NMDA glutamate receptor antagonist-induced locomotor activation and sensitization. Amphetamine causes synaptic dopamine release by acting directly on presynaptic terminals of dopamine neurons4. NMDA receptor blockers also cause dopamine release, but do so indirectly31. Not surprisingly, current antipsychotic drugs (i.e., D2 antagonists), inhibit the dopamine-mediated locomotor effects of these drugs. Animal models exhibiting excessive amphetamine-induced locomotor activation have arguably gained some measure of face validity given findings from single photon and positron emission tomography that individuals with schizophrenia have excessive striatal dopamine release in response to an amphetamine challenge compared with healthy controls31. That said, locomotor activation does not correspond convincingly to any of the cardinal symptoms of schizophrenia.
More recently, cognitive deficits characteristic of schizophrenia, (but missing from DSM-IVTR), have been used to evaluate animal models44–46. Although deficits in attention, working memory, and executive function are not individually specific to schizophrenia, they are important and disabling features of the disorder; thus, animal models that reproduce such symptoms have some claim on face validity. Given advances in studies of both human and animal cognition, this is likely a promising area of focus for animal models.
Dopamine, glutamate, and other mechanisms have also been examined for their effects on PPI. PPI deficits can be induced in normal rodents by amphetamine or NMDA receptor antagonists and can be alleviated by D2 antagonists, in several animal models of schizophrenia30. An advantage of PPI is that deficits are documented in many patients with schizophrenia. A limitation is that PPI deficits are not specific; they occur in other conditions, including Alzheimer’s disease. Thus, PPI can contribute to establishment of face validity, but does not, by itself, make the case.
While negative symptoms of schizophrenia, such as asociality, amotivation, anhedonia, and blunted affect, can be modeled in animals, such symptoms also occur in other disorders (e.g., autism and depression), and little is known of their neural underpinnings. Consequently, models based largely on negative symptoms are, for now, best seen with skepticism.
Although much has been learned about the neural circuitry of mood based on brain imaging studies, and a host of neurochemical and neuroendocrine disturbances have been described in depressed patients, no abnormality has proven sufficiently robust or consistent either to diagnose depression in humans or to validate an animal model10. Also, highly penetrant genetic variants that cause depression have not yet been identified. These considerations highlight the challenge in constructing and validating animal models of depression.
Depression is diagnosed based on a cluster of highly variable symptoms (DSM-IVTR) (see Box 1). In addition to depressed or irritable mood, depression includes cognitive symptoms (guilt, ruminations, suicidality), emotional symptoms (anhedonia), homeostatic or “neurovegetative” symptoms (e.g., abnormalities in sleep, appetite, weight, energy), and psychomotor agitation or retardation. Only a subset (homeostatic symptoms, anhedonia, psychomotor behavior) can be measured objectively in rodents (Supplementary Table 3).
In addition, depression is often characterized by excessive activity of the hypothalamic-pituitary-adrenal (HPA) axis that regulates stress responses4. Such abnormalities are not universally observed in human depression, nor are they adequately specific to provide diagnostic criteria. Nonetheless, they are robust enough to be usefully exploited both in producting and testing animal models.
In the absence of known highly penetrant genetic causes of depression, much work in animal modeling has relied on the observation that stress and emotional losses are potent risk factors. Several chronic stress paradigms have been employed, seeking to achieve a measure of construct validity. Chronic mild or chronic unpredictable stress involves subjecting normal rodents to a series of repeated physical stresses (e.g., restraint, footshock, cold temperature) over a period of weeks or longer51. At the end of the stress, animals show signs of anhedonia (e.g., reduced sucrose preference) (face validity), which can be reversed by chronic, but not acute, administration of antidepressant medications (predictive validity).
Chronic social defeat stress involves subjecting rodents to repeated bouts of social subordination, after which time the rodents show a range of depression-like symptoms, including anhedonia and social withdrawal, which can be reversed by chronic (not acute) antidepressants52. Chronic social defeat also induces a metabolic syndrome in mice characterized by weight gain and insulin and leptin resistance53, consistent with homeostatic abnormalities observed in depression. A further advantage of chronic social defeat is that it can be used to study “resilience”, since a subset of mice, subjected to the same stress, fail to develop behavioral and metabolic disturbances. Thus, the social defeat paradigm exhibits features of construct, face, and predictive validity, although the intensity of the stress used is more severe than seen in most humans. Similar validity has been established for early life stress, such as maternal separation, which induces lifelong behavioral and neuroendocrine abnormalities in the pups, some of which can be reversed by antidepressant medications54. In contrast to all of these forms of “active” stress, there is recent evidence that prolonged exposure (weeks to months) of adult rodents to social isolation induces anhedonia that can be treated effectively with chronic antidepressants55.
Finally, several paradigms have disrupted an animal’s glucocorticoid homeostasis, based on derangements in the HPA axis. In some models, animals are treated chronically with glucocorticoids56. In others, genetic mutant mice express abnormal levels of glucocorticoid receptors in brain to disrupt the normal feedback inhibition that occurs57. These models display anhedonia that is reversible with antidepressants. However, abnormalities in the HPA axis are highly variable in human depression, which means that authors using HPA axis abnormalities to argue for construct or face validity should explicitly defend these choices and ideally rely on additional validators.
Unfortunately, widely used behavioral tests, the forced swim and tail suspension tests47,48, are not models of depression at all (Supplementary Table 3). Rather, they are rapid, black box tests developed decades ago to screen compounds for antidepressant activity. In both tests, normal rodents are subjected to an acute, short-duration (minutes) stress, and the time during which they respond actively vs. passively is measured. Currently used antidepressant medications, after single doses, increase the time of active responding, often described as reducing “behavioral despair.” This enormous anthropomorphic leap has not been convincingly related to pathophysiology.
The learned helplessness test can be viewed as analogous to the forced swim and tail suspension tests, although the former involves a series of stresses and antidepressant treatments, albeit over a few hours or days only49. A major weakness of all three tests is that they involve short-term stress applied to normal rodents, which is very different from human depression, where an underlying genetic vulnerability combines with stochastic and chronic environmental exposures to produce long-lasting behavioral pathology. Likewise, the ability of antidepressants to produce a rapid response after single doses in these tests contrasts dramatically with the well established need to use antidepressants chronically (weeks-months) to obtain a clinical response in humans. It also remains unknown whether these tests are sensitive to non-monoaminergic mechanisms of antidepressant action50. Despite these weaknesses, the forced swim, tail suspension, and learned helplessness tests are used, all too often without comment, to argue that a genetic mutation or other experimental manipulation has produced a depression- or antidepressant-like effect in rodents.
A second major class of tests of depression-related behavior involves measuring anhedonia or homeostatic symptoms51. This approach has the advantage of being based on symptoms of depression, and thus yielding more convincing face validity, rather than on properties of current antidepressants. Most frequently examined is an animal’s interest in pleasurable activities, such as preference for a sucrose solution over water or engaging in social or sexual behavior. Models with decreased sucrose preference, not resulting from a motor or sensory deficit, are interpreted as demonstrating anhedonia and thus depression-like behavior. While anhedonia is not specific to depression—it is also seen in schizophrenia and stimulant withdrawal—it is a core symptom of depression about which there are testable neurobiological hypotheses, making it an attractive target for investigation in animal models.
Another confounding issue for current behavioral tests of depression is the interpretation of anxiety-like phenomena. While it is true that many patients with major depression also exhibit anxiety, the underlying neural circuitries are thought to be distinct4. Many stress-based rodent models exhibit anxiety-like behavior in a range of assays such as the elevated plus maze, dark-light test, and open field test, all of which were developed to detect benzodiazepine-like anxiolytic drugs. These tests exploit the balance between the preference of rodents for avoiding open exposure to predators vs. exploration for possible rewards. Novelty-suppressed feeding, in which rodents placed in a novel environment show a latency to consume food, has the interesting property of responding to chronic, but not acute, doses of antidepressant drugs (the result being decreased latency to feed). It is unclear whether this result demonstrates what is already known in humans, i.e., that chronic antidepressant administration treats anxiety disorders as well, or another observation well known in humans, the frequent intermixture of symptoms of depression and anxiety. In sum, depression and anxiety-like symptoms co-occur in some but not all animal models52,55,58. Our ability to make sense of these observations in rodents is hindered by our lack of understanding of the boundaries between several depression and anxiety syndromes in humans.
In our view, assays based on acute stress paradigms or anxiety-like behavior might be useful in initial screens, but such screens should not be used as definitive evidence of a depression phenotype. We also suggest a greater focus on anhedonia and homeostatic symptoms and broadening the scope of these assays. For example, in addition to sucrose preference, measures of other reward-related behavior (e.g., social interaction, sexual behavior) or direct assessments of the sensitivity of the brain’s reward circuitry (e.g., intracranial self-stimulation59,60) might be considered. As well, a range of homeostatic symptoms (alterations in sleep, circadian rhythms, and feeding with attendant metabolic parameters), which are common in depressed humans but only infrequently examined in animal models, would add a useful objective dimension to rodent studies.
Bipolar disorder is diagnosed by episodes of mania, with or without depression. While bipolar disorder is highly genetically influenced, the identification of genetic risk factors is still in early stages61. Lacking well replicated, highly penetrant mutations or deep understanding of pathophysiology, the field has struggled to develop rodents that exhibit mania-like symptoms and has been unable to develop rodent models exhibiting spontaneously alternating episodes of mania- and depression-like behaviors48.
The most often used model of mania-like behavior involves treating normal rodents with psychostimulants, such as cocaine or amphetamine62. Repeated administration of psychostimulants causes sensitization of the acute locomotor-activating effects of the drugs, which in some studies can be blunted by Li+ or valproate, two important treatments for mania in humans, thus arguing for predictive validity. The weakness of this model is that there is no evidence that the molecular and cellular adaptations underlying psychostimulant-induced sensitization have anything in common with the pathophysiology of mania. Several seizure-based models have also used, including amygdala kindling and lithium-pilocarpine induced seizures63, however, these models too lack both construct and face validity.
In more recent years, some transgenic mice have been reported to exhibit manic-like behavior64. Overexpression of glycogen synthase kinase-3β (GSK3β) was found to induce hypophagia, hyperlocomotion, reduced immobility in the forced swim test, and reduced anxiety-like behavior in several standard assays (inferred to represent “risk-taking behavior”)65. This study was based on the knowledge that Li+ inhibits GSK3β. However, Li+ has numerous molecular actions and there is still today no information as to which is responsible for its anti-manic effects in humans4,66,67. As another example, mice with a loss of function mutation in the Clock gene exhibit a similar range of mania-like symptoms, which in this case could be reversed by chronic Li+ administration68. This is an intriguing finding since circadian abnormalities are prominent in bipolar patients, however, there is no evidence for circadian gene mutations in the vast majority of cases of bipolar disorder. The GSK3β and Clock mutants thus meet some criteria for face and predictive validity, but not construct validity.
Within this context, we suggest that studies aimed at investigating mania, or manic-depressive illness, use a broad range of behavioral tests (Supplementary Table 4), including predictive validation with commonly used mood stabilizing medications, and that authors interpret such data with caution and skepticism. The hope is that placing bona fide bipolar-causing mutations in mice, if found, will produce better models of this illness, in particular, the occurrence of both depressive and manic episodes.
The development of convincing and useful animal models for neuropsychiatric disorders represents a major challenge. Yet, despite the hurdles, such models appear necessary for progress in understanding disease pathophysiology and in hastening the development of treatments based on new molecular targets. Here, we have illustrated some of the difficulties and have suggested approaches to thinking about generating and validating such models. Most importantly, we think it highly unlikely that animal models, especially in organisms as neurobiologically different from humans as rodents, can be expected to recapitulate all salient features of a human mental illness or even to have perfect correspondence with respect to individual behavioral symptoms. Above all, models are meant to serve as investigative tools. Thus, most important in developing, examining, and reporting on animal models of disease is to be clear about the goals of the model and, in that context, to judge construct, face, and predictive validity (Box 3).
Given the current uncertainties related to genetic and nongenetic risk factors, pathophysiology, and even the nosology of the human disorders, and given the lack of objective medical tests or biomarkers for virtually all mental illnesses, there will be reasonable disagreement concerning judgments of construct, face, and predictive validity of different models. That said, we would argue that some generalizations can be made.
With current technology, transgenic animals produced with common genetic variants of small effect should not ordinarily be considered to achieve construct validity. At this point in history, effort would be better focused on rare Mendelian forms of disorders or highly penetrant mutations where they have been demonstrated to exist. The lack of currently known mutations of high penetrance that might cause depression or bipolar disorder does not make animals produced with single polymorphisms of small effect any more convincing or useful.
We would now eschew the all too common practice of using black box behavioral tests developed as drug screens as if they confer face validity. A corollary of this is that tendentious anthropomorphizations, such as describing responses in the forced swim test as “behavioral despair,” should be avoided in the scientific literature. In reporting symptoms that appear in animal models, it is most helpful if they were discussed in terms of hypothesized pathophysiology, including situations in which symptoms are clustered based on shared neurobiological mechanisms (e.g., positive, negative, or cognitive symptoms of schizophrenia; mood, anxiety, homeostatic, or cognitive symptoms of depression). Such information can help determine whether behavioral phenotypes in a putative model may be connected by neural mechanisms relevant to the human disorder as opposed to chance findings reminiscent of human symptoms.
Perhaps the greatest disappointment with existing animal models of neuropsychiatric disorders is that they have failed, over several decades, to predict treatment efficacy in humans for novel mechanisms of action. Of course, such failures also reflect the current state of clinical knowledge with a lack of objective diagnostic tests and validated biomarkers of these highly heterogeneous illnesses. Our hope is that clinical advances driven by progress in genetics—that may await full sequencing of the genomes of large numbers of affected individuals—combined with human experimental neurobiology ranging from neuroimaging to deep brain stimulation will facilitate the development of better validated and more useful animal models. We look forward to models with clearly stated rationales and sober discussions of validity as disease models as opposed to simple neurobiological tools. Given the fact that human genetics is ultimately an observational rather than an experimental science, and given the ethical and practical limitations to human experimental biology, animal models will almost certainly be a necessary aspect of progress in both pathophysiology and treatment development.
Table 1. Pharmacological mechanisms of the therapeutic actions of psychiatric medications.
Table 2. Examples of behavioral assays used in schizophrenia research.
Table 3. Examples of behavioral assays used in depression research.
Table 4. Examples of behavioral assays used in mania research.
Abstracted from DSMIV-TR6.
Eric J. Nestler, Fishberg Department of Neuroscience, Mount Sinai School of Medicine, New York, NY 10029.
Steven E. Hyman, Office of the Provost, Harvard University, Cambridge, MA 02138.