|Home | About | Journals | Submit | Contact Us | Français|
Animal models of human diseases are in widespread use for biomedical research. Mouse models with a mutation in a single gene or multiple genes are excellent research tools for understanding the role of a specific gene in the etiology of a human genetic disease. Ideally, the mouse phenotypes will recapitulate the human phenotypes exactly. However, exact matches are rare, particularly in mouse models of neuropsychiatric disorders. This article summarizes the current strategies for optimizing the validity of a mouse model of a human brain dysfunction. We address the common question raised by molecular geneticists and clinical researchers in psychiatry, “what is a ‘good enough’ mouse model”?
As molecular geneticists generate mutant models of human genetic diseases, a host of methodological questions arise. What are the criteria necessary to define the model organism? Which assays are most appropriate for phenotyping the disease model? How many tests are necessary, how many replications must be conducted, and which controls are essential? In the case of neuropsychiatric disorders, which behavioral assays are sufficiently analogous to the behavioral symptoms of the human syndrome? This overview discusses the basic concepts inherent in phenotyping animal models of human neuropsychiatric disorders.
Three criteria are commonly used to validate an animal model. (1) Construct validity incorporates a conceptual analogy to the cause of the human disease. Mutant mice with a targeted mutation in a gene implicated in a neuropsychiatric disorder have reasonable construct validity for that inactivation or polymorphism of the human gene. Neuroanatomical lesions, prenatal drug exposures, and environmental toxins offer other examples of putative causes of human diseases that can be replicated in animal models. For example, a mouse model of schizophrenia could test the hypothesis that the gene COMT confers susceptibility to schizophrenia by knocking out the COMT gene in the mouse genome [Babovic et al., 2007; O’Tuathaigh et al., 2007], or could evaluate a knockin of the humanized DISC1 polymorphism found in some schizophrenic patients [Pletnikov et al., 2008]. (2) Face validity incorporates a conceptual analogy to the symptoms of the human disease. Behavioral symptoms, neuroanatomical pathology, neurophysiological responses, and neurochemical abnormalities are examples of disease components or endophenotypes that can be modeled in animals. Endophenotypes are single behavioral, anatomical, biochemical, and neurophysiological markers for a given disease. The temporal progression of a neurodevelopmental or neurodegenerative disease is approximated in the animal model by repeating assays to generate a longitudinal profile at appropriate ages. For example, autism is diagnosed by three behavioral criteria, in which aberrant reciprocal social interaction is the primary diagnostic symptom. Our automated three chambered social approach task assays aspects of sociability in mice that are most relevant to the first diagnostic symptom of autism, and can be used repeatedly in the same animals for longitudinal analyses of neurodevelopmental models [Moy et al., 2004; Nadler et al., 2004; Crawley, 2007a; Moy et al., 2007; Yang et al., 2007; McFarlane et al., 2008]. (3) Predictive validity incorporates specificity of responses to treatments that are effective in the human disease. A specific class of drugs that ameliorates the human symptoms should reverse the traits in the animal model. Classes of drugs that are ineffective in the human syndrome must similarly be ineffective in the animal model. For example, rodent models of depression rely on antidepressant drug reversal of immobility in the tail suspension and Porsolt forced swim tasks, which involve inescapable stressors [Porsolt et al., 1977, 1978a, 1978b; Steru et al., 1985; Detke et al., 1995; Cryan and Mombereau, 2004; Crowley et al., 2005].
Two major goals of animal models are (1) testing hypotheses about the mechanisms underlying the disease, and (2) translational evaluation of pharmacological, behavioral, and other treatments for the disease. The more similarities in construct, face, and predictive validity between the animal model and the human disease, the stronger the model, and the more useful it will be for meeting these two goals. Further criteria include quantitative measures that are amenable to standard statistical analyses, methodologies that can be readily applied by many laboratories, and robust traits that are easily detectable above background variability. More importantly, results will have to be reproducible in replications across cohorts of animals in the same laboratory, and in different laboratories across geographic locations. A highly valid behavioral phenotype of a targeted gene mutation must replicate in three independent cohorts of mice from several generations of the mutant mouse line, and in the same line tested in other laboratories.
Targeted gene mutation technology has provided an enormous contribution to understanding the role of genes in behavior. Transgenic mice, which may have a new gene added or an existing gene overexpressed, and knockout mice, in which there is a loss of function of a gene through deletion or mutation such that the protein is not correctly synthesized, have been developed for many neurotransmitters, receptors, second messengers, transporters, and transcription factors. Conditional and inducible promoters, knock-ins of humanized gene polymorphisms, and microinjections of viral vectors containing genes and RNA interference sequences into neuroanatomical locations provide further elegant research tools. Results from these various categories of mutant mouse models are leading to a better understanding of the neurological underpinnings of behavior, and the proximal causes of human genetic disorders.
Behavioral, electrophysiological, neuroanatomical, and pathological phenotyping assays, conducted in a rigorous and comprehensive manner, are central to determining the functional outcomes of genetic manipulations in the nervous system. While the present discussions focus on behavioral phenotyping, convergence of findings from multiple disciplines will strengthen the interpretation of analogies to the human disease. There are also other issues that are important to consider about the utility and limitations of mouse models of human genetic disorders. For example, the actions of one gene may be modified by one or several other genes (epistasis) and the interactions of genes and environment [Rutter et al., 2006].
Evaluation of a new transgenic or knockout mouse starts with simple measures of general health, to rule out any gross abnormalities that might interfere with further behavioral testing [Crawley and Paylor, 1997; Bailey et al., 2006; Crawley et al., 2007]. Poor health is evidenced by labored breathing, blood crusted around the nose, very low body weight, abnormal rectal temperature, hypo-activity in a novel environment, hypersensitivity to handling, low activity in the home cage, absence of nest-building, poor coat appearance such as bald patches or sores, tremors, seizures, circling and/or other easily observed morphological abnormalities. Gross neurological functions are scored in an empty cage environment, including behaviors such as wild-running (general hyperactivity), excessive grooming, excessive freezing, and hunching while walking. Simple tests of neurological reflexes include eye blink, ear twitch, whisker twitch, and the righting reflex. This yes-or-no battery of quick tests can be conducted sequentially in the same mice. Usually the entire set of observational measures can be obtained from a set of 60 or 90 mice in 1 day.
Early detection of a general health issue will allow the investigator to then choose appropriate tasks within the behavioral domain of interest, to avoid confounds created by the physical problem. For example, if the mutant mice show impaired hearing, then choosing a cognitive task such as fear conditioning that contains a tone cue will not be useful. Instead, learning tasks that do not require intact hearing such as the Morris water maze, T-maze, or object recognition will be more appropriate. Rapid observational tests are available to examine each of the sensory modalities of a mutant mouse. Some afford measures of acuity, but most offer only present-or-absent criteria. Vision is assessed with an approaching object, such as a cotton swab, to determine whether the mouse blinks, and whether the mouse investigates or ignores the approaching object. A mouse with normal vision will usually approach the object. Alternatively, movement of the mouse from a brightly lit to a dark area of a cage assesses ability to see levels of illumination. Hearing is assessed simply with the Preyer acoustic startle, the reflexive flinch and eyeblink response to a sudden loud noise such as a hand clap near the ears [Henry and Willott, 1972; Huang et al., 1995]. Alternatively, automated acoustic startle equipment that delivers tones of varying decibel levels is used to score amplitude of whole body flinch and detect threshold levels of hearing [Logue et al., 1997; Paylor and Crawley, 1997; McCaughran et al., 1999; Willott et al., 2003]. Sensitivity to touch is measured by a flinch response to a toe pinch. Pain sensitivity is evaluated using standardized hot plate and tail flick equipment [D’Amour and Smith, 1941; O’Callaghan and Holtzman, 1985; Hole and Tjolsen, 1993; Bannon et al., 1995; King et al., 1997; Malmberg and Bannon, 1999]. Olfaction is measured by latency for the mouse to retrieve food buried 1 cm from the surface of the litter [Nelson et al., 1995; Takeda et al., 2001; Bakker et al., 2002; Luo et al., 2002; Wersinger et al., 2002], or to sniff a novel odor presented in a neutral environment. Alternatively, olfactory habituation/dishabituation task (Fig. 1) provides a more sensitive measure of detection of same and different odors, including social odors [Luo et al., 2002; Wrenn et al., 2003]. Highly sensitive analyses of sensory abilities require neurophysiological recording from the sensory nerve or sensory cortex during presentation of the relevant sensory cues [Erway et al., 1996; Steele and Morris, 1999; Pinto and Enroth-Cugell, 2000; Peachey and Ball, 2003]. Operant chamber tasks in which the trained mouse makes a nose poke response to a specific sensory cue, to obtain a food reinforcer, offer similarly sensitive assays of sensory abilities [Staubli et al., 1985; Eichenbaum et al., 1988; Zhang et al., 1998; Doty et al., 1999].
Automated and observer-scored tests are available to quickly evaluate motor functions of the mutant mice. A 5-min open field test allows a measure of general exploratory locomotion in a novel environment [Schmidt et al., 1982; Van Daal et al., 1987; Hess et al., 1992]. Total distance and horizontal activity capture major motor deficits. Automated software includes a tentative measure of anxiety-like behavior, amount of time spent in the corners and near the walls, versus ventures out to the center of the open field (Fig. 2). Motor coordination and balance is evaluated by the latency to fall from an accelerating rotorod [Jones and Roberts, 1968; Sango et al., 1995; Sango et al., 1996; Chapillon et al., 1998; Carter et al., 1999; Rustay et al., 2003] (Fig. 3). The hindpaw footprint test detects ataxias, from measures of the stride length and variability [Barlow et al., 1996; Crawley and Paylor, 1997; Carter et al., 1999]. Muscle strength is evaluated using a hanging wire test [Sango et al., 1996].
Assuming that general health, neurological reflexes, sensory abilities, and motor functions are sufficiently normal to avoid confounds, the mutant mice now proceed on to testing for complex behaviors relevant to the human behavioral syndrome. Many behavioral tests are available within each behavioral domain, as described in the extensive behavioral neuroscience literature. Choosing multiple behavioral tests that have different sensory and motor requirements, mediated by different brain regions, may increase the generalization of the results. In addition, choices can be made that avoid sensory or motor abnormalities. For example, in the cognitive domain, some tasks may require a motor ability (e.g. swimming in the Morris water maze) or sensory ability (pain perception in fear conditioning) that is not specific to the domain (learning and memory) targeted by the test. Alternative tests such as T-maze, novel object recognition, and operant chamber tasks will reduce the likelihood of underinterpreting the learning abilities of a mutant strain. Multiple tests for each domain of complex behaviors are illustrated in Table I.
How do we model human emotional disorders in mice? On a practical level, it is impossible for researchers to know the true emotional state of a mouse. It is similarly impossible to relate that state directly to the human experience. Aberrant behaviors symptomatic of human mental illnesses may be uniquely human, particularly those that are mediated by brain pathways without homology in rodents, e.g. the expanded prefrontal cortex of the human brain. However, many similarities between human and mouse neuroanatomy, physiology and neurochemistry allow comparisons of some of the behavioral and physiological responses to specific stimuli and events between the two species. If we break down a disease into individual components of the symptoms, causes, and treatment responses, then it may be possible to model components of the human disease in mice, without undue anthropomorphism.
Assays for anxiety-like behaviors in mice are mainly approach–avoidance conflict tests. Mice generally display high levels of exploration of a novel environment, but avoid brightly lit, open spaces. The elevated plus-maze (Fig. 4) and elevated zero maze present the subject mouse with the choice of spending time exploring the open areas of a plus-shaped or circular runway, elevated approximately 1 m from the floor, versus spending time exploring the enclosed arms and arcs of the elevated plus or circle [Handley and Mithani, 1984; Pellow et al., 1985; Lister, 1987; Shepherd et al., 1994; File, 1997; Heisler et al., 1998; Cook et al., 2001; Zorner et al., 2003; Mombereau et al., 2004]. Our light ↔ dark transitions test presents the subject mouse with the choice of exploring both a brightly lit open area and a dark enclosed area of a two-chambered cage [Crawley and Goodwin, 1980; Bailey et al., 2007]. Other anxiety-related tests include marble burying [Broekkamp et al., 1986; Deacon, 2006; Jacobson et al., 2007; Rorick-Kehn et al., 2007; Uday et al., 2007] and shock-probe burying [Sluyter et al., 1996; Sikiric et al., 2001; Degroot and Treit, 2002; Degroot and Nomikos, 2006; Gasparotto et al., 2007], and the Vogel thirsty lick conflict test [Vogel et al., 1971; Johnston and File, 1991]. All display predictive validity, as anxiolytic benzodiazepines shift the conflict towards more exploration of the aversive regions. Drugs working through specific subunit compositions of the GABAA receptor produce specific anxiolytic effects on these tasks. Sedation appears to be mediated by neurons expressing GABAA receptors containing the α1 subunit, whereas anxiolysis is mediated by receptors containing α2 and/or α3 [Morris et al., 2006]. New drugs with selective efficacy for receptors containing α2/α3 subunits have been developed and shown to produce anxiolytic effects in the elevated plus maze, fear-potentiated startle tests, punished responding in rats and primates [McKernan et al., 2000; Chilman-Blair et al., 2003; Rowlett et al., 2005]. Mouse models with mutations in various GABAA subunits have been useful in screening for anxio-selective drugs with minimal sedative properties [Rudolph et al., 1999; Low et al., 2000; Crestani et al., 2001; Morris et al., 2006]. Usually two or three anxiety-related tests are conducted to validate the robustness of the drug response.
Two assays commonly used to evaluate mouse models of depression are the tail-suspension test and the forced swim test [Porsolt et al., 1978a; Steru et al., 1985; Crowley et al., 2005; Cryan et al., 2005; Petit-Demouliere et al., 2005]. Both the tail suspension and forced swim tests measure the response to an inescapable stressor. For the first few minutes of swimming in a deep cylinder of water, or dangling from a bar on which the tail has been taped, mice will generally struggle to find an escape route. Subsequently, the mouse will stop struggling and float in the water or hang immobile. Time spent immobile is decreased by treatment with an antidepressant drug. These two tests focus on predictive validity only. Attempts to model the prominent anhedonia symptom of depression have employed a sucrose preference test [Cryan and Mombereau, 2004] that incorporates some face validity. Approaches to more comprehensive modeling of chronic social stressors relevant to the causes of depression include the Visible Burrow System [Blanchard et al., 1995]. This labor-intensive and time-consuming model is based on the natural tendency of mature male rodents to establish social hierarchies in the context of resource competition. Four males living in the large complex visible burrow environment will quickly establish a dominance hierarchy, wherein one becomes dominant and initiates frequent attacks towards the three subordinates. Subordinate rats and mice display myriad physiological and behavioral responses which are remarkably similar to stress-related symptoms in humans, such as avoidance, reduced activity, severe weight loss, increases in voluntary ethanol consumption [Blanchard et al., 1995]. This model has been fairly fruitful in advancing current understanding of a wide range of stress-related processes, including the alterations in the vasopressin and corticotrophin releasing factor (CRF) system, the serotonin and dopamine systems [Blanchard et al., 1991; Lucas et al., 2004], the galanin system [Holmes et al., 2003], hippocampal dendritic arborization [McKittrick et al., 2000], reproductive functions [Monder et al., 1994a, 1994b; Hardy et al., 2002], appetitive behaviors and alcohol consumption [Tamashiro et al., 2004; Choi et al., 2006; Tamashiro et al., 2006]. While the visible burrow system (VBS) model has superb face validity and construct validity as a model for stress-induced depression, its predictive validity remains to be determined. Other models for evaluating depressive-like effects in mutant mice include olfactory bulbectomy, learned helplessness, chronic mild stress and drug-withdrawal-induced anhedonia reviewed by Cryan and Mombereau .
Some of the symptoms of schizophrenia, such as auditory hallucinations and delusions, have not yet been modeled due to the difficulty of finding a correlate in animals. Deficits in sensory processing have proven more amenable to modeling in mice, including sensorimotor gating, working memory and social recognition. Sensorimotor gating is tested using prepulse inhibition of the startle response. A weak stimulus inhibits the subsequent response to a strong stimulus, if it is presented within 100 msec [Braff and Geyer, 1990; Geyer et al., 1990; Swerdlow et al., 1994; Geyer and Ellenbroek, 2003]. Prepulse inhibition is performed with a set of prepulse tones of increasing decibels preceding a loud acoustic stimulus, or preceding a tactile air puff directed at the eye. One major advantage of prepulse inhibition is that it can be run in various species including mouse, rat, and human with almost identical methods. Social cognition is tested in mice with assays of social interaction that are analogous to human measures of social interaction. A standard approach is to score interactions between a subject mouse and an unfamiliar stranger mouse of the same or different sex and strain, in an open field arena. The stranger mouse can be freely moving [Miyakawa et al., 2003], or contained in an enclosure that allows sniffing but not aggressive behaviors [Shi et al., 2003; Spencer et al., 2005; Sankoorikal et al., 2006]. This can also be done in an apparatus with multiple chambers (Fig. 5), to examine the mouse’s preference for a chamber with a novel social partner versus a novel object [Nadler et al., 2004; Crawley et al., 2007; Yang et al., 2007; McFarlane et al., 2008]. Working memory deficits in schizophrenia are modeled with mouse working memory tasks such as the eight arm radial maze [Olton and Papas, 1979; Braida et al., 2004; Horwood et al., 2004], delayed or spontaneous alternation in the T-maze or Y-Maze and delayed matching to place in the Morris water maze [Steele and Morris, 1999; Fernandes et al., 2006; Duffy et al., 2008]. Schizophrenia is a complex disorder with a heterogeneous group of symptoms that present variably across patients. Validity of a mouse model for schizophrenia is greatest when phenotypes relevant to two or more of the symptoms appear.
High-throughput screening is needed at early stages of preclinical drug development, forward genetics mutagenesis approaches, and analyses of large numbers of targeted gene mutation lines in core facilities. If only a small number of rapid behavioral tests can be conducted, which are the optimal choices? We suggest the quick measures of general health and neurological reflexes, to detect gross abnormalities, followed by one or two tests in the behavioral domain of interest. In the anxiety-related domain, the elevated plus maze is a good choice, as it includes within-task controls such as total arm entries to detect potential confounds of sedation and hyperactivity. In the depression-related domain, the tail suspension test works well in mice and is sensitive to chronic treatment with standard antidepressant drugs. In the schizophrenia-related domain, prepulse inhibition is most analogous to a discrete symptom commonly seen in the human disease. Recommendations for specific measures to include in rapid test batteries are available from several expert laboratories [Nolan et al., 2000; McIlwain et al., 2001; Rogers et al., 2001; Voikar et al., 2004; Godinho and Nolan, 2006; Paylor et al., 2006]. All of these tests can usually be conducted in the same mice. A cohort of mutant and control littermates are tested in a sequence that begins with the least stressful quick observational tests, followed by the more stressful complex tasks, e.g. elevated plus maze, prepulse inhibition, tail suspension, fear conditioning, and Morris water maze. Where positive findings are obtained, more in-depth follow-up behavioral tasks can then be pursued.
In some cases, one well-validated behavioral task provides the critical assay to address the investigator’s hypothesis. Discovery of circadian rhythm genes illustrates this point. In the early 1990s, Takahashi and coworkers at Northwestern University initiated a chemical mutagenesis project to discover genes that affect the circadian clock. Circadian wheel-running activity was employed as a single, well-validated, automated assay to screen about 300 mutagen-treated mice. The early detection of one mouse that exhibited a circadian period that was more than an hour longer than normal led to the discovery of the Clock gene [Vitaterna et al., 1994]. Follow-up investigations that similarly used the single circadian wheel-running assay subsequently discovered mPer1 [Sun et al., 1997], mPer2 [Albrecht et al., 1997], mPer3 [Takumi et al., 1998], and BMAL1 [Gekakis et al., 1998].
This brief overview of mouse behavioral phenotyping has alluded to the importance of control experiments for physical and procedural abilities to rule out artifactual explanations of deficits on complex behavioral tasks. Several other methodological issues are essential to consider. Numbers of mice are usually higher for behavioral assays than electrophysiological and neuroanatomical assays, because environmental factors in the home cage, such as dominance hierarchy status and maternal care, will influence behavior differentially across individuals within a treatment group [van Praag et al., 2000; Palanza et al., 2001; Benaroya-Milshtein et al., 2004; Wolfer et al., 2004; Lambert et al., 2005; Lazarov et al., 2005; Tucci et al., 2006; Champagne and Meaney, 2007; D’Andrea et al., 2007]. For most behavioral experiments, Ns of 10–20 per genotype and per sex are commonly used, for example, N =20 +/+ male, N =20 +/− male, N =20 −/− male, N =20 +/+ female, N =20 +/− female and N =20 −/− female. Genotypes need to be represented within each experimental test day, including +/+ and −/− littermates, to ensure that environmental variables in the home cage and during the experiment have equal effects across genotype groups. Background genes inherent in the inbred strain(s) used for the embryonic stem cells, blastulas, and breeding may interact directly or indirectly with the targeted gene of interest. Compendiums of behavioral traits for various inbred strains of mice are available [Lyon et al., 1996; Wehner and Silva, 1996; Banbury-Conference, 1997; Crawley et al., 1997; Jones and Mormede, 1999; Bolivar et al., 2000; Jackson and Abbott, 2000; Joyner, 2000; Cook et al., 2002; Holmes and Hariri, 2003; Bogue and Grubb, 2004], from which to choose the optimal breeding strain. Mixed genetic backgrounds often contribute extra variability to behavioral results. Backcrossing for 10 generations into a pure genetic background will lower the variability and increase the likelihood of detecting a subtle behavioral phenotype. These and other methodological issues are extensively discussed in the mouse behavioral neuroscience literature [Joyner, 2000; Nagy et al., 2002]. Development of collaborations with behavioral neuroscience laboratories may be a useful approach for molecular genetics laboratories to pursue behavioral phenotyping of mouse models of psychiatric disorders.
While it is premature to recommend a fixed set of “gold standard” behavioral tasks for mouse behavioral phenotyping, recommendations offered in this overview and in its references will serve to start the novice investigator on the right path. A “good enough” mouse model produces corroborative results in at least two tests within the behavioral domain (see Table I), without confounding artifacts as measured in relevant control tasks. Replication of findings in a second cohort of mice, using appropriate statistical analyses, will support the robustness of the mouse model to test hypotheses and development treatments. A more comprehensive review of behavioral assays and how to apply them to mutant mice may be found in source books including “What’s Wrong With My Mouse? Behavioral Phenotyping of Transgenic and Knockout Mice” [Crawley, 2007b], Current Protocols in Neuroscience, and in the many excellent review articles cited above.
Supported by the National Institute of Mental Health Intramural Research Program.