According to several authors, an animal model of a psychiatric condition should fulfill a multidimensional set of criteria of validity to be considered relevant for human pathology. Many authors have proposed a list of such criteria, focusing on models of depression and models of anxiety (see Table ). Interestingly, since the 1960s, authors have felt increasingly concerned with criteria of external validity and less with criteria of internal validity (with exceptions such as [9
]). Internal validity addresses the consistency of the experimental design: reproducibility, inter-observer reliability, randomization, multicentric design, design (test-control), blind experimentation, and so on. These questions are indeed not specific to animal studies, but are widely shared across all fields of experimental science [11
]. On the other hand, external validity concerns the general question of the applicability of the results of a study on a sample to the target population: it obviously raises supplemental concern in the case of animal models because of the necessity to resort to analogical arguments. It is these concerns that led to the need for specific criteria for ensuring the external validity of animal studies. To our knowledge, the first attempt to define such criteria of validity for animal models was elaborated in 1964 by Janssen [12
]. This author proposed eight criteria to decide whether a procedure was relevant or not: efficiency, speed, simplicity, reproducibility, specificity, adequate design and data processing and correlation with other tests. These criteria did mainly apply to screening tests, and were rather pragmatic, as researchers were mainly interested in finding a device and/or protocol enabling them to rapidly test new compounds. It is to be noticed also that these criteria are not really relevant to translational research, as they did not refer to the clinical condition: the idea here was not to model a disorder, but to find a reproducible, reliable and rapid method to test compounds. Interestingly, this list mostly focused on criteria of internal validity. The first paper that explicitly proposed criteria for 'animal models', focusing on external validity, was published 5 years later by McKinney and Bunney [13
] and focused on depression. The literature in the field of animal models of affective disorders frequently cites this paper, claiming that McKinney and Bunney proposed four validity criteria (same etiology, same symptoms, same response to treatments and same biochemistry). As a matter of fact, this article presents the available methods to induce depressive-like symptoms and then proposes five requirements for an animal model: analogy of symptoms, existence of observable and measurable behavioral changes, interobserver agreement, same response to treatments and reproducibility of the system. However, these criteria were not well defined at that time, as their description was limited to one sentence in this original paper. Interestingly, these authors propose the criterion of similarity in symptoms and in response to treatments, which recapitulates two of the four criteria that are usually attributed to these authors. Concerning the two remaining criteria of the list of four (same etiology and same biochemistry), they cannot be recapitulated under the three remaining concepts they propose. For example, similarity in etiology is not really explicitly mentioned in that list, even if in the paper the authors describe social loss as one of the factors that can be used to elicit depressive-like symptoms. In 1977, additional criteria were added by Abramson and Seligman [14
]; they mentioned the similarity of etiology, but also an interesting criterion that was unfortunately abandoned: the precision of the sub-nosographic entity ('Does the laboratory model describe (...) a naturally occurring psychopathology or only a subgroup?'). However, most of the researchers working in the field of animal models of depression rely on the proposal made by Willner in 1984 of three criteria of validity: face validity, predictive validity and construct validity [15
]. Willner (personal communication) was inspired by the latter criterion as proposed 30 years earlier by Cronbach and Meehl [16
] in the field of psychology. Note that these criteria are still used by the European Federation of Psychologists' Association, albeit under different terminology. Willner's article can really be considered seminal in the field of animal models of psychiatric disorders (it is cited 547 times in March 2011), and most authors now refer to it, either by changing some of the criteria of that list or by adding a hierarchy between these criteria. Soubrié and Simon [17
] for example rather use the French terms for 'homology', 'isomorphism' and 'predictability' while Koob et al
] do not include predictive validity but add etiological validity and convergent validity. Geyer and Markou [9
] include etiological validity, convergent and discriminant validity, and claim that predictive validity is the crucial aspect. Koob et al
. consider reliability and predictive validity to be essential criteria, while face, convergent, etiological and construct validity are more secondary. For Sarter and Bruno [19
], on the other hand, construct validity is much more important than face and predictive validity. For Robbins [20
], homology is central for construct validity. However, it is possible that these diverging points of view also stem from different definitions of the various criteria. We thus will first try to carefully examine the definition of the various criteria, by focusing on the three criteria proposed by Willner [15
] or their equivalents.
According to Willner [15
], predictive validity relies on five sub-criteria: 'whether a model correctly identifies (1) antidepressant treatments of pharmacologically diverse types (2), without making errors of omission (3) or commission (4), and whether potency in the model correlates with clinical potency (5).' According to this definition, this criterion really relies on a pharmacological correlation (non-pharmacological treatments are not mentioned). It is clear from these examples that this criterion is not at all intended to translate aspects of human pathology in animals, as it is only concerned with pharmacological effects. In another paper by the same authors [21
], the criterion has been extended to include response to all available treatments (for example, in the case of depression, not only pharmacological antidepressants but also electroconvulsive therapy), so that one can conclude that it can correspond to a human-animal correlation of therapeutic outcomes. This concept is similar to one of the criteria proposed by McKinney and Bunney [13
], as the description given by these authors ('The treatment modalities effective in reversing depression in humans should reverse the changes seen in animals') more or less recapitulates Willner's sub-criteria 1, 2 and 3. It is, however, not convergent with the 'specificity' criteria of Janssen [12
] who claimed, 'Specificity, a given drug effects being characteristic for a well-defined class of chemicals and indicative of a specific mode of action.' There is no reference to psychiatric disorder, that is, to the idea that the treatment should reverse disease-related symptoms. However, the definition employed by Koob et al
] is quite different, in their paper focusing on anxiety, predictive validity is defined as 'the ability to make consistent predictions about anxiety based on an animal's performance in the model.' Definitions convergent with this proposal can also be found in Geyer and Markou's paper [9
], as these authors extend this criterion to what 'allows one to make predictions about the human phenomenon based on the performance of the model.' It is clear that their use of the term 'prediction' is not limited to the ability to predict the efficacy of treatments. So, this criterion of predictive validity is, in most cases, limited to the ability of the model to accurately respond to the treatments that are employed, but some authors also use it in a broader sense, including the model's aptitude to predict some specific markers of the disease.
For Willner [15
], 'Face validity is assessed by whether antidepressant effects are only present on, or are potentiated by, chronic administration (1), and whether the model resembles depression in a number of respects (2), which are specific to depression (3), and do actually coexist in a specific sub-group of depressions (4); also, the model should not show features which are not seen clinically (5).' By this definition, face validity interestingly encompasses both some treatment features and symptomatic aspects. Examples that Willner uses to illustrate this criterion include reserpine reversal, amphetamine potentiation, 5-hydroxytryptophan-induced depression, bulbectomy, isolation-induced hyperactivity, exhaustion stress and disturbance of circadian rhythms. The discussion about the fact that face validity applies to these models makes it clear that, according to this author, face validity includes both pharmacological similarity and phenomenological identity. For example, he mentions that in the unpredictable chronic mild stress (UCMS) model, antidepressants are effective after chronic, but not acute, treatment. He also notes that reserpine induces similar behavioral effects in animals and in humans, that hyperactivity and heightened glucocorticoid levels are observed both in depressed people and in rodents subjected to bulbectomy or to unpredictable chronic stress, and that elevation of the threshold for intracranial self-stimulation resembles the anhedonia displayed by depressed people. Later on, the same author claims that face validity corresponds to 'the extent of similarity between the model and the disorder is examined, on as wide as possible a range of symptoms and signs' [21
]. Here, therapeutic outcomes are not explicitly mentioned anymore and the definition rather shifts toward requiring the identity of symptoms. This is reminiscent of McKinney and Bunney proposing that, the symptoms of the depression so induced should be reasonably analogous to those seen in human depression' [13
]. Geyer and Markou [9
], as well as Sarter and Bruno [19
], define face validity as 'the degree of phenomenological similarity between the model and the disorder to be modeled.' It should be noted that this phenomenological identity, as formulated here, encompasses the behavioral and/or cognitive aspects only, not their physiological and/or neural bases. This suggests that, in fact, face validity corresponds to an attempt to mimic diagnostic criteria of the psychiatric conditions, such as those listed in the tenth revision of the World Health Organization's International Statistical Classification of Diseases and Related Health Problems (ICD-10) or the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders (DSM-IV); indeed, these criteria are generally behavioral and/or cognitive only, without referring to any etiology or biological basis. Another aspect should be considered here. In recent years, a debate has emerged (see, for example, [2
]) between the view that a relevant model should in fact apply to the disorder (depression, for example) or rather to dimensions, symptoms and/or endophenotypes (a model of anhedonia for example). In the first case, the phenomenon to be mimicked corresponds to a set of probably interdependent variables, while in the second case, there is no attempt to model a disorder, but rather to model one particular dimension of a disorder, which is possible if the various symptoms of a given pathology are independent from the others. In the first case, the changes observed in the animal should include several dimensions. For example, a model of depression should include anhedonia, but also changes in mood, in appetite, in sleep, and so on.
Concerning construct validity, the picture is rather complex and the views defended by various authors are summarized in Table . In his seminal paper on animal models of depression, Willner [15
] proposed that construct validity correspond to the fact that 'both the behavior in the model (1) and the features of depression being modeled (2) can be unambiguously interpreted, and are homologous (3), and whether the feature being modeled stands in an established empirical (4) and theoretical (5) relationship to depression.' The paper then describes several animal models of depression, discussing the fact that these models may or may not fulfill the construct validity requirement. Willner then discusses six methods for their potential ability to fulfill the construct validity criterion: learned helplessness, behavioral despair, UCMS, maternal separation, incentive disengagement and intracranial self-stimulation (updated list to be found in [25
] for anxiety models). This discussion indicates that sub-criterion 5, theoretical relationship to depression, is understood in a very broad and polysemic sense. It includes theories about the nature of the depressive state, the crucial impact of some dysfunctional processes (for example, that helplessness or anhedonia are central symptoms in depression), the dynamic of the disorder (for example, its biphasic course) and its etiology. The etiology, in turn, includes theories about the part some external events take in the triggering of a depressive-like state (stress or separation may cause depression in humans and depressive-like symptoms in non-human mammals), the central importance of some specific characteristics of these events (uncontrollability or unpredictability of the stressors as central mechanisms) and the involvement of underlying biological processes (for example, the participation of a dysfunction of the brain reward system).
Definitions for "construct validity"
These aspects could be considered different sub-dimensions of this criterion. The same concept, in which construct validity is seen as an attempt to establish a theoretical rationale of animal models both at the level of a similarity of the behavioral and/or cognitive dysfunctional processes and at the level of a similarity of the etiology, was developed in later papers by Willner [21
]. In a book chapter on animal models of depression [26
], the same author explicates two additional facts; firstly, that similarity between the biological dysfunctions in the clinical population and in the animal model is an essential aspect of this criterion; secondly, that homology between the modeled processes is not only required in addition to a similarity in the etiology and the cause of the abnormalities seen, but the link between these two levels should be translated as well: 'a theoretical account of the disordered behavior in the model, a theoretical account of the disorder itself, and a means to bring the two theories into alignment.'
In other terms, this means that if one considers that anhedonia, for example, is a crucial feature of depression (the first requirement above) and should be present in the animal model, and that anhedonia is caused by a dysfunction of the brain reward system including the nucleus accumbens (the second requirement), then the relationship between anhedonia and the function of the nucleus accumbens should be the same in animals and humans and its dysfunction should be similar in the depressed subjects and in the animal subjected to the model. A close assumption is found by Sarter and Bruno [19
]. However, in the paper by Geyer and Markou [9
], construct validity is also defined in relation to theoretical constructs, but it is clearly separated from etiological validity. Having given the example of the UCMS model, they claim that this protocol draws from theories on the link between 'stress and consummatory behavior', and assume that the role of stress in depression and anhedonia is a core symptom of depression. However, when trying to discuss this criterion, many authors ignore the first aspect (the similarity of the theoretical construct about the dysfunctional cognitive, behavioral and/or psychological processes) and thus mention only the second aspect, that is, the similarity of the etiology, either when theorizing about the external events causing the depressive state or focusing on the underlying biological basis (see [25
] for an exception). For example, concerning the first aspect, UCMS translates the diathesis theory of depression, as stress in vulnerable rodents may induce depressive-like behaviors. The diathesis theory of depression claims that depression relates to a predisposition that has been acquired during the developmental period, resulting both from genetic and from environmental factors and rendering the subject more vulnerable to triggering factors such as stress. The second aspect can be illustrated with the example of the model consisting of corticosterone administration in mice [27
], which in fact relies on the theory that depression is related to a dysfunction of the hypothalamus-pituitary-adrenal axis. Interestingly, when discussing a given animal model of affective disorder with regard to this criterion of construct validity, most authors only focus on one of these aspects, insisting either only on theories about the dysfunctional process (for example, focusing on helplessness for the learned helplessness model), on the biological etiology (a defect in glucocorticoid release regulation in the corticosterone administration model) or on the early environmental etiology (maternal separation). In some cases, such as the unpredictable mild stress model, the construct validity criterion can be discussed according to several of these sub-dimensions, including the importance of stress in triggering the depressive episode, the crucial nature of the unpredictability of these stressors in the etiology of the disorder and the centrality of anhedonia. However, the crucial importance of this construct validity criterion is not emphasized by all authors. For example, according to Weiss and Kilts, 'although theoretically based models are likely to provide interesting and valuable information about the relation of certain behaviors to physiological changes, they face no fewer fundamental problems in establishing their validity as models of diagnostic categories than did the psychodynamic formulation they have replaced' [28
]. Tables and recap the results of this review.