An inherent appeal of EP concept is that a ‘good’ EP should be closer to the ‘level of gene action’ than the relevant PD. This concept should translate into the empirical observation that the genetic effects on EP should be stronger than on PD. In their recent review, Flint and Munafo
11 state this explicitly
Much effort has been devoted to finding such endophenotypes, partly because it is believed that the genetic basis of endophenotypes will be easier to analyze than that of psychiatric disease. This belief depends in part on the assumption that the effect sizes of genetic loci contributing to endophenotypes are larger than those contributing to disease susceptibility, hence increasing the chance that genetic linkage and association tests will detect them.
We can best illustrate this point using path models. Assume, for example, the mediational model for EP as depicted in . As pictured, as long as EP is not perfectly correlated with PD (that is, the standardized path from EP to PD is < |1.00|), it is logically necessary that the genetic effect will be stronger on EP than on PD. However, this model ignores measurement error.
represents a more realistic model incorporating errors of measurement. This figure depicts a mediational model for EP that includes a path from the latent EP to the latent PD (β) and paths from these two latent constructs to the observed constructs, which reflect the accuracy with which they are assessed (λEP and λPD, respectively). (Latent here refers to an unmeasured ‘true’ construct and is traditionally depicted in path diagrams by circles or ovals, whereas measured variables are depicted in squares or rectangles.) Simple algebra allows us to conclude that the genetic effect will be stronger for EP than PD whenever λEP < βλPD. depicts the risk-indicator model for EP, which has a direct path from genes to the latent EP (βEP) and from genes to PD (βPD). For this model, the genetic effects on EP will exceed those on PD whenever aEPλEP > aPDλPD.
As EPs are often measured using sophisticated imaging, neurophysiological or neuropsychological measures, there is a tendency to assume that such ‘harder’ measures are, of necessity, more reliable than the ‘softer’ psychiatric diagnoses. That is, we commonly assume that
λEP exceeds
λPD. However, this assumption may be incorrect. Many putative EPs are measured over short time intervals and can be influenced by transient state effects such as ambient noise, temperature, time of day, or variations in machine functioning—as well as temporary changes in mental state of the participant due to stressors, or consumption of or withdrawal from nicotine, caffeine or alcohol. By contrast, some PDs are assessed using years of medical records and the recording of symptoms occurring over similar periods, or with carefully constructed psychological instruments. For example, Gur
et al.
12 report that the 1-year stability for a commonly used measure of the Continuous Performance Test was ‘found to be 0.65 for schizophrenia patients and 0.72 for healthy subjects,’ whereas another such measure had stability over 2 years, which ranged from 0.56 to 0.73. The reliability of brain functional magnetic resonance imaging (fMRI) is quite variable and for some paradigms is under +0.30.
13–15 However, other EPs, such as structural MRI, might be highly reliable as indicated by a recent report showing test–retest correlations > +0.95 for measures of cortical thickness.
16 By contrast, in the Irish Study of High-Density Schizophrenia Families, which used both in-depth personal interviews and extensive reviews of hospital records, the diagnostic reliability, assessed using a weighted
κ, was +0.94.
17 Conversely, in the Virginia Adult Twin Study of Psychiatric and Substance Use Disorders, the long-term stability of an interview-based assessment of lifetime major depression was
κ = +0.43,
18 considerably lower than the reliability of the short form of the Eysenck’s neuroticism scale (
r = +0.69) which has been proposed as an EP for MD—measured over a comparable time period. The major point here is that the relative ‘performance’ of EPs versus PD in assessing a ‘genetic signal’ cannot be divorced from the problems of measurement. It is perfectly possible for us to study an EP that is ‘truly’ closer to gene action than our PD. But if our measures of EP are less reliable than those of our PD, unless we account for this unreliability in our models, we would get the wrong answer—that EP cannot be sitting in the causal path to our PD.
There are several ways by which this problem can be approached. One is to obtain good measures of λEP and λPD that could be incorporated into the analytic models. This might be carried out by obtaining test–retest reliability in a subset of the studied sample. An even more powerful approach would be to obtain longitudinal measures of EP and PD in the entire sample. Then, making the reasonable assumption that the genetic risks for them were temporally stable, the model depicted in could be applied. In this model, the estimates of β are now unconfounded from those of λEP and λPD. Such data would also now be much more likely to be able to discriminate the risk indicator from the mediating variable model for EP. A third possibility is to use data collected from pairs of relatives, which can also potentially discriminate risk indicator from mediational models.