|Home | About | Journals | Submit | Contact Us | Français|
For most of us the foundations of our understanding of genetics were laid by considering Mendelian diseases in which familial recurrence risks are high and mutant alleles are both necessary and sufficient. One consequence of this deterministic teaching is that our conceptualisation of genetics tends to be dominated by the notion that the genetic aspects of disease are caused by rare alleles exerting large effects. Unfortunately the preconceptions that flow from this training are frequently erroneous and misleading in the context of common traits, where familial recurrence risks are modest and for the most part the relevant alleles are neither rare, nor necessary or sufficient. For these common traits the genetic architecture is far more “complex” with susceptibility rather than causality resulting from the combined effects of many alleles each exerting only a modest effect on risk. None of these alleles are sufficient to cause disease on their own and none are essential for the development of disease. Furthermore most are carried by large sections of the population the vast majority of whom don’t develop the disease. One consequence of our innate belief in the Mendelian paradigm is that we have an inherent expectation that knowledge about the genetic basis for a disease should allow genetic testing and thereby accurate risk prediction. There is an inevitable feeling that the same should be true in complex disease, but is it?
The enormous size of the human population coupled with the extreme length of the genome sequence means that even though any two individuals typically only differ by 0.1% at the genomic level there are still billions of variants prevalent in the population as a whole.1 International efforts to identify and catalogue human genetic variation, such as Hapmap (http://www.hapmap.org) and the 1000 genome project (http://www.1000genomes.org), have provided empirical support for the expected inverse relationship between the frequency of a variant allele and the number of variant alleles with the same frequency. Common variants, where both alleles have a frequency of greater than 1%, are far less numerous than rare variants. On the other hand common variants account for most (90%) of the difference between any two individuals.1 With approximately 10–15 million common variants and billions of rare variants in the human population, identifying which are relevant in any given disease has proven to be extremely challenging.2
In principle each and every genetic variant is likely to have some effect upon function and thereby upon the risk of disease; under this ultimate polygenic/biometric model3 all variants are expected to exert some effect on risk. However, the effects attributable to individual variants are likely to differ greatly with some exerting much larger effects than others, and most exerting little or no meaningful effect. Under this model we expect that both rare and common variants will influence the risk of a disease, with the relative contributions varying between diseases.4 At a population level the prevalence and familial recurrence risks of a disease are a reflection of the combined effects of the prevailing risk allelic architecture (see BOX 1).5, 6 In this context Mendelian disease can be seen to represent an unusual extreme in which a few rare variants exert profound effects and familial recurrence risks are maximal.
Consider three populations that differ only in terms of the frequency of a single risk allele and are equivalent in all other respects (see Fig B1a). In accordance with the number of individuals carrying the risk allele the prevalence of disease will be highest in population C and lowest in population A. On the other hand, for reasons which are perhaps less intuitively obvious, familial recurrence risk will be greatest in population B and uninfluenced by this particular risk allele in the other two populations. In population A no-one carries the risk allele, while in population C everyone is homozygous for the allele. In these populations then the rate of risk allele carriage is unrelated to disease status and therefore the frequency of this risk allele is no greater in the relatives of affected individuals than it is for unaffected individuals. In population B, on the other hand, affected individuals are more likely to carry the risk allele than unaffected individuals and therefore the recurrence risk will be increased in the relatives of affected individuals who will necessarily also have a higher rate of carrying this allele. In short while prevalence reflects the combined burden of risk alleles in the population as a whole, familial recurrence risk is a reflection of the variation in the risk burden between individuals. The greater the extent to which individuals vary in terms of their genetically determined risk the greater will be the extent of familial clustering. For example in a Mendelian dominant trait the risk varies considerably between individuals, being effectively zero in individuals who don’t carry the risk allele and complete in those who do. In this situation disease is effectively only seen in the relatives of affected individuals. The extent of familial clustering is thus a reflection of the extent to which genetic risk varies between individuals.
Clayton22 and Pharoah et al.35 have shown that under a biometric model log(risk) in the population will be approximately normally distributed with a mean (μ) and a variance (σ2) that are determined by the population prevalence (K) and the sibling recurrence risk (λs) according to the formulae shown below
The figures in this paper are plotted using these approximations to estimate the distribution of risk in the population. It is worth noting that the distribution of risk in cases has the same variance but a mean of loge(K) + σ2/2. The risk profiles of the cases and controls thus overlap to an extent which is dependent upon λs. Even if λs for multiple sclerosis were > 40 there would still be a substantial proportion of cases (14%) that had levels of risk below the 95th percentile risk seen in the general population (see Fig B1b). The percentage of lower risk cases would only fall below 10% for diseases where λs was > 72. At a λs of 10 almost a third of cases have a level of risk below the 95th percentile of risk seen in the general population.
Risch suggested that λs, the relative recurrence risk in the siblings of an affected individual, was a useful way to summarise the amount of familial clustering in a disease5 and showed that this value could easily be partitioned between relevant loci5 and was predictive of the power to identify linkage.7 By definition λs is the ratio between the lifetime risk of the disease in the siblings of an affected individual and the lifetime risk of the disease in the general population. Both of these risks are difficult to measure reliably and Guo has pointed out that in general the denominator will be underestimated while the numerator will be overestimated.8 As a result estimates of λs are almost always positively biased. Review articles frequently specify λs but rarely provide much guidance to the data behind these quoted values. These data are often remarkably difficult to track down and invariably associated with wide confidence intervals which are rarely, if ever, acknowledged in reviews. As epidemiological studies have become larger and more discriminating the value of λs has fallen in almost all complex traits, including multiple sclerosis (see Fig 1).9–15 In a recent attempt to integrate available epidemiological evidence relating to multiple sclerosis Butterworth found that the lifetime incidence in multiple sclerosis is likely to be higher than previously estimated, a fact which would further reduce the λs.16 The real value for λs seems likely to be very much less than 10 if not less than 5.
The relationship between recurrence risk and the degree of relatedness can provide a useful guide to the mathematical model which most closely reflects the underlying genetic risk architecture.5 In multiple sclerosis such segregation analysis suggests that a multiplicative model with one major risk allele and many minor alleles provides the optimal fit.5, 17 This result is unsurprising given the biometric notion that susceptibility is likely to be determined by multiple variants, if not to some extent by all variants. In this situation we would expect log(risk) to be normally distributed, since a normal distribution results from the sum of a large number of random variables (see BOX 1). In multiple sclerosis the relationship between relative recurrence risk and relatedness is decidedly non-linear (see Fig 2).10, 12, 13, 18 These data are consistent with a multiplicative model and imply that significant heterogeneity is unlikely.4 The linkage data in multiple sclerosis concords with these predictions confirming that there is just one major risk allele in the disease, DRB1*1501.19 Based on the linkage data the locus specific λs for the MHC region as a whole is 1.5 while all other loci of relevance in multiple sclerosis have a λs of 1.2.19
Prior to any form of assessment all individuals in a population have the same risk of disease (the population prevalence). In multiple sclerosis this prior risk is low (0.001).20 Although susceptibility loci have only modest individual effect on this prior probability the ability to discriminate those who will, from those who will not, develop the disease inevitably increases with each additional relevant locus considered.21 It turns out, however, that even if all relevant loci were known and tested disease can only be reliably predicted in relatively few individuals, unless λs is very large.22 For multiple sclerosis λs is at best 10 indicating that very few individuals (<0.1%) would have a risk of greater than 10% (see Fig 3). The distribution of risk shown in the figure reflects the combined effects of all risk alleles (known and as yet unknown) and thus represents the maximum level of information that could possibly be defined genetically. It is clear that the vast majority of the population have a very similar level of risk, indeed on average the relative risk of the disease between any two individuals is just 11.3, a rather limited value in the context of a disease with a prevalence of 0.001. In other words most of the population carry risk alleles but only a very few individuals carry a substantially larger than average number of these alleles. In principle an individual could be homozygous for all known risk alleles and thereby have a very high risk of disease. However, such individuals are extremely uncommon. Most individuals carry similar levels of genetically determined risk and relatively few individuals can have their ultimate disease status accurately predicted from genetic testing (see BOX 2).23
It seems reasonable to expect that the ability to predict who will develop multiple sclerosis would have meaningful clinical benefits, such as allowing expensive, invasive or potentially dangerous preventative strategies to be reserved for those at greatest risk. At first sight it also seems possible, if not probable, that genetic testing might enable such prediction. If all variants influencing susceptibility to multiple sclerosis had been defined then in principle a “diagnostic chip” could be created which would accurately genotype all these variants, determine an individual’s genetic risk (genetic profile39) and thereby discriminate between those who will and those who will not develop the disease. Unfortunately although this is a seductive logic in practice this approach would be unlikely to be useful in multiple sclerosis (see Fig 3).
For example if we used this chip to screen a population of 100,000 newborns then on average we would identify just 64 individuals with a risk of ≥ 10%. Ultimately only 14 of these would actually develop the disease. Since 100 of the screened individuals would ultimately be expected to develop multiple sclerosis it is also clear that this genetic screening effort would have missed most of the eventual cases (86/100). Including gender in our assessment adds very little extra, in a population of 100,000 (50,000 males and 50,000 females) we would expect to identify 61 females with a risk of ≥ 10% and 10 males with a risk of ≥ 10%. This total of 71 at risk individuals is greater than the 64 we were able to identify based on genetics alone, reflecting the extra information we gained by including gender in the assessment.
The relative proportion of false positives and false negatives clearly depends upon the threshold we choose to define people as being “at risk”. The Receiver Operating Characteristic (ROC) curve provides a useful way to summarise such data,40 (see Fig B2). Considering the ROC for the hypothetical chip described above shows that 50% of the cases occur amongst the 1.6% of the population that are at greatest risk. At first sight these figures seem appealing and suggest that perhaps genetic profiling might provide a useful way to identify a significant proportion of those at risk. However the low absolute risk of multiple sclerosis (the prevalence, 0.001) implies a low positive predictive value, meaning that even within this high risk group, those who will ultimately develop the disease constitute only 3% of the total. If a preventative strategy were applied in this setting the majority of those treated would be exposed unnecessarily (97%) and the cost per case prevented would be >30 times the unit cost of the intervention. In considering these numbers it is worth remembering that this level of risk (3%) is approximately the same as the familial recurrence risk in close relatives of affected individuals suggesting that a program in which preventative treatments were simply given to those with a family history of the disease might be as effective, and would of course completely avoid the need for genotyping. In other words in multiple sclerosis genetic profiling would add very little beyond that which could already be deduced from family history, as with other traits with strong familial clustering.24
For interventions that are safe, non-invasive and cheap (i.e. cost less per person than the cost of genotyping) screening would be pointless since it would be far more cost effective to simply apply such interventions to the whole population in an unselected fashion. If the cost of an intervention were high, then the absolute cost of a preventative programme would be prohibitive even if screening by genotyping were free. Clearly there is a middle ground where a program might be affordable (particularly when weighed against the full cost of the disease prevented) in this situation screening might provide a means to maximise the benefit from any investment by identifying those at greater risk. However the health and financial costs to the large numbers of false positives (treated) and false negative (un-treated) individuals would have to be very low if this were to be a useful approach.
In considering the issue of prediction it is also worth remembering that most of the individuals at very high risk will have a family history of the disease (even if they don’t eventually develop multiple sclerosis), and thus to some extent this genetic analysis is adding relatively little additional information that cannot already be inferred from family history.24 In some sense then this logic has come full circle, those individuals with the highest genetic risk will largely declare themselves ahead of typing by virtue of the fact that they will have affected relatives.
To date nine non-MHC susceptibility alleles have been established in multiple sclerosis (see Table)25–28 with many more expected to follow in the next few years. Together with the risk attributable to the MHC all known loci account for a λs of approximately 1.6. The distribution of risk attributable to the currently known susceptibility alleles (MHC and non-MHC) is considerably more limited than that due to all loci (see Fig 4). It is clear from this figure that based on current knowledge genetic screening would only be able to identify a very few individuals with at worst a modest 1% risk of developing the disease.
Once an individual develops symptoms consistent with multiple sclerosis the prior probability of the disease goes up significantly, and we could therefore imagine that genetic testing might be more useful in helping to refine diagnosis rather than predict disease. However, in this setting the utility of the testing depends on typing SNPs which differentiate multiple sclerosis from the alternate diagnoses rather than from the general background population. It is not clear that susceptibility SNPs will achieve this unless the pathogenesis of the alternate diagnoses are clearly distinct (have a different underlying genetic architecture). In the case of clinically isolated syndromes (CIS) for example it seems likely that those cases which do not progress to multiple sclerosis are simply milder versions of the same disease process. In this setting it is unlikely that the genetic architecture underlying cases that do not relapse will be significantly different from that underlying multiple sclerosis itself. Thus although the prior probability of multiple sclerosis must be higher in Neurology outpatient clinics the utility of testing susceptibility SNPs is likely to be reduced. The more distinct the alternate diagnosis the easier it will be to differentiate them from multiple sclerosis on clinical grounds. In other words in those settings where genetic testing might help with differential diagnosis it is likely that this will not be so challenging clinically. Whenever genetic testing might help it seems likely that it won’t be needed.
Once the diagnosis of multiple sclerosis is established we might ask if genetic testing could help in predicting disease features such as course or severity. Unfortunately available evidence suggests that the genetic influences on clinical features are significantly less marked than those influencing susceptibility.29 It is thus unsurprising that there has been little if any progress in identifying genetic variants that influence the course or the severity of the disease. It remains possible that such variants could be identified but unless they were unexpectedly more influential than the effects determining susceptibility it seems unlikely that testing will be any more productive than in the case of susceptibility.
One consequence of the biometric model is that affected individuals are inevitably highly heterogeneous in terms of the particular set of susceptibility alleles they carry.23 In this setting high levels of clinical heterogeneity might simply reflect the underlying heterogeneity in the distribution of risk alleles amongst cases. For example severity might simply correlate with the absolute level of genetic risk. Once sufficient risk alleles are identified it should be possible to test this theory. If this were confirmed then genetic testing might contribute some information distinguishing CIS from multiple sclerosis.
Even without genotyping we know of a number of factors which influence the risk of developing multiple sclerosis. Gender is the most obvious example. Compared with the population as a whole (see Fig 3) the risk for females is shifted to the right while that for males is shifted to the left (see Fig 5). These shifts are modest and have little effect on the number of individuals at the extreme of risk. In other words supplementing the risk assessment on the basis of non-genetic risk factors such as gender has little effect on the extent to which useful inferences about absolute risk can be made. It is clear that combining extra information from demographic and perhaps ultimately environmental risk factors (e.g. past history of infectious mononucleosis or smoking) is sure to improve risk prediction but it seems unlikely that this will compensate for the effects of the low prior probability of developing multiple sclerosis unless considerable risk could be accounted for or there was some form of strong interaction between genetic and environmental risk factors.
The logic and conclusions outlined above are probably applicable to most complex traits. For most λs has almost certainly been over-estimated in the past and is in reality likely to be 10. In this setting the multiplicative biometric model indicates that very few individuals will carry a level of genetically determined risk that would allow confident prediction. This situation is common in medicine where we are familiar with the fact that for many conditions the majority of cases arise in the very large number of people at modestly increased risk rather than the few people who are at very high risk (c.f. blood pressure in stroke or coronary heart disease).30 Of course the utility of genetic testing could be very much better if in fact susceptibility to multiple sclerosis is determined by a multitude of very rare alleles each exerting very large effect. However, the available data makes this extremely unlikely. Segregation analysis is against significant heterogeneity,5, 17 large extended families are practically unheard31 of and there is no significant evidence for linkage outside of the MHC.19 Indirect evidence from Genome-Wide Association Studies suggests that the polygenic/biometric model is likely the most relevant.32 Given the phenomenal progress made in the genetic analysis of complex disease over the last few years33 it was inevitable, and appropriate, that researchers should consider what role this new knowledge might play in matters such as disease prediction. It was equally inevitable that some would anticipate great benefits34–36 and others recommend caution.22, 23, 37 In multiple sclerosis our analysis suggests that the relatively low prevalence and modest levels of familial clustering seen in this disease mean that genetic profiling is unlikely to be of clinical benefit except in unusual circumstances. It seems to us that we should not be distressed by the fact that no matter how completely we understand the genetic basis for susceptibility to multiple sclerosis we will rarely be able to predict who will develop disease: this was never the primary goal of these endeavours. The extent to which these discoveries influence an individual’s risk of developing disease is only one, rather unimportant as it turns out, dimension in which their relevance might be measured. In terms of the Population Attributable Fraction (PAF) (the proportion of cases which would disappear if a risk factor were removed from a population38) these loci can be seen to represent enormous effects (see table). In considering the value of these new discoveries we should also remember that to date virtually all that have been identified are associated anonymous variants, and it will take considerable further work to understand these associations. Efforts at fine mapping to establish the causal variants and functional studies to fully understand how these variants are involved in pathogenesis are only just beginning. Ultimately it is these aspects that are likely to be the most rewarding and enlightening. It is too soon to judge what value these discoveries will ultimately yield but these benefits seem likely to be profound.
Our discourse is not intended to undermine the entire notion of genetic profiling, only to put this issue into a more pragmatic and realistic context. For a disease like multiple sclerosis, where prevalence and λs are modest, it seems unlikely that risk profiling will find any meaningful role in clinical practice; on the other hand such profiling could prove to be of much greater value in a research setting. The power of functional studies could be enhanced by concentrating on controls with lower levels of risk and cases with higher levels of risk. Similarly unaffected individuals with high risk and affected individuals with low risk could be especially informative when trying to understand the role of the environment. As genetic factors influencing natural history and response to treatment emerge, prognostic and pharmacogenomic profiling might have far more clinical utility. For other diseases with much higher prevalence or considerably greater λs risk profiling might have clinical utility especially if prediction could be focused on higher risk subgroups as defined by additional information from non-genetic testing or demographic features.35
This work was supported by the Wellcome Trust (084702/Z/08/Z), the Medical Research Council (G0700061 and U.1052.00.012.00001.01), the National Institute of Health (RO1 NS049477) and the Cambridge NIHR Biomedical Research Centre. We would like to thank all our colleagues in the International Multiple Sclerosis Genetics Consortium (IMSGC) and the Wellcome Trust Case Control Consortium (WTCCC) for their support and tireless efforts to move the genetics of multiple sclerosis forward.