Family studies do not have a significant role in the discovery or analysis of either common or rare disease associated variants, both of which have relatively low penetrances at the individual level (Box 1
and ). That is the basis for the need for quite different strategies for the discovery of either type of variant. Common variants depend on large-scale genotyping of large numbers of cases and controls to be sure of the statistical significance of a suspected SNP association. Rare variants depend on extensive resequencing of carefully selected candidate genes in relatively large numbers of carefully chosen cases, together with a thorough analysis of the functional effects of any suspected variants. Both types of studies assume that background genetic and environmental effects are averaged out, so that, in experimental design terminology, it is the ‘marginal’ effect of a variant that is being assessed.
Characteristics of common and rare disease variants compared
There is no doubt that WGAS have uncovered, and will continue to uncover, interesting and previously unknown polymorphic variants with measurable significant effects on a variety of common chronic diseases. Our analysis shows, however, that as the odds ratios for common variants will mostly be small, the penetrance of these variants will be very small, even though the contribution of an individual variant to the overall inherited susceptibility of a disease, as measured by the PAR, may be relatively large (Box 1
). It is the penetrance, however, that determines the possibility of applying potential preventative approaches on the basis of whether an individual is a carrier of a variant. Small ORs make it very difficult to establish the functional basis for any particular association, and so to make a convincing contribution to understanding the etiology of the disease. Thus, whereas WGAS may make a major contribution to understanding the population genetic architecture of a disease, their practical applications in terms of understanding the etiology of a disease and in targeted prevention are likely to be very limited.
It seems likely that, considering the scale of studies so far carried out and the wide range of SNPs used, most of the associations with ORs around 1.2 or greater for the diseases so far studied may already have been found, at least in populations of European origin. There is always the possibility that positive interactions between one or more common variants may give rise to a much increased OR. This is, however, very difficult to test for, unless the marginal effects of the variants being tested for their interactions are themselves significant. Even then, the number of pairwise combinations to be assessed is likely to be prohibitive. Furthermore, it seems a priori unlikely that variants with small primary effects would give rise to significant interactions.
There remain two key questions. First, is there a long tail of low OR associations still to be found? Second, are there, as might be expected, different associations in non-European populations? The lower the OR, the larger the study needed to achieve statistical significance and the harder it will be to find an association against a background of inevitably increased environmental, and possibly ethnic, heterogeneity. There is a sort of uncertainty principle here, as variant effects merge into the effects of a variable background environment. Given the difficulty of applying even those results associated with larger ORs, it is a serious question as to whether it is cost effective to do larger and larger studies simply to try and find out in more detail the population specific genetic architecture of a disease. Genotype by environmental effects will only be found by very large WGAS in different well-controlled environments that are not confounded by ethnic differences. It may well be questioned whether such studies are, in general, even possible, let alone worthwhile. It must be expected that the smaller the OR, the more likely it will be that environmental factors predominate.
Our analysis suggests that rare variants may make a substantial contribution to the multifactorial inheritance of common chronic diseases and may often have penetrances large enough to justify preventative screening strategies (Box 1
). Thus, even though individual rare variants may not contribute much to the overall inherited tendency of a disease, their discovery is likely to be much more rewarding than that of common variants in terms of practical applications, including understanding disease etiology.
In order to meet the challenge of finding rare variants, it is critical that the resources of the newer DNA sequencing technologies are made available for rare variant searches to at least the same extent as SNP typing resources have been made available for WGAS.
There are two important ways in which studies of rare and common variants might intersect. The first is the possibility that common variants may act as significant modifiers of the effects of rare variants (see ref. 31
for an example). This could be investigated, for example, by looking at the effects of established common variants influencing breast cancer susceptibility on the ORs for putative rare variants at the BRCA1
loci (Box 2
). The second point of interaction is that the genes for which common variants are found, or genes nearby that may contain the functionally relevant variant, could be considered candidates for the search for rare variants. They may also then help identify the functional variant associated with a common disease variant.
BOX 2 Rare variants in BRCA1 and BRCA2 BRCA1
mutations as listed in the Breast Cancer Information Core database are considered clinically significant if they are associated with a clear-cut familial pattern of disease incidence. These are predominantly frameshift or nonsense mutations with obviously disruptive effects on gene function, with just a small proportion of missense changes. Variants classified as of ‘unknown significance’ (VUS) or as ‘not clinically significant’, mainly because they do not show familial aggregation, have a notably different distribution of changes. These BRCA1
variants are often hardly, if at all, referred to in reviews of breast cancer susceptibility (see ref. 32
for an example). A high proportion of the VUS are missense changes, with a very small proportion of frameshift or nonsense changes (Supplementary Table 1
online). The functional consequences of these missense changes can be assessed in the usual way, according to the probable severity of the effect of the amino acid change on the function of the gene product (see ref. 33
for an example). Intervening sequence changes are found relatively often in all three categories of BRCA1
variants. The noteworthy feature of these data on types of mutations is the similarity of the distributions for VUS and ‘not clinically significant’ variants, if we ignore the synonymous changes, to the distribution expected for rare variants. This suggests that most, at least of the missense changes, in the VUS and ‘not clinically significant’ categories may actually be rare variants that do have some clinical significance. Assessment of function on the basis of familial aggregation will completely miss the potential pathological significance of these BRCA1
categories of variants, because of their relatively low penetrances.
For the severe mutations, assuming a mutation rate per base pair of about 5 × 10−8 and, conservatively, a selective disadvantage of about 0.1, a penetrance of 1, and 1,000 mutations, their total contribution to the PAR per locus is about 2 × (5 × 10−8/ 0.1) × 1 × 1,000 = 0.001. For the missense mutations as rare variants, we can reasonably assume an OR of 2, an average frequency of 0.002, a population incidence for breast cancer of 0.1 and also 1,000 variants, giving a contribution to the PAR of 0.4. Thus, on the basis of these fairly conservative assumptions, the contribution of the VUS and ‘not clinically relevant’ missense variants to the overall inherited risk of breast cancer would be 400 times that of the usual familial mutations. Given that the increased breast cancer risk to variant carriers could be between 10% and 20%, there is a strong case for considering some sort of genetic screening program for these variants, coupled with a more intensive breast cancer screening protocol for the carriers, once identified.
How many rare variants does each of us carry? This is analogous to the classic question of genetic load and the average number of recessive lethals per individual. Given the likely average frequency of rare variants (though the frequency distribution is probably very skewed), and the many thousands of genes in which such variants could occur, it seems possible that the average number of rare variants per person could easily be ten or more. As it is almost only the rare variants that are associated with high enough penetrances to influence individual prophylactic decisions, it is this type of low frequency variation that may be much more likely to become the basis for some sort of personalized medicine, than that usually discussed in relation to common polymorphic variation.