Risk-based sampling can greatly improve the efficiency of cohort studies for assessing the contribution of uncommon genetic and environmental factors to risk of diseases that are not extremely rare and that have an etiology that is in part genetic. When appropriately targeted, such strategies will likely also improve compliance and data quality. Our power calculations document that the benefit of risk-based sampling is attributable to increasing the prevalence of uncommon risk alleles as well as to the expected accrual of additional cases, the latter being the more important contributor. Nonetheless, the increased accrual is likely the result of the combined impact of increases in the prevalence of many rare risk alleles and exposures.
Although we have documented substantial power gains, these gains are most modest for causative genetic variants that are very common (refer to the figures) or, equivalently, for protective genetic variants that are uncommon. However, causative variants may often have a prevalence in the range of 10 percent or less, where achieving adequate power will be the investigator's greatest challenge, especially for detection of gene-by-environment interactions.
Closely related designs have been used to study the risk of events associated with an elevated recurrence risk, such as pregnancy complications. For conditions expressed early in life, maternally mediated genetic effects may be important (
12) because the mother's genotype can act on her phenotype during pregnancy and thereby influence her phenotype and that of her offspring. Such maternal influences may even be important for the development of later diseases, such as schizophrenia or breast cancer (
13). In the presence of such a mechanism, use of these designs will again enrich the parental genotype distribution for causative genes compared with what would be seen with random sampling, potentially reflecting both maternal and offspring genotype effects.
To the extent that maternally mediated effects are biologically plausible, if the mothers of cohort (or case/control) participants are not themselves genotyped, then any estimated effect of the offspring genotype must be interpreted with caution. This is classic confounding: the maternal genotype can act as a cause of both the offspring genotype and the condition. So, particularly when studying a disease with onset early in life, one should consider genotyping both the offspring and the mothers (
14). The implications of risk-based sampling for such a study are the subject of ongoing work.
One concern sometimes raised in the context of risk-based sampling, although with little empirical support, involves the idea that “bludgeon” genes may obstruct assessment of environmental effects. Consider the example of
BRCA1 and
BRCA2 genes in the sisters of women with breast cancer. It is estimated that about 0.2 percent of the population and 2 percent of breast cancer cases carry one or more risk alleles at these genes (
www.cancer.gov/cancertopics/pdq/genetics/breast-and-ovarian/HealthProfessional/). If so, about 1 percent of sisters of cases are carriers. We estimate that in the Sister Study cohort, approximately 1,500 new cases will be diagnosed in 5 years of follow-up. On the basis of an odds ratio of 10 for mutation carriers, approximately 112 of these new cases will carry a deleterious
BRCA1 or
BRCA2 mutation. Thus, while the increase in prevalence for rare variants is substantial, the majority of cases of disease will not be attributable to this potent cause. Environmental factors should remain detectable. In particular, the study may also be able to identify environmental cofactors for the known breast cancer genes. The same reasoning would apply to any disease where the relative risk associated with having a first-degree relative who is affected is modest.
Two additional concerns are sometimes raised: does risk-based sampling limit generalizability of the findings, and does it limit the investigator's ability to study outcomes other than those selected for? Thus, for example, can findings for risk factors be taken to hold also for individuals who do not have a sibling with the condition under study? In addition, can other conditions be validly studied, for example, osteoporosis in the Sister Study cohort?
Even when risk has a familial pattern, however, most diseases are not strongly genetic. There is typically no reason to believe that the siblings of cases are fundamentally biologically different from individuals without an affected sibling. The concordance rate for breast cancer in identical twins is modest (
15), suggesting that lifestyle and environmental factors play an important role. Moreover, studies of migrant populations reveal that for many complex diseases such as breast cancer, immigrants adopt the risk of their adopted country within a few generations (
16), again suggesting that environmental and behavioral factors play a major role in determining susceptibility. While exposure relative risks that apply to first-degree relatives of cases may be slightly different from those in the general population, marked differences would not be expected. Neither the risk factors themselves nor the directions of association for those factors should be different for siblings of affected individuals.
A risk-based cohort is made up of volunteers, as is true of most cohort studies, and if one does not begin with a known sampling frame and achieve a high participation rate, then the “worried well” may be overrepresented. Although volunteer-based cohort studies enjoy internal validity, the measurable relative risks might not coincide precisely with those for the population at large. Nonetheless, risk-based sampling can greatly enhance our ability to design efficient prospective studies of complex conditions to identify both genetic and environmental contributors to risk.