Our work shows that information routinely collected about family history in GWA studies can provide new and independent evidence of associations between genetic variants and disease at low marginal cost. The most exciting application of this family-history-based method is the feasibility of a GWA study of a rare disease using genotypes from persons previously studied in case-control or cohort studies for other purposes, when history of the rare disease in family members is available. The method can also add to the information available from cohort members or cases and controls; the method allows one to learn about genetic associations with a sex-specific disease, such as ovarian cancer, using male as well as female study participants.
Like the kin-cohort method,19,20
the family-history-based method indirectly assesses genetic susceptibility using error-in-variable methods facilitated by knowledge of the error mechanism from Mendelian principles.21
The kin-cohort method estimates penetrance, or risk of disease, in those with a high-risk genotype, whereas the proposed family-history-based approach tests for genetic association. Although the odds ratio estimate from the family-history-based approach is a biased estimate of the effect of the SNP on disease risk, the method provides an unbiased test for association.
Evidence from the family-history-based method may be useful either for discovery of associations or as supplementary information when prior data already exist. The information provided by the family-history-based approach can add power to conventional studies or to pooled analyses. When used by itself, the usual cautions for GWA studies apply.
Thornton et al.1
used family history information, together with the disease status of the study participants, to gain power in a standard analysis of case-control data. These authors exploited the information on disease status in relatives to construct optimal weights for study participants. Our family-history based approach uses family history of relatives, notably older ones (for adult cancer), for studying association. This approach does not use disease status of the participants, which may be unavailable at the start of a prospective cohort study. We emphasize our ability to study diseases that are not the focus of the main study; a case-control study of one disease can contribute information about risks of many others, and a cohort study that enrolls middle-aged subjects can be useful for investigating associations with a late-onset disease such as prostate cancer or Alzheimer disease, without having to wait for cases to accrue in the cohort.
When comparing the p-values from the family-history-based and disease-based analyses, we should note that the prostate-cancer cases were part of the discovery set for the prostate cancer GWA study. This naturally biases the disease-based analysis favorably for the SNPs that were discovered by that study and gives a spurious power advantage over the family-based analysis. However, there is real attenuation in power for the family-history-based method due to measurement error from using participants’ genotypes in studies of relatives’ disease, and this attenuation is far more important. For example, one might genotype men and collect information on their fathers’ disease status. The weak correlation between their genotypes contributes to the loss in power. Other sources of power loss are inaccurate reports of disease status of family members leading to phenotype misclassification, and the smaller number of men with affected fathers than in case-control studies, where cases are oversampled. This attenuation in power may be partially countered by larger sample sizes or by collecting information from several family members.
We used simulations to compare the performances of the family-history-based and disease-based methods. For odds ratios ranging from 1.2 to 1.8, we studied relative efficiencies (defined as the inverse ratio of sample sizes needed for 70% power at 0.05 level of significance) of family-history-based analyses, as we increase the number of family members, compared with the standard case-control analysis. We fixed disease prevalence at 8% and considered a SNP with 30% minor allele frequency for our simulations. The family-history-based test with a single first-degree relative requires approximately 4 times the sample size as a standard case-control test for association; as we gather information on more relatives, this ratio decreases (eFigure 2
). To check the sensitivity of our observations across various parameters, we performed these simulations for different values of disease prevalence and minor allele frequency (eTable 1
). We found that disease prevalence is the primary determinant of the comparative performance of the family-history-based and case-control analysis.
The quality of the reports on relatives’ disease history is the central concern in using reported family history. In the absence of electronically linkable medical records or on-line family history tools, accuracy of study participants’ reports of relatives’ disease history will vary by characteristics of the participants, the relatives and the disease. Mai et al.22
found that reports on breast, prostate, colorectal and lung cancer in family members of participants in the population-based 2001 Connecticut Family Health Study had low-to-moderate sensitivity and positive predictive value, but high specificity and negative predictive value. Participants’ knowledge of disease in family members varied with the disease (reported history of breast cancer in family members had the highest sensitivity, while colorectal cancer had the lowest) and with the degree of relatedness to the participant (reports on first-degree relatives were more accurate than reports on second-degree relatives). Accuracy of reports also depends on other factors: disease status, age, ethnicity, sex and family size of the participant, as well as age and sex of the family members. Family history was not directly verified in the Cancer Screening Trial, but Pinsky et al.16
indirectly assessed validity of reported family history by comparing reported rates of various cancers in family members with expected rates derived from the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database. Overall, the authors observed a ratio of reported rates to expected rates of approximately 0.7; for most cancers, this ratio was in the range of 0.6–1.0 for women and 0.3–0.8 for men. Incomplete or inaccurate family history information collected from participants can reduce power and, when reporting accuracy is differential between study participants with and without family history (as is most likely the case), can induce false-positive reports.
Different family-history variables might be appropriate for different settings. We show that the family-history-based method works with even the simplest yes/no family history variable, which might be the most common in existing studies and is the cheapest to obtain for future questionnaires. We observe that participants reporting family history of cancer had more first-degree relatives on average than participants with no family history; this suggests that formulations of family history taking into account family size (such as proportion of affected family members) may perform better. With additional details, we can construct a family history score23
based on family structure and risk covariates for family members. As a first step, we fitted a polytomous logistic regression (eTable 2
, column 3 [http://links.lww.com
]) treating the response variable (family history of prostate cancer) as a nominal variable (none, one, and multiple family members with prostate cancer); we saw no substantial improvement.
We assume an additive mode of inheritance in our analyses. To check sensitivity of our results to the modeling assumptions, we also fit the general two-degrees-of-freedom model (eTable 2
, columns 4 and 5 [http://links.lww.com
]). The observed pattern holds even under the general model. The two-degrees-of-freedom model changes the p-values only marginally in an analysis that does not adjust for population substructure (eTable 2
, columns 6 and 7 [http://links.lww.com
Information on family history of diseases can add value to association studies without additional genotyping. Family history information can supplement disease-based associations, and can be particularly useful when deciding on a set of variants for follow-up in further studies. It may be possible to increase the power of conventional disease-based association studies by combining information on disease status for participants and their relatives, accounting for correlation. In this context, the quasi-likelihood score test proposed by Thornton et al.1
can be used to construct pedigree-based weights and thus improve power. The power gain from combining family history with disease status may be particularly useful in assessing SNPs with borderline significance based on standard analysis of the study population.