We have presented a brief description of the methods and a few selected results derived from analyses of the 100K Affymetrix GeneChip with a large number of FHS traits, ranging from CVD events and subclinical measures to traditional cardiovascular risk factors of diabetes, lipid levels, blood pressure and also including more novel biomarker measures that reflect modern hypotheses, such as the role of inflammatory pathways in the development of CVD. We have also reported on a number of neurological, renal, cancer and aging traits, including longevity (age at death) and bone mass and structure. None of these manuscripts provide a comprehensive report. Rather, the purpose of this set of manuscripts is to provide a brief summary of the results and to introduce readers to the data posted on the dbGaP website http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007
. We note that the genotypes in this sample have also been evaluated by Drs. Michael Christman and Alan Herbert. Some of their results are reported on line as described by Herbert et al
Several aspects of our investigation merit comment. First, the present investigation represents a comprehensive GWAS analysis of numerous phenotypes in a large community-based cohort. To our knowledge, it is the largest GWAS performed in an observational cohort in terms of the number of phenotypes analyzed and web posted. Second, we exploited the phenotypic diversity and richness of the Framingham Offspring Study database to analyze a set of phenotypes that were for the most part collected by detailed, direct measurements of study participants. Further, many of the phenotypes are quantitative traits. Phenotypes have been broadly categorized into seventeen different domains for manuscripts in this supplement. It is noteworthy that key risk factor phenotypes, such as blood pressure and lipid levels, were collected at multiple examinations, and thus we were able to conduct analyses using time-averaged traits, maximizing the scientific yield from the longitudinal prospective design of our cohort study. Further, several recently collected phenotypes, in particular biomarkers and imaging measures, were collected using highly reproducible, state-of-the-art modalities. Correlated phenotypes facilitated the assessment of pleiotropy by seeking associations of SNPs with such phenotypes. These investigations occurred primarily among the variables in each individual manuscript. Finally, for most phenotypes, there was evidence for a significant heritable component from FHS or other studies. We acknowledge that some phenotypic domains may represent analytical constructs, rather than truly distinct groups from a biological standpoint.
, we have web-posted the results of all analyses on autosomes on dbGaP, including results without statistical evidence of association, so that investigators world-wide can access the data freely and mine them in silico
for hypothesis generation, inclusion in meta-analysis, and direct comparisons with their own results. In addition to the freely posted aggregate results, participant-specific genotypic and phenotypic data are available for distribution for further analyses to approved scientific investigators world-wide via the NCBI/NHLBI and consistent with Framingham Study data distribution policies (see http://www.nhlbi.nih.gov/about/framingham/policies/index.htm
). For the purpose of publication, reference to these analyses may be made by referring to either the appropriate manuscript or the specific URL for web-posted data. Fourth
, the simultaneous and full-disclosure release (on the web) of all association and linkage results of phenotypes encompassing at least 17 different domains in a cohesive and comprehensive manner signifies the tremendous teamwork of numerous FHS investigators, statisticians, programmers, and others. Most importantly, this effort would not be possible without the full cooperation and commitment of the FHS participants, who continue to attend Study examinations in an effort to further the scientific knowledge of factors that lead to heart disease and other traits.
, as in any genome-wide association study with a large number of SNPs, most results that are considered statistically significant by a conventional p < 0.05 may be falsely positive; so it is difficult to decide what results are important. Not only do we have a large number of statistical tests for each phenotype, but we also have numerous phenotypes. Thus, considering multiple testing in the interpretation of results is of paramount importance. There are several approaches to address the issue of multiple testing, such as Bonferroni correction, permutation testing and false discovery rates. To conduct permutation testing for all of the traits that we considered is prohibitively time-consuming, particularly in preserving heritability of the traits with family data. Further, with correlated traits it is difficult to decide what traits should be included in a permutation testing strategy. One approach to controlling the false-positive rate in genome-wide association studies is to set a stringent threshold for declaring statistical significance. According to the report of the International HapMap Consortium, complete testing of common variants (MAF > 0.05) in each 500 kb is equivalent to performing 150 independent tests in white populations of European descent [88
]. Using this guide and given that there are about 3000 Mb in the human genome, we would estimate that there are approximately 900,000–1,000,000 independent tests if testing all common variants in the genome. A conservative Bonferroni correction using this number of tests (0.05/1,000,000) yields an approximate threshold of genome-wide significance to be 5*10-8
. Thus, for a single trait, one could use this threshold. Several results do fall below this threshold (Table of the Overview). In considering these results, we note that our sample size of 1345 biologically related subjects is relatively small for detecting genetic variants of modest effect. We also note that we have a large number of correlated traits, including the same traits with different covariate adjustments. Further, we have already observed that our GEE results have an excess number of small p-values. Thus, we are hesitant to regard any result reported in our manuscripts as significant at a genome-wide level. We believe these findings are best regarded as hypothesis-generating. The determination of what constitutes genome-wide significance is challenged both by theoretical considerations as well as practical ones. Without pursuing more computationally intensive analyses, it is thus difficult to provide specific advice regarding what SNPs are most important. It may be safer to assume that most of the small p-values are likely to be false positives and that replication of our results in other independent samples is of critical importance
. We proceed with presentation of full-disclosure results to encourage readers to pursue such studies.
Associations achieving nominal genome wide significance, p < 5*10-8 across the 17 phenotype working groups
, we note that use of the 80% genotyping call rate is unusually liberal by today's standards in GWAS. We used this threshold in these manuscripts to be inclusive, rather than exclusive, in a first look such as this. We recognize that this threshold may permit consideration of some results that could be spurious due problems with genotyping. However, a limitation of our genotypes is that the genotype calls were made with the DM algorithm, which is less precise than those that have recently been introduced. At this time, we are unable to apply more accurate, reliable genotyping calls [89
], as we do not have access to the source data. Further, we found that the choice of the 80% threshold versus a more conservative one had little effect upon p-value distributions. Finally, all results, regardless of genotyping call rate, are posted on the dbGaP website and thus, investigators can evaluate for themselves what they believe to be the more valid results from this study.
Seventh, in our analyses we found that the GEE results appear to have an excess of significant results. We suspect that one reason is low MAF. Also, given the small sample of at most 1345 subjects, we would expect only 13–14 individuals to have the minor homozygote. Thus, we limited the results that we present in the manuscripts to those SNPs with MAF = 10%. Further analyses have indicated that use of a linear mixed effects model such as incorporating a SNP as a covariate in a regression model with proper correlation structure for the error terms that fully represent the familial correlations remedies this problem and has a valid type I error rate in simulated data.
, coverage of LD is incomplete with the 100K scan. Nicolae et al
. report that the Affymetrix 100K GeneChip includes fewer SNPs in coding and more SNPs in intergenic regions than represented on the HapMap [90
]. Further, our sample size is modest. These two facts combined likely limit the power for detection of associations with several traits in these data. For instance, while we noted modest to high heritability of numerous phenotypes, underscoring the contribution of additive genetic effects to interindividual variation in these traits, we did not find significant low p-values for several heritable traits in relation to the SNPs evaluated. Factors contributing to this observation included both the limited coverage of the Affymetrix 100K GeneChip as well as the possibility that some of the less significant p-values (example between 0.05 and 10-5
) may represent true positive findings. The limited power to detect SNPs of small effect sizes offered by the analysis of our relatively modest sample size of ~1300 participants contributes to this phenomenon as well; we only have high power to detect a SNP explaining 4% or more of the phenotypic variance in the population-based GEE association test; the power of FBAT and variance component linkage analysis is even lower.
Additionally, for several of the analyzed phenotypes we did not observe any overlap between the top SNP-phenotype associations noted in GEE and FBAT analyses. The inherent differences in the two analytical methods especially in the context of the modest sample sizes, particularly for FBAT with small numbers of informative trios, may contribute to this phenomenon. FBAT is limited by the number of informative transmissions and although we suspect that there is little population stratification in our sample [76
], GEE is limited by potential bias due to stratification. Furthermore, for several phenotypes the SNPs associated with the top LOD scores in linkage analyses were not among the top 50 SNPs in association analyses (GEE or FBAT).
Ninth, we were limited in our ability to replicate genetic variants previously reported to be associated with phenotypes in our database because specific coverage of such genetic variation in these candidates was limited in the Affymetrix 100K GeneChip. We view such analyses as more illustrative of the potential utility of our GWAS, rather than as definitive evidence for or against an association described with a putative candidate gene in the published literature.
Our data do suggest several interesting biological candidates among the SNPs most strongly associated with different traits in the various analytical approaches. The strongest and most clear-cut of the associations were for those phenotypes that represent the direct protein product of a gene. Examples include the association of CRP concentrations with SNPs in the CRP
gene (Benjamin et al
. in this series [33
]) and factor VII levels with SNP rs561241 on chromosome 13 (Yang et al
. in this series [34
]). Thus, while it is difficult to point to any result as definitive, those results for which we find some evidence of replication of associations found in the literature are regarded as worthy of further research.
Finally, the Framingham Study participants were white of European descent and predominantly middle-aged to elderly. Hence, the genetic associations may not be generalizable to other ethnicities/races or to younger individuals.