Much of the current focus on investigating the relationship between genetic variation and disease is through the use of genome-wide association studies (GWAS). This approach has been an important workhorse in genetic epidemiology for the past 5 years, as hundreds of SNPs have been associated with the risk of complex diseases, such as type 2 diabetes and osteoporosis, as well as with health-related intermediate traits such as elevated circulating low density lipoprotein (LDL) cholesterol levels [Hindorff et al., 2009
]. While these associations have explained a small proportion of the heritability of these traits, they have provided new leads toward a better understanding of the etiology and underlying biology of disease [Lou et al., 2009
; Moffatt et al., 2007
; Musunuru et al., 2010
One could argue that the current paradigm of GWAS is limited and suffers from important shortcomings that inhibit the quest to further the understanding of the genetic contribution to disease and traits. Despite the decrease in cost over time, GWAS are generally limited in scope due to the cost of genotyping hundreds of thousands millions of SNPs for each individual within a study. GWAS usually focus on a specific phenotypic outcome or series of measurements. The focus on a limited phenotypic domain, such as the specific presence or absence of a single disease in a GWAS, neglects the potential power gained through the use of intermediate phenotypes, sub-phenotypes, biomarkers, and endophenotypes that may more closely reflect a gene's mechanism, as well as the relationship between genetic variation and multiple diseases and phenotypes (pleiotropy). Finally, most published GWAS have been performed in populations of European-decent, and there is still little characterization of the relationship between risk variants found in GWAS and disease and/or phenotypes in other racial/ethnic groups.
A complementary approach to GWAS is the “phenome-wide association study” (PheWAS). In PheWAS, the association between a number of common genetic variations and a wide variety and large number of phenotypes are systematically characterized. Phenotypic/genotypic resources amenable to the PheWAS approach are large-scale epidemiologic studies with comprehensive collections of well-characterized phenotypic measurements and environmental exposures recorded prospectively or retrospectively for thousands of participants, such as the studies of the Population Architecture using Genomics and Epidemiology (PAGE) network. Electronic medical record (EMR) resources coupled to genotypic data can also be used for PheWAS, as they also contain rich resources of phenotypic and genotypic information.
The EMR PheWAS approach was recently employed successfully by Denny et al. in a proof-of-concept investigation using BioVU [Ritchie et al., 2010
; Roden et al., 2008
], Vanderbilt University's biobank. In this PheWAS, five SNPs selected from the candidate gene and GWAS literature were tested for associations with ICD-9 codes in several thousand patients. The investigators were able to detect both the original associations (for atrial fibrillation, Crohn's disease, carotid artery stenosis, coronary artery disease, multiple sclerosis, systemic lupus erythematosus, and rheumatoid arthritis), as well as potentially novel associations with other clinical outcomes/conditions in the EMR [Denny et al., 2010
While ground-breaking, the sole PheWAS published in the literature has limitations that can be addressed through the use of data from the studies of the (PAGE) network. Like initial GWAS studies in the literature, the initial PheWAS focused primarily on binary disease traits in European Americans from a clinic setting. The studies of PAGE that include quantitative measurements, along with detailed disease status (incident or prevalent depending on the study), and in most cases longitudinal follow-up information, have the potential for increased power to detect genotypic/phenotypic relationships, as well as characterize those relationships across race/ethnicity in a population-based manner. In addition, the phenotypic measurements/outcomes of PAGE are measured using standardized protocols, thus reducing measurement error and facilitating phenotype harmonization between studies.
As stated above, the PheWAS analysis represents tests of association between a large number of SNPs and phenotypes and traits available in PAGE, and is meant to be high-throughput. As such, results of these first-pass analyses are considered hypothesis-generating and require additional scrutiny before the findings are further considered for follow-up. This hypothesis-generating exercise is unlike the directed, a priori hypothesis-testing within PAGE whereby specific SNPs hypothesized to be associated with specific phenotypes are tested only for those phenotypes. These a priori analyses include careful phenotype harmonization for traits and outcomes that overlap across two or more PAGE studies, as well as considerable investigation of the possible effect of covariates such as age, gender, and environmental exposure(s) on the association between genetic variation and phenotypic outcome. Unlike PheWAS, the advantage of these more carefully directed analyses is the potential for identification and characterization of genetic modifiers and accurate effect estimates. Also, because much effort has been expended to harmonize phenotypes across studies at the initiation of the study, less effort is required to scrutinize the results from these analyses compared with PheWAS. Despite the advantages of the more a priori driven approach, the hypothesis generating PheWAS promises the opportunity to identify unsuspected genotypic and phenotypic relationships for further investigation that can include these forms of more thorough model characterization.
We describe herein the conceptual framework and design of the high-throughput, first-pass analysis of the relationship between all risk variants genotyped thus far within PAGE and the comprehensive phenotypic resources of the PAGE network, using this PheWAS approach. The results of this scan across SNPs and phenotypes can be used to discover novel relationships between SNPs, phenotypes, and networks of phenotypes to foster hypothesis generation. This manuscript presents the infrastructure and methodology, as well as insights gained in this PAGE-directed project, to benefit the larger scientific community.