1.  WikiGWA: an open platform for collecting and using genome-wide association results 
The number of discovered genetic variants from genome-wide association (GWA) studies (GWAS) has been growing rapidly. Centralized efforts such as the National Human Genome Research Institute's GWAS catalog provide regular updates and a convenient interface for quick lookup. However, the catalog entries are manually curated and rely on data from published articles. Other tools such as SNPedia ( collect published results regarding functional consequences of genetic variations. Here, we propose an approach that allows individual investigators to share their GWA results through an open platform. Unlike GWAS catalog or SNPedia, wikiGWA collects first-hand GWAS results and in a much larger scale. Investigators are not only able to post a much larger amount of results, but also post results from unpublished studies, which could alleviate publication bias and facilitate identification of weak signals. Our interface allows for flexible and fast queries, and the query results are formatted to work seamlessly with the LocusZoom program for visualization and annotation. We here describe wikiGWA, made publically available at
PMCID: PMC3598322  PMID: 22929026
genome-wide association; open platform; bioinformatics
2.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data 
Nature biotechnology  2013;31(12):1102-1110.
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
PMCID: PMC3969265  PMID: 24270849
3.  Pleiotropic Associations of Risk Variants Identified for Other Cancers With Lung Cancer Risk: The PAGE and TRICL Consortia 
Genome-wide association studies have identified hundreds of genetic variants associated with specific cancers. A few of these risk regions have been associated with more than one cancer site; however, a systematic evaluation of the associations between risk variants for other cancers and lung cancer risk has yet to be performed.
We included 18023 patients with lung cancer and 60543 control subjects from two consortia, Population Architecture using Genomics and Epidemiology (PAGE) and Transdisciplinary Research in Cancer of the Lung (TRICL). We examined 165 single-nucleotide polymorphisms (SNPs) that were previously associated with at least one of 16 non–lung cancer sites. Study-specific logistic regression results underwent meta-analysis, and associations were also examined by race/ethnicity, histological cell type, sex, and smoking status. A Bonferroni-corrected P value of 2.5×10–5 was used to assign statistical significance.
The breast cancer SNP LSP1 rs3817198 was associated with an increased risk of lung cancer (odds ratio [OR] = 1.10; 95% confidence interval [CI] = 1.05 to 1.14; P = 2.8×10–6). This association was strongest for women with adenocarcinoma (P = 1.2×10–4) and not statistically significant in men (P = .14) with this cell type (P het by sex = .10). Two glioma risk variants, TERT rs2853676 and CDKN2BAS1 rs4977756, which are located in regions previously associated with lung cancer, were associated with increased risk of adenocarcinoma (OR = 1.16; 95% CI = 1.10 to 1.22; P = 1.1×10–8) and squamous cell carcinoma (OR = 1.13; CI = 1.07 to 1.19; P = 2.5×10–5), respectively.
Our findings demonstrate a novel pleiotropic association between the breast cancer LSP1 risk region marked by variant rs3817198 and lung cancer risk.
PMCID: PMC3982896  PMID: 24681604
4.  A survey of informatics approaches to whole-exome and whole-genome clinical reporting in the electronic health record 
Genome-scale clinical sequencing is being adopted more broadly in medical practice. The National Institutes of Health developed the Clinical Sequencing Exploratory Research (CSER) program to guide implementation and dissemination of best practices for the integration of sequencing into clinical care. This study describes and compares the state of the art of incorporating whole-exome and whole-genome sequencing results into the electronic health record, including approaches to decision support across the six current CSER sites.
The CSER Medical Record Working Group collaboratively developed and completed an in-depth survey to assess the communication of genome-scale data into the electronic health record. We summarized commonalities and divergent approaches.
Despite common sequencing platform (Illumina) adoptions, there is a great diversity of approaches to annotation tools and workflow, as well as to report generation. At all sites, reports are human-readable structured documents available as passive decision support in the electronic health record. Active decision support is in early implementation at two sites.
The parallel efforts across CSER sites in the creation of systems for report generation and integration of reports into the electronic health record, as well as the lack of standardized approaches to interfacing with variant databases to create active clinical decision support, create opportunities for cross-site and vendor collaborations.
PMCID: PMC3951437  PMID: 24071794
clinical decision support; clinical sequencing; decision support rules; electronic health record; electronic medical record; next-generation sequencing
5.  Pleiotropy of Cancer Susceptibility Variants on the Risk of Non-Hodgkin Lymphoma: The PAGE Consortium 
PLoS ONE  2014;9(3):e89791.
Risk of non-Hodgkin lymphoma (NHL) is higher among individuals with a family history or a prior diagnosis of other cancers. Genome-wide association studies (GWAS) have suggested that some genetic susceptibility variants are associated with multiple complex traits (pleiotropy).
We investigated whether common risk variants identified in cancer GWAS may also increase the risk of developing NHL as the first primary cancer.
As part of the Population Architecture using Genomics and Epidemiology (PAGE) consortium, 113 cancer risk variants were analyzed in 1,441 NHL cases and 24,183 controls from three studies (BioVU, Multiethnic Cohort Study, Women's Health Initiative) for their association with the risk of overall NHL and common subtypes [diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), chronic lymphocytic leukemia or small lymphocytic lymphoma (CLL/SLL)] using an additive genetic model adjusted for age, sex and ethnicity. Study-specific results for each variant were meta-analyzed across studies.
The analysis of NHL subtype-specific GWAS SNPs and overall NHL suggested a shared genetic susceptibility between FL and DLBCL, particularly involving variants in the major histocompatibility complex region (rs6457327 in 6p21.33: FL OR = 1.29, p = 0.013; DLBCL OR = 1.23, p = 0.013; NHL OR = 1.22, p = 5.9×E-05). In the pleiotropy analysis, six risk variants for other cancers were associated with NHL risk, including variants for lung (rs401681 in TERT: OR per C allele = 0.89, p = 3.7×E-03; rs4975616 in TERT: OR per A allele = 0.90, p = 0.01; rs3131379 in MSH5: OR per T allele = 1.16, p = 0.03), prostate (rs7679673 in TET2: OR per C allele = 0.89, p = 5.7×E-03; rs10993994 in MSMB: OR per T allele = 1.09, p = 0.04), and breast (rs3817198 in LSP1: OR per C allele = 1.12, p = 0.01) cancers, but none of these associations remained significant after multiple test correction.
This study does not support strong pleiotropic effects of non-NHL cancer risk variants in NHL etiology; however, larger studies are warranted.
PMCID: PMC3943855  PMID: 24598796
6.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations 
Nucleic Acids Research  2013;42(D1):D1001-D1006.
The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100 000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10−5. The Catalog includes 1751 curated publications of 11 912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs’ chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.
PMCID: PMC3965119  PMID: 24316577
7.  Genetic variants associated with fasting glucose and insulin concentrations in an ethnically diverse population: results from the Population Architecture using Genomics and Epidemiology (PAGE) study 
BMC Medical Genetics  2013;14:98.
Multiple genome-wide association studies (GWAS) within European populations have implicated common genetic variants associated with insulin and glucose concentrations. In contrast, few studies have been conducted within minority groups, which carry the highest burden of impaired glucose homeostasis and type 2 diabetes in the U.S.
As part of the 'Population Architecture using Genomics and Epidemiology (PAGE) Consortium, we investigated the association of up to 10 GWAS-identified single nucleotide polymorphisms (SNPs) in 8 genetic regions with glucose or insulin concentrations in up to 36,579 non-diabetic subjects including 23,323 European Americans (EA) and 7,526 African Americans (AA), 3,140 Hispanics, 1,779 American Indians (AI), and 811 Asians. We estimated the association between each SNP and fasting glucose or log-transformed fasting insulin, followed by meta-analysis to combine results across PAGE sites.
Overall, our results show that 9/9 GWAS SNPs are associated with glucose in EA (p = 0.04 to 9 × 10-15), versus 3/9 in AA (p= 0.03 to 6 × 10-5), 3/4 SNPs in Hispanics, 2/4 SNPs in AI, and 1/2 SNPs in Asians. For insulin we observed a significant association with rs780094/GCKR in EA, Hispanics and AI only.
Generalization of results across multiple racial/ethnic groups helps confirm the relevance of some of these loci for glucose and insulin metabolism. Lack of association in non-EA groups may be due to insufficient power, or to unique patterns of linkage disequilibrium.
PMCID: PMC3849560  PMID: 24063630
8.  Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study 
PLoS Biology  2013;11(9):e1001661.
A multi-ethnic study demonstrates that the extrapolation of genetic disease risk models from European populations to other ethnicities is compromised more strongly by genetic structure than by environmental or global genetic background in differential genetic risk associations across ethnicities.
The vast majority of genome-wide association study (GWAS) findings reported to date are from populations with European Ancestry (EA), and it is not yet clear how broadly the genetic associations described will generalize to populations of diverse ancestry. The Population Architecture Using Genomics and Epidemiology (PAGE) study is a consortium of multi-ancestry, population-based studies formed with the objective of refining our understanding of the genetic architecture of common traits emerging from GWAS. In the present analysis of five common diseases and traits, including body mass index, type 2 diabetes, and lipid levels, we compare direction and magnitude of effects for GWAS-identified variants in multiple non-EA populations against EA findings. We demonstrate that, in all populations analyzed, a significant majority of GWAS-identified variants have allelic associations in the same direction as in EA, with none showing a statistically significant effect in the opposite direction, after adjustment for multiple testing. However, 25% of tagSNPs identified in EA GWAS have significantly different effect sizes in at least one non-EA population, and these differential effects were most frequent in African Americans where all differential effects were diluted toward the null. We demonstrate that differential LD between tagSNPs and functional variants within populations contributes significantly to dilute effect sizes in this population. Although most variants identified from GWAS in EA populations generalize to all non-EA populations assessed, genetic models derived from GWAS findings in EA may generate spurious results in non-EA populations due to differential effect sizes. Regardless of the origin of the differential effects, caution should be exercised in applying any genetic risk prediction model based on tagSNPs outside of the ancestry group in which it was derived. Models based directly on functional variation may generalize more robustly, but the identification of functional variants remains challenging.
Author Summary
The number of known associations between human diseases and common genetic variants has grown dramatically in the past decade, most being identified in large-scale genetic studies of people of Western European origin. But because the frequencies of genetic variants can differ substantially between continental populations, it's important to assess how well these associations can be extended to populations with different continental ancestry. Are the correlations between genetic variants, disease endpoints, and risk factors consistent enough for genetic risk models to be reliably applied across different ancestries? Here we describe a systematic analysis of disease outcome and risk-factor–associated variants (tagSNPs) identified in European populations, in which we test whether the effect size of a tagSNP is consistent across six populations with significant non-European ancestry. We demonstrate that although nearly all such tagSNPs have effects in the same direction across all ancestries (i.e., variants associated with higher risk in Europeans will also be associated with higher risk in other populations), roughly a quarter of the variants tested have significantly different magnitude of effect (usually lower) in at least one non-European population. We therefore advise caution in the use of tagSNP-based genetic disease risk models in populations that have a different genetic ancestry from the population in which original associations were first made. We then show that this differential strength of association can be attributed to population-dependent variations in the correlation between tagSNPs and the variant that actually determines risk—the so-called functional variant. Risk models based on functional variants are therefore likely to be more robust than tagSNP-based models.
PMCID: PMC3775722  PMID: 24068893
9.  Consistent Directions of Effect for Established Type 2 Diabetes Risk Variants Across Populations 
Diabetes  2012;61(6):1642-1647.
Common genetic risk variants for type 2 diabetes (T2D) have primarily been identified in populations of European and Asian ancestry. We tested whether the direction of association with 20 T2D risk variants generalizes across six major racial/ethnic groups in the U.S. as part of the Population Architecture using Genomics and Epidemiology Consortium (16,235 diabetes case and 46,122 control subjects of European American, African American, Hispanic, East Asian, American Indian, and Native Hawaiian ancestry). The percentage of positive (odds ratio [OR] >1 for putative risk allele) associations ranged from 69% in American Indians to 100% in European Americans. Of the nine variants where we observed significant heterogeneity of effect by racial/ethnic group (Pheterogeneity < 0.05), eight were positively associated with risk (OR >1) in at least five groups. The marked directional consistency of association observed for most genetic variants across populations implies a shared functional common variant in each region. Fine-mapping of all loci will be required to reveal markers of risk that are important within and across populations.
PMCID: PMC3357304  PMID: 22474029
10.  Investigation of gene-by-sex interactions for lipid traits in diverse populations from the population architecture using genomics and epidemiology study 
BMC Genetics  2013;14:33.
High-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and triglyceride (TG) levels are influenced by both genes and the environment. Genome-wide association studies (GWAS) have identified ~100 common genetic variants associated with HDL-C, LDL-C, and/or TG levels, mostly in populations of European descent, but little is known about the modifiers of these associations. Here, we investigated whether GWAS-identified SNPs for lipid traits exhibited heterogeneity by sex in the Population Architecture using Genomics and Epidemiology (PAGE) study.
A sex-stratified meta-analysis was performed for 49 GWAS-identified SNPs for fasting HDL-C, LDL-C, and ln(TG) levels among adults self-identified as European American (25,013). Heterogeneity by sex was established when phet < 0.001. There was evidence for heterogeneity by sex for two SNPs for ln(TG) in the APOA1/C3/A4/A5/BUD13 gene cluster: rs28927680 (phet = 7.4x10-7) and rs3135506 (phet = 4.3x10-4), one SNP in PLTP for HDL levels (rs7679; phet = 9.9x10-4), and one in HMGCR for LDL levels (rs12654264; phet = 3.1x10-5). We replicated heterogeneity by sex in five of seventeen loci previously reported by genome-wide studies (binomial p = 0.0009). We also present results for other racial/ethnic groups in the supplementary materials, to provide a resource for future meta-analyses.
We provide further evidence for sex-specific effects of SNPs in the APOA1/C3/A4/A5/BUD13 gene cluster, PLTP, and HMGCR on fasting triglyceride levels in European Americans from the PAGE study. Our findings emphasize the need for considering context-specific effects when interpreting genetic associations emerging from GWAS, and also highlight the difficulties in replicating interaction effects across studies and across racial/ethnic groups.
PMCID: PMC3669109  PMID: 23634756
Lipids; Genetics; Cardiovascular disease; Heterogeneity; Sex-specific effect; Association study
11.  Associations Between Incident Ischemic Stroke Events and Stroke and Cardiovascular Disease-Related GWAS SNPs in the Population Architecture Using Genomics and Epidemiology (PAGE) Study 
Genome-wide association studies (GWAS) have identified loci associated with ischemic stroke (IS) and cardiovascular disease (CVD) in European-descent individuals, but their replication in different populations has been largely unexplored.
Methods and Results
Nine single-nucleotide polymorphisms (SNPs) selected from GWAS and meta-analyses of stroke and 86 SNPs previously associated with myocardial infarction and CVD risk factors including blood lipids (HDL, LDL, triglycerides), type 2 diabetes and body mass index were investigated for associations with incident IS in European Americans (EA) N=26,276; African Americans (AA) N=8970; and American Indians (AI) N= 3570 from the Population Architecture using Genomics and Epidemiology Study. Ancestry-specific fixed effects meta-analysis with inverse variance weighting was used to combine study-specific log hazard ratios from Cox proportional hazards models. Two of 9 stroke SNPs (rs783396 and rs1804689) were associated with increased IS hazard in AA; none were significant in this large EA cohort. Of 73 CVD risk factor SNPs tested in EA, two (HDL and triglycerides SNPs) were associated with IS. In AA, SNPs associated with LDL, HDL and BMI were significantly associated with IS (3 of 86 SNPs tested). Out of 58 SNPs tested in AI, one LDL SNP was significantly associated with IS.
Our analyses showing lack of replication in spite of reasonable power for many stroke SNPs and differing results by ancestry highlight the need to follow-up on GWAS findings and conduct genetic association studies in diverse populations. We found modest IS associations with BMI and lipids SNPs, though these findings require confirmation.
PMCID: PMC3402178  PMID: 22403240
genetics of stroke; risk factors for stroke; genetics of cardiovascular disease; epidemiology
12.  Trans-Ethnic Fine-Mapping of Lipid Loci Identifies Population-Specific Signals and Allelic Heterogeneity That Increases the Trait Variance Explained 
Wu, Ying | Waite, Lindsay L. | Jackson, Anne U. | Sheu, Wayne H-H. | Buyske, Steven | Absher, Devin | Arnett, Donna K. | Boerwinkle, Eric | Bonnycastle, Lori L. | Carty, Cara L. | Cheng, Iona | Cochran, Barbara | Croteau-Chonka, Damien C. | Dumitrescu, Logan | Eaton, Charles B. | Franceschini, Nora | Guo, Xiuqing | Henderson, Brian E. | Hindorff, Lucia A. | Kim, Eric | Kinnunen, Leena | Komulainen, Pirjo | Lee, Wen-Jane | Le Marchand, Loic | Lin, Yi | Lindström, Jaana | Lingaas-Holmen, Oddgeir | Mitchell, Sabrina L. | Narisu, Narisu | Robinson, Jennifer G. | Schumacher, Fred | Stančáková, Alena | Sundvall, Jouko | Sung, Yun-Ju | Swift, Amy J. | Wang, Wen-Chang | Wilkens, Lynne | Wilsgaard, Tom | Young, Alicia M. | Adair, Linda S. | Ballantyne, Christie M. | Bůžková, Petra | Chakravarti, Aravinda | Collins, Francis S. | Duggan, David | Feranil, Alan B. | Ho, Low-Tone | Hung, Yi-Jen | Hunt, Steven C. | Hveem, Kristian | Juang, Jyh-Ming J. | Kesäniemi, Antero Y. | Kuusisto, Johanna | Laakso, Markku | Lakka, Timo A. | Lee, I-Te | Leppert, Mark F. | Matise, Tara C. | Moilanen, Leena | Njølstad, Inger | Peters, Ulrike | Quertermous, Thomas | Rauramaa, Rainer | Rotter, Jerome I. | Saramies, Jouko | Tuomilehto, Jaakko | Uusitupa, Matti | Wang, Tzung-Dau | Boehnke, Michael | Haiman, Christopher A. | Chen, Yii-Der I. | Kooperberg, Charles | Assimes, Themistocles L. | Crawford, Dana C. | Hsiung, Chao A. | North, Kari E. | Mohlke, Karen L. | Gibson, Greg
PLoS Genetics  2013;9(3):e1003379.
Genome-wide association studies (GWAS) have identified ∼100 loci associated with blood lipid levels, but much of the trait heritability remains unexplained, and at most loci the identities of the trait-influencing variants remain unknown. We conducted a trans-ethnic fine-mapping study at 18, 22, and 18 GWAS loci on the Metabochip for their association with triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C), respectively, in individuals of African American (n = 6,832), East Asian (n = 9,449), and European (n = 10,829) ancestry. We aimed to identify the variants with strongest association at each locus, identify additional and population-specific signals, refine association signals, and assess the relative significance of previously described functional variants. Among the 58 loci, 33 exhibited evidence of association at P<1×10−4 in at least one ancestry group. Sequential conditional analyses revealed that ten, nine, and four loci in African Americans, Europeans, and East Asians, respectively, exhibited two or more signals. At these loci, accounting for all signals led to a 1.3- to 1.8-fold increase in the explained phenotypic variance compared to the strongest signals. Distinct signals across ancestry groups were identified at PCSK9 and APOA5. Trans-ethnic analyses narrowed the signals to smaller sets of variants at GCKR, PPP1R3B, ABO, LCAT, and ABCA1. Of 27 variants reported previously to have functional effects, 74% exhibited the strongest association at the respective signal. In conclusion, trans-ethnic high-density genotyping and analysis confirm the presence of allelic heterogeneity, allow the identification of population-specific variants, and limit the number of candidate SNPs for functional studies.
Author Summary
Lipid traits are heritable, but many of the DNA variants that influence lipid levels remain unknown. In a genomic region, more than one variant may affect gene expression or function, and the frequencies of these variants can differ across populations. Genotyping densely spaced variants in individuals with different ancestries may increase the chance of identifying variants that affect gene expression or function. We analyzed high-density genotyped variants for association with TG, HDL-C, and LDL-C in African Americans, East Asians, and Europeans. At several genomic regions, we provide evidence that two or more variants can influence lipid traits; across loci, these additional signals increase the proportion of trait variation that can be explained by genes. At some association signals shared across populations, combining data from individuals of different ancestries narrowed the set of likely functional variants. At PCSK9 and APOA5, the data suggest that different variants influence trait levels in different populations. Variants previously reported to alter gene expression or function frequently exhibited the strongest association at those signals. The multiple signals and population-specific characteristics of the loci described here may be shared by genetic loci for other complex traits.
PMCID: PMC3605054  PMID: 23555291
13.  Genetic Variation and Reproductive Timing: African American Women from the Population Architecture Using Genomics and Epidemiology (PAGE) Study 
PLoS ONE  2013;8(2):e55258.
Age at menarche (AM) and age at natural menopause (ANM) define the boundaries of the reproductive lifespan in women. Their timing is associated with various diseases, including cancer and cardiovascular disease. Genome-wide association studies have identified several genetic variants associated with either AM or ANM in populations of largely European or Asian descent women. The extent to which these associations generalize to diverse populations remains unknown. Therefore, we sought to replicate previously reported AM and ANM findings and to identify novel AM and ANM variants using the Metabochip (n = 161,098 SNPs) in 4,159 and 1,860 African American women, respectively, in the Women’s Health Initiative (WHI) and Atherosclerosis Risk in Communities (ARIC) studies, as part of the Population Architecture using Genomics and Epidemiology (PAGE) Study. We replicated or generalized one previously identified variant for AM, rs1361108/CENPW, and two variants for ANM, rs897798/BRSK1 and rs769450/APOE, to our African American cohort. Overall, generalization of the majority of previously-identified variants for AM and ANM, including LIN28B and MCM8, was not observed in this African American sample. We identified three novel loci associated with ANM that reached significance after multiple testing correction (LDLR rs189596789, p = 5×10−08; KCNQ1 rs79972789, p = 1.9×10−07; COL4A3BP rs181686584, p = 2.9×10−07). Our most significant AM association was upstream of RSF1, a gene implicated in ovarian and breast cancers (rs11604207, p = 1.6×10−06). While most associations were identified in either AM or ANM, we did identify genes suggestively associated with both: PHACTR1 and ARHGAP42. The lack of generalization coupled with the potentially novel associations identified here emphasize the need for additional genetic discovery efforts for AM and ANM in diverse populations.
PMCID: PMC3570525  PMID: 23424626
14.  Genotype Imputation of Metabochip SNPs Using a Study-Specific Reference Panel of ~4,000 Haplotypes in African Americans From the Women’s Health Initiative 
Genetic epidemiology  2012;36(2):107-117.
Genetic imputation has become standard practice in modern genetic studies. However, several important issues have not been adequately addressed including the utility of study-specific reference, performance in admixed populations, and quality for less common (minor allele frequency [MAF] 0.005–0.05) and rare (MAF < 0.005) variants. These issues only recently became addressable with genome-wide association studies (GWAS) follow-up studies using dense genotyping or sequencing in large samples of non-European individuals. In this work, we constructed a study-specific reference panel of 3,924 haplotypes using African Americans in the Women’s Health Initiative (WHI) genotyped on both the Metabochip and the Affymetrix 6.0 GWAS platform. We used this reference panel to impute into 6,459 WHI SNP Health Association Resource (SHARe) study subjects with only GWAS genotypes. Our analysis confirmed the imputation quality metric Rsq (estimated r2, specific to each SNP) as an effective post-imputation filter. We recommend different Rsq thresholds for different MAF categories such that the average (across SNPs) Rsq is above the desired dosage r2 (squared Pearson correlation between imputed and experimental genotypes).With a desired dosage r2 of 80%, 99.9% (97.5%, 83.6%, 52.0%, 20.5%) of SNPs with MAF > 0.05 (0.03–0.05, 0.01–0.03, 0.005–0.01, and 0.001–0.005) passed the post-imputation filter. The average dosage r2 for these SNPs is 94.7%, 92.1%, 89.0%, 83.1%, and 79.7%, respectively. These results suggest that for African Americans imputation of Metabochip SNPs from GWAS data, including low frequency SNPs with MAF 0.005–0.05, is feasible and worthwhile for power increase in downstream association analysis provided a sizable reference panel is available.
PMCID: PMC3410659  PMID: 22851474
genotype imputation; Metabochip; internal reference; African Americans; rare variants
15.  Phenome-Wide Association Study (PheWAS) for Detection of Pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network 
PLoS Genetics  2013;9(1):e1003087.
Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype–phenotype associations, 26 represented phenotypes closely related to previously known genotype–phenotype associations, and 33 represented potentially novel genotype–phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits.
Author Summary
In phenome-wide association studies (PheWAS) all potential genetic variants in a dataset are systematically tested for association with all available phenotypes and traits that have been measured in study participants. By investigating the relationship between genetic variation and a diversity of phenotypes, there is the potential for uncovering novel relationships between single nucleotide polymorphisms (SNPs), phenotypes, and networks of interrelated phenotypes. PheWAS also can expose pleiotropy, provide novel mechanistic insights, and foster hypothesis generation. This approach is complementary to genome-wide association studies (GWAS) that test the association between hundreds of thousands, to over a million, single nucleotide polymorphisms and a single phenotype or limited phenotypic domain. The Population Architecture using Genomics and Epidemiology (PAGE) network has measures for a wide array of phenotypes and traits, including prevalent and incident status for clinical conditions and risk factors, as well as clinical parameters and intermediate biomarkers. We performed tests of association between a series of genome-wide association study (GWAS)–identified SNPs and a comprehensive range of phenotypes from the PAGE network in a high-throughput manner. We replicated a number of previously reported associations, validating the PheWAS approach. We also identified novel genotype–phenotype associations possibly representing pleiotropic effects.
PMCID: PMC3561060  PMID: 23382687
16.  A Systematic Mapping Approach of 16q12.2/FTO and BMI in More Than 20,000 African Americans Narrows in on the Underlying Functional Variation: Results from the Population Architecture using Genomics and Epidemiology (PAGE) Study 
PLoS Genetics  2013;9(1):e1003171.
Genetic variants in intron 1 of the fat mass– and obesity-associated (FTO) gene have been consistently associated with body mass index (BMI) in Europeans. However, follow-up studies in African Americans (AA) have shown no support for some of the most consistently BMI–associated FTO index single nucleotide polymorphisms (SNPs). This is most likely explained by different race-specific linkage disequilibrium (LD) patterns and lower correlation overall in AA, which provides the opportunity to fine-map this region and narrow in on the functional variant. To comprehensively explore the 16q12.2/FTO locus and to search for second independent signals in the broader region, we fine-mapped a 646–kb region, encompassing the large FTO gene and the flanking gene RPGRIP1L by investigating a total of 3,756 variants (1,529 genotyped and 2,227 imputed variants) in 20,488 AAs across five studies. We observed associations between BMI and variants in the known FTO intron 1 locus: the SNP with the most significant p-value, rs56137030 (8.3×10−6) had not been highlighted in previous studies. While rs56137030was correlated at r2>0.5 with 103 SNPs in Europeans (including the GWAS index SNPs), this number was reduced to 28 SNPs in AA. Among rs56137030 and the 28 correlated SNPs, six were located within candidate intronic regulatory elements, including rs1421085, for which we predicted allele-specific binding affinity for the transcription factor CUX1, which has recently been implicated in the regulation of FTO. We did not find strong evidence for a second independent signal in the broader region. In summary, this large fine-mapping study in AA has substantially reduced the number of common alleles that are likely to be functional candidates of the known FTO locus. Importantly our study demonstrated that comprehensive fine-mapping in AA provides a powerful approach to narrow in on the functional candidate(s) underlying the initial GWAS findings in European populations.
Author Summary
Genetic variants within the fat mass– and obesity-associated (FTO) gene are associated with increased risk of obesity. To better understand which specific genetic variant(s) in this genetic region is associated with obesity risk, we attempt to genotype or impute all known genetic variants in the region and test for association with body mass index as a measurement of obesity in over 20,000 African Americans. We identified 29 potential candidate variants, of which one variant (rs1421085) is a particularly interesting candidate for future functional follow-up studies. Our example shows the powerful approach of studying a large African American population, substantially reducing the number of possible functional variants compared with European descent populations.
PMCID: PMC3547789  PMID: 23341774
17.  Effects of smoking on the genetic risk of obesity: the population architecture using genomics and epidemiology study 
BMC Medical Genetics  2013;14:6.
Although smoking behavior is known to affect body mass index (BMI), the potential for smoking to influence genetic associations with BMI is largely unexplored.
As part of the ‘Population Architecture using Genomics and Epidemiology (PAGE)’ Consortium, we investigated interaction between genetic risk factors associated with BMI and smoking for 10 single nucleotide polymorphisms (SNPs) previously identified in genome-wide association studies. We included 6 studies with a total of 56,466 subjects (16,750 African Americans (AA) and 39,716 European Americans (EA)). We assessed effect modification by testing an interaction term for each SNP and smoking (current vs. former/never) in the linear regression and by stratified analyses.
We did not observe strong evidence for interactions and only observed two interactions with p-values <0.1: for rs6548238/TMEM18, the risk allele (C) was associated with BMI only among AA females who were former/never smokers (β = 0.018, p = 0.002), vs. current smokers (β = 0.001, p = 0.95, pinteraction = 0.10). For rs9939609/FTO, the A allele was more strongly associated with BMI among current smoker EA females (β = 0.017, p = 3.5x10-5), vs. former/never smokers (β = 0.006, p = 0.05, pinteraction = 0.08).
These analyses provide limited evidence that smoking status may modify genetic effects of previously identified genetic risk factors for BMI. Larger studies are needed to follow up our results.
Clinical Trial Registration
PMCID: PMC3564691  PMID: 23311614
Obesity; Body mass index; Genome-wide association study; Genetic risk factor; Smoking interactions; Genetic epidemiology
18.  Association of Genetic Variants and Incident Coronary Heart Disease in Multi-Ethnic Cohorts. The PAGE Study 
Genome wide association studies identified several single nucleotide polymorphisms (SNPs) associated with prevalent coronary heart disease (CHD) but less is known of associations with incident CHD. The association of thirteen published CHD SNPs was examined in five ancestry groups of four large US prospective cohorts.
Methods and Results
The analyses included incident coronary events over 9.1 to 15.7 average follow-up times in up to 26,617 white individuals (6,626 events), 8,018 African Americans (914 events), 1,903 Hispanics (113 events), 3,669 American Indians (595 events) and 885 Asian/Pacific Islanders (66 events). We used Cox proportional hazards models (with additive mode of inheritance) adjusted for age, sex and ancestry (as needed). Nine loci were statistically associated with incident CHD events in whites: 9p21 (rs10757278, p=4.7 × 10−41), 16q23.1 (rs2549513, p=0.0004), 6p24.1 (rs499818, p=0.0002), 2q36.3 (rs2943634, p=6.7 × 10−6), MTHFDIL (rs6922269, p=5.1 × 10−10), APOE (rs429358, p=2.7 × 10−18), ZNF627 (rs4804611, p=5.0 × 10−8), CXCL12 (rs501120, p=1.4 × 10−6) and LPL (rs268, p=2.7 × 10−17). The 9p21 region showed significant between-study heterogeneity, with larger effects in individuals aged 55 years or younger and in women. Inclusion of coronary revascularization procedures among the incident CHD events introduced heterogeneity. The SNPs were not associated with CHD in African Americans and associations varied in other US minorities.
Prospective analyses of white individuals replicated several reported cross-sectional CHD-SNP associations.
PMCID: PMC3293207  PMID: 22042884
9p21 locus; incident coronary heart disease; genetic polymorphisms
19.  GWAS Integrator: a bioinformatics tool to explore human genetic associations reported in published genome-wide association studies 
European Journal of Human Genetics  2011;19(10):1095-1099.
Genome-wide association studies (GWAS) have successfully identified numerous genetic loci that are associated with phenotypic traits and diseases. GWAS Integrator is a bioinformatics tool that integrates information on these associations from the National Human Genome Research institute (NHGRI) Catalog, SNAP (SNP Annotation and Proxy Search), and the Human Genome Epidemiology (HuGE) Navigator literature database. This tool includes robust search and data mining functionalities that can be used to quickly identify relevant associations from GWAS, as well as proxy single-nucleotide polymorphisms (SNPs) and potential candidate genes. Query-based University of California Santa Cruz (UCSC) Genome Browser custom tracks are generated dynamically on the basis of users' selected GWAS hits or candidate genes from HuGE Navigator literature database ( The GWAS Integrator may help enhance inference on potential genetic associations identified from GWAS studies.
PMCID: PMC3190251  PMID: 21610748
genome-wide association studies; database; bioinformatics
20.  Fine-Mapping and Initial Characterization of QT Interval Loci in African Americans 
PLoS Genetics  2012;8(8):e1002870.
The QT interval (QT) is heritable and its prolongation is a risk factor for ventricular tachyarrhythmias and sudden death. Most genetic studies of QT have examined European ancestral populations; however, the increased genetic diversity in African Americans provides opportunities to narrow association signals and identify population-specific variants. We therefore evaluated 6,670 SNPs spanning eleven previously identified QT loci in 8,644 African American participants from two Population Architecture using Genomics and Epidemiology (PAGE) studies: the Atherosclerosis Risk in Communities study and Women's Health Initiative Clinical Trial. Of the fifteen known independent QT variants at the eleven previously identified loci, six were significantly associated with QT in African American populations (P≤1.20×10−4): ATP1B1, PLN1, KCNQ1, NDRG4, and two NOS1AP independent signals. We also identified three population-specific signals significantly associated with QT in African Americans (P≤1.37×10−5): one at NOS1AP and two at ATP1B1. Linkage disequilibrium (LD) patterns in African Americans assisted in narrowing the region likely to contain the functional variants for several loci. For example, African American LD patterns showed that 0 SNPs were in LD with NOS1AP signal rs12143842, compared with European LD patterns that indicated 87 SNPs, which spanned 114.2 Kb, were in LD with rs12143842. Finally, bioinformatic-based characterization of the nine African American signals pointed to functional candidates located exclusively within non-coding regions, including predicted binding sites for transcription factors such as TBX5, which has been implicated in cardiac structure and conductance. In this detailed evaluation of QT loci, we identified several African Americans SNPs that better define the association with QT and successfully narrowed intervals surrounding established loci. These results demonstrate that the same loci influence variation in QT across multiple populations, that novel signals exist in African Americans, and that the SNPs identified as strong candidates for functional evaluation implicate gene regulatory dysfunction in QT prolongation.
Author Summary
The QT interval (QT) provides a measure of a ventricular action potential, and its prolongation is associated with sudden death and ventricular arrhythmias. Genome-wide association studies performed in European populations have identified common genetic variants that influence QT. However, it is unclear whether these variants are relevant in other populations, including African Americans. The increased genetic diversity in African Americans also provides opportunities to narrow association signals and identify candidates for functional evaluation. We therefore used data from 8,644 African Americans to further characterize previously identified QT loci. Of the fifteen known independent QT variants at the eleven previously identified QT loci, six were associated with QT in African Americans. We also identified three variants that were independent from previously reported signals and narrowed intervals flanking association signals using patterns of linkage disequilibrium. Finally, bioinformatic-based characterization pointed to candidates located outside protein coding regions. Our results underscore the utility of genetic studies in African ancestral populations to identify novel variants and narrow intervals surrounding established loci. These results suggest that known QT loci are important in African Americans and that further characterization of these loci in other populations may provide additional insights into the genetic and molecular mechanisms underlying QT.
PMCID: PMC3415454  PMID: 22912591
21.  Genetic architecture of cancer and other complex diseases: lessons learned and future directions 
Carcinogenesis  2011;32(7):945-954.
Genome-wide association studies have broadened our understanding of the genetic architecture of cancer to include common variants, in addition to the rare variants previously identified by linkage analysis. We review current knowledge on the genetic architecture of four cancers—breast, lung, prostate and colorectal—for which the balance of common and rare alleles identified ranges from fewer common alleles (lung cancer) to more common alleles (prostate cancer). Although most variants are cancer specific, pleiotropy has been observed for several variants, for example, variants at the 8q24 locus and breast, ovarian and prostate cancers or variants in KITLG in relation to hair color and testicular cancer. Although few studies have been adequately powered to investigate heterogeneity among ancestry groups, effect sizes associated with common variants have been reported to be fairly homogenous among ethnic groups. Some associations appear to be ancestry specific, such as HNF1B, which is associated with prostate cancer in European Americans and Latinos but not in African-Americans. Studies of cancer and other complex diseases suggest that a simple dichotomy between rare and common allelic architectures may be too simplistic and that future research is needed to characterize a fuller spectrum of allele frequency (common (>5%), uncommon (1–5%) and rare (<<1%) alleles) and effect size. In addition, a broadening of the concept of genetic architecture to encompass both population architecture, which reflects differences in exposures, genetic factors and population level risk among diverse groups of people, and genomic architecture, which includes structural, epigenomic and somatic variation, is envisioned.
PMCID: PMC3140138  PMID: 21459759
22.  Evaluation of the Metabochip Genotyping Array in African Americans and Implications for Fine Mapping of GWAS-Identified Loci: The PAGE Study 
PLoS ONE  2012;7(4):e35651.
The Metabochip is a custom genotyping array designed for replication and fine mapping of metabolic, cardiovascular, and anthropometric trait loci and includes low frequency variation content identified from the 1000 Genomes Project. It has 196,725 SNPs concentrated in 257 genomic regions. We evaluated the Metabochip in 5,863 African Americans; 89% of all SNPs passed rigorous quality control with a call rate of 99.9%. Two examples illustrate the value of fine mapping with the Metabochip in African-ancestry populations. At CELSR2/PSRC1/SORT1, we found the strongest associated SNP for LDL-C to be rs12740374 (p = 3.5×10−11), a SNP indistinguishable from multiple SNPs in European ancestry samples due to high correlation. Its distinct signal supports functional studies elsewhere suggesting a causal role in LDL-C. At CETP we found rs17231520, with risk allele frequency 0.07 in African Americans, to be associated with HDL-C (p = 7.2×10−36). This variant is very rare in Europeans and not tagged in common GWAS arrays, but was identified as associated with HDL-C in African Americans in a single-gene study. Our results, one narrowing the risk interval and the other revealing an associated variant not found in Europeans, demonstrate the advantages of high-density genotyping of common and rare variation for fine mapping of trait loci in African American samples.
PMCID: PMC3335090  PMID: 22539988
23.  A Phenomics-Based Strategy Identifies Loci on APOC1, BRAP, and PLCG1 Associated with Metabolic Syndrome Phenotype Domains 
PLoS Genetics  2011;7(10):e1002322.
Despite evidence of the clustering of metabolic syndrome components, current approaches for identifying unifying genetic mechanisms typically evaluate clinical categories that do not provide adequate etiological information. Here, we used data from 19,486 European American and 6,287 African American Candidate Gene Association Resource Consortium participants to identify loci associated with the clustering of metabolic phenotypes. Six phenotype domains (atherogenic dyslipidemia, vascular dysfunction, vascular inflammation, pro-thrombotic state, central obesity, and elevated plasma glucose) encompassing 19 quantitative traits were examined. Principal components analysis was used to reduce the dimension of each domain such that >55% of the trait variance was represented within each domain. We then applied a statistically efficient and computational feasible multivariate approach that related eight principal components from the six domains to 250,000 imputed SNPs using an additive genetic model and including demographic covariates. In European Americans, we identified 606 genome-wide significant SNPs representing 19 loci. Many of these loci were associated with only one trait domain, were consistent with results in African Americans, and overlapped with published findings, for instance central obesity and FTO. However, our approach, which is applicable to any set of interval scale traits that is heritable and exhibits evidence of phenotypic clustering, identified three new loci in or near APOC1, BRAP, and PLCG1, which were associated with multiple phenotype domains. These pleiotropic loci may help characterize metabolic dysregulation and identify targets for intervention.
Author Summary
The metabolic syndrome represents a clustering of metabolic phenotypes (e.g. elevated blood pressure, cholesterol levels, and plasma glucose, as well as abdominal obesity) and is associated with an increased risk of atherosclerosis and type 2 diabetes. Although multiple genes influencing the specific metabolic syndrome components have been reported, few studies have evaluated the genetic underpinnings of the syndrome as a whole. Here, we describe an approach to evaluate multiple clustered traits, which allows us to test whether common genetic variants influence the co-occurrence of one or more metabolic phenotypes. By examining approximately 20,000 European American and 6,200 African American participants from five studies, we show that three regions on chromosomes 12, 19, and 20 are associated with multiple metabolic phenotypes. These genetic variants are highly intriguing candidates that may increase our understanding of the biologic basis of the clustering of metabolic phenotypes and help identify targets for early intervention.
PMCID: PMC3192835  PMID: 22022282
24.  The Next PAGE in Understanding Complex Traits: Design for the Analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study 
American Journal of Epidemiology  2011;174(7):849-859.
Genetic studies have identified thousands of variants associated with complex traits. However, most association studies are limited to populations of European descent and a single phenotype. The Population Architecture using Genomics and Epidemiology (PAGE) Study was initiated in 2008 by the National Human Genome Research Institute to investigate the epidemiologic architecture of well-replicated genetic variants associated with complex diseases in several large, ethnically diverse population-based studies. Combining DNA samples and hundreds of phenotypes from multiple cohorts, PAGE is well-suited to address generalization of associations and variability of effects in diverse populations; identify genetic and environmental modifiers; evaluate disease subtypes, intermediate phenotypes, and biomarkers; and investigate associations with novel phenotypes. PAGE investigators harmonize phenotypes across studies where possible and perform coordinated cohort-specific analyses and meta-analyses. PAGE researchers are genotyping thousands of genetic variants in up to 121,000 DNA samples from African-American, white, Hispanic/Latino, Asian/Pacific Islander, and American Indian participants. Initial analyses will focus on single nucleotide polymorphisms (SNPs) associated with obesity, lipids, cardiovascular disease, type 2 diabetes, inflammation, various cancers, and related biomarkers. PAGE SNPs are also assessed for pleiotropy using the “phenome-wide association study” approach, testing each SNP for associations with hundreds of phenotypes. PAGE data will be deposited into the National Center for Biotechnology Information's Database of Genotypes and Phenotypes and made available via a custom browser.
PMCID: PMC3176830  PMID: 21836165
cardiovascular diseases; cohort studies; genome-wide association study; multifactorial inheritance; neoplasms; obesity; population characteristics; reproducibility of results
25.  Use of Factor V Leiden genetic testing in practice and impact on management 
To assess the use of the genetic test for Factor V Leiden in clinical practice, physician adherence to national and local guidelines, and impacts of test results on patient management.
Chart review of all patients tested for Factor V Leiden during a 1-year period (2003) in a large nonprofit health care system (group health) (n = 272).
The test for Factor V Leiden was most often used in nonacute outpatient settings by primary care practitioners, in combination with other tests for procoagulant disorders. Testing was performed more broadly than recommended: 61% of tests met American College of Medical Genetics guidelines, 46% of tests met CAP guidelines, and 37% of tests met group health internal guidelines. The most common rationale for testing was to explain a clinical event (58%). Patient management was modified more often in heterozygotes (54%) than in those with normal results (13%) (P < 0.0001).
The uptake of the test for Factor V Leiden has not followed existing recommendations. Genetic risk information was used to influence patient management in the absence of supporting evidence related to health outcomes. These results underscore the importance of further research concerning effective prevention and treatment strategies for patients with genetic risk to help translate genetic risk information into improved health outcomes.
PMCID: PMC3132195  PMID: 19668081
Factor V Leiden; genetic test; patient management; clinical practice guidelines; clinical utility

