1.  Variants Affecting Exon Skipping Contribute to Complex Traits 
PLoS Genetics  2012;8(10):e1002998.
DNA variants that affect alternative splicing and the relative quantities of different gene transcripts have been shown to be risk alleles for some Mendelian diseases. However, for complex traits characterized by a low odds ratio for any single contributing variant, very few studies have investigated the contribution of splicing variants. The overarching goal of this study is to discover and characterize the role that variants affecting alternative splicing may play in the genetic etiology of complex traits, which include a significant number of the common human diseases. Specifically, we hypothesize that single nucleotide polymorphisms (SNPs) in splicing regulatory elements can be characterized in silico to identify variants affecting splicing, and that these variants may contribute to the etiology of complex diseases as well as the inter-individual variability in the ratios of alternative transcripts. We leverage high-throughput expression profiling to 1) experimentally validate our in silico predictions of skipped exons and 2) characterize the molecular role of intronic genetic variations in alternative splicing events in the context of complex human traits and diseases. We propose that intronic SNPs play a role as genetic regulators within splicing regulatory elements and show that their associated exon skipping events can affect protein domains and structure. We find that SNPs we would predict to affect exon skipping are enriched among the set of SNPs reported to be associated with complex human traits.
Author Summary
Alternative splicing is a common eukaryotic cellular mechanism that allows for the production of multiple proteins from one gene and occurs in 40%–90% of all human genes. Alternative splicing has been shown to be important for many critical biological processes, including development, evolution, and even psychological behavior. Additionally, alternative splicing has been associated with 15%–50% of human genetic diseases, including breast cancer; however, the precise mechanism by which genetic variations regulate this process remains to be fully elucidated. In this study, we develop an integrative approach that utilizes sequence-based analysis and genome-wide expression profiling to identify genetic variations that may affect alternative splicing. We also evaluate their enrichment among established disease-associated variations. Our study provides insights into the functionality of these variations and emphasizes their importance for complex human traits and diseases.
PMCID: PMC3486879  PMID: 23133393
2.  SCAN database: facilitating integrative analyses of cytosine modification and expression QTL 
Functional annotation of genetic variants including single nucleotide polymorphisms (SNPs) and copy number variations (CNV) promises to greatly improve our understanding of human complex traits. Previous transcriptomic studies involving individuals from different global populations have investigated the genetic architecture of gene expression variation by mapping expression quantitative trait loci (eQTL). Functional interpretation of genome-wide association studies (GWAS) has identified enrichment of eQTL in top signals from GWAS of human complex traits. The SCAN (SNP and CNV Annotation) database was developed as a web-based resource of genetical genomic studies including eQTL detected in the HapMap lymphoblastoid cell line samples derived from apparently healthy individuals of European and African ancestry. Considering the critical roles of epigenetic gene regulation, cytosine modification quantitative trait loci (mQTL) are expected to add a crucial layer of annotation to existing functional genomic information. Here, we describe the new features of the SCAN database that integrate comprehensive mQTL mapping results generated in the HapMap CEU (Caucasian residents from Utah, USA) and YRI (Yoruba people from Ibadan, Nigeria) LCL samples and demonstrate the utility of the enhanced functional annotation system.
Database URL:
PMCID: PMC4375357  PMID: 25818895
4.  Trans-population Analysis of Genetic Mechanisms of Ethnic Disparities in Neuroblastoma Survival 
Black patients with neuroblastoma have a higher prevalence of high-risk disease and worse outcome than white patients. We sought to investigate the relationship between genetic variation and the disparities in survival observed in neuroblastoma.
The analytic cohort was composed of 2709 patients. Principal components were used to assign patients to genomic ethnic clusters for survival analyses. Locus-specific ancestry was calculated for use in association analysis. The shorter spans of linkage disequilibrium in African populations may facilitate the fine mapping of causal variants in regions previously implicated by genome-wide association studies conducted primarily in patients of European descent. Thus, we evaluated 13 single nucleotide polymorphisms known to be associated with susceptibility to high-risk neuroblastoma from genome-wide association studies and all variants with highly divergent allele frequencies in reference African and European populations near the known susceptibility loci. All statistical tests were two-sided.
African genomic ancestry was associated with high-risk neuroblastoma (P = .007) and lower event-free survival (P = .04, hazard ratio = 1.4, 95% confidence interval = 1.05 to 1.80). rs1033069 within SPAG16 (sperm associated antigen 16) was determined to have higher risk allele frequency in the African reference population and statistically significant association with high-risk disease in patients of European and African ancestry (P = 6.42×10−5, false discovery rate < 0.0015) in the overall cohort. Multivariable analysis using an additive model demonstrated that the SPAG16 single nucleotide polymorphism contributes to the observed ethnic disparities in high-risk disease and survival.
Our study demonstrates that common genetic variation influences neuroblastoma phenotype and contributes to the ethnic disparities in survival observed and illustrates the value of trans-population mapping.
PMCID: PMC3691940  PMID: 23243203
5.  Genetic association signal near NTN4 in Tourette Syndrome 
Annals of neurology  2014;76(2):310-315.
Tourette Syndrome (TS) is a neurodevelopmental disorder with a complex genetic etiology. Through an international collaboration, we genotyped 42 single nucleotide polymorphisms (SNPs)(p<10−3) from the recent TS genome-wide association study (GWAS) in 609 independent cases and 610 ancestry-matched controls. Only rs2060546 on chromosome 12q22 (p=3.3×10−4) remained significant after Bonferroni correction. Meta-analysis with the original GWAS yielded the strongest association to date (p=5.8×10−7). Although its functional significance is unclear, rs2060546 lies closest to NTN4, an axon guidance molecule expressed in developing striatum. Risk score analysis significantly predicted case/control status (p=0.042), suggesting that many of these variants are true TS risk alleles.
PMCID: PMC4140987  PMID: 25042818
6.  Development of Framework for Assessing Influenza Virus Pandemic Risk 
Emerging Infectious Diseases  2015;21(8):1372-1378.
This simple, additive, multiattribute assessment tool can evaluate the risk posed by novel influenza A viruses.
Although predicting which influenza virus subtype will cause the next pandemic is not yet possible, public health authorities must continually assess the pandemic risk associated with animal influenza viruses, particularly those that have caused infections in humans, and determine what resources should be dedicated to mitigating that risk. To accomplish this goal, a risk assessment framework was created in collaboration with an international group of influenza experts. Compared with the previously used approach, this framework, named the Influenza Risk Assessment Tool, provides a systematic and transparent approach for assessing and comparing threats posed primarily by avian and swine influenza viruses. This tool will be useful to the international influenza community and will remain flexible and responsive to changing information.
PMCID: PMC4517742  PMID: 26196098
influenza; influenza virus; viruses; risk assessment; prepandemic preparation; H7N9; H3N2v; zoonoses
7.  Copy Number Variation in Obsessive-Compulsive Disorder and Tourette Syndrome: A Cross-Disorder Study 
McGrath, Lauren M. | Yu, Dongmei | Marshall, Christian | Davis, Lea K. | Thiruvahindrapuram, Bhooma | Li, Bingbin | Cappi, Carolina | Gerber, Gloria | Wolf, Aaron | Schroeder, Frederick A. | Osiecki, Lisa | O’Dushlaine, Colm | Kirby, Andrew | Illmann, Cornelia | Haddad, Stephen | Gallagher, Patience | Fagerness, Jesen A. | Barr, Cathy L. | Bellodi, Laura | Benarroch, Fortu | Bienvenu, O. Joseph | Black, Donald W. | Bloch, Michael H. | Bruun, Ruth D. | Budman, Cathy L. | Camarena, Beatriz | Cath, Danielle C. | Cavallini, Maria C. | Chouinard, Sylvain | Coric, Vladimir | Cullen, Bernadette | Delorme, Richard | Denys, Damiaan | Derks, Eske M. | Dion, Yves | Rosário, Maria C. | Eapen, Valsama | Evans, Patrick | Falkai, Peter | Fernandez, Thomas | Garrido, Helena | Geller, Daniel | Grabe, Hans J. | Grados, Marco A. | Greenberg, Benjamin D. | Gross-Tsur, Varda | Grünblatt, Edna | Heiman, Gary A. | Hemmings, Sian M.J. | Herrera, Luis D. | Hounie, Ana G. | Jankovic, Joseph | Kennedy, James L | King, Robert A. | Kurlan, Roger | Lanzagorta, Nuria | Leboyer, Marion | Leckman, James F. | Lennertz, Leonhard | Lochner, Christine | Lowe, Thomas L. | Lyon, Gholson J. | Macciardi, Fabio | Maier, Wolfgang | McCracken, James T. | McMahon, William | Murphy, Dennis L. | Naarden, Allan L | Neale, Benjamin M | Nurmi, Erika | Pakstis, Andrew J. | Pato, Michele T. | Pato, Carlos N. | Piacentini, John | Pittenger, Christopher | Pollak, Yehuda | Reus, Victor I. | Richter, Margaret A. | Riddle, Mark | Robertson, Mary M. | Rosenberg, David | Rouleau, Guy A. | Ruhrmann, Stephan | Sampaio, Aline S. | Samuels, Jack | Sandor, Paul | Sheppard, Brooke | Singer, Harvey S. | Smit, Jan H. | Stein, Dan J. | Tischfield, Jay A. | Vallada, Homero | Veenstra-VanderWeele, Jeremy | Walitza, Susanne | Wang, Ying | Wendland, Jens R. | Shugart, Yin Yao | Miguel, Euripedes C. | Nicolini, Humberto | Oostra, Ben A. | Moessner, Rainald | Wagner, Michael | Ruiz-Linares, Andres | Heutink, Peter | Nestadt, Gerald | Freimer, Nelson | Petryshen, Tracey | Posthuma, Danielle | Jenike, Michael A. | Cox, Nancy J. | Hanna, Gregory L. | Brentani, Helena | Scherer, Stephen W. | Arnold, Paul D. | Stewart, S. Evelyn | Mathews, Carol A. | Knowles, James A. | Cook, Edwin H. | Pauls, David L. | Wang, Kai | Scharf, Jeremiah M.
Obsessive-compulsive disorder (OCD) and Tourette syndrome (TS) are heritable, neurodevelopmental disorders with a partially shared genetic etiology. This study represents the first genome-wide investigation of large (>500kb), rare (<1%) copy number variants (CNVs) in OCD and the largest genome-wide CNV analysis in TS to date.
The primary analyses utilized a cross-disorder design for 2,699 patients (1,613 ascertained for OCD, 1,086 ascertained for TS) and 1,789 controls. Parental data facilitated a de novo analysis in 348 OCD trios.
Although no global CNV burden was detected in the cross-disorder analysis or in secondary, disease-specific analyses, there was a 3.3-fold increased burden of large deletions previously associated with other neurodevelopmental disorders (p=.09). Half of these neurodevelopmental deletions were located in a single locus, 16p13.11 (5 patient deletions: 0 control deletions, p=0.08 in current study, p=0.025 compared to published controls). Three 16p13.11 deletions were confirmed de novo, providing further support to the etiological significance of this region. The overall OCD de novo rate was 1.4%, which is intermediate between published rates in controls (0.7%) and in autism or schizophrenia (2–4%).
Several converging lines of evidence implicate 16p13.11 deletions in OCD, with weaker evidence for a role in TS. The trend toward increased overall neurodevelopmental CNV burden in TS and OCD suggests that deletions previously associated with other neurodevelopmental disorders may also contribute to these phenotypes.
PMCID: PMC4218748  PMID: 25062598
Tourette syndrome; obsessive-compulsive disorder; copy number variation; genetics; 16p13.11
8.  Poly-Omic Prediction of Complex Traits: OmicKriging 
Genetic epidemiology  2014;38(5):402-415.
High-confidence prediction of complex traits such as disease risk or drug response is an ultimate goal of personalized medicine. Although genome-wide association studies have discovered thousands of well-replicated polymorphisms associated with a broad spectrum of complex traits, the combined predictive power of these associations for any given trait is generally too low to be of clinical relevance. We propose a novel systems approach to complex trait prediction, which leverages and integrates similarity in genetic, transcriptomic, or other omics-level data. We translate the omic similarity into phenotypic similarity using a method called Kriging, commonly used in geostatistics and machine learning. Our method called OmicKriging emphasizes the use of a wide variety of systems-level data, such as those increasingly made available by comprehensive surveys of the genome, transcriptome, and epigenome, for complex trait prediction. Furthermore, our OmicKriging framework allows easy integration of prior information on the function of subsets of omics-level data from heterogeneous sources without the sometimes heavy computational burden of Bayesian approaches. Using seven disease datasets from the Wellcome Trust Case Control Consortium (WTCCC), we show that OmicKriging allows simple integration of sparse and highly polygenic components yielding comparable performance at a fraction of the computing time of a recently published Bayesian sparse linear mixed model method. Using a cellular growth phenotype, we show that integrating mRNA and microRNA expression data substantially increases performance over either dataset alone. Using clinical statin response, we show improved prediction over existing methods.
PMCID: PMC4072756  PMID: 24799323
complex trait prediction; polygenic modeling; systems biology; polygenic prediction; Kriging
10.  Identification of Influenza A/PR/8/34 Donor Viruses Imparting High Hemagglutinin Yields to Candidate Vaccine Viruses in Eggs 
PLoS ONE  2015;10(6):e0128982.
One of the important lessons learned from the 2009 H1N1 pandemic is that a high yield influenza vaccine virus is essential for efficient and timely production of pandemic vaccines in eggs. The current seasonal and pre-pandemic vaccine viruses are generated either by classical reassortment or reverse genetics. Both approaches utilize a high growth virus, generally A/Puerto Rico/8/1934 (PR8), as the donor of all or most of the internal genes, and the wild type virus recommended for inclusion in the vaccine to contribute the hemagglutinin (HA) and neuraminidase (NA) genes encoding the surface glycoproteins. As a result of extensive adaptation through sequential egg passaging, PR8 viruses with different gene sequences and high growth properties have been selected at different laboratories in past decades. The effect of these related but distinct internal PR8 genes on the growth of vaccine viruses in eggs has not been examined previously. Here, we use reverse genetics to analyze systematically the growth and HA antigen yield of reassortant viruses with 3 different PR8 backbones. A panel of 9 different HA/NA gene pairs in combination with each of the 3 different lineages of PR8 internal genes (27 reassortant viruses) was generated to evaluate their performance. Virus and HA yield assays showed that the PR8 internal genes influence HA yields in most subtypes. Although no single PR8 internal gene set outperformed the others in all candidate vaccine viruses, a combination of specific PR8 backbone with individual HA/NA pairs demonstrated improved HA yield and consequently the speed of vaccine production. These findings may be important both for production of seasonal vaccines and for a rapid global vaccine response during a pandemic.
PMCID: PMC4465931  PMID: 26068666
11.  Integrating Cell-Based and Clinical Genome-Wide Studies to Identify Genetic Variants Contributing to Treatment Failure in Neuroblastoma Patients 
High-risk neuroblastoma is an aggressive malignancy with high rates of treatment failure. We evaluated genetic variants associated with in vitro sensitivity to two derivatives of cyclophosphamide for association with clinical response in a separate replication cohort of neuroblastoma patients (n=2,709). Lymphoblastoid cell lines (LCLs) were exposed to increasing concentrations of 4-hydroperoxycyclophosphamide [4HC n=422] and phosphoramide mustard [PM n=428] to determine sensitivity. Genome-wide association studies (GWAS) were performed to identify single nucleotide polymorphisms (SNPs) associated with 4HC and PM sensitivity. SNPs consistently associated with LCL sensitivity were analyzed for associations with event-free survival in patients. Two linked SNPs, rs9908694 and rs1453560, were found to be associated with PM sensitivity in LCLs across populations and were associated with event-free survival in all patients (P=0.01) and within the high-risk subset (P=0.05). Our study highlights the value of cell-based models to identify candidate variants that may predict response to treatment in patients with cancer.
PMCID: PMC4029857  PMID: 24549002
neuroblastoma; pharmacogenomics; cell-based models; IKZF3; ZPBP2; expression quantitative trait loci
12.  Establishment of CYP2D6 Reference Samples by Multiple Validated Genotyping Platforms 
The pharmacogenomics journal  2014;14(6):564-572.
Cytochrome P450 2D6 (cytochrome P450, family 2, subfamily D, polypeptide 6, or CYP2D6), a highly polymorphic drug metabolizing enzyme, is involved in the metabolism of one quarter of the most commonly prescribed medications. Here, we have applied multiple genotyping methods and Sanger sequencing to assign precise and reproducible CYP2D6 genotypes, including copy numbers, for 48 HapMap samples. Furthermore, by analyzing a set of 50 human liver microsomes using endoxifen formation from N-desmethyl-tamoxifen as the phenotype of interest, we observed a significant positive correlation between CYP2D6 genotype-assigned activity score and endoxifen formation rate (rs = 0.68 by Rank correlation test, P = 5.3 ×10−8), which corroborated the genotype-phenotype prediction derived from our genotyping methodologies. In the future, these 48 publicly available HapMap samples characterized by multiple substantiated CYP2D6 genotyping platforms could serve as a reference resource for assay development, validation, quality control, and proficiency testing for other CYP2D6 genotyping projects, and for programs pursuing clinical pharmacogenomic testing implementation.
PMCID: PMC4237721  PMID: 24980783
CYP2D6; genotyping; pharmacogenomics clinical implementation; sequencing
13.  A Genome-wide Association Study of Early-onset Breast Cancer Identifies PFKM as a Novel Breast Cancer Gene and Supports a Common Genetic Spectrum for Breast Cancer at Any Age 
Ahsan, Habibul | Halpern, Jerry | Kibriya, Muhammad G | Pierce, Brandon L | Tong, Lin | Gamazon, Eric | McGuire, Valerie | Felberg, Anna | Shi, Jianxin | Jasmine, Farzana | Roy, Shantanu | Brutus, Rachelle | Argos, Maria | Melkonian, Stephanie | Chang-Claude, Jenny | Andrulis, Irene | Hopper, John L | John, Esther M. | Malone, Kathi | Ursin, Giske | Gammon, Marilie D | Thomas, Duncan C | Seminara, Daniela | Casey, Graham | Knight, Julia A | Southey, Melissa C | Giles, Graham G | Santella, Regina M | Lee, Eunjung | Conti, David | Duggan, David | Gallinger, Steve | Haile, Robert | Jenkins, Mark | Lindor, Noralane M | Newcomb, Polly | Michailidou, Kyriaki | Apicella, Carmel | Park, Daniel J | Peto, Julian | Fletcher, Olivia | Silva, Isabel dos Santos | Lathrop, Mark | Hunter, David J | Chanock, Stephen J | Meindl, Alfons | Schmutzler, Rita K | Müller-Myhsok, Bertram | Lochmann, Magdalena | Beckmann, Lars | Hein, Rebecca | Makalic, Enes | Schmidt, Daniel F | Bui, Quang Minh | Stone, Jennifer | Flesch-Janys, Dieter | Dahmen, Norbert | Nevanlinna, Heli | Aittomäki, Kristiina | Blomqvist, Carl | Hall, Per | Czene, Kamila | Irwanto, Astrid | Liu, Jianjun | Rahman, Nazneen | Turnbull, Clare | Dunning, Alison M. | Pharoah, Paul | Waisfisz, Quinten | Meijers-Heijboer, Hanne | Uitterlinden, Andre G. | Rivadeneira, Fernando | Nicolae, Dan | Easton, Douglas F | Cox, Nancy J | Whittemore, Alice S
Early-onset breast cancer (EOBC) causes substantial loss of life and productivity, creating a major burden among women worldwide. We analyzed 1,265,548 Hapmap3 SNPs among a discovery set of 3,523 EOBC incident case and 2,702 population control women aged <=51 years. The SNPs with smallest P-values were examined in a replication set of 3,470 EOBC case and 5,475 control women. We also tested EOBC association with 19,684 genes by annotating each gene with putative functional SNPs, and then combining their P-values to obtain a gene-based P-value. We examined the gene with smallest P-value for replication in 1,145 breast cancer case and 1,142 control women. The combined discovery and replication sets identified 72 new SNPs associated with EOBC (P<4×10−8) located in six genomic regions previously reported to contain SNPs associated largely with later-onset breast cancer (LOBC). SNP rs2229882 and 10 other SNPs on chromosome 5q11.2 remained associated (P<6×10−4) after adjustment for the strongest published SNPs in the region. Thirty-two of the 82 currently known LOBC SNPs were associated with EOBC (P<0.05). Low power is likely responsible for the remaining 50 unassociated known LOBC SNPs. The gene-based analysis identified an association between breast cancer and the phosphofructokinase-muscle (PFKM) gene on chromosome 12q13.11 that met the genomewide gene-based threshold of 2.5×10−6. In conclusion, EOBC and LOBC appear to have similar genetic etiologies; the 5q11.2 region may contain multiple distinct breast cancer loci; and the PFKM gene region is worthy of further investigation. These findings should enhance our understanding of the etiology of breast cancer.
PMCID: PMC3990360  PMID: 24493630
14.  Identification and Functional Characterization of G6PC2 Coding Variants Influencing Glycemic Traits Define an Effector Transcript at the G6PC2-ABCB11 Locus 
Mahajan, Anubha | Sim, Xueling | Ng, Hui Jin | Manning, Alisa | Rivas, Manuel A. | Highland, Heather M. | Locke, Adam E. | Grarup, Niels | Im, Hae Kyung | Cingolani, Pablo | Flannick, Jason | Fontanillas, Pierre | Fuchsberger, Christian | Gaulton, Kyle J. | Teslovich, Tanya M. | Rayner, N. William | Robertson, Neil R. | Beer, Nicola L. | Rundle, Jana K. | Bork-Jensen, Jette | Ladenvall, Claes | Blancher, Christine | Buck, David | Buck, Gemma | Burtt, Noël P. | Gabriel, Stacey | Gjesing, Anette P. | Groves, Christopher J. | Hollensted, Mette | Huyghe, Jeroen R. | Jackson, Anne U. | Jun, Goo | Justesen, Johanne Marie | Mangino, Massimo | Murphy, Jacquelyn | Neville, Matt | Onofrio, Robert | Small, Kerrin S. | Stringham, Heather M. | Syvänen, Ann-Christine | Trakalo, Joseph | Abecasis, Goncalo | Bell, Graeme I. | Blangero, John | Cox, Nancy J. | Duggirala, Ravindranath | Hanis, Craig L. | Seielstad, Mark | Wilson, James G. | Christensen, Cramer | Brandslund, Ivan | Rauramaa, Rainer | Surdulescu, Gabriela L. | Doney, Alex S. F. | Lannfelt, Lars | Linneberg, Allan | Isomaa, Bo | Tuomi, Tiinamaija | Jørgensen, Marit E. | Jørgensen, Torben | Kuusisto, Johanna | Uusitupa, Matti | Salomaa, Veikko | Spector, Timothy D. | Morris, Andrew D. | Palmer, Colin N. A. | Collins, Francis S. | Mohlke, Karen L. | Bergman, Richard N. | Ingelsson, Erik | Lind, Lars | Tuomilehto, Jaakko | Hansen, Torben | Watanabe, Richard M. | Prokopenko, Inga | Dupuis, Josee | Karpe, Fredrik | Groop, Leif | Laakso, Markku | Pedersen, Oluf | Florez, Jose C. | Morris, Andrew P. | Altshuler, David | Meigs, James B. | Boehnke, Michael | McCarthy, Mark I. | Lindgren, Cecilia M. | Gloyn, Anna L.
PLoS Genetics  2015;11(1):e1004876.
Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.
Author Summary
Understanding how FI and FG levels are regulated is important because their derangement is a feature of T2D. Despite recent success from GWAS in identifying regions of the genome influencing glycemic traits, collectively these loci explain only a small proportion of trait variance. Unlocking the biological mechanisms driving these associations has been challenging because the vast majority of variants map to non-coding sequence, and the genes through which they exert their impact are largely unknown. In the current study, we sought to increase our understanding of the physiological pathways influencing both traits using exome-array genotyping in up to 33,231 non-diabetic individuals to identify coding variants and consequently genes associated with either FG or FI levels. We identified novel association signals for both traits including the receptor for GLP-1 agonists which are a widely used therapy for T2D. Furthermore, we identified coding variants at several GWAS loci which point to the genes underlying these association signals. Importantly, we found that multiple coding variants in G6PC2 result in a loss of protein function and lower fasting glucose levels.
PMCID: PMC4307976  PMID: 25625282
15.  Quantitative Allelic Test - a Fast Test for Very Large Association Studies 
Genetic epidemiology  2013;37(8):831-839.
Advances in high throughput technology have enabled the generation of unprecedented amounts of genomic data (e.g., next generation sequence data, transcriptomics, metabolomics, and proteomics), which promises to unravel the genetic architecture of complex traits. These discoveries may lead to novel therapeutic targets, guide disease prevention, and enable personalized medicine. However, the pace of data generation surpasses the ability to process and analyze the vast amounts of data. For example, in a typical study of transcription regulation, the relationship between more than 1 million genetic variants and ten thousand transcript levels are explored, requiring tens of billions of tests. In order to address this problem, we propose a fast, accurate, and robust method that can assess the significance of associations between quantitative phenotypes and genotypes. The method is an extension of the allelic test commonly used in case-control studies for the analysis of quantitative traits. We show the asymptotic equivalence of the proposed test to linear regression results. We also reduce a generalized linear regression problem to the comparison of two groups, which can handle non-normal and survival time phenotypes.
PMCID: PMC4054703  PMID: 24185610
GWAS; quantitative traits; allelic methods
16.  Characterization of Reverse Genetics-Derived Cold-Adapted Master Donor Virus A/Leningrad/134/17/57 (H2N2) and Reassortants with H5N1 Surface Genes in a Mouse Model 
Live attenuated influenza vaccines (LAIV) offer significant advantages over subunit or split inactivated vaccines to mitigate an eventual influenza pandemic, including simpler manufacturing processes and more cross-protective immune responses. Using an established reverse genetics (rg) system for wild-type (wt) A/Leningrad/134/1957 and cold-adapted (ca) A/Leningrad/134/17/1957 (Len17) master donor virus (MDV), we produced and characterized three rg H5N1 reassortant viruses carrying modified HA and intact NA genes from either A/Vietnam/1203/2004 (H5N1, VN1203, clade 1) or A/Egypt/321/2007 (H5N1, EG321, clade 2) virus. A mouse model of infection was used to determine the infectivity and tissue tropism of the parental wt viruses compared to the ca master donor viruses as well as the H5N1 reassortants. All ca viruses showed reduced replication in lungs and enhanced replication in nasal epithelium. In addition, the H5N1 HA and NA enhanced replication in lungs unless it was restricted by the internal genes of the ca MDV. Mice inoculated twice 4 weeks apart with the H5N1 reassortant LAIV candidate viruses developed serum hemagglutination inhibition HI and IgA antibody titers to the homologous and heterologous viruses consistent with protective immunity. These animals remained healthy after challenge inoculation with a lethal dose with homologous or heterologous wt H5N1 highly pathogenic avian influenza (HPAI) viruses. The profiles of viral replication in respiratory tissues and the immunogenicity and protective efficacy characteristics of the two ca H5N1 candidate LAIV viruses warrant further development into a vaccine for human use.
PMCID: PMC4018889  PMID: 24648485
17.  Structural Stability of Influenza A(H1N1)pdm09 Virus Hemagglutinins 
Journal of Virology  2014;88(9):4828-4838.
The noncovalent interactions that mediate trimerization of the influenza hemagglutinin (HA) are important determinants of its biological activities. Recent studies have demonstrated that mutations in the HA trimer interface affect the thermal and pH sensitivities of HA, suggesting a possible impact on vaccine stability (). We used size exclusion chromatography analysis of recombinant HA ectodomain to compare the differences among recombinant trimeric HA proteins from early 2009 pandemic H1N1 viruses, which dissociate to monomers, with those of more recent virus HAs that can be expressed as trimers. We analyzed differences among the HA sequences and identified intermolecular interactions mediated by the residue at position 374 (HA0 numbering) of the HA2 subdomain as critical for HA trimer stability. Crystallographic analyses of HA from the recent H1N1 virus A/Washington/5/2011 highlight the structural basis for this observed phenotype. It remains to be seen whether more recent viruses with this mutation will yield more stable vaccines in the future.
IMPORTANCE Hemagglutinins from the early 2009 H1N1 pandemic viruses are unable to maintain a trimeric complex when expressed in a recombinant system. However, HAs from 2010 and 2011 strains are more stable, and our work highlights that the improvement in stability can be attributed to an E374K substitution in the HA2 subunit of the stalk that emerged naturally in the circulating viruses.
PMCID: PMC3993803  PMID: 24522930
18.  A Non-Degenerate Code of Deleterious Variants in Mendelian Loci Contributes to Complex Disease Risk 
Cell  2013;155(1):10.1016/j.cell.2013.08.030.
Whereas countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. Here, we examine the extent to which Mendelian variation contributes to complex disease risk by mining the medical records of over 110 million patients. We detect thousands of associations between Mendelian and complex diseases, revealing a non-degenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this “Mendelian code.” Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute non-additively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.
PMCID: PMC3844554  PMID: 24074861
19.  Obesity-associated variants within FTO form long-range functional connections with IRX3 
Nature  2014;507(7492):371-375.
Genome-wide association studies (GWAS) have reproducibly associated variants within introns of FTO with increased risk for obesity and type-2 diabetes (T2D) 1–3. While the molecular mechanisms linking these noncoding variants with obesity are not immediately obvious, subsequent studies in mice demonstrated that FTO expression levels influence body mass and composition phenotypes 4–6. Yet, no direct connection between the obesity-associated variants and FTO expression or function has been made 7–9. Here, we show that the obesity-associated noncoding sequences within FTO are functionally connected, at megabase distances, with the homeobox gene IRX3. The obesity-associated FTO region directly interacts with the promoters of IRX3 as well as FTO in the human, mouse, and zebrafish genomes. Furthermore, long-range enhancers within this region recapitulate aspects of IRX3 expression, suggesting that the obesity-associated interval belongs to the regulatory landscape of IRX3. Supporting this, obesity-associated SNPs are associated with expression of IRX3, but not FTO, in human brains. Directly linking IRX3 expression with regulation of body mass and composition, Irx3-deficient mice exhibit a 25–30% reduction in body weight, primarily through the loss of fat mass and increase in basal metabolic rate with browning of white adipose tissue. Furthermore, hypothalamic expression of a dominant negative form of Irx3 reproduces the metabolic phenotypes of Irx3-deficient mice. Our data posit that IRX3 is a functional long-range target of obesity-associated variants within FTO, and represents a novel determinant of body mass and composition.
PMCID: PMC4113484  PMID: 24646999
20.  The chromosome 3q25 genomic region is associated with measures of adiposity in newborns in a multi-ethnic genome-wide association study 
Human Molecular Genetics  2013;22(17):3583-3596.
Newborns characterized as large and small for gestational age are at risk for increased mortality and morbidity during the first year of life as well as for obesity and dysglycemia as children and adults. The intrauterine environment and fetal genes contribute to the fetal size at birth. To define the genetic architecture underlying the newborn size, we performed a genome-wide association study (GWAS) in 4281 newborns in four ethnic groups from the Hyperglycemia and Adverse Pregnancy Outcome Study. We tested for association with newborn anthropometric traits (birth length, head circumference, birth weight, percent fat mass and sum of skinfolds) and newborn metabolic traits (cord glucose and C-peptide) under three models. Model 1 adjusted for field center, ancestry, neonatal gender, gestational age at delivery, parity, maternal age at oral glucose tolerance test (OGTT); Model 2 adjusted for Model 1 covariates, maternal body mass index (BMI) at OGTT, maternal height at OGTT, maternal mean arterial pressure at OGTT, maternal smoking and drinking; Model 3 adjusted for Model 2 covariates, maternal glucose and C-peptide at OGTT. Strong evidence for association was observed with measures of newborn adiposity (sum of skinfolds model 3 Z-score 7.356, P = 1.90×10−13, and to a lesser degree fat mass and birth weight) and a region on Chr3q25.31 mapping between CCNL and LEKR1. These findings were replicated in an independent cohort of 2296 newborns. This region has previously been shown to be associated with birth weight in Europeans. The current study suggests that association of this locus with birth weight is secondary to an effect on fat as opposed to lean body mass.
PMCID: PMC3736865  PMID: 23575227
21.  Identification of HKDC1 and BACE2 as Genes Influencing Glycemic Traits During Pregnancy Through Genome-Wide Association Studies 
Diabetes  2013;62(9):3282-3291.
Maternal metabolism during pregnancy impacts the developing fetus, affecting offspring birth weight and adiposity. This has important implications for metabolic health later in life (e.g., offspring of mothers with pre-existing or gestational diabetes mellitus have an increased risk of metabolic disorders in childhood). To identify genetic loci associated with measures of maternal metabolism obtained during an oral glucose tolerance test at ∼28 weeks’ gestation, we performed a genome-wide association study of 4,437 pregnant mothers of European (n = 1,367), Thai (n = 1,178), Afro-Caribbean (n = 1,075), and Hispanic (n = 817) ancestry, along with replication of top signals in three additional European ancestry cohorts. In addition to identifying associations with genes previously implicated with measures of glucose metabolism in nonpregnant populations, we identified two novel genome-wide significant associations: 2-h plasma glucose and HKDC1, and fasting C-peptide and BACE2. These results suggest that the genetic architecture underlying glucose metabolism may differ, in part, in pregnancy.
PMCID: PMC3749326  PMID: 23903356
22.  University of Chicago Center for Personalized Therapeutics: research, education and implementation science 
Pharmacogenomics  2013;14(12):1383-1387.
Pharmacogenomics is aimed at advancing our knowledge of the genetic basis of variable drug response. The Center for Personalized Therapeutics within the University of Chicago comprises basic, translational and clinical research as well as education including undergraduate, graduate, medical students, clinical/postdoctoral fellows and faculty. The Committee on Clinical Pharmacology and Pharmacogenomics is the educational arm of the Center aimed at training clinical and postdoctoral fellows in translational pharmacology and pharmacogenomics. Research runs the gamut from basic discovery and functional studies to pharmacogenomic implementation studies to evaluate physician adoption of genetic medicine. The mission of the Center is to facilitate research, education and implementation of pharmacogenomics to realize the true potential of personalized medicine and improve the lives of patients.
PMCID: PMC4022693  PMID: 24024891
23.  Genetic variants associated with warfarin dose in African-American individuals: a genome-wide association study 
Lancet  2013;382(9894):790-796.
VKORC1 and CYP2C9 are important contributors to warfarin dose variability, but explain less variability for individuals of African descent than for those of European or Asian descent. We aimed to identify additional variants contributing to warfarin dose requirements in African Americans.
We did a genome-wide association study of discovery and replication cohorts. Samples from African-American adults (aged ≥18 years) who were taking a stable maintenance dose of warfarin were obtained at International Warfarin Pharmacogenetics Consortium (IWPC) sites and the University of Alabama at Birmingham (Birmingham, AL, USA). Patients enrolled at IWPC sites but who were not used for discovery made up the independent replication cohort. All participants were genotyped. We did a stepwise conditional analysis, conditioning first for VKORC1 −1639G→A, followed by the composite genotype of CYP2C9*2 and CYP2C9*3. We prespecified a genome-wide significance threshold of p<5×10−8 in the discovery cohort and p<0·0038 in the replication cohort.
The discovery cohort contained 533 participants and the replication cohort 432 participants. After the prespecified conditioning in the discovery cohort, we identified an association between a novel single nucleotide polymorphism in the CYP2C cluster on chromosome 10 (rs12777823) and warfarin dose requirement that reached genome-wide significance (p=1·51×10−8). This association was confirmed in the replication cohort (p=5·04×10−5); analysis of the two cohorts together produced a p value of 4·5×10−12. Individuals heterozygous for the rs12777823 A allele need a dose reduction of 6·92 mg/week and those homozygous 9·34 mg/week. Regression analysis showed that the inclusion of rs12777823 significantly improves warfarin dose variability explained by the IWPC dosing algorithm (21% relative improvement).
A novel CYP2C single nucleotide polymorphism exerts a clinically relevant effect on warfarin dose in African Americans, independent of CYP2C9*2 and CYP2C9*3. Incorporation of this variant into pharmacogenetic dosing algorithms could improve warfarin dose prediction in this population.
National Institutes of Health, American Heart Association, Howard Hughes Medical Institute, Wisconsin Network for Health Research, and the Wellcome Trust.
PMCID: PMC3759580  PMID: 23755828
24.  Genome-Wide Association Studies in Pharmacogenomics: Successes and Lessons 
Pharmacogenetics and genomics  2013;23(8):383-394.
As genotyping technology has progressed, genome-wide association studies (GWAS) have matured into efficient and effective tools for mapping genes underlying human phenotypes. Recent studies have demonstrated the utility of the GWAS approach for examining pharmacogenomic traits, including drug metabolism, efficacy, and toxicity. Application of GWAS to pharmacogenomic outcomes presents unique challenges and opportunities. In the current review, we discuss the potential promises and potential caveats of this approach specifically as it relates to pharmacogenomic studies. Concerns with study design, power and sample size, and analysis are reviewed. We further examine the features of successful pharmacogenomic GWAS, and describe consortia efforts that are likely to expand the reach of pharmacogenomic GWAS in the future.
PMCID: PMC3003940  PMID: 20639796
Genome-wide association; GWAS; pharmacogenetic; pharmacogenomic; drug response; drug metabolism; toxicity

