Search tips
Search criteria

Results 1-25 (47)

Clipboard (0)
Year of Publication
1.  Coronary Heart Disease-Associated Variation in TCF21 Disrupts a miR-224 Binding Site and miRNA-Mediated Regulation 
PLoS Genetics  2014;10(3):e1004263.
Genome-wide association studies (GWAS) have identified chromosomal loci that affect risk of coronary heart disease (CHD) independent of classical risk factors. One such association signal has been identified at 6q23.2 in both Caucasians and East Asians. The lead CHD-associated polymorphism in this region, rs12190287, resides in the 3′ untranslated region (3′-UTR) of TCF21, a basic-helix-loop-helix transcription factor, and is predicted to alter the seed binding sequence for miR-224. Allelic imbalance studies in circulating leukocytes and human coronary artery smooth muscle cells (HCASMC) showed significant imbalance of the TCF21 transcript that correlated with genotype at rs12190287, consistent with this variant contributing to allele-specific expression differences. 3′ UTR reporter gene transfection studies in HCASMC showed that the disease-associated C allele has reduced expression compared to the protective G allele. Kinetic analyses in vitro revealed faster RNA-RNA complex formation and greater binding of miR-224 with the TCF21 C allelic transcript. In addition, in vitro probing with Pb2+ and RNase T1 revealed structural differences between the TCF21 variants in proximity of the rs12190287 variant, which are predicted to provide greater access to the C allele for miR-224 binding. miR-224 and TCF21 expression levels were anti-correlated in HCASMC, and miR-224 modulates the transcriptional response of TCF21 to transforming growth factor-β (TGF-β) and platelet derived growth factor (PDGF) signaling in an allele-specific manner. Lastly, miR-224 and TCF21 were localized in human coronary artery lesions and anti-correlated during atherosclerosis. Together, these data suggest that miR-224 interaction with the TCF21 transcript contributes to allelic imbalance of this gene, thus partly explaining the genetic risk for coronary heart disease associated at 6q23.2. These studies implicating rs12190287 in the miRNA-dependent regulation of TCF21, in conjunction with previous studies showing that this variant modulates transcriptional regulation through activator protein 1 (AP-1), suggests a unique bimodal level of complexity previously unreported for disease-associated variants.
Author Summary
Both genetic and environmental factors cumulatively contribute to coronary heart disease risk in human populations. Large-scale meta-analyses of genome-wide association studies have now leveraged common genetic variation to identify multiple sites of disease susceptibility; however, the causal mechanisms for these associations largely remain elusive. One of these disease-associated variants, rs12190287, resides in the 3′untranslated region of the vascular developmental transcription factor, TCF21. Intriguingly, this variant is shown to disrupt the seed binding sequence for microRNA-224, and through altered RNA secondary structure and binding kinetics, leads to dysregulated TCF21 gene expression in response to disease-relevant stimuli. Importantly TCF21 and miR-224 expression levels were perturbed in human atherosclerotic lesions. Along with our previous reports on the transcriptional regulatory mechanisms altered by this variant, these studies shed new light on the complex heritable mechanisms of coronary heart disease risk that are amenable to therapeutic intervention.
PMCID: PMC3967965  PMID: 24676100
2.  Re-sequencing Expands Our Understanding of the Phenotypic Impact of Variants at GWAS Loci 
PLoS Genetics  2014;10(1):e1004147.
Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20–30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5′ and 3′ untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
Author Summary
Abnormal serum levels of various metabolites, including measures relevant to cholesterol, other fats, and sugars, are known to be risk factors for cardiovascular disease and type 2 diabetes. Identification of the genes that play a role in generating such abnormalities could advance the development of new treatment and prevention strategies for these disorders. Investigations of common genetic variants carried out in large sets of research subjects have successfully pinpointed such genes within many regions of the human genome. However, these studies often have not led to the identification of the specific genetic variations affecting metabolic traits. To attempt to detect such causal variations, we sequenced genes in 17 genomic regions implicated in metabolic traits in >6,000 people from Finland. By conducting statistical analyses relating specific variations (individually and grouped by gene) to the measures for these metabolic traits observed in the study subjects, we added to our understanding of how genotypes affect these traits. Our findings support a long-held hypothesis that the unique history of the Finnish population provides important advantages for analyzing the relationship between genetic variations and biomedically important traits.
PMCID: PMC3907339  PMID: 24497850
3.  Prediction and Interaction in Complex Disease Genetics: Experience in Type 1 Diabetes 
PLoS Genetics  2009;5(7):e1000540.
PMCID: PMC2703795  PMID: 19584936
4.  Genome Wide Analysis of Narcolepsy in China Implicates Novel Immune Loci and Reveals Changes in Association Prior to Versus After the 2009 H1N1 Influenza Pandemic 
PLoS Genetics  2013;9(10):e1003880.
Previous studies in narcolepsy, an autoimmune disorder affecting hypocretin (orexin) neurons and recently associated with H1N1 influenza, have demonstrated significant associations with five loci. Using a well-characterized Chinese cohort, we refined known associations in TRA@ and P2RY11-DNMT1 and identified new associations in the TCR beta (TRB@; rs9648789 max P = 3.7×10−9 OR 0.77), ZNF365 (rs10995245 max P = 1.2×10−11 OR 1.23), and IL10RB-IFNAR1 loci (rs2252931 max P = 2.2×10−9 OR 0.75). Variants in the Human Leukocyte Antigen (HLA)- DQ region were associated with age of onset (rs7744020 P = 7.9×10−9 beta −1.9 years) and varied significantly among cases with onset after the 2009 H1N1 influenza pandemic compared to previous years (rs9271117 P = 7.8×10−10 OR 0.57). These reflected an association of DQB1*03:01 with earlier onset and decreased DQB1*06:02 homozygosity following 2009. Our results illustrate how genetic association can change in the presence of new environmental challenges and suggest that the monitoring of genetic architecture over time may help reveal the appearance of novel triggers for autoimmune diseases.
Author Summary
Narcolepsy-hypocretin deficiency results from a highly specific autoimmune attack on hypocretin cells. Recent studies have established antigen presentation by specific class II proteins encoded by (HLA DQB1*06:02 and DQA1*01:02) to the cognate T cell receptor as the main disease pathway, with a role for H1N1 influenza in the triggering process. Here, we have used a large and well-characterized cohort of Chinese narcolepsy cases to examine genetic architecture not observed in European samples. We confirmed previously implicated susceptibility genes (T cell receptor alpha, P2RY11), and identify new loci (ZNF365, IL10RB-IFNAR1), most notably, variants at the beta chain of the T cell receptor. We found that one HLA variant, (DQB1*03:01), is associated with dramatically earlier disease onset (nearly 2 years). We also identified differences in HLA haplotype frequencies among cases with onset following the 2009 H1N1 influenza pandemic as compared to before the outbreak, with fewer HLA DQB1*06:02 homozygotes. This may be the first demonstration of such an effect, and suggests that the study of changes in GWAS signals over time could help identify environmental factors in other autoimmune diseases.
PMCID: PMC3814311  PMID: 24204295
5.  Alu Elements in ANRIL Non-Coding RNA at Chromosome 9p21 Modulate Atherogenic Cell Functions through Trans-Regulation of Gene Networks 
PLoS Genetics  2013;9(7):e1003588.
The chromosome 9p21 (Chr9p21) locus of coronary artery disease has been identified in the first surge of genome-wide association and is the strongest genetic factor of atherosclerosis known today. Chr9p21 encodes the long non-coding RNA (ncRNA) antisense non-coding RNA in the INK4 locus (ANRIL). ANRIL expression is associated with the Chr9p21 genotype and correlated with atherosclerosis severity. Here, we report on the molecular mechanisms through which ANRIL regulates target-genes in trans, leading to increased cell proliferation, increased cell adhesion and decreased apoptosis, which are all essential mechanisms of atherogenesis. Importantly, trans-regulation was dependent on Alu motifs, which marked the promoters of ANRIL target genes and were mirrored in ANRIL RNA transcripts. ANRIL bound Polycomb group proteins that were highly enriched in the proximity of Alu motifs across the genome and were recruited to promoters of target genes upon ANRIL over-expression. The functional relevance of Alu motifs in ANRIL was confirmed by deletion and mutagenesis, reversing trans-regulation and atherogenic cell functions. ANRIL-regulated networks were confirmed in 2280 individuals with and without coronary artery disease and functionally validated in primary cells from patients carrying the Chr9p21 risk allele. Our study provides a molecular mechanism for pro-atherogenic effects of ANRIL at Chr9p21 and suggests a novel role for Alu elements in epigenetic gene regulation by long ncRNAs.
Author Summary
Chromosome 9p21 is the strongest genetic factor for coronary artery disease and encodes the long non-coding RNA (ncRNA) ANRIL. Here, we show that increased ANRIL expression mediates atherosclerosis risk through trans-regulation of gene networks leading to pro-atherogenic cellular properties, such as increased proliferation and adhesion. ANRIL may act as a scaffold, guiding effector-proteins to chromatin. These functions depend on an Alu motif present in ANRIL RNA and mirrored several thousand-fold in the genome. Alu elements are a family of primate-specific short interspersed repeat elements (SINEs) and have been linked with genetic disease. Current models propose that either exonisation of Alu elements or changes of cis-regulation of adjacent genes are the underlying disease mechanisms. Our work extends the function of Alu transposons to regulatory components of long ncRNAs with a central role in epigenetic trans-regulation. Furthermore, it implies a pivotal role for Alu elements in genetically determined vascular disease and describes a plausible molecular mechanism for a pro-atherogenic function of ANRIL at chromosome 9p21.
PMCID: PMC3701717  PMID: 23861667
6.  GLIS3, a Susceptibility Gene for Type 1 and Type 2 Diabetes, Modulates Pancreatic Beta Cell Apoptosis via Regulation of a Splice Variant of the BH3-Only Protein Bim 
PLoS Genetics  2013;9(5):e1003532.
Mutations in human Gli-similar (GLIS) 3 protein cause neonatal diabetes. The GLIS3 gene region has also been identified as a susceptibility risk locus for both type 1 and type 2 diabetes. GLIS3 plays a role in the generation of pancreatic beta cells and in insulin gene expression, but there is no information on the role of this gene on beta cell viability and/or susceptibility to immune- and metabolic-induced stress. GLIS3 knockdown (KD) in INS-1E cells, primary FACS-purified rat beta cells, and human islet cells decreased expression of MafA, Ins2, and Glut2 and inhibited glucose oxidation and insulin secretion, confirming the role of this transcription factor for the beta cell differentiated phenotype. GLIS3 KD increased beta cell apoptosis basally and sensitized the cells to death induced by pro-inflammatory cytokines (interleukin 1β + interferon-γ) or palmitate, agents that may contribute to beta cell loss in respectively type 1 and 2 diabetes. The increased cell death was due to activation of the intrinsic (mitochondrial) pathway of apoptosis, as indicated by cytochrome c release to the cytosol, Bax translocation to the mitochondria and activation of caspases 9 and 3. Analysis of the pathways implicated in beta cell apoptosis following GLIS3 KD indicated modulation of alternative splicing of the pro-apoptotic BH3-only protein Bim, favouring expression of the pro-death variant BimS via inhibition of the splicing factor SRp55. KD of Bim abrogated the pro-apoptotic effect of GLIS3 loss of function alone or in combination with cytokines or palmitate. The present data suggest that altered expression of the candidate gene GLIS3 may contribute to both type 1 and 2 type diabetes by favouring beta cell apoptosis. This is mediated by alternative splicing of the pro-apoptotic protein Bim and exacerbated formation of the most pro-apoptotic variant BimS.
Author Summary
Pancreatic beta cell dysfunction and death is a central event in the pathogenesis of diabetes. Genome-wide association studies have identified a large number of associations between specific loci and the two main forms of diabetes, namely type 1 and type 2 diabetes, but the mechanisms by which these candidate genes predispose to diabetes remain to be clarified. The GLIS3 gene region has been identified as a susceptibility risk locus for both type 1 and type 2 diabetes—it is actually the only locus showing association with both forms of diabetes and the regulation of blood glucose. We show that decreased expression of GLIS3 may contribute to diabetes by favouring beta cell apoptosis. This is mediated by the mitochondrial pathway of apoptosis, activated via alternative splicing (a process by which exons are joined in multiple ways, leading to the generation of several proteins by a single gene) of the pro-apoptotic protein Bim, which favours formation of the most pro-apoptotic variant. The present data provides the first evidence that a susceptibility gene for diabetes may contribute to disease via regulation of alternative splicing of a pro-apoptotic gene in pancreatic beta cells.
PMCID: PMC3667755  PMID: 23737756
7.  Human Genetics in Rheumatoid Arthritis Guides a High-Throughput Drug Screen of the CD40 Signaling Pathway 
PLoS Genetics  2013;9(5):e1003487.
Although genetic and non-genetic studies in mouse and human implicate the CD40 pathway in rheumatoid arthritis (RA), there are no approved drugs that inhibit CD40 signaling for clinical care in RA or any other disease. Here, we sought to understand the biological consequences of a CD40 risk variant in RA discovered by a previous genome-wide association study (GWAS) and to perform a high-throughput drug screen for modulators of CD40 signaling based on human genetic findings. First, we fine-map the CD40 risk locus in 7,222 seropositive RA patients and 15,870 controls, together with deep sequencing of CD40 coding exons in 500 RA cases and 650 controls, to identify a single SNP that explains the entire signal of association (rs4810485, P = 1.4×10−9). Second, we demonstrate that subjects homozygous for the RA risk allele have ∼33% more CD40 on the surface of primary human CD19+ B lymphocytes than subjects homozygous for the non-risk allele (P = 10−9), a finding corroborated by expression quantitative trait loci (eQTL) analysis in peripheral blood mononuclear cells from 1,469 healthy control individuals. Third, we use retroviral shRNA infection to perturb the amount of CD40 on the surface of a human B lymphocyte cell line (BL2) and observe a direct correlation between amount of CD40 protein and phosphorylation of RelA (p65), a subunit of the NF-κB transcription factor. Finally, we develop a high-throughput NF-κB luciferase reporter assay in BL2 cells activated with trimerized CD40 ligand (tCD40L) and conduct an HTS of 1,982 chemical compounds and FDA–approved drugs. After a series of counter-screens and testing in primary human CD19+ B cells, we identify 2 novel chemical inhibitors not previously implicated in inflammation or CD40-mediated NF-κB signaling. Our study demonstrates proof-of-concept that human genetics can be used to guide the development of phenotype-based, high-throughput small-molecule screens to identify potential novel therapies in complex traits such as RA.
Author Summary
A current challenge in human genetics is to follow-up “hits” from genome-wide association studies (GWAS) to guide drug discovery for complex traits. Previously, we identified a common variant in the CD40 locus as associated with risk of rheumatoid arthritis (RA). Here, we fine-map the CD40 signal of association through a combination of dense genotyping and exonic sequencing in large patient collections. Further, we demonstrate that the RA risk allele is a gain-of-function allele that increases the amount of CD40 on the surface of primary human B lymphocyte cells from healthy control individuals. Based on these observations, we develop a high-throughput assay to recapitulate the biology of the RA risk allele in a system suitable for a small molecule drug screen. After a series of primary screens and counter screens, we identify small molecules that inhibit CD40-mediated NF-kB signaling in human B cells. While this is only the first step towards a more comprehensive effort to identify CD40-specific inhibitors that may be used to treat RA, our study demonstrates a successful strategy to progress from a GWAS to a drug screen for complex traits such as RA.
PMCID: PMC3656093  PMID: 23696745
8.  RNAi–Based Functional Profiling of Loci from Blood Lipid Genome-Wide Association Studies Identifies Genes with Cholesterol-Regulatory Function 
PLoS Genetics  2013;9(2):e1003338.
Genome-wide association studies (GWAS) are powerful tools to unravel genomic loci associated with common traits and complex human disease. However, GWAS only rarely reveal information on the exact genetic elements and pathogenic events underlying an association. In order to extract functional information from genomic data, strategies for systematic follow-up studies on a phenotypic level are required. Here we address these limitations by applying RNA interference (RNAi) to analyze 133 candidate genes within 56 loci identified by GWAS as associated with blood lipid levels, coronary artery disease, and/or myocardial infarction for a function in regulating cholesterol levels in cells. Knockdown of a surprisingly high number (41%) of trait-associated genes affected low-density lipoprotein (LDL) internalization and/or cellular levels of free cholesterol. Our data further show that individual GWAS loci may contain more than one gene with cholesterol-regulatory functions. Using a set of secondary assays we demonstrate for a number of genes without previously known lipid-regulatory roles (e.g. CXCL12, FAM174A, PAFAH1B1, SEZ6L, TBL2, WDR12) that knockdown correlates with altered LDL–receptor levels and/or that overexpression as GFP–tagged fusion proteins inversely modifies cellular cholesterol levels. By providing strong evidence for disease-relevant functions of lipid trait-associated genes, our study demonstrates that quantitative, cell-based RNAi is a scalable strategy for a systematic, unbiased detection of functional effectors within GWAS loci.
Author Summary
Complex traits and diseases are assumed to result from interactions between multiple genes in relevant biological processes. Recent genome-wide association studies have uncovered many novel genomic loci where genes with functional significance are expected. However, functional validation of such genes has thus far remained confined to single gene approaches. Here, we use RNA interference and high-content screening microscopy to profile 133 genes at 56 loci associated with blood lipid traits, cardiovascular disease, and/or myocardial infarction for a function in regulating cellular free cholesterol levels and the efficiency of low-density lipoprotein uptake. Our results suggest that a high number of trait-associated genes have conserved cholesterol-regulatory functions in cells, with several GWAS loci harboring more than one gene of likely functional significance. For a number of genes without previously known lipid-regulatory functions, consequences upon siRNA knockdown positively correlated with cellular levels of LDL receptor, a major determinant of blood LDL levels. Moreover, GFP–tagged fusion proteins of several candidates shifted cellular cholesterol levels to inverse directions than knockdown, and subcellular localization of some candidates was sterol-dependent. Our study generates a valuable resource for prioritization of lipid-trait/CAD/MI-associated genes for future in-depth mechanistic analyses and introduces cell-based RNAi as a scalable and unbiased tool for functional follow-up of GWAS loci.
PMCID: PMC3585126  PMID: 23468663
9.  A Systematic Mapping Approach of 16q12.2/FTO and BMI in More Than 20,000 African Americans Narrows in on the Underlying Functional Variation: Results from the Population Architecture using Genomics and Epidemiology (PAGE) Study 
PLoS Genetics  2013;9(1):e1003171.
Genetic variants in intron 1 of the fat mass– and obesity-associated (FTO) gene have been consistently associated with body mass index (BMI) in Europeans. However, follow-up studies in African Americans (AA) have shown no support for some of the most consistently BMI–associated FTO index single nucleotide polymorphisms (SNPs). This is most likely explained by different race-specific linkage disequilibrium (LD) patterns and lower correlation overall in AA, which provides the opportunity to fine-map this region and narrow in on the functional variant. To comprehensively explore the 16q12.2/FTO locus and to search for second independent signals in the broader region, we fine-mapped a 646–kb region, encompassing the large FTO gene and the flanking gene RPGRIP1L by investigating a total of 3,756 variants (1,529 genotyped and 2,227 imputed variants) in 20,488 AAs across five studies. We observed associations between BMI and variants in the known FTO intron 1 locus: the SNP with the most significant p-value, rs56137030 (8.3×10−6) had not been highlighted in previous studies. While rs56137030was correlated at r2>0.5 with 103 SNPs in Europeans (including the GWAS index SNPs), this number was reduced to 28 SNPs in AA. Among rs56137030 and the 28 correlated SNPs, six were located within candidate intronic regulatory elements, including rs1421085, for which we predicted allele-specific binding affinity for the transcription factor CUX1, which has recently been implicated in the regulation of FTO. We did not find strong evidence for a second independent signal in the broader region. In summary, this large fine-mapping study in AA has substantially reduced the number of common alleles that are likely to be functional candidates of the known FTO locus. Importantly our study demonstrated that comprehensive fine-mapping in AA provides a powerful approach to narrow in on the functional candidate(s) underlying the initial GWAS findings in European populations.
Author Summary
Genetic variants within the fat mass– and obesity-associated (FTO) gene are associated with increased risk of obesity. To better understand which specific genetic variant(s) in this genetic region is associated with obesity risk, we attempt to genotype or impute all known genetic variants in the region and test for association with body mass index as a measurement of obesity in over 20,000 African Americans. We identified 29 potential candidate variants, of which one variant (rs1421085) is a particularly interesting candidate for future functional follow-up studies. Our example shows the powerful approach of studying a large African American population, substantially reducing the number of possible functional variants compared with European descent populations.
PMCID: PMC3547789  PMID: 23341774
10.  Integrative Analysis of a Cross-Loci Regulation Network Identifies App as a Gene Regulating Insulin Secretion from Pancreatic Islets 
PLoS Genetics  2012;8(12):e1003107.
Complex diseases result from molecular changes induced by multiple genetic factors and the environment. To derive a systems view of how genetic loci interact in the context of tissue-specific molecular networks, we constructed an F2 intercross comprised of >500 mice from diabetes-resistant (B6) and diabetes-susceptible (BTBR) mouse strains made genetically obese by the Leptinob/ob mutation (Lepob). High-density genotypes, diabetes-related clinical traits, and whole-transcriptome expression profiling in five tissues (white adipose, liver, pancreatic islets, hypothalamus, and gastrocnemius muscle) were determined for all mice. We performed an integrative analysis to investigate the inter-relationship among genetic factors, expression traits, and plasma insulin, a hallmark diabetes trait. Among five tissues under study, there are extensive protein–protein interactions between genes responding to different loci in adipose and pancreatic islets that potentially jointly participated in the regulation of plasma insulin. We developed a novel ranking scheme based on cross-loci protein-protein network topology and gene expression to assess each gene's potential to regulate plasma insulin. Unique candidate genes were identified in adipose tissue and islets. In islets, the Alzheimer's gene App was identified as a top candidate regulator. Islets from 17-week-old, but not 10-week-old, App knockout mice showed increased insulin secretion in response to glucose or a membrane-permeant cAMP analog, in agreement with the predictions of the network model. Our result provides a novel hypothesis on the mechanism for the connection between two aging-related diseases: Alzheimer's disease and type 2 diabetes.
Author Summary
Alzheimer's disease and type 2 diabetes are two common aging-related diseases. Numerous studies have shown that the two diseases are associated. However, the mechanisms of such connection are not clear. Both diseases are complex diseases that are induced by multiple genetic factors and the environment. To understand the molecular network regulated by complex genetic factors causing type 2 diabetes, we constructed an F2 intercross comprised of >500 mice from diabetes-resistant and diabetic mouse strains. We measured genotypes, clinical traits, and expression profiling in five tissues for each mouse. We then performed an integrative analysis to investigate the inter-relationship among genetic factors, expression traits, and plasma insulin, a hallmark diabetes trait, and developed a novel method for inferring key regulators for regulating plasma insulin. In islets, the Alzheimer's gene App was identified as a top candidate regulator. Islets from 17-week-old, but not 10-week-old, App knockout mice showed increased insulin secretion in response to glucose, in agreement with the predictions of the network model. Our result provides a novel hypothesis on the mechanism for the connection between two aging-related diseases: Alzheimer's disease and type 2 diabetes.
PMCID: PMC3516550  PMID: 23236292
11.  Population-Based Resequencing of APOA1 in 10,330 Individuals: Spectrum of Genetic Variation, Phenotype, and Comparison with Extreme Phenotype Approach 
PLoS Genetics  2012;8(11):e1003063.
Rare genetic variants, identified by in-detail resequencing of loci, may contribute to complex traits. We used the apolipoprotein A-I gene (APOA1), a major high-density lipoprotein (HDL) gene, and population-based resequencing to determine the spectrum of genetic variants, the phenotypic characteristics of these variants, and how these results compared with results based on resequencing only the extremes of the apolipoprotein A-I (apoA-I) distribution. First, we resequenced APOA1 in 10,330 population-based participants in the Copenhagen City Heart Study. The spectrum and distribution of genetic variants was determined as a function of the number of individuals resequenced. Second, apoA-I and HDL cholesterol phenotypes were determined for nonsynonymous (NS) and synonymous (S) variants and were validated in the Copenhagen General Population Study (n = 45,239). Third, observed phenotypes were compared with those predicted using an extreme phenotype approach based on the apoA-I distribution. Our results are as follows: First, population-based resequencing of APOA1 identified 40 variants of which only 7 (18%) had minor allele frequencies >1%, and most were exceedingly rare. Second, 0.27% of individuals in the general population were heterozygous for NS variants which were associated with substantial reductions in apoA-I (up to 39 mg/dL) and/or HDL cholesterol (up to 0.9 mmol/L) and, surprisingly, 0.41% were heterozygous for variants predisposing to amyloidosis. NS variants associated with a hazard ratio of 1.72 (1.09–2.70) for myocardial infarction (MI), largely driven by A164S, a variant not associated with apoA-I or HDL cholesterol levels. Third, using the extreme apoA-I phenotype approach, NS variants correctly predicted the apoA-I phenotype observed in the population-based resequencing. However, using the extreme approach, between 79% (screening 0–1st percentile) and 21% (screening 0–20th percentile) of all variants were not identified; among these were variants previously associated with amyloidosis. Population-based resequencing of APOA1 identified a majority of rare NS variants associated with reduced apoA-1 and HDL cholesterol levels and/or predisposing to amyloidosis. In addition, NS variants associated with increased risk of MI.
Author Summary
Rare genetic variants, identified by in-detail resequencing of loci, may contribute to complex traits. We used the apolipoprotein A-I gene (APOA1), a major high-density lipoprotein (HDL) gene, and population-based resequencing to determine the spectrum of genetic variants, the phenotypic characteristics of these variants, and how these results compared with results based on resequencing only the extremes of the apolipoprotein A-I (apoA-I) distribution. By resequencing APOA1 in >10,000 Danes and genotyping an additional >45,000, we show that population-based resequencing of APOA1 identifies a majority of rare genetic variants that together are relatively frequent: 0.27% of the population are heterozygous for nonsynonymous (NS) variants in APOA1 that associate with substantial reductions in apoA-I and HDL cholesterol, and 0.41% are heterozygous for variants predisposing to amyloidosis. NS variants associated with a hazard ratio of 1.72 (1.09–2.70) for myocardial infarction (MI), largely driven by A164S, a variant not associated with apoA-I or HDL cholesterol levels. Resequencing only the extremes of the apoA-I distribution, between 79% and 21% of all variants are not identified; among these are variants previously associated with amyloidosis. These results provide direct evidence that rare NS variants in APOA1 contribute to low apoA-I and HDL cholesterol levels, to susceptibility to amyloidosis, and to risk of MI in the general population.
PMCID: PMC3510059  PMID: 23209431
12.  Mining the Unknown: A Systems Approach to Metabolite Identification Combining Genetic and Metabolic Information 
PLoS Genetics  2012;8(10):e1003005.
Recent genome-wide association studies (GWAS) with metabolomics data linked genetic variation in the human genome to differences in individual metabolite levels. A strong relevance of this metabolic individuality for biomedical and pharmaceutical research has been reported. However, a considerable amount of the molecules currently quantified by modern metabolomics techniques are chemically unidentified. The identification of these “unknown metabolites” is still a demanding and intricate task, limiting their usability as functional markers of metabolic processes. As a consequence, previous GWAS largely ignored unknown metabolites as metabolic traits for the analysis. Here we present a systems-level approach that combines genome-wide association analysis and Gaussian graphical modeling with metabolomics to predict the identity of the unknown metabolites. We apply our method to original data of 517 metabolic traits, of which 225 are unknowns, and genotyping information on 655,658 genetic variants, measured in 1,768 human blood samples. We report previously undescribed genotype–metabotype associations for six distinct gene loci (SLC22A2, COMT, CYP3A5, CYP2C18, GBA3, UGT3A1) and one locus not related to any known gene (rs12413935). Overlaying the inferred genetic associations, metabolic networks, and knowledge-based pathway information, we derive testable hypotheses on the biochemical identities of 106 unknown metabolites. As a proof of principle, we experimentally confirm nine concrete predictions. We demonstrate the benefit of our method for the functional interpretation of previous metabolomics biomarker studies on liver detoxification, hypertension, and insulin resistance. Our approach is generic in nature and can be directly transferred to metabolomics data from different experimental platforms.
Author Summary
Genome-wide association studies on metabolomics data have demonstrated that genetic variation in metabolic enzymes and transporters leads to concentration changes in the respective metabolite levels. The conventional goal of these studies is the detection of novel interactions between the genome and the metabolic system, providing valuable insights for both basic research as well as clinical applications. In this study, we borrow the metabolomics GWAS concept for a novel, entirely different purpose. Metabolite measurements frequently produce signals where a certain substance can be reliably detected in the sample, but it has not yet been elucidated which specific metabolite this signal actually represents. The concept is comparable to a fingerprint: each one is uniquely identifiable, but as long as it is not registered in a database one cannot tell to whom this fingerprint belongs. Obviously, this issue tremendously reduces the usability of a metabolomics analyses. The genetic associations of such an “unknown,” however, give us concrete evidence of the metabolic pathway this substance is most probably involved in. Moreover, we complement the approach with a specific measure of correlation between metabolites, providing further evidence of the metabolic processes of the unknown. For a number of cases, this even allows for a concrete identity prediction, which we then experimentally validate in the lab.
PMCID: PMC3475673  PMID: 23093944
13.  Extent, Causes, and Consequences of Small RNA Expression Variation in Human Adipose Tissue 
PLoS Genetics  2012;8(5):e1002704.
Small RNAs are functional molecules that modulate mRNA transcripts and have been implicated in the aetiology of several common diseases. However, little is known about the extent of their variability within the human population. Here, we characterise the extent, causes, and effects of naturally occurring variation in expression and sequence of small RNAs from adipose tissue in relation to genotype, gene expression, and metabolic traits in the MuTHER reference cohort. We profiled the expression of 15 to 30 base pair RNA molecules in subcutaneous adipose tissue from 131 individuals using high-throughput sequencing, and quantified levels of 591 microRNAs and small nucleolar RNAs. We identified three genetic variants and three RNA editing events. Highly expressed small RNAs are more conserved within mammals than average, as are those with highly variable expression. We identified 14 genetic loci significantly associated with nearby small RNA expression levels, seven of which also regulate an mRNA transcript level in the same region. In addition, these loci are enriched for variants significant in genome-wide association studies for body mass index. Contrary to expectation, we found no evidence for negative correlation between expression level of a microRNA and its target mRNAs. Trunk fat mass, body mass index, and fasting insulin were associated with more than twenty small RNA expression levels each, while fasting glucose had no significant associations. This study highlights the similar genetic complexity and shared genetic control of small RNA and mRNA transcripts, and gives a quantitative picture of small RNA expression variation in the human population.
Author Summary
Genetic information is transmitted to the cell only through RNA molecules. A special class of RNAs is comprised of the small (up to 30 nucleotide) ones, known to be potent regulators of various cellular processes. At the same time, they have not been as widely studied as messenger RNAs—we do not know how much variation in their sequence and expression level occurs naturally in human populations or how this variability influences other traits. We measured small RNA levels and genetic variability in fat tissue from 131 individuals by high-throughput sequencing. We could associate the expression levels with genetic background of the individuals, as well as changes in metabolic traits. Surprisingly, we found no large scale influence of small RNA variation on mRNA levels, their main regulatory target. Overall, our study is the first to give a quantitative picture of the naturally occurring variation in these important regulatory molecules in human fat tissue.
PMCID: PMC3349731  PMID: 22589741
14.  Epigenome-Wide Scans Identify Differentially Methylated Regions for Age and Age-Related Phenotypes in a Healthy Ageing Population 
PLoS Genetics  2012;8(4):e1002629.
Age-related changes in DNA methylation have been implicated in cellular senescence and longevity, yet the causes and functional consequences of these variants remain unclear. To elucidate the role of age-related epigenetic changes in healthy ageing and potential longevity, we tested for association between whole-blood DNA methylation patterns in 172 female twins aged 32 to 80 with age and age-related phenotypes. Twin-based DNA methylation levels at 26,690 CpG-sites showed evidence for mean genome-wide heritability of 18%, which was supported by the identification of 1,537 CpG-sites with methylation QTLs in cis at FDR 5%. We performed genome-wide analyses to discover differentially methylated regions (DMRs) for sixteen age-related phenotypes (ap-DMRs) and chronological age (a-DMRs). Epigenome-wide association scans (EWAS) identified age-related phenotype DMRs (ap-DMRs) associated with LDL (STAT5A), lung function (WT1), and maternal longevity (ARL4A, TBX20). In contrast, EWAS for chronological age identified hundreds of predominantly hyper-methylated age DMRs (490 a-DMRs at FDR 5%), of which only one (TBX20) was also associated with an age-related phenotype. Therefore, the majority of age-related changes in DNA methylation are not associated with phenotypic measures of healthy ageing in later life. We replicated a large proportion of a-DMRs in a sample of 44 younger adult MZ twins aged 20 to 61, suggesting that a-DMRs may initiate at an earlier age. We next explored potential genetic and environmental mechanisms underlying a-DMRs and ap-DMRs. Genome-wide overlap across cis-meQTLs, genotype-phenotype associations, and EWAS ap-DMRs identified CpG-sites that had cis-meQTLs with evidence for genotype–phenotype association, where the CpG-site was also an ap-DMR for the same phenotype. Monozygotic twin methylation difference analyses identified one potential environmentally-mediated ap-DMR associated with total cholesterol and LDL (CSMD1). Our results suggest that in a small set of genes DNA methylation may be a candidate mechanism of mediating not only environmental, but also genetic effects on age-related phenotypes.
Author Summary
Epigenetic patterns vary during healthy ageing and development. Age-related DNA methylation changes have been implicated in cellular senescence and longevity, yet the causes and functional consequences of these variants remain unclear. To understand the biological mechanisms involved in potential longevity and rate of healthy ageing, we performed genome-wide association of epigenetic and genetic variation with both chronological age and age-related phenotypes. We identified hundreds of DNA methylation variants significantly associated with age and replicated these in an independent sample of young adult twins. Only a small proportion of these variants were also associated with age-related phenotypes. Therefore, the majority of age-related epigenetic changes do not contribute to rate of healthy ageing at later stages in life. Our results suggest that age-related changes in methylation occur throughout an individual's lifespan and that a proportion of these may be initiated from an early age. Intriguingly, a fraction of the age differentially methylated regions also associated with genetic variants in our sample, suggesting that DNA methylation may be a candidate mechanism of mediating not only environmental but also genetic effects on age-related phenotypes.
PMCID: PMC3330116  PMID: 22532803
15.  The Human Pancreatic Islet Transcriptome: Expression of Candidate Genes for Type 1 Diabetes and the Impact of Pro-Inflammatory Cytokines 
PLoS Genetics  2012;8(3):e1002552.
Type 1 diabetes (T1D) is an autoimmune disease in which pancreatic beta cells are killed by infiltrating immune cells and by cytokines released by these cells. Signaling events occurring in the pancreatic beta cells are decisive for their survival or death in diabetes. We have used RNA sequencing (RNA–seq) to identify transcripts, including splice variants, expressed in human islets of Langerhans under control conditions or following exposure to the pro-inflammatory cytokines interleukin-1β (IL-1β) and interferon-γ (IFN-γ). Based on this unique dataset, we examined whether putative candidate genes for T1D, previously identified by GWAS, are expressed in human islets. A total of 29,776 transcripts were identified as expressed in human islets. Expression of around 20% of these transcripts was modified by pro-inflammatory cytokines, including apoptosis- and inflammation-related genes. Chemokines were among the transcripts most modified by cytokines, a finding confirmed at the protein level by ELISA. Interestingly, 35% of the genes expressed in human islets undergo alternative splicing as annotated in RefSeq, and cytokines caused substantial changes in spliced transcripts. Nova1, previously considered a brain-specific regulator of mRNA splicing, is expressed in islets and its knockdown modified splicing. 25/41 of the candidate genes for T1D are expressed in islets, and cytokines modified expression of several of these transcripts. The present study doubles the number of known genes expressed in human islets and shows that cytokines modify alternative splicing in human islet cells. Importantly, it indicates that more than half of the known T1D candidate genes are expressed in human islets. This, and the production of a large number of chemokines and cytokines by cytokine-exposed islets, reinforces the concept of a dialog between pancreatic islets and the immune system in T1D. This dialog is modulated by candidate genes for the disease at both the immune system and beta cell level.
Author Summary
Pancreatic beta cells are destroyed by the immune system in type 1 diabetes mellitus, causing insulin dependence for life. Candidate genes for diabetes contribute to this process by acting both at the immune system and, as we suggest here, at the pancreatic beta cell level. We have utilized a novel technology, RNA sequencing, to define all transcripts expressed in human pancreatic islets under basal conditions and following exposure to cytokines, pro-inflammatory mediators that contribute to trigger diabetes. Our observations double the number of known genes present in human islets and indicate that >60% of the candidate genes for type 1 diabetes are expressed in beta cells. The data also show that pro-inflammatory cytokines modify alternative splicing in human islets, a process that may generate novel RNAs and proteins recognizable by the immune system. This, taken together with the findings that pancreatic beta cells themselves express and release many cytokines and chemokines (proteins that attract immune cells), indicates that early type 1 diabetes is characterized by a dialog between beta cells and the immune system. We suggest that candidate genes for diabetes function at least in part as “writers” for the beta cell words in this dialog.
PMCID: PMC3297576  PMID: 22412385
16.  Coexpression Network Analysis in Abdominal and Gluteal Adipose Tissue Reveals Regulatory Genetic Loci for Metabolic Syndrome and Related Phenotypes 
PLoS Genetics  2012;8(2):e1002505.
Metabolic Syndrome (MetS) is highly prevalent and has considerable public health impact, but its underlying genetic factors remain elusive. To identify gene networks involved in MetS, we conducted whole-genome expression and genotype profiling on abdominal (ABD) and gluteal (GLU) adipose tissue, and whole blood (WB), from 29 MetS cases and 44 controls. Co-expression network analysis for each tissue independently identified nine, six, and zero MetS–associated modules of coexpressed genes in ABD, GLU, and WB, respectively. Of 8,992 probesets expressed in ABD or GLU, 685 (7.6%) were expressed in ABD and 51 (0.6%) in GLU only. Differential eigengene network analysis of 8,256 shared probesets detected 22 shared modules with high preservation across adipose depots (DABD-GLU = 0.89), seven of which were associated with MetS (FDR P<0.01). The strongest associated module, significantly enriched for immune response–related processes, contained 94/620 (15%) genes with inter-depot differences. In an independent cohort of 145/141 twins with ABD and WB longitudinal expression data, median variability in ABD due to familiality was greater for MetS–associated versus un-associated modules (ABD: 0.48 versus 0.18, P = 0.08; GLU: 0.54 versus 0.20, P = 7.8×10−4). Cis-eQTL analysis of probesets associated with MetS (FDR P<0.01) and/or inter-depot differences (FDR P<0.01) provided evidence for 32 eQTLs. Corresponding eSNPs were tested for association with MetS–related phenotypes in two GWAS of >100,000 individuals; rs10282458, affecting expression of RARRES2 (encoding chemerin), was associated with body mass index (BMI) (P = 6.0×10−4); and rs2395185, affecting inter-depot differences of HLA-DRB1 expression, was associated with high-density lipoprotein (P = 8.7×10−4) and BMI–adjusted waist-to-hip ratio (P = 2.4×10−4). Since many genes and their interactions influence complex traits such as MetS, integrated analysis of genotypes and coexpression networks across multiple tissues relevant to clinical traits is an efficient strategy to identify novel associations.
Author Summary
Metabolic Syndrome (MetS) is a highly prevalent disorder with considerable public health concern, but its underlying genetic factors remain elusive. Given that most cellular components exert their functions through interactions with other cellular components, even the largest of genome-wide association (GWA) studies may often not detect their effects, nor necessarily provide insight into the complex molecular mechanisms of the disease. Rather than focusing on individual genes, the analysis of coexpression networks can be used for finding clusters (modules) of correlated expression levels across samples. In this study, we used a gene network–based approach for integrating clinical MetS, genotypic, and gene expression data from abdominal and gluteal adipose tissue and whole blood. We identified modules of genes related to MetS significantly enriched for immune response and oxidative phosphorylation pathways. We tested SNPs for association with MetS–associated expression (eSNPs), and tested prioritised eSNPs for association with MetS–related phenotypes in two large-scale GWA datasets. We identified two loci, neither of which had reached genome-wide significance levels in GWAs, associated with expression levels of RARRES2 and HLA-DRB1 and with MetS–related phenotypes, demonstrating that the integrated analysis of genotype and expression data from relevant multiple tissues can identify novel associations with complex traits such as MetS.
PMCID: PMC3285582  PMID: 22383892
17.  Genome-Wide Association Study Identifies Chromosome 10q24.32 Variants Associated with Arsenic Metabolism and Toxicity Phenotypes in Bangladesh 
PLoS Genetics  2012;8(2):e1002522.
Arsenic contamination of drinking water is a major public health issue in many countries, increasing risk for a wide array of diseases, including cancer. There is inter-individual variation in arsenic metabolism efficiency and susceptibility to arsenic toxicity; however, the basis of this variation is not well understood. Here, we have performed the first genome-wide association study (GWAS) of arsenic-related metabolism and toxicity phenotypes to improve our understanding of the mechanisms by which arsenic affects health. Using data on urinary arsenic metabolite concentrations and approximately 300,000 genome-wide single nucleotide polymorphisms (SNPs) for 1,313 arsenic-exposed Bangladeshi individuals, we identified genome-wide significant association signals (P<5×10−8) for percentages of both monomethylarsonic acid (MMA) and dimethylarsinic acid (DMA) near the AS3MT gene (arsenite methyltransferase; 10q24.32), with five genetic variants showing independent associations. In a follow-up analysis of 1,085 individuals with arsenic-induced premalignant skin lesions (the classical sign of arsenic toxicity) and 1,794 controls, we show that one of these five variants (rs9527) is also associated with skin lesion risk (P = 0.0005). Using a subset of individuals with prospectively measured arsenic (n = 769), we show that rs9527 interacts with arsenic to influence incident skin lesion risk (P = 0.01). Expression quantitative trait locus (eQTL) analyses of genome-wide expression data from 950 individual's lymphocyte RNA suggest that several of our lead SNPs represent cis-eQTLs for AS3MT (P = 10−12) and neighboring gene C10orf32 (P = 10−44), which are involved in C10orf32-AS3MT read-through transcription. This is the largest and most comprehensive genomic investigation of arsenic metabolism and toxicity to date, the only GWAS of any arsenic-related trait, and the first study to implicate 10q24.32 variants in both arsenic metabolism and arsenical skin lesion risk. The observed patterns of associations suggest that MMA% and DMA% have distinct genetic determinants and support the hypothesis that DMA is the less toxic of these two methylated arsenic species. These results have potential translational implications for the prevention and treatment of arsenic-associated toxicities worldwide.
Author Summary
Exposure to arsenic through drinking water is a serious public health issue in many countries, including Bangladesh and the United States. Although there is substantial inter-individual variation in arsenic metabolism and toxicity, the biological basis of this variation is not well understood. Here, we have conducted the first genome-wide association study of arsenic-related traits within a unique population cohort of arsenic-exposed Bangladeshi individuals. Using data on 1,313 well-characterized individuals, we identify multiple association signals for urinary arsenic metabolite concentrations in the 10q24.32 regions, near the AS3MT (arsenite methyltransferase) gene. In a subsequent analysis of >2,000 individuals, we show for the first time that variants that influence arsenic metabolism can also influence risk for arsenical skin lesions (the classical sign of arsenic toxicity) through interaction with arsenic exposure. Using array-based genome-wide gene expression data, we show that several of our lead genetic variants are associated with expression of AS3MT and neighboring gene C10orf32, providing a potential mechanism by which 10q24.32 variants influence arsenic metabolism and toxicity. Knowledge of variation in this region and associated biological processes could be used to develop intervention and pharmacological strategies aimed at preventing large numbers of arsenic-related deaths in arsenic-exposed populations.
PMCID: PMC3285587  PMID: 22383894
18.  A Genome-Wide Association Study Identified AFF1 as a Susceptibility Locus for Systemic Lupus Eyrthematosus in Japanese 
PLoS Genetics  2012;8(1):e1002455.
Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Although recent genome-wide association studies (GWAS) have contributed to discovery of SLE susceptibility genes, few studies has been performed in Asian populations. Here, we report a GWAS for SLE examining 891 SLE cases and 3,384 controls and multi-stage replication studies examining 1,387 SLE cases and 28,564 controls in Japanese subjects. Considering that expression quantitative trait loci (eQTLs) have been implicated in genetic risks for autoimmune diseases, we integrated an eQTL study into the results of the GWAS. We observed enrichments of cis-eQTL positive loci among the known SLE susceptibility loci (30.8%) compared to the genome-wide SNPs (6.9%). In addition, we identified a novel association of a variant in the AF4/FMR2 family, member 1 (AFF1) gene at 4q21 with SLE susceptibility (rs340630; P = 8.3×10−9, odds ratio = 1.21). The risk A allele of rs340630 demonstrated a cis-eQTL effect on the AFF1 transcript with enhanced expression levels (P<0.05). As AFF1 transcripts were prominently expressed in CD4+ and CD19+ peripheral blood lymphocytes, up-regulation of AFF1 may cause the abnormality in these lymphocytes, leading to disease onset.
Author Summary
Although recent genome-wide association study (GWAS) approaches have successfully contributed to disease gene discovery, many susceptibility loci are known to be still uncaptured due to strict significance threshold for multiple hypothesis testing. Therefore, prioritization of GWAS results by incorporating additional information is recommended. Systemic lupus erythematosus (SLE) is an autoimmune disease that causes multiple organ damage. Considering that abnormalities in B cell activity play essential roles in SLE, prioritization based on an expression quantitative trait loci (eQTLs) study for B cells would be a promising approach. In this study, we report a GWAS and multi-stage replication studies for SLE examining 2,278 SLE cases and 31,948 controls in Japanese subjects. We integrated eQTL study into the results of the GWAS and identified AFF1 as a novel SLE susceptibility loci. We also confirmed cis-regulatory effect of the locus on the AFF1 transcript. Our study would be one of the initial successes for detecting novel genetic locus using the eQTL study, and it should contribute to our understanding of the genetic loci being uncaptured by standard GWAS approaches.
PMCID: PMC3266877  PMID: 22291604
19.  Association of NCF2, IKZF1, IRF8, IFIH1, and TYK2 with Systemic Lupus Erythematosus 
PLoS Genetics  2011;7(10):e1002341.
Systemic lupus erythematosus (SLE) is a complex trait characterised by the production of a range of auto-antibodies and a diverse set of clinical phenotypes. Currently, ∼8% of the genetic contribution to SLE in Europeans is known, following publication of several moderate-sized genome-wide (GW) association studies, which identified loci with a strong effect (OR>1.3). In order to identify additional genes contributing to SLE susceptibility, we conducted a replication study in a UK dataset (870 cases, 5,551 controls) of 23 variants that showed moderate-risk for lupus in previous studies. Association analysis in the UK dataset and subsequent meta-analysis with the published data identified five SLE susceptibility genes reaching genome-wide levels of significance (Pcomb<5×10−8): NCF2 (Pcomb = 2.87×10−11), IKZF1 (Pcomb = 2.33×10−9), IRF8 (Pcomb = 1.24×10−8), IFIH1 (Pcomb = 1.63×10−8), and TYK2 (Pcomb = 3.88×10−8). Each of the five new loci identified here can be mapped into interferon signalling pathways, which are known to play a key role in the pathogenesis of SLE. These results increase the number of established susceptibility genes for lupus to ∼30 and validate the importance of using large datasets to confirm associations of loci which moderately increase the risk for disease.
Author Summary
Genome-wide association studies have revolutionised our ability to identify common susceptibility alleles for systemic lupus erythematosus (SLE). In complex diseases such as SLE, where many different genes make a modest contribution to disease susceptibility, it is necessary to perform large-scale association studies to combine results from several datasets, to have sufficient power to identify highly significant novel loci (P<5×10−8). Using a large SLE collection of 870 UK SLE cases and 5,551 UK unaffected individuals, we firstly replicated ten moderate-risk alleles (P<0.05) from a US–Swedish study of 3,273 SLE cases and 12,188 healthy controls. Combining our results with the US-Swedish data identified five new loci, which crossed the level for genome-wide significance: NCF2 (neutrophil cytosolic factor 2), IKZF1 (Ikaros family zinc-finger 1), IRF8 (interferon regulatory factor 8), IFIH1 (interferon-induced helicase C domain-containing protein 1), and TYK2 (tyrosine kinase 2). Each of these five genes regulates a different aspect of the immune response and contributes to the production of type-I and type-II interferons. Although further studies will be required to identify the causal alleles within these loci, the confirmation of five new susceptibility genes for lupus makes a significant step forward in our understanding of the genetic contribution to SLE.
PMCID: PMC3203198  PMID: 22046141
20.  A Genome-Wide Meta-Analysis of Six Type 1 Diabetes Cohorts Identifies Multiple Associated Loci 
PLoS Genetics  2011;7(9):e1002293.
Diabetes impacts approximately 200 million people worldwide, of whom approximately 10% are affected by type 1 diabetes (T1D). The application of genome-wide association studies (GWAS) has robustly revealed dozens of genetic contributors to the pathogenesis of T1D, with the most recent meta-analysis identifying in excess of 40 loci. To identify additional genetic loci for T1D susceptibility, we examined associations in the largest meta-analysis to date between the disease and ∼2.54 million SNPs in a combined cohort of 9,934 cases and 16,956 controls. Targeted follow-up of 53 SNPs in 1,120 affected trios uncovered three new loci associated with T1D that reached genome-wide significance. The most significantly associated SNP (rs539514, P = 5.66×10−11) resides in an intronic region of the LMO7 (LIM domain only 7) gene on 13q22. The second most significantly associated SNP (rs478222, P = 3.50×10−9) resides in an intronic region of the EFR3B (protein EFR3 homolog B) gene on 2p23; however, the region of linkage disequilibrium is approximately 800 kb and harbors additional multiple genes, including NCOA1, C2orf79, CENPO, ADCY3, DNAJC27, POMC, and DNMT3A. The third most significantly associated SNP (rs924043, P = 8.06×10−9) lies in an intergenic region on 6q27, where the region of association is approximately 900 kb and harbors multiple genes including WDR27, C6orf120, PHF10, TCTE3, C6orf208, LOC154449, DLL1, FAM120B, PSMB1, TBP, and PCD2. These latest associated regions add to the growing repertoire of gene networks predisposing to T1D.
Author Summary
Despite the fact that there is clearly a large genetic component to type 1 diabetes (T1D), uncovering the genes contributing to this disease has proven challenging. However, in the past three years there has been relatively major progress in this regard, with advances in genetic screening technologies allowing investigators to scan the genome for variants conferring risk for disease without prior hypotheses. Such genome-wide association studies have revealed multiple regions of the genome to be robustly and consistently associated with T1D. More recent findings have been a consequence of combining of multiple datasets from independent investigators in meta-analyses, which have more power to pick up additional variants contributing to the trait. In the current study, we describe the largest meta-analysis of T1D genome-wide genotyped datasets to date, which combines six large studies. As a consequence, we have uncovered three new signals residing at the chromosomal locations 13q22, 2p23, and 6q27, which went on to be replicated in independent sample sets. These latest associated regions add to the growing repertoire of gene networks predisposing to T1D.
PMCID: PMC3183083  PMID: 21980299
21.  A Genome-Wide Metabolic QTL Analysis in Europeans Implicates Two Loci Shaped by Recent Positive Selection 
PLoS Genetics  2011;7(9):e1002270.
We have performed a metabolite quantitative trait locus (mQTL) study of the 1H nuclear magnetic resonance spectroscopy (1H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by 1H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10−11
Author Summary
Physiological concentrations of metabolites—small molecules involved in biochemical processes in living systems—can be measured and used to diagnose and predict disease states. A common goal is to detect and clinically exploit statistical differences in metabolite concentrations between diseased and healthy individuals. As a basis for the design and interpretation of case-control studies, it is useful to have a characterization of metabolic diversity amongst healthy individuals, some of which stems from inter-individual genetic variation. When a single genetic locus has a sufficiently strong effect on metabolism, its genomic position can be determined by collecting metabolite concentration data and genome-wide genotype data on a set of individuals and searching for associations between the two data sets—a so-called metabolite quantitative trait locus (mQTL) study. By so tracing mQTLs, we can identify the genetic drivers of metabolism, characterize how the nature or quantity of the corresponding expressed protein(s) feeds forward to influence metabolite levels, and specify disease-predictive models that incorporate mutual dependence amongst genetics, environment, and metabolism.
PMCID: PMC3169529  PMID: 21931564
PLoS Genetics  2011;7(8):e1002215.
Metabolomic profiling and the integration of whole-genome genetic association data has proven to be a powerful tool to comprehensively explore gene regulatory networks and to investigate the effects of genetic variation at the molecular level. Serum metabolite concentrations allow a direct readout of biological processes, and association of specific metabolomic signatures with complex diseases such as Alzheimer's disease and cardiovascular and metabolic disorders has been shown. There are well-known correlations between sex and the incidence, prevalence, age of onset, symptoms, and severity of a disease, as well as the reaction to drugs. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by gender. This study investigated sex-specific differences of serum metabolite concentrations and their underlying genetic determination. For discovery and replication we used more than 3,300 independent individuals from KORA F3 and F4 with metabolite measurements of 131 metabolites, including amino acids, phosphatidylcholines, sphingomyelins, acylcarnitines, and C6-sugars. A linear regression approach revealed significant concentration differences between males and females for 102 out of 131 metabolites (p-values<3.8×10−4; Bonferroni-corrected threshold). Sex-specific genome-wide association studies (GWAS) showed genome-wide significant differences in beta-estimates for SNPs in the CPS1 locus (carbamoyl-phosphate synthase 1, significance level: p<3.8×10−10; Bonferroni-corrected threshold) for glycine. We showed that the metabolite profiles of males and females are significantly different and, furthermore, that specific genetic variants in metabolism-related genes depict sexual dimorphism. Our study provides new important insights into sex-specific differences of cell regulatory processes and underscores that studies should consider sex-specific effects in design and interpretation.
Author Summary
The combination of genomic and metabolic studies during the last years has provided astonishing results. However, most of the studies published so far did not consider the role of sexual dimorphism and did not analyse their data stratified by sex. The investigation of 131 serum metabolite concentrations of >3,300 population-based samples (KORA F3/F4) revealed significant differences in the metabolite profile of males and females. Furthermore, a genome-wide picture of sex-specific genetic variations in human metabolism (>2,000 subjects from KORA F3/F4 cohorts) was investigated. Sex-specific genome-wide association studies (GWAS) showed differences in the effect of genetic variations on metabolites in men and women. SNPs in the CPS1 (carbamoyl-phosphate synthase 1) locus showed genome-wide significant differences in beta-estimates of sex-specific association analysis (significance level: 3.8×10−10) for glycine. As global metabolomic techniques are more and more refined to identify more compounds in single biological samples, the predictive power of this new technology will greatly increase. This suggests that metabolites, which may be used as predictive biomarkers to indicate the presence or severity of a disease, have to be used selectively depending on sex.
PMCID: PMC3154959  PMID: 21852955
PLoS Genetics  2011;7(7):e1002177.
Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%–50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.
Author Summary
Genome-wide association studies (GWAS) have successfully identified genetic variants associated with complex human phenotypes. Despite a proliferation of analysis methods, most studies rely on simple, robust SNP–by–SNP univariate tests with ever-larger population sizes. Here we introduce a new test motivated by the biological hypothesis that a single gene may contain multiple variants that contribute independently to a trait. Applied to simulated phenotypes with real genotypes, our new method, Gene-Wide Significance (GWiS), has better power to identify true associations than traditional univariate methods, previous Bayesian methods, popular L1 regularized (LASSO) multivariate regression, and other approaches. GWiS retains power for low-frequency alleles that are increasingly important for personal genetics, and it is the only method tested that accurately estimates the number of independent effects within a gene. When applied to human data for multiple ECG traits, GWiS identifies more genome-wide significant loci (verified by meta-analyses of much larger populations) than any other method. We estimate that 35%–50% of ECG trait loci are likely to have multiple independent effects, suggesting that our method will reveal previously unidentified associations when applied to existing data and will improve power for future association studies.
PMCID: PMC3145613  PMID: 21829371
PLoS Genetics  2011;7(7):e1002193.
Long-chain n-3 polyunsaturated fatty acids (PUFAs) can derive from diet or from α-linolenic acid (ALA) by elongation and desaturation. We investigated the association of common genetic variation with plasma phospholipid levels of the four major n-3 PUFAs by performing genome-wide association studies in five population-based cohorts comprising 8,866 subjects of European ancestry. Minor alleles of SNPs in FADS1 and FADS2 (desaturases) were associated with higher levels of ALA (p = 3×10−64) and lower levels of eicosapentaenoic acid (EPA, p = 5×10−58) and docosapentaenoic acid (DPA, p = 4×10−154). Minor alleles of SNPs in ELOVL2 (elongase) were associated with higher EPA (p = 2×10−12) and DPA (p = 1×10−43) and lower docosahexaenoic acid (DHA, p = 1×10−15). In addition to genes in the n-3 pathway, we identified a novel association of DPA with several SNPs in GCKR (glucokinase regulator, p = 1×10−8). We observed a weaker association between ALA and EPA among carriers of the minor allele of a representative SNP in FADS2 (rs1535), suggesting a lower rate of ALA-to-EPA conversion in these subjects. In samples of African, Chinese, and Hispanic ancestry, associations of n-3 PUFAs were similar with a representative SNP in FADS1 but less consistent with a representative SNP in ELOVL2. Our findings show that common variation in n-3 metabolic pathway genes and in GCKR influences plasma phospholipid levels of n-3 PUFAs in populations of European ancestry and, for FADS1, in other ancestries.
Author Summary
Circulating long-chain n-3 polyunsaturated fatty acids (PUFAs) derive from fatty fish or from the conversion of the plant n-3 PUFA by elongation and desaturation. We looked for common genetic markers throughout the genome that might influence plasma phospholipid levels of the four major n-3 PUFAs in five large studies and pooled the results. We found that levels of all four n-3 PUFAs were associated with genetic markers in known desaturation and elongation genes. We also found evidence that conversion of the plant n-3 PUFA to longer chain n-3 PUFAs is less effective in people with certain desaturation-gene markers, which could be important for people who do not eat fish. We also found a marker in a gene involved in glucose metabolism, called the glucokinase regulator, to be associated with one intermediate n-3 PUFA. Some of these findings were seen across multiple race/ethnicities. Overall, these results have implications for how genes and the environment interact to influence circulating levels of fatty acids.
PMCID: PMC3145614  PMID: 21829377
PLoS Genetics  2011;7(7):e1002170.
Asthma is a complex phenotype influenced by genetic and environmental factors. We conducted a genome-wide association study (GWAS) with 938 Japanese pediatric asthma patients and 2,376 controls. Single-nucleotide polymorphisms (SNPs) showing strong associations (P<1×10−8) in GWAS were further genotyped in an independent Japanese samples (818 cases and 1,032 controls) and in Korean samples (835 cases and 421 controls). SNP rs987870, located between HLA-DPA1 and HLA-DPB1, was consistently associated with pediatric asthma in 3 independent populations (Pcombined = 2.3×10−10, odds ratio [OR] = 1.40). HLA-DP allele analysis showed that DPA1*0201 and DPB1*0901, which were in strong linkage disequilibrium, were strongly associated with pediatric asthma (DPA1*0201: P = 5.5×10−10, OR = 1.52, and DPB1*0901: P = 2.0×10−7, OR = 1.49). Our findings show that genetic variants in the HLA-DP locus are associated with the risk of pediatric asthma in Asian populations.
Author Summary
Asthma is the most common chronic disorder in children, and asthma exacerbation is an important cause of childhood morbidity and hospitalization. Here, taking advantage of recent technological advances in human genetics, we performed a genome-wide association study and follow-up validation studies to identify genetic variants for asthma. By examining 6,428 Asians, we found rs987870 and HLA-DPA1*0201/DPB1*0901 were associated with pediatric asthma. The association signal was stretched in the region of HLA-DPB2, collagen, type XI, alpha 2 (COL11A2), and Retinoid X receptor beta (RXRB), but strong linkage disequilibrium in this region made it difficult to specifically identify causative variants. Interestingly, the SNP (or the HLA-DP allele) associated with pediatric asthma (Th-2 type immune diseases) in the present study confers protection against Th-1 type immune diseases, such as type 1 diabetes and rheumatoid arthritis. Therefore, the association results obtained in the present study could partially explain the inverse relationship between asthma and Th-1 type immune diseases and may lead to better understanding of Th-1/Th-2 immune diseases.
PMCID: PMC3140987  PMID: 21814517

Results 1-25 (47)