Insulin secretion plays a critical role in glucose homeostasis, and failure to secrete sufficient insulin is a hallmark of type 2 diabetes. Genome-wide association studies (GWAS) have identified loci contributing to insulin processing and secretion1,2; however, a substantial fraction of the genetic contribution remains undefined. To examine low-frequency (minor allele frequency (MAF) 0.5% to 5%) and rare (MAF<0.5%) nonsynonymous variants, we analyzed exome array data in 8,229 non-diabetic Finnish males. We identified low-frequency coding variants associated with fasting proinsulin levels at the SGSM2 and MADD GWAS loci and three novel genes with low-frequency variants associated with fasting proinsulin or insulinogenic index: TBC1D30, KANK1, and PAM. We also demonstrate that the interpretation of single-variant and gene-based tests needs to consider the effects of noncoding SNPs nearby and megabases (Mb) away. This study demonstrates that exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits.
We investigated the association of glycemia and 43 genetic risk variants for hyperglycemia/type 2 diabetes with amino acid levels in the population-based Metabolic Syndrome in Men (METSIM) Study, including 9,369 nondiabetic or newly diagnosed type 2 diabetic Finnish men. Plasma levels of eight amino acids were measured with proton nuclear magnetic resonance spectroscopy. Increasing fasting and 2-h plasma glucose levels were associated with increasing levels of several amino acids and decreasing levels of histidine and glutamine. Alanine, leucine, isoleucine, tyrosine, and glutamine predicted incident type 2 diabetes in a 4.7-year follow-up of the METSIM Study, and their effects were largely mediated by insulin resistance (except for glutamine). We also found significant correlations between insulin sensitivity (Matsuda insulin sensitivity index) and mRNA expression of genes regulating amino acid degradation in 200 subcutaneous adipose tissue samples. Only 1 of 43 risk single nucleotide polymorphisms for type 2 diabetes or hyperglycemia, the glucose-increasing major C allele of rs780094 of GCKR, was significantly associated with decreased levels of alanine and isoleucine and elevated levels of glutamine. In conclusion, the levels of branched-chain, aromatic amino acids and alanine increased and the levels of glutamine and histidine decreased with increasing glycemia, reflecting, at least in part, insulin resistance. Only one single nucleotide polymorphism regulating hyperglycemia was significantly associated with amino acid levels.
William Martin and colleagues report on their stakeholder meetings that reviewed the health risks of household air pollution and cookstoves, and identified research priorities in seven key areas.
Please see later in the article for the Editors' Summary
A mutation in the LMNA gene is responsible for the most dramatic form of premature aging, Hutchinson-Gilford progeria syndrome (HGPS). Several recent studies have suggested that protein products of this gene might have a role in normal physiological cellular senescence. To explore further LMNA's possible role in normal aging, we genotyped 16 SNPs over a span of 75.4 kb of the LMNA gene on a sample of long-lived individuals (US Caucasians with age ≥95 years, N=873) and genetically matched younger controls (N=443). We tested all common non-redundant haplotypes (frequency ≥ 0.05) based on subgroups of these 16 SNPs for association with longevity. The most significant haplotype, based on 4 SNPs, remained significant after adjustment for multiple testing (OR = 1.56, P=2.5×10−5, multiple-testing-adjusted P=0.0045). To attempt to replicate these results, we genotyped 3448 subjects from four independent samples of long-lived individuals and control subjects from 1) the New England Centenarian Study (NECS) (N=738), 2) the Southern Italian Centenarian Study (SICS) (N=905), 3) France (N=1103), and 4) the Einstein Ashkenazi Longevity Study (N=702). We replicated the association with the most significant haplotype from our initial analysis in the NECS sample (OR = 1.60, P=0.0023), but not in the other three samples (P>.15). In a meta-analysis combining all five samples, the best haplotype remained significantly associated with longevity after adjustment for multiple testing in the initial and follow-up samples (OR = 1.18, P=7.5×10−4, multiple-testing-adjusted P=0.037). These results suggest that LMNA variants may play a role in human lifespan.
longevity gene; human; longevity; genetics
Glucokinase (GCK) acts as a component of the “glucose sensor” in pancreatic β-cells and possibly in other tissues, including the brain. However, >99% of GCK in the body is located in the liver, where it serves as a “gatekeeper”, determining the rate of hepatic glucose phosphorylation. Mutations in GCK are a cause of maturity-onset diabetes of the young (MODY), and GCKR, the regulator of GCK in the liver, is a diabetes susceptibility locus. In addition, several GCK activators are being studied as potential regulators of blood glucose. The ability to estimate liver GCK activity in vivo for genetic and pharmacologic studies may provide important physiologic insights into the regulation of hepatic glucose metabolism.
RESEARCH DESIGN AND METHODS
Here we introduce a simple, linear, two-compartment kinetic model that exploits lactate and glucose kinetics observed during the frequently sampled intravenous glucose tolerance test (FSIGT) to estimate liver GCK activity (KGK), glycolysis (K12), and whole body fractional lactate clearance (K01).
To test our working model of lactate, we used cross-sectional FSIGT data on 142 nondiabetic individuals chosen at random from the Finland–United States Investigation of NIDDM Genetics study cohort. Parameters KGK, K12, and K01 were precisely estimated. Median model parameter estimates were consistent with previously published values.
This novel model of lactate kinetics extends the utility of the FSIGT protocol beyond whole-body glucose homeostasis by providing estimates for indices pertaining to hepatic glucose metabolism, including hepatic GCK activity and glycolysis rate.
Gene–lifestyle interactions have been suggested to contribute to the development of type 2 diabetes. Glucose levels 2 h after a standard 75-g glucose challenge are used to diagnose diabetes and are associated with both genetic and lifestyle factors. However, whether these factors interact to determine 2-h glucose levels is unknown. We meta-analyzed single nucleotide polymorphism (SNP) × BMI and SNP × physical activity (PA) interaction regression models for five SNPs previously associated with 2-h glucose levels from up to 22 studies comprising 54,884 individuals without diabetes. PA levels were dichotomized, with individuals below the first quintile classified as inactive (20%) and the remainder as active (80%). BMI was considered a continuous trait. Inactive individuals had higher 2-h glucose levels than active individuals (β = 0.22 mmol/L [95% CI 0.13–0.31], P = 1.63 × 10−6). All SNPs were associated with 2-h glucose (β = 0.06–0.12 mmol/allele, P ≤ 1.53 × 10−7), but no significant interactions were found with PA (P > 0.18) or BMI (P ≥ 0.04). In this large study of gene–lifestyle interaction, we observed no interactions between genetic and lifestyle factors, both of which were associated with 2-h glucose. It is perhaps unlikely that top loci from genome-wide association studies will exhibit strong subgroup-specific effects, and may not, therefore, make the best candidates for the study of interactions.
Cell biologists love to think outside the box, pursuing many surprising twists and unexpected turns in their quest to unravel the mysteries of how cells work. But can cell biologists think outside the bench? We are certain that they can, and clearly some already do. To encourage more cell biologists to venture into the realm of translational research on a regular basis, we would like to share a handful of the many lessons that we have learned in our effort to develop experimental treatments for Hutchinson-Gilford progeria syndrome (HGPS), an endeavor that many view as a “poster child” for how basic cell biology can be translated to the clinic.
Genome-wide association studies (GWAS) identify regions of the genome that are associated with particular traits, but do not typically identify specific causative genetic elements. For example, while a large number of single nucleotide polymorphisms associated with type 2 diabetes (T2D) and related traits have been identified by human GWAS, only a few genes have functional evidence to support or to rule out a role in cellular metabolism or dietary interactions. Here, we use a recently developed Drosophila model in which high-sucrose feeding induces phenotypes similar to T2D to assess orthologs of human GWAS-identified candidate genes for risk of T2D and related traits.
Disrupting orthologs of certain T2D candidate genes (HHEX, THADA, PPARG, KCNJ11) led to sucrose-dependent toxicity. Tissue-specific knockdown of the HHEX ortholog dHHEX (CG7056) directed metabolic defects and enhanced lethality; for example, fat-body-specific loss of dHHEX led to increased hemolymph glucose and reduced insulin sensitivity.
Candidate genes identified in human genetic studies of metabolic traits can be prioritized and functionally characterized using a simple Drosophila approach. To our knowledge, this is the first large-scale effort to study the functional interaction between GWAS-identified candidate genes and an environmental risk factor such as diet in a model organism system.
Genome-wide association study; Drosophila melanogaster; Diabetes mellitus, type 2; Hyperglycemia; Dyslipidemias; Phylogeny; Reverse genetics; High-throughput screening assays; HHEX protein, Human
Transgenic animals are extensively used to model human disease. Typically, the transgene copy number is estimated, but the exact integration site and configuration of the foreign DNA remains uncharacterized. When transgenes have been closely examined, some unexpected configurations have been found. Here, we describe a method to recover transgene insertion sites and assess structural rearrangements of host and transgene DNA using microarray hybridization and targeted sequence capture. We used information about the transgene insertion site to develop a polymerase chain reaction genotyping assay to distinguish heterozygous from homozygous transgenic animals. Although we worked with a bacterial artificial chromosome transgenic mouse line, this method can be used to analyse the integration site and configuration of any foreign DNA in a sequenced genome.
While rapamycin has been in use for years in transplant patients as an antirejection drug, more recently it has shown promise in treating diseases of aging, such as neurodegenerative disorders and atherosclerosis. We recently reported that rapamycin reverses the cellular phenotype of fibroblasts from children with the premature aging disease Hutchinson-Gilford progeria syndrome (HGPS). We found that the causative aberrant protein, progerin, was cleared through autophagic mechanisms when the cells were treated with rapamycin, suggesting a new potential treatment for HGPS. Recent evidence shows that progerin is also present in aged tissues of healthy individuals, suggesting that progerin may contribute to physiological aging. While it is intriguing to speculate that rapamycin may affect normal aging in humans, as it does in lower organisms, it will be important to identify safer analogs of rapamycin for chronic treatments in humans in order to minimize toxicity. In addition to its role in HGPS and normal aging, we discuss the potential of rapamycin for the treatment of age-dependent neurodegenerative diseases.
progerin; rapamycin; autophagy; aging; neurodegeneration; progeria
The phenomenal progress made in stem cell biology in the past few years has infused the field of regenerative medicine with a great deal of scientific enthusiasm. However, along with the excitement of discovery comes a new sense of translational urgency. The prospect of using embryonic and induced pluripotent stem cell tools and technologies to produce cell-based therapies and other treatments is no longer a distant dream; it is a very real opportunity that demands our attention today. As with most new fields, regenerative medicine has experienced some significant growing pains, and we have identified a number of key obstacles to progress. Given our role as the lead U.S. biomedical research agency and the world's largest supporter of medical research, the National Institutes of Health (NIH) has a responsibility to find ways to reduce or remove many of these obstacles and, consequently, has—and continues—to respond to these challenges in a variety of ways. In this brief essay, we will review our progress and highlight a new development: the founding of a Center for Regenerative Medicine on the NIH campus.
To identify previously unknown genetic loci associated with fasting glucose concentrations, we examined the leading association signals in ten genome-wide association scans involving a total of 36,610 individuals of European descent. Variants in the gene encoding melatonin receptor 1B (MTNR1B) were consistently associated with fasting glucose across all ten studies. The strongest signal was observed at rs10830963, where each G allele (frequency 0.30 in HapMap CEU) was associated with an increase of 0.07 (95% CI = 0.06-0.08) mmol/l in fasting glucose levels (P = 3.2 = × 10−50) and reduced beta-cell function as measured by homeostasis model assessment (HOMA-B, P = 1.1 × 10−15). The same allele was associated with an increased risk of type 2 diabetes (odds ratio = 1.09 (1.05-1.12), per G allele P = 3.3 × 10−7) in a meta-analysis of 13 case-control studies totaling 18,236 cases and 64,453 controls. Our analyses also confirm previous associations of fasting glucose with variants at the G6PC2 (rs560887, P = 1.1 × 10−57) and GCK (rs4607517, P = 1.0 × 10−25) loci.
Asthma is etiologically and clinically heterogeneous, making the genomic basis of asthma difficult to identify. We exploited the strain-dependence of a murine model of allergic airway disease to identify different genomic responses in the lung. BALB/cJ and C57BL/6J mice were sensitized with the immunodominant allergen from the Dermatophagoides pteronyssinus species of house dust mite (Der p 1), without exogenous adjuvant, and the mice then underwent a single challenge with Der p 1. Allergic inflammation, serum antibody titers, mucous metaplasia, and airway hyperresponsiveness were evaluated 72 hours after airway challenge. Whole-lung gene expression analyses were conducted to identify genomic responses to allergen challenge. Der p 1–challenged BALB/cJ mice produced all the key features of allergic airway disease. In comparison, C57BL/6J mice produced exaggerated Th2-biased responses and inflammation, but exhibited an unexpected decrease in airway hyperresponsiveness compared with control mice. Lung gene expression analysis revealed genes that were shared by both strains and a set of down-regulated genes unique to C57BL/6J mice, including several G-protein–coupled receptors involved in airway smooth muscle contraction, most notably the M2 muscarinic receptor, which we show is expressed in airway smooth muscle and was decreased at the protein level after challenge with Der p 1. Murine strain–dependent genomic responses in the lung offer insights into the different biological pathways that develop after allergen challenge. This study of two different murine strains demonstrates that inflammation and airway hyperresponsiveness can be decoupled, and suggests that the down-modulation of expression of G-protein–coupled receptors involved in regulating airway smooth muscle contraction may contribute to this dissociation.
asthma; airway hyperresponsiveness; inflammation; house dust mite; Der p 1
Genome-wide association (GWA) studies have identified several susceptibility loci for metabolic syndrome (MetS) component traits, but have had variable success in identifying susceptibility loci to the syndrome as an entity. We conducted a GWA study on MetS and its component traits in four Finnish cohorts consisting of 2637 MetS cases and 7927 controls, both free of diabetes, and followed the top loci in an independent sample with transcriptome and NMR-based metabonomics data. Furthermore, we tested for loci associated with multiple MetS component traits using factor analysis and built a genetic risk score for MetS.
Methods and Results
A previously known lipid locus, APOA1/C3/A4/A5 gene cluster region (SNP rs964184), was associated with MetS in all four study samples (P=7.23×10−9 in meta-analysis). The association was further supported by serum metabolite analysis, where rs964184 associated with various VLDL, TG, and HDL metabolites (P=0.024-1.88×10−5). Twenty-two previously identified susceptibility loci for individual MetS component traits were replicated in our GWA and factor analysis. Most of these associated with lipid phenotypes and none with two or more uncorrelated MetS components. A genetic risk score, calculated as the number of alleles in loci associated with individual MetS traits, was strongly associated with MetS status.
Our findings suggest that genes from lipid metabolism pathways have the key role in the genetic background of MetS. We found little evidence for pleiotropy linking dyslipidemia and obesity to the other MetS component traits such as hypertension and glucose intolerance.
metabolic syndrome; risk factors; genome-wide association study; meta-analysis; lipids
In 2007, the International Knockout Mouse Consortium (IKMC) made the ambitious promise to generate mutations in virtually every protein-coding gene of the mouse genome in a concerted worldwide action. Now, 5 years later, the IKMC members have developed high-throughput gene trapping and, in particular, gene-targeting pipelines and generated more than 17,400 mutant murine embryonic stem (ES) cell clones and more than 1,700 mutant mouse strains, most of them conditional. A common IKMC web portal (www.knockoutmouse.org) has been established, allowing easy access to this unparalleled biological resource. The IKMC materials considerably enhance functional gene annotation of the mammalian genome and will have a major impact on future biomedical research.
Missing phenotype data can be a major hurdle to mapping quantitative trait loci (QTL). Though in many cases experiments may be designed to minimize the occurrence of missing data, it is often unavoidable in practice; thus, statistical methods to account for missing data are needed. In this paper we describe an approach for conjoining multiple imputation and QTL mapping. Methods are applied to map genes associated with increased breathing effort in mice after lung inflammation due to allergen challenge in developing lines of the Collaborative Cross, a new mouse genetics resource. Missing data poses a particular challenge in this study because the desired phenotype summary to be mapped is a function of incompletely observed dose-response curves. Comparison of the multiple imputation approach to two naive approaches for handling missing data suggest that these simpler methods may yield poor results: ignoring missing data through a complete case analysis may lead to incorrect conclusions, while using a last observation carried forward procedure, which does not account for uncertainty in the imputed values, may lead to anti-conservative inference. The proposed approach is widely applicable to other studies with missing phenotype data.
multiple imputation; missing data; quantitative trait loci
Common diseases such as type 2 diabetes are phenotypically heterogeneous. Obesity is a major risk factor for type 2 diabetes, but patients vary appreciably in body mass index. We hypothesized that the genetic predisposition to the disease may be different in lean (BMI<25 Kg/m2) compared to obese cases (BMI≥30 Kg/m2). We performed two case-control genome-wide studies using two accepted cut-offs for defining individuals as overweight or obese. We used 2,112 lean type 2 diabetes cases (BMI<25 kg/m2) or 4,123 obese cases (BMI≥30 kg/m2), and 54,412 un-stratified controls. Replication was performed in 2,881 lean cases or 8,702 obese cases, and 18,957 un-stratified controls. To assess the effects of known signals, we tested the individual and combined effects of SNPs representing 36 type 2 diabetes loci. After combining data from discovery and replication datasets, we identified two signals not previously reported in Europeans. A variant (rs8090011) in the LAMA1 gene was associated with type 2 diabetes in lean cases (P = 8.4×10−9, OR = 1.13 [95% CI 1.09–1.18]), and this association was stronger than that in obese cases (P = 0.04, OR = 1.03 [95% CI 1.00–1.06]). A variant in HMG20A—previously identified in South Asians but not Europeans—was associated with type 2 diabetes in obese cases (P = 1.3×10−8, OR = 1.11 [95% CI 1.07–1.15]), although this association was not significantly stronger than that in lean cases (P = 0.02, OR = 1.09 [95% CI 1.02–1.17]). For 36 known type 2 diabetes loci, 29 had a larger odds ratio in the lean compared to obese (binomial P = 0.0002). In the lean analysis, we observed a weighted per-risk allele OR = 1.13 [95% CI 1.10–1.17], P = 3.2×10−14. This was larger than the same model fitted in the obese analysis where the OR = 1.06 [95% CI 1.05–1.08], P = 2.2×10−16. This study provides evidence that stratification of type 2 diabetes cases by BMI may help identify additional risk variants and that lean cases may have a stronger genetic predisposition to type 2 diabetes.
Individuals with Type 2 diabetes (T2D) can present with variable clinical characteristics. It is well known that obesity is a major risk factor for type 2 diabetes, yet patients can vary considerably—there are many lean diabetes patients and many overweight people without diabetes. We hypothesized that the genetic predisposition to the disease may be different in lean (BMI<25 Kg/m2) compared to obese cases (BMI≥30 Kg/m2). Specifically, as lean T2D patients had lower risk than obese patients, they must have been more genetically susceptible. Using genetic data from multiple genome-wide association studies, we tested genetic markers across the genome in 2,112 lean type 2 diabetes cases (BMI<25 kg/m2), 4,123 obese cases (BMI≥30 kg/m2), and 54,412 healthy controls. We confirmed our results in an additional 2,881 lean cases, 8,702 obese cases, and 18,957 healthy controls. Using these data we found differences in genetic enrichment between lean and obese cases, supporting our original hypothesis. We also searched for genetic variants that may be risk factors only in lean or obese patients and found two novel gene regions not previously reported in European individuals. These findings may influence future study design for type 2 diabetes and provide further insight into the biology of the disease.
We investigated the effects of 34 genetic risk variants for hyperglycemia/type 2 diabetes on lipoprotein subclasses and particle composition in a large population-based cohort.
RESEARCH DESIGN AND METHODS
The study included 6,580 nondiabetic Finnish men from the population-based Metabolic Syndrome in Men (METSIM) study (aged 57 ± 7 years; BMI 26.8 ± 3.7 kg/m2). Genotyping of 34 single nucleotide polymorphism (SNPs) for hyperglycemia/type 2 diabetes was performed. Proton nuclear magnetic resonance spectroscopy was used to measure particle concentrations of 14 lipoprotein subclasses and their composition in native serum samples.
The glucose-increasing allele of rs780094 in GCKR was significantly associated with low concentrations of VLDL particles (independently of their size) and small LDL and was nominally associated with low concentrations of intermediate-density lipoprotein, all LDL subclasses, and high concentrations of very large and large HDL particles. The glucose-increasing allele of rs174550 in FADS1 was significantly associated with high concentrations of very large and large HDL particles and nominally associated with low concentrations of all VLDL particles. SNPs rs10923931 in NOTCH2 and rs757210 in HNF1B genes showed nominal or significant associations with several lipoprotein traits. The genetic risk score of 34 SNPs was not associated with any of the lipoprotein subclasses.
Four of the 34 risk loci for type 2 diabetes or hyperglycemia (GCKR, FADS1, NOTCH2, and HNF1B) were significantly associated with lipoprotein traits. A GCKR variant predominantly affected the concentration of VLDL, and the FADS1 variant affected very large and large HDL particles. Only a limited number of risk loci for hyperglycemia/type 2 diabetes significantly affect lipoprotein metabolism.
Large prospective cohort studies are critical for identifying etiologic factors for disease, but they require substantial long-term research investment. Such studies can be conducted as multisite consortia of academic medical centers, combinations of smaller ongoing studies, or a single large site such as a dominant regional health-care provider. Still another strategy relies upon centralized conduct of most or all aspects, recruiting through multiple temporary assessment centers. This is the approach used by a large-scale national resource in the United Kingdom known as the “UK Biobank,” which completed recruitment/examination of 503,000 participants between 2007 and 2010 within budget and ahead of schedule. A key lesson from UK Biobank and similar studies is that large studies are not simply small studies made large but, rather, require fundamentally different approaches in which “process” expertise is as important as scientific rigor. Embedding recruitment in a structure that facilitates outcome determination, utilizing comprehensive and flexible information technology, automating biospecimen processing, ensuring broad consent, and establishing essentially autonomous leadership with appropriate oversight are all critical to success. Whether and how these approaches may be transportable to the United States remain to be explored, but their success in studies such as UK Biobank makes a compelling case for such explorations to begin.
cohort studies; epidemiology; prospective studies
Hematological parameters, including red and white blood cell counts and hemoglobin concentration, are widely used clinical indicators of health and disease. These traits are tightly regulated in healthy individuals and are under genetic control. Mutations in key genes that affect hematological parameters have important phenotypic consequences, including multiple variants that affect susceptibility to malarial disease. However, most variation in hematological traits is continuous and is presumably influenced by multiple loci and variants with small phenotypic effects. We used a newly developed mouse resource population, the Collaborative Cross (CC), to identify genetic determinants of hematological parameters. We surveyed the eight founder strains of the CC and performed a mapping study using 131 incipient lines of the CC. Genome scans identified quantitative trait loci for several hematological parameters, including mean red cell volume (Chr 7 and Chr 14), white blood cell count (Chr 18), percent neutrophils/lymphocytes (Chr 11), and monocyte number (Chr 1). We used evolutionary principles and unique bioinformatics resources to reduce the size of candidate intervals and to view functional variation in the context of phylogeny. Many quantitative trait loci regions could be narrowed sufficiently to identify a small number of promising candidate genes. This approach not only expands our knowledge about hematological traits but also demonstrates the unique ability of the CC to elucidate the genetic architecture of complex traits.
Mouse Genetic Resource; Mouse Collaborative Cross; hematology; hemoglobin β; mean red cell volume; QTL; mouse genetics; complex traits; shared ancestry
Defining the genetic contribution of rare variants to common diseases is a major basic and clinical science challenge that could offer new insights into disease etiology and provide potential for directed gene- and pathway-based prevention and treatment. Common and rare nonsynonymous variants in the GCKR gene are associated with alterations in metabolic traits, most notably serum triglyceride levels. GCKR encodes glucokinase regulatory protein (GKRP), a predominantly nuclear protein that inhibits hepatic glucokinase (GCK) and plays a critical role in glucose homeostasis. The mode of action of rare GCKR variants remains unexplored. We identified 19 nonsynonymous GCKR variants among 800 individuals from the ClinSeq medical sequencing project. Excluding the previously described common missense variant p.Pro446Leu, all variants were rare in the cohort. Accordingly, we functionally characterized all variants to evaluate their potential phenotypic effects. Defects were observed for the majority of the rare variants after assessment of cellular localization, ability to interact with GCK, and kinetic activity of the encoded proteins. Comparing the individuals with functional rare variants to those without such variants showed associations with lipid phenotypes. Our findings suggest that, while nonsynonymous GCKR variants, excluding p.Pro446Leu, are rare in individuals of mixed European descent, the majority do affect protein function. In sum, this study utilizes computational, cell biological, and biochemical methods to present a model for interpreting the clinical significance of rare genetic variants in common disease.
Long-range regulatory elements, such as enhancers, exert substantial control over tissue-specific gene expression patterns. Genome-wide discovery of functional enhancers in different cell types is important for our understanding of genome function as well as human disease etiology.
In this study, we developed an in silico approach to model the previously reported phenomenon of transcriptional pausing, accompanied by divergent transcription, at active promoters. We then used this model for large-scale prediction of non-promoter-associated bidirectional expression of short transcripts. Our predictions were significantly enriched for DNase hypersensitive sites, histone H3 lysine 27 acetylation (H3K27ac), and other chromatin marks associated with active rather than poised or repressed enhancers. We also detected modest bidirectional expression at binding sites of the CCCTC-factor (CTCF) genome-wide, particularly those that overlap H3K27ac.
Our findings indicate that the signature of bidirectional expression of short transcripts, learned from promoter-proximal transcriptional pausing, can be used to predict active long-range regulatory elements genome-wide, likely due in part to specific association of RNA polymerase with enhancer regions.
Identifying cis-regulatory elements is important to understand how human pancreatic islets modulate gene expression in physiologic or pathophysiologic (e.g., diabetic) conditions. We conducted genome-wide analysis of DNase I hypersensitive sites, histone H3 lysine methylation modifications (K4me1, K4me3, K79me2), and CCCTC factor (CTCF) binding in human islets. This identified ~18,000 putative promoters (several hundred unannotated and islet-active). Surprisingly, active promoter modifications were absent at genes encoding islet-specific hormones, suggesting a distinct regulatory mechanism. Of 34,039 distal (non-promoter) regulatory elements, 47% are islet-unique and 22% are CTCF-bound. In the 18 type 2 diabetes (T2D)-associated loci, we identified 118 putative regulatory elements and confirmed enhancer activity for 12/33 tested. Among 6 regulatory elements harboring T2D-associated variants, 2 exhibit significant allele-specific differences in activity. These findings present a global snapshot of the human islet epigenome and should provide functional context for non-coding variants emerging from genetic studies of T2D and other islet disorders.