Next-generation deep sequencing of small RNAs has unveiled the complexity of the microRNA (miRNA) transcriptome, which is in large part due to the diversity of miRNA sequence variants (“isomiRs”). Changes to a miRNA’s seed sequence (nucleotides 2–8), including shifted start positions, can redirect targeting to a dramatically different set of RNAs and alter biological function. We performed deep sequencing of small RNA from mouse insulinoma (MIN6) cells (widely used as a surrogate for the study of pancreatic beta cells) and developed a bioinformatic analysis pipeline to profile isomiR diversity. Additionally, we applied the pipeline to recently published small RNA-seq data from primary human beta cells and whole islets and compared the miRNA profiles with that of MIN6. We found that: (1) the miRNA expression profile in MIN6 cells is highly correlated with those of primary human beta cells and whole islets; (2) miRNA loci can generate multiple highly expressed isomiRs with different 5′-start positions (5′-isomiRs); (3) isomiRs with shifted start positions (5′-shifted isomiRs) are highly expressed, and can be as abundant as their unshifted counterparts (5′-reference miRNAs). Finally, we identified 10 beta cell miRNA families as candidate regulatory hubs in a type 2 diabetes (T2D) gene network. The most significant candidate hub was miR-29, which we demonstrated regulates the mRNA levels of several genes critical to beta cell function and implicated in T2D. Three of the candidate miRNA hubs were novel 5′-shifted isomiRs: miR-375+1, miR-375-1 and miR-183-5p+1. We showed by in silico target prediction and in vitro transfection studies that both miR-375+1 and miR-375-1 are likely to target an overlapping, but distinct suite of beta cell genes compared to canonical miR-375. In summary, this study characterizes the isomiR profile in beta cells for the first time, and also highlights the potential functional relevance of 5′-shifted isomiRs to T2D.
Cellular gene expression is governed by a complex, multi-faceted network of regulatory interactions. In the last decade, microRNAs (miRNAs) have emerged as critical components of this network. miRNAs are small, non-coding RNA molecules that serve as post-transcriptional regulators of gene expression. Although there has been substantive progress in our understanding of miRNA-mediated gene regulation, the mechanisms that control the expression of the miRNAs themselves are less well understood. Identifying the factors that control miRNA expression will be critical for further characterizing miRNA function in normal physiology and pathobiology. We describe recent progress in the efforts to map genomic regions that control miRNA transcription (such as promoters). In particular, we highlight the utility of large-scale “-omic” data, such as those made available by the ENCODE and the NIH Roadmap Epigenomics consortiums, for the discovery of transcriptional control elements that govern miRNA expression. Finally, we discuss how integrative analysis of complementary genetic datasets, such as the NHGRI Genome Wide Association Studies Catalog, can predict novel roles for transcriptional mis-regulation of miRNAs in complex disease etiology.
Chromatin; complex disease; epigenome; genomics; microRNA; nascent RNA; promoter; transcription.
A diverse suite of effector immune responses provide protection against various pathogens. However, the array of effector responses must be immunologically regulated to limit pathogen- and immune-associated damage. CD4+Foxp3+ regulatory T cells (Treg) calibrate immune responses; however, how Treg cells adapt to control different effector responses is unclear. To investigate the molecular mechanism of Treg diversity we used whole genome expression profiling and next generation small RNA sequencing of Treg cells isolated from type-1 or type-2 inflamed tissue following Leishmania major or Schistosoma mansoni infection, respectively. In-silico analyses identified two miRNA “regulatory hubs” miR-10a and miR-182 as critical miRNAs in Th1- or Th2-associated Treg cells, respectively. Functionally and mechanistically, in-vitro and in-vivo systems identified that an IL-12/IFNγ axis regulated miR-10a and its putative transcription factor, Creb. Importantly, reduced miR-10a in Th1-associated Treg cells was critical for Treg function and controlled a suite of genes preventing IFNγ production. In contrast, IL-4 regulated miR-182 and cMaf in Th2-associed Treg cells, which mitigated IL-2 secretion, in part through repression of IL2-promoting genes. Together, this study indicates that CD4+Foxp3+ cells can be shaped by local environmental factors, which orchestrate distinct miRNA pathways preserving Treg stability and suppressor function.
The diversity of pathogens that the immune system encounters are controlled by a diverse suite of immunological effector responses. Preserving a well-controlled protective immune response is essential. Too vigorous an effector response can be as damaging as too little. Regulatory T cells (Treg) calibrate immune responses; however, how Treg cells adapt to control the diverse suite of effector responses is unclear. In this study we investigated the molecular identity of regulatory T cells that control distinct effector immune responses against two discrete pathogens, an intracellular parasitic protozoa, Leishmania major, and an extracellular helminth parasite, Schitsosoma mansoni. The two Treg populations studied were phenotypically and functionally different. We identified molecular pathways that influence this diversity and more specifically, we identified that two miRNAs (miR-182 and miR-10a) act as “regulatory hubs” critically controlling distinct properties within each Treg population. This is the first study identifying the upstream molecular pathways controlling Treg cell specialization and provides a new platform of Treg cell manipulation to fine-tune their function.
Genetic variants in intron 1 of the fat mass– and obesity-associated (FTO) gene have been consistently associated with body mass index (BMI) in Europeans. However, follow-up studies in African Americans (AA) have shown no support for some of the most consistently BMI–associated FTO index single nucleotide polymorphisms (SNPs). This is most likely explained by different race-specific linkage disequilibrium (LD) patterns and lower correlation overall in AA, which provides the opportunity to fine-map this region and narrow in on the functional variant. To comprehensively explore the 16q12.2/FTO locus and to search for second independent signals in the broader region, we fine-mapped a 646–kb region, encompassing the large FTO gene and the flanking gene RPGRIP1L by investigating a total of 3,756 variants (1,529 genotyped and 2,227 imputed variants) in 20,488 AAs across five studies. We observed associations between BMI and variants in the known FTO intron 1 locus: the SNP with the most significant p-value, rs56137030 (8.3×10−6) had not been highlighted in previous studies. While rs56137030was correlated at r2>0.5 with 103 SNPs in Europeans (including the GWAS index SNPs), this number was reduced to 28 SNPs in AA. Among rs56137030 and the 28 correlated SNPs, six were located within candidate intronic regulatory elements, including rs1421085, for which we predicted allele-specific binding affinity for the transcription factor CUX1, which has recently been implicated in the regulation of FTO. We did not find strong evidence for a second independent signal in the broader region. In summary, this large fine-mapping study in AA has substantially reduced the number of common alleles that are likely to be functional candidates of the known FTO locus. Importantly our study demonstrated that comprehensive fine-mapping in AA provides a powerful approach to narrow in on the functional candidate(s) underlying the initial GWAS findings in European populations.
Genetic variants within the fat mass– and obesity-associated (FTO) gene are associated with increased risk of obesity. To better understand which specific genetic variant(s) in this genetic region is associated with obesity risk, we attempt to genotype or impute all known genetic variants in the region and test for association with body mass index as a measurement of obesity in over 20,000 African Americans. We identified 29 potential candidate variants, of which one variant (rs1421085) is a particularly interesting candidate for future functional follow-up studies. Our example shows the powerful approach of studying a large African American population, substantially reducing the number of possible functional variants compared with European descent populations.
Background. To characterize the genetic basis of phenotypic evolution, numerous studies have identified individual genes that have likely evolved under natural selection. However, phenotypic changes may represent the cumulative effect of similar evolutionary forces acting on functionally related groups of genes. Phylogenetic analyses of divergent yeast species have identified functional groups of genes that have evolved at significantly different rates, suggestive of differential selection on the functional properties. However, due to environmental heterogeneity over long evolutionary timescales, selection operating within a single lineage may be dramatically different, and it is not detectable via interspecific comparisons alone. Moreover, interspecific studies typically quantify selection on protein-coding regions using the Dn/Ds ratio, which cannot be extended easily to study selection on noncoding regions or synonymous sites. The population genetic-based analysis of selection operating within a single lineage ameliorates these limitations. Findings. We investigated selection on several properties associated with genes, promoters, or polymorphic sites, by analyzing the derived allele frequency spectrum of single nucleotide polymorphisms (SNPs) in 28 strains of Saccharomyces paradoxus. We found evidence for significant differential selection between many functionally relevant categories of SNPs, underscoring the utility of function-centric approaches for discovering signatures of natural selection. When comparable, our findings are largely consistent with previous studies based on interspecific comparisons, with one notable exception: our study finds that mutations from an ancient amino acid to a relatively new amino acid are selectively disfavored, whereas interspecific comparisons have found selection against ancient amino acids. Several of our findings have not been addressed through prior interspecific studies: we find that synonymous mutations from preferred to unpreferred codons are selected against and that synonymous SNPs in the linker regions of proteins are relatively less constrained than those within protein domains. Conclusions. We present the first global survey of selection acting on various functional properties in S. paradoxus. We found that selection pressures previously detected over long evolutionary timescales have also shaped the evolution of S. paradoxus. Importantly, we also make novel discoveries untenable via conventional interspecific analyses.
evolution; natural selection; yeast; derived allele frequency
The QT interval (QT) is heritable and its prolongation is a risk factor for ventricular tachyarrhythmias and sudden death. Most genetic studies of QT have examined European ancestral populations; however, the increased genetic diversity in African Americans provides opportunities to narrow association signals and identify population-specific variants. We therefore evaluated 6,670 SNPs spanning eleven previously identified QT loci in 8,644 African American participants from two Population Architecture using Genomics and Epidemiology (PAGE) studies: the Atherosclerosis Risk in Communities study and Women's Health Initiative Clinical Trial. Of the fifteen known independent QT variants at the eleven previously identified loci, six were significantly associated with QT in African American populations (P≤1.20×10−4): ATP1B1, PLN1, KCNQ1, NDRG4, and two NOS1AP independent signals. We also identified three population-specific signals significantly associated with QT in African Americans (P≤1.37×10−5): one at NOS1AP and two at ATP1B1. Linkage disequilibrium (LD) patterns in African Americans assisted in narrowing the region likely to contain the functional variants for several loci. For example, African American LD patterns showed that 0 SNPs were in LD with NOS1AP signal rs12143842, compared with European LD patterns that indicated 87 SNPs, which spanned 114.2 Kb, were in LD with rs12143842. Finally, bioinformatic-based characterization of the nine African American signals pointed to functional candidates located exclusively within non-coding regions, including predicted binding sites for transcription factors such as TBX5, which has been implicated in cardiac structure and conductance. In this detailed evaluation of QT loci, we identified several African Americans SNPs that better define the association with QT and successfully narrowed intervals surrounding established loci. These results demonstrate that the same loci influence variation in QT across multiple populations, that novel signals exist in African Americans, and that the SNPs identified as strong candidates for functional evaluation implicate gene regulatory dysfunction in QT prolongation.
The QT interval (QT) provides a measure of a ventricular action potential, and its prolongation is associated with sudden death and ventricular arrhythmias. Genome-wide association studies performed in European populations have identified common genetic variants that influence QT. However, it is unclear whether these variants are relevant in other populations, including African Americans. The increased genetic diversity in African Americans also provides opportunities to narrow association signals and identify candidates for functional evaluation. We therefore used data from 8,644 African Americans to further characterize previously identified QT loci. Of the fifteen known independent QT variants at the eleven previously identified QT loci, six were associated with QT in African Americans. We also identified three variants that were independent from previously reported signals and narrowed intervals flanking association signals using patterns of linkage disequilibrium. Finally, bioinformatic-based characterization pointed to candidates located outside protein coding regions. Our results underscore the utility of genetic studies in African ancestral populations to identify novel variants and narrow intervals surrounding established loci. These results suggest that known QT loci are important in African Americans and that further characterization of these loci in other populations may provide additional insights into the genetic and molecular mechanisms underlying QT.
RNA interference occurs by two main processes: mRNA site-specific cleavage and non-cleavage-based mRNA degradation or translational repression. Site-specific cleavage is carried out by argonaute-2 (Ago2), while all four mammalian argonaute proteins (Ago1–Ago4) can carry out non-cleavage-mediated inhibition, suggesting that Ago1, Ago3 and Ago4 may have similar but potentially redundant functions. It has been observed that in mammalian tissues, expression of Ago3 and Ago4 is dramatically lower compared with Ago1; however, an optimization of the Ago3 and Ago4 coding sequences to include only the most common codon at each amino acid position was able to augment the expression of Ago3 and Ago4 to levels comparable to that of Ago1 and Ago2. Thus, we examined whether particular sequence features exist in the coding region of Ago3 and Ago4 that may prevent a high level of expression. Swapping specific sub-regions of wild-type and optimized Ago sequence identified the portion of the coding region (nucleotides 1–1163 for Ago-3 and 1–1494 for Ago-4) that is most influential for expression. This finding has implications for the evolutionary conservation of Ago proteins in the mammalian lineage and the biological role that potentially redundant Ago proteins may have.
Long-range regulatory elements, such as enhancers, exert substantial control over tissue-specific gene expression patterns. Genome-wide discovery of functional enhancers in different cell types is important for our understanding of genome function as well as human disease etiology.
In this study, we developed an in silico approach to model the previously reported phenomenon of transcriptional pausing, accompanied by divergent transcription, at active promoters. We then used this model for large-scale prediction of non-promoter-associated bidirectional expression of short transcripts. Our predictions were significantly enriched for DNase hypersensitive sites, histone H3 lysine 27 acetylation (H3K27ac), and other chromatin marks associated with active rather than poised or repressed enhancers. We also detected modest bidirectional expression at binding sites of the CCCTC-factor (CTCF) genome-wide, particularly those that overlap H3K27ac.
Our findings indicate that the signature of bidirectional expression of short transcripts, learned from promoter-proximal transcriptional pausing, can be used to predict active long-range regulatory elements genome-wide, likely due in part to specific association of RNA polymerase with enhancer regions.
Identifying cis-regulatory elements is important to understand how human pancreatic islets modulate gene expression in physiologic or pathophysiologic (e.g., diabetic) conditions. We conducted genome-wide analysis of DNase I hypersensitive sites, histone H3 lysine methylation modifications (K4me1, K4me3, K79me2), and CCCTC factor (CTCF) binding in human islets. This identified ~18,000 putative promoters (several hundred unannotated and islet-active). Surprisingly, active promoter modifications were absent at genes encoding islet-specific hormones, suggesting a distinct regulatory mechanism. Of 34,039 distal (non-promoter) regulatory elements, 47% are islet-unique and 22% are CTCF-bound. In the 18 type 2 diabetes (T2D)-associated loci, we identified 118 putative regulatory elements and confirmed enhancer activity for 12/33 tested. Among 6 regulatory elements harboring T2D-associated variants, 2 exhibit significant allele-specific differences in activity. These findings present a global snapshot of the human islet epigenome and should provide functional context for non-coding variants emerging from genetic studies of T2D and other islet disorders.
One-carbon metabolism (OCM) is linked to DNA synthesis and methylation, amino acid metabolism and cell proliferation. OCM dysfunction has been associated with increased risk for various diseases, including cancer and neural tube defects. MicroRNAs (miRNAs) are ∼22 nt RNA regulators that have been implicated in a wide array of basic cellular processes, such as differentiation and metabolism. Accordingly, mis-regulation of miRNA expression and/or activity can underlie complex disease etiology. We examined the possibility of OCM regulation by miRNAs. Using computational miRNA target prediction methods and Monte-Carlo based statistical analyses, we identified two candidate miRNA “master regulators” (miR-22 and miR-125) and one candidate pair of “master co-regulators” (miR-344-5p/484 and miR-488) that may influence the expression of a significant number of genes involved in OCM. Interestingly, miR-22 and miR-125 are significantly up-regulated in cells grown under low-folate conditions. In a complementary analysis, we identified 15 single nucleotide polymorphisms (SNPs) that are located within predicted miRNA target sites in OCM genes. We genotyped these 15 SNPs in a population of healthy individuals (age 18–28, n = 2,506) that was previously phenotyped for various serum metabolites related to OCM. Prior to correction for multiple testing, we detected significant associations between TCblR rs9426 and methylmalonic acid (p = 0.045), total homocysteine levels (tHcy) (p = 0.033), serum B12 (p < 0.0001), holo transcobalamin (p < 0.0001) and total transcobalamin (p < 0.0001); and between MTHFR rs1537514 and red blood cell folate (p < 0.0001). However, upon further genetic analysis, we determined that in each case, a linked missense SNP is the more likely causative variant. Nonetheless, our Monte-Carlo based in silico simulations suggest that miRNAs could play an important role in the regulation of OCM.
Primary transcripts of certain microRNA (miRNA) genes are subject to RNA editing that converts adenosine to inosine. However, the importance of miRNA editing remains largely undetermined. Here we report that tissue-specific adenosine-to-inosine editing of miR-376 cluster transcripts leads to predominant expression of edited miR-376 isoform RNAs. One highly edited site is positioned in the middle of the 5′-proximal half “seed” region critical for the hybridization of miRNAs to targets. We provide evidence that the edited miR-376 RNA silences specifically a different set of genes. Repression of phosphoribosyl pyrophosphate synthetase 1, a target of the edited miR-376 RNA and an enzyme involved in the uric-acid synthesis pathway, contributes to tight and tissue-specific regulation of uric-acid levels, revealing a previously unknown role for RNA editing in miRNA-mediated gene silencing.
MicroRNAs are small endogenously expressed non-coding RNA molecules that regulate target gene expression through translation repression or messenger RNA degradation. MicroRNA regulation is performed through pairing of the microRNA to sites in the messenger RNA of protein coding genes. Since experimental identification of miRNA target genes poses difficulties, computational microRNA target prediction is one of the key means in deciphering the role of microRNAs in development and disease.
DIANA-microT 3.0 is an algorithm for microRNA target prediction which is based on several parameters calculated individually for each microRNA and combines conserved and non-conserved microRNA recognition elements into a final prediction score, which correlates with protein production fold change. Specifically, for each predicted interaction the program reports a signal to noise ratio and a precision score which can be used as an indication of the false positive rate of the prediction.
Recently, several computational target prediction programs were benchmarked based on a set of microRNA target genes identified by the pSILAC method. In this assessment DIANA-microT 3.0 was found to achieve the highest precision among the most widely used microRNA target prediction programs reaching approximately 66%. The DIANA-microT 3.0 prediction results are available online in a user friendly web server at
TarBase5.0 is a database which houses a manually curated collection of experimentally supported microRNA (miRNA) targets in several animal species of central scientific interest, plants and viruses. MiRNAs are small non-coding RNA molecules that exhibit an inhibitory effect on gene expression, interfering with the stability and translational efficiency of the targeted mature messenger RNAs. Even though several computational programs exist to predict miRNA targets, there is a need for a comprehensive collection and description of miRNA targets with experimental support. Here we introduce a substantially extended version of this resource. The current version includes more than 1300 experimentally supported targets. Each target site is described by the miRNA that binds it, the gene in which it occurs, the nature of the experiments that were conducted to test it, the sufficiency of the site to induce translational repression and/or cleavage, and the paper from which all these data were extracted. Additionally, the database is functionally linked to several other relevant and useful databases such as Ensembl, Hugo, UCSC and SwissProt. The TarBase5.0 database can be queried or downloaded from http://microrna.gr/tarbase.
It has been speculated that the polymorphisms in the non-coding portion of the human genome underlie much of the phenotypic variability among humans and between humans and other primates. If so, these genomic regions may be undergoing rapid evolutionary change, due in part to natural selection. However, the non-coding region is a heterogeneous mix of functional and non-functional regions. Furthermore, the functional regions are comprised of a variety of different types of elements, each under potentially different selection regimes.
Findings and Conclusions
Using the HapMap and Perlegen polymorphism data that map to a stringent set of putative binding sites in human proximal promoters, we apply the Derived Allele Frequency distribution test of neutrality to provide evidence that many human-specific and primate-specific binding sites are likely evolving under positive selection. We also discuss inherent limitations of publicly available human SNP datasets that complicate the inference of selection pressures. Finally, we show that the genes whose proximal binding sites contain high frequency derived alleles are enriched for positive regulation of protein metabolism and developmental processes. Thus our genome-scale investigation provides evidence for positive selection on putative transcription factor binding sites in human proximal promoters.
Population genetics is the study of allele frequency changes driven by various evolutionary forces such as mutation, natural selection, and random genetic drift. Although natural selection is widely recognized as a bona-fide phenomenon, the extent to which it drives evolution continues to remain unclear and controversial. Various qualitative techniques, or so-called “tests of neutrality”, have been introduced to detect signatures of natural selection. A decade and a half ago, Stanley Sawyer and Daniel Hartl provided a mathematical framework, referred to as the Poisson random field (PRF), with which to determine quantitatively the intensity of selection on a particular gene or genomic region. The recent availability of large-scale genetic polymorphism data has sparked widespread interest in genome-wide investigations of natural selection. To that end, the original PRF model is of particular interest for geneticists and evolutionary genomicists. In this article, we will provide a tutorial of the mathematical derivation of the original Sawyer and Hartl PRF model.
miRGen is an integrated database of (i) positional relationships between animal miRNAs and genomic annotation sets and (ii) animal miRNA targets according to combinations of widely used target prediction programs. A major goal of the database is the study of the relationship between miRNA genomic organization and miRNA function. This is made possible by three integrated and user friendly interfaces. The Genomics interface allows the user to explore where whole-genome collections of miRNAs are located with respect to UCSC genome browser annotation sets such as Known Genes, Refseq Genes, Genscan predicted genes, CpG islands and pseudogenes. These miRNAs are connected through the Targets interface to their experimentally supported target genes from TarBase, as well as computationally predicted target genes from optimized intersections and unions of several widely used mammalian target prediction programs. Finally, the Clusters interface provides predicted miRNA clusters at any given inter-miRNA distance and provides specific functional information on the targets of miRNAs within each cluster. All of these unique features of miRGen are designed to facilitate investigations into miRNA genomic organization, co-transcription and targeting. miRGen can be freely accessed at .
Contrast enhancement within primary stimulus representations is a common feature of sensory systems that regulates the discrimination of similar stimuli. Whereas most sensory stimulus features can be mapped onto one or two dimensions of quality or location (e.g., frequency or retinotopy), the analogous similarities among odor stimuli are distributed high-dimensionally, necessarily yielding a chemotopically fragmented map upon the surface of the olfactory bulb. While olfactory contrast enhancement has been attributed to decremental lateral inhibitory processes among olfactory bulb projection neurons modeled after those in the retina, the two-dimensional topology of this mechanism is intrinsically incapable of mediating effective contrast enhancement on such fragmented maps. Consequently, current theories are unable to explain the existence of olfactory contrast enhancement.
We describe a novel neural circuit mechanism, non-topographical contrast enhancement (NTCE), which enables contrast enhancement among high-dimensional odor representations exhibiting unpredictable patterns of similarity. The NTCE algorithm relies solely on local intraglomerular computations and broad feedback inhibition, and is consistent with known properties of the olfactory bulb input layer. Unlike mechanisms based upon lateral projections, NTCE does not require a built-in foreknowledge of the similarities in molecular receptive ranges expressed by different olfactory bulb glomeruli, and is independent of the physical location of glomeruli within the olfactory bulb.
Non-topographical contrast enhancement demonstrates how intrinsically high-dimensional sensory data can be represented and processed within a physically two-dimensional neural cortex while retaining the capacity to represent stimulus similarity. In a biophysically constrained computational model of the olfactory bulb, NTCE successfully mediates contrast enhancement among odorant representations in the natural, high-dimensional similarity space defined by the olfactory receptor complement and underlies the concentration-independence of odor quality representations.
Comprehensive analyses of results from genome-wide association studies (GWAS) have demonstrated that complex disease/trait-associated loci are enriched in gene regulatory regions of the genome. The search for causal regulatory variation has focused primarily on transcriptional elements, such as promoters and enhancers. microRNAs (miRNAs) are now widely appreciated as critical posttranscriptional regulators of gene expression and are thought to impart stability to biological systems. Naturally occurring genetic variation in the miRNA regulome is likely an important contributor to phenotypic variation in the human population. However, the extent to which polymorphic miRNA-mediated gene regulation underlies GWAS signals remains unclear. In this study, we have developed the most comprehensive bioinformatic analysis pipeline to date for cataloging and prioritizing variants in the miRNA regulome as functional candidates in GWAS. We highlight specific findings, including a variant in the promoter of the miRNA let-7 that may contribute to human height variation. We also provide a discussion of how our approach can be expanded in the future. Overall, we believe that the results of this study will be valuable for researchers interested in determining whether GWAS signals implicate the miRNA regulome in their disease/trait of interest.
microRNA; GWAS; gene regulation; polymorphism; complex disease