MicroRNAs (miRNAs) are ~23-nt endogenous RNAs that pair to messages of protein-coding genes to direct the posttranscriptional repression of these mRNAs3
. To explore miRNA regulatory diversity within a single species, we considered miRNA complementary sites that are created or disrupted by single-nucleotide polymorphisms (SNPs) in mice. Our study centered on three types of complementary sites that previous computational and experimental results indicate can mediate miRNA recognition4
. Each of these three sites includes perfect Watson-Crick pairing to the miRNA seed (miRNA positions 2-7); one is a 7-nt site, referred to here as the “7mer-m8 site,” for which seed pairing is supplemented by a Watson-Crick match to miRNA nucleotide 8 (refs. 5-7
); another is the “7mer-A1 site,” for which seed pairing is supplemented by an A across from miRNA nucleotide 1 (ref. 5
), and the third is the “8mer site,” which has both the m8 match and the A across from position 1 (ref. 5
; ). We focused on sites to three miRNAs, miR-1 (for our purposes synonymous with its paralog, miR-206), miR-133, and miR-122, because of the strong, tissue-specific expression of these miRNAs in relatively homogenous and accessible tissues, muscle (miR-1 and miR-133) or liver (miR-122)8,9
. In agreement with previous reports1,10-12
, a search of the SNP databases13,14
, focusing on 3' untranslated regions (UTRs) because these regions of the messages are most hospitable to miRNA targeting15
, revealed many polymorphisms that when compared to the reference sequence create or disrupt sites for one of the three miRNAs. Because miRNAs often destabilize their target mRNAs16
, we reasoned that if these sites were functional in the tissue expressing the cognate miRNA, then less RNA might accumulate from the allele with the site. Moreover, in mice heterozygous for the SNPs, destabilization of mRNA from the target allele, but not that from the non-target allele, would contribute to allelic imbalance in mRNA steady-state level. Hence, we developed a method called allelic-imbalance sequencing (AI-Seq) to measure such imbalances, reasoning that any imbalances would identify and quantify miRNA regulatory diversity within a species, and provide a unique opportunity to examine molecular consequences of miRNA-mediated repression in vivo
without perturbing either the miRNA or its targets.
Figure 1 Measurement of mRNA allelic imbalances associated with heterozygous miRNA target sites. (a) Canonical 7-8-nt miRNA target sites. (b) AI-Seq, illustrated for the SNP rs13460433, which generates a heterozygous miR-122 target site in the Laptm4b gene of (more ...)
Because lab strains lack the heterozygosity found in natural populations, we performed five inter-strain crosses to generate mice heterozygous for the parental alleles. Approximately 300 annotated SNPs that create/disrupt target sites for one of the three miRNAs were heterozygous in at least one of the five F1 hybrids. We chose a subset of these, preferring those that create/disrupt 8mers, those in messages with evidence of expression in the tissues expressing the miRNAs, and those that were not linked to many nearby polymorphisms. Allelic imbalance was measured for 67 target sites (28 for miR-122, 28 for miR-1, and 11 for miR-133) in the tissue expressing the cognate miRNA.
For AI-Seq, mRNA fragments containing the SNPs were first reverse transcribed (RT) and amplified (PCR), and then the amplicon was subjected to high-throughput pyrosequencing17
(, Supplementary Discussion and Supplementary Fig. 1
online). To economize on sequencing, amplicons derived from different primers were pooled. Because the primers flanking the SNPs used for RT-PCR were gene-specific but not allelespecific, both alleles of the same gene were amplified by the same reaction, and their relative abundance could be inferred from the number of sequencing reads representing each allele.
If none of the intact miRNA sites imparted repression, the distribution of log ratios would center on zero, with individual measurements deviating from zero because of experimental noise and allelic imbalance independent of repression from the heterozygous site. However, the distribution of the log ratios for the 28 miR-122 sites measured using liver, centered below zero, consistent with the hypothesis that mRNA from some of the alleles with target sites was destabilized (). If miR-122 caused this destabilization, then the shift from zero would depend on the presence of miR-122. To test this dependency, we measured ratios for 22 of the 28 miR-122 sites in muscle, which does not express miR-122. The remaining six sites were missing from the muscle-measured ratios because four of the six were in messages not expressed in muscle, and the other two were unusual in that the SNP disrupting the miR-122 site (CACTCCA and ACACTCC, SNP underlined) simultaneously created a site for miR-1 (CATTCCA and ACATTCC), a miRNA expressed in muscle, thereby precluding the muscle-measured ratios as negative controls. As expected for a miRNA-mediated effect, the shift from zero disappeared in muscle (). When analyzing the ratios for the 39 sites for miR-1 or miR-133, which are expressed in the muscle but not in the liver, the reciprocal pattern was observed, in which the distribution of the liver-measured ratios was centered on zero and that of the muscle-measured ratios was skewed to the left ().
Figure 2 Impact of heterozygous target sites on mRNA allelic imbalance. (a) Distribution of allelic ratios (target:non-target, log2) measured for miR-122 heterozygous sites using mRNA from either liver (orange, n = 28) or muscle (green, n = 22), plotted as a histogram (more ...)
To increase sample size and thereby achieve statistical significance, datasets were combined such that the ratios measured in the presence of the cognate miRNA were analyzed together and compared to those measured in the absence of the miRNA (). A significantly large fraction of the log ratios were smaller than zero in the presence of the miRNA (p < 0.01, one-sided exact binomial test), but not in the absence of the miRNA (p = 0.6), and the difference between the two distributions also was significant (p = 0.02, one-sided Kolmogorov-Smirnov [KS] test). Thus, at least a subset of the interrogated target sites mediated repression.
On average, the polymorphic sites were associated with mRNA downregulation of 12% (; 95% confidence interval of 5-18%, bootstrapping). Actual downregulation was likely greater because the signal could have been diluted by both nuclear mRNA and mRNA from cells that do not express the cognate miRNA, such as those from blood or vasculature. Effects of functional sites also might have been diluted by inclusion of nonfunctional sites. Nonfunctional sites presumably were enriched among the set of sites interrogated in this study because natural selection selects against polymorphisms that either disrupt beneficial functional sites or generate functional sites in messages that should not be repressed10,18,19
A lower bound for the fraction of functional sites was estimated by analyzing the maximal vertical displacement of the cumulative distribution curves (correcting for the bumpiness of the distributions15
), and was found to be 19%. This estimate was a conservative lower bound for the true fraction; simulations incorporating the variability observed from the tissues lacking the miRNA showed that if 100% of sites mediated target repression by 20%, only 32% would be detected functional by our analyses. Therefore, at least 19% and perhaps over half the examined polymorphic sites mediated repression.
The experimental variability was composed of multiple components. One was stochastic sampling error inherent to counting sequencing reads, which was modeled by the binomial distribution (). A second component was PCR variability, which was estimated as the excess variability in the allelic ratios measured using gDNA, compared to the stochastic variability (; difference between gDNA and binomial distributions). Another component was the biological noise, which could stem from a variety of sources, including differential epigenetic states of the two alleles or allelic differences in linked cisregulatory elements, either of which might differentially regulate the two alleles. To begin to estimate the biological noise, we examined the distribution of the allelic ratios measured using both the mRNAs from tissues lacking the cognate miRNA and other mRNAs without predicted potential for allelic-specific repression mediated by the three miRNAs. Allelic ratios from these control mRNAs were substantially more variable than those from gDNA, suggesting frequent allelic imbalance not attributable to the sites under investigation (). However, we were unable to quantify the frequency or magnitude of this potentially widespread allelic imbalance because of the possibility that RT variability also contributed to the added variability of the mRNA controls compared to that of the gDNA controls.
Our quantitative readout of site efficacy in vivo
, without perturbation of either the miRNAs or their targets, provided a fresh opportunity to examine the influence of site type and site context. The 8mer sites performed significantly better than did 7mer-m8 or 7mer-A1 sites (; p
= 0.005 and p
= 0.001 respectively, one-sided KS test), and 7mer-m8 sites tended to perform slightly better than 7mer-A1 sites, although this difference was not statistically significant (p
= 0.1, one-sided KS test). Overall efficacy rank order of the three types was consistent with previous observations from experiments that ectopically expressed or deleted miRNAs15,20-22
Figure 3 Dependence of target site efficacy on site type and context. (a) Efficacy of target sites of different types (8mer, n = 5; 7mer-m8, n = 26; 7mer-A1, n = 36), plotted as in . (b) Relationship between allelic ratio and context score in the tissue (more ...)
To consider the influence of site context, the “context score” was calculated for each polymorphic site. Context scores quantitatively evaluate site type and three features of site context (surrounding AU content, position within the 3'UTR, and pairing to 3'-region of miRNA) to predict site efficacy15
. Context scores significantly correlated with target downregulation in the presence of the cognate miRNA, but not in the absence of the miRNA (). Significant correlation was retained in the presence of the miRNA even after the contribution of site type had been factored out, thereby indicating that site context, as scored by this model, influences the efficacy of polymorphic sites ().
Our experiments focused on three of the 87 miRNA families conserved to chicken or beyond23
. Expanding our SNP database search to the other 84 broadly conserved miRNA families and the ~8 million SNPs annotated in 15 mouse strains13
showed that two strains have an average of 2,430 distinct polymorphic sites (bottom and top 2.5 percentile, 810-4,600; median, 1,470) and 1,510 genes with at least one polymorphic miRNA site (bottom and top 2.5 percentile, 520-2,790; median, 950). These numbers would increase with consideration of sites to the hundreds of additional annotated miRNAs. However, because species-specific miRNAs and those conserved only within mammals tend to be expressed at lower levels, their 7-8mer sites are thought to be less frequently sufficient for mediating repression4
. Therefore, to guard against overstating the impact of polymorphic sites, we did not consider these additional miRNAs.
Our results on the performance of polymorphic sites matching three miRNAs, when extrapolated to the sites of all broadly conserved miRNAs, provided the first glimpse of the direct impact of such sites on gene-expression variation within a species. For the 67 sites examined, we observed average downregulation of 12%, with at least 19% of the sites responsible for the observed downregulation. Correcting for our preference in choosing 8mers for analysis slightly lowered these percentages to 10% and 18%, as our best estimates, respectively, for all 7-8mer polymorphic sites in mRNAs expressed in cognate tissues. If 50% of the genes with the sites are expressed in any cell type of the body that express the cognate miRNA and if the actual fraction of functional sites matches our observed lower limit of 18%, then at least 9% (= 0.5 × 0.18) of the genes with polymorphic sites will be differentially regulated between two strains. Thus, between most mouse strains, over a hundred messages are likely to be differentially regulated through polymorphic sites, with average mRNA downregulation for these messages exceeding 55% (average downregulation of all sites of 10%, divided by 0.18). In a more likely scenario in which half the sites in cognate tissues are functional, proportionally more messages would be downregulated, with average downregulation of functional sites still approximating 20%. Because miRNAs also influence translation, effects on the proteome are presumed even greater. Overall, it is hard to escape the conclusion that polymorphic miRNA regulatory sites have a substantial impact on gene-expression variation within a species.
Our results in mice, considered together with SNP frequencies in humans, indicate that any two unrelated humans are likely to have more than a hundred genes differentially regulated due to polymorphic miRNA targeting. Assuming that some of these could explain differences in disease risk among individuals, our results suggest that, as more genome-wide association studies are conducted with improved coverage in 3'UTRs, more miRNA target site polymorphisms will be associated with clinical conditions and individual traits2
Despite the success in mammalian tissues, miRNA effects were not detected in a HapMap24
panel of lymphoblastoid cell lines when we used AI-Seq to measure the allelic imbalance of 56 heterozygous target sites for nine miRNA families most highly expressed in these cell lines (data not shown). Cell lines exhibit variable degrees of clonality with random monoallelic expression in each clone25
, and the resulting random allelic imbalance might have overwhelmed any imbalance resulting from polymorphic regulatory elements of interest. Moreover, the process of establishing lymphoblastoid cell lines, which involved Epstein-Barr-virus infection and subsequent transformation of B-cells, might have downregulated miRNA expression26
Experiments examining the influence of miRNA knockouts on the transcriptome and proteome have been very informative for inferring effects of both conserved and nonconserved sites that are not polymorphic21,27,28
. Our results complement those of these previous studies, revealing the frequent influence of polymorphic sites, with the added advantage of detecting miRNA regulatory impact in vivo
without perturbing either the miRNAs or their targets. Upregulation of a target following miRNA knockout can trigger offsetting feedback regulation that reduces the observed effect of the miRNA. Our AI-Seq results are less likely to be confounded by such a response, because the feedback regulation is usually not allele-specific and therefore is unlikely to change the relative expression of the target compared to the non-target allele. Our approach can be extended to characterization of other cis-regulatory elements that might influence mRNA level. As the capacity of high-throughput sequencing increases, we anticipate that RNA-Seq coverage will expand to the point that directed amplification of specific loci will no longer be required to accurately detect many allelic imbalances and that our approach of correlating imbalances with predicted regulatory sites can be applied transcriptome-wide to reveal many of the polymorphic regulatory sites contributing to these imbalances.