MicroRNAs (miRNA) are post-transcriptional regulatory molecules recently discovered in animals and plants (review in (
Bartel, 2004)). They have been shown to regulate diverse biological processes ranging from embryonic development to the regulation of synaptic plasticity (
Carthew, 2006;
Kloosterman and Plasterk, 2006). Primary miRNA transcripts are predominantly transcribed by RNA Polymerase II. After multiple steps of transcript processing, the mature miRNA (~22 bps) is incorporated into the RISC complex in the cytoplasm. Mature miRNAs suppress gene expression via imperfect base pairing to the 3′ untranslated region (3′UTR) of target mRNAs, leading to repression of protein production, and in some cases, mRNA degradation (
Bartel, 2004;
Carthew, 2006;
Valencia-Sanchez et al., 2006). Hundreds of miRNA genes have been identified in mammalian genomes (
Griffiths-Jones et al., 2006), and computational predictions indicate that thousands of genes could be targeted by miRNAs in mammals (
John et al., 2004;
Krek et al., 2005;
Lewis et al., 2005;
Rajewsky, 2006). These findings suggest that miRNAs play an integral role in genome-wide regulation of gene expression.
Similar to electronic circuits, gene regulatory networks (GRN) are made up of basic subcircuits, such as feedback and feedforward loops. Pioneering work in
E. coli has shown that certain subcircuits are favored by evolution and hence are significantly more abundant than others (
Shen-Orr et al., 2002). The identification of these recurring subcircuits, called
network motifs (
Milo et al., 2002), has offered key insights into gene regulation. For instance, ~35% of
E. coli transcription factors repress their own transcription and such negative auto-regulatory circuits can significantly accelerate transcriptional response time (
Rosenfeld et al., 2002) and dampen protein expression fluctuations (
Becskei and Serrano, 2000). Like transcriptional repressors, miRNAs are likely embedded in a large number of GRNs, in which certain miRNA-containing circuits may be recurrent. While all miRNAs operate through a repressive mechanism, their functions in networks need not be simply repressive; they could have diverse functions depending on the unique GRN context of individual miRNA-target interactions. Hence, the identification of recurring miRNA-containing motifs in GRNs would greatly increase our understanding of the functional roles of miRNAs in gene regulation.
Only a few studies have experimentally explored miRNA function in the context of a GRN. They suggest that a key recurring function of miRNAs in networks is to reinforce the gene expression program of differentiated cellular states. For instance, the secondary vulva cell fate in C. elegans is promoted by Notch signaling, which also activates
miR-61;
miR-61 in turn post-transcriptionally represses an inhibitory factor of Notch signaling, thereby stabilizing the secondary vulva fate (
Yoo and Greenwald, 2005). Networks of similar architecture can also be found in the asymmetric differentiation of left-right neurons in C. elegans (
Johnston et al., 2005), eye and sensory organ precursor development in
Drosophila (
Li and Carthew, 2005;
Li et al., 2006), and granulocytic differentiation in human (
Fazi et al., 2005).
The repressive effect of miRNAs on target expression is modest and is often limited to the level of translation with little effects on transcript abundance (
Bartel, 2004). Thus, an important question is whether miRNAs act in concert with other regulatory processes, such as transcriptional control, to regulate target gene expression at multiple levels and with greater strength. One possibility is that the transcription of the miRNAs and their targets is oppositely regulated by common upstream factor(s) (Type II circuits, ). For instance, an upstream factor could repress the transcription of a target gene and simultaneously activate the transcription of a miRNA that inhibits target-gene translation. Type II circuits may be prevalent as genome-scale studies have shown that predicted target transcripts of several tissue-specific miRNAs tend to be expressed at a lower level in tissues where the miRNAs are expressed (
Farh et al., 2005;
Sood et al., 2006;
Stark et al., 2005). In contrast, there is little evidence for circuits in which the transcription of the miRNAs and their targets are positively co-regulated (Type I circuits, ); only one such example has been confirmed experimentally, where
miR-17-5p represses
E2F1, and both are transcriptionally activated by
c-Myc in human cells (
O'Donnell et al., 2005). While Type I circuits may seem counterintuitive and its functional significance has not been fully characterized, Type I circuits have the potential to provide a host of regulatory and signal processing functions (
Hornstein and Shomron, 2006), such as the fine-tuning and maintenance of protein steady-states (see
Discussion).
While individual examples of Type I and II circuits exist in mammalian GRNs, our goal is to determine whether these circuits are recurrent (i.e. more prevalent than would be expected by chance). Although existing experimental data suggests that Type II circuits are prevalent and Type I circuits are not, the number of examples is far too few to be conclusive. It is possible that the apparent lack of evidence for the prevalence of Type I circuits is due to the bias in the choice of experimental systems, i.e., most existing studies used cellular differentiation systems where Type II circuits function to reinforce differentiation decisions.
Given the dearth of known miRNA-containing networks, it is infeasible to directly determine whether Type I/II circuits are recurrent. However, if a miRNA is involved in a larger number of Type I (Type II) circuits than expected by chance, one would expect the transcription of the miRNA and a significant number of its targets to be positively (negatively) correlated across diverse conditions. There are three challenges that complicate the identification of such correlation signatures. The first challenge is that large-scale expression data sets containing both miRNAs and protein-coding genes are lacking. We address this challenge by taking advantage of the large number of miRNAs that are embedded in the introns of protein-coding genes in human and mouse (
Rodriguez et al., 2004). With few exceptions (e.g.
miR-7 during
Drosophila embryogenesis (
Aboobaker et al., 2005)), the expression profiles of embedded miRNAs examined thus far are highly correlated to their host genes at both the tissue and individual cell levels (
Aboobaker et al., 2005;
Baskerville and Bartel, 2005;
Li and Carthew, 2005), suggesting that they tend to be co-transcribed at identical rates from the same promoter(s) (
Kim and Kim, 2007). Hence, the
relative level of host-gene transcription across conditions can accurately serve as a proxy for that of the embedded miRNA(s), even though the steady-state levels of host-gene mRNA and that of the embedded miRNA(s) may be different.
The second challenge is that only a few miRNA targets have been verified
in vivo and computational target predictions can be noisy (
Rajewsky, 2006). We address this challenge by developing a robust method that does not rely on target prediction to detect significant over-abundance of Type I and/or II circuits.
The third challenge is that most existing mammalian expression data sets tend to study tissues, not individual cell types. While expression correlation over tissue conditions is likely due to transcriptional co-regulation by common upstream factors, cell-type heterogeneity in tissues can complicate the analysis. For example, some miRNAs and their targets could be expressed in distinct cell types within a tissue even though their averaged expression at the tissue level may suggest that their expression is correlated. To address this challenge, we analyze expression data from homogeneous neuronal cell populations (
Arlotta et al., 2005;
Sugino et al., 2006).
We consistently observe that Type I and/or II circuits are prevalent for a significant fraction of the embedded miRNAs we analyzed, independent of the gene expression data sets used in the analysis, suggesting that these two circuit types are recurrent motifs in mammalian genomes. Strikingly, brain-enriched miRNAs tend to target brain-enriched genes and Type I circuits are especially prevalent in mature neurons. Our findings not only confirm that Type II circuits are abundant, but reveal the surprising genome-wide prevalence of Type I circuits, suggesting that miRNAs are employed in recurrent gene regulatory circuits to perform important biological functions in mammals.