|Home | About | Journals | Submit | Contact Us | Français|
By disrupting microRNA (miRNA) biogenesis, we previously showed that this pathway is critical for the differentiation and function of T cells. While various cloning studies have shown that many miRNAs are expressed during T cell development, and in a dynamic manner, it was unclear how comprehensive these earlier analyses were. We therefore decided to profile miRNA expression by means of Next Generation Sequencing. Furthermore, we profiled miRNA expression starting from the hematopoietic stem cell. This analysis revealed that miRNA expression during T cell development is extremely dynamic, with 645 miRNAs sequenced, and the expression of some varying by as much as 3 orders of magnitude. Furthermore, changes in precursor processing led to altered mature miRNA sequences. We also analyzed the structures of the primary miRNA transcripts expressed in T cells, and found that many were extremely long. The longest was pri-mir-29b-1/29a at ~168kb. All the long pri-miRNAs also displayed extensive splicing. Our findings indicate that miRNA expression during T cell development is both a highly dynamic and a highly regulated process.
Early hematopoiesis occurs in the bone marrow (BM), giving rise to progenitors of all leukocyte lineages. Thymocyte progenitors then leave the BM to seed the thymus where definitive T cell development occurs. Early thymocytes first progress through four stages termed double negative (DN) 1 to 4, because they lack expression of the CD4 and CD8 coreceptors. DN3 is a key stage. T cell receptor (TCR) β rearrangement occurs here, and productive rearrangement is required for progression to the DN4 stage (1). DN4 thymocytes then rapidly proliferate and upregulate both coreceptors as they progress to the CD4+CD8+ double positive (DP) stage. Appropriate selection then leads to differentiation of mature CD8+ cytotoxic T cells or CD4+ helper T cells (2).
Much of what is understood about T cell development is centered on proteins, such as those involved in transcription and signal transduction. However, there is an increasing appreciation for the role of non-protein-coding RNAs (ncRNAs). Both long ncRNAs (lncRNAs) (3) and small ncRNAs (4-6) are expressed in T cells, but the function of most ncRNA classes remains unknown. The best understood class is the microRNAs (miRNAs), which are ~22nt small RNAs that inhibit the translation of protein-coding messenger RNAs (mRNAs). Thousands of miRNAs have so far been identified in plants and animals, including some 1,424 and 720 in humans and mice, respectively (miRBase) (7). MiRNAs target protein-coding mRNAs via incomplete base pairings (8). Because only partial complementarity is required, numerous mRNAs can be the target of each miRNA.
MiRNAs originate from long primary transcripts (known as “pri-miRNAs”) containing one or more secondary stem-loop structures. It is from this stem-loop that the mature miRNA is eventually derived from. In the canonical biogenesis pathway, the intermediate “pre-miRNA” stem-loop is released from the pri-miRNA in the nucleus by the microprocessor complex. At the core of this complex is the RNase III enzyme Drosha (9). The excised pre-miRNA is then exported to the cytoplasm where it is further processed by another RNase III enzyme complex containing Dicer, which clips off the loop structure (10). This is followed by loading of the miRNA-5p:miRNA-3p duplex (i.e. the two arms of the stem-loop) into the RNA-induced silencing complex (RISC) (11), at which point one strand is degraded.
We and others have shown that Drosha and Dicer, and therefore miRNAs, are important throughout the T cell compartment. MiRNAs are required for early thymocytes to progress through the DN stages (12). MiRNAs are also necessary for T cell function. In particular, they maintain the suppressor program of FoxP3+ regulatory T cells. Mice with FoxP3+ cell-specific Drosha or Dicer deficiency die from a lymphoproliferative multi-organ inflammatory disease due to loss of suppressor function (13-15).
Compared to the post-transcriptional processing, less is known about the transcription of miRNA genes. This has been due, in large part, to a lack of information about the full-length primary transcript of most miRNAs, and thus genomic databases have not been able to annotate these genes. It is estimated that 30-40% of mammalian miRNAs are derived from the introns of protein-coding genes, and thus are transcribed together with the host genes (16). However, the majority of miRNAs are derived from independent transcriptional units (ITUs). The transcription of most miRNA genes depends on RNA polymerase II (PolII) (17-19). Furthermore, PolII binding sites located near pre-miRNA genomic locations (potentially correlating with promoters) are marked by histone 3 trimethyl-lysine 4 (H3K4me3) (20). Because of rapid processing by the microprocessor complex, pri-miRNAs derived from these ITUs are difficult to detect in cells. Full-length primary transcripts have been characterized for only a handful of miRNAs (17, 21, 22). Thus, although it has been possible to determine from where in the genome the pre-miRNA stem-loop intermediate is derived, the structure of many miRNA genes remains undefined.
MiRNAs are clearly important for the development and function of T cells. However, which specific miRNAs are important at which stage(s) is still unclear. Here, we report the use of Next Generation Sequencing (NGS) to construct a comprehensive miRNA atlas of T cell development. We reveal the dynamic nature of miRNA gene transcription and processing throughout this developmental pathway. We also use NGS coupled to chromatin immunoprecipitation (ChIP-seq) and polyA RNA enrichment (RNA-seq) to map the structures of miRNA genes expressed in T cells.
DroshaF/F and DicerF/F CD4-cre mice have been previously described (13). C57BL/6 mice were purchased from Taconic Farms or produced at the Walter and Eliza Hall Institute. Mice at 6 to 8 week old mice were used for all analyses. Single-cell suspensions were prepared from BM by flushing with a syringe, and thymocyte suspensions were prepared by mincing thymi through a 100μm mesh. All antibodies were purchased from eBioscience and cell sorting was performed on a FACS-Aria (Beckton Dickinson).
Total RNA was prepared from 0.5-5×105 cells (depending on population) using Trizol (Invitrogen). The 19-24nt small RNAs were fractionated, and libraries were constructed for sequencing on the Illumina GAII platform as previously described (12, 23). Sequence reads were mapped to known mature and pre-miRNAs deposited in miRBase (7). Non-mapping reads were then analyzed for potential novel miRNA species. In brief, this involved aligning the remaining reads to the Mus musculus genome (mm9 assembly, NCBI Build 37) using Novoalign (NovoCraft V2.05.04). The genomic intervals in which novels reads clustered were then extracted and the corresponding RNA was analyzed for putative secondary structure using UNAfold (24). Sequences with predicted stem-loop structure were considered potential miRNA precursors. However, only those structures where the small RNAs actually mapped to the double stranded stems were considered novel miRNAs. Scripts for these pipelines can be provided upon request.
RNA-seq was performed as previously described (25, 26). In brief, polyA RNAs were selected using oligo-dT magnetic beads. The RNA was fragmented and reversed transcribed to produce cDNA, then dsDNA. Illumina-compatible adaptors were ligated to the dsDNA after end-repair and dA addition. The resulting library was size selected at ~225bp, amplified and purified before being sequenced. The reads were mapped to mouse genome (mm9 assembly) using the Illumina ELAND software. The data was then visualized using the Integrative Genomics Viewer software (27) or UCSC Genome Browser (28). Mammalian sequence conservation data was also obtained from the UCSC Genome Browser.
ChIPs were performed on native chromatin essentially as described (29), with some modifications. In brief, 1.1×107 cells were lysed in digestion buffer (50mM Tris-HCl pH 7.5, 250mM sucrose, 3mM MgCl2, 1mM CaCl2, and 0.2% Triton-X100, supplemented with fresh 5mM sodium butyrate, 0.5mM PMSF and EDTA-free Protease inhibitors) at 4°C for 10min. The nuclei were collected by centrifugation and resuspended in fresh digestion buffer. The nuclei were warmed to 37°C, then micrococcal nuclease was added to a final concentration of 0.09U/mL. After 5min, the reaction was stopped by the addition of EDTA/EGTA. The nuclei were collected, then lysed by 2 rounds of sonication (Diagenode Bioruptor 300) and incubation in lysis buffer (10mM Tris-HCl pH 7.5, 0.25mM EDTA, supplemented with fresh 5mM sodium butyrate, 0.5mM PMSF and EDTA-free Protease inhibitors) for 1h. >90% mononucleosomes release was confirmed. 25μg chromatin (~5×106 cell equivalents) was subjected to ChIP using 15μL Protein A Dynabeads (Invitrogen) pre-coated with anti-H3K4me3 (Upstate/Millipore, 05-745) or anti-H3K36me3 (Abcam, Ab9050) antibodies. Immunoprecipitated mononucleosomes were washed twice in RIPA buffer, twice in RIPA buffer plus 0.3M NaCl, twice in LiCl buffer (0.25 M LiCl, 1% NP40, 0.5% sodium deoxycholate), and finally twice with TE buffer. DNA was eluted by incubation with 3% SDS and 50μg/mL proteinase K in TE at 65°C for 1h. After phenol/chloroform extraction, libraries were prepared from 10ng DNA using the ChIP-Seq Prepreparation Kit (Illumina) according to the manufacturer’s protocol, and sequenced on the Illumina GAII platform. The reads were mapped to mouse genome (mm9 assembly) using the Illumina ELAND software.
cDNA was obtained by reverse transcription of total RNA using Superscript III (Invitrogen) and radom hexamers. The following primers were employed to amplify pri-miRNA sequences: 5′-GATACCAAAATGTCAGACAGCC and 5′-AGTCTCAGCTGCACTAGC were used to amplify pri-mir-21; 5′-GACTACGCTCTGCCCTTCTG and 5′-TTCTGCTTCCTGAGTGCTGA were used to amplify the pri-mir-29b-1/29a; 5′-AGGTTCATACCTGTTGGTGGA and 5′-CATTACGGCAATGACCAAGG were used to amplify the pri-let-7c-2/let-7b.
All miRNA-seq, RNA-seq and ChIP-seq datasets have been deposited with the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) under accession #GSE30584.
Early studies characterizing the expression of miRNAs in T cells used cloning or hybridization microarrays on whole lymphoid organs or partially fractionated T cell populations (5, 30). More recently, Neilson and co-workers extended this analysis by focusing specifically on T cell development (6). They analyzed libraries of short RNA clones derived from FACS sorted thymocyte subsets. However, given that fewer than 1000 clones were obtained for some populations, it was unclear how comprehensive this analysis was. We therefore decided to construct a more comprehensive miRNA profile of T cell development using NGS. Furthermore, we decided to start from the hematopoietic stem cell (HSC)-enrich Lin−Sca1+kit+ (LSK) population in the BM. We sorted the following populations corresponding to known steps in T cell development from the BM and thymus of C57BL/6 mice: LSK, multipotent progenitor (MPP, Lin−Sca1+kit+CD135+CD106+), lymphoid-primed multipotent progenitor (LMPP, Lin−Sca1+kit+CD135+CD106−), DN1 (CD90+CD4−CD8−CD3−CD44+CD25−), DN2 (CD90+CD4−CD8−CD3−CD44+CD25+), DN3 (CD90+CD4−CD8−CD3−/loCD44−CD25+), DN4 (CD90+CD4−CD8−CD3loCD44−CD25−), DP (CD90+CD4+CD8+CD3lo), CD4SP (CD90+CD4+CD8−CD24loCD3hi) and CD8SP (CD90+CD4−CD8+CD24loCD3hi). We also included pro-B cells (B220+IgM−CD43hi) as a related population, and murine embryonic fibroblasts (MEFs) as an unrelated population. A total of 3.17×107 mappable reads was obtained, at an average of 2.64×106 reads per library (Table 1). The vast majority (90.0-99.1%) mapped to previously annotated miRNA loci (Table 1). An average of 2.1% of reads were non-unique mappers (i.e mapped to multiple locations within the genome, such as repetitive sequences), and were thus excluded from further analyses. A further 1.9% of reads were found to be unique mappers, equally distributed between genic and intergenic regions.
To identify novel miRNAs, the remaining 1.9% of unique reads were pooled and analyzed for groups of reads clustering within 50-200bp genomic intervals. The RNA sequence of these intervals were then analyzed for potential secondary structure in unaFold (24). As pre-miRNAs are expected to fold into stable secondary stem-loop structures, we analyzed further only those sequences with predicted free energies (ΔG) of ≤ −30kcal/mol. This yielded 310 sequences with potentially stable secondary structure, of which 66 displayed clear stem-loop structure and mammalian sequence conservation (not shown). Despite so many sequences predicted to fold into pre-miRNA-like stem-loops, for only 20 did the reads map clearly to the stems (Figure S1). Three of these (#77, #1123 and #1413) have since been annotated as miR-1843b, miR-3068 and miR-1306b by miRBase (7). Two (#28 and #665) mapped to predicted snoRNAs (31) and 5 (#199, #624, #647, #671 and #823) mapped to predicted tRNAs (32). However, these previous annotations were based on genomic sequence rather than the RNA. Analysis of the actual RNAs suggests that all fold into stem-loop structures typical of pre-miRNAs rather than snoRNAs or tRNAs (Figure S1).
MiRNA-seq generated a comprehensive and quantitative atlas of miRNA expression during T cell development (Figure 1A and Table S1). 645 different mature miRNAs (including miRNA* species) were sequenced at least once in one of the populations within the T cell developmental pathway (Table S1). Previous studies suggested that while most miRNAs are widely expressed (4), different sets of miRNAs are expressed by different cell types (33). These studies also implied that similar miRNA profiles are shared by related cell types. From our miRNA-seq analysis, developmental relationships were clearly evident. The highly related populations, CD4SP and CD8SP thymocytes, displayed highly similar miRNA profiles, with a Pearson’s correlation of 0.954 (Figure 1B). The closely related populations, DN1 thymocytes and Pro-B cells, both being early steps in their respective developmental pathways, displayed somewhat similar miRNA profiles, with a Pearson’s correlation of 0.75 (Figure 1C). In contrast, the completely unrelated populations, LSK and MEFs, displayed completely different miRNA profiles, with a Person’s correlation of 0.04 (Figure 1D).
While many miRNAs were expressed at a constant level throughout (Figure 1E), including miR-181d and many Let-7 members, others displayed highly dynamic patterns. Many of the miRNAs that were highly expressed in the LSK and MPP stages, including miR-10a and miR-126-3p, were downregulated by as much as 3 orders of magnitude by the CD4SP and CD8SP stages (Figure 1F). Others were transiently upregulated at specific stages. In particular, there was a noticeable spike in miR-181a, miR-181b, miR-181c and miR-142-3p at the late DN to DP stages (Figure 1G). It has previously been observed that miR-181 family members are highly enriched in DP thymocytes (6). Thus, the spike in the miR-181a/b/c expression was not a surprise. According to miRBase (7), miR-181c clusters with miR-181d and thus co-ordinate expression might have been expected. However, since miR-181d displayed a constant expression pattern (Figure 1E), there appeared to be a discrepancy between the two linked miRNAs. Finally, numerous miRNAs displayed biphasic expression patterns, including miR-142-5p (Figure 1H). Again, one would have expected miR-142-5p and miR-142-3p to be expressed with similar patterns as both are derived from the same pre-miRNA. These miR-142-5p/3p and miR-181c/d discordances suggest that miRNA expression might not simply be determined by transcription of the gene, but also on precursor processing that appears to vary during T cell development.
To further explore the possibility of regulated miRNA precursor processing, we examined the expression of miR-106b, miR-93 and miR-25, which are all derived from intron 13 of the Mcm7 host transcript. Since all three are derived from the same precursor, one might expect similar expression patterns if all are processed with similar efficiencies. MiR-106b was poorly expressed at all stages (Figure 2A). MiR-93 and 25 were expressed at similar levels at most stages (Figure 2A). However, miR-25 was downregulated at the DN3 to DP stages, suggesting that there was a loss in the ability to process the pre-mir-25 intermediate specifically at these stages.
Fine-mapping of miRNA subspecies also suggested altered precursor processing of some miRNAs at the DN3 to DP stages. The 19-22-mer AAGUGCUUACAGUGCAGGU(AGU) subspecies were the dominant miR-17-5p molecules expressed in LSK to DN2, CD4SP and CD8SP populations (Figure 2B). However, in DN3 thymocytes there was a shift of 2 bases 5′ to the 21-24-mer CAAAGUGCUUACAGUGCAGGU(AGU) miR-17-5p subspecies, and the appearance of miR-17-3p subspecies. This shifted by 1 base again 3′ in DN4 and DP thymocytes to 20-22-mer AAAGUGCUUACAGUGCAGGU(AGU) miR-17-5p subspecies.
Pre-miR-21 also displayed altered processing at the DN3 to DP stages (Figure 2C). While most stages expressed 19-21-mer GCUUAUCAGACUGAUGUUG(AC) miR-21-5p subspecies, this was shifted by 2 base positions 5′ to 21-23-mer UAGCUUAUCAGACUGAUGUUG(AC) in DN3 thymocytes, and by 1 base in DN4 and DP thymocytes to 18-22-mer AGCUUAUCAGACUGAUGU(UGAC). Subspecies derived from the 3p arm again appeared in DN4 and DP thymocytes.
Thus, not only is miRNA maturation regulated during T cell development, but also the precise cut sites. Interestingly, for both maturation of the polycistron mir-106b~25 and variable cleavage sites, the shifts occurred specifically at the DN3 to DP stages, suggesting that there may be specific changes to the miRNA biogenesis machinery at this point in the T cell development.
Although miRNAs pair imperfectly with target mRNAs, pairing between the seed sequence (nucleotide positions 2 to 8) of the miRNA and target is thought to be a critical determinant (8). Altered precursor processing that result in shifts at the 5′ seed sequence may therefore affect target recognition.
Upon more careful examination of the miR-17 subspecies expressed during T cell development, we found that those expressed only in DN3 thymocytes corresponded to the annotated mature miR-17 in miRBase (7). The miR-17 subspecies expressed in DN4 and DP thymocytes were shifted 3′ by 1 base, while all other stages expressed miR-17 subspecies that were shifted 3′ by 2 bases compared to the annotated miR-17 (Figure 3A). A +1 shift results in a seed sequence identical to that of miR-302a or miR-106a, while a shift of +2 results in a seed sequence identical to that of miR-302c (Figure 3B). Targetscan (8) predicts 827 targets for miR-17, while 498 targets are predicted for a +1 shift (Figure 3C and Table S2). Of the targets predicted for miR-17+1, only 286 are also predicted targets of the annotated miR-17. In other words, a single nucleotide change in the seed sequence of miR-17 changed two-thirds of the predicted targets. With a +2 shift, only 132 of the same targets were predicted.
For miR-21, the impact of shifted seeds on target prediction was even more dramatic (Figure 3D and Table S2). Like for miR-17, the miR-21 subspecies expressed in DN3 thymocytes corresponded to the mature miR-21 in miRBase (7). Only 7 of the 94 predicted targets of the +1 shifted miR-21 were the same as for the annotated miR-21. Similarly, only 7 of the 63 predicted targets of the +2 shifted miR-21 were the same. Although only computational predictions, these data suggest that altered precursor processing could potentially have an enormous impact on the targets that are regulated by any single miRNA. Further genetic studies will be required to determine if altered mature miRNA sequence does indeed affect target recognition this dramatically.
The dynamic miRNA expression patterns observed during T cell development also suggested regulation at the level of miRNA gene transcription. In contrast to post-transcriptional processing, less is known about the transcription of miRNAs. Although up to 40% of miRNAs may be derived from the introns of protein-coding genes (16), and thus dependent on the transcription of the host gene, most are derived from ITUs. From the few pri-miRNA transcripts cloned previously from ITUs, we know that transcription is dependent on PolII and that they contain 7-methyl guanosine caps and polyA tails (17, 21). However, for most ITU-derived miRNAs, the pri-miRNA transcripts have yet to be cloned. This is thought to be due to rapid processing of these transcripts by the miRNA biogenesis machinery 17.
To map the structures of pri-miRNAs and miRNA genes in T cells, we first needed to block miRNA processing and allow transcript accumulation. We and others have previously shown that knocking out/down components of the microprocessor complex results in the accumulation of pri-miRNA transcripts that can be detected, at least, by Northern blotting (13, 17, 34). Deletion of Drosha in T cells results in the accumulation of pri-miRNA transcripts derived from ITUs that can also be detected by RNA-seq (Figures 4A and B) or by quantitative RT-PCR (data not shown). Drosha deficiency did not appear to affect the expression of mRNAs with intron-embedded miRNAs (data not shown). RNA-seq analysis of polyA RNAs in Drosha deficient cells thus allowed us to visualize the pri-miRNA transcripts and the structure of actively transcribed miRNA genes. We analyzed libraries from both CD4+ and CD8+ T cells, and ~2×107 reads were obtained for each library.
To clearly delineate the miRNA genes, we also analyzed specific histone modifications. Barski and co-workers have previously shown that miRNA genes carry chromatin modifications similar to that of protein-coding genes (35). Mapping of H3K4me3 peaks can therefore be employed to identify the promoters of actively transcribed miRNA genes, while broadly distributed H3K36me3 should mark the gene body (Figures 4C to F). From the miRNA-seq profile of CD4SP and CD8SP cells, we identified some 70 miRNA genomic loci (Table S3). Of these, 35 were clearly located within introns of protein-coding mRNAs, 24 loci were clearly expressed as ITUs, and a further 9 loci appeared to map within or close to exons of protein-coding genes.
Several of the 24 miRNA ITUs corresponded to previously cloned pri-miRNAs (22, 36), including mir-17~92a (Figure 4B). However, for the majority miRNAs, this was the first demonstration of the primary transcript. While several miRNA genes were only a few kilobases in length, such as Mir-142 (Figure 4C) and Mir-146a (Table S3), most were substantially larger (Figure 4D and E, and Table S3). The largest, included Mir-29b-1/29a, which spanned 167.8kb, and Mir-181a-2/181b-2, which spanned 111.4kb (Table S3). Furthermore, several miRNA genes displayed more than one H3K4me3 peak (Figure 4E), suggesting that more than one promoter may be employed by these genes.
Splicing of pri-miRNA transcripts has previously been reported (21). We found that while short pri-miRNA transcripts, such as mir-142, were not spliced, long pri-miRNAs appeared to be extensively spliced (Figure 4D/E). RT-PCR across putative introns confirmed splicing throughout the flanking ssRNA strands (Figure 5A and B).
It was curious to observe so much splicing. Extensive splicing is reminiscent of long ncRNA maturation (37), in addition to protein-coding mRNAs. Thus, we wondered whether these long ssRNAs that flank pre-miRNAs might remain in cells as mature lncRNAs. Our RNA-seq analysis had been performed on polyA-enriched RNA. In wild-type cells, processing would cleave off these flanking ssRNA strands and thus would not have been picked up the oligo-dT pulldown and subsequent RNA-seq. We therefore measured expression by quantitative RT-PCR. However, we found that these flanking ssRNAs did not accumulate in either wildtype or Dicer deficient cells, suggesting that they are normally degraded following Drosha-dependent cleavage (Figure 5C).
Nine loci appeared to map within or close to exons of other genes (Table S3). MiR-21 is annotated within the 3′UTR (exon 12) of Vmp1 in mice (Figure 6A). However, the presence of a H3K4me3 peak within intron 10 and RNA transcription through introns 10 and 11 of the Vmp1 gene suggested that miR-21 is actually derived from a short ITU that overlaps with the 3′ end of Vmp1. Indeed, the same overlapping structure was reported for MIR21 and VMP1 in humans (21). Such an overlap between independently transcribed miRNA and protein-coding genes was not unique. We found 8 other examples (Table S3), including Mir-700, Mir-15a/16-1 and Mir-1839 (Figure 6B to D).
By using primers located within the pri-mir-21 sequence but not exons of the overlapping Vmp1 gene (Figure 6E), we could detect the accumulation of the transcript in Drosha deficient T cells (Figure 6F), thus confirming that it is indeed a bona fide pri-miRNA transcript. Futhermore, the accumulation of RNA from intron 10 of Vmp1 specifically within Drosha deficient cells allowed us to confirmed that the H3K4me3 peak within intron 10 is most likely the Mir-21 promoter.
We next analyzed the sequence conservation of the miRNA genes expressed in T cells to better understand the structures of these genes. In contrast to protein-coding genes, miRNA genes were found to display very poor evolutionary conservation (Figure 7A and B). Within the transcript itself, only the pre-miRNA portions displayed a high degree of conservation among mammals, while the remainder of the transcript displayed little or no conservation. This indicates that it is only the sequence of the stem-loop under evolutionary pressure to maintain sequence. This is likely due to the requirement for base pairings between the 5p and 3p arms to form the stem-loop structure and be process by Drosha and Dicer, and to pair with target protein-coding mRNAs.
In contrast to the transcripts themselves, the regions upstream of miRNA genes displayed substantial sequence conservation. Regions immediately 5′ of transcriptional start sites, corresponding to the promoters of the genes, were invariably conserved (Figure 7A and B). We also observed numerous additional upstream peaks of conservation that are likely to correspond to cis regulatory elements. Thus, only the pre-miRNA stem-loop and cis regulatory elements of miRNA genes are evolutionarily conserved and the majority of the RNA itself permits a large degree of sequence flexibility.
MiRNA-seq revealed substantially more dynamic miRNA expression during T cell development than previously found. Several striking patterns were observed. Firstly, many of the miRNAs that were highly expressed in bone marrow progenitor populations, such as miR-10a, miR-99b, etc, were downregulated by 2 to 3 orders of magnitude in thymocyte populations. A survey of miRNA-seq data of B cell development and various other mature hematopoietic lineages (38) indicate that these miRNAs are also downregulated in other differentiation pathways. These miRNAs may therefore play important functions in maintaining the multipotential state of hematopoietic progenitors. MiR-125a has previously been shown to amplify the size and activity of HSCs in mice, in part, by inhibiting the expression of the pro-apoptotic protein Bak (39). We found that all miR-125 family members, miR-125a, miR-125b and miR-351, were highly expressed in hematopoietic progenitors but then downregulated. Because all three have the same seed sequence, it is likely that they have similar spectra of targets. Coordinate expression in progenitors then downregulation was also observed for miR-221 and miR-222, which share the same seed sequence. MiR-99a and miR-99b also displayed this same expression pattern. Simultaneous expression of miRNAs with shared seed sequence may function as a mechanism of redundancy in hematopoietic progenitors.
Changes in miRNA expression profiles specifically at late DN and DP thymocyte stages were also prominent. For example, miR-181a, miR-181b and miR-181c were all transiently upregulated. High levels of miR-181a have previously been reported for the thymus. Li and co-workers reported expression at early DN stages and not in DP thymocytes (40). In contrast, Neilson and co-workers found highest expression in DN4 thymocytes, but overall greatest enrichment in DP thymocytes (6). The reason for this discrepancy is unclear, but may be due to method of detection. Li and co-workers analyzed miR-181a expression by PCR, whereas we and the Neilson study employed cloning. Regardless, miR-181a is thought to regulate the expression of numerous proteins important for TCR signaling and thymocyte selection, including phosphatases (40) and CD69 (6). It has also been shown that miR-181a is necessary for the negative selection of autoreactive TCRs (41), confirming its expression and function in DP thymocytes.
While most of the changes in miRNA profiles can be attributed to likely changes in pri-miRNA transcription, discrepancies in the expression of mature miRNAs derived from the same polycistron suggested altered precursor processing. This appeared to change specifically at late DN and DP thymocyte stages. Likewise, precise endonucleolytic cleavage sites were also altered. It has previously been observed that miRNAs are usually expressed with fixed 5′ ends, but more flexible 3′ ends (42, 43). The interpretation was that a flexible 5′ end would alter seed sequence and therefore target genes. We found that a single homogenous population of cells at the same stage in T cell development invariably expressed miRNAs with identical 5′ ends. However, the precise ends could change in DN3, DN4 and/or DP thymocytes. We found that even a single nucleotide shift in the seed sequence dramatically affected the predicted targets (at least those predicted by Targetscan). Future genetic studies will be required to determine the impact of such altered precursor processing on mature miRNA targets. Regardless, our data implies that simply determining whether a miRNA is expressed or not is insufficient for investigating miRNA-dependent gene regulation. It is also essential that the precise sequence(s) of that miRNA are known.
That both polycistron processing and 5′ ends were altered at the same stages in T cell development suggests that the miRNA biogenesis machinery may be regulated differently here. Interestingly, these changes appear to correlate with the phenotypes observed when Drosha or Dicer was deleted specifically within the T cell developmental pathway. Lack of either enzyme, and therefore miRNAs, resulted in a developmental block at the DN3 stage (12). In the case of miR-17 and miR-21, their 5′ ends are generated by Drosha-mediated cleavage. Similarly, the release of different pre-miRNA intermediates from a common polycistron is dependent on Drosha. The Drosha complex may differ within thymocytes at different stage of development. Cell-specific co-factors could be recruited. Indeed, p68 and p72 DEAD-box RNA helicases have been shown to bind to Drosha and modulate the processing of specific miRNAs (44). Thus, miRNA expression is a complex regulated process and not simply dependent on transcription and inexorable processing by Drosha and Dicer.
Previous studies have attempted to map pri-miRNA transcripts by assembling ESTs (45) or transcriptional features surrounding miRNA genomic locations, in order to predict the putative boundaries (46). However, because most primary transcripts are efficiently processed by the microprocessor complex, mapping unobservable transcripts was obviously difficult. Although Saini and co-workers predicted that some miRNA genes would indeed be very large (46), we had expected most pri-miRNAs to be only several hundred base pairs to a few kilobases in length, as the few characterized pri-miRNAs were within this range (17, 21). Pri-miRNA prediction based on EST assembly also suggested this (45). This was already surprisingly long, given that pre-miRNA intermediates are 50-70nt and only 10nt of ssRNA immediately 5′ and 3′ of the stem-loop are necessary for processing, at least in vitro (47). To our surprise, we found that most of the miRNA genes expressed within T cells were tens of kilobases and up to a 167kb length. Given that, on average, each miRNA gene encodes two pre-miRNA stem-loops, only ~88bp (5p + 3p of each stem-loop) of each transcript is potentially functional. This means that up 99.95% of a pri-miRNA is transcribed, processed and then discarded.
It is unclear why pri-miRNAs transcripts are so long. One possibility is that the remaining long ssRNAs function as lncRNAs. However, the poor sequence conservation of these transcripts and their degradation in Drosha-sufficient cells suggests that this is unlikely. Another possibility is that flanking RNA strands are required for regulated pri-miRNA processing. Cis elements might function from a distance to regulate the recognition and processing of the stem-loop in an analogous manner to distal transcription factor binding sites that regulate gene transcription. At least in vitro, stem-loop containing RNAs are not processed by Drosha at equivalent efficiencies (12, 47). Although the majority of these flanking ssRNA strands have poor sequence conservation, peaks of conservation were present throughout. These peaks may correspond to RNA cis elements critical for regulated processing. These elements might function in a cell-specific manner and could help explain the variations in miRNA processing that occur during T cell development.
Our study has shown that miRNA expression during T cell development is highly dynamic. This dynamic nature is likely to be fundamental for controlling the transcriptional and proteomic landscape during T cell development. It has long been recognized that transcription factors can both activate and repress gene expression. However, with the discovery of regulatory ncRNAs, such as miRNAs, it is clear that these transcriptional networks are more complicated than previously appreciated. Transcription factors that activate the transcription of protein-coding genes may, in some instances, suppress targets indirectly by activating the transcription of regulatory miRNAs. Likewise, repressive transcription factors may upregulate protein expression by repressing miRNAs. Thus, a precise understanding both of protein-coding gene regulation and of miRNA (and other ncRNA) gene regulation will be fundamental to elucidating the molecular mechanisms that control the development and function of T cells.