|Home | About | Journals | Submit | Contact Us | Français|
Methylation of the N6 position of adenosine (m6A) is a post-transcriptional modification of RNA whose prevalence and physiological relevance is poorly understood. The recent discovery that FTO, an obesity risk gene, encodes an m6A demethylase implicates m6A as an important regulator of physiological processes. Here we present a method for transcriptome-wide m6A localization, which combines m6A-specific methylated RNA immunoprecipitation with next-generation sequencing (MeRIP-Seq). We use this method to identify mRNAs of 7,676 mammalian genes that contain m6A, indicating that m6A is a common base modification of mRNA. The m6A modification exhibits tissue-specific regulation and is markedly increased throughout brain development. We find that m6A sites are enriched near stop codons and in 3' UTRs, and we uncover an association between m6A residues and microRNA binding sites within 3' UTRs. These findings provide a resource for identifying transcripts that are substrates for adenosine methylation and reveal insights into the epigenetic regulation of the mammalian transcriptome.
The fat mass and obesity-associated (FTO) gene is a major regulator of metabolism and energy utilization (Church et al., 2009; Church et al., 2010; Fischer et al., 2009). In humans, FTO polymorphisms that increase FTO expression are associated with elevated body mass index and increased risk for obesity (reviewed in Fawcett and Barroso, 2010). FTO is a member of the Fe(II)- and oxoglutarate-dependent AlkB oxygenase family, and was originally shown to catalyze the oxidative demethylation of methylated thymidine and uracil (Gerken et al., 2007; Jia et al., 2008). However, FTO exhibits low activity toward these base modifications, and they are relatively infrequent with unclear physiological relevance (Klagsbrun, 1973). Thus, the physiologically relevant targets of FTO was unclear until recent studies that showed that FTO can demethylate N6-methyladenosine (m6A), a naturally occurring adenosine modification (Jia et al., 2011). These studies link adenosine methylation to physiological roles in human biological processes.
The distribution of m6A in RNA is poorly understood. Previous studies have found that m6A exists in RNA from a variety of unique organisms, including viruses, yeast, and mammals (Beemon and Keith, 1977; Bodi et al., 2010). m6A is found in tRNA (Saneyoshi et al., 1969), rRNA (Iwanami and Brown, 1968), and viral RNA (Beemon and Keith, 1977; Dimock and Stoltzfus, 1977). Although m6A is detectable in mRNA-enriched RNA fractions (Desrosiers et al., 1974), it has been confirmed in vivo in only one mammalian mRNA (Horowitz et al., 1984). The abundance of m6A has been shown to be 0.1–0.4% of total adenosine residues in cellular RNA (Dubin and Taylor, 1975; Perry et al., 1975; Wei et al., 1975), suggesting that this modification may be widespread throughout the transcriptome. Although the existence of m6A has been known for many years, progress in establishing the prevalence of m6A in mRNA has lagged behind that of other modified bases. This is due in large part to the lack of available methods for detecting the presence of m6A. Because methylation of adenosine does not alter its ability to base pair with thymidine or uracil, m6A is not amenable to detection with standard hybridization or sequencing-based methods.
Here, we examine the prevalence, regulation, and functional roles of m6A in the transcriptome. We show that m6A is a developmentally regulated RNA modification which is dynamically modified. Using antibodies that recognize m6A, we have developed an affinity enrichment strategy which, when coupled with next-generation sequencing, allows for the high-throughput identification of m6A sites. Using this approach, we present the first transcriptome-wide profile of m6A localization in RNA. We find that m6A is a widespread modification that is present in the mRNAs of over 7,600 genes and in over 300 noncoding RNAs. Additionally, m6A is highly enriched near the stop codon and in the 3’ UTR. Furthermore, bioinformatic analysis of m6A localization reveals consensus sites for m6A and identifies a potential interaction between m6A and microRNA pathways.
Because m6A exhibits the same base pairing as unmodified adenosine, it is not readily detectable by standard sequencing or hybridization-based approaches. Additionally, m6A is not susceptible to chemical modifications which might otherwise facilitate its detection, such as bisulfite treatment which is used to detect 5mC in DNA. The methods used thus far to detect m6A have involved treating cells with radiolabeled methionine, the precursor of the endogenous methylating agent S-adenosylmethionine, to impart radiolabeled methyl groups to adenosine (Csepany et al., 1990; Dubin and Taylor, 1975; Narayan and Rottman, 1988). Radiolabeled m6A residues are subsequently mapped with thin-layer chromatography or HPLC.
In order to simplify detection of m6A, we sought to develop an immunoblotting strategy. For these experiments, we used a previously described anti-m6A antibody (Bringmann and Luhrmann, 1987; Jia et al., 2011; Munns et al., 1977). To ensure the specificity of this antibody for m6A, we performed dot blots using modified oligonucleotides immobilized to a membrane. The m6A antibody selectively bound to oligonucleotides containing a single m6A residue, and exhibited negligible binding to oligonucleotides containing unmodified adenosine (Figure 1A). The binding was competed by incubating the antibody with increasing concentrations of an m6A-rich competitor RNA (Figure 1B). However, RNA containing unmodified adenosine did not compete for binding. Furthermore, binding was competed by N6-methyladenosine triphosphate, but not by ATP or other modified adenosine triphosphates including N1-methyladenosine and 2’-O-methyladenosine (Figure 1C). Finally, to examine the specificity of the antibody in the context of other nucleotide sequences, we took advantage of the fact that the enzyme encoded by the DNA adenine methylase (dam) gene in E. coli methylates the N6 position of adenosine in DNA. Upon subjecting digested DNA isolated from dam+ and dam− E. coli to immunoblotting using the m6A antibody, we found robust signals only in the DNA samples from the dam+ strain (Figure 1D). Collectively, these data demonstrate the high sensitivity and selectivity of this antibody for m6A, as well as its ability to detect m6A within cellular nucleotide pools.
To explore the abundance of m6A within various RNA populations, we isolated RNA from several mouse tissues and subjected it to immunoblot analysis using the m6A antibody (Figure 2A). We found that m6A was present in all RNA samples tested, indicating that this modified nucleotide is widely distributed in many tissues, with particularly high enrichment in liver, kidney, and brain (Figure 2B). In addition, we observed large differences in the m6A content of various immortalized cell lines, including several cancer cell lines, which further indicates that large differences in m6A levels exist in different cell populations (Figure S1A).
m6A immunoreactivity was detected in bands throughout the molecular weight range of the blot (~0.2 kb to ~ 10 kb), consistent with incorporation of m6A in mRNA. Indeed, fractionation of whole cellular RNA into polyadenylated and nonpolyadenylated RNAs indicates that m6A immunoreactivity is enriched in the polyadenylated RNA pool, which further suggests that m6A in cellular RNA is localized to mature mRNA (Figure 2C). To determine whether m6A is present in poly(A) tails, we selectively removed the poly(A) tail from cellular mRNA using oligo(dT) hybridization and RNase H treatment.
Transcripts depleted of the poly(A) tail did not exhibit an appreciable reduction in m6A levels (Figure 2D). In addition, immunoblotting poly(A) tails alone showed minimal m6A immunoreactivity (Figure S2C-E). Together, these data demonstrate that m6A is primarily an internal modification which is largely absent from the poly(A) tail.
Our observation that m6A is highly enriched within the brain prompted us to investigate the temporal dynamics of m6A levels during different stages of neural development. Immunoblotting RNA samples with the m6A antibody indicates that m6A is present in mRNA at low levels throughout embryogenesis but increases dramatically by adulthood (Figure 3A). A similar increase in m6A levels is also observed in RNA isolated from embryonic and postnatal rat brain cultured neurons (Figure S1B), which suggests that upregulation of m6A levels accompanies neuronal maturation.
We next asked if adenosine methylation is a dynamically regulated post-transcriptional modification and if its levels can be regulated by specific demethylating enzymes. In our search for potential demethylating enzymes that act to remove the methyl group from m6A, we focused on members of the family of Fe(II)- and 2-oxoglutarate-dependent oxygenases, several of which have previously been shown to demethylate both DNA and RNA (Falnes et al., 2007; Gerken et al., 2007). Consistent with the findings of Jia et al (2011), we observed that FTO decreased m6A levels when overexpressed in mammalian cells (Figure 3B). Furthermore, we find that overexpression of FTO resulted in a broad size range of RNAs that exhibit reduced m6A immunoreactivity (Figure 3B).
In order to obtain insight into potential roles for m6A, we sought to characterize its distribution throughout the transcriptome. To do this, we first determined whether the m6A antibody could be used to enrich m6A-containing RNAs. In vitro immunoprecipitation experiments showed that a single round of MeRIP produces ~70-fold enrichment, and two rounds produce over 130-fold enrichment for m6A-containing targets (Figure S3). To identify m6A sites throughout the transcriptome, we developed a method that combines m6A-specific methylated RNA immunoprecipitation (MeRIP) with next-generation sequencing (RNA-Seq). The procedure for MeRIP-Seq (outlined in Figure 4A) involves randomly fragmenting the RNA to approximately 100 nt-sized fragments prior to immunoprecipitation. Because an m6A site could lie anywhere along the length of a given immunoprecipitated 100 nt fragment, sequencing reads are expected to map to a region which contains the m6A site near its center. At its extremes, this region would be predicted to be roughly 200 nt wide (100 nt up- and downstream from the m6A site) (Figure 4B,C).
We next utilized MeRIP-Seq to identify m6A sites in total mouse brain RNA. Reads from the MeRIP sample frequently mapped to mRNAs and clustered as distinct peaks. As predicted, these peaks frequently converged to approximately 100 nt-wide regions near their midpoint (Figure 4C). Furthermore, enrichment of reads in these regions was not observed in the non-IP control sample, which was composed of the input RNA prior to m6A immunoprecipitation, demonstrating the specificity of these peaks (Figure S3).
To determine the location of these peaks throughout the transcriptome, and thus characterize the regions of m6A localization, we developed an algorithm for identifying m6A peaks (see Extended Experimental Procedures). Additionally, we performed replicate MeRIP-Seq experiments in which we utilized (1) a different sequencing platform (Illumina’s GAIIx vs HiSeq2000), (2) independently prepared RNA samples from different animals, and (3) an unrelated m6A antibody (Kong et al., 2000), which exhibited similarly high specificity for m6A (Figure S4). We employed our algorithm to identify m6A peaks that met a minimum p-value (p ≤ 0.05, Benjamini and Hochberg corrected) within each individual sample. From the three samples, we identified a total of 41,072 distinct peaks in the RNAs of 8,843 genes, which we call our “filtered” set of m6A peaks (Table S1). Of these peaks, 80% were detected in at least two different replicates. The high concordance between these samples indicates that MeRIP-Seq is highly reproducible across different sequencing platforms and using different m6A antibodies. For subsequent bioinformatic analyses, we used the list of 13,471 m6A peaks in RNAs from 4,654 genes which were detected in all three replicates (our “high-confidence” list; Table S2A). This list demonstrates the presence of m6A in a substantial fraction of the transcriptome and indicates that m6A is a common feature of mammalian mRNAs.
The majority of our high-confidence m6A peaks (94.5%) are found within mRNAs. However, we also observed that 236 (1.1%) of our peaks mapped to non-coding RNAs (ncRNAs) that were annotated in the RefSeq database (Table S2A). In addition, 588 m6A peaks did not map to a known RefSeq mRNA or ncRNA. To determine whether these unannotated peaks localize to ncRNAs predicted in other databases, we aligned them to genomic regions of a set of 32,211 ncRNAs from the RIKEN functional annotation of mouse (FANTOM3) dataset that we obtained from the mammalian noncoding RNA database (RNAdb; Pang et al., 2005). We found that 216 of these peaks mapped to a FANTOM3 ncRNA (Table S2B). All of these ncRNAs were greater than 200 nt in length, indicating that long ncRNAs are substrates for adenosine methylation. Additionally, when we interrogated a set of conserved human lincRNAs (Cabili et al., 2011) for overlaps with m6A peaks, we found nine additional peaks that overlapped with these lincRNAs (Table S2C). Collectively, these data identify several classes of ncRNAs as targets of adenosine methylation.
We next sought to validate the presence of m6A in mRNAs identified with MeRIP-Seq. To do this, we used RNA pull-down assays to isolate individual mRNAs from total mouse brain RNA by hybridization to target-specific probes. Isolated mRNAs were then subjected to immunoblot analysis using the m6A antibody to detect the presence of m6A. Using this method, we validated the presence of m6A within low density lipoprotein receptor (Ldlr) (Figure 5A,B), metabotropic glutamate receptor 1 (Grm1), and dopamine receptor D1A (Drd1a) (Figure S5A-D). These mRNAs were chosen to demonstrate our ability to validate m6A presence in transcripts with multiple methylation sites (Grm1 and Drd1a) as well as those with single m6A peaks (Ldlr). To further demonstrate that MeRIP-Seq selectively enriches for these endogenous methylated targets, we performed qRT-PCR on the unbound fractions after RNA precipitation with the m6A antibody. As expected, we observed substantial immunodepletion of Grm1, Drd1a, and other methylated targets in the unbound fraction. In contrast, transcripts which lack m6A peaks, such as Rps21 and Ndel1, were detectable at high levels in the unbound fraction (Figure S5E).
To predict potential signaling pathways and cellular processes that involve m6A, we used the DAVID bioinformatics database to identify the gene ontology (GO) terms that are enriched for m6A-containing transcripts. We found that genes encoding m6A-containing RNAs are involved in a variety of cellular functions, including transcriptional regulation, RNA metabolism, and intracellular signaling cascades (Table S5). In addition, we observed that m6A peaks mapped to many genes linked to neurodevelopmental and neurological disorders, such as Bdnf, Dscam, Lis1, and Ube3a, as well as the neurexins and several neuroligins. Collectively, these data demonstrate that m6A-containing RNAs are involved in a variety of biological pathways relevant to cellular signaling and disease.
Since m6A is a physiological target of FTO, we sought to determine whether mRNAs whose levels have previously been shown to be influenced by FTO activity contain m6A. We examined a list of 77 mRNAs whose levels are either up- or downregulated in the liver, skeletal muscle, or white adipose tissue of mice homozygous for a nonsynonymous FTO point mutation (Church et al., 2009). mRNAs from seven genes which were significantly upregulated in FTO mutants (Acaca, Atf6, Bip, Gcdh, Irs1, Perk, and Xbp1) also contain m6A peaks. Intriguingly, some of these genes are involved in important metabolic pathways, raising the possibility that demethylation of the transcripts of these genes may contribute to the mechanism by which FTO regulates metabolism and energy homeostasis.
We next characterized the pattern of adenosine methylation in mRNAs. mRNAs from many genes (46.0%) exhibit a single m6A peak, consistent with a single m6A site or a cluster of adjacent m6A residues. However, 37.3% contain two m6A peaks, 11.2% contain three peaks, and 5.5% contain four or more peaks. Several genes contain ten or more m6A peaks, suggesting the existence of multiple m6A residues along their length. Indeed, of the twenty genes that exhibited the largest number of m6A peaks, all had 15 or more m6A peaks along their length (Table S3). Additionally, of the genes that contain more than one m6A peak, 90.1% contain two or more contiguous m6A peaks, suggesting that m6A sites are frequently clustered in adjacent regions along the transcript. Indeed, 32.8% of m6A peaks are part of clusters that contain three or more adjacent m6A peaks, suggesting that m6A clustering is a common feature in methylated transcripts (Figure S6A). We also identified 68 genes that have long (≥ 1 kb) stretches of contiguous m6A peaks (Table S4), which likely indicates the presence of multiple m6A residues throughout these regions.
We next determined which mRNAs contain m6A sites with the highest degree of methylation. To do this, we developed a method of calculating the level of m6A enrichment at individual m6A peaks which normalized the number of MeRIP sample reads within each peak to the abundance of the individual transcript in which the peak resides (see Extended Experimental Procedures). The genes which contain the most enriched m6A peaks are listed in Table 1. Importantly, because MeRIP-Seq identifies m6A sites at a resolution of 200 nt, there could be multiple individual m6A residues within the area covered by each peak. Therefore, the peaks with the highest levels of local m6A enrichment may represent a single adenosine residue which exhibits a high degree of methylation, or multiple adjacent m6A residues with a lower stoichiometry of methylation. In either case, however, the high levels of methylation observed at these sites likely indicate transcripts that are most influenced by m6A-dependent regulatory processes.
We next examined the distribution of m6A peaks within regions of the transcriptome in our high confidence set. The majority (94.8%) of m6A peaks occur within intragenic regions (Figure 5C). These m6A peaks are abundant in coding sequences (CDS; 50.9%), and untranslated regions (UTRs; 41.9%), with relatively few in intronic regions (2.0%) (Figure 5C). Additionally, m6A peaks are less abundant in the 5’ UTR (7.0% of UTR peaks) than in the 3’ UTR (93.0% of UTR peaks) (Figure 5C). This distribution deviates substantially from the distribution of reads in the non-IP sample, indicating the high degree of enrichment of m6A peaks in the CDS and UTRs (Figure 5C). Although a low percentage of m6A peaks was observed in intronic regions, because our samples were not enriched for unspliced pre-mRNAs, it is possible that additional methylated intronic sequences exist.
We next sought to determine if m6A peaks are preferentially found in certain portions of transcripts. To do this, we assigned each m6A peak to either a 5’ UTR, CDS, or 3’ UTR category, and assigned it to one of 100 bins a based on its location along the 5’ UTR, CDS, or 3’ UTR. These data show that m6A occurs at low levels in the 5’ UTR and the 5’ end of the CDS. In the CDS, the percentage of m6A peaks increases steadily along transcript length and is on average 5–6 fold higher at the end of the CDS than at the beginning (Figure 5D). In the 3’ UTR, the peaks are enriched near the stop codon and decrease in abundance along the length of the 3’ UTR. Indeed, 61% of m6A peaks are in the first quarter of the 3’ UTR and a quarter of all m6A peaks across the entire transcriptome are found within the first 26% of the 3’ UTR (Figure 5D). Mapping the number of m6A peaks 1 kb up and downstream of CDS end sites further demonstrated the high levels of methylation in the vicinity of the stop codon (Figure S7A,B). Collectively, these data indicate that m6A peaks are highly clustered in the vicinity of the stop codon in mRNAs.
We next asked if m6A sites are conserved across species. We compared PhyloP scores across 30 vertebrates (Pollard et al., 2010) of m6A peak regions to those of random regions of the same size in gene exons. We found that the distribution of conservation scores of the m6A peaks was significantly different from that of the random regions (p ≤ 2.2×10−16, Kolmogorov-Smirnov (K-S) test, Figure 6A) and that m6A peaks’ median conservation score (0.578) was much higher than that of the random regions (0.023). The fact that m6A frequently occurs in evolutionarily conserved sequences suggests that m6A-containing regions are maintained through selection pressure.
Because the tools for transcriptome-wide localization of m6A sites have until now been unavailable, only a few studies to date have examined the sequence contexts of m6A formation (Canaani et al., 1979; Dimock and Stoltzfus, 1977; Wei et al., 1976). Using methods such as RNase T1 fingerprinting of radiolabeled RNA followed by separation by thin-layer chromatography, these studies reported that m6A exists within two unique sequence contexts: GAC and AAC (underlined adenosines indicate m6A). Subsequently, an extended m6A consensus sequence was identified: PuPuACX (Pu= purine; X= A, C, or U). However, since the methods used in these studies are not practical for use in a high-throughput manner, it is unclear whether these motifs are relevant to the transcriptome-wide m6A sites identified by MeRIP-Seq.
We therefore sought to identify sequence motifs that are enriched within m6A peaks. To do this, we used FIRE, a sensitive and unbiased tool for discovering RNA regulatory elements (Elemento et al., 2007). Remarkably, FIRE independently identified the GAC and AAC motif, G[AG]ACU, and related variants ([AC]GAC[GU], GGAC, [AU][CG]G[AG]AC, and UGAC) as being highly enriched in m6A peaks (Figure 6B). For example, the G[AG]ACU motif occurs in 42% of all m6A mRNA peaks and in a much lower fraction (21%) of non-m6A control peaks from the same mRNAs (p<1.0 × 10−124, chi-square test). Altogether, we found that >90% of all m6A peaks contain at least one of the motifs identified by FIRE.
We next examined the position of the motifs within m6A peaks. Nearly 30% of m6A peaks have only one motif (Figure S6B), indicating that these peaks are likely to contain only a single methylated residue. Motifs are also preferentially found in the center of m6A peaks (Figure 6C, D), suggesting that these peaks derive from a centrally located methylated adenosine residue. Notably, other RNA regulatory elements, such as AU-rich elements, poly(A) signals, or binding sites for known RNA-binding proteins, were not identified by FIRE, suggesting that m6A is unlikely to primarily function by modifying these known regulatory elements.
FIRE did not identify an enrichment of poly(A) signals (PASs), which are involved in 3’ UTR end processing, in m6A peaks. However, PASs exhibit considerable sequence heterogeneity beyond the canonical AAUAAA consensus (Tian et al., 2005). This sequence heterogeneity might allow these PASs to evade detection by FIRE, despite being enriched in m6A peaks. Therefore, we sought to further investigate whether m6A peaks within 3’ UTRs are enriched at PASs. We obtained a high-confidence list (Brockman et al., 2005) of poly(A) cleavage sites (the site downstream of a PAS where the mRNA is actually cleaved and polyadenylated) for the mRNAs that contain m6A peaks within their 3’ UTRs. We then examined whether m6A peaks were enriched near these sites by determining the number of 3’ UTR m6A peaks that fell within 50 nt upstream of each cleavage site. Since a PAS is located approximately 10–30 nt upstream of an actual mRNA cleavage site (reviewed in Proudfoot, 1991), these 50 nt-long regions are expected to contain the PAS. Of the 6,288 m6A peaks found within 3’ UTRs, 1,042 (16.6%) overlapped with the 50 nt-long regions upstream of poly(A) cleavage sites, compared to 1,070 (17.0%) control peaks, which were generated from random nonoverlapping regions of the same 3’ UTRs. Thus, these data suggest that m6A do not have a significant association with known PASs (p = 0.39; chi-square test).
Prior studies that used nonspecific methylation inhibitors to explore possible functions for m6A revealed impaired splicing in a small number of RNAs (Carroll et al., 1990; Stoltzfus and Dane, 1982). We therefore asked whether the localization of m6A peaks is compatible with a role for influencing the binding of splicing factors. However, only 80 splice junctions were found in regions contiguous with m6A peaks, significantly less than the overlap seen with a set of randomly-generated peaks (9,531; p=0.0; chi-square test). Thus, unlike CLIP-Seq tag clusters from RNA-binding proteins that influence splicing (Licatalosi et al., 2008), m6A peaks did not significantly coincide with exon-exon junctions, suggesting that m6A is unlikely to primarily function to directly influence the binding of splicing factors.
The strong enrichment of m6A peaks in 3’ UTRs prompted us to investigate whether m6A peaks are found near microRNA (miRNA) binding sites, which are also frequently observed within 3’ UTRs. We found that 67% of 3’ UTRs that contain m6A peaks also contain at least one TargetScan-predicted miRNA binding site. Since ~30% of genes have miRNA binding sites in their 3’ UTRs (Lewis et al., 2005), this is a significantly greater association than what would be expected by chance alone. Intriguingly, we also found that in 3’ UTRs with both m6A peaks and miRNA binding sites, the m6A peaks precede miRNA binding sites 62% of the time. Moreover, we found that the overall distribution of m6A peaks and miRNA binding sites within 3’ UTRs are anti-correlated; while m6A peaks are most abundant near the stop codon and generally decrease in frequency along 3’ UTR length, miRNA target sites are more enriched near the 3’ end of 3’ UTRs (Figure 6E). The reason for this inverse localization pattern is unknown, although it could indicate that a certain spatial separation is necessary for m6A to influence the function of a downstream bound miRNA or vice versa.
We next sought to determine whether miRNA-targeted transcripts in the brain are more likely to contain m6A. To test this, we used TargetScan to identify the target transcripts of the 25 most highly expressed and 25 least highly expressed miRNAs within the brain. Intriguingly, we observed that the most highly expressed miRNAs have a significantly greater percentage of target transcripts that contain m6A (p<0.05, Wilcoxon test; Figure 6F). These data suggest that miRNA levels may control methylation of their target transcripts.
We next asked whether the enrichment of m6A in the 3’ UTR is also observed in other species. We therefore profiled m6A in HEK293T cells, a human cell line with high levels of adenosine methylation (Figure 3B). We generated a high-confidence list of m6A peaks using three MeRIP-Seq biological replicates and confirmed by both m6A antibodies. We found that the distribution of m6A peaks in HEK293T cells closely mirrored the distribution in mouse brain, with 31% and 53% of m6A peaks falling within the 3’ UTR and the CDS, respectively (Figure 5D, Figure S7D). As with the pattern of m6A distribution in the mouse brain transcriptome, HEK293T m6A peaks were predominantly localized near stop codons (Figure 5D and Figure S7C).
In total, we identified 18,756 peaks in RNAs encoded by 5,768 genes in HEK293T cells (Table S6). Additionally, we found that transcripts from 2,145 and 3,259 genes were methylated only in the mouse brain and HEK293T datasets, respectively, and that transcripts from 2,509 genes were methylated in both datasets (Table S6B). Interestingly, among the transcripts methylated in both tissues, m6A peaks were often localized to the same distinct regions of both orthologs (Figure 5E). Collectively, these data indicate that m6A peaks are enriched near the stop codon in human transcripts and that many sites of methylation are conserved in mouse and human transcripts.
Unlike DNA, which undergoes cytosine methylation and hydroxymethylation, dynamic internal modifications of mRNA other than RNA editing have not been established. Recent evidence that the obesity risk gene, FTO, is a physiologic m6A demethylase suggests that m6A has central roles in cellular function. Here we use MeRIP-Seq to provide the first transcriptome-wide characterization of m6A. We show that m6A is a reversible and widespread modification which is primarily located in evolutionarily conserved regions and is particularly enriched near the stop codon. We also find that many features of m6A localization are conserved between the human and mouse transcriptomes, and we uncover a previously unidentified link between m6A and miRNA signaling. Collectively, these studies reveal that m6A is a widespread and dynamically regulated base modification in mRNA, and they identify mRNAs which are most likely to be influenced by signaling pathways that influence m6A levels.
One of the most striking features of m6A localization is its prevalence within 3’ UTRs. The 3’ UTR is an important region for RNA regulation, as it can influence RNA stability, subcellular localization, and translation regulation. Several of these events are regulated by RNA binding proteins (RBPs) that bind to cis-acting structural motifs or consensus sequences within the 3’ UTR and act to coordinate RNA processing. Conceivably, m6A may influence the affinity of specific RBPs for their target mRNAs, analogous to the recruitment of methyl-CpG binding protein 2 (MeCP2) to methylated cytosine residues in DNA (Lewis et al., 1992). Given the abundance of m6A throughout the transcriptome and its widespread localization, such a role for m6A would be likely to have important consequences for the regulation of numerous mRNAs.
Our profiling of m6A in HEK293T cells revealed thousands of transcripts that are also methylated in the mouse brain. In many cases, the patterns of m6A localization within these transcripts are nearly identical, suggesting that some RNAs possess highly conserved methylation profiles. However, we also uncovered many transcripts that exhibit distinct cell type-specific methylation patterns, demonstrating that m6A is also capable of being differentially regulated within unique cellular environments.
Our finding that a large proportion of 3’ UTRs that contain m6A peaks also contain miRNA binding sites is highly suggestive of an association between m6A and miRNA function. Additionally, our analysis also indicated an inverse localization of m6A peaks and miRNA binding sites within 3’ UTRs, with m6A sites typically preceding, but not overlapping, the miRNA sites in the 3’ UTRs. Although miRNAs can inhibit their target mRNAs by promoting either transcript degradation or translational repression (Guo et al., 2010; Hendrickson et al., 2009), the factors that determine which fate predominates are not well understood. Conceivably, the proximity of m6A to a miRNA binding site could influence the mechanism of miRNA-mediated transcript inhibition. Additionally, it is possible that miRNA binding influences m6A levels within 3’ UTRs. Indeed, our finding that abundant miRNAs are more significantly enriched in m6A peaks than weakly expressed miRNAs raises the possibility that miRNAs regulate methylation status.
A surprising result of these studies is the finding that m6A is highly enriched near stop codons. This recurrent localization within transcripts suggests that adenosine methylation in the vicinity of the stop codon may be of functional importance. Interestingly, the consensus for adenosine methylation is relatively short, and sequences that match the consensus are found throughout the transcriptome. However, despite the frequency of m6A consensus sites, methylation occurs primarily near stop codons. It will be important to determine how this specificity is achieved and whether cellular mechanisms that involve recognition of the stop codon or the beginning of the 3’ UTR are involved in providing specificity to adenosine methylation.
The finding that FTO demethylates m6A suggests that misregulation of pathways controlled by adenosine methylation ultimately affect physiologic processes in humans. Although m6A is found in many classes of RNA, it is intriguing to speculate that FTO mutations mediate their effects by affected m6A in mRNA. Indeed, our finding that FTO can demethylate diverse mRNAs is consistent with this model. Direct characterization of m6A profiles in patients with FTO mutations will be useful to establish the mechanisms by which this mutation leads to disease.
In summary, our study demonstrates that m6A is a widespread modification found in a large fraction of cellular mRNA. The pervasive nature of this modification suggests that adenosine methylation has important roles in RNA biology. Much how cytosine methylation and hydroxymethylation in DNA are important epigenetic regulators of the genome, our data demonstrate that adenosine methylation in RNA is a reversible modification that is likely to influence a wide variety of biological pathways and physiological processes.
Total mouse brain RNA was subjected to RiboMinus treatment to reduce rRNA content as per the manufacturer’s instructions (Invitrogen). RNA was then fragmented to 100 nt-sized fragments using Illumina Fragmentation Buffer according to the manufacturer’s instructions, and subjected to two rounds of m6A immunoprecipitation. Sequencing libraries were prepared using the Illumina protocol for mRNA samples, and sequencing was performed on an Illumina GAIIx or an Illumina HiSeq2000 as indicated. Genomic alignment (mm9 from UCSC genome browser) was done using the Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2010) at default settings. We analyzed only those reads which (1) uniquely mapped to the genome and (2) had a Phred quality score greater or equal to 20.
Additional methods are detailed in Extended Experimental Procedures.
m6A is a widespread RNA modification in many tissues with high levels in the brain
MeRIP-Seq identifies m6A in 7,913 genes encoding both coding and noncoding RNAs
m6A is enriched near stop codons and within 3' UTRs in both mouse and human mRNAs
The transcriptome-wide landscape of m6A provides important insights into m6A function
We thank Dr. Richard Roberts and New England Biolabs for generously providing the m6A (NEB) antibody. We also thank members of the Jaffrey and Mason laboratories for helpful comments and suggestions. This work was supported by NIH grants T32CA062948 (K.D.M.), 1F32MH095353-01 (K.D.M.), 1R01NS076465-01 (C.E.M.) and MH080420 (S.R.J.), NSF CAREER grant 1054964 (O.E.), and the Starr Cancer Consortium (I4-A411; C.E.M.).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
MeRIP-Seq data have been deposited in the GEO database under accession number GSE29714.