PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of narLink to Publisher's site
 
Nucleic Acids Res. 2011 January; 39(Database issue): D66–D69.
Published online 2010 November 11. doi:  10.1093/nar/gkq990
PMCID: PMC3013810

AREsite: a database for the comprehensive investigation of AU-rich elements

Abstract

AREsite is an online resource for the detailed investigation of AU-rich elements (ARE) in vertebrate mRNA 3′-untranslated regions (UTRs). AREs are one of the most prominent cis-acting regulatory elements found in 3′-UTRs of mRNAs. Various ARE-binding proteins that possess RNA stabilizing or destabilizing functions are recruited by sequence-specific motifs. Recent findings suggest an essential role of the structural mRNA context in which these sequence motifs are embedded. AREsite is the first database that allows to quantify the structuredness of ARE motif sites in terms of opening energies and accessibility probabilities. Moreover, we also provide a detailed phylogenetic analysis of ARE motifs and incorporate information about experimentally validated targets of the ARE-binding proteins TTP, HuR and Auf1. The database is publicly available at: http://rna.tbi.univie.ac.at/AREsite.

INTRODUCTION

AU-rich elements (AREs) are distinct sequence elements in the 3′-untranslated region (UTR) of mRNAs often consisting of one or several AUUUA pentamers located in an adenosine and uridine rich region (1). Numerous proteins directly interact with AREs, thereby modulating mRNA stability or translational efficiency. The importance of these sequence motifs has been highlighted recently by a multitude of studies pointing out that the loss of ARE-mediated mRNA control leads to severe pathologies as AREs affect gene expression on a global scale (2–7).

AREs have been studied bioinformatically early on (8) and today’s estimate is that ~7% of the human protein-coding genes contain AREs (9). However, the presence of an ARE consensus motif alone is not enough to qualify a gene as a true in vivo target of ARE-binding proteins. Recent computational and experimental evidence (10–13) and the fact that ARE-targeting proteins bind to RNA in single-stranded conformation (14) emphasize the need to analyze the structural context these motifs are embedded in. Furthermore, the mounting comparative genomics data available can be harnessed to identify evolutionarily conserved motif sites. AREsite is the first database that combines sequence annotation of AREs with the prediction of the accessibility and evolutionary conservation of the motif site. In addition to these features, we incorporated information from extensive expert literature search and list experimentally validated targets of the ARE-binding proteins TTP, HuR and Auf1.

DATABASE GENERATION AND CONTENT

In its current version AREsite uses Ensembl release 56 as data basis. For human and mouse, any protein-coding gene that has at least one transcript with a 3′-UTR sequence has been added to the collection. To account for the various definitions of AREs found in literature we decided not to restrict the database to a single motif, but offer the user the possibility to screen for a total of eight different consensus motifs, starting with the plain AUUUA pentamer to the WWWWAUUUAWWWW 13-mer, which resembles the core motif embedded in a stretch of A/U residues. By default, only the representative transcript of the selected gene, which we define as the transcript with the most AUUUA counts in its 3′-UTR sequence, is analyzed in detail. For each transcript we list sequence statistics and calculate the fold enrichment based on an order-0 and an order-1 Markov model for each motif. Beside plain sequence annotation of ARE motifs in transcripts AREsite also offers the researcher to study sequence conservation of motifs on both transcript and genomic level. For each motif site we provide annotated alignments with highlighted conserved motifs and sequence logos (15). Finally, an overview figure in form of a phylogenetic tree depicts the conservation pattern of all detected motif sites. Motif site accessibility in terms of opening energies and probabilities of being unpaired are calculated using RNAplfold (16,17). For each motif we present accessibility values for the core AUUUA pentamer. Furthermore, results are visualized in an interactive SVG plot that allows the user to explore different parameter settings (Figure 1).

Figure 1.
Screenshot of the interactive SVG plot showing an ARE motif site of the human TNF-alpha gene. TNF-alpha is one of the best characterized ARE-containing genes. Its ARE target site consists of several consecutive ATTTA (AUUUA) motifs which favors the site’s ...

For the three best studied ARE-binding proteins TTP, HuR and Auf1, literature was screened for putative or confirmed mRNA targets. We classified the type of evidence for an mRNA being targeted by one of the three proteins by five criteria: (i) direct binding of the protein to the mRNA or its 3′-UTR (e.g. using RNA immunoprecipitation or electrophoretic mobility shift assays); (ii) an independent reporter assay confirming the functionality of the putative binding site; (iii) the loss or overexpression of the ARE-binding protein affects mRNA and/or (iv) the protein level of the target mRNA; (v) the stability of the target mRNA is affected by the lack or excess of the ARE-binding protein as shown by actinomycin D chase experiments or cell-free decay assays. New references will be added on a regular basis.

Figure 2 shows a typical output of an AREsite query. If the user aims for permanent storage of the search results, annotated Genbank files can be downloaded for each analyzed transcript.

Figure 2.
Snapshot of a typical AREsite results page (gene: human IL6). (A) Basic statistics about the selected gene. (B) Experimental evidence collected for this gene. For each of the ARE-binding proteins TTP, HuR and Auf1 we list the type of evidence. The user ...

Generation of alignments from transcripts

Alignments of orthologous transcripts were generated using data from the Ensembl gene orthology pipeline. For each gene database entry we first collected all orthologous genes from other species that have a strict one to one relation. Next we screened for transcripts that have an annotated 3′-UTR and among those we selected the one that showed the best coverage (at least 75%) of the reference species 3′-UTR. Multiple species whole transcript alignments were then generated with CLUSTAL W. To investigate the sequence conservation of the motif site we finally extract the region containing the motif site plus five flanking nucleotides on each side from the alignments. Each alignment sequence is then searched with the corresponding consensus ARE motif. Finally, detected motifs are used as sequence anchors and sequences are realigned using DIALIGN (18). The same procedure was also applied to the processed and filtered genomic alignments.

Generation of genomic alignments

Since comparative data at the level of transcripts is still limited, we decided to also incorporate data from genome-wide alignments to get a more refined picture of the conservation pattern of motifs. Interpretation of these data though has to be done with caution since there is no guarantee that the aligned sequences from other species really belong to the gene of interest. We apply, however, filtering strategies that ensure that aligned sequences are homologous over a longer stretch of nucleotides than simply the motif site.

Genomic alignments in MAF format were obtained for each UTR sequence from multiz generated alignments available at the UCSC genome browser (19). For human, corresponding alignments were extracted from 46 species multiple alignments based on the human genome assembly hg19, and for mouse from 30 species multiple alignments based on the assembly mm9. The obtained alignment blocks were often too short for any practical use and so we developed a MAF processing and filtering pipeline, that first merges adjacent MAF blocks to longer ones and then returns alignment windows of 120 nt and a step size of 30 nt. Finally, these windowed alignments were realigned with CLUSTAL W and were filtered to contain only sequences that have a length of at least 50% of the sequence length of the reference species.

Quantifying motif site accessibility

For the calculation of the motif site accessibility in terms of opening energies and probabilities of being unpaired we used RNAplfold (16) with different parameter settings. RNAplfold is a thermodynamic RNA folding program that calculates local base-pairing probabilities, as well as the probability that a stretch of u consecutive nucleotides is unpaired (17). These probabilities are directly related to the energy needed to open all secondary structures in the respective stretch of nucleotides. The parameter set W = 80, L = 40 models the effects of cotranscriptional folding and has been previously used to predict siRNA binding (20). AREsite features also a different parameter setting (W = 240, L = 120), which considers longer base pair spans and shows improved results on siRNA binding as well as on RNA–RNA interaction (H. Tafer, personal communication). For each detected motif site we list the accessibility values (u = 5) for the core AUUUA pentamer for both parameter settings (short range, mid range).

DISCUSSION

In this contribution we have introduced AREsite, a database for the detailed investigation of ARE motifs in terms of motif site accessibility and evolutionary conservation. In its current state AREsite reports 3275 human protein coding genes which have at least one occurrence of the consensus motif WUAUUUAUW in their 3′-UTR sequences. This corresponds to ~16% of the human protein coding genes. For 711 of those genes AREsite lists experimental evidence that they are targets of ARE-binding proteins. The requirements which are needed to qualify a gene as an in vivo target of ARE-binding proteins are still poorly understood. AREsite with its features of conservation pattern analysis and accessibility prediction can help researchers to unravel the underlying mechanism. Recent studies (11,13) demonstrate the great value of combining computational accessibility prediction and wet-lab data. When interpreting accessibility predictions one has to keep in mind, however, that low accessibility does not necessarily exclude a gene from being an in vivo target. mRNA regulation is a complex system and the binding of one factor might lead to structural rearrangements which can make a formerly cryptic site accessible or vice versa (21). In the context of AREs, this concept has been nicely demonstrated by using artificially designed mRNA openers and closers to control mRNA stability (22). The accurate modeling of these combinatorial effects will be among the most challenging issues for future work.

FUNDING

University of Vienna “Research platform: Structural and Functional Analysis of mRNA Molecules Targeted by the RNA-binding Protein Tristetraprolin” (to P.K. and I.L.H.) Funding for open access charge: University of Vienna.

Conflict of interest statement. None declared.

REFERENCES

1. Barreau C, Paillard L, Osborne HB. AU-rich elements and associated factors: are there unifying principles? Nucleic Acids Res. 2005;33:7138–7150. [PMC free article] [PubMed]
2. Hao S, Baltimore D. The stability of mRNA influences the temporal order of the induction of genes encoding inflammatory molecules. Nat. Immunol. 2009;10:281–288. [PMC free article] [PubMed]
3. Lu JY, Sadri N, Schneider RJ. Endotoxic shock in AUF1 knockout mice mediated by failure to degrade proinflammatory cytokine mRNAs. Genes Dev. 2006;20:3174–3184. [PubMed]
4. Ghosh M, Aguila HL, Michaud J, Ai Y, Wu MT, Hemmes A, Ristimaki A, Guo C, Furneaux H, Hla T. Essential role of the RNA-binding protein HuR in progenitor cell survival in mice. J. Clin. Invest. 2009;119:3530–3543. [PMC free article] [PubMed]
5. Katsanou V, Milatos S, Yiakouvaki A, Sgantzis N, Kotsoni A, Alexiou M, Harokopos V, Aidinis V, Hemberger M, Kontoyiannis DL. The RNA-binding protein Elavl1/HuR is essential for placental branching morphogenesis and embryonic development. Mol. Cell. Biol. 2009;29:2762–2776. [PMC free article] [PubMed]
6. Taylor GA, Carballo E, Lee DM, Lai WS, Thompson MJ, Patel DD, Schenkman DI, Gilkeson GS, Broxmeyer HE, Haynes BF, et al. A pathogenetic role for TNF alpha in the syndrome of cachexia, arthritis and autoimmunity resulting from tristetraprolin (TTP) deficiency. Immunity. 1996;4:445–454. [PubMed]
7. Hodson DJ, Janas ML, Galloway A, Bell SE, Andrews S, Li CM, Pannell R, Siebel CW, MacDonald HR, De Keersmaecker K, et al. Deletion of the RNA-binding proteins ZFP36L1 and ZFP36L2 leads to perturbed thymic development and T lymphoblastic leukemia. Nat. Immunol. 2010;11:717–724. [PMC free article] [PubMed]
8. Bakheet T, Frevel M, Williams BR, Greer W, Khabar KS. ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins. Nucleic Acids Res. 2001;29:246–254. [PMC free article] [PubMed]
9. Halees AS, El-Badrawi R, Khabar KS. ARED organism: expansion of ARED reveals AU-rich element cluster variations between human and mouse. Nucleic Acids Res. 2008;36(Database issue):137–140. [PMC free article] [PubMed]
10. Hackermüller J, Meisner NC, Auer M, Jaritz M, Stadler PF. The effect of RNA secondary structures on RNA-ligand binding and the modifier RNA mechanism: a quantitative model. Gene. 2005;345:3–12. [PubMed]
11. Li X, Quon G, Lipshitz HD, Morris Q. Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure. RNA. 2010;16:1096–107. [PubMed]
12. Rabani M, Kertesz M, Segal E. Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes. Proc. Natl Acad. Sci. USA. 2008;105:14885–90. [PubMed]
13. Kazan H, Ray D, Chan ET, Hughes TR, Morris Q. RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins. PLoS Comput. Biol. 2010;6:e1000832. [PMC free article] [PubMed]
14. Hudson BP, Martinez-Yamout MA, Dyson HJ, Wright PE. Recognition of the mRNA AU-rich element by the zinc finger domain of TIS11d. Nat. Struct. Mol. Biol. 2004;11:257–264. [PubMed]
15. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. [PubMed]
16. Bernhart SH, Hofacker IL, Stadler PF. Local RNA base pairing probabilities in large sequences. Bioinformatics. 2006;22:614–615. [PubMed]
17. Bompfünewerer AF, Backofen R, Bernhart SH, Hertel J, Hofacker IL, Stadler PF, Will S. Variations on RNA folding and alignment: lessons from Benasque. J. Math. Biol. 2008;56:129–44. [PubMed]
18. Morgenstern B. DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics. 1999;15:211–218. [PubMed]
19. Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res. 2010;38(Database issue):613–619. [PMC free article] [PubMed]
20. Tafer H, Ameres SL, Obernosterer G, Gebeshuber CA, Schroeder R, Martinez J, Hofacker IL. The impact of target site accessibility on the design of effective siRNAs. Nat. Biotechnol. 2008;26:578–583. [PubMed]
21. Kedde M, van Kouwenhove M, Zwart W, Oude Vrielink JA, Elkon R, Agami R. A Pumilio-induced RNA structure switch in p27-3′ UTR controls miR-221 and miR-222 accessibility. Nat. Cell Biol. 2010;12:1014–1020. [PubMed]
22. Meisner NC, Hackermüller J, Uhl V, Aszódi A, Jaritz M, Auer M. mRNA openers and closers: modulating AU-rich element-controlled mRNA stability by a molecular switch in mRNA secondary structure. Chembiochem. 2004;5:1432–1447. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press