PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS One. 2010; 5(4): e10316.
Published online 2010 April 23. doi:  10.1371/journal.pone.0010316
PMCID: PMC2859052

Mining Mammalian Transcript Data for Functional Long Non-Coding RNAs

Cathal Seoighe, Editor

Abstract

Background

The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of ‘transcription noise’. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner.

Principal Findings

We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented.

Conclusion

Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms.

Introduction

With the rapid development in high-throughput sequencing methods, one is now able to describe the mammalian transcriptome in great detail [1], [2], [3]. Not only is the mammalian transcriptome vast (comprising millions of RNA transcripts) [1], but also is quite unexpectedly diverse. For example, transcript lengths vary from 18 nucleotides (small interfering RNAs) to more than 15,000 nucleotides (in the case of macroRNAs or long non-protein-coding RNAs). Some protein-coding genes not only encode proteins but also contribute to the non-protein-coding RNA pool [4]. It is to be noted however that a significant proportion of the mammalian transcriptome could simply be ‘transcriptional noise’ [5], [6], [7], [8]. A wealth of data is now available for the two most studied mammalian genomes (human and mouse), and the chief challenge is to mine this data effectively for functionally relevant sequences. In this study, we have mined the full-length mammalian transcript (cDNA) data sets from the H-Invitational [3] and Fantom3 [2] projects, to identify potentially functional long non-protein-coding RNAs (lncRNAs). Our rationale was that those lncRNAs (> = 200 nucletoides) that are expressed in human and mouse and preserved in distant relatives, plus that show features of primary sequence and secondary structure conservation, are likely to be functional. We were also interested in knowing whether lncRNAs are transcribed from orthologous protein-coding genes, and if so, from which ones. A positive finding would indicate the conserved role of such protein-coding genes in producing noncoding RNAs, and also would indicate probable functional categories of the lncRNAs.

Previously, we developed a computational pipeline to annotate ‘transcribed pseudogenes’ (tψg), a class of long non-protein-coding RNAs that are homologous to protein-coding gene transcripts, but which harbor features indicative of a lack of protein-coding ability [9]. We discovered thousands of cases of transcribed pseudogene annotations in the human genome, and filtered the list to identify potential functional cases. In this paper, in a complementary analysis, we have identified conserved non-tψg members of the long non-protein-coding RNA category.

Long non-protein-coding RNAs (also termed ‘messenger-like’ or ‘messenger-RNA-like’ non-coding RNAs) usually bear features of mRNAs, viz., 5′ capping, splicing and polyadenylation. However, they do not code for any protein. Although some well-characterized cases lack sequence conservation indicative of possible lineage-specific adaptive evolution [5], [8], a recent experimental work using chromatin immunoprecipitation and massively parallel sequencing (ChIP-Seq) identified several (>1500) ‘large, intervening ncRNAs’ that have some signatures of evolutionary conservation [10], thus challenging the current notion that lncRNA are not generally evolutionary conserved.

Examples of well-known functional long non-protein-coding RNAs include: Xist, and H19. Xist mediates X chromosome silencing as part of heterogametic dosage compensation during development [11], [12]. H19 regulates expression of its neighboring gene Igf2, during embryogenesis, and may act as a tumour suppressor [13], [14], [15]. Recently, by means of comparative genomics, conserved long non-protein-coding RNAs have been identified [16], but authors have either ignored the regions that overlap protein-coding genes, or considered smaller length human transcripts (EST sequences) as a proxy for transcription in the absence of full-length non-protein-coding transcripts. It is possible that non-protein-coding sequences arise in part from protein-coding genes, for example, comprising of only UTR regions, or including retained introns, in their non-protein-coding transcripts. We propose that such cases have to be included in the category of long non-protein-coding RNAs, and that some cases cannot be clearly classified as either alternative splicing or partially overlapping lncRNAs. Another parameter we considered as essential was the length of potential lncRNA transcripts. In the present analysis, we used a lower bound of 200 nucleotides for the operational definition of lncRNAs, as in earlier work [8], [17], [18]. This criterion was chosen on the basis of a suitable practical cut-off during RNA purification steps to exclude small RNAs.

Results and Discussion

Identification of conserved and expressed lncRNAs

H-Inv and Fantom3 projects catering to the human and mouse genomes, respectively, have generated thousands of sequence reads constituting expressed complements of the genomes [1], [2], [3]. Mere expression however does not necessarily indicate functionality. Many of these transcripts may simply be ‘transcriptional noise’ [5], [6]. Expressed elements that are syntenically conserved in phylogenetically divergent mammals are likely to be functional across Mammalia. Although a lot of transcripts could potentially be degradation products of UTRs or incompletely processed hnRNA fragments [7], natural selection would ensure preservation of biologically relevant genomic elements over millions years of evolution. Therefore, we developed a pipeline to identify potentially functional lncRNA candidates (fig. 1). We defined putative lncRNAs as full-length transcripts > = 200 nucleotides that do not: (i) exclusively contain known protein-coding exons; (ii) contain UTR plus protein-coding exons. We examined for syntenic conservation between the human and mouse genomes (see Methods for details). Additionally, we were also interested in identifying lncRNAs that originate from orthologous genes. Such genes may give hints to the function of lncRNAs. We found that 78 lncRNAs are syntenically conserved or originate from orthologous genes (Table 1). Some of these have detectable sequence similarity (Table 2). It is imperative that we find previously characterized functional lncRNAs in the list. Indeed, our list contains two well-documented examples of lncRNAs, namely, H19 [13], [14] and Xist [11], [12]. We also looked for lncRNA candidates that could have arisen due to internal priming as described by Nordstrom et al. [7] and Nam et al. [19]. For this a 50 bp genomic region downstream to the identified putative human lncRNAs was examined for the presence of poly(A) rich region. We found that only 3 out of the 78 putative lncRNAs may have arisen due to internal priming, thus indicating that the majority of the identified lncRNAs in this study are likely to be genuine candidates.

Figure 1
A schematic representation of the discovery pipeline for conserved expressed long non-protein-coding RNAs (lncRNAs).
Table 1
General statistics for the 78 conserved lncRNAs.
Table 2
A summary of the analysis results for preservation, sequence conservation and occurrence of secondary structure motifs in mouse lncRNAs that have orthologous human counterparts (with BLAST homology).

Origin of lncRNAs from various genomic positions

Next, we analyzed the various genomic segments that participate in the generation of these lncRNAs. We found that the above shortlisted lncRNAs are predominantly (70 out of 78 cases) derived from protein-coding genes (including intronic regions) or lay directly beside them (<1000 nts distance). This suggests that the lncRNAs depend on the same promoter regions for transcription, as the nearby protein-coding genes. Interestingly, 18 lncRNAs are expressed from UTRs, exclusively. Nineteen of them originate from introns, while others arise from a combination of different categories of genomic DNA, as exemplified in fig. 2. The lncRNAs that originate from UTRs may have a possible regulatory role akin to the role of specific UTRs as riboswitches [20].

Figure 2
A schematic representation of different genomic regions from which lncRNA originate relative to the structure of a protein-coding gene.

A significant proportion of putative functional lncRNAs originate from cancer-related genes

We found that ~35% (20/57) of the protein-coding genes that overlap with the annotations of the identified lncRNAs are implicated in the causation of diseases, particularly cancer (Table 3). To assess for the possible enrichment of such genes we proceeded as follows. We counted the number of lncRNA-producing genes from our list that are listed in the ‘CGMIM’ database [21]. ‘CGMIM’ provides a list of all gene entries in OMIM that referred to some type of cancer. ~18% of the protein coding genes that produce lncRNAs (10/57) have reference to cancer (see Table 3), whereas only 9% of all human protein coding genes (2147/23621) have reference to cancer. The above difference is statistically significant (chi-square test, P-value: 0.047; hypergeometric probability P-value = 0.018), suggesting that the genes implicated in cancer causation have a higher tendency to produce lncRNAs. It has been earlier found that ncRNAs have altered expression/splicing in cancer cells [22], [23]. Thus, we believe that the identified lncRNAs could have potential roles in oncogenesis, although of course, we cannot ascertain here whether there is a ‘cause-and-effect’ relationship.

Table 3
List of lncRNAs associated with known genes implicated in cancer pathogenesis.

Putative functional lncRNAs typically bear single non-coding exon

We performed an intron/exon analysis on the identified set of putative functional lncRNAs to study the contribution of splicing to their generation, thereby assessing the possible relationship between lncRNA splicing and function. We found that a vast majority (~83%, 65 out of 78) of the above lncRNAs contains just a single exon. This suggests that functional lncRNAs tend to have a single exon, and may thus (although speculative) reflect avoidance of unnecessary (complex) involvement of splicing mechanism regulation in lncRNA generation.

Examples of potential functional conserved lncRNAs include cases that overlap Dicer and U2AF2. Dicer is an endoribonuclease that cleaves double-stranded RNAs into shorter double-stranded segments called small interfering RNAs (siRNAs) [24], [25], [26]. The U2AF2 gene encodes the U2 snRNP auxiliary factor, which participates in splicesome assembly formation by binding to polypyrimidine tracts [27].

Role of some lncRNAs in post-transcriptional regulation

Long non-protein-coding RNAs are known to play a role in the post-transcriptional regulation of target genes [8]. We found two examples of lncRNAs (HIT000079026.8 and HIT000091723.8) that are transcribed in the antisense direction to the orientation of the UTR region of the protein-coding gene (in these cases, also, there are no other protein-coding exons that overlap on the other strand in these particular genomic regions). These lncRNAs could therefore act as negative regulators of gene expression by complementary binding to the UTRs of target mRNAs (fig. 3). A good example is that of an lncRNA associated with the ST7 gene. Functional analyses have revealed that ST7, a tumor-suppressor gene, plays a role in the development of certain cancer types [28]. Therefore, it is possible that the lncRNA may also be involved in carcinogenesis. Based on the above findings, we suggest a general model for negative feedback post-transcriptional regulation (fig. 3) of gene transcript effectuated via complementary hybridization between UTR-derived lncRNAs and parent mRNAs. An experimental validation, however, is necessary.

Figure 3
A model for antisense regulation of target mRNA transcripts by lncRNAs.

Evidence for selection on the identified putative functional lncRNAs

We analysed for features of selection in orthologous lncRNAs that have detectable (significant) similarity between them. As lncRNAs from mouse and human do not completely overlap although they show significant homology, we used mouse lncRNAs as reference sequences, and deduced the orthologous human counterpart by BLASTing [29] mouse lncRNAs against the human genome. We compared the sequence identities of these deduced orthologous lncRNAs to their flanking regions. Buffer (intergenic) regions flanking mouse lncRNAs, of length equivalent to that of lncRNA, were selected and examined for the presence of similar counterparts in near syntenic locations in other mammals. These were then aligned using a global alignment algorithm [30]. From the results (Table 2), it is clear that many lncRNAs do not have conserved flanking regions or are not as significantly conserved as lncRNAs. This indicates that the identified lncRNAs are under selection, thus giving further support to their potential functionality.

Secondary structure analysis

We then investigated whether any of the long non-coding RNAs (> = 200 nts) encode thermodynamically stable and conserved secondary-structure motifs, a finding that could lend support to their functional role. For this, we used the program RNAz [31] to examine for the conservation of stable secondary structure motifs in orthologous sequences. RNAz calculates a “RNA class probability” or P-value based on structural conservation index and thermodynamic stability scores. Alignments with P>0.5 are classified as functional RNA. We found that nearly 45% of the identified lncRNAs, i.e., 5 of the 11 orthologous lncRNAs that have detectable homology, have conserved and stable secondary structure motifs (i.e., P-value>0.5). This further strengthens our case that these lncRNAs could represent biologically relevant sequences.

Genomic conservation in other mammals

Expression per se does not indicate functionality. Sequences of long noncoding RNAs that are present in distantly related mammals (non-coding RNA orthologs) indicate the presence of evolutionary pressure for their preservation. Such preservation indicates possible functionality. Out of the 11 in our list, we find that 9 are conserved in human, mouse, dog and cow. One of them is preserved in human, mouse and dog, whereas the remaining one is preserved only in human and mouse. This indicates that the identified lncRNAs have been conserved across mammalian speciation.

Evolutionary analysis of codon substitution rates

A measure of selection pressure for protein-coding ability of genes is the ratio of non-synonymous to synonymous substitution rates (Ka/Ks). Values significantly [double less-than sign]1.0 indicate purifying selection, whereas neutral selection theoretically yields a value of ~1.0. We compared Ka/Ks values for the above 11 lncRNA ortholog pairs (termed Ka/KslncRNA-ortho) with the corresponding Ka/Ks values for their parent/nearby genes (Ka/Ksparent-ortho) (fig. 4). These Ka/Ks values were calculated for the longest ORFs from each lncRNA. Only 19 ORFs out of the 66 possible longest ORFs obtained following six-frame conceptual-translations, were found to have significant similarity to respective human counterparts. Although we considered best-case similarity between any two conceptually translated long open reading frames (see Materials and Methods), we found that codon substitution patterns do not support the hypothesis of protein-coding ability, as the Ka/Ks ratios for these alignments are mostly in the range 0.5–1.5.

Figure 4
Assessment for protein-coding ability.

Conclusion

In this comparative study, we mined publicly available (experimental) data sets of mammalian full-length cDNAs for evolutionarily conserved lncRNAs. These represent novel genomic elements of likely functional relevance. Of course, it cannot be ruled out that some of these apparent lncRNAs are conserved to produce functional short peptides, such as was recently described for two mRNAs in Drosophila [32]. Because quite a number of lncRNAs arise from protein-coding regions, it is conceivable that they are involved in functional roles complementing to that of the parent protein-coding gene. In this vein, we have found that cancer-related genes are over-represented in the protein-coding genes that are contributing to the pool of lncRNAs. This therefore suggests that lncRNAs may play an important role in cancer pathomechanisms.

Materials and Methods

Collection of data

Full-length cDNA datasets for human and mouse were obtained from the H-InvDB (www.h-invitational.jp/) and Fantom3 (http://fantom3.gsc.riken.jp/) databases respectively. Complete genome sequences of mammals were obtained from http://www.ensembl.org (Ensembl release 47 for human genome; Ensembl release 48 for other mammals, namely, rhesus monkey, mouse, rat, cow and dog). Full-length cDNAs with length> = 200 nucleotides only were considered for further analysis, as analysis of small RNAs was not the focus of this study. To identify genomic locations of transcripts in mammals, cDNAs were mapped onto the respective genome using GMAP software [33] with match criteria of ≥99% sequence identity and ≥99% sequence coverage.

Identification of orthologous lncRNAs in various sequenced mammalian genomes

Orthologous counterparts to mouse lncRNAs are detected by the presence of a similar sequence at the syntenic position in the other mammalian genome. Based on this criterion, a search was carried out in the target mammal as indicated in the synteny maps, to locate orthologous lncRNAs. The following mammals were included in the analysis: human, monkey, mouse, rat, cow and dog. The pair-wise synteny map data for the various mammals were obtained from http://genome.ucsc.edu/. For a schematic representation of the discovery pipeline for putative functional ncRNAs, see fig. 1.

Ka/Ks calculation

Although orthologous lncRNAs from mouse and human show significant similarity, they however do not completely overlap. Hence, we deduced the orthologous human lncRNAs counterpart by BLASTing [29] mouse lncRNAs against the human genome. Next, putative lncRNA sequences were conceptually translated in all six frames, and the longest ORF in each frame translation was identified. These long ORFs were then pairwise aligned to assess for possible homology at the protein sequence level using BlastP program of BLAST package [29]. Those showing significant pairwise BlastP homology were short-listed and were used for the calculation of Ka/Ks values using the PAL2NAL web server (www.bork.embl.de/pal2nal/), which integrates PAL2NAL tool [34] and the PAML 4 software package [35].

Secondary structure prediction

RNAz predicts structurally conserved and thermodynamically stable secondary structures (http://rna.tbi.univie.ac.at/cgi-bin/RNAz.cgi). We used the RNAz program with default parameters to check for conserved secondary structure motifs in the set of human-mouse lncRNA orthologs.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: A.N.K. and P.M.H. would like to thank the funding support from the National Science and Engineering Research Council of Canada (NSERC), and from Les Fonds Québécois de la Recherche sur la Nature et les Technologies (FQRNT). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420:563–573. [PubMed]
2. Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, et al. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet. 2006;2:e62. [PMC free article] [PubMed]
3. Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, et al. The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts. Nucleic Acids Res. 2008;36:D793–799. [PMC free article] [PubMed]
4. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature. 2009;457:1028–1032. [PMC free article] [PubMed]
5. Brosius J. Waste not, want not–transcript excess in multicellular eukaryotes. Trends Genet. 2005;21:287–288. [PubMed]
6. Ponjavic J, Ponting CP, Lunter G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 2007;17:556–565. [PubMed]
7. Nordstrom KJ, Mirza MA, Almen MS, Gloriam DE, Fredriksson R, et al. Critical evaluation of the FANTOM3 non-coding RNA transcripts. Genomics. 2009;94:169–176. [PubMed]
8. Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–641. [PubMed]
9. Khachane AN, Harrison PM. Assessing the genomic evidence for conserved transcribed pseudogenes under selection. BMC Genomics. 2009;10:435. [PMC free article] [PubMed]
10. Guttman M, Amit I, Garber M, French C, Lin MF, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458:223–227. [PMC free article] [PubMed]
11. Kay GF, Barton SC, Surani MA, Rastan S. Imprinting and X chromosome counting mechanisms determine Xist expression in early mouse development. Cell. 1994;77:639–650. [PubMed]
12. Duret L, Chureau C, Samain S, Weissenbach J, Avner P. The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science. 2006;312:1653–1655. [PubMed]
13. Li YM, Franklin G, Cui HM, Svensson K, He XB, et al. The H19 transcript is associated with polysomes and may regulate IGF2 expression in trans. J Biol Chem. 1998;273:28247–28252. [PubMed]
14. Gabory A, Ripoche MA, Yoshimizu T, Dandolo L. The H19 gene: regulation and function of a non-coding RNA. Cytogenet Genome Res. 2006;113:188–193. [PubMed]
15. Yoshimizu T, Miroglio A, Ripoche MA, Gabory A, Vernucci M, et al. The H19 locus acts in vivo as a tumor suppressor. Proc Natl Acad Sci U S A. 2008;105:12417–12422. [PubMed]
16. Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, et al. Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009;7:e1000112. [PMC free article] [PubMed]
17. Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. [PubMed]
18. Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat Rev Genet. 2009;10:155–159. [PubMed]
19. Nam DK, Lee S, Zhou G, Cao X, Wang C, et al. Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription. Proc Natl Acad Sci U S A. 2002;99:6152–6156. [PubMed]
20. Batey RT. Structures of regulatory elements in mRNAs. Curr Opin Struct Biol. 2006;16:299–306. [PubMed]
21. Bajdik CD, Kuo B, Rusaw S, Jones S, Brooks-Wilson A. CGMIM: automated text-mining of Online Mendelian Inheritance in Man (OMIM) to identify genetically-associated cancers and candidate genes. BMC Bioinformatics. 2005;6:78. [PMC free article] [PubMed]
22. Guffanti A, Iacono M, Pelucchi P, Kim N, Solda G, et al. A transcriptional sketch of a primary human breast cancer by 454 deep sequencing. BMC Genomics. 2009;10:163. [PMC free article] [PubMed]
23. Mattick JS. The genetic signatures of noncoding RNAs. PLoS Genet. 2009;5:e1000459. [PMC free article] [PubMed]
24. Bernstein E, Caudy AA, Hammond SM, Hannon GJ. Role for a bidentate ribonuclease in the initiation step of RNA interference. Nature. 2001;409:363–366. [PubMed]
25. Chiosea S, Jelezcova E, Chandran U, Acquafondata M, McHale T, et al. Up-regulation of dicer, a component of the MicroRNA machinery, in prostate adenocarcinoma. Am J Pathol. 2006;169:1812–1820. [PubMed]
26. Macrae IJ, Zhou K, Li F, Repic A, Brooks AN, et al. Structural basis for double-stranded RNA processing by Dicer. Science. 2006;311:195–198. [PubMed]
27. Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003;72:291–336. [PubMed]
28. Zenklusen JC, Conti CJ, Green ED. Mutational and functional analyses reveal that ST7 is a highly conserved tumor-suppressor gene on human chromosome 7q31. Nat Genet. 2001;27:392–398. [PubMed]
29. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed]
30. Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–453. [PubMed]
31. Gruber AR, Neubock R, Hofacker IL, Washietl S. The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures. Nucleic Acids Res. 2007;35:W335–338. [PMC free article] [PubMed]
32. Kondo T, Hashimoto Y, Kato K, Inagaki S, Hayashi S, et al. Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA. Nat Cell Biol. 2007;9:660–665. [PubMed]
33. Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics. 2005;21:1859–1875. [PubMed]
34. Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34:W609–612. [PMC free article] [PubMed]
35. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997;13:555–556. [PubMed]
36. Buim ME, Soares FA, Sarkis AS, Nagai MA. The transcripts of SFRP1,CEP63 and EIF4G2 genes are frequently downregulated in transitional cell carcinomas of the bladder. Oncology. 2005;69:445–454. [PubMed]
37. Xu L, Han C, Lim K, Wu T. Cross-talk between peroxisome proliferator-activated receptor delta and cytosolic phospholipase A(2)alpha/cyclooxygenase-2/prostaglandin E(2) signaling pathways in human hepatocellular carcinoma cells. Cancer Res. 2006;66:11859–11868. [PubMed]
38. Xu L, Han C, Wu T. A novel positive feedback loop between peroxisome proliferator-activated receptor-delta and prostaglandin E2 signaling pathways for human cholangiocarcinoma cell growth. J Biol Chem. 2006;281:33982–33996. [PubMed]
39. Hooi CF, Blancher C, Qiu W, Revet IM, Williams LH, et al. ST7-mediated suppression of tumorigenicity of prostate cancer cells is characterized by remodeling of the extracellular matrix. Oncogene. 2006;25:3924–3933. [PubMed]
40. Faridi J, Wang L, Endemann G, Roth RA. Expression of constitutively active Akt-3 in MCF-7 breast cancer cells reverses the estrogen and tamoxifen responsivity of these cells in vivo. Clin Cancer Res. 2003;9:2933–2939. [PubMed]
41. Davies MA, Stemke-Hale K, Tellez C, Calderone TL, Deng W, et al. A novel AKT3 mutation in melanoma tumours and cell lines. Br J Cancer. 2008;99:1265–1268. [PMC free article] [PubMed]
42. Tran MA, Gowda R, Sharma A, Park EJ, Adair J, et al. Targeting V600EB-Raf and Akt3 using nanoliposomal-small interfering RNA inhibits cutaneous melanocytic lesion development. Cancer Res. 2008;68:7638–7649. [PubMed]
43. Sharma A, Sharma AK, Madhunapantula SV, Desai D, Huh SJ, et al. Targeting Akt3 signaling in malignant melanoma using isoselenocyanates. Clin Cancer Res. 2009;15:1674–1685. [PMC free article] [PubMed]
44. Vazquez-Martinez R, Martinez-Fuentes AJ, Pulido MR, Jimenez-Reina L, Quintero A, et al. Rab18 is reduced in pituitary tumors causing acromegaly and its overexpression reverts growth hormone hypersecretion. J Clin Endocrinol Metab. 2008;93:2269–2276. [PubMed]
45. Ishiguro H, Shimokawa T, Tsunoda T, Tanaka T, Fujii Y, et al. Isolation of HELAD1, a novel human helicase gene up-regulated in colorectal carcinomas. Oncogene. 2002;21:6387–6394. [PubMed]
46. Finlin BS, Gau CL, Murphy GA, Shao H, Kimel T, et al. RERG is a novel ras-related, estrogen-regulated and growth-inhibitory gene in breast cancer. J Biol Chem. 2001;276:42259–42267. [PubMed]
47. Wu K, Katiyar S, Witkiewicz A, Li A, McCue P, et al. The cell fate determination factor dachshund inhibits androgen receptor signaling and prostate cancer cellular growth. Cancer Res. 2009;69:3347–3355. [PMC free article] [PubMed]
48. Wu K, Li A, Rao M, Liu M, Dailey V, et al. DACH1 is a cell fate determination factor that inhibits cyclin D1 and breast tumor growth. Mol Cell Biol. 2006;26:7116–7129. [PMC free article] [PubMed]
49. Schoenmakers EF, Huysmans C, Van de Ven WJ. Allelic knockout of novel splice variants of human recombination repair gene RAD51B in t(12;14) uterine leiomyomas. Cancer Res. 1999;59:19–23. [PubMed]
50. Blank C, Schoenmakers EF, Rogalla P, Huys EH, van Rijk AA, et al. Intragenic breakpoint within RAD51L1 in a t(6;14)(p21.3;q24) of a pulmonary chondroid hamartoma. Cytogenet Cell Genet. 2001;95:17–19. [PubMed]
51. Rusiniak ME, Yu M, Ross DT, Tolhurst EC, Slack JL. Identification of B94 (TNFAIP2) as a potential retinoic acid target gene in acute promyelocytic leukemia. Cancer Res. 2000;60:1824–1829. [PubMed]
52. Jaskula-Sztul R, Sokolowski W, Gajecka M, Szyfter K. Association of arylamine N-acetyltransferase (NAT1 and NAT2) genotypes with urinary bladder cancer risk. J Appl Genet. 2001;42:223–231. [PubMed]
53. Morton LM, Schenk M, Hein DW, Davis S, Zahm SH, et al. Genetic variation in N-acetyltransferase 1 (NAT1) and 2 (NAT2) and risk of non-Hodgkin lymphoma. Pharmacogenet Genomics. 2006;16:537–545. [PMC free article] [PubMed]
54. Moslehi R, Chatterjee N, Church TR, Chen J, Yeager M, et al. Cigarette smoking, N-acetyltransferase genes and the risk of advanced colorectal adenoma. Pharmacogenomics. 2006;7:819–829. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science