In the past few years, several families of regulatory RNA molecules have been shown to be widely expressed in eukaryotes [1
]. Natural antisense transcripts (NATs) belong to one such family. NATs are endogenous RNA molecules whose partial or entire sequences exhibit complementarity to other transcripts. There are two types of NATs. Cis
-NATs are transcribed from the same genomic loci as their sense transcripts but on the opposite DNA strand. By contrast, trans
-NATs are expressed from genomic regions distinct from those encoding their sense transcripts [3
-NATs and their sense RNAs are usually related in a one-to-one fashion, whereas a single trans
-NAT may target several sense transcripts; for example, one type of micro RNA (miRNA) could regulate the expression of several distinct target mRNAs [6
Studies performed in various organisms have suggested that NATs can participate in a broad range of regulatory events, such as transcription occlusion resulting in the reciprocal expression of sense-antisense RNAs [7
] and RNA interference (RNAi) which leads to the degradation of double-stranded sense-antisense transcript pairs [9
]. There is evidence for the involvement of NATs in alternative splicing [10
], RNA editing [12
], DNA methylation [14
], genomic imprinting [16
] and X-chromosome inactivation [21
]. NATs are also known to regulate expression of some circadian clock genes [22
]. However, because each of the above regulatory modes was only observed in a few cases, the general biological functions and regulatory mechanisms of NATs are still unclear.
Recent large-scale NAT identifications in several model organisms have revealed the widespread existence of cis
-NATs in eukaryotes. Lehner et al.
first reported 372 NATs in human by searching for overlapping mRNA sequences in public databases [23
]. Using a public expressed sequence tag (EST) database, Shendure and Church also found 144 human NATs and 73 mouse NATs [24
]. In a later work, Yelin et al.
predicted 2,667 NATs in human and concluded that around 1,600 NAT pairs were transcribed from both strands after experimental validation [25
]. The RIKEN group identified 2,481 NAT pairs and 899 non-antisense bidirectional transcript units from 60,770 mouse full-length cDNAs [26
]. A similar analysis by the same group uncovered 687 bidirectional transcript pairs from 32,127 rice (Oryza sativa
) full-length cDNAs [27
]. Antisense expression of about 7,600 annotated genes was observed in a recent work using whole-genome arrays to analyze the transcription activity of the A. thaliana
genome. However, a detailed list of these Arabidopsis
antisense RNAs and their complete analysis is not yet available [28
]. We note that in all previous investigations NAT prediction focused on cis
Here, we present results of a genome-wide computational search to predict and identify cis-NATs in Arabidopsis. Combining sequence information of Arabidopsis full-length cDNAs from the public databases and Arabidopsis annotated genes from the Arabidopsis genome release, we have identified 1,340 potential cis-NAT pairs. Expression evidence for transcripts derived from both strands of 957 cis-NAT pairs was obtained from the Arabidopsis full-length cDNA and the public Arabidopsis massively parallel signature sequencing (MPSS) database.