We report the first genome-wide and deep sequencing study in which intergenic noncoding expression has been followed throughout an animal’s life cycle. We report a set of 1,119 candidate lincRNA loci, of which only 17%, at most, are predicted to represent previously undiscovered protein-coding genes. Our results show that lincRNA loci are commonplace and should now prompt experimental investigations into whether they represent an important component of the functionality of the Drosophila
genome. We were able to validate, using qRT–PCR, expression from 87% of our lincRNA loci which we assayed, even for loci with maximum FPKM values as low as 0.23. The number of annotated loci in D. melanogaster
found in FlyBase now increases by 7.5% (from 14,833 to 15,952) with many of these novel loci, as expected, being expressed at low levels and in restricted numbers of tissues and developmental stages. As lincRNAs are generally shorter than protein-coding transcripts, this increase in the number of loci is not matched by a corresponding increase in the number of bases covered by these annotations (2% increase, from 91 to 93 Mb). Despite the greater range of developmental time points used for these RNA-seq data, the number of D. melanogaster
lincRNAs is already exceeded by known mouse lincRNA loci (Carninci et al. 2005
; Guttman et al. 2009
; Guttman et al. 2010
; Cabili et al. 2011
), a set that will clearly increase upon further RNA-seq interrogation of the mouse transcriptome (Marques and Ponting 2009
). The increased complexity of the mouse over the fruit fly therefore appears to be matched by increases in the number of lincRNA loci, as well as protein-coding genes.
If lincRNA loci in Drosophila were not to impart function, then their sequence evolution would not be expected to differ from untranscribed intergenic sequence, their transcript levels would not vary over developmental stages, and their genomic positions would occur randomly with respect to chromatin domains and neighboring protein-coding gene classes. Instead, we have shown that these lincRNA loci are almost as intolerant of substitution mutations as gene models and are considerably less tolerant than intergenic sequence for which we have no evidence of transcription. Ninety-five percent of lincRNA loci contain an MCS, arguing for their long-lasting functionality across the entire Drosophila phylogeny.
Our data suggest a major biological role for lincRNAs in transcription regulation during development. This is implied by their more prominent expression at the earlier embryonic and larval stages and their loci being enriched in Polycomb protein–associated domains which are known to harbor developmentally relevant genes. LincRNAs with the highest sequence constraint, which might be expected to convey the most fundamental roles, are expressed preferentially during single developmental stages, rather than over multiple stages, and represent the best candidates for further experimental scrutiny into their contributions to developmental processes.
Like other molecule types, lincRNAs are expected to possess many diverse molecular roles. Nevertheless, a substantial minority of lincRNAs (155 of 1,119, 14%) are transcribed in the vicinity of protein-coding genes from particular functional classes, which is approximately 2-fold more than expected by chance (). Expression of genes from these classes is also significantly more likely to be positively correlated with transcription from genomically adjacent lincRNA loci. These biases suggest this fraction of RNAs first as eRNAs that actively promote transcription of genomically adjacent protein-coding genes and second as RNAs with roles in development. Specifically, the role of this fraction of lincRNAs may be in the development of the nervous system. Similar findings were reported previously for mouse lincRNA loci (Ponjavic et al. 2009
). LincRNAs have previously been shown to be important in the mammalian nervous system (Mercer et al. 2008
) and their brain expression patterns can be conserved between diverse vertebrates (Chodroff et al. 2010
). Our findings in D. melanogaster
, an invertebrate, suggest a role for lincRNAs in regulating developmental processes and in the development of the nervous system more generally across the animal kingdom. The 255 pairs of D. melanogaster
lincRNA and protein-coding loci that contribute to these enrichments represent a rich resource for future investigations of the molecular mechanisms of transcriptional regulation during development.
The availability of lincRNA loci from both D. melanogaster
and mouse allowed us to identify lincRNAs in each species that lie in the genomic vicinity of orthologous protein-coding genes. Such lincRNAs, through the potential cis
-regulation of orthologous genes, may possess analogous, or even homologous, functional roles, which our results suggest would most likely be in developmental processes. We observed an increased frequency of D. melanogaster
lincRNAs in the genomic vicinities of genes whose mouse orthologues also neighbored a lincRNA locus. As discussed above, mouse lincRNA catalogues remain incomplete and so the true enrichment may be higher than reported here. Similar positionally equivalent lincRNA loci were previously identified between human and mouse (Engström et al. 2006
). To our knowledge, there have been only two previous reports of analogous lincRNA action between such distantly related species as mammals and Drosophila
(Deng and Meller 2006
; Jolly and Lakhotia 2006
). In both instances, lincRNAs from both species are seen to participate in chromatin remodeling, through dosage compensation or the heat shock response but otherwise exhibit little else in common. These species’ high divergence disallows sequence similarities, and thus distinction between analogy and homology, to be discerned between paired lincRNAs; hence, this issue will, in the future, require experimental resolution.
Whether these lincRNAs function to regulate these protein-coding genes through a purely cis
-acting mechanism could be tested by introducing genetic lesions, such as a premature transcriptional termination signal, to these sequences. Transfecting short hairpin RNAs (shRNAs) constructs (Guttman et al. 2011
), which only target the mature lincRNA molecule, preferentially reveal trans
-acting functions of the lincRNAs.
The data presented here for D. melanogaster
and elsewhere for mouse and other species (Yazgan and Krebs 2007
) suggest that the genomes of diverse animals contain large numbers of lincRNA loci that can confer biological function. The 1,119 D. melanogaster
lincRNA loci provide excellent experimental candidates for testing the functional hypotheses advanced by this study, such as sex-specific regulation, regulation by chromatin states, the analogous activity of lincRNAs between D. melanogaster
and mouse, and the cis
-regulation of neighboring protein-coding genes. In all, 632 (56.5%) of our lincRNAs can be tested for at least one of these four functions. Genetic transformation techniques are available for D. melanogaster
, which allow these hypotheses to be addressed. For example, 117 of our lincRNA loci contain a P-element for which it is already possible to obtain a mutant stock. Preliminary results (data not shown) reveal that several such P-element insertion lines exhibit a lethality phenotype, and these will be reported elsewhere. Clearly, the powerful genetic toolkit of D. melanogaster
can now be applied to determine the molecular deficits that underlie such phenotypic changes for these, and many other, lincRNA loci.