We have found evidence for transcription of 692 distinct L1 element sites in human lymphoblastoid cells. Of these, 410 sites correspond to full-length elements (Table ), including 52 of the 304 human-specific subfamily (Tables and ). Of the sixteen full-length human-specific elements that we identified carrying intact copies of both ORF1 and ORF2, only five were represented by five or more expression tags (Tables and ). Therefore, while many L1 elements are transcribed in somatic cells, paradoxically few are likely to be active.
The proportion of full-length elements that we identified containing intact ORFs was not significantly different from their occurrence in the genome, suggesting that transcription of putatively functional elements is tolerated in somatic cells. Only six of these intact L1s corresponded to those that are known to be highly active in a cell culture assay (Tables and ) [19
], suggesting that the most retrotranspositionally competent elements may be suppressed in somatic cells. Alternatively, the individuals that we assayed might carry less active alleles of these previously examined elements (see [56
] for evidences of known allelic variation in L1s). Six expressed full-length elements encode intact ORF2
in the absence of ORF1
(Tables and ). These elements would not be able to mobilize themselves in cis
, but could possibly retain the ability to mobilize non-autonomous elements such as Alus in trans
A previous study examining transcription of the native promoter of new L1 insertions in cell culture found that 5' transcript ends typically mapped within a few base pairs of the 5' end of the element [12
]. Our data suggest that the transcription start site of endogenous, evolutionarily older elements may also begin further away from the element. Half of the sites identified by 5' RACE mapped within 50 nucleotides of the start of the L1 element as indicated by sequence homology. We also found over 100 L1 transcription start sites situated greater than 100 nucleotides upstream of the element. These start sites might not result from action of the L1 promoter at all, but instead use another promoter located coincidentally in the flanking genomic vicinity of the element. This hijacking by the L1 of an upstream promoter might be advantageous for an element whose native promoter has either degenerated through mutation or been subjected to epigenetic silencing.
Examination of the expression of four full-length human-specific elements in members of a CEPH Utah family indicated a high degree of variation among different individuals. This is consistent with previous studies showing inter-individual and inter-allelic variation in retrotransposition of different highly active L1 elements [56
]. A larger population study is required to determine how widespread transcript-level variation is among different individuals and whether this variation is genetically tractable. Some individuals, such as GM10861, showed increased expression, while other individuals, such as GM11992, showed little to no expression at all four loci. These differences were not noted at the HPRT
or 18S loci (data not shown; see Materials and methods), which were evenly expressed in all individuals. We hypothesize that there might be a genome-wide level of regulation leading to individuals with higher or lower numbers of expressed L1 elements. We expect that further, more expanded studies in different human populations will reveal a great amount of natural variation in the number and location of L1s contributing to the human transcriptome.
Transcription of L1s within genes might be expected to interfere with transcription of the gene, so that genes containing highly expressed retroelements in their introns might be relatively suppressed. By contrast, we found that expression of the L1 elements at 4p15.32, 13q14.2, and 6p22.2 closely mirrored the expression of surrounding spliced exons from protein-coding genes. In the case of 13q14.2, a transcription start site was identified 457 nucleotides upstream of the L1, tens of kilobases downstream of the start of the RB1
gene. Therefore, it is unlikely that action of the RB1
promoter itself contributes directly to regulation of the L1. Instead, we hypothesize that transcription of RB1
results in the creation of a region of open chromatin that facilitates the activity of other promoters located within that region. Alternately, both loci might be located in a larger, transcriptionally permissive epigenetic domain. We note that Faulkner et al
] also found a positive correlation in expression between transcripts originating in retroelements and surrounding genes.
Regardless of whether the L1 transcripts are produced through the action of their native promoters or an upstream promoter, the transcripts identified through 3' end tagging terminate near the end of the L1. The short transductions that we identified most likely use the L1-encoded polyadenylation signal. In a few cases, longer transductions that were greater than 50 nucleotides were seen; this is consistent with previous studies describing L1 elements carrying transductions of their progenitor locus [44
]. Where elements are located within genes, such as the elements at 4p15.32 and 6p22.2, a transcript originating from the gene may terminate prematurely by polyadenylation at the end of the L1. In this way, the intronic L1 transcript might 'break' the expression of downstream exons [23
]. However, as the L1-incorporating transcripts described in this study are expressed at much lower steady-state levels than the surrounding genes, it is unclear as to what extent their termination influences the expression or function of those genes.
Cytosine methylation is known to suppress the activity of endogenous L1 promoters [27
]; our examination of the regions surrounding the start of transcription of two elements (the L1s at 1p22.3 and 13q14.2) did not find strong evidence for this modification modulating individual-specific expression levels. Other factors, such as histone modifications, nucleotide polymorphisms, and trans
-acting transcriptional regulators may function at these loci as rheostats to specify the exact levels of RNA production in different cells, tissues, and individuals. Moreover, differences in post-transcriptional regulation, for instance, through sequestration into subcellular compartments [58
], may further determine which full-length, putatively functional L1s are able to actively retrotranspose.
Our study is limited in that we are only able to detect transcripts that carry unique non-L1 sequence at either the 3' or 5' end. In addition, we have only sampled a single individual at the 5' end and a single individual in depth at the 3' end, in a single tissue type - transformed lymphoblastoid cell lines. Because of these caveats, we expect that we have found a subset of the transcribed element sites in different human tissues and populations. Indeed, we note that only 42 of the 606 L1 sites that we identified were also identified by an independent high throughput study looking for transcription start sites mapping within transposable elements [41
] (Additional data files 1 and 4), a study very different from ours in terms of methodology, individuals and cell types.
In addition to the major L1 sense promoter, L1s also contain an antisense promoter in the 5' UTR that produces transcripts of the upstream flanking region [60
]. A recent study has also found evidence for an additional outward-facing sense promoter in the 3' UTR [41
]. As such, explorations of sense transcription through the L1 may reveal only the tip of the iceberg of the genomic transcripts incorporating and influenced by the presence of these retroelements.