Every human tissue harbors HERV transcripts. While overall transcriptional activities of various HERV families have been studied in more detail, little is known about transcriptional activities of specific HERV loci in benign and malignant tissues. For instance, are only a few or a greater number of loci for a given HERV family and tissue transcriptionally active? Is the expression of specific HERV loci up- and downregulated depending on the cellular condition? Given the significant amount of HERV sequences in the human genome a better understanding of the regulation of HERV transcription appears crucial for better understanding transcriptional regulation in the human genome as a whole. Besides, alterations in transcriptional activities of HERV loci may provide important insight into epigenetic alterations in corresponding genomic regions, potentially affecting critical cellular genes located in those regions. In the following, we will also emphasize the need for larger scale and specialized studies to adequately address those issues.
Identification of transcriptionally active HERVs is a prerequisite for better understanding regulation of HERV transcription. We therefore aimed in the present study to identify transcriptionally active proviruses of the HERV-K(HML-2) family that is of importance from an evolutionary and a clinical perspective. The strategy for identifying transcriptionally active HML-2 sequences is relatively straightforward. Because the various HML-2 loci in the human genome accumulated random mutations over time each HML-2 locus harbors private nucleotide differences (Fig. ). Thus, cDNAs generated from RNA transcripts from a particular HML-2 provirus should be identical in sequence to the original provirus. In practice, this may not hold true because of RT-PCR errors. Also, SNPs within HML-2 sequences may introduce differences when compared to the human reference sequence. Previous work already indicated SNPs within HML-2 coding regions [28
]. However, the best cDNA-provirus sequence match is still expected to be the correct one if matches to other proviruses display clearly more differences. Assignment of HML-2 cDNA is further complicated by ex vivo
recombinations between transcripts from different proviruses that inevitably occur during cDNA generation [53
]. In the present study, we chose to exclude cDNA sequences that displayed 18 or more nt differences to the best match. While the great majority of cDNAs displayed between zero and 5 nt differences [53
] it is likely that there was a limited number of recombinant cDNAs among the cDNAs with less than 18 nt differences. Actually, a SAGE (serial analysis of gene expression)-like strategy could be applied to such recombinant cDNAs if it was possible to unambiguously assign each recombined portion to a particular provirus. Since HML-2 sequences are relatively similar in sequence, proper assignment seems difficult or impossible, but a SAGE-like strategy may be feasible for other, evolutionarily older and thus more diverged HERV families.
Since little is known about transcription of individual HERV sequences in various human tissues, a greater number of tissues would be of interest for initial studies. We investigated in the present study in more detail normal and pathologic germ cell tissue because of the association of HML-2 with GCT [1
], and brain tissues because of previous results indicating transcriptional differences between tissue samples from patients with neuropsychiatric disorders [56
Our study identified, in total, 23 different HML-2 proviruses as transcriptionally active in the studied tissues. Several HML-2 proviruses appeared transcribed in every investigated tissue. We stress in this context that our method is not intended to provide quantitative information as to overall transcriptional levels of HML-2 sequences in different tissues but to identify those loci that are transcriptionally active at all. In any case, among the transcriptionally active proviruses were structurally intact, full-length proviruses and proviruses lacking internal or lateral proviral regions. Both type 1 and type 2 proviruses, differing by a 292 bp sequence within the pol
boundary, were found transcribed. Notably, few loci (c7_C, c10_B, c19_A) lack the 5'LTR as the classical proviral promoter. Hitherto unknown flanking promoters obviously result in transcription of those HML-2 sequences. Given results from the ENCODE project, specifically the density of initiators of transcription in the human genome [51
], it seems plausible that there are promoters located nearby. Alternatively, as locus c10_B is located within a gene intron, rare splicing events might have produced c10_B-harboring transcripts. For locus c7_C, located closely downstream from the SSBP1 gene, read-through events might have produced c7_C-harboring transcripts.
It is likely that there are more than 23 transcriptionally active HML-2 proviruses. Oligonucleotides employed for PCR following the RT step display varying amounts of mismatches to HML-2 loci in the human genome. Transcriptionally active HML-2 loci tended to have less mismatches, however, several seemingly inactive loci displayed as few mismatches as active loci. One or both primer regions are missing in some of the HML-2 loci [see Additional file 3
]. Transcripts from some proviruses were indeed detected only as gag
- or env
-derived cDNAs. While transcripts from several HML-2 loci with one or two mismatches to the primers were eventually cloned as cDNA, it is difficult to decide for other loci whether they were not expressed or could not be efficiently cloned as cDNA for technical reasons. In fact, a type 1 provirus on human chromosome 3, previously termed HERV-K(II), was not identified in our analysis, despite the fact that it is known to produce np9 mRNA [[45
], Kehr et al. unpublished]. Transcripts from that provirus were probably not detected because of 25% and 20% sequence differences to gag
, respectively, reverse primers. In the course of larger scale studies degenerate oligonucleotides or sets of oligonucleotides representing proviral sequence variants should be employed.
Another point concerns relative cloning frequencies. While HML-2 sequences expressed at higher levels are more likely to be cloned as cDNA, many other HML-2 loci may be transcribed at much lower levels and thus their cDNA would only appear when many more cDNA sequences per tissue sample were analyzed. We observed such differences between tissues for several HML-2 loci. Considering current sequencing technologies, it should be quite feasible to generate significantly more cDNA sequences per tissue sample in the course of a larger study.
Different splicing efficiencies for HML-2 transcripts may also affect cloning likelihoods of corresponding proviral transcripts if, for instance, full-length transcripts are efficiently spliced down to env
mRNA, making it less likely to catch gag
portions as cDNA, as opposed to splicing-deficient transcripts. However, besides the fact that type 1 proviruses lack splice signals present in type 2 proviruses, yet, type 1 transcripts still being spliced [61
], there is currently no information about splicing efficiencies of transcripts from different HML-2 loci.
Furthermore, the polymorphic nature of several potentially transcriptionally active HML-2 proviruses will result in lack or potentially reduced amounts of corresponding transcripts for tissue samples from some human donors. In fact, six of the proviruses identified in our study (c1_A, c6_A, c7_A, c8_A, c11_A, c12_A) have been described polymorphic in the human population. Two of those loci (c1_A and c8_A) represent presence/absence alleles, the other alleles are full-length (either tandem or single) proviruses, solitary LTRs and/or (empty) preintegration sites [26
Based on cloning frequencies, a limited number of HML-2 proviruses appear transcribed at higher levels in the investigated brain and testicular tissues than other proviruses. It is currently not clear whether the former proviruses are also transcribed at higher levels in other human tissues, potentially even showing ubiquitous expression. Investigation of other human tissues in the course of a larger project should clarify that point.
GCTs, especially seminomas, display significantly upregulated HML-2 expression, both on the RNA and protein level. Variable HML-2 expression was previously also observed in different human brain samples [56
]. On the assumption that relevant proviruses were not missed in our analysis, our results indicate that observed differences in expression levels were not primarily due to drastic upregulation of single HML-2 proviruses but rather due to global upregulation of HML-2 transcription. This could be explained by differential expression of factors involved in the regulation of HML-2 sequences, such as transcription factors. Deregulated levels of such factors in GCT cells could result in the activation of HML-2 sequences. As for previously observed interindividual differences in HML-2 expression levels [56
], such differences have also been observed in the human population for cellular genes, and sets of genes appear regulated by hitherto unidentified elements (gene products) in defined genome regions [63
]. If regulators of HML-2 transcription were differentially expressed in humans expression levels of those regulators could indirectly result in differential HML-2 expression levels.
The Gag-encoding provirus c22_A appears strongly upregulated in pathologic versus normal testicular tissue, and may thus contribute to high Gag protein levels in GCT. However, the results from our study also indicate that presence of HML-2 Gag and Env antibodies in GCT patients is not correlated with expression of specific HML-2 proviruses, as several Gag- and Env-encoding HML-2 proviruses, among them c22_A, are transcriptionally active in both antibody positive and negative GCT patients. Generation of autoantibodies is currently little understood. It has been associated with increased expression of antigens. Immunogenic antigens, i.e. antigens that cause a humoral immune response, can also stem from genes altered by specific mutations including chromosome alterations and epigenetic DNA modifications. Likewise, spliced gene products can give rise to an immune response. Posttranslational modifications, such as altered protein folding and processing, may also result in immunogenic antigens. Finally, normal non-altered antigens can elicit an autoantibody response when expressed in specific tissues [for a review, see [67
Two research groups [23
] recently described engineered HML-2 proviral sequences that are replication-competent and infectious. An additional engineered proviral sequence, consisting of portions of so-called HERV-K109, HERV-K108 and HERV-K115 proviruses, was reported to generate an infectious retrovirus as well [24
]. The authors argued that recombination between those proviruses in vivo
could generate a functional HERV-K(HML-2) variant. We note that those three proviruses, named c6_A, c7_A and c8_A in our study, have been identified by us as transcriptionally active in germ cell tissue. For all three proviruses have been described polymorphic in the human population, only some human individuals will harbor all three proviruses, though. In any case, for individuals harboring all three, transcripts from those proviruses would be present and could serve as substrates for recombination events when transcripts were reverse transcribed.
Large-scale cDNA sequencing projects have so far generated about 8.1 million human expressed sequence tag (EST) entries http://www.ncbi.nlm.nih.gov/dbEST
. HML-2 provirus-derived ESTs are also found in dbEST. However, our initial analysis of HML-2 specific ESTs clearly shows that there is insufficient information in dbEST to comprehensively describe HML-2 proviral expression patterns. Our initial probing of dbEST with a representative HML-2 gag
sequence yielded only 109 significant hits. Breaking down those hits to specific tissues resulted in a maximum of 27 ESTs that were derived from stem cells, only 23 ESTs were derived from germ cells. It is likely that several of those 23 ESTs stem from the same HML-2 proviruses. Considering the usually poor sequence quality of ESTs, proviral assignment of ESTs may often be ambiguous. Moreover, HML-2 ESTs stem from very few source tissues. Indeed, few HML-2 ESTs in dbEST stem from tissues in which HML-2 is clinically relevant, i.e. germ cell tumors. Such lack of HML-2 ESTs for other tissues erroneously indicates that HML-2 is not transcribed in those tissues. Though, more detailed analysis of EST data does not provide significantly more and better information. Stauffer et al. [48
] recently reported such a more detailed analysis of HERV-derived ESTs, among them HML-2-specific ESTs. They identified 143 non-normalized, non-subtracted ESTs matching the HML-2 family. About a third of the ESTs could not be assigned unambiguously to specific HML-2 proviruses. In total, 8 HML-2 proviruses (and 3 ambiguous ones) were identified as transcriptionally active. Of those, 2 (or 4 when including the ambiguous ones) were also identified in our study (Table ). In other words, our study detected at least 19 transcriptionally active HML-2 proviruses that were not identified in the detailed EST study by Stauffer et al. A more recent analysis by Oja et al. [49
] reported about 300 HML-2-specific ESTs. However, it seems clear that 300 HML-2 ESTs are still insufficient to describe HML-2 expression patterns in, for instance, human benign and malignant tissues, and depending on cellular conditions and stimuli. It is well conceivable that conclusions for HML-2-specific ESTs are true for ESTs from other HERV families as well. One can also predict that (overall much smaller) non-human EST datasets are likewise insufficient for studying transcriptional activity of ERVs in other species. Thus, EST data are insufficient to describe (H)ERV transcription.
Buzdin et al. [43
] recently reported results from an experimental strategy, named GREM (Genome Repeat Expression Monitor) that is able to specifically amplify HERV transcripts. A total of 22 HML-2 proviruses were reported transcribed in parenchyma and/or seminoma. Of those proviruses, 16 were also identified in our study (Table ), 7 and 6 proviruses were found exclusively in our or in the study of Buzdin et al., respectively. Thus, specialized generation and analysis of HERV-specific transcripts in the study of Buzdin et al. likewise identified many more transcriptionally active HML-2 proviruses than the very limited number of ESTs in dbEST.
We conclude that a specialized HERV transcript sequencing project, a HERV Transcriptome Project
, will be required to comprehend transcriptional activity of HERVs. Bearing in mind the proportion of HERV sequences, and recent findings regarding the extent of transcription in the human genome, a HERV Transcriptome Project could provide crucial information for better understanding transcriptional regulation not only of HERVs but also of the human genome in general. For instance, many of the HERV sequences could function as transcription initiators and could thus contribute to the pervasive transcription of the human genome [51
]. Considering previously reported differences in HERV expression patterns between human tissues, transcriptional activities of individual HERV proviruses are very likely significantly different between tissues as well. Moreover, a HERV Transcriptome Project could significantly contribute to a better understanding of proposed roles of HERV sequences in various human diseases if relevant source tissues were analyzed in greater detail. From that point of view, it will be important to better understand cellular conditions under which HERV sequences are activated or repressed [for instance, see ref. [68
The current study provides an initial insight into the transcriptional activity of a clinically relevant HERV family in selected human tissues. We believe that strategies to specifically generate significantly greater numbers of HERV-specific cDNAs from human tissues and selected cellular conditions can be established in a straightforward fashion. Current sequencing technologies can produce significant amounts of HERV cDNA sequences in a short time that can then be used to identify transcriptionally active HERV loci. We furthermore believe that a HERV Transcriptome Project would best be pursued in a collaborative fashion, that is, the project should be open to researchers proposing analysis of specific HERV families for tissues or cellular conditions of specific interest.