Repetitive elements form a great portion of most eukaryotic genomes and large-scale studies of their transcriptional activity are now attracting increasing interest. Many genomic repeats have originated from insertions of transposable elements. Retroelements (REs), which proliferate via RNA intermediates, are known to be the only transpositionally active group of transposable elements in mammals. In vertebrates, REs occupy up to 30–40% of the genome (1
). Being mobile carriers of transcriptional regulatory modules, REs can affect regulation of host genes, in particular those involved in embryo development, thus being probable candidates for playing a role in speciation processes (5
It was recently demonstrated that REs can drive the transcription of unique host non-repetitive sequences (6
). Many kinds of genomic repeats are known to be transcribed in vivo
). However, a significant portion of such expressed repeats was found within larger transcripts driven from upstream genomic promoters. Conventional and popular methods for transcriptome analysis such as RT–PCR, differential display (10
), subtractive hybridization (12
), serial analysis of gene expression (15
) and microarray hybridization do not allow to distinguish between read-through transcripts and those due to the intrinsic promoter activity of genomic repeats. Different modifications of the 5′ rapid amplification of cDNA ends (RACE) technique allow one to precisely locate transcription start sites (16
), but cannot be used for quantitative and large-scale transcriptome screenings. We aimed to develop a transcriptome-wide strategy that would make it possible to detect intrinsic promoter activity of repetitive elements. To this end, we tried to combine the advantages of 5′-RACE and nucleic acid hybridization techniques.
Here, we describe an approach termed GREM (Genomic Repeat Expression Monitor), which is based on hybridization of total pools of cDNA 5′ terminal parts to genome-wide pools of repetitive elements flanking DNA, followed by selective PCR amplification of the resulting hybrid cDNA–genome duplexes. A library of cDNA/genomic DNA hybrid molecules obtained in such a way can be used as a set of tags for individual transcriptionally active repetitive elements. The method is both quantitative and qualitative, as the number of such tags is proportional to the content of mRNA driven from the corresponding promoter active repetitive element.
We applied GREM for the genome-wide recovery of promoter active human-specific endogenous retroviruses. HERV-K (HML-2) is the only family of endogenous retroviruses known to contain human-specific members (17
). This group, whose members not only retained their transcriptional activity (19
), but also probably still possess some infectious potential (20
), is thought to be among the most biologically active retroviral families of the human genome (22
). A major part of endogenous retroviruses have undergone homologous recombination between their LTR sequences, and this family is now represented mostly by solitary LTRs (25
). Human-specific HERV-K (HML-2) LTRs share a significant sequence identity and form a well-defined cluster (named the HS family) on a phylogenetic tree (17
). The HS family is characterized by diagnostic nucleotide substitutions within the consensus sequence of HS LTRs (17
). The HS family contains 156 mostly (~86%) human-specific LTR sequences. The HS family members are represented by parts of full-sized HERV-K (HML-2) proviruses (11.5% of individual HS representatives), truncated proviruses (5.2%) or solitary LTRs (83.3%). We describe here the results of the first genome-wide identification of those LTRs serving as in vivo
human-specific promoters in germ-line tissue and report the first comprehensive genomic map of transcriptionally active HS LTRs.