|Home | About | Journals | Submit | Contact Us | Français|
Sequences of retroviral origin occupy approximately 8% of the human genome. Most of these “retroviral” genes have lost their coding capacities since their entry into our ancestral genome millions of years ago, but some reading frames have remained open, suggesting positive selection. The complete sequencing of the human genome allowed a systematic search for retroviral envelope genes containing an open reading frame and resulted in the identification of 16 genes that we have characterized. We further showed, by quantitative reverse transcriptase PCR using specifically devised primers which discriminate between coding and noncoding elements, that all 16 genes are expressed in at least some healthy human tissues, albeit at highly different levels. All envelope genes disclose significant expression in the testis, three of them have a very high level of expression in the placenta, and a fourth is expressed in the thyroid. Besides their primary role as key molecules for viral entry, the envelope genes of retroviruses can induce cell-cell fusion, elicit immunosuppressive effects, and even protect against infection, and as such, endogenous retroviral envelope proteins have been tentatively identified in several reports as being involved in both normal and pathological processes. The present study provides a comprehensive survey of candidate genes and tools for a precise evaluation of their involvement in these processes.
Completion of the sequencing of the human genome has led to the conclusion that a significant fraction (approximately 8%) of our genome is of retroviral origin, with thousands of proviral sequences disclosing similarities with the integrated form of infectious retroviruses (23). These elements—also called human endogenous retroviruses (HERV)—are most probably the traces of “ancient” infections of the germ line by active retroviruses, which have thereafter been transmitted in a Mendelian manner. According to sequence homologies, these elements can be grouped into distinct families, with copy numbers ranging from a few to several hundred per haploid genome. Families have been tentatively named according to the tRNA normally used to prime reverse transcription in a retroviral replicative cycle (reviewed in references 25 and 43). In agreement with the proposed evolutionary scheme for the presence of these proviral elements, strong similarities between HERV and the present-day infectious retroviruses can be observed at the sequence level and in several instances at the functional level. Actually, phylogenetic analyses based on either the highly conserved reverse transcriptase (RT) domain of the pol gene or the transmembrane (TM) moiety of the envelope gene reveal interspersion of HERVs and infectious elements, suggesting a common history and shared ancestors (4, 40). HERV-encoded retrovirus-like particles have been detected in some tissues by electronic microscopy, revealing structural similarities to exogenous retroviruses (10). It has also recently been demonstrated that the HERV-K(HML-2) family can encode a regulatory protein (called Rec or cORF) which is functionally homologous to the Rev protein encoded by the human immunodeficiency retrovirus (26, 44) and that the coding envelope protein of the HERV-W family can still interact with the cellular receptor of the present-day D-type retroviruses (8). Finally, it has been shown for endogenous retroviruses of other species that their replicative cycle is closely related to that of exogenous retroviruses, with evidence for retrovirus-like recombination in the course of the reverse transcription step (20). As a consequence of the close relationship between HERVs and exogenous infectious retroviruses, it can be proposed that HERVs may still possess some of the functions of infectious retroviruses and as such have pathogenic effects, provided that they are transcriptionally active. Conversely, it is plausible that HERV proteins may have been subverted by the host for its benefit. Along this line, it has been proposed that the HERV envelope proteins could play a role in several processes, including (i) protection against infection by closely related exogenous retroviruses via receptor interference (5, 9, 37), (ii) protection of the fetus from the mother's immune system via a domain (the immunosuppressive domain) located in the TM subunit of most retroviral envelope proteins and known to inhibit immune effector functions (12, 28), and (iii) placenta formation via envelope protein-mediated fusogenic effects that could be involved in the generation of the syncytiotrophoblast (8, 31). An assessment of these putative roles of HERV is rendered extremely difficult as a result of the multicopy nature of these elements, which precludes classical genetic approaches. However, it has to be taken into account that a large fraction of endogenous retroviral genes are no longer coding genes, due to the accumulation of mutations, frameshifts, and deletions. Accordingly, we have made an exhaustive survey of the human genome for complete proviral elements, and specifically those containing a complete and coding env gene. Interestingly, this survey has led to the identification of a limited number of genes that we have characterized. To get insight into their potential functions, we devised a quantitative RT-PCR assay using primers allowing amplification of the coding copies specifically and provide the transcriptome of these human genes of retroviral origin in a large panel of healthy tissues. This analysis results in the unraveling of a series of new transcriptionally active human genes whose functions can now be appraised by classical molecular genetic approaches.
Homology searches were performed using the BLASTN program at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/BLAST), screening the finished (nr [nonredundant]) and unfinished (htgs [high-throughput genomic sequence]) databases with consensus envelope genes for each family. Hydropathy of the envelope proteins was calculated by the Kyte-Doolittle method implemented by the DNA Strider program (29). Alignments were performed with the CLUSTALW multialignment program at Infobiogen (http://www.infobiogen.fr). Chromosomal localizations of the coding env genes were performed at the Ensembl web site (http://www.ensembl.org/Homo_sapiens).
The coding env genes for in vitro transcription-translation assays were amplified with the Expand long-template PCR system (Roche, Indianapolis, Ind.). PCRs were carried out for 35 cycles (1 min at 94°C, 30 s at 58°C, 2 min at 68°C) using 1 ng of bacterial artificial chromosome (BAC) DNA. Primers were as follows: envHT7.3 (GCTAATACGACTCACTATAGGAACAGACCACCATGCACCACAGTATCAACCTTAC)and flR2B1 (TTCTGTTTCAGCTACAACTCTGT) for envH1; envHT7.3 and flRB2 (AGCAATAGTTTGTTAAATTC) for envH2; envHT7.2 (GCTAATACGACTCACTATAGGAACAGACCACCATGAGGGCACCCTCCAATACTTC) and flRB3 (ACCCCATGTTCTAGTCTTCC) for envH3; envKT7 (GCTAATACGACTCACTATAGGAACAGACCACCATGGAGATGCAAAGAAAAGCA) and LTRenvsK (GTGAACAAAGGTCTTTGCATCATAG) for envK1, envK3, envK5, and envK6; envKT7 and 3′fl51C12 (GAATTAGGCTTTCGGGACTTGAA) for envK2; envKT7 and KLTR3′ (C/TTTAAC/G/AG/AAGCATGCTGC/AC) for envK4; envTT7.2 (GCTAATACGACTCACTATAGGAACAGACCACCATGTTGGATTCATCACTCCCA) and envTflanq2 (CTGAAGGGAGTTCCTCCTAGG) for envT; envWT7 (GCTAATACGACTCACTATAGGAACAGACCACCATGGCCCTCCCTTATCAT) and envW64flR2 (ACAGCCAAGCAGGTACAG) for envW; envFRDT7.2 (GCTAATACGACTCACTATAGGAACAGACCACCATGCTCCTGCTGGTTCTCATTC) and envFRDfl3′ (CTGCAGCAGACTCCATCCTTG) for envFRD; enverv3T7 (GCTAATACGACTCACTATAGGAACAGACCACCATGACTAAAACCCTGTTGTATCA) and enverv3AS (GTTAATACTTAGTTAGGGCC) for envR; HS89F2T7 (GCTAATACGACTCACTATAGGAACAGACCACCATGGATCCACTACACACGATTGA) and HS89R3fl (TGTTTTGGGACACCACGAAT) for envR(b); envF(c)2T7 (GCTAATACGACTCACTATAGGAACAGACCACCATGAATTCTCCATGTGAC) and envF(c)2flR3 (GACACTTAATAGTTGCGACA) for envF(c)2; and envF(c)1T7 (GCTAATACGACTCACTATAGGAACAGACCACCATGGCCAGACCTTCCCCACTATGC) and envF(c)1fl3′ (GCCTTGGCAACTAAACCATTC) for envF(c)1. T7 promoter-containing PCR products were ethanol precipitated, and 200 ng of the amplification products was used in the TNT coupled reticulocyte lysate system (Promega Corp., Madison, Wis.) according to the manufacturer's instructions, with [3H]methionine (ICN Biomedicals Inc., Irvine, Calif.) for protein labeling. After electrophoresis of the translation products, sodium dodecyl sulfate-polyacrylamide gels were impregnated with Amplify (Amersham Biosciences, Piscataway, N.J.), rinsed with water, dried, and autoradiographed.
Human BAC clones containing the identified coding envelope genes were purchased from BACPAC Resources (Oakland, Calif.). RNAs from various human tissues were purchased from Clontech (Palo Alto, Calif.), Stratagene (La Jolla, Calif.), and Ambion (Austin, Tex.). Quality of the RNA was assessed on an RNA LabChip (Agilent 2100 Bioanalyzer), and RNA concentration was quantified spectrophotometrically. Five micrograms of each RNA sample was subjected to DNase treatment (DNA-free; Ambion) to eliminate DNA contaminants. One microgram of RNA from each sample was reverse transcribed in a 20-μl reaction using 50 U of Moloney murine leukemia virus RT and 20 U of RNuclease inhibitor (Applied Biosystems, Foster City, Calif.) per reaction, 1 mmol of dA/T/G/C (Amersham-Pharmacia Biotech, Uppsala, Sweden) per liter, 5 mmol of MgCl2 per liter, 10 mmol of Tris-HCl (pH 8.3) per liter, 10 mmol of KCl per liter, and 2.5 μmol of random hexamers (Applied Biosystems) per liter. The cDNAs were then diluted 1/25 in nuclease-free H2O (Promega Corp.).
Oligonucleotides were designed with the computer program Oligo (MedProbe, Oslo, Norway). The special requirements were a melting temperature of 60°C and an amplicon length between 80 and 350 bp. Oligonucleotides were purchased from MWG (Ebersberg, Germany).
Real-time quantitative PCR was achieved by using a cDNA equivalent of 20 ng of total RNA. The reaction was performed in 25 μl using SYBR green PCR core reagents (Applied Biosystems) according to the manufacturer's instructions. PCR was developed with the ABI PRISM 7000 sequence detection system (Applied Biosystems). Amplification was performed using a 2-min step at 50°C and then a 10-min denaturation step at 95°C, followed by 40 cycles of 15 s of denaturation at 95°C, 1 min of primer annealing, and a polymerization step at 60°C. To normalize for differences in the amount of total RNA added to the reaction, amplification of 18S total rRNA was performed as an endogenous control (variation was less than a factor of 3.5). The primers and probe from 18S RNA were purchased from Applied Biosystems. The relative expression in each sample was calculated with respect to a standard calibration curve (the dilution series of genomic DNA). Each sample was analyzed at least twice.
The control plasmid (p11env) containing the 11 envelope amplicons obtained with the primers listed in Table Table11 was constructed first by cloning each amplicon into the pGEMT vector and then by successive three-fragment ligations using fragments from these 11 plasmids and from construct intermediates. A control series of p11env plasmid dilutions ranging from 1 to 0.008 ng was amplified with each primer set to measure relative yield (variation of less than a factor of 2). A control tube containing 1 ng of the p11env plasmid was included in each real-time PCR assay as a reference.
The rationale for the screening procedure to identify the coding envelope genes in the human genome is illustrated in Fig. Fig.1.1. Basically, we used the Repbase database (final control search with update 8.2.1, Oct. 2002) (22), in which each family is represented by a “consensus” element, built from an alignment of all the proviruses belonging to the same family (i.e., copies which cluster together in phylogenetic trees). We implemented it by using RT- and TM-based searches as described in references 4 and 40, which revealed two additional families, the F(c)1 and F(c)2 families. Overall, approximately 100 HERV families were identified. For each family, the envelope gene was tentatively delineated, with the pol gene (which was easily positioned due to its high degree of conservation among retroviral elements) at its 5′ end and with the retroviral long terminal repeat (LTR) at its 3′ end. When the distance between the pol gene and LTR was higher than 1.4 kb (the average length of an envelope gene being 1.9 kb), the envelope gene was considered potentially complete and the HERV family was selected. The resulting list of the 34 corresponding HERV families is given in Table Table2.2. For each family, we then performed a BLAST query on human genome databases using the envelope gene as a probe. Overall, it yielded 476 potentially complete envelope genes (see Table Table22 for their distribution among the HERV families). The coding status of each identified env gene was finally assessed, and only those with an open reading frame (ORF) beginning at the first Met codon of the consensus envelope gene and uninterrupted over >90% of this gene were retained. Among the 476 envelope genes that were individually analyzed, only 16 genes were found to potentially encode envelope proteins. These 16 genes are listed and described in Table Table33 and Fig. Fig.2.2. Ten of these genes had previously been identified (2, 7, 13, 14, 24, 30, 38, 41), and six new genes emerged from this screen, including genes from the FRD, T, R(b), F(c)1, and F(c)2 families and one supplementary gene from the HERV-K(HML-2) family. The chromosomal localization was determined or confirmed by using the Ensembl web site and the corresponding accession numbers. It is noteworthy that one of these genes, envK2, can be found in duplicate in some individuals, since the corresponding provirus is organized as a tandem repeat (35). The two env sequences are 100% identical at the nucleotide level and are both referred to here as the envK2 gene. As can be observed in Fig. Fig.2,2, which describes the 16 putative envelope proteins in comparison with the Moloney leukemia virus envelope protein, the overall length of the proteins is variable, with a minimum of 514 amino acids for EnvR(b) and maximum of 699 amino acids for EnvK. This size variability is also found among exogenous retroviral envelope proteins since, for example, human T-cell leukemia virus type 1 and human immunodeficiency virus type 1 envelope proteins are 488 and 861 amino acids long, respectively.
To ascertain the existence of the ORFs inferred from the nucleotide sequences, an in vitro transcription-translation assay was performed with human BAC clones containing the identified coding envelope genes. The 16 BACs were obtained from BACPAC Resources, except for the H1-, H2-, and H3-containing BACs that had been cloned previously (14). The in vitro transcription-translation assay was performed directly on the env PCR products (generated by using a forward T7-containing primer at the env 5′ end and a reverse primer downstream of the stop codon). In all cases, translation products of the size expected from the sequences in the database were obtained (Fig. (Fig.3).3). For some envelope proteins, additional bands of lower molecular weight were observed, compatible with initiation at internal sites, and interestingly, in one case [F(c)2], additional bands at higher molecular weights were detected, most probably associated with the presence of a frameshift signal at the gene's 3′ end (3).
The predicted hydrophobic profiles of the 16 proteins represented in Fig. Fig.22 allow the identification of characteristic domains of these envelope proteins, namely, the fusion peptide, located just downstream of the proteolytic cleavage site between the Surface (SU) and TM moieties of the envelope proteins, and the transmembrane domain of the TM subunit, which permits the anchorage of the envelope protein in the membrane. Noteworthy, this analysis revealed that one envelope protein (EnvR) seems to be devoid of a fusion peptide and that two envelope proteins [EnvR and EnvF(c)2] disclose a premature stop codon just upstream of their transmembrane hydrophobic domain. Two other domains are delineated in Fig. Fig.2,2, which are characteristic features of the envelope proteins belonging to the C-type and D-type retroviruses: the CWLC domain, involved in the interaction between the SU and TM moieties (34), and the CKS17-like immunosuppressive domain (12). It is noteworthy that, like B-type retrovirus, lentivirus, and spumavirus envelope proteins, the six EnvK proteins lack these two domains.
To get insight into the expression profile of these genes in a quantitative manner, we devised a real-time RT-PCR strategy which uses specific primers designed in such a way that only envelope genes with an ORF should be amplified among all the envelope genes of a given family. To do so, for each HERV family containing a member with a coding element, env nucleotide sequences were aligned with the CLUSTALW program and primers were designed within domains of maximal divergence between the coding copy and the other copies. Primers for Sybr green amplification were devised with their 3′ ends forced at nucleotide positions with, again, maximum divergence between the coding sequence and the others. For the HERV-K(HML-2) family, which contains six coding env genes, this strategy could not be applied due to the too-high sequence conservation between copies. In this case, specific primers were devised that matched all six coding genes, tentatively excluding most other HERV-K(HML-2) envelope genes. The complete list of devised primers is given in Table Table11.
To determine whether the resulting primers fulfilled the requirements for both efficiency and specificity, a first series of assays was performed by PCR amplification of human genomic DNA. As expected, in all cases a single band was observed, thus excluding nonspecific amplifications or amplifications of elements of unusual size (data not shown). Then, PCR products for each couple of primers were cloned into a pGEM-T vector, and at least six clones per amplicon were sequenced for each envelope gene [26 clones for the HERV-K(HML-2) family; see below]. In all cases, the six clones for a given envelope gene were identical and unambiguously corresponded to the coding sequence, being different from all the other aligned sequences within each HERV family. For the HERV-K(HML-2) family, 26 clones were sequenced, of which 21 had an identical sequence corresponding to the sequence of the six coding envelope genes, and 5 corresponded to two other HERV-K(HML-2) envelope genes. Despite this parasitic amplification of noncoding envelope genes, the HERV-K(HML-2) coding envelope primer set was retained since it allowed the amplification of all the coding envelope genes.
A second series of assays was then performed to determine the yield of each pair of primers to normalize env expression levels. To do so, we constructed a single plasmid (see Materials and Methods) which contained the complete set of 11 amplicons [the six envK(HML-2) genes being amplified by the same primer set]. This plasmid, used as a “control” matrix for each couple of primers, actually allowed a refined normalization for possible differences in the yield of the various primers and was used in each real-time amplification below.
A systematic screening of the expression level of the 16 coding env genes present in the human genome was achieved with the primers listed above, on a series of 19 healthy human tissues (Fig. (Fig.4).4). RNA were subjected to DNase treatment and reverse transcribed using murine leukemia virus RT, and a first real-time quantitative PCR was performed using primers for 18S rRNA to normalize for differences in the amount of total RNA added to the reaction mixture (less than a 3.5-fold variation among samples). Real-time PCR was then performed for each pair of specific primers. Control PCRs performed on RNA without the RT step never resulted in any amplification, as expected. The expression levels represented in Fig. Fig.44 using a logarithmic gray scale point to several interesting features. Firstly, there is an enormous variation in the level of expression (up to 4 log) among the different tissues for a given gene as well as among the different env genes. Secondly, and at a more refined level, it appears that all genes are transcribed at a significant level in the testis, still with variations (over a 100-fold) depending on the gene tested. Thirdly, the placenta is the organ where maximal expression can be observed for envR, envW, and envFRD, but the other env genes are expressed very poorly or not at all in this organ. Fourthly, thyroid exhibits a specific and high-level expression of the envT gene not observed for the other env genes in this organ. Fifthly, there is no expression of coding env genes in heart and liver, except for envR (the lowest level of expression for this gene). Finally, two groups of coding envelope genes can be inferred from these transcriptional data: those which display severe tissue specificity together with an overall low expression level, namely, the three envH genes, envR(b), envF(c)1, and envF(c)2, and those which are transcribed in the majority of the tissues, namely, envK, envT, envW, envFRD, and envR. The latter gene is singular, as its level of expression is extremely high in all tissues tested, with the highest value among all coding env genes (with the exception of the thyroid for envT).
The extensive survey of the human genome performed here reveals that among the >10,000 retroviral elements clustered into approximately 100 families, only 16 possess a coding retroviral envelope gene (Table (Table22 and Fig. Fig.2).2). The most important contributor to the coding envelope genes is the HERV-K(HML-2) family, since it comprises six coding envelope genes out of 35 full-length envelope gene copies. The status of the HERV-K(HML-2) family is unique among the coding env-containing families in several respects. Although elements of this family first entered the primate genome more than 30 million years ago (36), new proviral copies have been generated in the recent past. Actually, the date of entry into the human genome of four of the six HERV-K(HML-2) proviruses with a coding env gene (K1 [data not shown] and K2, K3, and K4 ) has been estimated to be less than 5 million years ago. In addition, the proviruses K5 and K6 are polymorphic in humans with allele frequencies of 0.04 and 0.19 (41). Furthermore, the six HERV-K(HML-2) proviruses with coding env have ORFs in other genes [all six have ORFs in their gag genes, four have ORFs in their pro genes, and four have ORFs in their pol genes, resulting in three completely coding HERV-K(HML-2) proviruses], and among the 29 other env-containing proviruses of this family, 10 additional retroviral coding genes can be found.
In comparison with the HERV-K(HML-2) family, the other env-containing families can be considered “old families,” with no human-specific integrations. The date of entry into the primate genome of the coding env gene-containing proviruses varies from 10 million years (for the HERV-Hp62 provirus ) to more than 45 million years [for the coding env-containing HERV-R(b) provirus (data not shown)]. Furthermore, no entirely coding gene besides the env genes has been found so far among those families.
We have devised an efficient method to detect the expression of specific genes belonging to large multigenic families with high homology between their members. It allowed us to quantitatively and specifically monitor the expression level of these endogenous retroviral genes, a task which would be impossible using the Northern blot or classical RT-PCR method, which detect the overall expression of the complete set of elements among each family. The first important outcome in the observed transcriptional pattern is that all genes are transcribed, at least in the testis. Despite the low (although unambiguous) transcriptional level of some of the env genes in this organ [e.g., envR(b), envF(c)2, and the three envH genes], this indicates that all promoters are active. The fact that testis is an organ in which all coding envelope genes are transcribed is not a totally unexpected result, since germ line expression is a common feature of transposable elements in other species, including mice (17, 39) and Drosophila (reviewed in references 11 and 18). In these species, expression in the germ line is associated with a high transpositional activity which may result in stably inherited mutations and most probably plays a role in the generation of genetic diversity in the course of evolution. For the placenta, expression of the coding env genes is much more heterogeneous than in the testis, with an extremely high expression level detected in this organ for three envelope genes, i.e., for the newly identified envFRD gene and for envR and envW (for the latter two, high-level expression had previously been observed at the protein level by using antibodies [8, 42]). Conversely, for five coding envelope genes [namely, the envH genes, envF(c)2, and envF(c)1], expression is undetectable. Several studies on HERV transcription have been performed by Northern blot analyses (reviewed in reference 25), which revealed a preferential expression of all HERV families in the placenta (except for the HERV-K family). It was suggested that a release of retroelements expression might take place in this organ without deleterious consequences for the individual due to the “provisional” status of this organ. Our results suggest that such a “generalized” expression is likely not to stem from all HERV members among each family but rather results from the activation of a limited number of proviral copies (possibly noncoding), for instance via position effects (for an example, see reference 17). Finally, the high-level expression of the newly identified envT gene detected in the thyroid is intriguing because it is one of the rare coding env genes highly expressed in a “permanent” healthy tissue and it is highly expressed only in this organ. This expression might be relevant to the hormone-producing status of the thyroid gland (and this interpretation might similarly hold for envR in the adrenal gland) and to specific sequences in the HERV LTR or in the vicinity of the proviral insertion site.
Detection of an active transcription of the coding envelope genes in healthy tissues together with the probable positive selective pressure responsible for the conservation of ORFs throughout millions of years raises the question of a putative role of these genes in normal cell physiology. The envelope proteins of exogenous counterparts of HERVs possess several functions besides their primary role as key molecules for viral entry: most of them elicit immunosuppressive effects, and some of them have the capacity to create cell-cell fusion. Consequently, the high-level expression of three endogenous coding env genes observed in the placenta (namely, envR, envW, and the newly identified envFRD) could be implicated in two major physiological processes of this organ: fusion of the cytotrophoblast cells to form the syncytiotrophoblast layer and local immunosuppression at the materno-fetal barrier. Along the first line, it has been shown that the envW gene product is a highly fusogenic protein (8). Furthermore, inhibition of envW expression in primary cultures of human villous cytotrophoblasts leads to a decrease in trophoblast fusion and differentiation, suggesting a role in syncytiotrophoblast layer formation (19). Interestingly, the protein encoded by envFRD (also highly expressed in the placenta) might have a similar role, as it can also generate cell-cell fusion (S. Blaise, personal communication). In the case of envR, in contrast, we had previously ruled out its possible implication in fundamental processes of placenta formation by the discovery of a premature stop mutation present in 16% of Caucasians in the heterozygous state and in 1% in the homozygous state (16). Another property of retroviral envelope proteins is to inhibit infection by retroviruses sharing the same receptor—i.e., receptor interference (reviewed in references 5 and 37). It is therefore plausible, though again difficult to demonstrate, that some of the identified endogenous envelope proteins, as demonstrated in the mouse for the Fv4 locus (21), protect cells against infections by exogenous retroviruses. Interestingly, it has been demonstrated that EnvW interacts with the type D mammalian retrovirus receptor (8), and this envelope protein could therefore play a protective role at the placental barrier.
HERV-encoded proteins have also been tentatively involved in several human pathologies, including cancer, immune disorders, and neurological diseases (reviewed in references 25 and 32). By analogy with animal models, it is particularly tempting to implicate endogenous retroviral envelope proteins in tumorigenesis as a result of a transforming (1, 33) or immunosuppressive (6, 27, 28) effect. Numerous studies report on expressed endogenous retroviral sequences in pathological tissues, but in most cases the techniques used (Northern blot or RT-PCR using degenerate primers) did not allow the specific detection of coding sequences. Accordingly, the significance of these expressions remains elusive, and an association with diseases is only speculative. With the help of the primers devised in this study, a systematic and quantitative evaluation of the transcription levels of coding envelope genes in pathological versus healthy tissues should now be possible.
In conclusion, the present survey of coding env genes of the human genome provides a comprehensive list of candidate genes for the possible involvement of HERVs in human physiology and physiopathology. Furthermore, we show that all of the identified genes can be expressed at least in some tissues, and we provide tools to specifically evaluate their transcriptional status in physiological and pathological conditions. The identified genes should allow further studies of their function via classical genetic approaches (identification of susceptibility loci and searches for polymorphisms among the human population as previously performed for the HERV-R locus ), while the limited number of unraveled coding sequences reduces a hitherto insoluble multigenic analysis to a simpler one.
This work was supported by the CNRS and by grants from the Ligue Nationale contre le Cancer (Equipe Labellisée).