This study describes an attempt to systematically identify and characterize endogenous retroviruses in the cow genome. Although we used only located genomic information, leaving contigs untested, in this study we identified nearly 10,000 putative BoERVs that were distributed in a nonhomogeneous way across chromosomes. By comparing three different methods for ERV detection, we found that each method yields different and, in some cases, discordant information.
The BLAST-based search detected the fewest elements (928 elements), most of which were also detected by Retrotector. As the criteria used in the BLAST-based search were quite strict, the elements that it detected could be considered to be highly conserved ERVs.
LTR_STRUC detected 4,487 elements. It identified more elements without the RT region than did BLAST and Retrotector. It also detected many elements that were not identified by BLAST and Retrotector. Because LTR_STRUC is designed to find elements flanked by LTRs, it may be able to detect elements with a noncanonical structure (22
Retrotector detected the most possible BoERVs (9,698) and had the most overlapping detections. In most of the elements detected by Retrotector, all three main genes were identified. It is thus clear that it is more efficient than BLAST-based detection and able to detect elements that are not as highly conserved (38
Comparison of different genomes is problematic because various methods have been used to detect ERVs. In previous studies of human (20
), mouse (26
), rat (7
), dog (21
), cat (30
), and cow (5
), RepeatMasker and Repbase were used to detect repetitive elements. However, as stated previously by Sperber et al. (38
), results from RepeatMasker and Retrotector cannot be directly compared because the RepeatMasker output is difficult to organize into proviruses. In addition, Retrotector rarely detects elements less than 1,000 bp long, and RepeatMasker can detect much shorter repeats and single LTRs. Moreover, the secondary integration of proviruses into each other, a feature of old elements, can also be a problem (38
In a previous study of the cow genome, 142,096 ERVs were detected with PALS/PILER (5
), while we identified 928 with BLAST, 4,487 with LTR_STRUC, and 9,698 with Retrotector. The genome coverage of the elements detected by the different programs was also discordant: 1.75% of the genome by PALS/PILER, 0.36% by BLAST, 1.77% by LTR_STRUC, and 4.29% by Retrotector. These data suggest that the coverage is similar or greater with fewer elements. Thus, the abundance of short elements by methods such as RepeatMasker and PALS/PILER make cross-species comparisons difficult. In addition, the classification of the elements detected by the different programs adds complexity to the comparison: RepeatMasker uses the Repbase annotation (13
; Smit et al., personal communication), and Retrotector uses its own motif database (38
). Thus, we found that RepeatMasker and Retrotector did not routinely sort the same element into the same class. For example, among ERVs classified as class I by the Retrotector method, 64.72% were classified as ERV1 and 35.28% were classified as ERVL by RepeatMasker.
One explanation for the different distributions of ERVs across bovine chromosomes could lie in the target elements employed by the methods used to identify ERVs. In the analysis of the chromosomal distribution of the elements detected, the various methods showed different chromosomes that did not follow any homogeneous distribution. The nature of the elements detected by each method could be a good reason for this discrepancy.
Across chromosomes, the BLAST-based and Retrotector methods identified significantly more ERVs in the X chromosome than would be expected from a homogeneous distribution. A similar excess of ERVs has been observed for the human X chromosome (41
The number of ERVs detected was positively correlated with chromosome length (P
< 0.001 for all three methods) and negatively correlated with the GC content of the chromosome (P
< 0.001 for all three methods). No correlation was observed between the number of ERVs detected and gene or pseudogene density. In humans, the number of class I and class III ERVs—but not the number of class II ERVs—has been negatively correlated with GC content (24
). The insertion preferences of ERVs in the cow genome should be analyzed in greater detail to gain a better understanding of the preferences of bovine ERVs.
Phylogenetic analysis based on the RT region of a number of selected elements was used to cluster these elements into 24 putative families, which we called BoERV families. Previously, 4 retroviral families were detected (43
), which are included in the 24 families that we identified. Although it wad previously suggested that the BERV-γ4 family, referred to here as BoERV3, was the most abundant (43
), we found that BoERV1 was actually the most abundant. This family had not previously been identified in any mammal. One possible explanation for this is that the members of this family have some nucleotide differences in the region where hybridization took place with the primers used for pig, sheep, and cow (43
). We used PCR to amplify a 150-base sequence in sheep, so it is possible that BoERV1 could be a ruminant-specific ERV family.
The comparison of ERV family numbers was limited to four species with defined families (human, chimpanzee, mouse, and rat). In cows, the number of families (24 putative families) was higher than that for mouse (20 families) (23
) and lower than those for chimpanzee (42 families) (29
) and human (31 families) (15
). In the case of rodents, where information is available only for class II elements in two species, the number of families in cow (six families) was similar to those in rat and mouse (seven families) (2
). To the best of our knowledge, no information is currently available on dog and cat ERV families.
We did not detect any class III-related ERVs. Although this could be an artifact due to the distance from the reference sequences used for the BLAST-based search and the limits of class III element detection by Retrotector (38
), it is more likely because the presence of class III ERVs in the cow genome is limited. In fact, although a number of sequences related to class III were amplified previously by Bénit et al. (3
), the amplification signal was weak, and these sequences were quite short.
The relationship between representatives of the ERV families from different species is interesting. In general, the lineages of the different ERV groups are divided following the species phylogeny, with humans and chimpanzees on one side and cows, sheep, and pigs on the other side. Representative elements of the scarce murine class I families were included in our analysis, but their relationship with representative elements of the ERV families of other species remains obscure. Even so, representative elements of the human/chimpanzee groups and, to a lesser extent, mouse/rat and pig/sheep groups tend to follow the pattern of previously reported comparisons of each pair (2
). Following this pattern, the representative bovine elements cluster with the representative sheep elements, as obtained by experiments (17
) with most of the lineages. In some cow breeds, ovine enJSRV-related env
, orf-x, and LTR sequences have been detected (25
). However, bovine ERVs closely related to enJSRVs were not detected in the version of the genome used in our study. This genome sequence belongs to a Hereford animal, while Morozov et al. analyzed animals from Simmental and Limousine breeds. For humans, it was suggested that a combination of genetic and environmental factors could contribute to determining the prevalence of enJSRV-related sequences in different populations (34
). Thus, it is possible that different breeds of cow could also have different prevalences of enJSRV-related sequences.
Related to the relationship of ERV families of different species, in one lineage, representatives of human, chimpanzee, pig, and sheep groups were present, while cattle elements were absent. To account for this absence, we estimated the insertion time of the elements in this lineage. As there is no genomic information available for pigs and sheep, estimates were available only for human (19.5 to 8.9 MYA) and chimpanzee (33 to 15.8 MYA) elements. These insertion times were later than the divergence of ruminants and primates. Based on the weak support of the tree topology, a single infection is unlikely. In this lineage, two independent infections by a similar virus could have been detected, and in the case of ruminants, it is possible that cows lost this element at some point.
The absence of some ERV families in cows, compared with sheep and pigs, has prompted some authors to suggest that cows have a limited number of ERV families (43
). Taking into account that the numbers of ERV families described were 31 for humans (15
), 42 for chimpanzees (29
), 20 for mice (23
), and 24 putative families for cows (this study), BoERVs may not be as scarce as previously stated. Moreover, we detected one family, BoERV1, that had not been detected previously but that appears to be present at least in ruminants.
As described above, we did not detect any class III elements. It was suggested previously that in primates and mice (18
), ERVs related to this class have been subjected to one or two bursts of copy number. If so, it is possible that the difference in the number of ERV families with primates and mice could be based on this burst of class III-related ERVs. Finally, the whole picture could be also confused by the intense selective breeding processes that have accompanied the domestication of cows (4
In conclusion, we identified several thousand ERVs in the genome of Bos taurus by three different methods. The number detected depended on the technique used, ranging from a low of 928 using a BLAST-based method to 9,698 using Retrotector. When attempting to detect new ERVs, the use of different methods is advisable. ERVs did not appear to be randomly scattered across the chromosome but were more abundant on some, especially the X chromosome, than on others. Among the 24 detected families, 20 were newly described ERV families. The most abundant BoERV1 family is described for the first time. Finally, representatives of ERV families from rodents, primates, and ruminants showed a phylogenetic relationship following their hosts' relationships.
This is indeed the first genome-wide approach for the detection and characterization of bovine endogenous retroviruses. Further in-depth analyses are thus needed to uncover the whole picture of these genomic elements in cattle.