To further our understanding of ERV integration complexity and effects on host genome function, we have studied the high-quality dog (Canis familiaris) genome sequence of a female boxer (canFam2 
). We screened for CfERVs using the platform-independent Java program RetroTector©
, designed to identify proviral retrovirus sequences in eukaryotic genomes 
was efficient in detecting ERVs in human and chimpanzee genomes 
and automates filtering and categorization of proviral chains using a series of criteria and likelihood tests.
We detected 407 CfERVs corresponding to 0.15% of the entire dog genome. This amount is substantially lower than in human and mouse which have 0,8% and 2% ERVs, respectively () using the same selection criteria. The amount of CfERV in dog is as low as the Red Junglefowl (0.2%) and suggests either retrovirus restriction or purging of ERVs by yet unknown mechanisms, The Canids may also have had less retroviral infections compared to primates and rodents. However, the paucity of known extant retroviruses in dogs compared to other mammals as well as the current status of the dog assembly and the limited number of carnivore species sequenced to date preclude firm conclusions regarding mechanisms and processes leading to the low CfERV content observed in dog.
Gamma-like and Beta-like CfERVs were most common in the dog (Table S1
). Of the total 407 CfERVs, 44 loci show ORFs with varying conservation. Of these loci, 36 CfERVs had both flanking LTRs, suggesting that around ten percent of the CfERVs are relatively recent integrations. Further, retained functionality of these CfERVs cannot be excluded since complete Gamma-like CfERVs seem to have retained both LTRs to a greater extent whereas related proviruses with only detectable Gag-Pro-Pol have lost their LTRs.
In addition, the Spuma-like and Gypsy-like CfERVs are both represented at low frequencies. However, the former is likely underestimated by RetroTector©
due to limited Spuma-related virus references whereas Gypsy-like chains are rare in eutherian genomes 
The integration landscape in the dog genome revealed CfERVs mostly in intergenic regions in chromosome X and on all autosomes (). The strongest correlation obtained was between number of CfERV integrations and chromosome length (Fig. S2A
). Chromosome 1 is in fact richer in gaps that are likely to be ERV containing compared with the remaining autosomes, suggesting that they may in fact harbor more rather than fewer unassembled ERVs than the rest of the genome (Table S2
). The X chromosome is a special case because in the heterogametic sex, meiotic crossing over is constrained entirely to the pseudoautosomal regions (PARs), with the remainder of the chromosome being non-recombining, meaning that the non-PAR X has about half the recombination rate one would expect from obligate crossing over. If efficiency of removal of mobile element insertions increase in regions of higher recombination 
, it would explain the difference in the number of annotations in the sex chromosome and the integration desert in the X_PAR region in dog (chrX:1-6,53 Mb). Furthermore, the analyzed individual is female but the CfERVs integrated in chromosome X would reflect millions of years of canine evolution.
Interestingly, CfERVs integration patterns correlated better with ncRNAs than with protein coding genes (Fig. S2E–F
), suggesting a selection against integrations in chromosomal transcription units. This was also supported in our neighborhood analysis of CfERVs with complete LTRs (), where the integration landscape may have retained those a priori selectively neutral integrations. In the neighborhood plots extended with all CfERVs against protein coding genes from different species (), antisense integrations with respect to gene orientation were clearly favored. Surprisingly, when we plotted only the ncRNA genes annotated in the dog genome by the UCSC pipelines (), the orientation preference changed. This could either imply unknown interactions between ERVs and ncRNAs, or more likely due to under-annotation of ncRNA genes in the dog assembly 
CfERV integrations in the close vicinity of genes showed: one integration estimated to about 6 mya, three integrations about 107 mya, and one integration that could not be age estimated. As expected, CfERVs within genes are predominantly in anti-sense orientation relative to the transcriptional direction of the host genes () assuming that they are less likely to interfere with chromosomal transcription and splicing. These results are in agreement with the integration patterns in both human and mouse 
. However, conclusions about CfERV interference with gene function as previously hypothesized 
, remains to be drawn.
Among the most interesting CfERVs identified in this study we found a group of 33 HERV-Fc-like proviruses, some recently integrated, that were divided into CfERV-Fc subgroups (). Until now, these elements have exclusively been described as a small group intermediate to ERV-F/H in the primate lineage 
. Nearly intact HERV-Fc-like proviruses, also previously referred to as possible "midwife elements”, are hypothesized to contribute proteins in
trans to mobilize similar but less complete ERVs 
. The CfERV-Fc in dog were surprisingly numerous compared to the characterization of the low copy number HERV-Fc in primates 
. Interestingly, 10 recently integrated CfERVs had all puteins, albeit some with mutations, as well as both flanking LTRs, making them candidates for spread by complementation in trans
Although highly diverse, young (less or equal to 12 mya), middle (12 to 25 mya) and old chains (more than 25 mya) as dated by LTR divergence, the majority of candidate CfERV-Fc chains (n
28) could be used in Pol phylogenies. Thus, based on LTR divergence and the most conserved Pol, we have constructed a hypothetical scenario for presence of CfERV-Fc in canids. Our results are consistent with four recent amplification bursts, in the canFam2 dog genome. The four identified groups (i.e: CfERV-Fc1 to -Fc4) presented strong local identities within cluster nodes and 50–60% toward the external branches, which suggested a specific ancestry for each group. Four of the six unclustered CfERVs were devoid of RT motifs complicating their classification. Moreover, local similarities to HERV-Fc templates even after considering different mutation rates along the genome indicated that the CfERV expansion is unlikely to originate from HERV-Fc1.
The phylogenetic analyses suggest that the CfERV-Fc have evolved as exogenous retroviruses that successfully infected the ancestral canid population in bursts, possibly followed by mobilization of endogenous retroviruses. The observed CfERV-Fc sequence differences agree with mutations from extracellular replication and infection by virus strains at different evolutionary stages. It appears that several template sequences have evolved to form the different clusters () since copy numbers have increased in bursts from extracellular replication rather than retrotransposition. Thus, our proposed CfERV amplification scenario favors a random template master gene model 
over a strict master gene model 
The origin of these proviruses cannot be strictly deduced based on current information. They may either represent new strains of retroviral integrations or alternatively, they may be derived from a founding lateral transfer from other CfERVs or from master ERV-Fc-like sequences. The HERV-Fc2 master may have co-evolved in parallel to the Canidae
CfERV-Fc and infected primates some 20–32 mya 
. Thereafter, HERV-Fc1 evolved and infected ancestors of pongids and hominids. According to the low number of infections, ERV-Fc seem to have been unsuccessful in primates but rather more successful in canids, and possibly also in other carnivores. The relatively new HERV-Fc acquisition in baboon 
is more related to HERV-Fc than to our CfERV-Fc elements and may have infected only cercophitecoids.
In conclusion, the ancestors of Canidae has been successful in protecting and/or purging its genome from retroviral integrations but also appears to have been susceptible to certain retrovirus infections. We observed that over time, retrovirus integrations have been selected towards neutral sites. The relative contribution between the permissive selection of these proviruses and the role of domestication of a phenotypically diverse species such as the dog remains unclear. However, our findings support a possibility that some of these proviruses may serve as templates for recombination and that observed proviral ORFs could provide proteins in trans to mobilize similar but defective ERVs. The gammaretrovirus-like CfERV-Fc described here, in relatively high copy numbers and with long estimated range of integration time, provides useful insights and understanding of a HERV-Fc-like group that is larger and older than previously considered. Further studies to elucidate functionality and ERV integration polymorphism in multiple dog breeds may define the acquisition pattern of the proviruses and their complex evolutionary relationship with the host in finer detail.