Retroviruses are the only group of viruses known to have left a fossil record, in the form of endogenous proviruses, and some of 8% of the human genome is made up of these elements1,2. Although many other viruses, including non-retroviral RNA viruses, are known to generate DNA forms of their own genomes during replication3,4, none has been found as DNA in the germline of animals. Bornaviruses, a nonsegmented, negative-sense RNA virus, are unique among RNA viruses in that they establish persistent infection in the cell nucleus5–7. Here we show that elements homologous to the nucleoprotein (N) gene of Bornavirus exist in the genomes of several mammalian species, including humans, non-human primates, rodents and elephants. These sequences have been designated endogenous Borna-like N (EBLN) elements. Some of the primate EBLNs contain an intact open reading frame (ORF) and are expressed as mRNA. Phylogenetic analyses revealed that EBLNs appear to have been generated by different insertional events in each specific animal family. Furthermore, the EBLN of a ground squirrel was formed by a recent integration event, while those in primates must have been formed more than 40 million years ago. We also show that the N mRNA of a current mammalian Bornavirus, Borna disease virus (BDV), can form EBLN-like elements in the genomes of persistently infected cultured cells. Our results provide the first evidence for endogenization of non-retroviral virus-derived elements in mammalian genomes and give novel insights not only into generation of endogenous elements but also into a role of Bornavirus as a source of genetic novelty in its host.
Bornaviruses are the only animal RNA viruses that achieve a highly cell-associated life cycle within the nuclear envelope5–8, and can therefore provide not only novel paradigms of RNA virus replication but also insight into dynamics of RNA molecules in eukaryote cells. In an effort to understand whether Bornaviruses mimic host factors to maintain persistent infection in the nucleus, we searched human protein databases for sequences with similarity to BDV proteins. This search identified two hypothetical human proteins (GeneID LOC340900 and LOC55096), each of which has significant sequence similarity to BDV N (Fig. 1a and Supplementary Table 1). BDV N is a major structural protein, which tightly encapsidates the viral RNA to form the nucleocapsid. The LOC340900 sequence encodes a protein of comparable length (366 residues) to BDV N (370 residues), while LOC55096 seems to contain multiple frameshift mutations relative to BDV N, resulting in a shorter ORF length (Fig. 1a). Both LOC340900 and LOC55096 showed an overall 41% sequence identity and 58% similarity to BDV N and 72 % identity to each other. The close relationship between BDV N and the homologous genes was further demonstrated by the alignment of regulatory sequences on either side of BDV N (Fig. 1b). The S and T motifs in flanking sequences of both putative human proteins were well conserved with those of BDV (Fig. 1b). In addition, a poly-A sequence appears after the T1-like motif in the 3’ flanking region of LOC55096 (Fig. 1b). The homology of the human genes to BDV N was also confirmed by a permutation test (Supplementary Fig. 1). These findings suggested that both human genes may be endogenous elements related to BDV N gene, and therefore we designated them EBLNs (LOC340900; EBLN-1 and LOC55096; EBLN-2).
To investigate the presence of EBLN sequences in other animal species, we conducted tblastn searches using BDV N as a query in eukaryote and whole-genome shotgun databases at NCBI. Sequences with blast E-values of 10−10 or lower were identified as EBLNs. We found two additional human elements (EBLN-3 and -4) as well as a number of related sequences in various mammalian species, including marsupials (Supplementary Table 2). Orthologous genes to human EBLNs were identified in the genomes of non-human anthropoid primates, including chimpanzee, gorilla, orangutan, and macaque (Supplementary Table 2). We also detected primate EBLNs in the genomes of the suborder Strepsirrhini, including the mouse lemur and Garnett's galago. Furthermore, two species of the Afrotheria, African elephant and cape hyrax, and four rodents were found to have EBLNs with E-values of less than 10−20 (Supplementary Table 2). An EBLN locus with a high level of similarity to BDV N was also identified in the thirteen-lined ground squirrel (TLS) genome (Supplementary Fig. 2a). Like the human EBLNs, the TLS EBLN contained a 3’ poly A sequence, as well as BDV S and T signal motifs, in its 3’ flanking region (Supplementary Fig. 2b). Almost all EBLN fragments, except for EBLN-1 and the TLS gene, contained multiple stop codons in the predicted coding sequences, or lacked the identifiable flanking sequences. In addition, we found that all anthropoid EBLNs, except for EBLN-4, are expressed as mRNAs in some human and monkey-derived cell lines (Supplementary Fig. 3). A previous study reported the interaction of human EBLN-2 with other cellular proteins, such as AP1S1, TUSC2/FUS1, and FANCC (Supplementary Table 1) (ref. 9), suggesting that anthropoid EBLNs may encode functional proteins.
To further investigate whether other mammalian species contain EBLN-related sequences in their genomes, we conducted Southern blot hybridization under low-stringency conditions using human, murine and TLS EBLN and BDV N sequences as probes (Fig. 1c and d, Supplementary Fig. 4). Along with the clear signals in primate genomes, we detected reproducible faint positive bands in murine and shrew genomes when using a human EBLN probe (Fig. 1c, dots). The signals were also observed using a mouse EBLN probe (Fig. 1c, arrowheads), indicating that the faint bands are most likely to be EBLN-related sequences. In fact, EBLN-like sequences, albeit with E-values greater than 10−10, were found in the Eurasian shrew genome in our tblastn searches. On the other hand, except for TLS, no positive band was detected by the TLS probe in the genomes of several different squirrel species, such as Woodchuck (Marmota spp.), the closest species to the TLS (Spermophilus spp.) (Fig. 1d) (ref. 10), suggesting that the ground squirrels are likely to be the only host species of EBLN within the squirrel family. The BDV N probe detected many faint and smear bands that include the signals detected by EBLN-specific probes in both selected mammalian species and the squirrel families (Supplementary Fig. 4), suggesting that EBLN-related fragments are more widely distributed in the mammalian genome.
We next performed a comprehensive phylogenetic analysis using nucleotide sequences of all EBLNs with E-values less than 10−20 (Fig. 2 and Supplementary Fig. 5). In addition to EBLNs, we included avian bornaviruses (ABVs)11 and an exogenous reptile bornavirus (RBV) sequence, which was detected in a cDNA library from a Bitis gabonica (Gaboon viper) venom gland12 (Supplementary Fig. 6). As shown in Fig. 2, the anthropoid and murine EBLNs are clustered phylogenetically within each host order. By contrast, EBLNs from other species, including African elephant, rock hyrax, and guinea pig, form branches independent from the evolutionary lineage of their hosts, indicating that these EBLNs had most likely invaded each species via independent integration events. Interestingly, the TLS EBLNs form a tight cluster more closely related to modern exogenous Bornaviruses than to those of other animals. Considering that a closely related species does not contain EBLNs, the integration of squirrel EBLN could have been a very recent event. A phylogenetic analysis using all primate EBLNs, including marmoset (Supplementary Fig. 7), revealed that the integration events leading to the primate EBLNs occurred in the Haplorrhini at least before the split between rhesus macaque and marmoset.
To investigate whether current Bornaviruses are able to be copied into DNA to produce EBLN-like elements, we first performed PCR analyses using DNA of persistently BDV-infected cells. As shown in Fig. 3a and Supplementary Table 3, BDV DNA was clearly detected in some cell lines by a primer set targeted to the BDV N region. To understand which viral RNA species serve as template for the DNA form of BDV, we used several primers within the BDV genome for amplification. The results showed that primer sets straddling the boundaries of BDV transcription units could not amplify BDV-specific DNA (Fig. 3b and c), indicating that the DNA is transcribed from mRNAs of BDV. We detected BDV-specific DNA in the brains of persistently BDV-infected mice (Supplementary Fig. 8), suggesting that BDV can produce DNA forms in vitro and in vivo. We next performed Alu-PCR to investigate whether BDV DNA detected in the infected cells exists as integrated or extrachromosomal DNA. As shown in Fig. 3d and Supplementary Fig. 9, an Alu-specific PCR product was detected in BDV-infected cells only when using an N-specific forward primer about 30 days postinfection. This observation indicated that while BDV DNA in infected cells may be mainly extrachromosomal, the N gene is integrated into the host genome during persistent infection.
We further characterized the BDV DNA insertions and flanking cellular sequences by using Alu-PCR and inverse PCR (Supplementary Fig. 10a and b) (ref. 13). Integration sites were present on various chromosomes (Fig. 4). Similar to some mammalian EBLNs, many BDV DNA insertions contained a 3’ poly-A sequence (Fig. 4b and c). In addition, integrations of truncated BDV N DNA were also found in some clones. No apparent consensus sequences were found at the sites, although target site duplications (TSDs) were detected in some clones from the inverse RT-PCR (Fig. 4c). We also found deletions, as well as sequence rearrangement, of host genome adjacent to BDV DNA insertions (Fig. 4c). These results indicate that modern BDV is able to produce DNA forms leading to insertion of EBLN-like elements into its host’s genome.
This report is the first to provide evidence of endogenous sequences derived from a non-retroviral RNA virus in mammalian species. Phylogenetic analyses demonstrate that the oldest primate EBLN observed must have appeared in an ancestor of primates after the separation between Strepsirrhini and Haplorrhini, implying that Bornaviruses have coexisted with primates for an evolutionary history stretching at least 40 million years. Thus, Bornaviruses are the first non-retroviral RNA virus whose existence in prehistoric times has been confirmed. To date, the evolution/origin of RNA viruses is a major puzzle in the relationship between viruses and mammalian hosts, because simple molecular clock calculations using an average rate of nucleotide substitutions estimate the origin of RNA viruses to be a very recent event14–16. Despite replication during tens of millions of years as exogenous viruses, the amino acid sequences of current BDV N seem surprisingly conserved relative to EBLNs. This conservation demonstrates the inapplicability of simple molecular clocks to RNA virus evolution. Discovery of EBLNs in several mammalian species will help shed light on the evolutional history of RNA viruses and their hosts.
The sequence characteristics of both EBLNs and BDV DNA insertions in host genomes suggest that the reverse transcriptase activity encoded by retrotransposons, such as long interspersed nucleotide elements (LINEs), is likely to be involved in the reverse transcription and integration of Bornavirus mRNAs, although some clones showed no apparent TSDs17. LINE-1s (L1) are abundant retrotransposons, whose enzymes are able to sometimes target cellular mRNAs and produce processed pseudogenes in mammalian genomes18–20. The organization of sequences flanking EBLN-2 is consistent with the action of L1. The sequence reveals the presence of an AluSx element immediately downstream of the 3’ poly-A tail of EBLN-2 (Supplementary Fig. 11). The key observation is that the EBLN-2/AluSx element is flanked by a perfect 9-bp TSD. Since the AluSx itself is not flanked by TSDs and the 3’ end of Alu is known to be recognized by L1 during target-primed reverse transcription, the presumed EBLN-2/AluSx chimera element was most likely created and integrated by the L1 machinery. Thus, it is likely that EBLNs are processed pseudogenes derived from ancient Bornavirus infections. At present, the reasons why Bornaviruses but not other non-retroviral RNA viruses, and why only N and not other genes, have been preserved in mammalian genomes as endogenous elements are not clear. There are several possibilities. First, Bornaviruses may have greater access to the germline. Second, the BDV N mRNA, like some cellular RNAs, may have features that, by chance, make it a favorable template for L1-mediated reverse transcription21,22. Third, the predominant transcription of BDV N mRNA in infected cells may also favor its association with the L1 replication machinery. The selectivity for BDV N mRNA implies a role for specific structural features, perhaps in conjunction with one or more of the other possibilities. Our data also raise the possibility that, like some endogenous retroviruses, EBLNs may have some function in their host species. An analysis of the non-synonymous to synonymous substitution ratios among anthropoid EBLNs suggests functional, albeit weak, evolutionary conservation. This finding implicates Bornaviruses as a novel source of genetic innovation in their hosts. Further studies will be needed to explore this possibility.