This analysis indicates that mA3 has been involved in genetic conflicts through Mus evolution. This gene shows strong positive selection marked by an increase in replacement versus synonymous substitutions. Six of the 10 codons that evolved under strongest positive selection are in two clusters in the N-terminal catalytically active CDA. Five of these 6 codons specify different amino acids in MLV and MMTV restrictive and nonrestrictive mouse strains, and mutational analysis suggests these residues contribute to antiviral activity. We also demonstrate that the antiviral allelic variant has acquired a retroviral LTR insertion, the presence of which is associated with elevated mA3 expression levels in the spleens of inbred and wild-derived mice.
Retroviral insertions can be important functional components of the host genome, and can clearly affect host gene expression. Examination of spontaneous mutations in the mouse suggested that 10–12% of all mutations are due to ERV insertions 
. Like the mA3 LTR, most of these mutant-associated ERVs are in reverse orientation in introns, and the responsible mutational mechanisms include two of relevance here: aberrant splicing and enhanced transcription driven by the ERV LTR. While the mA3 LTR is inserted at a splice donor site, it does not alter splicing of the associated intron, and although all mice carrying this LTR produce the same Δexon5 mA3 isoform, the absence of this LTR in at least one mouse species producing that isoform (M. m. molossinus
) suggests that the LTR was acquired by mice already preferentially producing this splice variant. As for LTR-driven altered expression levels, two of three previous studies that compared mA3 RNA levels in virus-resistant and susceptible strains reported that mA3 expression levels are significantly higher in mice carrying the LTR+ C57BL allele compared to LTR− BALB/c 
. Our analysis of mA3 expression levels shows a correlation between the presence of the LTR and elevated expression in a variety of inbred strains and mouse species. Because enhancer activation of cellular genes by viral LTRs can occur with insertions in either orientation and at considerable distance from the cellular promoter, it is thus possible that the enhancer of this inserted LTR sequence drives the elevated expression observed in the LTR+ mice. This elevated expression in conjunction with altered splicing may together have contributed to the evolution of the antiviral C57BL mA3. It has been suggested that the Δexon5 isoform has enhanced antiviral activity due to its resistance to the viral protease 
; elevated expression of this variant due to subsequent LTR insertion would further boost the survival value of this factor.
It is particularly intriguing that this X-MLV LTR sequence is found in NZB and CZECH mice and one breeding line of M. m. castaneus
. These mice are unusual among laboratory strains and wild mice in that they harbor highly active X-MLV ERVs producing infectious virus, and such active ERV expression increases the likelihood of insertional mutagenesis. NZB mice are characterized by lifelong viremia with X-MLVs 
. M. m. castaneus
and CZECH mice are among wild mouse Eurasian populations with highest copy number of X-MLV ERVs 
, and we have isolated infectious X-MLV-related virus from both of these wild mice 
. If in fact the inserted MLV LTR causes elevated mA3 expression, then this would provide another instance of an ERV sequence that is co-opted by the virus-infected host for an antiviral function, other examples in the mouse being Fv1
, and Rmcf 
In addition to differences in splicing and expression levels, mA3 genes of virus resistant and sensitive mice differ in protein sequence. Our phylogenetic analysis showed that most of these polymorphic sites are under strong positive selection. The alignment of these sites with functionally important residues in the hA3G C-terminal active CDA suggests they serve similar roles in the mouse and that therefore, this function has been important during Mus
evolution. That this evolutionarily important function is related to mA3 deaminase activity is supported by the observation that the great majority of these selected residues are in the N-terminal half of mA3 which encodes the active Z2 CDA 
and that antiviral activity resides in the first 194 amino acids (exons 1–4) 
. In the predicted mA3 structure, these positively selected residues are positioned in one of two loops assigned functional importance in hA3G, AC loop 1 and a cluster of residues facing AC loop 1 on the other side of the putative substrate groove 
. The charged and hydrophobic residues in these regions are positioned to maintain structural integrity of the groove and to interact with one another and the nucleic acid substrate in a way that could contribute to substrate specificity.
Three positively selected residues, G34, K37 and G38, in the mA3 AC loop 1 sequence KNLG
RKD are most likely responsible for providing conformational freedom (in the case of the G34 and G38) and for interacting favorably with the phosphate backbone (in the case of K37). The electrostatic contributions of K37 along with K40 and D41 probably play an important role in determining substrate affinity and specificity while Y35 is in a position to stack with a nucleotide base. The analogous sequence in hA3G is NNEPWVRGRHE (207–217) with R213, H216 and E217 positioned to interact electrostatically with a phosphate backbone and W211 able to stack with a nucleotide base. R39 (mA3) and R215 (hA3G) are positioned similarly in that the residue provides an elaborate H-bonding network defining the shape of AC loop 1 
Five positively selected residues (V134, Q135, D136, E138 and T139) lie in a region that comprises the end of helix 4 and an adjacent loop that define the side of the substrate binding groove opposite of AC loop 1 (mA3 sequence YNVQD
). Close inspection of this region in the mouse model reveals that the sidechain of D136 is in a position to H-bond with T139 maintaining the helical nature of helix 4 despite the presence of P137. This has the result of allowing Q135 to form the top-side of the groove allowing V134, N133 and Y132 to form the side of the groove with Y132 in position to stack with a nucleotide base. Y132 is invariant in our mouse sequences along with nearby W102 which defines the floor of the groove. The homologous segment of human APOBECs has now been implicated in the distinctive substrate preferences among AID/APOBEC family members which target cytosine within different sequence motifs. A recognition loop responsible for these preferences (hA3G sequence IYDDQGRCQ) lies between the β4 strand and the α4 helix (, residues 314–322) 
. That this highly variable region controls substrate preferences is also supported by mutational analysis 
. Alignment of the active CDAs of hA3G and mA3 indicates that this loop overlaps the 134–139 cluster of positively selected residues in mA3. This suggests that genetic conflicts between host and pathogen in this case produced positive selection that may be driven, not by protein-protein interactions, but by the interaction of mA3 and varying ssDNA substrates, a suggestion that is also consistent with the finding that the efficiency of substrate deamination is sensitive to ssDNA secondary structure 
Mutational analysis of 6 codons in the two clusters under positive selection showed that introduction of BALB/c residues, particularly in the 134–139 cluster, reduced antiviral activity against Friend MLV. Further studies may determine if the differences associated with overexpressed mA3 in transiently transfected cells have physiological relevance, and whether substitutions at these sites similarly affect restriction of other retroviruses. It has been reported that mA3 shows stronger antiviral activity against HIV-1 than against MLV 
, suggesting that the genetic conflicts responsible for positive selection during Mus
evolution may have resulted from interactions with pathogens unrelated to the FrMLV used here.
Previous phylogenetic analysis of hA3G had identified 21 sites under very strong positive selection, 9 of which are in the active CDA 
. One of these sites, R213, aligns with one of the clusters of residues (positions 34–38) under strong selection in mA3; however, the analysis of hA3G did not identify selection in the region aligning with the second cluster under strong selection in mA3 (positions 134–139), although this segment is a substrate recognition loop that is highly variable among members of the AID/APOBEC family 
. The additional sites identified to be under positive selection in the hA3G active CDA have no positively selected counterparts in mA3. Among these additional sites in hA3G, two, H248 and K249, lie in AC loop 3 
. Mutagenesis and analysis of hA3G structure have implicated this loop in antiviral deamination 
, but much of AC loop 3 is deleted in the mouse, leaving only the key residues at the base of this loop that align with critical residues N244 and R256. The residues at these sites are invariant in our mA3 sequences suggesting their evolution is under purifying selection. The differences in AC loop 3 between hA3G and mA3 and the fact that different residues are under selection in hA3G and mA3 suggests there may be functional differences between these proteins.
Our analysis of the full-length mA3 sequences also identified four sites under positive selection in the C-terminal half of the protein (, S1
) that carries the Z3 CDA that has been determined to be inactive 
. It is not clear what role these residues serve. An antiviral role for the C-terminal half of mA3 is suggested by the observation that that the conserved glutamates in the N-terminal Z2 domain and the C-terminal Z3 domain of mA3 are both required for antiviral activity against HIV-1 
. Other evidence suggests that the inactive CDA is involved in virus encapsidation 
. We note that alignment of the mouse Z2 and Z3 CDA regions shows that one of the two selected Z3 codons, P316, aligns with the 134–139 selected cluster of codons in Z2, VQDPET. Another selected codon in the Z3 CDA, T273, aligns with an hA3G segment with two codons under selection in primates 
. This suggests the possibility that this Z3 CDA may have had deaminase activity in some branches of the Mus
Further analysis of the C57BL and BALB/c mA3 genes should shed light on the functional roles of the polymorphic residues in the two groove-associated clusters. The information from additional phylogenetic, structural, and functional comparisons will help describe the range of antiviral activity and evolutionary history of this gene. We are currently analyzing additional mA3 mutants for antiviral activity, and using molecular dynamics simulations to describe the structural implications of specific substitutions.