3.1. Residues 198-384 of APOBEC3G are sufficient for DNA deamination
Chimeric APOBEC3 proteins and site-directed mutants have been used to demonstrate that the intrinsic DNA cytosine deaminase activity of human APOBEC3G resides in the conserved C-terminal zinc-coordinating domain [9
]. To further delineate the minimal domain required for catalysis, nine A3G deletion constructs were tested for mutability in the E. coli
mutation assay (). Bacteria expressing GST showed a median of 2.5 RifR
mutants per 107
viable cells. Expression of full-length A3G fused to GST caused a 4.4-fold increase in mutation frequency. Apart from two notable exceptions, all of the A3G deletion constructs were inactive. The inactivity of constructs lacking 22 or 40 C-terminal residues was consistent with prior work showing that A3G lacking 38 or 89 C-terminal amino acids was unable to inhibit HIV vif [17
]. A more interesting result emerged from analyses of N-terminal deletion set. A3G variants encoding residues 175-384 or 198-384, but not 215-384, were considerably more mutable than full-length A3G. These data demonstrated that the entire N-terminal, zinc-binding domain is dispensable for activity, and only A3G residues 198-384 are required.
Fig. 1 APOBEC3G deletion mutants delineate a minimal active domain. (A) An illustration showing the amino acid boundaries used for deletion constructs. The HXE-X23–28-CX2–4C motifs are depicted by open boxes, and the asterisk designates the catalytic (more ...)
3.2. Size exclusion experiments indicate that A3G198-384 is monomeric
Previous studies indicated that an A3G C97A mutant was incapable of co-immunoprecipitating wildtype A3G, but was still capable of DNA deamination [12
]. Consistent with these studies, A3G198-384 profiled as a 22 kDa monomer in size exclusion experiments, migrating clearly between the elution positions of lysozyme (14 kDa) and GST (25 kDa) (). It is not likely that stable oligomeric forms of A3G198-384 exist, because fractions 1–27 did not contain protein peaks.
3.3. Alanine mutations define essential and non-essential residues of A3G198-384
To more precisely delineate the residues and domains required for DNA deamination, the A3G198-384 variant was used to construct a series of 69 alanine mutants (). Mutagenesis was concentrated to hydrophobic residues and cysteines. This strategy was motivated partly by the likelihood that some of the hydrophobic amino acids would likely be important structurally (perhaps forming part of the enzyme core) whereas, more intriguingly, others would be positioned on the surface of the protein (perhaps defining interaction sites). We also envisaged that mutating select hydrophobic residues might help overcome the solubility issues of A3G and other family members [8
]. All 69 mutants were analyzed using the E. coli
mutation assay, because in vitro
experiments were hindered by the fact that A3G198-384 frequently precipitated during biochemical purification and invariably during long-term storage.
Fig. 2 Mutator phenotype of 69 APOBEC3G alanine substitution mutants. Histograms showing the relative RifR mutation frequencies of cells expressing the vector control (−), A3G198-384 (+) or derivatives with alanine substitutions at the underlined amino (more ...)
Twelve independent RifR experiments, each with at least 10 constructs (and 8–12 independent cultures per mutant), were required to analyze 69 alanine mutant derivatives of A3G198-384 (). It was not feasible to simultaneously examine all 69 mutants, and minimizing inter-experiment variability was therefore important. This was done by normalizing the median RifR mutation frequencies of cells expressing the control vector (i.e., the spontaneous mutation frequency was set to 1 and used as a baseline). The actual vector control values ranged from 0.8 to 3.9 RifR mutants per 107 viable cells (n=12 experiments). This approach readily enabled the mutagenic effects of A3G198-384 and derivatives to be compared between multiple experiments. For instance, the first two columns of each row of report the relative mutation frequency of the vector control and A3G198-384, which increased the RifR mutation frequency14.9-fold (SEM: 1.5-fold; n=12 experiments; the actual values ranged from 15.1 to 54.5 RifR mutants per 107 viable cells). Although the raw experimental values fluctuated modestly between experiments (attributable to factors such as incubation temperatures, freshness of the rifampicin-containing plates, colony sizes upon counting, time in stationary phase, etc.), the small SEMs indicated that the relative inter-experiment values were remarkably constant and therefore readily comparable.
We were first struck by the surprising number of alanine mutants that elicited levels of DNA deaminase activity that were at least 3-fold greater than those of the vector control (). In total, 31/69 mutants met this threshold. Several of the other mutants also triggered mutation frequencies significantly above those of the vector control cells and above those of cells expressing a catalytically dead A3G variant, E259A [11
The second notable feature of this dataset was an obvious clustering of non-essential and essential residues, defined by alanine substitution mutants that retained or lost significant activity, respectively. Approximately two-thirds of the non-essential amino acids were found in the A3G region spanning 198-275, with 224-253 particularly striking. Conversely, the majority of the essential residues were found in the C-terminal interval, 276-384, suggesting that the C-terminal end of A3G is required for the integrity of the enzyme.
Third, the alanine mutant data agreed with those from the deletion studies. Toward the N-terminal end, F202, F206 and W211 were required for RifR mutagenesis, explaining why all of the A3G variants starting at position 215 were inactive. Similarly, L364, L375, I378 and L379 were clearly required, offering a reasonable explanation for why all three C-terminal deletion constructs were inactive.
3.4. A3G198-384 and derivative mutator activities are not simply explained by expression levels
To help exclude the possibility that a lack of deaminase activity might simply be due to reduced solubility or expression, the relative abundance of each protein was analyzed by SDS-PAGE. Rather than examine whole cell extracts, the soluble (supernatant) and insoluble (pellet) fractions were considered separately (the sum of course reflecting whole cell levels). A representative coomassie-stained gel is shown for the first five GST-A3G198-384 derivatives, F202A, F204A, F206A, W211A and V212A (). The supernatant/pellet ratio of all five mutants was similar to that of the parent construct, indicating that the low activity levels for F202A, F206A and V212A were not simply due to solubility or expression deficiencies. Anti-GST and anti-A3G immunoblots confirmed the identity of the coomassie-stained bands (). The overwhelming majority of the remaining mutants were as soluble and some were even more soluble than GST-A3G198-348 (). However, six mutants were less soluble. Four of these mutants, L260A, C261A, C281A and C308A, caused significant mutation frequency increases indicating that these mutants are catalytically active and that the corresponding mutation frequencies may be underestimates. Correction factors were not calculated because these mutants did not impact major conclusions. However, two of the six less soluble mutants, W269A and C288A (one of the conserved zinc-coordinating positions), showed no activity. We were therefore unable to determine whether this was due to gross insolubility, to a loss of enzyme activity or to both of these reasons. Nevertheless, the expression data indicated that vast majority of mutants are well expressed and modestly soluble and, therefore, that the corresponding E. coli-based activity data are informative.
Fig. 3 GST-A3G198-384 expression data. (A) A representative gel showing the soluble (supernatant) and insoluble (pellet) amounts of GST, GST-A3G198-384 (WT) and 5 mutant derivatives. The S/P ratio of the boxed bands is shown below each lane. The E. coli protein(s) (more ...)
3.5. The DNA deaminase results corroborate a predicted 3-D structure for A3G198-384
To begin to address whether there was a correlation between activity level and structural position, we used the APOBEC2 crystal structure to model A3G198-384 (). Side-chains of all of the mutated residues were added to the model and colored green or red, representing non-essential or essential residues, respectively. This scheme revealed two striking correlations. First, the amino acid side chains of most of the essential residues positioned toward the core of the protein, facing inward and away from solvent-accessible areas. Second, most of the amino acid side chains of the residues that were not required for DNA deamination appeared in external, solvent accessible spaces. A particularly interesting (and apparently dispensable) cluster was located within the predicted β1-loop-β2-loop region (M227, W232, V233, L234, L235, F241, L242, C243, F252 and L253). It is tempting to speculate that this region constitutes a protein interaction surface, possibly involved in an association with the N-terminal half of A3G. Such a possibility is supported by the APOBEC2 crystal structure, which shows that the analogous β2 strand forms extensive anti-parallel contacts with the β2 strand of another APOBEC2 molecule resulting in a dimer. Analogous contacts may zip-together the N- and the C-terminal halves of A3G.
Fig. 4 Four views of a model A3G198-384 structure based on human APOBEC2 . The predicted α helices and β sheets correspond with those shown in . Relevant amino acid side chains are depicted in green or red to reflect the activity or (more ...)
The structural model also afforded reasonable explanations for the essential nature of the N- and C-terminal ends of A3G198-384 (). N-terminal residues preceding the β1 strand may help stabilize the β-sheet core of A3G198-384 such that removing (or mutating) these N-terminal residues would likely cause α5 to dissociate from the core and thereby render the resulting protein nonfunctional. Similarly, the C-terminal domain α5 helix appears positioned to help stabilize the zinc-coordinating active site, which consists of β1, β3, β4, α1 and α2.
The structural model for A3G198-384 is likely to be reasonably accurate because 31/35 of the hydrophobic residues required for DNA deaminase activity are similar (11/31) or identical (20/31) to homologous amino acids in APOBEC2. Many of the 31 residues are located within or near predicted secondary structural elements, which are probably required for hydrophobic interactions that maintain the overall structural integrity of the enzyme. These residues include L220 and Y222 in β1, F262, I266 and W269 in α1, Y277, V279 and F282 in β3, M295 in α2, V305, L307, I309, I314 and Y315 in β4, L325 and L328 in α3, I335 and M338 in β5, F343, W347, F350 and V351 in α4 and L364, L375, I378 and L379 in α5. The strong correlation between conservation and activity is further bolstered by the periodicity of the correlation ---apparent at every other residue in β-strands and every 3 or 4 residues in α-helices. The amino acid side chains of these important residues are facing toward the inside of the protein structure (). Moreover, these correlations are even more striking when one considers that fact that these two proteins have less than 40% identity overall. Therefore, the E. coli-based mutation data strongly indicate that both the secondary structural elements and the overall three-dimensional folds of A3G198-384 are similar to those of APOBEC2.
A major unanswered question is how does A3G bind DNA? One clue may be provided by the location of the conserved HXE-X23–28
C active site, which appeared uniquely positioned on the outside of the predicted structure, with the zinc-coordinating histidine and the two cysteines appearing toward the ends of α1 and α2 helices, respectively (). This positioning together with the monomeric nature of A3G198-384 suggested that single-strand DNA may be contacted by these two helices. The structural model and functional data presented here, together with a full-length A3G model published recently [21
], will provide a solid foundation for future structure-function studies of A3G and other APOBEC family members.