|Home | About | Journals | Submit | Contact Us | Français|
APOBEC3G is a single-strand DNA cytosine deaminase capable of blocking retrovirus and retrotransposon replication. APOBEC3G has two conserved zinc-coordinating motifs but only one is required for catalysis. Here, deletion analyses revealed that the minimal catalytic domain consists of residues 198-384. Size exclusion assays indicated that this protein is monomeric. Many (31/69) alanine substitution derivatives of APOBEC3G198-384 retained significant to full levels of activity. These data corroborated an APOBEC2-based structural model for the catalytic domain of APOBEC3G indicating that most non-essential residues are solvent accessible and most essential residues cluster within the protein core.
Human APOBEC family members are important mediators of adaptive and innate immune responses (reviewed by [1,2]). These proteins are defined by a highly conserved zinc-coordinating motif, HXE-X23–28-CX2–4C, in which the histidine and the two cysteines position zinc and the glutamate positions water to promote the nucleophilic deamination of cytosines within single-stranded, polynucleotide substrates (usually DNA). One family member, APOBEC3G (A3G), was identified as a cellular protein capable of blocking the replication of virion infectivity factor (Vif)-defective HIV-1 . Independent work showed that A3G was capable of DNA cytosine deamination . Subsequent studies linked these two activities by demonstrating that A3G could inhibit the replication of HIV-1 and other retroviruses by deaminating viral cDNA cytosines to uracils during reverse transcription [5–7]. Uracils template the incorporation of adenines during the synthesis of the complementary viral DNA strand, and subsequent replication (or DNA repair) ultimately produces strand-specific C/G to T/A transition mutations (hypermutations).
The deaminase activity of A3G has been studied genetically and biochemically [5,8–10]. It has been mapped to the conserved C-terminal zinc-coordinating domain [9–12]. The local dinucleotide deamination preference of A3G (5′-CC) is readily observed in AIDS patient-derived HIV-1 DNA sequences as viral genomic strand 5′-GG to 5′-AG hypermutations (cDNA strand 5′-CC to 5′-CT transition mutations; e.g., ). Here, an extensive mutational analysis of A3G was performed to delineate the minimal region required for catalysis, to define nonessential (and essential) amino acids and to corroborate structure-based predictions.
The A3G cDNA used here matches NM_021822. A3G and mutant derivatives were expressed as GST fusion proteins using pGEX6P1 or pGEX6P2 (GE Healthsciences). An EcoRI-SalI DNA fragment from pTrc99A-A3G encoding full-length A3G  was sub-cloned directly into pGEX6P1. A3G deletion mutants were constructed by amplifying the relevant A3G coding regions, digesting the resulting PCR products with SmaI and SalI and ligating them into the SmaI and XhoI sites of pGEX6P2. Alanine mutants were constructed using the QuikChange protocol (Stratagene). All constructs were verified by DNA sequencing. An oligonucleotide table is available online.
The E. coli-based, rifampicin-resistance (RifR) mutation assay has been used extensively to monitor the intrinsic DNA cytosine deaminase activity of several A3 proteins including A3G (e.g., [4,9]). Aliquots of saturated overnight cultures (LB plus 200μg/mL ampicillin) were spread onto LB plates containing 100μg/mL rifampicin to select for RifR mutants and diluted aliquots were spread onto LB plates containing 200μg/mL ampicillin to determine the number of viable cells. Mutation frequencies were calculated by dividing the number of RifR mutants by the number of viable cells in each culture. For the deletion experiments, the mutation frequencies of 8 individual cultures were plotted and the median indicated. For the alanine mutant experiments, the RifR mutation frequency for each construct was determined by assaying the median RifR mutation frequency for 8–12 independent cultures, calculating the fold-difference relative to the median value of the vector control cultures and averaging at least two (and up to 12) independent experiments.
GST-A3G protein levels were monitored by expressing them in E. coli BL21 DE3 RIL (Stratagene), sonicating the cells in lysis buffer (100 mM NaCl, 50 mM Na2HPO4/NaH2PO4 [pH7.0], protease inhibitor [Roche]), separating the soluble (supernatant) and insoluble (pellet) fractions by centrifugation (12,110g, 20 min, 4 ºC) and fractionating aliquots of the resulting proteins by SDS PAGE. Proteins were detected by coomassie blue and quantified by Image J software (http://rsb.info.nih.gov/ij/). Immunoblots were done with antibodies from GE Healthcare (anti-GST) and from J. Lingappa (anti-A3G; ) For size exclusion experiments, GST-A3G198-384 was bound to glutathione sepharose, washed several times with lysis buffer, eluted with PreScission protease (GE Healthcare) in 1mM DTT, 50 mM Na2HPO4/NaH2PO4 [pH7.0] buffer, quantified and immediately 1mg (approx. 1mL) was passed through a Superdex 75 size exclusion column (GE Healthcare) and detected in fractions by UV (absorbance 280). GST and lysozyme were purchased from Sigma.
A3G198-384 and APOBEC2 primary amino acid sequences were aligned with the homology modeling module of the InsightII program (Accelrys) (Online Fig. S1). Secondary structural elements of A3G198-384 were predicted using the HNN program [14,15]. A model was generated by fitting these elements (allowing for gaps) to those in the actual structure of APOBEC2 (PDB 2YNT; ). No gross differences in secondary structure were observed. The homology modeling module of the InsightII program (Accelrys) was used to construct and refine the model by minimizing energetically unfavorable amino acid side chain interactions.
Chimeric APOBEC3 proteins and site-directed mutants have been used to demonstrate that the intrinsic DNA cytosine deaminase activity of human APOBEC3G resides in the conserved C-terminal zinc-coordinating domain [9–12]. To further delineate the minimal domain required for catalysis, nine A3G deletion constructs were tested for mutability in the E. coli-based RifR mutation assay (Fig. 1). Bacteria expressing GST showed a median of 2.5 RifR mutants per 107 viable cells. Expression of full-length A3G fused to GST caused a 4.4-fold increase in mutation frequency. Apart from two notable exceptions, all of the A3G deletion constructs were inactive. The inactivity of constructs lacking 22 or 40 C-terminal residues was consistent with prior work showing that A3G lacking 38 or 89 C-terminal amino acids was unable to inhibit HIV vif [17,18]. A more interesting result emerged from analyses of N-terminal deletion set. A3G variants encoding residues 175-384 or 198-384, but not 215-384, were considerably more mutable than full-length A3G. These data demonstrated that the entire N-terminal, zinc-binding domain is dispensable for activity, and only A3G residues 198-384 are required.
Previous studies indicated that an A3G C97A mutant was incapable of co-immunoprecipitating wildtype A3G, but was still capable of DNA deamination [12,19]. Consistent with these studies, A3G198-384 profiled as a 22 kDa monomer in size exclusion experiments, migrating clearly between the elution positions of lysozyme (14 kDa) and GST (25 kDa) (Fig. 1C). It is not likely that stable oligomeric forms of A3G198-384 exist, because fractions 1–27 did not contain protein peaks.
To more precisely delineate the residues and domains required for DNA deamination, the A3G198-384 variant was used to construct a series of 69 alanine mutants (Fig. 2). Mutagenesis was concentrated to hydrophobic residues and cysteines. This strategy was motivated partly by the likelihood that some of the hydrophobic amino acids would likely be important structurally (perhaps forming part of the enzyme core) whereas, more intriguingly, others would be positioned on the surface of the protein (perhaps defining interaction sites). We also envisaged that mutating select hydrophobic residues might help overcome the solubility issues of A3G and other family members [8,10,19]. All 69 mutants were analyzed using the E. coli-based RifR mutation assay, because in vitro experiments were hindered by the fact that A3G198-384 frequently precipitated during biochemical purification and invariably during long-term storage.
Twelve independent RifR experiments, each with at least 10 constructs (and 8–12 independent cultures per mutant), were required to analyze 69 alanine mutant derivatives of A3G198-384 (Fig. 2). It was not feasible to simultaneously examine all 69 mutants, and minimizing inter-experiment variability was therefore important. This was done by normalizing the median RifR mutation frequencies of cells expressing the control vector (i.e., the spontaneous mutation frequency was set to 1 and used as a baseline). The actual vector control values ranged from 0.8 to 3.9 RifR mutants per 107 viable cells (n=12 experiments). This approach readily enabled the mutagenic effects of A3G198-384 and derivatives to be compared between multiple experiments. For instance, the first two columns of each row of Fig. 2 report the relative mutation frequency of the vector control and A3G198-384, which increased the RifR mutation frequency14.9-fold (SEM: 1.5-fold; n=12 experiments; the actual values ranged from 15.1 to 54.5 RifR mutants per 107 viable cells). Although the raw experimental values fluctuated modestly between experiments (attributable to factors such as incubation temperatures, freshness of the rifampicin-containing plates, colony sizes upon counting, time in stationary phase, etc.), the small SEMs indicated that the relative inter-experiment values were remarkably constant and therefore readily comparable.
We were first struck by the surprising number of alanine mutants that elicited levels of DNA deaminase activity that were at least 3-fold greater than those of the vector control (Fig. 2). In total, 31/69 mutants met this threshold. Several of the other mutants also triggered mutation frequencies significantly above those of the vector control cells and above those of cells expressing a catalytically dead A3G variant, E259A [11,20].
The second notable feature of this dataset was an obvious clustering of non-essential and essential residues, defined by alanine substitution mutants that retained or lost significant activity, respectively. Approximately two-thirds of the non-essential amino acids were found in the A3G region spanning 198-275, with 224-253 particularly striking. Conversely, the majority of the essential residues were found in the C-terminal interval, 276-384, suggesting that the C-terminal end of A3G is required for the integrity of the enzyme.
Third, the alanine mutant data agreed with those from the deletion studies. Toward the N-terminal end, F202, F206 and W211 were required for RifR mutagenesis, explaining why all of the A3G variants starting at position 215 were inactive. Similarly, L364, L375, I378 and L379 were clearly required, offering a reasonable explanation for why all three C-terminal deletion constructs were inactive.
To help exclude the possibility that a lack of deaminase activity might simply be due to reduced solubility or expression, the relative abundance of each protein was analyzed by SDS-PAGE. Rather than examine whole cell extracts, the soluble (supernatant) and insoluble (pellet) fractions were considered separately (the sum of course reflecting whole cell levels). A representative coomassie-stained gel is shown for the first five GST-A3G198-384 derivatives, F202A, F204A, F206A, W211A and V212A (Fig. 3A). The supernatant/pellet ratio of all five mutants was similar to that of the parent construct, indicating that the low activity levels for F202A, F206A and V212A were not simply due to solubility or expression deficiencies. Anti-GST and anti-A3G immunoblots confirmed the identity of the coomassie-stained bands (Fig. 3B). The overwhelming majority of the remaining mutants were as soluble and some were even more soluble than GST-A3G198-348 (Fig. 3C). However, six mutants were less soluble. Four of these mutants, L260A, C261A, C281A and C308A, caused significant mutation frequency increases indicating that these mutants are catalytically active and that the corresponding mutation frequencies may be underestimates. Correction factors were not calculated because these mutants did not impact major conclusions. However, two of the six less soluble mutants, W269A and C288A (one of the conserved zinc-coordinating positions), showed no activity. We were therefore unable to determine whether this was due to gross insolubility, to a loss of enzyme activity or to both of these reasons. Nevertheless, the expression data indicated that vast majority of mutants are well expressed and modestly soluble and, therefore, that the corresponding E. coli-based activity data are informative.
To begin to address whether there was a correlation between activity level and structural position, we used the APOBEC2 crystal structure to model A3G198-384 (Fig. 4). Side-chains of all of the mutated residues were added to the model and colored green or red, representing non-essential or essential residues, respectively. This scheme revealed two striking correlations. First, the amino acid side chains of most of the essential residues positioned toward the core of the protein, facing inward and away from solvent-accessible areas. Second, most of the amino acid side chains of the residues that were not required for DNA deamination appeared in external, solvent accessible spaces. A particularly interesting (and apparently dispensable) cluster was located within the predicted β1-loop-β2-loop region (M227, W232, V233, L234, L235, F241, L242, C243, F252 and L253). It is tempting to speculate that this region constitutes a protein interaction surface, possibly involved in an association with the N-terminal half of A3G. Such a possibility is supported by the APOBEC2 crystal structure, which shows that the analogous β2 strand forms extensive anti-parallel contacts with the β2 strand of another APOBEC2 molecule resulting in a dimer. Analogous contacts may zip-together the N- and the C-terminal halves of A3G.
The structural model also afforded reasonable explanations for the essential nature of the N- and C-terminal ends of A3G198-384 (Fig. 4). N-terminal residues preceding the β1 strand may help stabilize the β-sheet core of A3G198-384 such that removing (or mutating) these N-terminal residues would likely cause α5 to dissociate from the core and thereby render the resulting protein nonfunctional. Similarly, the C-terminal domain α5 helix appears positioned to help stabilize the zinc-coordinating active site, which consists of β1, β3, β4, α1 and α2.
The structural model for A3G198-384 is likely to be reasonably accurate because 31/35 of the hydrophobic residues required for DNA deaminase activity are similar (11/31) or identical (20/31) to homologous amino acids in APOBEC2. Many of the 31 residues are located within or near predicted secondary structural elements, which are probably required for hydrophobic interactions that maintain the overall structural integrity of the enzyme. These residues include L220 and Y222 in β1, F262, I266 and W269 in α1, Y277, V279 and F282 in β3, M295 in α2, V305, L307, I309, I314 and Y315 in β4, L325 and L328 in α3, I335 and M338 in β5, F343, W347, F350 and V351 in α4 and L364, L375, I378 and L379 in α5. The strong correlation between conservation and activity is further bolstered by the periodicity of the correlation ---apparent at every other residue in β-strands and every 3 or 4 residues in α-helices. The amino acid side chains of these important residues are facing toward the inside of the protein structure (Fig. 4). Moreover, these correlations are even more striking when one considers that fact that these two proteins have less than 40% identity overall. Therefore, the E. coli-based mutation data strongly indicate that both the secondary structural elements and the overall three-dimensional folds of A3G198-384 are similar to those of APOBEC2.
A major unanswered question is how does A3G bind DNA? One clue may be provided by the location of the conserved HXE-X23–28-CX2–4C active site, which appeared uniquely positioned on the outside of the predicted structure, with the zinc-coordinating histidine and the two cysteines appearing toward the ends of α1 and α2 helices, respectively (Fig. 4). This positioning together with the monomeric nature of A3G198-384 suggested that single-strand DNA may be contacted by these two helices. The structural model and functional data presented here, together with a full-length A3G model published recently , will provide a solid foundation for future structure-function studies of A3G and other APOBEC family members.
This work was supported by NIH grants AI064046 (RSH) and AI073167 (HM) and by the University of Minnesota’s Leukemia Research Fund (RSH) and Grant-in-Aid Program (HM). RSH is a Searle Scholar and a University of Minnesota McKnight Land Grant Assistant Professor. We thank A. Briggs, P. Gross and Y. Sham for technical assistance, Harris and Matsuo laboratory members for helpful discussions and the University of Minnesota AGAC and SCI for DNA sequencing and computational resources, respectively.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.