|Home | About | Journals | Submit | Contact Us | Français|
The AID/APOBEC family enzymes convert cytosines in single-stranded DNA to uracil causing base substitutions and strand breaks. They are induced by cytokines produced during the body’s inflammatory response to infections, and help combat infections through diverse mechanisms. AID is essential for the maturation of antibodies and causes mutations and deletions in antibody genes through somatic hypermutation (SHM) and class-switch recombination (CSR) processes. One member of the APOBEC family, APOBEC1, edits mRNA for a protein involved in lipid transport. Members of the APOBEC3 subfamily in humans (APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G and APOBEC3H) inhibit infections of viruses such as HIV, HBV and HCV, and retrotransposition of endogenous retroelements through mutagenic and non-mutagenic mechanisms. There is emerging consensus that these enzymes can cause mutations in the cellular genome at replication forks or within transcription bubbles depending on the physiological state of the cell and the phase of the cell cycle during which they are expressed. We describe here the state of knowledge about the structures of these enzymes, regulation of their expression, and both the advantageous and deleterious consequences of this expression including carcinogenesis. We highlight similarities among them and present a holistic view of their regulation and function.
Activation-induced deaminase (AID) and apolipoprotein B mRNA-editing catalytic polypeptide-like (APOBEC) proteins are found in all tetrapods including the primates and in bony fish including the lampreys. They deaminate cytosine to uracil in single-stranded DNA (ssDNA)1–6 or in both ssDNA and RNA.5,7–9 Primates appear to have the highest number of this family of proteins10 and in humans they include AID, APOBEC1, APOBEC2, seven APOBEC3 subfamily members (APOBEC3A, APOBEC3B, APOBEC3C, APOBEC3D/E, APOBEC3F, APOBEC3G and APOBEC3H) and APOBEC411 In this review we will principally discuss the biochemical properties and biological functions of the mammalian AID/APOBEC family proteins, with the exception of APOBEC2 and APOBEC4. The latter two proteins appear not to be catalytically active and will not be discussed here.
These enzymes are part of the cellular innate and adaptive immune response that protects the host organism against infection. Although the biochemical properties of these enzymes will be described below, the principal focus of this review is to summarize what is known about their biological functions. The important immunological functions of these enzymes come with the potential risk of causing considerable damage to the host genome and we will review what is known about the harmful effects of these enzymes in mammalian cells and in humans. A major goal of this review is to identify the gaps in our understanding of these enzymes. Consequently, the review will highlight the limitations of the available data and the inadequacies of the tools of study or biological models.
Most recent reviews have treated AID, APOBEC1 and APOBEC3 proteins as if they were unrelated, and have not emphasized the functional overlaps between them. Here, we will identify similarities between them and try to integrate what is known about these enzymes to create a coherent narrative. In particular, we will outline how the regulation of AID overlaps with that of the APOBEC3 enzymes during the inflammatory response to an infection, and suggest a model of how the biological functions of these enzymes go hand in hand with their ability to cause cellular malfunction.
AID/APOBEC proteins have a characteristic zinc-coordination motif (H-X-E-X23-28-P–C-X-C) within the active site where a water molecule binds Zn2+ and the metal ion is coordinated by one histidine and two cysteines.12 While the genes for AID, APOBEC1 (A1), APOBEC3A (A3A), APOBEC3C (A3C) and APOBEC3H (A3H) contain a single Zn2+-binding domain, genes for APOBEC3B (A3B), APOBEC3D/E (A3D/E), APOBEC3F (A3F) and APOBEC3G (A3G) have resulted from duplications of the primordial gene10,13 and contain two putative zinc-binding motifs. In all cases where there are two Zn2+-binding domains, only the carboxy-terminal domain is catalytically active. Based on prior work with bacterial and yeast cytidine deaminases, it has been suggested that a conserved glutamate plays a central role in catalysis by shuttling a proton between the bound water molecule and N3 of cytosine, and between the resulting −OH and the exocyclic amino group of cytosine.12 They show little activity towards the free cytosine base, its nucleosides or mononucleotides.1,4
Different AID/APOBEC proteins deaminate cytosines in different preferred sequence contexts. They have a stronger preference for specific bases on the 5′ side of the target cytosine than on its 3′ side. While AID prefers WRC14 (W is A or T, R is purine, target cytosine is underlined) sequence, APOBEC3G prefers CCC, and the other family members target YC sequences (Y is pyrimidine) with a preference for T as the pyrimidine.1,15
APOBEC2 was the first member of the family for which the crystal structure became available16 and subsequently the structures of several APOBEC3 subfamily proteins have been determined. The structures have been determined for A3A17,18, A3C19 and the C-terminal catalytic domains of A3B and A3F.20–22 Additionally, several structures of the C-terminal domain of APOBEC3G (A3G-CTD) have been reported based on X-ray crystallographic and NMR studies.23–25 As yet, the complete structure of a two Zn2+-domain member of the APOBEC3 subfamily has not been reported.
Using ENDscript 2 software26 we compared representative APOBEC3 subfamily structures and identified their common structural elements(Fig. 1A). The proteins share extensive secondary structural similarity reflecting their sequence similarities and their tertiary structure has a central β sheet surrounded by three helices on each side. The principal difference between the different structures is that while A3F, A3C and A3A have a continuous β2 strand, this strand is interrupted by an α helix or a β turn in A3B and A3G structures (Fig. 1A). The biochemical consequences of this difference are unclear. The active site Zn2+ is coordinated by one His and two Cys residues and these residues are at the edges of two helices that lie close to the surface of the protein (Figs. 1A and and1B).1B). Coulombic surface potential map of A3F-CTD27 shows that the catalytic center is near a cavity with negative potential and this is likely to be the pocket in which the target cytosine is inserted. There is no groove with a strong positive potential on the protein surface near the catalytic center for nucleic acid binding. Instead, there is a surface patch of neutral/positive potential that extends from the catalytic residues towards loop 7 (Fig. 1B). This may interact with the substrate DNA and two potential paths for the DNA are shown in Figure 1B.
A number of independent lines of evidence show that the principal determinant of DNA sequence specificity within the AID/APOBEC proteins is the loop 7 (Fig. 1B). This conclusion is supported by experiments in which loop 7 was exchanged between different members of the family resulting in the swapping of sequence specificities.28–30 Additionally, replacement of loop 7 in AID with the corresponding sequence in A3A resulted in increased preference for methylated cytosines for deamination that is characteristic of A3A.8 Molecular docking of a single-stranded DNA template with A3A also suggested interaction between several residues in loop 7 with DNA with additional interactions with loops 1, 3 and 5.31 Alanine scanning mutagenesis of loop 7 in AID revealed several residues essential for deamination activity.32 Saturation mutagenesis of loop 7 residues in AID followed by multiple rounds of genetic selection for cytosine deamination confirmed this result.32 In other experiments, D317 in this loop of A3G was substituted with tyrosine resulting in the changing of the sequence-specificity of the enzyme from 5′-CC to 5′-YC.33 Mutational studies of A3F also showed that a replacement of W310 in this loop with alanine resulted in decreased binding to DNA and reduced deaminase activity.34 NMR studies of interactions between deoxynucleotides or ssDNA with A3A showed that residues in the loop 7 interact with nucleotides on either side of the target cytosine.17 Together these studies show that while loop 7 is the principal determinant of sequence specificity in AID/APOBEC enzymes, loops 1, 3 and 5 also contribute to DNA binding.28,35,36 A more complete picture of the mechanism of DNA sequence recognition will emerge when structures of enzyme-DNA co-crystals become available.
Despite extensive studies, the subunit composition of these enzymes is poorly understood and remains controversial. The multimerization of these proteins in vitro has been studied using a large number of biochemical and biophysical techniques including co-immunoprecipitation, yeast two-hybrid analysis, bimolecular fluorescence complementation, size exclusion chromatography, matrix-assisted laser desorption ionization time-of-flight spectrometry, small angle X-ray scattering, X-ray crystallography, nuclear magnetic resonance, density gradient separation of cytoplasmic components, atomic force microscopy as well as through studies of live and fixed cells.17,20,37–47 Based on such studies a variety of different subunit compositions have been reported for most of the proteins. A3G has been studied most extensively in this regard and the data collectively suggest that A3G can be present as a monomer, dimer, tetramer or an oligomer, and the process of oligomerization may depend on protein modifications, intracellular protein concentration, salt conditions, presence of RNA, presence of HIV-1 infectivity factor, Vif (see below), subcellular localization and the cell type in which it is expressed. This makes it difficult to correlate multimerization of this and other proteins in the family with their biological function.
APOBEC1 (A1) is known to cause C to U conversion in mRNA7, but it can also act on cytosines in single-stranded DNA.48 The RNA editing by A1 in the mammalian apolipoprotein B gene creates a termination codon that shortens the resulting protein with altered function.49 The function of its DNA editing ability is not known. AID is required for the creation of high affinity antibodies against antigens. It causes mutations in the immunoglobulin genes and facilitates “maturation” of antibodies following an infection.50 The biological function of APOBEC3 subfamily proteins is to introduce mutations in the genome of viruses infecting cells or within endogenous retroelements during retrotransposition.51,52 APOBEC2 and APOBEC4 will not be discussed further because they do not show any deaminase activity and their functions remain poorly defined.
The biological function of AID in adaptive immunity is well understood, but the biochemical pathways it participates in remain poorly understood at the molecular level. In particular, the steps downstream of AID action on DNA do not fit our understanding of DNA repair pathways and it appears that AID colludes with DNA repair pathways to rearrange the genomes of B lymphocytes and alter their genetic information. It is a major challenge to understand how the enzymes in base-excision repair (BER) and mismatch repair (MMR) cause high levels of mutations and strand breaks with the help of AID, when several decades of studies have shown that these pathways have evolved to protect the genome. Finally, it would be naïve to think that the damage done by AID to the genome can be restricted to the immunoglobulin loci. Increasingly, it is becoming clear that cytosines outside the immunoglobulin genes are targets of AID and that this is a source of mutations and genome instability in B lymphocytes and other cells.
The maturation of antibodies occurs in B cells when they enter transient structures called germinal centers that form within peripheral lymphoid organs. AID causes hypermutations in the “variable region” of antibody genes, resulting in an increase in the affinity of antibodies for antigens. This process is called somatic hypermutation (SHM). During their maturation process, the antibodies are transported to the surface of B lymphocytes and act as receptors, and the ability of these B cell receptors to bind an antigen and interact with follicular helper T cells assures their survival and proliferation. This is called clonal selection53–55 and repeated cycling of B cells through a process of hypermutation and selection is responsible for a time-dependent increase in the affinity of antibodies for antigens derived from an infectious agent.56–58
Prior to an infection, combinatorial reassortment of variable (V), diversity (D) and junction (J) segments generates millions of unique combinations in different cells and creates the V(D)J exon in the immunoglobulin (Ig) gene. The vertebrate B lymphocytes acquire hypermutations in a region that includes the V(D)J exon either through a process that involves damage to DNA followed by error-prone repair (SHM; Fig. 2) or gene conversion (GC) between V(D)J and upstream pseudo-V segments.59 AID is required for both these processes.50,60,61 GC of Ig genes is not observed in humans and will not be discussed here.
The variable region of the Ig gene is approximately 1,500 base pair (bp) long and extends from about 150 bp downstream of the Ig promoters to the beginning of the intron separating V(D)J from the constant domain exons (Fig. 2). The mutations caused by AID include all transitions and transversions and generally occur at a rate of about 10−3 per bp per generation. About 30% of the mutations are found in the two hotspots 5′-RGYW/5′-WRCY (W is A or T, R is purine and Y is pyrimidine) and TTA/TAA62 and show a roughly bell-shaped distribution over the region. There is no detectable strand bias in the mutations, suggesting that the mutational process acts on both the template and the coding strand of the gene.
As the sequence selectivity of purified AID is WRC14, and hence the principal hotspot for SHM, RGYW/WRCY63, may be solely determined by this selectivity. However, a copying of uracils generated by AID results in only C:G to T:A transitions and other types of base substitutions found in SHM are caused by translesion synthesis (TLS) polymerases. In this model64, error-prone repair of uracils results in the creation of any of the three possible base substitutions (Fig. 3A). UNG2 excises the uracil creating an abasic (AP) site and TLS polymerases insert any of the four bases across the AP site. TLS polymerase η would predominantly insert an adenine across the AP site causing C:G to T:A mutation65, but other TLS polymerases may insert other bases across the AP site expanding the spectrum of mutations. For example, REV1 inserts C across an AP site66,67 and would create C:G to G:C transversions. This model further suggests that the actions of MMR and UNG2 help extend the mutations to the base pairs flanking the target cytosine64,68,69, but the mechanistic details of this process are unclear. Together, these processes spread the mutations outside the WRC sites that are the targets of AID for deamination and allow the creation of all six base substitution mutations. The roles of TLS polymerases Polη and REV1 in shaping the mutation spectrum in SHM has been reviewed in greater detail elsewhere.70–72
SHM requires transcription of the target sequences73,74 and this topic was reviewed recently.75 There are good biochemical reasons why AID is likely to act at transcription pause or arrest sites. Transcription causes separation of DNA strands potentially exposing them to AID. However, AID is a slow enzyme (rate of deamination of 0.03 sec−1)76–78, and hence cannot capture cytosines in a transcription bubble that normally stays open for about 0.1 sec during normal elongation.79 Thus, AID is much more likely to find cytosines in ssDNA at a pause or arrest site than within a transcript elongation complex.80
Many structural and mechanistic aspects of the interaction between AID and the transcription bubble are poorly understood. For example, the non-template strand within the transcription bubble is more accessible than the template strand81 (Fig. 3A), but there is no deamination bias in favor of the non-template strand.68,69 An attractive model that overcomes this problem proposes that the RNA in the transcription bubble is removed by a ribonuclease exposing the template strand.82–84 However, it is unclear as to how the ribonuclease accesses 3′ end of the nascent pre-mRNA for degradation. The RNA is paired with the template strand and lies deep within the RNA polymerase. Also, the forces that keep the two DNA strands apart after the degradation of RNA and removal of the polymerase from the DNA have not been described. Furthermore, if AID acts at transcription pause sites, it is not clear whether the pauses occur randomly along the DNA or are caused by specific events such as formation of secondary structure85,86 or the creation of DNA supercoiling domains.87 Another possibility is that two convergent elongating polymerases could collide within the V(D)J segment causing transcription arrest and creating a region of single-strandedness (Fig. 2).88 Interestingly, a single-molecule study showed that AID can cause T7 RNA polymerase elongation complexes to stall89 eliminating the need for protein or structural factors to promote transcriptional pausing or stalling. This is similar to the original model for how AID may work73,74 and underlines the need for an in vitro coupled transcription/deamination experimental model for answering many of the mechanistic questions about AID biochemistry.
Early work on AID did not provide direct experimental evidence that AID causes cytosine deaminations in the B cell genome. The role of AID in SHM was judged solely through genetics (loss of SHM in AID−/− mice, ref.50) or by determining mutation spectra in mice deficient in the repair of uracils (UNG−/−, ref.90) and/or base-base mismatches (MSH2−/− or MSH6−/−, references68,69,91). More recent studies provide more direct evidence for the creation of uracils by AID.92–94
When DNA from splenocytes of immunized UNG−/− mice was sequentially treated with Escherichia coli Ung and an AP endonuclease to generate strand breaks and the breaks were quantified, the breaks (and hence the uracils) were found to occur in both the V(D)J and the switch regions.92 This study showed that uracils were present in both the DNA strands of the switch region and estimated that there are about 0.8 uracils/103 bp in the Sμ region of activated B cells.92 This number is several orders of magnitude higher than the number of uracils found in monocytes from un-immunized mice95, but may not be sufficient to explain the double-strand breaks (DBSs) promoted by AID (see below). In a different study, uracils in DNA were replaced with biotin-containing tags, and total genomic uracils were quantified following ex vivo stimulation of WT and UNG−/− splenocytes.94
In UNG−/− splenocytes, the genomic uracil levels increased about ten-fold in the first three days following stimulation, and this paralleled the increase in AID gene expression and nuclear DNA-cytosine deamination activity. The study estimated that the total number of uracils in the genome of UNG−/− B cells was at least a few thousand94, which is larger than what can be accounted for by the excess uracils in the Ig genes detected by the strand cleavage assay described above.92 This suggests that during the activation of B cells a large number of uracils are introduced in the genome, and most of the uracils lie outside the Ig genes.
When WT splenocytes were stimulated, UNG2 (the nuclear form of UNG) expression and nuclear uracil excision activity increased commensurate with the increase in AID expression and, in contrast to UNG−/− splenocytes, no net increase in genomic uracils was detected. The lack of increase in uracil content was also seen in genomes of stimulated human tonsillar cells.94 Hence, within the limits of detection of this assay, generation of genomic uracils and their removal are balanced during normal B cell maturation.
Although this study found that most uracils created by AID must lie outside Ig genes, it is unclear whether these uracils were in U:A or U•G pairs. The study used E. coli Ung to excise the uracils94 and this enzyme excises uracils from both U•G and U:A pairs. Hence it is possible that some of the detected uracils resulted from the insertion of dU by DNA polymerases across from dA. Such incorporations of uracils in DNA have no mutagenic consequences. Mutations caused by AID in non-Ig genes are discussed below in the section 3.4.
Following an infection, mammalian B cells undergo an additional genetic alteration, class-switch recombination (CSR, refs.96,97). It is a region-specific recombination process that replaces constant heavy chain of immunoglobulins, μ, with one of the other chains (Fig. 2). This creates antibodies of different isotypes that interact with different cellular receptors to perform distinct immune functions. AID is essential for CSR.50
AID promotes the formation of DSBs in the two switch regions98 that participate in recombination that causes the isotype switch. The non-homologous end-joining or alternate end-joining processes connect the two broken ends replacing the μ constant segment with one of the downstream constant segments (Fig. 2, ref.97). The critical unanswered question is how AID promotes the formation of DSBs. CSR is almost completely eliminated in UNG−/− mice90 and inactivation of MMR proteins MSH2 or MSH6 also reduces efficiency of CSR99,100 leading to the proposal that BER and MMR cooperate to generate the DSBs needed for CSR.96,97
A possible way in which AID may promote the formation of DSBs is if the repair of U•G mispairs stops after hydrolysis of the AP sites by the AP endonuclease APE1. If this incomplete BER reaction pathway processed two closely spaced U•G mispairs with the uracils in opposite strands during a short time interval, a DSB would be created. Although, APE1 does appear to be required for CSR101, it is unclear how the subsequent steps in canonical BER are prevented from taking place. It is also unclear why this incomplete BER does not occur over the whole B cell genome. If incomplete BER were to happen on a genome-wide scale, potentially lethal strand breaks would be created during the repair of about 15,000 AP sites that occur in every cell per generation due to depurination.102 An additional problem with this pathway for generating DSBs is that the number of uracils in the S regions detected by Maul et al92 is not large enough to create closely spaced U•G mispairs. Therefore, the number of uracils generated by AID in the S regions is either underestimated or in the fraction of the cells undergoing CSR (which is generally <<50%) AID creates closely spaced uracils on opposite strands in a concerted fashion. Clearly, a great deal of additional work is needed to validate this pathway.
The dependence of efficient CSR on MMR may be explained by the binding of the MutSα (MSH2•MSH6) complex to other U•G mispairs in the S regions. One problem with this hypothesis is that the MMR is normally coordinated with DNA replication in the S phase to correct replication errors and AID is expressed mainly in the G1 phase. A possible answer to this problem may lie in a recently described non-canonical replication-independent mismatch repair (ncMMR) pathway in extracts of B cell tumor lines.103,104 In this process, binding of MutSα to U•G is followed by random nicking of DNA by the MutLα complex. Exonuclease 1 resects the DNA from these nicks and creates DSBs.104 Regardless of the molecular details of how DSBs are created, it is unclear at this time whether U•G mispairs created by AID outside the Ig genes are also subject to ncMMR creating DSBs. Strand breaks outside the Ig genes have been detected during B cell stimulation105 and it is possible that they are the result of incomplete BER or ncMMR of uracils created by AID outside Ig genes.
During CSR, mutations are also introduced in the two switch (S) regions in which DSBs are created (Fig. 2). The switch regions contain pentameric repeats with sequences such as GGGGT and GAGCT, and a majority of switch junctions lie at these repeats.106 Deletions and base substitutions are frequently found near such junctions. The spectrum and distribution of substitution mutations in the switch regions are not as well-documented as V(D)J mutations, but they also have a preference for RGYW/WRCY targets107, extend over several thousand base pairs and do not show a strand preference.108 However, no biological function has been attributed to these mutations and they appear to be the result of a sloppy process of generating strand breaks. Consistent with this hypothesis, analysis of sequences of Sμ switch regions has shown that AID creates mutations in this region even without isotype switching.107
The original suggestion that AID may play a role in DNA demethylation was based on the observation that 5-methylcytosines (5mC) in DNA is a substrate for AID. It was postulated that the T•G mispairs resulting from the deamination of 5mC could be repaired through base excision repair pathway restoring a C:G pair.109,110 Furthermore, detection of AID gene expression in oocytes and embryonic stem cells111 was considered suggestive of AID’s role in the erasure of DNA methylation during early stages of mammalian embryogenesis and stem cell development. However, AID was not found in testis111,112, and genetic and biochemical studies of purified AID showed that 5mC is a poor substrate for AID.2,8,113–115 Subsequent discovery of the Tet enzymes which convert 5mC in DNA to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC)116,117 and the detection of these modified bases in the genomes derived from both the parents within early embryos (reviewed in ref.118) makes it unlikely that AID plays a major role in genome-wide DNA demethylation that occurs during embryogenesis. Furthermore, the finding that the reactivity of AID towards C5-substituted cytosine decreases with increasing size of the substituent, makes it doubtful that AID plays a role in deaminating 5hmC, 5fC or 5caC in DNA.114,115,119 An analysis of the methylome in AID−/− mice also did not support its role in DNA demethylation.120 Despite these negative results, persistent reports suggest that AID may play a role in DNA methylation changes at a limited number of loci during cellular differentiation and establishment of pluripotency.121–125 Whether this role is enzymatic or structural remains unresolved at this time and availability of a mouse with catalytically inactive AID may help answer this question.
A number of studies have shown that AID targets cytosines outside the Ig genes causing mutations and strand breaks. Immunoprecipitation experiments showed that AID binds RNAP II126 and ChIP analysis found that AID associates with nearly 6,000 genes in stimulated murine B cells.127 The genes that recruited AID had a corresponding mRNA abundance that is 40 times greater than that of genes that do not recruit AID. In elongating genes, AID and RNAP II peak at the transcription start sites (TSS), and AID occupancy mirrors RNAP II density along the genes. A limited number of genes that recruit AID were sequenced and were found not be hypermutated in stimulated wild-type B cells. However, in UNG−/− background some non-Ig genes accumulated mutations at frequencies only ~10-fold lower than in the Sμ region.127
In another study, when 1 kbp segments near 5′ ends of over 80 genes expressed in germinal centers were sequenced from B cells undergoing maturation, 19% non-Ig genes were significantly mutated. The genes BCL6, CD83 and PIM1 had the highest level of AID-dependent mutations, while other genes including H2AFX had background mutation frequency.128 MYC gene was modestly, but significantly, mutated.128 However, the frequency of mutations in the Jh4 region in the VDJ segment was greater than 40 times the frequency in any other gene demonstrating that the VDJ segment is far and away the best target for hypermutations caused by AID in activated B cells. When the same genes were sequenced from cells of UNG−/− MSH2−/− mice, the percentage of genes with significantly higher mutations than background increased to 43%. However, in this background, BCL6, CD83 and PIM1 acquired mutations at only 1.6-fold the frequency found in WT cells. In contrast, mutation frequencies of several other genes were dramatically higher in the UNG−/− MSH2−/− cells. For example, frequency of mutations in the H2AFX and MYC genes respectively increased 18- and 14-fold.128 Together these results show that AID deaminates cytosines in many non-Ig genes expressed in germinal center B cells, but high fidelity repair of the resulting uracils prevents mutations in most genes. However, uracils in several other genes are processed in the same error-prone manner as those in the V(D)J segment resulting in mutations (Fig. 2). The factors that determine whether or not U•G pair in a genomic region will be repaired through high fidelity repair need to be identified. It will also be useful to have results from a whole genome sequencing study of this kind to determine the genic and non-genic targets of AID and to delineate the features that make genomic regions susceptible to error-prone or high-fidelity repair.
As mentioned earlier, B cell stimulation results in the creation of strand breaks outside the Ig genes which are repaired through homologous recombination prior to DNA replication105, but the location and numbers of these breaks have not been determined. In other studies, the locations of double-strand breaks generated by AID in the murine genome were determined by capturing them as translocations to engineered SceI endonuclease generated breaks.129,130 Although, a large fraction of AID-generated breaks were within the immunoglobulin loci, the remaining breaks were distributed throughout the murine chromosomes and correlated with actively transcribed genes. The frequency of AID-generated breaks peaked a few hundred bp downstream of TSS and correlated with hypermutated genes.129,130 These and other studies show that AID can target non-Ig genes and cause strand-breaks.84,88
While normal circulating B lymphocytes have undetectable levels of AID expression, most non-Hodgkins B cell lymphomas (B-NHLs) express AID at high levels131–135 These cells also show evidence of germinal center development such as SHM, CSR, or both. Additionally, their genomes contain translocations and non-Ig hypermutations. Typically, the translocations involve recombination between an Ig gene and a protooncogene such as MYC, BCL2 or BCL6.136,137 The biochemical steps leading from AID promoted DSBs to chromosome translocations have been reviewed extensively138–140 and will not be discussed here.
In an IL6tg murine B cell plasmacytoma model, a critical step in cellular transformation was dependent on AID. When mice are treated with pristane, a chemical that causes chronic inflammation, or interleukin-6 (IL6), the animals acquire plasmacytomas and the tumors contain IgH-MYC translocations.141,142 However, these translocations were not found in IL6tg AID−/− mice and the formation of lymphatic hyperplasia was delayed in these mice.143 AID promotes the formation of these translocations by creating DSBs at both Myc and IgH loci.144 AID was also required for germinal center-derived lymphomagenesis in a different lymphoma-prone mouse model.145 These mice expressed BCL6, which is a master regulator of germinal center development and inhibits apoptosis. These and other results show that the principal role of AID in causing B cell non-Hodgkin’s lymphomas is the ability of AID to generate DSBs near Myc, BCL6 and other oncogenes causing translocations that result in their dysregulation. In contrast, absence of AID had no impact on Myc-driven, pre-GC lymphomas145, suggesting that only GC-derived B cell lymphomas depend on the expression of AID.
AID may play other roles in carcinogenesis in addition to promoting translocations. When bone marrow cells transduced with AID were transplanted into immune cell-depleted mice, the mice developed both B and T cell lymphomas.146 The B cell lymphomas contained base substitution and addition/deletion mutations in genes EBF1 and PAX5 that are normally expressed in B cells, but did not contain Myc-IgH translocations. This suggests that the ability of AID to cause mutations may also play a role in tumor promotion.146
Somewhat surprisingly, constitutive expression of AID in transgenic mice causes T cell lymphomas, but not B cell lymphomas.147 These mice also develop lung microadenomas and adenocarcinomas and, less frequently, develop other types of tumors such as hepatocellular carcinomas, melanomas, and sarcomas.148 In these mice, genes for T-cell receptor (TCR), MYC, PIM1, CD4, and CD5 were extensively mutated, but no large-scale clonal chromosome rearrangements such as IgH-c-myc translocations were found.147,149 These studies show that ectopic expression of AID can contribute to carcinogenesis, but is not sufficient for B cell cancers. As described above, a coordinated expression of other proteins such as BCL6 may be necessary to drive B cells to malignancy.
Cytokine-mediated inflammatory responses are the first line of defense against viral infections. Examples of such cytokines include interferons α, β and γ (IFN-α, -β and -γ), interleukins and tumor necrosis factor α (TNF-α). These cytokines in turn activate several transcription factors such as NF-κB and STAT150 resulting in the expression of specific host proteins and activation of host defense mechanisms to clear the viral infection.151 These cytokines cause a wide range of changes in cellular function and body physiology that are collectively called inflammation.
AID expression is regulated by a number of proinflammatory cytokines. TGF-β, TNF-α and IL-1β can stimulate AID expression via NF-κB signaling in primary human hepatocytes and reduce HBV infectivity in host cells.152–154 Moreover, AID expression is also enhanced by IL-4 and IL-13 in a STAT6 dependent manner in B cells and in human colonic epithelial cells.155,156 Thus AID expression is part of a broader inflammatory response following infection.
Chronic inflammation can be triggered by an autoimmune reaction and AID plays diverse and conflicting roles in regulating autoimmunity. Patients with AID deficiency fail to produce class-switched and affinity matured antibody isotypes and suffer from bacterial infection.157 The role played by AID in preventing autoimmunity is illustrated by the observation that about 20 – 30% of these patients develop autoimmune diseases involving the production of non-hypermutated autoreactive unswitched (IgM) antibodies against self-tissues.158 Examples of autoreactive antibodies in AID-deficient patients include an abnormal Ig repertoire encoding cold agglutinin antibodies that recognizes N-acetyllactosamine structures on red blood cells.159 and an enrichment of clones with a long IgH CDR3159, which favors self-reactivity.160 In B cells that leave the bone marrow for further maturation, AID expression is necessary for the removal of autoreactive clones, perhaps by exerting genotoxic stress in conjunction with RAG2 and promoting apoptosis.161,162 Therefore, AID deficiency causes ineffective deletion of autoreactive B cells emerging from the bone marrow.
It is also possible that bacterial infection, a direct consequence of defective SHM and CSR in AID deficiency, leads to amplification of autoreactive B cells in the periphery.163 Additionally, a reduction in peripheral blood regulatory T cells and an increase in circulating B cell-activating factor of the tumor necrosis factor family (BAFF)159 may also aggravate B cell autoimmunity. Consistent with a role of AID in limiting autoimmunity in humans, AID−/− mice have more severe autoimmune manifestations than their WT counterparts in certain specific genetic backgrounds.164,165
Interestingly, while the deficiency of AID can precipitate B cell autoimmunity, uncontrolled AID expression can also promote B cell autoimmunity. In fact, AID-mediated germinal center reaction serves as an important source of autoantibodies.166 Correlative studies found that B cells in the blood and ectopic synovial lymphoid follicles of rheumatoid arthritis patients have higher AID expression than those in osteoarthritis patients, and AID expression strongly correlates with serum rheumatoid factors.167 The autoimmune-prone BXD2 mice harbor increased AID expression in splenic B cells as compared to B6 mice, with concomitant spontaneous formation of germinal centers and production of hypermutated autoreactive IgG.168 Furthermore, AID-mediated SHM plays a critical role in the generation of high-avidity antinuclear antibodies in a mouse model of systemic lupus erythematosus (SLE).169 AID−/− and AID+/− MRL/lpr mice show reduced or delayed lupus nephritis as compared to WT MRL/lpr mice.152,170
These seemingly contradictory roles of AID in B cell autoimmunity may result from distinct stage-specific functions of AID during B cell differentiation. In the bone marrow, AID expression is required to purge autoreactive immature B cell clones by imposing genotoxic stress and promoting apoptosis. Whereas during the subsequent development of B cells in germinal centers and extrafollicular areas of secondary lymphoid organs, excessive AID-mediated SHM can engender B cell autoreactivity. B cells also interact with other immune cells, such as T cells, and this interaction is also influenced by AID171,172, further complicating the role played by AID in autoimmunity. These intricate immune regulatory circuits involving AID exemplify an emerging link between immunodeficiency and autoimmunity in many diseases.173
APOBEC1 can deaminate cytosines in both ssDNA and RNA. In humans APOBEC1 is expressed only in small intestine, and in rodents it is expressed in multiple tissues including the liver.174
The major physiological function of APOBEC1 is editing apolipoprotein B (apoB) mRNA and this affects lipid metabolism and transport. ApoB protein is found in two forms, ApoB-100 and ApoB-48. The larger ApoB-100 is synthesized in the liver and is the major protein component of LDL. The truncated form ApoB-100, ApoB-48, is synthesized in the small intestine and is essential for chylomicron assembly and secretion. Unlike ApoB-100, ApoB-48 lacks the LDL receptor binding domain and as a result, ApoB-48 containing lipid vesicles are rapidly cleared from circulation.175
A1 is the catalytic component of a complex that also contains auxiliary proteins including ACF (APOBEC1 complementation factor) and ASP (APOBEC1 stimulating protein).176,177 A1 deaminates cytosine in the codon 2153 (C6666AA) in ApoB mRNA to uracil creating an in-frame stop codon.7,178 This causes a premature translation termination and creation of the shorter apoB-48 protein.
Apart from editing cytosines in RNA, A1 also has the ability to deaminate cytosines in DNA.48 The target sequence context is different for RNA and DNA. Whereas the target cytosine in ApoB mRNA is flanked by an adenine on the 5′ side, E. coli genetic assays and biochemical assays with purified protein suggest that in DNA the preferred 5′ nucleotide is a pyrimidine.5,48 The physiological function of DNA editing by A1 has not been established.
An early investigation of transgenic mice and rabbits expressing A1 found that the animals showed liver abnormalities and developed hepatocellular carcinoma.181 Mice expressing a truncated form of A1 did not show these abnormalities. In addition to the editing of apoB mRNA, other mRNAs in the liver were found to be edited in transgenic mice.181 At the time of these studies APOBEC1 was known to act only on RNA and hence the malignant transformation was attributed to aberrant editing of mRNA for a protein with homology with a translation factor.179,181 There is a need to revisit this issue by determining genomic uracils, mutations and genomic rearrangements in hepatic carcinoma created by the dysregulation of A1.
There is currently no authentic animal model for the studies of individual APOBEC3 genes and proteins because rodents have a single APOBEC3 gene, while humans have seven APOBEC3 genes.182 Consequently, most experiments with human APOBEC3s are done using primary or immortalized cells of human or non-human origin, and the results are often dependent on the specific cell line being used. This has created significant confusion regarding the expression of some the APOBEC3 proteins and their biological function.
Several APOBEC3 genes are up-regulated in response to cytokine treatment of cells. While some APOBEC3 genes are expressed constitutively in some tissues, expression of other APOBEC3s is cytokine responsive.183,184
The genes for A3A, A3F and A3G, but not A3B or A3C, are upregulated by IFN-α in several hematopoietic cells including dendritic cells, macrophages and naïve CD4+ T cells.184,185 IFN-α enhances A3G expression in cells that are targets of HIV-1 infection, resting CD4+ T cells186, human peripheral plasmacytoid dendritic cells187 and macrophages.185,188 Moreover, INF-α and IFN- γ enhance the expression of A3F and A3G in primary human brain microvascular endothelial cells, the major component of blood-brain barrier, presumably to restrict HIV-1 entry into the central nervous system.189 A3G is also induced by several other factors including IL-2, IL-15, and to a lesser extent IL-7, in peripheral blood lymphocytes and by TNF-α during the maturation of dendritic cells which are different cellular targets of HIV.188
Hepatitis B virus (HBV) infection of hepatocytes results in expression of cytokines and various APOBEC3 proteins. A3G gene has an IFN response element and its expression is upregulated upon stimulation of cells with IFN-α.190 In primary human hepatocytes or HBV infected liver cells A3A, A3B, A3F and A3G expression is induced by IFN-α.191,192 Additionally, treatment of a cervical keratinocyte cell line that contains episomal HPV16 genomes with IFN-β results in increased expression of A3A, A3F and A3G genes.193 The studies described above collectively show that expression of APOBEC3 genes is an important part of cytokine-mediated inflammatory response.
There are several mechanisms in mammals to detect and eliminate foreign DNA circulating in bodily fluids. Dendritic cells and macrophages detect foreign DNA unmethylated at CpG sequences through toll-like receptor 9 in endosomal compartments194 Additional DNA sensors include the complex “stimulator of interferon genes” (STING), “DNA-dependent activator of IFN regulatory factor” (DAI), “absent in melanoma-2” (AIM2) and RNA polymerase III.195–197 Together they participate in signaling pathways that produce pro-inflammatory cytokines and chemokines such as TNFα and type I interferons resulting in the transcription of effector genes to elicit a broad innate immune response.196,198–200 The subsequent steps in the process of clearance of foreign DNA are less characterized, but APOBEC3 proteins may play a role in it.
Expression of A3A increases more than 100-fold in response to interferon treatment of cells such as monocytes and macrophages, which ingest foreign objects like bacteria and viruses.183,201 A3A expression affects the integrity and stability of foreign DNA reducing gene transfer efficiency, inhibiting transient gene expression and destabilizing foreign plasmid DNA. Differential DNA denaturation PCR (3DPCR) and an UNG inhibitor were used to demonstrate that the foreign DNA underwent deamination of cytosines to uracils.201 A subsequent study showed that A3A expressed in monocytes was predominantly cytoplasmic and its optimal activity occurred in acidic pH range found in endosomes.202,203 Methylation of CpGs suppresses cellular recognition of foreign DNA204 and among the AID/APOBEC family members, A3A is most efficient at deaminating 5mC.8,205,206 This raises the interesting possibility that the ability of A3A to deaminate 5mCs in DNA is related to clearance of viruses that carry CpG methylation.207 Family members such as A3G would be ineffective at this task because of their inability to deaminate 5mC.8
APOBEC3 subfamily proteins contribute to innate immunity by restricting viral infection and propagation.51,208 The first APOBEC3 protein shown to have activity that restricts HIV infection was A3G.209 Restriction of HIV by APOBEC3s has been reviewed extensively210–212 and will not be described here in detail. Briefly, early work showed that A3G protein is incorporated into HIV-1 particles and during reverse transcription of the viral RNA, A3G deaminates cytosines in minus-strand DNA to cause G to A mutations213,214 and creates non-infectious virions. Additionally, it has been suggested that excision of uracils created by APOBECs by the cellular uracil-DNA glycosylase may result in the degradation of viral DNA215 However, the reverse transcription of viral RNA takes place in the cytoplasm and only mitochondrial (UNG1) and nuclear (UNG2) forms of UNG have been described previously216 Thus the interaction between the UNG protein and the DNA copy of the viral genome needs to be studied in greater detail.
HIV encodes the protein virion-infectivity factor (Vif) that abrogates restriction of HIV by A3G and other APOBEC3 family members215 and thus the above-mentioned experiments were done using Vif− strains of HIV. Vif prevents A3G incorporation into the progeny virus and directs its degradation by a proteosome-dependent pathway.217–219 The interactions of Vif with CUL5-RBX2, ELOB-ELOC and CBFβ to form ubiquitin ligase E3 and its interactions with APOBEC3 family proteins has been described in detail210,220–223 and will not be described further.
A3D/E, A3F and A3H haplotypes II, V and VII, may also provide protection against Vif-deficient HIV-1 in tissue culture models.224 Using humanized mouse models it was also shown that several APOBEC3 enzymes (A3G, A3D, A3F) can restrict HIV-1 in vivo225 Some studies also report that A3A, A3B and A3C are capable of inhibiting HIV infection210,226,227, but their significance is controversial.228,229 This is partly because some members, including A3B, are able to inhibit wild type Vif-proficient HIV, but are not normally expressed in T cells that are the primary targets of HIV infection.227
Subsequently, HIV-1 restriction was also observed with catalytically defective variants of A3G and A3F, hence a deaminase independent mechanism may also inhibit HIV growth. The prevailing model to explain this phenomenon invokes binding of APOBEC protein to viral RNA and blocking the reverse transcription of viral genome.211 In summary, in CD4+ T cells both editing and non-editing mechanisms mostly by A3G, and to lesser extent by A3F and A3D/E, contribute to the restriction of Vif-defective HIV-1.230
In addition to HIV, APOBEC proteins have been reported to restrict or mutate a broad range of RNA viruses including other retroviruses such as human T-cell leukemia virus type-1 (HTLV-1)231–233 and human foamy virus.234 A3G appears to play a major role in restricting both HTLV-1232 and foamy viruses.234 Also, other reports suggest that, like HIV, these viruses may also express proteins that counteract the APOBEC3G protein235–238 and hence this issue needs further examination.
DNA viruses including adeno-associated virus (AAV), hepatitis B virus (HBV), human papillomavirus (HPV) and herpes viruses such as herpes simplex-1 (HSV-1) and Epstein-Barr Virus (EBV) have been reported to be restricted by APOBEC3s.51,208,239
AAV is a nuclear replicating parvovirus that is restricted by A3A, but not A3G.240,241 3DPCR was used to show that the genome of another nuclear replicating virus, HBV, mutated at different levels by a number of APOBECs including A3A, A3B, A3C, A3F and A3G.242,243 Cytokine-mediated upregulation of A3A and A3B results in the degradation of HBV covalently closed circular nuclear DNA without apparent damage to the host genomic DNA.192
HPV is a double stranded DNA virus that infects skin cells and infection of keratinocyte cell line with HPV E6 results in expression of A3B.244 Again, 3DPCR was used to show that HPV was hyperedited when A3A, A3B and A3H enzymes were transiently overexpressed.245 In another study hypermutation of HPV16 in cervical keratinocytes were seen when A3A, A3F and A3G were upregulated following IFN-β treatment.193 Both HSV-1 and EBV genomes may be susceptible to editing by A3C expressed in HeLa cells and overexpression of A3C through transfection results in the reduction of both viral titers and infectivity by HSV-1.246 Edited EBV DNA was also found in infected peripheral blood mononuclear cell lines in association with high levels of A3C expression.246
Although these studies collectively suggest that APOBEC3 proteins mutate many human viruses, there are several concerns about these data. It is unclear how A3G, which is found almost exclusively in the cytoplasm, can edit nuclear replicating EBV and HSV-1 viruses242,246 There are also hints that, like HIV, other human viruses may employ protective measures against APOBEC3 proteins and there is a need to investigate them further. For example, most herpes viruses code their own UNG (vUNG) proteins that are expressed at the onset of DNA replication. Furthermore, it has been shown that inactivation of vUNG in several herpes viruses reduces viral replication.247–249 It would be interesting to investigate whether vUNG proteins provide protection against the action of AID/APOBEC family enzymes on herpes viral genomes.
Another significant concern is that a good deal of the work regarding antiviral effects of APOBEC3s is based on analysis of mutations in viral genomes using the technique of 3DPCR.250 It uses lower than normal temperatures during the denaturation step in PCR and thus selectively amplifies a fraction of molecules within the products that contain multiple mutations. This has the potential of exaggerating the mutational effects of APOBEC3s. It has also been shown that this procedure introduces mutations in PCR products at a frequency of 1 in 500 bp251, which is at least an order of magnitude higher than normal PCR. It would be useful to use high fidelity PCR and deep sequencing technologies to reexamine mutational effects of APOBEC3s in human viruses.
There are two major classes of retroelements in mammalian cells. One class includes elements with long terminal repeats (LTRs) such as endogenous retroviruses and the second class consists of non-LTR retroelements such as the long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs). APOBEC proteins inhibit retrotransposition of both kinds of retroelements.
A3A, A3B, A3C and A3F have a strong inhibitory effect on non-LTR retroelements, specifically LINE-1 (L1) and Alu elements.252–254 A3B and A3F are thought to inhibit L1 transposition through a deamination-dependent mechanism.255 Although the involvement of A3G against non-LTR retroelements is controversial256,257, A3G can suppress the retrotransposition of mouse LTR retrotransposons MusD and IAP as well as yeast Ty1 retrotransposon.258,259 Similarly A3A, A3B, A3C and A3F have been shown to inhibit these LTR retrotransposition.256
Almost half the human genome consists of DNA transposons and retroelements260 and mutational footprints of APOBEC3s can be found in many such sequences.261 There is considerable evidence that the spread of these elements has occurred sporadically in the evolution of eukaryotes including primates.262–264 It is possible that the expansion of APOBEC3s from one or two genes in the common ancestor of mammals to seven genes in primates10 may be explained by bursts of retrotransposition activity over 100 million years of mammalian evolution.
The potential of AID for causing mutations in the cellular chromosomes was recognized even before the enzyme was discovered265,266, but similar hazard was not associated with the other enzymes in the family for over a decade following their discovery. This slothful scientific slumber was rudely disturbed by the publication by Nik-Zainal et al describing several breast cancer tumors with APOBEC3 mutational signatures.267,268 These whole genome sequencing (WGS) studies identified APOBEC3-specific mutational signature, C to T or C to G mutations in TCW sequence context, in the tumor genomes and found that the mutations are often clustered.268 Additionally, the TCW sequences in these mutational clusters tend to have the target cytosines in the same DNA strand, a phenomenon referred to as strand-coordinated mutations.268–270 Subsequently, this signature was also found in more than a dozen different types of cancers including those of head-and-neck and cervix.267,269,271,272 Additionally, expression of A3B was correlated with occurrence of breast273 and other cancers274,275, and other studies have implicated both A3A and A3B in cancer genome mutations.276 It should be noted that all the APOBEC3s with the exception of A3G have essentially the same mutational signature, and hence these mutations are collectively referred to as APOBEC signature mutations. It is now clear that the some members of the APOBEC3 family play a significant role in causing mutations in cancer genomes277, but whether they play a role in driving carcinogenesis and tumor progression remains unclear.
One analysis of both breast and lung tumor sequences found that mutations with APOBEC signature were more prevalent in early replicating regions and that this was opposite of the distribution of most other somatic mutations in these cancers which were found in late replicating regions.278 Furthermore, these mutations were not correlated with transcription suggesting that they were unlikely to have the same underlying mechanism as AID-generated genomic mutations. In a different study, A3A was expressed from a regulated Tet promoter and expressed either in cells blocked in the G1 phase or released from G1 and allowed to enter the S phase. The latter cells acquired more genomic uracils and strand breaks than the cells in G1.279 Together, these studies implicate replication forks as targets for APOBEC3s.280
This correlation between replication and APOBEC-generated cytosine deaminations was confirmed and extended through further analysis of strand-coordinated mutations in cancer genomes with APOBEC signature.281,282 In both the studies, the origins of replication (ORI) were assumed to lie at the intersection of replication timing versus distance plots for the left- and right-replicating forks. Based on this identification of ORI, APOBEC signature mutations were assigned to the two replication strands, the lagging strand template (LGST) and the leading strand template (LDST). This analysis found that 50% to 100% more mutations were in the LGST compared to the LDST.281,282 This preferential targeting of LGST by APOBEC3s may be explained by the fact that the LGST spends more time in a single-stranded state than the LDST.283
Studies using yeast284 and E. coli285 genetic systems have provided strong experimental evidence in support of this hypothesis. When A3G-CTD was expressed in an ung− mutant E. coli and the resulting mutations were determined by WGS in 50 independent cell lines grown for over 1,000 generations, the resulting C:G to T:A mutations had a strong strand bias. There were 3- to 4-times as many transition mutations with the cytosine in LGST than in LDST.285 The bulk of the C:G to T:A mutations were in runs of C’s indicating that they were likely to have been caused by A3G-CTD and the strand bias was greatly diminished when a catalytically defective mutant was expressed in the cells.285
When the catalytic domain of A3G (A3G-CTD) was expressed in an engineered ung1Δ yeast strain, the mutation frequency in a reporter gene in the ssDNA region generated through aberrant resection of telomeric ends increased substantially.286 This showed that A3G can target ssDNA in the yeast genome. In a different study, expression of A3A or A3B in an ung1-defective, but otherwise normal, yeast strain produced strand-coordinated mutations linked to replication.284 The mutations were predominantly G:C to A:T when the reporter gene was on the 5′ side of ORI, but C:G to T:A mutations dominated when the same reporter was on the 3′ side of the ORI. Furthermore, destabilization of replication through the use of a yeast mutants defective in RPA, the single-strand DNA-binding protein, or Tof1, a protein that couples replicative polymerases with the helicase that opens the replication fork, increased the frequency of mutations caused by A3A and A3B.284 A large increase in mutation frequency was also seen when the cells were treated with hydroxyurea, a chemical known to cause replication stress. Unexpectedly, the strand bias in mutations disappeared on the 5′ side, but not the 3′ side, of the ORI in both the Tof1 mutant strain and in cells treated with hydroxyurea.284 Finally, the strand-coordinated mutations caused by A3A and A3B were more often clustered when the replication fork was disturbed compared to untreated wild-type cells and are reminiscent of the clustered mutations seen in cancer genomes268
The following model (Fig. 3B) can explain mutations and strand breaks observed in dividing cells expressing APOBECs. Most cytosines in the LDST will be inaccessible to APOBECs because they will be in double-stranded DNA. Cytosines in the single-stranded gaps between Okazaki fragments will be deaminated by the APOBECs and most of these uracils will be copied immediately by the replicative DNA polymerase δ creating C:G to T:A mutations. Occasionally, UNG2 may excise the uracil before a polymerase has a chance to copy it, and the resulting AP site may be copied by TLS polymerase η again creating the same mutation in most cases.65 Alternately, the abasic site may be copied by REV1 creating C:G to G:C transversions.66,67 Finally, if AP endonuclease APE1 acts on the AP site before TLS bypass of the AP site, a double strand break will occur (Fig. 3B). These may be the strand breaks observed by Green et al279 when they expressed A3A in the S phase. Thus expression of APOBECs during replication should cause genome instability in addition to generating mutations.
In contrast to these results in dividing cells, Lada et al found that genome-wide mutations caused by the lamprey DNA-cytosine deaminase, PmCDA1, in non-dividing yeast were often clustered and the occurrence of clusters was correlated with known transcription levels of the target genes.287,288 Furthermore, more mutations were in the 5′-UTRs than in the bodies of the genes, and there were substantially more C to T mutations in the non-transcribed DNA strand than in the transcribed strand in all the parts of the genes.287 This result is well-explained by the observation that the non-transcribed strand is more susceptible to damage than the transcribed strand75,289 (Fig. 3A). Together, these observations suggest that in dividing cells, the APOBEC enzymes preferentially target cytosines in the LGST, but in non-dividing cells the preference may shift to cytosines in the non-transcribed strand of actively transcribed genes.
As mentioned above, there is currently no animal model for studying the role of APOBEC3 proteins in carcinogenesis. However, a number of lines of evidence suggest that these proteins play an important role in promoting malignant transformation.
About 15–20% of human cancers are attributed to viruses290 and these tumor viruses are distributed throughout DNA and RNA virus families.291 They include DNA viruses such as HPV, EBV and human herpes virus 8/ Kaposi’s sarcoma-associated herpes virus, and RNA viruses including human T-lymphotrophic virus-1 and HCV.292 However, virus infected cells launch an innate immune response, in part, by releasing cytokines such as interferons and interleukins.293 In general, the release of these cytokines has been linked to cancer through the generation of oxidative stress, stimulation of pro-growth signal transduction pathways, and synthesis of chemokines and growth factors.293,294 While it is possible that these molecules and pathways play major roles in causing DNA damage and promoting cancer, AID/APOBEC proteins may also play a role in this transformation.
As discussed in section 5.1, cytokines produced during viral infections trigger the expression of AID/APOBEC proteins and as discussed in sections 3.4 and 5.6 these enzymes can cause genome-wide mutations. Therefore, AID/APOBEC expression could be a key link between viral infection and malignant transformation (Fig. 4). What remains unclear at this point is whether or not APOBEC enzymes cause a significant fraction of the driver mutations or genome rearrangements seen in cancers. Consequently, there is an urgent need for the development of new animal models to study the link between APOBEC enzymes and cancer.
The AID/APOBEC field was trifurcated at its birth. RNA and lipid biochemists who study RNA editing by APOBEC1 and its role in lipid metabolism reside on the first leaf of this three-leaf clover. One report showed 20 years ago that A1 may cause liver cancer, but there have been no further reports confirming or refuting this observation. There have also been few studies of the effects of A1 expression on cellular genome mutations. Virologists have resided on the second leaf of this Shamrock studying the role of APOBEC3 subfamily in innate immunity and inhibition of virus growth. Although APOBEC3 family members were shown to bind RNA and this binding has a role in antiviral function of the proteins, no evidence was presented that they may act as RNA-cytosine deaminases. The possible role of APOBEC3 enzymes in causing genomic mutations and promoting cancer was also rarely examined. On the last leaf of this Shamrock lie immunologists and tumor biologists who are focused on the role of AID in antibody maturation and B cell cancers. The studies of the role of AID in carcinogenesis have focused on DSBs and translocations with only a few studies exploring the non-Ig gene mutations. Despite an early observation that AID binds RNA2, the functional significance of this observation was not explained.
Luckily, a number of recent studies suggest that this separate progress in the three separate fields is about to converge, especially when it comes to AID and APOBEC3 enzymes. Recently, A3A was implicated in C to U editing in RNA9 and, more surprisingly, in G to A editing in RNA.295 Additionally, AID was shown to bind intronic RNA from the switch regions and the evidence suggests that this bound RNA guides the enzyme to the switch regions and is essential for CSR.296 It is likely that all members of the AID/APOBEC family bind RNA and this binding is biologically relevant. The implicit assumption that APOBEC3 enzymes act only on viral or retrotransposition intermediates must be set aside in view of the work of Nik-Zainal et al268 and others. Interestingly, the APOBECs were shown to act at replication forks281,282,284,285 creating a contrast with AID which was known to act only on actively transcribed genes. However, Le and Maizels297 recently showed that this dichotomy is not dictated by the biochemistry of AID, but is probably the result of a lack of expression of AID in the S phase. If AID is expressed in the S phase it causes strand breaks in the same way as A3A.279,297 Lastly the link between the expression of AID and APOBEC3s with inflammation suggests that they may share more unexplored biology than previously suspected.
The AID/APOBEC family is unique among all known enzymes in that most members of this family cause damage to a DNA base and the biological purpose of this action is to cause base substitution mutations and strand breaks. Vertebrates express these enzymes to gain immunity against infections and Homo sapiens appear to have the highest number of genes for these enzymes. However, this protective strategy requires that these enzymes perform a high-wire act in which they mutate the foreign genomes (or the host immunoglobulin genes) without damaging much of their own genome. This is the genetic equivalent of playing with fire and it is increasingly becoming clear that it fails in some cells and get badly burnt. In particular, many members of the AID/APOBEC family are implicated as the source of mutations found in many cancers and some of these mutations may drive carcinogenesis. We need to learn a great deal more about the regulation of these enzymes in response to infections, the role of different family members in restricting different human viruses under biologically relevant conditions, the kind and amount of damage they cause to the host genome, the cellular mutation avoidance mechanisms that limit such damage, and the contributions of the DNA damage that escapes repair to cellular dysfunction including their malignant transformation.
Molecular graphics were generated using the UCSF Chimera package. We would like to thank Ms. Shanqiao Wei and Jessica Stewart (Wayne State University) for help with a figure and Dr. Thomas Holland (Wayne State University) for comments on the manuscript. The research in the Bhagwat laboratory was supported by national Institutes of Health grant GM 57200 and funds from the Wayne State University office of Vice-President for research. Research in the Chen laboratory was supported by grants from the National Institutes of Health (U01AI95776 Young Investigator Award, R21AI122256 and Grant P30CA22453), the Burroughs Wellcome Fund, American Congress of Obstetricians and Gynecologists and Wayne State University Maternal, Perinatal and Child Health Initiative.
Sachini U. Siriwardena was born in Colombo, Sri Lanka in 1987. She earned a B.Sc. in Molecular Biology and Biochemistry with first class honors in 2011 from the University of Colombo. Subsequently, she worked as a temporary Assistant Lecturer in the same institute for one year. Currently she is pursuing a Ph.D. degree in Biochemistry at the Wayne State University under the guidance of Ashok Bhagwat.
Kang Chen is an Assistant Professor at Wayne State University since 2012. He studied at the National University of Singapore on a Singapore government scholarship and graduated with First-Class Honors in Biochemistry. He received Ph.D. in Immunology and Microbial Pathogenesis from Weill Cornell Medical College and Memorial Sloan-Kettering Cancer Center and did postdoctoral research at Icahn School of Medicine at Mount Sinai. He also worked with the United Nations Human Settlements Programme (UN-HABITAT) on initiatives of sustainable urbanization and public health. He has published articles in Science, Nature Immunology, Immunity, Annual Review of Immunology, Journal of Clinical Investigation, received many prestigious awards, such as Singapore Lijen Industrial Development Medal, Charles Janeway Memorial Award, Vincent du Vigneaud Award of Excellence, and is a Burroughs Wellcome Fund investigator in preterm birth. Research in his lab is focused on the regulation of antibody diversification, mucosal immunology and immune pathogenesis of reproductive disorders.
Ashok S. Bhagwat is a Professor of Chemistry, and Immunology and Microbiology at Wayne State University. He obtained B.Sc. and M.Sc. in Physics from the University of Bombay and the Indian Institute of Technology, Bombay, respectively. His interest in biological research was stimulated at the Indian Institute of Science, Bangalore and he eventually obtained a Ph.D. in Biophysics from Pennsylvania State University. Following post-doctoral work at Cold Spring Harbor laboratory, he joined Wayne in 1988. His research interests include the biochemistry of DNA-acting enzymes and is a hopeless romantic about the importance of basic research in science.