|Home | About | Journals | Submit | Contact Us | Français|
Toxoplasma gondii is an obligate intracellular parasite of the phylum Apicomplexa, which includes a number of species of medical and veterinary importance. Inhibitors of lysine deacetylases (KDACs) exhibit potent antiparasitic activity, suggesting that interference with lysine acetylation pathways hold promise for future drug targeting. Using high resolution LC-MS/MS to identify parasite peptides enriched by immunopurification with acetyl-lysine antibody, we recently produced an acetylome of the proliferative intracellular stage of Toxoplasma. In this study, we used similar approaches to greatly expand the Toxoplasma acetylome by identifying acetylated proteins in non-replicating extracellular tachyzoites. The functional breakdown of acetylated proteins in extracellular parasites is similar to intracellular parasites, with an enrichment of proteins involved in metabolism, translation, and chromatin biology. Altogether, we have now detected over 700 acetylation sites on a wide variety of parasite proteins of diverse function in multiple subcellular compartments. We found 96 proteins uniquely acetylated in intracellular parasites, 216 uniquely acetylated in extracellular parasites, and 177 proteins acetylated in both states. Our findings suggest that dramatic changes occur at the proteomic level as tachyzoites transition from the intracellular to extracellular environment, similar to reports documenting significant changes in gene expression during this transition. The expanded dataset also allowed a thorough analysis of the degree of protein intrinsic disorder surrounding lysine residues targeted for this post-translational modification. These analyses indicate that acetylated lysines in proteins from extracellular and intracellular tachyzoites are largely located within similar local environments, and that lysine acetylation preferentially occurs in intrinsically disordered or flexible regions.
Toxoplasma gondii is an obligate, intracellular parasite of the phylum Apicomplexa, of which a number of medically important human pathogens are members. This protozoan parasite displays remarkable distribution throughout the world with estimates that it has chronically infected up to 30% of the human population.1 Toxoplasma has attained this cosmopolitan distribution largely due to its ability to convert into a latent tissue cyst known as the bradyzoite. Chronic infection with latent bradyzoite cysts is asymptomatic in immunocompetent individuals; however, upon host immunosuppression (due to chemotherapy, organ transplantation, or HIV infection) the parasite reconverts into its proliferative tachyzoite form, which causes severe tissue damage that can result in organ failure and death.2 While an important human pathogen in its own right, its ease of in vitro culture and genetic tractability also make Toxoplasma an attractive model for other apicomplexan parasites such as Plasmodium.
Traditionally studied in the context of histone modification and epigenetic gene regulation, lysine acetylation has recently emerged as a widespread post-translational modification (PTM) found on proteins involved in almost every cellular function.3 Recent studies have clearly illustrated that lysine acetylation is detected on many non-histone substrates, including other nuclear proteins as well as proteins located in the cytoplasm and mitochondria.4, 5 Acetylated proteins have functions in metabolism, mRNA translation, protein folding, DNA packaging, and the cytoskeletal system. The development of specific acetyl-lysine antibodies has allowed analyses of acetylation at the whole-proteome level, and the resulting “acetylomes” have been described for prokaryotes,6–8 plants,9, 10 Drosophila melanogaster,11 human cells,3, 12, 13 and most recently for the intracellular Toxoplasma tachyzoite.14
Acetylation is a reversible PTM that may affect protein stability, localization, activity, or protein-protein interactions. How the modification affects an individual protein is dependent on the function of the protein, as well as the context of the modified lysine with respect to other PTMs.15 The abundance of reversible Nε-acetylation on very different proteins underscores the regulatory potential of this modification, which has led to the idea that acetylation may rival phosphorylation as a signaling modality.4, 16 In this context, it is not surprising that dysregulation of lysine acetylation has been linked to cancer, aging, and neurological disease.17–19
PTMs extend the range of amino acid structures and properties, thereby diversifying protein structure and function.20 DNA encodes for 20 primary amino acids, yet proteins contain more than 140 different residues due to various PTMs. All amino acid side chains are known to undergo chemical diversification due to various PTMs and altogether there are more than 300 different PTM types described in literature.21 PTMs are generally catalyzed by specialized enzymes that recognize specific target sequences within a protein. Some PTMs (e.g., phosphorylation and acetylation) are reversible through the action of specific deconjugating enzymes. The interplay between modifying and demodifying enzymes allows for rapid and economical control of protein function. In higher eukaryotes, as much as 5% of the genome is expected to encode enzymes related to the post-translational modification of the proteome.20
It has been proposed that PTMs can be classified by taking into account the conformational state of the site where the modification takes place.22, 23 Bioinformatics analysis has revealed that many PTMs are located within intrinsically disordered regions. Among this group of PTMs are phosphorylation, acetylation, acylation, adenylylation, ADP ribosylation, amidation, carboxylation, formylation, glycosylation, methylation, sulfation, prenylation, ubiquitination, and Ubl-conjugation (i.e., covalent attachment of ubiquitin-like proteins, including SUMO, ISG15, Nedd8, Atg8, etc.).22, 23 This observation is in line with the notion that intrinsic disorder (i.e., lack of ordered structure in a functional protein or a protein region) is crucial for biological functions. In fact, it is clear now that intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are abundantly involved in numerous biological processes,22, 24–67 where they are found to play different roles in regulation of the function of their binding partners and in promotion of the assembly of supra-molecular complexes. The conformational plasticity associated with intrinsic disorder provides IDPs/IDPRs with a wide spectrum of exceptional functional advantages over the functional modes of ordered proteins and domains.24, 26, 28–30, 37–39, 45–47, 49, 53 For example, the high accessibility of sites within the disordered proteins simplifies their post-translational modifications, such as phosphorylation, acetylation, lipidation, ubiquitination, sumoylation, etc., allowing for a simple mean of the modulation of their biological functions.53 Many IDPRs contain specific identification regions via which they are involved in various regulation, recognition, signalingand control pathways.37, 39 Conformational plasticity confers numerous advantages to the intrinsic disorder-based protein interactions.24, 26, 45, 68–71 Some of these advantages are: (i) Decoupled specificity and strength of binding resulting in high-specificity-low-affinity interactions; (ii) Increased speed of interaction due to greater capture radius and the ability to spatially search through interaction space; (iii) The ability for fast formation of encounter complexes that relaxes spatial orientation requirements; (iv) Increased interaction surfaces; (v) The ability to fold to different conformations according to the templates provided by binding partners; (vi) The existence of multiple linear binding epitopes; (vii) The existence of overlapping binding sites due to extended linear conformation; (viii) The ability of a single disordered region to bind to several structurally diverse partners; (ix) The ability of many different proteins to bind to a single partner. The combination of high specificity with low affinity defines the broad utilization of intrinsic disorder in regulatory interactions where turning a signal off is as important as turning it on.26
Furthermore, many IDPs/IDPRs possess complex “anatomy” (they contain multiple, relatively short functional elements), which contributes to their unique “physiology” (an ability to be involved in interaction with, regulation of, and control by multiple structurally unrelated partners). Given the existence of multiple functions in a single disordered protein, and given that each functional element is typically relatively short, alternative splicing could readily generate a set of protein isoforms with a highly diverse set of regulatory elements.72 The complexity of the disorder-based interactomes is further increased due to the ability of a single IDPR to bind to multiple partners gaining very different structures in the bound state.49
IDPs can form highly stable complexes, or be involved in signaling interactions where they undergo constant “bound-unbound” transitions, thus acting as dynamic and sensitive “on-off” switches. The ability of these proteins to return to the highly flexible conformations after the completion of a particular function, and their predisposition to gain different conformations depending on the environmental peculiarities, are unique physiological properties of IDPs which allow them to exert different functions in different cellular contests according to a specific conformational state.53
The ability to be modulated by various PTMs is a key functional advantage of IDPs/IDPRs. The importance of intrinsic disorder as it pertains to the catalysis of different PTMs is well illustrated by kinases. It is estimated that the function of one-third of eukaryotic proteins is controlled via phosphorylation/dephosphorylation cycles that originate from carefully regulated protein kinase and phosphatase activities.73 Eukaryotic protein kinases constitute one of the largest gene families, e.g., yeast kinome includes 119 kinases, Arabidopsis thaliana contains 1019 kinase- and 300 phosphatase-coding genes, the mouse kinome includes 540 kinases, and the human genome contains ~520 genes encoding kinases and more than 150 genes encoding phosphatases.23 However, in any given proteome, the number of kinases and phosphatases is noticeably smaller than the number of potential substrates. On average, each eukaryotic protein kinase serves ~20 substrates, whereas each human phosphatase is expected to dephosphorylate ~65 clients. Although phosphorylation by each kinase is a highly specific process, kinase substrates typically bind to the enzyme with weak affinity. A combination of high specificity and low affinity is characteristic of intrinsic disorder-based signaling interactions.74–79
As with many reversible PTMs catalyzed by specific enzymes, acetylation is expected to predominantly occur in intrinsically disordered regions. We addressed this hypothesis by performing a comprehensive bioinformatics analysis of the Toxoplasma acetylome. We have previously reported the acetylome of the actively replicating, intracellular Toxoplasma tachyzoite.14 To further elucidate functional roles that may exist for protein acetylation in Toxoplasma, and to investigate if the protein acetylation landscape changes as the tachyzoites transition from one environment to another, we carried out an acetylome analysis on recently egressed tachyzoites. Using extracellular parasites, we detected novel protein lysine acetylation sites across an additional 267 proteins. The expanded dataset including all acetylated lysines detected in Toxoplasma to date was used to analyze the characteristics of the amino acid sequences flanking the targeted lysine residue. We want to emphasize here that the major focus of our study is at the disorder status of acetylated and non-acetylated lysines in Toxoplasma, and that this work should not be understood as a quantitative analysis of differences between intra- and extracellular parasites.
Immortalized human foreskin fibroblasts (hTERT) host cells were grown to confluency in DMEM supplemented with 10% heat-inactivated fetal bovine serum, maintained at 37°C in 5% CO2 in a humidified incubator. Toxoplasma RH strain was allowed to infect hTERT monolayers in DMEM supplemented with 1% fetal bovine serum (Invitrogen). Extracellular tachyzoites were harvested 1–2 hours post-egress, when approximately 95% of the host cell monolayer had been lysed. The preparation of the parasite lysate, acetyl-peptide immunopurification, and analysis by LC-MS/MS was performed as described previously.14 The mzXML, Dtas and Out files associated with this manuscript may be downloaded from ProteomeCommons.org Tranche using the following hash: “7a4RlqaXAxoPSLW/iJ0fxYON0FPw5X3a93rpHqUco2fTovcvKtfj6G6kB0PGs6tMUj6YEJW+NPCWkk0DxLbb39vi2uQAAAAAAAAJIw==”.
Acetylated proteins were classified according to Gene Ontology (GO) annotations as listed on the Toxoplasma sequence database (ToxoDB 7.0, www.toxodb.org) and Uniprot (http://www.ebi.ac.uk/uniprot/). When a single peptide could be matched to two or more different proteins, manual inspection of gene expression and proteomics data from ToxoDB 7.0 usually allowed determination of the likely protein from which the peptide was derived. If such an inference was not possible, the peptide was omitted from the rest of the analysis.
Predictions of intrinsic disorder in proteins were performed using a PONDR® VLXT predictor, access to which was provided by Molecular Kinetics, Inc. (http://www.pondr.com). PONDR® (Predictor Of Natural Disordered Regions) is a set of neural network predictors of disordered regions on the basis of local amino acid composition, flexibility, hydropathy and other factors. These predictors classify each residue within a sequence as either ordered or disordered. PONDR® VLXT integrates three feed forward neural networks: the Variously characterized Long, version 1 (VL1) predictor, which predicts non-terminal residues.70, and the X-ray characterized N- and C- terminal predictors (XT), which predicts terminal residues.80 Output for the VL1 predictor starts and ends 11 amino acids from the termini. The XT predictors output provides predictions up to 14 amino acids from their respective ends. A simple average is taken for the overlapping predictions; and a sliding window of 9 amino acids is used to smooth the prediction values along the length of the sequence. Unsmoothed prediction values from the XT predictors are used for the first and last 4 sequence positions.
In addition to PONDR® VLXT, intrinsic disorder in intracellular and extracellular Toxoplasma tachyzoites was evaluated by IUPred,81 which predicts intrinsic disorder in protein regions from their amino acid sequences by estimating the total pair-wise inter-residue interaction energy in a protein sequence. This tool is based on the assumption that IDP/IDPR sequences do not fold due to the lack of the sufficient number of stabilizing inter-residue interactions.81
Compositional profiling is a computational tool for the automated detection of enrichment or depletion patterns of individual amino acids or groups of amino acids classified by several physico-chemical and structural properties in a set of query proteins.82 The calculations were performed using a normalization procedure elaborated for analysis of intrinsically disordered proteins.82 Using this method, compositional profiling is based on the evaluation of the (Cs1 − Cs2)/Cs2 values, where Cs1 is a content of a given residue in a query set of proteins of interest and Cs2 is the corresponding value for the sample set of proteins (e.g., set of ordered proteins from PDB). In this presentation, negative values correspond to residues that are depleted in a given dataset in comparison with a set of ordered proteins, whereas positive values correspond to residues that are overrepresented in the set.
In a previous study, we mapped the acetylome of actively proliferating intracellular tachyzoites, identifying 411 lysine acetylation sites across 274 proteins.14 Using the same techniques, we have now determined an acetylome for non-replicating tachyzoites that have freshly egressed from their human host cells. Proteomic analysis of these extracellular tachyzoites identified 571 acetylated lysines across 386 proteins (Supplemental Table 1). 177 of these acetylation marks on 119 proteins were also detected in intracellular tachyzoites14 (Supplemental Table 1). The functional breakdown of acetylated proteins in extracellular parasites is similar to intracellular parasites, with an enrichment of proteins involved in metabolism, translation, and chromatin biology (Figure 1). However, a number of categories are populated with different proteins between the two states, suggesting a change in acetylation status as the parasite switches from its intracellular to extracellular environment. This is not surprising since tachyzoites are actively replicating only when inside a nutrient-rich host cell; indeed, significant changes in mRNA expression have been documented for parasites transitioning between extracellular and intracellular environments.83 Using metabolic proteins as an example, lactate dehydrogenase (TGME49_032350) was not found to be acetylated in intracellular parasites, but was acetylated on four residues in extracellular parasites. Phosphoglycerate kinase (TGME49_118230) was acetylated on five residues in the intracellular state, although only one of these marks was observed in extracellular parasites. An additional acetylated lysine was observed on a phosphoglycerate kinase paralogue (TGME49_022020) that was not analogous to any of the acetylated lysines found on TGME49_118230.
The most extensively acetylated protein is once again the porin, voltage-dependent anion channel 1 (VDAC1, TGME49_063300). In addition to the 7 sites previously shown to be acetylated in intracellular parasites,14 we detected two additional lysines on porin that are acetylated in extracellular parasites (K55 and K242).
A total of 51 ribosomal acetyl marks were identified in both intra- and extracellular tachyzoites, however, only twelve of these marks were conserved in both states (Supplemental Table 1). The differences in ribosomal protein acetylation suggest that there may be distinct ribosome acetylation profiles associated with active and inactive ribosomes, which could be relevant to the observed global down-regulation of translation in egressed parasites.84 Further experimentation is required to test if ribosomal protein acetylation serves as another level of translational control.
As seen in the intracellular tachyzoite acetylome, there is extensive acetylation of chromatin and chromatin modification machinery in extracellular parasites (Supplemental Table S1). ADA2-A is a transcriptional activator that recruits the lysine acetyltransferase TgGCN5-B to nucleosomes for modulation of gene expression through acetylation of its histone substrates.85 Acetylation of ADA2-A was detected in extracellular parasites on three lysines, two of which (K245 and K250) lie within the TgGCN5-B interaction domain. It is possible that the change in acetylation status of these residues govern ADA2-A and TgGCN5-B association and that TgGCN5-B itself may regulate this interaction by acetylating ADA2-A. Interesting differences in acetylation were also found among the unique lineage of putative transcription factors harboring a plant-like AP2 DNA-binding domain. We found 8 acetylated lysines on 5 AP2 proteins in intracellular tachyzoites, 16 acetylated lysines on 8 AP2 proteins in extracellular parasites, and 4 acetylated lysines on 3 AP2 proteins in both states. For the first time, we also detected a lysine within a predicted AP2 domain (K570 on TGME49_014840). While it remains to be experimentally tested, acetylation of a residue within the AP2 domain may alter DNA-binding properties.
It is recognized now that intrinsic disorder is a highly heterogeneous phenomenon that can manifest itself differently at different levels of proteins structural organization.39, 53 Therefore, the combined analysis of the intrinsic disorder propensity by several computational tools (especially by tools that utilizes different attributes) provides additional advantages,86–88 allowing better visualization of the differences between the various protein groups.59, 89 Based on these premises, the prevalence of intrinsic disorder in regions containing acetylated and non-acetylated lysines in acetylated proteins from intracellular and extracellular Toxoplasma tachyzoites was evaluated by two orthogonal computational tools, PONDR® VLXT and IUPred.
As mentioned above, PONDR® VLXT applies various compositional probabilities and hydrophobic measures of amino acid as the input features of artificial neural networks for the prediction.70 This predictor uses three different neural networks, one for each terminal region and one for the internal region of the sequence. Each neural network is trained by a specific dataset containing only the amino acid residues of that specific region. The final prediction result uses the individual predictors in their respective regions. The transition from one predictor to another is accomplished by computing the average scores of the two predictors for a short region of overlap at the boundary between the two regions. The input features of neural networks include selected compositions and profiles from the primary sequences. PONDR® VLXT may underestimate the occurrence of long disordered regions in proteins. Although it is no longer the most accurate predictor, it is very sensitive to the local compositional biases and therefore it has significant advantages in finding potential binding sites within IDPs/IDPRs.90, 91
IUPred is based on the hypothesis that globular proteins have larger numbers of effective inter-residue interactions (negative free energy) than disordered proteins due to the different types of amino acids involved in possible residue contacts. Based on this idea, a composition-based pair-wise interaction matrix was shown to give values similar to those obtained from a structure-based interaction matrix. Structured and disordered proteins were compared by this approach, with the structured proteins found to have a significantly lower free energy estimate, thus giving a means to predict whether a protein is structured or disordered using amino acid sequence as input.81
Figure 2 represents the results of the analysis of the prevalence of intrinsic disorder in regions containing acetylated and non-acetylated lysines in acetylated proteins from the extracellular and intracellular tachyzoites. Here, the predisposition for intrinsic disorder in different dataset is shown as fractions of lysines with corresponding intrinsic disorder scores. Figure 2 shows that the disorder statuses of acetylated and non-acetylated lysines are rather different for both extracellular and intracellular tachyzoites, whereas there is a little pair-wise difference between the plots corresponding to the proteins derived from the extracellular and intracellular tachyzoites (i.e., acetylated (non-acetylated) lysines in the extracellular tachyzoite are similar to acetylated (non-acetylated) lysines in the intracellular tachyzoite). Figure S1 provides more detailed analysis of the results of disorder prediction by these two tools and shows that the correlation coefficients between PONDR® VLXT and IUPred predictions for intracellular and extracellular acetylated lysines are 0.70 and 0.72, respectively. Despite the differences in the outputs generated by PONDR® VLXT and IUPred, both predictors generally agreed on the existence of differences between the datasets and showed that non-acetylated lysines were typically characterized by a bit higher disorder scores than acetylated lysines. For example, in the extracellular tachyzoite, the mean disorder scores of acetylated lysines were 0.511±0.301 (PONDR® VLXT) and 0.475±0.217 (IUPred), whereas non-acetylated lysines were characterized by the disorder scores of 0.535±0.321 (PONDR® VLXT) and 0.482±0.237 (IUPred). Similarly for the intracellular tachyzoite, the mean disorder scores of acetylated lysines were 0.499±0.310 (PONDR® VLXT) and 0.471±0.213 (IUPred), whereas non-acetylated lysines were characterized by the disorder scores of 0.541±0.320 (PONDR® VLXT) and 0.488±0.236 (IUPred). Although these differences were not very significant, they were reproducible since both predictors showed somewhat higher disorder score for the non-acetylated lysines.
Our analysis revealed that in intracellular Toxoplasma tachyzoites, lysine acetylation preferentially occurs in intrinsically disordered or flexible regions. This conclusion follows from several observations. The application of the PONDR® VLXT predictor to the 274 acetylated proteins from intracellular Toxoplasma tachyzoites14 revealed that there is a strong correlation between acetylation and protein intrinsic disorder, and many of the 411 acetylated lysines were predicted as intrinsically disordered residues (i.e., residues with a disorder score >0.5). To systematically investigate the local environment of the acetylated lysines within the intracellular tachyzoite proteins (i.e., to see whether lysines were located inside disordered or ordered regions), a recently developed computational approach was used.92 There are four potential scenarios for the acetylated lysines and their local environment that were grouped into the following classes (Figure 3): Class I, a disordered acetylated lysine is located inside a disordered region; Class II, an ordered acetylated lysine is located inside a disordered region; Class III, a disordered acetylated lysine is located inside an ordered region; Class IV, an ordered acetylated lysine is located inside an ordered region. These four scenarios can be clearly distinguished in PONDR® VLXT plots (see Figure 3), with peptides from Classes II and III appearing as characteristic “dips” or “spikes”, respectively, in the corresponding disorder curves.92 Therefore, differences across local sequence environments of acetylated lysines can be evaluated computationally by analyzing the disorder scores computed for the target lysine within fragments of increasing length.
Figure 4 represents the results of this analysis, where the original averaged disorder scores of acetylated lysines in the intracellular tachyzoite were set on the x-axis and the averaged disorder scores for various extensions of regions containing acetylated lysines were projected on the y-axis. Ascending scores on the y-axis indicates a bias for the acetylated lysine to be within long disordered regions, and vice versa. For Classes I and IV, such extensions are not supposed to reveal noticeable changes in the averaged disorder scores. Conversely, Class II acetylated lysines would clearly show decreased disorder scores by such an operation. An increase in averaged disorder score would indicate acetylated lysines assigned to Class III. Averaged disorder scores were computed for sequences comprised of 5 or 30 amino acids flanking the lysine (Figures 4A and 4B, respectively). Disorder scores using amino acid extensions of varying lengths between 5 and 30 residues are shown in Supplementary Figure S2. Figure 4 shows that acetylated lysines with disorder scores <0.5 (i.e., predicted to be ordered) were commonly found within disordered regions (symbols in the lower-left quadrant above the diagonal in Figure 4) or were shown to have much higher flexibility than neighboring amino acids (symbols in the lower-left quadrant below the diagonal in Figure 4).
In addition to 411 acetylated lysines, we also analyzed the 12,883 non-acetylated lysines in the 274 acetylated proteins from the intracellular tachyzoites. Figure 5A represents the distribution of acetylated and non-acetylated lysines with varying ranges of disorder score, showing that acetylated and non-acetylated lysines possess a different predisposition for disorder. The results of the analysis of the local environment of the non-acetylated lysines are shown in Figure 5B, in a form of the distributions of non-acetylated lysines in various ranges of disorder score calculated for the extended regions of varying length from the target lysine. Similar to what we observed for the acetylated lysines, this analysis reveals that the non-acetylated lysines are also frequently located within disordered regions.
Based on the results of these analyses one might conclude that both types of lysine-containing fragments (acetylated and non-acetylated) are expected to be disordered. However, there is some difference between these two classes of fragments, with acetylated peptides expected to be a bit less prone for disorder. This observation is supported by Figure 6, which represents the logo profiles of acetylated and non-acetylated peptides (15-mers) in which lysine is located at the middle position. Neighboring residues are colored according to their disorder propensities, with disorder-promoting in red, order-promoting in blue, and neutral in green. Comparison of these plots shows that acetylated peptides have noticeable bias in the residues surrounding the target lysine, whereas residues in the non-acetylated peptides are distributed more evenly. Furthermore, Figure 6A shows that the acetylated peptides typically contain hydrophobic/aromatic residues in close proximity to the acetylated lysine, whereas the environment of non-acetylated lysines is typically much less biased (Figure 6B).
The conclusion on the more disordered nature of the non-acetylated peptides is further supported by the results of compositional profiling of 15mer peptides containing acetylated lysines (Figure 7A) or non-acetylated lysines (Figure 7B) in comparison with the ordered proteins. Analysis of these data show that both types of peptides are clearly disordered but their relative amino acid compositions are rather different. Compositional profiling of peptides containing non-acetylated lysines relative to peptides containing acetylated lysines (Figure 7C) reveal that the acetylated peptides clearly had more C, F, I, Y, and G (statistically significant enrichment for C, F, Y, and G) and less W, M, R, P, and E (statistically significant depletion for R, P, and E).
Finally, visual inspection of plots shown in Figure 2 suggests that there is a bimodal distribution within the disorder score range 0.05–0.95 for all the datasets. Distributions for the acetylated lysines in the extracellular and especially intracellular tachyzoites (Figure 2A and 2C) are characterized by maxima in the vicinity of disorder scores of 0.2 and 0.8, whereas the IUPred-based distributions for the non-acetylated lysines were shifted toward higher disorder scores and had maxima in the vicinity of 0.3 and 0.9. Furthermore, in the extracellular tachyzoites, 49.8±0.7% (PONDR® VLXT) and 44.2±0.6% (IUPred) acetylated lysines had a disorder score above the 0.5 threshold, whereas in the set of non-acetylated lysines, these numbers were 54.8±0.1% (PONDR® VLXT) and 45.6±0.1% (IUPred). Similarly, in the intracellular tachyzoites, 46.4±1.1% (PONDR® VLXT) and 40.5±0.8% (IUPred) acetylated lysines had a disorder score above the 0.5 threshold, whereas in the set of non-acetylated lysines, these numbers were 55.6±0.2% (PONDR® VLXT) and 46.3±0.2% (IUPred).
Disorder analysis of 571 acetylated lysines in 386 proteins from freshly egressed extracellular Toxoplasma tachyzoites supported the major conclusions made for the acetylome of the intracellular tachyzoites. Similar to the acetylome of the intracellular tachyzoites, more than 50% of non-acetylated lysines are predicted to have disorder score above 0.5; i.e., predicted to be intrinsically disordered residues. Furthermore, acetylated lysines in proteins from the extracellular and intracellular tachyzoites are largely located within similar local environments. Figure 8 displays the results of the analysis of the local environment of acetylated lysines within the amino acid sequences of proteins from extracellular and intracellular tachyzoites. Here, the original averaged disorder scores of acetylated lysines in extracellular (Figure 8A) and intracellular tachyzoites (Figure 8B) were used in the x-axis and the averaged disorder scores for sequences comprising of 15 amino acids flanking the lysine at both sides were used in the y-axis. Striking similarity is seen for these two plots indicating that there is no significant difference between the local environments of acetylated lysines in extracellular and intracellular tachyzoites. Figure 8 shows that for both acetylomes, the acetylated lysines predicted to be ordered (i.e., with the disorder score of <0.5) were more commonly found within disordered regions rather than within the ordered regions. In other words, ordered acetylated lysines preferentially belonged to Class II rather than to Class IV, as evidenced by high populations of symbols in the lower-left quadrants above the diagonal in Figure 8. Furthermore, many of these ordered acetylated lysines are expected to possess much higher flexibility than their neighboring amino acids (symbols in the lower-left quadrant below the diagonal in Figure 8).
Comparison of the local environment of the acetylated and non-acetylated lysines in the Toxoplasma acetylome revealed that both lysine types are predominantly located in intrinsically disordered regions. Although no structural information is currently available for the target peptides bound to the Toxoplasma acetyltransferase, analysis of the structures deposited to PDB revealed the existence of several complexes between the various acetyltransferases from different sources and their acetylatable targets (mostly fragments of different histones). Figure 9 illustrates some of the peculiarities of signaling interactions between acetyltransferases and their targets by presenting several structures of corresponding complexes. Figure 9A shows a crystal structure of the complex between a 15-amino acid peptide derived from the human histone H4 and the catalytic subunit of the human histone acetyltransferase 1 (HAT1, PDB ID: 2P0W). Figure 9B represents a crystal structure of the complex between a 19-amino acid peptide derived from histone H3 and histone acetyltransferase GCN5 from Tetrahymena thermophila (PDB ID: 1PU9). Figure 9C shows a crystal structure of a complex between the Saccharomyces cerevisiae histone acetyltransferase RTT109, histone chaperone (vacuolar protein sorting-associated protein 75, Vps75), and a 14-amino acid peptide from histone H3 (PDB ID: 3Q33). Finally, Figure 9D represents an NMR solution structure of a complex between a bromodomain of the human acetyltransferase PCAF and 14-amino acid peptide from histone H3 (PDB ID: 2RNW). Analysis of the structures shown in this figure clearly demonstrates that the acetyltransferase targets are in highly extended conformations that do not have any intramolecular hydrogen bonds. Furthermore, out of 14 residues of the histone H3 co-crystallized with RTT109 and Vps75, only 4 residues were resolved (Figure 9C), whereas 10 residues were missing in the electron density map, clearly indicating their highly dynamic nature. Finally, the highly flexible nature of the acetylatable target is illustrated by Figure 9D, which represents an NMR solution structure of an acetyltransferase-target complex and clearly shows that the bound 14-amino acid peptide from histone H3 exists as a highly dynamic structural ensemble, with the structure of the entire complex resembling a “can of worms”.
Overall, Figure 9 illustrates that irrespectively of their origin acetylatable regions that are recognized by the various acetyltransferases are predominantly disordered. This observation provides strong support to the idea that intrinsic disorder is a general feature of the protein regions subjected to acetylation, and is not limited to the 411 lysine acetylation sites across 274 proteins in the intracellular tachyzoites or the 571 acetylated lysines across 386 proteins in the extracellular tachyzoites described in our study.
On the other hand, data shown in Figures 5–7 suggest that the overall local environment of the acetylated lysines is slightly less disordered than that of the non-acetylated lysines and is clearly characterized by noticeable biases with hydrophobic and aromatic residues being frequently seen in the close proximity of acetylated lysines (Figure 6). These observations suggest that in addition to the overall flexibility of regions surrounding the acetylated lysines, acetylation sites might have some specific signals that facilitate recognition by lysine acetyltransferases (KATs). Typically, such recognition sites should have some hydrophobic/aromatic residues, which can explain why acetylated peptides are less disordered.
Support for this research was made possible through grants from the National Institutes of Health (AI077502 to WJS) and from the Program of the Russian Academy of Sciences for the “Molecular and cellular biology” (VNU). The authors thank Jeffery Silva (Cell Signaling Technology) for assistance with the mass spectrometry analysis.