|Home | About | Journals | Submit | Contact Us | Français|
σ factors are transcriptional regulatory proteins that bind to the RNA polymerase and dictate gene expression. The extracytoplasmic function σ factors (ECF) govern the environment dependent regulation of transcription. ECF σ factors have two domains σ2 and σ4 that recognize the −10 and −35 promoter elements. However, unlike the primary σ factor σA, the ECF σ factors lack σ3, a region that helps in the recognition of the extended −10 element and σ1.1, a domain involved in the auto-inhibition of σA in the absence of core RNA polymerase. Mycobacterium tuberculosis σC is an ECF σ factor that is essential for the pathogenesis and virulence of M. tuberculosis in the mouse and guinea pig models of infection. However, unlike other ECF σ factors, σC does not appear to have a regulatory anti-σ factor located in the same operon. We also note that Mycobacterium tuberculosis σC differs from the canonical ECF σ factors as it has an N-terminal domain comprising of 126 amino acids that precedes the σC2 and σC4 domains. In an effort to understand the regulatory mechanism of this protein, the crystal structures of the σC2 and σC4 domains of σC were determined. These promoter recognition domains are structurally similar to the corresponding domains of σA despite the low sequence similarity. Fluorescence experiments using the intrinsic tryptophan residues of σC2 as well as surface plasmon resonance measurements reveal that the σC2 and σC4 domains interact with each other. Mutational analysis suggests that the Pribnow box binding region of σC2 is involved in this inter-domain interaction. Interaction between the promoter recognition domains in M. tuberculosis σC are thus likely to regulate the activity of this protein even in the absence of an anti-σ factor.
The slow growing pathogen, Mycobacterium tuberculosis, has only one ribosomal RNA operon. An efficient transcription mechanism is thus essential for the survival and pathogenecity of this bacillus. Although a third of the human population are carriers of this pathogen, only a few contract this disease. This bacillus can lie latent in the host for prolonged periods of time till a mycobacterial infection gets triggered, most often in the setting of decreased host immunity. Studies that help further our understanding of the latent phase of this bacillus are thus vital to understand the mechanisms of pathogenesis and virulence in this organism. This ability of M. tuberculosis to survive in the hostile environmental conditions in the carrier is brought about by adaptability to rapidly changing environmental conditions. The survival mechanism is thus crucially dependent on efficient communication between the mechanism that senses the environment and the molecules that regulate transcription.
The DNA dependent RNA polymerase (RNAP) that controls gene expression consists of five domains- two α subunits, one ω subunit and the β and β′ domains. The σ-factor reversibly associates with the RNAP and provides the RNAP enzyme with the ability to recognize promoter regions on the DNA template. M. tuberculosis has a total of 13 σ-factors: 3 primary σ factors and 10 extra-cytoplasmic function (ECF) σ factors. Differences in the promoter specificity of these σ factors lead to differential gene expression (1, 2). The task of environment-dependent transcription modulation is performed by the ECF σ factors. Regulatory mechanisms, in turn, dictate which σ factors get activated to bind to the apo-RNAP and initiate transcription and which ones are not. This regulatory mechanism is coupled to a signal transduction system that senses the environment. The interplay between signal transduction and the transcriptional regulatory mechanisms allows the bacillus to respond to changes in the environment by synthesizing new proteins or down-regulating others.
Studies on the transcription mechanism in E. coli have shown that a σ factor directs promoter-specific transcription initiation by specific interactions between two hexamers of a consensus DNA sequence- the Pribnow box (−10 element) and the −35 element. The crystal structure and limited proteolysis experiments on the principal σ factor σA revealed a structural arrangement comprising of a series of domains connected by flexible linkers. The utility of this structural arrangement is borne out by the observation that substantial structural rearrangements take place in the σ-factor during transcription initiation. Exposed surfaces of each of these domains were found to be important for RNAP binding (3). Interaction with the core RNAP has been demonstrated to structurally rearrange the σ factor into an active conformation in which the DNA binding regions of σ2 and σ4 domains are exposed and appropriately positioned to recognize the −10 and −35 elements (3,4,5,6).
Two distinct types of mechanisms have been proposed to describe the regulation of the activity of the σ factors that belong to the σ70 family. Thus while auto-inhibition by the N-terminal domain mediates the activity of the principal σ factor, σA, protein-protein interactions mediated by the anti- and anti-anti-σ factors control σ factor concentrations in the case of the ECF σ factors. Other regulatory mechanisms have been proposed wherein σ factor activity is controlled by stimulatory signals (such as the phosphorylation and dephosphorylation processes controlled by the kinases and phosphatases) or mechanisms that can control the intracellular σ factor concentration by localizing it on the inner surface of the cell membrane. In the case of the E. coli σ factor σE (7) for example, many of the key regulators of σE activities (products of the rse genes) are co-transcribed with RpoE to form an operon : rpoE, rseA, rseB and rseC. The protein RseA is an anti-σ factor for σE whereas the periplasmic RseB protein enhances its activity. The activity of σE can also be regulated by cytoplasmic factors like guanosine 3′,5′-bispyrophosphate (ppGpp). The signaling pathway employed in this process is distinct from the pathway that is activated upon extra-cytoplasmic stress (8). Very little is known of these processes in mycobacteria although recent reports indicate that work to delineate the regulatory mechanism of σE and another σE-like factor σH are presently in progress (9,10,11). The anti-σ factor, as seen in the cases of the SpoIIAB (12) and the σE-RseA complex (4), appears to bind the free σ factor, thereby preventing its interaction with the core RNAP. The σ28/FlgM complex (6) on the other hand is different from the others in that FlgM can also form a ternary complex with the σ28 holoenzyme, thereby destabilizing the σ28/RNAP interaction. The modes of σ factor anti-σ factor interactions also differ between the σ70 and σ28 families. The anti-σ factor RseA is sandwiched between σE2 and σE4 domains of σE whereas the FlgM protein wraps around the outside of σ28 and occludes the core RNAP binding determinants on σ282 and σ284
M. tuberculosis has about 190 transcription regulators (13) which include 13 σ factors, 11 two-component systems, 5 unpaired response regulators, 11 protein kinases and 140 other proteins. The resulting interaction amongst these proteins suggests a complex system of overlapping functions and redundancies (9). Four σ factors, σA, σB, σC and σE of the 13 in M. tuberculosis are conserved in all the pathogenic mycobacteria. Only four σ factors in M. tuberculosis σA, σC, σD and σL have been examined in detail for their roles in virulence and pathogenecity. The ECF σ factor σC is essential for lethality in mice but is not required for bacterial survival in this species (14). Based on genomic microarray data, σC was demonstrated to modulate the expression of several key virulence associated genes including hspX, senX3 and mtrA, a two component sensor kinase and a two component response regulator. The phenotype of the Δ σC mutant of M. tuberculosis can thus persist in tissues but is attenuated in its ability to elicit lethal immunopathology (14). A recent report also suggests σC to be a key regulator of pathogenesis and adaptive survival in the lung and spleen of guinea pigs (15). These in vivo studies in two different animal models thus demonstrate the role of σC in the pathogenesis and virulence of M. tuberculosis. Although σC does not appear to have an anti-σ factor located in the same operon, in silico analysis for the identification of potential interacting partners for σC suggested two proteins SirR and the unannotated protein Rv0093c. However, the interaction between these proteins and σC could not be substantiated by experimental evidence.
In this manuscript, we report the biochemical and structural analysis of the domain organization of σC. Despite the low sequence similarity between the promoter recognition domains of σC with the primary σ-factor σA, we observe that the crystal structure of the two promoter recognition domains are conserved. Spectroscopic studies using the intrinsic tryptophan fluorescence as well as surface plasmon resonance experiments suggest that the σC2 and σC4 domains interact in vitro. Interactions between the two promoter recognition domains of σC which involves the occlusion of the Pribnow box recognition region of σC2 suggests that substantial inter- domain rearrangements would be needed to activate σC even in the absence of an anti σ-factor.
The details of the expression constructs used in this study are compiled in Table 1. After transforming the plasmid into BL21 (DE3) or Rosetta origami cells (Novagen, Inc.), the cells were grown in Luria broth with an antibiotic selection (ampicillin 100 μg/ml and chloramphenicol 30 μg/ml) to an OD600 nm of 0.5-0.6. The cells were induced with 0.2 mM IPTG (final concentration). Subsequently, the growth temperature was lowered to 290 K and cells were grown for further 12-18 hrs before they were spun down and stored at 193 K. Full length σC (311 amino acids) was purified in denaturing conditions using 8M urea. Refolding of the denatured protein was carried out in a step-wise fashion with five buffer changes (each lasting about 3-4 hours) with decreasing concentration of urea. The recombinant proteins from the other three constructs were purified under native conditions as described earlier (16). The proteins were further purified by size exclusion chromatography using a Superdex S-200 column (Amersham-Pharmacia, Inc.) after the affinity chromatography step. The proteins were concentrated using membrane based centrifugal ultra filtration (Amicon). L-Arginine and L-glutamic acid were added to the final concentration of 50mM each during the concentration step (17). The purity of the samples was analyzed using SDS-PAGE followed by Coomassie blue staining. Both full length σC as well as the σC4 domain have poor solubility and are very unstable. These proteins were freshly purified prior to each biochemical experiment. The molecular weights of the recombinant proteins were also verified by Mass spectrometry on a MALDI-TOF (Bruker Daltonics, Inc) mass spectrometer.
The western blot analysis to determine the molecular weight of σC in situ was performed according to the standard ocedure outlined in Molecular Cloning (18). Cell free lysate of H37Rv was obtained from the TB vaccine testing and research material contract at the Colorado state University. The cell lysate and purified protein samples were resolved in 10% SDS-PAGE, and transferred to a nitrocellulose membrane. A polyclonal antibody raised in rabbit against σC127-311 (a smaller construct containing residues 127-311aa based on the annotated sequence of H37Rv) was used as the primary antibody for σC detection. Western blots were developed using the AEC (Sigma-Aldrich, Inc.) substrate with HRP-labeled sheep anti-rabbit IgG (Bangalore Genei Co.)
Site directed mutagenesis experiments were performed using a PCR based method (19). The primers GATAGGCGACGAACCGCGCCACGTCTTGCTGGG (for W164A) and CGATGGCCAGCAACGCAGTTCGGGCGCTGG (for the W203A point variant) were used. These mutations were confirmed by DNA sequencing.
All fluorescence studies were performed on a JOBIN YVON FlouroMax-3 fluorimeter at room temperature. The fluorescence excitation was set at 280 nm and emission spectra were recorded from 300 to 400 nm at a band-width of 1 nm. The excitation and emission slit widths were set to 3 and 5 nm, respectively. Each spectrum is an average of five scans. The spectra were obtained at a protein concentration of ~ 3 μM in 10 mM Tris-HCl, 50 mM NaCl pH 7.5. Interactions between the σC2 and σC4 domains were monitored after the two protein samples were mixed and incubated at room temperature for 2 minutes prior to data acquisition.
Surface plasmon resonance (SPR) assays were performed on a BIAcore 2000 instrument (BIAcore AB). All experiments were conducted at 25°C. SPR experiments were performed with either the σC2 and σC4 domains immobilized onto carboxylated dextran chips (sensor chip CM5 from BIAcore AB) using the standard amine coupling procedure as recommended by the manufacturer. Immobilization resulted in a change in the refractive index corresponding to 700 and 640 resonance units (RU) for σC2 and σC4 respectively. Binding and kinetic assays were performed in 50 mM sodium phosphate (pH 7.4), 100 mM NaCl at flow rate of 20 μL/min. The proteins were diluted in this buffer and experiments were carried out at concentrations ranging from 0.8–200 μM. Freshly purified protein samples were used for these binding assays. Dissociation was initiated by replacing the analyte with buffer. The association and dissociation curves were monitored for 120s. Sensograms were analyzed with BIAevaluation software version 2 (BIAcore AB). A 1:1 langmuir binding model was used to fit the curves.
Crystallization conditions and data collection strategies have been reported previously (16). The data collection and refinement statistics for both the domains are reported in Table 2. The N- and C-terminal domains of E.coli σE (PDB code 1OR7) were used as starting model(s) for Molecular Replacement (MR). The sequence identity between σC2 and N-term σE is 23% whereas that between the C-term σE and σC4 domain is 45%. MR calculations were performed using the program PHASER(20). The solution for σC2 and σC4 had log likelihood gains of 71.52 and 489.97 with corresponding Z-scores of 5.21 and 15.53 respectively. There were two molecules in the asymmetric unit in the case of σC2 and three in the asymmetric unit for the σC4 domain. Both structures were refined using tight non-crystallographic symmetry (NCS) restraints. Iterated cycles of model building using Coot (21) and refinement using Refmac(22) led to the final model for σC2 with an Rcryst of 18.9% and Rfree of 24.3%. The final model σC4 had Rcryst of 21.0 % and Rfree of 26.7% The final model of σC2 consists of 88 residues, 4 sulfate ions and 20 water molecules whereas that for the σC4 domain has 60 residues, 8 sulfate ions and 17 water molecules.
In M. tuberculosis the following σ-anti σ pairs have been reported: σD-Rv3413c, σE-Rv1222, σF-Rv3287c, σH-Rv3222c and σL-Rv0736 (23,24,25,26,27). σD, σF, σH and their respective anti-σ factors are canonical in the sense that these pairs occur as a part of one operon. σC the only member in an operon in both the M. tuberculosis strains H37Rv and CDC1551. Assuming that interacting proteins would show correlated expression levels under certain conditions, the correlation coefficients and t values (Student's t test) between a given σ factor and the corresponding anti-σ pair were calculated in different conditions (7H9 medium(N=4), Balb mice(N=4), Scid mice(N=4), stationary state(N=6), Low oxygen dormancy(N=9) using the equations given below:
To predict all the interacting proteins/anti-σ factor(s) for σC, we examined all the transcription regulatory genes that are present in M. leprae (M. Leprae has only 4 σ factors σA, σB, σC, and σE). An amino-acid sequence based search was also performed to locate putative anti-σ factors on the basis of the sequence signature (HXXXCXXC) (27). Transmembrane protein prediction was done using the program Conpred (28). These results are compiled in Supplementary Table 1.
Differences in the annotation of σC in the two M. tuberculosis strains H37Rv (13) and CDC1551 (29) led to the identification of an N-terminal 126 residue long polypeptide preceding the structured domain σC2. The σC2 and σC4 domains are connected by a flexible ~25 amino acids long linker. Sequence based prediction of low complexity regions (30) in this protein (Figure 1a) suggests that the N-terminal domain preceding σC2 is largely unfolded. M. tuberculosis extra-cytoplasmic σC thus appears to be more primary σ factor-like in terms of the domain organization except that it lacks the region 3.2 that lies between the σC2 and σC4 domains and is involved in extended −10 promoter recognition. The sequence based structure alignment of M. tuberculosis σC is shown in Figure 2 (adapted from reference 2). Interestingly, Myobacterium leprae σC does not have this N-terminal region. Based on the sequence alignment and patterns of protease sensitivity, four expression constructs of σC were examined: the full length σC, the smaller construct containing only the σC2 and σC4 domains and the two independent domains σC2 and σC4 (Table 1 and Figure 3). In a western blot experiment using cell free lysate of M. tuberculosis H37Rv (obtained from the TB research center, Colorado State University), the polyconal antibody raised against the shorter length construct (lacking the 126 amino acid polypeptide at the N-terminus) recognized the full-length σC. σC is thus a 311 amino acids long protein in both the laboratory (H37Rv) as well as the clinical (CDC1551) strains of M. tuberculosis (Figure 3).
The σE2 and σE4 domains of E. coli σE were used as search models to solve the structures of M. tuberculosis σC2 and σC4 domains by Molecular Replacement. Overall, these two proteins have 32 % identity in the amino acid sequence although the C-terminal region is more conserved than the N-terminal polypeptide. Thus while M. tuberculosis σC2 and E. coli σE2 have 23 % identity, M. tuberculosis σC4 and E. coli σE4 are more similar with ~ 44 % sequence identity. The crystallization of the σC2 and the σC4 domains has been reported earlier (16). The data collection and refinement statistics are compiled in Table 2 (more details in Supplementary Figure 1). Despite the low sequence similarity, the overall topology of the σC2 and the σC4 domains is similar to that of the known σ factors reported till date (3,4,5,6). The backbone Cα superposition between σC2 with E. coli σA2 shows a root-mean-squared-deviation (rmsd) of 2.97 Å (~8 % identity over 77 amino acid residues) whereas σC2 superposes better with E. coli E2 with an rmsd of 1.4 Å (24 % identity over 88 residues). The σC4 domain which interacts with the −35 promoter element is structurally more conserved with a backbone rmsd of 1.6 Å with the corresponding domain of the primary σ-factor σA4. Molecular modeling and docking based on the structure of the Thermus aquaticus RNAP suggests that the σC2 and σC4 domains would be spatially separated by a distance of ~50 Å (data not shown) when bound to the RNAP. This distance is compatible with the ca 25 residue linker joining the σC2 and σC4 domains. Studies on the primary σ-factor σA have shown that aromatic and charged residues in region 2 are involved in recognizing the −10 element and subsequent strand separation (31). On the basis of multiple sequence alignment, W203 was found to be universally conserved and is likely to be involved in interactions with the Pribnow box element. Based on a model of E. coli σE-holoenzyme and the σE-RseA structure (4), Campbell et al proposed that the E. coli anti-σ factor RseA functions by sterically occluding the two primary RNAP-binding regions on σE. Activation of the σE regulon occurs when σE is released from RseA upon degradation by the Hho (DegS) protease. In the σE-RseA structure, the anti-σ factor was found to be sandwiched between the σE2 and the σE4 domains. Several substitutions in E. coli σE4, R178G, I181A and V185A were identified as those that led to defective RseA binding (4). In the case of M. tuberculosis, the equivalent Arg residue is conserved whereas the other two positions (E. coli/M. tub) are replaced by (Ile/Leu) and (Val/Ala).
M. tuberculosis σC is different from other ECF σ factors as there is no anti-σ factor located downstream of σC in the same operon. To understand the mechanism of regulation of transcription by σC, attempts were made to identify the interacting partners that could activate/inactivate this protein. Coimmunoprecipitation using the purified σC antibody using the cell free lysate (obtained from the TB research center, Colorado State University) was used to isolate proteins that interact with σC. These proteins were identified by in situ digestion using trypsin followed by peptide mass-fingerprinting using a MALDI-TOF mass spectrometer and analyzed using MASCOT (Bruker Daltonics, Inc). Few components of RNAP could be detected at high confidence levels (data not shown). However, no potential anti-σ factor could be identified in this experiment. Several computational tools were also used to identify proteins that could potentially interact with σC. These results including relevant entries from the database STRING (32) are compiled in Table 3. The application of this tool for the prediction of protein–protein interactions suffers from the problem that functional interaction does not necessarily imply direct physical interaction. Based on the co-occurrence of genes in related species, proteins exhibiting similar phylogenetic profiles can be predicted to be functionally linked i.e. they occur as a structural complex or are involved in a common metabolic pathway. Although 29 potential interacting proteins could be identified using this approach, none showed a significant correlation in their mRNA expression levels with that of σC. The Rosetta tool for gene fusion analysis (33) led to the identification of the gene Rv0093c. In this method the existence of a fusion protein in one genome allows the prediction of the interaction between the single domain proteins in other genomes. Rv0093c also has an amino-acid sequence signature proposed for anti-σ factors that regulate oxidative stress (27) and a conditional correlation was observed between the expression level of this protein versus σC in one DNA microarray data-set. A pertinent observation in this regard is that notwithstanding the large errors involved in the gene expression levels obtained using DNA microarray technology, well characterized σ-anti σ pairs exhibit significant correlation (α=0.05) under specific environmental or growth conditions (Table 3b).
A sequence based search for RseA-like proteins in the M. tuberculosis genome led to the identification of a protein SirR (32 % identity between the E. coli RseA sequence that interacts with σE and M. tuberculosis SirR). The annotation of SirR in the M. tuberculosis genome describes it as an Iron dependent transcription repressor. Two strategies, one involving co-expression using pET-Duet1 vector and the other by in vitro studies using purified recombinant proteins were adopted to examine interactions between σC and the predicted anti-σ factors SirR and Rv0093c. In the co-expression approach, σC was cloned with a poly-histidine tag at the N-terminus to enable the complex to be purified by affinity chromatography. No interaction could be detected between either of these proteins and σC in vitro (data not shown). This finding could probably be rationalized on the basis of the variations seen in residues that are involved in RseA recognition.
Fluorescence measurements using the intrinsic tryptophan residue in σC2 were used to monitor the interactions between the two promoter recognizing domains of σC. The fluorescence spectra are shown in Figure 4. There are two tryptophans in σC2 (region 2) and none in σC4 (region 4) (Figure 4b). This fortuitous distribution of tryptophan residues helped us examine the interactions between the σC2 and σC4 domains. A fluorescence spectrum of native σC (127-311 residues) showed an emission maximum at 342 nm while σC2 showed an emission maximum at 349 nm. Upon the addition of σC4 in stoichiometric ratio, a blue shift (~7nm) in the fluorescence spectrum was observed with a reduction in the emission intensity. A representative binding reaction wherein the σC4 domain is titrated into the σC2 domain is shown in Figure 4a. This fluorescence titration suggests a weak interaction between the σC4 and σC2 domains (Kd ~ 2 μM).
Of the two tryptophans in σC2, Trp-203 is involved in DNA melting (Pribnow Box element). Based on molecular modeling analysis (data not shown) Trp-164 is likely to interact with the RNAP. These observations led us to investigate the specificity of interactions (given the fact that the two domains are connected by a flexible linker of ca 25 amino acids long) and identify the region where σC4 binds σC2. To address these questions, two point variants of σC viz W164A and W203A were constructed in σC127-311 (medium length σC that lacks the positively charged N terminal domain which contains three tryptophan residues). The W164A mutant protein had a fluorescence emission maximum at 335 nm whereas the W203A variant had its emission maximum at 353 nm (Figure 4b). The blue shift seen in the W164A variant suggests that σC4 interacts σC2 in the region that would lead to the partial burial of W203 (Figure 4c). Interactions between the σC2 and σC4 domains thus result in the occlusion of the Pribnow box recognition element.
The interaction between the σC2 and σC4 domains was also monitored by Surface Plasmon Resonance experiments (Figure 4d). SPR studies of the interaction between the σC2 and σC4 domains suggest a low affinity binding between these two domains (Kd=1.81±0.03μM). The binding affinity and kinetic parameters were comparable when either the σC2 and σC4 domain was immobilized. The highly polar nature of this interdomain interaction was apparent from a size exclusion chromatography experiment performed at medium salt concentrations (250mM NaCl) where the two domains migrate separately. This was also supported by fluorescence quenching experiments where increasing salt concentrations not only reduced the fluorescence quenching but also resulted in a red shift in the emission maximum (Supplementary Figure 3).
In summary, the structural and biophysical studies on the two promoter recognition domains of σC shows an interaction between the σC2 and σC4 domains that is mostly governed by polar residues. This interaction involves the occlusion of the region of σC that participates in the recognition of the −10 promoter element. The binding of M. tuberculosis σC to the core RNAP would thus involve substantial structural re-arrangement to release σC from its auto-inhibited state. These observations thus provide an alternate mechanism for the regulation of the ECF σ-factor, σC.