|Home | About | Journals | Submit | Contact Us | Français|
Special AT-rich sequence-binding protein 1 (SATB1) is a global chromatin organizer and gene expression regulator essential for T-cell development and breast cancer tumor growth and metastasis. The oligomerization of the N-terminal domain of SATB1 is critical for its biological function. We determined the crystal structure of the N-terminal domain of SATB1. Surprisingly, this domain resembles a ubiquitin domain instead of the previously proposed PDZ domain. Our results also reveal that SATB1 can form a tetramer through its N-terminal domain. The tetramerization of SATB1 plays an essential role in its binding to highly specialized DNA sequences. Furthermore, isothermal titration calorimetry results indicate that the SATB1 tetramer can bind simultaneously to two DNA targets. Based on these results, we propose a molecular model whereby SATB1 regulates the expression of multiple genes both locally and at a distance.
Global gene regulation is essential for cell differentiation and maturation. Local and long-range coordinated gene regulations provide flexibility for the coregulation and modulation of related genes to meet various biological requirements during development or tumors growth and metastasis (1). The chromatin architecture in the nucleus plays an important role in regulating gene expression (2). The sequences of the nuclear matrix attachment regions (MARs), also referred to as base-unpairing regions (BURs), are generally 70% AT rich (3). The unwinding property of BURs has been shown to be essential for binding to the nuclear matrix and enhancing promoter activity (4). Special AT-rich sequence-binding protein 1 (SATB1) binds to the core unwinding elements of BURs and recruits various chromatin remodeling/modifying enzymes to regulate gene expression by directly influencing various gene promoter activities (5–8). Strikingly, SATB1 is also able to regulate gene expression at long distances, up to several hundred kb (9–11). A recent report showed that SATB1 is aberrantly expressed in human metastatic breast cancer and coordinately regulates the expression of sets of genes that promote breast cancer tumor growth and metastasis (12).
SATB1 was initially identified as a cell type-specific MAR DNA-binding protein, predominantly expressed in the thymus (3). SATB1 consists of an N-terminal domain, a C-terminal homeodomain (HD) and tandem CUT domains in the center (Figure 1A). The sequence-specific binding of SATB1 to its DNA targets is mediated by the HD and CUT tandem domains (13,14) and requires oligomerization of the N-terminal domain (15,16). In addition, chemical interference assays suggested a rapid association and dissociation kinetics of DNA binding by SATB1, and the dissociation rate (Koff) for the multiple AT-rich-containing DNA fragments (such as seven repeats) is slower than the less AT-rich-containing DNA fragments (such as two repeats) (3). In addition to its DNA-binding ability, SATB1 also acts as a ‘docking site’ for various chromatin remodeling/modifying enzymes and transcription factors (5,7,9,11,17). The transcriptional activity of SATB1 is regulated by several post-translational modifications such as phosphorylation (18), acetylation (7,18) and sumoylation (19). Taken together, these studies demonstrate that SATB1 acts as a linker between DNA loop organization, chromatin modification/remodeling and the association of transcription factors with MARs, and thus it functions as a ‘genome organizer’ that is essential for T-cell development (17,20).
A structural study of the SATB1 CUT1 domain bound to DNA showed that the CUT1 domain binds to the major groove of DNA (21). However, the molecular mechanism of how SATB1 oligomerization regulates DNA binding has remained elusive. In this study, we present the crystal structure of the N-terminus of SATB1. Surprisingly, it resembles a ubiquitin domain instead of the previously identified PDZ domain. We also found that SATB1 can assemble into a tetramer, which provides insight into molecular basis of its ability to regulate gene expression at long distances.
Various fragments (Supplementary Data and Supplementary Figure S1) of the mouse SATB1 gene were PCR amplified from a mouse thymus cDNA library and cloned into an in-house modified version of the pET32a (Novagen) vector and confirmed by DNA sequencing. The resulting protein contained a His6-tag in its N-terminus. All point mutations of SATB1 described here were created using the standard PCR-based mutagenesis method and confirmed by DNA sequencing. The recombined protein was expressed in BL21 (DE3) Escherichia coli cells at 16°C for 16–18h. The His6-tagged protein was purified by Ni-NTA (QIAGEN) affinity chromatography followed by size exclusion chromatography on a HiLoad 26/60 Superdex 200 (GE Healthcare). After digestion with PreScission Protease to cleave the N-terminal His6-tag, the target protein was purified on a Mono Q 10/100 GL (GE Healthcare) anion-exchange column. The final purification step was size exclusion chromatography on a HiLoad 26/60 Superdex 200 column in 50mM Tris pH 8.0, 50mM NaCl, 1mM EDTA and 1mM DTT.
Se–Met-recombined protein was expressed in B834 (DE3) E. coli cells at 20°C for 20h. The B834 (DE3) cells were cultured in LeMaster medium in which methionine was replaced by selenomethionine. The Se–Met substituted protein was purified as wild-type protein as described above.
The outline for the expression and purification of full-length protein is summarized in Supplementary Figure S7. Briefly, SATB1 and its mutants were expressed in E. coli as a fusion protein with an N-terminal Trx-His6-tag and C-terminal His6-tag. The Trx-His6-tagged protein was purified by Ni-NTA affinity chromatography followed by size exclusion chromatography on a HiLoad 26/60 Superdex 200 column. After digestion with PreScission Protease to cleave the N-terminal Trx-His6-tag, the target protein was purified on a Mono Q 10/100 GL anion-exchange column. The protein was then denatured with 6 M guanidine hydrochloride and purified by Ni-NTA affinity chromatography under denaturing conditions by its C-terminal His6-tag. The eluted protein was refolded by extensively dialyzing out the denaturant in 50mM Tris pH 8.0, 100mM NaCl, 1mM EDTA and 1mM DTT at 4°C for 12h at least three times. The refolded protein was further purified on a Mono Q 10/100 GL anion-exchange column followed by size exclusion chromatography on a HiLoad 26/60 Superdex 200 column in 50mM Tris pH 8.0, 100mM NaCl, 1mM EDTA and 1mM DTT.
The wild-type protein was crystallized using the sitting drop vapor diffusion method equilibrated against a reservoir solution of 15% polyethylene glycol 3350, 0.15M ammonium citrate dibasic and 4% polypropylene glycol P400. Selenium-substituted protein was crystallized under similar conditions to the native protein, except that 4% polypropylene glycol P400 was replaced by 3% dextran sulfate. Crystals grew at 20°C and were frozen in a cryoprotectant solution consisting of the reservoir solution supplemented with 15% glycerol. All crystals belonged to the space group of P212121. The wild-type crystals diffracted to 1.7Å with unit cell dimensions of a=35.90Å, b=71.03Å and c=153.76Å and Se–Met substituted crystals diffracted to 2.1Å with unit cell dimensions of a=36.14Å, b=71.23Å and c=154.23Å. A native data set and a single anomalous dispersion (SAD) data set were collected at the peak wavelength for Se on station BL17U1 of the Shanghai Synchrotron Radiation Facility (SSRF). Both data sets were processed using HKL2000 software (22).
The program HKL2MAP (23) was used to search for 16 Se sites and the initial SAD phases were then calculated using PHENIX program (24). A model covering 60% of the protein molecule was automatically built into the SAD map using the PHENIX program (24), and additional residues were manually built into the electron density with the Coot program (25). Then, molecular replacement by the Phaser program (26) was performed to locate the exact position of four protein molecules in one asymmetric unit of the wild-type data. The final tetramer model was refined iteratively by the CNS (27) and the Coot programs (25). The orientations of the amino acid side chains and bound water molecules were modeled based on sigmaA weighted 2Fobs−Fcalc and Fobs−Fcalc Fourier electron density maps. The final structure had an Rcrystal value of 15.8% and an Rfree value of 21.8%. The statistics on the structure refinement are summarized in Table 1.
Sedimentation velocity (SV) and sedimentation equilibrium (SE) experiments were performed in a Beckman/Coulter XL-I analytical ultracentrifuge using double-sector or six-channel centerpieces and sapphirine windows. An additional protein purification step on a HiLoad 26/60 Superdex 200 size exclusion column in 50mM Tris pH 8.0, 100mM NaCl, 1mM EDTA and 1mM TCEP was performed before the experiments. SV experiments were conducted at 50000rpm and 20°C using absorbance detection and double-sector cells loaded with ~80μM for ubiquitin-like domain (ULD). For the full-length protein, SV experiments were conducted at 40000rpm and 4°C with 7.5 and 15μM protein. For the SE experiment, data were collected at 20°C and 18000rpm with ~16, 24 and 40μM ULD, at 4°C and 4300, 6400 and 8000rpm with 2.9, 4.4 and 7.3µM wild-type full-length SATB1 and at 4°C and 6000, 8000, 10000 and 12000rpm with 3.3, 4.9 and 8.2μM KWN–AAA mutant, respectively. The buffer composition (density and viscosity) and protein partial specific volume (V-bar) were obtained using the program SEDNTERP (http://www.rasmb.bbri.org/). The SV and SE data were analyzed using the programs SEDFIT and SEDPHAT (28,29).
The forward and reverse oligonucleotides for a particular set were mixed together. To anneal the labeled oligonucleotides, the mixtures were heated to 95°C for 10min and allowed to cool slowly at room temperature. Annealed oligonucleotides were end labeled with γ-32P ATP using T4 polynucleotide kinase (New England Biolabs). Binding reactions were performed in a 10μl total volume containing 10mM Tris pH 7.5, 1mM DTT, 50mM KCl, 5mM MgCl2, 2.5% glycerol, 0.05% NP-40 and the appropriate amount of annealed oligonucleotides and recombinant proteins. Samples were incubated on ice for 1h and separated by electrophoresis on a 5% native polyacrylamide gel (PAGE). The gels were dried under vacuum and exposed to X-ray film. The radioactive intensities of protein-bound DNA bands and free DNA probe reduction due to protein binding were calculated with Adobe Photoshop CS4 and normalized to that of the free DNA probe.
Size exclusion chromatography was performed on an AKTA FPLC system using a Superose 12 10/300 column (GE Healthcare) for ULD or Superdex 200 10/300 column (GE Healthcare) for full-length SATB1. Protein samples were dissolved in buffer containing 50mM Tris pH 8.0, 100mM NaCl, 1mM EDTA and 1mM DTT. The column was calibrated with a gel filtration standard from Bio-Rad.
Circular dichroism (CD) spectra of various proteins were collected on a MOS450 spectropolarimeter (BioLogic) at room temperature. The protein samples (~6μM) were dissolved in 50mM Tris pH 8.0, 100mM NaCl, 1mM EDTA and 1mM DTT.
Isothermal titration calorimetry (ITC) measurements were performed on a MicroCal™ Isothermal Titration Calorimeter iTC200 (GE Healthcare) in 20mM PBS pH 7.2 and 50mM NaCl. For wild-type SATB1, ~232μM of DNA probe was titrated into 11.7μM of protein. For the KWN-AAA mutant, 239μM of DNA probe was titrated into 16μM of protein. The titration consisted of an initial injection of 0.4μl followed by 26 injections of 1.5μl every 120s at 15°C. To determine the baseline, the DNA probe was titrated into the same buffer without protein under the same conditions. The titration data and binding plot after baseline subtraction were analyzed with the MicroCal Origin software with the two-site and one-site-binding models for wild-type SATB1 and the KWN–AAA mutant, respectively.
Chemical cross-linking assay was carried out by incubating (1–263)AA or SATB1 proteins with Lys-specific cross-linker disuccinimidyl glutarate (DSG, Thermo Pierce). The reaction was carried out in 20mM Hepes buffer (pH 7.5) containing various concentrations of NaCl and glycerol with a protein concentration of 0.5mg/ml for 15min on ice. The concentration of DSG was adjusted to be 5M equivalent of total Lys concentration of each protein. The cross-linking reactions were quenched by addition of 50mM Tris to the reaction solution. GST and MBP were served as positive and negative control, respectively.
To identify a stable region of the N-terminus of SATB1 for crystallization, several constructs were designed and tested for protein expression and purification. Among them, seven purified proteins formed tetramers, as determined by analytical ultracentrifugation SV experiments (Supplementary Figure S1). After extensive crystal screening, high-quality crystals were obtained only with the construct containing residues 71–172. The crystal structure of residues 71–172 of SATB1 was solved by SAD at a resolution of 1.8Å (Table 1). Strikingly, the overall structure of this region contains four antiparallel β-sheets (β1–β4) flanked by four α-helices (α1–α4) (Figure 1B), and does not show any similarity to the previously defined PDZ domain (15). Instead, it resembles a typical ubiquitin domain. The superposition of the N-terminal domain of SATB1 onto the classic ubiquitin domain results in a root-mean-square deviation of 2.7Å for the 67 Cα-atoms (Figure 1C), even though the two proteins share no sequence homology. Therefore, we renamed this domain a ULD (Figure 1A).
Prior studies have proposed that residues 90–204 of SATB1 form a PDZ-like domain based on primary sequence alignment (15). So far, two structures of residues 179–250 of SATB1 have been deposited in the Protein Data Bank with accession codes 3NZL and 2L1P. Interestingly, the structure of residues 179–250 resembles the CUT1 domain of SATB1, and this region was designated a CUT1-like (CUTL) domain (Supplementary Figure S2 and Figure 1A). Therefore, according to current structural studies, it appears that the N-terminus of SATB1 does not contain a PDZ domain. A sequence alignment of SATB family proteins (SATB1 and SATB2) across species shows that the residues of the ULD domain are all highly conserved (Supplementary Figure S3), indicating that the ULD domain we identified in this study may exist in all SATB family members and have similar biological functions.
Our crystal structure contains four ULD molecules, which seem to form a tightly packed tetramer, in one asymmetric unit (Figure 2A). Consistent with this observation, ULD was eluted as a single peak from a size exclusion column with a molecular mass corresponding to a tetramer (Supplementary Figure S4A). Analytical ultracentrifugation further confirmed that ULD assembles into a tetramer with a molecular mass of ~43.5kDa (Supplementary Figure S4B). These results demonstrate that ULD forms a homotetramer in solution.
In the structure of the ULD tetramer, two dimers (cyan–yellow and green–yellow) are formed with multiple hydrogen bonds and hydrophobic interactions within their interfaces (Figure 2B and C, Supplementary Figure S5). It has been noted that high concentrations of NaCl and glycerol disrupt neither the ULD tetramer nor ULD-mediated full-length SATB1 tetramer (Supplementary Figure S6). The buried surface area calculated by the program AREAIMOL (30) is 1343.6 and 1647.5 Å2 within the interface of the cyan–yellow and green–yellow dimers, respectively, indicating that both dimers could play a role in the oligomerization of full-length SATB1. The interactions within the interface of both dimers were analyzed in detail. These interactions display non-crystallographic 2-fold symmetry. Within the interface of the cyan–yellow dimer (Figure 2B and Supplementary Figure S5A), Glu97 from the cyan monomer makes several hydrogen bonds with Thr72 from the yellow monomer. Additionally, Phe98 and His162 from the cyan monomer form stacking interactions with Pro75 and His162 from the yellow monomer, respectively. Within the green–yellow interface (Figure 2C and Supplementary Figure S5B), three residues (Ile132, Val134 and Val145) from the green monomer form a hydrophobic pocket to accommodate the bulky side chain of Trp137 from the yellow monomer. Atom NE1 of the side chain of Trp137 from the green monomer forms a hydrogen bond with the main chain oxygen atom of Met156 from the yellow monomer. Asn138 and Asp159 from the green monomer make hydrogen bonds with Asp147 and Lys136 from the yellow monomer, respectively. All of the above interactions are also reciprocal. Therefore, the ‘97E98F162H’ motif (residue Glu97, Phe98 and His162) is presumably important for the formation of the cyan–yellow ULD dimer, and the ‘136K137W138N’ motif (residues 136–138) is important for the formation of the green–yellow ULD dimer. For the convenience of further discussion, the cyan–yellow ULD dimer interface (Figure 2B) and the green–yellow ULD dimer interface (Figure 2C) are referred to as the EFH and KWN interfaces, respectively.
To investigate the oligomerization of SATB1 in solution, we mutated residues within the two interfaces to test the ULD domain-mediated oligomerization state of full-length SATB1. The first mutant was designed to disrupt the EFH interface-mediated ULD dimer by replacing the 97E98F162H motif with an ‘AAA’ cassette, and the second mutant was designed to disrupt the KWN interface-mediated ULD dimer by substituting the 136K137W138N motif with an ‘AAA’ cassette (referred to as EFH–AAA and KWN–AAA, respectively) (Figure 1A). Wild-type SATB1 and the two mutants were purified to homogeneity by the protocol summarized in Supplementary Figure S7 (Figure 3A, inset) and analyzed by SV analysis. Wild-type SATB1 exhibited a narrow sedimentation coefficient distribution in continuous size distribution analysis (black lines in Figure 3A) (28), suggesting that it is mono-disperse, and it is estimated to be a stable tetramer. SE analysis of analytical ultracentrifugation further confirmed that wide-type full-length SATB1 assembled into a tetramer with a molecular mass of ~344.4kDa (Figure 3B and Supplementary Figures S8A–S8C). The KWN–AAA mutant also exhibited a mono-disperse sedimentation coefficient distribution (red lines in Figure 3A), suggesting that it is a dimer. The dimeric state of the KWN–AAA mutant in solution was confirmed by SE analysis, yielding a calculated molecular weight of ~174.5kDa (Figure 3B and Supplementary Figures S8D–S8F). In sharp contrast, the EFH–AAA mutant exhibited a wide range of sedimentation coefficients in the c(S) distribution (cyan lines in Figure 3A) indicating that it exists as multiple forms of oligomers including dimmers, tetramers and even high-order form. These data suggest that this mutation drastically interferes with the oligomerization of SATB1 and that the EFH–AAA mutant does not have a unique form of oligomers, which is consistent with its elution profile by size exclusion chromatography (Supplementary Figure S9). Taken together, we conclude that both dimers in the ULD tetramer contribute to the oligomerization of full-length SATB1.
SATB1 has been shown to bind to AT-rich DNA of BURs through its C-terminal tandem CUT domains and HD (13,14) and lost its DNA binding activity after deletion the N-terminal 248 amino acids of SATB1 (15). We therefore investigated the role of ULD domain-mediated oligomerization in SATB1's DNA-binding activity. To accomplish this, we assessed the ability of wild-type SATB1 and the KWN–AAA and EFH–AAA mutants to bind a 37-bp DNA fragment known to bind wild-type SATB1 (Figure 4A) (6) via an electrophoretic mobility shift assay (EMSA). The DNA-binding affinity of wild-type SATB1 to the 37-bp DNA was stronger than that of the KWN–AAA and EFH–AAA mutants in dose-dependent EMSAs varying either the protein or the DNA probe concentration (Figure 4B and C). Thus, our data indicate that ULD-mediated oligomerization of SATB1 is required for binding to target DNA.
To quantitatively study the binding of wild-type and mutant SATB1 to target DNA, we used ITC. The binding of wild-type SATB1 to DNA fit well to a two-site binding model with calculated Kd values of ~0.36 and ~1.99µM, indicating that the SATB1 tetramer is capable of binding to two DNA fragments simultaneously (Figure 4D). However, the binding of the KWN–AAA mutant to target DNA fit well to a one-site binding model with a Kd of ~1.95µM (Figure 4E). The EFH–AAA mutant was not amenable to analysis by ITC because it exists as multiple forms of oligomers (Figure 3A and Supplementary Figure S9). These ITC results indicate that the wild-type SATB1 can possibly bind to two DNA segments in an allosteric interaction with one comparable and another 5-fold increase binding affinity comparing to the KWN–AAA mutant binding to DNA, which is consistent with the observation in EMSA assays that the wild-type SATB1 binds to the target DNA stronger than the KWN–AAA mutant does.
Oligomerization of SATB1 plays a very important role in DNA binding and has been implicated in gene regulation by SATB1. The structure of the SATB1 ULD domain reported here provides the molecular basis for how the ULD domain mediated the oligomerization state of SATB1. SATB1 assembles into a tetramer in vitro (Figure 5A), and the tetramerization of SATB1 is essential for recognizing specific DNA sequences (such as multiple AT-rich DNA fragments). Thus, SATB1 may regulate gene expression directly by binding to various promoters and upstream regions and thereby influencing promoter activity (Figure 5B). This local gene regulation model is consistent with experimental observations that SATB1 directly regulates the expression of a number of genes, including globin, interleukin-2, interleukin-2 receptor α and interleukin-5, by recruiting either coactivators or corepressors (5–8). Furthermore, we showed that the SATB1 tetramer can simultaneously bind to two DNA segments, and thus the tetramerization of SATB1 may organize high-order chromatin architecture by anchoring specialized DNA sequences in close proximity and recruiting various chromatin remodeling factors to coordinately regulate gene expression over long distances (Figure 5C). This long-range gene regulation model is also consistent with the observations that SATB1 regulates the coordinated expression of genes located both at the 200-kb T-helper 2 cytokine locus (10) and at the 300-kb major histocompatibility Class I locus (11), and that it reprograms chromatin organization and the transcriptional profiles of breast tumors to promote growth and metastasis (12,31,32).
Long-range gene regulation plays a role in many biological activities such as regulation of the β-globin locus (33), cytokine gene cluster (34), estrogen-induced gene expression (35) and mating-type switching in yeast (36). Moreover, it has been found that the CI protein of bacteriophage λ regulates gene expression over a long distance via cooperative binding of its oligomers to specific target DNA (37). Our proposed model for SATB1 oligomerization-mediated long-range gene regulation is consistent with their finding and likely represents a general mechanism for spatiotemporal and quantitative regulation of gene expression.
Any assembly of a functional protein complex in a living cell must be dynamically regulated. The oligomeric assembly of SATB1 is not an exception. The mechanism that regulates the dynamic assembly of SATB1 tetramers remains unclear. Understanding the dynamic regulatory assembly mechanism of SATB1 (or SATB2) is an important area of future research. SATB1 is known to be acetylated by P300/CBP-associated factor at residue Lys136, located just within the 136K137W138N motif, which is important for SATB1 tetramerization and to mediate gene regulation in coordination with C-terminal-binding protein 1 during Wnt signaling in T cells (7). The identification of key residues (i.e. 97E98F162H and 136K137W138N motifs) in the assembly of SATB1 oligomers (and likely SATB2) should be helpful in designing mutants of SATB family proteins to evaluate their functional roles in living cells and/or animal models.
The results of EMSA and ITC indicate that the ULD-mediated SATB1 oligomerization can affect the DNA-binding affinity and stoichiometry for SATB1 (Figure 4). It is possible that, similar to DNA binding, the ULD-mediated SATB1 oligomerization may also affect the binding affinity and stoichiometry for its various protein-binding partners. Furthermore, the ITC data showed that SATB1 may allosterically bind to two DNA fragments, indicating that SATB1 binds to multiple AT-rich-containing DNA fragments with higher affinity by its tetramerization. In addition, a previous study suggested a rapid association and dissociation kinetics of DNA binding by SATB1 (3). Whether the oligomerization of SATB1 influencing the kinetic constants of DNA-binding by SATB1 needs to be investigated in the future study by other technique such as surface plasmon resonance.
Although the N-terminus of SATB1 (residues 90–204) was previously identified as a PDZ domain (15), structural studies have shown that this region is made up of ULD and CUTL domains. Detailed sequence alignments show that CUTL has the evolutionarily conserved amino acids involved in CUT1 DNA binding (Supplementary Figures S2B and S3). It would be interesting to investigate the functional role of the CUTL domain of SATB1 in future studies.
The atomic coordinates and structure factors for the structure of SATB1 ULD have been deposited in the Protein Data Bank with accession code 3TUO.
Supplementary Data are available at NAR Online: Supplementary Figures 1–9.
973 Program (grant 2009CB825504); National Natural Science Foundation of China (grants 31100527, 31140029 and 31170684); Tianjin Basic Research program (grants 08QTPTJC28200 and 08SYSYTC00200); Fundamental Research Funds for the Central Universities (grants 65011621 and 65020241). Funding for open access charge: grant 2009CB825504.
Conflict of interest statement. None declared.
We are grateful to Dr Lingyi Chen and Miss Lipin Ma for help with the EMSA experiments and Mr Wentao Diao for his technical help in CD experiments. We are also grateful to the staff at the beamline BL17U1 of the SSRF and at the beamline 3W1A of the Beijing Synchrotron Radiation Facility for excellent technical assistance during data collection.