DNA-binding proteins such as transcription factors use DNA-binding domains (DBDs) to bind to specific sequences in the genome to initiate many important biological functions. Accurate prediction of such target sequences, often represented by position weight matrices (PWMs), is an important step to understand many biological processes. Recent studies have shown that knowledge-based potential functions can be applied on protein-DNA co-crystallized structures to generate PWMs that are considerably consistent with experimental data. However, this success has not been extended to DNA-binding proteins lacking co-crystallized structures. This study aims at investigating the possibility of predicting the DNA sequences bound by DNA-binding proteins from the proteins' unbound structures (structures of the unbound state). Given an unbound query protein and a template complex, the proposed method first employs structure alignment to generate synthetic protein-DNA complexes for the query protein. Once a complex is available, an atomic-level knowledge-based potential function is employed to predict PWMs characterizing the sequences to which the query protein can bind. The evaluation of the proposed method is based on seven DNA-binding proteins, which have structures of both DNA-bound and unbound forms for prediction as well as annotated PWMs for validation. Since this work is the first attempt to predict target sequences of DNA-binding proteins from their unbound structures, three types of structural variations that presumably influence the prediction accuracy were examined and discussed. Based on the analyses conducted in this study, the conformational change of proteins upon binding DNA was shown to be the key factor. This study sheds light on the challenge of predicting the target DNA sequences of a protein lacking co-crystallized structures, which encourages more efforts on the structure alignment-based approaches in addition to docking- and homology modeling-based approaches for generating synthetic complexes.
Nuclear receptors (NRs) are important transcriptional modulators in metazoans which regulate transcription through binding to the promoter region of their target gene by the DNA binding domain (DBD) and activation or repression of mRNA synthesis through co-regulators bound to the ligand binding domain (LBD). NRs typically have a single DBD with a LBD.
Three nuclear receptors named 2DBD-NRs, were identified from the flatworm Schistosoma mansoni that each possess a novel set of two DBDs in tandem with a LBD. They represent a novel NR modular structure: A/B-DBD-DBD-hinge-LBD. The 2DBD-NRs form a new subfamily of NRs, VII. By database mining, 2DBD-NR genes from other flatworm species (Schmidtea mediterranea and Dugesia japonica), from Mollusks (Lottia gigantean) and from arthropods (Daphnia pulex) were also identified. All 2DBD-NRs possess a P-box sequence of CEACKK in the first DBD, which is unique to 2DBD-NRs, and a P-box sequence of CEGCKG in the second DBD. Phylogenetic analyses of both DBD and ligand binding domain sequences showed that 2DBD-NR genes originate from a common two DBD-containing ancestor gene. A single 2DBD-NR orthologue was found in Arthropoda, Platyhelminths and Mollusca. Subsequent 2DBD-NR gene evolution in Mollusks and Platyhelminths involved gene duplication. Chromosome localization of S. mansoni 2DBD-NR genes by Fluorescent in situ hybridization (FISH) suggests that 2DBD-NR genes duplicated on different chromosomes in the Platyhelminths. Dimerization of Sm2DBDα indicates that 2DBD-NRs may act as homodimers, suggesting either that two repeats of a half-site are necessary for each DBD of 2DBD-NRs to bind to its target gene, or that each 2DBD-NR can recognize multiple sites.
2DBD-NRs share a common ancestor gene which possessed an extra DBD that likely resulted from a recombination event. After the split of the Arthropods, Mollusks and Platyhelminths, 2DBD-NR underwent a recent duplication in a common ancestor of Mollusks, while two rounds of duplication occurred in a common ancestor of the Platyhelminths. This demonstrates that certain NR gene underwent recent duplication in Prostostome lineages after the split of the Prostostomia and Deuterostomia.
Predicting binding sites of a transcription factor in the genome is an important, but challenging, issue in studying gene regulation. In the past decade, a large number of protein–DNA co-crystallized structures available in the Protein Data Bank have facilitated the understanding of interacting mechanisms between transcription factors and their binding sites. Recent studies have shown that both physics-based and knowledge-based potential functions can be applied to protein–DNA complex structures to deliver position weight matrices (PWMs) that are consistent with the experimental data. To further use the available structural models, the proposed Web server, PiDNA, aims at first constructing reliable PWMs by applying an atomic-level knowledge-based scoring function on numerous in silico mutated complex structures, and then using the PWM constructed by the structure models with small energy changes to predict the interaction between proteins and DNA sequences. With PiDNA, the users can easily predict the relative preference of all the DNA sequences with limited mutations from the native sequence co-crystallized in the model in a single run. More predictions on sequences with unlimited mutations can be realized by additional requests or file uploading. Three types of information can be downloaded after prediction: (i) the ranked list of mutated sequences, (ii) the PWM constructed by the favourable mutated structures, and (iii) any mutated protein–DNA complex structure models specified by the user. This study first shows that the constructed PWMs are similar to the annotated PWMs collected from databases or literature. Second, the prediction accuracy of PiDNA in detecting relatively high-specificity sites is evaluated by comparing the ranked lists against in vitro experiments from protein-binding microarrays. Finally, PiDNA is shown to be able to select the experimentally validated binding sites from 10 000 random sites with high accuracy. With PiDNA, the users can design biological experiments based on the predicted sequence specificity and/or request mutated structure models for further protein design. As well, it is expected that PiDNA can be incorporated with chromatin immunoprecipitation data to refine large-scale inference of in vivo protein–DNA interactions. PiDNA is available at: http://dna.bime.ntu.edu.tw/pidna.
The structures of DNA–protein complexes have illuminated the diversity of DNA–protein binding mechanisms shown by different protein families. This lack of generality could pose a great challenge for predicting DNA–protein interactions. To address this issue, we have developed a knowledge-based method, DNA-binding Domain Hunter (DBD-Hunter), for identifying DNA-binding proteins and associated binding sites. The method combines structural comparison and the evaluation of a statistical potential, which we derive to describe interactions between DNA base pairs and protein residues. We demonstrate that DBD-Hunter is an accurate method for predicting DNA-binding function of proteins, and that DNA-binding protein residues can be reliably inferred from the corresponding templates if identified. In benchmark tests on ∼4000 proteins, our method achieved an accuracy of 98% and a precision of 84%, which significantly outperforms three previous methods. We further validate the method on DNA-binding protein structures determined in DNA-free (apo) state. We show that the accuracy of our method is only slightly affected on apo-structures compared to the performance on holo-structures cocrystallized with DNA. Finally, we apply the method to ∼1700 structural genomics targets and predict that 37 targets with previously unknown function are likely to be DNA-binding proteins. DBD-Hunter is freely available at http://cssb.biology.gatech.edu/skolnick/webservice/DBD-Hunter/.
Diverse mechanisms for DNA-protein recognition have been elucidated in numerous atomic complex structures from various protein families. These structural data provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing specialized methods that predict the DNA-binding function from protein structure. While such methods are useful, a major limitation is that they require an experimental structure of the target as input. To overcome this obstacle, we develop a threading-based method, DNA-Binding-Domain-Threader (DBD-Threader), for the prediction of DNA-binding domains and associated DNA-binding protein residues. Our method, which uses a template library composed of DNA-protein complex structures, requires only the target protein's sequence. In our approach, fold similarity and DNA-binding propensity are employed as two functional discriminating properties. In benchmark tests on 179 DNA-binding and 3,797 non-DNA-binding proteins, using templates whose sequence identity is less than 30% to the target, DBD-Threader achieves a sensitivity/precision of 56%/86%. This performance is considerably better than the standard sequence comparison method PSI-BLAST and is comparable to DBD-Hunter, which requires an experimental structure as input. Moreover, for over 70% of predicted DNA-binding domains, the backbone Root Mean Square Deviations (RMSDs) of the top-ranked structural models are within 6.5 Å of their experimental structures, with their associated DNA-binding sites identified at satisfactory accuracy. Additionally, DBD-Threader correctly assigned the SCOP superfamily for most predicted domains. To demonstrate that DBD-Threader is useful for automatic function annotation on a large-scale, DBD-Threader was applied to 18,631 protein sequences from the human genome; 1,654 proteins are predicted to have DNA-binding function. Comparison with existing Gene Ontology (GO) annotations suggests that ∼30% of our predictions are new. Finally, we present some interesting predictions in detail. In particular, it is estimated that ∼20% of classic zinc finger domains play a functional role not related to direct DNA-binding.
DNA-binding proteins represent only a small fraction of proteins encoded in genomes, yet they play a critical role in a variety of biological activities. Identifying these proteins and understanding how they function are important issues. The structures of solved DNA protein complexes of different protein families provide an invaluable knowledge base not only for understanding DNA-protein interactions, but also for developing methods that predict whether or not a protein binds DNA. While such methods are useful, they require an experimental structure as input. To overcome this obstacle, we have developed a threading-based method for the prediction of DNA-binding domains and associated DNA-binding protein residues from protein sequence. The method has higher accuracy in large scale benchmarking than methods based on sequence similarity alone. Application to the human proteome identified potential targets of not only previously unknown DNA-binding proteins, but also of biologically interesting ones that are related to, yet evolved from, DNA-binding proteins.
FOXO3a is a transcription factor of the FOXO family. The FOXO proteins participate in multiple signaling pathways, and their transcriptional activity is regulated by several post-translational mechanisms, including phosphorylation, acetylation and ubiquitination. Because these post-translational modification sites are located within the C-terminal basic region of the FOXO DNA-binding domain (FOXO-DBD), it is possible that these post-translational modifications could alter the DNA-binding characteristics. To understand how FOXO mediate transcriptional activity, we report here the 2.7 Å crystal structure of the DNA-binding domain of FOXO3a (FOXO3a-DBD) bound to a 13-bp DNA duplex containing a FOXO consensus binding sequence (GTAAACA). Based on a unique structural feature in the C-terminal region and results from biochemical and mutational studies, our studies may explain how FOXO-DBD C-terminal phosphorylation by protein kinase B (PKB) or acetylation by cAMP-response element binding protein (CBP) can attenuate the DNA-binding activity and thereby reduce transcriptional activity of FOXO proteins. In addition, we demonstrate that the methyl groups of specific thymine bases within the consensus sequence are important for FOXO3a-DBD recognition of the consensus binding site.
Human replication protein A (RPA), a heterotrimer composed of RPA70, RPA32, and RPA14 subunits, contains four single-stranded DNA (ssDNA) binding domains (DBD): DBD-A, DBD-B and DBD-C in RPA70 and DBD-D in RPA32. While crystallographic or NMR structures of these DBDs and a trimerization core have been determined, the structure of the full length of RPA or RPA-ssDNA complex remains unknown. In this report, we have examined the structural features of RPA interaction with ssDNA by fluorescence spectroscopy. Using a set of oligonucleotides (dT) with varying lengths as a molecular ruler and also as the substrates, we have determined at single nucleotide resolution the relative positions of the ssDNA-interacting intrinsic tryptophans of RPA. Our results revealed that Trp528 in DBD-C and Trp107 in DBD-D contact ssDNA at the 16th and 24th nucleotides (nt) from the 5′-end of the substrate, respectively. Evaluation of the relative spatial arrangement of RPA domains in RPA-ssDNA complex suggested that DBD-B and DBD-C are spaced by about 4 nt (~19 Å) apart while DBD-C and DBD-D by about 7 nt (~34 Å). Based on these geometric constraints, a global structure model for the binding of the major RPA DBDs to ssDNA was proposed.
Replication protein A; single-stranded DNA; RPA-ssDNA binding; structural characterization; and fluorescence spectroscopy
Binding of many eukaryotic transcription regulatory proteins to their DNA recognition sequences results in conformational changes in DNA. To test the effect of altering DNA topology by prebending a transcription factor binding site, we examined the interaction of the estrogen receptor (ER) DNA binding domain (DBD) with prebent estrogen response elements (EREs). When the ERE in minicircle DNA was prebent toward the major groove, which is in the same direction as the ER-induced DNA bend, there was no significant effect on ER DBD binding relative to the linear counterparts. However, when the ERE was bent toward the minor groove, in a direction that opposes the ER-induced DNA bend, there was a four- to eightfold reduction in ER DBD binding. Since reduced binding was also observed with the ERE in nicked circles, the reduction in binding was not due to torsional force induced by binding of ER DBD to the prebent ERE in covalently closed minicircles. To determine the mechanism responsible for reduced binding to the prebent ERE, we examined the effect of prebending the ERE on the association and dissociation of the ER DBD. Binding of the ER DBD to ERE-containing minicircles was rapid when the EREs were prebent toward either the major or minor groove of the DNA (k(on) of 9.9 x 10(6) to 1.7 x 10(7) M(-1) s(-1)). Prebending the ERE toward the minor groove resulted in an increase in k(off) of four- to fivefold. Increased dissociation of the ER DBD from the ERE is, therefore, the major factor responsible for reduced binding of the ER DBD to an ERE prebent toward the minor groove. These data provide the first direct demonstration that the interaction of a eukaryotic transcription factor with its recognition sequence can be strongly influenced by altering DNA topology through prebending the DNA.
The bovine papillomavirus replication initiator protein E1 is an origin of replication (ori)-binding protein absolutely required for viral DNA replication. In the presence of the viral transcription factor E2, E1 binds to the ori and initiates DNA replication. To understand how the E1 initiator recognizes the ori and how E2 assists in this process, we have expressed and purified a 166-amino-acid fragment which corresponds to the minimal E1 DNA-binding domain (DBD). DNA binding studies using this protein demonstrate that the E1 DBD can bind to the palindromic E1 binding site in several forms but that binding of two monomers, each recognizing one half-site of the E1 palindrome, is the predominant form. This is reminiscent of the binding of the T-antigen DBD to the SV40 ori, and interestingly, the arrangement of E1 binding sites shows striking similarities to the arrangement of T-antigen binding sites in the SV40 ori even though the recognition sequences are unrelated. The E1 DBD is capable of interacting cooperatively with E2; however, the E2 DBD and not the E2 activation domain mediates this interaction. Furthermore, the E2 DBD stimulates binding of two monomers of the E1 DBD to the ori by binding cooperatively with one E1 monomer. Finally, we show that our results concerning the DNA-binding properties of the E1 DBD can be extended to full-length E1.
PAX5 encodes a master regulator of B-cell development. It fuses to other genes associated with acute lymphoblastoid leukemia (ALL). These fusion products are potent dominant-negative (DN) inhibitors of wild-type PAX5 resulting in a blockade of B-cell differentiation. Here, we show that multimerization of PAX5 DNA-binding domain (DBD) is necessary and sufficient to cause extremely stable chromatin binding and DN-activity. ALL-associated PAX5-C20S results from fusion of the N-terminal region of PAX5 including its paired DBD, to the C-terminus of C20orf112, a protein of unknown function. We report that PAX5-C20S is a tetramer which interacts extraordinarily stably with chromatin as determined by fluorescence recovery after photobleaching (FRAP) in living cells. Tetramerization, stable chromatin-binding and DN-activity all require a putative five-turn amphipathic α-helix at the C-terminus of C20orf112, and does not require potential co-repressor binding peptides elsewhere in the sequence. In vitro, the monomeric PAX5 DBD and PAX5-C20S binds a PAX5-binding site with equal affinity when it is at the center of an oligonucleotide too short to bind to more than one PAX5 DBD. But PAX5-C20S binds the same sequence with tenfold higher affinity than the monomeric PAX5 DBD when it is in a long DNA molecule. We suggest that the increased affinity results from interactions of one or more of the additional DBDs with neighboring non-specific sites in a long DNA molecule, and that this can account for the increased stability of PAX5-C20S chromatin binding compared to wt PAX5, resulting in DN-activity by competition for binding to PAX5-target sites. Consistent with this model, the ALL-associated PAX5 fused to ETV6 or the multimerization domain of ETV6 SAM results in stable chromatin binding and DN-activity. In addition, PAX5 DBD fused to artificial dimerization, trimerization, and tetramerization domains result in parallel increases in the stability of chromatin binding and DN-activity. Our studies suggest that oncogenic fusion proteins that retain the DBD of the transcription factor and the multimerization sequence of the partner protein can act in a DN fashion by multimerizing and binding avidly to gene targets preventing the normal transcription factor from binding and inducing expression of its target genes. Inhibition of this multimeriztion may provide a novel therapeutic approach for cancers with this or similar fusion proteins.
PAX5 fusions; DNA binding; multimerization stable
Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.
TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds.
We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation).
MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query.
Availability and implementation
MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
Transcription factor binding site; TFBS; Transcription factor binding site model; Binding motif; Jaccard similarity; Position weight matrix; PWM; P-value; Position specific frequency matrix; PSFM; Macroape
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.
By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.
The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at http://dodoma.systemsbiology.netdodoma.systemsbiology.net.
The homeobox gene (HOXA13) codes for a transcription factor protein that binds to AT-rich DNA sequences and controls expression of genes during embryonic morphogenesis. Here we present the NMR structure of HOXA13 homeodomain (A13DBD) bound to an 11-mer DNA duplex. A13DBD forms a dimer that binds to DNA with a dissociation constant of 7.5 nM. The A13DBD/DNA complex has a molar mass of 35 kDa consistent with two molecules of DNA bound at both ends of the A13DBD dimer. A13DBD contains an N-terminal arm (residues 324 – 329) that binds in the DNA minor groove, and a C-terminal helix (residues 362 – 382) that contacts the ATAA nucleotide sequence in the major groove. The N370 side-chain forms hydrogen bonds with the purine base of A5* (base paired with T5). Side-chain methyl groups of V373 form hydrophobic contacts with the pyrimidine methyl groups of T5, T6* and T7*, responsible for recognition of TAA in the DNA core. I366 makes similar methyl contacts with T3* and T4*. Mutants (I366A, N370A and V373G) all have decreased DNA binding and transcriptional activity. Exposed protein residues (R337, K343, and F344) make intermolecular contacts at the protein dimer interface. The mutation F344A weakens protein dimerization and lowers transcriptional activity by 76%. We conclude that the non-conserved residue, V373 is critical for structurally recognizing TAA in the major groove, and that HOXA13 dimerization is required to activate transcription of target genes.
The LysR-type transcriptional regulators (LTTRs) comprise the largest family of prokaryotic transcription factors. These proteins are composed of an N-terminal DNA binding domain (DBD) and a C-terminal cofactor binding domain. To date, no structure of the DBD has been solved. According to the SUPERFAMILY and MODBASE databases, a reliable homology model of LTTR DBDs may be built using the structure of the Escherichia coli ModE transcription factor, containing a winged helix– turn–helix (HTH) motif, as a template. The remote, but statistically significant, sequence similarity between ModE and LTTR DBDs and an alignment generated using SUPERFAMILY and MODBASE methods was independently confirmed by alignment of sequence profiles representing ModE and LTTR family DBDs. Using the crystal structure of the E.coli OxyR C-terminal domain and the DBD alignments we constructed a structural model of the full-length dimer of this LTTR family member and used it to investigate the mode of protein–DNA interaction. We also applied the model to interpret, in a structural context, the results of numerous biochemical studies of mutated LTTRs. A comparison of the LTTR DBD model with the structures of other HTH proteins also provides insights into the interaction of LTTRs with the C-terminal domain of the RNA polymerase α subunit.
Summary: The transcriptional activator AREA is a member of the GATA family of transcription factors and mediates nitrogen metabolite repression in the fungus Aspergillus nidulans. The nutritional versatility of A. nidulans and its amenability to classical and reverse genetic manipulations make the AREA DNA binding domain (DBD) a useful model for analyzing GATA family DBDs, particularly as structures of two AREA-DNA complexes have been determined. The 109 extant mutant forms of the AREA DBD surveyed here constitute one of the highest totals of eukaryotic transcription factor DBD mutants, are discussed in light of the roles of individual residues, and are compared to corresponding mutant sequence changes in other fungal GATA factor DBDs. Other topics include delineation of the DBD using both homology and mutational truncation, use of frameshift reversion to detect regions of tolerance to mutational change, the finding that duplication of the DBD can apparently enhance AREA function, and use of the AREA system to analyze a vertebrate GATA factor DBD. Some major points to emerge from work on the AREA DBD are (i) tolerance to sequence change (with retention of function) is surprisingly great, (ii) mutational changes in a transcription factor can have widely differing, even opposing, effects on expression of different structural genes so that monitoring expression of one or even several structural genes can be insufficient and possibly misleading, and (iii) a mutational change altering local hydrophobic packing and DNA binding target specificity can markedly influence the behavior of mutational changes elsewhere in the DBD.
Disulfide engineering is an important biotechnological tool that has advanced a wide range of research. The introduction of novel disulfide bonds into proteins has been used extensively to improve protein stability, modify functional characteristics, and to assist in the study of protein dynamics. Successful use of this technology is greatly enhanced by software that can predict pairs of residues that will likely form a disulfide bond if mutated to cysteines.
We had previously developed and distributed software for this purpose: Disulfide by Design (DbD). The original DbD program has been widely used; however, it has a number of limitations including a Windows platform dependency. Here, we introduce Disulfide by Design 2.0 (DbD2), a web-based, platform-independent application that significantly extends functionality, visualization, and analysis capabilities beyond the original program. Among the enhancements to the software is the ability to analyze the B-factor of protein regions involved in predicted disulfide bonds. Importantly, this feature facilitates the identification of potential disulfides that are not only likely to form but are also expected to provide improved thermal stability to the protein.
DbD2 provides platform-independent access and significantly extends the original functionality of DbD. A web server hosting DbD2 is provided at http://cptweb.cpt.wayne.edu/DbD2/.
Disulfide bond; Protein design; Protein engineering; Bioinformatics
Gene expression is regulated mainly by transcription factors (TFs) that interact with regulatory cis-elements on DNA sequences. To identify functional regulatory elements, computer searching can predict TF binding sites (TFBS) using position weight matrices (PWMs) that represent positional base frequencies of collected experimentally determined TFBS. A disadvantage of this approach is the large output of results for genomic DNA. One strategy to identify genuine TFBS is to utilize local concentrations of predicted TFBS. It is unclear whether there is a general tendency for TFBS to cluster at promoter regions, although this is the case for certain TFBS. Also unclear is the identification of TFs that have TFBS concentrated in promoters and to what level this occurs. This study hopes to answer some of these questions.
We developed the cluster score measure to evaluate the correlation between predicted TFBS clusters and promoter sequences for each PWM. Non-promoter sequences were used as a control. Using the cluster score, we identified a PWM group called PWM-PCP, in which TFBS clusters positively correlate with promoters, and another PWM group called PWM-NCP, in which TFBS clusters negatively correlate with promoters. The PWM-PCP group comprises 47% of the 199 vertebrate PWMs, while the PWM-NCP group occupied 11 percent. After reducing the effect of CpG islands (CGI) against the clusters using partial correlation coefficients among three properties (promoter, CGI and predicted TFBS cluster), we identified two PWM groups including those strongly correlated with CGI and those not correlated with CGI.
Not all PWMs predict TFBS correlated with human promoter sequences. Two main PWM groups were identified: (1) those that show TFBS clustered in promoters associated with CGI, and (2) those that show TFBS clustered in promoters independent of CGI. Assessment of PWM matches will allow more positive interpretation of TFBS in regulatory regions.
promoter; tissue-specific gene expression; position weight matrix; regulatory motif
The RFX DNA binding domain (DBD) is a novel highly conserved motif belonging to a large number of dimeric DNA binding proteins which have diverse regulatory functions in eukaryotic organisms, ranging from yeasts to human. To characterize this novel motif, solid phase synthesis of a 76mer polypeptide corresponding to the DBD of human hRFX1 (hRFX1/DBD), a prototypical member of the RFX family, has been optimized to yield large quantities (approximately 90 mg) of pure compound. Preliminary two-dimensional1H NMR experiments suggested the presence of helical regions in this sequence in agreement with previously reported secondary structure predictions. In gel mobility shift assays, this synthetic peptide was shown to bind in a cooperative manner the 23mer duplex oligodeoxynucleotide corresponding to the binding site of hRFX1, with a 2:1 stoichoimetry due to an inverse repeat present in the 23mer. The stoichiometry of this complex was reduced to 1:1 by decreasing the length of the DNA sequence to a 13mer oligonucleotide containing a single half-site. Surface plasmon resonance measurements were achieved using this 5'-biotylinated 13mer oligonucleotide immobilized on an avidin-coated sensor chip. Using this method an association constant (K a = 4 x 10(5)/M/s), a dissociation constant (K d = 6 x 10(-2)/s) and an equilibrium dissociation constant (K D = 153 nM) were determined for binding of hRFX1/DBD to the double-stranded 13mer oligonucleotide. In the presence of hRFX1/DBD the melting temperature of the 13mer DNA was increased by 16 degreesC, illustrating stabilization of the double-stranded conformation induced by the peptide.
The E1 helicase of papillomaviruses is required for replication of the viral double-stranded DNA genome, in conjunction with cellular factors. DNA replication is initiated at the viral origin by the assembly of E1 monomers into oligomeric complexes that have unwinding activity. In vivo, this process is catalyzed by the viral E2 protein, which recruits E1 specifically at the origin. For bovine papillomavirus (BPV) E1 a minimal DNA-binding domain (DBD) has been identified N-terminal to the enzymatic domain. In this study, we characterized the DBD of human papillomavirus 11 (HPV11), HPV18, and BPV E1 using a quantitative DNA binding assay based on fluorescence anisotropy. We found that the HPV11 DBD binds DNA with an affinity and sequence requirement comparable to those of the analogous domain of BPV but that the HPV18 DBD has a higher affinity for nonspecific DNA. By comparing the DNA-binding properties of a dimerization-defective protein to those of the wild type, we provide evidence that dimerization of the HPV11 DBD occurs only on two appropriately positioned E1 binding-sites and contributes approximately a 10-fold increase in binding affinity. In contrast, the HPV11 E1 helicase purified as preformed hexamers binds DNA with little sequence specificity, similarly to a dimerization-defective DBD. Finally, we show that the amino acid substitution that prevents dimerization reduces the ability of a longer E1 protein to bind to the origin in vitro and to support transient HPV DNA replication in vivo, but has little effect on its ATPase activity or ability to oligomerize into hexamers. These results are discussed in light of a model of the assembly of replication-competent double hexameric E1 complexes at the origin.
The tumor suppressor p53 plays a crucial role in the cell cycle checkpoints, DNA repair, and apoptosis. p53 consists of a natively unfolded N-terminal region (NTR), central DNA binding domain (DBD), C-terminal tetramerization domain, and regulatory region. In this paper, the interactions between the DBD and the NTR, and between the DBD and DNA were investigated by measuring changes in the mechanical unfolding trajectory of the DBD using atomic force microscopy (AFM)-based single molecule force spectroscopy. In the absence of DNA, the DBD (94–293, 200 amino acids (AA)) showed two different mechanical unfolding patterns. One indicated the existence of an unfolding intermediate consisting of approximately 60 AA, and the other showed a 100 AA intermediate. The DBD with the NTR did not show such unfolding patterns, but heterogeneous unfolding force peaks were observed. Of the heterogeneous patterns, we observed a high frequency of force peaks indicating the unfolding of a domain consisting of 220 AA, which is apparently larger than that of a sole DBD. This observation implies that a part of NTR binds to the DBD, and the mechanical unfolding happens not solely on the DBD but also accompanying a part of NTR. When DNA is bound, the mechanical unfolding trajectory of p53NTR+DBD showed a different pattern from that without DNA. The pattern was similar to that of the DBD alone, but two consecutive unfolding force peaks corresponding to 60 and 100 AA sub-domains were observed. These results indicate that interactions with the NTR or DNA alter the mechanical stability of DBD and result in drastic changes in the mechanical unfolding trajectory of the DBD.
The latency-associated nuclear antigen (LANA) of Kaposi's sarcoma-associated herpesvirus functions as an origin-binding protein (OBP) and transcriptional regulator. LANA binds the terminal repeats via the C-terminal DNA-binding domain (DBD) to support latent DNA replication. To date, the structure of LANA has not been solved. Sequence alignments among OBPs of gammaherpesviruses have revealed that the C terminus of LANA is structurally related to EBNA1, the OBP of Epstein–Barr virus. Based on secondary structure predictions for LANADBD and published structures of EBNA1DBD, this study used bioinformatics tools to model a putative structure for LANADBD bound to DNA. To validate the predicted model, 38 mutants targeting the most conserved motifs, namely three α-helices and a conserved proline loop, were constructed and functionally tested. In agreement with data for EBNA1, residues in helices 1 and 2 mainly contributed to sequence-specific DNA binding and replication activity, whilst mutations in helix 3 affected replication activity and multimer formation. Additionally, several mutants were isolated with discordant phenotypes, which may aid further studies into LANA function. In summary, these data suggest that the secondary and tertiary structures of LANA and EBNA1 DBDs are conserved and are critical for (i) sequence-specific DNA binding, (ii) multimer formation, (iii) LANA-dependent transcriptional repression, and (iv) DNA replication.
The B3 DNA-binding domains (DBDs) of plant transcription factors (TF) and DBDs of EcoRII and BfiI restriction endonucleases (EcoRII-N and BfiI-C) share a common structural fold, classified as the DNA-binding pseudobarrel. The B3 DBDs in the plant TFs recognize a diverse set of target sequences. The only available co-crystal structure of the B3-like DBD is that of EcoRII-N (recognition sequence 5′-CCTGG-3′). In order to understand the structural and molecular mechanisms of specificity of B3 DBDs, we have solved the crystal structure of BfiI-C (recognition sequence 5′-ACTGGG-3′) complexed with 12-bp cognate oligoduplex. Structural comparison of BfiI-C–DNA and EcoRII-N–DNA complexes reveals a conserved DNA-binding mode and a conserved pattern of interactions with the phosphodiester backbone. The determinants of the target specificity are located in the loops that emanate from the conserved structural core. The BfiI-C–DNA structure presented here expands a range of templates for modeling of the DNA-bound complexes of the B3 family of plant TFs.
Members of the nuclear receptor superfamily differentiate in terms of specificity for DNA recognition and binding, oligomeric state, and ligand binding. The wide range of specificities are impressive given the high degree of sequence conservation in the DNA binding domain (DBD) and moderate sequence conservation with high structural similarity within the ligand binding domains (LBDs). Determining sequence positions that are conserved within nuclear receptor subfamilies can provide important indicators into the structural dynamics that translate to oligomeric state of the active receptor, DNA binding specificity and ligand affinity and selectivity. Here we present a method to analyze sequence data from all nuclear receptors that facilitates detection of co-evolving pairs using Mutual Information (MI). Using this method we demonstrate that MI can reveal functionally important sequence positions within the superfamily and the approach identified three sequence positions that have conserved sequence patterns across all nuclear receptors and subfamilies. Interestingly, two of the sequence positions identified are located within the DBD CII and the third was within Helix c of the DBD. These sequences are located within the heterodimer interface of PPARγ (CII) and RXRα (Helix c) based on PDB:3DZU. Helix c of PPARγ, which is not involved in the DBD dimer interface, binds the minor groove in the 5' flanking region in a consensus PPARγ response element (PPRE) and the corresponding RXRα (CII) is found in the 3' flanking region of RXRE (3DZU). As these three sequence positions represent unique identifiers for all nuclear receptors and they are located within the dimer interface of PPARγ-RXRα DBD (3DZU) interfacing with the flanking regions of the NRRE, we conclude they are critical sequence positions perhaps dictating nuclear receptor (NR) DNA binding specificity.
DNA-binding domain (DBD) is a database of predicted sequence-specific DNA-binding transcription factors (TFs) for all publicly available proteomes. The proteomes have increased from 150 in the initial version of DBD to over 700 in the current version. All predicted TFs must contain a significant match to a hidden Markov model representing a sequence-specific DNA-binding domain family. Access to TF predictions is provided through http://transcriptionfactor.org, where new search options are now provided such as searching by gene names in model organisms, searching for all proteins in a particular DBD family and specific organism. We illustrate the application of this type of search facility by contrasting trends of DBD family occurrence throughout the tree of life, highlighting the clear partition between eukaryotic and prokaryotic DBD expansions. The website content has been expanded to include dedicated pages for each TF containing domain assignment details, gene names, links to external databases and links to TFs with similar domain arrangements. We compare the increase in number of predicted TFs with proteome size in eukaryotes and prokaryotes. Eukaryotes follow a slower rate of increase in TFs than prokaryotes, which could be due to the presence of splice variants or an increase in combinatorial control.
The S. pombe protection of telomeres 1 (SpPot1) protein recognizes the 3′ single-stranded ends of telomeres and provides essential protective and regulatory functions. The ssDNA-binding activity of SpPot1 is conferred by its ssDNA-binding domain, Pot1-DBD (residues 1-389), which can be further separated into two distinct domains, Pot1pN (residues 1-187) and Pot1pC (residues 188-389). Here we show that Pot1pC, like Pot1pN, can function independently of Pot1-DBD and binds specifically to a minimal nonameric oligonucleotide, d(GGTTACGGT), with a KD of 400 +/- 70 nM (specifically recognized nucleotides in bold). NMR chemical shift perturbation analysis indicates that the overall structures of the isolated Pot1pN and Pot1pC domains remain intact in Pot1-DBD. Furthermore, alanine scanning reveals modest differences in the ssDNA-binding contacts provided by isolated Pot1pN and within Pot1-DBD. Although the global character of both Pot1pN and Pot1pC is maintained in Pot1-DBD, chemical shift perturbation analysis highlights localized structural differences within the G1/G2 and T3/T4 binding pockets of Pot1pN in Pot1-DBD, which correlate with its distinct ssDNA-binding activity. Furthermore, we find evidence for a putative interdomain interface on Pot1pN that mediates interactions with Pot1pC that ultimately result in the altered ssDNA-binding activity of Pot1-DBD. Together, these data provide insight into the mechanisms underlying the activity and regulation of SpPot1 at the telomere.
telomeres; ssDNA-binding domain; end-protection; OB fold; Pot1