|Home | About | Journals | Submit | Contact Us | Français|
Transcriptional control requires the spatially and temporally coordinated action of many macromolecular complexes. Chromosomal proteins, transcription factors, co-activators and components of the general transcription machinery, including RNA polymerases, often use structurally or stoichiometrically ill-defined regions for interactions that convey regulatory information in processes ranging from chromatin remodeling to mRNA processing. Determining the functional significance of intrinsically disordered protein regions and developing conceptual models of their action will help to illuminate their key role in transcription regulation. Complexes comprising disordered regions often display short recognition elements embedded in flexible and sequentially variable environments that can lead to structural and functional malleability. This provides versatility to recognize multiple targets having different structures, facilitate conformational rearrangements and physically communicate with many partners in response to environmental changes. All these features expand the capacities of ordered complexes and give rise to efficient regulatory mechanisms.
At the critical intersection between signaling pathways and the process of gene expression, transcription regulation controls and influences many cellular and physiological processes, from cell differentiation and development to metabolic responses to environmental stimuli. Transcription regulation depends on communication and interaction between many large multiprotein complexes, which results in transmission of regulatory information to the RNA polymerases that carry out the synthesis of mRNA from a chromosomal DNA template. In eukaryotic organisms, diverse arrays of proteins are involved in every aspect of transcription regulation, from chromatin remodeling to mRNA processing. These include (i) the histones, chromatin and macromolecular complexes that modify chromatin structure and provide access to the DNA, (ii) transcription factors that bind to upstream regulatory sequences, (iii) co-activators that communicate signals to the core transcription machinery and (iv) general transcription factors that facilitate formation of the pre-initiation complex (Box 1 and Fig. 1).
Transcription of protein-encoding genes is one of the first steps in deciphering the genetic material. Fundamentally, it is the synthesis of mRNA from a DNA template, carried out in eukaryotes by the enzyme RNA polymerase II. Transcription is a hierarchical process involving many different macromolecular assemblies regulated by numerous protein-protein and protein-DNA interactions (Fig. 1).
Chromatin and chromatin modifying enzymes function to regulate accessibility of the DNA at the global (that is, multiple-kilobase-pair) level (Fig. 1a). The subunit of chromatin is the nucleosome, which is a complex of 146-base-pair chromosomal DNA with an octamer of core histones. Arrays of nucleosomes spaced at roughly 200-base-pair intervals make up chromatin fibers, which are structurally dynamic and can condense locally and globally into chromosomal domains. There are two major classes of chromatin modifying enzymes: those that add or remove specific post-translational modifications (for example, the Gcn5 acetyltransferase) and those that use the energy of ATP hydrolysis to alter the structure of nucleosomes and specific chromatin regions (for example, SWI/SNF). Chromatin modifying enzymes are large multi component assemblies that have elongated, flexible shapes (Fig. 1a). Collectively, chromatin and chromatin modifying enzymes operate at the epigenetic level and coordinate global accessibility of promoter DNA.
Once the chromatin environment becomes accessible, regulation of the promoter involves the action of regulatory transcription factors, co-activators/co-repressors and the basal transcription machinery. Upstream DNA sequences are targeted by specific DNA binding proteins, called transcription factors (Fig. 1b), which are responsible for controlling (activation/inhibition) specific gene expression. Transcription factors have a modular architecture with a DNA binding domain and a transactivator domain (TAD). TADs communicate with other regulatory transcriptional proteins and have an important role in orchestrating the transcriptional assemblies. Co-activators provide a link between chromatin, transcription factors and the basal transcription machinery (Fig. 1c). Co-activators generally have multiple transcription factor binding sites and are thus able to process multiple transcriptional regulatory inputs. The co-activator p300 is known to interact with over 50 proteins and has histone acetyltransferase activity. Co-activators can also adopt a variety of structural forms needed for different stages of transcription. For example, the Mediator co-activator complex undergoes a large conformational transition from a closed form to an open form to be able to accommodate RNAP II. The actual recruitment of RNAP II to promoter DNA is accomplished by the general transcription factors (GTFs): TFIIA, TFIIB, TFIID, TFIIE, TFIIF and TFIIH (Fig. 1d). The TATA DNA element at the promoter is recognized by the TATA box binding protein (TBP), which significantly bends DNA. This TBP-DNA association provides a platform for the assembly of associated factors. The assembly of RNAP II, GTFs and co-activators is called the pre-initiation complex (PIC). The C-terminal domain (CTD) of RNAP II is heavily phosphorylated during transcription (for example, by TFIIH or by the kinase subunit of the Mediator), and its phosphorylation stage has a critical role in pre-mRNA processing and termination of transcription.
Early observations about two decades ago indicated that various components of the transcription machinery, such as the transactivator domains (TADs) of the GCN4 and GAL4 transcription factors and the C-terminal domain (CTD) of RNA polymerase II (RNAP II), cannot be characterized by a well-defined three-dimensional structure, and that their functions may even be independent of their actual sequences1. These domains were envisaged to act as charged ‘blobs’ or ‘noodles’ that would allow recognition of interacting partners of variable architectures and form nontraditional assemblages. Although this scenario has been refined subsequently2, it has been recognized only recently that structurally ill-defined proteins and protein segments, termed intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs), are abundant in eukaryotic proteomes. For example, >50% of proteins are predicted to contain at least one IDR that is 30 amino acids in length3.The fraction of proteins with long IDRs increases with the complexity of the organism, which suggests an evolutionary benefit related to protein disorder.
In the past decade, rapidly accumulating experimental evidence has pointed to the frequent occurrence of disordered or unstructured regions in proteins involved in every aspect of transcription, thus providing flexible pieces of the transcriptional puzzle4. This review discusses recent advances in the IDP field related to components of the transcription machinery and demonstrates how structural properties and recognition mechanisms of IDPs are ideally suited to implement complex communication pathways among multiprotein regulatory assemblies. A wide range of examples that highlight how IDRs can uniquely enable a combination of specific yet versatile molecular interactions are presented.
A wealth of experimental and theoretical data demonstrate that many proteins or protein segments do not adopt a unique equilibrium structure under native conditions but exist in a rapidly fluctuating ensemble of conformations5 that may persist even in the bound form6. These results conflict with the generally accepted idea that a well-defined structure is a prerequisite for protein function and have led to a reassessment of the structure-function paradigm7. Currently, there are over 500 IDP examples assembled in the DisProt database8 whose disordered state is experimentally supported by biophysical data. The relevance of structural disorder in vivo has been corroborated experimentally—for example, by in-cell NMR measurements, which indicate the persistence of a disordered state in crowding conditions9,10.
IDPs comprise a variety of broad structural categories11. These range from completely unstructured proteins (“random coils”) that resemble the denatured states of globular proteins (for example, the p160 steroid receptor coactivator ACTR; ref. 12) to partially structured (“pre-molten globules”) or more compact ensembles (“molten globules”)13 that may have some secondary structure (for example, p27Kip1 (ref. 14) or MeCP2 (ref. 15)). Disorder can also be present in locally disordered N- or C-terminal tails or internal linkers11.
The disordered state of IDPs is intimately linked to their unusual amino acid composition— they are enriched in polar and charged residues (lysine, arginine, glutamate, glutamine and serine) and depleted in hydrophobic residues (tryptophan, phenylalanine, tyrosine, leucine, valine, isoleucine and methionine) that form the hydrophobic core in a conventional folded protein16. IDPs are often associated with low-complexity regions and include sequence repeats17. IDRs often have distinctive, specific amino acid compositions for given functions18.
The conformational malleability of IDRs allows a diversity of binding modes to other proteins, RNA, DNA or small ligands, resulting in combinatorial molecular recognition and functional versatility19. IDR-mediated interactions include simple association, chemical modifications, interference with enzymatic activities, structural rearrangements, assistance of folding or even transitions to amyloid forms20. IDPs can be classified into seven general functional categories in terms of their mechanism of action21. Within these categories, function stems directly from disorder in only one (entropic chain; for example, nucleoporin FG repeat22), whereas function is realized via binding to partner molecules in the other six (display sites23, scavengers21, effectors14, assemblers24, chaperones25 and prions26). Structural disorder is prevalent in proteins involved in signal transduction and regulation of gene expression27, which likely reflects the significance of IDRs in mediating interactions within and between complex molecular assemblies.
Binding of IDPs in general is characterized by high specificity for given (even multiple) partners as compared to nonspecific targets, and high kon and koff rates that enable rapid association with the partner without an excessive binding strength. The latter results from the large entropic penalty accompanying binding, due to the partial or full folding of the IDR (ref. 28). Currently, key molecular aspects of IDP recognition are captured by four related interaction models, as follows.
Induced folding of IDPs can be facilitated by stabilizing transient structural elements (residual structures), toward which the conformational equilibrium is shifted upon binding29. Experimental data in more than a dozen cases (for example, p27Kip1 (ref. 14) and p53 (ref. 30)) support a model in which such transiently populated preformed elements (i) of IDPs are critical for recognition. A similar principle underlies the concept of molecular recognition features (ii)31, where protein segments biased for a given conformation are distinguished in recognition and undergo an induced folding process. Even in the absence of transient secondary structures, the accessibility of different segments of the IDP chain can vary along its sequence32. Transient exposure of specific sites can facilitate recognition by serving as primary contact sites (iii) for a partner. As IDPs are predominantly constituted by polar and charged residues, hydrophobic amino acids can be used for this purpose. Short linear motifs (iv) distinguished in specific protein-protein interactions33 were found to be preferentially embedded in disordered environments34. Linear motifs are frequently associated with signaling processes, as favored target sequences of SH3 domains, 14-3-3 proteins and calmodulin, as well as sites for post-translational modification, such as phosphorylation35, acetylation or ubiquitination36, are found on these motifs.
All four concepts emphasize the segmental nature of IDPs, which has three direct consequences for recognition. First, it increases the association rates by anchoring only a small, exposed binding region, with further contacts facilitated by flexible linkers via a “fly-casting mechanism”37. Second, as recognition is primarily determined by short segments, part or even the majority of the protein may remain disordered in the bound form, resulting in structurally ambiguous, ‘fuzzy’ complexes6. Third, recognition by short motifs can be achieved by a very few specificity-determining residues, which leaves the rest of the motif, and the entire flanking region, rather free to change in sequence34.
These features of IDPs and IDRs allow them to perform unorthodox functions that require extreme adaptability (Fig. 2). For example, they can accommodate more than one partner (promiscuity) (Fig. 2a), like the C-terminal region of p53, which adopts different conformations upon interacting with four different partners (S100ββ, sirtuin, CBP and cyclin A; Fig. 2b)38. When bound in different conformations, IDPs can serve more than one (sometimes opposing) function (moonlighting)39. IDPs can also be involved in multiple activities even at the same time (one-to-many signaling), as exemplified by the high mobility group (HMGA) proteins that serve as hubs of nuclear function and sensors of a wide variety of signaling pathways40. Action of IDPs can also be fine-tuned by gradual modifications (ultra-sensitivity), for example, phosphorylation at multiple sites that individually are suboptimal for the partner41.
Bioinformatics analysis predicts that most proteins involved in transcription regulation contain at least one long (>30 amino acids) intrinsically disordered segment (with many predicted to be far more disordered), and biophysical evidence to substantiate a number of specific instances is available. A summary of well-characterized examples of IDPs involved in transcriptional regulation is presented in Table 1.
In general, IDRs and IDPs can contribute to the functioning of the transcription machinery (Fig. 1) in three ways. First, IDRs can serve as linkers connecting structured globular domains and thus extend modularity of function. Fluctuation of IDRs between widely differing conformational states imparts structural variability on the connected recognition elements or folded segments, thereby enabling recognition of partners with variable structure, such as recognition of promoters with variable organization by transcription factors42. Second, molecular associations involving IDRs result in pliable complexes that can undergo conformational changes in response to regulatory signals. For example, the Mediator co-activator complex transforms from a closed to an open form to accommodate RNAP II (ref. 43). Third, IDRs can also facilitate assembly or disassembly of large complexes coupled to functional transitions—for example, interaction of the RNAP II CTD with different partners during a transcription cycle44. In the following sections, we will describe how these principles are exploited in the action of four major components of the transcription machinery (Box 1): proteins in chromatin remodeling, transcription factors, co-activators and the basal machinery (Fig. 1).
Genomic DNA in eukaryotes is complexed with core and linker histones to form nucleosomes and chromatin fibers45, with most genes also assembled into higher order chromatin structures (Fig. 1a). The higher order condensation and conformational dynamics of chromatin fibers are mediated by the disordered core histone N-terminal ‘tails’. These IDRs are not visible in the crystal structure of the nucleosome46 and have also been shown to be disordered in solution47. In addition to chromatin condensation, a primary role of the disordered N-terminal tails is to specifically interact with nonhistone proteins and multiprotein complexes involved in gene activation or repression48. For example, the N-terminal domain (NTD) of the core histone H4 functions as an assembly platform for different chromatin remodeling complexes, such as ISWI (ref. 49) and NURF (ref. 50). The H4 NTD also helps target ISWI to specific loci51. The many interactions involving the N-terminal tails are likely of transient nature and require rapid association and dissociation with their partners. Although both core and linker histone tails are predominantly (~40%) composed of positively charged residues, their functions cannot be recapitulated by simply neutralizing DNA charges at high ionic strength45. These observations might imply that local organization of IDRs can be a functional determinant for histone proteins that can be perturbed by post-translational modifications (for example, methylation and acetylation)52. These covalent modifications have been proposed to function as a “histone code” that influences and regulates specific macromolecular interactions and cell functions48,53. Although in most cases the structural consequences of these covalent modifications have not been elucidated, they might alter secondary structure preferences of IDRs, which could abolish nucleosome-nucleosome interactions or modulate interactions with chromatin remodeling complexes49 and other chromatin proteins.
The intrinsically disordered C-terminal tail of linker histones undergoes a disorder-to-helix transition upon binding to DNA (ref. 54) and contributes to stabilization of condensed chromatin structures. The linker histone H1 °CTD also binds to and activates the DFF40 (also known as CAD) apoptotic nuclease. Interestingly, different CTD regions are used for chromatin condensation and nuclease binding. These H1 °CTD regions have distinct, highly conserved amino acid compositions18. Accordingly, scrambling the sequence of these segments does not alter CTD-dependent stabilization of higher order chromatin structures (X. Lu and J.C. Hansen, Colorado State University, personal communication). Although possible changes in the interaction and activation mechanism have not been elucidated, the IDR had to be sufficiently long (at least 47 residues). These results emphasize that there is no specific sequence requirement for the IDRs—only some spatial and electrostatic conditions have to be fulfilled.
IDPs also seem to have important roles in the function of ATP-dependent chromatin remodeling complexes, such as ISWI, CHRAC, NURF, RSC and SWI/SNF, which regulate accessibility of the genomic DNA (Fig. 1a). All these complexes facilitate sliding and transfer of histone octamers along the DNA and expose targeted DNA segments to nucleases and other probes. Gel mobility analysis suggests that most of these remodeling complexes exhibit significant deviations from globularity49,55–58, as they migrate at higher molecular weight than that expected from the mass of the complex. IDRs in chromatin remodeling complexes likely mediate low-affinity interactions with DNA and establish variable contact patterns required for sliding. Thus disordered regions enhance mobility along DNA (ref. 55), and their removal significantly impairs the sliding process50,55,56. Structural characterization by electron microscopy of the yeast SWI/SNF remodeler revealed a structure composed of eight subunits assembled into a modular and highly irregular structure57 (Fig. 1). Gel mobility analysis and disorder predictions also suggest that the Snf5 and Swi3 subunits of the SWI/SNF remodeling complex are rich in IDRs. Removal of the N-terminal domain of Swi3 (predicted to be poorly ordered) results in the remainder of the protein migrating at its actual molecular weight (C. Peterson, University of Massachusetts Medical School, personal communication). Swi3 serves as an assembly scaffold and is involved in histone binding using IDRs to interact with multiple partners. Snf5 has a role in recruitment of SWI/SNF to specific genomic regions—another process with coordinated changes in macromolecular interactions. We hypothesize that IDPs allow malleability in the structure of chromatin remodeling and modifying complexes, which facilitates their interaction with the equally structurally malleable chromatin fiber.
Modification of chromatin structure may also be facilitated by increased DNA distortion induced by “architectural transcription factors”40. For example, the disordered high-mobility group (HMG) proteins involved in chromatin structure regulation contain multiple copies of short motifs (called AT hooks) that tend to bind to the minor groove and significantly bend the DNA (ref. 59). The AT hooks are connected by flexible linkers that result in diverse modular binding patterns that neutralize charges and induce specific conformational changes of DNA. Besides DNA, HMG proteins can also interact with nucleosome particles and a large number of transcription factors, inducing formation of assemblies (enhanceosomes) at promoter or enhancer regions of inducible genes40. These interactions also facilitate contacts among the regulatory proteins and cross-associations with DNA that may result in complex regulatory pathways. As observed for the histone tails, HMG function can be regulated by modulating binding affinity to both DNA and nucleosomes via post-translational modifications such as phosphorylation and methylation, which are facilitated by structural disorder35.
A range of recent observations underscores that a variety of largely disordered proteins are involved in regulation of higher level chromatin structural organization (Table 1), the multifunctionality of which is enabled by structural disorder.
Transcription factors serve as sensors of specific DNA sequences that code for activation or repression of transcription. This is facilitated by a modular architecture in which DNA binding domains (DBDs) contact enhancer or suppressor sequences and transactivator domains (TADs) interact with other proteins that influence recruitment and assembly of the transcription machinery (Fig. 1b). Peculiar structural properties of TADs and their resistance to crystallization attempts have long been noted. Bioinformatics analysis reveals that ~90% of transcription factors contain extended IDRs60, out of which a few specific examples are listed in Table 1. Perhaps as a result of disobeying strict geometrical complementarity, TADs might also exhibit weak sequence specificity. The classical example is GCN4, in which the acidic TAD can be replaced with short random acidic segments without significant loss of activity61. It is not known, however, whether these scrambled sequences act by the same mechanism and whether their interactions with other co-activators remain unperturbed. For example, the disordered C-terminal region of the yeast activator Gal4 has dual functions: it elicits gene activation and is also capable of binding to Gal80. Mutations of cysteine or proline residues abolish interactions with Gal80 without affecting transcriptional activity, which suggests that different functions have different sequence and structural requirements2. It has also been shown that short hydrophobic motifs that might provide an underlying “organization” appear to be critical for TAD activity62,63. For example, the repeat units of the transactivator domain of the oncogenic EFP proteins were found to function even when randomized, reversed or interchanged, but the presence of a given arrangement of tyrosine residues64 was found to be an essential requirement. These hydrophobic residues may serve as primary contact sites, for example, as seen in the folding-coupled binding of the KID domain of cAMP response element–binding (CREB) protein to the KIX domain of CREB-binding protein (CBP) induced by phosphorylation of KID (ref. 65). Hydrophobic patches can also act like molecular recognition features that trigger the conversion from the initial, low-affinity complex to the high-affinity complex66. Preformed elements of TADs can also facilitate binding their interactions, as in the case of the α-helical element in binding of p53 to Mdm2 (ref. 30).
Post-translational modification sites are frequently located in disordered regions, perhaps enabling them to function as molecular switches within TADs. A prime example is ubiquitination, with its far-reaching functional consequences. Ubiquitination of a TAD not only induces degradation of the transcription factor but also signals for activation (a process called “licensing”)23. Thus, disorder of TADs may be involved in a very specific regulatory feature in which ubiquitination induces activation through destruction of the targeted protein.
In contrast with TADs, DBDs are mostly structured. However, certain transcription factors, such as GCN4 or GAGA, undergo an induced folding process upon binding to DNA, with recognition facilitated by preformed secondary structure elements67. Such disorder-to-order transitions have been shown to enhance interaction specificity68.
Highly complex, modular proteins and multiprotein complexes are responsible for communicating signals from enhancer- and repressor-bound factors to the core transcription machinery (Fig. 1c). In eukaryotic cells, co-activators are involved in multiple functions, ranging from modifying chromatin structure to interacting with a variety of regulatory proteins, general transcription factors and RNAP II. IDPs appear to perform many critical functions that enable co-activators to transduce regulatory information.
The co-activators CBP/p300 affect chromatin structure through their intrinsic histone acetyltransferase activity, and can also serve as a scaffold for the assembly of much of the transcription machinery24. Approximately half of its 2,442 residues are found in disordered regions, including the NCBR domain and linkers between six folded domains11. The six globular domains serve as templates for induced folding of many disordered transcription factors—for example, TAZ1 for HIF1-α (ref. 69) and CITED2 (ref. 70), Bromo domain for p53 CTD (ref. 71), TAZ2 for E1A (ref. 11) and KIX for the pKID domain of CREB (ref. 72) (Fig. 3). The disordered NCBD domain undergoes a synergistic folding upon binding to the ACTR domain of the p160 co-activator that is facilitated by the presence of transient helical structures in NCBD (ref. 12). These interactions contribute to the recruitment of RNAP II and the basal transcription machinery. In addition to enabling conformational flexibility, disordered linker regions harbor various linear motifs that serve as potential binding sites for regulatory proteins (for example, KHKXLXXLL for nuclear receptors73) or that serve as post-translational modification sites (for example, SUMOylation sites74). IDRs have a critical role in determining the properties of CBP and p300, which have been described as a “molecular interpreter” of different combinations of transcriptional signals owing to their highly adaptable architectures.
Coordination between chromatin remodeling and the next step in the gene expression process, assembly of the pre-initiation complex (PIC), might occur through the interplay between CBP/p300 and Mediator75, a multisubunit assembly comprising about 25 different proteins that plays an essential role in regulation from yeast to humans76. The two co-activators act synergistically, but p300 competes with the general transcription factor TFIID for binding to Mediator at the promoter. Autoacetylation of p300 in a disordered loop region provides a catalytic switch77 by initiating a conformational change that results in dissociation of p300 from the promoter. This process allows the association of TFIID with Mediator, triggering assembly of the PIC. Biochemical78 and structural79 analyses have revealed that Mediator has a modular architecture, with three structural domains that in broad terms appear to have specific functional roles. Significant conformational changes are prompted by interaction of the human Mediator complex with activators and the RNAP II CTD (refs. 80,81). In yeast, interaction of Mediator with RNAP II to form a regulation-responsive holoenzyme requires a substantial structural rearrangement of Mediator that exposes a cryptic RNAP II binding site43.
It has been suggested that the rearrangement of Mediator required for holoenzyme formation might constitute a general and essential aspect of the regulatory mechanism82, and may also be required in other organisms81,83. It seems likely that the complex conformational behavior exhibited by the Mediator complex might be facilitated by disordered regions acting as “molecular hinges” in response to various regulatory signals. Several Mediator subunits were predicted to contain a significant level of disorder, and the identified IDRs were found to be conserved, which supports this molecular hinge hypothesis84. Although structural data of the whole Mediator is limited to low-resolution reconstructed electron microscopy images, the presence of disordered regions was experimentally corroborated in the case of the Med8-Med18-Med20 submodule, which contains multiple binding sites for the TATA box binding protein (TBP) complex85. In the crystal structure, only a short α-helical region of Med8 has been observed86, whereas the linker between the C- and N-terminal regions of Med8 exhibits enhanced sensitivity to proteolytic digestion in the free protein. Thus it appears that the short structured segment of Med8 serves as a molecular recognition feature to promote recognition of a larger disordered region. The activities of different Mediator subunits are also substantially influenced by post-translational modifications87 facilitated by intrinsic disorder.
In eukaryotes, the series of molecular events that leads to gene expression culminates in initiation of transcription by RNA polymerase and the general transcription factors (GTFs) (Fig. 1d). RNA polymerases are for the most part well-structured enzymes, although flexibility of specific domains is of functional significance. In the eukaryotic RNAP II, the CTD of the largest subunit is highly disordered and contains 25–52 tandem repeats of the sequence YSPTSPS (ref. 88). The CTD serves as a scaffold for the highly orchestrated assembly of a range of complexes involved in the initiation, elongation and termination of transcription, thereby linking these steps directly to mRNA maturation89. Owing to its malleability, the CTD is capable of interacting with multiple partners and adopting different conformations upon binding90. Its function is tightly regulated by phosphorylation, and the concerted action of a variety of CTD kinases and phosphatases signals its transitions from states competent for PIC assembly to initiation, elongation and then termination of transcription89. Phosphorylation of the CTD results in significant changes in its structure and charge pattern, but CTD activity depends only weakly on its actual amino acid sequence.
Disordered regions also influence the functions of the GTFs in the assembly of the pre-initiation complex and in interactions with promoters and co-activators. In TFIIB (Fig. 2), a globular CTD that contacts the TBP at the TATA box is connected to the N-terminal RNAP II–interacting region by a linker region that is disordered in solution. Part of this linker folds into a “B finger” upon interacting with RNAP II, reaching into the active site of the enzyme and playing a crucial role in determining the transcription start site91. The conformational malleability of the B finger segment is likely required to facilitate entrance of the domain into the active site cleft of polymerase. The rest of the linker remains disordered even in the presence of RNAP II and might allow for different modes of interaction between the C-terminal portion of TFIIB and polymerase92. Cryo-electron microscopy analysis of a yeast RNAP II–TFIIF complex has revealed that TFIIF exhibits scattered density around the RNAP II active site cleft, with a considerable fraction of the factor appearing disordered93. This suggests a structural organization of TFIIF that, in agreement with predictions based on sequence analysis, consists of globular domains connected by disordered segments. The central region of Tfg1, the largest yeast TFIIF subunit, is highly charged and hypersensitive to proteolysis, which also suggests a poorly ordered structure94. Interestingly, this part of TFIIF is expected to be largely exposed in the RNAP II–TFIIF complex, and TFIIF has been shown to play a critical role in promoting association of Mediator with polymerase and the PIC (ref. 85). Although structural characterization of the basal transcription machinery is still at a very early stage, the information available so far suggests that flexibility and disorder are prevalent and likely play a crucial role in enabling RNA polymerase to interact with promoters that display a wide range of sequence and structural variability. It is tempting to suggest that solutions that have evolved to address promoter heterogeneity are likely to also be involved in regulation.
The early idea that transcription regulation requires malleable architectures rather than rigid assemblages is bolstered by a growing body of experimental evidence. Assembly of the transcription machinery involves a complex set of highly specific, transient interactions with multiple partners that often have variable architecture. In principle, the controversial requirements for specificity and structural adaptability in these interactions can only be satisfied if short recognition sites are embedded in a malleable environment capable of accommodating different structures. IDRs that function through short recognition elements or act as linkers connecting globular domains fit naturally in this context. As shown for co-activators, IDR linkers can provide enormous structural flexibility and modularity to the complex. As IDRs can interact with many partners simultaneously, they can help in the assembly and disassembly of large complexes in response to the needs of the cell (for example, Mediator). Within this framework, a network of highly specific yet transient contacts between disordered regions enables fast information flow and rapid responses to diverse regulatory signals by enabling large-scale conformational rearrangements of the transcription machinery. All these events take place against a background of chromatin structural transitions that are mediated by the intrinsically disordered histone tail domains and regulated by structurally complex chromatin modifying enzymes. At least one likely reason why there is so much intrinsic disorder in transcriptional proteins is because of the structural and functional flexibility needed to precisely coordinate so many transient, high-specificity interactions.
In regards to possible evolutionary consequences, this model can reconcile the opposing demands of conservation of regulatory mechanisms and an ease of adaptation to environmental changes. Large disordered regions that host short functional sites are more tolerant to mutations than globular proteins, thereby exhibiting ‘robustness’ in both structural and functional senses18,61,64. Indeed, the rapid evolution of IDPs has been demonstrated in a number of cases17,95. On the other hand, short motifs embedded in disordered regions can provide specific interaction sites with other proteins or serve as post-translational modification sites, yet they are easily turned on and off by evolution33, which lends the capacity to rapidly create or abolish specific functionality. In summary, the benefits of the structural flexibility, adaptability and evolvability provided by protein disorder are apparent at all levels of gene transcription regulation.
In this review we demonstrated that IDPs and IDRs are involved in all stages of the transcription process and appear to be critically important in regulatory functions. We are just at the beginning, however, of understanding the molecular mechanisms that underlie the many functions of IDPs, and their contribution to transcription regulation. Our limitations are two-fold. High-resolution structural imaging of large, flexible and transient assemblies is a major bottleneck in understanding the transcriptional machinery. In addition, the paucity of structural data on IDPs and IDRs in vivo curbs extrapolations from in vitro observations to conditions within the cell. Whereas these limitations restrain our understanding of the role of disorder in transcription regulation, the major benefit of the recognition of the importance of disorder is that it suggests new approaches toward transcription regulation that include the detailed structural and functional description of the IDRs.
Currently there are two major areas where characterization of the structural ensemble of IDRs and IDPs could significantly improve our understanding of transcription. One is the mechanisms of huge assemblages, such as chromatin remodeling machines or co-activator complexes (for example, Mediator). Anomalous gel electrophoretic mobility or cryo-electron microscopy could be used to detect the presence of disordered regions that can trigger conformational changes. Biochemical experiments, such as limited proteolysis at very low protease concentrations, can identify transiently exposed segments of the IDRs. These regions usually serve as interaction sites that contribute to self-organization of the complexes or mediate interactions with other partners. In vivo or in vitro functional assays can validate the functional roles of these regions. Bioinformatics methods are extremely useful to complement experimental data to pinpoint IDRs and locate possible functional sites (for example, for post-translational modifications) and initiate targeted studies. Further development of these tools may allow estimation of the degree of folding that is induced upon binding to a partner or a possible architecture of a large assembly.
The other area is the mechanisms of nonconventional interactions, such as those that exhibit weak dependence on the actual sequence of one of the participating macromolecules or regions. A pertinent example of such a binding event is that of histone tails to DNA or regulatory proteins. Analyzing changes in disorder upon sequence modifications by NMR or CD (and also by bioinformatics methods) may help to identify those properties that are required for the interactions. Furthermore, these methods can be useful to understand the effect of epigenetic factors, such as post-translational modifications of histone tails. Recognition of disorder (and low-complexity regions) may also provide possible connections between elements of the transcription machineries of different organisms, which could in turn help to illuminate their function.
Although much remains to be learned about the function and behavior of IDPs in transcription regulation, recognition of their prevalence and importance in controlling gene expression constitutes a new perspective that opens up new research avenues in the study of transcription regulation.
M.F. is supported by MRTN-CT-2005-019566 of the European FP6 and by the Bolyai János fellowship; P.T. is supported by Országos Tudományos Kutatási Alapprogramok NK 71582, ETT 245/2006; and I.S. is supported by Országos Tudományos Kutatási Alapprogramok K72569. V.N.U. is supported by the grants R01 LM007688-01A1 (to A.K. Dunker and V.N.U.) and GM071714-01A2 (to A.K. Dunker and V.N.U.) from the US National Institutes of Health. We gratefully acknowledge the support of the Indiana University, Purdue University at Indianapolis Signature Centers Initiative. J.C.H. is supported by US National Institutes of Health grants GM45916 and GM66834. F.J.A. is supported by funds from the US National Institute of General Medical Sciences and the Leukemia and Lymphoma Society of America. The authors are grateful to C. Oldfield for his contribution to Figure 2.
Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/