|Home | About | Journals | Submit | Contact Us | Français|
In developing bilaterans, the Hox transcription factor family regulates batteries of downstream genes to diversify serially repeated units. Given Hox homeodomains bind a wider array of DNA binding sites in vitro than are regulated by the full-length protein in vivo, regions outside the homeodomain must aid DNA site selection. Indeed, we find affinity for disparate DNA sequences varies less than 3-fold for the homeodomain isolated from the Drosophila Hox protein Ultrabithorax Ia (UbxHD), whereas for the full-length protein (UbxIa) affinity differs by more than 10-fold. The rank order of preferred DNA sequences also differs, further demonstrating distinct DNA binding preferences. The increased specificity of UbxIa can be partially attributed to the I1 region, which lies adjacent to the homeodomain and directly impacts binding energetics. Each of three segments within I1 – the Extradenticle-binding YPWM motif, the 6 amino acids immediately N-terminal to this motif, and the 8 amino acids abutting the YPWM C-terminus – uniquely contribute to DNA specificity. Combination of these regions synergistically modifies DNA binding to further enhance specificity. Intriguingly, the presence of the YPWM motif in UbxIa inhibits DNA binding only to Ubx•Extradenticle heterodimer binding sites, potentially functioning in vivo to prevent Ubx monomers from binding and misregulating heterodimer target genes. However, removal of the surrounding region allows the YPWM motif to also inhibit binding to Hox-only recognition sequences. Despite a modular domain design for Hox proteins, these results suggest that multiple Hox protein regions form a network of regulatory interactions that coordinate context-and gene-specific responses. Since most non-homeodomain regions are not conserved between Hox family members, these regulatory interactions have the potential to diversify binding by the highly homologous Hox homeodomains.
A central challenge in developmental biology is to link events in patterning and morphogenesis to the gene regulatory hierarchy that drives these processes. Because only a few proteins regulate tissue-specific gene transcription relative to the number of unique tissues that must be generated, each transcription factor active in development must uniquely function in multiple cellular contexts. Thus, this challenge is doubly complex, requiring identification of crucial genes and cis-acting regulatory sequences as well as elucidation of mechanisms that govern context-specific transcription factor function.
The Hox transcription factor family acts near the top of this regulatory hierarchy in all bilaterally symmetric animals. Each Hox protein specifies multiple structures – directing appendage and organ development as well as determining local attributes of tissues in all three germ layers.1–3 As transcription factors, the crucial contributions of Hox proteins are based on their ability to target different promoters in a tissue-specific manner.4–6 However, the DNA-binding specificity of Hox-DNA interactions is anticipated to be insufficient to distinguish cognate and non-cognate targets in even a single tissue. 7–9 Consequently, we and others have postulated that regions of Hox proteins outside the homeodomain must influence DNA binding to generate the necessary level of sequence selection.7–11 Furthermore, modulation of non-homeodomain regulatory regions through post-translational modifications,12 protein interactions,2, 13–15 or alternative splicing16 would provide an opportunity for tissue-specific regulation of target gene selection.
Unfortunately, identification of non-homeodomain regions capable of modifying binding is a long-standing challenge, hampered both by the predominance of intrinsic disorder outside the homeodomain10 and the tendency of full-length Hox proteins to aggregate.17 As a result, DNA binding studies have frequently been limited to the use of deletion mutants or unpurified proteins produced in vitro, preventing quantitative analysis of function in the native protein.11, 18 We have recently devised protocols to overcome these barriers for the full-length Drosophila melanogaster Hox protein Ultrabithorax Ia (UbxIa),10, 17 the dominant alternative splicing isoform expressed in the Drosophila embryo.16 By measuring equilibrium DNA binding by purified UbxIa of high activity under conditions where the DNA concentration is more than an order of magnitude less than the dissociation constant19 and excess non-specific DNA sequences are absent, we can accurately assess the DNA binding affinity of wild-type and mutant UbxIa, independent of the influence of other factors or competition with non-specific DNA interactions. Using this approach, we have discovered that most of the UbxIa protein sequence can influence the affinity of the homeodomain for the optimal Ubx DNA binding site.10 Both prediction algorithms and native state proteolysis experiments demonstrated that most of the sequences that modulate DNA-binding affinity in UbxIa are intrinsically disordered.10
Herein, we demonstrate directly for the first time that regions outside the homeodomain of a Hox protein also impact DNA binding specificity – altering affinity in a DNA sequence-dependent manner. DNA binding by full-length UbxIa and the Ubx homeodomain (UbxHD) was monitored for a variety of DNA sequences. Sites examined included the UbxHD optimal binding site20 and representative Ubx genomic binding targets selected to vary a wide range of factors: binding site sequence, number and density, the presence of binding sites for the Hox-interacting protein Extradenticle (Exd), the transcriptional outcome of binding (activation versus repression), and the developmental stage and tissue in which the enhancer is active.18, 20–24 Although UbxHD binds tightly to all examined sequences, the affinity of full-length UbxIa varies by more than an order of magnitude with DNA sequence, indicating regions outside the homeodomain do improve specificity. We next sought to identify specificity determinants – regions that either impact binding only to a subset of DNA sequences or have opposite effects on binding different DNA sequences. We focused on the I1 affinity-modulating region of Ubx,10 which contains an 18 amino acid sequence required for repression of distalless by Ubx in vivo.25 We find removal of this sequence alters binding affinity in a DNA-sequence specific manner, demonstrating that the 18 amino acid region acts as a specificity determinant. We next divided this region into three sections: the N-terminal 6 amino acids, the central 4 amino acid Exd interaction motif (YPWM), and the intrinsically disordered C-terminal 8 amino acids, which contain a portion of the mI alternatively spliced microexon.10,14–16, 26–29 Mutation or deletion of each of these three sections has a differing impact on DNA specificity. Furthermore, combination of these regions produces new DNA binding specificity patterns. We propose the I1 region may act as a cellular “antenna”, integrating tissue-specific information provided by alternative splicing and protein interactions to direct UbxIa to the appropriate DNA binding sites.
Previously, we demonstrated that the full-length UbxIa splicing isoform (Fig. 1a) and its homeodomain have significantly different affinities and responses to pH when binding an optimal DNA sequence.10 Strikingly, most non-homeodomain regions of Ubx can modulate Ubx•DNA binding affinity (Fig. 1e), utilizing a variety of mechanisms.10 To determine whether non-homeodomain sequences also enhance Hox•DNA binding specificity, we compared the range of affinities observed for UbxIa and UbxHD binding to a variety of DNA sequences, including the UbxHD optimal binding site, termed 40AB,20 and five representative Ubx genomic binding targets: Dll, Dpp, UA, A1, and Sal (Table 1, Fig. 2).18, 20–24 DNA binding sequences were selected to maximize the variety of (i) base sequences adjacent to the 5´-TAAT-3´ consensus site or sites, (ii) binding site densities of the promoters regulated by Ubx in vivo, (iii) requirements for the Hox interacting protein Exd, (iv) transcriptional outcomes (activation versus repression), and (v) the cellular contexts in which Ubx binds the natural sequences, including the developmental stage, germ layer, and tissue. To facilitate analysis, small (< 90 bp) sections of these complex cis-acting regulatory regions were used for DNA binding experiments.
Equilibrium mobility shift analysis experiments (gel shifts) were used to assess DNA binding at protein concentrations near the dissociation constant, where binding is most sensitive to DNA sequence, and at DNA concentrations well below this range. Gel shifts allow separation and observation of complexes with different numbers of bound proteins. Equilibrium dissociation constants are reported as Kd for single DNA sites and as the apparent equilibrium dissociation constant (Kapp) for DNA sequences with multiple binding sites. Kapp reflects not only DNA affinity but also the availability of multiple binding sites and the additional energy provided by protein•protein interactions upon the creation of higher order complexes. However, binding by a single protein dominates at the point in the binding curve where the concentration of UbxIa is equal to Kapp, although at higher concentrations multiple binding events are observed (Fig. 3a, b and Supplementary Materials Fig. 1).
Because binding experiments utilizing protein in crude cell extracts are not quantitative, and even very short truncations can dramatically impact the DNA binding properties of Ubx,10 our experiments were based on purified, full-length proteins. For each protein preparation, the purity, solubility, and activity were determined to permit confident assessment of small changes in DNA binding affinity. Wild-type and variant Ubx proteins are all untagged to preclude the possibility of a tag the protein’s structure or function. The UbxIa splicing isoform was selected for these experiments because it is the most abundant isoform expressed during fly development.16, 30 UbxIa contains all possible amino acid sequences except the 9 amino acid “b element” microexon, located between the YPWM motif and the mI microexon (Fig. 1).28
We first compared the affinity of full-length UbxIa and its isolated homeodomain for the selected DNA binding targets (Fig. 3d–j, Table 2). UbxHD binds all target sequences with high affinity, varying only ~ 3-fold. Although this low level of sequence discrimination by UbxHD may contribute to target gene selection, it is generally believed insufficient to drive the level of reliable binding site discrimination required for in vivo function.7–9 Indeed, UbxHD binds equally well to its optimal site, 40AB, and the Dll DNA, even though Dll binding by Ubx in vivo requires interaction with the Exd protein.4,24,26,29 Full-length UbxIa affinity was lower than that of UbxHD for all DNAs investigated (Fig. 3d–j, Table 2). This lower affinity (higher Kd or Kapp) is not due to an inability of multiple full-length UbxIa proteins to bind multi-site DNA sequences, since higher order complexes were also observed for the full-length protein at higher concentrations (Fig. 3c). Therefore, a net inhibition of binding by non-homeodomain regions, previously observed for the UbxHD optimal DNA binding sequence,10 may be a general feature of Ubx•DNA interactions.
In contrast to UbxHD, UbxIa binds these DNAs with affinities spanning more than an order of magnitude, demonstrating regions outside the homeodomain alter affinity in a manner dependent on DNA sequence. In fact, the ratio of UbxIa affinity to UbxHD affinity (Protein sensitivityFL-HD, Table 2) varies ~10-fold with DNA sequence. Thus, non-homeodomain regions of UbxIa impact binding to some DNA sequences more than others. Significantly, the order of DNA sequences, when ranked by affinity, is also different for UbxHD and UbxIa. Therefore, the role of non-homeodomain sequences is more complex than to simply amplify pre-existing UbxHD sequence preferences.
Our next step was to explore the non-homeodomain regions that modulate DNA binding specificity observed for full-length UbxIa compared to UbxHD. We have previously shown that the I1 region (Fig. 1e) reduces HD•DNA affinity ~2-fold.10 At the N-terminus of the I1 region is the YPWM motif, which forms a portion of the Exd interaction interface (Fig. 1b–e).15, 26–29 Much of the middle of the I1 region – corresponding to the alternatively spliced microexons16 – is predicted to be intrinsically disordered, consistent with its being hyper-susceptible to native state proteolysis.10 The C-terminus of the I1 region is also located adjacent to the N-terminal arm of UbxHD, which forms base-specific contacts with the minor groove.27,31–33 This location may allow the I1 region to impact DNA specificity, potentially in response to Exd interaction or alternative splicing. Indeed, deletion of 18 amino acids within I1, creating Ubx-18 (Fig. 1e), partially relieves the DNA binding inhibition caused by I1.10 Ubx-18 is also less able to regulate distalless transcription in vivo.25 For these reasons, our search for determinants of DNA binding specificity initially utilized the Ubx-18 mutant. Although no effect of the mutation was observed for binding 40AB, Dpp, or UA, Ubx-18 binds roughly 2- to 3-fold better to Dll, A1, and Sal DNA (Table 2, Fig 4a, UbxIa versus Ubx-18). Therefore, this 18 amino acid region impacts binding in a DNA sequence-specific manner. Indeed, the contribution of this region to specificity in the full-length protein is comparable to the level of sequence selection inherent to the homeodomain (~ 2.8 fold).
Internal deletions have the potential to induce structural rearrangements, which may, in turn, cause other protein regions to behave in a non-native manner. This possibility was particularly a concern for the Ubx-18 mutant, in which the type 1 reverse turn formed by the YPWM motif is removed.27 However, this region is bounded by an intrinsically disordered region on the C-terminal side.10 A second large disordered region, just beyond the predicted activation domain α-helix,34 is located N-terminal to the 18 amino acid deleted region.10 These disordered regions are likely to provide sufficient flexibility to mitigate any structural rearrangements due to mutation. Indeed, Ubx-18 is both soluble and active in vitro10 and in vivo.25 The circular dichroism spectra of UbxIa and Ubx-18 overlay (Fig. 4c), indicating that the 18 amino acid deletion has not caused any global structural rearrangements. Furthermore, affinity and activity are only marginally affected when binding the optimal DNA sequence, 40AB. Finally, both UbxIa and Ubx-18 cooperatively bind Dll DNA in conjunction with full-length Exd and Homothorax (Hth) (Fig. 5). Cooperative binding to this site is mediated by an alternate Exd interaction motif C-terminal to the homeodomain.29 Therefore, the C-terminus of UbxIa is also unaffected by the 18 amino acid deletion (Fig. 5). Together, these data demonstrate UbxIa structure is not greatly altered due to internal deletion in this region. However, local structural perturbations beyond the detection limits of these assays could still be caused by the deletion.
The mechanism by which the 18 amino acid region impacts DNA specificity may involve intramolecular protein interactions (e.g., with the homeodomain) as well as intermolecular Ubx•Ubx interactions on multi-site DNA sequences. To determine whether intermolecular interactions contribute to changes in DNA binding, we generated a new DNA oligonucleotide, Sal6, which only contains the middle of the 3 binding sites on the Sal DNA (Fig. 2, Table 1). UbxIa binds Sal and Sal6 with similar affinity, suggesting that single site binding dominates the UbxIa interaction with Sal (Table 3). However, Ubx-18 binds Sal6 with lower affinity than Sal. Consequently, Ubx-18 may prefer to interact with a different site on the Sal DNA, or Ubx•Ubx interactions may have a larger impact on Ubx-18 binding to Sal. In support of this second hypothesis, the affinity of Ubx-18 for Sal is higher than for the UbxIa•Sal interaction.
The amino acids deleted in Ubx-18 include a partially conserved region at the N-terminus of the deletion, the highly conserved YPWM Exd-interaction motif, and the less conserved 8 amino acids C-terminal to the YPWM motif, which includes a portion of the mI microexon (Fig. 1d). To parse the effects of these three regions, each was abrogated in turn by mutation. The role of the hexapeptide motif was initially assessed by creating point mutants within the YPWM sequence (Fig. 1e). In the first mutant, the YPWM sequence was changed to YAAA, a version whose effects in vivo have previously been assessed.18 Unfortunately, YAAA aggregates when overexpressed in E. coli and could not be purified. These mutations are predicted to replace a reverse turn within an α-helix, thus significantly impacting global protein structure. In a second variant, the YPWM sequence was mutated to GPGG, eliminating intra-protein contacts made by this motif while simultaneously avoiding potential structural rearrangements. Indeed, the solubility and activity are similar to wild-type UbxIa. Furthermore, the circular dichroism spectra of UbxIa and GPGG are similar, although the ellipticity of GPGG is less negative below 205 nm, indicating a somewhat lower content of intrinsic disorder (Supplementary Materials Fig. 2).35 GPGG binds within error limits with an affinity comparable to wild-type UbxIa for all tested DNA sequences excepting Dll, which is bound with ~3-fold higher affinity than UbxIa (Table 2, Fig. 4a). Therefore, the YPWM motif only inhibits binding by UbxIa to this composite DNA site.
To determine whether the YPWM motif is capable of inhibiting binding to other composite DNA sites, we measured binding by Ubx, Ubx-18, and GPGG to two additional composite DNA sequences. The Dppe4 sequence contains the e4 Hox•Exd binding site from the 812 bp enhancer used by Ubx to activate dpp expression in parasegment 7 of the embryonic mesoderm.36 In Dppe4, the Exd binding site is farther away from the TAAT Ubx-binding site than in the Dll sequence (Table 1). The second DNA site tested is the murine HoxB1 enhancer “repeat 3” sequence that can be used by the Labial•Exd complex to repress reporter gene expression.26 This sequence is similar to the consensus Hox•Exd site in which the Hox and Exd sites overlap, but the TAAT sequence is absent.6 For both DNA sequences, UbxIa bound with low affinity similar to Dll, and removal of the YPWM sequence either by deletion or by point mutations improved DNA binding affinity (Fig. 4b, Table 3). Therefore, the YPWM motif inhibits binding to all composite Hox•Exd DNA binding sites tested, despite variations in binding site sequence and spacing.
Whereas the Ubx-18 and GPGG mutations similarly impact binding to composite sites, only Ubx-18 alters binding to the A1 and Sal Ubx multimer binding sites (Fig. 4a). Consequently, the amino acids flanking the YPWM motif must also contribute to specificity. To test this hypothesis, the N-terminal and C-terminal flanking regions of the YPWM motif were individually removed to form the Ubx-6 and Ubx-8 mutants, respectively (Fig. 1e). The circular dichroism spectra, solubility, and activity of these mutants confirm that the overall structure of the protein was not significantly altered by these deletions (Supplementary Materials Fig. 3). However, small differences in the CD spectra below 210 nm may reflect changes in the disorder content. Both mutants impact DNA binding specificity, although they have different effects. Relative to wild-type UbxIa, Ubx-6 binds with somewhat higher affinity only to Dll (Table 2, Fig. 6a), a pattern that is similar to (but significantly weaker than) that observed for the GPGG mutant. Therefore, the role of these 6 amino acids may be to enhance the regulatory role of the YPWM motif in preventing monomer association with Ubx•Exd composite binding sites. In contrast, Ubx-8 binds less well to 40AB and Dpp, yet better to Dll (Table 2, Fig. 6a, UbxIa versus Ubx-8). The ability of the C-terminal 8 amino acids to drive both positive and negative effects upon binding different DNA sequences suggests this region can impact DNA interactions via multiple mechanisms.
Given the YPWM motif and 6 amino acids N-terminal to this motif both alter DNA binding specificity, removing the 8 amino acids C-terminal to the YPWM motif could change specificity either by truncating the distance between the homeodomain and the YPWM motif and / or by independently acting as a specificity determinant. To distinguish these mechanisms, we generated a new mutant, termed Ubx8scramble, which rearranges the 8 amino acids C-terminal to the YPWM motif in such a way as to maintain a similar disorder and hydrophobicity profile (Fig. 1e, Fig. 6c,d).37,38 Binding of Ubx8scramble was measured for all DNA sequences in which Ubx-8 was different than UbxIa. For each of these sequences, binding of Ubx8scamble closely matched that of Ubx-8 (Fig. 6b, Table 3). We conclude that the impact of the 8 amino acid region includes sequence-specific effects. Furthermore, the flexibility of this region appears to prevent major local disruptions from internal deletions.
Surprisingly, neither mutation of the YPWM motif (Fig. 4a, UbxIa versus GPGG), nor removal of either flanking region affects binding to A1 and Sal (Fig. 6a, UbxIa versus Ubx-6 and Ubx-8), even though simultaneous removal of all three regions does alter affinity for these DNA sequences (Fig. 4a, UbxIa versus Ubx-18). These results suggest these three segments also impact each other in response to the DNA sequence to create non-additive effects. To test this hypothesis, a third internal deletion mutant, Ubx-14, was constructed which removes both the N-terminal 6 amino acids and C-terminal 8 amino acids (Fig. 1e). Similar to the other mutants, simultaneous deletion of both regions did not impact the solubility or activity of the proteins. Circular dichroism experiments revealed only some loss of intrinsic disorder, similar to that observed for the GPGG mutant (Supplementary Materials Fig. 4). Both Ubx-8 and Ubx-14 bind more weakly than UbxIa to 40AB and Dpp but more tightly to Dll. However, these effects are more pronounced in the Ubx-14 mutant (Fig. 6a,e).
The DNA binding specificity of a given UbxIa variant was estimated by the variation in binding affinity for our six sample DNA sequences (“Fold variation”, Table 2). UbxHD, Ubx-8, and Ubx-14 are much less sensitive to changes in the DNA sequence (affinities for each vary only ~ 2-fold) relative to Ubx-6 (~8-fold variation), or the GPGG mutant (~6-fold variation). The wild-type UbxIa protein was best able to discriminate between the tested DNA sequences (~12-fold variation).
The YPWM motif and its surrounding regions combine to increase DNA binding specificity. The improvement in Dll binding caused by the 14 amino acid deletion (Fig. 6e, UbxIa versus Ubx-14) was similar in magnitude to the GPGG point mutations (Fig. 4a, UbxIa versus GPGG), but weaker than that observed upon removal of the entire region (Fig. 6e, UbxIa versus Ubx-18). Therefore, the YPWM motif and its flanking sequences likely work together to inhibit monomer binding to Hox•Exd composite sites. A more pronounced effect was observed for binding to A1 and Sal DNAs. The GPGG mutation and the 6, 8, and 14 amino acid deletions either inhibit or do not alter binding to Sal and A1 (Fig. 4a, Fig 6a). However, binding is significantly enhanced when all three regions are removed in the Ubx-18 mutant (Fig. 6e).
The underlying cause can be elucidated by comparing the impact of the YPWM motif in different Ubx sequence backgrounds (Fig. 4a, Fig. 6e). In the context of the full-length protein, mutation of YPWM to GPGG inhibits DNA binding only to Dll DNA (Fig. 4a). However, comparison of the Ubx-14 and Ubx-18 mutants reveals the YPWM motif, present only in Ubx-14, inhibits binding to every DNA sequence tested – even the optimal Ubx binding site – when the surrounding 14 amino acids are removed (Fig. 6e). Therefore, the region surrounding the YPWM motif controls the magnitude and sequence dependence of DNA inhibition by this motif. The inhibition of DNA binding by the YPWM motif in the Ubx-14 mutant appears to decrease variation in DNA binding affinity, and thus decrease DNA binding specificity, to levels even lower than observed for UbxHD (Table 2, Compare Fold variation for Ubx-14 and UbxHD).
Together, we find that each mutation or deletion in the 18 amino acid region impacts Ubx•DNA binding in a unique, sequence-dependent manner. The impact of this 18 amino acid region varies substantially with DNA sequence (e.g., Protein sensitivityFL-18, Table 2). The strongest effect was observed for binding Dpp (7-fold), whereas the UA and A1 DNA sequences are much less sensitive to changes in this region. We conclude multiple subregions of the I1 inhibiting segment form a network of regulatory interactions with each other and the DNA sequence to direct different aspects of specific binding.
The crucial role of Hox proteins in animal development, wound healing, stem cell differentiation, and carcinogenesis39–40 necessitates an understanding of Hox function, and in particular Hox•DNA binding, at the molecular level. Our results highlight two problems commonly encountered in assessing Hox•DNA interactions. First, Hox truncation mutants, in which the predominately disordered sequences are removed, are commonly used to avert aggregation.11, 15 However, we demonstrate that sequences outside the homeodomain can have dramatic effects on Hox•DNA binding affinity10 and specificity, requiring cautious interpretation of studies relying solely on truncation mutants. In particular, removal of only amino acids 2–19 from the N-terminus of UbxIa is sufficient to impact binding.10 Since the N-terminus is moderately conserved in both Ubx orthologues and Hox paralogues,25,41,42 even minor N-terminal truncations may alter binding in any Hox protein.
The second problem involves using an optimal DNA binding site to characterize transcription factor binding. Such sites are extremely useful for identifying potential genomic binding sites and elucidating differences between related transcription factors.20,43,44 However, the optimal DNA binding sequence masks the impact of non-homeodomain regions that alter binding to natural DNA targets (Protein sensitivityFL-HD, Table 2). In vivo, these regions may regulate binding through cooperative binding or cofactor interactions.21,31 Therefore, regulatory mechanisms potentially utilized in vivo may be overlooked when binding studies exclusively utilize the optimal DNA sequence. We have therefore examined an array of Ubx target sequences in analyzing the impact of the YPWM region.
Our search for specificity determinants outside the UbxIa homeodomain focused on an 18 amino acid sequence within the I1 region, which inhibits UbxIa•DNA binding.10 We find all mutations involving this region reduced the DNA binding specificity of UbxIa (Table 2, Fold variation). In vivo, Ubx-mediated repression of the dll and antp promoters is impaired by deletion of this region.25 The increase in binding by Ubx-18 to the Dll Hox•Exd composite DNA may permit binding and misregulation of dll by Ubx monomers. Since this deletion does not appear to alter DNA binding to the A1 portion of the large antp promoter,25 changes in Ubx-18 interactions with other Hox binding site clusters may alter antp expression (Fig. 2). Alternately, mutations in this region may also impact other aspects of UbxIa function such as protein interactions or transcription repression.13–15, 25, 26 Since this region participates in many molecular functions, it is not possible to unambiguously ascribe any single molecular event as the direct cause of a phenotype in vivo. However, in such cases in vitro studies can create a list of candidate functions that may, alone or in combination, contribute to a particular phenotype. Intriguingly, participation of a single protein region in multiple molecular events creates the potential for one interaction to regulate other molecular functions. Indeed, this scenario has already been suggested for the analogous region of a truncated Labial mutant: inhibition of DNA-binding by the Exd-interaction motif is relieved upon binding Exd.26 While we cannot measure an accurate dissociation constant at the protein concentrations required to observe a Ubx•Exd•Hth supershift of Dll DNA, our approximation of an upper bound for the true Kd is comparable to the Kd for Ubx monomer binding Dll DNA. Consequently, for Ubx we cannot conclusively determine whether interaction with Exd is able to reduce the DNA binding inhibition imposed by the YPWM motif.
We have identified three smaller regions within this 18 amino acid sequence, each with separable effects, which combine in a complex manner to generate larger variations in DNA binding affinity. We demonstrate for the first time that DNA binding inhibition by the YPWM motif in UbxIa is limited to Hox•Exd composite DNA binding sites. Thus, one role of the YPWM motif may be to specifically prevent Hox monomers from binding composite DNA sites, thereby misregulating their associated genes.45,46 Since an alternate Exd interaction motif is present in Ubx,29 the YPWM motif may, therefore, be free to also regulate DNA binding specificity in a subset of Ubx•Exd heterodimers.
The region surrounding the YPWM motif modulates both the strength and DNA sequence-dependence of YPWM-mediated binding inhibition. In addition, the amino acids N- and C-terminal to the YPWM motif independently impact DNA site selection, an effect that is dependent on the amino acid sequence for the C-terminal sub-region. Divergence – to different degrees – of these regions in Ubx orthologues suggests DNA site selection may have evolved and thus contributed to the emergence of distinct body morphologies (Fig. 1d).10
A key question is whether the 18 amino acid region is the only specificity determinant in UbxIa. Since the impact of this region is DNA sequence-dependent, the sensitivity of each DNA sequence to non-homeodomain regions must be separately addressed. In general, the effects of the 18 amino acid region do not match the net impact of all non-homeodomain sequences (Protein sensitivityFL-18 versus Protein sensitivityFL-HD, Table 2). For instance, of the DNAs examined, the Dpp sequence was the most sensitive to changes in the 18 amino acid region, yet is the least sensitive to the net impact of all non-homeodomain sequences. Conversely, binding to the UA and A1 DNAs is relatively insensitive to the 18 amino acid region, but extremely sensitive to all non-homeodomain sequences. Therefore, although in vitro and in vivo data25 suggest the importance of the 18 amino acid region in DNA site selection by full-length UbxIa, other specificity determinants must also be present.
Hox proteins must specifically regulate their target genes, despite the wide array of binding sites that meet the low sequence standards of their highly homologous homeodomains.7–9, 20,47,48 In contrast to other homeodomain families, most Hox homeodomains contain a “molecular code” that favors the 5´-TAAT-3´ sequence.32, 43, 44 We and others find that the Ubx homeodomain binds with less than 3-fold variation in DNA binding affinity to DNA sequences containing this core motif.20 Although this variation may contribute to sequence selection in vivo, it is likely to be insufficient to reliably distinguish Hox and non-Hox targets. For example, under conditions where an obligatory gene target is 100% occupied by a Hox protein, sites with unfavorable flanking sequences would still be ~40% occupied and therefore subject to mis-regulation. Indeed, we observe high affinity (72 ± 3 pM) binding of the UbxHD monomer to the Dll DNA sequence, which should only be bound by Hox•Exd heterodimers in vivo.
Individual Hox proteins direct the development of multiple tissues. Consequently, Hox proteins must also be able to recognize the cellular context and respond by binding only the target genes appropriate for that tissue.5, 6 The 18 amino acid region in Ubx could provide several mechanisms to modulate DNA binding in response to contextual cues. First, the structure or energetics of this region could be altered by protein interactions.13, 14, 26 Second, during Drosophila development, five different Ubx isoforms are produced in a spatiotemporal-dependent manner by varying the microexon content at this location.16 The C-terminal portion of the 18 amino acid region both modulates Exd interaction and includes the first portion of the mI microexon.16, 28 Therefore, mRNA splicing may alter both DNA binding specificity and the balance between binding by Ubx monomers and Ubx•Exd heterodimers to direct functions specific to a particular tissue or developmental stage. Indeed, five of the eight Drosophila Hox proteins are alternatively spliced between the YPWM motif and the homeodomain, encompassing a portion of the 18 amino acid region, suggesting that this region may act as an “antenna” which senses and responds to cell context in many Hox proteins.
Indeed, context-specific alternative splicing of intrinsically disordered regions has been proposed to be a means to diversify protein function by including or excluding functional domains (e.g., protein or nucleic acid binding domains) or regulatory regions in a tissue-dependent manner.49 Here we have shown that the intrinsically disordered 8 amino acids C-terminal to the YPWM motif, which includes a portion of the mI microexon, act as a specificity determinant. This region also acts as a regulator of another regulatory region, modulating inhibition of binding by the YPWM motif. We hypothesize that such modulation is mediated by altering the distance between the YPWM motif to the homeodomain via alternative splicing, a novel mechanism for regulation by an intrinsically disordered region. If true, then the other microexons may have a similar role, a hypothesis currently under investigation in our laboratory.
For many proteins, the length of a flexible linker has previously been shown to impact the activity of its terminal functional domains. Reducing linker length by mutation to within 5–10 amino acids can alter both the activity and stability of the N- and C-terminal functional domains.50,51 Conversely, increasing linker length to 60–70 amino acids may render the functional domains insensitive to one another.52,53 In our case, the N-terminal domain is the short YPWM motif which regulates the activity of the C-terminal DNA binding homeodomain. During development, alternative mRNA splicing varies the distance between the YPWM motif from 8 to 51 amino acids,49 roughly spanning the range in which functional domains appear to be sensitive to linker length. While this range is likely to be dependent on the size, function, and type of regulation used for each protein, the linker lengths separating other regulatory and functional modules also fall within this length range.54,55 As for Ubx, the ability of these distant motifs to regulate function is also dependent on the length and amino acid sequence.54,55
The YPWM Exd-interaction motif forms a type I reverse turn and is located between two intrinsically disordered regions10 and thus has the characteristics of a linear motif, in which a short segment that mediates protein interactions is embedded in a larger disordered segment.56 The less stringent sequence requirements for linear motifs relative to protein interaction surfaces in globular proteins enable more rapid evolution of protein interaction networks. In the case of Ubx, the YPWM linear motif is a conserved feature found in most Hox proteins in all bilaterally symmetric animals. However, the region surrounding this motif has evolved to not only be flexible, but also to regulate the function of both the YPWM motif and the homeodomain. Hence, linear motifs may also act to create functional diversity not by evolving the protein interaction motif but by using the disordered regions to modulate adjacent functional domains.
Variations in these DNA binding regulatory mechanisms may also generate functional diversity among Hox homologues. Members of the Hox protein family have nearly identical homeodomain sequences, and thus very similar DNA binding site preferences.7–9, 20, 43, 44 Despite this similarity, each Hox protein within an organism must regulate different genes in vivo to confer unique identities to otherwise similar regions. For instance, AbdA, but not Ubx, activates expression of lh, tina-1, wg, and ndae1 in the Drosophila embryonic cardiac system.57–59 Since the sequence surrounding its YPWM motif is very different in AbdA and Ubx,60 this region may diversify DNA site selection by their nearly identical homeodomains. Indeed, the region N-terminal to the homeodomain, including the 18 amino acid specificity-conferring sequence, is sufficient to dictate AbdA versus Ubx function in the visceral mesoderm.60 The inability of this region to distinguish AbdA and Ubx functions in the epidermis further demonstrates the role of this region is context-dependent.
In summary, previous in vivo developmental biology studies established the 18 amino acid region alters Hox activity in a tissue-specific manner.25, 60 Furthermore, this region also distinguishes the roles of individual Hox proteins in vivo.60 Our quantitative DNA binding data not only support these prior observations, but also provide a molecular mechanism by which the region surrounding the YPWM motif may diversify Hox function in vivo as well as respond to the cellular context by altering DNA site selection.
UbxIa and the Ubx-18 mutant genes, the gifts of Dr. Ella Tour and Dr. Bill McGinnis (UCSD), were sub-cloned into the pET3c expression vector. All other internal deletions and point mutations were constructed using the QuikChange site-directed mutagenesis kit (Stratagene).
Wild-type UbxIa and UbxHD were expressed and purified as described previously.10 All point mutations and small internal deletions were examined in full-length, untagged UbxIa. The expression and purification procedures for all variants were the same as for wild-type UbxIa.10
Full-length Exd, cloned into pET-9a, and histidine-tagged, full-length Homothorax (Hth), in pET-14b, were co-expressed in E. coli using the same procedures as for Ubx.10 The cell lysate was sonicated with twelve pulses at 15 s intervals using a VWR Bronson Scientific sonicator followed by centrifugation at 22,000 × g for 20 min.. Polyethyleneimine (50% W/V, 200 µl) was added to the supernatant prior to centrifugation for 15 min. Protein samples were mixed with 4 mL nickel-nitrilotriacetic acid (Ni-NTA) agarose resin (slurry), which was pre-equilibrated with wash buffer, containing 50 mM NaH2PO4 (pH 8.0), 300 mM NaCl, 10 mM imidazole, and 5% glucose. The protein mixture was loaded onto the column, washed with 50 mL wash buffer containing 20 mM imidazole, followed by 30 mL wash buffer containing 40 mM imidazole, and 20 mL wash buffer containing 80 mM imidazole. Protein was eluted with wash buffer containing 200 mM imidazole and analyzed by SDS-PAGE and western blots.
The filter aggregation assay17 was used to examine solubility of purified wild-type and mutant UbxIa. The YAAA mutant, which could not be purified, formed insoluble inclusion bodies. UbxIa and all other UbxIa derivatives were soluble.
Synthesized DNA binding oligonucleotides were annealed and labeled as previously described.10 Oligonucleotide sequences are listed in Table 1. The Dpp DNA binding sequence is located on chromosome 2L, at bases 2452271–2452228; the UA sequence is on 3R, at bases 12560064–12560101; the Sal sequence is on 2R, at bases 11454558–11454617; the A1 sequence is on 3R, at bases 2830985–2831025; and the Dll sequence is on 2R, at bases 20691027–20690984. Both protein activity and affinity experiments were performed using gel retardation assays, using the same binding buffer and gel composition reported previously.10 Activity assays measured the fraction of active protein using stoichiometric conditions, where [DNA] » Kd. All purified proteins were greater than 80% active. The affinity gel retardation assays were performed under equilibrium conditions, where [DNA] « Kd, generally 10−13 to 10−10 M depending on the protein utilized and DNA sequence.10 Since some DNA sequences used have multiple binding sites, the disappearance of free DNA was analyzed for consistency. Therefore, the apparent dissociation constant, Kapp, rather than Kd, is reported for the multi-site target DNAs. Each measurement is corrected for protein activity and is the average of at least nine gel shift experiments, as reported previously.10
For gel super-shift experiments, purified UbxIa or Ubx-18 (concentrations ranging from 1 × 10−11 M to 1 × 10−9 M) with 5 × 10−8 M purified Exd•Hth and 10−12 M Dll oligonucleotide in 20 mM Tris-HCl (pH 7.5), 100 mM KCl, 100 µg/mL BSA, 5 mM DTT, and 10% glycerol were incubated for 30 min. at room temperature. Retardation gels contained 0.5% agarose and 3% polyacrylamide (37.5:1 acrylamide:bisacrylamide), 0.5X TBE (0.023 M Tris-borate, 0.5 mM EDTA, pH 8.0), and 3% glycerol. The gels were pre-run at constant current (150 V) while circulating 0.5X TBE buffer for at least 30 min. Samples were loaded at 300 V, and after 5– 8 min. the voltage was reduced to 150 V for approximately three hours. Gels were blotted onto filter paper, dried on a vacuum dryer, and exposed to a FUJI phosphorimaging plate overnight. Images of protein•DNA complexes were scanned with a Fuji Imager and quantified using the MacBAS1000 2.0 program.
Circular dichroism was used to assess structural changes due to mutation, whereas magnetic circular dichroism was used to determine protein concentrations by measuring tryptophan concentration in an environmentally independent manner.61 Buffer conditions and experimental parameters were as previously described.10
Fig. 1. Representative gel shifts for UbxHD and UbxIa binding to 40AB, Dll, Dpp, A1, and Sal DNAs. Gel shifts for the UA DNA are reported in Fig. 3. Arrows indicate the lane at which the protein concentration is closest to the dissociation constant.
Fig. 2. Circular dichroism spectra of UbxIa (open circles) and GPGG (gray circles).
Fig. 3. a. Circular dichroism spectra of UbxIa (open circles) and Ubx-6 (gray circles). b. Circular dichroism spectra of UbxIa (open circles) and Ubx-8 (gray circles).
Fig. 4. Circular dichroism spectra of UbxIa (open circles) and Ubx-14 (gray circles).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*This work was supported by grants from the Robert A. Welch Foundation (C-576) and the National Institutes of Health (GM22441) to K.S.M. We thank the members of the Matthews and Silberg laboratories at Rice University for helpful comments and suggestions. We also thank Joseph Mire (Texas A&M University) for assistance modeling the disordered YPWM-HD linker region. The UbxIa gene and the Exd and Hth constructs were the gifts of Dr. Ella Tour and Dr. Bill McGinnis, both from UCSD. The UbxHD expression vector was the gift of Dr. Philip Beachy from Johns Hopkins University.