|Home | About | Journals | Submit | Contact Us | Français|
The helix-loop-helix (HLH) family of transcriptional regulatory proteins are key players in a wide array of developmental processes. Over 240 HLH proteins have been identified to date in organisms ranging from the yeast Saccharomyces cerevisiae to humans (6). Studies in Xenopus laevis, Drosophila melanogaster, and mice have convincingly demonstrated that HLH proteins are intimately involved in developmental events such as cellular differentiation, lineage commitment, and sex determination. In yeast, HLH proteins regulate several important metabolic pathways, including phosphate uptake and phospholipid biosynthesis (19, 67, 112). In multicellular organisms, HLH factors are required for a multitude of important developmental processes, including neurogenesis, myogenesis, hematopoiesis, and pancreatic development (12, 86, 127, 179). The purpose of this review is to examine the structure and functional properties of HLH proteins.
Gene transcription of the immunoglobulin heavy-chain (IgH) gene has long been known to be regulated, in part, by a cis-acting DNA element known as the IgH intronic enhancer (109, 156). By in vivo methylation protection assays, a number of sites were identified in both the IgH and the kappa light-chain gene enhancers which were specifically protected in B cells but not in nonlymphoid cells (41). These elements shared a signature motif which consisted of the core hexanucleotide sequence, CANNTG, and were subsequently dubbed E boxes (41). A total of five E-box elements are present in the IgH gene enhancer: μE1, μE2, μE3, μE4, and μE5. The Ig kappa enhancer also contains three cannonical E boxes, designated κE1, κE2, and κE3. E-box sites have been subsequently found in B-cell-specific promoter and enhancer elements, including a subset of Ig light-chain gene promoters, the IgH and Ig light-chain 3′ enhancers, and, more recently, the λ5 promoter (110, 118, 156).
E-box elements have also been identified in promoter and enhancer elements that regulate muscle-, neuron-, and pancreas-specific gene expression. For example, in muscle, the muscle creatine kinase gene, acetylcholine receptor genes α and δ, and the myosin light-chain gene all require E-box elements for full activity (27, 51, 85). A number of genes whose expression is limited to the pancreas also require E-box sites for proper expression. The insulin and somatostatin genes, for example, contain E-box sites that, when multimerized, are sufficient to regulate pancreatic β-cell-specific gene expression (168). More recently, E-box regulatory sites have been identified in a number of neuron-specific genes, including the opsin, hippocalcin, beta 2 subunit of the neuronal nicotinic acetylcholine receptor, and muscarinic acetylcholine receptor genes (1, 21, 52, 125).
Two proteins, termed E12 and E47, were originally identified as binding to the κE2/μE5 site (65, 102). They have a region of homology with the Drosophila Daughterless protein, the myogenic differentiation factor MyoD, members of the achaete-scute gene complex, and the Myc family of transcription factors (102). This stretch of conserved residues, known as the Myc homology region, appeared to be critical for the DNA binding properties of E12 and E47 (102). The E12 and E47 proteins, which differ only within this Myc homology region, arise by alternative splicing of the E2A gene (157). This conserved sequence, which was modeled as two amphipathic alpha helices separated by a flexible loop structure, was named the HLH motif and shown to function as a dimerization domain.
The solution structure of the basic HLH (bHLH)-leucine zipper (LZ) factor Max first confirmed the existence of the HLH motif (44). Subsequently, the three-dimensional structure of the E47 bHLH polypeptide bound to its E-box recognition site, CACCTG, has been solved at 2.8-Å resolution (38). A number of interesting features were revealed from analysis of the E47 crystal structure. The E47 dimer forms a parallel, four-helix bundle which allows the basic region to contact the major groove (38). In addition to the basic region, residues in the loop and helix 2 also make contact with DNA (38). Stable interaction of the HLH domain is favored by van der Waals interactions between conserved hydrophobic residues (38). The E47 dimer is centered over the E box, with each monomer interacting with either a CAC or CAG half-site. A glutamate present in the basic region of each subunit makes contact with the cytosine and adenine bases in the E-box half-site. An adjacent arginine residue stabilizes the position of the glutamate by direct interaction with these nucleotides and additionally the phosphodiester backbone. Both the glutamate and the arginine residues are conserved in most bHLH proteins, consistent with a role in specific DNA binding (6, 38, 102).
Owing to the large number of HLH proteins that have been described, a classification scheme that was based upon tissue distribution, dimerization capabilities, and DNA-binding specificities was devised (Fig. (Fig.1)1) (101). Class I HLH proteins, also known as the E proteins, include E12, E47, HEB, E2-2, and Daughterless. These proteins are expressed in many tissues and capable of forming either homo- or heterodimers (103). The DNA-binding specificity of class I proteins is limited to the E-box site. Class II HLH proteins, which include members such as MyoD, myogenin, Atonal, NeuroD/BETA2, and the achaete-scute complex, show a tissue-restricted pattern of expression. With few exceptions, they are incapable of forming homodimers and preferentially heterodimerize with the E proteins. Class I-class II heterodimers can bind both canonical and noncanonical E-box sites (103). Class III HLH proteins include the Myc family of transcription factors, TFE3, SREBP-1, and the microphthalmia-associated transcription factor, Mi. Proteins of this class contain an LZ adjacent to the HLH motif (66, 177). Class IV HLH proteins define a family of molecules, including Mad, Max, and Mxi, that are capable of dimerizing with the Myc proteins or with one another (7, 22, 174). A group of HLH proteins that lack a basic region, including Id and emc, define the class V HLH proteins (18, 39, 47). Class V members are negative regulators of class I and class II HLH proteins (18, 39, 47). Class VI HLH proteins have as their defining feature a proline in their basic region. This group includes the Drosophila proteins Hairy and Enhancer of split (76, 141). Finally, the class VII HLH proteins are categorized by the presence of the bHLH-PAS domain and include members such as the aromatic hydrocarbon receptor (AHR), the AHR nuclear-translocator (Arnt), hypoxia-inducible factor 1α, and the Drosophila Single-minded and Period proteins (34).
Recently, another classification method of HLH proteins has been described (6). Based on the amino acid sequences of 242 HLH proteins, a phylogenetic tree was created to group family members according to evolutionary relationships (6). Four major groups, A through D, which comprise more than 24 protein families were identified (6). The groupings were based upon DNA-binding specificity as well as conservation of amino acids at certain positions (6). As the number of HLH proteins continues to grow, this evolutionary or “natural” classification may provide a more accurate and convenient means of categorization.
Genetic studies in S. cerevisiae have identified a number of HLH factors that are important for regulating the transcription of genes involved in different metabolic pathways. The INO2 and INO4 genes encode HLH proteins that function together as a heterodimer to activate transcription of the fatty acid synthase genes required for the biosynthesis of phospholipids (67, 112). The HLH protein Pho4 plays an essential role in the regulation of phosphate uptake in yeast (see below). The yeast HLH protein CBF1 has a dual role in regulating methionine biosynthesis and controlling chromosomal integrity (28). Under conditions of mitochondrial stress, an isoform of citrate synthase encoded by the nuclear gene CIT2 is transcriptionally induced (90). Two HLH proteins, Rtg1p and Rtg3p, have been shown to be required for both the basal and induced expression of CIT2 (89, 137). Rtg1p and Rtg3p form a heterodimeric complex that binds to two sequence elements, each containing an E-box half-site present in the promoter of the CIT2 gene (73).
Recent analysis of open reading frames present in the C. elegans genome has revealed 24 putative HLH proteins (142). A number of these genes have been identified previously in genetic screens. For example, the bHLH protein lin-32 is required for proper formation of the male tail rays, which are peripheral sense organs important for mating (176). A class VI HLH protein, lin-22, is important for proper nervous system patterning (169). A Daughterless ortholog has also been cloned, demonstrating that both the ubiquitous and cell-type-specific HLH proteins are present in nematodes (81). Surprisingly, however, like S. cerevisiae, the C. elegans genome lacks HLH proteins of the Id class (142). This suggests that inhibition of HLH activity through the formation of non-DNA-binding complexes was a strategy that evolved later in evolution.
The decision of what sex to become is regulated in Drosophila primarily by the ratio of X chromosomes to sets of autosomes (122). A number of genes, including daughterless (da), sisterless-a (sis-a), and sisterless-b (sis-b), that are involved in sex determination have been identified (122). The da gene product, a bHLH protein, is required maternally and is present in equal amounts in both male and female embryos (35, 122). Mothers of the da genotype fail to produce female, but not male, offspring (35). sis-b, which also encodes a bHLH protein, is located on the X chromosome and is required zygotically for proper sex determination. Both genes have been shown to be required for activation of the Sxl gene promoter (23, 75). Since females are 2X, they make twice as much sis-b as males, and thus only females make sufficient concentrations of da–sis-b heterodimers to activate Sxl expression (32). In support of this model, da–sis-b heterodimers have been shown to bind E-box sites in vitro and can activate transcription in S. cerevisiae (4, 161). The choice between male and female in flies is thus dependent upon the concentration of active bHLH heterodimers.
Perhaps the best understood of the bHLH proteins are the myogenic regulatory factors (MRFs). Four genes whose protein products regulate both establishment and differentiation of the myogenic lineage have been identified. These proteins, named MyoD, MRF-4, Myf-5, and myogenin, are approximately 80% similar in their bHLH regions (115, 167). In addition, the MRFs have homology outside the bHLH domain, including a cysteine-histidine-rich stretch adjacent to the basic region and a serine-threonine-rich region at the carboxyl terminus (115). All MRFs are capable of converting the mesodermal cell line C3H10T1/2 into muscle progenitor cells known as myoblasts (115). This provided support for the existence of a muscle-specific master regulatory gene(s) (167). All MRFs are expressed solely in skeletal muscle, as determined by Northern blot analysis (164). In situ hybridization studies of the developing mouse embryo have shown that the MRFs are expressed in the somites and limb buds, consistent with a role in skeletal muscle development (164).
Gene targeting experiments in mice have demonstrated that the MRFs are important for different stages of muscle development. For example, both MyoD and Myf-5 null mutant mice show normal skeletal muscle development, suggesting a redundancy in MRF function (25, 138). Mice containing a homozygous deletion of both MyoD and Myf-5, however, die at birth and lack skeletal muscle (139). The myogenin gene product is required at a later stage of muscle development, specifically during the terminal differentiation of myoblasts to myotubes (61, 107). Three groups have independently generated mice homozygously null for MRF-4 (24, 124, 175). Although all used a similar gene targeting strategy, the severity of the phenotype was dependent upon the knockout allele used (116). Interestingly, the phenotypic variation in MRF-4−/− mice was due to an unanticipated inhibitory effect on the expression of the neighboring Myf-5 allele caused by disruption of the MRF-4 locus (116). These studies ultimately concluded, however, that MRF-4 is not required for the formation of skeletal muscle.
Two class II HLH proteins have been shown to be essential for proper heart morphogenesis. In situ hybridization studies in the mouse and chick have shown that two closely related HLH genes, dHAND and eHAND, are expressed in the developing heart (154). Inhibition of both dHAND and eHAND expression with antisense oligonucleotides resulted in arrested development at the cardiac looping stage (154). Consistent with these findings, both dHAND and eHAND null mutant mice exhibit pronounced defects in heart formation and die as embryos (45, 135, 155).
As discussed above, a number of pancreas-specific genes contain E-box elements that regulate their tissue-restricted expression. α and β cells express both HEB and E2A and heterodimerize with a class II HLH protein, BETA2 (49, 111, 119, 150). Surprisingly, BETA2 turned out to be identical to NeuroD, a factor involved in neurogenic differentiation (86). Recently, mice that carry a targeted deletion of the BETA2/NeuroD gene have been generated (108). BETA2/NeuroD knockout mice exhibit no obvious neuronal phenotype but show marked pancreatic defects (108). These animals are severely diabetic (BETA2−/− mice have blood glucose levels fourfold higher than those in wild-type littermate controls) and die several days after birth (108). Normal morphogenesis of the pancreas is disrupted in BETA2 null mutant mice, and there is a dramatic reduction in insulin-producing β cells (108).
A number of HLH proteins have been implicated in the control of vertebrate neurogenesis. Many of these factors were identified by a degenerate PCR strategy based on the sequences of Drosophila genes which regulate neurogenesis. For example, the mammalian achaete-scute homolog, MASH-1, plays an essential role in the generation of both autonomic and olfactory neurons (55). Recent gene targeting experiments in mice have demonstrated that Math1, the murine homolog of Atonal, is required for the development of granule neurons and the external germinal layer of the cerebellum (17). Interestingly, Math1 is also required for the formation of hair cells in the inner ear (20). Injection of NeuroD-related bHLH factor neurogenin 1 (ngn1) mRNA into Xenopus embryos induces ectopic neurogenesis and the expression of NeuroD (94). Although expression of NeuroD also promotes the generation of ectopic neurons in Xenopus, it does not activate ngn1 expression (86, 94). These data suggest that NeuroD is a downstream target of ngn1 (94). Consistent with these findings, ngn1 null mutant mice fail to activate NeuroD expression and show defects in cranial sensory neuron development (93).
The HLH proteins described above act as transcriptional activators. However, known HLH repressors also function in neurogenesis. Mice lacking the HLH repressor hairy and Enhancer of split homolog, HES-1, display a marked neural tube defect that may be the result of premature neurogenesis (70). The Id family of inhibitory HLH factors also play key roles in neurogenesis. All four mammalian Id genes are expressed in the developing brain and spinal cord (72). The unique expression patterns of the Id genes suggest that each is performing a specialized or nonredundant function (72). Retrovirus-mediated overexpression studies in chick embryos have demonstrated that Id2 can promote premature neurogenesis (95). Additionally, ectopic Id2 expression was able to convert ectoderm into neural crest cells, suggesting that the relative level of active bHLH proteins in these tissues helps determine whether cells will adopt a neural or epidermal fate (95).
HLH factors play significant roles in hematopoiesis. The bHLH protein SCL/Tal1, originally discovered by virtue of its involvement in a T-cell acute lymphoblastic leukemia (T-ALL)-specific translocation, has been shown to be essential for the development of all hematopoietic cell lineages (127). Disruption of the SCL/tal1 gene in mice leads to early embryonic lethality and a complete absence of blood (151). Specifically, immature erythrocytes are lacking in the embryo, placenta, or yolk sac of tal1-deficient mice, indicating a key role for Tal1 in erythropoiesis (151). Further studies indicated that the Tal1 protein is also required for proper B- and T-lineage development (127).
Class II tissue-specific bHLH proteins are required for a number of developmental processes in organisms as diverse as flies and mammals. This, however, does not seem to be the case for the regulation of B-cell development, as homodimers of E47, a widely expressed E protein, appear to be the relevant DNA-binding species (11, 104, 149). These results put to rest, temporarily, the notion that B-cell-specific gene expression could be regulated by a heterodimer composed of E47 and a B-cell-restricted HLH protein (149). Recently, however, a B-cell-restricted bHLH protein, ABF-1, which is capable of binding DNA as a heterodimer with E2A has been identified (98). We note that ABF-1 expression has been detected only in human activated B cells and its function in B lineage development remains to be clarified.
Ectopic expression of E12 and E47 in various cell types is sufficient to activate B-lineage-specific gene expression. Overexpression of E47 in a pre-T-cell line leads to Ig germ line transcription and promotes IgH DJ gene rearrangements (146). Ectopic expression of E12 in a macrophage cell line leads to activation of λ5, RAG-1, early B-cell factor (EBF), and Pax-5 transcription (74, 152). In summary, these data support a model whereby E12 and E47 homodimers activate B-lineage-specific gene expression (101, 104).
Targeted deletion studies in mice have since demonstrated that the E2A gene is absolutely required for B-cell development (12, 179). E2A null mutant mice lack pre-B and mature B lymphocytes and contain significantly reduced numbers of B220+ CD43+ B-cell progenitors (12, 179). Additionally, thymocyte development is also severely perturbed. Both E2A- and HEB-deficient mice show abnormalities during the early stages of thymocyte differentiation (10, 178). Furthermore, ectopic expression of Id3 in human fetal thymic organ cultures showed a complete block in thymocyte development at a stage similar to that seen in E2A-deficient bone marrow cells (62). The E2A proteins are also required for proper γδ T-lineage development (13). Specifically, in the absence of E2A, γδ T-cell receptor (TCR) V(D)J recombination is severely perturbed (13). Interestingly, the dosage of E2A proteins was shown to regulate site-specific recombination of these loci. These data suggest that the concentration of E2A proteins in developing thymocytes may be rate limiting for the recombination reaction, preventing the rearrangement of two alleles at a given time.
E2A-deficient mice revealed novel aspects for E2A function in addition to the role of E2A proteins in lymphocyte development. Specifically, E2A-deficient mice rapidly develop T-cell lymphomas (10). The tumors are monoclonal in origin, are of an immature T-cell phenotype, and will form tumors when injected into nude mice (10). Ectopic expression of Id2 in developing thymocytes also leads to the rapid development of lymphomas, consistent with these observations (100). However, it is remarkable that in these mice polyclonal tumors develop rather than the monoclonal populations observed in E2A-deficient thymocytes. These seemingly contradictory results may be explained by assuming that ectopic expression of Id2 blocks both E2A and HEB DNA binding activity, thus leading to a more severe phenotype compared to the E2A deficiency alone.
It is conceivable that loss of E2A homodimers, through targeted gene disruption or through ectopic expression of other bHLH proteins such as Lyl1 or Tal1, is an important event driving the development of T-cell leukemias (10). Thus, a function for E2A as a tumor suppressor in human malignancies is a reasonable possibility. Ectopic expression of E2A in E2A-deficient lymphomas rapidly leads to the induction of apoptosis, suggesting that indeed E2A is acting as a tumor suppressor (40). Similarly, overexpression of E2A in human T-ALL-derived cells results in cell death, providing further evidence for a key role of these proteins in lymphomagenesis (120).
E2A proteins also contribute to the regulation of cell cycle progression. Specifically, ectopic expression of E2A in fibroblasts blocks cells in the G1 phase (126). Consistent with a role for the E proteins in growth control, it was recently shown that Id2 is overexpressed in human pancreatic cancer (77). It was demonstrated previously that Id2 can reverse retinoblastoma protein (pRb)-mediated growth arrest in Saos-2 cells (69). Remarkably, ectopic expression of Id1 in primary keratinocytes leads to immortalization (2). Furthermore, deregulated expression of Id1 alone activates telomerase activity and promotes pRb phosphorylation in keratinocytes (2). How pRb phosphorylation is regulated by Id1 remains to be determined. Interestingly, both MyoD and E2A have the ability to regulate cyclin-dependent kinase inhibitor p21 expression (56, 128). It is conceivable, therefore, that Id1 controls pRb phosphorylation by indirectly controlling the expression of cyclin inhibitors. We propose that since the E proteins are targets of Id family members in these cells, they may contribute, directly or indirectly, to the regulation of telomerase activity. It will be important to dissect the role of E proteins in cell cycle progression, cell survival, and immortalization. Taken together, these observations suggest that E proteins regulate a wide variety of genes that are involved in many aspects of cellular differentiation and homeostasis.
Consistent with their role in activating the expression of B-lineage-specific genes, the E proteins, which include E2A (E47 and E12), E2-2, and HEB, have been shown to function as transcriptional activators (65). Recent studies have demonstrated that the activation domains of E2A are required for proper regulation of target genes in vivo. Removal of the N-terminal transactivation domains renders E12 incapable of inducing B-lineage-specific gene expression in a macrophage cell line (74). The transcriptional activity mapped to two domains, AD1 and AD2, located in the N-terminal half of the E2A protein (5, 65, 97, 132).
Interestingly, mutations that affected the transcriptional activity of the AD1 and AD2 domains in mammalian cells showed the same effects in the yeast S. cerevisiae, suggesting that the targets of these domains are highly conserved (97, 132). Two recent reports have shed some light on the transactivation properties of the E proteins. The coactivator p300, originally identified by virtue of its ability to interact with the E1A oncoprotein, has been shown to interact with E12 when bound to DNA (37). p300 has been shown to contain histone acetyltransferase (HAT) activity, thus raising the possibility that the E proteins recruit enzymes involved in chromatin modification (14, 37, 114). The interaction between p300 and E12 was mapped to the bHLH region of E12 and did not require the presence of the transactivation domains (37). These data are somewhat surprising since the bHLH domain does not contain intrinsic transactivation potential. p300 also has the ability to potentiate both AD1- and AD2-mediated transcription (130). More recently, studies have indicated that a conserved motif, LDFS, present in the AD1 domain directly interacts with a distinct nuclear HAT complex, termed SAGA (Fig. (Fig.2)2) (96). Many of the subunits that make up the SAGA complex are conserved from yeast to mammals. This includes a subset of the TATA binding protein-associated factors (TAFs), the HAT Gcn5, the Ada and Spt proteins, and an ATM family member called Tra1 (53, 54, 113). Specifically, amino acid substitutions within a conserved region of AD1 that disrupts SAGA binding in vitro abrogate transcriptional activation in vivo (96). Furthermore, AD1 cannot activate transcription in yeast strains lacking functional SAGA complexes (96). Taken together, these findings suggest that the E proteins may stimulate target gene transcription through interaction of their conserved activation domains with protein complexes that exhibit HAT activity. We note that it remains to be determined whether solely the SAGA complex or p300 or both are essential to mediate E-protein-mediated transcription.
In addition to controlling Ig gene transcription, the Ig enhancers function to promote site-specific recombination. Since E2A binding sites are required for Ig enhancer activity, the potential role for these proteins in V(D)J recombination has been extensively studied. When E47 is expressed in a pre-T-cell line, it is capable of activating germ line transcription from the IgH enhancer and enhancing IgH DJ gene rearrangements (146). Overexpression of E47 activates IgH germ line transcription (31). More recently, E2A has been shown to be required for proper V(D)J recombination of certain TCR γ and δ loci (13). Remarkably, ectopic expression of E2A in a nonlymphoid line activates rearrangement of the Ig kappa light-chain gene (B. Romanow et al., submitted for publication). More specifically, overexpression of E12/E47 and RAG1 and RAG2 in embryonic kidney cells promotes endogenous Ig kappa VJ recombination utilizing variable regions that are interspersed over large segments of DNA (Romanow et al., submitted).
E2A proteins also are involved in DNA rearrangement at a later stage of B-cell development, class switch recombination (CSR). Immunohistochemical staining of human lymph nodes, spleens, and appendixes with an anti-E2A antibody showed that E2A protein was present in the dark zone of the germinal centers (50). This region is known to contain activated B cells that are actively undergoing CSR and somatic mutation (16). This is consistent with earlier reports demonstrating that high levels of E2A transcripts are present in splenic germinal centers (136). Similarly, in vitro evidence indicates that E2A proteins levels increase significantly upon activation with various mitogens (131). A role for E2A-like proteins in isotype switch recombination was demonstrated both in cell lines and in primary B-lineage cells (50). Interestingly, E2A-deficient B-lineage cells were not able to promote class switch recombination as detected by genomic rearrangements (131). Thus, E2A proteins are required for proper Ig isotype switching. These data open the intriguing possibility that E2A is directly involved in regulating various aspects of DNA rearrangement of both TCR and Ig loci.
How might E2A control DNA recombination in lymphocytes? It is conceivable that E2A directly recruits a recombination factor(s) to specific sites within the target gene. In this model, the ability of E2A to regulate recombination may be distinct from its transcriptional activation capability. While such a scenario cannot be excluded, we favor a mechanism in which E2A proteins promote accessibility to target gene loci. E2A binding sites are present in, for example, both the promoter and enhancer elements of the Ig kappa locus. E2A could recruit a nuclear HAT activity such as p300/CREB binding protein (CBP) or a mammalian SAGA complex (PCAF or human GCN5 [hGCN5] complexes) to the target gene, which could alter the acetylation state of nucleosomal histones (113). Covalent modification could destabilize the target nucleosomes, perhaps by weakening contact between adjacent nucleosomes (91). This, in turn, could lead to a localized opening of chromatin and allow access by the recombination machinery.
MyoD and BETA2 also have the ability to bind p300/CBP (37, 105, 130). Interestingly, the p300/CBP-associated factor PCAF also directly interacts with MyoD in vitro and has the ability to enhance MyoD-directed transcription (129). The HAT activity of PCAF but not that of p300 was required for coactivation of p21 (129). Furthermore, when myotube cell extracts were affinity purified with an E-box oligonucleotide, MyoD, p300, and PCAF were specifically retained (129). What are the functional consequences of the MyoD-p300 and MyoD-PCAF interactions on muscle differentiation? Microinjection studies demonstrated that myogenic differentiation was inhibited the addition of either anti-PCAF or anti-p300 antibodies (129). Conversely, overexpression of PCAF enhanced the differentiation of myoblasts to myotubes and cooperated with MyoD to promote increased conversion of fibroblasts into muscle cells (129). Hence, the physical association of MyoD with nuclear HATs is essential for its ability to direct skeletal muscle differentiation. What are the effects on chromatin structure on HLH target genes in vivo? A recent study has provided insight into how MyoD activates transcription by altering chromatin structure (48). Expression of a hormone-inducible form of MyoD activates transcription of myogenin, and MyoD was shown to remodel chromatin structure at several muscle-lineage specific genes, including MCK, myoD, and myogenin (48). These findings suggest that a MyoD-HAT complex may be recruited to promote accessibility to the transcription machinery, similar to that described above for the E2A proteins (Fig. (Fig.22).
Sterol-responsive element-binding proteins 1 and 2 (SREBP-1 and -2) are bHLH-LZ transcription factors that are important for chloresterol-mediated induction of the low-density lipoprotein receptor (26, 143). These proteins, which are normally anchored in the endoplasmic reticulum membrane, are released by a chloresterol-induced protease and translocate to the nucleus, where they activate, in concert with Sp1, low-density lipoprotein receptor transcription (26, 143). The transcriptional synergy between SREBP-1a and Sp1 was not observed on naked DNA but required templates to be packaged into chromatin (106). More detailed biochemical analysis revealed that, in addition to the TAFs, a novel CBP-containing protein complex was required for transcriptional synergy in vitro (106). This complex contains at least 10 polypeptides in addition to CBP and, not surprisingly, exhibits HAT activity.
The Myc family of proteins has been shown to regulate a wide range of processes, including oncogenic transformation, apoptosis, and cellular differentiation (reviewed in references 42 and 64). Significant progress into the mechanism of transcriptional regulation by c-Myc has been achieved. A member of the SWI-SNF chromatin remodeling complex, hSNF5 (Ini1), interacts with the bHLH-LZ domain of c-Myc (30). Expression of a dominant-negative form of BRG1, another member of the SWI-SNF complex, blocks c-Myc transcriptional activation (30). Collectively, these data suggest that direct recruitment of SWI-SNF complexes by c-Myc is required for its transactivation function (Fig. (Fig.22).
The N terminus of c-Myc is required for transactivation and is essential for mediating its oncogenic effects (33). A novel ATM-related factor, TRRAP, interacts with the N terminus of c-Myc (99). TRAPP is closely related to Tra1, a yeast homolog that is present in several biochemically distinct HAT complexes, including SAGA (54). Furthermore, TRRAP has been shown to be a component of two mammalian SAGA-related activities: the PCAF and hGCN5 complexes (162). Interestingly, overexpression of dominant-negative forms of TRRAP blocked the ability of c-Myc to transform, in conjunction with activated ras, rat embryo fibroblasts (99). Taken together, these data indicate that c-Myc may be regulating gene expression, at least in part, by recruiting distinct HAT complexes through the association with its cofactor, TRRAP (Fig. (Fig.22).
AHR and Arnt are ligand-inducible mammalian HLH transcription factors that function together to activate gene transcription in response to the environmental toxin dioxin (reviewed in reference 59). AHR normally resides in the cytoplasm but translocates to the nucleus when liganded and forms dimeric complex with Arnt. AHR-Arnt heterodimers recognize sequence elements known as dioxin response elements present in a number of genes which regulate hydrocarbon detoxification and activate their transcription. Both AHR and Arnt contain glutamine-rich transactivation domains in their C termini (71). Using a DNase I digestion–ligation-mediated PCR assay, Ko et al. have assessed effects on the chromatin structure on the promoter and enhancer regions of the CYP1A1 gene, a direct target of the AHR-Arnt heterodimer (78, 79). Upon agonist treatment, cells expressing wild-type AHR, but not a carboxyl-terminal deletion mutant, showed increased DNase I hypersensitivity at the promoter element (78). Interestingly, the CYP1A1 promoter does not contain any known AHR-Arnt binding sites, so the effect is likely mediated by AHR-Arnt bound at the upstream enhancer. The Q-rich activation domain present in AHR is thus required for altering chromatin structure 1 kb away (78). Interestingly, other types of transactivation domains can functionally substitute for the Q-rich activator of AHR. Chimeric AHR proteins containing acidic activation domains derived from VP16 or Arnt were capable of activating CPY1A1 transcription in a dioxin-inducible manner (79). Furthermore, both AHR activator chimeras were shown to facilitate nucleosome disruption and allow occupancy of the TATA box and NF1 site at the CYP1A1 promoter (79).
Elegant studies focused on the control of phosphate uptake in the yeast S. cerevisiae also have provided key insights into how HLH proteins regulate gene expression. When exposed to limiting phosphate conditions, transcriptional activation of the PHO5 gene, which encodes a secreted acid phosphatase, is induced (88). Two upstream activating sequences (UAS) called UASp1 and UASp2 are present within PHO5 and are required for its transcriptional induction (140). The HLH protein Pho4, which is absolutely essential for PHO5 activation, binds both UASp1 and UASp2 (15). Under high-phosphate conditions, when PHO5 transcription is repressed, an ordered series of four nucleosomes that occludes the UASp2 site is present in the PHO5 promoter (163). The UASp1 site, however, lies in between nucleosomes in a DNase I hypersensitive region. When phosphate becomes limiting, disruption of the four nucleosomes occurs and Pho4 binds to both UASp1 and UASp2 (158). Transcriptional induction of PHO5 requires the integrity of the Pho4 transcriptional activation domain (159). Interestingly, when the PHO5 TATA box is deleted, Pho4-dependent chromatin remodeling can still occur (43, 158). Thus, PHO5 transcription per se is not required for chromatin remodeling to occur.
Recently, Komeili and O'Shea have provided further insight into how Pho4 is regulated by differential modification (80). First, when the Pho80/Pho85 cyclin-dependent kinase becomes activated under conditions of adequate phosphate, five sites within Pho4 become phosphorylated, inactivating its ability to induce transcription of phosphate-responsive genes (80). Second, phosphorylation of residues present in the N-terminal portion of Pho4 promotes nuclear export by facilitating an interaction with the nuclear export factor Msn5 (80). Third, in vitro binding studies demonstrated that phosphorylation of a residue present in the nuclear localization signal of Pho4 disrupts its ability to bind the nuclear import protein Pse1 (80). Interestingly, Pho4 mutants that were constitutively nuclear were still capable of inducing acid phosphatase when grown under limiting phosphate conditions, suggesting yet another mechanism for Pho4 regulation. Indeed, phosphorylation of yet another residue was shown to disrupt binding to Pho2, another transcription factor required for expression of PHO5 (80). Thus, posttranslational modification of Pho4 regulates both its intracellular localization and its ability to cooperate with other transcriptional regulators.
Much progress has been made towards the understanding of how the class VI HLH proteins, including Enhancer of split and Hairy, function as repressors and has been the subject of several recent reviews (46, 123). Once bound to DNA, Hairy recruits a corepressor known as Groucho (Fig. (Fig.2).2). This interaction is mediated through a conserved tetrapeptide sequence, WRPW/Y, present at the extreme C terminus of all HES family members. A recent study has shown that Groucho can functionally interact with the histone deacetylase (HDAC) Rpd3 (29). In vitro binding assays demonstrated that the interaction with Rpd3 is direct and is mediated by the glycine-proline-rich domain of Groucho (29). These data indicate that at least one mechanism by which Groucho can repress transcription is through the recruitment of a chromatin-modifying activity to target genes (Fig. (Fig.22).
The class IV proteins Mad and Mxi1 function as repressors when bound to E-box elements as a heterodimer with Max (148). Over the last several years, significant progress has been made toward the understanding of how Mad proteins repress transcription. A breakthrough came with the identification of mSin3A and mSin3B, mammalian homologs of the yeast repressor Sin3, as factors that could directly interact with the Mad repression domain (SID) (8, 147). Subsequently, a number of groups identified a protein complex containing Sin3, the nuclear receptor corepressor N-CoR, and HDAC (3, 63, 82). Thus, Mad regulates repression through a SID-dependent association with a complex containing HDAC activity (Fig. (Fig.22).
Class II HLH proteins such as Tal1, ABF-1, and Mist1 have been shown to function as transcriptional repressors as well (Fig. (Fig.2)2) (87, 98, 121). Tal1 and ABF-1 inhibit E47-dependent activation presumably through the formation of heterodimers which are incapable of activating transcription through E-box elements (98, 121). Both ABF-1 and Mist1 can repress transcription in mammalian cells when expressed as GAL4 DNA-binding domain fusions (87, 98). These data suggest that ABF-1–E2A and Mist1-E2A heterodimers can function to actively repress E-box-containing target genes. Another model that has been proposed to explain how Mist1 inhibits transactivation by MyoD suggests that Mist1 forms transcriptionally inactive heterodimers with MyoD (87). How ABF-1 and Mist repress transcription remains to be determined.
The class V HLH proteins, which include emc and Id, inhibit HLH activators through passive repression. These proteins contain a highly conserved HLH motif but lack an adjacent basic region necessary for DNA binding. Heterodimerization with these dominant-negative factors results in a form that is incapable of binding DNA (Fig. (Fig.2).2). Four known vertebrate Id gene products have been identified (reviewed in reference 93). The Id1 to Id4 gene products are closely related in their HLH regions and show similar affinities for the various E proteins (83). However, they differ in their expression patterns (72, 133, 134). The function of Id proteins has recently been addressed by using a gene targeting approach in mice. Whereas Id1 null mutant mice do not exhibit abnormalities, Id2−/− mice lack lymph nodes and the development of the natural killer (NK) cell lineage is severely perturbed (171, 173). On the other hand, we note that in the absence of E2A activity, NK cell development is significantly enhanced (62). Taken together, the data suggest that during early thymocyte differentiation the relative dose of E proteins and Id2 levels may ultimately determine cell fate, NK versus T-lineage cells. Intriguing data have recently revealed a key role for Id1 and Id3 in angiogenesis. Specifically, distinct tumors failed to grow and metastasize in Id1+/− Id3−/− null mutant mice, suggesting that these Id proteins are required for the invasiveness of vasculature structures (92).
How Id proteins are regulated is largely unknown. Recent studies have indicated that cell cycle-regulated phosphorylation of Id2 and Id3 by Cdk2 inhibits their ability to dimerize with the E proteins (36, 60). Ultimately it will be important to determine precisely how Id and E-protein levels are regulated by signals emanating from the cell surface and how their relative levels contribute to determine cell fate.
The class II HLH protein Twist was originally identified as a factor required for proper gastrulation and mesoderm formation in Drosophila. A number of reports have demonstrated that Twist has the ability to inhibit the differentiation of multiple cell types, including muscle and neurons. Twist appears to be able to inhibit myogenic differentiation by interfering with the activity of the myogenic transcription factors MyoD and MEF2. Several independent mechanisms have been proposed to explain the inhibitory effects of Twist on the muscle-specific transcriptional program. Twist, like Id, has the ability to repress MyoD function by titrating away E proteins, thus preventing the formation of functional MyoD–E-protein heterodimers (153). These authors further demonstrated that Twist inhibits muscle-specific gene activation by interfering with MEF2-mediated transactivation (153). This dually inhibitory mechanism of Twist may interfere with synergistic transactivation of target genes by the myogenic HLH proteins and MEF2 (153). Based on the observation that overexpression of E proteins and MEF2 does not relieve the inhibitory effect of Twist on MyoD activity, another inhibitory mechanism has been proposed (58). Twist was shown to directly interact with MyoD, MRF4, Myf-5, and myogenin. Surprisingly, MyoD and Twist associated via their basic regions (58). This unusual dimerization with Twist thus sequesters the myogenic HLH proteins in a transcriptionally inactive form.
Twist may also play a more general role in negative regulation of transcription. The amino terminus of Twist has recently been shown to mediate a direct interaction with the HAT domains of two different coactivators which are known to associate in vivo: p300 and PCAF (57, 172). In an in vitro HAT assay to monitor acetylation of purified histones, HAT activity of both p300 and PCAF was inhibited upon interaction with Twist (57). Overexpression of Twist in cell lines blocked PCAF-enhanced p300 autoacetylation, suggesting that, in vivo, HAT activity is impaired (57). Transient-transfection studies showed that the activity of several p300-dependent acidic activators could be dramatically reduced when Twist was coexpressed (57). These data demonstrate that p300- and PCAF-dependent gene expression may be downregulated in cells that express Twist, by directly inhibiting their acetyltransferase activity and possibly by preventing the interaction between PCAF and p300 (57). This novel mechanism of Twist-mediated transcriptional inhibition may help to explain its ability to interfere with the differentiation of diverse cell types.
Tal1 is a class II HLH protein that was originally identified as a result of a chromosomal translocation found in a subset of T-ALLs (9). Lmo2 is a member of the LIM-only family of transcription factors that is inappropriately expressed in T-ALL due to a chromosomal translocation (144). Targeted deletion of these genes in mice has demonstrated that both Tal1 and Lmo2 are required for erythroid development (127, 151, 166, 170). Studies in transgenic mice have shown that Tal1 and Lmo2 can collaborate to induce T-cell leukemia (84). By a binding-site selection strategy, a novel DNA-binding protein complex that contains the HLH proteins E47, Tal-1, zinc finger transcription factor GATA-1, and Lmo2 and Ldb1, two LIM domain-containing proteins, has recently been identified in erythroid cells (165). These five proteins bind to a bipartite sequence element composed of an E box and a GATA site. Complex binding required the integrity of both binding sites and was sensitive to the spacing between them. Reconstitution of this oligomeric complex by cotransfection experiments in COS cells demonstrated that it was capable of activating transcription of a reporter gene containing two tandem copies of the E box-GATA element. A protein complex present in a T-ALL line which is composed of Tal-1, Lmo, and GATA-3 was recently characterized (117). This complex has the ability to synergistically activate the RALDH2 gene in an E-box-independent fashion (117). Thus, HLH proteins may be associated with other transcription factors to regulate cell-type-specific gene transcription and tumorigenesis in a manner that may not require direct binding to E boxes.
HLH proteins are also known to be important for inducible expression of genes downstream of signal transduction pathways. Recently, a novel cDNA expression cloning technique based on retroviral gene transfer has been used to identify proteins involved in transforming growth factor β (TGF-β)-inducible expression of plasminogen activator inhibitor 1 (PAI-1), an extracellular matrix protein involved in cell adhesion and blood coagulation (68). A cDNA encoding the HLH protein TFE3 was isolated and found to be important for inducible expression of the PAI-1 promoter by TGF-β (68). An E-box element that binds TFE3 in vitro was identified in the PAI-1 promoter and was shown to be required for TGF-β-induced transcription in vivo (68). Transient-transfection analysis in a hepatocellular carcinoma cell line demonstrated that two transcription factors in the TGF-β signalling pathway, Smad3 and Smad4, synergized with TFE3 to activate PAI-1 in a TGF-β-dependent manner (68). The functional collaboration between TFE3 and the Smads does not appear to involve a direct protein-protein interaction, but gel shift analysis has shown that they can bind to adjacent sites on the PAI-1 promoter (68).
Studies of developmental pathways, cell growth, and cell death have in part become studies of transcriptional regulators. Efforts to identify the nature of these regulators have revealed families of proteins that are characterized by highly conserved DNA binding, dimerization, and transactivation and repression motifs. One of these includes the HLH family. Despite the advances in understanding HLH structure and function, how they regulate gene expression and recombination is largely unknown. What is the mechanism by which this particular class of regulators influences RNA polymerase or recombinase function? Also, much is to be learned about how these proteins act in synergy with other families of transcriptional regulators to control gene expression and how they modulate chromatin structure. Finally, we will need to learn more about how these proteins are regulated at the levels of synthesis, at both the transcriptional and posttranscriptional levels, by signals emanating from cell surface receptors.
We thank the reviewers for their helpful comments and suggestions which improved the quality of the manuscript.
M.E.M. is supported by a postdoctoral Cancer Biology training grant from the National Institutes of Health. C.M. is supported by grants from the National Institutes of Health.