The helix-loop-helix (HLH) family of transcriptional regulatory proteins are key players in a wide array of developmental processes. Over 240 HLH proteins have been identified to date in organisms ranging from the yeast Saccharomyces cerevisiae to humans (6). Studies in Xenopus laevis, Drosophila melanogaster, and mice have convincingly demonstrated that HLH proteins are intimately involved in developmental events such as cellular differentiation, lineage commitment, and sex determination. In yeast, HLH proteins regulate several important metabolic pathways, including phosphate uptake and phospholipid biosynthesis (19, 67, 112). In multicellular organisms, HLH factors are required for a multitude of important developmental processes, including neurogenesis, myogenesis, hematopoiesis, and pancreatic development (12, 86, 127, 179). The purpose of this review is to examine the structure and functional properties of HLH proteins.
E-box sites: elements mediating cell-type-specific gene transcription.
Gene transcription of the immunoglobulin heavy-chain (IgH) gene has long been known to be regulated, in part, by a cis-acting DNA element known as the IgH intronic enhancer (109, 156). By in vivo methylation protection assays, a number of sites were identified in both the IgH and the kappa light-chain gene enhancers which were specifically protected in B cells but not in nonlymphoid cells (41). These elements shared a signature motif which consisted of the core hexanucleotide sequence, CANNTG, and were subsequently dubbed E boxes (41). A total of five E-box elements are present in the IgH gene enhancer: μE1, μE2, μE3, μE4, and μE5. The Ig kappa enhancer also contains three cannonical E boxes, designated κE1, κE2, and κE3. E-box sites have been subsequently found in B-cell-specific promoter and enhancer elements, including a subset of Ig light-chain gene promoters, the IgH and Ig light-chain 3′ enhancers, and, more recently, the λ5 promoter (110, 118, 156).
E-box elements have also been identified in promoter and enhancer elements that regulate muscle-, neuron-, and pancreas-specific gene expression. For example, in muscle, the muscle creatine kinase gene, acetylcholine receptor genes α and δ, and the myosin light-chain gene all require E-box elements for full activity (27, 51, 85). A number of genes whose expression is limited to the pancreas also require E-box sites for proper expression. The insulin and somatostatin genes, for example, contain E-box sites that, when multimerized, are sufficient to regulate pancreatic β-cell-specific gene expression (168). More recently, E-box regulatory sites have been identified in a number of neuron-specific genes, including the opsin, hippocalcin, beta 2 subunit of the neuronal nicotinic acetylcholine receptor, and muscarinic acetylcholine receptor genes (1, 21, 52, 125).
E-box sites: cognate recognition sequence for HLH proteins.
Two proteins, termed E12 and E47, were originally identified as binding to the κE2/μE5 site (65, 102). They have a region of homology with the Drosophila Daughterless protein, the myogenic differentiation factor MyoD, members of the achaete-scute gene complex, and the Myc family of transcription factors (102). This stretch of conserved residues, known as the Myc homology region, appeared to be critical for the DNA binding properties of E12 and E47 (102). The E12 and E47 proteins, which differ only within this Myc homology region, arise by alternative splicing of the E2A gene (157). This conserved sequence, which was modeled as two amphipathic alpha helices separated by a flexible loop structure, was named the HLH motif and shown to function as a dimerization domain.
The HLH structure.
The solution structure of the basic HLH (bHLH)-leucine zipper (LZ) factor Max first confirmed the existence of the HLH motif (44). Subsequently, the three-dimensional structure of the E47 bHLH polypeptide bound to its E-box recognition site, CACCTG, has been solved at 2.8-Å resolution (38). A number of interesting features were revealed from analysis of the E47 crystal structure. The E47 dimer forms a parallel, four-helix bundle which allows the basic region to contact the major groove (38). In addition to the basic region, residues in the loop and helix 2 also make contact with DNA (38). Stable interaction of the HLH domain is favored by van der Waals interactions between conserved hydrophobic residues (38). The E47 dimer is centered over the E box, with each monomer interacting with either a CAC or CAG half-site. A glutamate present in the basic region of each subunit makes contact with the cytosine and adenine bases in the E-box half-site. An adjacent arginine residue stabilizes the position of the glutamate by direct interaction with these nucleotides and additionally the phosphodiester backbone. Both the glutamate and the arginine residues are conserved in most bHLH proteins, consistent with a role in specific DNA binding (6, 38, 102).
Classification of the HLH proteins.
Owing to the large number of HLH proteins that have been described, a classification scheme that was based upon tissue distribution, dimerization capabilities, and DNA-binding specificities was devised (Fig. (Fig.1)1) (101). Class I HLH proteins, also known as the E proteins, include E12, E47, HEB, E2-2, and Daughterless. These proteins are expressed in many tissues and capable of forming either homo- or heterodimers (103). The DNA-binding specificity of class I proteins is limited to the E-box site. Class II HLH proteins, which include members such as MyoD, myogenin, Atonal, NeuroD/BETA2, and the achaete-scute complex, show a tissue-restricted pattern of expression. With few exceptions, they are incapable of forming homodimers and preferentially heterodimerize with the E proteins. Class I-class II heterodimers can bind both canonical and noncanonical E-box sites (103). Class III HLH proteins include the Myc family of transcription factors, TFE3, SREBP-1, and the microphthalmia-associated transcription factor, Mi. Proteins of this class contain an LZ adjacent to the HLH motif (66, 177). Class IV HLH proteins define a family of molecules, including Mad, Max, and Mxi, that are capable of dimerizing with the Myc proteins or with one another (7, 22, 174). A group of HLH proteins that lack a basic region, including Id and emc, define the class V HLH proteins (18, 39, 47). Class V members are negative regulators of class I and class II HLH proteins (18, 39, 47). Class VI HLH proteins have as their defining feature a proline in their basic region. This group includes the Drosophila proteins Hairy and Enhancer of split (76, 141). Finally, the class VII HLH proteins are categorized by the presence of the bHLH-PAS domain and include members such as the aromatic hydrocarbon receptor (AHR), the AHR nuclear-translocator (Arnt), hypoxia-inducible factor 1α, and the Drosophila Single-minded and Period proteins (34).
Recently, another classification method of HLH proteins has been described (6). Based on the amino acid sequences of 242 HLH proteins, a phylogenetic tree was created to group family members according to evolutionary relationships (6). Four major groups, A through D, which comprise more than 24 protein families were identified (6). The groupings were based upon DNA-binding specificity as well as conservation of amino acids at certain positions (6). As the number of HLH proteins continues to grow, this evolutionary or “natural” classification may provide a more accurate and convenient means of categorization.