Coordinated regulation of gene expression is a fundamental process that depends on the binding of transcription factors to a gene's cis
-regulatory sequences. Absolutely required for transcription initiation of metazoan protein-coding genes is the core promoter, the region of DNA 35–40 bp upstream and downstream of the transcription start site (TSS) [1
]. The core promoter contains sequence elements, referred to as "core promoter motifs," which interact with the basal transcription machinery, including RNA polymerase II and the TFIID complex [2
]. In recent years, it has become clear that the core promoter, rather than playing a passive role in the spatial and temporal regulation of gene expression, is an important active partner in these events [3
]. For instance, different promoter sequences are found preferentially associated with certain functional classes of genes, with genes expressed at particular developmental stages, and with genes expressed in the germ line versus the soma [5
]. Various tissue-specific members of the TATA box-binding protein (TBP) family, such as the TBP-related factors (TRFs), bind preferentially to certain core promoters [4
]. There is also substantial evidence for preferred or specific promoter-enhancer interactions, whereby a distal cis
-regulatory module (CRM, or "enhancer") can stimulate activity from one promoter, but not another [9
A number of mechanisms have been demonstrated to restrict the activity of a CRM to a particular promoter, including insulator elements [11
], insulator-bypass or promoter targeting elements [12
], short-range repression [14
], chromatin-mediated silencing [11
], and preferential interaction with promoters containing certain core promoter motifs [10
]. The relative prevalence of each of these mechanisms is unknown, as in most cases is a detailed understanding of how they function. In particular, the molecular basis underlying core promoter preference has not been clearly defined.
The existence of CRM-promoter specificity is all the more remarkable given that it is maintained despite the fact that there are sometimes other promoters closer to, or even interposed between, a CRM and its target. In fact, the latter may be a much more common scenario than typically credited, as it can occur not only with respect to the regulation of different genes, but also with respect to alternative promoters of the same gene. In humans, it is estimated that upwards of 50% of all genes have at least one alternative promoter [7
], and there is growing evidence that alternative promoter usage plays important roles in both development and disease [21
]. It is unknown how frequently such alternative promoters are regulated by distinct CRMs, but the number could be large; Kimura et al
] suggest that over 1800 sets of alternative promoters are regulated in a tissue-specific fashion.
Except for the case of bidirectional promoters (those that regulate divergently transcribed genes; [22
]), few studies have focused specifically on promoters of neighboring genes or on alternative promoters, and little is known about the mechanisms that direct promoter usage choice. Baek et al
] recently analyzed a subset of human promoters by dividing them into the four categories of CpG-island containing and non-containing single and alternative promoters, and observed differences in sequence properties, evolutionary conservation, biological roles, and degree of usage. Their data suggest that there may be differences among promoters depending on their relative position in the gene, with more upstream promoters being more highly expressed and more CpG-rich than the more downstream promoters. Interestingly, they found that the TATA box and DPE core promoter motifs were more common in single than in alternative promoters. However, a similar study by Kimura et al
] found little difference in the frequency of the TATA box between the two groups, although they observed a large difference in the prevalence of CpG islands. Differences in the full set of promoters used and in how the promoters were grouped–the latter study did not look separately at the CpG-containing and non-containing promoters–may account for the discrepancies in the reported results. A number of other sequence motifs, of unknown functional significance, were also seen to be differentially represented among the promoter classes [24
]. These studies suggest that there might be fundamental differences in the structure and function of single versus alternative promoters that could have broad implications for understanding how transcription is coordinated within the genome.
As a means to developing an estimate of how important the sequence of the promoter might be in dictating promoter usage choice and in mediating CRM-promoter specificity, and as a prelude to experimental studies of the mechanisms of CRM-promoter interactions, we undertook a global bioinformatics analysis of Drosophila melanogaster promoters. We found that there are marked differences in nucleotide composition and motif prevalence between single promoters and alternative promoters, and between the most 5' alternative promoters and more downstream alternative promoters. We also observed that adjacent genes on the chromosome are more likely than expected to have promoters with a similar motif profile, and that this similarity in promoter configuration correlates with co-regulated gene expression. Our results suggest that promoter composition may play a larger-than-appreciated role in coordinating gene expression both between nearby genes and between multiple transcripts of the same gene.