|Home | About | Journals | Submit | Contact Us | Français|
Although it is well established that cis-acting regulatory variation contributes to morphological evolution between species, few concrete examples of polymorphism affecting developmental patterning within species have been demonstrated. Early embryogenesis in Drosophila is initiated by a gradient of Bicoid morphogen activity that results in differential expression of multiple target genes. In a screen for genetic variation affecting this process, we surveyed 96 wild-type lines of Drosophila melanogaster for polymorphisms in binding sites within 16 Bicoid cis-regulatory response elements. One common polymorphism in the orthodenticle (otd) early head enhancer is associated with a complex series of indels/substitutions that define two distinct haplotypes. The middle region of this enhancer exhibits an unusual pattern of nucleotide diversity that does not easily fit into standard models of selection and demography. Population Gene Expression Maps, generated by extracting binary expression profiles from normalized embryo images, revealed a ventral reduction of otd transcript abundance in one of the haplotypes that was recapitulated in expression of transgenic constructs containing the two alleles. We thus demonstrate that even a process as robust as early developmental patterning is affected by standing genetic variation, intriguingly involving otd, whose morphogenetic function bicoid is thought to have displaced during dipteran evolution.
The basic patterning processes of early embryogenesis have been conserved for tens of millions of years in the genus Drosophila. There is nevertheless a constant tension between robustness and flexibility during ontogeny, and this gives rise to a surprising evolutionary lability of detailed features of the patterning process that is observed between genera and species (Peel et al., 2005; Damen, 2007). The fundamental morphogen Bicoid (Bcd), that sets up the gradient of anterior-posterior (A-P) positional information (Ephrussi and St Johnston, 2004) is actually an invention of higher diptera that seems to have taken over a role largely performed by another transcription factor, Orthodenticle (Otd), in more primitive insects (Schroder, 2003; Lynch et al., 2006). One of the target genes of Bicoid, the pair-rule gene even-skipped (eve), initiates segmentation, and expression in the second stripe of eve is controlled by an enhancer that has evolved extensive substitutions despite retaining its overall function between divergent Drosophila species (Ludwig et al., 2005). Even closely related species of Drosophila show alterations in the timing of early pattern formation (Kim et al., 2000). These observations suggest that there is also likely to be functional polymorphism for early embryonic patterning within a species, but the extent and nature of such variation has not been documented.
Bicoid protein is distributed in a gradient, and activates transcription of target genes at different positions along the A-P axis (Driever and Nüsslein-Volhard, '88a,b; Driever et al., '89; Ephrussi and St Johnston, 2004; Spirov et al., 2009). Any differences in the amount of bcd mRNA deposited at the anterior pole of the embryo as a result of maternal genotype or nutrition, or any physiological variation in the diffusion of Bcd protein and mRNA after fertilization, will alter the gradient and hence the position and timing of transcriptional responses. Yet early embryonic patterning mediated by Bcd takes place robustly despite variation for embryonic size and shape, and across a range of temperatures, and segmentation defects are presumably as rare in nature as in the laboratory (Houchmandzadeh et al., 2002). Manu et al. (2009) demonstrate that cross-regulation among the gap genes is essential for robustness during development, but it remains to be established whether this homeostatic mechanism actually permits cryptic variation to be maintained in populations, or whether evolutionary canalization simply reflects the purging of any variation that would tend to disrupt embryogenesis.
In order to identify potential sources of genetic variation in the response to Bcd, we surveyed 16 well-characterized short cis-regulatory regions (Ochoa-Espinoza et al., 2005) associated with a dozen genes, in a total of 96 highly inbred lines from two populations of Drosophila melanogaster. Among several polymorphisms identified, one in the promoter of the aforementioned orthodenticle gene stands out as unusual in structure and diversity, and we subsequently show that it is part of a functionally distinct common haplotype dimorphism. The results build a bridge between studies of evolution and development between species, and quantitative genetic variation within (Stern, 2000), and demonstrate that even the earliest and most highly conserved developmental patterning processes are genetically variable.
Individual, female D. melanogaster were collected in the summer of 2004 and established as isofemale lines. The North Carolina population (NC2) was caught by us in a peach orchard near West End, North Carolina; the Maine (ME) population was collected by Martin Kreitman in a blueberry field close to Cherryfield, Maine. Lines were bred to isogenicity through 15–20 generations of full sib-mating. Five African D. melanogaster lines (MW6, MW12, MW29, MW46 and MW51), and one D. sechellia line (Robertson 3C) were provided by Corbin Jones (UNC). The D. simulans line NC112T was provided by Richard Lyman and Trudy Mackay (NCSU). All lines were kept on 12hr light–dark cycles in vials with 10mL standard cornmeal medium supplemented with yeast.
All genotype data for 48 NC2 and 48 ME lines was obtained by direct sequencing of PCR products amplified from genomic DNA extracted from a single female fly. PCR amplification and sequencing was performed by Genaissance Pharmaceuticals (now Cogenics Division of Clinical Data, New Haven, CT); primer sequences will be provided upon request. All fragments were unidirectionally sequenced using the forward PCR primer except for Dichaete that was bidirectionally sequenced. Base calling and initial contig assembly was performed with Phred and Phrap (Codon-Code, Deadham, MA). Alignment of multiple contigs, manual sequence editing, and single nucleotide polymorphism (SNP) calling was performed with BioLign software (Tom Hall and Ed Buckler; http://www.mbio.ncsu.edu/BioEdit/bioedit.html). D. simulans (Release 1.0, April 2005) and D. yakuba (Release 2.0, November 2005) outgroup sequences were produced by the Washington University School of Medicine Genome Sequencing Center, St. Louis, MO.
Additional sequencing in the otd Bicoid cis-regulatory response element (BCRE) region was done in six African D. melanogaster lines, one D. simulans line, and one D. sechellia line. PCR primers (otdSSM F: 5′-gggaggctaggtaggcgtag-3′; otdSSM R: 5′-ggacaacaatgaaggcgatt-3′) were designed that would amplify in all three species; amplification was performed with Taq polymerase (Promega, Madison, WI) at an annealing temperature of 54°C for 35 cycles. For the D. simulans and D. sechellia lines, PCR products generated from genomic DNA were directly sequenced with the forward and reverse PCR primers, using BigDye Terminator v. 3.1 (Applied Biosystems, Foster City, CA) on ABI 3700 sequencers at the NCSU Genome Research Laboratory. Because the African lines are not isogenic, PCR products were first cloned using the Promega pGEM-T Easy TA cloning kit, and four clones per line were sequenced using plasmid DNA as the template.
The sequence was annotated for Bcd binding sites in two ways. Approximately one-third of the BCREs had been previously characterized, with binding sites identified and functionally verified in some way (see Gao and Finkelstein, '98; Driever et al., '89; Howard and Struhl, '90; Small et al., '92; Rivera-Pomar et al., '95; Kuhnlein et al., '97; La Rosee et al., '97; Fujioka et al., '99). The other BCREs had been identified as potential Bcd targets by computational means (see Ochoa-Espinoza et al., 2005); these BCREs were searched for potential Bcd binding sites using TFSEARCH (Yutaka Akiyama: http://www.rwcp.or.jp/papia) using a threshold score of 80. TFSEARCH makes use of the TRANSFAC transcription factor database (Heinemeyer et al., '98).
All sequences have been deposited in the NCBI PopSet database as alignments and can be accessed using the following accession numbers: CG9571 (FJ713898); Dichaete (FJ713992); eve1 (FJ714081); eve2 (FJ714176); gt1 (FJ14270) gt23 (FJ14362); hairy0 (FJ714453); hairy2 (FJ714543); hairy7 (FJ714632); hbP2 (FJ71425); knirps (FJ14810); otd early (FJ714902); salmBE (FJ715017); slpA (FJ715200); slpB (FJ715200); tll (FJ715292); otd early MW lines (FJ714921); otd early—D. simulans and D. sechellia (FJ714923).
DNA polymorphism data was analyzed in DnaSP version 4.0 (Rozas et al., 2003). Nucleotide diversity was estimated as the average number of pairwise differences per site (π) and as Watterson's estimate of Theta (θW) based on the number of segregating sites. Tests of neutrality (Tajima's D, Fu and Li's D) were also performed in DnaSP; D. simulans sequence was used as the outgroup for Fu and Li's D test. Divergence between D. melanogaster and D. simulans was calculated in DnaSP using the Jukes and Cantor method. Linkage disequilibrium (LD) was assessed in Tassel version 2.0.1 (Ed Buckler, http://www.maizegenetics.net) using Fisher's exact test to assess the significance of the squared correlation of the allele frequencies (r2). Only sites with a minor allele frequency of 0.05, and that were present in at least 30 alleles were used in the analysis. Haplotype networks were constructed using TCS version 1.21 (Clement et al. 2000), which estimates phylogenetic networks using the statistical parsimony algorithm (Templeton et al., '92). Recombination rates were estimated using the Drosophila Recombination Rate Calculator (http://petrov.stanford.edu/cgi-bin/recombination-rates_updateR5.pl: Singh et al., 2005). A summary of haplotype sequences is shown in Table 1.
The 925 bp otd early head enhancer was PCR amplified from D and I haplotype lines with the primers F: 5′-AACCGAATTCCCTAGGCCCGA-GATTAAGTACCG-3′ and R: 5′-AACCGGCG-CGCCGGATACAGATCTCGTGGATTGC-3′. The resulting PCR fragments contained an EcoRI and AscI site on the 5′ and 3′ ends respectively. Fragments were cloned into pAOE1 (a gift of A. Ochoa-Espinosa), which contains the eve basal promoter fused to the lacZ gene and the α-tubulin 3′ UTR (Ochoa-Espinoza et al., 2005). Transgenic lines were generated using standard injection techniques by the Duke University Model System Genomics facility (Durham, NC). Five and six independent lines were obtained for the D and I haplotype reporter constructs, respectively.
Expression of endogenous otd mRNA was analyzed by in situ hybridization using an antisense otd probe generated from DNA clone RE10280 (Gen-Bank ID: BI169182) obtained from the Berkley Drosophila Genome Project (BDGP) EST project (www.fruitfly.org/EST). Reporter gene expression was analyzed using an antisense lacZ probe. Females were allowed to lay eggs on apple juice–agar plates for 3hr, after which embryos were dechorionated and fixed. Fixation and hybridization was performed as described by Tautz and Pfeifle ('89).
After in situ hybridization, embryos were mounted on microscope slides in 70% glycerol/1× PBS. Lateral images of blastoderm-stage embryos were captured at 20× magnification, using a Nikon Eclipse E800 microscope with camera attachment. Because we were interested in differences between haplotypes, and not within lines, single embryos from 99 wild type (WT) lines (38 Hap D, 61 Hap I) and 11 transgenic lines (6 otdED, 5 otdEI) were analyzed. To analyze differences in the ventral domain of otd expression, a line was drawn from dorsal to ventral near the anterior border of the expression domain. Pixel intensity along the line was calculated using NIH ImageJ software (http://rsb.info.nih.gov/ij/) and plotted against the percent egg height (dorsal margin = 0%, ventral margin = 100%). Note that dark colors have a lower pixel intensity than light colors; therefore, as staining decreases the relative pixel intensity increases (black would be 0 pixel intensity, a complete absence of color). The inflection point was determined as the percent egg height at which at least four points in a row increased in pixel intensity above background levels for that image. The difference in distribution of inflection points among D and I haplotypes within wild type or transgenic lines was assessed using a nonparametric Mann–Whitney/Wilcoxon sum-rank test implemented in JMP Genomics (SAS Institute, Cary, NC). A second replicate of single images from all lines was performed, with similar results (data not shown). To analyze differences in the anterior boundary of the otd expression domain, a line was drawn from the anterior tip to posterior (center of the pole cells). Pixel intensity along the line was calculated using NIH ImageJ software (http://rsb.info.nih.gov/ij/) and plotted against the percent egg length (posterior = 0%, anterior = 100%). The border of the expression domain was determined as the percent egg length at which at least four points in a row increased in pixel intensity at the expression boundary. The difference in the position of anterior expression boundaries among D and I haplotypes within wild type or transgenic lines was also assessed using a Mann–Whitney/Wilcoxon test.
Population Gene Expression Maps (popGEMs) are based on carefully standardized and aligned images (Kumar et al., 2002; see also www.flyexpress.net). In this standardization process, the embryo is first enclosed in a rectangle with sides tangent to the perimeter of the embryo. This is followed by rectangle rotation to ensure that anterior of the embryo is on the left and that the A-P axis is parallel to the horizontal axis. After cropping the image, all embryos are size-standardized to a 320 × 128 pixel dimension, such that each row of pixels in the image as well as each image ends on a byte, word and long word boundaries, which obviates the need for pixel padding. The aspect ratio chosen is based on the average aspect ratio found in the collection of original in situ images in FlyExpress (www.flyexpress.net). The size-standardized images are then subjected to expression pattern extraction from the foreground, which eliminates the background (including noise; see methods described in Kumar et al. (2002) and Gurunathan et al. (2004). This process produces a binary expression extraction (BEE) for each in situ image. All binary patterns from each class of image (Hap D, Hap I, otdED, and otdEI) are summarized in a two-dimensional grid, with each spatial coordinate containing the number of images (from different individuals) exhibiting a binary expression. In a popGEM, the color depth reflects the relative fraction of individuals that show expression at the given location. popGEMs were generated from the same images as above, for a total of 99 WT images (38 Hap D, 61 Hap I) and 11 transgenic images (6 otdED, 5 otdEI).
The sequenced BCREs averaged 726 bp in length and have been shown to drive reporter gene expression in transgenic embryos with boundaries of expression at specific positions along the embryo (Hoch et al., '90; Howard and Struhl, '90; Small et al., '92; Rivera-Pomar et al., '95; Kuhnlein et al., '97; La Rosee et al., '97; Gao and Finkelstein, '98; Fujioka et al., '99; Ochoa-Espinoza et al., 2005). Previous studies of these BCREs suggest that gene expression along the A-P axis is owing to combinatorial inputs of Bcd and other transcription factors in the regulatory network (Ochoa-Espinoza et al., 2005). Levels of sequence diversity were estimated on the basis of average nucleotide heterozygosity (π) and Watterson's estimate of theta (θW) (Moriyama and Powell, '96; Hutter et al., 2007). Overall levels of polymorphism in the BCREs are low (Table 2; avg π=0.0047, avg θW = 0.0046), but within the range of other noncoding regions in D. melanogaster (Andolfatto, 2005), and the expected correlation between diversity and recombination rate first described by Begun and Aquadro ('92) was observed (not shown). We had postulated that there may be a relationship between sequence diversity and level of expression along the A-P axis, but sequence diversity was too low to detect such a trend.
Our data set contained 119 potential Bcd binding sites that cover approximately 10% of the sequence. 34 of these sites had been functionally characterized previously, and 85 sites were predicted based on sequence similarity to known Bcd binding sites (see Materials and Methods). Thirteen sites, three of which have been functionally characterized, were polymorphic in our sample (Table 3). Only one of the SNPs, the G/C polymorphism in the eve stripe 2 CRE, has been previously reported by Ludwig and Kreitman ('95). In general, polymorphisms tend to be situated outside the core 5′-TAATCC-3′ domain where most deviations from the consensus sequence have been found to reside in naturally occurring binding sites (Lifanov et al., 2003). However, SNPs do occur in the core region: the site in the hairy stripe 7 enhancer is a good example of this. Whether these polymorphisms affect Bcd binding or target gene expression awaits further study.
Theory predicts that the action of natural selection should leave a molecular signature in the region around selected loci that is different from the pattern of molecular variation seen under neutral conditions. To examine whether any of the BCREs have been subject to past natural selection, we used Tajima's D and Fu and Li's D statistics to test for a departure from neutrality (Haddrill et al., 2005; Begun et al., 2007). These tests indicate that the null hypothesis of neutrality cannot be rejected for most BCREs, with the exception of the orthodenticle (otd; also known as ocelliless, oc) early head enhancer. The otd BCRE shows a positive and significant Tajima's D statistic, and is a clear outlier among the rest of the BCRE's (Table 2). Andolfatto (2005) demonstrated that Tajima's D values in D. melanogaster tend to be negatively skewed, making the positive value for the otd BCRE all the more interesting. In this case, a positive Tajima's D is the result of an excess of polymorphic alleles that segregate at intermediate frequency, as seen in a plot of the allele frequency spectrum (Fig. 1A). Furthermore, sliding window plots of π and Tajima's D reveal a high level of diversity within a 260 bp block in the center of the otd enhancer that corresponds to the region of elevated values of Tajima's D (Fig. 1B).
Unusually for D. melanogaster, this otd enhancer region also shows a large block of LD with nearly perfect association among alleles over the highly polymorphic 260 bp region (Fig. 1C). LD across the 16 other BCREs revealed no additional examples of such extensive haplotype block structure, including fragments with similar levels of polymorphism to the otd BCRE (data not shown), and this level of divergence exceeds that seen in our earlier demonstration of haplotype dimorphism in random genomic fragments of D. melanogaster (Teeter et al., 2000). The haplotype network diagram (Fig. 2A) shows two clear haplotype groups that are separated by a deep branch of 12 mutational steps. The haplotypes are further distinguished by several indel polymorphisms, and we use the presence of a major indel (16 bp) to define the main groups, denoted as D and I (see also Table 1). The polymorphisms that cause the strongly positive Tajima's D value, plus four indels, all appear in nearly perfect LD within the central stretch of 260 bp (Fig. 1C). There are additional mutational events distinguishing alleles within each haplotype group but these occur outside the 260 bp stretch. The more common haplotype (I) is present in approximately 60% of the samples, and allele frequencies do not differ substantially between two populations (Maine: 64% I, North Carolina: 54% I). We also sequenced a small sample of five African alleles, and observed both haplotypes (data not shown), implying that the dimorphism predates the bottleneck associated with the recent global spread of the species.
The same region where we find the strong haplotype structure appears to be part of a separate, complex insertion–deletion event leading to high divergence between D. melanogaster and its sibling species D. simulans and D. sechellia. The 163 base pair (bp) region between sites 363 and 516 in D. melanogaster is only 126 bp long in D. simulans and 119 bp in D. sechellia, of which, only short stretches can be aligned with the D. melanogaster sequence (Fig. 2B). Even without counting the indels, divergence between D. melanogaster, and the homologous region in D. simulans, D. sechellia, and D. yakuba, is increased (Fig. 2, and data not shown). In the 260 bp region, divergence between D. melanogaster and D. simulans is about 14%, almost three times higher than the average for non coding regions reported by Begun et al. (2007).
The polymorphism and divergence pattern at the otd BCRE is thus characterized by three findings. First, there are two clear haplotypes with high divergence between them. Second, there is no polymorphism in the central 260 bp region within haplotype groups, even though both are at an intermediate frequency and there is a large sample size. And third, there is a high level of divergence relative to the sibling species D. simulans and D. sechellia in the same region. Each of these findings can result either from neutral processes such as enhanced mutation or a bottleneck, or from selective events. Haplotype structure with high divergence between the haplotypes could be caused by a population bottleneck, admixture, balancing selection or a soft selective sweep. Lack of polymorphism could be owing to a recent bottleneck or positive or negative selection. High divergence to the outgroup species could be caused by an enhanced mutation rate or recurrent positive selection. Although the data do not allow a definitive conclusion as to which of these scenarios is most likely, they are suggestive of a complex history of both selective and demographic influences.
The otd/oc gene has pleiotropic roles in embryonic head, central nervous system, and adult eye development (Finkelstein et al., '90), and selection could affect any or all of these processes. To see whether it could feasibly affect embryonic patterning, we examined the expression of otd at the cellular blastoderm stage of 99 wild-type inbred lines by in situ hybridization (Gao and Finkelstein, '98), 38 with the D haplotype and 61 with the I haplotype. Visual examination of staining in embryos suggested a tendency toward differential intensity of staining in the ventral portion of the expression domain as epitomized by the two embryos in Figure 3A. Specifically, embryos with the D haplotype appeared to have reduced staining in the ventral region, whereas I haplotype embryos showed a strong band of expression all the way to the ventral margin.
To quantify this difference, embryos were stained and photographed, and NIH ImageJ software was used to measure pixel intensity along a line drawn from the dorsal to the ventral edge of each embryo, near the anterior border of the expression domain (Fig. 3A). Plots of pixel intensity vs. percent egg height (dorsal = 0%, ventral = 100%) were generated and the inflection point was determined as the percent egg height at which at least four points in a row increased in pixel intensity above background levels (Fig. 3B). The distribution of inflection points for wild type animals is shown above the x-axis in Figure 3C.
A Mann-Whitney U test indicates that the distribution of inflection point location along the dorsal-ventral axis was significantly different among embryos with different otd haplotypes (two-tailed t-test, P = 0.0007). Note that this procedure is independent of differences in staining intensity among embryos and that biases introduced by imperfect orientation of the embryos on the microscope slide would be randomized over the two classes and tend to reduce the true signal of differential ventral expression.
To further explore the expression differences between D and I haplotypes we used a novel mode of digital analysis, popGEMs. Each individual embryo image was standardized by forcing the embryo into a 320 × 128 pixel rectangle, as this aspect ratio corresponds to the average seen for thousands of embryos in the FlyExpress database (Gurunathan et al., 2004). A single best expression signature converts each image into a black/white BEE in which each pixel indicates either presence or absence of transcript spatial expression. Stacking of images from embryos with expression of the same gene from different lines results in popGEMs, wherein each pixel is now associated with a probability that the gene is expressed in that portion of the embryo in the sample. The otd popGEMs for the D and I haplotypes are shown in Figures 4A and 4B, and subtraction of D from I results in the difference plot in Figure 4C. The shades of red (negative values) indicate locations where otd is more commonly expressed in D than I lines, whereas green shades (positive values) indicate where expression is more common in I lines. If D and I are identical then this ΔpopGEM should show a random scattering of red and green colors against a white background.
This is not the case, as the two haplotypes show substantial differential expression of two types. The extensive green band at the anterior tip of the embryo in Figure 4C indicates that I haplotypes tend to have expanded anterior expression, and this can be seen clearly in the representative embryos in Figure 3 as well as contrasting Figures 4A and 4B. We also generated a histogram showing the area, in pixels, of expression in each embryo of the two otd classes, and this too indicates a significant anterior expansion of the area of otd expression in I haplotype embryos (data not shown). We confirmed this difference using NIH ImageJ software to determine the position of the anterior expression boundary (see Materials and Methods). The distribution of anterior expression boundary position for wild type animals is shown above the x-axis in Figure 3D. A Mann–Whitney U test indicates that the distribution of the position of the anterior boundary along the A-P axis was significantly different among embryos with different otd haplotypes (two-tailed t-test, p = 0.04); specifically, the boundary shows an anterior shift in I haplotype embryos.
By contrast, the red bands in Figure 4C are more difficult to explain as they suggest extensive dorsal and ventral expression in the D lines, which is not obvious from the popGEM images and is not what we would predict based on the results in Figure 3C. popGEMs give a visual representation of the probability that any embryo within the sample will exhibit expression at a particular location. Thus, the expression of otd at the ventral margin in the D haplotype popGEM in Figure 4A indicates that some embryos in the sample do have staining in this region; however, this spatial analysis method is not capturing quantitative differences in expression. The red bands in Figure 4C are more likely explained by an overall elevation of expression in D lines throughout the domain of expression common to both the D and I haplotypes. A substantial portion of the ventral domain is green, indicating more frequent ventral expression in I haplotype lines, consistent with the results in Figure 3C.
To confirm that polymorphism in the BCRE is responsible for the differential expression of otd, transgenic constructs where either otd enhancer haplotype drives expression of the lacZ reporter gene were generated by P-element mediated transformation. Six independent insertions of the deletion construct, and five of the insertion construct were examined. The ventral expansion of expression by the I haplotype was confirmed, showing a considerably enhanced differentiation compared with the wild type alleles. The bars below the x-axis in Figure 3C indicate that the inflection point for ventral expression is between 78 and 92% egg height for the I lines, and between 58 and 74% egg height for 5 of the 6 D lines (two-sided Mann–Whitney U test, P = 0.01). Although there appears to be a slight anterior shift of the anterior expression boundary in transgenic I haplotype lines (Fig. 3D), the trend was not significant (two-sided Mann–Whitney U test, P = 0.5) suggesting either that the position of the anterior boundary of otd expression is influenced by more than just this enhancer element or that this effect is subject to chromosomal position effects at the transgene integration site. We also suspect that variation in egg shape is not fully accounted for by the image transformation, and this may reduce the effectiveness of the popGEM method, which is most useful as a screening tool. The popGEMs and difference plot in Figures 4D–F nevertheless confirm the ventral expansion, despite the much reduced sample size relative to the wild-type comparisons.
One or more of the dozen sequence differences between the two haplotypes may be responsible for the expression differences. The Bcd binding site polymorphism (Table 3) is a clear candidate for mediating the anterior expansion in the I lines, but there are no obvious polymorphic binding sites for D-V transcription factors such as Snail that might explain the loss of ventral expression in the D lines. Attempts to demonstrate differential binding of Bcd protein to the two sequences were not successful, likely because Bcd is expected to bind the putative site with low affinity. More comprehensive dissection of the element is needed to establish the molecular basis of the various aspects of differential expression.
An outstanding question in the field of evolutionary developmental biology regards the contribution of cis-regulatory mutations to phenotypic evolution (Hoekstra and Coyne, 2007; Wray, 2007). It has been proposed that certain types of phenotypic changes, notably those involving morphology, may be more likely to occur owing to adaptive mutation in cis-regulatory regions, rather than in the protein-coding regions of genes (Jacob and Monod, '61; Carroll, 2000; Stern, 2000; Gompel et al., 2005). This argument largely rests on the hypothesis that the modular nature of cis-regulatory regions reduces the likelihood of pleiotropic mutations by limiting the effect of a mutation to a small part of the overall transcriptional profile. In contrast, mutations in protein coding regions are expected to affect the function of the gene product in all expression domains, increasing the likelihood of a deleterious effect.
Until recently, a detailed analysis of the contribution of cis-regulatory mutations to phenotypic change has been difficult to come by. Unlike protein-coding regions of genes, regulatory sequences are notoriously difficult to identify, and predictions regarding the effects of mutations are hard to make. However, over the last several years the large volume of genomic data available to researchers, combined with improvements in technology, has made identification and analysis of cis-regulatory regions much more tractable. Several instances of polymorphisms in cis-regulatory elements have been shown to drive differential expression, and to be associated with complex disease susceptibility, and population genetic analyses have shown how a combination of balancing and diversifying selection as well as mutation pressures influence the distribution of allele frequencies in different human populations (Hahn et al., 2004; Rockman et al., 2003, 2004, 2005). At least three clear instances of multiple mutations affecting cis-regulatory modules responsible for divergence in expression between Drosophila species have been described (Ludwig and Kreitman, '95; Sucena et al., 2003; Jeong et al., 2008). To our knowledge, this study provides the first demonstration of intraspecific cis-regulatory polymorphisms affecting the expression of a gene that is crucial for early embryonic patterning.
It is not clear what population genetic processes might explain the unusual nature of the sequence diversity in the otd enhancer. To explain all three observations (two divergent haplotypes, no polymorphism in the central 260 bp region of each haplotype, high divergence relative to the sibling species), at least two forces must be invoked. A purely neutral scenario would be a locally enhanced mutation rate combined with a strong recent bottleneck. The complicated history of North American D. melanogaster (which has certainly experienced bottlenecks and possibly admixture: Haddrill et al., 2005), accommodates a wide range of neutral processes. The difficulty here is that we need to assume that two entirely independent extreme processes have affected the same short sequence region, namely a large local mutation rate and an extreme shape of the local genealogy owing to a bottleneck.
Balancing selection maintains multiple alleles at intermediate frequency within a population by heterozygote advantage, frequency dependent selection, or selection on alternate alleles under different environmental conditions (environmental heterogeneity). Loci under old balancing selection will maintain common polymorphism in the region surrounding the selected polymorphism, and hence show an increased level of diversity that is organized in haplotype groups. However, in this case, we would expect neutral polymorphism to segregate within each of these haplotype groups. Surprisingly, the highly polymorphic 260 bp block shows no variation within either of the haplotype groups. To make the old balancing selection scenario compatible with the low polymorphism within the haplotypes, one needs to postulate additional recent positive selection events within both haplotype groups. Also, balancing selection does not explain the high divergence with D. simulans, for which we would still need to invoke an enhanced mutation rate or recurrent positive selection in the D. melanogaster or D. simulans lineage.
Another possibility is that a particular class of selective sweep, namely a soft selective sweep from recurrent beneficial mutation, has allowed two or more beneficial alleles at the same locus to quickly increase in frequency in the population as recently suggested at the tan locus (Jeong et al., 2008). This would be expected to create a strong haplotype structure, with very low polymorphism within the two haplotypes, and it is likely to result in a high Tajima's D (Pennings and Hermisson, 2006a). However, unlike balancing selection, a soft sweep is not expected to lead to high divergence between the two haplotypes. This aspect of the data, as well as the high divergence to the outgroups, suggests a locally increased mutation rate, but as with the neutral model, two unusually strong phenomena must be invoked. Interestingly, a high mutation rate in itself increases the probability of soft sweeps (Pennings and Hermisson, 2006b), but a further conundrum is that the two alleles would be expected to have nearly identical fitness effects. This is not obviously reconciled with the clear functional difference between the haplotypes at the expression level, though it is possible that selection acts on a pleiotropic aspect of the enhancer function and that the embryonic expression phenotype is effectively neutral.
It is however, difficult to imagine that the expression differences described here would be without subtle consequences. Fowlkes et al. (2008) recently showed that precise standardization of embryonic shapes and staging relative to a control gene can be used to decipher spatial interactions that are important for the network of gene function in embryogenesis. Our results suggest that similar approaches applied to a diversity of wild type lines may provide insight into the variability of embryonic patterning even within a species. A fascinating outcome of such studies will be better understanding of the buffering mechanisms that ensure that development is robust, and of the canalization mechanisms that result in species-specific patterns of gene expression (Lott et al., 2007).
The authors thank Erin Kennerly for help collecting flies; Martin Kreitman, Corbin Jones and Richard Lyman for generously providing fly lines; and Ian Dworkin for help with DNA collection and sequencing. We thank Ian Dworkin and Stefan Laurent for helpful discussions regarding the data. J. H. is supported by the Vienna Science and Technology Fund (WWTF), S. K. by the NHGRI, and G. G. by the US NIH (2R01-GM61600) and the Australian Research Council (ARC-DP0880204).
Grant sponsor: US National Institutes of Health; Grant number: 2R01GM61600; Grant sponsor: Australian Research Council; Grant number: DP0880204.