|Home | About | Journals | Submit | Contact Us | Français|
Protein-coding change is one possible genetic mechanism underlying the evolution of adaptive wing colour pattern variation in Heliconius butterflies. Here we determine whether 38 putative genes within two major Heliconius patterning loci, HmYb and HmB, show evidence of positive selection. Ratios of nonsynonymous to synonymous nucleotide changes (ω) were used to test for selection, as a means of identifying candidate genes within each locus that control wing pattern.
Preliminary analyses using 454 transcriptome and Bacterial Artificial Chromosome (BAC) sequences from three Heliconius species highlighted a cluster of genes within each region showing relatively higher rates of sequence evolution. Other genes within the region appear to be highly constrained, and no ω estimates exceeded one. Three genes from each locus with the highest average pairwise ω values were amplified from additional Heliconius species and races. Two selected genes, fizzy-like (HmYb) and DALR (HmB), were too divergent for amplification across species and were excluded from further analysis. Amongst the remaining genes, HM00021 and Kinesin possessed the highest background ω values within the HmYb and HmB loci, respectively. After accounting for recombination, these two genes both showed evidence of having codons with a signature of selection, although statistical support for this signal was not strong in any case.
Tests of selection reveal a cluster of candidate genes in each locus, suggesting that weak directional selection may be occurring within a small region of each locus, but coding changes alone are unlikely to explain the full range of wing pattern diversity. These analyses pinpoint many of the same genes believed to be involved in the control of colour patterning in Heliconius that have been identified through other studies implementing different research methods.
Mechanisms by which biological information is encoded in living systems are incredibly diverse, and the resulting biological diversity is no less astounding. The relative contribution of different mechanisms to adaptive phenotypic variation has been the focus of many recent evolutionary and developmental studies and the subject of much debate [1-4]. One outstanding question is the relative preponderance of coding versus regulatory mutations in generating phenotypic change. Numerous studies have shown rapid evolution of protein coding regions of genes, for example in hominid lineages . For other vertebrates, frequently cited examples of morphological variation controlled by amino acid changes include repeated involvement of Melanocortin-1 receptor (Mc1r) mutations in mice, birds, and reptiles in adaptive melanic pigmentation [6-8].
Increasingly, the importance of cis-regulatory elements (CREs) as the "creative" force in evolution has assumed center stage [9,3]. The theoretical basis for this hypothesis is that every gene possesses a modular arrangement of 5' CREs that control temporal and spatial patterns of expression, and that these regions are less subject to the negative pleiotropic effects constraining coding changes, allowing adaptive evolution to proceed at a higher rate . Oft-cited examples of cis-regulatory change as the molecular basis for phenotypic variation come from Drosophila, notably studies of the genetic control of yellow pigmentation patterning in adult wings of Drosophila biarmipes . Other well-examined morphological changes include the pelvic spine and armoured plate in sticklebacks and bristles in Drosophila larvae [12,13].
The Heliconius radiation presents an excellent opportunity to test these competing hypotheses in evolutionary genetics. These mimetic, aposematic butterflies are highly phenotypically varied and complex, yet patterns are under simple genetic control. Patterning variation exists on multiple levels, from inter-specific, through geographic variation within species to intra-population polymorphism, and can be attributed to strong selection for mimicry that also drives adaptive radiation [14,15]. Most importantly, the variation observed represents several cases of adaptive divergence and convergence, providing multiple sets of comparisons to test whether evolution recruits the same genetic pathways to produce similar phenotypic traits.
Joron et al. identified, through linkage mapping, a conserved locus responsible for control of wing patterning in three Heliconius species, suggesting that a conserved region controls pattern evolution across the genus . In Heliconius melpomene, HmYb and HmB loci are two major regions known to control the hindwing bar element and the red forewing band, respectively. The corresponding loci in Heliconius erato, the co-mimic of H. melpomene, have been mapped to the same genomic regions as HmYb and HmB, suggesting the genetic mechanisms giving rise to shared phenotypes across co-mimicking races may be similar. Furthermore, these loci are characterised by multiple alleles responsible for many different phenotypes, such that we might expect the genes responsible to have undergone multiple rounds of strong positive selection. HmYb and HmB have recently been annotated, and between 20 and 24 genes have been predicted in each locus [17,18].
It is currently unknown whether coding or regulatory regions within these loci are primarily responsible for wing pattern diversity. In parallel studies we have been examining gene expression differences between races of Heliconius melpomene. To complement this approach, we here examine signatures of evolution in coding regions by estimating the ratio (ω) of nonsynonymous (dN) to synonymous changes (dS) in 20 and 18 putative patterning genes within the HmYb and HmB loci, respectively . We first analysed BAC sequences, Expressed Sequence Tag (EST) transcripts, and 454 transcriptome sequences, most of which are available for three focal species (Heliconius melpomene, Heliconius numata, and Heliconius erato) and two races of H. melpomene (H. m. cythera and H. m. malleti). Three selected genes from each locus were then sequenced for the entire coding region in several lineages of Heliconius and a closely related genus when possible. The results identify clusters of three or four genes with elevated rates of evolution, but fail to provide strong statistical evidence for diversifying selection. Nonetheless, genes highlighted here as having high rates of protein coding evolution are good candidates for further investigation into their role in wing patterning, as has been recently found for the UV opsins and their role in Heliconius vision .
Using available sequences of three Heliconius species and two races, we estimated the substitution rate for genes within the two patterning locus regions through 1) pairwise analyses with the YN00 program and 2) one-ratio branch model with CODEML in the PAML suite . Since the only intra-specific comparison in these analyses is between the trans-Andean races, H. m. malleti and H. m. cyrbia, we consider recent recombination within sequences to be unlikely. Hence, we have explicitly accounted for recombination only in subsequent analyses with larger numbers of intra-specific comparisons. Generally low pairwise dN and dS values suggest that the majority of genes in the HmYb and HmB loci are slow evolving and highly constrained (Figure (Figure1).1). Furthermore, no pairwise or background (one-ratio model) ω values of any gene in either locus exceeded one, which is the hallmark signature of positive selection. Conventionally, neutrality is inferred when ω = 1, purifying or negative selection when ω < 1, and positive or directional selection when ω > 1. However, the average ω value across sites in a gene is unlikely to be above 1 unless a large proportion of sites in the gene are under diversifying selection .
Analysis of the 20 genes annotated within the HmYb region highlights three genes on the reverse strand and in close proximity to one another that show the highest average pairwise and background ω values: HM00021, LRR (HM00024), and fizzy-like (fzl; HM00025) (Table (Table1)1) . Analysis of background ω values calculated across a phylogeny of available BAC sequences using the one-ratio model also pinpointed these same three genes, as well as HM00016, as having elevated values (Table (Table1;1; Figure Figure2).2). Of these genes, fzl yielded the highest lineage-wide ratio (ω = 0.456), which is also reflected in its highest average pairwise value (ω = 0.480; Table Table1).1). All pairwise alignments of both HM00021 and fzl reveal more nonsynonymous changes than synonymous changes, while the number of nonsynonymous changes never exceeded synonymous changes for analyses of HM00016. Previous studies suggest ω > 0.5 to be the threshold at which adaptive evolution can be reasonably suspected . Therefore, based on their ω estimates, genes HM00021, LRR and fzl were selected for further analyses using expanded datasets to explore possible adaptive evolution within the Heliconius radiation. These genes are clustered together along the HmYb BAC walk on the reverse strand (Figure (Figure22).
In the HmB locus no pairwise, lineage-wide, or specific branch model tests estimated ω values that exceeded one, though DALR (HM01004), kinesin (HM01018), HM01022, and epoxide hydrolase (HM01014), exhibited pairwise ω values approaching 0.5 and high background ω values (Table (Table2;2; Figure Figure3).3). However, epoxide hydrolase (background ω = 0.219), has been slow evolving within H. melpomene, with no nonsynonymous changes and only three synonymous changes between H. m. malleti and H. m. cythera, which is insufficient to accurately estimate ω (Table (Table2).2). HM01022 and DALR show stronger signatures of selection in more recently diverged taxa, with the number of nonsynonymous changes exceeding synonymous changes between the two H. melpomene races (Figure (Figure3ATable3ATable Table2).2). Similar to the trend observed in HmYb, genes possessing high ω values in HmB are found on the same strand and are located in close proximity to each other along the locus, albeit interspersed with genes showing lower ω values (Figure (Figure33).
The kinesin gene, in addition to yielding the highest background ω value (ω = 0.393; Table Table2;2; Figure Figure3B)3B) and second highest average pairwise ω value (ω = 0.401; Table Table2),2), is the only gene to possess more nonsynonymous changes than synonymous changes across all pairwise analyses (Table (Table2).2). It is important to note that kinesin is an unusually large gene, and most pairwise analyses only encompassed 2 kilobases at the 3' end, since 454 contigs of H. m. malleti, H. m. cythera, and H. numata only span this region (Table (Table2).2). SMART (Simple Modular Architecture Research Tool) functional domain searches located a conserved kinesin motor domain at the 5' end, with the 3' end likely conferring protein binding specificity (for discussion of kinesin gene structure, see ) . Thus, our analyses do at least include the region in which functionally significant changes are most likely to occur. Due to an abundance of insertions and deletions in exon 13 between H. melpomene and H. erato, the amino acid sequences were aligned with ClustalW and only nucleotides of aligning amino acids were used in tests of selection (Additional file 1).
Genes singled out from preliminary analyses were further examined by amplifying sequences across other Heliconiini lineages. In addition to HM00021 and LRR, trehalase 1A (HM00006) was included in further analyses since its gene expression was found to be associated with wing patterning (R. Reed, University of California, Irvine, personal correspondence; N.J. Nadeau et al., unpublished data). Unfortunately, we were unable to amplify fzl sequences from other races and species, precluding further investigation. For the HmB locus, Mad (HM01000) was included in these expanded analyses due to its potentially important role as a mitotic checkpoint protein, suggesting potential interactions with fzl in the HmYb locus . Mad had been excluded from preliminary analyses due to a lack of 454 raw reads for the gene, and we were unable to amplify DALR from other races and species.
Since genealogical topologies may differ from published Heliconius phylogenies, trees for each of the candidate genes were reconstructed by maximum likelihood using the dnadist program in Phylip 3.69. Results show that all six candidate genes produce an overall topology similar to a recent Heliconius phylogeny . The only discrepancy was placement of H. charithonia, a monomorphic species, within H. erato for Mad. However, because the bootstrap value for the branch grouping H. charithonia with H. erato cyrbia was low (34%), the published phylogeny was used for analysis.
The estimated background ω value for HM00021 across 11 Heliconius taxa (ω = 0.43; Figure Figure4B)4B) was comparable to the preliminary estimate that included only the three focal Heliconius taxa (ω = 0.39; Table Table1).1). Nucleotide sequence alignments of exon 5 revealed several indels, some across many lineages, others specific between two species (Additional file 2A, B). However, indels were removed within reading frame from sequences used in all preliminary and expanded PAML analyses. Most notably, there is a 12 nucleotide insertion in H. charithonia, H. doris, H. numata, and H. m. malleti relative to the H. melpomene and H. erato BAC sequences (Additional file 2A). The amino acid alignments of exon 5 suggest that H. doris is an outgroup to the main Heliconius clades at this gene (percent similarity between H. melpomene and H. doris, 70%; and between H. melpomene and H. erato, 81%; Additional file 2B).
Since phylogeny-based methods of detecting positive selection do not take into consideration the effects of recombination on sequence evolution, gene alignments were analyzed using GARD (Genetic Algorithms for Recombination Detection) to estimate recombining breakage points [27,28]. Trees generated from the six recombinant fragments for HM00021 were then used in FEL (Fixed Effects Likelihood) tests for positive selection (Table (Table3).3). Following previous analyses, we take p < 0.1 as indicative of positive selection . FEL analyses of GARD trees and the one neighbour joining tree found one codon with a signature of positive selection at 0.1 > p > 0.01 (Table (Table3).3). The inclusion or exclusion of race sequences did not affect detection of sites under directional selection (Table (Table33).
Due to the difficulty of amplifying the entire LRR gene (HM00024) in H. erato and other species in the pupal mating clade, analyses included only H. charithonia (a monomorphic species closely related to H. erato), three races of H. melpomene (malleti, cythera, plesseni), and BAC sequences of H. erato and H. numata. The background ω (0.23; Figure Figure4)4) across this topology is similar to that of the preliminary background analysis (ω = 0.20; Table Table2).2). Three breakage points were detected by GARD analyses, and FEL tests of selection using these recombinant partitions found no sites undergoing positive selection (Table (Table3).3). Phylogeny-wide analyses reveal a low background ω value for trehalase 1A (ω = 0.074), and FEL tests of selection provided no evidence for selection.
Due to difficulty in amplifying the entire length of kinesin from many species, analysis of kinesin was limited to exon 13 (1748 bps), the largest exon and also the region that includes the predicted "binding domain" of the protein. Because exon 13 contained an abundance of non-frame-shifting indels, amino acid sequences for all races were first aligned with ClustalW and nucleotide sequences of only overlapping amino acid positions were included in the PAML, GARD, and FEL analyses. The H. doris kinesin sequence is again highly divergent from the other Heliconius species sampled (Additional file 1).
Kinesin ω values are the highest estimated across all genes (Figure (Figure5).5). Background, H. melpomene branch (malleti, cythera, H. melpomene BAC), and H. erato branch (lativitta, cyrbia, and H. erato BAC) ω values are similar: 0.39, 0.35, and 0.40, respectively (Figure (Figure5B).5B). GARD analyses generated eight predicted breakage points, two of which showed statistically significant topological incongruence at p = 0.05 (Table (Table3).3). One codon was found to show evidence for positive selection when accounting for recombination, while two codons were identified when a single NJ tree was used for analysis (Table (Table33).
HM01022 showed no evidence for recombination (Table (Table3),3), but also did not show any support for positive selection. Estimated dN/dS ratios for Mad are low, suggesting that the gene is also highly constrained (Figure (Figure5A).5A). Background ω was 0.034 and branch ω values for H. melpomene (cythera and plesseni races only) and H. erato were 0.071 and 0.0001, respectively (Figure (Figure5).5). Accounting for recombination did not alter findings that there is no evidence for positive selection in HM01022 or Mad (Table (Table33).
The genetic basis of adaptation in natural populations is an area of considerable current interest in evolutionary biology. One major area of debate is the relative importance of protein-coding versus regulatory change in morphological adaptation. The Heliconius butterflies show both divergence and convergence in wing pattern, making them an interesting system to test these opposing hypotheses. Here, we explore protein-coding change across two genomic regions controlling wing pattern variation across multiple species of Heliconius. The role of these regions in wing pattern evolution across multiple species of Heliconius makes this a promising system to search for signatures of rapid evolution. However, our results provide little convincing evidence for accelerated protein-coding evolution by conventional interpretation. Although statistical support is mixed and ω values are never significantly greater than 1, a handful of genes do show evidence for adaptive evolution based on conservative estimates of ω ≈ 0.5. Nonetheless, both genomic regions studied provide some evidence for a clustering of genes with elevated ω, suggesting the influence of selection on the region. Furthermore, there is intriguing concordance between the genes highlighted here as having signatures of rapid evolution, and those already identified by gene expression and population genetic surveys.
Of the 20 genes analyzed for signatures of positive selection in the HmYb locus, only three genes, HM00021, LRR, and fizzy-like, consistently showed accelerated rates of evolution. And of the 18 examined genes in the HmB locus, DALR, HM01022, and kinesin yielded the highest ω values from several pairwise and lineage-wide tests of selection. It is striking that these candidate genes group together along the BAC walk: at the 5' end of the HmB locus and at the 3' of the HmYb locus, suggesting that in both cases a specific region in each locus may have undergone selection. While HM00022 is also found proximally to these genes along the same strand in HmYb, its ω values are relatively much lower. Since HM00022 codes for a protein that serves a known important function (Enoyl-CoA Hydratase), it may have experienced greater selective pressure to be functionally constrained compared to other genes discussed here . These high ω values may indicate selection on the protein sequence of several functionally important genes, but perhaps more likely also indicate fixation of "nearly neutral mutations" rather than positive selection . Genetic hitch-hiking due to repeated selective sweeps, perhaps on neighbouring cis-regulatory regions or untranslated regions (UTRs), could have locally reduced the effective population size, leading to fixation of nearly neutral coding changes . Thus, repeated selective sweeps in close proximity to these genes would be expected to give a pattern of elevated ω values as observed. Regions of low recombination also experience a higher rate of nonsynonymous polymorphism and greater rate of accumulation of nonsynonymous substitutions over time that better reflect effective population size effects rather than positive selection [30-33]. Higher ω values could also therefore result from lower rates of recombination in these gene regions.
The general assumption of the analyses performed in this present study is that differences between sequences are fixed differences between the populations studied. Of course, this will not be true in many cases, and in particular, closely related races will share polymorphisms. Thus, what tests of positive selection assume to be substitutions reflecting fixed changes may actually be "pre-fixations" . Taking this into consideration, Kryazhimskiy and Plotkin suggest that ω values below one may in fact still be a signature of strong or weak positive selection, and this is supported by studies finding statistical evidence for adaptive evolution when ω > 0.5 with limited sequence comparisons [23,34]. In some cases single nucleotide polymorphisms were identified in the sequences between H. m. cythera and H. m. malleti, in which case the shared base was chosen for sequence analyses (see Methods section). This may lead to either artificially high or low ω estimates depending on whether polymorphic sites account for silent or non-silent variation. HM00021 (HmYb) was the only gene for which high polymorphism observed in 454 sequence reads was of concern. The potential consequences of "majority-rule" or "minority-rule" selection of polymorphic sites was addressed by running analyses on many possible combinations of polymorphic sites, with no strong effect on estimated ω values.
The ranges of many Heliconius species are broken up into divergent wing pattern races separated by narrow hybrid zones. Analysis of population differentiation for the HmYb locus between such wing pattern races showed a striking peak of population differentiation around the LRR gene in several pairwise comparisons . Here, we have also highlighted this region, with HM00021, LRR (HM00024), and fizzy-like (HM00025) being relatively fast evolving.
In the HmYb locus, the only gene with tentative evidence for positive selection was HM00021. Its analyzed data set is more robust as it includes broad representation of closely related species outside of the H. melpomene and H. erato branches. Yet, estimated rates of synonymous and nonsynonymous substitutions are likely a conservative estimate of the true degree of sequence divergence, since alignments revealed several indels in the fifth and last exon of HM00021 (Additional file 2). While the majority of indels exist between species, a conspicuous 12 base-pair insertion in 454 contigs of H. m. malleti was found in exon 5 relative to the H. melpomene and H. erato BAC sequences (Additional file 2A). The method by which ω is estimated fails to capture and quantify the effects of insertions and deletions as part and parcel of "protein-coding" mutations, as they are theoretically equally capable as point mutations in affecting phenotype. While detailed examination of the distribution of synonymous and nonsynonmyous substitution sites in HM00021 revealed no discernable pattern, indels are notably only found in exon 5. However, little more can be inferred about HM00021's candidacy as a hindwing pattern switch gene since it shows no homology to any known gene and does not contain recognisable protein motifs. Amplified sequences that contain indels maintain an open reading frame until the predicted 3' stop codon, suggesting that HM00021 is functionally active.
Though we were unable to successfully amplify sequences of fizzy-like from related species and races, initial ω estimates for fzl are amongst the highest, identifying it as a promising candidate gene. Furthermore, the gene shows highly complex patterns of alternative splicing between races of H. melpomene that correlate with hindwing phenotype (N.J. Nadeau et al., unpublished data). Strong 5' UTR divergence in fzl has been observed between races of H. melpomene, perhaps hinting at 5' UTR involvement in regulation of alternative splicing (N.J. Nadeau et al., unpublished data). While a great diversity of regulatory mechanisms for alternative splicing exist, including usage of regulatory modules that have been found in exons and introns, recent studies point to alternative 5' UTRs as important mechanisms in differential, complex, and highly targeted gene regulation [35,36]. Such findings, in conjunction with results in this present study, suggest that this gene may be a target for selection, either within the coding region, or on neighbouring non-coding regions that regulate splicing in fzl.
In population genetic surveys between two Peruvian races of H. melpomene, a region of the HmB locus surrounding the kinesin gene was shown to be highly differentiated . Even more compelling, gene expression studies showed parallel expression changes between races of H. melpomene and H. erato in exon 13 of the kinesin gene and in-situ hybridization experiments suggest consistent spatial and temporal patterns of gene expression on the forewing . At kinesin, rates of evolution are relatively high along both the H. erato and H. melpomene branches, though ω values do not significantly differ from each other or from the background ratio (Figure (Figure5B).5B). Alignments of kinesin sequences amplified from many species and races revealed not only an abundance of SNPs, but also a profusion of indels that maintained open reading frames, suggesting that rapid sequence evolution had not simply resulted in non-functional proteins. Between the BAC sequences of H. erato and H. melpomene across exon 13, there are at least nine indels ranging in size from one to eight amino acids (Additional file 1A). Surprisingly, indel patterns do not agree with overall sequence divergence patterns in that H. melpomene and H. charithonia, species belonging to each of the two main clades, appear to share many of the amino acid indels (Additional file 1A). Together, the evidence supports kinesin's strong candidacy as a potential HmB switch gene.
Not surprisingly, these very genes of interest identified in preliminary BAC analyses as the most "fast evolving," such as fzl, DALR, and kinesin, were difficult to amplify across species for further phylogeny-based analyses. Thus, in some ways our methods are biased against the most interesting genes. Fortunately however, analyses using BAC sequences available for three Heliconius species include all genes in the region, so are not biased in this way.
The present study did not find strong support for protein-coding changes, at least those detectable through codon-based analyses, to be responsible for the control of major wing pattern elements in Heliconius. However, these analyses have been valuable in highlighting regions within each locus that show some signature of selection. Such patterns hint at the possibility of selection primarily acting on non-coding or regulatory regions found between or proximal to those genes, which have reduced effective population sizes in surrounding genomic regions. Interestingly, the identification of these gene regions corroborates the findings of related research efforts.
Over recent years, a paradigm has emerged for the origin of novel morphological adaptations that involves co-option of genes with conserved functions through novel regulatory interactions. However, this paradigm has arisen from studies using methods that are inherently biased towards discovering roles for conserved genes, such as the use of cross-staining antibodies. Here, we have found higher rates of sequence evolution in genes showing little or no homology to known proteins. The strong candidate gene for the yellow band (HmYb), fizzy-like (HM00025) shares at most only 26% amino-acid identity with other members of the fizzy-family. Findings from this present study raise the suspicion that some genes involved in wing colour patterning of Heliconius may be evolving so rapidly that homology cannot be easily established.
Protein-coding sequences of all predicted genes within the HmYb and HmB loci were used to search the NCBI Lepidoptera nucleotide and EST databases using tBLASTn to acquire Bicyclus, Papilio, Bombyx, Spodoptera and any related Lepidoptera nucleotide sequences. We searched consensus EST contigs available on Butterflybase http://butterflybase.ice.mpg.de/ and 454 transcriptome sequences using local BLAST, which were available for the following species/races: Heliconius melpomene cythera (short read archive accession number SRX005618), Heliconius melpomene malleti (SRX005617), Heliconius numata population 1 (Tarapoto, Peru, Huallaga biogeographic region; SRX014029), Heliconius numata population 2 (Yurimaguas, Peru, Ucayali biogeographic region; SRX014035), and Heliconius erato .
Using CodonCode alignment software, 454 consensus sequences and raw reads of each species or race for each gene were aligned against the BAC sequence to ensure that the transcripts were from the genes of interest. After eliminating misaligning transcripts, sequences were assembled into one contig per taxon per gene.
Contigs were truncated to give a core region of maximum overlap between all species and compiled into input files formatted for analysis using PAML (Phylogenetic Analysis by Maximum Likelihood; Ziheng Yang, Department of Biology, University College London). Input files were analyzed with the YN00 program using the method of Yang and Nielsen  to estimate pairwise rates of non-synonymous (dN) and synonymous substitutions (dS) between taxa (run mode = -2). If BAC sequences were available for all three Heliconius species of interest (H. melpomene, H. numata, and H. erato), the one-ratio model was applied using the CODEML program in PAML, with fixed rates (model = 0, NSsites = 0), to calculate a single background dN/dS (ω) for each tree per gene. Occasionally, complete consensus sequences constructed from 454 contigs were used to substitute missing BAC sequences. Analyses were not run on genes with sparse or no coverage from 454 sequences.
When polymorphisms were observed between contigs or raw reads within a race or species, several analyses were run on possible variants or haplotypes of the sequence. The preferred method was majority rule, in which the most commonly observed base was selected, but alternative base selections were also tested and the most conservative estimate, or the lowest ω value, was reported. Pairwise and background estimates of ω were plotted across the HmYb and HmB loci (Figure (Figure2,2, ,3).3). Gene annotation images of the HmYb and HmB BAC walks are previously published [17,18].
Pupae of H. m. cythera, H. m. plesseni, H. numata, Heliconius erato lativitta, Heliconius erato cyrbia, Heliconius himera, Heliconius doris, Heliconius charitonia, Heliconius sara, Heliconius hecale, and Dione juno were purchased from Stratford Butterfly Farm, UK. Forewings, hindwings, and thorax tissue were dissected at the early pupal (EP) and early melanin (EM) stages to be stored in RNAlater at -80 C (for staging see ).
Total RNA was extracted from whole forewing and hindwing tissue separately using TRIzol reagent (Invitrogen) and an RNeasy kit (Qiagen), and treated with DNAse using a TURBO DNA-free kit (Ambion). RNA samples (2 ug) were reversed transcribed into cDNA using Bioscript (Bioline) and random hexamer primers. Genomic DNA (gDNA) was extracted from <25 mg of thoracic tissue for each species/race using a DNeasy Kit (Qiagen). Final DNA concentration was adjusted to 25 ng/ul for all subsequent RT-PCR reactions. The RPS13 housekeeping gene was amplified from all cDNA and gDNA samples as a positive control.
For each of the eight candidate genes that were selected based on preliminary tests of selection or previous population genetics analyses, multiple primers were designed using the BAC sequences to amplify these genes from extrated cDNA and gDNA (Additional file 3). Primers were initially tested with EP and EM cDNA. Polymerase Chain Reactions (PCR) contained 20-50 ng of cDNA, 1x reaction buffer, 2.0 mM MgCl2, 0.2 mM dNTPs, 50 pmol of each primer, 0.1 ul of Taq polymerase (Bio-Line) and were run in a thermal cycler with the following conditions: 2 min at 94°C, 40 cycles of 20 s at 94°C, 30 s at 55°C, 2 min at 72°C, and 72°C final extension. PCR products were visualized on a 1.5% agarose gel. When no PCR product bands were observed, PCR experiments were repeated at lower annealing temperatures (51 - 49°C). If amplification continued to prove unsuccessful, likely due to developmental stage specific gene expression, PCR experiments were repeated with gDNA and gDNA-specific primers.
When multiple bands were observed due to non-specific amplification and/or amplification of two alleles with different intronic insertions or deletions, bands were cut from the gel and PCR products extracted using NucleoSpin Extract II Kit (Macherey-Nagel). If multiple PCR products were too similar in size to cleanly separate by cutting, PCR products were cloned using either a TOPO TA cloning kit with one-shot chemically competent cells (Invitrogen) or a pGEM-T easy cloning kit with JM-109 chemically competent cells (Promega). Depending on the number of different PCR products observed in each reaction, 5 to 20 colonies were chosen per transformation, and cells were collected, suspended in 50 ul of sterile water, lysed for 5 minutes at 95°C, and used directly as template in subsequent PCR reactions with either M13 or T7/SP6 primers. PCR products were prepared using a standard big dye protocol before sequencing using an ABI13730 capillary sequencer.
All amplified sequences not previously available were deposited in Genbank. Accession numbers are as follows: HM00001 (trehalase 1a) [Heliconius melpomene plesseni (HQ328814), Heliconius doris, (HQ328815), Heliconius erato lativitta (HQ328816), Heliconius sara (HQ328817), Heliconius charithonia (HQ328818)]; HM00021 (hypothetical protein) [H. doris (HQ328805), H. m. plessseni (HQ328806), Heliconius erato cyrbia (HQ328807), H. e. lativitta (HQ328808), Heliconius himera (HQ328809), H. charithonia (HQ328810), H. sara (HQ328811)]; HM00024 (LRR) [H. m. plesseni (HQ328812), H. charithonia (HQ328813)]; HM01000 (MAX-dimerization) [H. himera (HQ328819), H. e. cyrbia (HQ328820), H. charithonia (HQ328821), H. e. plesseni (HQ328822), Dione juno (HQ328823)]; HM01018 (rab-kinesin) [H. doris (HQ328824), H. charithonia (HQ328825), H. e. cyrbia (HQ328826), H. e. lativitta (HQ328827)]; HM01022 (ashwin) [H. e. plesseni (HQ328828), H. doris (HQ328829), H. himera (HQ328830), H. charithonia (HQ328831), H. e. lativitta (HQ328832), H. e. cyrbia (HQ328833)].
Sequence traces were trimmed and contigs of each species were assembled for each gene using CodonCode Aligner software. Phylogenetic trees were newly constructed for each gene using Maximum Likelihood (ML) in Phylip 3.69 (Joseph Felsenstein, University of Washington). The dnaml program with the JTT (Jones Taylor Thornton) model category was used to calculate distances. Bootstrap re-sampling was done with 100 replicates using the Seqboot program. Matrices were then transformed into trees using the Fitch-Margoliash tree drawing method without molecular clock, and a consensus tree was produced using the Consense program.
The topologies of these genealogies were compared to the most recent published studies of Heliconius phylogenetics . When the two topologies differed, the Phylip generated candidate gene tree was used as the tree file for PAML analyses. However, nodes with bootstrap values below 50 were resolved using known phylogenetic relationships between species .
When nucleotide sequences contained many indels (for kinesin and HM00021), protein translations were aligned using ClustalW and non-overlapping regions excluded from analyses (see Additional files 1, 2 for alignments). All available contigs for each gene were formatted into single infiles compatible with PAML.
Tests of selection were run using the CODEML program in PAML, using the one-ratio model (NSsites = 0; model = 0) to produce a single dN/dS (ω) for each tree, and the free-ratios model (NSsites = 0; model = 1), which calculates different ω values for each branch. Analyses were again rerun with model = 2, or the multiple-ratios model, producing different ω for groups labelled in the tree file. To compare rates of evolution in the 'diverse lineages', H. melpomene, H. numata, and H. erato, branches in these clades were assigned separate labels and ω values estimated separately.
The effect of recombination events on detecting positively selected sites for the six selected candidate genes was investigated using GARD (Genetic Algorithms for Recombination Detection) available on the http://datamonkey.org HyPhy webserver . AICC scores rank goodness of fit of a model, with lower scores being better. ΔAICC is a measure of fit of a model accounting for recombination breakpoints, with the greater the difference between the AICC scores of the null and recombinant models, the better the fit. KH tests determine statistical significance of incongruence between tree topologies generated using recombinant fragments and are reported in addition to ΔAICC scores for model validation. GARD topologies generated using recombinant fragments were then used for FEL (Fixed Effects Likelihood) tests of selection .
GCW carried out molecular and genetic experiments, participated in the design and coordination of the study, analyzed the data, and drafted the manuscript. CDJ conceived of the study, participated in its design, and helped to draft the manuscript. MJ participated in the design of the study and helped to draft the manuscript. All authors read and approved the final manuscript.
ClustalW amino acid alignment of exon 13 of kinesin (HmB). The alignment reveals at least 12 discrete regions of indels (A). All nucleotide sequences, despite the abundance of indels, maintain open reading frames until the end of the exon. In comparison to the remainder of this large gene, the vast majority of indels are found within exon 13, which is the penultimate and largest exon. Though overall sequence alignment predicts greater similarity between H. erato and H. charithonia (B), an examination of the pattern of indels suggests that H. melpomene and H. charithonia share a similar pattern. Blue boxes indicate shared sequence deletion between H. melpomene and H. charithonia, and orange boxes indicate shared insertion of amino acids (A). An unrooted phylogram of exon 13 constructed using the dnaml program in Phylip reveals larger H. doris sequence divergence (C).
ClustalW amino acid alignment of exon 5 of HM00021 (HmYb). Alignment reveals four regions of insertions and deletions (A). All nucleotide sequences, despite the indels, maintain open reading frames until the end of the exon. All indels are found within exon 5, which is the last and largest exon, and the region with the most 454 contig support. A comparison of alignment scores indicates that H. erato and H. melpomene share highest sequence similarity and that H. doris has the most divergent sequence (B).
Primers used for amplifying candidate genes from related Heliconiini.
We would like to thank Laura Ferguson, Robert Jones, Simon Baxer, Stephen Montgomery, and Ronald Lee for assistance with methodology and data analyses. We also thank Bill Amos for comments on early versions of this manuscript. This study was supported in part by the Leverhulme Trust, the Downing Scholarship foundation at Pomona College, and the Sherwood Foundation.