|Home | About | Journals | Submit | Contact Us | Français|
Comparisons of genomic sequence between divergent species can provide insight into the action of natural selection across many distinct classes of proteins. Here, we examine the extent of positive selection as a function of tissue-specific and stage-specific gene expression in two closely-related sea urchins, the shallow-water Strongylocentrotus purpuratus and the deep-sea Allocentrotus fragilis, which have diverged greatly in their adult but not larval habitats. Genes that are expressed specifically in adult somatic tissue have significantly higher dN/dS ratios than the genome-wide average, whereas those in larvae are indistinguishable from the genome-wide average. Testis-specific genes have the highest dN/dS values, whereas ovary-specific have the lowest. Branch-site models involving the outgroup S. franciscanus indicate greater selection (ωFG) along the A. fragilis branch than along the S. purpuratus branch. The A. fragilis branch also shows a higher proportion of genes under positive selection, including those involved in skeletal development, endocytosis, and sulfur metabolism. Both lineages are approximately equal in enrichment for positive selection of genes involved in immunity, development, and cell–cell communication. The branch-site models further suggest that adult-specific genes have experienced greater positive selection than those expressed in larvae and that ovary-specific genes are more conserved (i.e., experienced greater negative selection) than those expressed specifically in adult somatic tissues and testis. Our results chart the patterns of protein change that have occurred after habitat divergence in these two species and show that the developmental or functional context in which a gene acts can play an important role in how divergent species adapt to new environments.
Evolutionary divergence following the split of a species into populations inhabiting different habitats has long been thought to be one of the major drivers of allopatric speciation (Mayr and Diamond 2002). But although changes in isolated genes underlying differences between ecologically divergent sister species have been linked to changes in the environment (Cobbett and Goldsbrough 2002), evolutionary pressures acting on the genome as a whole and the specific role of environmental pressures on divergence have been challenging to study. Are ecological differences between closely related species the result of few changes of large effect or do they result from more diffuse changes throughout the genome? Are all classes of genes equally visible to the actions of positive (or negative) selection? Are genes expressed at different stages of life history or in different tissues equally visible to the actions of natural selection?
Whole-genome scans for selection offer one approach to addressing these questions by providing a genome-scale survey of the signatures of selection both on individual genes (Sutter et al. 2007) as well as on gene categories (Ellegren 2008). Although earlier, single-gene studies provided some insight into recurrent patterns of selection between species (Allison 1954; Hughes and Nei 1988; Fitch et al. 1991; Palumbi and Metz 1991; Swanson and Vacquier 2002), recent genomic work has shown that categories of genes such as those related to immunity, reproduction, and sensory perception are repeatedly under selection in multiple lineages (Clark et al. 2003; Bustamante et al. 2005; Chimpanzee Sequencing and Analysis Consortium 2005; Nielsen et al. 2005; Arbiza et al. 2006; Bakewell et al. 2007; Begun et al. 2007; Gibbs et al. 2007; Sabeti et al. 2007; Kosiol et al. 2008).
Genome-wide comparisons have also been employed to evaluate traditional hypotheses about the ways in which developmental timing can constrain or facilitate the actions of natural selection (Schank and Wimsatt 1986; Raff and Slack 1996; Davidson and Erwin 2006), with mixed results (Garfield and Wray 2009). Although two studies in Drosophila found evidence of constraint in larval development (Davis et al. 2005; Artieri et al. 2009), studies in vertebrates (Roux and Robinson-Rechavi 2008), and in nematodes (Castillo-Davis and Hartl 2002), have failed to find a clear pattern of DNA sequence conservation as a function of developmental timing. These variable patterns suggest that developmental constraint may not be common or may not be visible in the comparisons done to date. However, it may also be the case that the very different modes of development employed by Drosophila, nematodes, and vertebrates introduce different constraints on the actions of selection. Genome-wide comparisons offer an opportunity of distinguishing between these hypotheses.
The above studies demonstrate the utility of whole-genome scans for selection. However, most scans for selection suffer from two limitations. First, they largely seek to explain evolutionary patterns across gene classes or across developmental stages without the context of the organisms’ environment. Comparative work suggests that external environment plays a major role in determining evolutionary rates (Hoffmann and Parsons 1997) and can play an important role in driving speciation in a variety of taxa (Coyne and Orr 2004). It has also been shown that while divergence in patterns of embryogenesis between related taxa often appears to be constrained during “phylotypic” stages early in development, there are a host of exceptions often related to changes in the external environments experienced during development (Wray 2000; Garfield and Wray 2009). In seeking to understand genome evolution between species, it is therefore reasonable to ask how genomes evolve in the context of documented ecological shifts between recently diverged species.
A second limitation is more technical. Because many scans for selection involve comparisons of divergence between “pairs” of species (usually in the form of the ratio of nonsynonymous to synonymous substitutions), they cannot polarize evolutionary change along individual lineages nor can they distinguish between positive selection and a relaxation of constraint for any but the most strongly selected genes. By using sequence from more than two related species and by making comparisons between evolutionary models within a likelihood framework, it is possible to identify both lineage-specific trends and to quantify the relative strengths of positive selection, negative selection, and neutral evolution (Yang 2007).
The strongylocentrotid sea urchins present an excellent genomic system for investigating the role of environmental adaptation in evolutionary divergence. Eight of the ten extant species are found in thermally variable intertidal and shallow subtidal waters, including both Strongylocentrotus purpuratus and the distantly related S. franciscanus (Biermann et al. 2003), whereas two sister species, including the pink urchin Allocentrotus fragilis, inhabit the cold, dark bathyl zone and can be found at depths exceeding 1,000 m (Moore 1959; Sumich and McCauley 1973; Emlet 1995). Although the nomenclature has yet to be revised, A. fragilis lies phylogenetically within the existing genus Strongylocentrotus and shows a divergence time from S. purpuratus of around 5–7 Ma, compared with a split with S. franciscanus at about 20–25 Ma (Biermann et al. 2003; Lee 2003). These phylogenetic investigations support the contention that the shallow-water affinity seen in 8 of the 10 species, including S. purpuratus and the more distantly related S. franciscanus, is ancestral in this lineage (Biermann et al. 2003). In this paper, we focus on the evolution of protein-coding genes in two closely related species, S. purpuratus and A. fragilis, using the more distantly related S. franciscanus as an outgroup.
Although S. purpuratus adults inhabit shallow coastal waters and A. fragilis adults live in deep habitats, both species possess a free-swimming larval stage that feeds in shallow-water pelagic environments (Moore 1959). The environmental distinctions in temperature, light, and pH between the distinct depth zones of their adult habitats may have affected the evolution of these two species. However, the similarities in larval habitats suggest that evolutionary pressures exerted by these urchins’ respective environments may dramatically differ between larval and adult stages.
We used the sequenced genome of S. purpuratus (Sea Urchin Genome Consortium 2006) along with recently available genome-wide sequences from A. fragilis and the outgroup S. franciscanus (Baylor Sequencing Center, Sea Urchin Genome Consortium unpublished; see Materials and Methods) to test hypotheses about the relationship between genomic evolutionary rates, environmental adaptation, and developmental constraints. We compare rates of protein evolution and signals of both positive and negative selection across the coding genomes of S. purpuratus and A. fragilis as polarized by the outgroup S. franciscanus. In addition, we provide a map of the expression patterns of these genes by using microarrays to quantify gene expression at different life-history stages and in different tissues in S. purpuratus. These data together allow us to map patterns of positive and negative selection as a function of life history, tissue specificity, and habitat use.
Using both pairwise dN/dS values and the results of a branch-site model, we find evidence for elevated rates of amino acid evolution and increased positive selection in genes expressed exclusively in adult somatic tissues as compared with genes expressed in larvae or ovaries. Not surprisingly, we also find genes expressed exclusively in testis and those involved in immune responses and sensory perception to be evolving rapidly, independent of habitat in all species. In addition, we find stronger signals of positive selection in the lineage leading to A. fragilis both across the whole genome and in genes expressed only in adult somatic tissues. These patterns, along with signals from a categorical enrichment of branch-specific estimates of selection, support the hypothesis that changes in the adult habitat of A. fragilis have induced rapid protein evolution. However, we also find evidence of a role for developmental/tissue-specific constraints in determining evolutionary rates in the form of increasing levels of negative selection from ovary to larvae to adult tissues in all three species.
Annotations of the S. purpuratus genome were carried out with a computational algorithm to produce what are known as “GLEAN” models or GLEANs (Davidson and Erwin 2006; Elsik et al. 2007). These annotations were then edited by comparing annotated sequence features with a whole-genome tiling array (Samanta et al. 2006). In addition, approximately 9,000 of these GLEAN models have been examined and verified by members of the sea urchin research community, and additional gene annotations were added. The sequences of all computationally and manually annotated genes were downloaded and mapped to an updated genome assembly (version 2.1) using BLAT (Kent 2002) to remove extraneous annotations resulting from earlier assembly errors. To remove highly similar sequences that might complicate genomic alignments, any annotation with >90% sequence similarity to another annotation was removed from further consideration.
Four hundred fifty-four-based whole-genome shotgun sequence (~2.1× coverage) for A. fragilis and S. franciscanus were downloaded from the Baylor Sequencing Center and screened for read quality as follows: Individual 454 reads were removed if the average quality score was <25 over the length of the read or if the length of the read fell within the top or bottom 0.5% of all read lengths, as these features were shown to be strongly associated with read errors in a previous study (Huse et al. 2007). After these filters, any sequence containing an ambiguous base call (“N”) was removed from the data set, and any bases with individual quality scores <20 were then set to N for subsequent alignments. Reads orthologous to regions of S. purpuratus GLEAN models were identified and assembled as described below.
Branch-site models can be enormously powerful in detecting selection along individual lineages (Zhang et al. 2005). However, to be effective, these methods require substantial sequence overlap from all species under consideration, including outgroup taxa. With 2.1× coverage from two species, we could not construct branch-site models for all S. purpuratus annotations. We therefore took a more comprehensive approach, looking for concordance between branch-site models and more traditional comparisons of pairwise dN/dS values as described below.
Pairwise and three-way whole-gene alignments were created by scaffolding the 454 reads from A. fragilis and S. franciscanus against the S. purpuratus gene annotations. First, a set of potentially matching 454 reads for each annotation were identified using Blast (Altschul et al. 1990) by querying the 454 libraries with each quality-filtered S. purpuratus annotation and taking all matches with e-scores < 0.001. Each candidate 454 read was then “Blasted” back to the S. purpuratus genome. Only those reads whose best match was to the reference annotation were used for subsequent alignment steps. A S. purpuratus annotation and its corresponding pools of candidate 454 matches were then aligned using the TBA pipeline (Blanchette et al. 2004) using the BlastZ settings T = 1, E = 100, Y = 2,000, K = 2,300, L = 3,000 for the S. purpuratus–A. fragilis pairwise alignments and the settings T = 1, L = 3,000, O = 400, E = 30 for pairwise alignments between S. purpuratus and S. franciscanus. During this alignment process, custom scripts written in Python were used to remove individual reads that introduced a stop codon or had >40% divergence. Three-way alignments were generated for genes with successful alignments to both A. fragilis and S. franciscanus using the program “multiz” (Blanchette et al. 2004) and analyzed using branch-site models (see below).
In addition to branch-site models, estimates of dN, dS, and dN/dS for pairwise alignments between S. franciscanus–S. purpuratus and A. fragilis–S. purpuratus were carried out using PAML (Yang 1997). Expected codon frequencies were estimated using the frequencies of nucleotides at first, second, and third codon positions along the gene as was done for the branch-site specific model (see below). As a conservative measure, any gene region with zero silent-site divergence (dS = 0), an ungapped alignment length <100 bp, or with dN/dS values in the top 0.5% of all pairwise dN/dS values were removed from all subsequent analyses.
Because evolutionary rates can vary across a protein, long alignments can underestimate locally elevated dN/dS. To ensure that this effect did not bias our estimates, we calculated dN/dS values along individual reads from the A. fragilis trace library. From the quality-filtered A. fragilis reads, we performed a nucleotide Blast of each A. fragilis read to the S. purpuratus GLEAN3 database, rejecting any alignments that had a best alignment score less than 100, had a second-best alignment score greater than 70% of the top hit (to reduce issues of paralogy), any frameshift gaps, or that were less than 99 bps in length. Once this first set of alignments was produced, we also purged any overlapping reads with more than 50% overlap of the shorter read from the data set, retaining the longest read for any overlapping region. For each alignment in this set, we calculated dN/dS using CODEML from the PAML package (Yang 2007).
To estimate the strength of negative and positive selection acting on genes along particular lineages, we implemented the “modified branch-site model A” of Zhang et al. (2005) using the program HyPhy (Pond et al. 2004). In this model, one lineage is designated as the “foreground” lineage (along which we estimated positive selection), whereas the other lineages are designated as “background.” Briefly, maximum likelihood methods (Yang 2002) are then used to identity the relative proportion of sites evolving under; neutrality, positive selection, or negative selection; the strength of selection (ω, or the instantaneous rate of accumulation of nonsynonymous changes to synonymous changes) for sites evolving under negative or positive selection; the lengths of each branch; and the ratio of transversion rates to transition rates, kappa. Because many of our alignments are relatively short, we could not confidently estimate all the parameters of Zhang et al.’s (2005) model along all lineages simultaneously. We therefore adopted an alternative strategy, fitting the model to each alignment letting each of S. purpuratus, A. fragilis, and S. franciscanus, in turn, serve as the foreground lineage.
From the model outputs, we defined selection scores representing the strength and frequency of positive, neutral, and negative selection across all sites in a gene. The positive selection score is calculated as the total fraction of sites with ω > 1 multiplied by log(ω_2), where ω_2 is the estimate of ω for these sites. The negative selection score defined as the fraction of sites with ω < 1 multiplied by –log(ω_0), if ω_0 > 0 and by L otherwise, with ω_0 defined as the estimate of ω for sites under negative selection and L defined as the negative logarithm of the smallest nonzero ω_0 in the data set. The neutrality score is defined simply as the fraction of sites evolving neutrally. Finally, we define an overall ω for a gene, ωFG, as ωFG = f0 × ω0 + f1 + (f3) × ω_2 where f0, f1, and f3 are defined for a lineage as the fraction of sites under negative selection, positive selection, or neutrality along that lineage. The value ωFG is functionally equivalent to the pairwise dN/dS ratios described above but incorporates information about how changes have been distributed along particular lineages. We note that because neutrally evolving sites are explicitly considered in the branch-site model (Pond et al. 2004; Zhang et al. 2005), ωFG is less sensitive to lineage-specific relaxations of negative constraint than other methods of calculating dN/dS (Nei and Gojobori 1986).
In addition to estimating the strength of selection along a given lineage, it is possible to ask, in a statistical sense, the degree to which a model that includes positive selection along a foreground lineage is a better fit to the data than a model in which all sites are evolving neutrally or under negative selection along all lineages. To evaluate this contrast, we implemented a likelihood ratio test comparing the log-likelihood of best-fit model above with the best log-likelihood of a null model, in which ω_2 is fixed to 1, using a χ2 distribution with one degree of freedom (Zhang et al. 2005).
To gain insight into temporal and spatial patterns of gene expression, we used custom microarrays to examine gene expression profiles in S. purpuratus during development and in several adult tissues. In total, 244,000 probes for an Agilent (www.agilent.com) microarray were custom designed by the Systemix Institute based on the sequences of 28,036 predicted genes from the published genome of S. purpuratus, in addition to 641 novel genes predicted from whole-genome tiling arrays measuring the larval transcriptome (Samanta et al. 2006). Probes were 50 nucleotides long and tiled across the 3′-most exon of each gene. Most genes (85%) were represented by nine probes. The probe set also contained 1,236 random sequence probes with no known matches to S. purpuratus as negative controls.
Adult S. purpuratus were collected from Soberanes, CA, held in common garden at Hopkins Marine Station (Pacific Grove, CA) and fed giant kelp (Macrocystis pyrifera) ad libitum. Larvae were reared on Dunaliella tertiolecta and Isochrysis sp. in artificial seawater at 15 °C for 2 weeks after fertilization, following standard protocols (Strathmann 1987) and stored in RLT buffer (Qiagen) at −80 °C until RNA extraction. We chose to rear larvae to a 2-week pluteus stage because both S. purpuratus and A. fragilis larvae are planktonic in similar habitats at this age (Sumich and McCauley 1973; Strathmann 1987). The following tissues were extracted from eight reproductively active individuals (four females, four males): tube feet, coelomocytes, lantern muscle, gut tissue, spine-base muscle, and ovary or testis. Coelomic fluid was drawn through the peristomal membrane and centrifuged at 4 °C for 10 min to pellet coelomocytes before freezing. All other tissues were immediately flash frozen in liquid nitrogen and stored at −80 °C.
All tissues were homogenized in lysis buffer using Qiagen Tissuelyzer, and total RNA was extracted using Qiagen RNeasy kit according to standard protocol. Total RNA was treated with DNase to eliminate genomic DNA contamination using appropriate components of the Macherey-Nagel Total RNA Isolation kit according to manufacturer’s protocols. RNA was assessed for quality and quantity on a Nanodrop spectrophotometer and a 1.5% nondenaturing agarose gel. For each tissue, equal amounts of total RNA was pooled from four individuals. A common reference was created by pooling equal amounts of total RNA from all tissues. Total RNA was reverse transcribed to cDNA, transcribed to amplified RNA (aRNA) and labeled with Cy5 (red) or Cy3 (green) fluorescent dye (Amersham) using Ambion Aminoallyl Amplification kit (catalog number 1753). Tissues were labeled with Cy5 and reference with Cy3, with the exception of testis and gut, for which dyes were swapped. Nanodrop was used to quantify unlabeled and labeled aRNA and to measure fluorescence of labeled aRNA before hybridization. Microarrays were competitively hybridized according to Agilent specifications for two-color gene expression at 65 °C for 17 h, each tissue hybridized against the common reference sample for a total of eight microarrays. All arrays were scanned using an Axon GenePix 4000B scanner and data extracted with Agilent Feature Extraction software. Local background subtraction, global loess normalization between channels, and between-array quantile normalization of the resulting data were carried out using the R package “limma” (Smyth 2004).
To account for nonspecific hybridization among marginally related sequences, we subtracted 3× the median value of the 1,236 negative controls from each probe on that array, setting any resulting negative values to zero. The value of 3× was chosen both for consistency with a previous microarray study in sea urchins (Wei et al. 2006) and because this subtraction resulted in a log-normal distribution of expression values, as would be expected on the basis of previous expression studies (Galau et al. 1974; Hoyle et al. 2002). The median value of all probes for a gene was then used to classify a gene as being expressed (1; at least one probe for a gene >0) or not (0; all probes for a gene ≤0 after negative control subtraction) in each tissue. These binarized values, used in all subsequent analyses, offer two significant advantages over quantitative data: 1) they allow characterization of the site and timing of gene expression in a categorical way that favors straightforward statistical analysis and 2) they remove biases associated with the fact that many of the genes expressed at (relatively) high levels in early developmental stages are housekeeping genes that are expressed at all life stages, though at lower levels relative to the pool of transcripts expressed at later stages. Binarized data also allows some leeway in using expression measurements in one species to infer patterns of gene expression in closely related species; whereas gene expression levels often change quantitatively between closely related species, the overall presence/absence of a transcript in a given tissue often does not (Blekhman et al. 2008; Babbitt et al. 2010).
For gene ontology classifications, we compared each identified coding region from the S. purpuratus GLEAN3 data set to the ENSEMBL human protein database at National Center for Biotechnology Information using BlastX. We then matched each protein to its best Blast hit, extracted Uniprot ID’s for the human protein, and downloaded the corresponding biological process, molecular function and pathway annotations from the PANTHER gene ontology database (Thomas et al. 2003).
Tests for significant categorical enrichment were performed using the rank biserial correlation on biological processes and molecular functions with at least ten genes and pathway components with at least five genes measured in our study. All categorical enrichment tests compared the distribution of positive selection scores within a category to that of scores outside of that category, following the equation:
where M1 is the mean rank of all the genes measured in our study in the category, M0 is the mean rank of all the genes measured in our study not in the category and the sizes of the groups are n1 and n2. Statistical significance was assessed via 1,000 bootstrap replicates.
We performed two different pairwise alignments between sequences of A. fragilis and S. purpuratus. First, we aligned individual, nonoverlapping reads from A. fragilis to the exon models for S. purpuratus, producing multiple short alignments per gene. We also assembled whole-gene alignments for a subset of the genes shared between A. fragilis and S. purpuratus.
Of the 6.5 million reads obtained from the 454 sequencing of A. fragilis, we found 75,694 nonoverlapping reads at least 99-bp long that showed high-quality alignments to a S. purpuratus GLEAN3-coding region without frameshift gaps. Of these single-read alignments, 5,210 alignments showed fewer than three substitutions and were not considered for further analyses. We also purged a further 6,834 alignments that showed nonsynonymous changes (dN > 0) but very low levels of synonymous changes (dS < 0.01), thus removing very large dN/dS values that might bias results toward the discovery of spurious positive selection. The remaining 63,650 reads aligned to 19,573 genes of the 28,944 identified in the GLEAN3 gene set, with a mean of 3.25 reads in each covered gene. The mean alignment length was 176 bp with an average of 573 bp of coverage per gene. After these quality filters, the coding alignments showed a mean nonsynonymous divergence of 2.29% (±0.02% 95 confidence interval [CI]) and synonymous site divergence of 10.44% (+0.05/−0.06% 95 CI). The data set contains 20,309 reads in 10,052 genes that show no nonsynonymous substitutions (dN = 0).
Alignments of the 454 sequencing reads of A. fragilis against the GLEAN3 protein-coding regions of S. purpuratus produced whole-gene alignments for 14,543 genes after quality filtering. For these 14,543 whole-gene alignments, the mean alignment length was 1,541 bp. These coding alignments showed a mean nonsynonymous divergence of 2.75% (±0.04% 95 CI) and synonymous site divergence of 11.1% (±0.13% 95 CI).
In both whole-gene and per read comparisons, the overall distribution of dN/dS values is log normal and shows a clear skew toward purifying selection (values near zero) across the genome (fig. 1). The per-read analysis, with its shorter alignments, shows a more extreme distribution, with more alignments showing both high and low dN/dS values. The whole gene median dN/dS value is 0.190, which is greater than the median in the more variable per-read data set, which is 0.147.
To compare patterns of selection among genes expressed at distinct life stages, we ran microarray expression assays on seven different life-stages/tissues of S. purpuratus: ovary, testes, 2-week-old pluteus larvae, and 4 adult somatic tissues (coelemocytes, spine base, tube foot, and gut). A dendrogram of expression profiles shows that gut appears to be an extreme outlier to all other tissues and that the three other adult somatic tissues show similar expression profiles (fig. 2). We therefore excluded gut data and combined the other tissues as a single group, which we will refer to as “adult somatic.”
To quantify tissue specificity, we applied a binary score to expression in each tissue without reference to level of expression. Therefore, the expression of each gene can be considered to belong to 1 of 16 categories, representing all possible combinations of each of the four tissue groups under consideration: ovary, 6-arm pluteus larvae, adult somatic (see above), and testis.
Results comparing A. fragilis and S. purpuratus per read suggest that genes expressed in ovary tend to be highly conserved (median dN/dS = 0.085), with significantly lower dN/dS ratios than the whole-genome distribution (Mann–Whitney P < 0.003; fig. 3). In contrast, the median values of dN/dS for adult-specific and testis-specific genes lie significantly above the genome-wide values (median dN/dS = 0.183 and 0.256, respectively; Mann–Whitney P < 0.003), suggesting that either positive selection or relaxation of selective constraint has driven more rapid functional evolution in these genes (fig. 3). The median values of dN/dS for larva-specific genes lie intermediate to those of ovary and testis and are indistinguishable from the genome-wide average (median = 0.162; not significant [ns]). Whole-gene alignments confirm these results, with the same groups showing significant divergences from whole-genome distributions.
We also examined the proportion of genes that fall into each of these expression categories across 1) all alignments and 2) the most rapidly evolving alignments (i.e., dN/dS = 2.0 for the per-read analysis or dN/dS = 1.0 for the whole gene analysis). Because the per-read and whole-gene alignments give similar results (fig. 4), only the per-read analyses are presented in the text. Again, rapidly evolving alignments are much more likely to be adult- or testis-specific (30% and 11% in rapidly evolving vs. 21% and 6% overall, respectively; χ2 P < 0.01; fig. 4) and much less likely to be ovary-specific (9% in rapidly evolving, 16% background; χ2 P < 0.01; fig. 4). Larva-specific genes were not significantly different among groups (3% in rapidly evolving, 4% background, ns; fig. 4).
The above analyses suggest more rapid evolution in adult somatic genes than larval genes between S. purpuratus and A. fragilis, but this result could be due to 1) lower constraints on evolution in adult genes or 2) rapid evolution along the A. fragilis lineage. To test these two hypotheses, we rooted gene sequence comparisons using the outgroup S. franciscanus (Biermann et al. 2003). Among the genes compared between species above, we were able to root the comparisons by mapping S. franciscanus sequences to 9,258 whole-gene alignments spanning the three species. These alignments showed a median gapped length of 1,194 base pairs (+18/−9 bp 95 CI) and median ungapped length of 533 base pairs (±9 bp 95 CI).
Genes evolving along the A. fragilis branch show a mean ωFG of 0.474, significantly higher than the 0.393 seen along the S. purpuratus branch (9,536 and 9,476 genes measured respectively postquality filtering; Mann–Whitney P = 0.004; fig. 5A). We also see that 2.8% of the 9,258 genes with alignments between both A. fragilis and S. purpuratus and between S. franciscanus and S. purpuratus show evidence of positive selection along the A. fragilis branch. By contrast, only 1.5% show such evidence in the S. purpuratus lineage (χ2 P < 0.0001; fig. 5B).
One concern with using less-than-complete genomic coverage in a scan for selection is that sequencing errors might bias the results, leading us to conclude that the less heavily sequenced species has greater levels of positive selection. We see no evidence that this is the case here. Using the outgroup S. franciscanus to assign differences between A. fragilis and S. purpuratus to one of the two lineages, we estimate that there are a similar number of changes unique to each lineage (mean of 11.4 changes per gene in S. purpuratus vs. 11.9 changes per gene in A. fragilis). Furthermore, our branch-site model estimates of dS are approximately equal on both lineages, with a slight elevation in the more extensively sequenced S. purpuratus (0.171 in S. purpuratus vs. 0.162 in A. fragilis).
Along both A. fragilis and S. purpuratus lineages, ovary-expressed genes had a consistently low ωFG (mean = 0.313 and 0.267, respectively), relative to whole-genome values for their species (mean = 0.471 and 0.394, respectively; fig. 6A). In contrast, ωFG in adult- and testes-expressed genes show consistently higher values along both branches (adult mean = 0.490 and 0.452, testes mean = 0.689 and 0.526, respectively; fig. 6A). Each of these comparisons is significantly distinct from whole-genome distributions (all Mann–Whitney P < 0.00625). Larva-specific genes show ωFG values intermediate to those from genes expressed only in ovary or adult and show no significant distinction from whole-genome distributions (larva mean = 0.416 and 0.355, Mann–Whitney P > 0.05).
This pattern is also largely repeated in the proportions of genes showing statistically significant evidence of positive selection by tissue type (fig. 6B). In both A. fragilis and S. purpuratus, ovary-specific genes show the lowest proportions of significant genes (1.4% and 0.4%; χ2 P < 0.05) with larval-specific genes showing only slightly more (1.7% and 0.8%). Adult-specific genes show higher proportions still (3.1% and 2.1% in adults vs. 2.8% and 1.5% whole-genome). Testes-specific genes, in contrast, show a pattern different from that seen in the ωFG values— the proportion of genes showing evidence of positive selection is approximately comparable with those for ovary (2.3% and 0.6%; fig. 6B), suggesting that the overall increase in positive selection in testis-specific genes can be explained by relatively few rapidly evolving genes. The same pattern of relative ωFG values and proportions of genes under selection is also observed along the lineage leading to the outgroup S. franciscanus (supplementary fig. S1, Supplementary Material online).
The branch-specific likelihood model also allows us to examine the relative strength of positive selection, purifying (negative) selection, and neutrality using scores for each type of selection that incorporate both the frequency and the strength (ω) of each of these processes (see Materials and Methods). Comparing across branches, the A. fragilis branch shows more evidence of positive selection (mean positive selection score = 0.0435) than S. purpuratus (0.0323; Mann–Whitney P = 1.566 × 10−10; fig. 7A), with the strongest tissue-specific signal appearing in adult-expressed genes (A. fragilis mean = 0.0453, S. purpuratus mean = 0.0394; Mann–Whitney P = 9.009 × 10−3; fig. 7A). In comparing tissue types to whole-genome distributions along a branch, ovary-specific genes in both branches had positive selection scores below whole-genome values (0.0238 for A. fragilis and 0.0140 for S. purpuratus; Mann–Whitney P < 1 × 10−7). Both adult only and testes tissue shows significant divergence from the whole-genome averages along the A. fragilis branch (Mann–Whitey P < 0.004), and the overall trends are consistent with the tissue-specific estimates of ω (figs. 4A and 7A).
We found no significant differences in the overall neutral scores comparing across branches. However, individual tissues showed proportions of neutral sites both higher and lower than their species’ genome-wide distribution. Ovary-specific genes had a lower proportion of neutral sites than whole-genome values along both branches (Mann–Whitney P < 1.0 × 10−4; fig. 7B). Adult-expressed genes, on the other hand, showed a higher proportion of sites evolving neutrally (Mann–Whitney P < 1.0 × 10−5).
Negative selection scores, showed both branch and tissue-specific effects, with the S. purpuratus branch (Mann–Whitney P < 1.0 × 10−10) showing greater evidence of purifying selection. Across both branches, purifying selection was strongest in the ovary (Mann–Whitney P < 0.004) and weakest in the testes (Mann–Whitney P < 0.001). Larva-specific genes had values of neutrality and negative selection intermediate to those of ovary and adult/testes and show no significant divergence from the whole-genome distributions.
The extent to which a particular gene has experienced positive selection is largely independent along the two lineages (fig. 8A), and the branch-specific positive selection scores for a given gene are poorly correlated across branches (Pearson’s r = 0.08, on 9,258 genes). In contrast, the strengths of negative selection and neutral evolution acting on a given gene are well correlated across branches (fig. 8B–C; r = 0.81, 0.76, respectively).
Because many genes in the sea urchin genome remain unannotated, we assigned each sea urchin annotation to a human gene via protein sequence similarity using BlastP and then classified these predicted annotations into functional categories on the basis of published human ontologies (see Materials and Methods). We then looked for enrichments of strong positive selection scores among genes in specific PANTHER Ontological Category Sets (Thomas et al. 2003) (Biological Process, Molecular Function, and Pathway; supplementary table S1A–C, Supplementary Material online).
Categories of genes involved in immunity, developmental processes, and cell–cell communication show strong evidence of selection along both lineages in all three PANTHER categories. Despite few individual genes showing significant positive selection in both lineages simultaneously (fig. 8), we find the functional categories under selection to be similar along both lineages (supplementary table S1, Supplementary Material online). For example, of the 194 biological process categories in which at least 10 genes occur, 37 categories show selection along one or both branches, 20 of which are under selection in both lineages (P < 0.05). Of the remaining 17, 10 are specific to A. fragilis including several categories related to metabolism, sensory perception, and skeletal development. An additional eight categories are specific to S. purpuratus, though all of these categories, with the possible exception of “gametogenesis,” are similar in nature to categories enriched for positive selection scores in A. fragilis (e.g., the S. purpuratus specific category “oncogenesis” is enriched for genes also classified under the shared category “immunity and defense”).
In addition to functional gene categories that show evidence of selection, those genes that lack a functional annotation because of their poor match to human orthologs show elevated dN/dS values and are themselves enriched for evidence of positive selection. Those genes that successfully map to a human ortholog show a mean dN/dS between S. purpuratus and A. fragilis of 0.21, whereas those genes that have no high-quality human ortholog show a significantly higher mean of 0.51 (Mann–Whitney P < 0.001). The Blast e-value of each gene and its dN/dS between S. purpuratus and A. fragilis are significantly positively correlated (Spearman’s ρ = 0.37, P < 2.2 × 10−16), suggesting that a poor match (high e-value) is likely to have a high dN/dS.
A possible concern with categorical enrichments is that different categories of genes may have differed with respect to information content and mutational target size if there is systematic variation in length. We find that there is only a modest relationship between the mean alignment length of a category and evidence (P-value) of positive selection. Furthermore, of the 78 PANTHER categories that show evidence of enrichment for positive selection in one or both the S. purpuratus and A. fragilis lineage, only four also show enrichment for alignment length, suggesting that alignment length contributes only modestly if at all to our overall picture of categorical enrichment.
Several lines of evidence suggest an important role for environmental adaptation in driving divergence between the sister species A. fragilis and S. purpuratus (figs. 5, 6, and 7). We find that compared with S. purpuratus, the ecologically divergent, but phylogenetically similar, A. fragilis shows significantly higher ωFG values (figs. 5B and 6B), higher positive selection scores (fig. 7A), and a significantly higher proportion of genes showing evidence of positive selection (figs. 5B and 6B). We observe remarkable specificity in the actions of natural selection. Although S. purpuratus and A. fragilis adults inhabit disparate environments, their larvae are ecologically similar (Moore 1959; Sumich and McCauley 1973; Emlet 1995). Reflecting this difference, we find that the above signals of positive selection are stronger for genes acting in adults (50–90% higher in the case of the mean positive selection scores; fig. 7A) than they are for genes expressed exclusively in larvae or ovaries.
Consistent with adaptations to different environments, we observe that there is little correlation in the scores for positive selection for individual genes in the two lineages (Pearson’s r = 0.08), whereas scores for negative selection and neutral evolution are highly correlated (r = 0.89, 0.81; fig. 8). This pattern suggests that while selective constraint (or lack thereof) may be acting similarly across species on a similar set of genes, regardless of the particular lineage or adult environment, the actions of positive selection are acting in a lineage and/or environment sensitive manner. Among the genes that show strong signatures of selection along one, but not both, lineages, several stand out as interesting candidates for genes contributing to the differences underlying these two species (supplementary table S2, Supplementary Material online). Within-species sexual conflict is an important driver of the evolution of gamete recognition proteins between Strongylocentrotid urchins (Palumbi and Metz 1994). In S. purpuratus, sexual conflict is thought to be particularly intense, leading to eggs that are largely incompatible with even very high concentrations of heterospecific sperm (Palumbi and Metz 1991). It is interesting, therefore, to note, that the receptor for egg jelly (GLEAN3_23767) has among the highest ωFG estimates of any analyzed annotation (1.12 over 2,642 aligned bases) and that the biological process category gametogenesis is uniquely enriched along the S. purpuratus lineage. It is worth noting that several genes known to be involved in sperm function, such as Bindin, do not match any human gene and were thus not considered in the categorical enrichment.
Many candidate genes enriched specifically along the A. fragilis lineage are especially notable in light of their potential adaptive roles; multiple carbohydrate sulfotransferases show elevated ωFG scores specifically in A. fragilis, including CHST1 (GLEAN3_03467) and CHST9 (GLEAN3_13284) as does SM30A (GLEAN3_00825), an important component of the larval and adult skeletal matrix (Li and Tedder 1999; Davidson and Erwin 2006; supplementary table S2, Supplementary Material online).
The candidate genes highlighted above show particularly strong signatures of amino acid evolution and radical amino acid changes along individual lineages. Although they are notable for that reason, they are not representative of overall divergence between these species. Even within the more divergent A. fragilis lineage, evolution might be better characterized as a slow shift in many proteins rather than a rapid adaptive revolution. Genes with strong signatures of selection in A. fragilis are scattered across functional classes in the genome and though there are more that are expressed in adult somatic tissue than in larvae (fig. 4), virtually all classes of genes show higher evolutionary rates in A. fragilis (fig. 6). Pervasive shifts across the genome suggest that large habitat shifts cannot always be made just by evolution of a few genes, as has been suggested for invasion of heavy metal soils by land plants (Cobbett and Goldsbrough 2002).
We observe several biological process categories to be under selection in A. fragilis that are fundamentally different from those under selection in S. purpuratus: “sulfur metabolism”, “skeletal development,” and several categories of genes related to endo- and exocytosis consistent with the observation of stronger selection along the more ecologically divergent lineage (supplementary table S1A, Supplementary Material online). That genes involved in metabolism are under selection in A. fragilis resonates with several recent studies identifying metabolic genes as being among the most differentially expressed genes in comparisons between humans and chimpanzees (Uddin et al. 2004; Blekhman et al. 2008; Babbitt et al. 2010) and suggests that changes in diet can lead to profound shifts in genetic architecture between species. The categories associated with endo- and exocytosis may indicate a metabolic change associated with a deep-sea life style or that membranes and membrane-related function have undergone selection to the stable, cold temperatures, and high-pressures in the deep sea (Hochachka and Somero 2002). That skeletal development is enriched for selection may reflect adaptation to the more acidic environments of the deep sea and/or the notable divergence of these species in skeletal morphology (Moore 1959).
We also find several categories of genes under selection in both lineages, especially for immunity, developmental processes, and cell–cell communication. Many of these categories have also been observed to be under selection in primates (Chimpanzee Sequencing and Analysis Consortium 2005; Nielsen et al. 2005; Kosiol et al. 2008) and in fruit flies (Begun et al. 2007). Thus, these genes may be in a constant process of evolutionary updating between species. One possibility is that these genes are involved in coevolutionary races, such as the races between hosts and pathogens or races that accompany coevolution of gamete recognition genes in males and females. Another intriguing possibility is that selection may adapt organisms to different environments using a common set of biological pathways.
Evolutionary developmental biologists since Von Baer (1828) have hypothesized that the process of development imposes constraints on the actions of natural selection, resulting in conserved or phylotypic stages in embryonic development (see Raff 1996). Previous studies comparing ratios of dN/dS between pairs of species for genes active at different stages of development have given mixed support for theories of developmental constraint. Because organisms develop differently, we should not be surprised that developmental constraints, if they exist at all, would leave different signatures in different taxa. However, previous analyses based on pairwise comparisons are unable to distinguish between positive selection, negative selection, and relaxations of constraint in giving rise to differing levels of dN/dS among genes expressed at different stages of development. Our results suggest that developmental and tissue-specific constraint play important roles in shaping genome evolution. Scores for neutrality and negative selection show congruent patterns among distinct tissues along both these lineages as well as the outgroup S. fransicanus, which shares both its larval and adult habitat with S. purpuratus (fig. 7A–C; supplementary fig. S1, Supplementary Material online). Across all three lineages, ovary-specific, and, to a lesser extent, larva-specific, genes not only had lower signals of positive selection and neutrality but also higher levels of negative selection than the whole-genome average. On the other hand, adult and testis-specific genes had the highest levels of positive selection and neutrality as well as the lowest signatures of negative selection in both lineages (fig. 7A–C).
These results suggest some inherent constraints in when and how selection can operate to drive interspecific divergence. Although sexual selection or environmental adaptation can drive, and in this case has driven, divergence in genes expressed exclusively in testis or adult somatic tissues, genes expressed in larva or ovary are apparently more constrained, as evidenced by strong signatures of negative selection and a relative absence of positive selection. Because this pattern is evident not only along the ecologically divergent lineage but also along the two lineages that share the same ancestral larval and adult habitats, we believe that our results constitute evidence in favor of an important role for developmental constraints in shaping between species divergence.
In our comparison of the intertidal S. purpuratus and the deep-sea A. fragilis, we have shown that although much of the genome experiences purifying selection, a small percentage (~1–3%) of the urchin genome is being driven by positive selection. The A. fragilis branch shows greater rates of positive selection than the S. purpuratus branch, especially in ecologically divergent adult tissues, suggesting the action of environmental adaptation. In addition, trends across tissues types appear to be largely explained by a reduction of selective constraint through development. Genes involved in immunity, reproduction and development, signaling cascades, transcriptional regulation, the extracellular matrix, and the cytoskeleton all showed significant enrichment for rapidly evolving genes.
The authors would like to gratefully acknowledge the assistance of the Human Genome Sequencing Center at Baylor University and Andrew Cameron for their cooperation and assistance. Jason Ladner, Melissa Pespeni, Malin Pinsky, and Courtney Babbitt provided comments on an earlier manuscript. This work was supported by the National Science Foundation (0614509 to G.A.W. and 0714997 to S.R.P.), the National Institutes of Health (Systems Biology Center grant NIH 5P50-GM-081883, PI Philip Benfey; 5P50-GM-081883 to G.A.W.), the David and Lucille Packard Foundation, and the Gordon and Betty Moore Foundation.