|Home | About | Journals | Submit | Contact Us | Français|
Both genome content and deployment contribute to phenotypic differences between species1-5. Sex is the most important difference between individuals in a species and has long been posited to be rapidly evolving. Indeed, in the Drosophila genus, traits such as sperm length, genitalia, and gonad size are the most obvious differences between species6. Comparative analysis of sex-biased expression should deepen our understanding of the relationship between genome content and deployment during evolution. Using existing7,8 and newly assembled genomes9, we designed species-specific microarrays to examine sex-biased expression of orthologues and species-restricted genes in D. melanogaster, D. simulans, D. yakuba, D. ananassae, D. pseudoobscura, D. virilis and D. mojavensis. We show that averaged sex-biased expression changes accumulate monotonically over time within the genus. However, different genes contribute to expression variance within species groups compared to between groups. We observed greater turnover of species-restricted genes with male-biased expression, indicating that gene formation and extinction may play a significant part in species differences. Genes with male-biased expression also show the greatest expression and DNA sequence divergence. This higher divergence and turnover of genes with male-biased expression may be due to high transcription rates in the male germline, greater functional pleiotropy of genes expressed in females, and/or sexual competition.
There are numerous case studies demonstrating that orthologues with sex-biased function diverge more rapidly than genes with non-biased function10. To determine systematically the relative contributions of gene content and expression divergence to sexual differences, we sampled sex-biased expression within the Drosophila genus using species-specific microarrays designed for the closely related D. melanogaster, D. simulans and D. yakuba group (common ancestor, 10–13 million years ago), and for the more distantly related D. ananassae, D. pseudoobscura, D. virilis and D. mojavensis (common ancestor, 40–65 million years ago) (Supplementary Table 1). The species-specific platform eliminated confounding effects of sequence divergence on hybridization and allowed us to assay the expression of lineage-restricted genes.
Previous work has demonstrated that sex-biased expression in D. melanogaster adults is substantial, primarily owing to gameto-genesis10. This seems to be characteristic for the genus (Fig. 1, and Supplementary Fig. 1). Generally, we observed greater male-biased expression (~7–14% of the transcriptome) relative to female-biased expression (~3–9% of the transcriptome), at a significance value of P≤0.01 (Mann–Whitney, false-discovery-rate-corrected). The exceptions were D. pseudoobscura (~16% female- and male-biased expression) and D. mojavensis (~12% female- and male-biased expression). Additionally, the magnitude of male-biased expression was generally greater than the female-biased expression—the average log2 female:male expression ratio was −1.2 for genes with male-biased expression and 0.8 for genes with female-biased expression. This indicates that there were more genes approaching male-specific expression than female-specific expression. The genes that showed sex-biased expression in each species are listed in Supplementary Information (Supplementary Tables 3–16).
To examine expression divergence over time, we parsed the genes with orthologues in every species and constructed a pairwise matrix of log2 female:male expression ratios. We compared expression within species (two strains of D. simulans), between species within the closely related melanogaster subgroup, and between all seven species (Fig. 2a–c). Similar pairwise matrices for quadruplicate replicateswithin each species were also plotted as a baseline measurement of technical noise and biological variability (Supplementary Fig. 2). All expression ratio plots were linear and showed increasing expression divergence with inferred genetic distance.
There was an especially clear relationship between sequence and expression divergence. Neighbour-joining trees of expression divergence (from the pairwise expression ratios between each species; 1−Pearson's r; Supplementary Fig. 3), or by sequence divergence9,11 have the same topology (Fig. 2d). Expression divergence tightly correlated with time (Fig. 2e, r2 = 0.96), which may provide a useful tool in molecular phylogenetics.
Although the whole-genome trends in expression divergence were both obvious and clear, at the gene level, the magnitude of expression divergence was modest. Only 384 orthologue pairs (0.3%) showed significant female-biased expression in one species and significant male-biased expression in another. Switches between highly female-biased expression and highly male-biased expression were never observed (Fig. 2c). Extensive (20%) categorical changes in sex-bias class, especially for genes with male-biased expression, were previously reported between D. melanogaster and D. simulans12,13. We observed a categorical change in sex-biased expression in 12% of the orthologues between these two species, but the changes were dominated by low magnitude changes between modest sex-biased expression and non-sex-biased categories. These values are highly sensitive to arbitrary significance-level cut-offs; however, it was clear in exploratory plots of expression ratios that genes with male-biased expression showed greater expression divergence (Fig. 2b, c). Plots of expression ratio standard deviations against average expression ratio (Fig. 3a) also showed a clear excess of variable expression among orthologues with male-biased expression (P<10−8, chi-squared test). Thus, male-biased expression contributes heavily to overall expression divergence.
To determine if particular types of genes show greater or lesser expression divergence we analysed Gene Ontology14 (GO) terms. Unsurprisingly, genes annotated as ‘unknown function’ are significantly over-represented (P<10−8, Fisher's exact test) among genes with variable expression. Genes with ‘transcriptional regulation’ annotations were under-represented in the same gene set (P<10−4, Fisher's exact test), suggesting that genes involved in transcription regulation are under constraint. Similar constrained expression of transcriptional regulators was observed in a study of metamorphosis in the melanogaster subgroup5.
Just as changes in DNA sequence can have consequences ranging from deleterious to neutral to advantageous15, changes in gene expression should have variable effects, owing to underlying mutations in transcription factors, cis-regulatory sites and post-transcriptional regulators, and the resulting variance will be subject to drift and selection2,3,5,13,16-18. We were able to distinguish expression differences between species well enough to show a linear relationship with time at the full-transcriptome level, but does this apply to individual genes?
To determine if there is a common set of orthologues that can tolerate variable expression (that can be thought of as the thematic equivalent of a synonomous codon substitution), we asked if expression divergence between orthologues within the melanogaster subgroup correlates with the expression divergence between more distantly related species. We found no significant correlation between orthologue expression divergence between groups of species (r2 = 0.08, Fig. 3b). Genes with greater expression divergence in the melanogaster subgroup and the remaining species are different. Thus, although overall expression divergence shows a clock-like behaviour (reflecting mutation accumulation in a neutral model, or an adaptive speed limit in a selection model), different individual genes contribute to this global expression divergence in different amounts. This suggests that there is not a common set of genes that tolerate large drifts in sex-biased expression ratios.
To analyse further the orthologues with the most divergent expression, we selected orthologues with the greatest expression divergence (s.d.>0.5) and subjected them to cluster analysis with species-order fixed (Fig. 3c). Strikingly, even those genes with the most variable expression were organized into well-defined clusters. Each of the clusters was subsequently analysed to look for patterns of change. We observed three distinct cluster types revealing expression divergence between lineages, aberrant expression in a single species, and unpatterned variability (Fig. 3c, d). For example, cluster ‘A’ shows higher male-biased expression in just the melanogaster subgroup (D. melanogaster, D. simulans and D. yakuba); cluster ‘B’ shows increased male-biased expression in D. pseudoobscura only; and cluster ‘C’ shows no evidence for a phylogenetic trend. Briefly, among the 5% of common orthologues with the most variable expression, 52% exhibited lineage-specific, 22% species-specific and only 25% unpatterned expression variability.
Having only a few sequenced genomes seriously hinders the study of genes that are species- or lineage-specific (species-restricted). We took advantage of the species-specific array design to determine the contribution of common orthologues and species-restricted genes to overall sex-biased expression patterns (Fig. 4a, b). Female-biased expression was over-represented (P<10−2, chi-squared test) among common orthologues in four of the seven species, whereas male-biased expression was always under-represented. The pattern was reversed among the species-restricted genes. Female-biased expression of species-restricted genes was less prevalent in all species except D. virilis, and male-biased expression was more prevalent in each of the species examined. Female-biased expression was also under-represented among paralogues (Supplementary Fig. 4). Similar results were obtained using TBLASTN methods to detect genes that had diverged to obscure orthology (Supplementary Fig. 5). These suggest that genes with male-biased expression have higher effective birth and extinction rates.
We also asked if sex-bias and expression divergence correlate with sequence divergence among orthologues. If similar selective pressure acts on both protein-coding capacity and expression at a given locus, then they should correlate. However, protein-coding capacity and expression divergence need not be tightly coupled. For example, high expression divergence can result from changes in upstream transcription factors or the cis-regulatory sites that they bind19.
Synonymous (KS) and non-synonymous substitution rates (KA)in protein-coding genes were used to examine sequence divergence20. Multiple substitutions occur at a given site between distantly related species (for example, D. melanogaster and D. mojavensis) making KA/KS ratios much less reliable, and therefore KA/KS ratios were used only within the melanogaster subgroup (Fig. 4c, d). Genes with male-biased expression were expected to show higher KA/KS ratios10. Indeed, common orthologues with male-biased expression had KA/KS values within the melanogaster subgroup (0.129), more than two times those of common orthologues with female-biased expression (0.061). Interestingly, common orthologues with non-biased expression showed intermediate KA/KS values. We observed a strong correlation between expression and sequence divergence among the genes showing the greatest expression divergence (Fig. 4c), as has also been seen in mammals21. Additionally, species-restricted genes had higher sequence-divergence than common orthologues for all expression categories (Fig. 4d), as has been seen in vertebrates22. Perhaps expression divergence, gene turn-over, sex-bias and sequence divergence of individual genes are often coupled to the same selective forces.
The contrasting divergence and turnover patterns of genes with male-biased expression relative to those with female-biased expression is somewhat surprising. Reproduction is the function of a couple, not an individual; therefore co-evolution of reproductive traits is expected to occur. For example, selection for sperm tail length in Drosophila males is coupled to selection for length of the seminal receptacle in females23. There are a number of possible explanations. There may be greater de novo generation of genes with male-biased expression as a result of simple sequence requirements for core promoter generation24 and extremely high levels of RNA polymerase in spermatocytes25. This combination might result in excessive transcription of intragenic regions26. A few of these new genes with male-biased expression might be functional, but most of these ‘de novo’ genes would be expected to rapidly degenerate. Alternatively, genes required for oogenesis may be more constrained because of pleiotropy or the under-representation of paralogues with partially overlapping functions. Many D. melanogaster genes required for female fertility are also required for organismal viability27, and genes with clear multiple functions, such as those encoding ribosomal proteins, are overexpressed in ovaries relative to testes28. Finally, male–male competition might be particularly strong29. The addition of more sequenced genomes will provide ample opportunities to explore these questions further.
Species were grown on standard media (Tucson Drosophila Stock Center). We isolated messenger RNA from adult females and males grown at 22 °C (5–7 days post eclosion), and labelled and hybridized using standard methods.
Oligonucleotide arrays of 50-mers (NimbleGen Systems) were designed against draft assemblies and ab initio annotations we contributed to the gene model reconciliation9 (Supplementary Table 2). D. melanogaster 60-mer expression array (NimbleGen Design ID 2005-10-17_Dmel4_60mer_exp) designed on the basis of Flybase annotation V4.2 was used for D. melanogaster hybridizations. For this report, we remapped all array elements to current consensus gene models9. Our expression results and conclusions were similar using the original models. Because low-magnitude expression divergence is difficult to distinguish from noise, we performed at least quadruplicate replicates for each species and only channels passing a stringent quality control regimen were used in the final analysis (72 channels total). Full platform descriptions and data are available at the GEO under accession GSE6640.
For each gene, log2 intensities for female and male expression were compared by non-parametric two-sample Mann–Whitney tests to generate the significance of sex-biased expression (P ≤ 0.01) and ratios of each gene were calculated as the average probeset intensity in female channels divided by the intensity in male channels. Common orthologues are present in all 7 species and species-restricted genes are present in at least one species, but absent in ≥1 species.
Stocks (Tucson Drosophila Stock Center) of D. simulans (14021-0251.198 and 14021-0251.011), D. yakuba (14021-0261.01), D. ananassae (14024-0371.13), D. pseudoobscura (14011-0121.94), D. mojavensis (15081-1352.22) and D. virilis (15010-1051.87) were grown on standard cornmeal media (Tucson Drosophila Stock Center), with the exception of D. pseudoobscura, D. mojavensis and D. virilis, which were grown on Banana-Opuntia media (Tucson Drosophila Stock Center).
Oligonucleotide 50-mer arrays (NimbleGen Systems) were designed on the basis of draft or versioned genomic assemblies of six Drosophila species (D. simulans assembly: PCAP assembly for white501, GSC, Wash U, 01 December 04; D. yakuba assembly: GSC (WashU), 07 April 2004. D. ananassae assembly: Arachne Assembler, Agencourt, 06 December 2004; D. pseudoobscura assembly: v. 1.03 from FlyBase, December 2004; D. mojavensis assembly: Arachne Assembler, Agencourt, 06 December 2004; D. virilis assembly: Arachne Assembler, Agencourt, 29 October 2004). An average of 10 array probes were selected without bias with respect to position within each of our OLIV gene models. The OLIV set includes both high- and low-confidence models, which included non-overlapping draft EIS gene models based on D.melanogaster orthology (M. Eisen laboratory, v1.0, Feb 2005), ab initio GeneID30 predictions using the D. melanogaster training set, FlyBase31 genes and expressed sequence tag sequence from GenBank32. Array probes were remapped to the final genome assemblies and gene predictions GLEANR9 by BLAT V25x1 (ref. 33). Only probes uniquely and perfectly matched to both annotation and assembly were used for final analysis.
Hybridization was according to the manufacturer's instructions (NimbleGen Systems), except that hybridization was done in custom-made chambers. Arrays were scanned on an Axon GenePix 4000B (Molecular Devices Corporation) and data were captured using NimbleScan 2.1 (NimbleGen Systems). For each species, at least four hybridizations, including technical (dye-flipped) replicates for each of two discrete samples (biological replicates) were performed. Extra hybridizations were performed for a different D. simulans stain (14021-0251.011).
We used a multi-step quality control pipeline. Hybridization channels were retained when experimental intensities were>1 s.d. above mean on-spot background (from non-Drosophila control elements) and the inter-quartile ranges of log2 intensities were>1. Passed channels were normalized using variance stabilization normalization34. Signal variability between replicate channels was then tested by calculating the inter-quartile range of the relative log expression values for each channel against a virtual reference (the median value in all replicate channels for each array element). The channels with inter-quartile ranges of the relative log expression values greater than one were rejected. Passed channels were re-normalized by variance stabilization normalization from the raw data. This approach does not over-normalize the data while assuring that hybridization intensity is consistent between replicate channels.
For each gene, log2 intensities from female-sample single channels and male-sample single channels were compared by non-parametric two-sample Mann–Whitney tests to generate a significance measure. Ratios were then calculated as the average probeset intensity in female channels divided by the average probeset intensity in male channels for each gene. P values were false-discovery-rate-corrected35. The cut-off for sex-biased expression we used was false-discovery-rate-adjusted P≤0.01. Expression was called ‘below background’ when the average probeset intensity was less than the average intensity of negative controls (probes targeting four Arabidopsis genes and two yeast genes) in both sexes.
Orthology calls are as described9. Common orthologues represent orthologues present in all 7 species and species-restricted genes are present in one species, but absent in≥1 species. Paralogues are excluded from most of the data analysis. Multiple sequence alignments of orthologues were imported using the seqinR package36. KA/KS estimates adjusted for differences in transition and transversion rates were calculated from these alignments20. Average KA/KS of common orthologues were calculated from all possible KA/KS values between the melanogaster subgroup, then median KA/KS values were calculated for each category of genes with different sex-biased expression and genes with different expression divergence.
DNA sequence divergence and expression divergence (1–Pearson's r between the sex-biased expression ratio of two species) between each species pair was calculated. DNA sequence divergence was presented using genomic mutation distance by the method described previously11. Neighbour-joining trees were then inferred using DNA sequence divergence and expression distance from six species separately in MEGA4 (ref. 37). The common orthologues with most variable expression among species (s.d.>0.5) were K-means clustered with 10 nodes using the euclidean similarity metric in Cluster 3.0/Tree-View38.
Unless otherwise noted all data handling was performed in BioConductor40.
We thank the Drosophila Comparative Genome Sequencing and Analysis Consortium for access to the assembly, alignment and annotation of the 12 sequenced Drosophila genomes, S. Davis for valuable technical advice, and K. P. White, C. Vinson, A. Clark, M. Lynch and members of the B. Oliver laboratory for helpful discussion and comments on the manuscript. We are supported by the Intramural Research Program of the NIH, NIDDK, except S.K. who is supported by the NIH Extramural Program.
Author Information Reprints and permissions information is available at www.nature.com/reprints.
Full Methods and any associated references are available in the online version of the paper at www.nature.com/nature.