|Home | About | Journals | Submit | Contact Us | Français|
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Y chromosomes are derived from ordinary autosomes and degenerate because of a lack of recombination. Well-studied Y chromosomes only have few of their original genes left and contain little information about their evolutionary origin. Here, we take advantage of the recently formed neo-Y chromosome of Drosophila miranda to study the processes involved in Y degeneration on a genomic scale.
We obtained sequence information from 14 homologous bacterial artificial chromosome (BAC) clones from the neo-X and neo-Y chromosome of D. miranda, encompassing over 2.5 Mb of neo-sex-linked DNA. A large fraction of neo-Y DNA is composed of repetitive and transposable-element-derived DNA (20% of total DNA) relative to their homologous neo-X linked regions (1%). The overlapping regions of the neo-sex linked BAC clones contain 118 gene pairs, half of which are pseudogenized on the neo-Y. Pseudogenes evolve significantly faster on the neo-Y than functional genes, and both functional and non-functional genes show higher rates of protein evolution on the neo-Y relative to their neo-X homologs. No heterogeneity in levels of degeneration was detected among the regions investigated. Functional genes on the neo-Y are under stronger evolutionary constraint on the neo-X, but genes were found to degenerate randomly on the neo-Y with regards to their function or sex-biased expression patterns.
Patterns of genome evolution in D. miranda demonstrate that degeneration of a recently formed Y chromosome can proceed very rapidly, by both an accumulation of repetitive DNA and degeneration of protein-coding genes. Our data support a random model of Y inactivation, with little heterogeneity in degeneration among genomic regions, or between functional classes of genes or genes with sex-biased expression patterns.
In many animal and plant species, sex is determined by a pair of morphologically and genetically distinct sex chromosomes (X and Y chromosomes in species with male heterogamety [1-3]). A striking common feature of many taxa is the almost complete erosion of genes from the Y chromosome (or W chromosome, with female heterogamety). The Y chromosome also often contains an unusual abundance of repetitive DNA sequences [1,4]. Sex chromosomes evolved independently many times from an ordinary pair of autosomes that stopped recombining with each other after acquiring a sex-determining role [2,3]. In the absence of recombination, these originally homologous chromosomes continued to differentiate. The X chromosome, which continues to recombine in females, maintains most of its ancestral genes whereas the non-recombining region of the Y chromosome degenerates [2,3]. Various population genetic processes can lead to the observed loss of gene function from a non-recombining chromosome [1-3].
For example, the recently completed sequence of the human Y chromosome  has revealed that the 24 Mb male-specific portion of this chromosome contains only 78 protein-coding genes. However, old Y chromosomes, such as the human Y (about 300 million years old), retain few traces of the evolutionary processes that led to their degeneration. The best hope for identifying the causes of Y chromosome evolution is to investigate a young Y chromosome that is still in the process of degeneration.
Drosophila miranda, a close relative of D. pseudoobscura, has a recently formed new sex chromosome pair (called the neo-sex chromosomes). Due to a fusion of an autosome to the Y chromosome about one million years ago , a neo-Y chromosome has been formed. Its unfused homolog, which is in one copy in males and two copies in females, is called the neo-X chromosome. The neo-Y chromosome is transmitted in association with the ancestral Y chromosome through males only. Since male Drosophila lack recombination, this chromosomal arm is completely sheltered from recombination, and thus subject to the same evolutionary forces as a true Y chromosome [7,8]. Interestingly, most of the human Y chromosome is derived from a Y-autosome fusion; thus, the human Y chromosome is also a degenerate neo-Y .
Previous studies have shown that the neo-Y chromosome of D. miranda is in transition from an ordinary autosome into a degenerate Y chromosome [8,10,11]. About one-third of all genes studied to date are clearly non-functional on the neo-Y, containing frame-shift mutations or stop codons . Most remaining genes show various other signs of degeneration, such as an elevated rate of amino-acid substitutions [6,10-13] and less constraint in regulatory regions . Moreover, in situ hybridization experiments and sequence analysis have revealed an accumulation of transposable elements on the neo-Y relative to the neo-X [8,14,15].
However, these previous studies in D. miranda have investigated only relatively small genomic regions (mainly 1 kb fragments amplified by PCR and a few clones isolated from a λ genomic library). These studies are biased towards identifying conserved regions between the neo-sex chromosomes, and may greatly underestimate the fraction of non-functional or missing genes on the neo-Y, or the level of intergenic divergence, including the amount of transposable element DNA on the neo-Y. Here, we study the genomic organization of the neo-Y chromosome using sequenced bacterial artificial chromosome (BAC) clones from 7 regions scattered along the neo-sex chromosomes of D. miranda, which contain over 100 homologous gene pairs. The BAC sequencing approach here should remove the bias of identifying conserved regions, and allows us to obtain an unbiased estimate on the amount of degeneration of the neo-Y. In addition, this much larger data set also enables us to test which genomic features influence Y degeneration.
In particular, we address the following questions. What fraction of genes is non-functional or missing on the neo-Y? Is selection against amino acid changes reduced on the neo-Y? Are certain classes of genes more prone to degeneration? How abundant are transposable elements on the neo-Y? How much heterogeneity in the amount of degeneration and TE abundance is there among regions? Does an accumulation of TEs trigger degeneration at adjacent protein-coding regions? Are genes with female-biased expression more prone to degeneration on the neo-Y, and male-biased genes more conserved?
To investigate the genomic organization of the neo-sex chromosomes, a BAC library made from male D. miranda was screened. We used a pool of probes for 20 genes that were shown previously to be neo-sex linked . A total of 269 positive BAC clones were identified. BAC clones were end-sequenced using the SP6 and T7 primers. The sequences were blasted against the D. pseudoobscura genome sequence to orient the BAC clones along chromosome 3 of D. pseudoobscura, which is homologous to the neo-sex chromosomes of D. miranda. BAC clones were subsequently typed as being from the neo-X or neo-Y using sex-specific polymorphisms. A total of 202 BAC clones were mapped and oriented to the neo-sex chromosomes of D. miranda. Fourteen BAC clones were chosen for sequence analysis, from seven homologous regions on the neo-X and neo-Y that maximized both the number of genes sequenced and the overlap between the neo-X and neo-Y chromosomes (Table (Table1).1). Six BAC clones were sequenced using Sanger sequencing technology, and eight BAC clones were sequenced using 454 technology. Assemblies using these two technologies resulted in a very similar number of large contigs containing >500 bp per BAC (that is, roughly 10-20 large contigs for neo-X linked BAC clones, or 30-50 large contigs for neo-Y linked BAC clones; Table Table1).1). The total length in base pairs of large contigs is higher for the neo-X clones than for neo-Y clones (Table (Table1).1). This likely reflects the presence of transposable element (TE) fragments and repetitive DNA on the neo-Y chromosome (see below), resulting in poorer sequence assemblies on the neo-Y, and the collapse of homologous repetitive elements into the same contig.
Contigs were assembled and orientated against the homologous D. pseudoobscura region using BLAST analysis. Concatenated neo-X and neo-Y contigs were aligned to D. pseudoobscura, and annotated using the D. pseudoobscura genome annotation as a guide (release r2.0). Figure Figure11 shows a schematic overview of the orthologous genomic regions investigated on the neo-sex chromosomes, anchored to the D. pseudoobscura genome annotation. One BAC clone on the neo-Y, CH229-47N8, spans a 350 kb region in D. pseudoobscura, with the middle 300 kb being deleted from the neo-Y. This could reflect a real structural rearrangement on the neo-Y (deletion, duplication or inversion), or instead be an artifact of a recombinant BAC clone.
Accumulation of repetitive DNA and transposable elements is predicted to occur on a non-recombining genome, such as the Y chromosome. RepeatMasker  was used to characterize the distribution of known transposable elements on the neo-sex chromosomes of D. miranda (Table (Table2).2). As expected for euchromatic regions in Drosophila, transposable element insertions are rare on the neo-X; only about 1% of the DNA analyzed was classified as being transposable element derived. In contrast, about 20% of the DNA derived from the neo-Y chromosome consists of transposable elements, ranging from 14% to 29% among the BAC clones studied (Table (Table2).2). There is no significant heterogeneity in the amount of repetitive DNA identified among neo-Y linked regions (χ2 = 8.1; p = 0.71) .
The true total amount of repetitive DNA on the neo-Y, however, is clearly much larger than 20%. In particular, TEs were identified using mainly a D. melanogaster transposable element library. Comparisons of the sequenced neo-sex linked regions with the D. pseudoobscura genome sequence suggest that up to 50% of the neo-Y chromosome might be made of repetitive DNA. Specifically, the 1.46 Mb of neo-X derived sequence span a 1.40 Mb region in D. pseudoobscura (Table (Table1),1), suggesting that the neo-X chromosome has a similar repeat content to the autosomal orthologous regions in D. pseudoobscura. In contrast, the 0.94 Mb of neo-Y derived sequence span only a 0.42 Mb region in D. pseudoobscura (excluding BAC clone CH229-47N8, which either is a recombinant BAC clone or has large genomic deletions on the neo-Y; Table Table1).1). This suggests that more than half of neo-Y derived DNA consists of repetitive and junk DNA, including transposable elements and tandem duplications. Retroelements are the most abundant type of repetitive DNA identified on the neo-Y, accounting for more than 90% of the transposable element DNA of the neo-Y. This accumulation of transposable elements on the neo-Y chromosome of D. miranda is consistent with previous observations based on in situ hybridization experiments [8,14] or DNA sequence analysis [8,10].
Several other evolving Y chromosome systems have now been characterized in both plants (Silene latifolia, liverwort, papaya [17-19]) and animals (medaka, threespine stickleback [20,21]), and show a similar accumulation of transposable elements. These data suggest that transposable elements are an early invader of newly formed Y chromosomes, and play a general and widespread role in Y chromosome degeneration.
A total of 118 gene pairs were contained within the overlapping regions of the neo-sex linked BAC clones. Of these, 63 are potentially functional on the neo-Y chromosome (that is, they can be translated into a functional protein), while 55 genes are clearly non-functional. In particular, 4 genes are completely deleted from the neo-Y, 27 genes contain frameshift mutations, 7 genes harbor premature stop codons and 17 genes have both frameshift mutations and premature stop codons (Table (Table3).3). Thus, almost half of the genes have become pseudogenized after only about one million years of evolution on this non-recombining Y chromosome. This estimate is higher than previous ones (that is, a third of all genes were found to be non-functional ), and likely reflects that most previous genes investigated were isolated by PCR, which biases the amplification towards conserved genes.
Combining this new data set with 73 previously studied [10,11,22] and 18 novel genes or gene fragments isolated by PCR (see Materials and methods), the total number of gene pairs investigated to date on the neo-sex chromosomes of D. miranda amounts to 209 (that is, roughly 10% of all the genes that were originally present on the neo-Y chromosome). Of these, 123 have a potentially functional copy on the neo-Y chromosome, while 86 have clearly become pseudogenes. The rate of amino acid evolution is significantly higher at non-functional genes on the neo-Y compared to potentially functional ones, with non-functional genes evolving about twice as fast (dN/dS = 0.67 versus dN/dS = 0.38 for non-functional versus functional genes, Wilcoxon test, p < 0.001; Table Table4).4). This is consistent with the notion that non-functional genes evolve under little or no selective constraint and thus accumulate amino acid changes in a neutral manner. However, both functional and non-functional genes evolve significantly faster at the protein level on the neo-Y compared to their neo-X homologs (Table (Table4).4). Amino acid substitutions accumulate on the neo-Y approximately 2-fold faster than on the neo-X (dN/dS = 0.24 versus dN/dS = 0.50 for neo-X versus neo-Y genes, Wilcoxon test, p < 0.001; Table Table4).4). This supports the idea that the efficacy of natural selection is reduced on a non-recombining genome, resulting in an increased rate of accumulation of slightly deleterious amino acid substitutions on the neo-Y. In principal, amino acid accumulation can also result from adaptive protein evolution, and positive selection using polymorphism data has been detected on the neo-Y of D. miranda . However, it is likely that this adaptation occurs only at few genes, and the majority of amino acid accumulation is due to the reduced efficacy of natural selection due to selection at linked sites. Thus, most genes in fact suffer from the adaptation that occurs at a few loci on the neo-Y.
Interestingly, genes that have a functional copy on the neo-Y chromosome also evolve slower on the neo-X (average dN/dS = 0.19 for functional genes and dN/dS = 0.31 for genes whose homolog is a pseudogene on the neo-Y, Wilcoxon test, p < 0.03; Table Table4).4). This suggests that genes that generally evolve under stronger functional constraints (that is, lower dN/dS on the neo-X) are more likely to be conserved on an evolving Y chromosome. This is expected under evolutionary models of Y chromosome degeneration if genes that have fewer amino acid changes on the neo-X are genes in which mutations are more strongly deleterious, since weakly deleterious mutations are more likely to accumulate on a non-recombining chromosome.
Of the 209 genes investigated to date, 118 genes were isolated using the BAC library screen, and 91 genes using mainly a PCR-based approach. A larger fraction of genes is non-functional (47%) isolated from the BAC library, relative to the PCR biased set (35%), although this difference is not statistically significant (p = 0.17, Fisher's exact test). This is probably due to both the preferential amplification of conserved genes by PCR, and the fact that for many genes isolated by PCR only partial coding sequence was studied, which means that some stop codons or frameshift mutations will have gone undetected. Also, genes isolated by the BAC library screen evolve faster relative to the PCR biased set (average dN/dS = 0.53 for BAC genes and dN/dS = 0.45 for genes isolated by PCR, Wilcoxon test, p = 0.09). Again, a bias towards isolating conserved genes by PCR and a higher fraction of pseudogenes (which have higher dN/dS ratios (see above)) can account for faster rates of protein evolution in the set of genes isolated by the BAC library screen.
TEs can have both direct and indirect roles in contributing to Y chromosome degeneration. Direct effects are apparent as insertions into coding and regulatory regions on the neo-Y. Using Repeatmasker, we identified 13 TE insertions inside genes on the neo-Y chromosome (all inside introns). The fraction of transposable element insertions is similar between functional and non-functional genes; 7 out of the 51 pseudogenes annotated on the neo-Y contain a TE insertion, and 6 of the 63 genes with intact open reading frames (Fisher's exact test, p = 0.3). Thus, transposable elements do not simply accumulate neutrally in inactive genes, but may actively contribute to gene inactivation of adjacent genes on the neo-Y. Transposable elements inserted into pseudogenes are significantly longer than those inserted into functional genes (average length 604 versus 104 bp, Wilcoxon test, p = 0.01). We cannot distinguish whether larger TE insertions are simply more tolerated in inactive neo-Y genes, or whether large TE insertions trigger the inactivation of adjacent genes. Gene expression studies should help to clarify this question. TEs can also indirectly interfere with regular gene expression by producing antisense transcripts of adjacent genes, or by altering the chromatin structure of the Y chromosome. We detect no association between the functional status of a gene, and its mean distance to the closest TE insertion. Also, we find no correlation between transposable element abundance and the fraction of pseudogenes or rates of protein evolution of protein-coding genes among the BAC clones investigated. This does not support the idea that an accumulation of TEs indirectly interferes with the function of adjacent genes but, instead, TE insertions might have only very local effects . Detailed gene expression analysis should reveal the effect of individual TE insertions on the function of neo-Y-linked genes.
The contributions of the individual alleles to the phenotype are often non-additive; at many loci, wild-type alleles are dominant and mutant alleles are recessive. The dominance of wild-type alleles has been suggested to be a by-product of the 'physiology of the organism' , and that, in dosage-sensitive genes, wild-type alleles should be recessive . Compatible with the physiological theory of dominance, human enzymes are more likely to be haplosufficient (that is, the wild-type is dominant) while structural proteins are more likely to be haploinsufficient, that is, the wild-type is recessive . Genes on the neo-Y chromosome are usually sheltered by a functional copy on the neo-X. Thus, dosage-insensitive genes (like metabolic enzymes) might be more prone to degeneration than dosage-sensitive ones (like regulatory genes). To investigate this prediction, we compared the functional status of a neo-Y gene (that is, functional versus pseudogene) and its rate of amino acid evolution (dN/dS) to the Gene Ontology (GO) annotations. Of the 209 genes investigated, 142 had GO level 2 information. We investigated the same major functional categories as ref. . We observe no significant differences in the proportion of functional and non-functional genes among different functional categories (Table (Table5).5). GO categories also do not predict rates of protein evolution of neo-Y genes (results not shown). Thus, there is no evidence in our data that genes that encode proteins with enzymatic functions are more likely to degenerate, or genes encoding regulatory or structural proteins are less likely to degenerate, as might be expected under the physiological theory of dominance .
A large fraction of the Drosophila genome shows sex-biased expression [26,27]. Since the neo-Y chromosome of D. miranda is passed through males only, there may be less selective pressure to maintain genes with female-biased expression on this chromosome. Genes with male-biased expression, in contrast, may be more conserved on the neo-Y than genes with female-biased or unbiased expression . To investigate this prediction, we compared the functional status of a neo-Y gene (that is, functional versus pseudogene) and its rate of amino acid evolution (dN/dS) to patterns of sex-biased gene expression. We classified the neo-sex genes as male-biased, female-biased or non-biased, as identified in gene expression studies in D. pseudoobscura . Of the 209 genes investigated, 196 genes had information on patterns of sex-biased gene expression, with 25 genes showing male-biased expression, 52 genes showing female-biased expression, and 119 genes showing no sex bias in patterns of gene expression (Table (Table6).6). While genes with female-biased expression show a trend to evolve slightly faster on average on the neo-Y than male-biased or unbiased genes (mean dN/dS = 0.52 versus 0.47 and 0.47; Table Table6),6), these differences in rates of protein evolution are not significant. Also, the fraction of functional genes does not differ significantly among genes with different types of sex-biased expression (Table (Table6).6). Thus, there is no evidence in our data set that sex-biased gene expression is a major determinant in predicting patterns of Y chromosome degeneration.
While extensive sequence analysis has been performed on old Y chromosomes [5,30,31], our data represent the most extensive sequencing effort on a young Y chromosome to date. Several hallmarks of early Y chromosome evolution were identified. At least half of the neo-Y chromosome consists of repetitive DNA, and TEs, in particular, show a striking accumulation on the neo-Y compared to homologous neo-X regions. TEs accumulate uniformly among genomic regions, and no interaction between TE abundance and degeneration at protein-coding genes was detected. Roughly half of all genes originally present on the neo-Y chromosome have already become pseudogenized, after about one million years of evolution on this non-recombining chromosome. Pseudogenes evolve faster at the protein level on the neo-Y compared to functional genes, and the rate of protein evolution at both functional and non-functional genes is significantly elevated on the neo-Y genes relative to their neo-X homologs. This supports theoretical predictions that the efficacy of natural selection is reduced on a non-recombining genome. Genes that are more conserved at the protein level on the neo-X are also more likely to be functional on the neo-Y. However, functional categories or sex-biased gene expression patterns do not predict the amount of degeneration of neo-Y linked genes. It will be of great interest to study patterns of gene expression at these neo-sex linked regions, to link changes at the DNA level to Y inactivation and the evolution of dosage compensation.
How general will our observations on the processes involved in Y degeneration be? One peculiarity of the neo-sex chromosomes in Drosophila is that an entire chromosome stops recombining instantaneously, while in other systems, like humans, recombination is restricted gradually. While there is no obvious reason to expect that the types of changes accumulating on the Y would differ between these scenarios, the rate of degeneration is expected to be much slower in systems where only few genes are sheltered from recombination from the beginning . It will be of interest to study such systems, such as, for example, the young sex chromosomes of some plants  or vertebrates [20,21].
The CHORI-229 D. miranda BAC library has been constructed from adult males from a D. miranda isofemale line (MSH22) by the Children's Hospital Oakland Research Institute. High molecular weight DNA was partially digested with a combination of EcoRI restriction enzyme and EcoRI methylase and size fractionated by pulsed-field electrophoresis. DNA fragments from the appropriate size fraction were cloned into the pTARBAC2.1 vector between the two EcoRI sites. The ligation products were transformed into DH10B (T1 resistant) electro-competent cells (Invitrogen, Carlsbad, CA, USA). High density filters from the BAC library were screened with radiolabeled overgo probes designed to 20 neo-sex linked genes described in ref. . Positive BACs were endsequenced using the T7 and SP6 primers, and the sequences were blasted against the D. pseudoobscura genome sequence to orient the BAC clones along chromosome 3 of D. pseudoobscura (which is homologous to the neo-sex chromosomes of D. miranda). BAC clones were typed as neo-X or neo-Y specific using sex-specific polymorphisms. A total of 202 BAC clones were mapped and oriented to the neo-sex chromosomes of D. miranda.
Seven neo-X and seven neo-Y BACs were chosen for sequencing, employing two different sequencing strategies. Three homologous BAC clone pairs were sequenced using Sanger technology (clones isolated with probes CG3700, RnrS, and Cyp6t3). Purified BAC DNA was randomly sheared using hydroshear technology, size selected (approximately 2.5 kb) and subcloned into a pUC based vector and transformed into Escherichia coli. Universal primers and BigDye Terminator chemistry (Applied Biosystems, Foster City, CA, USA) were used for sequencing randomly selected plasmid subclones to an average sequence depth of 6×. The Phred/Phrap/Consed suite of programs was used for assembling and editing the sequences . Four homologous BAC clone pairs were sequenced using 454 technology (clones isolated with probes CG11228, CG2269, dpld, CG3167). 454 Life Sciences (Branford, CT, USA) performed one large (70 × 75) run of the BAC DNA, partitioned into eight. A total of 292,705 quality filtered sequence reads were generated with an average length of approximately 102 bp. The total sequence output was 30,171,327 bases. After E. coli removal, all quality filtered sequences were assembled using the 454 Newbler assembler. Sequences have been deposited in GenBank under accession numbers EU624504 - EU624734.
To identify repetitive elements, the neo-X and neo-Y sequence contigs were analyzed with RepeatMasker  using the Drosophila library. The masked BAC sequence contigs were oriented and mapped against the D. pseudoobscura genome sequence using BLAST analysis. Contigs with significant BLAST matches were concatenated to generate single sequence contigs for each BAC region from the neo-X and neo-Y chromosomes. A global alignment of the final neo-X, neo-Y and D. pseudoobscura sequence was performed using PipMaker .
The neo-sex linked BAC clones were annotated using the D. pseudoobscura genome sequence annotation as a guide. Pairwise alignments between D. pseudoobscura and the neo-X or neo-Y sequence, respectively, were performed using PipMaker. The positions of exon-intron boundaries on the neo-X and neo-Y sequence were determined using the alignment, and coding regions were extracted using custom PERL scripts. Coding regions were realigned using MUSCLE and by eye. Genes that contain STOP codons or frameshift insertions or deletions on the neo-Y were classified as non-functional. To estimate rates of synonymous and nonsynonymous mutations (dS and dN) on the neo-X and neo-Y branch, we used the PAML software package , which accounts for unequal transition and transversion rates and unequal base and codon frequencies.
BAC, bacterial artificial chromosome; GO, Gene Ontology; TE, transposable element.
DB conceived and supervised the study. XM isolated genomic DNA from D. miranda for BAC library construction. PJ made the D. miranda BAC library and performed the BAC library screening. EH isolated BAC clone DNA. DB analyzed the data, and KMW helped with the analysis of the GO data. DB wrote the manuscript.
This research is funded by NIH Grant GM076007 and a Sloan Fellowship to DB. BAC library construction was funded by a Wellcome Trust grant to P Keightley and B Charlesworth. P Andolfatto provided funds for the sequence of two BAC clones. We are grateful to D Sturgill for providing gene expression data prior to publication.