|Home | About | Journals | Submit | Contact Us | Français|
Many animal species use a chromosome-based mechanism of sex determination, which has led to the coordinate evolution of dosage-compensation systems. Dosage compensation not only corrects the imbalance in the number of X chromosomes between the sexes but also is hypothesized to correct dosage imbalance within cells that is due to monoallelic X-linked expression and biallelic autosomal expression, by upregulating X-linked genes twofold (termed ‘Ohno’s hypothesis’). Although this hypothesis is well supported by expression analyses of individual X-linked genes and by microarray-based transcriptome analyses, it was challenged by a recent study using RNA sequencing and proteomics. We obtained new, independent RNA-seq data, measured RNA polymerase distribution and reanalyzed published expression data in mammals, C. elegans and Drosophila. Our analyses, which take into account the skewed gene content of the X chromosome, support the hypothesis of upregulation of expressed X-linked genes to balance expression of the genome.
The evolution of a specific set of sex-determining chromosomes has led, in many species, to the existence of a single X chromosome in males and two in females. Unchecked, this mechanism of sex determination would result both in unequal X-linked gene expression between the sexes and in a naturally occurring ‘X aneuploidy’ in males. Chromosomal aneuploidy, caused by loss or gain of large regions of the genome, results in disturbances in the overall balance of gene expression networks, leading to reduced fitness and mortality1,2. Thus, mechanisms to maintain balanced expression of the genome are vital3. To compensate for haploinsufficiency of X-linked genes in males and to restore equal expression between the sexes, dosage-compensation mechanisms evolved, which differ between organisms. In D. melanogaster, expression of most X-linked genes is increased about twofold selectively in males to achieve balanced expression with the autosomes and to equalize expression with females4,5. In mammals and C. elegans, two processes have evolved: (i) a mechanism to equalize X-linked gene expression between the sexes by either silencing one X chromosome in mammalian females (X inactivation)6 or repressing both X chromosomes in C. elegans hermaphrodites7 and (ii) a mechanism hypothesized to upregulate expression from the X chromosome in both sexes8,9. Whereas the doubling of X-linked gene expression in Drosophila has been well documented, evidence for X-linked gene upregulation in mammals and C. elegans is relatively recent. Measurements of Clcn4-2 expression in different species of mice and analyses of expression microarray data provided the first evidence for the upregulation of genes on the X chromosome8–10. In XX or XY cells, the ratio of X-linked to autosomal expression as measured by microarrays was ~1. This held true in several tissues from multiple mammalian species, as well as in C. elegans and Drosophila8,9.
A recent study concluded that, in contrast to expression array data, RNA sequencing data do not support dosage compensation of the active X chromosome in mammals and adult C. elegans, thereby rejecting Ohno’s hypothesis11. Based on reanalyses of published RNA-seq data, the authors calculated that X:autosome (X:A) median expression ratios were ~0.5 in humans and 0.3 in mice, whereas in C. elegans, ratios declined from 1.0 in embryos to 0.4 in adults. However, as shown here, these ratios are strongly influenced by the preponderance of reproduction-related genes on the X chromosome that are silent in somatic tissues12 and by special aspects of X chromosome regulation in germ cells13–15. Our reanalyses of published RNA-seq data and data from new experiments, which take into consideration the skewed gene content and regulation of the X chromosome, uphold the idea that there is compensation of gene expression between the X chromosome and autosomes in both sexes of mammals, C. elegans and Drosophila.
To determine whether X-linked genes were expressed at twice the level of autosomal genes per active allele in mammals, we first investigated whether the X chromosome and autosomes had similar expression profiles. We suspected that the skewed gene content of the mammalian X chromosome would affect the distribution of gene expression because reproduction-related genes have low or no expression in somatic tissues16–18. The distribution of X-linked and autosomal gene expression was determined in multiple human and mouse tissues and cell lines by the reanalysis of a subset of the same RNA-seq data sets analyzed previously11, in addition to the evaluation of newly released RNA-seq data sets (Supplementary Table 1). In all human and mouse samples examined, the frequency of genes with no expression (0 fragments per kilobase of exon per million mapped fragments (FPKM)) was significantly higher on the X chromosome than on autosomes (P < 0.05, by Fisher’s exact test; Supplementary Fig. 1a and Supplementary Table 2). As demonstrated in the following section, many of these are multicopy genes expressed in the testis.
The distributions of gene expression were similar between the X chromosome and autosomes in brain and five other human tissues (P > 0.05, by Kolmogorov-Smirnov test), even when including all genes with any evidence of expression (>0 FPKM) (Fig. 1a, Supplementary Fig. 1b and Supplementary Table 2). This finding was also demonstrated by a computational doubling of X-linked expression values, which produced a theoretical curve shifted far to the right of autosomal expression values, rather than the overlapping curves that would be anticipated if expression on the X chromosome was not upregulated (Fig. 1a). A cutoff of >0 FPKM includes some genes that are actually silent but that record reads as a result of biological noise or noise from the process of sequencing and read-mapping19. For genes with more robust expression (≥1 FPKM), the distributions of X-linked and autosomal expression became statistically indistinguishable in 12 of 16 human tissues, consistent with the upregulation of expressed X-linked genes (P > 0.05, by Kolmogorov-Smirnov test; Supplementary Table 2). Additional RNA-seq data from four human ENCODE cell lines confirmed results obtained in human tissues (Supplementary Fig. 2a and Supplementary Table 2). Computational doubling of X-linked expression values also shifted the X-chromosome curve to the right of the autosomal curve. We then compared gene expression from individual chromosomes. Box plots showed no significant differences between the X chromosome and each human autosome (P > 0.05, one-way ANOVA test for genes with >0 FPKM; Fig. 1b). In addition, the average distribution of X-linked expression in 41 human lymphoblastoid cell lines20 was not significantly different from autosomal expression (P > 0.05 for 15 of 22 autosomes, by Kolmogorov-Smirnov test for genes with >0 FPKM; Fig. 1c).
The calculated X:A median expression ratios based on RNA-seq data were strongly dependent on inclusion or removal of genes with no or low expression (Fig. 1d and Supplementary Table 2). Comparisons between 41 male and female lymphoblastoid cell lines20 also revealed variability between individual samples, albeit no apparent significant sex differences (P > 0.05, by Student’s t-test; Fig. 1d and Supplementary Fig. 2b). Given the skewed X:A distribution of genes with spatially or temporally restricted expression, for example reproduction-related genes, it is clear that X:A median expression ratios are distorted and that a better approach is to compare complete distributions of X-linked and autosomal gene expression, as shown above (Fig. 1). The advantage of using distributions was also seen by calculating X:A median expression ratios for genes separated into 16 bins, each containing the same number of X-linked genes (33–59, dependent on tissue) and autosomal genes (933–1,396) (Fig. 1e and Supplementary Fig. 2b). Note that brain has the highest X:A median expression ratio, as previously reported9. For all tissues, the X:A median expression ratios were variable for bins containing genes with low expression and gradually increased from ~0.7 to ~1.0 for bins containing genes with expression measurements ≥1 FPKM. Thus, the inclusion of non-expressed or very weakly expressed genes that result from the skewed gene content of the X chromosome masks the upregulation of X-linked genes.
Likewise, in mouse tissues, the distributions of X-linked and autosomal expression were similar for genes with ≥0 FPKM in brain and for genes with ≥1 FPKM in other tissues (P > 0.05, by Kolmogorov-Smirnov test; Fig. 2a, Supplementary Fig. 3 and Supplementary Table 2). There was no significant difference between the X chromosome and 14 of 19 mouse autosomes (P > 0.05, one-way ANOVA test for genes with >0 FPKM; Fig. 2b). In mouse as well, the calculated X:A median expression ratios based on RNA-seq data were strongly dependent on the inclusion or removal of genes that were expressed weakly or not expressed (Fig. 2c). The combined analyses of human and mouse tissues and cell lines not only reveal a high frequency of X-linked genes with no or low expression but also support the idea that most expressed genes on the single active X chromosome are upregulated to achieve a similar level of expression to that of autosomes present in two copies.
Tissue-specific genes implicated in sexual reproduction have accumulated on the mammalian X chromosome. The nonrandom distribution of genes with gonad and brain expression would have a profound effect on the evaluation of Ohno’s hypothesis. Reproduction-related genes on the human X chromosome include 13 multicopy families of cancer-testis antigen genes and represent at least 10% of the X-linked genes17,21,22. Genes expressed in brain are also abundant on the human X chromosome23–25. On the mouse X chromosome, about 18% of genes are included in 33 multicopy gene families representing a total of 273 genes that are mainly expressed in postmeiotic male germ cells18. Genes expressed in premeiotic male germ cells and somatic cells of the testis, as well as in female-specific tissues such as placenta and the ovary, also predominate on the mouse X chromosome16,26. Thus, it was not unexpected that the two tissues that had the lowest percentage of non-expressed X-linked genes were testis (16%) and brain (22%), as compared to other tissues (23–40%) (P < 4 × 10−7, by Fisher’s exact test between testis and other tissues; Supplementary Fig. 1a and Supplementary Table 2).
In human testis, the distributions of gene expression remained significantly different between the X chromosome and autosomes, even for well-expressed genes with >2 FPKM (P = 1 × 10−4, by Kolmogorov-Smirnov test; Supplementary Table 2). This finding suggested that a specific set of genes is expressed in the testis. We conducted pairwise analyses to compare the testis to other human tissues by plotting the distribution of testis expression for the subsets of X-linked and autosomal genes that were not expressed (0 FPKM) in a given somatic tissue (liver, adrenal gland and lung). A higher proportion of these genes were X linked rather than autosomal, confirming the preponderance of reproduction-related genes on the X chromosome (P < 1 × 10−7, by Kolmogorov-Smirnov test; Fig. 1f and Supplementary Fig. 4). For example, of 3,066 genes with 0 FPKM in human liver and >0 FPKM in human testis, 192 were X linked (22% of the total number of X-linked genes assayed) and 2,874 were autosomal (14% of the total number of autosomal genes assayed) (P = 4 × 10−9, by Fisher’s exact test; Supplementary Table 3). Comparisons between somatic tissues did not show this trend (Supplementary Fig. 4). Six of the top ten functional categories (false discovery rate (FDR) score 2 × 10−8) obtained by Gene Ontology analyses of the subset of X-linked genes with no or very low expression (0–0.1 FPKM) in human somatic tissues were reproduction-related. Similarly, more than 93% of known reproduction-related genes16,18 were not expressed in mouse somatic tissues (Supplementary Table 4). We conclude that reproduction-related genes contribute to X-linked genes that are not expressed or are expressed at low levels in somatic tissues of mammals. When we correct for this biased distribution, RNA-seq data are strongly consistent with the predictions of Ohno’s hypothesis of similar expression levels between the X chromosome and autosomes.
If upregulation of the X chromosome occurs at the level of transcription, one would expect higher occupancy by active forms of RNA polymerase II on the X chromosome. To test this idea, we used an undifferentiated female embryonic stem (ES) cell line (PGK12.1) with two active X chromosomes and an X:A expression ratio of 1.4 (ref. 27). Because the X:A expression ratio should theoretically be 2 if X-linked expression were doubled by dosage-compensation mechanisms, the observed lower ratio suggests that dampening of X-linked expression occurs in PGK12.1 cells, thus allowing the survival of this particular cell line28. Most female ES cell lines lose one X chromosome, likely as a result of the deleterious effects of high X-linked expression, which is also the probable cause of lethality in mouse embryos engineered to retain two active X chromosomes29.
Chromatin immunoprecipitation combined with DNA array (ChIP-chip) analyses of PGK12.1 cells showed, in two separate experiments, that the form of RNA polymerase II that is phosphorylated at Ser5 (PolII-S5p), which is specifically associated with active transcriptional initiation30, was more highly enriched at the 5′ end of genes on the X chromosome than for autosomal genes (Fig. 3a and Supplementary Fig. 5). RNA-seq analyses confirmed higher levels of expressed X-linked genes in this cell line (Fig. 3b). When we sorted genes into nine bins based on expression, X-linked genes in the upper four bins had higher PolII-S5p occupancy relative to autosomal genes (Fig. 3b). As expected, no such difference between the X chromosome and autosomes was apparent among genes with low or no expression. Scatter plots confirmed that PolII-S5p occupancy depended on expression levels (Fig. 3c) as was shown in another system31. This RNA-independent evidence strongly supports upregulation of expressed X-linked genes and provides a plausible mechanism for mediating increased expression in mammals.
In C. elegans, Xiong et al. observed X:A median expression ratios of 0.92, 0.84, 0.69 and 0.41 for hermaphrodites at the L2, L3, L4 and adult stages, respectively11. These declining ratios were interpreted to indicate that the C. elegans X chromosomes do not undergo compensatory upregulation in later developmental stages11. However, the progressive reduction in X:A expression ratio values during development is caused by a change in the ratio of somatic to germ cells in the organism.
The anatomy of C. elegans has been fully described, and the precise lineage of each somatic cell, which does not vary from animal to animal, is known32–35. When the embryo hatches, there are 558 somatic cells, a subset of which undergo further division to yield approximately 700 cells by the third larval stage (L3), and a total of exactly 959 somatic cells are present in a mature hermaphrodite. Germline development commences after the animal hatches, most rapidly during the late L3 and L4 stages. There are approximately 10 germ cells by the end of L2, 100 by the end of L3, 1,000 by the end of L4 and 2,000 in a mature XX hermaphrodite adult36. Thus, the ratio of germ cells to somatic cells increases as the animal develops. This is relevant because both X chromosomes are transcriptionally silenced in germ cells13. Therefore, if the true X:A expression ratio in somatic cells were 1 (consistent with upregulation of X-linked genes to balance gene expression), we would expect measured X:A expression ratios to be approximately 0.98 in L2, 0.87 in L3, 0.48 in L4 and 0.32 in adults. These ratios are rough estimations, yet they very closely approximate the reported ratios (0.92, 0.84, 0.69 and 0.41)11. These data are therefore consistent with known C. elegans biology and have no bearing on Ohno’s hypothesis.
To compare expression between the X chromosome and autosomes in somatic cells, where genes on the X chromosome are known to be expressed, we performed RNA-seq on glp-1 (q224) XX L4 animals, which do not undergo germline proliferation. Consistent with the previous analysis11, the X:A expression ratio was 0.51 in wild-type L4 animals with a developing germline but was 0.99 in XX L4 animals lacking a germline (Table 1 and Fig. 4). This result indicates that the lower X:A expression ratio in wild-type XX animals was indeed due to the presence of the silenced X chromosomes in the germline.
Having established equalized levels of X-linked and autosomal expression in XX somatic cells throughout development (Fig. 4), we next tested whether an animal with a single X chromosome also showed equal X-linked and autosomal expression. First, we note that a wild-type XO L4 male (also subject to the germline proliferation issue described above) has an X:A expression ratio of 0.51, identical to that of an XX L4 hermaphrodite (Fig. 4 and Table 1). Second, using microarrays, we found that the mean and median X:A expression ratio in L3 animals that are karyotypically male was 0.98 (see Online Methods), indicating that X-linked genes are upregulated in XO animals.
We also re-examined published mouse proteomics data11,37, again taking X-chromosome gene content into account in our analysis. We found a lower proportion of proteins encoded by X-linked genes represented in the data set (10.8%) compared to autosomal genes (17.5%). This is fully consistent with observations that a large proportion of X-linked genes are not expressed in somatic tissues. If only the top 10.8% of proteins from autosomal genes are used in calculating the X:A protein ratio, whereas all proteins from X-linked genes are included (also 10.8%, but including very low-abundance proteins), we also obtain a low X:A median protein ratio of 0.38 (ref. 11). However, if we use a threshold based on protein abundance for calculating the X:A protein ratio, in this case only considering autosomal and X-linked genes for which the corresponding proteins are found at least once in the data set, the X:A median protein ratio is 1.07 (P = 0.73, Mann-Whitney test). We conclude that the very limited mouse proteomics data available support the idea of similar protein levels from the X chromosome and autosomes.
The issue described above in the analysis of the mammalian proteomics data also applies to the prior analysis of the C. elegans proteomics data11. Furthermore, the C. elegans proteomics data were obtained from mixed-stage animals, meaning that a large proportion of the assayed cells were germ cells and subject to the same effect described above in the RNA-seq analysis. Therefore, analysis of these samples is not appropriate for testing X-linked gene upregulation.
As in mammals and C. elegans, there is biased gene content on the Drosophila X chromosome, such that there are fewer genes with male-biased expression on that chromosome38. Although to a lesser extent than in C. elegans, the Drosophila X chromosome is also inactivated in the late male germline14. Both of these effects reduce overall expression measured from the X chromosome. In a previous microarray study, we used sex reversal, which concomitantly results in the accumulation of mitotic germline cells, to examine X-linked expression in isolation from the effects of sex-biased gene distribution and X inactivation. We found that both somatic and germline cells have dosage compensation8.
We reanalyzed RNA-seq data from wild-type testes, which are expected to show decreased X-linked expression, and from the testes of bam mutants, which contain tumors of early mitotic male germ cells39. This analysis allowed us to remove the effects of X inactivation and most sex-biased expression. As expected, in wild-type testis, we found that expression of X-linked genes was significantly lower than that of autosomal genes (X:A median expression ratios 0.64–0.71; P < 4.4 ×10−6, by Kolmogorov-Smirnov test; Fig. 5). However, in testes from bam mutants, expression from the X chromosome was much more similar to that from autosomes (X:A median expression ratios 0.87–0.92; P > 0.01 for 3 of 4 autosomes, by Kolmogorov-Smirnov test; Fig. 5). These results indicate that there is X chromosome dosage compensation in early mitotic cells in the Drosophila germline.
Our reanalysis of RNA-seq data and data from new experiments are consistent with balanced expression from the X chromosome and autosomes in mammals, C. elegans and Drosophila. When considering expressed genes, there is strong evidence of X-linked gene upregulation in somatic tissues in both mammals and C. elegans. This finding is similar to what has been observed in somatic tissues of Drosophila4,5. We also confirm that there is balanced expression of the genome, even in the Drosophila germline that lacks the canonical compensatory MSL complex8. Furthermore, in female mouse ES cells with two active X chromosomes, we observe a greater occupancy by RNA polymerase II of expressed X-linked genes relative to autosomal genes, consistent with increased X-linked gene expression in mammals27.
We arrive at a different conclusion from Xiong et al.11 for the following reasons. First, the X chromosomes of many species have a remarkably unusual gene content, and genes with no or very low expression in somatic tissues are especially abundant on the X chromosome, as a result of the accumulation of genes with tissue-restricted expression on the sex chromosomes (for example, testis, ovary or brain)12,16–18,26,40,41. The sex chromosomes are uniquely enriched in genes involved in sexual reproduction because of evolutionary selection mechanisms that favor location on the X or Y chromosomes12. Our reanalysis of RNA-seq data of the distribution of gene expression in human and mouse tissues and cell lines confirm that the X chromosome is enriched in genes with low expression in somatic tissues but high expression in reproduction-related organs, especially the testis. We propose that any dosage-compensation mechanism to equalize the output between the X chromosome and autosomes is expected to target only genes that are appreciably expressed, as there would be no need for such a mechanism to work on genes that are silent or strongly repressed under the cellular conditions assayed. This theory has been confirmed in Drosophila, in which the MSL complex mainly assembles at expressed genes to enhance transcription elongation and thus increase expression from the male X chromosome42–46. Similarly, in C. elegans, the condensin-like dosage-compensation complex binds to the promoters of expressed genes on the X chromosomes in XX hermaphrodites47. As shown here, inclusion of X-linked genes with very low or no expression in analyses of expression data11 causes a shift to low median X:A expression ratio values. Second, in the C. elegans germline, the X chromosome is largely silenced13. The declining X:A expression ratio during development11 is therefore a reflection of the decreasing ratio of somatic to germ cells as the germline develops in the third and fourth larval stages and does not relate to dosage compensation. We demonstrate this by showing that the distribution of X-linked and autosomal gene expression is very similar in near-adult XX animals lacking a germline.
X chromosome–wide silencing not only occurs in germ cells of C. elegans13 but also takes place in male germ cells of mammals16 and Drosophila14. This complex regulation affects X-linked expression in certain tissues or cell types. In addition, not all X-linked genes are dosage compensated: some genes escape upregulation in Drosophila8, and some genes escape X inactivation in mammals48. These exceptional genes cause sex-specific differences in gene expression, which can vary between tissues, adding another layer of complexity to the regulation of X-linked genes49.
In conclusion, when taking into account genes implicated in sexual reproduction and unique regulatory features of the X chromosome in particular cell types such as germ cells, Ohno’s hypothesis that X-linked genes are overexpressed per active allele relative to autosomal genes to regain dosage balance is supported by RNA-seq analyses and by RNA-independent results in mammals, C. elegans and Drosophila.
The female mouse ES cell line PGK12.1 (a gift from N. Brockdorff)50 was maintained in standard ES medium with 1,000 U/ml leukemia inhibitory factor (LIF) (Millipore) on mouse embryonic fibroblast (MEF) feeders. We verified that two active X chromosomes were present by karyotyping and RNA-FISH for Xist (inactive X-specific transcripts), which showed the absence of a Xist cloud. ChIP was performed using 5 μg of rabbit polyclonal antibody to RNA polymerase II phosphorylated at Ser5 (PolII-S5p, Abcam) on fixed chromatin prepared from cell pellets as described51. A 10% aliquot of chromatin was saved as the input fraction. Immunoprecipitated chromatin and input fractions were treated with 0.2 M NaCl at 65 °C overnight to remove cross-links before DNA purification using the QIAquick PCR purification kit (Qiagen). An aliquot was subjected to PCR for the gene encoding the housekeeping factor β-actin to confirm the specificity of immunoprecipitation compared to a control without antibody. ChIP DNA and input DNA were amplified using the GenomePlex Complete Whole Genome Amplification Kit (Sigma) before labeling with Cy5 and Cy3, respectively. Array hybridizations were performed at the Genomics Resource Center (Fred Hutchinson Cancer Research Center, Seattle, Washington, USA) using NimbleGen mouse 2.1 M mouse promoter arrays. GFF ratios (log2 ChIP: input ratio) and peak files were generated using Nimblescan software. Average enrichment profiles at the 5′ ends of genes were calculated by end analysis52. A total of 16,141 autosomal genes and 647 X-linked genes were separately rank-ordered based on their expression levels by RNA-seq analysis and divided into nine bins (bin 1 containing all genes with 0 FPKM). Two independent biological replicates showed a high correlation (Supplementary Fig. 5).
To generate expression data for PGK12.1 ES cells by RNA-seq, purified mRNA was fragmented into ~200 bases and reverse-transcribed with random hexamers into cDNA to prepare a library for Illumina sequencing, by using the RNA-Seq Sample Prep kit (Illumina) as described51. Two technical replicates of the same library were prepared for sequencing. Sequences (total 200 million (M) 36-bp single-end reads) were aligned to the mouse reference genome (mm9) using TopHat53 with default parameters, except that the “min-isoform-fraction” flag was set to zero, “–no-novel-juncs” was specified to constrain the search to annotated exon-exon boundaries and the RefSeq mouse gene annotation (mm9) was used. Alignment resulted in 90 M mapped reads. FPKM values were obtained using Cufflinks with default parameters and “min-isoform-fraction” set to zero54.
Analysis of published RNA-seq data sets was based on data deposited in public databases (Supplementary Table 1). Hi-seq data were obtained from Illumina for 16 human tissues. The 50-bp paired-end reads had originally been analyzed as single-end reads using ELAND and CASAVA1.6 (ref. 55) to generate reads per kilobase of exon per million mapped reads (RPKM) values. Because the reads were analyzed as single-end reads, RPKM units are equivalent to FPKM units. A subset of the human tissues was also reanalyzed by TopHat and Cufflinks, which yielded similar expression profiles. In addition, removal of the pseudoautosomal genes did not alter the expression profiles. Data for the four ENCODE diploid cell lines was analyzed using Cufflinks with Gencode Annotation, version 3c. FPKM values computed by Cufflinks were averaged from two biological replicates, each sequenced to ~200 M paired-end 76-bp reads. Data for the 41 lymphoblastoid cell lines had been previously processed to generate RPKM (FPKM) values based on single-end reads20. The same RNA-seq data for mouse brain, liver 1 and muscle that were analyzed by Xiong et al.11 were reanalyzed using TopHat and Cufflinks as described above for PGK12.1 ES cells. Data for the mouse liver 2 sample from MORGEN (http://www.mouseatlas.org/data/mouse/) and for mouse E15 whole brain, preoptic area (POA) and prefrontal cortex (PFC)56 were also reanalyzed as described above for PGK12.1 ES cells. For simplicity, all data are presented as FPKM values. Fisher’s exact tests, one-way ANOVA tests, Kolmogorov-Smirnov tests (two-sided) and 95% bootstrap confidence intervals were calculated in R.
Raw RNA-seq data from the testes of wild-type and bam mutant organisms39 were obtained from GEO (GSE16960). These data consist of 30-bp single-end reads, with multiple lanes pooled together per sample. Read-mapping was performed using Tophat53 against a D. melanogaster reference (UCSC Dm3, FlyBase release 5 assembly, excluding “chrUextra”), turning off minimum isoform fraction filtering and setting the minimum intron size to 42 bp (the size of the smallest annotated intron). Unique mapping was performed using Tophat version v.1.1.4 (ref. 53) with Bowtie v.0.12.7 (ref. 59). Gene-level abundance measures were calculated using Cufflinks53, by turning off minimum isoform fraction filtering and supplying an annotation reference to quantify against, with all other options set at default. The alignment reference used was from ENSEMBL release 60 (November 2010), file name Drosophila_melanogaster.BDGP5.25.60.gtf, accessed from ftp://ftp.ensembl.org/pub/current_gtf/. The Cufflinks version used was v.0.9.3, with SAM tools v.0.1.9 (ref. 60). Gene-level results are reported in units of FPKM. All downstream computation, including Mann-Whitney tests, was performed in R.
We used XO L3 animals with a genotype of her-1(e1520) sdc-3(y126) V; xol-1(y9) X that were transformed into a hermaphrodite for ease of growth and culture61. RNA was purified by TRIzol (Invitrogen) extraction and Qiagen RNeasy kit. Total RNA was labeled and hybridized to single-color expression arrays at Roche NimbleGen. The normalization and data processing were performed using NimbleScan software, which normalizes intensity using Robust Multichip Average (RMA) and combines data from three probes per gene into one value per gene (call). The RMA calls were log2 transformed and used for analysis. The raw and processed data are available from the GEO database (GSE20136).
We thank A. Nelson and C. Ware (University of Washington) for expert assistance with ES cell culture and N. Brockdorff (Oxford University) for the female ES cell line PGK12.1. We thank R. Beyer (University of Washington) and X. Deng (University of Wisconsin–Madison) for help with statistical analyses and F. Yang (University of Washington) for helpful discussions. We thank I. Khrebtukova (Illumina) for RNA-seq data on human tissues. This work was supported by grants from the US National Institutes of Health GM079537 (C.M.D.), modENCODE grants HG004270 (J.D.L.), HG004263 (R.H.W.) and AG039173 (J.B.H.), ENCODE Transcriptome Project grant HG004557 (T.R.G.) and by the Intramural Research Program of the National Institute of Diabetes and Digestive and Kidney Diseases (B.O.), the William H. Gates III Endowed Chair of Biomedical Sciences (R.H.W.) and a fellowship from the Achievement Rewards for College Scientists (J.B.H.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Accession numbers. Human RNA-Seq data are available at EMBL-EBI (E-MTAB-513) and the GEO database (GSE26284 and GSE16921). Mouse RNA-Seq data are available at the GEO database (GSE22131) and the SRA database (SRA001030 and SRA012213), and RNA-seq data from mouse PGK12.1 ES cells are available at the GEO database (GSE30690). Mouse ChIP-chip array data for PolII-S5p are available at the GEO database (GSE30689). C. elegans RNA-Seq data are available from modENCODE (SRA003622 and SRA008646), and C. elegans expression array data are available at the GEO database (GSE20136). D. melanogaster RNA-seq data from the testes of wild-type and bam mutant organisms are available at the GEO database (GSE16960).
Note: Supplementary information is available on the Nature Genetics website.
AUTHOR CONTRIBUTIONSX.D., C.M.D., J.D.L. and B.O. conceived the project and wrote the manuscript. X.D., J.B.H., D.K.N., F.S., C.A.D., T.R.G., J.S., C.M.D. and B.O. analyzed the mammalian data; R.H.W., L.W.H., J.D.L., V.J.R. and S.E. analyzed the C. elegans data; B.O. and D.S. analyzed the Drosophila data; X.D., J.B.H. and J.S. performed or analyzed the RNA-seq and ChIP-seq data from mouse ES cells.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.