|Home | About | Journals | Submit | Contact Us | Français|
Sexual dimorphism results in the formation of two types of individuals with specialized reproductive roles and is most evident in the germ cells and gonads.
We have undertaken a global analysis of transcription between the sexes using a 31,464 element FlyGEM microarray to determine what fraction of the genome shows sex-biased expression, what tissues express these genes, the predicted functions of these genes, and where these genes map onto the genome. Females and males (both with and without gonads), dissected testis and ovary, females and males with genetically ablated germlines, and sex-transformed flies were sampled.
Using any of a number of criteria, we find extensive sex-biased expression in adults. The majority of cases of sex differential gene expression are attributable to the germ cells. There is also a large class of genes with soma-biased expression. There is little germline-biased expression indicating that nearly all genes with germline expression also show sex-bias. Monte Carlo simulations show that some genes with sex-biased expression are non-randomly distributed in the genome.
Sexual dimorphism varies enormously from the morphologically indistinct yeast mating types, to extreme differences that can lead to the misclassification of males and females as distinct species. While the Drosophila sexes show relatively modest sexually dimorphic somatic body plans there is striking sexual dimorphism in the germline . Indeed, it is difficult to imagine cell types that differ more than eggs and sperm. During the last century a handful of genes involved in the regulation of somatic (for example, Sex-lethal , transformer , transformer-2 , doublesex , intersex ) and germline sexual identity (for example, ovo , sans fille , ovarian tumor , and stand still ) have been identified, but we know very little about the effector genes that actually result in a sexually dimorphic state. Major terminal genes in the current models of somatic and germline sex determination are a transcription factor (doublesex) and an RNA binding protein (Sex-lethal) [11,12]. These molecules must orchestrate a cascade of effector functions that result in sexually dimorphic gametes and the somatic support functions required for their union at fertilization.
Whilst we know relatively little about how germline sexual identity is determined in Drosophila, the downstream process of oogenesis has been well studied. Egg production occurs in the ovary ; each ovary consists of a cluster of 16-20 ovarioles where the assembly line-like production of germline and somatic cell units (egg chambers) progresses along the length of the ovariole. The germline component of each egg chamber contains the differentiated products of a single germline stem cell division. The differentiating stem cell daughter undergoes four rounds of incomplete cytokinesis to produce an interconnected 16-cell cyst surrounded by a somatic follicular epithelium.
One of these 16 germline cells becomes the oocyte and the remaining 15 cells develop into supporting nurse cells. The 16-germline cells within cysts are connected by an intercellular network facilitating the active transport of macromolecules from the nurse cells into the growing oocyte. These components include basic cellular machinery such as ribosomes, and a vast assortment of proteins and RNA species that support early embryonic development. For example, the dorsal/ventral and anterior/posterior axes of the future embryo are laid down during oogenesis . The somatic follicular epithelium surrounding each egg chamber is an important source of structural proteins, such as yolk and egg shell proteins, that become incorporated into the oocyte, as well as patterning information. Most of the yolk proteins are produced distantly in non-gonadal fat body tissue . Thus, while much of the egg is constructed by the cells of the egg chamber, there is significant contribution from distant organs. Finally, additional somatic functions in the female reproductive tract and female mating behaviors are required for productive gamete function . A global analysis of gene expression in adult females therefore captures the genes required for all of the stages of oogenesis from stem cell to early embryo.
Analogously, the Drosophila testis contains developing gametes, from the stem cells at the apical tip of the testis to fully functional sperm . As in the ovary, the germline stem cell division in the testis produces cysts of 16 primary spermatocytes. However, in males all these primary spermatocytes undergo meiosis resulting in a cyst of 64 spermatids. Each spermatid then follows an elaborate differentiation program of cytoskeletal and nuclear rearrangements to form a mature sperm cell. These changes are quite remarkable. For example, sperm chromatin is nearly crystalline and the nucleus changes from a round structure to a highly elongated and slightly hooked shape. Cytoskeletal rearrangement is equally dramatic. The round spermatid forms a flagellar axoneme that is nearly half the length of the adult. Mitochondrial differentiation in the axoneme is also striking. Individual mitochondria fuse into two large and interleaved structures extending along the length of the flagellum. Thus, while the structure of a sperm cell might suggest a simplistic developmental program for spermatogenesis (DNA and a motor), shedding the features that characterize virtually all other cells in the body (such as packing DNA into nucleosomes) is an enormous reengineering feat. As is the case in females, the male reproductive tract and male behavior are required for fertility . Therefore a global analysis of gene expression in adult males captures the genes required for all of the stages of spermatogenesis from stem cell to the fertilized egg.
Not surprisingly, both genetic and classic molecular studies indicate that spermatogenesis and oogenesis are complex events requiring extensive and often sex-specifically deployed information [13,17]. More recently, global gene expression studies using printed cDNAs, expressed sequence tags (ESTs), and full transcriptome microarrays have revealed extensive overall sex differential expression [19-24], with gene expression in the germline and gonads being particularly striking [19,20,24]. Here we report gene expression profiles as a function of sexual dimorphism and sex determination in Drosophila as analyzed using a platform including 93% of predicted genes from version 1.0 of the Drosophila genome  and 75% of release 3.1. This article includes the dataset from Parisi et al.  on gene expression in adults, augmented with additional microarray experiments to further track the source of sexually dimorphic expression.
There are many stories embedded in the expression data reported here. We touch on only a few to illustrate the value of the dataset. The most significant aspect of the survey we report here is the creation of a dataset that can be mined by other researchers interested in gametogenesis and sexual dimorphism. To that end, genes showing differential expression have been organized into easy to browse tables that include internet links to FlyBase [26,27], the compendium of genome information for Drosophila. We have also deposited all the data at the Gene Expression Omnibus [28,29] so that those interested in large-scale reanalysis can easily download the entire dataset.
We used a series of experiments to delimit germline and somatic sex-biased expression. By comparing flies with and without a germline we can identify genes with germline-dependent expression bias [21,23]. While most genes showing germline-dependent expression are expressed within the germline cells, this class also includes somatically-expressed genes induced or de-repressed by the presence of germ cells. Progeny of homozygous tudor1 mothers have no germline and are useful in differentiating somatic and germline-biased expression . Another measure of germline contribution was obtained by dissection. The gonad samples provide a measure of gene expression in the germline and somatic components of the gonad. Whilst the gonads clearly contain both tissues, the vast majority of the mRNA is derived from the germline cells. Indeed, we were unable to collect sufficient mRNA (600 ng of polyA+) from dissected germline-less gonads to perform a single microarray experiment. Finally, wildtype Drosophila males are XY and females are XX. We reversed the sex of XX flies from female to male using mutations in tra2 and dsx . These sex transformed flies have rudimentary testes with very few germline cells  and are thus phenotypically similar, although less extreme, than the male progeny of tudor1 mothers . By performing replicate microarray experiments on these diverse samples we obtain a robust picture of sexually dimorphic gene expression in the germline and soma.
Pairs of mRNA samples (Figure (Figure1)1) labeled using Cy3 or Cy5 labeled nonomers were applied to Incyte Fly Gene Expression Microarrays (FlyGEM) and the Cy3 and Cy5 channel intensity values were extracted for study as previously reported . The microarray was designed against release 1 of the Drosophila genome and has been updated to release 3.1. There are many changes in the annotation and future changes are also likely. As a result some of the microarray elements no longer correspond to officially recognized genes (although some detect transcripts) and some transcripts are now recognized by multiple element species. Therefore we use the expression from the 14,611 DNA element species pairs corresponding to Drosophila transcripts (there are at least two of each printed on the array plus many control elements for a total of 31,464) as a metric for 'genes'. Using element species rather than gene names assures that the dataset can be updated as gene models change in the ongoing annotation effort.
In order to characterize a gene as showing biased expression in a particular tissue or sex, one is forced to use arbitrary statistical or magnitude cut-offs. In our replicated heterotypic hybridizations we used both a conservative measure to capture the highest confidence data and noisier, but more inclusive, clustering methods. We express sex-biased and tissue-biased gene expression in terms of percent of the genome for didactic purposes only, as these numbers depend greatly on the arbitrary cut-off. For gene lists, see the extensive set of tables and links to FlyBase in the Additional data files.
We have used conservative calls on competitive pairwise experiments at two-fold. An extensive set of homotypic hybridization experiments (mean r2 of 0.754, and a standard deviation of 0.097) has shown that the 99.5% tolerance interval for measurements on the FlyGEMs is <1.5-fold, and unlike the case with some array platforms the particular element sequence does not contribute significantly to variance . Therefore, this global statistic can be applied to individual elements. Outliers are very effectively dealt with by replicates (Figure (Figure2).2). In heterotypic hybridizations reported here, only 0.01% of the elements (representing 0-6 in each comparison) showed two-fold differences that fail to correlate with the experimental variables in question. Briefly, replication effectively eliminates false positives for tissue-biased expression in theory and practice.
In comparisons of whole adult females and males, 17% of the element species reporting in all experiments showed a greater than two-fold sex-biased hybridization intensity (Figure (Figure2a).2a). Female-biased hybridization intensity was less extensive (6%) than male-biased hybridization intensity (11%). A series of experiments suggest that most sex-biased hybridization intensity is due to the germline (Figure (Figure2;2; see also tables in the Additional data). We found that 1.5% of array element species show sex-biased hybridization in experiments using probes derived from flies with no germ cells (Figure (Figure2b).2b). This distribution was asymmetric, with greater numbers of element species showing high hybridization to male samples (0.9%) than to female samples (0.4%). Similarly, when male or female carcasses were compared, 3% of the element species showed sex-biased hybridization (Figure (Figure2c).2c). The greater degree of sex-biased hybridization observed in the carcass samples may reflect strain differences, which can be substantial [21,23], or germline-dependent expression in the soma. We are currently scrutinizing these possibilities with additional array experiments using flies of identical genotypes, with and without germlines. Briefly, these data indicate that relatively little of the sex-biased expression in whole flies is attributable to the soma.
While the contribution of the germline to sexually dimorphic gene expression can be approximated by examining the fraction of the sex-biased expression in whole wildtype flies minus the sex-biased expression in the soma, a more direct measurement is obtained by hybridizing mRNA from dissected gonads. As we show below, this substantially concentrates the mRNAs from germline cells and increases the depth of the observed expression profile. Ovary and testis samples exhibit the highest degree of sex-biased hybridization in our experiments (Figure (Figure2d).2d). A total of 29% of element species exhibit ovary-biased or testis-biased hybridization. This is in contrast to 14-16% of element species deduced to show ovary or testis-biased hybridization by subtracting the somatic sex-biased genes from the list of element species exhibiting sex-biased hybridization in whole adults. The expression profile is distinctly asymmetric with nearly twice as many element species showing greater than two-fold hybridization to testis samples (18%) versus ovary samples (11%). Additionally, there were far more element species showing a large magnitude of testis-biased hybridization. For example, 2% of element species showed >20-fold hybridization intensity in the testis channel, while <<1% of element species showed a similar magnitude in the ovary channel (n = 4). These data indicate that the cells of the testis deploy a larger battery of specific genes than those of the ovary. Additionally, these data confirm that the most sexually dimorphic tissues, the gonads, show the most distinct gene expression profiles.
A priori, one might expect that the soma would be quite different from the germline, but to what extent is this due to the expression of germline-specific or soma-specific gene batteries? To address this question, we have also examined germline versus somatic expression within a given sex. As expected we observed a high degree of testis-biased hybridization in competitive hybridizations with samples from either males with no or few germline cells (males from tudor1 mothers, and females transformed somatically into males; not shown) or from males with gonads removed (Figure (Figure2e).2e). A total of 14% of element species hybridize preferentially to labeled testis samples, supporting the idea that the gene batteries deployed in a testis-biased manner are extensive. Additionally, these experiments reveal a high degree of soma-biased hybridization - 10% of element species are preferentially hybridized to samples from males with the gonads removed (Figure (Figure2e2e).
Somewhat surprisingly, comparing females with or without germlines did not show the same pattern of soma versus germline hybridization that was evident in the males. Ovary-biased hybridization was modest (4%) in experiments measuring hybridization of samples from ovary versus samples from females with the ovaries removed (Figure (Figure2f).2f). In contrast, soma-biased hybridization was extensive (12%) as was the case when examining male soma and germline (Figure (Figure2e).2e). Thus, unlike what is seen in males, the asymmetry in these ratio plots favors the soma. This suggests that most of the genes expressed in the female soma are also expressed in the ovary and that many genes are expressed specifically in the female soma. The soma is distinguished from the germline by a high degree of soma-biased gene expression.
An overview of all sex-biased gene expression is provided by examining a self-organized heat map of the cross-normalized intensities from each of the hybridizations (Figure (Figure3).3). This also provides an independent method for building gene lists. As expected from the pairwise comparisons, the prominent clusters in this map highlight groups of element species exhibiting testis-biased (12%), ovary-biased (9%), and soma-biased (11%) hybridization. The clusters of element species showing testis-biased hybridization are most dramatic, both in terms of the number showing sex-bias and in terms of the degree of hybridization bias. This global overview shows very little overt sex-biased expression in the soma, and very little germline-biased expression. In agreement with the scatterplot analysis, the heat map indicates that the main batteries of differentially expressed genes distinguish the soma from the germline, and the female germline from the male germline. Again, the lack of an overt class of genes showing germline-biased expression indicates that the male and female germlines have very little in common, other than the absence of expression from a large class of genes that appear to have soma-biased function.
It is standard practice to confirm the expression of genes of interest based on microarray data by RT-PCR or Northern blotting, although this is not practical when thousands of genes show biased expression. Additionally, this confirmation is usually restricted to genes showing differential expression, and therefore does not allow for a good estimate of sex-biased gene expression missed in the FlyGEM experiments. We selected 75 element species from the FlyGEM experiments, across the range of observed intensities and sex-biases, for Northern blotting experiments on adult males and females with and without germlines (Figure (Figure4a).4a). Signal intensities from each lane on the Northerns were measured for each experiment and then used to calculate differential expression ratios among the four sample lanes in a fashion analogous to pairwise microarray comparisons (absolute intensities correlated very poorly, indicating that the techniques have different signal and noise characteristics even though the same amplicons were used). Plotting the ratios from Northerns versus microarrays demonstrates consistent relative measurement of RNA between the two experimental systems (Figure 4b,c,d,e). The majority of the data points fall along the diagonal for wildtype males and females. In the other comparisons, both the microarray and Northern ratios tend to cluster near the origin (a ratio of 1 is 0 in log space) due to lower sex-biased expression in the germlineless flies of tudor1 mothers. Thus, ratios determined by Northern blotting and FlyGEM compare favorably.
Comparing array results across platforms can cross-validate a larger fraction of the genome. Confidence is heightened for element species showing biased expression on two platforms . A Drosophila developmental profile including males and females with and without a germline has also been performed on a cDNA array platform (GEO accession GPL218) . This array is less complete than the FlyGEM and is biased against testis gene expression as no clones from testis cDNA libraries were included in the platform. Thus, overall estimates of sex-biased gene expression are expected to vary between the platforms. However, FlyGEM element species designed against the genes represented by cDNA inserts should correlate if both platforms are performing as expected. We extracted data from 2,536 GPL218 elements also present on the FlyGEM platform for comparative analysis.
The correlation between male/female hybridization ratios was quite good (Figure 5a,b). The trend lines clearly demonstrate that elements showing sex-biased expression on one platform show similar sex-biased expression on the other if one applies a greater than two-fold cut-off to these ratios. The only clearly artifactual result from this comparison is the presence of a substantial number of elements reporting sex-bias only in Arbeitman et al. . This may be due to cross-hybridization as a result of low sequence complexity and repeats in the cDNA inserts - a common problem with cDNA arrays . Nevertheless, this comparison clearly cross-validates the expression profiles for a substantial set of genes. Indeed, the r2 for male/female ratios across platforms is 0.447, which is similar to the correlation observed between replicate hybridizations within the time course dataset (r2 = 0.334 +/- 0.114).
We then tested whether expression ratios from our Northern analysis correlated to ratios derived from Arbeitman et al. . We have Northern data in 16/2,536 elements common to both platforms. The male versus female and 'tud' male versus 'tud' female ratios were plotted (Figure 5c,d). The Arbeitman et al.  data to Northern (Figure 5c,d) and FlyGEM to Northern (Figure 4b,c) correlation was similar.
The Arbeitman et al.  data has been recently re-evaluated in a more statistically rigorous manner to extract the high confidence data from the moderately reproducible primary data . The specific aim of this re-evaluation was to identify genes with sex-specific expression patterns. If the set of cDNA amplicons that most reliably hybridize to female or male samples can also be used to categorize female and male samples of differing genotype from another lab, then confidence in that list of genes is augmented. We therefore took a list of gene identifiers from this more recent study, extracted our corresponding data, and generated a cluster map of samples (Figure 5e,5f). This small subset of elements was sufficient for the unsupervised classification of whole females and males, despite disparate genetic backgrounds. Tissues were also appropriately juxtaposed on the sample tree. Collectively, the scatterplot and heat map comparisons indicate that FlyGEM and Arbeitman et al.  data are in broad agreement and suggests that a full meta-analysis of Drosophila expression data from multiple platforms will be fruitful.
Genes known to have sex-biased expression or function should be present in the appropriate ovary-, testis-, or soma-biased lists. We have mapped the FlyGEM element species to the current release of the Drosophila genome and generated a large set of lists with links to Flybase gene report pages (see Additional data). Following these links leads to more links (for example, literature, additional array data, yeast two-hybrid maps) and following these trails lends further confidence in the array data. For example, on lists of element species showing ovary-bias (Table 21 in Additional data; see also Tables 5, 15, 17, 30, 24, 36, and 38), we find six that represent chorion proteins (Cp15, Cp16, Cp18, Cp19, Cp36, and Cp38). Chorion proteins are components of the egg shell, and are known to be highly over-expressed in the ovary [37,38]. Similarly, structural components of sperm such as beta-tubulin85D , tektin-A  and don juan  are found on testis-biased lists (Additional data, Table 22; see also Tables 6, 7, 9, 31, 33, and 39). A number of other genes showing biased hybridization are known to be required for female or male fertility. These include alpha-tubulin67C, which is represented by over 20 female sterile alleles  and deadhead, which is represented by four female sterile alleles and a single lethal allele . Genes represented by male sterile alleles (for example, boule  and fuzzy onions ) are found on testis-biased lists. The boule gene is also required in mammalian spermatogenesis , highlighting the potential utility of this dataset for those interested in spermatogenesis in mammals. Genes known to encode products required maternally for germline formation in the resulting embryo are represented on lists of genes showing ovary-biased hybridization (for example, oskar , mago nashi , vasa , smaug , and cyclinB ) as are genes required for germline cell division within the ovary (for example, piwi and fs(1)Ya ). Even among the genes with little functional information, there is evidence to cross validate some of them. For example, BEST:GH14656 is represented by 13 testis ESTs from the NIH collection and just two ESTs from the BDGP head library [19,50].
In addition, there are a number of genes showing soma-biased expression in our experiments (Additional data, Table 23; see also Tables 9, 18, 37, and 40). Examples include actin88F , flightin , Tropomyosin1, Tropomyosin2, and upheld  that are known to be required for flight muscle function. Similarly, genes involved in phototransduction (arrestin-2 , ninaC  and five genes encoding rhodopsins ) exhibit soma-biased expression, as do genes expressed in the fat body which are immune response genes (Drosomycin ), or genes that may be required for energy storage (fat body protein 1  and lipid storage droplet protein 1 ). Thus, many of the class of genes showing soma-biased expression are genes used for well-defined soma-specific functions.
Briefly, many of the genes expected to show biased expression do. Additionally, given the importance of these known genes, it seems likely that there will be interesting functions among the unstudied genes showing biased expression in our experiments.
Browsing offers some degree of confidence, but is not amenable to the analysis of large datasets. The Gene Ontology (GO) consortium  has provided a controlled vocabulary to describe gene function allowing us to determine rigorously which functions are over-represented in a particular expression profile. We queried our element species lists with the GO term analysis tool 'Database for Annotation, Visualization and Integrated Discovery' (DAVID) and 'Expression Analysis Systematic Explorer' (EASE) .
The results of GO term analysis are reassuring. Genes with oogenic functions  are highly over-represented among element species with ovary-biased expression (Figure (Figure6a).6a). The most highly over-represented categories are associated with metabolism (P < 1 × 10-20) and are probably associated with biosynthesis required for the manufacture of eggs. Similarly, ribosomes are highly over-represented cellular components (P < 2 × 10-42) as are ribosomal proteins in the protein GO category (P < 8 × 10-52). Genes encoding translational initiation complexes are also highly over-represented (P < 2 × 10-22). This is consistent with the high level of protein production during egg development and the high ribosomal content maternally loaded into the egg. Indeed, mutations in many ribosomal protein genes result in female sterility due to defective egg formation . Also prominently represented are genes involved in DNA metabolism and cell cycle regulation (P < 1 × 10-12, P < 1 × 10-7). These functions may be required for the dramatic endoreplication of nurse cells during oogenesis. Similarly, the over-representation of transcripts from genes required for DNA replication (P < 1 × 10-9), chromatin architecture (P < 4 × 10-5), nuclear organization (P < 1 × 10-4), and DNA packaging (P < 5 × 10-5), may be required for nurse cell polyploidization  or for the rapidly dividing embryonic cells which rely on maternally deposited gene products. Finally, molecular genetic analysis of oogenesis has revealed that a substantial number of these gene products include RNA binding proteins, and this class is also over-represented in ovaries in our microarray data (P < 2 × 10-6).
The representation of GO terms in the testis-biased element species lists was similarly examined (Figure (Figure6b).6b). Again, the most significantly over-represented GO terms are consistent with what we know about spermatogenesis . Genes encoding microtubule (P < 2 × 10-9) or movement (P < 3 × 10-8) function are among the element species with testis-biased expression. These are required for the extensive cytoskeletal reorganization during spermiogenesis . Genes encoding transport proteins (P < 2 × 10-6), including those with dynein-associated ATPase activity (P < 1 × 10-9) are over-represented. Again, this is consistent with the motile nature of sperm. Chaperones (P < 3 × 10-4) are over-represented, which is in agreement with the known requirement for heat shock proteins during spermatogenesis . In testis, cellular component analysis shows high over-representation of gene products associated with mitochondria (P < 9 × 10-14), membranes (P < 1 × 10-4) and the cytoskeleton (P < 3 × 10-7) which is almost certainly due to requirements for sperm axoneme structure.
Non-gonadal female somatic tissues show sex-biased expression of genes related to reproductive processes (Figure (Figure6c).6c). Vitellogenesis (P < 2 × 10-5) is required for egg development, as the vitellogenin or yolk proteins synthesized in the female fat body are the major component of a mature egg. The second category of proteolytic processing (P < 8 × 10-4) might reflect the process required for generating yolk proteins. Ribosomal protein encoding genes are also over-represented (P < 1 × 10-6) in the non-gonadal female soma transcription profile. Similarly, element species encoding lysosome components (P < 2 × 10-5) are over-represented. These functions probably reflect the dual anabolic and catabolic metabolism required to support egg development.
The male reproductive tract includes several additional somatic structures in addition to the testis. For example, the seminal fluid components of the ejaculate are produced in the accessory glands. These glandular secretions mediate sperm competition, reduce post copulatory mating by recipient females, and increase egg laying rates . These element species with 'extracellular' function are over-represented in the male expression profile (P < 4 × 10-5) (Figure (Figure6d6d).
A whole genome transcriptional profile enables us to investigate the genomic organization of tissue-specific gene batteries. Previous experiments have shown that there are significantly fewer genes with male-biased expression on the X chromosome [23,24]. The fourth chromosome is enriched in genes with soma-biased expression (not shown). On a more local level, Drosophila genes with tissue-, temporally-biased expression patterns have been shown to be concentrated in neighborhoods of contiguous genes . We asked if genes with sex-biased expression show a similar sub-chromosomal level organization.
We first examined a heat map of cross-normalized intensities by position along the chromosome arms (not shown). While there were patterns of alternating bands of genes showing greater or lesser intensity along the chromosomes, similar but not as obvious as those reported for embryo and adult , these patterns were also seen in randomized datasets, albeit at a lower frequency. Additionally, these patterns were less striking than those seen when plotting normalized intensity versus process parameters such as position in the microarray. (The FlyGEM platform we used is printed randomly with respect to genome position, so these process artifacts do not affect our previously reported reduction in the density of genes with male-biased expression on the X chromosome , nor the neighborhoods reported below.)
Given that the concept of gene expression neighborhoods is relatively new, we were interested in identifying the most significant examples for detailed future study. To do so, we examined the distribution of element species with both highly ovary-, testis-, or soma-biased expression and high intensity to avoid low intensity noise. This resulted in 791 element species with testis-biased hybridization, 129 element species with ovary-biased hybridization, and 287 element species with soma-biased hybridization, which we mapped onto the genome (Figure (Figure7).7). We also generated randomized datasets to determine if the distribution pattern of ovary-biased, testis-biased, or soma-biased expression is non-random. To do this, 100,000 randomizations were performed for each gene position over windows (consecutive genes along the chromosome) ranging between five to 200 genes. We used sliding windows of various sizes to determine how often an equal or greater number of elements reporting highly biased expression occurred in a random set versus the real data. A total of 20 gene expression neighborhoods were identified with a P-value < 10-3, 15 at P <10-4 and seven at the lowest possible P-value of <10-5 (0/100,000). The most clear-cut examples of gene expression neighborhoods include approximately 10 genes within a chromosomal region spanning between 20 to 40 genes (Figure (Figure8).8). This is consistent with the report of 10- to 20-gene expression neighborhoods shown in microarray analysis of Drosophila embryonic versus adult transcription .
There are several possible reasons for co-expression of neighboring genes. One possibility is that they are members of gene families or are part of the same transcription unit. Indeed, a recent article suggests that operons and duplicate genes account for many of the gene expression neighborhoods in Caenorhabditis elegans . While there is no evidence for extensive operon structure in Drosophila, genome annotation errors might potentially introduce artifactual evidence for gene expression neighborhoods if genes thought to be distinct are in fact derived from a common transcript. Alternatively, gene expression neighborhoods might reflect the local transcriptional environment via a neighborhood-wide control structure [66,67]. In order to distinguish between these models, to a first approximation, we examined the seven P < 10-5 neighborhoods more closely.
All of the most significant gene expression neighborhoods contain genes expressed from either strand, indicating that neither polycistronic messages nor annotation errors are causal (Figure (Figure8).8). Additionally, there are no overt gene families or groups of gene ontology terms associated with six of the neighborhoods, suggesting that they are unlikely to be derived by gene duplication, although one is a neighborhood of paralogous genes (Figure (Figure8e).8e). This latter soma-biased gene expression neighborhood consists of a group of distinct genes encoding trypsin-like proteases . Such gene arrangements usually represent gene duplications from a common ancestor and are often co-expressed. The unusually high number of Drosophila genes encoding trypsins has been previously noted , it appears that repeated gene duplication may be the cause.
We have carefully cataloged sex-biased expression in Drosophila adults using 88 samples of wild-type flies and tissues, germline-ablated flies, and flies with sex transformed genotypes with 14,611 pairs of elements representing exons from release 1 of the Drosophila genome. We have also carefully examined platform performance  and have extensively analyzed the dataset for internal consistency as well as cross-validation with other array datasets and Northern blotting. The lists of genes in the survey of sex-biased expression will be a valuable resource for ultimately understanding how sexual dimorphism is genetically encoded and executed. Additionally, this dataset has already proven valuable in the study of rapidly evolving genes with male-biased expression and in understanding X chromosome evolution . Sex-biased expression may also be a convenient tool for understanding functional consequences and mechanistic underpinnings of gene expression neighborhoods.
In broad terms, we identify three large batteries of genes with ovary-biased, testis-biased and soma-biased expression. The germline and soma of Drosophila are the first lineages to separate in the developing embryo, so we tend to think of germline versus somatic tissue as being perhaps the most significant dichotomy. In addition to early divergence in the embryo, the germlines of females and males are functionally similar - they undergo meiosis and produce gametes. One might therefore expect that female and male germ cells would express many genes that are silent in somatic cells. There is a small core of genes showing germline-biased expression (~3%); our data, however, suggest that most of the genes showing germline-dependent expression also show sex-biased expression. Essentially, adult ovary and testis are as different from each other as they are from somatic cells. A major difference between the germline and soma is the absence of gene expression from loci encoding terminal differentiation proteins (for specific adult somatic structures such as eyes and muscles) in the germline.
The lack of a large battery of genes showing germline-bias could also be due to expression of sperm or egg differentiation products that overwhelm the signal from genes expressed in the stem cells, which show no overt sexual dimorphism. Early undifferentiated germ cells from males and females would be expected to have more similar gene expression profiles. However, preliminary global analysis of transcripts in undifferentiated germ cells are also revealing a modest set of genes with germline-biased expression. The germline transcription profile is defined by the transcripts from genes encoding somatic terminal differentiation products and by sex. Germline development is under the control of sex determination signals from the time the embryonic genome is activated . Perhaps it is better to think of the male and female germlines as quite distinct entities, rather than lumping them together as 'germline'.
Sex-biased expression in ovaries and testis were by far the most dramatic in terms of both numbers of genes and the degree of sex-bias. While most of this sex-biased expression is clearly explained by expression within the germline, there also appears to be striking germline-dependent expression outside of the gonads. Flies with and without a germline are often used to distinguish between genes that are expressed in the germline from those that are expressed in the soma . Technically, a gene expressed in wildtype adults but not in adults missing a germline is said to be germline-dependent, not germline expressed. In comparing flies without a germline to those whose gonads had been removed, we found that there was greater sex-biased expression in the latter. These data suggest that the germline acts at a distance to condition the soma. This does not appear to be due to strain differences, as preliminary analysis of gonadectomized flies of identical zygotic genotypes but which had or were missing germ cells, shows substantial gene expression differences. Thus, it appears that much of the sex-biased expression in the Drosophila soma is due to conditioning by the germline, rather than the somatic sex determination hierarchy.
A catalog of gene expression data is of limited value if it cannot be easily accessed. In addition to depositing all the array data at GEO, we have run a number of queries and provided the resulting gene lists in the Additional data files. Any of the lists of element species in the Additional data associated with this manuscript may be browsed in order to confirm the performance of the platform with a gene of interest, or for finding a novel expression pattern for a known gene. We have highlighted just a few of the stories that are embedded in the array dataset reported here. To list just a few examples, the preliminary data reported here suggest that germ cells are dosage compensated . The dot fourth chromosome is especially active in the soma. Odorant-binding protein-encoding genes with highly sex-biased expression might be quite interesting candidates in the pheromone response.
It is our hope that this and other datasets generated on this platform will be useful to those who are interested in data mining for candidate genes. Additionally, there was reasonably good agreement between the array datasets here and those of Arbeitman et al. [20,36], which suggests that Drosophila array data from different laboratories will be directly comparable. This further suggests that array data will be broadly useful to the wider Drosophila community. However, it is also clear that there will be artifacts in array datasets, and indeed in the underlying genome annotations. Strain differences and the presence of genetic markers are also likely to confound results; array data should be viewed with this firmly in mind.
We used standard Drosophila techniques throughout. Flies were grown at 25 ± 0.5°C on GIF or PB media (KD Scientific, Columbia, MD, USA) and aged 5-7 days prior to use. Most alleles and transgenes have been previously described and can be found, with references, at FlyBase . Flies wildtype for sex were of the genotype y1 w67c. To study the contribution of the body as a whole and the gonads, flies were dissected and snap frozen as described . To study the germline-dependent gene expression profiles, samples were prepared from the progeny of homozygous tud1 bw1 sp1 females. Flies were dissected to confirm germline loss (tudor1 is not fully penetrant for germlineless-ness, only individuals with atrophic gonads were retained) as above. This sham dissection also facilitates comparisons to gonadectomized samples. The effects of somatic sex determination mutants on sex-biased gene expression included examining sex-transformed flies using doublesex (+/+; dsxM+R45/dsxswe) and transformer 2 (w67c/+;tra2B/Df(2R)trix) mutants. Sibling w67c/BsY;tra2B/Df(2R)trix were used in a direct comparison for one group of experiments.
An extensive platform description is available  and at the Gene Expression Omnibus website  under accession number GPL20. RNA isolation, labeling, and hybridization have been previously described .
Hybridized slides were read on a GenePix 4000 A to generate TIFF images which were imported into GemTools (Incyte Genomics, Palo Alto, CA, USA) for spot finding. Data was normalized and corrected in Array Qualifier (Novation Biosciences, Palo Alto, CA, USA). Intensities <2.5-fold above local background were discarded. Raw intensity data and corrected data from Array Qualifier were loaded into an Access database and analyzed in Excel (Microsoft, Redmond, WA, USA). The GEO data accessions for these hybridizations are: GSM2456, GSM2458, GSM2460, GSM2462, GSM2464, GSM2466, GSM16542, GSM16554, GSM16556, GSM16569, GSM16570, GSM16572, GSM16574, GSM16576, GSM16578, GSM16580, GSM2457, GSM2459, GSM2461, GSM2463, GSM2465, GSM2467, GSM16650, GSM16555, GSM16571, GSM16573, GSM16575, GSM16577, GSM16579, GSM16581, GSM16582, GSM16584, GSM15686, GSM16588, GSM16593, GSM16613, GSM16614, GSM16615, GSM16619, GSM16583, GSM16585, GSM16587, GSM16589, GSM16609.
For analysis of ratios, the Cy3:Cy5 signal for each element was averaged over biological replicate experiments and dye flips and then averaged from duplicate elements. Elements failing to return a ratio value from duplicates were excluded. Boolean queries were run to generate gene lists of particular interest (for example, high in ovary and female, and low in all others). For analysis by intensity, array channels were cross-normalized in Array Qualifier.
For array to array comparisons, the GPL218 platform table and 40 sample datasets representing 20 biologically replicated adult stages were downloaded from the GEO website. For comparison to the GPL20 data, Log2 VALUE data (ratio of experiment to reference sample) from eight sample sets representing four biological replicates for five day adult Canton S or 'tud' males and females (GEO accessions GSM3551, GSM3552, GSM3518, GSM3519, GSM3531, GSM3532, GSM3537, GSM3538) were converted to Ln values. The ratios from each biological replicate pair were averaged and these experiment/reference sample ratios were then rearranged into male/female and 'tud' male/'tud' female ratios for parallel platform comparison. A total of 2,536 overlapping elements between GPL20 and GPL218 were selected based on Flybase IDs and plotted against the GPL20 counterpart (sample accession: GSM2456, GSM2458, GSM2457, GSM2459, GSM16582, GSM16583) or Northern ratios.
To analyze sex-biased gene expression as normalized intensities among multiple microarray experiments, heat diagrams were generated by the Cluster 3.0 for Mac OSX program  formatted for OSX by Michiel deHoon . A normalized set of signal intensity data was prepared for Cluster by averaging duplicate spots (the two duplicate features within an array) from 26 cross normalized microarray experiments. The sample columns were fixed, while genes were clustered. A self organizing map (SOM) was made using default parameters (10 clusters). The resulting SOM file was then used to perform k-means clustering. The output file was visualized with JavaTreeview (adapted by Alok Saldanha at Stanford University) .
A similar heat diagram was processed for analyzing the gene neighborhoods using this normalized dataset. A Treeview-generated image of the normalized intensities was made using an input data file that sorted the data for each gene by position along the chromosome. Data was sorted first by chromosome arm and then by distance along the chromosome using the first nucleotide of the microarray element amplimer as the positional reference relative to the annotation.
To produce the array to array comparison heat map, normalized intensity data from the FlyGEM platform was extracted for gene identifiers from Arbeitman et al., Table 2 . Heat maps were performed by k-means clustering of 26 female and 33 male soma-biased element species. Both genes and samples were clustered.
Analysis of gene function was done using the vocabulary of the Gene Ontology Consortium  and applying it to the pairwise expression ratio gene lists processed through the bioinformatics tools DAVID and EASE  (search performed November, 2003). The LocusLink identifiers for all microarray features showing greater than two-fold differential expression ratios were loaded into the DAVID website and search parameters set to include all available categories from the three GO ontologies (biological process, cellular component and molecular function). A statistical measure of the representation of all GO categories was done with the EASE bioinformatics tool. The statistical probability for representation of genes within a given category relative to the total number of genes with GO function terms (EASE score) was calculated for each category using the Drosophila-specific database as a background filter. Drosophila features with GO term categories with EASE scores of P < 0.001 were reported.
To test if genes with sex-biased expression are non-randomly distributed into neighborhoods, we performed 100,000 simulations per window. The input data are stringently selected lists of genes that are differentially expressed in ovaries (ovary versus testis or females, no gonads), testis (testis versus ovary or males, no gonads) and somatic tissue (female and male, no gonads versus their respective gonads). The microarray data elements include those shown to be greater than three standard deviations above the mean local background in one channel while the signal from the other channel was within background levels. These lists are binary. The chromosomal locations for all genes were mapped on the genomic scaffolds from version 3.1 of the Drosophila genome annotation using the first nucleotide position from the sequence of each microarray element. Redundant elements were then removed based on their primer sequences to minimize spurious clustering.
To control for non-random distribution of gene models on chromosomes, we made use of gene order rather than exact chromosomal position. This is addressed using a nonparametric permutation test approach . Because there is a multiple testing issue , a large number of dependent tests (since the windows are allowed to overlap) are performed, one for every gene window, so the measures of significance have to be calculated to account for the number of tests performed.
We determined a significance level to associate with the test of clustering in a given window as follows. In the given window (w) we determine the number (n) of sex-biased genes. A series of 100,000 Monte Carlo iterations are carried out in which the sex-biased genes and the non-differentially expressed genes are shuffled among the gene positions and the relative frequency of at least n sex-biased genes occurring in some window of the same size as w on the chromosome is determined. Thus, our empirical P-value can be interpreted as the probability of such a degree of clustering occurring in at least some window of the size by chance alone.
A subset of 75 element species were chosen as probes for Northern blotting on Hybond-N+ membranes (Amersham, Piscataway, NJ, USA) essentially as described in . Probes were selected to cover the full range of absolute intensities and male/female differential expression revealed in microarray experiments. Blotted mRNAs were from flies wildtype with respect to sex and germline (y1 w67c) or from flies with no germline (progeny of homozygous tudor1 mothers). These same genotypes were used for labeling reactions in microarray experiments. Amplicon probes were made using the same primer pairs used in microarray construction and were labeled using Redi-prime II (Amersham). Northerns were hybridized at 42° in UltraHyb (Ambion, Houston, TX, USA) in 15 ml conical tubes in a bacterial shaker. Blots were images on a Storm 860 phosphorimager and quantified using ImageQuant (Molecular Dynamics, Sunnyvale, CA, USA). A total of 73 Northerns were successful (passing visual inspection and showing bands above background, see Figure Figure4a).4a). While absolute intensities between microarrays and Northerns showed poor correlation (not shown), which is perhaps expected for different assays with different efficiencies, the relative measures showed very good correlation.
A zip file containing 42 Excel tables
An rtf file detailing the contents of the tables
We thank the Bloomington Stock Center for providing stocks. We also thank Dr. Margaret Cam, Dr. Thomas Brody, the Genome Biology referees and members of the Oliver lab for critical reading of the manuscript, and Bruce Baker for providing an advance copy of Arbeitman et al. .