Functional profile of EST libraries
Three non-normalized cDNA libraries were made from the eye-antennal imaginal discs and optic lobes of T. dalmanni
at three developmental stages: third instar wandering larva (L), early pupae (P1) and mid pupae (P2). We generated 33,229 high-quality expressed sequence tags (ESTs) from these libraries with 66% of the ESTs coming from the larval library, 10% from the early pupal library and 24% from the mid-pupal library (Table ). The ESTs are available through NCBI [Genbank:GO271638
]. The ESTs were assembled into 7,066 clusters containing a total of 11,545 consensus sequences (conseqs). In general, multiple conseqs within a cluster resulted from the presence of non-overlapping sequence, allelic variation or alternative transcripts. A total of 3,697 of the clusters had two or more conseqs and 142 clusters had four or more conseqs. 4,015 clusters contained at least one conseq that had a significant Blast hit (< 1e-9
) to a protein in D. melanogaster
and 3,410 unique genes were represented in this list. When blasted against the protein database for Anopheles gambiae
, 3,296 clusters, comprising 2,847 unique genes, had significant hits. Among the clusters that had a hit to D. melanogaster
, but not A. gambiae
, there were 504 unique genes, while 14 clusters had a hit to A. gambiae
but not D. melanogaster
. An additional 384 clusters that did not have significant sequence similarity to D. melanogaster
or A. gambiae
proteins had significant hits to sequences in the non-redundant protein (nr) and nucleotide (nt) databases with approximately half of these hits being to transposable elements. Fourteen clusters had significant hits to microsatellites previously identified in T. dalmanni
Summary statistics for the sequencing of the eye-antennal cDNA libraries.
Because eye-stalks are a complex structure involving modification of the brain, eye, optic nerves and head case, no single GO category is likely to include all, or even most, of the genes involved in eye stalk development. Therefore, we selected from the Gene Ontology database a number of different Biological Process categories, such as cell growth and regulation of cell size and cell shape, that are likely to be important factors in eye-stalk development and calculated for each category the percentage of D. melanogaster genes within a given category for which homologues have been identified in the Teleopsis EST database (Figure ). For instance, about 56% of the genes known to affect cell growth in D. melanogaster have been identified in the T. dalmanni libraries. Overall, we have at least 50% of the genes in most GO categories and 53.6% of all the relevant genes across all categories. This catalogue of genes provides a valuable framework for a comprehensive examination of gene expression in the developing eye disc. In addition, these numbers likely underestimate the percentage of relevant genes discovered in our EST database because the gene lists in Drosophila include all genes that are expressed in any developmental stage or participating tissue. Only a subset of these genes is likely to be relevant during eye-antennal disc development.
Figure 1 Percentage of Drosophila genes identified in T. dalmanni EST database. For Biological Process (BP) categories that are likely to be important in eye-stalk development and evolution, the bars represent the percentage of D. melanogaster genes belonging (more ...)
Developmental changes in gene expression
Comparison between the EST libraries from the larval, early pupal and mid-pupal developmental stages indicates some significant shifts in the type of genes expressed at each stage. We identified several GO categories that exhibited significant over-representation in one of the libraries relative to the EST database as a whole (Figure ). In general, the larval stage is characterized by an over-representation of genes involved in anatomical structure formation, transcription, and cell proliferation while the pupal stage exhibits an increase in genes involved in neurogenesis and energy pathways followed by a substantial increase in cuticle production.
Figure 2 Gene expression differences among developmental stages. The percentage of genes in each developmental stage library is presented for GO categories that exhibit significant over-representation in one of the three libraries relative to the EST database (more ...)
Examination of the expression levels for individual genes also reveals substantial differences across libraries as numerous genes are expressed primarily in one developmental stage. Table lists the genes with the largest stage-specific expression patterns as measured by the R-statistic [32
]. Thirty additional genes (Nop56
, Tenascin major
, Chromatin assembly factor 1 subunit
, Cyclin A
, Eukaryotic initiation factor 4B
, innexin 3
, Minute (2) 21AB
(3535660), Protein phosphatase 19C
, Ribonuclear protein at 97D
(3532397), Arginine methyltransferase 1
, Tenascin accessory
, blown fuse
, Chloride intracellular channel
, eyes absent
, frizzled 2
, ORF-141, Protein disulfide isomerase
) were represented by eight or more cDNAs in the database and were expressed exclusively in the larval library, but had lower R-statistic values due to the disproportionately high EST sampling from this library.
Abundant, stage-specific transcripts identified in the late larval (L), early pupal (P1) and mid pupal (P2) EST libraries.
In terms of gene families, all representatives of the Enhancer of split complex (E(spl) mα
, E(spl) mβ
, E(spl) m3
, E(spl) m4
, E(spl) m7
) identified in the EST libraries were expressed in the larval tissue. In contrast, we identified 11 genes from the Osiris complex (Osi1
) and 9 were expressed exclusively in the mid-pupal stage. Little is currently know about the function of the members of this gene family [33
], but their strong association with this pupal stage may provide insight about their role in development. Analysis of the GO terms (Figure ) and gene expression levels (Table ) indicate that the production of structural cuticle substantially increases during the middle of pupation. In fact, we identified a total of 19 Cuticular proteins (Cpr30F
) in the EST database comprising over 200 cDNAs and all but two of them were expressed exclusively in the mid-pupal library.
The distribution of transposable elements within the EST database of T. dalmanni exhibited a particularly strong relationship with developmental stage. We blasted (using tblastx) all the conseqs in the EST libraries against the D. melanogaster Transposable Element (TE) database and found 191 clusters with significant hits to a total of 71 different types of transposable elements (Table ). These clusters are more likely to contain reads from the larval library than either of the pupal libraries. Of the 191 clusters with TE hits, 170 are derived exclusively from cDNAs from the larval library, while 10 clusters are exclusive to the early pupal library, 7 clusters from the mid pupal library and 3 clusters from more than one developmental stage. If we look at the total number of cDNAs within these clusters, 252 of the 279 TE cDNAs (90%) come from the larval library whereas only 65% of the total number of cDNAs was sequenced from this library. Furthermore, the mid-pupal library produced 24% of all cDNAs but this stage contains only 4% of the TE reads. This distribution of TE cDNAs across libraries produces an R-statistic = 22.15 and a χ2 = 78.23 (p < 0.0001), indicating that the expression of TEs is significantly biased toward the larval developmental stage.
Transposable elements identified in the late larval (L), early pupal (P1) and mid pupal (P2) EST libraries.
Developmental shifts in the pattern of TE expression have been detected in Drosophila
] and there are numerous host factors that can influence the spatial and temporal regulation of TEs [36
]. Most Class I retrotransposable elements are dependent to some extent on the host transcriptional machinery or specific host transcription factors for their expression [38
]. The late larval developmental stage in T. dalmanni
contains an overabundance of genes involved in transcription (Figure ) that may be influencing the pattern of TE expression either through the presence of specific transcription factors or a general increase in the core transcriptional machinery. However, there is also a disproportionately high number of Class II DNA transposons in the late larval library and these TEs are less likely to be influenced by the dynamics of host transcription activity [36
]. An alternative explanation is that mechanisms involved in TE suppression are more active during the pupal stages. Organisms possess a vast array of epigenetic mechanisms that suppress TE activity [37
]. One such defense system involves the Argonaute gene family. Proteins from these genes bind to small guide RNAs to form TE silencing complexes [42
, which has been shown to form TE silencing complexes in the somatic tissue of Drosophila
], was identified in the EST database (two cDNA clones from the early pupal library) but not in sufficient quantity to determine if differences in relative distribution exist across developmental stages.
Paralogous gene assessment
Gene duplication is an important source of novel genetic variation that can facilitate morphological evolution [44
]. Analysis of EST databases provides a valuable tool for identifying gene duplications or gene expansions particular to the taxon of interest. The central issue in this analysis is to differentiate paralogous gene pairs from allelic variation or alternative transcripts. Here, we have taken a conservative approach and tentatively labeled two or more clusters as paralogous genes established in the diopsid lineage if they meet two criteria: (1) they have greater than 10% amino acid divergence from each other and (2) they are both more closely related in a phylogenetic analysis to one specific Drosophila
gene than to any other Drosophila
gene. Using these criteria, we identified 20 pairs or sets of clusters that may have arisen from gene duplication events since the split between Teleopsis
(Figure ). Alternatively, the duplications may have occurred prior to the Teleopsis
split with subsequent loss of one of the copies before the Drosophila
Figure 3 Putative gene duplication events in the T. dalmanni lineage. A) cdc2, B) CG10907, C) CG16886, D) CG31075, E) CG31917, F) CG6769, G) CG7214, H) CG7713, I) crooked legs, J) Cuticular protein 35B, K) Decondensation factor 31, L) Hemomucin, M) Leucine-rich (more ...)
The branch lengths for the trees in Figure provide some indication of the amount of divergence between the T. dalmanni clusters relative to the divergence across Diptera. In 16 of the genes, the divergence between the T. dalmanni clusters is greater than that between D. melanogaster and D. pseudoobscura suggesting that these are true paralogs and not different alleles of the same gene. The lack of monophyly for a number of the cluster pairs may result from the short length of some of the EST amino acid sequence. Sequencing of the entire protein coding region of these genes for T. dalmanni and a congeneric taxa will ultimately be necessary to confirm that these clusters are true paralogs and that both copies are functional across the entire length of their sequence. Examination of the gene ontology terms for these putative duplicates indicates that the set of 20 genes is significantly overrepresented for genes involved in spermatogenesis (p = 0.004; Ribonuclear protein at 97D, quaking related 58E-3, cdc2) and mRNA binding (p = 0.005; suppressor of variegation 205, alan shepard, Ribonuclear protein at 97D, quaking related 58E-3, Hemomucin)
Genes that play an important role in the rapid diversification of eye-stalks within diopsids may exhibit rapid rates of change at the protein level. Therefore, we used a measure of relative protein divergence specific to T. dalmanni in order to identify genes and GO categories undergoing substantially faster evolution in this lineage. This measure differs from Blast identity percentage because it standardizes the amount of change in the lineage leading to T. dalmanni by the total amount of evolutionary change across other flies. Relative divergence is expressed as the percent of total tree length comprised by the branch leading to T. dalmanni.
Translations of putative protein coding sequence data were obtained for 4,450 contigs comprising 3,230 unique genes with significant homology to a gene in D. melanogaster. After alignment to Drosophila and Anopheles sequences, trimming of poorly aligned sequences and concatenation of multiple non-overlapping fragments, we were left with 2,604 genes for analysis. The gene set contained an average of 342 aligned amino acids (aa) per gene and ranged from 3,673 aa (shortstop) to 50 aa (alignments shorter than 50 were excluded from the analysis). Based on a pairwise relative rate test, 72 genes were evolving significantly faster in T. dalmanni than in each of the three Drosophila species included in the alignment. Figure shows the branch lengths for the 20 most rapidly evolving genes within diopsids that had at least 150 aa in their alignment.
Figure 4 The twenty fastest evolving genes in T. dalmanni. A) beta-Tubulin at 85D, B) CG17293, C) rhea, D) crooked legs, E) CG9520, F) ran, G) lesswright, H) CG4598, I) CG6480, J) TBPH, K) Small ribonucleoprotein particle protein B, L) CG31352, M) alien, N) expanded (more ...)
Contrary to expectations, functional analysis of the relative gene divergence estimates did not indicate more rapid rates of evolutionary change for genes involved in biological processes expected to be important in eye-stalk development and evolution. Analysis based on the relative divergence rates for all 2,604 genes indicated only a single category, transcription initiation from RNA polymerase II promoter (GO:0006367), was evolving significantly faster than the T. dalmanni genes on average (Figure ). In contrast, several biological process categories–and some, such as neurological system process, compound eye photoreceptor fate commitment and regulation of cell growth, that are likely to be important in eye-stalk development–are evolving significantly slower than expected (Figure ). Overall, when we combine all the genes that fall within the 'important eye-stalk' BP categories listed in Figure , these genes are evolving slightly slower, but not significantly different, than the rest of the genes (P = 0.099, t = -1.65).
Figure 5 Biological process categories undergoing significantly slower or faster rates of evolutionary change. X-axis represents the percentage of the total tree length comprised by the branch leading to T. dalmanni. The sample sizes for the various process categories (more ...)
The numerous slow-evolving categories that characterize the T. dalmanni
lineage may result from strong stabilizing selection acting on genes in these categories in general. If positive selection is operating on these biological processes at all it may be limited to only a few genes in each category. For instance, the gene crooked legs
) is involved in two of the slow-evolving categories–cell adhesion and tissue development–and, recently, has been shown to represent an important step in the pathway linking ecdysone signaling and cell proliferation [47
]. Unlike other genes involved in these BP categories, crol
is also undergoing extremely rapid protein evolution (Figure ) and appears to have duplicated at least twice within the lineage leading to T. dalmanni
(Figure ). Alternatively, the gene set characterized by the list of 'important eye-stalk' BP categories may not reflect the processes that are actually important in eye-stalk evolution or rates of protein evolution may not provide a valuable indicator of their relevance to eye-stalk evolution. It is also important to note that, because these estimates of gene divergence are derived from ESTs that represent only partial gene fragments, the estimated rates of evolution may be different when the entire protein coding sequence is evaluated. Furthermore, additional sampling of species within Diptera is necessary to verify that apparent rate differences between T. dalmanni
are actually specific to diopsid lineages.
Divergence in gene expression between selection lines
Using the EST sequences to construct probes for oligonucleotide microarrays, we conducted an experiment examining differences in gene expression between lines of flies selected for longer or shorter relative eyespan. Analysis of eight replicate arrays [Genbank:GSE15444] revealed that 367 of 3,105 genes exhibit differential expression based on a false discovery rate of 1%. The d-statistics for the significant genes were either greater than 2.26 or less than -2.26. Of these genes, 44 had d-statistics exceeding 5 (Table ), 27 exhibited more than a two-fold difference in expression between the lines, and one gene, CG11577, exhibited 8 times greater expression in flies from the most extreme high line than from the most extreme low line. Among the 367 significant genes, 40 exhibited no detectable homology to any gene in the Drosophila database, and 50 belonged to the set of 'important eye-stalk' genes defined in Figure . Fourteen of the differentially expressed genes (by S6, cdc2, CG10283, CG1575, CG15835, CG31917, CG4598, CG6480, Kruppel, lesswright, lethal (2) k14505, par-6, Ribonucleoside diphosphate reductase small subunit, Small ribonucleoprotein particle protein B) are also evolving significantly faster in T. dalmanni than other flies and, of these, 2 genes (cdc2 and CG31917) have duplicates in the EST database.
Differentially expressed genes between flies that have been bred for increased and decreased eyespan.
Functional analyses of the microarray results do not reveal a strong pattern of relationships among the set of 367 significant differentially expressed genes. GeneMerge did not identify any GO categories with significant overrepresentation when all differentially expressed genes were included in the analysis set. When only genes that were significantly down-regulated (i.e. had higher expression values in the low line flies) were analyzed, genes involved in RNA splicing factor activity, transesterification mechanism (GO:0031202) were significantly overrepresented (P = 0.0362). No significant GO categories were found among the set of up-regulated genes.
We also tested for differences in the mean d-statistic for each biological process category relative to all the genes. This functional assay utilizes the expression values for all of the genes, not just those with significant expression difference between lines. Based on this analysis, genes that play a role in gamete generation (P < 0.001), embryonic development (P < 0.001), cell-cell adhesion (P = 0.015) and neurological system process (P = 0.03) have significantly higher mean d-statistics than an average gene set of similar size. This result indicates that the high line flies had higher expression values for genes in these GO categories than the low line flies. It is likely that this pattern results from a shift in either the developmental timing or allometric relationship among body parts between lines. For instance, genes involved in embryonic development are expressed relatively early in the metamorphic process compared with other genes in the EST database, so if high line flies have delayed their differentiation to allow for extra imaginal disc growth relative to low line flies, an increase in the expression of embryonic development genes might occur. Consistent with this interpretation, flies from the high lines exhibit a steeper allometric relationship between eyespan and body length [27
] and take longer to develop [48
] than flies from the low lines. Similarly, the optic nerve may represent a larger proportion of the overall tissue in high line flies than in low line flies, which could result in a slight increase in the overall level of gene expression for neurological system process genes. Distinguishing between these possibilities will require additional experiments in which gene expression between replicate and control lines is compared at multiple time points during development.
Relative expression of eight genes was estimated using quantitative reverse transcription PCR (qrtPCR). The correlation between the average relative expression detected by the microarray and by qrtPCR was 0.83 (Figure ). Variation was considerably greater for qrtPCR estimates because four, rather than eight, replicates were performed. Nevertheless, all four genes with greater high than low line expression showed increased high line expression by qrtPCR and the four genes with greater low than high line expression showed increased low line expression by qrtPCR.
Figure 6 Relative gene expression estimated by oligoarrays correlates with relative expression estimated by qrtPCR. Calculation of fold change for eight target genes was estimated relative to the expression level of a control gene, GAPDH, using the 2-ΔΔCT (more ...)