In the first genetical genomic study with RILs performed in insects, we show evidence that adult Drosophila melanog
aster (fruit flies) have 12 major master modulatory trans
-eQTLs that contain from 96 to 278 genes each, which places this organism somewhere between C. elegans
and mice in terms of the number and complexity of the trans
-eQTL targets (transbands) [14
]. We extended the genetical genomics approach by comparing control flies with flies exposed to the developmental neurotoxin lead and identified transbands that differ in the two environments. The most interesting finding is that 33 genes are co-regulated by two transbands, one on the second chromosome at 30AB and one on the 3rd
chromosome at either 72A or 73D which is located in the adjacent 5 cM window that was used to define transbands. Because the resolution of QTLs is often larger than 5 cM, it is likely that the 3rd
chromosome transbands are the same locus, which for simplicity we will call the “73D transband” in and in the following paragraphs.
Model for co-regulation by trans-eQTL at 30AB and 73D
These 33 co-regulated genes show significant genotype-by-environment (GxE) interactions for both the 30AB transband and the 73D transband. We have recently shown by traditional phenotypic QTL analysis that the 30AB locus is involved in lead-dependent changes in locomotor activity, which we believe is an important validation of the 30AB eQTL transband [53
]. We propose a model based on our data that a trans
-regulator located at 73D increases expression of its target genes when it binds lead, regardless of the genotype of the 73D transband (). In contrast, a second trans
-regulator located at 30AB only increases expression (or somehow increases steady state levels of mRNAs via transcriptional or post-transcriptional mechanisms) of the co-regulated target genes in the presence of lead when it has the ORE
genotype (). This model can explain much of the data, but other possibilities exist. For example, the putative trans
-regulator could increase the stability of the target gene mRNAs or there could be an indirect effect on steady-state mRNA levels. Fine-mapping the genes that underlie the QTLs, which will be aided by sequencing both of the ORE
genomes, should help in determining the precise mechanism involved.
The first genetical genomics study that identified genes with significant GxE interactions was a study in C. elegans
in which the authors identified a group of genes with trans
-eQTL that are induced by heat shock, so called “plastic QTL” or pQTL [72
]. Smith and Kruglyak recently did a detailed analysis of GxE-eQTL in yeast (which they call “gxe
QTL”) grown in either glucose or ethanol as a sole carbon source [104
]. Other laboratories have identified genotype-by-environment interactions in which the ‘environment’ is a different tissue (e.g .
, brain versus liver [55
]), but we believe that ours is the first ‘genetical toxicogenomic’ study that combines genetical genomics with toxicogenomics. The benefit of adding environmental perturbations in genetical genomics studies is discussed in a recent paper by Li and colleagues [73
]. Interestingly, we identified a GxE-eQTL transband with 814 genes that does not correspond to a main-effect eQTL transband. Of these 814 genes, 59 of them are in the GO category ‘nervous system development,’ and 51 of these genes have significant GxE-eQTL when L (Line) is a non-dependent covariate. We are undertaking a study to identify all of the genes with GxE-eQTL in Drosophila
, where L (line) is a non-dependent covariate, but this is beyond the scope of the present study. The meaning of a GxE-eQTL without a main-effect eQTL is controversial, and more work needs to be done to determine whether such GxE-eQTL are biologically meaningful.
We showed that there are 12 trans
-eQTL transbands; 5 control transbands and 7 lead-treated transbands at 11 different loci (). The existence of trans
-eQTL transbands has been validated in yeast [13
], but there is controversy as to whether trans
-eQTL transbands identified in other model organisms are real or correlation artifacts [30
]. This has been illustrated by a simulation study that showed that the five most populated bins of expression data contained 20% of the significant, but spurious, trans
]. One suggested solution, which we did in this manuscript, is to perform permutation analyses of the eQTL data (i.e
., permuting the genotype assignments only and keeping the original expression values for all the samples) [12
]. However, such permutation analyses, done by others, seemingly invalidate numerous of the previous studies if one only considers the size of the transband as the most important criterion [12
]. Therefore, we have more extensively analyzed the permutations in our study and found that (1) the number of transbands, (2) the total number of eQTL, and (3) the number of identical genes in control and treated transbands might be more important criteria than the absolute size of the transband.
Another approach to validating transbands is to test gene expression for individual probesets within them for significant genetic correlation with variation among lines for functionally related traits assessed independently. The 30AB region of the Drosophila
genome has been identified as both a behavioral QTL for changes in locomotion in response to lead treatment [53
] and as a transband in the present study that regulates 194 probesets in response to lead. We can therefore search for individual probesets within the 30AB region showing variation in gene expression after lead exposure among roo lines that is correlated with variation among the same lines for their behavioral response to lead to further identify potential candidate genes for both the behavioral and transband eQTL. Any probesets identified in this manner can be further screened by correlating gene expression from candidate probesets linked to 30AB with gene expression from candidate probesets regulated from the 30AB transband.
For present purposes, we focused on 122 probesets that include 30AB and the proximate half of the adjacent regions 29F and 30D (measured by base-pair distances). We identified seven probesets among the 122 spanning 30AB that showed correlations at a level of p<0.05 (). We identified Ggamma30A (CG3694), involved in G-protein signal transduction and cell cycle regulation as the probeset in the 30AB region with gene expression most strongly correlated with both lead-induced behavioral changes and lead-induced changes in gene expression among those probesets regulated from 30AB. Ggamma30A has human and other mammalian orthologs with similar function, and has been associated in mice with gustation [57
Table 6 Candidate genes from 30AB that mediate the behavioral response to lead. We took gene expression data for 122 probesets nearest our behavioral QTL and asked whether or not strain differences in gene expression levels for these, individually, were correlated (more ...)
Although these cursory analyses for purposes of discussion have not been corrected for multiple comparisons, they illustrate a simple and direct method for correlating functional traits associated with lead exposure to gene expression levels in response to lead as an independent method for screening candidate genes of interest within identified QTL. The combination of gene expression data and data on functional traits in the same RI lines provides a more precise method for a priori screening of candidate loci representing significant QTL than either method alone would allow. This approach should be able to increase the probability of identifying valid candidate loci represented by eQTL to facilitate confirmation by direct genetic methods such as deletion mapping and gene silencing, at least in cases where functional trait QTL are derived from strain differences in traits that are a direct function of strain differences in gene expression.
In the absence of data on functional traits and functionally related gene expression in the same RI lines, a priori identification of candidate genes for a QTL is limited to what is already known about the functions of closely linked loci. In the present case, examining the known, or predicted functions of the 122 genes closely linked to our behavioral QTL reveals one that is a strong candidate based on predicted function (CG3759: “laccase-like”) associated with defense against toxins, iron and copper binding (Flybase). Although gene expression from this probeset is not correlated with the behavioral response to lead, it may represent genetic variation that acts developmentally to mediate behavioral responses to lead at a physiological or anatomical level that does not depend on adult gene expression.
We have also measured variation among these roo lines for lead burden in response to the same lead treatment used here to assay gene expression in response to lead treatments, and have identified three significant QTL for this trait (unpublished) that do not overlap with either the behavioral QTL [53
] or the transband eQTL identified here in response to lead treatment. This observation suggests that our transband pattern of gene expression in response to lead is not significantly dependent on strain variation in lead burden, although further analysis of individual probesets may reveal interesting associations.
Systematic analysis of genetic correlations among probesets for gene expression and functional trait expression in response to lead exposure may help identify specific candidate probesets of interest within each of the transband QTLs. Genetic validation of the putative trans-regulators in our transbands is required to authenticate this proposition, but, if validated, it represents a significant change in how to interpret eQTL analyses.
Limitations of genetical genomics studies
There are limitations common to all eQTL studies. For example, previous genetical genomics studies have determined that probeset expression levels are often apparently polarized for expression based on genotype – i.e
., probesets of one genotype in the RILs have higher relative expression levels compared with probesets of the other genotype. For example, in studies with BXD mouse RILs, which were made from the C57BL/6J sequenced mouse strain (for which the probes were designed) and the DBA/2J strain, a significantly larger number of the probesets with the C57BL/6J genotype have an apparently higher relative expression level than with the non-sequenced genotype [19
]. The apparent polarization of expression based on genotype is now a well-known design flaw of microarray experiments because probes that fail to hybridize to a gene because of a sequence mismatch could be erroneously scored as having lower expression [95
]. We controlled for this by eliminating probesets that have “outlier probe(s)” with significantly lower signal(s) than the other probes in the probeset.
Another major limitation of the current genetical genomics study is that if the regulation of a lead-dependent master modulatory gene is identical in the progenitor strains ORE
, then these master regulators would probably not be identified in our analysis. An approach that analyzes gene expression patterns in populations of flies, such as the quantitative-trait transcript technique [88
], or an eQTL approach that uses more than two progenitor strains [40
], will likely identify further GxE-eQTL transbands and lead-dependent master-modulatory genes.
A further limitation of our experimental design is that anatomic alterations are likely induced by either lead or genetics. For example, if some of the flies have a bigger brain (or any other tissue since we analyzed whole flies) then one would find up-regulation of brain (or some other tissue) associated genes. However, this would not
necessarily be true if you compare only an equivalent amount of cells from the tissue in question. Therefore, as discussed as a general limitation of microarray design in whole organism studies [20
], the observed effect would not be a transcription effect but rather is a function of strain differences occurring on a larger anatomic scale. We and others have observed that the lead concentration that we used causes developmental delays, but these developmental delays are not accompanied by changes in the adult body weight [2
]. However, we and others have shown that certain traits such as triglyceride levels and body mass vary considerably in the RILs that we used in our studies [24
]. Therefore, these caveats should be considered in the interpretation of our data and in similar studies.
Why did we use whole flies for these studies instead of heads or brains? The co-authors extensively discussed this issue before deciding on using whole flies. We thought that starting with whole flies is the best approach because lead has numerous physiological effects throughout the body, and not just in the nervous system. We also believed that lead could have global effects on gene expression in a conserved manner, but the neuronal effects would become manifest only in the nervous system. We also argued that if we decided to analyze only the head (of which over half is brain and eyes, which are also parts of the nervous system), then we might be throwing out interesting gene expression changes that occur in the PNS (e.g
., ganglia) and other parts of the body. Futhermore, a criticism of mouse genetical genomics experiments that use whole brains is that smaller regions of the brain would have been preferred. For Drosophila
, we believe the next step should be to repeat these studies with fly heads or, even better, specific types of neurons or brain regions by using a purification technique, such as the bacTRAP technique that was developed for purifying mouse Purkinje cells and other neuronal types [35
Why did we only do one microarray experiment for each of the 75 RILs in the presence or absence of only one concentration of lead? The most obvious answer is that this would have been prohibitively expensive. However, the great advantage of genetical genomics studies over other types of gene expression studies with microarrays is that the most important unit of replication is the genotype of a particular genome region (ORE or 2B) and not the RIL. Since we performed 150 microarrays, we analyzed gene expression of the ORE genotype ~75 times and of the 2B genotype ~75 times for each region of the genome. Not many studies have run 75 replicate microarray experiments, but we have essentially done so in this study.
What is meant by a “significant” change in gene expression? We decided to go by P value (via two tailed t-test) rather than by absolute change in gene expression (e.g., a minimum 1.5-fold change used by many investigators). The main reason that some other studies use a threshold of 1.5 as being “significant” is that they wish to identify the genes that are likely important in a response, and it can be argued that a change in steady state mRNA levels of only 1.2 is unlikely to cause a significant change in protein levels. However, this is not the case in our studies because we are primarily interested in pathways (i.e., transbands) rather than individual genes, so we believe that “significance” by t-test is a more robust approach. Also, as mentioned in the previous paragraph, since we essentially replicated our studies ~75 times for the genotype of each region of the genome, a 1.2-fold change, which is a typical change in gene expression in a probeset in the transbands, can be very significant.
What is the cost effectiveness of doing genetical genomics in Drosophila
versus mice or some other model organism? In general, the smaller the genome, the more powerful the genetical genomic studies can be. For example, genetical genomic studies done with a similar number of RILs in the small-genome organism Saccharomyces cerevisia
have mapped cis
-eQTL and transbands to the exact polymorphism [32
]. Genetical genomics studies in C. elegans
, which has a similar genome size as Drosophila melanogaster
, can map eQTL to within 5 cM, which corresponds to dozens or hundreds of genes, but further fine-mapping studies can be quickly done to identify the exact polymorphisms responsible [77
]. Genetical genomics studies have also been done extensively in rats and mice [7
], which have a further 10-to-20-fold increase in genome size, but the precision of these studies is consequently much broader and fine mapping studies are much more difficult. Fortunately, next-generation “cheap sequencing” technologies have begun to enter the genetical-genomics field [11
], and these techniques might soon allow these studies to become cost effective in mammalian models for even small laboratories.
Evolution in a toxic environment
Responses to stressors must play a vital role in evolution, and our results provide insights into mechanisms that may be involved. We identified several transbands that might contain master-modulatory proteins that are affected by lead in one genotype but are insensitive to lead in the other genotype. Since we started with only two parental lines, there might be many more such master modulatory proteins in the genome in a population of Drosophila
that are similarly evolving. Another question is whether these same master-modulatory proteins uniquely acquired sensitivity to lead, which seems unlikely, or respond in similar manners to many environmental toxins. Toxin-induced changes in the regulation of these proteins might result in the development of a slightly “modified” nervous system that could enhance fitness in a contaminated environment. Consistent with this, lead-induced changes in mating and in fecundity in Drosophila
] can be interpreted in terms of changes in resource allocation by the female that may, from a long-term perspective, be adaptive. The studies presented here show many promising new avenues for investigation.
There are several practical applications for the data generated in the study. The data will be deposited in the GeneNetwork website (http://www.genenetwork.org
) so that other investigators can look for correlations between gene expression patterns and phenotypic traits. The GeneNetwork is an open resource and consists of a set of linked resources for systems genetics. It has been designed for integration of networks of genes, transcripts, and traits such as toxicity, cancer susceptibility, and behavior for several species. Phenotypic QTLs using the roo lines were identified in numerous other QTL mapping studies [46
]. For sets of phenotypes, particularly those in Gene Network's databases (Drosophila
phenotypes are not yet in this database), a variety of correlation analyses can be performed with the gene expression data. Trait values entered by users or retrieved from the databases can be correlated with any other database of phenotypes from the same mapping genetic reference panel, in our case, the roo lines. The future addition of our data and other Drosophila
QTL data to GeneNetwork will be a tremendous boon to the Drosophila
community and to the genetics community in general.