Aerobic organisms are susceptible to damage by reactive oxygen species. Oxidative stress resistance is a quantitative trait with population variation attributable to the interplay between genetic and environmental factors. Drosophila melanogaster provides an ideal system to study the genetics of variation for resistance to oxidative stress.
Methods and Findings
We used 167 wild-derived inbred lines of the Drosophila Genetic Reference Panel for a genome-wide association study of acute oxidative stress resistance to two oxidizing agents, paraquat and menadione sodium bisulfite. We found significant genetic variation for both stressors. Single nucleotide polymorphisms (SNPs) associated with variation in oxidative stress resistance were often sex-specific and agent-dependent, with a small subset common for both sexes or treatments. Associated SNPs had moderately large effects, with an inverse relationship between effect size and allele frequency. Linear models with up to 12 SNPs explained 67–79% and 56–66% of the phenotypic variance for resistance to paraquat and menadione sodium bisulfite, respectively. Many genes implicated were novel with no known role in oxidative stress resistance. Bioinformatics analyses revealed a cellular network comprising DNA metabolism and neuronal development, consistent with targets of oxidative stress-inducing agents. We confirmed associations of seven candidate genes associated with natural variation in oxidative stress resistance through mutational analysis.
We identified novel candidate genes associated with variation in resistance to oxidative stress that have context-dependent effects. These results form the basis for future translational studies to identify oxidative stress susceptibility/resistance genes that are evolutionary conserved and might play a role in human disease.
Phenotypic plasticity is the ability of a single genotype to produce different phenotypes in response to changing environments. We assessed variation in genome-wide gene expression and four fitness-related phenotypes of an outbred Drosophila melanogaster population under 20 different physiological, social, nutritional, chemical, and physical environments; and we compared the phenotypically plastic transcripts to genetically variable transcripts in a single environment. The environmentally sensitive transcriptome consists of two transcript categories, which comprise ∼15% of expressed transcripts. Class I transcripts are genetically variable and associated with detoxification, metabolism, proteolysis, heat shock proteins, and transcriptional regulation. Class II transcripts have low genetic variance and show sexually dimorphic expression enriched for reproductive functions. Clustering analysis of Class I transcripts reveals a fragmented modular organization and distinct environmentally responsive transcriptional signatures for the four fitness-related traits. Our analysis suggests that a restricted environmentally responsive segment of the transcriptome preserves the balance between phenotypic plasticity and environmental canalization.
Unlike Mendelian traits, where the genotype allows a direct prediction of the phenotype, predicting phenotypic values is not straightforward for complex traits, which arise from multiple segregating genes and their interactions with the environment. Here, a single genotype can often express different phenotypes in different environments. Such phenotypic plasticity is the counterpoint to “environmental canalization,” whereby genotypes produce the same phenotype in different environments. Whereas phenotypic plasticity allows organisms to respond rapidly to changing environments, environmental canalization buffers phenotypes against environmental perturbations. The balance between plasticity and robustness is crucial for optimal fitness, but the genetic basis for phenotypic plasticity is poorly defined. Here, we present the most comprehensive analysis to date of variation in genome-wide gene expression of an outbred Drosophila melanogaster population under 20 different environments. We find that a restricted environmentally responsive segment of the transcriptome (∼15%) preserves the balance between phenotypic plasticity and environmental canalization. Environmentally plastic transcripts can be grouped into two categories. Class I transcripts are genetically variable and associated with detoxification, metabolism, proteolysis, heat shock proteins, and transcriptional regulation. Class II transcripts have low genetic variance and show sexually dimorphic expression enriched for reproductive functions. Despite low genetic variance these transcripts evolve rapidly.
A central issue in evolutionary quantitative genetics is to understand how genetic variation for quantitative traits is maintained in natural populations. Estimates of genetic variation and of genetic correlations and pleiotropy among multiple traits, inbreeding depression, mutation rates for fitness and quantitative traits and of the strength and nature of selection are all required to evaluate theoretical models of the maintenance of genetic variation. Studies in Drosophila melanogaster have shown that a substantial fraction of segregating variation for fitness-related traits in Drosophila is due to rare deleterious alleles maintained by mutation–selection balance, with a smaller but significant fraction attributable to intermediate frequency alleles maintained by alleles with antagonistic pleiotropic effects, and late-age-specific effects. However, the nature of segregating variation for traits under stabilizing selection is less clear and requires more detailed knowledge of the loci, mutation rates, allelic effects and frequencies of molecular polymorphisms affecting variation in suites of pleiotropically connected traits. Recent studies in D. melanogaster have revealed unexpectedly complex genetic architectures of many quantitative traits, with large numbers of pleiotropic genes and alleles with sex-, environment- and genetic background-specific effects. Future genome wide association analyses of many quantitative traits on a common panel of fully sequenced Drosophila strains will provide much needed empirical data on the molecular genetic basis of quantitative traits.
maintenance of quantitative genetic variation; mutation–selection balance; balancing selection; pleiotropy; context-dependent effects; genetic architecture
Interactions among genes and the environment are a common source of phenotypic variation. To characterize the interplay between genetics and the environment at single nucleotide resolution, we quantified the genetic and environmental interactions of four quantitative trait nucleotides (QTN) that govern yeast sporulation efficiency. We first constructed a panel of strains that together carry all 32 possible combinations of the 4 QTN genotypes in 2 distinct genetic backgrounds. We then measured the sporulation efficiencies of these 32 strains across 8 controlled environments. This dataset shows that variation in sporulation efficiency is shaped largely by genetic and environmental interactions. We find clear examples of QTN:environment, QTN: background, and environment:background interactions. However, we find no QTN:QTN interactions that occur consistently across the entire dataset. Instead, interactions between QTN only occur under specific combinations of environment and genetic background. Thus, what might appear to be a QTN:QTN interaction in one background and environment becomes a more complex QTN:QTN:environment:background interaction when we consider the entire dataset as a whole. As a result, the phenotypic impact of a set of QTN alleles cannot be predicted from genotype alone. Our results instead demonstrate that the effects of QTN and their interactions are inextricably linked both to genetic background and to environmental variation.
Phenotypic variation among individuals is caused by naturally occurring genetic differences, or alleles. The relationship between an allele and the phenotype is extremely complex; for example, the effect of an allele often depends upon both the environment and the individual's genetic background. To better understand these complex relationships, we examined the effects of four quantitative trait nucleotides (QTN) in three genes that cause variation in sporulation efficiency between vineyard and oak tree strains of yeast. We measured the effects of the QTN while varying both the genetic makeup of the strains and their growth environments. We found that the effects of each of the four QTN alleles depended upon the genotypes at the other QTN, the growth environment, and whether the strain carried the oak or vineyard parent genome. There were no simple rules that describe the effects of the alleles across all environments; instead, detailed models were needed to account for environmental and genetic variation in order to predict the effects of alleles in specific individuals.
Understanding the genetic and environmental factors that affect variation in life span and senescence is of major interest for human health and evolutionary biology. Multiple mechanisms affect longevity, many of which are conserved across species, but the genetic networks underlying each mechanism and cross-talk between networks are unknown. We report the results of a screen for mutations affecting Drosophila life span. One third of the 1,332 homozygous P–element insertion lines assessed had quantitative effects on life span; mutations reducing life span were twice as common as mutations increasing life span. We confirmed 58 mutations with increased longevity, only one of which is in a gene previously associated with life span. The effects of the mutations increasing life span were highly sex-specific, with a trend towards opposite effects in males and females. Mutations in the same gene were associated with both increased and decreased life span, depending on the location and orientation of the P–element insertion, and genetic background. We observed substantial—and sex-specific—epistasis among a sample of ten mutations with increased life span. All mutations increasing life span had at least one deleterious pleiotropic effect on stress resistance or general health, with different patterns of pleiotropy for males and females. Whole-genome transcript profiles of seven of the mutant lines and the wild type revealed 4,488 differentially expressed transcripts, 553 of which were common to four or more of the mutant lines, which include genes previously associated with life span and novel genes implicated by this study. Therefore longevity has a large mutational target size; genes affecting life span have variable allelic effects; alleles affecting life span exhibit antagonistic pleiotropy and form epistatic networks; and sex-specific mutational effects are ubiquitous. Comparison of transcript profiles of long-lived mutations and the control line reveals a transcriptional signature of increased life span.
Recent advances in medical science as well as vastly improved living conditions have resulted in a steady increase in human life span, with a concomitant increase in health issues associated with aging. In addition, understanding life history evolution requires that we know why organisms age and why there is variation in aging and senescence. To identify genes involved in aging, we assessed longevity in a collection of over 1,300 Drosophila lines homozygous for a single P transposable element mutation. We found 58 mutations in novel loci that increase life span by up to 33%. Most mutations had different effects on male and female life span, and for some the effects were opposite between the sexes. Effects of these mutations on starvation resistance, chill coma recovery, and climbing ability varied, but all had a deleterious effect on at least one other trait. A sample of ten mutations with increased life span formed genetic interaction networks, but the genetic interactions were different, and sometimes in opposite directions, in males and females. Transcript profiles of seven long-lived mutations and the control line reveal a core transcriptional signature of increased life span involving novel candidate genes for future analysis.
For most organisms, chemosensation is critical for survival and is mediated by large families of chemoreceptor proteins, whose expression must be tuned appropriately to changes in the chemical environment. We asked whether expression of chemoreceptor genes that are clustered in the genome would be regulated independently; whether expression of certain chemoreceptor genes would be especially sensitive to environmental changes; whether groups of chemoreceptor genes undergo coordinated rexpression; and how plastic the expression of chemoreceptor genes is with regard to sex, development, reproductive state, and social context. To answer these questions we used Drosophila melanogaster, because its chemosensory systems are well characterized and both the genotype and environment can be controlled precisely. Using customized cDNA microarrays, we showed that chemoreceptor genes that are clustered in the genome undergo independent transcriptional regulation at different developmental stages and between sexes. Expression of distinct subgroups of chemoreceptor genes is sensitive to reproductive state and social interactions. Furthermore, exposure of flies only to odor of the opposite sex results in altered transcript abundance of chemoreceptor genes. These genes are distinct from those that show transcriptional plasticity when flies are allowed physical contact with same or opposite sex members. We analyzed covariance in transcript abundance of chemosensory genes across all environmental conditions and found that they segregated into 20 relatively small, biologically relevant modules of highly correlated transcripts. This finely pixilated modular organization of the chemosensory subgenome enables fine tuning of the expression of the chemoreceptor repertoire in response to ecologically relevant environmental and physiological conditions.
Rapid adaptation and phenotypic plasticity to the chemical environment are essential prerequisites for survival; and, consequently, large families of genes that mediate the recognition of olfactory and gustatory cues have evolved. We asked how flexible the expression of these genes is in the face of rapidly changing conditions encountered during an individual's lifetime. We used the fruit fly, Drosophila melanogaster, to address this question, since both the genetic composition and environmental rearing conditions can be controlled precisely in this experimentally amenable model organism. By measuring expression levels of all chemosensory genes simultaneously, we identified genes that show altered expression at different developmental stages, during aging, in males and females, following mating, and in different social conditions. We asked whether chemosensory genes are regulated independently or whether their regulation is structured. We found that chemosensory genes that are located in close proximity to one another on the chromosome are often regulated independently. However, statistical analysis showed that groups of chemosensory genes are coordinately expressed in response to a range of environmental conditions, revealing an underlying modular organization of the phenotypic plasticity of the chemosensory receptor repertoire.
Determining the genetic architecture of complex traits is challenging because phenotypic variation arises from interactions between multiple, environmentally sensitive alleles. We quantified genome-wide transcript abundance and phenotypes for six ecologically relevant traits in D. melanogaster wild-derived inbred lines. We observed 10,096 genetically variable transcripts and high heritabilities for all organismal phenotypes. The transcriptome is highly genetically inter-correlated, forming 241 transcriptional modules. Modules are enriched for transcripts in common pathways, gene ontology categories, tissue-specific expression, and transcription factor binding sites. The high transcriptional connectivity allows us to infer genetic networks and the function of predicted genes based on annotations of other genes in the network. Regressions of organismal phenotypes on transcript abundance implicate several hundred candidate genes that form modules of biologically meaningful correlated transcripts affecting each phenotype. Overlapping transcripts in modules associated with different traits provides insight into the molecular basis of pleiotropy between complex traits.
Sleep disorders are common in humans, and sleep loss increases the risk of obesity and diabetes1. Studies in Drosophila2, 3 have revealed molecular pathways4–7 and neural tissues8–10 regulating sleep; however, genes that maintain genetic variation for sleep in natural populations are unknown. Here, we characterized sleep in 40 wild-derived Drosophila lines and observed abundant genetic variation in sleep architecture. We associated sleep with genome-wide variation in gene expression11 to identify candidate genes. We independently confirmed that molecular polymorphisms in Catecholamines up are associated with variation in sleep; and that P-element mutations in four candidate genes affect sleep and gene expression. Transcripts associated with sleep grouped into biologically plausible genetically correlated transcriptional modules. We confirmed co-regulated gene expression using P-element mutants. Genes associated with sleep duration are evolutionarily conserved. Quantitative genetic analysis of natural phenotypic variation is an efficient method for revealing candidate genes and pathways.
Glaucoma is the world's second leading cause of bilateral blindness with progressive loss of vision due to retinal ganglion cell death. Myocilin has been associated with congenital glaucoma and 2–4% of primary open angle glaucoma (POAG) cases, but the pathogenic mechanisms remain largely unknown. Among several hypotheses, activation of the unfolded protein response (UPR) has emerged as a possible disease mechanism.
Methodology / Principal Findings
We used a transgenic Drosophila model to analyze whole-genome transcriptional profiles in flies that express human wild-type or mutant MYOC in their eyes. The transgenic flies display ocular fluid discharge, reflecting ocular hypertension, and a progressive decline in their behavioral responses to light. Transcriptional analysis shows that genes associated with the UPR, ubiquitination, and proteolysis, as well as metabolism of reactive oxygen species and photoreceptor activity undergo altered transcriptional regulation. Following up on the results from these transcriptional analyses, we used immunoblots to demonstrate the formation of MYOC aggregates and showed that the formation of such aggregates leads to induction of the UPR, as evident from activation of the fluorescent UPR marker, xbp1-EGFP.
Conclusions / Significance
Our results show that aggregation of MYOC in the endoplasmic reticulum activates the UPR, an evolutionarily conserved stress pathway that culminates in apoptosis. We infer from the Drosophila model that MYOC-associated ocular hypertension in the human eye may result from aggregation of MYOC and induction of the UPR in trabecular meshwork cells. This process could occur at a late age with wild-type MYOC, but might be accelerated by MYOC mutants to account for juvenile onset glaucoma.
Successful reproduction is critical to pass genes to the next generation. Seminal proteins contribute to important reproductive processes that lead to fertilization in species ranging from insects to mammals. In Drosophila, the male's accessory gland is a source of seminal fluid proteins that affect the reproductive output of males and females by altering female post-mating behavior and physiology. Protein classes found in the seminal fluid of Drosophila are similar to those of other organisms, including mammals. By using RNA interference (RNAi) to knock down levels of individual accessory gland proteins (Acps), we investigated the role of 25 Acps in mediating three post-mating female responses: egg production, receptivity to remating and storage of sperm. We detected roles for five Acps in these post-mating responses. CG33943 is required for full stimulation of egg production on the first day after mating. Four other Acps (CG1652, CG1656, CG17575, and CG9997) appear to modulate the long-term response, which is the maintenance of post-mating behavior and physiological changes. The long-term post-mating response requires presence of sperm in storage and, until now, had been known to require only a single Acp. Here, we discovered several novel Acps together are required which together are required for sustained egg production, reduction in receptivity to remating of the mated female and for promotion of stored sperm release from the seminal receptacle. Our results also show that members of conserved protein classes found in seminal plasma from insects to mammals are essential for important reproductive processes.
In sexually reproducing organisms, sperm enter the female in combination with seminal proteins that are critical for fertility. These proteins can activate sperm or enhance sperm storage within the female, and can improve the chance that sperm will fertilize eggs. Understanding the action of seminal proteins has potential utility in insect pest control and in the diagnosis of certain human infertilities. However, the precise function of very few seminal proteins is known. To address this, we knocked down the levels of 25 seminal proteins individually in male fruit flies, and tested the males' abilities to modulate egg production, sperm storage/release, or behavior of their mates. We found five seminal proteins that are necessary to elevate offspring production in mated females. Four of these proteins are needed for efficient release of sperm from storage to fertilize eggs, a function that had not been previously assigned to any seminal protein. All four are in biochemical classes that are conserved in seminal fluid from insects to humans, suggesting they may play similar sperm-related roles in other animals. In addition to assigning functions to particular seminal proteins, our results suggest that fruit flies can serve as a model with which to dissect the functions of conserved protein classes in seminal fluid.
Aggressive behavior is important for animal survival and reproduction, and excessive aggression is an enormous social and economic burden for human society. Although the role of biogenic amines in modulating aggressive behavior is well characterized, other genetic mechanisms affecting this complex behavior remain elusive. Here, we developed an assay to rapidly quantify aggressive behavior in Drosophila melanogaster, and generated replicate selection lines with divergent levels of aggression. The realized heritability of aggressive behavior was approximately 0.10, and the phenotypic response to selection specifically affected aggression. We used whole-genome expression analysis to identify 1,539 probe sets with different expression levels between the selection lines when pooled across replicates, at a false discovery rate of 0.001. We quantified the aggressive behavior of 19 mutations in candidate genes that were generated in a common co-isogenic background, and identified 15 novel genes affecting aggressive behavior. Expression profiling of genetically divergent lines is an effective strategy for identifying genes affecting complex traits.
Aggressive behavior is a complex trait affected by numerous interacting genes whose expression depends on the environment. Aggression can be selectively advantageous in the pursuit of mates, territory, or food; however, excessive aggression may be deleterious. Pathological levels of aggression in humans create an enormous burden to society. Although dysfunction of the biogenic amine systems is often associated with alterations in aggressive behavior, this represents only the “tip of the iceberg” of the complex genetic architecture of aggressive behavior. The fruit fly Drosophila melanogaster is an excellent model genetic system for exploring the genetic basis of aggressive behavior. The authors have developed a rapid assay to quantify Drosophila aggression, and have used it to select genetically divergent replicate lines for increased and decreased behavior from a genetically heterogeneous base population. They used whole-genome expression profiling to identify variation in gene expression among these lines, and identified 1,539 transcripts that differed between the selection lines, illustrating the complex genomic basis of aggressive behavior. The authors evaluated aggressive behavior of flies with mutations in 19 genes that were implicated by the analysis of differential transcript abundance, and identified 15 novel candidate genes affecting this complex trait, eight of which have human orthologs.
Numbers of Drosophila sensory bristles present an ideal model system to elucidate the genetic basis of variation for quantitative traits. Here, we review recent evidence that the genetic architecture of variation for bristle numbers is surprisingly complex. A substantial fraction of the Drosophila genome affects bristle number, indicating pervasive pleiotropy of genes that affect quantitative traits. Further, a large number of loci, often with sex- and environment-specific effects that are also conditional on background genotype, affect natural variation in bristle number. Despite this complexity, an understanding of the molecular basis of natural variation in bristle number is emerging from linkage disequilibrium mapping studies of individual candidate genes that affect the development of sensory bristles. We show that there is naturally segregating genetic variance for environmental plasticity of abdominal and sternopleural bristle number. For abdominal bristle number this variance can be attributed in part to an abnormal abdomen-like phenotype that resembles the phenotype of mutants defective in catecholamine biosynthesis. Dopa decarboxylase (Ddc) encodes the enzyme that catalyses the final step in the synthesis of dopamine, a major Drosophila catecholamine and neurotransmitter. We found that molecular polymorphisms at Ddc are indeed associated with variation in environmental plasticity of abdominal bristle number.
P-element mutagenesis; quantitative trait loci mapping; linkage disequilibrium mapping; genetic variance of environmental plasticity
Epistasis is an important feature of the genetic architecture of quantitative traits, but the dynamics of epistatic interactions in natural populations and the relationship between epistasis and pleiotropy remain poorly understood. Here, we studied the effects of epistatic modifiers that segregate in a wild-derived Drosophila melanogaster population on the mutational effects of P-element insertions in Semaphorin-5C (Sema-5c) and Calreticulin (Crc), pleiotropic genes that affect olfactory behaviour and startle behaviour and, in the case of Crc, sleep phenotypes. We introduced Canton-S B (CSB) third chromosomes with or without a P-element insertion at the Crc or Sema-5c locus in multiple wild-derived inbred lines of the Drosophila melanogaster Genetic Reference Panel (DGRP) and assessed the effects of epistasis on the olfactory response to benzaldehyde and, for Crc, also on sleep. In each case, we found substantial epistasis and significant variation in the magnitude of epistasis. The predominant direction of epistatic effects was to suppress the mutant phenotype. These observations support a previous study on startle behaviour using the same D. melanogaster chromosome substitution lines, which concluded that suppressing epistasis may buffer the effects of new mutations. However, epistatic effects are not correlated among the different phenotypes. Thus, suppressing epistasis appears to be a pervasive general feature of natural populations to protect against the effects of new mutations, but different epistatic interactions modulate different phenotypes affected by mutations at the same pleiotropic gene.
The relative importance between additive and non-additive genetic variance has been widely argued in quantitative genetics. By approaching this question from an evolutionary perspective we show that, while additive variance can be maintained under selection at a low level for some patterns of epistasis, the majority of the genetic variance that will persist is actually non-additive. We propose that one reason that the problem of the “missing heritability” arises is because the additive genetic variation that is estimated to be contributing to the variance of a trait will most likely be an artefact of the non-additive variance that can be maintained over evolutionary time. In addition, it can be shown that even a small reduction in linkage disequilibrium between causal variants and observed SNPs rapidly erodes estimates of epistatic variance, leading to an inflation in the perceived importance of additive effects. We demonstrate that the perception of independent additive effects comprising the majority of the genetic architecture of complex traits is biased upwards and that the search for causal variants in complex traits under selection is potentially underpowered by parameterising for additive effects alone. Given dense SNP panels the detection of causal variants through genome-wide association studies may be improved by searching for epistatic effects explicitly.
In this study we have shown that two independent problems may have a common cause. Why do traits under selection exhibit additive genetic variance, and why is the proportion of the heritability explained by additive effects much smaller than the total heritability estimated to exist? Our results indicate that epistatic interactions can allow deleterious mutations to persist under selection and that these interactions can abate the depletion of additive genetic variation. Furthermore, a much larger element of non-additive genetic variance is maintained, which supports the notion that the heritability estimated from family studies could be a mixture of both additive and non-additive components. We show that searching directly for epistatic effects greatly improves the discovery of variants under selection, despite the multiple testing penalty being much larger. Finally, we demonstrate that common practices in genome-wide association studies could lead to both an ascertainment bias in detecting additive effects and a confirmation bias in perceiving that most of the genetic variance is additive.
Understanding the relationship between genetic and phenotypic variation is one of the great outstanding challenges in biology. To meet this challenge, comprehensive genomic variation maps of human as well as of model organism populations are required. Here, we present a nucleotide resolution catalog of single-nucleotide, multi-nucleotide, and structural variants in 39 Drosophila melanogaster Genetic Reference Panel inbred lines. Using an integrative, local assembly-based approach for variant discovery, we identify more than 3.6 million distinct variants, among which were more than 800,000 unique insertions, deletions (indels), and complex variants (1 to 6,000 bp). While the SNP density is higher near other variants, we find that variants themselves are not mutagenic, nor are regions with high variant density particularly mutation-prone. Rather, our data suggest that the elevated SNP density around variants is mainly due to population-level processes. We also provide insights into the regulatory architecture of gene expression variation in adult flies by mapping cis-expression quantitative trait loci (cis-eQTLs) for more than 2,000 genes. Indels comprise around 10% of all cis-eQTLs and show larger effects than SNP cis-eQTLs. In addition, we identified two-fold more gene associations in males as compared to females and found that most cis-eQTLs are sex-specific, revealing a partial decoupling of the genomic architecture between the sexes as well as the importance of genetic factors in mediating sex-biased gene expression. Finally, we performed RNA-seq-based allelic expression imbalance analyses in the offspring of crosses between sequenced lines, which revealed that the majority of strong cis-eQTLs can be validated in heterozygous individuals.
One of the principal challenges in current biology is to understand the relationship between genetic and phenotypic variation. The increasing availability of genomic variation maps of human as well as of model organism populations (mouse and Arabidopsis) constitutes an important step towards meeting this challenge. However, despite its excellent track record as a premier model to understand genome function, no genome-wide variation data beyond single-nucleotide variants and microsatellites are currently available for D. melanogaster. Here, we present a comprehensive, nucleotide-resolution catalogue of variants of various types (single-nucleotide, multi-nucleotide, and structural variants) for 39 wild-derived inbred D. melanogaster lines based on high-throughput sequencing. This catalogue confirms that non–SNP variants account for more than half of genomic variation, allowing us to provide new insights into the non-random distribution of variants in the Drosophila genome. We further present genome-wide cis-associations with gene expression based on whole adult fly microarray data, revealing significant associations for about 2,000 genes. Most associations are sex-specific, providing evidence for a decoupling of the genomic, regulatory architecture between males and females.
Reactive oxygen species (ROS) are a common byproduct of mitochondrial energy metabolism, and can also be induced by exogenous sources, including UV light, radiation, and environmental toxins. ROS generation is essential for maintaining homeostasis by triggering cellular signaling pathways and host defense mechanisms. However, an imbalance of ROS induces oxidative stress and cellular death and is associated with human disease, including age-related locomotor impairment. To identify genes affecting sensitivity and resistance to ROS-induced locomotor decline, we assessed locomotion of aged flies of the sequenced, wild-derived lines from the Drosophila melanogaster Genetics Reference Panel on standard medium and following chronic exposure to medium supplemented with 3 mM menadione sodium bisulfite (MSB). We found substantial genetic variation in sensitivity to oxidative stress with respect to locomotor phenotypes. We performed genome-wide association analyses to identify candidate genes associated with variation in sensitivity to ROS-induced decline in locomotor performance, and confirmed the effects for 13 of 16 mutations tested in these candidate genes. Candidate genes associated with variation in sensitivity to MSB-induced oxidative stress form networks of genes involved in neural development, immunity, and signal transduction. Many of these genes have human orthologs, highlighting the utility of genome-wide association in Drosophila for studying complex human disease.
Predicting organismal phenotypes from genotype data is important for plant and animal breeding, medicine, and evolutionary biology. Genomic-based phenotype prediction has been applied for single-nucleotide polymorphism (SNP) genotyping platforms, but not using complete genome sequences. Here, we report genomic prediction for starvation stress resistance and startle response in Drosophila melanogaster, using ∼2.5 million SNPs determined by sequencing the Drosophila Genetic Reference Panel population of inbred lines. We constructed a genomic relationship matrix from the SNP data and used it in a genomic best linear unbiased prediction (GBLUP) model. We assessed predictive ability as the correlation between predicted genetic values and observed phenotypes by cross-validation, and found a predictive ability of 0.239±0.008 (0.230±0.012) for starvation resistance (startle response). The predictive ability of BayesB, a Bayesian method with internal SNP selection, was not greater than GBLUP. Selection of the 5% SNPs with either the highest absolute effect or variance explained did not improve predictive ability. Predictive ability decreased only when fewer than 150,000 SNPs were used to construct the genomic relationship matrix. We hypothesize that predictive power in this population stems from the SNP–based modeling of the subtle relationship structure caused by long-range linkage disequilibrium and not from population structure or SNPs in linkage disequilibrium with causal variants. We discuss the implications of these results for genomic prediction in other organisms.
The ability to accurately predict values of complex phenotypes from genotype data will revolutionize plant and animal breeding, personalized medicine, and evolutionary biology. To date, genomic prediction has utilized high-density single-nucleotide polymorphism (SNP) genotyping arrays, but the availability of sequence data opens new frontiers for genomic prediction methods. This article is the first application of genomic phenotype prediction using whole-genome sequence data in a substantial sample of a higher eukaryote. We use ∼2.5 million SNPs with minor allele frequency greater than 2.5% derived from genomic sequences of the “Drosophila Genetic Reference Panel” to predict phenotypes for two traits, starvation resistance and startle-induced locomotor behavior. We systematically address prediction within versus across sexes, genomic best linear unbiased prediction (GBLUP) versus a Bayesian approach, and the effect of SNP density. We find that (i) genomic prediction can be efficiently implemented using sequence data via GBLUP, (ii) there is little gain in predictive ability if the number of SNPs is increased above 150,000, and (iii) neither implicit nor explicit marker selection substantially improves the predictive ability. Although the findings must be seen against the background of small sample sizes, the results illustrate both the potential of the approach and the challenges ahead.
An epistatic interaction between two genes occurs when the phenotypic impact of one gene depends on another gene, often exposing a functional association between them. Due to experimental scalability and to evolutionary significance, abundant work has been focused on studying how epistasis affects cellular growth rate, most notably in yeast. However, epistasis likely influences many different phenotypes, affecting our capacity to understand cellular functions, biochemical networks adaptation, and genetic diseases. Despite its broad significance, the extent and nature of epistasis relative to different phenotypes remain fundamentally unexplored. Here we use genome-scale metabolic network modeling to investigate the extent and properties of epistatic interactions relative to multiple phenotypes. Specifically, using an experimentally refined stoichiometric model for Saccharomyces cerevisiae, we computed a three-dimensional matrix of epistatic interactions between any two enzyme gene deletions, with respect to all metabolic flux phenotypes. We found that the total number of epistatic interactions between enzymes increases rapidly as phenotypes are added, plateauing at approximately 80 phenotypes, to an overall connectivity that is roughly 8-fold larger than the one observed relative to growth alone. Looking at interactions across all phenotypes, we found that gene pairs interact incoherently relative to different phenotypes, i.e. antagonistically relative to some phenotypes and synergistically relative to others. Specific deletion-deletion-phenotype triplets can be explained metabolically, suggesting a highly informative role of multi-phenotype epistasis in mapping cellular functions. Finally, we found that genes involved in many interactions across multiple phenotypes are more highly expressed, evolve slower, and tend to be associated with diseases, indicating that the importance of genes is hidden in their total phenotypic impact. Our predictions indicate a pervasiveness of nonlinear effects in how genetic perturbations affect multiple metabolic phenotypes. The approaches and results reported could influence future efforts in understanding metabolic diseases and the role of biochemical regulation in the cell.
An epistatic interaction between two genes occurs when the phenotypic impact of one gene is dependent on the other. While different phenotypes have been used to uncover epistasis in different contexts, little is known about how cell-scale genetic interaction networks vary across multiple phenotypes. Here we use a genome-scale mathematical model of yeast metabolism to compute a three-dimensional matrix of interactions between any two gene deletions with respect to all metabolic flux phenotypes. We find that this multi-phenotype epistasis map contains many more interactions than found relative to any single phenotype. The unique contribution of examining multiple phenotypes is further demonstrated by the fact that individual interactions may be synergistic relative to some phenotypes and antagonistic relative to others. This observation indicates that different phenotypes are indeed capturing different aspects of the functional relationships between genes. Furthermore, the observation that genes involved in many epistatic interactions across all metabolic flux phenotypes are found to be highly expressed and under strong selective pressure seems to indicate that these interactions are important to the cell and are not just the unavoidable consequence of the connectivity of biological networks. Multi-phenotype epistasis maps may help elucidate the functional organization of biological systems and the role of epistasis in the manifestation of complex genetic diseases.
Genome-wide association studies (GWAS) have demonstrated the ability to identify the strongest causal common variants in complex human diseases. However, to date, the massive data generated from GWAS have not been maximally explored to identify true associations that fail to meet the stringent level of association required to achieve genome-wide significance. Genetics of gene expression (GGE) studies have shown promise towards identifying DNA variations associated with disease and providing a path to functionally characterize findings from GWAS. Here, we present the first empiric study to systematically characterize the set of single nucleotide polymorphisms associated with expression (eSNPs) in liver, subcutaneous fat, and omental fat tissues, demonstrating these eSNPs are significantly more enriched for SNPs that associate with type 2 diabetes (T2D) in three large-scale GWAS than a matched set of randomly selected SNPs. This enrichment for T2D association increases as we restrict to eSNPs that correspond to genes comprising gene networks constructed from adipose gene expression data isolated from a mouse population segregating a T2D phenotype. Finally, by restricting to eSNPs corresponding to genes comprising an adipose subnetwork strongly predicted as causal for T2D, we dramatically increased the enrichment for SNPs associated with T2D and were able to identify a functionally related set of diabetes susceptibility genes. We identified and validated malic enzyme 1 (Me1) as a key regulator of this T2D subnetwork in mouse and provided support for the association of this gene to T2D in humans. This integration of eSNPs and networks provides a novel approach to identify disease susceptibility networks rather than the single SNPs or genes traditionally identified through GWAS, thereby extracting additional value from the wealth of data currently being generated by GWAS.
Genome-wide association studies (GWAS) seek to identify loci in which changes in DNA are correlated with disease. However, GWAS do not necessarily lead directly to genes associated with disease, and they do not typically inform the broader context in which disease genes operate, thereby providing limited insights into the mechanisms driving disease. One critical task to providing further insights into GWAS is developing an understanding of the genetics of gene expression (GGE). We present the first empiric study demonstrating that SNPs in human cohorts that associate with gene expression in liver and adipose tissues are enriched for associating with Type 2 Diabetes (T2D) in humans. By filtering “eSNPs” based on causal gene networks defined in an experimental cross population segregating T2D traits, we demonstrate a dramatically increased enrichment of T2D SNPs that enhance our ability to assess T2D risk. We demonstrate the utility of this approach by identifying malic enzyme 1 (ME1) as a novel T2D susceptibility gene in humans and then functionally validating the causal connection between ME1 and T2D in a mouse knockout model for Me1. This approach provides a path to identifying disease susceptibility networks rather than single SNPs or genes traditionally identified through GWAS.
Flowering time is a key life-history trait in the plant life cycle. Most studies to unravel the genetics of flowering time in Arabidopsis thaliana have been performed under greenhouse conditions. Here, we describe a study about the genetics of flowering time that differs from previous studies in two important ways: first, we measure flowering time in a more complex and ecologically realistic environment; and, second, we combine the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping. Our experiments involved phenotyping nearly 20,000 plants over 2 winters under field conditions, including 184 worldwide natural accessions genotyped for 216,509 SNPs and 4,366 RILs derived from 13 independent crosses chosen to maximize genetic and phenotypic diversity. Based on a photothermal time model, the flowering time variation scored in our field experiment was poorly correlated with the flowering time variation previously obtained under greenhouse conditions, reinforcing previous demonstrations of the importance of genotype by environment interactions in A. thaliana and the need to study adaptive variation under natural conditions. The use of 4,366 RILs provides great power for dissecting the genetic architecture of flowering time in A. thaliana under our specific field conditions. We describe more than 60 additive QTLs, all with relatively small to medium effects and organized in 5 major clusters. We show that QTL mapping increases our power to distinguish true from false associations in GWA mapping. QTL mapping also permits the identification of false negatives, that is, causative SNPs that are lost when applying GWA methods that control for population structure. Major genes underpinning flowering time in the greenhouse were not associated with flowering time in this study. Instead, we found a prevalence of genes involved in the regulation of the plant circadian clock. Furthermore, we identified new genomic regions lacking obvious candidate genes.
Dissecting the genetic bases of adaptive traits is of primary importance in evolutionary biology. In this study, we combined a genome-wide association (GWA) study with traditional linkage mapping in order to detect the genetic bases underlying natural variation in flowering time in ecologically realistic conditions in the plant Arabidopsis thaliana. Our study involved phenotyping nearly 20,000 plants over 2 winters under field conditions in a temperate climate. We show that combined linkage and association mapping clearly outperforms each method alone when it comes to identifying true associations. This highlights the utility of combining different methods to localize genes involved in complex trait natural variation. Most candidate genes found in this study are involved in the regulation of the plant circadian clock and, surprisingly, were not associated with flowering time scored under greenhouse conditions. While rapid advances have been made in high-throughput genotyping and sequencing, high-throughput phenotyping of complex traits under natural conditions will be the next challenge for dissecting the genetic bases of adaptive variation in “laboratory” model organisms.
Genome-wide association studies have identified hundreds of genetic variants associated with complex human diseases and traits, and have provided valuable insights into their genetic architecture. Most variants identified so far confer relatively small increments in risk, and explain only a small proportion of familial clustering, leading many to question how the remaining, ‘missing’ heritability can be explained. Here we examine potential sources of missing heritability and propose research strategies, including and extending beyond current genome-wide association approaches, to illuminate the genetics of complex diseases and enhance its potential to enable effective disease prevention or treatment.
DNA sequence polymorphism in a regulatory protein can have a widespread transcriptional effect. Here we present a computational approach for analyzing modules of genes with a common regulation that are affected by specific DNA polymorphisms. We identify such regulatory-linkage modules by integrating genotypic and expression data for individuals in a segregating population with complementary expression data of strains mutated in a variety of regulatory proteins. Our procedure searches simultaneously for groups of co-expressed genes, for their common underlying linkage interval, and for their shared regulatory proteins. We applied the method to a cross between laboratory and wild strains of S. cerevisiae, demonstrating its ability to correctly suggest modules and to outperform extant approaches. Our results suggest that middle sporulation genes are under the control of polymorphism in the sporulation-specific tertiary complex Sum1p/Rfm1p/Hst1p. In another example, our analysis reveals novel inter-relations between Swi3 and two mitochondrial inner membrane proteins underlying variation in a module of aerobic cellular respiration genes. Overall, our findings demonstrate that this approach provides a useful framework for the systematic mapping of quantitative trait loci and their role in gene expression variation.
High-throughput genotypic and expression data for individuals in a segregating population can provide important information regarding causal regulatory events. However, it has proven difficult to predict these regulatory relations, largely because of statistical power limitations. The use of additional available resources may increase the accuracy of predictions and suggest possible mechanisms through which the target genes are regulated. In this study, we combine genotypic and expression data across the segregating population with complementary regulatory information to identify modules of genes that are jointly affected by changes in activity of regulatory proteins, as well as by genotypic changes. We develop a novel approach called ReL analysis, which automatically learns such modules. A unique feature of our approach is that all three components of the module—the genes, the underlying polymorphism, and the regulatory proteins—are predicted simultaneously. The integrated analysis makes it possible to capture weaker linkage signals and suggests possible mechanisms underlying expression changes. We demonstrate the power of the method on data from yeast segregants, by identifying the roles of new as well as known polymorphisms.
The relative proportion of additive and non-additive variation for complex traits is important in evolutionary biology, medicine, and agriculture. We address a long-standing controversy and paradox about the contribution of non-additive genetic variation, namely that knowledge about biological pathways and gene networks imply that epistasis is important. Yet empirical data across a range of traits and species imply that most genetic variance is additive. We evaluate the evidence from empirical studies of genetic variance components and find that additive variance typically accounts for over half, and often close to 100%, of the total genetic variance. We present new theoretical results, based upon the distribution of allele frequencies under neutral and other population genetic models, that show why this is the case even if there are non-additive effects at the level of gene action. We conclude that interactions at the level of genes are not likely to generate much interaction at the level of variance.
Genetic variation in quantitative or complex traits can be partitioned into many components due to additive, dominance, and interaction effects of genes. The most important is the additive genetic variance because it determines most of the correlation of relatives and the opportunities for genetic change by natural or artificial selection. From reviews of the literature and presentation of a summary analysis of human twin data, we show that a high proportion, typically over half, of the total genetic variance is additive. This is surprising as there are many potential interactions of gene effects within and between loci, some revealed in recent QTL analyses. We demonstrate that under the standard model of neutral mutation, which leads to a U-shaped distribution of gene frequencies with most near 0 or 1, a high proportion of additive variance would be expected regardless of the amount of dominance or epistasis at the individual loci. We also show that the model is compatible with observations in populations undergoing selection and results of QTL analyses on F2 populations.
The genetic basis of odorant-specific variations in human olfactory thresholds, and in particular of enhanced odorant sensitivity (hyperosmia), remains largely unknown. Olfactory receptor (OR) segregating pseudogenes, displaying both functional and nonfunctional alleles in humans, are excellent candidates to underlie these differences in olfactory sensitivity. To explore this hypothesis, we examined the association between olfactory detection threshold phenotypes of four odorants and segregating pseudogene genotypes of 43 ORs genome-wide. A strong association signal was observed between the single nucleotide polymorphism variants in OR11H7P and sensitivity to the odorant isovaleric acid. This association was largely due to the low frequency of homozygous pseudogenized genotype in individuals with specific hyperosmia to this odorant, implying a possible functional role of OR11H7P in isovaleric acid detection. This predicted receptor–ligand functional relationship was further verified using the Xenopus oocyte expression system, whereby the intact allele of OR11H7P exhibited a response to isovaleric acid. Notably, we also uncovered another mechanism affecting general olfactory acuity that manifested as a significant inter-odorant threshold concordance, resulting in an overrepresentation of individuals who were hyperosmic to several odorants. An involvement of polymorphisms in other downstream transduction genes is one possible explanation for this observation. Thus, human hyperosmia to isovaleric acid is a complex trait, contributed to by both receptor and other mechanisms in the olfactory signaling pathway.
Humans can accurately discern thousands of odors, yet there is considerable inter-individual variation in the ability to detect different odors, with individuals exhibiting low sensitivity (hyposmia), high sensitivity (hyperosmia), or even “blindness” (anosmia) to particular odors. Such differences are thought to stem from genetic differences in olfactory receptor (OR) genes, which encode proteins that initiate olfactory signaling. OR segregating pseudogenes, which have both functional and inactive alleles in the population, are excellent candidates for producing this olfactory phenotype diversity. Here, we provide evidence that a particular segregating OR gene is related to sensitivity to a sweaty odorant, isovaleric acid. We show that hypersensitivity towards this odorant is seen predominantly in individuals who carry at least one copy of the intact allele. Furthermore, we demonstrate that this hyperosmia is a complex trait, being driven by additional factors affecting general olfactory acuity. Our results highlight a functional role of segregating pseudogenes in human olfactory variability, and constitute a step towards deciphering the genetic basis of human olfactory variability.
Genetic epidemiology analysis reveals a multifaceted mechanism underlying enhanced olfactory sensitivity to the sweaty odor of isovaleric acid in humans.
Deficits in prepulse inhibition (PPI) are a biological marker for schizophrenia. To unravel the mechanisms that control PPI, we performed quantitative trait loci (QTL) analysis on 1,010 F2 mice derived by crossing C57BL/6 (B6) animals that show high PPI with C3H/He (C3) animals that show low PPI. We detected six major loci for PPI, six for the acoustic startle response, and four for latency to response peak, some of which were sex-dependent. A promising candidate on the Chromosome 10-QTL was Fabp7 (fatty acid binding protein 7, brain), a gene with functional links to the N-methyl-D-aspartic acid (NMDA) receptor and expression in astrocytes. Fabp7-deficient mice showed decreased PPI and a shortened startle response latency, typical of the QTL's proposed effects. A quantitative complementation test supported Fabp7 as a potential PPI-QTL gene, particularly in male mice. Disruption of Fabp7 attenuated neurogenesis in vivo. Human FABP7 showed altered expression in schizophrenic brains and genetic association with schizophrenia, which were both evident in males when samples were divided by sex. These results suggest that FABP7 plays a novel and crucial role, linking the NMDA, neurodevelopmental, and glial theories of schizophrenia pathology and the PPI endophenotype, with larger or overt effects in males. We also discuss the results from the perspective of fetal programming.
A startle response to an unexpected, strong startling stimulus can be suppressed by an immediately preceding low-intensity stimulus, thereby eliciting little behavioral response. This phenomenon, called prepulse inhibition (PPI), has been observed in all mammals tested and is thought to reflect sensory-motor gating functions in organisms. PPI is diminished in human schizophrenia, raising the possibility that PPI might serve as a potential biological marker for the disease. Once the genes regulating PPI in lower animals are identified, it is expected that the human orthologs will be strong candidate genes for schizophrenia. In this study, we first performed a genetic dissection of mouse PPI using quantitative trait loci analysis, which detects chromosomal regions harboring causative genes. Further analyses including those of knockout mice, allowed us to identify one potential causative gene, Fabp7 (fatty acid binding protein 7, brain), a chaperon for the essential fatty acid docosahexaenoic acid. Human studies showed that the FABP7 gene is modestly associated with schizophrenia and that transcript expression levels are up-regulated in schizophrenic brains. From these results, we propose that a FABP7 protein-mediated disturbance of essential lipid metabolism in developing brains may be one risk factor in the development of schizophrenia, with a greater effect in males.
The search for responsible genes for prepulse inhibition, a measure deemed to be a biological trait in schizophrenia, has exposed a gene encoding essential fatty acid-binding protein.