Sexual antagonism, whereby mutations are favourable in one sex and disfavourable in the other, is common in natural populations, yet the root causes of sexual antagonism are rarely considered in evolutionary theories of adaptation. Here, we explore the evolutionary consequences of sex-differential selection and genotype-by-sex interactions for adaptation in species with separate sexes. We show that sexual antagonism emerges naturally from sex differences in the direction of selection on phenotypes expressed by both sexes or from sex-by-genotype interactions affecting the expression of such phenotypes. Moreover, modest sex differences in selection or genotype-by-sex effects profoundly influence the long-term evolutionary trajectories of populations with separate sexes, as these conditions trigger the evolution of strong sexual antagonism as a by-product of adaptively driven evolutionary change. The theory demonstrates that sexual antagonism is an inescapable by-product of adaptation in species with separate sexes, whether or not selection favours evolutionary divergence between males and females.
intralocus sexual conflict; Fisher's geometric model; sexual dimorphism; pleiotropy; evolutionary constraint; adaptation
Cactophilic Drosophila species provide a valuable model to study gene–environment interactions and ecological adaptation. Drosophila buzzatii and Drosophila mojavensis are two cactophilic species that belong to the repleta group, but have very different geographical distributions and primary host plants. To investigate the genomic basis of ecological adaptation, we sequenced the genome and developmental transcriptome of D. buzzatii and compared its gene content with that of D. mojavensis and two other noncactophilic Drosophila species in the same subgenus. The newly sequenced D. buzzatii genome (161.5 Mb) comprises 826 scaffolds (>3 kb) and contains 13,657 annotated protein-coding genes. Using RNA sequencing data of five life-stages we found expression of 15,026 genes, 80% protein-coding genes, and 20% noncoding RNA genes. In total, we detected 1,294 genes putatively under positive selection. Interestingly, among genes under positive selection in the D. mojavensis lineage, there is an excess of genes involved in metabolism of heterocyclic compounds that are abundant in Stenocereus cacti and toxic to nonresident Drosophila species. We found 117 orphan genes in the shared D. buzzatii–D. mojavensis lineage. In addition, gene duplication analysis identified lineage-specific expanded families with functional annotations associated with proteolysis, zinc ion binding, chitin binding, sensory perception, ethanol tolerance, immunity, physiology, and reproduction. In summary, we identified genetic signatures of adaptation in the shared D. buzzatii–D. mojavensis lineage, and in the two separate D. buzzatii and D. mojavensis lineages. Many of the novel lineage-specific genomic features are promising candidates for explaining the adaptation of these species to their distinct ecological niches.
cactophilic Drosophila; genome sequence; ecological adaptation; positive selection; orphan genes; gene duplication
Recent efforts have attempted to describe the population structure of common chimpanzee, focusing on four subspecies: Pan troglodytes verus, P. t. ellioti, P. t. troglodytes, and P. t. schweinfurthii. However, few studies have pursued the effects of natural selection in shaping their response to pathogens and reproduction. Whey acidic protein (WAP) four-disulfide core domain (WFDC) genes and neighboring semenogelin (SEMG) genes encode proteins with combined roles in immunity and fertility. They display a strikingly high rate of amino acid replacement (dN/dS), indicative of adaptive pressures during primate evolution. In human populations, three signals of selection at the WFDC locus were described, possibly influencing the proteolytic profile and antimicrobial activities of the male reproductive tract. To evaluate the patterns of genomic variation and selection at the WFDC locus in chimpanzees, we sequenced 17 WFDC genes and 47 autosomal pseudogenes in 68 chimpanzees (15 P. t. troglodytes, 22 P. t. verus, and 31 P. t. ellioti). We found a clear differentiation of P. t. verus and estimated the divergence of P. t. troglodytes and P. t. ellioti subspecies in 0.173 Myr; further, at the WFDC locus we identified a signature of strong selective constraints common to the three subspecies in WFDC6—a recent paralog of the epididymal protease inhibitor EPPIN. Overall, chimpanzees and humans do not display similar footprints of selection across the WFDC locus, possibly due to different selective pressures between the two species related to immune response and reproductive biology.
WFDC; natural selection; chimpanzees; serine protease inhibitor; reproduction; innate immunity
Technologies for genome-wide sequence interrogation have dramatically improved our ability to identify loci associated with complex human disease. However, a chasm remains between correlations and causality that stems, in part, from a limiting theoretical framework derived from Mendelian genetics, and an incomplete understanding of disease physiology. Here we propose a set of criteria, akin to Koch’s postulates for infectious disease, for assigning causality between genetic variants and human disease phenotypes.
Adult house flies, Musca domestica L., are mechanical vectors of more than 100 devastating diseases that have severe consequences for human and animal health. House fly larvae play a vital role as decomposers of animal wastes, and thus live in intimate association with many animal pathogens.
We have sequenced and analyzed the genome of the house fly using DNA from female flies. The sequenced genome is 691 Mb. Compared with Drosophila melanogaster, the genome contains a rich resource of shared and novel protein coding genes, a significantly higher amount of repetitive elements, and substantial increases in copy number and diversity of both the recognition and effector components of the immune system, consistent with life in a pathogen-rich environment. There are 146 P450 genes, plus 11 pseudogenes, in M. domestica, representing a significant increase relative to D. melanogaster and suggesting the presence of enhanced detoxification in house flies. Relative to D. melanogaster, M. domestica has also evolved an expanded repertoire of chemoreceptors and odorant binding proteins, many associated with gustation.
This represents the first genome sequence of an insect that lives in intimate association with abundant animal pathogens. The house fly genome provides a rich resource for enabling work on innovative methods of insect control, for understanding the mechanisms of insecticide resistance, genetic adaptation to high pathogen loads, and for exploring the basic biology of this important pest. The genome of this species will also serve as a close out-group to Drosophila in comparative genomic studies.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0466-3) contains supplementary material, which is available to authorized users.
Markers of the chromosome 9p21 region are regarded as the strongest and most reliably significant genome-wide association study (GWAS) signals for Coronary heart disease (CHD) risk; this was recently confirmed by the CARDIoGRAMplusC4D Consortium meta-analysis. However, while these associations are significant at the population level, they may not be clinically relevant predictors of risk for all individuals. We describe here the results of a study designed to address the question: What is the contribution of context defined by traditional risk factors in determining the utility of DNA sequence variations marking the 9p21 region for explaining variation in CHD risk? We analyzed a sample of 7,589 (3,869 females and 3,720 males) European American participants of the Atherosclerosis Risk in Communities study. We confirmed CHD-SNP genotype associations for two 9p21 region marker SNPs previously identified by the CARDIoGRAMplusC4D Consortium study, of which ARIC was a part. We then tested each marker SNP genotype effect on prediction of CHD within sub-groups of the ARIC sample defined by traditional CHD risk factors by applying a novel multi-model strategy, PRIM. We observed that the effects of SNP genotypes in the 9p21 region were strongest in a subgroup of hypertensives. We subsequently validated the effect of the region in an independent sample from the Copenhagen City Heart Study. Our study suggests that marker SNPs identified as predictors of CHD risk in large population based GWAS may have their greatest utility in explaining risk of disease in particular sub-groups characterized by biological and environmental effects measured by the traditional CHD risk factors.
Innate immunity involves direct interactions between the host and the microbial world, both pathogenic and symbiotic, so natural selection is expected to strongly influence genes involved in these processes. Population genetics investigates the impact of past natural selection events on the genome of present-day human populations, and complements immunological, and clinical and epidemiological genetic studies. Recent data show that the impact of selection on the different families of innate immune receptors and their downstream signalling molecules varies considerably. This Review discusses these findings and highlights how they help delineate the relative functional importance of innate immune pathways, which can range from being essential to being redundant.
Wolbachia pipientis is one of the most widely studied endosymbionts today, yet we know little about its short-term adaptation and evolution. Here, using a set of 91 inbred Drosophila melanogaster lines from five populations, we explore patterns of diversity and recent evolution in the Wolbachia strain wMel. Within the D. melanogaster lines, we identify six major mitochondrial clades, including one not yet described in the literature. Using Bayesian analysis informed with demographic estimates of colonization times, we estimate that all extant D. melanogaster mitochondrial haplotypes coalesce to a Wolbachia-infected ancestor approximately 2,200 years ago. Concordant with past studies, the Wolbachia haplotypes contain an overall low level of nucleotide diversity, yet they still display geographic structuring. Finally, we show that fly populations vary in wMel titre. This demonstration of local phenotypic divergence suggests that intra-specific host genetic variation plays a key role in shaping this model symbiotic system.
Wolbachia; Drosophila melanogaster; endosymbiosis; mtDNA; population genetics
The whey acidic protein (WAP) four-disulfide core domain (WFDC) locus located on human chromosome 20q13 spans 19 genes with WAP and/or Kunitz domains. These genes participate in antimicrobial, immune, and tissue homoeostasis activities. Neighboring SEMG genes encode seminal proteins Semenogelin 1 and 2 (SEMG1 and SEMG2). WFDC and SEMG genes have a strikingly high rate of amino acid replacement (dN/dS), indicative of responses to adaptive pressures during vertebrate evolution. To better understand the selection pressures acting on WFDC genes in human populations, we resequenced 18 genes and 54 noncoding segments in 71 European (CEU), African (YRI), and Asian (CHB + JPT) individuals. Overall, we identified 484 single-nucleotide polymorphisms (SNPs), including 65 coding variants (of which 49 are nonsynonymous differences). Using classic neutrality tests, we confirmed the signature of short-term balancing selection on WFDC8 in Europeans and a signature of positive selection spanning genes PI3, SEMG1, SEMG2, and SLPI. Associated with the latter signal, we identified an unusually homogeneous-derived 100-kb haplotype with a frequency of 88% in Asian populations. A putative candidate variant targeted by selection is Thr56Ser in SEMG1, which may alter the proteolytic profile of SEMG1 and antimicrobial activities of semen. All the well-characterized genes residing in the WDFC locus encode proteins that appear to have a role in immunity and/or fertility, two processes that are often associated with adaptive evolution. This study provides further evidence that the WFDC and SEMG loci have been under strong adaptive pressure within the short timescale of modern humans.
WFDC; semenogelins; natural selection; innate immunity; serine protease inhibitors; reproduction
Antagonistically selected alleles -- those with opposing fitness effects between sexes, environments, or fitness components -- represent an important component of additive genetic variance in fitness-related traits, with stably balanced polymorphisms often hypothesized to contribute to observed quantitative genetic variation. Balancing selection hypotheses imply that intermediate-frequency alleles disproportionately contribute to genetic variance of life history traits and fitness. Such alleles may also associate with population genetic footprints of recent selection, including reduced genetic diversity and inflated linkage disequilibrium at linked, neutral sites. Here, we compare the evolutionary dynamics of different balancing selection models, and characterize the evolutionary timescale and hitchhiking effects of partial selective sweeps generated under antagonistic versus non-antagonistic (e.g., overdominant and frequency-dependent selection) processes. We show that that the evolutionary timescales of partial sweeps tend to be much longer, and hitchhiking effects are drastically weaker, under scenarios of antagonistic selection. These results predict an interesting mismatch between molecular population genetic and quantitative genetic patterns of variation. Balanced, antagonistically selected alleles are expected to contribute more to additive genetic variance for fitness than alleles maintained by classic, non-antagonistic mechanisms. Nevertheless, classical mechanisms of balancing selection are much more likely to generate strong population genetic signatures of recent balancing selection.
Imprinted genes have been extensively documented in eutherian mammals and found to exhibit significant interspecific variation in the suites of genes that are imprinted and in their regulation between tissues and developmental stages. Much less is known about imprinted loci in metatherian (marsupial) mammals, wherein studies have been limited to a small number of genes previously known to be imprinted in eutherians. We describe the first ab initio search for imprinted marsupial genes, in fibroblasts from the opossum, Monodelphis domestica, based on a genome-wide ChIP-seq strategy to identify promoters that are simultaneously marked by mutually exclusive, transcriptionally opposing histone modifications.
We identified a novel imprinted gene (Meis1) and two additional monoallelically expressed genes, one of which (Cstb) showed allele-specific, but non-imprinted expression. Imprinted vs. allele-specific expression could not be resolved for the third monoallelically expressed gene (Rpl17). Transcriptionally opposing histone modifications H3K4me3, H3K9Ac, and H3K9me3 were found at the promoters of all three genes, but differential DNA methylation was not detected at CpG islands at any of these promoters.
In generating the first genome-wide histone modification profiles for a marsupial, we identified the first gene that is imprinted in a marsupial but not in eutherian mammals. This outcome demonstrates the practicality of an ab initio discovery strategy and implicates histone modification, but not differential DNA methylation, as a conserved mechanism for marking imprinted genes in all therian mammals. Our findings suggest that marsupials use multiple epigenetic mechanisms for imprinting and support the concept that lineage-specific selective forces can produce sets of imprinted genes that differ between metatherian and eutherian lines.
Genomic imprinting; Monoallelic expression; Histone modification; ChIP-seq; Monodelphis domestica; Marsupial
The initial site of smoking-induced lung disease is the small airway epithelium, which is difficult and time consuming to sample by fiberoptic bronchoscopy. We developed a rapid, office-based procedure to obtain trachea epithelium without conscious sedation from healthy nonsmokers (n=26) and healthy smokers (n=19, 27 ± 15 pack-yr). Gene expression differences (fold-change >1.5, p<0.01, Benjamini-Hochberg correction) were assessed with Affymetrix microarrays. 1,057 probe sets were differentially expressed in healthy smokers vs nonsmokers, representing >500 genes. Trachea gene expression was compared to an independent group of small airway epithelial samples (n=23 healthy nonsmokers, n=19 healthy smokers, 25 ± 12 pack-yr). The trachea epithelium is more sensitive to smoking, responding with 3-fold more differentially-expressed genes than small airway epithelium. The trachea transcriptome paralleled the small airway epithelium, with 156 of 167 (93%) genes that are significantly upand down-regulated by smoking in the small airway epithelium showing similar direction and magnitude of response to smoking in the trachea. Trachea epithelium can be obtained without conscious sedation, representing a less invasive surrogate “canary” for smoking-induced changes in the small airway epithelium. This should prove useful in epidemiologic studies correlating gene expression with clinical outcome in assessing smoking-induced lung disease.
The innate immune system in insects consists of a conserved core signaling network and rapidly diversifying effector and recognition components, often containing a high proportion of taxonomically-restricted genes. In the absence of functional annotation, genes encoding immune system proteins can thus be difficult to identify, as homology-based approaches generally cannot detect lineage-specific genes. Here, we use RNA-seq to compare the uninfected and infection-induced transcriptome in the parasitoid wasp Nasonia vitripennis to identify genes regulated by infection. We identify 183 genes significantly up-regulated by infection and 61 genes significantly down-regulated by infection. We also produce a new homology-based immune catalog in N. vitripennis, and show that most infection-induced genes cannot be assigned an immune function from homology alone, suggesting the potential for substantial novel immune components in less well-studied systems. Finally, we show that a high proportion of these novel induced genes are taxonomically restricted, highlighting the rapid evolution of immune gene content. The combination of functional annotation using RNA-seq and homology-based annotation provides a robust method to characterize the innate immune response across a wide variety of insects, and reveals significant novel features of the Nasonia immune response.
Variation in reproductive success has long been thought to be mediated in part by genes encoding seminal proteins. Here we explore the effect on male reproductive phenotypes of X-linked polymorphisms, a chromosome that is depauperate in genes encoding seminal proteins. Using 57 X chromosome substitution lines, sperm competition was tested both when the males from the wild-extracted line were the first to mate (“defense” crosses), followed by a tester male, and when extracted-line males were the second to mate, after a tester male (“offfense” crosses). We scored the proportion of progeny sired by each male, the fecundity, the remating rate and refractoriness to remating, and tested the significance of variation among lines. Eleven candidate genes were chosen based on previous studies, and portions of these genes were sequenced in all 57 lines. A total of 131 polymorphisms were tested for associations with the reproductive phenotypes using linear models. Nine polymorphisms in 4 genes were found to show significant associations (at a 5% FDR). Overall, it appears that the X chromosomes harbor abundant variation in sperm competition, especially considering the paucity of seminal protein genes. This suggests that much of the male reproductive variation lies outside of genes that encode seminal proteins.
The detailed study of breakpoints associated with copy number variants (CNVs) can elucidate the mutational mechanisms that generate them and the comparison of breakpoints across species can highlight differences in genomic architecture that may lead to lineage-specific differences in patterns of CNVs. Here, we provide a detailed analysis of Drosophila CNV breakpoints and contrast it with similar analyses recently carried out for the human genome.
By applying split-read methods to a total of 10x coverage of 454 shotgun sequence across nine lines of D. melanogaster and by re-examining a previously published dataset of CNVs detected using tiling arrays, we identified the precise breakpoints of more than 600 insertions, deletions, and duplications. Contrasting these CNVs with those found in humans showed that in both taxa CNV breakpoints fall into three classes: blunt breakpoints; simple breakpoints associated with microhomology; and breakpoints with additional nucleotides inserted/deleted and no microhomology. In both taxa CNV breakpoints are enriched with non-B DNA sequence structures, which may impair DNA replication and/or repair. However, in contrast to human genomes, non-allelic homologous-recombination (NAHR) plays a negligible role in CNV formation in Drosophila. In flies, non-homologous repair mechanisms are responsible for simple, recurrent, and complex CNVs, including insertions of de novo sequence as large as 60 bp.
Humans and Drosophila differ considerably in the importance of homology-based mechanisms for the formation of CNVs, likely as a consequence of the differences in the abundance and distribution of both segmental duplications and transposable elements between the two genomes.
Copy number variants; CNVs; Non-allelic homologous-recombination; NAHR; Single-strand annealing; SSA; Non-homologous end-joining; NHEJ; Replication-associated repair; Alternative end-joining; Microhomology-mediated end-joining; MMEJ; Filler DNA
Screening of small molecule libraries offers the potential to identify compounds that inhibit specific biological processes and, ultimately, to identify macromolecules that are important players in such processes. To date, however, most screens of small molecule libraries have focused on identification of compounds that inhibit known proteins or particular steps in a given process, and have emphasized automated primary screens. Here we have used “low tech” in vivo primary screens to identify small molecules that inhibit both cytokinesis and single cell wound repair, two complex cellular processes that possess many common features. The “diversity set”, an ordered array of 1990 compounds available from the National Cancer Institute, was screened in parallel to identify compounds that inhibit cytokinesis in D. excentricus (sand dollar) embryos and single cell wound repair in X. laevis (frog) oocytes. Two small molecules were thus identified: Sph1 and Sph2. Sph1 reduces Rho activation in wound repair and suppresses formation of the spindle midzone during cytokinesis. Sph2 also reduces Rho activation in wound repair and may inhibit cytokinesis by blocking membrane fusion. The results identify two small molecules of interest for analysis of wound repair and cytokinesis, reveal that these processes are more similar than often realized and reveal the potential power of low tech screens of small molecule libraries for analysis of complex cellular processes.
This study addresses the question of how purifying selection operates during recent rapid population growth such as has been experienced by human populations. This is not a straightforward problem because the human population is not at equilibrium: population genetics predicts that, on the one hand, the efficacy of natural selection increases as population size increases, eliminating ever more weakly deleterious variants; on the other hand, a larger number of deleterious mutations will be introduced into the population and will be more likely to increase in their number of copies as the population grows. To understand how patterns of human genetic variation have been shaped by the interaction of natural selection and population growth, we examined the trajectories of mutations with varying selection coefficients, using computer simulations. We observed that while population growth dramatically increases the number of deleterious segregating sites in the population, it only mildly increases the number carried by each individual. Our simulations also show an increased efficacy of natural selection, reflected in a higher fraction of deleterious mutations eliminated at each generation and a more efficient elimination of the most deleterious ones. As a consequence, while each individual carries a larger number of deleterious alleles than expected in the absence of growth, the average selection coefficient of each segregating allele is less deleterious. Combined, our results suggest that the genetic risk of complex diseases in growing populations might be distributed across a larger number of more weakly deleterious rare variants.
purifying selection; exponential growth; deleterious mutations; demographic history; human
The parasitoid wasp Nasonia vitripennis is an emerging genetic model for functional analysis of DNA methylation. Here, we characterize genome-wide methylation at a base-pair resolution, and compare these results to gene expression across five developmental stages and to methylation patterns reported in other insects. An accurate assessment of DNA methylation across the genome is accomplished using bisulfite sequencing of adult females from a highly inbred line. One-third of genes show extensive methylation over the gene body, yet methylated DNA is not found in non-coding regions and rarely in transposons. Methylated genes occur in small clusters across the genome. Methylation demarcates exon-intron boundaries, with elevated levels over exons, primarily in the 5′ regions of genes. It is also elevated near the sites of translational initiation and termination, with reduced levels in 5′ and 3′ UTRs. Methylated genes have higher median expression levels and lower expression variation across development stages than non-methylated genes. There is no difference in frequency of differential splicing between methylated and non-methylated genes, and as yet no established role for methylation in regulating alternative splicing in Nasonia. Phylogenetic comparisons indicate that many genes maintain methylation status across long evolutionary time scales. Nasonia methylated genes are more likely to be conserved in insects, but even those that are not conserved show broader expression across development than comparable non-methylated genes. Finally, examination of duplicated genes shows that those paralogs that have lost methylation in the Nasonia lineage following gene duplication evolve more rapidly, show decreased median expression levels, and increased specialization in expression across development. Methylation of Nasonia genes signals constitutive transcription across developmental stages, whereas non-methylated genes show more dynamic developmental expression patterns. We speculate that loss of methylation may result in increased developmental specialization in evolution and acquisition of methylation may lead to broader constitutive expression.
Insects use methylation to modulate genome function in a different manner from vertebrates. Here, we quantified the global methylation profile in a parasitic wasp species, Nasonia vitripennis, a model with some advantages over ant and honeybee for functional and genetic analyses of methylation, such as short generation time, inbred lines, and inter-fertile species. Using a highly inbred line permitted us to precisely characterize DNA methylation, which is compared to gene expression variation across developmental stages, and contrasted to other insect species. DNA methylation is almost exclusively on the 5′-most 1 kbp coding exons, and ∼1/3 of protein coding genes are methylated. Methylated genes tend to occur in small clusters in the genome. Unlike many organisms, Nasonia leaves nearly all transposable element genes non-methylated. Methylated genes exhibit more uniform expression across developmental stages for both moderately and highly expressed genes, suggesting that DNA methylation is marking the genes for constitutive expression. Among pairs of differentially methylated duplicated genes, the paralogs that lose DNA methylation after duplication in the Nasonia lineage show lower expression and greater specialization of expression. Finally, by comparative analysis, we show that methylated genes are more conserved at three different time scales during evolution.
Characterizing and understanding the complex spectrum of lipids in higher organisms lags far behind our analysis of genome and transcriptome sequences. Here we generate and evaluate comprehensive lipid profiles (>200 lipids) of 92 inbred lines from five different Drosophila melanogaster populations. We find that the majority of lipid species are highly heritable, and even lipids with odd-chain fatty acids, which cannot be generated by the fly itself, also have high heritabilities. Abundance of the endosymbiont Wolbachia, a potential provider of odd-chained lipids, was positively correlated with this group of lipids. Additionally, we show that despite years of laboratory rearing on the same medium, the lipid profiles of the five geographic populations are sufficiently distinct for population discrimination. Our data predicts a strikingly different membrane fluidity for flies from the Netherlands, which is supported by their increased ethanol tolerance. We find that 18% of lipids show strong concentration differences between males and females. Through an analysis of the correlation structure of the lipid classes, we find modules of co-regulated lipids and begin to associate these with metabolic constraints. Our data provide a foundation for developing associations between variation in lipid composition with variation in other metabolic attributes, with genome-wide variation, and with metrics of health and overall reproductive fitness.
Molecular evolutionary theory predicts that the ratio of autosomal to X-linked adaptive substitution (KA/Kx) is primarily determined by the average dominance coefficient of beneficial mutations. Although this theory has profoundly influenced analysis and interpretation of comparative genomic data, its predictions are based upon two unverified assumptions about the genetic basis of adaptation. The theory assumes that 1) the rate of adaptively driven molecular evolution is limited by the availability of beneficial mutations, and 2) the scaling of evolutionary parameters between the X and the autosomes (e.g., the beneficial mutation rate, and the fitness effect distribution of beneficial alleles, per X-linked versus autosomal locus) is constant across molecular evolutionary timescales. Here, we show that the genetic architecture underlying bouts of adaptive substitution can influence both assumptions, and consequently, the theoretical relationship between KA/Kx and mean dominance. Quantitative predictions of prior theory apply when 1) many genomically dispersed genes potentially contribute beneficial substitutions during individual steps of adaptive walks, and 2) the population beneficial mutation rate, summed across the set of potentially contributing genes, is sufficiently small to ensure that adaptive substitutions are drawn from new mutations rather than standing genetic variation. Current research into the genetic basis of adaptation suggests that both assumptions are plausibly violated. We find that the qualitative positive relationship between mean dominance and KA/Kx is relatively robust to the specific conditions underlying adaptive substitution, yet the quantitative relationship between dominance and KA/Kx is quite flexible and context dependent. This flexibility may partially account for the puzzlingly variable X versus autosome substitution patterns reported in the empirical evolutionary genomics literature. The new theory unites the previously separate analysis of adaptation using new mutations versus standing genetic variation and makes several useful predictions about the interaction between genetic architecture, evolutionary genetic constraints, and effective population size in determining the ratio of adaptive substitution between autosomal and X-linked genes.
dominance; epistasis; genetics of adaptation; soft sweeps; molecular evolution
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.
Not all cigarette smokers develop chronic obstructive pulmonary disease (COPD), and discovering susceptibility factors is an important research priority. The oxidative burden of smoking may overwhelm antioxidant defenses, and vulnerabilities may exist as a result of sequence variants in genes encoding antioxidant enzymes. This study explored the association between genetic variation in a network of antioxidant enzymes and lung phenotypes. Linear models evaluated single locus marker associations in 2,387 European and African American participants in the Health, Aging, and Body Composition (Health ABC) Study. After correcting for multiple comparisons, 15 statistically significant associations were identified, all of which were for SNP by smoking interactions. The most statistically significant findings were in genes encoding members of the isocitrate dehydrogenase gene family (IDH3A, IDH3B, IDH2). For rs6107100 (IDH3B) the variant genotype was associated with a difference of 6% in the FEV1/FVC ratio in African American current smokers, but the SNP had little or no association with FEV1/FVC in former and never smokers (nominal pinteraction=5 × 10−6). A variant in peroxiredoxin gene (rs9787810, PRDX5) was associated with lower %predicted FEV1 and a lower ratio in European American current smokers, with little or no association in other smoking groups (nominal pinteraction=0.0001 and 0.0003, respectively). The studied genes have not been reported in previous candidate gene association studies, and thus the findings suggest novel mechanisms and targets for future research, and provide evidence for a contribution of sequence variation in genes encoding antioxidant enzymes to susceptibility in smokers.
Antioxidant enzymes; Lung function
X inactivation—the transcriptional silencing of one X chromosome copy per female somatic cell—is universal among therian mammals, yet the choice of which X to silence exhibits considerable variation among species. X inactivation strategies can range from strict paternally inherited X inactivation (PXI), which renders females haploid for all maternally inherited alleles, to unbiased random X inactivation (RXI), which equalizes expression of maternally and paternally inherited alleles in each female tissue. However, the underlying evolutionary processes that might account for this observed diversity of X inactivation strategies remain unclear. We present a theoretical population genetic analysis of X inactivation evolution and specifically consider how conditions of dominance, linkage, recombination, and sex-differential selection each influence evolutionary trajectories of X inactivation. The results indicate that a single, critical interaction between allelic dominance and sex-differential selection can select for a broad and continuous range of X inactivation strategies, including unequal rates of inactivation between maternally and paternally inherited X chromosomes. RXI is favored over complete PXI as long as alleles deleterious to female fitness are sufficiently recessive, and the criteria for RXI evolution is considerably more restrictive when fitness variation is sexually antagonistic (i.e., alleles deleterious to females are beneficial to males) relative to variation that is deleterious to both sexes. Evolutionary transitions from PXI to RXI also generally increase mean relative female fitness at the expense of decreased male fitness. These results provide a theoretical framework for predicting and interpreting the evolution of chromosome-wide expression of X-linked genes and lead to several useful predictions that could motivate future studies of allele-specific gene expression variation.
With the exception of its most primitive members, mammal species practice X inactivation, where one copy of each X chromosome pair is silenced in each cell of the female body. The particular copy of the X that is silenced nevertheless shows considerable variability among species, and the evolutionary causes for this variability remain unclear. Here, we show that X inactivation strategies are likely to evolve in response to the sex-differential fitness properties of X-linked genetic variation. Genetic variation with similar effects on male and female fitness will generally favor the evolution of random X inactivation, potentially including preferential inactivation of the maternally inherited X chromosome. Variation with opposing fitness effects in each sex (“sexually antagonistic” variation, which includes mutations that both decrease female fitness and enhance male fitness) selects for preferential or complete inactivation of the paternally inherited X. Paternally biased X inactivation patterns appear to be common in nature, which suggests that sexually antagonistic genetic variation might be an important factor underlying the evolution of X inactivation. The theory provides a conceptual framework for understanding the evolution of X inactivation strategies and generates several novel predictions that may soon be tested with modern genome sequencing technologies.
Human populations have experienced recent explosive growth, expanding by at least three orders of magnitude over the past 400 generations. This departure from equilibrium skews patterns of genetic variation and distorts basic principles of population genetics. We characterized the empirical signatures of explosive growth on the site frequency spectrum and found that the discrepancy in rare variant abundance across demographic modeling studies is mostly due to differences in sample size. Rapid recent growth increases the load of rare variants and is likely to play a role in the individual genetic burden of complex disease risk. Hence, the extreme recent human population growth needs to be taken into consideration in studying the genetics of complex diseases and traits.
Various methods have been developed for identifying gene–gene interactions in genome-wide association studies (GWAS). However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene–gene interaction (GGG) tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein–protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies.
Epistasis is likely to play a significant role in complex diseases or traits and is one of the many possible explanations for “missing heritability.” However, epistatic interactions have been difficult to detect in genome-wide association studies (GWAS) due to the limited power caused by the multiple-testing correction from the large number of tests conducted. Gene-based gene–gene interaction (GGG) tests might hold the key to relaxing the multiple-testing correction burden and increasing the power for identifying epistatic interactions in GWAS. Here, we developed GGG tests of quantitative traits by extending four P value combining methods and evaluated their type I error rates and power using extensive simulations. All four GGG tests are more powerful than a principal component-based test. We also applied our GGG tests to data from the Atherosclerosis Risk in Communities study and found five gene-level interactions associated with the levels of total cholesterol and high-density lipoprotein cholesterol (HDL-C). One interaction between SMAD3 and NEDD9 on HDL-C was further replicated in an independent sample from the Multi-Ethnic Study of Atherosclerosis.