Lethal mutagenesis, the attempt to extinguish a population by elevating its mutation rate, has been endorsed in the virology literature as a promising approach for treating viral infections. In support of the concept, in vitro studies have forced viral extinction with high doses of mutagenic drugs. However, the one known mutagenic drug used on patients commonly fails to cure infections, and in vitro studies typically find a wide range of mutagenic conditions permissive for viral growth. A key question becomes how subsequent evolution is affected if the viral population is mutated but avoids extinction—Is viral adaptation augmented rather than suppressed? Here we consider the evolution of highly mutated populations surviving mutagenesis, using the DNA phage T7. In assays using inhibitory hosts, whenever resistance mutants were observed, the mutagenized populations exhibited higher frequencies, but some inhibitors blocked plaque formation by even the mutagenized stock. Second, outgrowth of previously mutagenized populations led to rapid and potentially complete fitness recovery but polymorphism was slow to decay, and mutations exhibited inconsistent patterns of change. Third, the combination of population bottlenecks with mutagenesis did cause fitness declines, revealing a vulnerability that was not apparent from mutagenesis of large populations. The results show that a population surviving high mutagenesis may exhibit enhanced adaptation in some environments and experience little negative fitness consequences in many others.
evolution; extinction; fitness; virus; theory; mutation
A relatively small number of signaling pathways govern the early patterning processes of metazoan development. The architectural changes over time to these signaling pathways offer unique insights into their evolution. In the case of Hedgehog (Hh) signaling, two very divergent mechanisms of pathway transduction have evolved. In vertebrates, signaling relies on trafficking of Hh pathway components to nonmotile specialized primary cilia. In contrast, protostomes do not use cilia of any kind for Hh signal transduction. How these divergent lineages adapted such dramatically different ways of activating the signaling pathway is an unanswered question. Here, we present evidence that in the sea urchin, a basal deuterostome, motile cilia are required for embryonic Hh signal transduction, and the Hh receptor Smoothened (Smo) localizes to cilia during active Hh signaling. This is the first evidence that Hh signaling requires motile cilia and the first case of an organism requiring cilia outside of the vertebrate lineage.
cell biology; evolution and development; cell signaling
Changes in gene regulation are associated with the evolution of morphologies. However, the specific sequence information controlling gene expression is largely unknown and discovery is time and labor consuming. We use the intricate patterning of follicle cells to probe species’ relatedness in the absence of sequence information. We focus on one of the major families of genes that pattern the Drosophila eggshell, the Chorion protein (Cp). Systematically screening for the spatiotemporal patterning of all nine Cp genes in three species (Drosophila melanogaster, D. nebulosa, and D. willistoni), we found that most genes are expressed dynamically during mid and late stages of oogenesis. Applying an annotation code, we transformed the data into binary matrices that capture the complexity of gene expression. Gene patterning is sufficient to predict species’ relatedness, consistent with their phylogeny. Surprisingly, we found that expression domains of most genes are different among species, suggesting that Cp regulation is rapidly evolving. In addition, we found a morphological novelty along the dorsalmost side of the eggshell, the dorsal ridge. Our matrix analysis placed the dorsal ridge domain in a cluster of epidermal growth factor receptor associated domains, which was validated through genetic and chemical perturbations. Expression domains are regulated cooperatively or independently by signaling pathways, supporting that complex patterns are combinatorially assembled from simple domains.
tissue patterning; oogenesis; eggshell; EGFR signaling
Much of the phenotypic variation observed between even closely related species may be driven by differences in gene expression levels. The current availability of reliable techniques like RNA-Seq, which can quantify expression levels across species, has enabled comparative studies. Ornstein–Uhlenbeck (OU) processes have been proposed to model gene expression evolution as they model both random drift and stabilizing selection and can be extended to model changes in selection regimes. The OU models provide a statistical framework that allows comparisons of specific hypotheses of selective regimes, including random drift, constrained drift, and expression level shifts. In this way, inferences may be made about the mode of selection acting on the expression level of a gene. We augment this model to include within-species expression variance, allowing for modeling of nonevolutionary expression variance that could be caused by individual genetic, environmental, or technical variation. Through simulations, we explore the reliability of parameter estimates and the extent to which different selective regimes can be distinguished using phylogenies of varying size using both the typical OU model and our extended model. We find that if individual variation is not accounted for, nonevolutionary expression variation is often mistaken for strong stabilizing selection. The methods presented in this article are increasingly relevant as comparative expression data becomes more available and researchers turn to expression as a primary evolving phenotype.
expression evolution; RNA-Seq; Ornstein–Uhlenbeck; evolutionary models; expression variation
We announce the release of an advanced version of the Molecular Evolutionary Genetics Analysis (MEGA) software, which currently contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting molecular evolutionary analysis. In version 6.0, MEGA now enables the inference of timetrees, as it implements the RelTime method for estimating divergence times for all branching points in a phylogeny. A new Timetree Wizard in MEGA6 facilitates this timetree inference by providing a graphical user interface (GUI) to specify the phylogeny and calibration constraints step-by-step. This version also contains enhanced algorithms to search for the optimal trees under evolutionary criteria and implements a more advanced memory management that can double the size of sequence data sets to which MEGA can be applied. Both GUI and command-line versions of MEGA6 can be downloaded from www.megasoftware.net free of charge.
software; relaxed clocks; phylogeny
Yeast species represent an ideal model system for population genomic studies but large-scale polymorphism surveys have only been reported for species of the Saccharomyces genus so far. Hence, little is known about intraspecific diversity and evolution in yeast. To obtain a new insight into the evolutionary forces shaping natural populations, we sequenced the genomes of an expansive worldwide collection of isolates from a species distantly related to Saccharomyces cerevisiae: Lachancea kluyveri (formerly S. kluyveri). We identified 6.5 million single nucleotide polymorphisms and showed that a large introgression event of 1 Mb of GC-rich sequence in the chromosomal arm probably occurred in the last common ancestor of all L. kluyveri strains. Our population genomic data clearly revealed that this 1-Mb region underwent a molecular evolution pattern very different from the rest of the genome. It is characterized by a higher recombination rate, with a dramatically elevated A:T → G:C substitution rate, which is the signature of an increased GC-biased gene conversion. In addition, the predicted base composition at equilibrium demonstrates that the chromosome-scale compositional heterogeneity will persist after the genome has reached mutational equilibrium. Altogether, the data presented herein clearly show that distinct recombination and substitution regimes can coexist and lead to different evolutionary patterns within a single genome.
population genomics; chromosome evolution; yeast
Desaturase genes are essential for biological processes, including lipid metabolism, cell signaling, and membrane fluidity regulation. Insect desaturases are particularly interesting for their role in chemical communication, and potential contribution to speciation, symbioses, and sociality. Here, we describe the acyl-CoA desaturase gene families of 15 insects, with a focus on social Hymenoptera. Phylogenetic reconstruction revealed that the insect desaturases represent an ancient gene family characterized by eight subfamilies that differ strongly in their degree of conservation and frequency of gene gain and loss. Analyses of genomic organization showed that five of these subfamilies are represented in a highly microsyntenic region conserved across holometabolous insect taxa, indicating an ancestral expansion during early insect evolution. In three subfamilies, ants exhibit particularly large expansions of genes. Despite these expansions, however, selection analyses showed that desaturase genes in all insect lineages are predominantly undergoing strong purifying selection. Finally, for three expanded subfamilies, we show that ants exhibit variation in gene expression between species, and more importantly, between sexes and castes within species. This suggests functional differentiation of these genes and a role in the regulation of reproductive division of labor in ants. The dynamic pattern of gene gain and loss of acyl-CoA desaturases in ants may reflect changes in response to ecological diversification and an increased demand for chemical signal variability. This may provide an example of how gene family expansions can contribute to lineage-specific adaptations through structural and regulatory changes acting in concert to produce new adaptive phenotypes.
desaturase genes; gene duplication; social Hymenoptera; chemical communication
The evolution of avian feathers has recently been illuminated by fossils and the identification of genes involved in feather patterning and morphogenesis. However, molecular studies have focused mainly on protein-coding genes. Using comparative genomics and more than 600,000 conserved regulatory elements, we show that patterns of genome evolution in the vicinity of feather genes are consistent with a major role for regulatory innovation in the evolution of feathers. Rates of innovation at feather regulatory elements exhibit an extended period of innovation with peaks in the ancestors of amniotes and archosaurs. We estimate that 86% of such regulatory elements and 100% of the nonkeratin feather gene set were present prior to the origin of Dinosauria. On the branch leading to modern birds, we detect a strong signal of regulatory innovation near insulin-like growth factor binding protein (IGFBP) 2 and IGFBP5, which have roles in body size reduction, and may represent a genomic signature for the miniaturization of dinosaurian body size preceding the origin of flight.
enhancer; gene regulation; comparative genomics; integument; body size; dinosaur
A fundamental question in evolutionary genetics concerns the roles of mutational pleiotropy and epistasis in shaping trajectories of protein evolution. This question can be addressed most directly by using site-directed mutagenesis to explore the mutational landscape of protein function in experimentally defined regions of sequence space. Here, we evaluate how pleiotropic trade-offs and epistatic interactions influence the accessibility of alternative mutational pathways during the adaptive evolution of hemoglobin (Hb) function in high-altitude pikas (Mammalia: Lagomorpha). By combining ancestral protein resurrection with a combinatorial protein-engineering approach, we examined the functional effects of sequential mutational steps in all possible pathways that produced an increased Hb–O2 affinity. These experiments revealed that the effects of mutations on Hb–O2 affinity are highly dependent on the temporal order in which they occur: Each of three β-chain substitutions produced a significant increase in Hb–O2 affinity on the ancestral genetic background, but two of these substitutions produced opposite effects when they occurred as later steps in the pathway. The experiments revealed pervasive epistasis for Hb–O2 affinity, but affinity-altering mutations produced no significant pleiotropic trade-offs. These results provide insights into the properties of adaptive substitutions in naturally evolved proteins and suggest that the accessibility of alternative mutational pathways may be more strongly constrained by sign epistasis for positively selected biochemical phenotypes than by antagonistic pleiotropy.
adaptation; epistasis; hemoglobin; high altitude; molecular evolution; protein evolution
An elaborated tripartite brain is considered one of the important innovations of vertebrates. Other extant chordate groups have a more basic brain organization. For instance, cephalochordates possess a relatively simple brain possibly homologous to the vertebrate forebrain and hindbrain, whereas tunicates display the tripartite organization, but without the specialized brain centers. The difference in anatomical complexity is even more pronounced if one compares chordates with other deuterostomes that have only a diffuse nerve net or alternatively a rather simple central nervous system. To gain a new perspective on the evolutionary roots of the complex vertebrate brain, we made here a phylostratigraphic analysis of gene expression patterns in the developing zebrafish (Danio rerio). The recovered adaptive landscape revealed three important periods in the evolutionary history of the zebrafish brain. The oldest period corresponds to preadaptive events in the first metazoans and the emergence of the nervous system at the metazoan–eumetazoan transition. The origin of chordates marks the next phase, where we found the overall strongest adaptive imprint in almost all analyzed brain regions. This finding supports the idea that the vertebrate brain evolved independently of the brains within the protostome lineage. Finally, at the origin of vertebrates we detected a pronounced signal coming from the dorsal telencephalon, in agreement with classical theories that consider this part of the cerebrum a genuine vertebrate innovation. Taken together, these results reveal a stepwise adaptive history of the vertebrate brain where most of its extant organization was already present in the chordate ancestor.
brain evolution; vertebrates; phylostratigraphy; zebrafish; gene expression; development
Whole-genome resequencing of experimental populations evolving under a specific selection regime has become a popular approach to determine genotype–phenotype maps and understand adaptation to new environments. Despite its conceptual appeal and success in identifying some causative genes, it has become apparent that many studies suffer from an excess of candidate loci. Several explanations have been proposed for this phenomenon, but it is clear that information about the linkage structure during such experiments is needed. Until now only Pool-Seq (whole-genome sequencing of pools of individuals) data were available, which do not provide sufficient information about the correlation between linked sites. We address this problem in two complementary analyses of three replicate Drosophila melanogaster populations evolving to a new hot temperature environment for almost 70 generations. In the first analysis, we sequenced 58 haploid genomes from the founder population and evolved flies at generation 67. We show that during the experiment linkage disequilibrium (LD) increased almost uniformly over much greater distances than typically seen in Drosophila. In the second analysis, Pool-Seq time series data of the three replicates were combined with haplotype information from the founder population to follow blocks of initial haplotypes over time. We identified 17 selected haplotype-blocks that started at low frequencies in the base population and increased in frequency during the experiment. The size of these haplotype-blocks ranged from 0.082 to 4.01 Mb. Moreover, between 42% and 46% of the top candidate single nucleotide polymorphisms from the comparison of founder and evolved populations fell into the genomic region covered by the haplotype-blocks. We conclude that LD in such rising haplotype-blocks results in long range hitchhiking over multiple kilobase-sized regions. LD in such haplotype-blocks is therefore a major factor contributing to an excess of candidate loci. Although modifications of the experimental design may help to reduce the hitchhiking effect and allow for more precise mapping of causative variants, we also note that such haplotype-blocks might be well suited to study the dynamics of selected genomic regions during experimental evolution studies.
experimental evolution; haplotype sequencing; long range genetic hitchhiking; time series; standing genetic variation; selection on rare variants
Genome evolution is shaped by a multitude of mutational processes, including point mutations, insertions, and deletions of DNA sequences, as well as segmental duplications. These mutational processes can leave distinctive qualitative marks in the statistical features of genomic DNA sequences. One such feature is the match length distribution (MLD) of exactly matching sequence segments within an individual genome or between the genomes of related species. These have been observed to exhibit characteristic power law decays in many species. Here, we show that simple dynamical models consisting solely of duplication and mutation processes can already explain the characteristic features of MLDs observed in genomic sequences. Surprisingly, we find that these features are largely insensitive to details of the underlying mutational processes and do not necessarily rely on the action of natural selection. Our results demonstrate how analyzing statistical features of DNA sequences can help us reveal and quantify the different mutational processes that underlie genome evolution.
genome evolution; sequence similarities; segmental duplication; comparative genomics
Local protein interactions (“molecular context” effects) dictate amino acid replacements and can be described in terms of site-specific, energetic preferences for any different amino acid. It has been recently debated whether these preferences remain approximately constant during evolution or whether, due to coevolution of sites, they change strongly. Such research highlights an unresolved and fundamental issue with far-reaching implications for phylogenetic analysis and molecular evolution modeling. Here, we take advantage of the recent availability of phenotypically supported laboratory resurrections of Precambrian thioredoxins and β-lactamases to experimentally address the change of site-specific amino acid preferences over long geological timescales. Extensive mutational analyses support the notion that evolutionary adjustment to a new amino acid may occur, but to a large extent this is insufficient to erase the primitive preference for amino acid replacements. Generally, site-specific amino acid preferences appear to remain conserved throughout evolutionary history despite local sequence divergence. We show such preference conservation to be readily understandable in molecular terms and we provide crystallographic evidence for an intriguing structural-switch mechanism: Energetic preference for an ancestral amino acid in a modern protein can be linked to reorganization upon mutation to the ancestral local structure around the mutated site. Finally, we point out that site-specific preference conservation naturally leads to one plausible evolutionary explanation for the existence of intragenic global suppressor mutations.
molecular evolution; ancestral proteins; amino acid replacements
Although parasitic organisms are found worldwide, the relative importance of host specificity and geographic isolation for parasite speciation has been explored in only a few systems. Here, we study Plasmodium parasites known to infect Asian nonhuman primates, a monophyletic group that includes the lineage leading to the human parasite Plasmodium vivax and several species used as laboratory models in malaria research. We analyze the available data together with new samples from three sympatric primate species from Borneo: The Bornean orangutan and the long-tailed and the pig-tailed macaques. We find several species of malaria parasites, including three putatively new species in this biodiversity hotspot. Among those newly discovered lineages, we report two sympatric parasites in orangutans. We find no differences in the sets of malaria species infecting each macaque species indicating that these species show no host specificity. Finally, phylogenetic analysis of these data suggests that the malaria parasites infecting Southeast Asian macaques and their relatives are speciating three to four times more rapidly than those with other mammalian hosts such as lemurs and African apes. We estimate that these events took place in approximately a 3–4-Ma period. Based on the genetic and phenotypic diversity of the macaque malarias, we hypothesize that the diversification of this group of parasites has been facilitated by the diversity, geographic distributions, and demographic histories of their primate hosts.
host range; macaques; malaria; orangutan; parasite speciation; phylogeny Plasmodium; population structure
The resurrection of ancestral proteins provides direct insight into how natural selection has shaped proteins found in nature. By tracing substitutions along a gene phylogeny, ancestral proteins can be reconstructed in silico and subsequently synthesized in vitro. This elegant strategy reveals the complex mechanisms responsible for the evolution of protein functions and structures. However, to date, all protein resurrection studies have used simplistic approaches for ancestral sequence reconstruction (ASR), including the assumption that a single sequence alignment alone is sufficient to accurately reconstruct the history of the gene family. The impact of such shortcuts on conclusions about ancestral functions has not been investigated. Here, we show with simulations that utilizing information on species history using a model that accounts for the duplication, horizontal transfer, and loss (DTL) of genes statistically increases ASR accuracy. This underscores the importance of the tree topology in the inference of putative ancestors. We validate our in silico predictions using in vitro resurrection of the LeuB enzyme for the ancestor of the Firmicutes, a major and ancient bacterial phylum. With this particular protein, our experimental results demonstrate that information on the species phylogeny results in a biochemically more realistic and kinetically more stable ancestral protein. Additional resurrection experiments with different proteins are necessary to statistically quantify the impact of using species tree-aware gene trees on ancestral protein phenotypes. Nonetheless, our results suggest the need for incorporating both sequence and DTL information in future studies of protein resurrections to accurately define the genotype–phenotype space in which proteins diversify.
ancestral sequence reconstruction; protein resurrection; gene tree reconciliation; lateral gene transfer; protein evolution; phylogeny
Widespread premature termination codon mutations (PTCs) were recently observed in human and fly populations. We took advantage of the population resequencing data in the Drosophila Genetic Reference Panel to investigate how the expression profile and the evolutionary age of genes shaped the allele frequency distribution of PTCs. After generating a high-quality data set of PTCs, we clustered genes harboring PTCs into three categories: genes encoding low-frequency PTCs (≤1.5%), moderate-frequency PTCs (1.5–10%), and high-frequency PTCs (>10%). All three groups show narrow transcription compared with PTC-free genes, with the moderate- and high-PTC frequency groups showing a pronounced pattern. Moreover, nearly half (42%) of the PTC-encoding genes are not expressed in any tissue. Interestingly, the moderate-frequency PTC group is strongly enriched for genes expressed in midgut, whereas genes harboring high-frequency PTCs tend to have sex-specific expression. We further find that although young genes born in the last 60 My compose a mere 9% of the genome, they represent 16%, 30%, and 50% of the genes containing low-, moderate-, and high-frequency PTCs, respectively. Among DNA-based and RNA-based duplicated genes, the child copy is approximately twice as likely to contain PTCs as the parent copy, whereas young de novo genes are as likely to encode PTCs as DNA-based duplicated new genes. Based on these results, we conclude that expression profile and gene age jointly shaped the landscape of PTC-mediated gene loss. Therefore, we propose that new genes may need a long time to become stably maintained after the origination.
gene loss; premature termination codon; midgut; young gene; gene duplication
We estimated the spontaneous mutation rate in Heliconius melpomene by genome sequencing of a pair of parents and 30 of their offspring, based on the ratio of number of de novo heterozygotes to the number of callable site-individuals. We detected nine new mutations, each one affecting a single site in a single offspring. This yields an estimated mutation rate of 2.9 × 10−9 (95% confidence interval, 1.3 × 10−9–5.5 × 10−9), which is similar to recent estimates in Drosophila melanogaster, the only other insect species in which the mutation rate has been directly estimated. We infer that recent effective population size of H. melpomene is about 2 million, a substantially lower value than its census size, suggesting a role for natural selection reducing diversity. We estimate that H. melpomene diverged from its Müllerian comimic H. erato about 6 Ma, a somewhat later date than estimates based on a local molecular clock.
mutation; Heliconius; genome sequencing
Measuring natural selection on genomic elements involved in the cis-regulation of gene expression—such as transcriptional enhancers and promoters—is critical for understanding the evolution of genomes, yet it remains a major challenge. Many studies have attempted to detect positive or negative selection in these noncoding elements by searching for those with the fastest or slowest rates of evolution, but this can be problematic. Here, we introduce a new approach to this issue, and demonstrate its utility on three mammalian transcriptional enhancers. Using results from saturation mutagenesis studies of these enhancers, we classified all possible point mutations as upregulating, downregulating, or silent, and determined which of these mutations have occurred on each branch of a phylogeny. Applying a framework analogous to Ka/Ks in protein-coding genes, we measured the strength of selection on upregulating and downregulating mutations, in specific branches as well as entire phylogenies. We discovered distinct modes of selection acting on different enhancers: although all three have experienced negative selection against downregulating mutations, the selection pressures on upregulating mutations vary. In one case, we detected positive selection for upregulation, whereas the other two had no detectable selection on upregulating mutations. Our methodology is applicable to the growing number of saturation mutagenesis data sets, and provides a detailed picture of the mode and strength of natural selection acting on cis-regulatory elements.
natural selection; noncoding; enhancer; neutral; cis-regulation
Influenza B viruses make a considerable contribution to morbidity attributed to seasonal influenza. Currently circulating influenza B isolates are known to belong to two antigenically distinct lineages referred to as B/Victoria and B/Yamagata. Frequent exchange of genomic segments of these two lineages has been noted in the past, but the observed patterns of reassortment have not been formalized in detail. We investigate interlineage reassortments by comparing phylogenetic trees across genomic segments. Our analyses indicate that of the eight segments of influenza B viruses only segments coding for polymerase basic 1 and 2 (PB1 and PB2) and hemagglutinin (HA) proteins have maintained separate Victoria and Yamagata lineages and that currently circulating strains possess PB1, PB2, and HA segments derived entirely from one or the other lineage; other segments have repeatedly reassorted between lineages thereby reducing genetic diversity. We argue that this difference between segments is due to selection against reassortant viruses with mixed-lineage PB1, PB2, and HA segments. Given sufficient time and continued recruitment to the reassortment-isolated PB1–PB2–HA gene complex, we expect influenza B viruses to eventually undergo sympatric speciation.
influenza; reassortment; evolution; phylogenetics; speciation
A method was developed for simultaneous Bayesian inference of species delimitation and species phylogeny using the multispecies coalescent model. The method eliminates the need for a user-specified guide tree in species delimitation and incorporates phylogenetic uncertainty in a Bayesian framework. The nearest-neighbor interchange algorithm was adapted to propose changes to the species tree, with the gene trees for multiple loci altered in the proposal to avoid conflicts with the newly proposed species tree. We also modify our previous scheme for specifying priors for species delimitation models to construct joint priors for models of species delimitation and species phylogeny. As in our earlier method, the modified algorithm integrates over gene trees, taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. We conducted a simulation study to examine the statistical properties of the method using six populations (two sequences each) and a true number of three species, with values of divergence times and ancestral population sizes that are realistic for recently diverged species. The results suggest that the method tends to be conservative with high posterior probabilities being a confident indicator of species status. Simulation results also indicate that the power of the method to delimit species increases with an increase of the divergence times in the species tree, and with an increased number of gene loci. Reanalyses of two data sets of cavefish and coast horned lizards suggest considerable phylogenetic uncertainty even though the data are informative about species delimitation. We discuss the impact of the prior on models of species delimitation and species phylogeny and of the prior on population size parameters (θ) on Bayesian species delimitation.
Bayesian species delimitation; species tree; multispecies coalescent; reversible-jump MCMC; guide tree; nearest-neighbor interchange
The POU genes represent a diverse class of animal-specific transcription factors that play important roles in neurogenesis, pluripotency, and cell-type specification. Although previous attempts have been made to reconstruct the evolution of the POU class, these studies have been limited by a small number of representative taxa, and a lack of sequences from basally branching organisms. In this study, we performed comparative analyses on available genomes and sequences recovered through “gene fishing” to better resolve the topology of the POU gene tree. We then used ancestral state reconstruction to map the most likely changes in amino acid evolution for the conserved domains. Our work suggests that four of the six POU families evolved before the last common ancestor of living animals—doubling previous estimates—and were followed by extensive clade-specific gene loss. Amino acid changes are distributed unequally across the gene tree, consistent with a neofunctionalization model of protein evolution. We consider our results in the context of early animal evolution, and the role of POU5 genes in maintaining stem cell pluripotency.
POU; Metazoa; homeobox; EvoDevo; stem cells; gene duplication
Following domestication, sheep (Ovis aries) have become essential farmed animals across the world through adaptation to a diverse range of environments and varied production systems. Climate-mediated selective pressure has shaped phenotypic variation and has left genetic “footprints” in the genome of breeds raised in different agroecological zones. Unlike numerous studies that have searched for evidence of selection using only population genetics data, here, we conducted an integrated coanalysis of environmental data with single nucleotide polymorphism (SNP) variation. By examining 49,034 SNPs from 32 old, autochthonous sheep breeds that are adapted to a spectrum of different regional climates, we identified 230 SNPs with evidence for selection that is likely due to climate-mediated pressure. Among them, 189 (82%) showed significant correlation (P ≤ 0.05) between allele frequency and climatic variables in a larger set of native populations from a worldwide range of geographic areas and climates. Gene ontology analysis of genes colocated with significant SNPs identified 17 candidates related to GTPase regulator and peptide receptor activities in the biological processes of energy metabolism and endocrine and autoimmune regulation. We also observed high linkage disequilibrium and significant extended haplotype homozygosity for the core haplotype TBC1D12-CH1 of TBC1D12. The global frequency distribution of the core haplotype and allele OAR22_18929579-A showed an apparent geographic pattern and significant (P ≤ 0.05) correlations with climatic variation. Our results imply that adaptations to local climates have shaped the spatial distribution of some variants that are candidates to underpin adaptive variation in sheep.
adaptation; climate-mediated selection; genome-wide scans; GTPase regulator; peptide receptor; TBC1D12; sheep
Several methods have been proposed to test for introgression across genomes. One method tests for a genome-wide excess of shared derived alleles between taxa using Patterson’s D statistic, but does not establish which loci show such an excess or whether the excess is due to introgression or ancestral population structure. Several recent studies have extended the use of D by applying the statistic to small genomic regions, rather than genome-wide. Here, we use simulations and whole-genome data from Heliconius butterflies to investigate the behavior of D in small genomic regions. We find that D is unreliable in this situation as it gives inflated values when effective population size is low, causing D outliers to cluster in genomic regions of reduced diversity. As an alternative, we propose a related statistic f^d, a modified version of a statistic originally developed to estimate the genome-wide fraction of admixture. f^d is not subject to the same biases as D, and is better at identifying introgressed loci. Finally, we show that both D and f^d outliers tend to cluster in regions of low absolute divergence (dXY), which can confound a recently proposed test for differentiating introgression from shared ancestral variation at individual loci.
ABBA–BABA; gene flow; introgression; population structure; Heliconius; simulation