Estimates of the proportion of amino acid substitutions that have been fixed by selection (α) vary widely among taxa, ranging from zero in humans to over 50% in Drosophila. This wide range may reflect differences in the efficacy of selection due to differences in the effective population size (Ne). However, most comparisons have been made among distantly related organisms that differ not only in Ne but also in many other aspects of their biology. Here, we estimate α in three closely related lineages of house mice that have a similar ecology but differ widely in Ne: Mus musculus musculus (Ne ∼ 25,000–120,000), M. m. domesticus (Ne ∼ 58,000–200,000), and M. m. castaneus (Ne ∼ 200,000–733,000). Mice were genotyped using a high-density single nucleotide polymorphism array, and the proportions of replacement and silent mutations within subspecies were compared with those fixed between each subspecies and an outgroup, Mus spretus. There was significant evidence of positive selection in M. m. castaneus, the lineage with the largest Ne, with α estimated to be approximately 40%. In contrast, estimates of α for M. m. domesticus (α = 13%) and for M. m. musculus (α = 12 %) were much smaller. Interestingly, the higher estimate of α for M. m. castaneus appears to reflect not only more adaptive fixations but also more effective purifying selection. These results support the hypothesis that differences in Ne contribute to differences among species in the efficacy of selection.
substitution; adaptation; evolution; effective population size; house mouse; Mus musculus
Neanderthals have been shown to share more genetic variants with present-day non-Africans than Africans. Recent admixture between Neanderthals and modern humans outside of Africa was proposed as the most parsimonious explanation for this observation. However, the hypothesis of ancient population structure within Africa could not be ruled out as an alternative explanation. We use simulations to test whether the site frequency spectrum, conditioned on a derived Neanderthal and an ancestral Yoruba (African) nucleotide (the doubly conditioned site frequency spectrum [dcfs]), can distinguish between models that assume recent admixture or ancient population structure. We compare the simulations to the dcfs calculated from data taken from populations of European, Chinese, and Japanese descent in the Complete Genomics Diversity Panel. Simulations under a variety of plausible demographic parameters were used to examine the shape of the dcfs for both models. The observed shape of the dcfs cannot be explained by any set of parameter values used in the simulations of the ancient structure model. The dcfs simulations for the recent admixture model provide a good fit to the observed dcfs for non-Africans, thereby supporting the hypothesis that recent admixture with Neanderthals accounts for the greater similarity of Neanderthals to non-Africans than Africans.
admixture; ancient structure; Neanderthal; frequency spectrum
Extensive synonymous codon modification of viral genomes appears to be an effective way of attenuating strains for use as live vaccines. An assumption of this method is that codon changes have individually small effects, such that codon-attenuated viruses will be slow to evolve back to high fitness (and thus to high virulence). The major capsid gene of the bacterial virus T7 was modified to have varying levels of suboptimal synonymous codons in different constructs, and fitnesses declined linearly with the number of changes. Adaptation of the most extreme design, with 182 codon changes, resulted in a slow fitness recovery by standards of previous experimental evolution with this virus, although fitness effects of substitutions were higher than expected from the average effect of an engineered codon modification. Molecular evolution during recovery was modest, and changes evolved both within the modified gene and outside it. Some changes within the modified gene evolved in parallel across replicates, but with no obvious explanation. Overall, the study supports the premise that codon-modified viruses recover fitness slowly, although the evolution is substantially more rapid than expected from the design principle.
codon modification; vaccine design; fitness suppression; genome engineering
Although both genotypes with elevated mutation rate (mutators) and mobilization of insertion sequence (IS) elements have substantial impact on genome diversification, their potential interactions are unknown. Moreover, the evolutionary forces driving gradual accumulation of these elements are unclear: Do these elements spread in an initially transposon-free bacterial genome as they enable rapid adaptive evolution? To address these issues, we inserted an active IS1 element into a reduced Escherichia coli genome devoid of all other mobile DNA. Evolutionary laboratory experiments revealed that IS elements increase mutational supply and occasionally generate variants with especially large phenotypic effects. However, their impact on adaptive evolution is small compared with mismatch repair mutator alleles, and hence, the latter impede the spread of IS-carrying strains. Given their ubiquity in natural populations, such mutator alleles could limit early phase of IS element evolution in a new bacterial host. More generally, our work demonstrates the existence of an evolutionary conflict between mutation-promoting mechanisms.
evolution; IS elements; mutation rate
Endogenous retroviruses provide molecular fossils for studying the ancient evolutionary history of retroviruses. Here, we report our independent discovery and analysis of endogenous lentiviral insertions (Mustelidae endogenous lentivirus [MELV]) within the genomes of weasel family (Mustelidae). Genome-scale screening identified MELV elements in the domestic ferret (Mustela putorius furo) genome (MELVmpf). MELVmpf exhibits a typical lentiviral genomic organization. Phylogenetic analyses position MELVmpf basal to either primate lentiviruses or feline immunodeficiency virus. Moreover, we verified the presence of MELV insertions in the genomes of several species of the Lutrinae and Mustelinae subfamilies but not the Martinae subfamily, suggesting that the invasion of MELV into the Mustelidae genomes likely took place between 8.8 and 11.8 Ma. The discovery of MELV in weasel genomes extends the host range of lentiviruses to the Caniformia (order Carnivora) and provides important insights into the prehistoric diversity of lentiviruses.
weasel; lentivirus; endogenous retrovirus
Genome-wide disease association studies contrast genetic variation between disease cohorts and healthy populations to discover single nucleotide polymorphisms (SNPs) and other genetic markers revealing underlying genetic architectures of human diseases. Despite scores of efforts over the past decade, many reproducible genetic variants that explain substantial proportions of the heritable risk of common human diseases remain undiscovered. We have conducted a multispecies genomic analysis of 5,831 putative human risk variants for more than 230 disease phenotypes reported in 2,021 studies. We find that the current approaches show a propensity for discovering disease-associated SNPs (dSNPs) at conserved genomic positions because the effect size (odds ratio) and allelic P value of genetic association of an SNP relates strongly to the evolutionary conservation of their genomic position. We propose a new measure for ranking SNPs that integrates evolutionary conservation scores and the P value (E-rank). Using published data from a large case-control study, we demonstrate that E-rank method prioritizes SNPs with a greater likelihood of bona fide and reproducible genetic disease associations, many of which may explain greater proportions of genetic variance. Therefore, long-term evolutionary histories of genomic positions offer key practical utility in reassessing data from existing disease association studies, and in the design and analysis of future studies aimed at revealing the genetic basis of common human diseases.
phylomedicine, GWAS, heritability
The voltage-sensitive phosphoinositide phosphatases provide a mechanism to couple changes in the transmembrane electrical potential to intracellular signal transduction pathways. These proteins share a domain architecture that is conserved in deuterostomes. However, gene duplication events in primates, including humans, give rise to the paralogs TPTE and TPTE2 that retain protein domain organization but, in the case of TPTE, have lost catalytic activity. Here, we present evidence that these human proteins contain a functional voltage sensor, similar to that in nonmammalian orthologs. However, domains of these human proteins can also generate a noninactivating outward current that is not observed in zebra fish or tunicate orthologs. This outward current has the anticipated characteristics of a voltage-sensitive proton current and is due to the appearance of a single histidine residue in the S4 transmembrane segment of the voltage sensor. Histidine is observed at this position only during the eutherian radiation. Domains from both human paralogs generate proton currents. This apparent gain of proton channel function during the evolution of the TPTE protein family may account for the conservation of voltage sensor domains despite the loss of phosphatase activity in some human paralogs.
voltage sensor domain; proton channel; ion channel; phosphoinositide phosphatase; sperm
Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.
model comparison; marginal likelihood; Bayes factors; path sampling; stepping-stone sampling; demographic models; molecular clock; Bayesian inference; phylogeny; BEAST
It has been increasingly clear that changes in gene regulation play important roles in physiological and phenotypic evolution. Rewiring gene-regulatory networks, i.e., alteration of the gene-regulation system for different biological functions, has been demonstrated in various species. Posttranscriptional regulons have prominent roles in coordinating gene expression in a variety of eukaryotes. In this study, using Puf4p in fungi as an example, we demonstrate that posttranscriptional regulatory networks can also be rewired during evolution. Although Puf4p is highly conserved in fungi, targets of the posttranscriptional regulon are functionally diverse among known fungal species. In the Saccharomycotina subdivision, target genes of Puf4p mostly conduct function in the nucleolus; however, in the Pezizomycotina subdivision, they are enriched in the mitochondria. Furthermore, we demonstrate different regulation efficiencies of mitochondrial function by PUF proteins in different fungal clades. Our results indicate that rewiring of posttranscription regulatory networks may be an important way of generating genetic novelties in gene regulation during evolution.
posttranscriptional regulon; evolution of gene regulation; yeast
Studies in evolutionary developmental biology suggest that the structure of genetic pathways may bias the fixation of natural variation toward particular nodes in these pathways. In an attempt to test this trend genome wide, we integrated several previously published data sets to examine whether the position of genes in the whole-genome transcriptional network of Saccharomyces cerevisiae is associated with the amount of cis-regulatory expression divergence between S. cerevisiae and its sibling species Saccharomyces paradoxus. We find little evidence for an association between connectivity and divergence in the global network that combines data from multiple conditions. However, relationships between connectivity and divergence are apparent in some of the smaller subnetworks. Despite a slight tendency for genes with more transcriptional interactions to show greater divergence, these differences explain no more than a small fraction of variation in evolutionary rates. These results suggest that the systems biology focus on large interactomes may miss some critical details of local interactions. More detailed experimental analysis will be needed to define the genetic pathways that control specific phenotypic traits and quantify the rate of regulatory changes at different points in these pathways.
gene networks; cis-regulatory evolution; pleiotropy; species divergence; S. cerevisiae; S. paradoxus
Molecular evolutionary theory predicts that the ratio of autosomal to X-linked adaptive substitution (KA/Kx) is primarily determined by the average dominance coefficient of beneficial mutations. Although this theory has profoundly influenced analysis and interpretation of comparative genomic data, its predictions are based upon two unverified assumptions about the genetic basis of adaptation. The theory assumes that 1) the rate of adaptively driven molecular evolution is limited by the availability of beneficial mutations, and 2) the scaling of evolutionary parameters between the X and the autosomes (e.g., the beneficial mutation rate, and the fitness effect distribution of beneficial alleles, per X-linked versus autosomal locus) is constant across molecular evolutionary timescales. Here, we show that the genetic architecture underlying bouts of adaptive substitution can influence both assumptions, and consequently, the theoretical relationship between KA/Kx and mean dominance. Quantitative predictions of prior theory apply when 1) many genomically dispersed genes potentially contribute beneficial substitutions during individual steps of adaptive walks, and 2) the population beneficial mutation rate, summed across the set of potentially contributing genes, is sufficiently small to ensure that adaptive substitutions are drawn from new mutations rather than standing genetic variation. Current research into the genetic basis of adaptation suggests that both assumptions are plausibly violated. We find that the qualitative positive relationship between mean dominance and KA/Kx is relatively robust to the specific conditions underlying adaptive substitution, yet the quantitative relationship between dominance and KA/Kx is quite flexible and context dependent. This flexibility may partially account for the puzzlingly variable X versus autosome substitution patterns reported in the empirical evolutionary genomics literature. The new theory unites the previously separate analysis of adaptation using new mutations versus standing genetic variation and makes several useful predictions about the interaction between genetic architecture, evolutionary genetic constraints, and effective population size in determining the ratio of adaptive substitution between autosomal and X-linked genes.
dominance; epistasis; genetics of adaptation; soft sweeps; molecular evolution
The multispecies coalescent provides an elegant theoretical framework for estimating species trees and species demographics from genetic markers. However, practical applications of the multispecies coalescent model are limited by the need to integrate or sample over all gene trees possible for each genetic marker. Here we describe a polynomial-time algorithm that computes the likelihood of a species tree directly from the markers under a finite-sites model of mutation effectively integrating over all possible gene trees. The method applies to independent (unlinked) biallelic markers such as well-spaced single nucleotide polymorphisms, and we have implemented it in SNAPP, a Markov chain Monte Carlo sampler for inferring species trees, divergence dates, and population sizes. We report results from simulation experiments and from an analysis of 1997 amplified fragment length polymorphism loci in 69 individuals sampled from six species of Ourisia (New Zealand native foxglove).
multispecies coalescent; species trees; SNP; AFLP; effective population size; SNAPP
Myeloperoxidase (MPO) is a member of the mammalian heme peroxidase (MHP) multigene family. Whereas all MHPs oxidize specific halides to generate the corresponding hypohalous acid, MPO is unique in its capacity to oxidize chloride at physiologic pH to produce hypochlorous acid (HOCl), a potent microbicide that contributes to neutrophil-mediated host defense against infection. We have previously resolved the evolutionary relationships in this functionally diverse multigene family and predicted in silico that positive Darwinian selection played a major role in the observed functional diversities (Loughran NB, O'Connor B, O'Fagain C, O'Connell MJ. 2008. The phylogeny of the mammalian heme peroxidases and the evolution of their diverse functions. BMC Evol Biol. 8:101). In this work, we have replaced positively selected residues asparagine 496 (N496), tyrosine 500 (Y500), and leucine 504 (L504) with the amino acids present in the ancestral MHP and have examined the effects on the structure, biosynthesis, and activity of MPO. Analysis in silico predicted that N496F, Y500F, or L504T would perturb hydrogen bonding in the heme pocket of MPO and thus disrupt the structural integrity of the enzyme. Biosynthesis of the mutants stably expressed in human embryonic kidney 293 cells yielded apoproMPO, the heme-free, enzymatically inactive precursor of MPO, that failed to undergo normal maturation or proteolytic processing. As a consequence of the maturational arrest at the apoproMPO stage of development, cells expressing MPO with mutations N496F, Y500F, L504T, individually or in combination, lacked normal peroxidase or chlorinating activity. Taken together, our data provide further support for the in silico predictions of positive selection and highlight the correlation between positive selection and functional divergence. Our data demonstrate that directly probing the functional importance of positive selection can provide important insights into understanding protein evolution.
myeloperoxidase; animal peroxidase family; positive selection; protein evolution; Darwinian selection; functional shift
We previously reported a human-specific gene conversion of
SIGLEC11 by an adjacent paralogous pseudogene
(SIGLEC16P), generating a uniquely human form of the Siglec-11 protein,
which is expressed in the human brain. Here, we show that Siglec-11 is expressed
exclusively in microglia in all human brains studied—a finding of potential
relevance to brain evolution, as microglia modulate neuronal survival, and Siglec-11
recruits SHP-1, a tyrosine phosphatase that modulates microglial biology. Following the
recent finding of a functional SIGLEC16 allele in human populations,
further analysis of the human SIGLEC11 and
SIGLEC16/P sequences revealed an unusual series of
gene conversion events between two loci. Two tandem and likely simultaneous gene
conversions occurred from SIGLEC16P to SIGLEC11 with a
potentially deleterious intervening short segment happening to be excluded. One of the
conversion events also changed the 5′ untranslated sequence, altering predicted
transcription factor binding sites. Both of the gene conversions have been dated to
∼1–1.2 Ma, after the emergence of the genus Homo, but prior to
the emergence of the common ancestor of Denisovans and modern humans about 800,000 years
ago, thus suggesting involvement in later stages of hominin brain evolution. In keeping
with this, recombinant soluble Siglec-11 binds ligands in the human brain. We also address
a second-round more recent gene conversion from SIGLEC11 to
SIGLEC16, with the latter showing an allele frequency of
∼0.1–0.3 in a worldwide population study. Initial pseudogenization of
SIGLEC16 was estimated to occur at least 3 Ma, which thus preceded the
gene conversion of SIGLEC11 by SIGLEC16P. As gene
conversion usually disrupts the converted gene, the fact that ORFs of
hSIGLEC11 and hSIGLEC16 have been maintained after an
unusual series of very complex gene conversion events suggests that these events may have
been subject to hominin-specific selection forces.
pseudogene; gene conversion; human evolution; human brain; microglia
The genomes of related species contain valuable information on the history of the considered taxa. Great apes in particular exhibit variation of evolutionary patterns along their genomes. However, the great ape data also bring new challenges, such as the presence of incomplete lineage sorting and ancestral shared polymorphisms. Previous methods for genome-scale analysis are restricted to very few individuals or cannot disentangle the contribution of mutation rates and fixation biases. This represents a limitation both for the understanding of these forces as well as for the detection of regions affected by selection. Here, we present a new model designed to estimate mutation rates and fixation biases from genetic variation within and between species. We relax the assumption of instantaneous substitutions, modeling substitutions as mutational events followed by a gradual fixation. Hence, we straightforwardly account for shared ancestral polymorphisms and incomplete lineage sorting. We analyze genome-wide synonymous site alignments of human, chimpanzee, and two orangutan species. From each taxon, we include data from several individuals. We estimate mutation rates and GC-biased gene conversion intensity. We find that both mutation rates and biased gene conversion vary with GC content. We also find lineage-specific differences, with weaker fixation biases in orangutan species, suggesting a reduced historical effective population size. Finally, our results are consistent with directional selection acting on coding sequences in relation to exonic splicing enhancers.
phylogenetics-population genetics model; mutation rates; biased gene conversion; rate heterogeneity; coding sequence evolution; primates evolution
Class IV homeodomain leucine zipper (C4HDZ) genes are plant-specific transcription factors that, based on phenotypes in Arabidopsis thaliana, play an important role in epidermal development. In this study, we sampled all major extant lineages and their closest algal relatives for C4HDZ homologs and phylogenetic analyses result in a gene tree that mirrors land plant evolution with evidence for gene duplications in many lineages, but minimal evidence for gene losses. Our analysis suggests an ancestral C4HDZ gene originated in an algal ancestor of land plants and a single ancestral gene was present in the last common ancestor of land plants. Independent gene duplications are evident within several lineages including mosses, lycophytes, euphyllophytes, seed plants, and, most notably, angiosperms. In recently evolved angiosperm paralogs, we find evidence of pseudogenization via mutations in both coding and regulatory sequences. The increasing complexity of the C4HDZ gene family through the diversification of land plants correlates to increasing complexity in epidermal characters.
gene family evolution; gene duplication; transcription factor; homeodomain leucine zipper
Sex determination mechanisms are highly variable across teleost fishes and sexual development is often plastic. Nevertheless, downstream factors establishing the two sexes are presumably conserved. Here, we study sequence evolution and gene expression of core genes of sexual development in a prime model system in evolutionary biology, the East African cichlid fishes. Using the available five cichlid genomes, we test for signs of positive selection in 28 genes including duplicates from the teleost whole-genome duplication, and examine the expression of these candidate genes in three cichlid species. We then focus on a particularly striking case, the A- and B-copies of the aromatase cyp19a1, and detect different evolutionary trajectories: cyp19a1A evolved under strong positive selection, whereas cyp19a1B remained conserved at the protein level, yet is subject to regulatory changes at its transcription start sites. Importantly, we find shifts in gene expression in both copies. Cyp19a1 is considered the most conserved ovary-factor in vertebrates, and in all teleosts investigated so far, cyp19a1A and cyp19a1B are expressed in ovaries and the brain, respectively. This is not the case in cichlids, where we find new expression patterns in two derived lineages: the A-copy gained a novel testis-function in the Ectodine lineage, whereas the B-copy is overexpressed in the testis of the speciest-richest cichlid group, the Haplochromini. This suggests that even key factors of sexual development, including the sex steroid pathway, are not conserved in fish, supporting the idea that flexibility in sexual determination and differentiation may be a driving force of speciation.
sex determination; adaptive evolution; cichlid fishes; aromatase; developmental system drift; gene duplication
Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources: the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families, we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. TEs are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that “burden tests” that test for the effect of TEs as a class may be more fruitful.
transposable element; DGRP; DSPR; genomics; population genetics
Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
haplotype homozygosity; next generation sequencing; linkage disequilibrium; effective population size; PSMC
In the Metazoa, globin proteins display an underlying unity in tertiary structure that belies an extraordinary diversity in primary structures, biochemical properties, and physiological functions. Phylogenetic reconstructions can reveal which of these functions represent novel, lineage-specific innovations, and which represent ancestral functions that are shared with homologous globin proteins in other eukaryotes and even prokaryotes. To date, our understanding of globin diversity in deuterostomes has been hindered by a dearth of genomic sequence data from the Ambulacraria (echinoderms + hemichordates), the sister group of chordates, and the phylum Xenacoelomorpha, which includes xenoturbellids, acoelomorphs, and nemertodermatids. Here, we report the results of a phylogenetic and comparative genomic analysis of the globin gene repertoire of deuterostomes. We first characterized the globin genes of the acorn worm, Saccoglossus kowalevskii, a representative of the phylum Hemichordata. We then integrated genomic sequence data from the acorn worm into a comprehensive analysis of conserved synteny and phylogenetic relationships among globin genes from representatives of the eight lineages that comprise the superphylum Deuterostomia. The primary aims were 1) to unravel the evolutionary history of the globin gene superfamily in deuterostomes and 2) to use the estimated phylogeny to gain insights into the functional evolution of deuterostome globins. Results of our analyses indicate that the deuterostome common ancestor possessed a repertoire of at least four distinct globin paralogs and that different subsets of these ancestral genes have been retained in each of the descendant organismal lineages. In each major deuterostome group, a different subset of ancestral precursor genes underwent lineage-specific expansions of functional diversity through repeated rounds of gene duplication and divergence. By integrating results of the phylogenetic analysis with available functional data, we discovered that circulating oxygen-transport hemoglobins evolved independently in several deuterostome lineages and that intracellular nerve globins evolved independently in chordates and acoelomorph worms.
acorn worm; Ambulacraria; chordates; gene family evolution; hemoglobin; neuroglobin
Histone modification is an important mechanism of gene regulation in eukaryotes. Why many histone modifications can be stably maintained in the midst of genetic and environmental changes is a fundamental question in evolutionary biology. We obtained genome-wide profiles of three histone marks, H3 lysine 4 tri-methylation (H3K4me3), H3 lysine 4 mono-methylation (H3K4me1), and H3 lysine 27 acetylation (H3K27ac), for several cell types from human and mouse. We identified histone modifications that were stable among different cell types in human and histone modifications that were evolutionarily conserved between mouse and human in the same cell type. We found that histone modifications that were stable among cell types were also likely to be conserved between species. This trend was consistently observed in promoter, intronic, and intergenic regions for all of the histone marks tested. Importantly, the trend was observed regardless of the expression breadth of the nearby gene, indicating that slow evolution of housekeeping genes was not the major reason for the correlation. These regions showed distinct genetic and epigenetic properties, such as clustered transcription factor binding sites (TFBSs), high GC content, and CTCF binding at flanking sides. Based on our observations, we proposed that TFBS clustering in or near a histone modification plays a significant role in stabilizing and conserving the histone modification because TFBS clustering promotes TFBS conservation, which in turn promotes histone modification conservation. In summary, the results of this study support the view that in mammalian genomes a common mechanism maintains histone modifications against both genetic and environmental (cellular) changes.
histone modification; transcription factor binding site; evolution of chromatin state
The nearly neutral theory of molecular evolution predicts that the efficacy of both positive and purifying selection is a function of the long-term effective population size (Ne) of a species. Under this theory, the efficacy of natural selection should increase with Ne. Here, we tested this simple prediction by surveying ∼1.5 to 1.8 Mb of protein coding sequence in the two subspecies of the European rabbit (Oryctolagus cuniculus algirus and O. c. cuniculus), a mammal species characterized by high levels of nucleotide diversity and Ne estimates for each subspecies on the order of 1 × 106. When the segregation of slightly deleterious mutations and demographic effects were taken into account, we inferred that >60% of amino acid substitutions on the autosomes were driven to fixation by positive selection. Moreover, we inferred that a small fraction of new amino acid mutations (<4%) are effectively neutral (defined as 0 < Nes < 1) and that this fraction was negatively correlated with a gene’s expression level. Consistent with models of recurrent adaptive evolution, we detected a negative correlation between levels of synonymous site polymorphism and the rate of protein evolution, although the correlation was weak and nonsignificant. No systematic X chromosome–autosome difference was found in the efficacy of selection. For example, the proportion of adaptive substitutions was significantly higher on the X chromosome compared with the autosomes in O. c. algirus but not in O. c. cuniculus. Our findings support widespread positive and purifying selection in rabbits and add to a growing list of examples suggesting that differences in Ne among taxa play a substantial role in determining rates and patterns of protein evolution.
proportion of adaptive substitutions; distribution of fitness effects; effective population size; McDonald–Kreitman test; nearly neutral theory; transcriptome
Chromosomal inversions are usually portrayed as simple two-breakpoint rearrangements changing gene order but not gene number or structure. However, increasing evidence suggests that inversion breakpoints may often have a complex structure and entail gene duplications with potential functional consequences. Here, we used a combination of different techniques to investigate the breakpoint structure and the functional consequences of a complex rearrangement fixed in Drosophila buzzatii and comprising two tandemly arranged inversions sharing the middle breakpoint: 2m and 2n. By comparing the sequence in the breakpoint regions between D. buzzatii (inverted chromosome) and D. mojavensis (noninverted chromosome), we corroborate the breakpoint reuse at the molecular level and infer that inversion 2m was associated with a duplication of a ∼13 kb segment and likely generated by staggered breaks plus repair by nonhomologous end joining. The duplicated segment contained the gene CG4673, involved in nuclear transport, and its two nested genes CG5071 and CG5079. Interestingly, we found that other than the inversion and the associated duplication, both breakpoints suffered additional rearrangements, that is, the proximal breakpoint experienced a microinversion event associated at both ends with a 121-bp long duplication that contains a promoter. As a consequence of all these different rearrangements, CG5079 has been lost from the genome, CG5071 is now a single copy nonnested gene, and CG4673 has a transcript ∼9 kb shorter and seems to have acquired a more complex gene regulation. Our results illustrate the complex effects of chromosomal rearrangements and highlight the need of complementing genomic approaches with detailed sequence-level and functional analyses of breakpoint regions if we are to fully understand genome structure, function, and evolutionary dynamics.
inversion; breakpoint; Drosophila; BAC; shotgun sequencing; transposable elements
Heterogeneity among life traits in mammals has resulted in considerable phylogenetic conflict, particularly concerning the position of the placental root. Layered upon this are gene- and lineage-specific variation in amino acid substitution rates and compositional biases. Life trait variations that may impact upon mutational rates are longevity, metabolic rate, body size, and germ line generation time. Over the past 12 years, three main conflicting hypotheses have emerged for the placement of the placental root. These hypotheses place the Atlantogenata (common ancestor of Xenarthra plus Afrotheria), the Afrotheria, or the Xenarthra as the sister group to all other placental mammals. Model adequacy is critical for accurate tree reconstruction and by failing to account for these compositional and character exchange heterogeneities across the tree and data set, previous studies have not provided a strongly supported hypothesis for the placental root. For the first time, models that accommodate both tree and data set heterogeneity have been applied to mammal data. Here, we show the impact of accurate model assignment and the importance of data sets in accommodating model parameters while maintaining the power to reject competing hypotheses. Through these sophisticated methods, we demonstrate the importance of model adequacy, data set power and provide strong support for the Atlantogenata over other competing hypotheses for the position of the placental root.
mammal phylogeny; phylogenetic reconstruction; evolutionary models; placental root; heterogeneous modeling
The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.
selective sweep; positive selection; high-performance computing; site frequency spectrum