Emerging data from the coelacanth genome are beginning to shed light on the origin and evolution of tetrapod genes and noncoding elements. Of particular relevance is the realization that coelacanth retains active copies of transposable elements that once served as raw material for the evolution of new functional sequences in the vertebrate lineage. Recognizing the evolutionary significance of coelacanth genome in this regard, we employed an ab initio search strategy to further classify its repetitive complement. This analysis uncovered a class of interspersed elements (Latimeria Harbinger 1—LatiHarb1) that is a major contributor to coelacanth genome structure and gene content (∼1% to 4% or the genome). Sequence analyses indicate that 1) each ∼8.7 kb LatiHarb1 element contains two coding regions, a transposase gene and a gene whose function is as yet unknown (MYB-like) and 2) copies of LatiHarb1 retain biological activity in the coelacanth genome. Functional analyses verify transcriptional and enhancer activities of LatiHarb1 in vivo and reveal transcriptional decoupling that could permit MYB-like genes to play functional roles not directly linked to transposition. Thus, LatiHarb1 represents the first known instance of a harbinger-superfamily transposon with contemporary activity in a vertebrate genome. Analyses of LatiHarb1 further corroborate the notion that exaptation of anciently active harbinger elements gave rise to at least two vertebrate genes (harbi1 and naif1) and indicate that the vertebrate gene tsnare1 also traces its ancestry to this transposon superfamily. Based on our analyses of LatiHarb1, we speculate that several functional features of harbinger elements may predispose the transposon superfamily toward recurrent exaptive evolution of cellular coding genes. In addition, these analyses further reinforce the broad utility of the coelacanth genome and other “outgroup” genomes in understanding the ancestry and evolution of vertebrate genes and genomes.
doi:10.1093/molbev/msr267
PMCID: PMC3278475
PMID: 22045999
genome evolution; coelacanth; harbinger; exaptation; Latimeria
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple binding sites for the same transcription factor. Such “homotypic site clustering” has been hypothesized as arising out of functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite enhancers are common because they are favored by evolutionary sampling of the genotype–phenotype landscape. To test this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more readily reached by the simulated evolutionary process. We show that there are more ways to “build” a fit genotype with many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape, that is, an “evolutionary signature” in enhancer sequences. Finally, we investigated potential effects of other factors, such as rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic site clustering.
Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be evaluated, and cautions against “evolutionary mirages” present in common features of genomic sequence. The quantitative framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their composition and evolution.
doi:10.1093/molbev/msr277
PMCID: PMC3278477
PMID: 22075113
enhancer evolution; homotypic site clustering; complex genotypes; thermodynamic model
We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate classes. Under the Dirichlet process, the number of rate classes, assignment of branches to rate classes, and the rate value associated with each class are treated as random variables. The performance of this model was evaluated by conducting analyses on data sets simulated under a range of different models. We compared the Dirichlet process model with two alternative models for rate variation: the strict molecular clock and the independent rates model. Our results show that divergence time estimation under the DPP provides robust estimates of node ages and branch rates without significantly reducing power. Further analyses were conducted on a biological data set, and we provide examples of ways to summarize Markov chain Monte Carlo samples under this model.
doi:10.1093/molbev/msr255
PMCID: PMC3350323
PMID: 22049064
divergence time estimation; relaxed clock; phylogenetics; Bayesian estimation; Markov chain Monte Carlo; Dirichlet process prior; mixed model; simulation
Drosophila melanogaster has long been used as a model for the molecular genetics of innate immunity. Such work has uncovered several immune receptors that recognize bacterial and fungal pathogens by binding unique components of their cell walls and membranes. Drosophila also act as hosts to metazoan pathogens such as parasitic wasps, which can infect a majority of individuals in natural populations, but many aspects of their immune responses against these more closely related pathogens are poorly understood. Here, we present data describing the transcriptional induction and molecular evolution of a candidate Drosophila anti-wasp immunity gene, lectin-24A. Lectin-24A has a secretion signal sequence and its lectin domain suggests a function in sugar group binding. Transcript levels of lectin-24A were induced significantly stronger and faster following wasp attack than following wounding or bacterial infection, demonstrating lectin-24A is not a general stress response or defense response gene but is instead part of a specific response against wasps. The major site of lectin-24A transcript production is the fat body, the main humoral immune tissue of flies. Interestingly, lectin-24A is a new gene of the D. melanogaster/Drosophila simulans clade, displaying very little homology to any other Drosophila lectins. Population genetic analyses of lectin-24A DNA sequence data from African and North American populations of D. melanogaster and D. simulans revealed gene length polymorphisms segregating at high frequencies as well as strong evidence of repeated and recent selective sweeps. Thus, lectin-24A is a rapidly evolving new gene that has seemingly developed functional importance for fly resistance against infection by parasitic wasps.
doi:10.1093/molbev/msr191
PMCID: PMC3258034
PMID: 21873297
molecular evolution; new gene; immunity; lectin; Drosophila; parasitic wasp
Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.
doi:10.1093/molbev/msr202
PMCID: PMC3258035
PMID: 21873298
molecular evolution; statistical inference; phylogenetics; evolutionary tree; statistical bias; variance
The DARC (Duffy antigen/receptor for chemokines) gene, also called Duffy or FY, encodes a membrane-bound chemokine receptor. Two malaria parasites, Plasmodium vivax and Plasmodium knowlesi, use DARC to trigger internalization into red blood cells. Although much has been reported on the evolution of DARC null alleles, little is known about the evolution of the coding portion of this gene or the role that protein sequence divergence in this receptor may play in disease susceptibility or zoonosis. Here, we show that the Plasmodium interaction domain of DARC is nearly invariant in the human population, suggesting that coding polymorphism there is unlikely to play a role in differential susceptibility to infection. However, an analysis of DARC orthologs from 35 simian primate species reveals high levels of sequence divergence in the Plasmodium interaction domain. Signatures of positive selection in this domain indicate that species-specific mutations in the protein sequence of DARC could serve as barriers to the transmission of Plasmodium between primate species.
doi:10.1093/molbev/msr204
PMCID: PMC3258036
PMID: 21878684
malaria; positive selection; arms race; species tropism; zoonosis
Sub-Saharan Africa has consistently been shown to be the most genetically diverse region in the world. Despite the fact that a substantial portion of this variation is partitioned between groups practicing a variety of subsistence strategies and speaking diverse languages, there is currently no consensus on the genetic relationships of sub-Saharan African populations. San (a subgroup of KhoeSan) and many Pygmy groups maintain hunter-gatherer lifestyles and cluster together in autosomal-based analysis, whereas non-Pygmy Niger-Kordofanian speakers (non-Pygmy NKs) predominantly practice agriculture and show substantial genetic homogeneity despite their wide geographic range throughout sub-Saharan Africa. However, KhoeSan, who speak a set of relatively unique click-based languages, have long been thought to be an early branch of anatomically modern humans based on phylogenetic analysis. To formally test models of divergence among the ancestors of modern African populations, we resequenced a sample of San, Eastern, and Western Pygmies and non-Pygmy NKs individuals at 40 nongenic (∼2 kb) regions and then analyzed these data within an Approximate Bayesian Computation (ABC) framework. We find substantial support for a model of an early divergence of KhoeSan ancestors from a proto-Pygmy-non-Pygmy NKs group ∼110 thousand years ago over a model incorporating a proto-KhoeSan–Pygmy hunter-gatherer divergence from the ancestors of non-Pygmy NKs. The results of our analyses are consistent with previously identified signals of a strong bottleneck in Mbuti Pygmies and a relatively recent expansion of non-Pygmy NKs. We also develop a number of methodologies that utilize “pseudo-observed” data sets to optimize our ABC-based inference. This approach is likely to prove to be an invaluable tool for demographic inference using genome-wide resequencing data.
doi:10.1093/molbev/msr212
PMCID: PMC3258037
PMID: 21890477
sub-Saharan Africa; resequencing; Approximate Bayesian Computation; KhoeSan; Pygmy; demographic history
Males and females share the same genome, thus, phenotypic divergence requires differential gene expression and sex-specific regulation. Accordingly, the analysis of expression patterns is pivotal to the understanding of sex determination mechanisms. Many bivalves are stable gonochoric species, but the mechanism of gonad sexualization and the genes involved are still unknown. Moreover, during the period of sexual rest, a gonad is not present and sex cannot be determined. A mechanism associated with germ line differentiation in some bivalves, including the Manila clam Ruditapes philippinarum, is the doubly uniparental inheritance (DUI) of mitochondria, a variation of strict maternal inheritance. Two mitochondrial lineages are present, one transmitted through eggs and the other through sperm, as well as a mother-dependent sex bias of the progeny. We produced a de novo annotation of 17,186 transcripts from R. philippinarum and compared the transcriptomes of males and females and identified 1,575 genes with strong sex-specific expression and 166 sex-specific single nucleotide polymorphisms, obtaining preliminary information about genes that could be involved in sex determination. Then we compared the transcriptomes between a family producing predominantly females and a family producing predominantly males to identify candidate genes involved in regulation of sex-specific aspects of DUI system, finding a relationship between sex bias and differential expression of several ubiquitination genes. In mammalian embryos, sperm mitochondria are degraded by ubiquitination. A modification of this mechanism is hypothesized to be responsible for the retention of sperm mitochondria in male embryos of DUI species. Ubiquitination can additionally regulate gene expression, playing a role in sex determination of several animals. These data enable us to develop a model that incorporates both the DUI literature and our new findings.
doi:10.1093/molbev/msr248
PMCID: PMC3258041
PMID: 21976711
Ruditapes philippinarum; de novo; transcriptome; doubly uniparental inheritance; sex bias; sex determination
Rate heterogeneity among lineages is a common feature of molecular evolution, and it has long impeded our ability to accurately estimate the age of evolutionary divergence events. The development of relaxed molecular clocks, which model variable substitution rates among lineages, was intended to rectify this problem. Major subtypes of pandemic HIV-1 group M are thought to exemplify closely related lineages with different substitution rates. Here, we report that inferring the time of most recent common ancestor of all these subtypes in a single phylogeny under a single (relaxed) molecular clock produces significantly different dates for many of the subtypes than does analysis of each subtype on its own. We explore various methods to ameliorate this problem. We conclude that current molecular dating methods are inadequate for dealing with this type of substitution rate variation in HIV-1. Through simulation, we show that heterotachy causes root ages to be overestimated.
doi:10.1093/molbev/msr266
PMCID: PMC3258043
PMID: 22045998
molecular clock; rate variation; HIV-1
β-Catenin is a multifunctional scaffolding protein with roles in Wnt signaling, cell adhesion, and centrosome separation. Here, we report on independent duplications of the insect β-Catenin ortholog armadillo (arm) in the red flour beetle Tribolium castaneum and the pea aphid Acyrthosiphon pisum. Detailed sequence analysis shows that in both species, one paralog lost critical residues of the α-Catenin binding domain, which is essential for cell adhesion, and accumulated a dramatically higher number of amino acid substitutions in the central Arm repeat domain. Residues associated with aspects of Wnt signaling, however, are conserved in both paralogs. Consistent with these molecular signatures, the effects of specific and combinatorial knockdown experiments in the Tribolium embryo indicate that the duplication resulted in redundant involvement in Wnt signaling of both β-Catenin paralogs but differential inheritance of the ancestral cell adhesion and centrosome separation functions. We conclude that the duplicated pea aphid and flour beetle β-catenin genes experienced partial subfunctionalization, which appears to be evolutionarily favored. Providing first evidence of genetic separability of the cell adhesion and centrosome separation functions, the duplicated Tribolium and Acyrthosiphon arm paralogs offer new inroads for context-specific analyses of β-Catenin. Our data also revealed the conservation of a C-terminally truncated Arm isoform in both singleton and duplicated homologs, suggesting an as yet unexplored role in Wnt signaling.
doi:10.1093/molbev/msr219
PMCID: PMC3283115
PMID: 21890476
gene duplication; Drosophila; Tribolium; pea aphid; β-Catenin; armadillo; subfunctionalization; centrosome; cell adhesion; Wnt-signaling; redundancy
Despite the great utility of mitochondrial DNA (mtDNA) sequence data in population genetics and phylogenetics, key parameters describing the process of mitochondrial mutation (e.g., the rate and spectrum of mutational change) are based on few direct estimates. Furthermore, the variation in the mtDNA mutation process within species or between lineages with contrasting reproductive strategies remains poorly understood. In this study, we directly estimate the mtDNA mutation rate and spectrum using Daphnia pulex mutation-accumulation (MA) lines derived from sexual (cyclically parthenogenetic) and asexual (obligately parthenogenetic) lineages. The nearly complete mitochondrial genome sequences of 82 sexual and 47 asexual MA lines reveal high mtDNA mutation rate of 1.37 × 10−7 and 1.73 × 10−7 per nucleotide per generation, respectively. The Daphnia mtDNA mutation rate is among the highest in eukaryotes, and its spectrum is dominated by insertions and deletions (70%), largely due to the presence of mutational hotspots at homopolymeric nucleotide stretches. Maximum likelihood estimates of the Daphnia mitochondrial effective population size reveal that between five and ten copies of mitochondrial genomes are transmitted per female per generation. Comparison between sexual and asexual lineages reveals no statistically different mutation rates and highly similar mutation spectra.
doi:10.1093/molbev/msr243
PMCID: PMC3350313
PMID: 21998274
asexuality; mitochondrial DNA; mutation-accumulation; mutation hotspots; mitochondria effective population size; mitochondrial evolution
Chimeric genes form through the combination of portions of existing coding sequences to create a new open reading frame. These new genes can create novel protein structures that are likely to serve as a strong source of novelty upon which selection can act. We have identified 14 chimeric genes that formed through DNA-level mutations in Drosophila melanogaster, and we investigate expression profiles, domain structures, and population genetics for each of these genes to examine their potential to effect adaptive evolution. We find that chimeric gene formation commonly produces mid-domain breaks and unites portions of wholly unrelated peptides, creating novel protein structures that are entirely distinct from other constructs in the genome. These new genes are often involved in selective sweeps. We further find a disparity between chimeric genes that have recently formed and swept to fixation versus chimeric genes that have been preserved over long periods of time, suggesting that preservation and adaptation are distinct processes. Finally, we demonstrate that chimeric gene formation can produce qualitative expression changes that are difficult to mimic through duplicate gene formation, and that extremely young chimeric genes (dS < 0.03) are more likely to be associated with selective sweeps than duplicate genes of the same age. Hence, chimeric genes can serve as an exceptional source of genetic novelty that can have a profound influence on adaptive evolution in D. melanogaster.
doi:10.1093/molbev/msr184
PMCID: PMC3350314
PMID: 21771717
chimeric genes; Drosophila melanogaster; regulatory evolution; evolutionary novelty; adaptive evolution; duplicate genes
Transposable elements (TEs) are highly abundant in the genome and capable of mobility, two properties that make them particularly prone to transfer horizontally between organisms. Although the impact of horizontal transfer (HT) of TEs is well recognized in prokaryotes, the frequency of this phenomenon and its contribution to genome evolution in eukaryotes remain poorly appreciated. Here, we provide evidence that a DNA transposon called SPIN has colonized the genome of 17 species of reptiles representing nearly every major lineage of squamates, including 14 families of lizards, snakes, and amphisbaenians. Slot blot analyses indicate that SPIN has amplified to high copy numbers in most of these species, ranging from 2,000–28,000 copies per haploid genome. In contrast, we could not detect the presence of SPIN in any of the turtles (seven species from seven families) and crocodiles (four species) examined. Genetic distances between SPIN sequences from species belonging to different squamate families are consistently very low (average = 0.1), considering the deep evolutionary divergence of the families investigated (most are >100 My diverged). Furthermore, these distances fall below interfamilial distances calculated for two genes known to have evolved under strong functional constraint in vertebrates (RAG1, average = 0.24 and C-mos, average = 0.27). These data, combined with phylogenetic analyses, indicate that the widespread distribution of SPIN among squamates is the result of at least 13 independent events of HTs. Molecular dating and paleobiogeographical data suggest that these transfers took place during the last 50 My on at least three different continents (North America, South America and, Africa). Together, these results triple the number of known SPIN transfer events among tetrapods, provide evidence for a previously hypothesized transoceanic movement of SPIN transposons during the Cenozoic, and further underscore the role of HT in the evolution of vertebrate genomes.
doi:10.1093/molbev/msr181
PMCID: PMC3350315
PMID: 21771716
horizontal transfer; transposable elements; reptiles
Homologous long segments along the genomes of close or remote relatives that are identical by descent (IBD) from a common ancestor provide clues for recent events in human genetics. We set out to extensively map such IBD segments in large cohorts and investigate their distribution within and across different populations. We report analysis of several data sets, demonstrating that IBD is more common than expected by naïve models of population genetics. We show that the frequency of IBD pairs is population dependent and can be used to cluster individuals into populations, detect a homogeneous subpopulation within a larger cohort, and infer bottleneck events in such a subpopulation. Specifically, we show that Ashkenazi Jewish individuals are all connected through transitive remote family ties evident by sharing of 50 cM IBD to a publicly available data set of less than 400 individuals. We further expose regions where long-range haplotypes are shared significantly more often than elsewhere in the genome, observed across multiple populations, and enriched for common long structural variation. These are inconsistent with recent relatedness and suggest ancient common ancestry, with limited recombination between haplotypes.
doi:10.1093/molbev/msr133
PMCID: PMC3350316
PMID: 21984068
population genetics; identity by descent; haplotypes; computational tools; structural variations
We analyzed the genome-wide pattern of single nucleotide polymorphisms (SNPs) in a sample with 12 strains of Staphylococcus aureus. Population structure of S. aureus seems to be complex, and the 12 strains were divided into five groups, named A, B, C, D, and E. We conducted a detailed analysis of the topologies of gene genealogies across the genomes and observed a high rate and frequency of tree-shape switching, indicating extensive homologous recombination. Most of the detected recombination occurred in the ancestral population of A, B, and C, whereas there are a number of small regions that exhibit evidence for homologous recombination with a distinct related species. As such regions would contain a number of novel mutations, it is suggested that homologous recombination would play a crucial role to maintain genetic variation within species. In the A-B-C ancestral population, we found multiple lines of evidence that the coalescent pattern is very similar to what is expected in a panmictic population, suggesting that this population is suitable to apply the standard population genetic theories. Our analysis showed that homologous recombination caused a dramatic decay in linkage disequilibrium (LD) and there is almost no LD between SNPs with distance more than 10 kb. Coalescent simulations demonstrated that a high rate of homologous recombination—a relative rate of 0.6 to the mutation rate with an average tract length of about 10 kb—is required to produce patterns similar to those observed in the S. aureus genomes. Our results call for more research into the evolutionary role of homologous recombination in bacterial populations.
doi:10.1093/molbev/msr249
PMCID: PMC3350317
PMID: 22009061
population genomics; bacteria; homologous recombination; demography; linkage disequilibrium
Many of the eukaryotic phylogenomic analyses published to date were based on alignments of hundreds to thousands of genes. Frequently, in such analyses, the most realistic evolutionary models currently available are often used to minimize the impact of systematic error. However, controversy remains over whether or not idiosyncratic gene family dynamics (i.e., gene duplications and losses) and incorrect orthology assignments are always appropriately taken into account. In this paper, we present an innovative strategy for overcoming orthology assignment problems. Rather than identifying and eliminating genes with paralogy problems, we have constructed a data set comprised exclusively of conserved single-copy protein domains that, unlike most of the commonly used phylogenomic data sets, should be less confounded by orthology miss-assignments. To evaluate the power of this approach, we performed maximum likelihood and Bayesian analyses to infer the evolutionary relationships within the opisthokonts (which includes Metazoa, Fungi, and related unicellular lineages). We used this approach to test 1) whether Filasterea and Ichthyosporea form a clade, 2) the interrelationships of early-branching metazoans, and 3) the relationships among early-branching fungi. We also assessed the impact of some methods that are known to minimize systematic error, including reducing the distance between the outgroup and ingroup taxa or using the CAT evolutionary model. Overall, our analyses support the Filozoa hypothesis in which Ichthyosporea are the first holozoan lineage to emerge followed by Filasterea, Choanoflagellata, and Metazoa. Blastocladiomycota appears as a lineage separate from Chytridiomycota, although this result is not strongly supported. These results represent independent tests of previous phylogenetic hypotheses, highlighting the importance of sophisticated approaches for orthology assignment in phylogenomic analyses.
doi:10.1093/molbev/msr185
PMCID: PMC3350318
PMID: 21771718
Capsaspora; Filasterea; Filozoa; Holozoa; Ichthyosporea; multicellularity
The Opisthokonta clade includes Metazoa, Fungi, and several unicellular lineages, such as choanoflagellates, filastereans, ichthyosporeans, and nucleariids. To date, studies of the evolutionary diversity of opisthokonts have focused exclusively on metazoans, fungi, and, very recently, choanoflagellates. Thus, very little is known about diversity among the filastereans, ichthyosporeans, and nucleariids. To better understand the evolutionary diversity and ecology of the opisthokonts, here we analyze published environmental data from nonfungal unicellular opisthokonts and report 18S ribosomal DNA phylogenetic analyses. Our data reveal extensive diversity among all unicellular opisthokonts, except for the filastereans. We identify several clades that consist exclusively of environmental sequences, especially among ichthyosporeans and choanoflagellates. Moreover, we show that the ichthyosporeans represent a significant percentage of overall unicellular opisthokont diversity, with a greater ecological role in marine environments than previously believed. Our results provide a useful phylogenetic framework for future ecological and evolutionary studies of these poorly known lineages.
doi:10.1093/molbev/mst006
PMCID: PMC3603316
PMID: 23329685
unicellular opisthokonts; diversity; distribution; choanoflagellates; ichthyosporeans
Sex chromosome evolution is usually seen as a process that, once initiated, will inevitably progress toward an advanced stage of degeneration of the nonrecombining chromosome. However, despite evidence that avian sex chromosome evolution was initiated >100 Ma, ratite birds have been trapped in an arrested stage of sex chromosome divergence. We performed RNA sequencing of several tissues from male and female ostriches and assembled the transcriptome de novo. A total of 315 Z-linked genes fell into two categories: those that have equal expression level in the two sexes (for which Z–W recombination still occurs) and those that have a 2-fold excess of male expression (for which Z–W recombination has ceased). We suggest that failure to evolve dosage compensation has constrained sex chromosome divergence in this basal avian lineage. Our results indicate that dosage compensation is a prerequisite for, not only a consequence of, sex chromosome evolution.
doi:10.1093/molbev/mst009
PMCID: PMC3603317
PMID: 23329687
We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.
doi:10.1093/molbev/mst010
PMCID: PMC3603318
PMID: 23329690
multiple sequence alignment; metagemone; protein structure; progressive alignment; parallel processing
The sequences of the untranslated regions (UTRs) of mRNAs play important roles in posttranscriptional regulation, but whether a change in UTR length can significantly affect the regulation of gene expression is not clear. In this study, we examined the connection between UTR length and Expression Correlation with cytosolic ribosomal proteins (CRP) genes (ECC), which measures the level of expression similarity of a group of genes with CRP genes under various growth conditions. We used data from the aerobic fermentation yeast Saccharomyces cerevisiae and the aerobic respiration yeast Candida albicans. To reduce statistical fluctuations, we computed the ECC for the genes in a Gene Ontology (GO) functional group. We found that in both species, ECC is strongly correlated with the 5′ UTR length but not with the 3′ UTR length and that the 5′ UTR length is evolutionarily better conserved than the 3′ UTR length. Interestingly, we found 11 GO groups that have had a substantial increase in 5′ UTR length in the S. cerevisiae lineage and that the length increase was associated with a substantial decrease in ECC. Moreover, 9 of the 11 GO groups of genes are involved in mitochondrial respiration function, whose expression reprogramming has been shown to be a major factor for the evolution of aerobic fermentation. Finally, we found that an increase in 5′ UTR length may decrease the +1 nucleosome occupancy. This study provides a new angle to understand the role of 5′ UTR in gene expression regulation and evolution.
doi:10.1093/molbev/msr143
PMCID: PMC3245540
PMID: 21965341
UTR length; gene expression evolution; aerobic fermentation
It has been hypothesized that two successive rounds of whole-genome duplication (WGD) in the stem lineage of vertebrates provided genetic raw materials for the evolutionary innovation of many vertebrate-specific features. However, it has seldom been possible to trace such innovations to specific functional differences between paralogous gene products that derive from a WGD event. Here, we report genomic evidence for a direct link between WGD and key physiological innovations in the vertebrate oxygen transport system. Specifically, we demonstrate that key globin proteins that evolved specialized functions in different aspects of oxidative metabolism (hemoglobin, myoglobin, and cytoglobin) represent paralogous products of two WGD events in the vertebrate common ancestor. Analysis of conserved macrosynteny between the genomes of vertebrates and amphioxus (subphylum Cephalochordata) revealed that homologous chromosomal segments defined by myoglobin + globin-E, cytoglobin, and the α-globin gene cluster each descend from the same linkage group in the reconstructed proto-karyotype of the chordate common ancestor. The physiological division of labor between the oxygen transport function of hemoglobin and the oxygen storage function of myoglobin played a pivotal role in the evolution of aerobic energy metabolism, supporting the hypothesis that WGDs helped fuel key innovations in vertebrate evolution.
doi:10.1093/molbev/msr207
PMCID: PMC3245541
PMID: 21965344
cytoglobin; genome duplication; globin gene family; hemoglobin; myoglobin
Huff, Chad D. | Witherspoon, David J. | Zhang, Yuhua | Gatenbee, Chandler | Denson, Lee A. | Kugathasan, Subra | Hakonarson, Hakon | Whiting, April | Davis, Chadwick T. | Wu, Wilfred | Xing, Jinchuan | Watkins, W. Scott | Bamshad, Michael J. | Bradfield, Jonathan P. | Bulayeva, Kazima | Simonson, Tatum S. | Jorde, Lynn B. | Guthery, Stephen L.
Inflammatory bowel disease 5 (IBD5) is a 250 kb haplotype on chromosome 5 that is associated with an increased risk of Crohn’s disease in Europeans. The OCTN1 gene is centrally located on IBD5 and encodes a transporter of the antioxidant ergothioneine (ET). The 503F variant of OCTN1 is strongly associated with IBD5 and is a gain-of-function mutation that increases absorption of ET. Although 503F has been implicated as the variant potentially responsible for Crohn’s disease susceptibility at IBD5, there is little evidence beyond statistical association to support its role in disease causation. We hypothesize that 503F is a recent adaptation in Europeans that swept to relatively high frequency and that disease association at IBD5 results not from 503F itself, but from one or more nearby hitchhiking variants, in the genes IRF1 or IL5. To test for evidence of recent positive selection on the 503F allele, we employed the iHS statistic, which was significant in the European CEU HapMap population (P = 0.0007) and European Human Genome Diversity Panel populations (P ≤ 0.01). To evaluate the hypothesis of disease-variant hitchhiking, we performed haplotype association tests on high-density microarray data in a sample of 1,868 Crohn’s disease cases and 5,550 controls. We found that 503F haplotypes with recombination breakpoints between OCTN1 and IRF1 or IL5 were not associated with disease (odds ratio [OR]: 1.05, P = 0.21). In contrast, we observed strong disease association for 503F haplotypes with no recombination between these three genes (OR: 1.24, P = 2.6 × 10−8), as expected if the sweeping haplotype harbored one or more disease-causing mutations in IRF1 or IL5. To further evaluate these disease-gene candidates, we obtained expression data from lower gastrointestinal biopsies of healthy individuals and Crohn’s disease patients. We observed a 72% increase in gene expression of IRF1 among Crohn’s disease patients (P = 0.0006) and no significant difference in expression of OCTN1. Collectively, these data indicate that the 503F variant has increased in frequency due to recent positive selection and that disease-causing variants in linkage disequilibrium with 503F have hitchhiked to relatively high frequency, thus forming the IBD5 risk haplotype. Finally, our association results and expression data support IRF1 as a strong candidate for Crohn’s disease causation.
doi:10.1093/molbev/msr151
PMCID: PMC3245542
PMID: 21816865
positive selection; genetic hitchhiking; Crohn's disease; IBD5; IRF1
Anaplasma phagocytophilum is an obligately intracellular tick-transmitted bacterial pathogen of humans and other animals. During the course of infection, A. phagocytophilum utilizes gene conversion to shuffle ∼100 functional pseudogenes into a single expression cassette of the msp2(p44) gene, which codes for the major surface antigen and major surface protein 2 (MSP2). The role and extent of msp2(p44) recombination, particularly in hosts that only experience acute infections, is not clear. In the present study, we explored patterns of recombination and expression of the msp2(p44) gene of A. phagocytophilum in a serially infected mouse model. Even though the bacterium was passed rapidly among mice, minimizing the opportunities for the host to develop adaptive immunity, we detected the emergence of 34 unique msp2(p44) expression cassette variants. The expression of msp2(p44) pseudogenes did not follow a consistent pattern among different groups of mice, although some pseudogenes were expressed more frequently than others. In addition, among 263 expressed pseudogenes, 3 mosaic sequences each consisting of 2 different pseudogenes were identified. Population genetic analysis showed that genetic diversity and subpopulation differentiation tended to increase over time until stationarity was reached but that the variance that was observed in allele (expressed pseudogene) frequency could occur by drift alone only if a high variance in bacterial reproduction could be assumed. These findings suggest that evolutionary forces influencing antigen variation in A. phagocytophilum may comprise random genetic drift as well as some innate but apparently nonpurifying selection prior to the strong frequency-dependent selection that occurs cyclically after hosts develop strong adaptive immunity.
doi:10.1093/molbev/msr229
PMCID: PMC3245543
PMID: 21965342
Anaplasma phagocytophilum; msp2(p44); antigen variation; recombination; drift; selection
Anopheles gambiae sensu stricto exists as two often-sympatric races termed the M and S molecular forms, characterized by fixed differences at an X-linked marker. Extreme divergence between M and S forms at pericentromeric “genomic islands” suggested that selection on variants therein could be driving interform divergence in the presence of ongoing gene flow, but recent work has detected much more widespread genomic differentiation. Whether such genomic islands are important in reproductive isolation or represent ancestral differentiation preserved by low recombination is currently unclear. A critical test of these competing hypotheses could be provided by comparing genomic divergence when rates of recent introgression vary. We genotyped 871 single nucleotide polymorphisms (SNPs) in A. gambiae sensu stricto from locations of M and S sympatry and allopatry, encompassing the full range of observed hybridization rates (0–25%). M and S forms were readily partitioned based on genomewide SNP variation in spite of evidence for ongoing introgression that qualitatively reflects hybridization rates. Yet both the level and the heterogeneity of genomic divergence varied markedly in line with levels of introgression. A few genomic regions of differentiation between M and S were common to each sampling location, the most pronounced being two centromere–proximal speciation islands identified previously but with at least one additional region outside of areas expected to exhibit reduced recombination. Our results demonstrate that extreme divergence at genomic islands does not simply represent segregating ancestral polymorphism in regions of low recombination and can be resilient to substantial gene flow. This highlights the potential for islands comprising a relatively small fraction of the genome to play an important role in early-stage speciation when reproductive isolation is limited.
doi:10.1093/molbev/msr199
PMCID: PMC3259608
PMID: 21836185
Anopheles gambiae; malaria vector; speciation island; single nucleotide polymorphism; hybridization
In order to gain further insight into the processes underlying rapid reproductive protein evolution, we have conducted a population genetic survey of 44 reproductive tract–expressed proteases, protease inhibitors, and targets of proteolysis in Drosophila melanogaster and Drosophila simulans. Our findings suggest that positive selection on this group of genes is temporally heterogeneous, with different patterns of selection inferred using tests sensitive at different time scales. Such variation in the strength and targets of selection through time may be expected under models of sexual conflict and/or host–pathogen interaction. Moreover, available functional information concerning the genes that show evidence of selection suggests that both sexual selection and immune processes have been important in the evolutionary history of this group of molecules.
doi:10.1093/molbev/msr197
PMCID: PMC3283112
PMID: 21940639
seminal proteins; sexual conflict; sperm competition; proteolysis; coevolution; immunity