Streptococcus mutans is widely recognized as one of the key etiological agents of human dental caries. Despite its role in this important disease, our present knowledge of gene content variability across the species and its relationship to adaptation is minimal. Estimates of its demographic history are not available. In this study, we generated genome sequences of 57 S. mutans isolates, as well as representative strains of the most closely related species to S. mutans (S. ratti, S. macaccae, and S. criceti), to identify the overall structure and potential adaptive features of the dispensable and core components of the genome. We also performed population genetic analyses on the core genome of the species aimed at understanding the demographic history, and impact of selection shaping its genetic variation. The maximum gene content divergence among strains was approximately 23%, with the majority of strains diverging by 5–15%. The core genome consisted of 1,490 genes and the pan-genome approximately 3,296. Maximum likelihood analysis of the synonymous site frequency spectrum (SFS) suggested that the S. mutans population started expanding exponentially approximately 10,000 years ago (95% confidence interval [CI]: 3,268–14,344 years ago), coincidental with the onset of human agriculture. Analysis of the replacement SFS indicated that a majority of these substitutions are under strong negative selection, and the remainder evolved neutrally. A set of 14 genes was identified as being under positive selection, most of which were involved in either sugar metabolism or acid tolerance. Analysis of the core genome suggested that among 73 genes present in all isolates of S. mutans but absent in other species of the mutans taxonomic group, the majority can be associated with metabolic processes that could have contributed to the successful adaptation of S. mutans to its new niche, the human mouth, and with the dietary changes that accompanied the origin of agriculture.
Streptococcus mutans; demographic inference; cavities; bacterial evolution; pan and core genome; infectious disease
Sociocultural phenomena, such as exogamy or phylopatry, can largely determine human sex-specific demography. In Central Africa, diverging patterns of sex-specific genetic variation have been observed between mobile hunter–gatherer Pygmies and sedentary agricultural non-Pygmies. However, their sex-specific demography remains largely unknown. Using population genetics and approximate Bayesian computation approaches, we inferred male and female effective population sizes, sex-specific migration, and admixture rates in 23 Central African Pygmy and non-Pygmy populations, genotyped for autosomal, X-linked, Y-linked, and mitochondrial markers. We found much larger effective population sizes and migration rates among non-Pygmy populations than among Pygmies, in agreement with the recent expansions and migrations of non-Pygmies and, conversely, the isolation and stationary demography of Pygmy groups. We found larger effective sizes and migration rates for males than for females for Pygmies, and vice versa for non-Pygmies. Thus, although most Pygmy populations have patrilocal customs, their sex-specific genetic patterns resemble those of matrilocal populations. In fact, our results are consistent with a lower prevalence of polygyny and patrilocality in Pygmies compared with non-Pygmies and a potential female transmission of reproductive success in Pygmies. Finally, Pygmy populations showed variable admixture levels with the non-Pygmies, with often much larger introgression from male than from female lineages. Social discrimination against Pygmies triggering complex movements of spouses in intermarriages can explain these male-biased admixture patterns in a patrilocal context. We show how gender-related sociocultural phenomena can determine highly variable sex-specific demography among populations, and how population genetic approaches contrasting chromosomal types allow inferring detailed human sex-specific demographic history.
approximate Bayesian computation; African Pygmy; demography; history; human population genetics; sex specific
The whey acidic protein (WAP) four-disulfide core domain (WFDC) locus located on human chromosome 20q13 spans 19 genes with WAP and/or Kunitz domains. These genes participate in antimicrobial, immune, and tissue homoeostasis activities. Neighboring SEMG genes encode seminal proteins Semenogelin 1 and 2 (SEMG1 and SEMG2). WFDC and SEMG genes have a strikingly high rate of amino acid replacement (dN/dS), indicative of responses to adaptive pressures during vertebrate evolution. To better understand the selection pressures acting on WFDC genes in human populations, we resequenced 18 genes and 54 noncoding segments in 71 European (CEU), African (YRI), and Asian (CHB + JPT) individuals. Overall, we identified 484 single-nucleotide polymorphisms (SNPs), including 65 coding variants (of which 49 are nonsynonymous differences). Using classic neutrality tests, we confirmed the signature of short-term balancing selection on WFDC8 in Europeans and a signature of positive selection spanning genes PI3, SEMG1, SEMG2, and SLPI. Associated with the latter signal, we identified an unusually homogeneous-derived 100-kb haplotype with a frequency of 88% in Asian populations. A putative candidate variant targeted by selection is Thr56Ser in SEMG1, which may alter the proteolytic profile of SEMG1 and antimicrobial activities of semen. All the well-characterized genes residing in the WDFC locus encode proteins that appear to have a role in immunity and/or fertility, two processes that are often associated with adaptive evolution. This study provides further evidence that the WFDC and SEMG loci have been under strong adaptive pressure within the short timescale of modern humans.
WFDC; semenogelins; natural selection; innate immunity; serine protease inhibitors; reproduction
Interspersed and tandem repeat sequences comprise the bulk of mammalian genomes. Interspersed repeats result from successive replication by transposable elements, such as Alu and long interspersed element type 1 (L1). Microsatellites are tandem repeats of 1–6 base pairs, among which poly(A) microsatellites are the most abundant in the human genome. The rise and fall of a microsatellite has been depicted as a life cycle. Previous studies have demonstrated that Alu and L1 insertions are a major source of A-rich microsatellites owing to the concurrent formation of a poly(A) DNA tract at the 3′-end of each insertion. The fate of such poly(A) tracts has been studied by surveying the length distribution of genomic resident Alu and L1 insertions. However, these cross-sectional studies provide no information about the tempo of mutation immediately after birth. In this study, de novo L1 insertions were created using a transgenic L1 mouse model and traced through generations to investigate the early life of poly(A) microsatellites. High frequencies of intra-individual and intergenerational shortening were observed for long poly(A) tracts, creating somatic and germline mosaicism at the insertion site, whereas little variation was observed for short poly(A) alleles. As poly(A) microsatellites are the major intrinsic signal for nucleosome positioning, their remarkable abundance and variability make them a significant source of epigenetic variation. Thus, the birth of poly(A) microsatellites from retrotransposons and the subsequent rapid and variable shortening represent a new way with which retrotransposons can modify the genetic and epigenetic architecture of our genome.
development; LINE-1; mononucleotide microsatellite repeat; mosaicism; mouse model; poly(A) tract shortening
Dissecting the genetic basis for the evolution of species differences requires a combination of phylogenetic and molecular genetic perspectives. By mapping the genetic changes and their phenotypic effects onto the phylogeny, it is possible to distinguish changes that may have been directly responsible for a new character state from those that fine tune the transition. Here, we use phylogenetic and functional methods to trace the evolution of substrate specificity in dihydroflavonol-4-reductase (Dfr), an anthocyanin pathway gene known to be involved in the transition from blue to red flowers in Iochroma. Ancestral state reconstruction indicates that three substitutions occurred during the flower color transition, whereas several additional substitutions followed the transition. Comparisons of enzymatic function between ancestral proteins in blue- and red-flowered lineages and proteins from present-day taxa demonstrate that evolution of specificity for red pigment precursors was caused by the first three substitutions, which were fixed by positive selection and which differ from previously documented mutations affecting specificity. Two inferred substitutions subsequent to the initial flower color transition were also adaptive and resulted in an additional increase in specificity for red precursors. Epistatic interactions among both sets of substitutions may have limited the order of substitutions along branches of the phylogeny leading from blue-pigmented ancestors to the present-day red-flowered taxa. These results suggest that the species differences in DFR specificity may arise by a combination of selection on flower color and selection for improved pathway efficiency but that the exact series of genetic changes resulting in the evolution of specificity is likely to be highly contingent on the starting state.
ancestral state reconstruction; anthocyanins; dihydroflavonol-4-reductase; flower color; Iochroma
Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-based models have been developed for Bayesian nonparametric estimation of effective population size over time. Among the most successful is a Gaussian Markov random field (GMRF) model for a single gene locus. Here, we present a generalization of the GMRF model that allows for the analysis of multilocus sequence data. Using simulated data, we demonstrate the improved performance of our method to recover true population trajectories and the time to the most recent common ancestor (TMRCA). We analyze a multilocus alignment of HIV-1 CRF02_AG gene sequences sampled from Cameroon. Our results are consistent with HIV prevalence data and uncover some aspects of the population history that go undetected in Bayesian parametric estimation. Finally, we recover an older and more reconcilable TMRCA for a classic ancient DNA data set.
coalescent; smoothing; effective population size; Gaussian Markov random fields
The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich’s ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants.
microsatellites; fitness landscape; natural selection; population genetic inference; Friedreich’s ataxia; tandem repeats
Diatoms are the most species-rich group of microalgae, and their contribution to marine primary production is important on a global scale. Diatoms can form dense blooms through rapid asexual reproduction; mutations acquired and propagated during blooms likely provide the genetic, and thus phenotypic, variability upon which natural selection may act. Positive selection was tested using genome and transcriptome-wide pair-wise comparisons of homologs in three genera of diatoms (Pseudo-nitzschia, Ditylum, and Thalassiosira) that represent decreasing phylogenetic distances. The signal of positive selection was greatest between two strains of Thalassiosira pseudonana. Further testing among seven strains of T. pseudonana yielded 809 candidate genes of positive selection, which are 7% of the protein-coding genes. Orphan genes and genes encoding protein-binding domains and transcriptional regulators were enriched within the set of positively selected genes relative to the genome as a whole. Positively selected genes were linked to the potential selective pressures of nutrient limitation and sea surface temperature based on analysis of gene expression profiles and identification of positively selected genes in subsets of strains from locations with similar environmental conditions. The identification of positively selected genes presents an opportunity to test new hypotheses in natural populations and the laboratory that integrate selected genotypes in T. pseudonana with their associated phenotypes and selective forces.
positive selection; natural selection; diatom; evolution
Reconstruction of the past is an important task of evolutionary biology. It takes place at different points in a hierarchy of molecular variation, including genes, individuals, populations, and species. Statistical inference about population histories has recently received considerable attention, following the development of computational tools to provide tractable approaches to this very challenging problem. Here, we introduce a likelihood-based approach which generalizes a recently developed model for random fluctuations in allele frequencies based on an approximation to the neutral Wright–Fisher diffusion. Our new framework approximates the infinite alleles Wright–Fisher model and uses an implementation with an adaptive Markov chain Monte Carlo algorithm. The method is especially well suited to data sets harboring large population samples and relatively few loci for which other likelihood-based models are currently computationally intractable. Using our model, we reconstruct the global population history of a major human pathogen, Streptococcus pneumoniae. The results illustrate the potential to reach important biological insights to an evolutionary process by a population genetics approach, which can appropriately accommodate very large population samples.
population history; genetic drift; infinite alleles Wright–Fisher model
Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike’s information criterion through Markov chain Monte Carlo (AICM), in Bayesian model selection of demographic and molecular clock models. Almost simultaneously, a Bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.
model comparison; marginal likelihood; Bayes factors; path sampling; stepping-stone sampling; model averaging; molecular clock; Bayesian inference; phylogeny; BEAST
Plasmids of the incompatibility group IncP-1 can transfer and replicate in many genera of the Proteobacteria. They are composed of backbone genes that encode a variety of essential functions and accessory genes that have implications for human health and environmental remediation. Although it is well understood that the accessory genes are transferred horizontally between plasmids, recent studies have also provided examples of recombination in the backbone genes of IncP-1 plasmids. As a consequence, phylogeny estimation based on backbone genes is expected to produce conflicting gene tree topologies. The main goal of this study was therefore to infer the evolutionary history of IncP-1 plasmids in the presence of both vertical and horizontal gene transfer. This was achieved by quantifying the incongruence among gene trees and attributing it to known causes such as 1) phylogenetic uncertainty, 2) coalescent stochasticity, and 3) horizontal inheritance. Topologies of gene trees exhibited more incongruence than could be attributed to phylogenetic uncertainty alone. Species-tree estimation using a Bayesian framework that takes coalescent stochasticity into account was well supported, but it differed slightly from the maximum-likelihood tree estimated by concatenation of backbone genes. After removal of the gene that demonstrated a signal of intergroup recombination, the concatenated tree was congruent with the species-tree estimate, which itself was robust to inclusion/exclusion of the recombinant gene. Thus, in spite of horizontal gene exchange both within and among IncP-1 subgroups, the backbone genome of these IncP-1 plasmids retains a detectable vertical evolutionary history.
plasmid; phylogeny; species tree; genomics; horizontal gene transfer
Network characteristics of biochemical pathways are believed to influence the rate of evolutionary change in constituent enzymes. One characteristic that may affect rate heterogeneity is control of the amount of product produced by a biochemical pathway or flux control. In particular, theoretical analyses suggest that adaptive substitutions should be concentrated in the enzyme(s) that exert the greatest control over flux. Although a handful of studies have found a correlation between position in a pathway and evolutionary rate, these investigations have not examined the relationship between evolutionary rate and flux control. Given that genes with greater control will experience stronger selection and that the probability of fixation is proportional to the selective advantage, we ask the following: 1) do upstream enzymes have majority flux control, 2) do enzymes with majority flux control accumulate adaptive substitutions, and 3) are upstream enzymes under higher selective constraint? First, by perturbing the enzymes in the aliphatic glucosinolate pathway in Arabidopsis thaliana with gene insertion lines, we show that flux control is focused in the first enzyme in the pathway. Next, by analyzing several sequence signatures of selection, we also show that this enzyme is the only one in the pathway that shows convincing evidence of selection. Our results support the hypothesis that natural selection preferentially acts on enzymes with high flux control.
flux control; evolution; Arabidopsis thaliana; natural selection; glucosinolates
Dinoflagellates produce a variety of toxic secondary metabolites that have a significant impact on marine ecosystems and fisheries. Saxitoxin (STX), the cause of paralytic shellfish poisoning, is produced by three marine dinoflagellate genera and is also made by some freshwater cyanobacteria. Genes involved in STX synthesis have been identified in cyanobacteria but are yet to be reported in the massive genomes of dinoflagellates. We have assembled comprehensive transcriptome data sets for several STX-producing dinoflagellates and a related non-toxic species and have identified 265 putative homologs of 13 cyanobacterial STX synthesis genes, including all of the genes directly involved in toxin synthesis. Putative homologs of four proteins group closely in phylogenies with cyanobacteria and are likely the functional homologs of sxtA, sxtG, and sxtB in dinoflagellates. However, the phylogenies do not support the transfer of these genes directly between toxic cyanobacteria and dinoflagellates. SxtA is split into two proteins in the dinoflagellates corresponding to the N-terminal portion containing the methyltransferase and acyl carrier protein domains and a C-terminal portion with the aminotransferase domain. Homologs of sxtB and N-terminal sxtA are present in non-toxic strains, suggesting their functions may not be limited to saxitoxin production. Only homologs of the C-terminus of sxtA and sxtG were found exclusively in toxic strains. A more thorough survey of STX+ dinoflagellates will be needed to determine if these two genes may be specific to SXT production in dinoflagellates. The A. tamarense transcriptome does not contain homologs for the remaining STX genes. Nevertheless, we identified candidate genes with similar predicted biochemical activities that account for the missing functions. These results suggest that the STX synthesis pathway was likely assembled independently in the distantly related cyanobacteria and dinoflagellates, although using some evolutionarily related proteins. The biological role of STX is not well understood in either cyanobacteria or dinoflagellates. However, STX production in these two ecologically distinct groups of organisms suggests that this toxin confers a benefit to producers that we do not yet fully understand.
Saxitoxin; dinoflagellate; evolution; gene transfer; secondary metabolism; toxin
The inverse correlation between skin pigmentation and latitude observed in human populations is thought to have been shaped by selective pressures favoring lighter skin to facilitate vitamin D synthesis in regions far from the equator. Several candidate genes for skin pigmentation have been shown to exhibit patterns of polymorphism that overlap the geospatial variation in skin color. However, little work has focused on estimating the time frame over which skin pigmentation has changed and on the intensity of selection acting on different pigmentation genes. To provide a temporal framework for the evolution of lighter pigmentation, we used forward Monte Carlo simulations coupled with a rejection sampling algorithm to estimate the time of onset of selective sweeps and selection coefficients at four genes associated with this trait in Europeans: KITLG, TYRP1, SLC24A5, and SLC45A2. Using compound haplotype systems consisting of rapidly evolving microsatellites linked to one single-nucleotide polymorphism in each gene, we estimate that the onset of the sweep shared by Europeans and East Asians at KITLG occurred approximately 30,000 years ago, after the out-of-Africa migration, whereas the selective sweeps for the European-specific alleles at TYRP1, SLC24A5, and SLC45A2 started much later, within the last 11,000–19,000 years, well after the first migrations of modern humans into Europe. We suggest that these patterns were influenced by recent increases in size of human populations, which favored the accumulation of advantageous variants at different loci.
pigmentation genes; age of selection; selection coefficient; European populations
We present a novel method to identify sites under selection in protein-coding genes. Our method combines the traditional Goldman–Yang model of coding-sequence evolution with the information obtained from the 3D structure of the evolving protein, specifically the relative solvent accessibility (RSA) of individual residues. We develop a random-effects likelihood sites model in which rate classes are RSA dependent. The RSA dependence is modeled with linear functions. We demonstrate that our RSA-dependent model provides a significantly better fit to molecular sequence data than does a traditional, RSA-independent model. We further show that our model provides a natural, RSA-dependent neutral baseline for the evolutionary rate ratio ω = dN/dS Sites that deviate from this neutral baseline likely experience selection pressure for function. We apply our method to the influenza proteins hemagglutinin and neuraminidase. For hemagglutinin, our method recovers positively selected sites near the sialic acid-binding site and negatively selected sites that may be important for trimerization. For neuraminidase, our method recovers the oseltamivir resistance site and otherwise suggests that few sites deviate from the neutral baseline. Our method is broadly applicable to any protein sequences for which structural data are available or can be obtained via homology modeling or threading.
positive selection; protein evolution; relative solvent accessibility; influenza
Recently, Lee et al. (Lee JH, Silhavy JL, Lee JE, et al. (30 co-authors). 2012. Evolutionarily assembled cis-regulatory module at a human ciliopathy locus. Science (335:966–969.) demonstrated that mutation in either of the transmembrane protein encoding genes, TMEM138 or TMEM216, causes phenotypically indistinguishable ciliopathy. Furthermore, on the basis of the observation that their orthologs are linked in a head-to-tail configuration in other mammals and Anolis, but present on different scaffolds or chromosomes in Xenopus tropicalis and zebrafish, the authors concluded that the two genes were joined by chromosomal rearrangement at the evolutionary amphibian-to-reptile transition to form a functional module. We have sequenced these gene loci in a cartilaginous fish, the elephant shark, and found that the two genes together with a related gene (Tmem80) constitute a tandem cluster. This suggests that the two genes were already linked in the vertebrate ancestor and then rearranged independently in Xenopus and zebrafish. Analyses of the coelacanth and lamprey genomes support this hypothesis. Our study highlights the importance of basal vertebrates as critical reference genomes.
Callorhinchus milii; coelacanth; teleost fishes; comparative genomics
The nuclear genomes of euglenids contain three types of introns: conventional spliceosomal introns, nonconventional introns for which a splicing mechanism is unknown (variable noncanonical borders, RNA secondary structure bringing together intron ends), and so-called intermediate introns, which combine features of conventional and nonconventional introns. Analysis of two genes, tubA and tubB, from 20 species of euglenids reveals contrasting distribution patterns of conventional and nonconventional introns—positions of conventional introns are conserved, whereas those of the nonconventional ones are unique to individual species or small groups of closely related taxa. Moreover, in the group of phototrophic euglenids, 11 events of conventional intron loss versus 15 events of nonconventional intron gain were identified. A comparison of all nonconventional intron sequences highlighted the most conserved elements in their sequence and secondary structure. Our results led us to put forward two hypotheses. 1) The first one posits that mutational changes in intron sequence could lead to a change in their excision mechanism—intermediate introns would then be a transitional form between the conventional and nonconventional introns. 2) The second hypothesis concerns the origin of nonconventional introns—because of the presence of inverted repeats near their ends, insertion of MITE-like transposon elements is proposed as a possible source of new introns.
euglenids; nonconventional introns; conventional spliceosomal introns; tubulin gene
In bacteria, physiological change may be effected by a single gene acquisition, producing ecological differentiation without genetic isolation. Natural selection acting on such differences can reduce the frequency of genotypes that arise from recombination at these loci. However, gene acquisition can only account for recombination interference in the fraction of the genome that is tightly linked to the integration site. To identify additional loci that contribute to adaptive differences, we examined orthologous genes in species of Enterobacteriaceae to identify significant differences in the degree of codon selection. Significance was assessed using the Adaptive Codon Enrichment metric, which accounts for the variation in codon usage bias that is expected to arise from mutation and drift; large differences in codon usage bias were identified in more genes than would be expected to arise from stochastic processes alone. Genes in the same operon showed parallel differences in codon usage bias, suggesting that changes in the overall levels of gene expression led to changes in the degree of adaptive codon usage. Most significant differences between orthologous operons were found among those involved with specific environmental adaptations, whereas "housekeeping" genes rarely showed significant changes. When considered together, the loci experiencing significant changes in codon selection outnumber potentially adaptive gene acquisition events. The identity of genes under strong codon selection seems to be influenced by the habitat from which the bacteria were isolated. We propose a two-stage model for how adaptation to different selective regimes can drive bacterial speciation. Initially, gene acquisitions catalyze rapid ecological differentiation, which modifies the utilization of genes, thereby changing the strength of codon selection on them. Alleles develop fitness variation by substitution, producing recombination interference at these loci in addition to those flanking acquired genes, allowing sequences to diverge across the entire genome and establishing genetic isolation (i.e., protection from frequent homologous recombination).
codon usage bias; codon selection; speciation; recombination interference
The level of within-species polymorphism differs greatly among genes in a genome. Many genomic studies have investigated the relationship between gene polymorphism and factors such as recombination rate or expression pattern. However, the polymorphism of a gene is affected not only by its physical properties or functional constraints but also by natural selection on organisms in their environments. Specifically, if functionally divergent alleles enable adaptation to different environments, locus-specific polymorphism may be maintained by spatially heterogeneous natural selection. To test this hypothesis and estimate the extent to which environmental selection shapes the pattern of genome-wide polymorphism, we define the "environmental relevance" of a gene as the proportion of genetic variation explained by environmental factors, after controlling for population structure. We found substantial effects of environmental relevance on patterns of polymorphism among genes. In addition, the correlation between environmental relevance and gene polymorphism is positive, consistent with the expectation that balancing selection among heterogeneous environments maintains genetic variation at ecologically important genes. Comparison of the gene ontology annotations shows that genes with high environmental relevance are enriched in unknown function categories. These results suggest an important role for environmental factors in shaping genome-wide patterns of polymorphism and indicate another direction of genomic study.
environment; Arabidopsis thaliana; genome-wide polymorphism; genetic variation
Genome reduction in obligately intracellular bacteria is one of the most well-established patterns in the field of molecular evolution. In the extreme, many sap-feeding insects harbor nutritional symbionts with genomes that are so reduced that it is not clear how they perform basic cellular functions. For example, the primary symbiont of psyllids (Carsonella) maintains one of the smallest and most AT-rich bacterial genomes ever identified and has surprisingly lost many genes that are thought to be essential for its role in provisioning its host with amino acids. However, our understanding of this extreme case of genome reduction is limited, as genomic data for Carsonella are available from only a single host species, and little is known about the functional role of “secondary” bacterial symbionts in psyllids. To address these limitations, we analyzed complete Carsonella genomes from pairs of congeneric hosts in three divergent genera within the Psyllidae (Ctenarytaina, Heteropsylla, and Pachypsylla) as well as complete secondary symbiont genomes from two of these host species (Ctenarytaina eucalypti and Heteropsylla cubana). Although the Carsonella genomes are generally conserved in size, structure, and GC content and exhibit genome-wide signatures of purifying selection, we found that gene loss has remained active since the divergence of the host species and had a particularly large impact on the amino acid biosynthesis pathways that define the symbiotic role of Carsonella. In some cases, the presence of additional bacterial symbionts may compensate for gene loss in Carsonella, as functional gene content indicates a high degree of metabolic complementarity between co-occurring symbionts. The genomes of the secondary symbionts also show signatures of long-term evolution as vertically transmitted, intracellular bacteria, including more extensive genome reduction than typically observed in facultative symbionts. Therefore, a history of co-evolution with secondary bacterial symbionts can partially explain the ongoing genome reduction in Carsonella. However, the absence of these secondary symbionts in other host lineages indicates that the relationships are dynamic and that other mechanisms, such as changes in host diet or functional coordination with the host genome, must also be at play.
amino acid biosynthesis; Carsonella; endosymbiont; gene loss; purifying selection
Proteins in the superfamily of voltage-gated ion channels mediate behavior across the tree of life. These proteins regulate the movement of ions across cell membranes by opening and closing a central pore that controls ion flow. The best-known members of this superfamily are the voltage-gated potassium, calcium (Cav), and sodium (Nav) channels, which underlie impulse conduction in nerve and muscle. Not all members of this family are opened by changes in voltage, however. NALCN (NA+ leak channel nonselective) channels, which encode a voltage-insensitive “sodium leak” channel, have garnered a growing interest. This study examines the phylogenetic relationship among Nav/Cav voltage-gated and voltage-insensitive channels in the eukaryotic group Opisthokonta, which includes animals, fungi, and their unicellular relatives. We show that NALCN channels diverged from voltage-gated channels before the divergence of fungi and animals and that the closest relatives of NALCN channels are fungal calcium channels, which they functionally resemble.
NALCN; Cch1; maximum likelihood; pore motif
Gene expression levels correlate with multiple aspects of gene sequence and gene structure in phylogenetically diverse taxa, suggesting an important role of gene expression levels in the evolution of protein-coding genes. Here we present results of a genome-wide study of the influence of gene expression on synonymous codon usage, amino acid composition, and gene structure in the red flour beetle, Tribolium castaneum. Consistent with the action of translational selection, we find that synonymous codon usage bias increases with gene expression. However, the correspondence between tRNA gene copy number and optimal codons is weak. At the amino acid level, translational selection is suggested by the positive correlation between tRNA gene numbers and amino acid usage, which is stronger for highly expressed genes. In addition, there is a clear trend for increased use of metabolically cheaper, less complex amino acids as gene expression increases. tRNA gene numbers also correlate negatively with amino acid size/complexity (S/C) score indicating the coupling between translational selection and selection to minimize the use of large/complex amino acids. Interestingly, the analysis of 10 additional genomes suggests that the correlation between tRNA gene numbers and amino acid S/C score is widespread and might be explained by selection against negative consequences of protein misfolding. At the level of gene structure, three major trends are detected: 1) complete coding region length increases across low and intermediate expression levels but decreases in highly expressed genes; 2) the average intron size shows the opposite trend, first decreasing with expression, followed by a slight increase in highly expressed genes; and 3) intron density remains nearly constant across all expression levels. These changes in gene architecture are only in partial agreement with selection favoring reduced cost of biosynthesis.
Tribolium castaneum; expression; translational selection; size/complexity score; tRNA abundance; gene structure
The HLA region shows diversity concerning the number and content of DRB genes present per haplotype. Similar observations are made for the equivalent regions in other primate species. To elucidate the evolutionary history of the various HLA-DRB genes, a large panel of intron sequences obtained from humans, chimpanzees, rhesus macaques, and common marmosets has been subjected to phylogenetic analyses. Special attention was paid to the presence and absence of particular transposable elements and/or to their segments. The sharing of different parts of the same long interspersed nuclear element-2 (LINE2, L2) and various Alu insertions by the species studied demonstrates that one precursor gene must have been duplicated several times before the Old World monkey (OWM) and hominid (HOM) divergence. At least four ancestral DRB gene families appear to have been present before the radiation of OWM and HOM, and one of these even predates the speciation of Old and New World primates. Two of these families represent the pseudogenes DRB6/DRB2 and DRB7, which have been locked in the genomes of various primate species over long evolutionary time spans. Furthermore, all phylogenies of different intron segments show consistently that, apart from the pseudogenes, only DRB5 genes are shared by OWM and HOM, and they demonstrate the common history of certain DRB genes/lineages of humans and chimpanzees. In contrast, the evolutionary history of some other DRB loci is difficult to decipher, thus illustrating the complex history of the evolution of DRB genes due to a combination of mutations and recombination-like events. The selected approach allowed us to shed light on the ancestral DRB gene pool in primates and on the evolutionary relationship of the various HLA-DRB genes.
MHC; HLA; evolution; transposable elements; introns; primates
An improved understanding of the biological and numerical properties of measures of population differentiation across loci is becoming increasingly more important because of their growing use in analyzing genome-wide polymorphism data for detecting population structures, inferring the rates of migration, and identifying local adaptations. In a genome-wide analysis, we discovered that the estimates of population differentiation (e.g., FST, θ, and Jost’s D) calculated for human single-nucleotide polymorphisms (SNPs) are strongly and positively correlated to the position-specific evolutionary rates measured from multispecies alignments. That is, genomic positions (loci) experiencing higher purifying selection (lower evolutionary rates) produce lower values for the degree of population differentiation than those evolving with faster rates. We show that this pattern is completely mediated by the negative effects of purifying selection on the minor allele frequency (MAF) at individual loci. Our results suggest that inferences and methods relying on the comparison of population differentiation estimates (FST, θ, and Jost’s D) based on SNPs across genomic positions should be restricted to loci with similar MAFs and/or the rates of evolution in genome scale surveys.
FST; minor allele frequency; population differentiation; purifying selection; evolutionary rate