In contrast to RNA viruses, double-stranded DNA viruses have low mutation rates, yet must still adapt rapidly in response to changing host defenses. To determine mechanisms of adaptation we subjected the model poxvirus vaccinia to serial propagation in human cells, where its anti-host factor K3L is maladapted against the anti-viral Protein Kinase R (PKR). Viruses rapidly acquired higher fitness via recurrent K3L gene amplifications, incurring up to 7-10% increases in genome size. These transient gene expansions were necessary and sufficient to counteract human PKR and facilitated the gain of an adaptive amino acid substitution in K3L that also defeats PKR. Subsequent reductions in gene amplifications offset the costs associated with larger genome size while retaining adaptive substitutions. Our discovery of viral ‘gene-accordions’ explains how poxviruses can rapidly adapt to defeat different host defenses despite low mutation rates and reveals how classical Red Queen conflicts can progress through unrecognized intermediates.
The human SAMHD1 protein potently restricts lentiviral infection in dendritic cells and monocyte/macrophages, but is antagonized by the primate lentiviral protein Vpx which targets SAMHD1for degradation. However, only two of eight primate lentivirus lineages encode Vpx whereas its paralog, Vpr, is conserved across all extant primate lentiviruses. We find that not only multiple Vpx but also some Vpr proteins are able to degrade SAMHD1 and such antagonism led to dramatic positive selection of SAMHD1 in the primate subfamily Cercopithecinae. Residues that have evolved under positive selection precisely determine sensitivity to Vpx/Vpr degradation and alter binding specificity. By overlaying these functional analyses on a phylogenetic framework of Vpr and Vpx evolution, we can decipher the chronology of acquisition of SAMHD1-degrading abilities in lentiviruses. We conclude that vpr neofunctionalized to degrade SAMHD1 even prior to the birth of a separate vpx gene, thereby initiating an evolutionary arms race with SAMHD1.
Pax6 is a developmental control gene essential for eye development throughout the animal kingdom. In addition, Pax6 plays key roles in other parts of the CNS, olfactory system, and pancreas. In mammals a single Pax6 gene encoding multiple isoforms delivers these pleiotropic functions. Here we provide evidence that the genomes of many other vertebrate species contain multiple Pax6 loci. We sequenced Pax6-containing BACs from the cartilaginous elephant shark (Callorhinchus milii) and found two distinct Pax6 loci. Pax6.1 is highly similar to mammalian Pax6, while Pax6.2 encodes a paired-less Pax6. Using synteny relationships, we identify homologs of this novel paired-less Pax6.2 gene in lizard and in frog, as well as in zebrafish and in other teleosts. In zebrafish two full-length Pax6 duplicates were known previously, originating from the fish-specific genome duplication (FSGD) and expressed in divergent patterns due to paralog-specific loss of cis-elements. We show that teleosts other than zebrafish also maintain duplicate full-length Pax6 loci, but differences in gene and regulatory domain structure suggest that these Pax6 paralogs originate from a more ancient duplication event and are hence renamed as Pax6.3. Sequence comparisons between mammalian and elephant shark Pax6.1 loci highlight the presence of short- and long-range conserved noncoding elements (CNEs). Functional analysis demonstrates the ancient role of long-range enhancers for Pax6 transcription. We show that the paired-less Pax6.2 ortholog in zebrafish is expressed specifically in the developing retina. Transgenic analysis of elephant shark and zebrafish Pax6.2 CNEs with homology to the mouse NRE/Pα internal promoter revealed highly specific retinal expression. Finally, morpholino depletion of zebrafish Pax6.2 resulted in a “small eye” phenotype, supporting a role in retinal development. In summary, our study reveals that the pleiotropic functions of Pax6 in vertebrates are served by a divergent family of Pax6 genes, forged by ancient duplication events and by independent, lineage-specific gene losses.
Pax6 is a highly conserved transcription factor with key roles in eye, brain, pancreas, and olfactory system development. In mammals multiple Pax6 isoforms are encoded by a single Pax6 gene, embedded within a complex regulatory landscape. Here we provide evidence for the presence of multiple Pax6 loci in other vertebrate species. We show that two Pax6 genes (Pax6.1 and Pax6.2) are present in the genome of elephant shark (a cartilaginous fish). Pax6.1 is highly similar to mammalian Pax6 in terms of structure of the gene locus, protein sequence, and expression pattern; whereas the second gene, Pax6.2, codes for a protein lacking the paired domain. We identify orthologs of Pax6.2 in other vertebrate genomes, such as lizard, Xenopus, and teleost fishes, and show it is important for eye development in zebrafish. Additionally, we have characterised a third Pax6 (Pax6.3) present only in some teleost fishes. Phylogenetic analyses indicate that the evolutionary history of the Pax6 gene family in vertebrates is a result of ancient duplications followed by independent gene losses in different lineages. Sequence comparison of the cis-regulatory landscapes around the genes has led to the identification of novel Pax6 enhancers that provide a link between the diverged Pax6 family members.
When growing populations of bacteria are confronted with bactericidal antibiotics, the vast majority of cells are killed, but subpopulations of genetically susceptible but phenotypically resistant bacteria survive. In accord with the prevailing view, these “persisters” are non- or slowly dividing cells randomly generated from the dominant population. Antibiotics enrich populations for pre-existing persisters but play no role in their generation. The results of recent studies with Escherichia coli suggest that at least one antibiotic, ciprofloxacin, can contribute to the generation of persisters. To more generally elucidate the role of antibiotics in the generation of and selection for persisters and the nature of persistence in general, we use mathematical models and experiments with Staphylococcus aureus (Newman) and the antibiotics ciprofloxacin, gentamicin, vancomycin, and oxacillin. Our results indicate that the level of persistence varies among these drugs and their concentrations, and there is considerable variation in this level among independent cultures and mixtures of independent cultures. A model that assumes that the rate of production of persisters is low and persisters grow slowly in the presence of antibiotics can account for these observations. As predicted by this model, pre-treatment with sub-MIC concentrations of antibiotics substantially increases the level of persistence to drugs other than those with which the population is pre-treated. Collectively, the results of this jointly theoretical and experimental study along with other observations support the hypothesis that persistence is the product of many different kinds of errors in cell replication that result in transient periods of non-replication and/or slowed metabolism by individual cells in growing populations. This Persistence as Stuff Happens (PaSH) hypothesis can account for the ubiquity of this phenomenon. Like mutation, persistence is inevitable rather than an evolved character. What evolved and have been identified are genes and processes that affect the frequency of persisters.
Because of its importance to therapy, a great deal of effort has been devoted to understanding the mechanisms responsible for and the genetic basis of persistence in inherently susceptible but phenotypically antibiotic-resistant subpopulations of bacteria. Much of this research is based on the premise that persisters are produced at random from the susceptible population and the antibiotics used to detect them play no role in their generation. The results of this jointly theoretical and experimental study are inconsistent with this hypothesis. These results, along with observations reported by other investigators, including the failure to find bacteria that do not produce persisters but an abundance of genes modifying their frequency, support the hypothesis that there are many mechanisms responsible for persistence. Based on these collective theoretical and experimental results, along with evolutionary considerations, we postulate that persistence is analogous to mutation. It is an inevitable product of errors and glitches in cell replication and metabolism rather than an evolved character.
Drosophila melanogaster has played a pivotal role in the development of modern population genetics. However, many basic questions regarding the demographic and adaptive history of this species remain unresolved. We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection within an African population, between African populations, and between European and African populations. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa FST were found to be enriched in genomic regions of locally elevated cosmopolitan admixture, possibly reflecting a role for some of these loci in driving the introgression of non-African alleles into African populations.
Improvements in DNA sequencing technology have allowed genetic variation to be studied at the level of fully sequenced genomes. We have sequenced more than 100 D. melanogaster genomes originating from sub-Saharan Africa, which is thought to contain the ancestral range of this model organism. We found evidence for recent and substantial non-African gene flow into African populations, which may be driven by natural selection. The data also helped to refine our understanding of the species' history, which may have involved a geographic expansion from southern central Africa (e.g. Zambia). Lastly, we identified a large number of genes and functions that may have experienced recent adaptive evolution in one or more populations. An understanding of genomic variation in ancestral range populations of D. melanogaster will improve our ability to make population genetic inferences for worldwide populations. The results presented here should motivate statistical, mathematical, and computational studies to identify evolutionary models that are most compatible with observed data. Finally, the potential signals of natural selection identified here should facilitate detailed follow-up studies on the genetic basis of adaptive evolutionary change.
Chromosomal inversions have been an enduring interest of population geneticists since their discovery in Drosophila melanogaster. Numerous lines of evidence suggest powerful selective pressures govern the distributions of polymorphic inversions, and these observations have spurred the development of many explanatory models. However, due to a paucity of nucleotide data, little progress has been made towards investigating selective hypotheses or towards inferring the genealogical histories of inversions, which can inform models of inversion evolution and suggest selective mechanisms. Here, we utilize population genomic data to address persisting gaps in our knowledge of D. melanogaster's inversions. We develop a method, termed Reference-Assisted Reassembly, to assemble unbiased, highly accurate sequences near inversion breakpoints, which we use to estimate the age and the geographic origins of polymorphic inversions. We find that inversions are young, and most are African in origin, which is consistent with the demography of the species. The data suggest that inversions interact with polymorphism not only in breakpoint regions but also chromosome-wide. Inversions remain differentiated at low levels from standard haplotypes even in regions that are distant from breakpoints. Although genetic exchange appears fairly extensive, we identify numerous regions that are qualitatively consistent with selective hypotheses. Finally, we show that In(1)Be, which we estimate to be ∼60 years old (95% CI 5.9 to 372.8 years), has likely achieved high frequency via sex-ratio segregation distortion in males. With deeper sampling, it will be possible to build on our inferences of inversion histories to rigorously test selective models—particularly those that postulate that inversions achieve a selective advantage through the maintenance of co-adapted allele complexes.
Chromosomal inversions are known to respond to powerful natural selection in many species. Despite this evidence, little progress has been made towards understanding the nature of selection that affects inversions. Here, we utilize two recently released population-resequencing projects from D. melanogaster to address many of the unknown features of polymorphic inversions. We find evidence that inversions in this species are generally very young, with ages on the order of hundreds to tens of thousands of years, and that the majority of inversions originated in ancestral African populations. Inversions are also the source of the majority of genetic structure within populations and affect polymorphism chromosome-wide. We are able to confirm experimentally that one X-chromosome inversion achieves an advantage by selfishly increasing its transmission through males. Future work will build on our basic inferences to identify potential selective mechanisms and candidate genes in the other inversions studied.
We present a hidden Markov model (HMM) for inferring gradual isolation between two populations during speciation, modelled as a time interval with restricted gene flow. The HMM describes the history of adjacent nucleotides in two genomic sequences, such that the nucleotides can be separated by recombination, can migrate between populations, or can coalesce at variable time points, all dependent on the parameters of the model, which are the effective population sizes, splitting times, recombination rate, and migration rate. We show by extensive simulations that the HMM can accurately infer all parameters except the recombination rate, which is biased downwards. Inference is robust to variation in the mutation rate and the recombination rate over the sequence and also robust to unknown phase of genomes unless they are very closely related. We provide a test for whether divergence is gradual or instantaneous, and we apply the model to three key divergence processes in great apes: (a) the bonobo and common chimpanzee, (b) the eastern and western gorilla, and (c) the Sumatran and Bornean orang-utan. We find that the bonobo and chimpanzee appear to have undergone a clear split, whereas the divergence processes of the gorilla and orang-utan species occurred over several hundred thousands years with gene flow stopping quite recently. We also apply the model to the Homo/Pan speciation event and find that the most likely scenario involves an extended period of gene flow during speciation.
Next-generation sequencing technology has enabled the generation of whole-genome data for many closely related species. For population genetic inference we have sequenced many loci, but only in a few individuals. We present a new method that allows inference of the divergence process based on two closely related genomes, modelled as gradual isolation in an isolation with migration model. This allows estimation of the initial time of restricted gene flow, the cessation of gene flow, as well as the population sizes, migration rates, and recombination rates. We show by simulations that the parameter estimation is accurate with genome-wide data and use the model to disentangle the divergence processes among three sets of closely related great ape species: bonobo/chimpanzee, eastern/western gorillas, and Sumatran/Bornean orang-utans. We find allopatric speciation for bonobo and chimpanzee and non-allopatric speciation for the gorillas and orang-utans. We also consider the split between humans and chimpanzees/bonobos and find evidence for non-allopatric speciation, similar to that within gorillas and orang-utans.
Mimivirus and Megavirus are the best characterized representatives of an expanding new family of giant viruses infecting Acanthamoeba. Their most distinctive features, megabase-sized genomes carried in particles of size comparable to that of small bacteria, fill the gap between the viral and cellular worlds. These giant viruses are also uniquely equipped with genes coding for central components of the translation apparatus. The presence of those genes, thought to be hallmarks of cellular organisms, revived fundamental interrogations on the evolutionary origin of these viruses and the link they might have with the emergence of eukaryotes. In this work, we focused on the Mimivirus-encoded translation termination factor gene, the detailed primary structure of which was elucidated using computational and experimental approaches. We demonstrated that the translation of this protein proceeds through two internal stop codons via two distinct recoding events: a frameshift and a readthrough, the combined occurrence of which is unique to these viruses. Unexpectedly, the viral gene carries an autoregulatory mechanism exclusively encountered in bacterial termination factors, though the viral sequence is related to the eukaryotic/archaeal class-I release factors. This finding is a hint that the virally-encoded translation functions may not be strictly redundant with the one provided by the host. Lastly, the perplexing occurrence of a bacterial-like regulatory mechanism in a eukaryotic/archaeal homologous gene is yet another oddity brought about by the study of giant viruses.
Giant viruses, such as Mimivirus and Megavirus, have huge near-micron-sized particles and possess more genes than several cellular organisms. Furthermore their genomes encode functions not supposed to be in a virus, such as components of the protein translation apparatus. Since Lwoff in 1957, viruses are defined as ultimate obligate intracellular parasites from their need to hijack the peptide synthesis machinery of their host to replicate. We looked at the Mimivirus and Megavirus proteins that recognize the stop codons, the translation termination factors. We found that these genes contain two internal stop codons, meaning that their translation bypasses two distinct stop codons to produce a functional translation termination factor. These types of autoregulatory mechanisms are found in bacterial termination factors, although it involves only a single internal stop codon and not two, and are absent from their eukaryotic and archaeal homologs. Despite these bacterial-like features, giant viruses' termination factors have sequences that do not resemble bacterial genes but are clearly related to the eukaryotic and archaeal termination factors. Thus, giant viruses' termination factors surprisingly combine elements from eukaryotes/archaea and bacteria.
Drosophila telomere maintenance depends on the transposition of the specialized retrotransposons HeT-A, TART, and TAHRE. Controlling the activation and silencing of these elements is crucial for a precise telomere function without compromising genomic integrity. Here we describe two chromosomal proteins, JIL-1 and Z4 (also known as Putzig), which are necessary for establishing a fine-tuned regulation of the transcription of the major component of Drosophila telomeres, the HeT-A retrotransposon, thus guaranteeing genome stability. We found that mutant alleles of JIL-1 have decreased HeT-A transcription, putting forward this kinase as the first positive regulator of telomere transcription in Drosophila described to date. We describe how the decrease in HeT-A transcription in JIL-1 alleles correlates with an increase in silencing chromatin marks such as H3K9me3 and HP1a at the HeT-A promoter. Moreover, we have detected that Z4 mutant alleles show moderate telomere instability, suggesting an important role of the JIL-1-Z4 complex in establishing and maintaining an appropriate chromatin environment at Drosophila telomeres. Interestingly, we have detected a biochemical interaction between Z4 and the HeT-A Gag protein, which could explain how the Z4-JIL-1 complex is targeted to the telomeres. Accordingly, we demonstrate that a phenotype of telomere instability similar to that observed for Z4 mutant alleles is found when the gene that encodes the HeT-A Gag protein is knocked down. We propose a model to explain the observed transcriptional and stability changes in relation to other heterochromatin components characteristic of Drosophila telomeres, such as HP1a.
Drosophila telomeres constitute a remarkable exception to the general telomerase mechanism of telomere maintenance in eukaryotes. The essential role of the telomeric transposons HeT-A, TART, and TAHRE (HTT) in this organism contrasts with the strong conservation of their retrotransposon personalities. The particularities of this system add an extra layer of complexity to the control of telomere length in Drosophila; on one hand, telomere expression should be fine-tuned in order to achieve telomere function whenever needed; on the other, terminal transposition should be tightly controlled to guarantee genomic stability. Here, we report the dual role of the JIL-1-Z4 complex in regulating the HeT-A retrotransposon transcription (by the action of the JIL-1 kinase) and in guaranteeing the stability of telomeres (by the zinc finger protein Z4). We show how the loss of JIL-1 and Z4 causes major changes at the chromatin of the HeT-A promoter that can explain the phenotypes that we observe in JIL-1 and Z4 mutant alleles. Moreover, we give evidence of the involvement of the HeT-A Gag protein in the recruitment of Z4 to the HTT array, and we demonstrate how the disruption of this interaction has fatal consequences for telomere stability.
Although hypoxia is a major stress on physiological processes, several human populations have survived for millennia at high altitudes, suggesting that they have adapted to hypoxic conditions. This hypothesis was recently corroborated by studies of Tibetan highlanders, which showed that polymorphisms in candidate genes show signatures of natural selection as well as well-replicated association signals for variation in hemoglobin levels. We extended genomic analysis to two Ethiopian ethnic groups: Amhara and Oromo. For each ethnic group, we sampled low and high altitude residents, thus allowing genetic and phenotypic comparisons across altitudes and across ethnic groups. Genome-wide SNP genotype data were collected in these samples by using Illumina arrays. We find that variants associated with hemoglobin variation among Tibetans or other variants at the same loci do not influence the trait in Ethiopians. However, in the Amhara, SNP rs10803083 is associated with hemoglobin levels at genome-wide levels of significance. No significant genotype association was observed for oxygen saturation levels in either ethnic group. Approaches based on allele frequency divergence did not detect outliers in candidate hypoxia genes, but the most differentiated variants between high- and lowlanders have a clear role in pathogen defense. Interestingly, a significant excess of allele frequency divergence was consistently detected for genes involved in cell cycle control and DNA damage and repair, thus pointing to new pathways for high altitude adaptations. Finally, a comparison of CpG methylation levels between high- and lowlanders found several significant signals at individual genes in the Oromo.
Although hypoxia is a major stress on physiological processes, several human populations have survived for millennia at high altitudes, suggesting that they have adapted to hypoxic conditions. Consistent with this idea, previous studies have identified genetic variants in Tibetan highlanders associated with reduction in hemoglobin levels, an advantageous phenotype at high altitude. To compare the genetic bases of adaptations to high altitude, we collected genetic and epigenetic data in Ethiopians living at high and low altitude, respectively. We find that variants associated with hemoglobin variation among Tibetans or other variants at the same loci do not influence the trait in Ethiopians. However, we find a different variant that is significantly associated with hemoglobin levels in Ethiopians. Approaches based on the difference in allele frequency between high- and lowlanders detected strong signals in genes with a clear role in defense from pathogens, consistent with known differences in pathogens between altitudes. Finally, we found a few genome-wide significant epigenetic differences between altitudes. These results taken together imply that Ethiopian and Tibetan highlanders adapted to the same environmental stress through different variants and genetic loci.
The modulation of fitness by single mutational substitutions during environmental change is the most fundamental consequence of natural selection. The antagonistic tradeoffs of pleiotropic mutations that can be selected under changing environments therefore lie at the foundation of evolutionary biology. However, the molecular basis of fitness tradeoffs is rarely determined in terms of how these pleiotropic mutations affect protein structure. Here we use an interdisciplinary approach to study how antagonistic pleiotropy and protein function dictate a fitness tradeoff. We challenged populations of an RNA virus, bacteriophage Φ6, to evolve in a novel temperature environment where heat shock imposed extreme virus mortality. A single amino acid substitution in the viral lysin protein P5 (V207F) favored improved stability, and hence survival of challenged viruses, despite a concomitant tradeoff that decreased viral reproduction. This mutation increased the thermostability of P5. Crystal structures of wild-type, mutant, and ligand-bound P5 reveal the molecular basis of this thermostabilization—the Phe207 side chain fills a hydrophobic cavity that is unoccupied in the wild-type—and identify P5 as a lytic transglycosylase. The mutation did not reduce the enzymatic activity of P5, suggesting that the reproduction tradeoff stems from other factors such as inefficient capsid assembly or disassembly. Our study demonstrates how combining experimental evolution, biochemistry, and structural biology can identify the mechanisms that drive the antagonistic pleiotropic phenotypes of an individual point mutation in the classic evolutionary tug-of-war between survival and reproduction.
The most fundamental mechanism of natural selection in a changing environment is the modulation of fitness by mutations. It is the tradeoffs offered by these mutations that drive evolution. However, fitness tradeoffs are rarely understood at the molecular level, in terms of how the selected mutations affect protein structure and function. Here, we merge experimental evolution and structural biology to study the fundamental tradeoff between survival and reproduction. We challenged populations of an RNA virus to evolve in a novel temperature environment where heat shock imposed extreme virus mortality. A single mutation in a specific viral protein increased the stability, and hence survival of challenged viruses, despite a concomitant tradeoff that decreased viral reproduction. This mutation increased the thermal stability of the mutant protein. Atomic structures of the wild-type and mutant protein reveal the molecular basis of this stabilization. The mutation did not reduce the enzymatic activity of the protein, suggesting that the reproduction tradeoff stems from other factors, such as inefficient virus assembly or disassembly. Our study uncovers the mechanism that drives the antagonistic effects of an individual point mutation in the classic evolutionary tug-of-war between survival and reproduction.
It is generally believed that the last eukaryotic common ancestor (LECA) was a unicellular organism with motile cilia. In the vertebrates, the winged-helix transcription factor FoxJ1 functions as the master regulator of motile cilia biogenesis. Despite the antiquity of cilia, their highly conserved structure, and their mechanism of motility, the evolution of the transcriptional program controlling ciliogenesis has remained incompletely understood. In particular, it is presently not known how the generation of motile cilia is programmed outside of the vertebrates, and whether and to what extent the FoxJ1-dependent regulation is conserved. We have performed a survey of numerous eukaryotic genomes and discovered that genes homologous to foxJ1 are restricted only to organisms belonging to the unikont lineage. Using a mis-expression assay, we then obtained evidence of a conserved ability of FoxJ1 proteins from a number of diverse phyletic groups to activate the expression of a host of motile ciliary genes in zebrafish embryos. Conversely, we found that inactivation of a foxJ1 gene in Schmidtea mediterranea, a platyhelminth (flatworm) that utilizes motile cilia for locomotion, led to a profound disruption in the differentiation of motile cilia. Together, all of these findings provide the first evolutionary perspective into the transcriptional control of motile ciliogenesis and allow us to propose a conserved FoxJ1-regulated mechanism for motile cilia biogenesis back to the origin of the metazoans.
Cilia are microtubule-based, hair-like organelles that project from the surfaces of eukaryotic cells. Protists use motile cilia for locomotion as well as for sensory perception. In metazoans, motile cilia also function in fluid transport over epithelia, such as in the mammalian lungs. Most vertebrate and some invertebrate cell-types differentiate non-motile primary cilia, which function exclusively in sensory transduction. It is believed that primary cilia arose from motile cilia through the loss of the motility apparatus. Cilia are complex organelles: a large number of proteins are involved in their assembly and maintenance. FoxJ1, a forkhead-domain transcription factor, is the master regulator of motile ciliogenesis in vertebrates. It is not known to what extent this transcriptional control is conserved and how it may have evolved. Here, we document the existence of FoxJ1 orthologs in several eukaryotic groups besides the vertebrates. FoxJ1 proteins from three representative phyla—Placozoa, Platyhelminthes, and Echinodermata—were able to activate the expression of ciliary genes when mis-expressed in zebrafish embryos. Moreover, inactivation of FoxJ1 in planaria (Platyhelminthes) abolished motile cilia differentiation. These results provide new insights into the transcriptional regulation of motile cilia biogenesis outside the vertebrates and demonstrate a remarkable conservation of the activity of FoxJ1.
Comparative ChIP-seq data reveal adaptive evolution of insulator protein CTCF binding in multiple Drosophila species.
Changes in the physical interaction between cis-regulatory DNA sequences and proteins drive the evolution of gene expression. However, it has proven difficult to accurately quantify evolutionary rates of such binding change or to estimate the relative effects of selection and drift in shaping the binding evolution. Here we examine the genome-wide binding of CTCF in four species of Drosophila separated by between ∼2.5 and 25 million years. CTCF is a highly conserved protein known to be associated with insulator sequences in the genomes of human and Drosophila. Although the binding preference for CTCF is highly conserved, we find that CTCF binding itself is highly evolutionarily dynamic and has adaptively evolved. Between species, binding divergence increased linearly with evolutionary distance, and CTCF binding profiles are diverging rapidly at the rate of 2.22% per million years (Myr). At least 89 new CTCF binding sites have originated in the Drosophila melanogaster genome since the most recent common ancestor with Drosophila simulans. Comparing these data to genome sequence data from 37 different strains of Drosophila melanogaster, we detected signatures of selection in both newly gained and evolutionarily conserved binding sites. Newly evolved CTCF binding sites show a significantly stronger signature for positive selection than older sites. Comparative gene expression profiling revealed that expression divergence of genes adjacent to CTCF binding site is significantly associated with the gain and loss of CTCF binding. Further, the birth of new genes is associated with the birth of new CTCF binding sites. Our data indicate that binding of Drosophila CTCF protein has evolved under natural selection, and CTCF binding evolution has shaped both the evolution of gene expression and genome evolution during the birth of new genes.
A large proportion of the diversity of living organisms results from differential regulation of gene transcription. Transcriptional regulation is thought to differ between species because of evolutionary changes in the physical interactions between regulatory DNA elements and DNA-binding proteins; these can generate variation in the spatial and temporal patterns of gene expression. The mechanisms by which these protein–DNA interactions evolve is therefore an important question in evolutionary biology. Does adaptive evolution play a role, or is the process dominated by neutral genetic drift? Insulator proteins are a special group of DNA-binding proteins—instead of directly serving to activate or repress genes, they can function to coordinate the interactions between other regulatory elements (such as enhancers and promoters). Additionally, insulator proteins can limit the spreading of chromatin condensation and help to demarcate the boundaries of regulatory domains in the genome. In spite of their critical role in genome regulation, little is known about the evolution of interactions between insulator proteins and DNA. Here, we use ChIP-seq to examine the distribution of binding sites for CTCF, a highly conserved insulator protein, in four closely related Drosophila species. We find that genome-wide binding profiles of CTCF are highly dynamic across evolutionary time, with frequent births of new CTCF-DNA interactions, and we demonstrate that this evolutionary process is driven by natural selection. By comparing these with RNA-seq data, we find that gain or loss of CTCF binding impacts the expression levels of nearby genes and correlates with structural evolution of the genome. Together these results suggest a potential mechanism of regulatory re-wiring through adaptive evolution of CTCF binding.
MOV10 protein, a putative RNA helicase and component of the RNA–induced silencing complex (RISC), inhibits retrovirus replication. We show that MOV10 also severely restricts human LINE1 (L1), Alu, and SVA retrotransposons. MOV10 associates with the L1 ribonucleoprotein particle, along with other RNA helicases including DDX5, DHX9, DDX17, DDX21, and DDX39A. However, unlike MOV10, these other helicases do not strongly inhibit retrotransposition, an activity dependent upon intact helicase domains. MOV10 association with retrotransposons is further supported by its colocalization with L1 ORF1 protein in stress granules, by cytoplasmic structures associated with RNA silencing, and by the ability of MOV10 to reduce endogenous and ectopic L1 expression. The majority of the human genome is repetitive DNA, most of which is the detritus of millions of years of accumulated retrotransposition. Retrotransposons remain active mutagens, and their insertion can disrupt gene function. Therefore, the host has evolved defense mechanisms to protect against retrotransposition, an arsenal we are only beginning to understand. With homologs in other vertebrates, insects, and plants, MOV10 may represent an ancient and innate form of immunity against both infective viruses and endogenous retroelements.
LINE1s, the only active autonomous mobile DNA in humans, occupy at least 17% of our genome. It is believed that about 100 L1s are potentially active in any individual diploid genome. The L1 has also been responsible for genomic insertion of processed pseudogenes and more than a million non-autonomous retrotransposons, mainly Alus and SVAs. Together, this mass of genomic baggage has had, and continues to have, profound effects on gene organization and expression. Consequently a number of molecular mechanisms have evolved to prevent the unchecked expansion of endogenous retroelements. We demonstrate that the putative RNA helicase MOV10, recently discovered to limit production and infectivity of retroviruses, also profoundly inhibits retrotransposition of L1s, Alus, and SVAs in cell culture. Microscopy and immunoprecipitation show a close association of MOV10 protein with the L1 ribonucleoprotein particle. This study reveals a novel factor that interacts with the L1 retrotransposon to modulate its activity, and it increases our understanding of the means by which the cell coexists with these genomic “parasites.”
Insertions of parasitic DNA within coding sequences are usually deleterious and are generally counter-selected during evolution. Thanks to nuclear dimorphism, ciliates provide unique models to study the fate of such insertions. Their germline genome undergoes extensive rearrangements during development of a new somatic macronucleus from the germline micronucleus following sexual events. In Paramecium, these rearrangements include precise excision of unique-copy Internal Eliminated Sequences (IES) from the somatic DNA, requiring the activity of a domesticated piggyBac transposase, PiggyMac. We have sequenced Paramecium tetraurelia germline DNA, establishing a genome-wide catalogue of ∼45,000 IESs, in order to gain insight into their evolutionary origin and excision mechanism. We obtained direct evidence that PiggyMac is required for excision of all IESs. Homology with known P. tetraurelia Tc1/mariner transposons, described here, indicates that at least a fraction of IESs derive from these elements. Most IES insertions occurred before a recent whole-genome duplication that preceded diversification of the P. aurelia species complex, but IES invasion of the Paramecium genome appears to be an ongoing process. Once inserted, IESs decay rapidly by accumulation of deletions and point substitutions. Over 90% of the IESs are shorter than 150 bp and present a remarkable size distribution with a ∼10 bp periodicity, corresponding to the helical repeat of double-stranded DNA and suggesting DNA loop formation during assembly of a transpososome-like excision complex. IESs are equally frequent within and between coding sequences; however, excision is not 100% efficient and there is selective pressure against IES insertions, in particular within highly expressed genes. We discuss the possibility that ancient domestication of a piggyBac transposase favored subsequent propagation of transposons throughout the germline by allowing insertions in coding sequences, a fraction of the genome in which parasitic DNA is not usually tolerated.
Ciliates are unicellular eukaryotes that rearrange their genomes at every sexual generation when a new somatic macronucleus, responsible for gene expression, develops from a copy of the germline micronucleus. In Paramecium, assembly of a functional somatic genome requires precise excision of interstitial DNA segments, the Internal Eliminated Sequences (IES), involving a domesticated piggyBac transposase, PiggyMac. To study IES origin and evolution, we sequenced germline DNA and identified 45,000 IESs. We found that at least some of these unique-copy elements are decayed Tc1/mariner transposons and that IES insertion is likely an ongoing process. After insertion, elements decay rapidly by accumulation of deletions and substitutions. The 93% of IESs shorter than 150 bp display a remarkable size distribution with a periodicity of 10 bp, the helical repeat of double-stranded DNA, consistent with the idea that evolution has only retained IESs that can form a double-stranded DNA loop during assembly of an excision complex. We propose that the ancient domestication of a piggyBac transposase, which provided a precise excision mechanism, enabled transposons to subsequently invade Paramecium coding sequences, a fraction of the genome that does not usually tolerate parasitic DNA.
The emerging field of paleovirology aims to study the evolutionary age and impact of ancient viruses (paleoviruses) on host biology. Despite a historical emphasis on retroviruses, paleoviral ‘fossils’ have recently been uncovered from a broad swathe of viruses. These viral imprints have upended long-held notions of the age and mutation rate of viruses. While 'direct' paleovirology relies on the insertion of viral genes in animal genomes, examination of adaptive changes in host genes that occurred in response to paleoviral infections provides a complementary strategy for making ‘indirect’ paleovirological inferences. Finally, viruses have also impacted host biology by providing genes hosts have domesticated for their own purpose.
Phenotypes that appear to be conserved could be maintained not only by strong purifying selection on the underlying genetic systems, but also by stabilizing selection acting via compensatory mutations with balanced effects. Such coevolution has been invoked to explain experimental results, but has rarely been the focus of study. Conserved expression driven by the unc-47 promoters of Caenorhabditis elegans and C. briggsae persists despite divergence within a cis-regulatory element and between this element and the trans-regulatory environment. Compensatory changes in cis and trans are revealed when these promoters are used to drive expression in the other species. Functional changes in the C. briggsae promoter, which has experienced accelerated sequence evolution, did not lead to alteration of gene expression in its endogenous environment. Coevolution among promoter elements suggests that complex epistatic interactions within cis-regulatory elements may facilitate their divergence. Our results offer a detailed picture of regulatory evolution in which subtle, lineage-specific, and compensatory modifications of interacting cis and trans regulators together maintain conserved gene expression patterns.
Some phenotypes, including gene expression patterns, are conserved between distantly related species. However, the molecular bases of those phenotypes are not necessarily conserved. Instead, regulatory DNA sequences and the proteins with which they interact can change over time with balanced effects, preserving expression patterns and concealing regulatory divergence. Coevolution between interacting molecules makes gene regulation highly species-specific, and it can be detected when the cis-regulatory DNA of one species is used to drive expression in another species. In this way, we identified regions of the C. elegans and C. briggsae unc-47 promoters that have coevolved with the lineage-specific trans-regulatory environments of these organisms. The C. briggsae promoter experienced accelerated sequence change relative to related species. All of this evolution occurred without changing the expression pattern driven by the promoter in its endogenous environment.
Alu elements are trans-mobilized by the autonomous non-LTR retroelement, LINE-1 (L1). Alu-induced insertion mutagenesis contributes to about 0.1% human genetic disease and is responsible for the majority of the documented instances of human retroelement insertion-induced disease. Here we introduce a SINE recovery method that provides a complementary approach for comprehensive analysis of the impact and biological mechanisms of Alu retrotransposition. Using this approach, we recovered 226 de novo tagged Alu inserts in HeLa cells. Our analysis reveals that in human cells marked Alu inserts driven by either exogenously supplied full length L1 or ORF2 protein are indistinguishable. Four percent of de novo Alu inserts were associated with genomic deletions and rearrangements and lacked the hallmarks of retrotransposition. In contrast to L1 inserts, 5′ truncations of Alu inserts are rare, as most of the recovered inserts (96.5%) are full length. De novo Alus show a random pattern of insertion across chromosomes, but further characterization revealed an Alu insertion bias exists favoring insertion near other SINEs, highly conserved elements, with almost 60% landing within genes. De novo Alu inserts show no evidence of RNA editing. Priming for reverse transcription rarely occurred within the first 20 bp (most 5′) of the A-tail. The A-tails of recovered inserts show significant expansion, with many at least doubling in length. Sequence manipulation of the construct led to the demonstration that the A-tail expansion likely occurs during insertion due to slippage by the L1 ORF2 protein. We postulate that the A-tail expansion directly impacts Alu evolution by reintroducing new active source elements to counteract the natural loss of active Alus and minimizing Alu extinction.
SINEs are mobile elements that are found ubiquitously throughout a large diversity of genomes from plants to mammals. The human SINE, Alu, is among the most successful mobile elements, with more than one million copies in the genome. Due to its high activity and ability to insert throughout the genome, Alu retrotransposition is responsible for the majority of diseases reported to be caused by mobile element activity. To further evaluate the genomic impact of SINEs, we recovered and characterized over 200 de novo Alu inserts under controlled conditions. Our data reinforce observations on the mutagenic potential of Alu, with newly retrotransposed Alu elements favoring insertion into genic and highly conserved elements. Alu-mediated deletions and rearrangements are infrequent and lack the typical hallmarks of TPRT retrotransposition, suggesting the use of an alternate method for resolving retrotransposition intermediates or an atypical insertion mechanism. Our data also provide novel insights into SINE retrotransposition biology. We found that slippage of L1 ORF2 protein during reverse transcription expands the A-tails of de novo insertions. We propose that the L1 ORF2 protein plays a major role in minimizing Alu extinction by reintroducing active Alu elements to counter the natural loss of Alu source elements.
The imprint of natural selection on protein coding genes is often difficult to identify because selection is frequently transient or episodic, i.e. it affects only a subset of lineages. Existing computational techniques, which are designed to identify sites subject to pervasive selection, may fail to recognize sites where selection is episodic: a large proportion of positively selected sites. We present a mixed effects model of evolution (MEME) that is capable of identifying instances of both episodic and pervasive positive selection at the level of an individual site. Using empirical and simulated data, we demonstrate the superior performance of MEME over older models under a broad range of scenarios. We find that episodic selection is widespread and conclude that the number of sites experiencing positive selection may have been vastly underestimated.
Identifying regions of protein coding genes that have undergone adaptive evolution is important to answering many questions in evolutionary biology and genetics. In order to tease out genetic evidence for natural selection, genes from a diverse array of taxa must be analyzed, only a subset of which may have undergone adaptive evolution; the same gene region may be under stabilizing or relaxed selection in lineages leading to other taxa. Most current computational methods designed to detect the imprint of natural selection at a site in a protein coding gene assume the strength and direction of natural selection is constant across all lineages. Here, we present a method to detect adaptive evolution, even when the selective forces are not constant across taxa. Using a variety of well-characterized genes, we find evidence suggesting that natural selection is generally episodic and that modeling it as such reveals that many more sites are subject to episodic positive selection than previously appreciated.
Animal heterotrimeric G proteins are activated by guanine nucleotide exchange factors (GEF), typically seven transmembrane receptors that trigger GDP release and subsequent GTP binding. In contrast, the Arabidopsis thaliana G protein (AtGPA1) rapidly activates itself without a GEF and is instead regulated by a seven transmembrane Regulator of G protein Signaling (7TM-RGS) protein that promotes GTP hydrolysis to reset the inactive (GDP-bound) state. It is not known if this unusual activation is a major and constraining part of the evolutionary history of G signaling in eukaryotes. In particular, it is not known if this is an ancestral form or if this mechanism is maintained, and therefore constrained, within the plant kingdom. To determine if this mode of signal regulation is conserved throughout the plant kingdom, we analyzed available plant genomes for G protein signaling components, and we purified individually the plant components encoded in an informative set of plant genomes in order to determine their activation properties in vitro. While the subunits of the heterotrimeric G protein complex are encoded in vascular plant genomes, the 7TM-RGS genes were lost in all investigated grasses. Despite the absence of a Gα-inactivating protein in grasses, all vascular plant Gα proteins examined rapidly released GDP without a receptor and slowly hydrolyzed GTP, indicating that these Gα are self-activating. We showed further that a single amino acid substitution found naturally in grass Gα proteins reduced the Gα-RGS interaction, and this amino acid substitution occurred before the loss of the RGS gene in the grass lineage. Like grasses, non-vascular plants also appear to lack RGS proteins. However, unlike grasses, one representative non-vascular plant Gα showed rapid GTP hydrolysis, likely compensating for the loss of the RGS gene. Our findings, the loss of a regulatory gene and the retention of the “self-activating” trait, indicate the existence of divergent Gα regulatory mechanisms in the plant kingdom. In the grasses, purifying selection on the regulatory gene was lost after the physical decoupling of the RGS protein and its cognate Gα partner. More broadly these findings show extreme divergence in Gα activation and regulation that played a critical role in the evolution of G protein signaling pathways.
Extracellular signals activate intracellular changes that lead to cell behaviors. This spatial coupling is mediated by cell-surface receptor activation of the heterotrimeric G protein complex located on the cytoplasmic side of the plasma membrane. Unlike the case for metazoans, plant G proteins are constitutively active. Plants use multiple mechanisms to keep the G protein complex in its resting state, and activation occurs by inhibition of this property. One mechanism involves a cell surface receptor that accelerates the return to the resting state through direct interaction with the G protein at a specific protein interface. This unique protein, AtRGS1, has both an animal like receptor domain and a domain (RGS box) responsible for accelerating deactivation. One group of plants (cereals) lost this protein through, first, a mutation in the protein interface that reduces the affinity for the RGS box to the G protein, followed by gene loss.
Viperin, also known as RSAD2, is an interferon-inducible protein that potently restricts a broad range of different viruses such as influenza, hepatitis C virus, human cytomegalovirus and West Nile virus. Viperin is thought to affect virus budding by modification of the lipid environment within the cell. Since HIV-1 and other retroviruses depend on lipid domains of the host cell for budding and infectivity, we investigated the possibility that Viperin also restricts human immunodeficiency virus and other retroviruses.
Like other host restriction factors that have a broad antiviral range, we find that viperin has also been evolving under positive selection in primates. The pattern of positive selection is indicative of Viperin's escape from multiple viral antagonists over the course of primate evolution. Furthermore, we find that Viperin is interferon-induced in HIV primary target cells. We show that exogenous expression of Viperin restricts the LAI strain of HIV-1 at the stage of virus release from the cell. Nonetheless, the effect of Viperin restriction is highly strain-specific and does not affect most HIV-1 strains or other retroviruses tested. Moreover, knockdown of endogenous Viperin in a lymphocytic cell line did not significantly affect the spreading infection of HIV-1.
Despite positive selection having acted on Viperin throughout primate evolution, our findings indicate that Viperin is not a major restriction factor against HIV-1 and other retroviruses. Therefore, other viral lineages are likely responsible for the evolutionary signatures of positive selection in viperin among primates.
Histone variants are non-allelic protein isoforms that play key roles in diversifying chromatin structure. The known number of such variants has greatly increased in recent years, but the lack of naming conventions for them has led to a variety of naming styles, multiple synonyms and misleading homographs that obscure variant relationships and complicate database searches. We propose here a unified nomenclature for variants of all five classes of histones that uses consistent but flexible naming conventions to produce names that are informative and readily searchable. The nomenclature builds on historical usage and incorporates phylogenetic relationships, which are strong predictors of structure and function. A key feature is the consistent use of punctuation to represent phylogenetic divergence, making explicit the relationships among variant subtypes that have previously been implicit or unclear. We recommend that by default new histone variants be named with organism-specific paralog-number suffixes that lack phylogenetic implication, while letter suffixes be reserved for structurally distinct clades of variants. For clarity and searchability, we encourage the use of descriptors that are separate from the phylogeny-based variant name to indicate developmental and other properties of variants that may be independent of structure.
Heterochromatin is the gene-poor, satellite-rich eukaryotic genome compartment that supports many essential cellular processes. The functional diversity of proteins that bind and often epigenetically define heterochromatic DNA sequence reflects the diverse functions supported by this enigmatic genome compartment. Moreover, heterogeneous signatures of selection at chromosomal proteins often mirror the heterogeneity of evolutionary forces that act on heterochromatic DNA. To identify new such surrogates for dissecting heterochromatin function and evolution, we conducted a comprehensive phylogenomic analysis of the Heterochromatin Protein 1 gene family across 40 million years of Drosophila evolution. Our study expands this gene family from 5 genes to at least 26 genes, including several uncharacterized genes in Drosophila melanogaster. The 21 newly defined HP1s introduce unprecedented structural diversity, lineage-restriction, and germline-biased expression patterns into the HP1 family. We find little evidence of positive selection at these HP1 genes in both population genetic and molecular evolution analyses. Instead, we find that dynamic evolution occurs via prolific gene gains and losses. Despite this dynamic gene turnover, the number of HP1 genes is relatively constant across species. We propose that karyotype evolution drives at least some HP1 gene turnover. For example, the loss of the male germline-restricted HP1E in the obscura group coincides with one episode of dramatic karyotypic evolution, including the gain of a neo-Y in this lineage. This expanded compendium of ovary- and testis-restricted HP1 genes revealed by our study, together with correlated gain/loss dynamics and chromosome fission/fusion events, will guide functional analyses of novel roles supported by germline chromatin.
Our genome is comprised of two compartments. The euchromatin harbors abundant genes and regulatory information, while heterochromatin harbors few genes and abundant repetitive DNA. These characteristic features of heterochromatin challenge traditional methods of sequence assembly and molecular dissection. The analysis, instead, of proteins that localize to and often functionally define heterochromatic sequence has illuminated numerous heterochromatin-dependent, essential cellular processes, including chromosome segregation, telomere stability, and gene regulation. With the aim of increasing our sample of heterochromatin-localizing proteins, we performed a comprehensive search for new members of Heterochromatin Protein 1 gene family over 40 million years of Drosophila evolution. Our report expands this family from a modest five genes to 26 genes. Unlike the founding family members, the HP1s we describe are structurally diverse, largely restricted to male reproductive tissue, and highly dynamic over evolutionary time. Despite recurrent HP1 gene birth and death, gene numbers per species are relatively constant. These gene “replacements” likely support a dynamic biological process. We propose, and present evidence for, the hypothesis that recurrent chromosomal rearrangements drive at least some HP1 gene family dynamics observed. We anticipate that these HP1 genes will help define new heterochromatin-dependent processes in the male germline.
Changes in gene expression are commonly observed during evolution. However, the phenotypic consequences of expression divergence are frequently unknown and difficult to measure. Transcriptional regulators provide a mechanism by which phenotypic divergence can occur through multiple, coordinated changes in gene expression during development or in response to environmental changes. Yet, some changes in transcriptional regulators may be constrained by their pleiotropic effects on gene expression. Here, we use a genome-wide screen for promoters that are likely to have diverged in function and identify a yeast transcription factor, FZF1, that has evolved substantial differences in its ability to confer resistance to sulfites. Chimeric alleles from four Saccharomyces species show that divergence in FZF1 activity is due to changes in both its coding and upstream noncoding sequence. Between the two closest species, noncoding changes affect the expression of FZF1, whereas coding changes affect the expression of SSU1, a sulfite efflux pump activated by FZF1. Both coding and noncoding changes also affect the expression of many other genes. Our results show how divergence in the coding and promoter region of a transcription factor alters the response to an environmental stress.
Changes in gene regulation are thought to play an important role in evolution. While variation in gene expression between species is common, it is hard to identify the phenotypic consequences of this variation since many changes in gene expression may have subtle or no phenotypic effects. In this study, we investigate changes in sulfite resistance and gene expression caused by the transcription factor, FZF1, that has evolved rapidly during the divergence of related yeast species. We find that divergence in the ability of FZF1 to confer sulfite resistance is mediated by changes in its expression as well as changes in its protein structure, both of which cause changes in the expression of other genes. Our results show how the combination of multiple changes within a transcription factor can produce substantial changes in phenotype and the expression of many genes.
The Apobec3 family of cytidine deaminases can inhibit the replication of retroviruses and retrotransposons. Human and chimpanzee genomes encode seven Apobec3 paralogs; of these, Apobec3DE has the greatest sequence divergence between humans and chimpanzees. Here we show that even though human and chimpanzee Apobec3DEs are very divergent, the two orthologs similarly restrict long terminal repeat (LTR) and non-LTR retrotransposons (MusD and Alu, respectively). However, chimpanzee Apobec3DE also potently restricts two lentiviruses, human immunodeficiency virus type 1 (HIV-1) and the simian immunodeficiency virus (SIV) that infects African green monkeys (SIVagmTAN), unlike human Apobec3DE, which has poor antiviral activity against these same viruses. This difference between human and chimpanzee Apobec3DE in the ability to restrict retroviruses is not due to different levels of Apobec3DE protein incorporation into virions but rather to the ability of Apobec3DE to deaminate the viral genome in target cells. We further show that Apobec3DE rapidly evolved in chimpanzee ancestors approximately 2 to 6 million years ago and that this evolution drove the increased breadth of chimpanzee Apobec3DE antiviral activity to its current high activity against some lentiviruses. Despite a difference in target specificities between human and chimpanzee Apobec3DE, Apobec3DE is likely to currently play a role in host defense against retroelements in both species.