PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (37)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Adaptation of HIV-1 to Its Human Host 
Molecular biology and evolution  2007;24(8):1853-1860.
Human immunodeficiency virus type 1 (HIV-1) originated from three independent cross-species transmissions of simian immunodeficiency virus (SIVcpzPtt) infecting chimpanzees (Pan troglodytes troglodytes) in west central Africa, giving rise to pandemic (group M) and non-pandemic (groups N and O) clades of HIV-1. To identify host-specific adaptations in HIV-1 we compared the inferred ancestral sequences of HIV-1 groups M, N and O to 12 full length genome sequences of SIVcpzPtt and four of the outlying but closely related SIVcpzPts (from P. t. schweinfurthii). This analysis revealed a single site that was completely conserved among SIVcpzPtt strains but different (due to the same change) in all three groups of HIV-1. This site, Gag-30, lies within p17, the gag-encoded matrix protein. It is Met in SIVcpzPtt, underwent a conservative replacement by Leu in one lineage of SIVcpzPts but changed radically to Arg on all three lineages leading to HIV-1. During subsequent diversification this site has been conserved as a basic residue (Arg or Lys) in most lineages of HIV-1. Retrospective analysis revealed that Gag-30 had reverted to Met in a previous experiment in which HIV-1 was passaged through chimpanzees. To examine whether this substitution conferred a species specific growth advantage, we used site-directed mutagenesis to generate variants of these chimpanzee-adapted HIV-1 strains with Lys at Gag-30, and tested their replication in both human and chimpanzee CD4+ T lymphocytes. Remarkably, viruses encoding Met replicated to higher titers than viruses encoding Lys in chimpanzee T cells, but the opposite was found in human T cells. Taken together, these observations provide compelling evidence for host-specific adaptation during the emergence of HIV-1 and identify the viral matrix protein as a modulator of viral fitness following transmission to the new human host.
doi:10.1093/molbev/msm110
PMCID: PMC4053193  PMID: 17545188
HIV-1; SIV; matrix protein; cross-species transmission; host-specific adaptation
2.  Ancestral Ca2+ signaling machinery in early animal and fungal evolution 
Molecular biology and evolution  2011;29(1):91-100.
Animals and fungi diverged from a common unicellular ancestor of Opisthokonta, yet they exhibit significant differences in their components of Ca2+ signaling pathways. Many Ca2+ signaling molecules appear to be either animal-specific or fungal-specific, which is generally believed to result from lineage-specific adaptations to distinct physiological requirements. Here, by analyzing the genomic data from several close relatives of animals and fungi, we demonstrate that many components of animal and fungal Ca2+ signaling machineries are present in the apusozoan protist Thecamonas trahens, which belongs to the putative unicellular sister group to Opisthokonta. We also identify the conserved portion of Ca2+ signaling molecules in early evolution of animals and fungi following their divergence. Furthermore, our results reveal the lineage-specific expansion of Ca2+ channels and transporters in the unicellular ancestors of animals and in basal fungi. These findings provide novel insights into the evolution and regulation of Ca2+ signaling critical for animal and fungal biology.
doi:10.1093/molbev/msr149
PMCID: PMC4037924  PMID: 21680871
Ca2+ signaling; Ca2+ channel; evolutionary genomics; fungi; metazoan evolution
3.  Next-Generation Sequencing Reveals the Impact of Repetitive DNA Across Phylogenetically Closely Related Genomes of Orobanchaceae 
Molecular biology and evolution  2012;29(11):10.1093/molbev/mss168.
We used next-generation sequencing to characterize the genomes of nine species of Orobanchaceae of known phylogenetic relationships, different life forms, and including a polyploid species. The study species are the autotrophic, nonparasitic Lindenbergia philippensis, the hemiparasitic Schwalbea americana, and seven nonphotosynthetic parasitic species of Orobanche (Orobanche crenata, Orobanche cumana, Orobanche gracilis (tetraploid), and Orobanche pancicii) and Phelipanche (Phelipanche lavandulacea, Phelipanche purpurea, and Phelipanche ramosa). Ty3/Gypsy elements comprise 1.93%–28.34% of the nine genomes and Ty1/Copia elements comprise 8.09%–22.83%. When compared with L. philippensis and S. americana, the nonphotosynthetic species contain higher proportions of repetitive DNA sequences, perhaps reflecting relaxed selection on genome size in parasitic organisms. Among the parasitic species, those in the genus Orobanche have smaller genomes but higher proportions of repetitive DNA than those in Phelipanche, mostly due to a diversification of repeats and an accumulation of Ty3/Gypsy elements. Genome downsizing in the tetraploid O. gracilis probably led to sequence loss across most repeat types.
doi:10.1093/molbev/mss168
PMCID: PMC3859920  PMID: 22723303
next-generation sequencing; polyploidy; genome size; genome downsizing; transposable elements; LTR retrotransposons; Ty3/Gypsy; Orobanche; Phelipanche; Orobanchaceae
4.  Y-chromosomal variation in Sub-Saharan Africa: insights into the history of Niger-Congo groups 
Molecular biology and evolution  2010;28(3):1255-1269.
Technological and cultural innovations, as well as climate changes, are thought to have influenced the diffusion of major language phyla in sub-Saharan Africa. The most widespread and the richest in diversity is the Niger-Congo phylum, thought to have originated in West Africa ~10,000 years ago. The expansion of Bantu languages (a family within the Niger-Congo phylum) ~5,000 years ago represents a major event in the past demography of the continent. Many previous studies on Y chromosomal variation in Africa associated the Bantu expansion with haplogroup E1b1a (and sometimes its sub-lineage E1b1a7). However, the distribution of these two lineages extends far beyond the area occupied nowadays by Bantu speaking people, raising questions on the actual genetic structure behind this expansion. To address these issues, we directly genotyped 31 biallelic markers and 12 microsatellites on the Y chromosome in 1195 individuals of African ancestry focusing on areas that were previously poorly characterized (Botswana, Burkina Faso, D.R.C, and Zambia). With the inclusion of published data, we analyzed 2736 individuals from 26 groups representing all linguistic phyla and covering a large portion of Sub-Saharan Africa. Within the Niger-Congo phylum, we ascertain for the first time differences in haplogroup composition between Bantu and non-Bantu groups via two markers (U174 and U175) on the background of haplogroup E1b1a (and E1b1a7), which were directly genotyped in our samples and for which genotypes were inferred from published data using Linear Discriminant Analysis on STR haplotypes. No reduction in STR diversity levels was found across the Bantu groups, suggesting the absence of serial founder effects. In addition, the homogeneity of haplogroup composition and pattern of haplotype sharing between Western and Eastern Bantu groups suggest that their expansion throughout Sub-Saharan Africa reflects a rapid spread followed by backward and forward migrations. Overall, we found that linguistic affiliations played a notable role in shaping sub-Saharan African Y chromosomal diversity, although the impact of geography is clearly discernible.
doi:10.1093/molbev/msq312
PMCID: PMC3561512  PMID: 21109585
Human; Language; Geography; Migration; Y chromosome; Bantu
5.  Ancestral Ca2+ Signaling Machinery in Early Animal and Fungal Evolution 
Molecular Biology and Evolution  2011;29(1):91-100.
Animals and fungi diverged from a common unicellular ancestor of Opisthokonta, yet they exhibit significant differences in their components of Ca2+ signaling pathways. Many Ca2+ signaling molecules appear to be either animal-specific or fungal-specific, which is generally believed to result from lineage-specific adaptations to distinct physiological requirements. Here, by analyzing the genomic data from several close relatives of animals and fungi, we demonstrate that many components of animal and fungal Ca2+ signaling machineries are present in the apusozoan protist Thecamonas trahens, which belongs to the putative unicellular sister group to Opisthokonta. We also identify the conserved portion of Ca2+ signaling molecules in early evolution of animals and fungi following their divergence. Furthermore, our results reveal the lineage-specific expansion of Ca2+ channels and transporters in the unicellular ancestors of animals and in basal fungi. These findings provide novel insights into the evolution and regulation of Ca2+ signaling critical for animal and fungal biology.
doi:10.1093/molbev/msr149
PMCID: PMC4037924  PMID: 21680871
Ca2+ signaling; Ca2+ channel; evolutionary genomics; fungi; metazoan evolution
6.  Population Genetic Structure in Indian Austroasiatic Speakers: The Role of Landscape Barriers and Sex-Specific Admixture 
Molecular biology and evolution  2010;28(2):1013-1024.
The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in south and southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in southeast Asia with a later dispersal to south Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from south Asia. To test the two alternative models, this study combines the analysis of uniparentally inherited markers with 610,000 common single nucleotide polymorphism loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17–28 thousand years ago) in southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and “structure-like” analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterized by two ancestral components—one represented in the pattern of Y chromosomal and EDAR results and the other by mitochondrial DNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from southeast Asia, followed by extensive sex-specific admixture with local Indian populations.
doi:10.1093/molbev/msq288
PMCID: PMC3355372  PMID: 20978040
Austroasiatic; mtDNA; Y chromosome; autosomes; admixture
7.  Parallel Evolution of Genes and Languages in the Caucasus Region 
Molecular biology and evolution  2011;28(10):2905-2920.
We analyzed 40 SNP and 19 STR Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees, and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language co-evolution occurred within geographically isolated populations, probably due to its mountainous terrain.
doi:10.1093/molbev/msr126
PMCID: PMC3355373  PMID: 21571925
Y chromosome; glottochronology; Caucasus; gene geography
8.  Higher Intensity of Purifying Selection on >90% of the Human Genes Revealed by the Intrinsic Replacement Mutation Rates 
Molecular biology and evolution  2006;23(12):2283-2287.
For over 3 decades, the rate of replacement mutations has been assumed to be equal to, and estimated from, the rate of “strictly” neutral sequence divergence in noncoding regions and in silent-codon positions where mutations do not alter the amino acid encoded. This assumption is fundamental to estimating the fraction of harmful protein mutations and to identifying adaptive evolution at individual codons and proteins. We show that the assumption is not justifiable because a much larger fraction of codon positions is involved in hypermutable CpG dinucleotides as compared with the introns, leading to a higher expected replacement mutation rate per site in a vast majority of the genes. Consideration of this difference reveals a higher intensity of purifying natural selection than previously inferred in human genes. We also show that a much smaller number of genes are expected to be evolving with positive selection than that predicted using sequence divergence at intron and silent positions in the human genome. These patterns indicate the need for using new approaches for estimating rates of amino acid–altering mutations in order to find positively selected genes and codons in genomes that contain hypermutable CpG’s.
doi:10.1093/molbev/msl123
PMCID: PMC3072915  PMID: 16982819
adaptive evolution; mutation rate; test of selection; comparative genomics
9.  Observations of Amino Acid Gain and Loss during Protein Evolution Are Explained by Statistical Bias 
Molecular biology and evolution  2006;23(7):1444-1449.
The authors of a recent manuscript in “Nature” claim to have discovered “universal trends” of amino acid gain and loss in protein evolution. Here, we show that this universal trend can be simply explained by a bias that is unavoidable with the 3-taxon trees used in the original analysis. We demonstrate that a rigorously reversible equilibrium model, when analyzed with the same methods as the “Nature” manuscript, yields identical (and in this case, clearly erroneous) conclusions. A main source of the bias is the division of the sequence data into “informative” and “noninformative” sites, which favors the observation of certain transitions.
doi:10.1093/molbev/msl010
PMCID: PMC2943954  PMID: 16698770
amino acid bias; ancestral reconstruction; molecular evolution; parsimony
10.  From DNA to Fitness Differences: Sequences and Structures of Adaptive Variants of Colias Phosphoglucose Isomerase (PGI) 
Molecular biology and evolution  2005;23(3):499-512.
Colias eurytheme butterflies display extensive allozyme polymorphism in the enzyme phosphoglucose isomerase (PGI). Earlier studies on biochemical and fitness effects of these genotypes found evidence of strong natural selection maintaining this polymorphism in the wild. Here we analyze the molecular features of this polymorphism by sequencing multiple alleles and modeling their structures. PGI is a dimer with rotational symmetry. Each monomer provides a critical residue to the other monomer’s catalytic center. Sequenced alleles differ at multiple amino acid positions, including cryptic charge-neutral variation, but most consistent differences among the electromorph alleles are at the charge-changing amino acid sites. Principal candidate sites of selection, identified by structural and functional analyses and by their variants’ population frequencies, occur in interpenetrating loops across the interface between monomers, where they may alter subunit interactions and catalytic center geometry. Comparison to a second (and basal) species, Colias meadii, also polymorphic for PGI under natural selection, reveals one fixed amino acid difference between their PGIs, which is located in the interpenetrating loop and accompanies functional differences among their variants. We also study nucleotide variability among the PGI alleles, comparing these data to similar data from another glycolytic enzyme gene, glyceraldehyde-3-phosphate dehydrogenase. Despite extensive nonsynonymous and synonymous polymorphism at PGI in each species, the only base changes fixed between species are the two causing the amino acid replacement; this absence of synonymous fixation yields a significant McDonald-Kreitman test. Analyses of these data suggest historical population expansion. Positive peaks of Tajima’s D statistic, representing regions of neutral “hitchhiking,” are found around the principal candidate sites of selection. This study provides novel views of molecular-structural mechanisms, and beginnings of historical evidence, for a long-persistent balanced enzyme polymorphism at PGI in these and perhaps other species.
doi:10.1093/molbev/msj062
PMCID: PMC2943955  PMID: 16292000
adaptive evolution; G3PD; balancing selection; dimeric enzyme evolution; molecular tests of selection; structural basis of heterosis
11.  The Dynamic Nature of Eukaryotic Genomes 
Molecular biology and evolution  2008;25(4):787-794.
Analyses from diverse eukaryotes reveal that genomes are dynamic, sometimes dramatically so. In numerous lineages across the eukaryotic tree of life, DNA content varies within individuals throughout life cycles and among individuals within species. Novel genome features are discovered and our understanding of the extent of genome dynamism continues to grow as more genomes are sequenced. Though most completed eukaryotic genomes are from animals, fungi, and plants, these lineages represent only three of the 60–200 lineages of eukaryotes. Here, we discuss the diverse genomic strategies in exemplar eukaryotic lineages, including several microbial eukaryotes, to reveal dramatic variation that challenges established views of genome evolution. For example, in the life cycle of some members of the ‘radiolaria’ ploidy increases from haploid (N) to approximately 1000N, while intrapopulation variability of the enteric parasite Entamoeba ranges from 4N to 40N. Variation has also been found within our own species, with substantial differences in both gene content and chromosome lengths between individuals. Data on the dynamic nature of genomes shift the perception of the genome from being fixed and characteristic of a species (typological) to plastic due to variation within and between species.
doi:10.1093/molbev/msn032
PMCID: PMC2933061  PMID: 18258610
Ploidy; Microbial Eukaryotes; Genome Evolution; Copy Number Variation; Nuclear Life Cycle
12.  Adaptive Evolution of Proteins Secreted during Sperm Maturation: An Analysis of the Mouse Epididymal Transcriptome 
Molecular biology and evolution  2007;25(2):383-392.
A common pattern observed in molecular evolution is that reproductive genes tend to evolve rapidly. However, most previous studies documenting this rapid evolution are based on genes expressed in just a few male reproductive organs. In mammals, sperm become motile and capable of fertilization only after leaving the testis, during their transit through the epididymis. Thus, genes expressed in the epididymis are expected to play important roles in male fertility. Here, we performed evolutionary genetic analyses on the epididymal transcriptome of mice. Overall, epididymis-expressed genes showed evidence of strong evolutionary constraint, a finding that contrasts with most previous analyses of genes expressed in other male reproductive organs. However, a subset of epididymis-specialized, secreted genes showed several signatures of adaptive evolution, including an increased rate of nonsynonymous evolution. Furthermore, this subset of genes was overrepresented on the X chromosome. Immunity and protein modification functions were significantly overrepresented among epididymis-specialized, secreted genes. These analyses identified a group of genes likely to be important in male reproductive success.
doi:10.1093/molbev/msm265
PMCID: PMC2915769  PMID: 18056076
reproduction; epididymis; evolution; selection
13.  Roles of cis- and trans-Changes in the Regulatory Evolution of Genes in the Gluconeogenic Pathway in Yeast 
Molecular Biology and Evolution  2008;25(9):1863-1875.
The yeast Saccharomyces cerevisiae proliferates rapidly in glucose-containing media. As glucose is getting depleted, yeast cells enter the transition from fermentative to nonfermentative metabolism, known as the diauxic shift, which is associated with major changes in gene expression. To understand the expression evolution of genes involved in the diauxic shift and in nonfermentative metabolism within species, a laboratory strain (BY), a wild strain (RM), and a clinical isolate (YJM) were used in this study. Our data showed that the RM strain enters into the diauxic shift ∼1 h earlier than the BY strain with an earlier, higher induction of many key transcription factors (TFs) involved in the diauxic shift. Our sequence data revealed sequence variations between BY and RM in both coding and promoter regions of the majority of these TFs. The key TF Cat8p, a zinc-finger cluster protein, is required for the expression of many genes in gluconeogenesis under nonfermentative growth, and its derepression is mediated by deactivation of Mig1p. Our kinetic study of CAT8 expression revealed that CAT8 induction corresponded to the timing of glucose depletion in both BY and RM and CAT8 was induced up to 50- to 90-folds in RM, whereas only 20- to 30-folds in BY. In order to decipher the relative importance of cis- and trans-variations in expression divergence in the gluconeogenic pathway during the diauxic shift, we studied the expression levels of MIG1, CAT8, and their downstream target genes in the cocultures and in the hybrid diploids of BY–RM, BY–YJM, and RM–YJM and in strains with swapped promoters. Our data showed that the differences between BY and RM in the expression of MIG1, the upstream regulator of CAT8, were affected mainly by changes in cis-elements, though also by changes in trans-acting factors, whereas those of CAT8 and its downstream target genes were predominantly affected by changes in trans-acting factors.
doi:10.1093/molbev/msn138
PMCID: PMC2515871  PMID: 18573843
cis-regulation; trans-regulation; diauxic shift; expression evolution
14.  Roles of cis- and trans-changes in the regulatory evolution of genes in the gluconeogenic pathway in yeast 
Molecular biology and evolution  2008;25(9):1863-1875.
The yeast Saccharomyces cerevisiae proliferates rapidly in glucose-containing media. As glucose is getting depleted, yeast cells enter the transition from fermentative to non-fermentative metabolism, known as the diauxic shift, which is associated with major changes in gene expression. To understand the expression evolution of genes involved in the diauxic shift and in non-fermentative metabolism within species, a laboratory strain (BY), a wild strain (RM), and a clinical isolate (YJM) were used in this study. Our data showed that the RM strain enters into the diauxic shift ∼1 hour earlier than the BY strain with an earlier, higher induction of many key transcription factors (TFs) involved in the diauxic shift. Our sequence data revealed sequence variations between BY and RM in both coding and promoter regions of the majority of these TFs. The key TF Cat8p, a zinc-finger cluster protein, is required for the expression of many genes in gluconeogenesis under non-fermentative growth and its derepression is mediated by deactivation of Mig1p. Our kinetic study of CAT8 expression revealed that CAT8 induction corresponded to the timing of glucose depletion in both BY and RM and CAT8 was induced up to 50-90 folds in RM, whereas only 20-30 folds in BY. In order to decipher the relative importance of cis- and trans-variations in expression divergence in the gluconeogenic pathway during the diauxic shift, we studied the expression levels of MIG1, CAT8, and their downstream target genes in the co-cultures and in the hybrid diploids of BY-RM, BY-YJM, and RM-YJM, and in strains with swapped promoters. Our data showed that the differences between BY and RM in the expression of MIG1, the upstream regulator of CAT8, were affected mainly by changes in cis elements, though also by changes in trans-acting factors, whereas those of CAT8 and its downstream target genes were predominantly affected by changes in trans-acting factors.
doi:10.1093/molbev/msn138
PMCID: PMC2515871  PMID: 18573843
cis-regulation; trans-regulation; diauxic shift; expression evolution
15.  Inferring Selection on Amino Acid Preference in Protein Domains 
Molecular biology and evolution  2008;26(3):527-536.
Models that explicitly account for the effect of selection on new mutations have been proposed to account for “codon bias” or the excess of “preferred” codons that results from selection for translational efficiency and/or accuracy. In principle, such models can be applied to any mutation that results in a preferred allele, but in most cases, the fitness effect of a specific mutation cannot be predicted. Here we show that it is possible to assign preferred and unpreferred states to amino acid changing mutations that occur in protein domains. We propose that mutations that lead to more common amino acids (at a given position in a domain) can be considered “preferred alleles” just as are synonymous mutations leading to codons for more abundant tRNAs. We use genome-scale polymorphism data to show that alleles for preferred amino acids in protein domains occur at higher frequencies in the population, as has been shown for preferred codons. We show that this effect is quantitative, such that there is a correlation between the shift in frequency of preferred alleles and the predicted fitness effect. As expected, we also observe a reduction in the numbers of polymorphisms and substitutions at more important positions in domains, consistent with stronger selection at those positions. We examine the derived allele frequency distribution and polymorphism to divergence ratios of preferred and unpreferred differences and find evidence for both negative and positive selections acting to maintain protein domains in the human population. Finally, we analyze a model for selection on amino acid preferences in protein domains and find that it is consistent with the quantitative effects that we observe.
doi:10.1093/molbev/msn286
PMCID: PMC2716081  PMID: 19095755
weakly selected variants; protein domains; polymorphism; McDonald–Kreitman test; deleterious; advantageous
16.  Widespread ultraconservation divergence in primates 
Molecular biology and evolution  2008;25(8):1668-1676.
The distribution and evolution of ultraconserved elements (UCEs, DNA stretches that are perfectly identical in primates and rodents) were examined in genomes of three primate species (human, chimpanzee, and rhesus macaque). It was found that the number of UCEs has decreased throughout primate evolution. At least 26% of ancestral UCEs have diverged in hominoids, while an additional 17% have accumulated one or more single nucleotide polymorphisms (SNPs) in the human genome. Sequence polymorphism analyses indicate that mutation fixation within an UCE can trigger a relaxation in the selective constraint on that element. Homogeneous mutation accumulations in UCEs served as a template by which purifying selection acted more effectively on protein-coding UCEs. Gene ontology annotation suggests that UCE sequence variation, primarily occurring in noncoding regions, might be linked to the reprogramming of the expression pattern of transcription factors and developmentally important genes. Many of these genes are expressed in the central nervous system. Finally, UCE sequence variability within human populations has been identified, including population-specific non-synonymous changes in protein-coding regions.
doi:10.1093/molbev/msn116
PMCID: PMC2464743  PMID: 18492662
17.  Multicopy Suppression Underpins Metabolic Evolvability 
Molecular biology and evolution  2007;24(12):2716-2722.
Our understanding of the origins of new metabolic functions is based upon anecdotal genetic and biochemical evidence. Some auxotrophies can be suppressed by overexpressing substrate-ambiguous enzymes (i.e., those that catalyze the same chemical transformation on different substrates). Other enzymes exhibit weak but detectable catalytic promiscuity in vitro (i.e., they catalyze different transformations on similar substrates). Cells adapt to novel environments through the evolution of these secondary activities, but neither their chemical natures nor their frequencies of occurrence have been characterized en bloc. Here, we systematically identified multifunctional genes within the Escherichia coli genome. We screened 104 single-gene knockout strains and discovered that many (20%) of these auxotrophs were rescued by the overexpression of at least one noncognate E. coli gene. The deleted gene and its suppressor were generally unrelated, suggesting that promiscuity is a product of contingency. This genome-wide survey demonstrates that multifunctional genes are common and illustrates the mechanistic diversity by which their products enhance metabolic robustness and evolvability.
doi:10.1093/molbev/msm204
PMCID: PMC2678898  PMID: 17884825
catalytic promiscuity; directed evolution; multicopy suppression; substrate ambiguity
18.  Deciphering Past Human Population Movements in Oceania: Provably Optimal Trees of 127 mtDNA Genomes 
Molecular biology and evolution  2006;23(10):1966-1975.
The settlement of the many island groups of Remote Oceania occurred relatively late in prehistory, beginning approximately 3,000 years ago when people sailed eastwards into the Pacific from Near Oceania, where evidence of human settlement dates from as early as 40,000 years ago. Archeological and linguistic analyses have suggested the settlers of Remote Oceania had ancestry in Taiwan, as descendants of a proposed Neolithic expansion that began approximately 5,500 years ago. Other researchers have suggested that the settlers were descendants of peoples from Island Southeast Asia or the existing inhabitants of Near Oceania alone. To explore patterns of maternal descent in Oceania, we have assembled and analyzed a data set of 137 mitochondrial DNA (mtDNA) genomes from Oceania, Australia, Island Southeast Asia, and Taiwan that includes 19 sequences generated for this project. Using the MinMax Squeeze Approach (MMS), we report the consensus network of 165 most parsimonious trees for the Oceanic data set, increasing by many orders of magnitude the numbers of trees for which a provable minimal solution has been found. The new mtDNA sequences highlight the limitations of partial sequencing for assigning sequences to haplogroups and dating recent divergence events. The provably optimal trees found for the entire mtDNA sequences using the MMS method provide a reliable and robust framework for the interpretation of evolutionary relationships and confirm that the female settlers of Remote Oceania descended from both the existing inhabitants of Near Oceania and more recent migrants into the region.
doi:10.1093/molbev/msl063
PMCID: PMC2674580  PMID: 16855009
human; mtDNA; Oceania; MMS; prehistory
19.  Evolution of the Rho family of ras-like GTPases in eukaryotes 
Molecular Biology and Evolution  2006;24(1):203-216.
GTPases of the Rho family are molecular switches that play important roles in converting and amplifying external signals into cellular effects. Originally demonstrated to control the dynamics of the F-actin cytoskeleton, Rho GTPases have been implicated in many basic cellular processes that influence cell proliferation, differentiation, motility, adhesion, survival or secretion. To elucidate the evolutionary history of the Rho family, we have analyzed over twenty species covering major eukaryotic clades from unicellular organisms to mammals, including platypus and opossum, and have reconstructed the ontogeny and the chronology of emergence of the different subfamilies. Our data establish that the 20 mammalian Rho members are structured into eight subfamilies, among which Rac is the founder of the whole family. Rho, Cdc42, RhoUV and RhoBTB subfamilies appeared before Coelomates, and RhoJQ, RhoDF and Rnd emerged in Chordates. In Vertebrates, gene duplications and retrotranspositions increased the size of each chordate Rho subfamily, while RhoH, the last subfamily, arose probably by horizontal gene transfer. Rac1b, a Rac1 isoform generated by alternative splicing, emerged in amniotes, and RhoD, only in therians. Analysis of Rho mRNA expression patterns in mouse tissues shows that recent subfamilies have tissue-specific specific and low level expression, which supports their implication only in narrow time windows or in differentiated metabolic functions. These findings give a comprehensive view of the evolutionary canvas of the Rho family and provide guides for future structure and evolution studies of other components of Rho signaling pathways, in particular regulators of the RhoGEF family.
doi:10.1093/molbev/msl145
PMCID: PMC2665304  PMID: 17035353
Amino Acid Sequence; Animals; Evolution; Molecular; Fungi; genetics; Gene Duplication; Humans; Invertebrates; genetics; Molecular Sequence Data; Phylogeny; Plants; genetics; Pseudogenes; Sequence Alignment; Vertebrates; genetics; rho GTP-Binding Proteins; genetics
20.  Patterns of Evolution in the Unique tRNA Gene Arrays of the Genus Entamoeba 
Molecular biology and evolution  2007;25(1):187-198.
Genome sequencing of the protistan parasite Entamoeba histolytica HM-1:IMSS revealed that almost all the tRNA genes are organized into tandem arrays that make up over 10% of the genome. The 25 distinct array units contain up to 5 tRNA genes each and some also encode the 5S RNA. Between adjacent genes in array units are complex short tandem repeats (STRs) resembling microsatellites. To investigate the origins and evolution of this unique gene organization, we have undertaken a genome survey to determine the array unit organization in 4 other species of Entamoeba—Entamoeba dispar, Entamoeba moshkovskii, Entamoeba terrapinae, and Entamoeba invadens—and have explored the STR structure in other isolates of E. histolytica. The genome surveys revealed that E. dispar has the same array unit organization as E. histolytica, including the presence and numerical variation of STRs between adjacent genes. However, the individual repeat sequences are completely different to those in E. histolytica. All other species of Entamoeba studied also have tandem arrays of clustered tRNA genes, but the gene composition of the array units often differs from that in E. histolytica/E. dispar. None of the other species' arrays exhibit the complex STRs between adjacent genes although simple tandem duplications are occasionally seen. The degree of similarity in organization reflects the phylogenetic relationships among the species studied. Within individual isolates of E. histolytica most copies of the array unit are uniform in sequence with only minor variation in the number and organization of the STRs. Between isolates, however, substantial differences in STR number and organization can exist although the individual repeat sequences tend to be conserved. The origin of this unique gene organization in the genus Entamoeba clearly predates the common ancestor of the species investigated to date and their function remains unclear.
doi:10.1093/molbev/msm238
PMCID: PMC2652664  PMID: 17974548
Entamoeba; tRNA genes; repeated DNA; recombination
21.  Widespread Positive Selection in Synonymous Sites of Mammalian Genes 
Molecular biology and evolution  2007;24(8):1821-1831.
Evolution of protein sequences is largely governed by purifying selection, with a small fraction of proteins evolving under positive selection. The evolution at synonymous positions in protein-coding genes is not nearly as well understood, with the extent and types of selection remaining, largely, unclear. A statistical test to identify purifying and positive selection at synonymous sites in protein-coding genes was developed. The method compares the rate of evolution at synonymous sites (Ks) to that in intron sequences of the same gene after sampling the aligned intron sequences to mimic the statistical properties of coding sequences. We detected purifying selection at synonymous sites in ∼28% of the 1,562 analyzed orthologous genes from mouse and rat, and positive selection in ∼12% of the genes. Thus, the fraction of genes with readily detectable positive selection at synonymous sites is much greater than the fraction of genes with comparable positive selection at nonsynonymous sites, i.e., at the level of the protein sequence. Unlike other genes, the genes with positive selection at synonymous sites showed no correlation between Ks and the rate of evolution in nonsynonymous sites (Ka), indicating that evolution of synonymous sites under positive selection is decoupled from protein evolution. The genes with purifying selection at synonymous sites showed significant anticorrelation between Ks and expression level and breadth, indicating that highly expressed genes evolve slowly. The genes with positive selection at synonymous sites showed the opposite trend, i.e., highly expressed genes had, on average, higher Ks. For the genes with positive selection at synonymous sites, a significantly lower mRNA stability is predicted compared to the genes with negative selection. Thus, mRNA destabilization could be an important factor driving positive selection in nonsynonymous sites, probably, through regulation of expression at the level of mRNA degradation and, possibly, also translation rate. So, unexpectedly, we found that positive selection at synonymous sites of mammalian genes is substantially more common than positive selection at the level of protein sequences. Positive selection at synonymous sites might act through mRNA destabilization affecting mRNA levels and translation.
doi:10.1093/molbev/msm100
PMCID: PMC2632937  PMID: 17522087
synonymous sites; nonsynonymous sites; positive selection; purifying selection; introns
22.  Excavating past population structures by surname-based sampling: the genetic legacy of the Vikings in northwest England 
Molecular biology and evolution  2007;25(2):301-309.
The genetic structures of past human populations are obscured by recent migrations and expansions, and can been observed only indirectly by inference from modern samples. However, the unique link between a heritable cultural marker, the patrilineal surname, and a genetic marker, the Y chromosome, provides a means to target sets of modern individuals that might resemble populations at the time of surname establishment. As a test case, we studied samples from the Wirral peninsula and West Lancashire, in northwest England. Place names and archaeology show clear evidence of a past Viking presence, but heavy immigration and population growth since the Industrial Revolution are likely to have weakened the genetic signal of a thousand-year-old Scandinavian contribution. Samples ascertained on the basis of two generations of residence were compared with independent samples based on known ancestry in the region, plus the possession of a surname known from historical records to have been present there in medieval times. The Y-chromosomal haplotypes of these two sets of samples are significantly different, and in admixture analyses the surname-ascertained samples show markedly greater Scandinavian ancestry proportions, supporting the idea that northwest England was once heavily populated by Scandinavian settlers. The method of historical surname-based ascertainment promises to allow investigation of the influence of migration and drift over the last few centuries in changing the population structure of Britain, and will have general utility in other regions where surnames are patrilineal and suitable historical records survive.
doi:10.1093/molbev/msm255
PMCID: PMC2628767  PMID: 18032405
Human; Y chromosome; surnames; population; Vikings; admixture
23.  Inferring Selection on Amino Acid Preference in Protein Domains 
Molecular Biology and Evolution  2008;26(3):527-536.
Models that explicitly account for the effect of selection on new mutations have been proposed to account for “codon bias” or the excess of “preferred” codons that results from selection for translational efficiency and/or accuracy. In principle, such models can be applied to any mutation that results in a preferred allele, but in most cases, the fitness effect of a specific mutation cannot be predicted. Here we show that it is possible to assign preferred and unpreferred states to amino acid changing mutations that occur in protein domains. We propose that mutations that lead to more common amino acids (at a given position in a domain) can be considered “preferred alleles” just as are synonymous mutations leading to codons for more abundant tRNAs. We use genome-scale polymorphism data to show that alleles for preferred amino acids in protein domains occur at higher frequencies in the population, as has been shown for preferred codons. We show that this effect is quantitative, such that there is a correlation between the shift in frequency of preferred alleles and the predicted fitness effect. As expected, we also observe a reduction in the numbers of polymorphisms and substitutions at more important positions in domains, consistent with stronger selection at those positions. We examine the derived allele frequency distribution and polymorphism to divergence ratios of preferred and unpreferred differences and find evidence for both negative and positive selections acting to maintain protein domains in the human population. Finally, we analyze a model for selection on amino acid preferences in protein domains and find that it is consistent with the quantitative effects that we observe.
doi:10.1093/molbev/msn286
PMCID: PMC2716081  PMID: 19095755
weakly selected variants; protein domains; polymorphism; McDonald–Kreitman test; deleterious; advantageous
24.  Early Vertebrate Evolution of the TATA-Binding Protein, TBP 
Molecular biology and evolution  2003;20(11):1932-1939.
TBP functions in transcription initiation in all eukaryotes and in Archaebacteria. Although the 181–amino acid (aa) carboxyl (C-) terminal core of the protein is highly conserved, TBP proteins from different phyla exhibit diverse sequences in their amino (N-) terminal region. In mice, the TBP N-terminus plays a role in protecting the placenta from maternal rejection; however the presence of similar TBP N-termini in nontherian tetrapods suggests that this domain also has more primitive functions. To gain insights into the pretherian functions of the N-terminus, we investigated its phylogenetic distribution. TBP cDNAs were isolated from representative nontetrapod jawed vertebrates (zebrafish and shark), from more primitive jawless vertebrates (lamprey and hagfish), and from a prevertebrate cephalochordate (amphioxus). Results showed that the tetrapod N-terminus likely arose coincident with the earliest vertebrates. The primary structures of vertebrate N-termini indicates that, historically, this domain has undergone events involving intragenic duplication and modification of short oligopeptide-encoding DNA sequences, which might have provided a mechanism of de novo evolution of this polypeptide.
doi:10.1093/molbev/msg205
PMCID: PMC2577151  PMID: 12885957
transcription; TFIID; cyclostome; minisatellite duplication; polypeptide genesis
25.  A Unified Model Explaining the Offsets of Overlapping and Near-Overlapping Prokaryotic Genes 
Molecular biology and evolution  2007;24(9):2091-2098.
Overlapping genes are a common phenomenon. Among sequenced prokaryotes, more than 29% of all annotated genes overlap at least 1 of their 2 flanking genes. We present a unified model for the creation and repair of overlaps among adjacent genes where the 3′ ends either overlap or nearly overlap. Our model, derived from a comprehensive analysis of complete prokaryotic genomes in GenBank, explains the nonuniform distribution of the lengths of such overlap regions far more simply than previously proposed models. Specifically, we explain the distribution of overlap lengths based on random extensions of genes to the next occurring downstream stop codon. Our model also provides an explanation for a newly observed (here) pattern in the distribution of the separation distances of closely spaced nonoverlapping genes. We provide evidence that the newly described biased distribution of separation distances is driven by the same phenomenon that creates the uneven distribution of overlap lengths. This suggests a dynamic picture of continual overlap creation and elimination.
doi:10.1093/molbev/msm145
PMCID: PMC2429982  PMID: 17642473
genome analysis; gene finding; overlapping genes; prokaryotes

Results 1-25 (37)