Search tips
Search criteria

Results 1-25 (1126375)

Clipboard (0)

Related Articles

1.  Molecular evolution of the hyperthermophilic archaea of the Pyrococcus genus: analysis of adaptation to different environmental conditions 
BMC Genomics  2009;10:639.
Prokaryotic microorganisms are able to survive and proliferate in severe environmental conditions. The increasing number of complete sequences of prokaryotic genomes has provided the basis for studying the molecular mechanisms of their adaptation at the genomic level. We apply here a computer-based approach to compare the genomes and proteomes from P. furiosus, P. horikoshii, and P. abyssi to identify features of their molecular evolution related to adaptation strategy to diverse environmental conditions.
Phylogenetic analysis of rRNA genes from 26 Pyrococcus strains suggested that the divergence of P. furiosus, P. horikoshii and P. abyssi might have occurred from ancestral deep-sea organisms. It was demonstrated that the function of genes that have been subject to positive Darwinian selection is closely related to abiotic and biotic conditions to which archaea managed to become adapted. Divergence of the P. furiosus archaea might have been due to loss of some genes involved in cell motility or signal transduction, and/or to evolution under positive selection of the genes for translation machinery. In the course of P. horikoshii divergence, positive selection was found to operate mainly on the transcription machinery; divergence of P. abyssi was related with positive selection for the genes mainly involved in inorganic ion transport. Analysis of radical amino acid replacement rate in evolving P. furiosus, P. horikoshii and P. abyssi showed that the fixation rate was higher for radical substitutions relative to the volume of amino acid side-chain.
The current results give due credit to the important role of hydrostatic pressure as a cause of variability in the P. furiosus, P. horikoshii and P. abyssi genomes evolving in different habitats. Nevertheless, adaptation to pressure does not appear to be the sole factor ensuring adaptation to environment. For example, at the stage of the divergence of P. horikoshii and P. abyssi, an essential evolutionary role may be assigned to changes in the trophic chain, namely, acquisition of a consumer status at a high (P. horikoshii) or low level (P. abyssi).
PMCID: PMC2816203  PMID: 20042074
2.  Ecotype Diversity and Conversion in Photobacterium profundum Strains 
PLoS ONE  2014;9(5):e96953.
Photobacterium profundum is a cosmopolitan marine bacterium capable of growth at low temperature and high hydrostatic pressure. Multiple strains of P. profundum have been isolated from different depths of the ocean and display remarkable differences in their physiological responses to pressure. The genome sequence of the deep-sea piezopsychrophilic strain Photobacterium profundum SS9 has provided some clues regarding the genetic features required for growth in the deep sea. The sequenced genome of Photobacterium profundum strain 3TCK, a non-piezophilic strain isolated from a shallow-water environment, is now available and its analysis expands the identification of unique genomic features that correlate to environmental differences and define the Hutchinsonian niche of each strain. These differences range from variations in gene content to specific gene sequences under positive selection. Genome plasticity between Photobacterium bathytypes was investigated when strain 3TCK-specific genes involved in photorepair were introduced to SS9, demonstrating that horizontal gene transfer can provide a mechanism for rapid colonisation of new environments.
PMCID: PMC4019646  PMID: 24824441
3.  Laterally transferred elements and high pressure adaptation in Photobacterium profundum strains 
BMC Genomics  2005;6:122.
Oceans cover approximately 70% of the Earth's surface with an average depth of 3800 m and a pressure of 38 MPa, thus a large part of the biosphere is occupied by high pressure environments. Piezophilic (pressure-loving) organisms are adapted to deep-sea life and grow optimally at pressures higher than 0.1 MPa. To better understand high pressure adaptation from a genomic point of view three different Photobacterium profundum strains were compared. Using the sequenced piezophile P. profundum strain SS9 as a reference, microarray technology was used to identify the genomic regions missing in two other strains: a pressure adapted strain (named DSJ4) and a pressure-sensitive strain (named 3TCK). Finally, the transcriptome of SS9 grown under different pressure (28 MPa; 45 MPa) and temperature (4°C; 16°C) conditions was analyzed taking into consideration the differentially expressed genes belonging to the flexible gene pool.
These studies indicated the presence of a large flexible gene pool in SS9 characterized by various horizontally acquired elements. This was verified by extensive analysis of GC content, codon usage and genomic signature of the SS9 genome. 171 open reading frames (ORFs) were found to be specifically absent or highly divergent in the piezosensitive strain, but present in the two piezophilic strains. Among these genes, six were found to also be up-regulated by high pressure.
These data provide information on horizontal gene flow in the deep sea, provide additional details of P. profundum genome expression patterns and suggest genes which could perform critical functions for abyssal survival, including perhaps high pressure growth.
PMCID: PMC1239915  PMID: 16162277
4.  Strong Purifying Selection at Synonymous Sites in D. melanogaster 
PLoS Genetics  2013;9(5):e1003527.
Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in Drosophila melanogaster evolve under very strong selective constraint while few, if any, appear to be under weak constraint. Linking polymorphism with divergence data, we further find that the fraction of synonymous sites exposed to strong purifying selection is higher for those positions that show slower evolution on the Drosophila phylogeny. The function underlying the inferred strong constraint appears to be separate from splicing enhancers, nucleosome positioning, and the translational optimization generating canonical codon bias. The fraction of synonymous sites under strong constraint within a gene correlates well with gene expression, particularly in the mid-late embryo, pupae, and adult developmental stages. Genes enriched in strongly constrained synonymous sites tend to be particularly functionally important and are often involved in key developmental pathways. Given that the observed widespread constraint acting on synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.
Author Summary
Synonymous mutations do not alter the sequence of amino acids encoded by the gene in which they occur. These synonymous mutations were thus long thought to have no effect on the function of the ensuing protein or the fitness of the organism. At four-fold degenerate sites, every possible mutation is synonymous. For this reason, they are often neglected as a possible source of important functional changes. Using a deep sampling of the variation within a population of the fruit fly Drosophila melanogaster, we show that, contrary to this expectation, 22% of synonymous mutations at four-fold degenerate sites are strongly deleterious to the point of absence in the Drosophila population. The underlying biological function disrupted by these mutations is unknown, but is not related to the forces generally believed to be the principal actors shaping the evolution of synonymous sites. Genes with many such possible deleterious synonymous mutations tend to be particularly functionally important, highly expressed, and often involved in key developmental pathways. Given that the observed functional importance of synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.
PMCID: PMC3667748  PMID: 23737754
5.  High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus 
BMC Genomics  2010;11:559.
Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology.
A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments.
We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels.
The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism.
PMCID: PMC3091708  PMID: 20937131
6.  Correlated Evolution of Nearby Residues in Drosophilid Proteins 
PLoS Genetics  2011;7(2):e1001315.
Here we investigate the correlations between coding sequence substitutions as a function of their separation along the protein sequence. We consider both substitutions between the reference genomes of several Drosophilids as well as polymorphisms in a population sample of Zimbabwean Drosophila melanogaster. We find that amino acid substitutions are “clustered” along the protein sequence, that is, the frequency of additional substitutions is strongly enhanced within ≈10 residues of a first such substitution. No such clustering is observed for synonymous substitutions, supporting a “correlation length” associated with selection on proteins as the causative mechanism. Clustering is stronger between substitutions that arose in the same lineage than it is between substitutions that arose in different lineages. We consider several possible origins of clustering, concluding that epistasis (interactions between amino acids within a protein that affect function) and positional heterogeneity in the strength of purifying selection are primarily responsible. The role of epistasis is directly supported by the tendency of nearby substitutions that arose on the same lineage to preserve the total charge of the residues within the correlation length and by the preferential cosegregation of neighboring derived alleles in our population sample. We interpret the observed length scale of clustering as a statistical reflection of the functional locality (or modularity) of proteins: amino acids that are near each other on the protein backbone are more likely to contribute to, and collaborate toward, a common subfunction.
Author Summary
Genes are templates for proteins, yet evolutionary studies of genes and proteins often bear little resemblance. Analyses of gene evolution typically treat each codon independently, quantifying gene evolution by summing over the constituent codons. In contrast, studies of protein evolution generally incorporate protein structure and interactions between amino acids explicitly. We investigate correlations in the evolution of codons as a function of their distance from each other along the protein coding sequence. This approach is motivated by the expectation that codons near each other in sequence often encode amino acids belonging to the same functional unit. Consequently, these amino acids are more likely to interact and/or experience similar selective regimes, introducing correlation between the evolution of the underlying codons. We find codon evolution in Drosophilids to be correlated over a characteristic length scale of ≈10 codons. Specifically, the presence of a non-synonymous substitution substantially increases the probability of further such substitutions nearby, particularly within that lineage. Further analysis suggests both functional interactions between amino acids and correlation in the strength of selection contribute to this effect. These findings are relevant for understanding the relative importance of different modes of selection, and particularly the role of epistasis, in gene and protein evolution.
PMCID: PMC3044683  PMID: 21383965
7.  Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis 
Nucleotide and amino acid substitution tendencies are characteristic of each species, organelle, and protein family. Hence, various empirical amino acid substitution rate matrices have needed to be estimated for phylogenetic analysis: JTT, WAG, and LG for nuclear proteins, mtREV for mitochondrial proteins, cpREV10 and cpREV64 for chloroplast-encoded proteins, and FLU for influenza proteins. On the other hand, in a mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the ratio of fixation depending on the type of amino acid replacement, mutation rates and the strength of selective constraint on amino acids can be tailored to each protein family with additional 11 parameters. As a result, in the evolutionary analysis of codon sequences it outperforms codon substitution models equivalent to empirical amino acid substitution matrices. Is it superior even for amino acid sequences, among which synonymous substitutions cannot be identified?
Nucleotide mutations are assumed to occur independently of codon positions but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene with a linear function of a given estimate of selective constraints, which were estimated by maximizing the likelihood of an empirical amino acid or codon substitution frequency matrix, each of JTT, WAG, LG, and KHG. It is shown that the mechanistic codon substitution model with the assumption of equal codon usage yields better values of Akaike and Bayesian information criteria for all three phylogenetic trees of mitochondrial, chloroplast, and influenza-A hemagglutinin proteins than the empirical amino acid substitution models with mtREV, cpREV64, and FLU, which were designed specifically for those protein families, respectively. The variation of selective constraint across sites fits the datasets significantly better than variable codon mutation rates, confirming that substitution rate variations across sites detected by amino acid substitution models are caused primarily by the variation of selective constraint against amino acid substitutions rather than the variation of codon mutation rate.
The mechanistic codon substitution model is superior to amino acid substitution models even in the evolutionary analysis of protein sequences.
PMCID: PMC4225520  PMID: 24256155
Amino acid substitution model; Empirical amino acid substitution rate matrix; Mechanistic codon substitution model; Structural constraints; Functional constraints; Selective constraints; Variable selective constraint across sites; Variable mutation rate across sites; multiple nucleotide change
8.  Extreme Population Differences in the Human Zinc Transporter ZIP4 (SLC39A4) Are Explained by Positive Selection in Sub-Saharan Africa 
PLoS Genetics  2014;10(2):e1004128.
Extreme differences in allele frequency between West Africans and Eurasians were observed for a leucine-to-valine substitution (Leu372Val) in the human intestinal zinc uptake transporter, ZIP4, yet no further evidence was found for a selective sweep around the ZIP4 gene (SLC39A4). By interrogating allele frequencies in more than 100 diverse human populations and resequencing Neanderthal DNA, we confirmed the ancestral state of this locus and found a strong geographical gradient for the derived allele (Val372), with near fixation in West Africa. In extensive coalescent simulations, we show that the extreme differences in allele frequency, yet absence of a classical sweep signature, can be explained by the effect of a local recombination hotspot, together with directional selection favoring the Val372 allele in Sub-Saharan Africans. The possible functional effect of the Leu372Val substitution, together with two pathological mutations at the same codon (Leu372Pro and Leu372Arg) that cause acrodermatitis enteropathica (a disease phenotype characterized by extreme zinc deficiency), was investigated by transient overexpression of human ZIP4 protein in HeLa cells. Both acrodermatitis mutations cause absence of the ZIP4 transporter cell surface expression and nearly absent zinc uptake, while the Val372 variant displayed significantly reduced surface protein expression, reduced basal levels of intracellular zinc, and reduced zinc uptake in comparison with the Leu372 variant. We speculate that reduced zinc uptake by the ZIP4-derived Val372 isoform may act by starving certain pathogens of zinc, and hence may have been advantageous in Sub-Saharan Africa. Moreover, these functional results may indicate differences in zinc homeostasis among modern human populations with possible relevance for disease risk.
Author Summary
Zinc is an essential trace element with many biological functions in the body, whose concentrations are tightly regulated by different membrane transporters. Here we report an unusual case of positive natural selection for an amino acid replacement in the human intestinal zinc uptake transporter ZIP4. This substitution is recognized as one of the most strongly differentiated genome-wide polymorphisms among human populations. However, since the extreme population differentiation of this non-synonymous site was not accompanied by additional signatures of natural selection, it was unclear whether it was the result of genetic adaptation. Using computer simulations we demonstrate that such an unusual pattern can be explained by the effect of local recombination, together with positive selection in Sub-Saharan Africa. Moreover, we provide evidence to suggest functional differences between the two ZIP4 isoforms in terms of the transporter cell surface expression and zinc uptake. This result is the first genetic indication that zinc regulation may differ among modern human populations, a finding that may have implications for health research. Further, we speculate that reduced zinc uptake mediated by the derived variant may have been advantageous in Sub-Saharan Africa, possibly by reducing access of a geographically restricted pathogen to this micronutrient.
PMCID: PMC3930504  PMID: 24586184
9.  Properties of the Glucose Transport System in Some Deep-Sea Bacteria 
Many deep-sea bacteria are specifically adapted to flourish under the high hydrostatic pressures which exist in their natural environment. For better understanding of the physiology and biochemistry of these microorganisms, properties of the glucose transport systems in two barophilic isolates (PE-36, CNPT-3) and one psychrophilic marine bacterium (Vibrio marinus MP1) were studied. These bacteria use a phosphoenol-pyruvate:sugar phosphotransferase system (PTS) for glucose transport, similar to that found in many members of the Vibrionaceae and Enterobacteriaceae. The system was highly specific for glucose and its nonmetabolizable analog, methyl alpha-glucoside (a-MG), and exhibited little affinity for other sugars tested. The temperature optimum for glucose phosphorylation in vitro was approximately 20°C. Membrane-bound PTS components of deep-sea bacteria were capable of enzymatically cross-reacting with the soluble PTS enzymes of Salmonella typhimurium, indicating functional similarities between the PTS systems of these organisms. In CNPT-3 and V. marinus, increased pressure had an inhibitory effect on a-MG uptake, to the greatest extent in V. marinus. Relative to atmospheric pressure, increased pressure stimulated sugar uptake in the barophilic isolate PE-36 considerably. Increased hydrostatic pressure inhibited in vitro phosphoenolpyruvate-dependent a-MG phosphorylation catalyzed by crude extracts of V. marinus and PE-36 but enhanced this activity in crude extracts of the barophile CNPT-3. Both of the pressure-adapted barophilic bacteria were capable of a-MG uptake at higher pressures than was the nonbarophilic psychrophile, V. marinus.
PMCID: PMC203701  PMID: 16347302
10.  The Deep-Sea Bacterium Photobacterium profundum SS9 Utilizes Separate Flagellar Systems for Swimming and Swarming under High-Pressure Conditions ▿ †  
Applied and Environmental Microbiology  2008;74(20):6298-6305.
Motility is a critical function needed for nutrient acquisition, biofilm formation, and the avoidance of harmful chemicals and predators. Flagellar motility is one of the most pressure-sensitive cellular processes in mesophilic bacteria; therefore, it is ecologically relevant to determine how deep-sea microbes have adapted their motility systems for functionality at depth. In this study, the motility of the deep-sea piezophilic bacterium Photobacterium profundum SS9 was investigated and compared with that of the related shallow-water piezosensitive strain Photobacterium profundum 3TCK, as well as that of the well-studied piezosensitive bacterium Escherichia coli. The SS9 genome contains two flagellar gene clusters: a polar flagellum gene cluster (PF) and a putative lateral flagellum gene cluster (LF). In-frame deletions were constructed in the two flagellin genes located within the PF cluster (flaA and flaC), the one flagellin gene located within the LF cluster (flaB), a component of a putative sodium-driven flagellar motor (motA2), and a component of a putative proton-driven flagellar motor (motA1). SS9 PF flaA, flaC, and motA2 mutants were defective in motility under all conditions tested. In contrast, the flaB and motA1 mutants were defective only under conditions of high pressure and high viscosity. flaB and motA1 gene expression was strongly induced by elevated pressure plus increased viscosity. Direct swimming velocity measurements were obtained using a high-pressure microscopic chamber, where increases in pressure resulted in a striking decrease in swimming velocity for E. coli and a gradual reduction for 3TCK which proceeded up to 120 MPa, while SS9 increased swimming velocity at 30 MPa and maintained motility up to a maximum pressure of 150 MPa. Our results indicate that P. profundum SS9 possesses two distinct flagellar systems, both of which have acquired dramatic adaptations for optimal functionality under high-pressure conditions.
PMCID: PMC2570297  PMID: 18723648
11.  Tubulin evolution in insects: gene duplication and subfunctionalization provide specialized isoforms in a functionally constrained gene family 
The completion of 19 insect genome sequencing projects spanning six insect orders provides the opportunity to investigate the evolution of important gene families, here tubulins. Tubulins are a family of eukaryotic structural genes that form microtubules, fundamental components of the cytoskeleton that mediate cell division, shape, motility, and intracellular trafficking. Previous in vivo studies in Drosophila find a stringent relationship between tubulin structure and function; small, biochemically similar changes in the major alpha 1 or testis-specific beta 2 tubulin protein render each unable to generate a motile spermtail axoneme. This has evolutionary implications, not a single non-synonymous substitution is found in beta 2 among 17 species of Drosophila and Hirtodrosophila flies spanning 60 Myr of evolution. This raises an important question, How do tubulins evolve while maintaining their function? To answer, we use molecular evolutionary analyses to characterize the evolution of insect tubulins.
Sixty-six alpha tubulins and eighty-six beta tubulin gene copies were retrieved and subjected to molecular evolutionary analyses. Four ancient clades of alpha and beta tubulins are found in insects, a major isoform clade (alpha 1, beta 1) and three minor, tissue-specific clades (alpha 2-4, beta 2-4). Based on a Homarus americanus (lobster) outgroup, these were generated through gene duplication events on major beta and alpha tubulin ancestors, followed by subfunctionalization in expression domain. Strong purifying selection acts on all tubulins, yet maximum pairwise amino acid distances between tubulin paralogs are large (0.464 substitutions/site beta tubulins, 0.707 alpha tubulins). Conversely orthologs, with the exception of reproductive tissue isoforms, show little sequence variation except in the last 15 carboxy terminus tail (CTT) residues, which serve as sites for post-translational modifications (PTMs) and interactions with microtubule-associated proteins. CTT residues overwhelming comprise the co-evolving residues between Drosophila alpha 2 and beta 3 tubulin proteins, indicating CTT specializations can be mediated at the level of the tubulin dimer. Gene duplications post-dating separation of the insect orders are unevenly distributed, most often appearing in major alpha 1 and minor beta 2 clades. More than 40 introns are found in tubulins. Their distribution among tubulins reveals that insertion and deletion events are common, surprising given their potential for disrupting tubulin coding sequence. Compensatory evolution is found in Drosophila beta 2 tubulin cis-regulation, and reveals selective pressures acting to maintain testis expression without the use of previously identified testis cis-regulatory elements.
Tubulins have stringent structure/function relationships, indicated by strong purifying selection, the loss of many gene duplication products, alpha-beta co-evolution in the tubulin dimer, and compensatory evolution in beta 2 tubulin cis-regulation. They evolve through gene duplication, subfunctionalization in expression domain and divergence of duplication products, largely in CTT residues that mediate interactions with other proteins. This has resulted in the tissue-specific minor insect isoforms, and in particular the highly diverse α3, α4, and β2 reproductive tissue-specific tubulin isoforms, illustrating that even a highly conserved protein family can participate in the adaptive process and respond to sexual selection.
PMCID: PMC2880298  PMID: 20423510
12.  Contrasted patterns of selective pressure in three recent paralogous gene pairs in the Medicago genus (L.) 
Gene duplications are a molecular mechanism potentially mediating generation of functional novelty. However, the probabilities of maintenance and functional divergence of duplicated genes are shaped by selective pressures acting on gene copies immediately after the duplication event. The ratio of non-synonymous to synonymous substitution rates in protein-coding sequences provides a means to investigate selective pressures based on genic sequences. Three molecular signatures can reveal early stages of functional divergence between gene copies: change in the level of purifying selection between paralogous genes, occurrence of positive selection, and transient relaxed purifying selection following gene duplication. We studied three pairs of genes that are known to be involved in an interaction with symbiotic bacteria and were recently duplicated in the history of the Medicago genus (Fabaceae). We sequenced two pairs of polygalacturonase genes (Pg11-Pg3 and Pg11a-Pg11c) and one pair of auxine transporter-like genes (Lax2-Lax4) in 17 species belonging to the Medicago genus, and sought for molecular signatures of differentiation between copies.
Selective histories revealed by these three signatures of molecular differentiation were found to be markedly different between each pair of paralogs. We found sites under positive selection in the Pg11 paralogs while Pg3 has mainly evolved under purifying selection. The most recent paralogs examined Pg11a and Pg11c, are both undergoing positive selection and might be acquiring new functions. Lax2 and Lax4 paralogs are both under strong purifying selection, but still underwent a temporary relaxation of purifying selection immediately after duplication.
This study illustrates the variety of selective pressures undergone by duplicated genes and the effect of age of the duplication. We found that relaxation of selective constraints immediately after duplication might promote adaptive divergence.
PMCID: PMC3517903  PMID: 23025552
Duplication; Medicago; Neofunctionalization; Subfunctionalization; Paralogs evolution
13.  WSPMaker: a web tool for calculating selection pressure in proteins and domains using window-sliding 
BMC Bioinformatics  2008;9(Suppl 12):S13.
In the study of adaptive evolution, it is important to detect the protein coding sites where natural selection is acting. In general, the ratio of the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks) is used to estimate either negative or positive selection for an entire gene region of interest. However, since each amino acid in a region has a different function and structure, the type and strength of natural selection may be different for each amino acid. Specifically, domain sites on the protein are indicative of structurally and functionally important sites. Therefore, Window-sliding tools can be used to detect evolutionary forces acting on mutation sites.
This paper reports the development of a web-based tool, WSPMaker (Window-sliding Selection pressure Plot Maker), for calculating selection pressures (estimated by Ka/Ks) in the sub-regions of two protein-coding DNA sequences (CDSs). The program uses a sliding window on DNA with a user-defined window length. This enables the investigation of adaptive protein evolution and shows selective constraints of the overall/specific region(s) of two orthologous gene-coding DNA sequences. The method accommodates various evolutionary models and options such as the sliding window size. WSPmaker uses domain information from Pfam HMM models to detect highly conserved residues within orthologous proteins.
WSPMaker is a web tool for scanning and calculating selection pressures (estimated by Ka/Ks) in sub-regions of two protein-coding DNA sequences (CDSs).
PMCID: PMC2638153  PMID: 19091012
14.  Mining a database of single amplified genomes from Red Sea brine pool extremophiles—improving reliability of gene function prediction using a profile and pattern matching algorithm (PPMA) 
Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website.
PMCID: PMC3985023  PMID: 24778629
bioinformatics; single amplified genomes; halophiles; extermophile; protein sequence consensus patterns; PROSITE IDs; GO-terms; functional genomics
15.  Frequent Toggling between Alternative Amino Acids Is Driven by Selection in HIV-1 
PLoS Pathogens  2008;4(12):e1000242.
Host immune responses against infectious pathogens exert strong selective pressures favouring the emergence of escape mutations that prevent immune recognition. Escape mutations within or flanking functionally conserved epitopes can occur at a significant cost to the pathogen in terms of its ability to replicate effectively. Such mutations come under selective pressure to revert to the wild type in hosts that do not mount an immune response against the epitope. Amino acid positions exhibiting this pattern of escape and reversion are of interest because they tend to coincide with immune responses that control pathogen replication effectively. We have used a probabilistic model of protein coding sequence evolution to detect sites in HIV-1 exhibiting a pattern of rapid escape and reversion. Our model is designed to detect sites that toggle between a wild type amino acid, which is susceptible to a specific immune response, and amino acids with lower replicative fitness that evade immune recognition. Through simulation, we show that this model has significantly greater power to detect selection involving immune escape and reversion than standard models of diversifying selection, which are sensitive to an overall increased rate of non-synonymous substitution. Applied to alignments of HIV-1 protein coding sequences, the model of immune escape and reversion detects a significantly greater number of adaptively evolving sites in env and nef. In all genes tested, the model provides a significantly better description of adaptively evolving sites than standard models of diversifying selection. Several of the sites detected are corroborated by association between Human Leukocyte Antigen (HLA) and viral sequence polymorphisms. Overall, there is evidence for a large number of sites in HIV-1 evolving under strong selective pressure, but exhibiting low sequence diversity. A phylogenetic model designed to detect rapid toggling between wild type and escape amino acids identifies a larger number of adaptively evolving sites in HIV-1, and can in some cases correctly identify the amino acid that is susceptible to the immune response.
Author Summary
Viruses, such as HIV, are able to evade host immune responses through escape mutations, yet sometimes they do so at a cost. This cost is the reduction in the ability of the virus to replicate, and thus selective pressure exists for a virus to revert to its original state in the absence of the host immune response that caused the initial escape mutation. This pattern of escape and reversion typically occurs when viruses are transmitted between individuals with different immune responses. We develop a phylogenetic model of immune escape and reversion and provide evidence that it outperforms existing models for the detection of selective pressure associated with host immune responses. Finally, we demonstrate that amino acid toggling is a pervasive process in HIV-1 evolution, such that many of the positions in the virus that evolve rapidly, under the influence of positive Darwinian selection, nonetheless display quite low sequence diversity. This highlights the limitations of HIV-1 evolution, and sites such as these are potentially good targets for HIV-1 vaccines.
PMCID: PMC2592544  PMID: 19096508
16.  Lineage-Specific Differences in the Amino Acid Substitution Process 
Journal of molecular biology  2010;396(5):1410-1421.
In Darwinian evolution, mutations occur approximately at random in a gene, turned into amino acid mutations by the genetic code. Some mutations are fixed to become substitutions and some are eliminated from the population. Partitioning pairs of closely related species with complete genome sequences by average population size of each pair, we looked at the substitution matrices generated for these partitions and compared the substitution patterns between species. We estimated a population genetic model that relates the relative fixation probabilities of different types of mutations to the selective pressure and population size. Parameterizations of the average and distribution of selective pressures for different amino acid substitution types in different population size comparisons were generated with a Bayesian framework. We found that partitions in population size as well as in substitution type are required to explain the substitution data. Selection coefficients were found to decrease with increasingly radical amino acid substitution and with increasing effective population size.
To further explore the role of underlying processes in amino acid substitution, we analyzed embryophyte (plant) gene families from TAED (The Adaptive Evolution Database), where solved structures for at least one member exist in the Protein Data Bank. Using PAML, we assigned branches to three categories: strong negative selection, moderate negative selection/ neutrality, and positive diversifying selection. Focusing on the first and third categories, we identified sites changing along gene family lineages and observed the spatial patterns of substitution. Selective sweeps were expected to create primary sequence clustering under positive diversifying selection. Co-evolution through direct physical interaction was expected to cause tertiary structural clustering. Under both positive and negative selection, the substitution patterns were found to be nonrandom. Under positive diversifying selection, significant independent signals were found for primary and tertiary sequence clustering, suggesting roles for both selective sweeps and direct physical interaction. Under strong negative selection, the signals were not found to be independent. All together, a complex interplay of population genetic and protein thermodynamics forces is suggested.
PMCID: PMC2850115  PMID: 20004669
molecular evolution; protein structure; sequence–structure relationships; population genetics; selection
17.  Lophelia pertusa corals from the Ionian and Barents seas share identical nuclear ITS2 and near-identical mitochondrial genome sequences 
BMC Research Notes  2013;6:144.
Lophelia pertusa is a keystone cold-water coral species with a widespread distribution. Due to the lack of a mitochondrial marker variable enough for intraspecific analyses, the population structure of this species has only been studied using ITS and microsatellites so far. We therefore decided to sequence and compare complete mitochondrial genomes from two distant L. pertusa populations putatively isolated from each other (in the Barents Sea off Norway and in the Mediterranean Sea off Italy) in the hope of finding regions variable enough for population genetic and phylogeographic studies.
The mitogenomes of two L. pertusa individuals collected in the Mediterranean and Barents seas differed at only one position, which was a non-synonymous substitution, but comparison with another recently published L. pertusa mitochondrial genome sequence from Norway revealed 18 nucleotide differences. These included two synonymous and nine non-synonymous substitutions in protein-coding genes (dN/dS > 1): hence, the mitogenome of L. pertusa may be experiencing positive selection. To test for the presence of cryptic species, the mitochondrial control region and the nuclear ITS2 were sequenced for five individuals from each site: Italian and Norwegian populations turned out to share haplotypes of both markers, indicating that they belonged to the same species.
L. pertusa corals collected 7,500 km apart shared identical nuclear ITS2 and near-identical mitogenomes, supporting the hypothesis of a recent connection between Lophelia reefs in the Mediterranean and in the Northern Atlantic. Multi-locus or population genomic approaches will be required to shed further light on the genetic connectivity between L. pertusa reefs across Europe; nevertheless, ITS2 and the mitochondrial control region may be useful markers for investigating the phylogeography and species boundaries of the keystone genus Lophelia across its worldwide area of distribution.
PMCID: PMC3637110  PMID: 23578100
Mitogenomics; Control region; Internal transcribed spacer; Haploweb; Phylogeography; Mediterranean outflow water
18.  Development of a genetic system for the deep-sea psychrophilic bacterium Pseudoalteromonas sp. SM9913 
Pseudoalteromonas species are a group of marine gammaproteobacteria frequently found in deep-sea sediments, which may play important roles in deep-sea sediment ecosystem. Although genome sequence analysis of Pseudoalteromonas has revealed some specific features associated with adaptation to the extreme deep-sea environment, it is still difficult to study how Pseudoalteromonas adapt to the deep-sea environment due to the lack of a genetic manipulation system. The aim of this study is to develop a genetic system in the deep-sea sedimentary bacterium Pseudoalteromonas sp. SM9913, making it possible to perform gene mutation by homologous recombination.
The sensitivity of Pseudoalteromonas sp. SM9913 to antibiotic was investigated and the erythromycin resistance gene was chosen as the selective marker. A shuttle vector pOriT-4Em was constructed and transferred into Pseudoalteromonas sp. SM9913 through intergeneric conjugation with an efficiency of 1.8 × 10-3, which is high enough to perform the gene knockout assay. A suicide vector pMT was constructed using pOriT-4Em as the bone vector and sacB gene as the counterselective marker. The epsT gene encoding the UDP-glucose lipid carrier transferase was selected as the target gene for inactivation by in-frame deletion. The epsT was in-frame deleted using a two-step integration–segregation strategy after transferring the suicide vector pMT into Pseudoalteromonas sp. SM9913. The ΔepsT mutant showed approximately 73% decrease in the yield of exopolysaccharides, indicating that epsT is an important gene involved in the EPS production of SM9913.
A conjugal transfer system was constructed in Pseudoalteromonas sp. SM9913 with a wide temperature range for selection and a high transfer efficiency, which will lay the foundation of genetic manipulation in this strain. The epsT gene of SM9913 was successfully deleted with no selective marker left in the chromosome of the host, which thus make it possible to knock out other genes in the same host. The construction of a gene knockout system for Pseudoalteromonas sp. SM9913 will contribute to the understanding of the molecular mechanism of how Pseudoalteromonas adapt to the deep-sea environment.
PMCID: PMC3930924  PMID: 24450434
19.  Adaptive evolution of the matrix extracellular phosphoglycoprotein in mammals 
Matrix extracellular phosphoglycoprotein (MEPE) belongs to a family of small integrin-binding ligand N-linked glycoproteins (SIBLINGs) that play a key role in skeleton development, particularly in mineralization, phosphate regulation and osteogenesis. MEPE associated disorders cause various physiological effects, such as loss of bone mass, tumors and disruption of renal function (hypophosphatemia). The study of this developmental gene from an evolutionary perspective could provide valuable insights on the adaptive diversification of morphological phenotypes in vertebrates.
Here we studied the adaptive evolution of the MEPE gene in 26 Eutherian mammals and three birds. The comparative genomic analyses revealed a high degree of evolutionary conservation of some coding and non-coding regions of the MEPE gene across mammals indicating a possible regulatory or functional role likely related with mineralization and/or phosphate regulation. However, the majority of the coding region had a fast evolutionary rate, particularly within the largest exon (1467 bp). Rodentia and Scandentia had distinct substitution rates with an increased accumulation of both synonymous and non-synonymous mutations compared with other mammalian lineages. Characteristics of the gene (e.g. biochemical, evolutionary rate, and intronic conservation) differed greatly among lineages of the eight mammalian orders. We identified 20 sites with significant positive selection signatures (codon and protein level) outside the main regulatory motifs (dentonin and ASARM) suggestive of an adaptive role. Conversely, we find three sites under selection in the signal peptide and one in the ASARM motif that were supported by at least one selection model. The MEPE protein tends to accumulate amino acids promoting disorder and potential phosphorylation targets.
MEPE shows a high number of selection signatures, revealing the crucial role of positive selection in the evolution of this SIBLING member. The selection signatures were found mainly outside the functional motifs, reinforcing the idea that other regions outside the dentonin and the ASARM might be crucial for the function of the protein and future studies should be undertaken to understand its importance.
PMCID: PMC3250972  PMID: 22103247
20.  Mechanisms of wavelength tuning in the rod opsins of deep-sea fishes. 
The main object of this study was to investigate the molecular basis for changes in the spectral sensitivity of the visual pigments of deep-sea fishes. The four teleost species studied, Hoplostethus mediterraneus, Cataetyx laticeps, Gonostoma elongatum and Histiobranchus bathybius, are phylogenetically distant from each other and live at depths ranging from 500 to almost 5000 m. A single fragment of the intronless rod opsin gene was PCR-amplified from each fish and sequenced. The wavelength of peak sensitivity for the rod visual pigments of the four deep-sea species varies from 483 nm in H. mediterraneus and G. elongatum to 468 nm in C. laticeps. Six amino acids at sites on the inner face of the chromophore-binding pocket formed by the seven transmembrane a-helices are identified as candidates for spectral tuning. Substitutions at these sites involve either a change of charge, or a gain or loss of a hydroxyl group. Two of these, at positions 83 and 292, are consistently substituted in the visual pigments of all four species and are likely to be responsible for the shortwave sensitivity of the pigments. Shifts to wavelengths shorter than 480 nm may involve substitution at one or more of the remaining four sites. None of the modifications found in the derived sequences of these opsins suggest functional adaptations, such as increased content of hydroxyl-bearing or proline residues, to resist denaturation by the elevated hydrostatic pressures of the deep sea. Phylogenetic evidence for the duplication of the rod opsin gene in the Anguilliform lineage is presented.
PMCID: PMC1688238  PMID: 9061967
21.  The transcriptional landscape of the deep-sea bacterium Photobacterium profundum in both a toxR mutant and its parental strain 
BMC Genomics  2012;13:567.
The deep-sea bacterium Photobacterium profundum is an established model for studying high pressure adaptation. In this paper we analyse the parental strain DB110 and the toxR mutant TW30 by massively parallel cDNA sequencing (RNA-seq). ToxR is a transmembrane DNA-binding protein first discovered in Vibrio cholerae, where it regulates a considerable number of genes involved in environmental adaptation and virulence. In P. profundum the abundance and activity of this protein is influenced by hydrostatic pressure and its role is related to the regulation of genes in a pressure-dependent manner.
To better characterize the ToxR regulon, we compared the expression profiles of wt and toxR strains in response to pressure changes. Our results revealed a complex expression pattern with a group of 22 genes having expression profiles similar to OmpH that is an outer membrane protein transcribed in response to high hydrostatic pressure. Moreover, RNA-seq allowed a deep characterization of the transcriptional landscape that led to the identification of 460 putative small RNA genes and the detection of 298 protein-coding genes previously unknown. We were also able to perform a genome-wide prediction of operon structure, transcription start and termination sites, revealing an unexpected high number of genes (992) with large 5′-UTRs, long enough to harbour cis-regulatory RNA structures, suggesting a correlation between intergenic region size and UTR length.
This work led to a better understanding of high-pressure response in P. profundum. Furthermore, the high-resolution RNA-seq analysis revealed several unexpected features about transcriptional landscape and general mechanisms of controlling bacterial gene expression.
PMCID: PMC3505737  PMID: 23107454
High-pressure adaptation; Deep sea; Extremophile; Transcription; Operon; RNA-seq; UTR; Vibrionaceae; Photobacterium profundum; ToxR
22.  Low endemism, continued deep-shallow interchanges, and evidence for cosmopolitan distributions in free-living marine nematodes (order Enoplida) 
Nematodes represent the most abundant benthic metazoa in one of the largest habitats on earth, the deep sea. Characterizing major patterns of biodiversity within this dominant group is a critical step towards understanding evolutionary patterns across this vast ecosystem. The present study has aimed to place deep-sea nematode species into a phylogenetic framework, investigate relationships between shallow water and deep-sea taxa, and elucidate phylogeographic patterns amongst the deep-sea fauna.
Molecular data (18 S and 28 S rRNA) confirms a high diversity amongst deep-sea Enoplids. There is no evidence for endemic deep-sea lineages in Maximum Likelihood or Bayesian phylogenies, and Enoplids do not cluster according to depth or geographic location. Tree topologies suggest frequent interchanges between deep-sea and shallow water habitats, as well as a mixture of early radiations and more recently derived lineages amongst deep-sea taxa. This study also provides convincing evidence of cosmopolitan marine species, recovering a subset of Oncholaimid nematodes with identical gene sequences (18 S, 28 S and cox1) at trans-Atlantic sample sites.
The complex clade structures recovered within the Enoplida support a high global species richness for marine nematodes, with phylogeographic patterns suggesting the existence of closely related, globally distributed species complexes in the deep sea. True cosmopolitan species may additionally exist within this group, potentially driven by specific life history traits of Enoplids. Although this investigation aimed to intensively sample nematodes from the order Enoplida, specimens were only identified down to genus (at best) and our sampling regime focused on an infinitesimal small fraction of the deep-sea floor. Future nematode studies should incorporate an extended sample set covering a wide depth range (shelf, bathyal, and abyssal sites), utilize additional genetic loci (e.g. mtDNA) that are informative at the species level, and apply high-throughput sequencing methods to fully assay community diversity. Finally, further molecular studies are needed to determine whether phylogeographic patterns observed in Enoplids are common across other ubiquitous marine groups (e.g. Chromadorida, Monhysterida).
PMCID: PMC3022606  PMID: 21167065
23.  GroEL dependency affects codon usage—support for a critical role of misfolding in gene evolution 
Integrating genome-scale sequence, expression, structural and protein interaction data from E. coli we establish an interaction between chaperone (GroEL) dependency and optimal codon usage.Highly expressed sporadic substrates of GroEL employ more optimal codons than expected, show enrichment for optimal codons at structurally sensitive sites and greater conservation of codon optimality under conditions of relaxed purifying selection.We suggest that highly expressed genes cannot routinely utilize GroEL for error control so that codon usage has evolved to provide complementary error limitation, whereas obligate GroEL substrates experience relaxed selection on codon usage.Our results support a critical role of misfolding prevention in gene evolution.
Errors during gene expression are relatively commonplace, which has prompted speculations that many features of gene and genome anatomy and organization have evolved to reduce or mitigate such errors. One type of error that can be particularly costly occurs when the polypeptide chain that emerges from the ribosome fails to fold into its native structure. Some aberrantly folded proteins, exposing hydrophobic residues that would normally be buried, may begin to promiscuously interact with other proteins, become toxic to the cell and thus pose a substantial fitness concern (Gregersen et al, 2006).
In trans, molecular chaperones have long been recognized to play crucial roles in misfolding prevention and remedy. In cis, it has recently been suggested that the use of optimal codons limits mistranslation-induced protein misfolding (Drummond and Wilke, 2008). Evidence for the latter is centred on the argument that synonymous codons differ in their propensity to cause mistranslation. Translationally optimal codons, typically represented by more abundant cognate tRNAs (Duret, 2000), are thought less likely to cause ribosomal stalling and/or incorporation of the wrong amino acid.
Here, we suggest that the role, if any, of error limitation in cis can be revealed by studying its interaction with well-established error management systems in trans (chaperones). If codon usage does indeed play a tangible role in misfolding prevention, we would expect selection on codon identity to vary with the degree to which a protein can rely on other error control mechanisms, namely chaperones. We use the E. coli chaperonin GroEL as a model system to explore whether there is any interaction between optimal codon usage and chaperone dependency.
Kerner et al (2005) had previously determined GroEL substrates on a genome-wide scale. Based on enrichment in GroEL complexes the authors assigned ∼250 proteins to three classes reflecting GroEL dependency: class-I proteins, only a small fraction of which (<1%) associates with GroEL and which spontaneously regain some activity; class-II proteins, which only exhibit spontaneous refolding at more permissive temperatures and class-III proteins, which are obligate substrates of GroEL and largely fail to refold even under more benign conditions. Notably, although on average less abundant than class-I/II proteins (‘sporadic clients'), class-III proteins (‘obligate clients') occupy ∼80% of GroEL's capacity in vivo. Consequently, a higher proportion (∼100% versus ∼20% for class-II and ∼1% for class-I) of these proteins is routinely processed by the GroEL system.
We demonstrate that sporadic but not obligate clients of GroEL exhibit enhanced codon adaptation, carefully controlling for possible confounding factors, notably expression level and protein length (Figure 1). We also point out that genes that recently entered the E. coli genome via horizontal gene transfer will distort equilibrium analyses of codon usage in bacteria and should thus be routinely eliminated from analysis.
Building on earlier work by Zhou et al (2009), we further show that sporadic substrates are conspicuously enriched for optimal codons at structurally sensitive sites, consistent with more severe fitness implications of codon choice for these proteins.
Lastly, we reveal that codon optimality in sporadic clients is more highly conserved in S. dysenteriae. S. dysenteriae is an E. coli clone that has diverged relatively recently from the E. coli K12 strain and has adopted an intracellular lifestyle (Balbi et al, 2009). Concomitant with that lifestyle, Shigella has experienced a lower effective population size and therefore reduced efficiency of purifying selection. This has generated conditions where, overall, codon optimality has started to decay. However, when we followed the fate of ancestrally optimal codons at buried sites in the S. dysenteriae and E. coli K12 genomes, we found that a lower fraction of buried sites has lost codon optimality in sporadic substrates (Figure 4), again consistent with greater structural importance of codon choice in these substrates.
Based on the these findings, we suggest the following explanation: As mentioned above, class-III substrates are defined not only by GroEL being critical for proper folding, but also by occupying most of GroEL's capacity (∼80%). With a high proportion of class-III protein passaged through the GroEL system, mistranslation errors in these proteins weigh less severely as GroEL can remedy at least some misfolding that ensues. In contrast, class-I and II genes are more highly expressed and cannot routinely rely on GroEL to rectify folding errors. Yet class-I/II proteins are clearly liable to misfold as testified by their sporadic association with GroEL. We argue that augmenting GroEL's capacity to address the misfolding propensity of these genes would be prohibitively costly to the organism and that, as an alternative strategy, these genes employ optimal codons to reduce the rate of misfolding error.
Our findings (a) reveal a cis–trans interaction between codon usage and chaperones in providing an integrated error management system, (b) provide independent evidence for a role of misfolding in shaping gene evolution and (c) suggest that the burden of deleterious mutations in long-term bottlenecking populations like that of the insect endosymbiont Buchnera not only comprises unfavourable amino-acid (Moran, 1996) but also synonymous substitutions.
It has recently been suggested that the use of optimal codons limits mistranslation-induced protein misfolding, yet evidence for this remains largely circumstantial. In contrast, molecular chaperones have long been recognized to play crucial roles in misfolding prevention and remedy. We propose that putative error limitation in cis can be elucidated by examining the interaction between codon usage and chaperoning processes. Using Escherichia coli as a model system, we find that codon optimality covaries with dependency on the chaperonin GroEL. Sporadic but not obligate substrates of GroEL exhibit higher average codon adaptation and are conspicuously enriched for optimal codons at structurally sensitive sites. Further, codon optimality of sporadic clients is more conserved in the E. coli clone Shigella dysenteriae. We suggest that highly expressed genes cannot routinely use GroEL for error control so that codon usage has evolved to provide complementary error limitation. These findings provide independent evidence for a role of misfolding in shaping gene evolution and highlight the need to co-characterize adaptations in cis and trans to unravel the workings of integrated molecular systems.
PMCID: PMC2824523  PMID: 20087338
codon bias; GroEL; misfolding
24.  Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences 
PLoS ONE  2014;9(7):e103357.
Comparison of polymorphism at synonymous and non-synonymous sites in protein-coding DNA can provide evidence for selective constraint. Non-coding DNA that forms part of the regulatory landscape presents more of a challenge since there is not such a clear-cut distinction between sites under stronger and weaker selective constraint. Here, we consider putative regulatory elements termed Conserved Non-coding Elements (CNEs) defined by their high level of sequence identity across all vertebrates. Some mutations in these regions have been implicated in developmental disorders; we analyse CNE polymorphism data to investigate whether such deleterious effects are widespread in humans. Single nucleotide variants from the HapMap and 1000 Genomes Projects were mapped across nearly 2000 CNEs. In the 1000 Genomes data we find a significant excess of rare derived alleles in CNEs relative to coding sequences; this pattern is absent in HapMap data, apparently obscured by ascertainment bias. The distribution of polymorphism within CNEs is not uniform; we could identify two categories of sites by exploiting deep vertebrate alignments: stretches that are non-variant, and those that have at least one substitution. The conserved category has fewer polymorphic sites and a greater excess of rare derived alleles, which can be explained by a large proportion of sites under strong purifying selection within humans – higher than that for non-synonymous sites in most protein coding regions, and comparable to that at the strongly conserved trans-dev genes. Conversely, the more evolutionarily labile CNE sites have an allele frequency distribution not significantly different from non-synonymous sites. Future studies should exploit genome-wide re-sequencing to obtain better coverage in selected non-coding regions, given the likelihood that mutations in evolutionarily conserved enhancer sequences are deleterious. Discovery pipelines should validate non-coding variants to aid in identifying causal and risk-enhancing variants in complex disorders, in contrast to the current focus on exome sequencing.
PMCID: PMC4111549  PMID: 25062004
25.  Substrate Specificity and Signal Transduction Pathways in the Glucose-Specific Enzyme II (EIIGlc) Component of the Escherichia coli Phosphotransferase System 
Journal of Bacteriology  2000;182(16):4437-4442.
Escherichia coli adapted to glucose-limited chemostats contained mutations in ptsG resulting in V12G, V12F, and G13C substitutions in glucose-specific enzyme II (EIIGlc) and resulting in increased transport of glucose and methyl-α-glucoside. The mutations also resulted in faster growth on mannose and glucosamine in a PtsG-dependent manner. By use of enhanced growth on glucosamine for selection, four further sites were identified where substitutions caused broadened substrate specificity (G176D, A288V, G320S, and P384R). The altered amino acids include residues previously identified as changing the uptake of ribose, fructose, and mannitol. The mutations belonged to two classes. First, at two sites, changes affected transmembrane residues (A288V and G320S), probably altering sugar selectivity directly. More remarkably, the five other specificity mutations affected residues unlikely to be in transmembrane segments and were additionally associated with increased ptsG transcription in the absence of glucose. Increased expression of wild-type EIIGlc was not by itself sufficient for growth with other sugars. A model is proposed in which the protein conformation determining sugar accessibility is linked to transcriptional signal transduction in EIIGlc. The conformation of EIIGlc elicited by either glucose transport in the wild-type protein or permanently altered conformation in the second category of mutants results in altered signal transduction and interaction with a regulator, probably Mlc, controlling the transcription of pts genes.
PMCID: PMC94614  PMID: 10913076

Results 1-25 (1126375)