PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1316670)

Clipboard (0)
None

Related Articles

1.  The Ortholog Conjecture Is Untestable by the Current Gene Ontology but Is Supported by RNA Sequencing Data 
PLoS Computational Biology  2012;8(11):e1002784.
The ortholog conjecture posits that orthologous genes are functionally more similar than paralogous genes. This conjecture is a cornerstone of phylogenomics and is used daily by both computational and experimental biologists in predicting, interpreting, and understanding gene functions. A recent study, however, challenged the ortholog conjecture on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data in human and mouse. It instead proposed that the functional similarity of homologous genes is primarily determined by the cellular context in which the genes act, explaining why a greater functional similarity of (within-species) paralogs than (between-species) orthologs was observed. Here we show that GO-based functional similarity between human and mouse orthologs, relative to that between paralogs, has been increasing in the last five years. Further, compared with paralogs, orthologs are less likely to be included in the same study, causing an underestimation in their functional similarity. A close examination of functional studies of homologs with identical protein sequences reveals experimental biases, annotation errors, and homology-based functional inferences that are labeled in GO as experimental. These problems and the temporary nature of the GO-based finding make the current GO inappropriate for testing the ortholog conjecture. RNA sequencing (RNA-Seq) is known to be superior to microarray for comparing the expressions of different genes or in different species. Our analysis of a large RNA-Seq dataset of multiple tissues from eight mammals and the chicken shows that the expression similarity between orthologs is significantly higher than that between within-species paralogs, supporting the ortholog conjecture and refuting the cellular context hypothesis for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.
Author Summary
Today's exceedingly high speed of genome sequencing, compared with the generally slow pace of functional assay, means that the functions of most genes identified from genome sequences will be annotated only through computational prediction. The primary source of information for this prediction is the functions of orthologous genes in model organisms, because orthologs are widely believed to be functionally similar, especially when compared with paralogs. This belief, known as the ortholog conjecture, was recently challenged on the basis of experimentally derived Gene Ontology (GO) annotations and microarray gene expression data, because these data revealed greater functional and expressional similarities of paralogs than orthologs. Here we show that GO-based estimates of functional similarities are temporary and unreliable, due to experimental biases, annotation errors, and homology-based functional inferences that are incorrectly labeled as experimental in GO. RNA sequencing (RNA-Seq) is superior to microarray for comparing the expressions of different genes or in different species, and our analysis of a large RNA-Seq dataset provides strong support to the ortholog conjecture for gene expression. We conclude that the ortholog conjecture remains largely valid to the extent that it has been tested, but further scrutiny using more and better functional data is needed.
doi:10.1371/journal.pcbi.1002784
PMCID: PMC3510086  PMID: 23209392
2.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report 
PLoS Computational Biology  2012;8(2):e1002386.
A recent paper (Nehrt et al., PLoS Comput. Biol. 7:e1002073, 2011) has proposed a metric for the “functional similarity” between two genes that uses only the Gene Ontology (GO) annotations directly derived from published experimental results. Applying this metric, the authors concluded that paralogous genes within the mouse genome or the human genome are more functionally similar on average than orthologous genes between these genomes, an unexpected result with broad implications if true. We suggest, based on both theoretical and empirical considerations, that this proposed metric should not be interpreted as a functional similarity, and therefore cannot be used to support any conclusions about the “ortholog conjecture” (or, more properly, the “ortholog functional conservation hypothesis”). First, we reexamine the case studies presented by Nehrt et al. as examples of orthologs with divergent functions, and come to a very different conclusion: they actually exemplify how GO annotations for orthologous genes provide complementary information about conserved biological functions. We then show that there is a global ascertainment bias in the experiment-based GO annotations for human and mouse genes: particular types of experiments tend to be performed in different model organisms. We conclude that the reported statistical differences in annotations between pairs of orthologous genes do not reflect differences in biological function, but rather complementarity in experimental approaches. Our results underscore two general considerations for researchers proposing novel types of analysis based on the GO: 1) that GO annotations are often incomplete, potentially in a biased manner, and subject to an “open world assumption” (absence of an annotation does not imply absence of a function), and 2) that conclusions drawn from a novel, large-scale GO analysis should whenever possible be supported by careful, in-depth examination of examples, to help ensure the conclusions have a justifiable biological basis.
Author Summary
Understanding gene function—how individual genes contribute to the biology of an organism at the molecular, cellular and organism levels—is one of the primary aims of biomedical research. It has been a longstanding tenet of model organism research that experimental knowledge obtained in one organism is often applicable to other organisms, particularly if the organisms share the relevant genes because they inherited them from their common ancestor. Nevertheless this tenet is, like any hypothesis, not beyond question. A recent paper has termed this hypothesis a “conjecture,” and performed a statistical analysis, the results of which were interpreted as evidence against the hypothesis. This statistical analysis relied on a computational representation of gene function, the Gene Ontology (GO). As representatives of the international consortium that produces the GO, we show how the apparent evidence against the “ortholog conjecture” can be better explained as an artifact of how molecular biology knowledge is accumulated. In short, a complementarity between knowledge obtained in mouse and human experimental systems was incorrectly interpreted as a disagreement. We discuss the proper interpretation of GO annotations and potential sources of bias, with an eye toward enhancing the informed use of the GO by the scientific community.
doi:10.1371/journal.pcbi.1002386
PMCID: PMC3280971  PMID: 22359495
3.  Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals 
PLoS Computational Biology  2011;7(6):e1002073.
A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the “ortholog conjecture”). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.
Author Summary
The use of model organisms in biological research rests upon the assumption that gene and protein functions discovered in one organism are likely to be the same or similar in another organism. Hence, the assumption that experiments in mouse will tell us about the function of genes in humans. A guiding principle in the assignment of function from one organism to another is that single-copy genes (“orthologs”) are statistically more likely to provide functional information than are multi-copy genes, whether in the same organism or different organisms. Here we have tested this idea by examining genes with known functions in human and mouse. Surprisingly, we find that multi-copy genes are equally or more likely to provide accurate functional information than are single-copy genes. Our results suggest that the organism itself plays at least as large a role in determining the function of genes as does the particular sequence of the gene alone. This insight will benefit the assignment of function to genes whose roles are not yet known by widening the pool of appropriate genes from which function can be inferred.
doi:10.1371/journal.pcbi.1002073
PMCID: PMC3111532  PMID: 21695233
4.  Complexity of Gene Expression Evolution after Duplication: Protein Dosage Rebalancing 
Ongoing debates about functional importance of gene duplications have been recently intensified by a heated discussion of the “ortholog conjecture” (OC). Under the OC, which is central to functional annotation of genomes, orthologous genes are functionally more similar than paralogous genes at the same level of sequence divergence. However, a recent study challenged the OC by reporting a greater functional similarity, in terms of gene ontology (GO) annotations and expression profiles, among within-species paralogs compared to orthologs. These findings were taken to indicate that functional similarity of homologous genes is primarily determined by the cellular context of the genes, rather than evolutionary history. Subsequent studies suggested that the OC appears to be generally valid when applied to mammalian evolution but the complete picture of evolution of gene expression also has to incorporate lineage-specific aspects of paralogy. The observed complexity of gene expression evolution after duplication can be explained through selection for gene dosage effect combined with the duplication-degeneration-complementation model. This paper discusses expression divergence of recent duplications occurring before functional divergence of proteins encoded by duplicate genes.
doi:10.1155/2014/516508
PMCID: PMC4150538  PMID: 25197576
5.  Domain architecture conservation in orthologs 
BMC Bioinformatics  2011;12:326.
Background
As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence.
To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.
Results
The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation.
The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.
Conclusions
On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.
doi:10.1186/1471-2105-12-326
PMCID: PMC3215765  PMID: 21819573
6.  Functional Evolution of Mammalian Odorant Receptors 
PLoS Genetics  2012;8(7):e1002821.
The mammalian odorant receptor (OR) repertoire is an attractive model to study evolution, because ORs have been subjected to rapid evolution between species, presumably caused by changes of the olfactory system to adapt to the environment. However, functional assessment of ORs in related species remains largely untested. Here we investigated the functional properties of primate and rodent ORs to determine how well evolutionary distance predicts functional characteristics. Using human and mouse ORs with previously identified ligands, we cloned 18 OR orthologs from chimpanzee and rhesus macaque and 17 mouse-rat orthologous pairs that are broadly representative of the OR repertoire. We functionally characterized the in vitro responses of ORs to a wide panel of odors and found similar ligand selectivity but dramatic differences in response magnitude. 87% of human-primate orthologs and 94% of mouse-rat orthologs showed differences in receptor potency (EC50) and/or efficacy (dynamic range) to an individual ligand. Notably dN/dS ratio, an indication of selective pressure during evolution, does not predict functional similarities between orthologs. Additionally, we found that orthologs responded to a common ligand 82% of the time, while human OR paralogs of the same subfamily responded to the common ligand only 33% of the time. Our results suggest that, while OR orthologs tend to show conserved ligand selectivity, their potency and/or efficacy dynamically change during evolution, even in closely related species. These functional changes in orthologs provide a platform for examining how the evolution of ORs can meet species-specific demands.
Author Summary
The mammalian odorant receptor repertoire has been subjected to significant gene duplication and gene loss between species, presumably to adapt to the environment of an organism. However, even in distantly related species, a clear orthologous relationship exists for many genes. While ligands have been identified for several ORs, many of these receptors remain uncharacterized, especially in species other than human and mouse. Due to this paucity of functional data, it is assumed that ORs with similar sequence share functional characteristics. Here we investigate the functional evolution of OR orthologs—genes related via speciation—and OR paralogs—genes related via a duplication event—to provide insight as to how this large gene family has evolved. We show that OR orthologs have similar ligand selectivity to a panel of odors but differ in response magnitude. Additionally, orthologs respond to a common ligand more often than human OR paralogs, but there are vast differences in the potency and efficacy of individual receptors. This result stresses the broad importance of combining evolutionary genomics and molecular biology approaches to study gene function.
doi:10.1371/journal.pgen.1002821
PMCID: PMC3395614  PMID: 22807691
7.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods 
PLoS Computational Biology  2009;5(1):e1000262.
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.
Author Summary
The identification of orthologs, pairs of homologous genes in different species that started diverging through speciation events, is a central problem in genomics with applications in many research areas, including comparative genomics, phylogenetics, protein function annotation, and genome rearrangement. An increasing number of projects aim at inferring orthologs from complete genomes, but little is known about their relative accuracy or coverage. Because the exact evolutionary history of entire genomes remains largely unknown, predictions can only be validated indirectly, that is, in the context of the different applications of orthology. The few comparison studies published so far have asssessed orthology exclusively from the expectation that orthologs have conserved protein function. In the present work, we introduce methodology to verify orthology in terms of phylogeny and perform a comprehensive comparison of nine leading ortholog inference projects and two methods using both phylogenetic and functional tests. The results show large variations among the different projects in terms of performances, which indicates that the choice of orthology database can have a strong impact on any downstream analysis.
doi:10.1371/journal.pcbi.1000262
PMCID: PMC2612752  PMID: 19148271
8.  Improving the specificity of high-throughput ortholog prediction 
BMC Bioinformatics  2006;7:270.
Background
Orthologs (genes that have diverged after a speciation event) tend to have similar function, and so their prediction has become an important component of comparative genomics and genome annotation. The gold standard phylogenetic analysis approach of comparing available organismal phylogeny to gene phylogeny is not easily automated for genome-wide analysis; therefore, ortholog prediction for large genome-scale datasets is typically performed using a reciprocal-best-BLAST-hits (RBH) approach. One problem with RBH is that it will incorrectly predict a paralog as an ortholog when incomplete genome sequences or gene loss is involved. In addition, there is an increasing interest in identifying orthologs most likely to have retained similar function.
Results
To address these issues, we present here a high-throughput computational method named Ortholuge that further evaluates previously predicted orthologs (including those predicted using an RBH-based approach) – identifying which orthologs most closely reflect species divergence and may more likely have similar function. Ortholuge analyzes phylogenetic distance ratios involving two comparison species and an outgroup species, noting cases where relative gene divergence is atypical. It also identifies some cases of gene duplication after species divergence. Through simulations of incomplete genome data/gene loss, we show that the vast majority of genes falsely predicted as orthologs by an RBH-based method can be identified. Ortholuge was then used to estimate the number of false-positives (predominantly paralogs) in selected RBH-predicted ortholog datasets, identifying approximately 10% paralogs in a eukaryotic data set (mouse-rat comparison) and 5% in a bacterial data set (Pseudomonas putida – Pseudomonas syringae species comparison). Higher quality (more precise) datasets of orthologs, which we term "ssd-orthologs" (supporting-species-divergence-orthologs), were also constructed. These datasets, as well as Ortholuge software that may be used to characterize other species' datasets, are available at (software under GNU General Public License).
Conclusion
The Ortholuge method reported here appears to significantly improve the specificity (precision) of high-throughput ortholog prediction for both bacterial and eukaryotic species. This method, and its associated software, will aid those performing various comparative genomics-based analyses, such as the prediction of conserved regulatory elements upstream of orthologous genes.
doi:10.1186/1471-2105-7-270
PMCID: PMC1524997  PMID: 16729895
9.  Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human 
PLoS Computational Biology  2006;2(9):e133.
Accurate predictions of orthology and paralogy relationships are necessary to infer human molecular function from experiments in model organisms. Previous genome-scale approaches to predicting these relationships have been limited by their use of protein similarity and their failure to take into account multiple splicing events and gene prediction errors. We have developed PhyOP, a new phylogenetic orthology prediction pipeline based on synonymous rate estimates, which accurately predicts orthology and paralogy relationships for transcripts, genes, exons, or genomic segments between closely related genomes. We were able to identify orthologue relationships to human genes for 93% of all dog genes from Ensembl. Among 1:1 orthologues, the alignments covered a median of 97.4% of protein sequences, and 92% of orthologues shared essentially identical gene structures. PhyOP accurately recapitulated genomic maps of conserved synteny. Benchmarking against predictions from Ensembl and Inparanoid showed that PhyOP is more accurate, especially in its predictions of paralogy. Nearly half (46%) of PhyOP paralogy predictions are unique. Using PhyOP to investigate orthologues and paralogues in the human and dog genomes, we found that the human assembly contains 3-fold more gene duplications than the dog. Species-specific duplicate genes, or “in-paralogues,” are generally shorter and have fewer exons than 1:1 orthologues, which is consistent with selective constraints and mutation biases based on the sizes of duplicated genes. In-paralogues have experienced elevated amino acid and synonymous nucleotide substitution rates. Duplicates possess similar biological functions for either the dog or human lineages. Having accounted for 2,954 likely pseudogenes and gene fragments, and after separating 346 erroneously merged genes, we estimated that the human genome encodes a minimum of 19,700 protein-coding genes, similar to the gene count of nematode worms. PhyOP is a fast and robust approach to orthology prediction that will be applicable to whole genomes from multiple closely related species. PhyOP will be particularly useful in predicting orthology for mammalian genomes that have been incompletely sequenced, and for large families of rapidly duplicating genes.
Synopsis
Biologists often exploit the evolutionary relationships between proteins in order to explain how their findings are relevant to the biology of other species, including Homo sapiens. The most natural way to define these relationships is to draw family trees showing, for example, which human protein is the counterpart (“orthologue”) of a protein in dog, and which human proteins have arisen by recent duplication of existing genes (“paralogues”). On a small-scale this is relatively straightforward, but it is difficult to do this automatically on a genome-wide scale. In this paper the authors describe a new approach to drawing a giant family tree of all proteins from humans and dogs. They show how this tree allows them to refine some protein predictions and discard others that are likely to be nonfunctional dead sequences. Family relationships can show how the dog and human genomes have been rearranged since their last common ancestor. In addition, they help to identify the proteins that are specific to either dog or human, and which contribute to these species' biological differences. Giant trees, drawn from this method, will help to associate the differences, duplications, and evolution of proteins in different mammals with their distinctive physiologies and behaviours.
doi:10.1371/journal.pcbi.0020133
PMCID: PMC1584324  PMID: 17009864
10.  Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants 
BMC Genomics  2008;9:183.
Background
Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations.
Results
We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions.
Conclusion
Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods.
doi:10.1186/1471-2164-9-183
PMCID: PMC2377279  PMID: 18426584
11.  COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations 
Bioinformatics (Oxford, England)  2006;22(7):779-788.
Motivation
Determining orthology relations among genes across multiple genomes is an important problem in the post-genomic era. Identifying orthologous genes can not only help predict functional annotations for newly sequenced or poorly characterized genomes, but can also help predict new protein–protein interactions. Unfortunately, determining orthology relation through computational methods is not straightforward due to the presence of paralogs. Traditional approaches have relied on pairwise sequence comparisons to construct graphs, which were then partitioned into putative clusters of orthologous groups. These methods do not attempt to preserve the non-transitivity and hierarchic nature of the orthology relation.
Results
We propose a new method, COCO-CL, for hierarchical clustering of homology relations and identification of orthologous groups of genes. Unlike previous approaches, which are based on pairwise sequence comparisons, our method explores the correlation of evolutionary histories of individual genes in a more global context. COCO-CL can be used as a semi-independent method to delineate the orthology/paralogy relation for a refined set of homologous proteins obtained using a less-conservative clustering approach, or as a refiner that removes putative out-paralogs from clusters computed using a more inclusive approach. We analyze our clustering results manually, with support from literature and functional annotations. Since our orthology determination procedure does not employ a species tree to infer duplication events, it can be used in situations when the species tree is unknown or uncertain.
doi:10.1093/bioinformatics/btl009
PMCID: PMC1620014  PMID: 16434444
12.  Evolutionary relationships of Aurora kinases: Implications for model organism studies and the development of anti-cancer drugs 
Background
As key regulators of mitotic chromosome segregation, the Aurora family of serine/threonine kinases play an important role in cell division. Abnormalities in Aurora kinases have been strongly linked with cancer, which has lead to the recent development of new classes of anti-cancer drugs that specifically target the ATP-binding domain of these kinases. From an evolutionary perspective, the species distribution of the Aurora kinase family is complex. Mammals uniquely have three Aurora kinases, Aurora-A, Aurora-B, and Aurora-C, while for other metazoans, including the frog, fruitfly and nematode, only Aurora-A and Aurora-B kinases are known. The fungi have a single Aurora-like homolog. Based on the tacit assumption of orthology to human counterparts, model organism studies have been central to the functional characterization of Aurora kinases. However, the ortholog and paralog relationships of these kinases across various species have not been rigorously examined. Here, we present comprehensive evolutionary analyses of the Aurora kinase family.
Results
Phylogenetic trees suggest that all three vertebrate Auroras evolved from a single urochordate ancestor. Specifically, Aurora-A is an orthologous lineage in cold-blooded vertebrates and mammals, while structurally similar Aurora-B and Aurora-C evolved more recently in mammals from a duplication of an ancestral Aurora-B/C gene found in cold-blooded vertebrates. All so-called Aurora-A and Aurora-B kinases of non-chordates are ancestral to the clade of chordate Auroras and, therefore, are not strictly orthologous to vertebrate counterparts. Comparisons of human Aurora-B and Aurora-C sequences to the resolved 3D structure of human Aurora-A lends further support to the evolutionary scenario that vertebrate Aurora-B and Aurora-C are closely related paralogs. Of the 26 residues lining the ATP-binding active site, only three were variant and all were specific to Aurora-A.
Conclusions
In this study, we found that invertebrate Aurora-A and Aurora-B kinases are highly divergent protein families from their chordate counterparts. Furthermore, while the Aurora-A family is ubiquitous among all vertebrates, the Aurora-B and Aurora-C families in humans arose from a gene duplication event in mammals. These findings show the importance of understanding evolutionary relationships in the interpretation and transference of knowledge from studies of model organism systems to human cellular biology. In addition, given the important role of Aurora kinases in cancer, evolutionary analysis and comparisons of ATP-binding domains suggest a rationale for designing dual action anti-tumor drugs that inhibit both Aurora-B and Aurora-C kinases.
doi:10.1186/1471-2148-4-39
PMCID: PMC524484  PMID: 15476560
13.  Molecular evolution of the polyamine oxidase gene family in Metazoa 
Background
Polyamine oxidase enzymes catalyze the oxidation of polyamines and acetylpolyamines. Since polyamines are basic regulators of cell growth and proliferation, their homeostasis is crucial for cell life. Members of the polyamine oxidase gene family have been identified in a wide variety of animals, including vertebrates, arthropodes, nematodes, placozoa, as well as in plants and fungi. Polyamine oxidases (PAOs) from yeast can oxidize spermine, N1-acetylspermine, and N1-acetylspermidine, however, in vertebrates two different enzymes, namely spermine oxidase (SMO) and acetylpolyamine oxidase (APAO), specifically catalyze the oxidation of spermine, and N1-acetylspermine/N1-acetylspermidine, respectively. Little is known about the molecular evolutionary history of these enzymes. However, since the yeast PAO is able to catalyze the oxidation of both acetylated and non acetylated polyamines, and in vertebrates these functions are addressed by two specialized polyamine oxidase subfamilies (APAO and SMO), it can be hypothesized an ancestral reference for the former enzyme from which the latter would have been derived.
Results
We analysed 36 SMO, 26 APAO, and 14 PAO homologue protein sequences from 54 taxa including various vertebrates and invertebrates. The analysis of the full-length sequences and the principal domains of vertebrate and invertebrate PAOs yielded consensus primary protein sequences for vertebrate SMOs and APAOs, and invertebrate PAOs. This analysis, coupled to molecular modeling techniques, also unveiled sequence regions that confer specific structural and functional properties, including substrate specificity, by the different PAO subfamilies. Molecular phylogenetic trees revealed a basal position of all the invertebrates PAO enzymes relative to vertebrate SMOs and APAOs. PAOs from insects constitute a monophyletic clade. Two PAO variants sampled in the amphioxus are basal to the dichotomy between two well supported monophyletic clades including, respectively, all the SMOs and APAOs from vertebrates. The two vertebrate monophyletic clades clustered strictly mirroring the organismal phylogeny of fishes, amphibians, reptiles, birds, and mammals. Evidences from comparative genomic analysis, structural evolution and functional divergence in a phylogenetic framework across Metazoa suggested an evolutionary scenario where the ancestor PAO coding sequence, present in invertebrates as an orthologous gene, has been duplicated in the vertebrate branch to originate the paralogous SMO and APAO genes. A further genome evolution event concerns the SMO gene of placental, but not marsupial and monotremate, mammals which increased its functional variation following an alternative splicing (AS) mechanism.
Conclusions
In this study the explicit integration in a phylogenomic framework of phylogenetic tree construction, structure prediction, and biochemical function data/prediction, allowed inferring the molecular evolutionary history of the PAO gene family and to disambiguate paralogous genes related by duplication event (SMO and APAO) and orthologous genes related by speciation events (PAOs, SMOs/APAOs). Further, while in vertebrates experimental data corroborate SMO and APAO molecular function predictions, in invertebrates the finding of a supported phylogenetic clusters of insect PAOs and the co-occurrence of two PAO variants in the amphioxus urgently claim the need for future structure-function studies.
doi:10.1186/1471-2148-12-90
PMCID: PMC3517346  PMID: 22716069
14.  The evolution of protostome GATA factors: Molecular phylogenetics, synteny, and intron/exon structure reveal orthologous relationships 
Background
Invertebrate and vertebrate GATA transcription factors play important roles in ectoderm and mesendoderm development, as well as in cardiovascular and blood cell fate specification. However, the assignment of evolutionarily conserved roles to GATA homologs requires a detailed framework of orthologous relationships. Although two distinct classes, GATA123 and GATA456, have been unambiguously recognized among deuterostome GATA genes, it has been difficult to resolve exact orthologous relationships among protostome homologs. Protostome GATA genes are often present in multiple copies within any one genome, and rapidly evolving gene sequences have obscured orthology among arthropod and nematode GATA homologs. In addition, a lack of taxonomic sampling has prevented a stepwise reconstruction of protostome GATA gene family evolution.
Results
We have identified the complete GATA complement (53 genes) from a diverse sampling of protostome genomes, including six arthropods, three lophotrochozoans, and two nematodes. Reciprocal best hit BLAST analysis suggested orthology of these GATA genes to either the ancestral bilaterian GATA123 or the GATA456 class. Using molecular phylogenetic analyses of gene sequences, together with conserved synteny and comparisons of intron/exon structure, we inferred the evolutionary relationships among these 53 protostome GATA homologs. In particular, we resolved the orthology and evolutionary birth order of all arthropod GATA homologs including the highly divergent Drosophila GATA genes.
Conclusion
Our combined analyses confirm that all protostome GATA transcription factor genes are members of either the GATA123 or GATA456 class, and indicate that there have been multiple protostome-specific duplications of GATA456 homologs. Three GATA456 genes exhibit linkage in multiple protostome species, suggesting that this gene cluster arose by tandem duplications from an ancestral GATA456 gene. Within arthropods this GATA456 cluster appears orthologous and widely conserved. Furthermore, the intron/exon structures of the arthropod GATA456 orthologs suggest a distinct order of gene duplication events. At present, however, the evolutionary relationship to similarly linked GATA456 paralogs in lophotrochozoans remains unclear. Our study shows how sampling of additional genomic data, especially from less derived and interspersed protostome taxa, can be used to resolve the orthologous relationships within more divergent gene families.
doi:10.1186/1471-2148-8-112
PMCID: PMC2383905  PMID: 18412965
15.  Gene-oriented ortholog database: a functional comparison platform for orthologous loci 
The accumulation of complete genomic sequences enhances the need for functional annotation. Associating existing functional annotation of orthologs can speed up the annotation process and even examine the existing annotation. However, current protein sequence-based ortholog databases provide ambiguous and incomplete orthology in eukaryotes. It is because that isoforms, derived by alternative splicing (AS), often share higher sequence similarity to interfere the sequence-based identification. Gene-Oriented Ortholog Database (GOOD) employs genomic locations of transcripts to cluster AS-derived isoforms prior to ortholog delineation to eliminate the interference from AS. From the gene-oriented presentation, isoforms can be clearly associated to their genes to provide comprehensive ortholog information and further be discriminated from paralogs. Aside from, displaying clusters of isoforms between orthologous genes can present the evolution variation at the transcription level. Based on orthology, GOOD additionally comprises functional annotation from the Gene Ontology (GO) database. However, there exist redundant annotations, both parent and child terms assigned to the same gene, in the GO database. It is difficult to precisely draw the numerical comparison of term counts between orthologous genes annotated with redundant terms. Instead of the description only, GOOD further provides the GO graphs to reveal hierarchical-like relationships among divergent functionalities. Therefore, the redundancy of GO terms can be examined, and the context among compared terms is more comprehensive. In sum, GOOD can improve the interpretation in the molecular function from experiments in the model organism and provide clear comparative genomic annotation across organisms.
Database URL: http://goods.ibms.sinica.edu.tw/goods/
doi:10.1093/database/baq002
PMCID: PMC2860896  PMID: 20428317
16.  Orthology Inference in Nonmodel Organisms Using Transcriptomes and Low-Coverage Genomes: Improving Accuracy and Matrix Occupancy for Phylogenomics 
Molecular Biology and Evolution  2014;31(11):3081-3092.
Orthology inference is central to phylogenomic analyses. Phylogenomic data sets commonly include transcriptomes and low-coverage genomes that are incomplete and contain errors and isoforms. These properties can severely violate the underlying assumptions of orthology inference with existing heuristics. We present a procedure that uses phylogenies for both homology and orthology assignment. The procedure first uses similarity scores to infer putative homologs that are then aligned, constructed into phylogenies, and pruned of spurious branches caused by deep paralogs, misassembly, frameshifts, or recombination. These final homologs are then used to identify orthologs. We explore four alternative tree-based orthology inference approaches, of which two are new. These accommodate gene and genome duplications as well as gene tree discordance. We demonstrate these methods in three published data sets including the grape family, Hymenoptera, and millipedes with divergence times ranging from approximately 100 to over 400 Ma. The procedure significantly increased the completeness and accuracy of the inferred homologs and orthologs. We also found that data sets that are more recently diverged and/or include more high-coverage genomes had more complete sets of orthologs. To explicitly evaluate sources of conflicting phylogenetic signals, we applied serial jackknife analyses of gene regions keeping each locus intact. The methods described here can scale to over 100 taxa. They have been implemented in python with independent scripts for each step, making it easy to modify or incorporate them into existing pipelines. All scripts are available from https://bitbucket.org/yangya/phylogenomic_dataset_construction.
doi:10.1093/molbev/msu245
PMCID: PMC4209138  PMID: 25158799
Diplopoda; phylotranscriptomics; RNA-seq; Vitaceae
17.  Phyletic Profiling with Cliques of Orthologs Is Enhanced by Signatures of Paralogy Relationships 
PLoS Computational Biology  2013;9(1):e1002852.
New microbial genomes are sequenced at a high pace, allowing insight into the genetics of not only cultured microbes, but a wide range of metagenomic collections such as the human microbiome. To understand the deluge of genomic data we face, computational approaches for gene functional annotation are invaluable. We introduce a novel model for computational annotation that refines two established concepts: annotation based on homology and annotation based on phyletic profiling. The phyletic profiling-based model that includes both inferred orthologs and paralogs—homologs separated by a speciation and a duplication event, respectively—provides more annotations at the same average Precision than the model that includes only inferred orthologs. For experimental validation, we selected 38 poorly annotated Escherichia coli genes for which the model assigned one of three GO terms with high confidence: involvement in DNA repair, protein translation, or cell wall synthesis. Results of antibiotic stress survival assays on E. coli knockout mutants showed high agreement with our model's estimates of accuracy: out of 38 predictions obtained at the reported Precision of 60%, we confirmed 25 predictions, indicating that our confidence estimates can be used to make informed decisions on experimental validation. Our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time. Our predictions for 998 prokaryotic genomes include ∼400000 specific annotations with the estimated Precision of 90%, ∼19000 of which are highly specific—e.g. “penicillin binding,” “tRNA aminoacylation for protein translation,” or “pathogenesis”—and are freely available at http://gorbi.irb.hr/.
Author Summary
While both the number and the diversity of sequenced prokaryotic genomes grow rapidly, the number of specific assignments of gene functions in the databases remains low and skewed toward the model prokaryote Escherichia coli. To aid in understanding the full set of newly sequenced genes, we created a computational model for assignment of function to prokaryotic genomes. The result is an innovative framework for orthology and paralogy-aware phyletic profiling that provides a large number of computational annotations with high predictive accuracy in train/test evaluations. Our predictions include annotations for 1.3 million genes with the estimated Precision of 90%; these, and many more predictions for 998 prokaryotic genomes are freely available at http://gorbi.irb.hr/. More importantly, we show a proof of principle that our functional annotation model can be used to generate new biological hypotheses: we performed experiments on 38 E. coli knockout mutants and showed that our annotation model provides realistic estimates of predictive accuracy. With this, our work will contribute to making experimental validation of computational predictions more approachable, both in cost and time.
doi:10.1371/journal.pcbi.1002852
PMCID: PMC3536626  PMID: 23308060
18.  A phylogenomic gene cluster resource: the Phylogenetically Inferred Groups (PhIGs) database 
BMC Bioinformatics  2006;7:201.
Background
We present here the PhIGs database, a phylogenomic resource for sequenced genomes. Although many methods exist for clustering gene families, very few attempt to create truly orthologous clusters sharing descent from a single ancestral gene across a range of evolutionary depths. Although these non-phylogenetic gene family clusters have been used broadly for gene annotation, errors are known to be introduced by the artifactual association of slowly evolving paralogs and lack of annotation for those more rapidly evolving. A full phylogenetic framework is necessary for accurate inference of function and for many studies that address pattern and mechanism of the evolution of the genome. The automated generation of evolutionary gene clusters, creation of gene trees, determination of orthology and paralogy relationships, and the correlation of this information with gene annotations, expression information, and genomic context is an important resource to the scientific community.
Discussion
The PhIGs database currently contains 23 completely sequenced genomes of fungi and metazoans, containing 409,653 genes that have been grouped into 42,645 gene clusters. Each gene cluster is built such that the gene sequence distances are consistent with the known organismal relationships and in so doing, maximizing the likelihood for the clusters to represent truly orthologous genes. The PhIGs website contains tools that allow the study of genes within their phylogenetic framework through keyword searches on annotations, such as GO and InterPro assignments, and sequence similarity searches by BLAST and HMM. In addition to displaying the evolutionary relationships of the genes in each cluster, the website also allows users to view the relative physical positions of homologous genes in specified sets of genomes.
Summary
Accurate analyses of genes and genomes can only be done within their full phylogenetic context. The PhIGs database and corresponding website address this problem for the scientific community. Our goal is to expand the content as more genomes are sequenced and use this framework to incorporate more analyses.
doi:10.1186/1471-2105-7-201
PMCID: PMC1523372  PMID: 16608522
19.  RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs 
BMC Bioinformatics  2002;3:14.
Background
When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication). The utility of phylogenetic information in high-throughput genome annotation ("phylogenomics") is widely recognized, but existing approaches are either manual or not explicitly based on phylogenetic trees.
Results
Here we present RIO (Resampled Inference of Orthologs), a procedure for automated phylogenomics using explicit phylogenetic inference. RIO analyses are performed over bootstrap resampled phylogenetic trees to estimate the reliability of orthology assignments. We also introduce supplementary concepts that are helpful for functional inference. RIO has been implemented as Perl pipeline connecting several C and Java programs. It is available at http://www.genetics.wustl.edu/eddy/forester/. A web server is at http://www.rio.wustl.edu/. RIO was tested on the Arabidopsis thaliana and Caenorhabditis elegans proteomes.
Conclusion
The RIO procedure is particularly useful for the automated detection of first representatives of novel protein subfamilies. We also describe how some orthologies can be misleading for functional inference.
doi:10.1186/1471-2105-3-14
PMCID: PMC116988  PMID: 12028595
20.  Selection in the evolution of gene duplications 
Genome Biology  2002;3(2):research0008.1-research0008.9.
Background
Gene duplications have a major role in the evolution of new biological functions. Theoretical studies often assume that a duplication per se is selectively neutral and that, following a duplication, one of the gene copies is freed from purifying (stabilizing) selection, which creates the potential for evolution of a new function.
Results
In search of systematic evidence of accelerated evolution after duplication, we used data from 26 bacterial, six archaeal, and seven eukaryotic genomes to compare the mode and strength of selection acting on recently duplicated genes (paralogs) and on similarly diverged, unduplicated orthologous genes in different species. We find that the ratio of nonsynonymous to synonymous substitutions (Kn/Ks) in most paralogous pairs is <<1 and that paralogs typically evolve at similar rates, without significant asymmetry, indicating that both paralogs produced by a duplication are subject to purifying selection. This selection is, however, substantially weaker than the purifying selection affecting unduplicated orthologs that have diverged to the same extent as the analyzed paralogs. Most of the recently duplicated genes appear to be involved in various forms of environmental response; in particular, many of them encode membrane and secreted proteins.
Conclusions
The results of this analysis indicate that recently duplicated paralogs evolve faster than orthologs with the same level of divergence and similar functions, but apparently do not experience a phase of neutral evolution. We hypothesize that gene duplications that persist in an evolving lineage are beneficial from the time of their origin, due primarily to a protein dosage effect in response to variable environmental conditions; duplications are likely to give rise to new functions at a later phase of their evolution once a higher level of divergence is reached.
PMCID: PMC65685  PMID: 11864370
21.  Evaluating Ortholog Prediction Algorithms in a Yeast Model Clade 
PLoS ONE  2011;6(4):e18755.
Background
Accurate identification of orthologs is crucial for evolutionary studies and for functional annotation. Several algorithms have been developed for ortholog delineation, but so far, manually curated genome-scale biological databases of orthologous genes for algorithm evaluation have been lacking. We evaluated four popular ortholog prediction algorithms (MultiParanoid; and OrthoMCL; RBH: Reciprocal Best Hit; RSD: Reciprocal Smallest Distance; the last two extended into clustering algorithms cRBH and cRSD, respectively, so that they can predict orthologs across multiple taxa) against a set of 2,723 groups of high-quality curated orthologs from 6 Saccharomycete yeasts in the Yeast Gene Order Browser.
Results
Examination of sensitivity [TP/(TP+FN)], specificity [TN/(TN+FP)], and accuracy [(TP+TN)/(TP+TN+FP+FN)] across a broad parameter range showed that cRBH was the most accurate and specific algorithm, whereas OrthoMCL was the most sensitive. Evaluation of the algorithms across a varying number of species showed that cRBH had the highest accuracy and lowest false discovery rate [FP/(FP+TP)], followed by cRSD. Of the six species in our set, three descended from an ancestor that underwent whole genome duplication. Subsequent differential duplicate loss events in the three descendants resulted in distinct classes of gene loss patterns, including cases where the genes retained in the three descendants are paralogs, constituting ‘traps’ for ortholog prediction algorithms. We found that the false discovery rate of all algorithms dramatically increased in these traps.
Conclusions
These results suggest that simple algorithms, like cRBH, may be better ortholog predictors than more complex ones (e.g., OrthoMCL and MultiParanoid) for evolutionary and functional genomics studies where the objective is the accurate inference of single-copy orthologs (e.g., molecular phylogenetics), but that all algorithms fail to accurately predict orthologs when paralogy is rampant.
doi:10.1371/journal.pone.0018755
PMCID: PMC3076445  PMID: 21533202
22.  Functional diversification of sonic hedgehog paralog enhancers identified by phylogenomic reconstruction 
Genome Biology  2007;8(6):R106.
Investigation of the ar-C midline enhancer of sonic hedgehog orthologs and paralogs from distantly related vertebrate lineages identified lineage-specific motif changes; exchanging motifs between paralog enhancers resulted in the reversal of enhancer specificity.
Background
Cis-regulatory modules of developmental genes are targets of evolutionary changes that underlie the morphologic diversity of animals. Little is known about the 'grammar' of interactions between transcription factors and cis-regulatory modules and therefore about the molecular mechanisms that underlie changes in these modules, particularly after gene and genome duplications. We investigated the ar-C midline enhancer of sonic hedgehog (shh) orthologs and paralogs from distantly related vertebrate lineages, from fish to human, including the basal vertebrate Latimeria menadoensis.
Results
We demonstrate that the sonic hedgehog a (shha) paralogs sonic hedgehog b (tiggy winkle hedgehog; shhb) genes of fishes have a modified ar-C enhancer, which specifies a diverged function at the embryonic midline. We have identified several conserved motifs that are indicative of putative transcription factor binding sites by local alignment of ar-C enhancers of numerous vertebrate sequences. To trace the evolutionary changes among paralog enhancers, phylogenomic reconstruction was carried out and lineage-specific motif changes were identified. The relation between motif composition and observed developmental differences was evaluated through transgenic functional analyses. Altering and exchanging motifs between paralog enhancers resulted in reversal of enhancer specificity in the floor plate and notochord. A model reconstructing enhancer divergence during vertebrate evolution was developed.
Conclusion
Our model suggests that the identified motifs of the ar-C enhancer function as binary switches that are responsible for specific activity between midline tissues, and that these motifs are adjusted during functional diversification of paralogs. The unraveled motif changes can also account for the complex interpretation of activator and repressor input signals within a single enhancer.
doi:10.1186/gb-2007-8-6-r106
PMCID: PMC2394741  PMID: 17559649
23.  Correlated changes between regulatory cis elements and condition-specific expression in paralogous gene families 
Nucleic Acids Research  2009;38(3):738-749.
Gene duplication is integral to evolution, providing novel opportunities for organisms to diversify in function. One fundamental pathway of functional diversification among initially redundant gene copies, or paralogs, is via alterations in their expression patterns. Although the mechanisms underlying expression divergence are not completely understood, transcription factor binding sites and nucleosome occupancy are known to play a significant role in the process. Previous attempts to detect genomic variations mediating expression divergence in orthologs have had limited success for two primary reasons. First, it is inherently challenging to compare expressions among orthologs due to variable trans-acting effects and second, previous studies have quantified expression divergence in terms of an overall similarity of expression profiles across multiple samples, thereby obscuring condition-specific expression changes. Moreover, the inherently inter-correlated expressions among homologs present statistical challenges, not adequately addressed in many previous studies. Using rigorous statistical tests, here we characterize the relationship between cis element divergence and condition-specific expression divergence among paralogous genes in Saccharomyces cerevisiae. In particular, among all combinations of gene family and TFs analyzed, we found a significant correlation between TF binding and the condition-specific expression patterns in over 20% of the cases. In addition, incorporating nucleosome occupancy reveals several additional correlations. For instance, our results suggest that GAL4 binding plays a major role in the expression divergence of the genes in the sugar transporter family. Our work presents a novel means of investigating the cis regulatory changes potentially mediating expression divergence in paralogous gene families under specific conditions.
doi:10.1093/nar/gkp989
PMCID: PMC2817486  PMID: 19933262
24.  Complex patterns of divergence among green-sensitive (RH2a) African cichlid opsins revealed by Clade model analyses 
Background
Gene duplications play an important role in the evolution of functional protein diversity. Some models of duplicate gene evolution predict complex forms of paralog divergence; orthologous proteins may diverge as well, further complicating patterns of divergence among and within gene families. Consequently, studying the link between protein sequence evolution and duplication requires the use of flexible substitution models that can accommodate multiple shifts in selection across a phylogeny. Here, we employed a variety of codon substitution models, primarily Clade models, to explore how selective constraint evolved following the duplication of a green-sensitive (RH2a) visual pigment protein (opsin) in African cichlids. Past studies have linked opsin divergence to ecological and sexual divergence within the African cichlid adaptive radiation. Furthermore, biochemical and regulatory differences between the RH2aα and RH2aβ paralogs have been documented. It thus seems likely that selection varies in complex ways throughout this gene family.
Results
Clade model analysis of African cichlid RH2a opsins revealed a large increase in the nonsynonymous-to-synonymous substitution rate ratio (ω) following the duplication, as well as an even larger increase, one consistent with positive selection, for Lake Tanganyikan cichlid RH2aβ opsins. Analysis using the popular Branch-site models, by contrast, revealed no such alteration of constraint. Several amino acid sites known to influence spectral and non-spectral aspects of opsin biochemistry were found to be evolving divergently, suggesting that orthologous RH2a opsins may vary in terms of spectral sensitivity and response kinetics. Divergence appears to be occurring despite intronic gene conversion among the tandemly-arranged duplicates.
Conclusions
Our findings indicate that variation in selective constraint is associated with both gene duplication and divergence among orthologs in African cichlid RH2a opsins. At least some of this variation may reflect an adaptive response to differences in light environment. Interestingly, these patterns only became apparent through the use of Clade models, not through the use of the more widely employed Branch-site models; we suggest that this difference stems from the increased flexibility associated with Clade models. Our results thus bear both on studies of cichlid visual system evolution and on studies of gene family evolution in general.
doi:10.1186/1471-2148-12-206
PMCID: PMC3514295  PMID: 23078361
Codon substitution model; Visual pigment evolution; Nonsynonymous-to-synonymous substitution rate ratio; dN/dS; Clade model; Maximum likelihood; Gene family evolution
25.  Positive Darwinian selection is a driving force for the diversification of terpenoid biosynthesis in the genus Oryza 
BMC Plant Biology  2014;14(1):239.
Background
Terpenoids constitute the largest class of secondary metabolites made by plants and display vast chemical diversity among and within species. Terpene synthases (TPSs) are the pivotal enzymes for terpenoid biosynthesis that create the basic carbon skeletons of this class. Functional divergence of paralogous and orthologous TPS genes is a major mechanism for the diversification of terpenoid biosynthesis. However, little is known about the evolutionary forces that have shaped the evolution of plant TPS genes leading to terpenoid diversity.
Results
The orthologs of Oryza Terpene Synthase 1 (OryzaTPS1), a rice terpene synthase gene involved in indirect defense against insects in Oryza sativa, were cloned from six additional Oryza species. In vitro biochemical analysis showed that the enzymes encoded by these OryzaTPS1 genes functioned either as (E)-β-caryophyllene synthases (ECS), or (E)-β-caryophyllene & germacrene A synthases (EGS), or germacrene D & germacrene A synthases (DAS). Because the orthologs of OryzaTPS1 in maize and sorghum function as ECS, the ECS activity was inferred to be ancestral. Molecular evolutionary detected the signature of positive Darwinian selection in five codon substitutions in the evolution from ECS to DAS. Homology-based structure modeling and the biochemical analysis of laboratory-generated protein variants validated the contribution of the five positively selected sites to functional divergence of OryzaTPS1. The changes in the in vitro product spectra of OryzaTPS1 proteins also correlated closely to the changes in in vivo blends of volatile terpenes released from insect-damaged rice plants.
Conclusions
In this study, we found that positive Darwinian selection is a driving force for the functional divergence of OryzaTPS1. This finding suggests that the diverged sesquiterpene blend produced by the Oryza species containing DAS may be adaptive, likely in the attraction of the natural enemies of insect herbivores.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0239-x) contains supplementary material, which is available to authorized users.
doi:10.1186/s12870-014-0239-x
PMCID: PMC4172859  PMID: 25224158
Plant secondary metabolism; Terpene synthase; Positive selection

Results 1-25 (1316670)