PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-22 (22)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  ApoB-containing lipoproteins regulate angiogenesis by modulating expression of VEGF receptor 1 
Nature medicine  2012;18(6):967-973.
Despite the clear major contribution of hyperlipidemia to the prevalence of cardiovascular disease in the developed world, the direct effects of lipoproteins on endothelial cells have remained obscure and are under debate. Here we report a previously uncharacterized mechanism of vessel growth modulation by lipoprotein availability. Using a genetic screen for vascular defects in zebrafish, we initially identified a mutation, stalactite (stl), in the gene encoding microsomal triglyceride transfer protein (mtp), which is involved in the biosynthesis of apolipoprotein B (ApoB)-containing lipoproteins. By manipulating lipoprotein concentrations in zebrafish, we found that ApoB negatively regulates angiogenesis and that it is the ApoB protein particle, rather than lipid moieties within ApoB-containing lipoproteins, that is primarily responsible for this effect. Mechanistically, we identified downregulation of vascular endothelial growth factor receptor 1 (VEGFR1), which acts as a decoy receptor for VEGF, as a key mediator of the endothelial response to lipoproteins, and we observed VEGFR1 downregulation in hyperlipidemic mice. These findings may open new avenues for the treatment of lipoprotein-related vascular disorders.
doi:10.1038/nm.2759
PMCID: PMC3959651  PMID: 22581286
2.  Hidden Markov Models for Evolution and Comparative Genomics Analysis 
PLoS ONE  2013;8(6):e65012.
The problem of reconstruction of ancestral states given a phylogeny and data from extant species arises in a wide range of biological studies. The continuous-time Markov model for the discrete states evolution is generally used for the reconstruction of ancestral states. We modify this model to account for a case when the states of the extant species are uncertain. This situation appears, for example, if the states for extant species are predicted by some program and thus are known only with some level of reliability; it is common for bioinformatics field. The main idea is formulation of the problem as a hidden Markov model on a tree (tree HMM, tHMM), where the basic continuous-time Markov model is expanded with the introduction of emission probabilities of observed data (e.g. prediction scores) for each underlying discrete state. Our tHMM decoding algorithm allows us to predict states at the ancestral nodes as well as to refine states at the leaves on the basis of quantitative comparative genomics. The test on the simulated data shows that the tHMM approach applied to the continuous variable reflecting the probabilities of the states (i.e. prediction score) appears to be more accurate then the reconstruction from the discrete states assignment defined by the best score threshold. We provide examples of applying our model to the evolutionary analysis of N-terminal signal peptides and transcription factor binding sites in bacteria. The program is freely available at http://bioinf.fbb.msu.ru/~nadya/tHMM and via web-service at http://bioinf.fbb.msu.ru/treehmmweb.
doi:10.1371/journal.pone.0065012
PMCID: PMC3676395  PMID: 23762278
3.  Computational identification of functional introns: high positional conservation of introns that harbor RNA genes 
Nucleic Acids Research  2013;41(11):5604-5613.
An appreciable fraction of introns is thought to have some function, but there is no obvious way to predict which specific intron is likely to be functional. We hypothesize that functional introns experience a different selection regime than non-functional ones and will therefore show distinct evolutionary histories. In particular, we expect functional introns to be more resistant to loss, and that this would be reflected in high conservation of their position with respect to the coding sequence. To test this hypothesis, we focused on introns whose function comes about from microRNAs and snoRNAs that are embedded within their sequence. We built a data set of orthologous genes across 28 eukaryotic species, reconstructed the evolutionary histories of their introns and compared functional introns with the rest of the introns. We found that, indeed, the position of microRNA- and snoRNA-bearing introns is significantly more conserved. In addition, we found that both families of RNA genes settled within introns early during metazoan evolution. We identified several easily computable intronic properties that can be used to detect functional introns in general, thereby suggesting a new strategy to pinpoint non-coding cellular functions.
doi:10.1093/nar/gkt244
PMCID: PMC3675471  PMID: 23605046
4.  The Impact of Gene Duplication, Insertion, Deletion, Lateral Gene Transfer and Sequencing Error on Orthology Inference: A Simulation Study 
PLoS ONE  2013;8(2):e56925.
The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.
Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.
doi:10.1371/journal.pone.0056925
PMCID: PMC3581572  PMID: 23451112
5.  The Ecoresponsive Genome of Daphnia pulex 
Science (New York, N.Y.)  2011;331(6017):555-561.
We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 Mb and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than 1/3 of Daphnia’s genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The co-expansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes – including many additional loci within sequenced regions that are otherwise devoid of annotations – are the most responsive genes to ecological challenges.
doi:10.1126/science.1197761
PMCID: PMC3529199  PMID: 21292972
6.  De Novo Origin of Protein-Coding Genes in Murine Rodents 
PLoS ONE  2012;7(11):e48650.
Background
New genes in eukaryotes are created through a variety of different mechanisms. De novo origin from non-coding DNA is a mechanism that has recently gained attention. So far, de novo genes have been described in a handful of organisms, with Drosophila being the most extensively studied. We searched for genes that have appeared de novo in the mouse and rat lineages.
Methodology
Using a rigorous and conservative approach we identify 75 murine genes (69 mouse genes and 6 rat genes) for which there is good evidence of de novo origin since the divergence of mouse and rat. Each of these genes is only found in either the mouse or rat lineages, with no candidate orthologs nor evidence for potentially-unannotated orthologs in the other lineage. The veracity of each of these genes is supported by expression evidence. Additionally, their presence in one lineage and absence in the other cannot be explained by sequencing gaps. For 11 of the 75 candidate novel genes we could identify a mouse-specific mutation that led to the creation of the open reading frame (ORF) specifically in mouse. None of the six rat-specific genes had an unequivocal rat-specific mutation creating the ORF, which may at least be partly due to lower data quality for that genome.
Conclusions
All 75 candidate genes presented in this study are relatively small and encode short peptides. A large number of them (51 out of 69 mouse genes and 3 out of 6 rat genes) also overlap with other genes, either within introns, or on the opposite strand. These characteristics have previously been documented for de novo genes. The description of these genes opens up the opportunity to integrate this evolutionary analysis with the rich experimental data available for these two model organisms.
doi:10.1371/journal.pone.0048650
PMCID: PMC3504067  PMID: 23185269
7.  Dependencies among Editing Sites in Serotonin 2C Receptor mRNA 
PLoS Computational Biology  2012;8(9):e1002663.
The serotonin 2C receptor (5-HT2CR)–a key regulator of diverse neurological processes–exhibits functional variability derived from editing of its pre-mRNA by site-specific adenosine deamination (A-to-I pre-mRNA editing) in five distinct sites. Here we describe a statistical technique that was developed for analysis of the dependencies among the editing states of the five sites. The statistical significance of the observed correlations was estimated by comparing editing patterns in multiple individuals. For both human and rat 5-HT2CR, the editing states of the physically proximal sites A and B were found to be strongly dependent. In contrast, the editing states of sites C and D, which are also physically close, seem not to be directly dependent but instead are linked through the dependencies on sites A and B, respectively. We observed pronounced differences between the editing patterns in humans and rats: in humans site A is the key determinant of the editing state of the other sites, whereas in rats this role belongs to site B. The structure of the dependencies among the editing sites is notably simpler in rats than it is in humans implying more complex regulation of 5-HT2CR editing and, by inference, function in the human brain. Thus, exhaustive statistical analysis of the 5-HT2CR editing patterns indicates that the editing state of sites A and B is the primary determinant of the editing states of the other three sites, and hence the overall editing pattern. Taken together, these findings allow us to propose a mechanistic model of concerted action of ADAR1 and ADAR2 in 5-HT2CR editing. Statistical approach developed here can be applied to other cases of interdependencies among modification sites in RNA and proteins.
Author Summary
The serotonin receptor 2C is a key regulator of diverse neurological processes that affect feeding behavior, sleep, sexual behavior, anxiety and depression. The function of the receptor itself is regulated via so-called pre-mRNA editing, i.e. site-specific adenosine deamination in five distinct sites. The greater the number of edited sites in the serotonin receptor mRNA, the lower the activity of the receptor it encodes. Here we used the results of extensive massively parallel sequencing from human and rat brains to elucidate the dependencies among the editing states of the five sites. Despite the apparent simplicity of the problem, disambiguation of these dependencies is a difficult task that required development of a new statistical technique. We employed this method to analyse the dependencies among editing in the 5 susceptible sites of the receptor mRNA and found that the proximal, juxtaposed sites A and B are strongly interdependent, and that the editing state of these two sites is a major determinant of the editing states of the other three sites, and hence the overall editing pattern. The statistical approach we developed for the analysis of mRNA editing can be applied to other cases of multiple site modification in RNA and proteins.
doi:10.1371/journal.pcbi.1002663
PMCID: PMC3435259  PMID: 22969417
8.  A Maximum Likelihood Method for Reconstruction of the Evolution of Eukaryotic Gene Structure 
Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence–absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.
doi:10.1007/978-1-59745-243-4_16
PMCID: PMC3410445  PMID: 19381540
Maximum likelihood; expectation-maximization; intron evolution; ancestral reconstruction; eukaryotic gene structure
9.  A Model-Based Method for Gene Dependency Measurement 
PLoS ONE  2012;7(7):e40918.
Many computational methods have been widely used to identify transcription regulatory interactions based on gene expression profiles. The selection of dependency measure is very important for successful regulatory network inference. In this paper, we develop a new method–DBoMM (Difference in BIC of Mixture Models)–for estimating dependency of gene by fitting the gene expression profiles into mixture Gaussian models. We show that DBoMM out-performs 4 other existing methods, including Kendall’s tau correlation (TAU), Pearson Correlation (COR), Euclidean distance (EUC) and Mutual information (MI) using Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster, Arabidopsis thaliana data and synthetic data. DBoMM can also identify condition-dependent regulatory interactions and is robust to noisy data. Of the 741 Escherichia coli regulatory interactions inferred by DBoMM at a 60% true positive rate, 65 are previously known interactions and 676 are novel predictions. To validate the new prediction, the promoter sequences of target genes regulated by the same transcription factors were analyzed and significant motifs were identified.
doi:10.1371/journal.pone.0040918
PMCID: PMC3400631  PMID: 22829898
10.  Origin and evolution of spliceosomal introns 
Biology Direct  2012;7:11.
Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section.
doi:10.1186/1745-6150-7-11
PMCID: PMC3488318  PMID: 22507701
Intron sliding; Intron gain; Intron loss; Spliceosome; Splicing signals; Evolution of exon/intron structure; Alternative splicing; Phylogenetic trees; Mobile domains; Eukaryotic ancestor
11.  The Function of Introns 
The intron–exon architecture of many eukaryotic genes raises the intriguing question of whether this unique organization serves any function, or is it simply a result of the spread of functionless introns in eukaryotic genomes. In this review, we show that introns in contemporary species fulfill a broad spectrum of functions, and are involved in virtually every step of mRNA processing. We propose that this great diversity of intronic functions supports the notion that introns were indeed selfish elements in early eukaryotes, but then independently gained numerous functions in different eukaryotic lineages. We suggest a novel criterion of evolutionary conservation, dubbed intron positional conservation, which can identify functional introns.
doi:10.3389/fgene.2012.00055
PMCID: PMC3325483  PMID: 22518112
intron function; gene architecture; intron–exon structure; intron positional conservation; expression regulation; non-coding RNAs; exon-junction complex; splicing
12.  Using Paleogenomics to Study the Evolution of Gene Families: Origin and Duplication History of the Relaxin Family Hormones and Their Receptors 
PLoS ONE  2012;7(3):e32923.
Recent progress in the analysis of whole genome sequencing data has resulted in the emergence of paleogenomics, a field devoted to the reconstruction of ancestral genomes. Ancestral karyotype reconstructions have been used primarily to illustrate the dynamic nature of genome evolution. In this paper, we demonstrate how they can also be used to study individual gene families by examining the evolutionary history of relaxin hormones (RLN/INSL) and relaxin family peptide receptors (RXFP). Relaxin family hormones are members of the insulin superfamily, and are implicated in the regulation of a variety of primarily reproductive and neuroendocrine processes. Their receptors are G-protein coupled receptors (GPCR's) and include members of two distinct evolutionary groups, an unusual characteristic. Although several studies have tried to elucidate the origins of the relaxin peptide family, the evolutionary origin of their receptors and the mechanisms driving the diversification of the RLN/INSL-RXFP signaling systems in non-placental vertebrates has remained elusive. Here we show that the numerous vertebrate RLN/INSL and RXFP genes are products of an ancestral receptor-ligand system that originally consisted of three genes, two of which apparently trace their origins to invertebrates. Subsequently, diversification of the system was driven primarily by whole genome duplications (WGD, 2R and 3R) followed by almost complete retention of the ligand duplicates in most vertebrates but massive loss of receptor genes in tetrapods. Interestingly, the majority of 3R duplicates retained in teleosts are potentially involved in neuroendocrine regulation. Furthermore, we infer that the ancestral AncRxfp3/4 receptor may have been syntenically linked to the AncRln-like ligand in the pre-2R genome, and show that syntenic linkages among ligands and receptors have changed dynamically in different lineages. This study ultimately shows the broad utility, with some caveats, of incorporating paleogenomics data into understanding the evolution of gene families.
doi:10.1371/journal.pone.0032923
PMCID: PMC3310001  PMID: 22470432
13.  Endothelial cells promote migration and proliferation of enteric neural crest cells via β1 integrin signaling 
Developmental biology  2009;330(2):263-272.
Enteric neural crest-derived cells (ENCCs) migrate along the intestine to form a highly organized network of ganglia that comprises the enteric nervous system (ENS). The signals driving the migration and patterning of these cells are largely unknown. Examining the spatiotemporal development of the intestinal neurovasculature in avian embryos, we find endothelial cells (ECs) present in the gut prior to the arrival of migrating ENCCs. These ECs are patterned in concentric rings that are predictive of the positioning of later arriving crest-derived cells, leading us to hypothesize that blood vessels may serve as a substrate to guide ENCC migration. Immunohistochemistry at multiple stages during ENS development reveals that ENCCs are positioned adjacent to vessels as they colonize the gut. A similar close anatomic relationship between vessels and enteric neurons was observed in zebrafish larvae. When EC development is inhibited in cultured avian intestine, ENCC migration is arrested and distal aganglionosis results, suggesting that ENCCs require the presence of vessels to colonize the gut. Neural tube and avian midgut were explanted onto a variety of substrates, including components of the extracellular matrix and various cell types, such as fibroblasts, smooth muscle cells, and endothelial cells. We find that crest-derived cells from both the neural tube and the midgut migrate avidly onto cultured endothelial cells. This EC-induced migration is inhibited by the presence of CSAT antibody, which blocks binding to β1 integrins expressed on the surface of crest-derived cells. These results demonstrate that ECs provide a substrate for the migration of ENCCs via an interaction between β1 integrins on the ENCC surface and extracellular matrix proteins expressed by the intestinal vasculature. These interactions may play an important role in guiding migration and patterning in the developing ENS.
doi:10.1016/j.ydbio.2009.03.025
PMCID: PMC2690696  PMID: 19345201
enteric nervous system; endothelial cells; blood vessels; Hirschsprung’s disease; integrins; avian; zebrafish
14.  EREM: Parameter Estimation and Ancestral Reconstruction by Expectation-Maximization Algorithm for a Probabilistic Model of Genomic Binary Characters Evolution 
Advances in Bioinformatics  2010;2010:167408.
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches).
doi:10.1155/2010/167408
PMCID: PMC2866244  PMID: 20467467
15.  A Universal Nonmonotonic Relationship between Gene Compactness and Expression Levels in Multicellular Eukaryotes 
Analysis of gene architecture and expression levels of four organisms, Homo sapiens, Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana, reveals a surprising, nonmonotonic, universal relationship between expression level and gene compactness. With increasing expression level, the genes tend at first to become longer but, from a certain level of expression, they become more and more compact, resulting in an approximate bell-shaped dependence. There are two leading hypotheses to explain the compactness of highly expressed genes. The selection hypothesis predicts that gene compactness is predominantly driven by the level of expression, whereas the genomic design hypothesis predicts that expression breadth across tissues is the driving force. We observed the connection between gene expression breadth in humans and gene compactness to be significantly weaker than the connection between expression level and compactness, a result that is compatible with the selection hypothesis but not the genome design hypothesis. The initial gene elongation with increasing expression level could be explained, at least in part, by accumulation of regulatory elements enhancing expression, in particular, in introns. This explanation is compatible with the observed positive correlation between intron density and expression level of a gene. Conversely, the trend toward increasing compactness for highly expressed genes could be caused by selection for minimization of energy and time expenditure during transcription and splicing and for increased fidelity of transcription, splicing, and/or translation that is likely to be particularly critical for highly expressed genes. Regardless of the exact nature of the forces that shape the gene architecture, we present evidence that, at least, in animals, coding and noncoding parts of genes show similar architectonic trends.
doi:10.1093/gbe/evp038
PMCID: PMC2817431  PMID: 20333206
eukaryotic gene structure; eukaryotic gene architecture; selection on gene compactness; genomic design; intron functionality; intron density
16.  Widespread Positive Selection in Synonymous Sites of Mammalian Genes 
Molecular biology and evolution  2007;24(8):1821-1831.
Evolution of protein sequences is largely governed by purifying selection, with a small fraction of proteins evolving under positive selection. The evolution at synonymous positions in protein-coding genes is not nearly as well understood, with the extent and types of selection remaining, largely, unclear. A statistical test to identify purifying and positive selection at synonymous sites in protein-coding genes was developed. The method compares the rate of evolution at synonymous sites (Ks) to that in intron sequences of the same gene after sampling the aligned intron sequences to mimic the statistical properties of coding sequences. We detected purifying selection at synonymous sites in ∼28% of the 1,562 analyzed orthologous genes from mouse and rat, and positive selection in ∼12% of the genes. Thus, the fraction of genes with readily detectable positive selection at synonymous sites is much greater than the fraction of genes with comparable positive selection at nonsynonymous sites, i.e., at the level of the protein sequence. Unlike other genes, the genes with positive selection at synonymous sites showed no correlation between Ks and the rate of evolution in nonsynonymous sites (Ka), indicating that evolution of synonymous sites under positive selection is decoupled from protein evolution. The genes with purifying selection at synonymous sites showed significant anticorrelation between Ks and expression level and breadth, indicating that highly expressed genes evolve slowly. The genes with positive selection at synonymous sites showed the opposite trend, i.e., highly expressed genes had, on average, higher Ks. For the genes with positive selection at synonymous sites, a significantly lower mRNA stability is predicted compared to the genes with negative selection. Thus, mRNA destabilization could be an important factor driving positive selection in nonsynonymous sites, probably, through regulation of expression at the level of mRNA degradation and, possibly, also translation rate. So, unexpectedly, we found that positive selection at synonymous sites of mammalian genes is substantially more common than positive selection at the level of protein sequences. Positive selection at synonymous sites might act through mRNA destabilization affecting mRNA levels and translation.
doi:10.1093/molbev/msm100
PMCID: PMC2632937  PMID: 17522087
synonymous sites; nonsynonymous sites; positive selection; purifying selection; introns
17.  Superposition of Transcriptional Behaviors Determines Gene State 
PLoS ONE  2008;3(8):e2901.
We introduce a novel technique to determine the expression state of a gene from quantitative information measuring its expression. Adopting a productive abstraction from current thinking in molecular biology, we consider two expression states for a gene - Up or Down. We determine this state by using a statistical model that assumes the data behaves as a combination of two biological distributions. Given a cohort of hybridizations, our algorithm predicts, for the single reading, the probability of each gene's being in an Up or a Down state in each hybridization. Using a series of publicly available gene expression data sets, we demonstrate that our algorithm outperforms the prevalent algorithm. We also show that our algorithm can be used in conjunction with expression adjustment techniques to produce a more biologically sound gene-state call. The technique we present here enables a routine update, where the continuously evolving expression level adjustments feed into gene-state calculations. The technique can be applied in almost any multi-sample gene expression experiment, and holds equal promise for protein abundance experiments.
doi:10.1371/journal.pone.0002901
PMCID: PMC2488367  PMID: 18682855
18.  Homoplasy in genome-wide analysis of rare amino acid replacements: the molecular-evolutionary basis for Vavilov's law of homologous series 
Biology Direct  2008;3:7.
Background
Rare genomic changes (RGCs) that are thought to comprise derived shared characters of individual clades are becoming an increasingly important class of markers in genome-wide phylogenetic studies. Recently, we proposed a new type of RGCs designated RGC_CAMs (after Conserved Amino acids-Multiple substitutions) that were inferred using genome-wide identification of amino acid replacements that were: i) located in unambiguously aligned regions of orthologous genes, ii) shared by two or more taxa in positions that contain a different, conserved amino acid in a much broader range of taxa, and iii) require two or three nucleotide substitutions. When applied to animal phylogeny, the RGC_CAM approach supported the coelomate clade that unites deuterostomes with arthropods as opposed to the ecdysozoan (molting animals) clade. However, a non-negligible level of homoplasy was detected.
Results
We provide a direct estimate of the level of homoplasy caused by parallel changes and reversals among the RGC_CAMs using 462 alignments of orthologous genes from 19 eukaryotic species. It is shown that the impact of parallel changes and reversals on the results of phylogenetic inference using RGC_CAMs cannot explain the observed support for the Coelomata clade. In contrast, the evidence in support of the Ecdysozoa clade, in large part, can be attributed to parallel changes. It is demonstrated that parallel changes are significantly more common in internal branches of different subtrees that are separated from the respective common ancestor by relatively short times than in terminal branches separated by longer time intervals. A similar but much weaker trend was detected for reversals. The observed evolutionary trend of parallel changes is explained in terms of the covarion model of molecular evolution. As the overlap between the covarion sets in orthologous genes from different lineages decreases with time after divergence, the likelihood of parallel changes decreases as well.
Conclusion
The level of homoplasy observed here appears to be low enough to justify the utility of RGC_CAMs and other types of RGCs for resolution of hard problems in phylogeny. Parallel changes, one of the major classes of events leading to homoplasy, occur much more often in relatively recently diverged lineages than in those separated from their last common ancestor by longer time intervals of time. This pattern seems to provide the molecular-evolutionary underpinning of Vavilov's law of homologous series and is readily interpreted within the framework of the covarion model of molecular evolution.
Reviewers
This article was reviewed by Alex Kondrashov, Nicolas Galtier, and Maximilian Telford and Robert Lanfear (nominated by Laurence Hurst).
doi:10.1186/1745-6150-3-7
PMCID: PMC2292158  PMID: 18346278
19.  Predicting the Receptive Range of Olfactory Receptors 
PLoS Computational Biology  2008;4(2):e18.
Although the family of genes encoding for olfactory receptors was identified more than 15 years ago, the difficulty of functionally expressing these receptors in an heterologous system has, with only some exceptions, rendered the receptive range of given olfactory receptors largely unknown. Furthermore, even when successfully expressed, the task of probing such a receptor with thousands of odors/ligands remains daunting. Here we provide proof of concept for a solution to this problem. Using computational methods, we tune an electronic nose to the receptive range of an olfactory receptor. We then use this electronic nose to predict the receptors' response to other odorants. Our method can be used to identify the receptive range of olfactory receptors, and can also be applied to other questions involving receptor–ligand interactions in non-olfactory settings.
Author Summary
A key goal in biology is to identify specific ligands for specific receptors. One example is where the ligand is a drug. In turn, in the olfactory system the ligand is the odorant that binds to olfactory receptors. There are many olfactory receptor types, and which odorants will activate which receptors remains largely unknown. One way to answer this is to systematically vary the molecular features of ligands and to measure the olfactory receptor response. However, the vast number of molecular features and their combinations renders such an effort potentially unsolvable. Here, rather than looking at the trees (each molecular feature), we looked at the forest (the smell they generate). We used a device called an electronic nose that generates a patterned response to odorants. We then obtained the response to a set of odorants that are known to activate a particular olfactory receptor, and we used this pattern to predict the response of that receptor to other odorants. We found that, on average in three out of four we could predict the response of olfactory receptors. This result provides a new method for probing the olfactory system, and also suggests a novel method for identifying potential drugs.
doi:10.1371/journal.pcbi.0040018
PMCID: PMC2222922  PMID: 18248088
20.  Predicting the Receptive Range of Olfactory Receptors 
PLoS Computational Biology  2008;4(2):e18.
Although the family of genes encoding for olfactory receptors was identified more than 15 years ago, the difficulty of functionally expressing these receptors in an heterologous system has, with only some exceptions, rendered the receptive range of given olfactory receptors largely unknown. Furthermore, even when successfully expressed, the task of probing such a receptor with thousands of odors/ligands remains daunting. Here we provide proof of concept for a solution to this problem. Using computational methods, we tune an electronic nose to the receptive range of an olfactory receptor. We then use this electronic nose to predict the receptors' response to other odorants. Our method can be used to identify the receptive range of olfactory receptors, and can also be applied to other questions involving receptor–ligand interactions in non-olfactory settings.
Author Summary
A key goal in biology is to identify specific ligands for specific receptors. One example is where the ligand is a drug. In turn, in the olfactory system the ligand is the odorant that binds to olfactory receptors. There are many olfactory receptor types, and which odorants will activate which receptors remains largely unknown. One way to answer this is to systematically vary the molecular features of ligands and to measure the olfactory receptor response. However, the vast number of molecular features and their combinations renders such an effort potentially unsolvable. Here, rather than looking at the trees (each molecular feature), we looked at the forest (the smell they generate). We used a device called an electronic nose that generates a patterned response to odorants. We then obtained the response to a set of odorants that are known to activate a particular olfactory receptor, and we used this pattern to predict the response of that receptor to other odorants. We found that, on average in three out of four we could predict the response of olfactory receptors. This result provides a new method for probing the olfactory system, and also suggests a novel method for identifying potential drugs.
doi:10.1371/journal.pcbi.0040018
PMCID: PMC2222922  PMID: 18248088
21.  Patterns of intron gain and conservation in eukaryotic genes 
Background:
The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
Results:
We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
Conclusion:
We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
doi:10.1186/1471-2148-7-192
PMCID: PMC2151770  PMID: 17935625
22.  Unifying measures of gene function and evolution 
Recent genome analyses revealed intriguing correlations between variables characterizing the functioning of a gene, such as expression level (EL), connectivity of genetic and protein–protein interaction networks, and knockout effect, and variables describing gene evolution, such as sequence evolution rate (ER) and propensity for gene loss. Typically, variables within each of these classes are positively correlated, e.g. products of highly expressed genes also have a propensity to be involved in many protein–protein interactions, whereas variables between classes are negatively correlated, e.g. highly expressed genes, on average, evolve slower than weakly expressed genes. Here, we describe principal component (PC) analysis of seven genome-related variables and propose biological interpretations for the first three PCs. The first PC reflects a gene's ‘importance’, or the ‘status’ of a gene in the genomic community, with positive contributions from knockout lethality, EL, number of protein–protein interaction partners and the number of paralogues, and negative contributions from sequence ER and gene loss propensity. The next two PCs define a plane that seems to reflect the functional and evolutionary plasticity of a gene. Specifically, PC2 can be interpreted as a gene's ‘adaptability’ whereby genes with high adaptability readily duplicate, have many genetic interaction partners and tend to be non-essential. PC3 also might reflect the role of a gene in organismal adaptation albeit with a negative rather than a positive contribution of genetic interactions; we provisionally designate this PC ‘reactivity’. The interpretation of PC2 and PC3 as measures of a gene's plasticity is compatible with the observation that genes with high values of these PCs tend to be expressed in a condition- or tissue-specific manner. Functional classes of genes substantially vary in status, adaptability and reactivity, with the highest status characteristic of the translation system and cytoskeletal proteins, highest adaptability seen in cellular processes and signalling genes, and top reactivity characteristic of metabolic enzymes.
doi:10.1098/rspb.2006.3472
PMCID: PMC1560323  PMID: 16777745
gene expression; gene dispensability; protein–protein interaction; sequence evolution rate; gene loss; principal component analysis

Results 1-22 (22)