Search tips
Search criteria

Results 1-25 (1296609)

Clipboard (0)

Related Articles

1.  Rooting Gene Trees without Outgroups: EP Rooting 
Genome Biology and Evolution  2012;4(8):821-831.
Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167–181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301–316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60–76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489–493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763–766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255–260).
PMCID: PMC3509888  PMID: 22593551
rooting; trees; gene sequences; evolutionary parsimony; metazoa; Bayesian statistics; linear invariants
2.  Genetic Analysis of Fin Development in Zebrafish Identifies Furin and Hemicentin1 as Potential Novel Fraser Syndrome Disease Genes 
PLoS Genetics  2010;6(4):e1000907.
Using forward genetics, we have identified the genes mutated in two classes of zebrafish fin mutants. The mutants of the first class are characterized by defects in embryonic fin morphogenesis, which are due to mutations in a Laminin subunit or an Integrin alpha receptor, respectively. The mutants of the second class display characteristic blistering underneath the basement membrane of the fin epidermis. Three of them are due to mutations in zebrafish orthologues of FRAS1, FREM1, or FREM2, large basement membrane protein encoding genes that are mutated in mouse bleb mutants and in human patients suffering from Fraser Syndrome, a rare congenital condition characterized by syndactyly and cryptophthalmos. Fin blistering in a fourth group of zebrafish mutants is caused by mutations in Hemicentin1 (Hmcn1), another large extracellular matrix protein the function of which in vertebrates was hitherto unknown. Our mutant and dose-dependent interaction data suggest a potential involvement of Hmcn1 in Fraser complex-dependent basement membrane anchorage. Furthermore, we present biochemical and genetic data suggesting a role for the proprotein convertase FurinA in zebrafish fin development and cell surface shedding of Fras1 and Frem2, thereby allowing proper localization of the proteins within the basement membrane of forming fins. Finally, we identify the extracellular matrix protein Fibrillin2 as an indispensable interaction partner of Hmcn1. Thus we have defined a series of zebrafish mutants modelling Fraser Syndrome and have identified several implicated novel genes that might help to further elucidate the mechanisms of basement membrane anchorage and of the disease's aetiology. In addition, the novel genes might prove helpful to unravel the molecular nature of thus far unresolved cases of the human disease.
Author Summary
There are a large number of human genetic syndromes with limb and digit deformities. It has been shown that the genes underlying these syndromes are well conserved in evolution, and most perform the same role even in the fins of fish. One such human syndrome is Fraser Syndrome, characterized by a number of defects including fusion of the fingers (syndactyly). Data obtained with corresponding mouse mutants suggest that all of these defects are due to transient basement membrane disruptions and epithelial blistering during development. Whilst some of the Fraser Syndrome genes have been identified, others are unknown. We show that mutation of the known Fraser Syndrome genes in zebrafish generate comparable blistering defects in the fins. Importantly, we have also identified additional genes and mechanisms required for the same processes. Included in this are hemicentin1, a gene whose function had thus far only been studied in nematodes, and furinA, encoding a proprotein convertase, for which we reveal a novel role in ectodomain shedding of Fras/Frem proteins. This work thus expands our understanding, not only of Fraser Syndrome, but also of the common processes of basement membrane formation and function during fin and limb development.
PMCID: PMC2855323  PMID: 20419147
3.  Long-branch attraction and the phylogeny of true water bugs (Hemiptera: Nepomorpha) as estimated from mitochondrial genomes 
Most previous studies of morphological and molecular data have consistently supported the monophyly of the true water bugs (Hemiptera: Nepomorpha). An exception is a recent study by Hua et al. (BMC Evol Biol 9: 134, 2009) based on nine nepomorphan mitochondrial genomes. In the analysis of Hua et al. (BMC Evol Biol 9: 134, 2009), the water bugs in the group Pleoidea formed the sister group to a clade that consisted of Nepomorpha (the remaining true water bugs) + Leptopodomorpha (shore bugs) + Cimicomorpha (assassin bugs and relatives) + Pentatomomorpha (stink bugs and relatives), thereby suggesting that fully aquatic hemipterans evolved independently at least twice. Based on these results, Hua et al. (BMC Evol Biol 9: 134, 2009) elevated the Pleoidea to a new infraorder, the Plemorpha.
Our reanalysis suggests that the lack of support for the monophyly of the true water bugs (including Pleoidea) by Hua et al. (BMC Evol Biol 9: 134, 2009) likely resulted from inadequate taxon sampling. In particular, long-branch attraction (LBA) between the distant outgroup taxa and Pleoidea, as well as LBA among taxa in the ingroup, made Nepomorpha appear to be polyphyletic. We used three complementary strategies to test and alleviate the effects of LBA: (1) the removal of distant outgroups from the analysis; (2) the addition of closely related outgroups; and (3) the addition of a mitochondrial genome from a second family of Pleoidea. We also performed likelihood-ratio tests to examine the support for monophyly of Nepomorpha with different combinations of taxa included in the analysis. Furthermore, we found that specimens of Helotrephes sp. were misidentified as Paraplea frontalis (Fieber, 1844) by Hua et al. (BMC Evol Biol 9: 134, 2009).
All analyses that included the addition of more taxa significantly and consistently supported the placement of Pleoidea within the Nepomorpha (i.e., supported the monophyly of the traditional true water bugs). Our analyses further support a close relationship between Notonectoidea and Pleoidea within Nepomorpha, and the superfamilies Nepoidea, Ochteroidea, Naucoroidea, and Pleoidea are resolved as monophyletic in all trees with strong support. Our results also confirmed that monophyly of Nepomorpha clearly is not refuted by the mitochondrial genome data.
PMCID: PMC4101842  PMID: 24884699
Long-branch attraction; Nepomorpha; Mitochondrial genome; Taxon sampling; Likelihood-ratio test
4.  Metabolic modeling of endosymbiont genome reduction on a temporal scale 
This study explores the order in which individual metabolic genes are lost in an in silico evolutionary process leading from the metabolic network of Eschericia coli to that of the genome-reduced endosymbiont Buchnera aphidicola.
Simulating the reductive evolutionary process under several growth conditions, a remarkable correlation between in silico and phylogenetically reconstructed gene loss time is obtained.A gene's k-robustness (its depth of backups) is prime determinant of its loss time.In silico gene loss time is a better predictor of their actual loss times than genomic features and network properties.Simulating the reductive evolutionary process by the loss of large blocks followed by single-gene deletions, as known to occur in evolution, yields a remarkable correspondence with the phylogenetic reconstruction and the block loss reported in the literature.
An open fundamental challenge in Systems Biology is whether a genome-scale model can predict patterns of genome evolution by realistically accounting for the associated biochemical constraints. In this study, we explore the order in which individual genes are lost in an in silico evolutionary process, leading from the metabolic network of Eschericia coli to that of the endosymbiont Buchnera aphidicola.
To evaluate the in silico gene loss time, we repeated the reductive evolutionary process introduced by Pál et al (2006), denoting the in silico deletion time of a gene in a single run of the reductive evolutionary process as the number of genes deleted before its own deletion occurred. By comparing the in silico evaluations of the gene loss time to that obtained by a phylogenetic reconstruction (Figure 1), we could evaluate the ability of an in silico process to predict temporal patterns of genome reduction. Applying this procedure on a literature-based viable media, we obtained a mean Spearman's correlation of 0.46 (53% of the maximal correlation, empirical P-value <9.9e−4) between in silico and phylogenetically reconstructed loss times. In order to provide an upper bound on evolutionary necessity stemming from metabolic constraints, we searched the space of potential growth media and biomass functions via a simulated annealing search algorithm aimed at identifying an environment/biomass function that maximizes the target correlation between in silico and reconstructed loss times. Simulating the reductive evolutionary process under the growth conditions and biomass function obtained in this process, we managed to improve the correlation between in silico and reconstructed loss times to a mean Spearman's correlation of 0.54 (63% of the maximal correlation, empirical P-value <9.9e−4, Figure 3).
Examining the dependency of the predicted loss time of each gene on its intrinsic network-level properties we find a very strong inverse Spearman's correlation of −0.84 (empirical P-value <9.9e−4) between the order of gene loss predicted in silico and the k-robustness levels of the genes, the latter denoting the depth of their functional backups in the network (Deutscher et al, 2006). Moreover, in order to examine whether the relative loss time of a gene is influenced by its functional dependencies with other genes, we performed a flux-coupling analysis and identified pairs of reactions whose activities asymmetrically depend on each other, i.e., are directionally coupled (Burgard et al, 2004). We find that genes encoding reactions whose activity is needed for activating the other reaction (and not vice versa) have a tendency to be lost later, as one would expect (binomial P-value <1e−14).
To assess the scale of these results, we examined as a control how well genomic features and network properties predict the phylogenetically reconstructed gene loss times. We examined the dependency of the latter on several factors that are known be inversely correlated with the propensity of a gene to be lost (Brinza et al, 2009; Delmotte et al, 2006; Tamames et al, 2007), including the genes' mRNA levels, tAI values (Covert et al, 2004; Reis et al, 2004; Sharp and Li, 1987; Tuller et al, 2010a) and the number of partners the gene products have in a protein–protein interaction network. Remarkably, these genomic features yield considerably lower Spearman's correlation than that obtained by the in silico simulations. Moreover, multiply regressing the loss times from the phylogenetic reconstruction on the in silico gene loss time predictions and the genomic and network variables, we found that the (normalized) coefficient of the in silico predictions in the regression is much higher than those of the genomic features, further testifying to the considerable independent predictive power of the metabolic model.
Finally, simulating the evolutionary process as large block deletions at first followed by single-gene deletions as is thought to occur in evolution (Moran and Mira, 2001; van Ham et al, 2003), a remarkable correspondence with the phylogenetic reconstruction was found. Namely, we find that after a certain amount of genes are deleted from the genome, no further block deletions can occur due to the increasing density of essential genes. Notably, the maximum amount of genes that can be deleted in blocks (i.e., until no more blocks can be deleted) corresponds to the number of genes appearing in our phylogenetic reconstruction from the LCA (last common ancestor of Buchnera and E. coli) to the LCSA (last common symbiotic ancestor, nodes 1–3 in Figure 1A), as described in the literature.
A fundamental challenge in Systems Biology is whether a cell-scale metabolic model can predict patterns of genome evolution by realistically accounting for associated biochemical constraints. Here, we study the order in which genes are lost in an in silico evolutionary process, leading from the metabolic network of Eschericia coli to that of the endosymbiont Buchnera aphidicola. We examine how this order correlates with the order by which the genes were actually lost, as estimated from a phylogenetic reconstruction. By optimizing this correlation across the space of potential growth and biomass conditions, we compute an upper bound estimate on the model's prediction accuracy (R=0.54). The model's network-based predictive ability outperforms predictions obtained using genomic features of individual genes, reflecting the effect of selection imposed by metabolic stoichiometric constraints. Thus, while the timing of gene loss might be expected to be a completely stochastic evolutionary process, remarkably, we find that metabolic considerations, on their own, make a marked 40% contribution to determining when such losses occur.
PMCID: PMC3094061  PMID: 21451589
constraint-based modeling; endosymbiont; evolution; metabolism
5.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
6.  Understanding protein evolutionary rate by integrating gene co-expression with protein interactions 
BMC Systems Biology  2010;4:179.
Among the many factors determining protein evolutionary rate, protein-protein interaction degree (PPID) has been intensively investigated in recent years, but its precise effect on protein evolutionary rate is still heavily debated.
We first confirmed that the correlation between protein evolutionary rate and PPID varies considerably across different protein interaction datasets. Specifically, because of the maximal inconsistency between yeast two-hybrid and other datasets, we reasoned that the difference in experimental methods contributes to our inability to clearly define how PPID affects protein evolutionary rate. To address this, we integrated protein interaction and gene co-expression data to derive a co-expressed protein-protein interaction degree (ePPID) measure, which reflects the number of partners with which a protein can permanently interact. Thus, irrespective of the experimental method employed, we found that (1) ePPID is a better predictor of protein evolutionary rate than PPID, (2) ePPID is a more robust predictor of protein evolutionary rate than PPID, and (3) the contribution of ePPID to protein evolutionary rate is statistically independent of expression level. Analysis of hub proteins in the Structural Interaction Network further supported ePPID as a better predictor of protein evolutionary rate than the number of distinct binding interfaces and clarified the slower evolution of co-expressed multi-interface hub proteins over that of other hub proteins.
Our study firmly established ePPID as a robust predictor of protein evolutionary rate, irrespective of experimental method, and underscored the importance of permanent interactions in shaping the evolutionary outcome.
PMCID: PMC3022652  PMID: 21190591
7.  An Experimentally Informed Evolutionary Model Improves Phylogenetic Fit to Divergent Lactamase Homologs 
Molecular Biology and Evolution  2014;31(10):2753-2769.
Phylogenetic analyses of molecular data require a quantitative model for how sequences evolve. Traditionally, the details of the site-specific selection that governs sequence evolution are not known a priori, making it challenging to create evolutionary models that adequately capture the heterogeneity of selection at different sites. However, recent advances in high-throughput experiments have made it possible to quantify the effects of all single mutations on gene function. I have previously shown that such high-throughput experiments can be combined with knowledge of underlying mutation rates to create a parameter-free evolutionary model that describes the phylogeny of influenza nucleoprotein far better than commonly used existing models. Here, I extend this work by showing that published experimental data on TEM-1 beta-lactamase (Firnberg E, Labonte JW, Gray JJ, Ostermeier M. 2014. A comprehensive, high-resolution map of a gene’s fitness landscape. Mol Biol Evol. 31:1581–1592) can be combined with a few mutation rate parameters to create an evolutionary model that describes beta-lactamase phylogenies much better than most common existing models. This experimentally informed evolutionary model is superior even for homologs that are substantially diverged (about 35% divergence at the protein level) from the TEM-1 parent that was the subject of the experimental study. These results suggest that experimental measurements can inform phylogenetic evolutionary models that are applicable to homologs that span a substantial range of sequence divergence.
PMCID: PMC4166927  PMID: 25063439
phylogenetics; deep mutational scanning; lactamase; protein evolution; substitution model
8.  Inteins, introns, and homing endonucleases: recent revelations about the life cycle of parasitic genetic elements 
Self splicing introns and inteins that rely on a homing endonuclease for propagation are parasitic genetic elements. Their life-cycle and evolutionary fate has been described through the homing cycle. According to this model the homing endonuclease is selected for function only during the spreading phase of the parasite. This phase ends when the parasitic element is fixed in the population. Upon fixation the homing endonuclease is no longer under selection, and its activity is lost through random processes. Recent analyses of these parasitic elements with functional homing endonucleases suggest that this model in its most simple form is not always applicable. Apparently, functioning homing endonuclease can persist over long evolutionary times in populations and species that are thought to be asexual or nearly asexual. Here we review these recent findings and discuss their implications. Reasons for the long-term persistence of a functional homing endonuclease include: More recombination (sexual and as a result of gene transfer) than previously assumed for these organisms; complex population structures that prevent the element from being fixed; a balance between active spreading of the homing endonuclease and a decrease in fitness caused by the parasite in the host organism; or a function of the homing endonuclease that increases the fitness of the host organism and results in purifying selection for the homing endonuclease activity, even after fixation in a local population. In the future, more detailed studies of the population dynamics of the activity and regulation of homing endonucleases are needed to decide between these possibilities, and to determine their relative contributions to the long term survival of parasitic genes within a population. Two outstanding publications on the amoeba Naegleria group I intron (Wikmark et al. BMC Evol Biol 2006, 6:39) and the PRP8 inteins in ascomycetes (Butler et al.BMC Evol Biol 2006, 6:42) provide important stepping stones towards integrated studies on how these parasitic elements evolve through time together with, or despite, their hosts.
PMCID: PMC1654191  PMID: 17101053
9.  Isolation and characterization of a sperm-specific gene family in the nematode Caenorhabditis elegans. 
Molecular and Cellular Biology  1984;4(3):529-537.
The major sperm protein (MSP) of the nematode Caenorhabditis elegans is a low-molecular-weight (15,000) basic protein implicated in the pseudopodial movement of mature spermatozoa. Its synthesis occurs in a specific region of the gonad and is regulated at the level of transcription (M. Klass and D. Hirsh, Dev. Biol. 84:299-312, 1981; S. Ward and M. Klass, Dev. Biol. 92:203-208, 1982; Klass et al., Dev. Biol. 93:152-164, 1982). A developmentally regulated gene family has been identified that codes for this MSP. Whole genomic blots, as well as analysis of genomic clone banks, indicate that there are between 15 and 25 copies of the MSP gene in the nematode genome. Southern blot analysis also indicates that there is no rearrangement or amplification within the MSP gene family during development. No evidence was found of methylation at various restriction sites surrounding the MSP gene family, and similarly, no correlation between methylation and expression was observed. Three distinct members of this MSP gene family have been cloned, and their nucleotide sequences have been determined. Differential screening of a cDNA clone bank made from polyadenylated mRNA from adult males yielded 45 male-specific clones, 32 of which were clones of MSP genes. One of these cDNA clones was found to contain the entire nucleotide sequence for the MSP, including part of the 5' leader and all of the 3' trailing sequence. Genomic clones bearing copies of the MSP genes have been isolated. At least one of the members of this gene family is a pseudogene. Another member of the MSP gene family that has been cloned from genomic DNA contains the entire uninterrupted structural sequence for the MSP in addition to a 5' flanking sequence containing a promoter-like region with the classic TATA box, a sequence resembling the CAAT box, and a putative ribosome-binding sequence. The 3' trailing sequences of the genomic and the cDNA clones contain an AATAAA polyadenylation site.
PMCID: PMC368732  PMID: 6325882
10.  Zebrafish Pou5f1-dependent transcriptional networks in temporal control of early development 
Time-resolved transcriptome analysis of early pou5f1 mutant zebrafish embryos identified groups of developmental regulators, including SoxB1 genes, that depend on Pou5f1 activity, and a large cluster of differentiation genes which are prematurely expressed.Pou5f1 represses differentiation genes indirectly via activation of germlayer-specific transcriptional repressor genes, including her3, which may mediate in part Pou5f1-dependent repression of neural genes.A dynamic mathematical model is established for Pou5f1 and SoxB1 activity-dependent temporal behaviour of downstream transcriptional regulatory networks. The model predicts that Pou5f1-dependent increase in SoxB1 activity significantly contributes to developmental timing in the early gastrula.Comparison to mouse Pou5f1/Oct4 reveals evolutionary conserved targets. We show that Pou5f1 developmental function is also conserved by demonstrating rescue of Pou5f1 mutant zebrafish embryos by mouse POU5F1/OCT4.
The transcription factor Pou5f1/Oct4 controls pluripotency of mouse embryonic inner cell mass cells (Nichols et al, 1998), and of mouse and human ES cell lines (Boiani and Scholer, 2005). Although Pou5f1/Oct4-dependent pluripotency transcriptional circuits and many transcriptional targets have been characterized, little is known about the mechanisms by which Pou5f1/Oct4 controls early developmental events. A detailed understanding of Pou5f1/Oct4 functions during mammalian blastocyst and gastrula development as well as studies of the temporal changes in the Pou5f1/Oct4-regulated networks are precluded by the early lineage defects in pou5f1/oct4 mutant mice. To investigate Pou5f1-dependent transcriptional circuits in developmental control, we used the zebrafish (Danio rerio) as a genetic and experimental model representing an earlier state of vertebrate evolution. Zebrafish have one pou5f1/pou2 gene (Takeda et al, 1994) orthologous to the mammalian gene (Niwa et al, 2008; Frankenberg et al, 2009). Both fish and mammalian orthologs are expressed broadly in tissues giving rise to the embryo proper during blastula and early gastrula stages, as well as in the neural plate (Belting et al, 2001; Reim and Brand, 2002; Downs, 2008).
Zebrafish pou5f1 loss-of-function mutant embryos, MZspg (abbreviated ‘MZ'), are completely devoid of maternal and zygotic Pou5f1 activity (Lunde et al, 2004; Reim et al, 2004). MZ embryos have gastrulation abnormalities (Lachnit et al, 2008), dorsoventral patterning defects (Reim and Brand, 2006), and do not develop endoderm (Lunde et al, 2004; Reim et al, 2004). In contrast to Pou5f1/Oct4 mutant mice, which are blocked in development due to loss of inner cell mass, MZ mutant embryos are neither blocked in development nor display a general delay. Therefore, zebrafish present a good model system to identify specific transcriptional targets of Pou5f1 during development.
Our study aims to understand the structure, regulatory logic, and developmental temporal changes in the Pou5f1-dependent transcriptional network in the context of an intact embryo. Therefore, we investigated transcriptome changes in MZ compared with WT zebrafish by microarray analysis at 10 distinct time-points during development, from ovaries to late gastrulation. We identified changes in Pou5f1 target gene expression both with respect to their expression level and temporal behavior. We used correlation analysis to identify clusters of target genes enriched for genes with developmentally regulated expression profiles. This correlation analysis revealed a cluster of genes, which were not activated or were significantly delayed in MZ. Interestingly, there was also a large gene cluster with premature onset of expression in MZ.
Several targets activated by Pou5f1 encode known repressors of differentiation (RODs), of which we analyzed her3 in detail. Pou5f1 also activates several SoxB1 group transcription factors, which are known to act together with Pou5f1 in mammalian systems. Among the large group of genes prematurely activated in MZ, many genes encode developmental regulators of differentiation normally acting during organogenesis (promoters of differentiation—PODs). Our analysis of potential direct transcriptional interactions by suppression of translation of intermediate zygotic Pou5f1 or SoxB1 targets, enabled us to distinguish Sox-dependent and independent subgroups of the Pou5f1 transcriptional network. Interestingly, tissue-specific expression of Pou5f1 targets correlated with their regulation by Sox2, with Sox-dependent targets being mostly localized to ectoderm and neuroectoderm, whereas Sox-independent targets localized to mesendoderm of the developing zebrafish embryo. Further, SoxB1 independent Pou5f1 targets (for example foxD3) differ from SoxB1-dependent targets (e.g her3) in temporal dynamics of expression. Most Sox-independent direct Pou5f1 targets in WT reach maximal expression levels soon after midblastula transition (MBT) at 3–4 h postfertilization (hpf). In contrast, genes depending both on Sox2 and Pou5f1 tend to have a biphasic temporal expression curve or are activated with >2 h delay after MBT to reach maximum levels at 6–7 hpf only.
To better understand the impact of our findings on Pou5f1/SoxB1-dependent versus Pou5f1-only regulation on developmental mechanisms, we built a small dynamic network model that links the temporal control of target genes to regulatory principles exerted by Pou5f1 and SoxB1 proteins (Figure 6A). The model is based on ordinary differential equations, and parameters were determined by a fit to the WT and MZ gene expression data. The optimized model highlights two qualitatively different temporal expression modes of Pou5f1 downstream targets: monophasic for targets depending only on Pou5f1 (foxd3), and biphasic for Pou5f1- and SoxB1-dependent targets (sox2 and her3; Figure 6B). To test whether the model is also able to correctly predict a different genetic condition, we simulated the M mutant, which is lacking maternal Pou5f1, but gradually rescued by the paternal pou5f1 contribution after MBT (Figure 6B, blue, dashed curve). The model predicts an overall shift in the developmental program. Most importantly, the sox2 and her3 expression is rescued with a delay of about 2 h. The model predictions were checked experimentally by quantitative RT–PCR (Figure 6B, blue dots). Most predictions are in good agreement with the experimental data, for example the delayed rescue of the sox2 and her3 temporal expression profile. With respect to the ‘POD' nr2f1, the model correctly predicts the efficient downregulation by zygotic targets of Pou5f1 (Figure 6B).
We identified an evolutionary conserved core set of Pou5f1 targets, by comparing our gene list with the lists of mouse Pou5f1/Oct4 targets (Loh et al, 2006; Sharov et al, 2008). The evolutionary conservation suggests equivalent Pou5f1 functions during the pregastrulation and gastrulation period of vertebrate embryogenesis. Therefore, we tested whether mouse Pou5f1/Oct4 was able to rescue MZ embryos. Injection of mRNA encoding mouse Pou5f1/Oct4 into MZ embryos (Figure 8A) was able to restore normal zebrafish development to an extent comparable with zebrafish pou5f1/pou2 mRNA (Figure 8B and C). The significant overlap between zebrafish and mammalian Pou5f1 targets together with the ability of mouse Pou5f1/Oct4 to functionally replace the zebrafish Pou5f1/Pou2 (Figure 8A–C), suggests that the mammalian network may have evolved from a basal situation similar to what is observed in teleosts. We propose models that emphasize the evolution of Pou5f1-dependent transcriptional networks during development of the zebrafish (Figure 8D) and mammals (Figure 8E). Our representation highlights the evolutionary ancient germlayer-specific subnetworks downstream of Pou5f1, which are presumably used for controlling the timing of differentiation during gastrulation in all vertebrates (Figure 8D and E, black arrows). As the Pou5f1 downstream regulatory nodes revealed in our zebrafish model are likely conserved across vertebrates, we envision that their knowledge will contribute to the effort of directing differentiation of pluripotent stem cells to defined cell fates.
The transcription factor POU5f1/OCT4 controls pluripotency in mammalian ES cells, but little is known about its functions in the early embryo. We used time-resolved transcriptome analysis of zebrafish pou5f1 MZspg mutant embryos to identify genes regulated by Pou5f1. Comparison to mammalian systems defines evolutionary conserved Pou5f1 targets. Time-series data reveal many Pou5f1 targets with delayed or advanced onset of expression. We identify two Pou5f1-dependent mechanisms controlling developmental timing. First, several Pou5f1 targets are transcriptional repressors, mediating repression of differentiation genes in distinct embryonic compartments. We analyze her3 gene regulation as example for a repressor in the neural anlagen. Second, the dynamics of SoxB1 group gene expression and Pou5f1-dependent regulation of her3 and foxD3 uncovers differential requirements for SoxB1 activity to control temporal dynamics of activation, and spatial distribution of targets in the embryo. We establish a mathematical model of the early Pou5f1 and SoxB1 gene network to demonstrate regulatory characteristics important for developmental timing. The temporospatial structure of the zebrafish Pou5f1 target networks may explain aspects of the evolution of the mammalian stem cell networks.
PMCID: PMC2858445  PMID: 20212526
developmental timing; mathematical modeling; Oct4; transcriptional networks
11.  Complaints in for-profit, non-profit and public nursing homes in two Canadian provinces 
Open Medicine  2011;5(4):e183-e192.
Nursing homes provide long-term housing, support and nursing care to frail elders who are no longer able to function independently. Although studies conducted in the United States have demonstrated an association between for-profit ownership and inferior quality, relatively few Canadian studies have made performance comparisons with reference to type of ownership. Complaints are one proxy measure of performance in the nursing home setting. Our study goal was to determine whether there is an association between facility ownership and the frequency of nursing home complaints.
We analyzed publicly available data on complaints, regulatory measures, facility ownership and size for 604 facilities in Ontario over 1 year (2007/08) and 62 facilities in British Columbia (Fraser Health region) over 4 years (2004–2008). All analyses were carried out at the facility level. Negative binomial regression analysis was used to assess the association between type of facility ownership and frequency of complaints.
The mean (standard deviation) number of verified/substantiated complaints per 100 beds per year in Ontario and Fraser Health was 0.45 (1.10) and 0.78 (1.63) respectively. Most complaints related to resident care. Complaints were more frequent in facilities with more citations, i.e., violations of the legislation or regulations governing a home, (Ontario) and inspection violations (Fraser Health). Compared with Ontario’s for-profit chain facilities, adjusted incident rate ratios and 95% confidence intervals of verified complaints were 0.56 (0.27–1.16), 0.58 (0.34–1.00), 0.43 (0.21– 0.88), and 0.50 (0.30– 0.84) for for-profit single-site, non-profit, charitable, and public facilities respectively. In Fraser Health, the adjusted incident rate ratio of substantiated complaints in non-profit facilities compared with for-profit facilities was 0.18 (0.07–0.45).
Compared with for-profit chain facilities, non-profit, charitable and public facilities had significantly lower rates of complaints in Ontario. Likewise, in British Columbia’s Fraser Health region, non-profit owned facilities had significantly lower rates of complaints compared with for-profit owned facilities.
PMCID: PMC3345377  PMID: 22567074
12.  Protein protein interactions, evolutionary rate, abundance and age 
BMC Bioinformatics  2006;7:128.
Does a relationship exist between a protein's evolutionary rate and its number of interactions? This relationship has been put forward many times, based on a biological premise that a highly interacting protein will be more restricted in its sequence changes. However, to date several studies have voiced conflicting views on the presence or absence of such a relationship.
Here we perform a large scale study over multiple data sets in order to demonstrate that the major reason for conflict between previous studies is the use of different but overlapping datasets. We show that lack of correlation, between evolutionary rate and number of interactions in a data set is related to the error rate. We also demonstrate that the correlation is not an artifact of the underlying distributions of evolutionary distance and interactions and is therefore likely to be biologically relevant. Further to this, we consider the claim that the dependence is due to gene expression levels and find some supporting evidence. A strong and positive correlation between the number of interactions and the age of a protein is also observed and we show this relationship is independent of expression levels.
A correlation between number of interactions and evolutionary rate is observed but is dependent on the accuracy of the dataset being used. However it appears that the number of interactions a protein participates in depends more on the age of the protein than the rate at which it changes.
PMCID: PMC1431566  PMID: 16533385
13.  Phylogenetic mixture models for proteins 
Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution. We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TreeBase. We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average AIC gain per site with TreeBase test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008 Mol. Biol. Evol. 25, 1307–1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from
PMCID: PMC2607422  PMID: 18852096
amino acid replacement matrices; JTT, WAG and LG; CAT profile model; maximum-likelihood estimations; phylogenetic inference
14.  Regulatory Network Structure as a Dominant Determinant of Transcription Factor Evolutionary Rate 
PLoS Computational Biology  2012;8(10):e1002734.
The evolution of transcriptional regulatory networks has thus far mostly been studied at the level of cis-regulatory elements. To gain a complete understanding of regulatory network evolution we must also study the evolutionary role of trans-factors, such as transcription factors (TFs). Here, we systematically assess genomic and network-level determinants of TF evolutionary rate in yeast, and how they compare to those of generic proteins, while carefully controlling for differences of the TF protein set, such as expression level. We found significantly distinct trends relating TF evolutionary rate to mRNA expression level, codon adaptation index, the evolutionary rate of physical interaction partners, and, confirming previous reports, to protein-protein interaction degree and regulatory in-degree. We discovered that for TFs, the dominant determinants of evolutionary rate lie in the structure of the regulatory network, such as the median evolutionary rate of target genes and the fraction of species-specific target genes. Decomposing the regulatory network by edge sign, we found that this modular evolution of TFs and their targets is limited to activating regulatory relationships. We show that fast evolving TFs tend to regulate other TFs and niche-specific processes and that their targets show larger evolutionary expression changes than targets of other TFs. We also show that the positive trend relating TF regulatory in-degree and evolutionary rate is likely related to the species-specificity of the transcriptional regulation modules. Finally, we discuss likely causes for TFs' different evolutionary relationship to the physical interaction network, such as the prevalence of transient interactions in the TF subnetwork. This work suggests that positive and negative regulatory networks follow very different evolutionary rules, and that transcription factor evolution is best understood at a network- or systems-level.
Author Summary
Transcription factors (TFs) are proteins which regulate the expression of genes by interacting with DNA. Mutations in TF protein sequences can affect the expression levels of regulated genes throughout evolution. In this study, we look into the factors which cause the different TFs in baker's yeast to be more or less tolerant of mutations during recent evolution. This tolerance is measured as the evolutionary rate, defined for each protein as the relative rate of protein-changing DNA mutations over other mutations (Ka/Ks). We found that the typical determinants of protein evolutionary rate, such as expression level and network interactions have a very different influence on TF evolutionary rate. We found that TF evolutionary rate is most highly correlated to the evolutionary properties of the genes which they regulate and specifically genes which they activate. We also show that TF evolutionary rate predicts actual evolutionary expression differences of regulated genes and we discuss some of the features unique to TFs which likely contribute to their different evolutionary trends, such as the types of protein-protein interactions prevalent in the TF subnetwork or TFs' potential role in adaptive evolution.
PMCID: PMC3475661  PMID: 23093926
15.  Functional Consequence of Positive Selection Revealed through Rational Mutagenesis of Human Myeloperoxidase 
Molecular Biology and Evolution  2012;29(8):2039-2046.
Myeloperoxidase (MPO) is a member of the mammalian heme peroxidase (MHP) multigene family. Whereas all MHPs oxidize specific halides to generate the corresponding hypohalous acid, MPO is unique in its capacity to oxidize chloride at physiologic pH to produce hypochlorous acid (HOCl), a potent microbicide that contributes to neutrophil-mediated host defense against infection. We have previously resolved the evolutionary relationships in this functionally diverse multigene family and predicted in silico that positive Darwinian selection played a major role in the observed functional diversities (Loughran NB, O'Connor B, O'Fagain C, O'Connell MJ. 2008. The phylogeny of the mammalian heme peroxidases and the evolution of their diverse functions. BMC Evol Biol. 8:101). In this work, we have replaced positively selected residues asparagine 496 (N496), tyrosine 500 (Y500), and leucine 504 (L504) with the amino acids present in the ancestral MHP and have examined the effects on the structure, biosynthesis, and activity of MPO. Analysis in silico predicted that N496F, Y500F, or L504T would perturb hydrogen bonding in the heme pocket of MPO and thus disrupt the structural integrity of the enzyme. Biosynthesis of the mutants stably expressed in human embryonic kidney 293 cells yielded apoproMPO, the heme-free, enzymatically inactive precursor of MPO, that failed to undergo normal maturation or proteolytic processing. As a consequence of the maturational arrest at the apoproMPO stage of development, cells expressing MPO with mutations N496F, Y500F, L504T, individually or in combination, lacked normal peroxidase or chlorinating activity. Taken together, our data provide further support for the in silico predictions of positive selection and highlight the correlation between positive selection and functional divergence. Our data demonstrate that directly probing the functional importance of positive selection can provide important insights into understanding protein evolution.
PMCID: PMC3408071  PMID: 22355012
myeloperoxidase; animal peroxidase family; positive selection; protein evolution; Darwinian selection; functional shift
16.  Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution 
Biology Direct  2008;3:40.
Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate.
This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude.
Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution.
This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section.
PMCID: PMC2572155  PMID: 18840284
17.  Expression and Association of Group IV Nitrogenase NifD and NifH Homologs in the Non-Nitrogen-Fixing Archaeon Methanocaldococcus jannaschii▿ † 
Journal of Bacteriology  2007;189(20):7392-7398.
Using genomic analysis, researchers previously identified genes coding for proteins homologous to the structural proteins of nitrogenase (J. Raymond, J. L. Siefert, C. R. Staples, and R. E. Blankenship, Mol. Biol. Evol. 21:541-554, 2004). The expression and association of NifD and NifH nitrogenase homologs (named NflD and NflH for “Nif-like” D and H, respectively) have been detected in a non-nitrogen-fixing hyperthermophilic methanogen, Methanocaldococcus jannaschii. These homologs are expressed constitutively and do not appear to be directly involved with nitrogen metabolism or detoxification of compounds such as cyanide or azide. The NflH and NflD proteins were found to interact with each other, as determined by bacterial two-hybrid studies. Upon immunoisolation, NflD and NflH copurified, along with three other proteins whose functions are as yet uncharacterized. The apparent presence of genes coding for NflH and NflD in all known methanogens, their constitutive expression, and their high sequence similarity to the NifH and NifD proteins or the BchL and BchN/BchB proteins suggest that NflH and NflD participate in an indispensable and fundamental function(s) in methanogens.
PMCID: PMC2168459  PMID: 17660283
18.  Segmental dataset and whole body expression data do not support the hypothesis that non-random movement is an intrinsic property of Drosophila retrogenes 
Several studies in Drosophila have shown excessive movement of retrogenes from the X chromosome to autosomes, and that these genes are frequently expressed in the testis. This phenomenon has led to several hypotheses invoking natural selection as the process driving male-biased genes to the autosomes. Metta and Schlötterer (BMC Evol Biol 2010, 10:114) analyzed a set of retrogenes where the parental gene has been subsequently lost. They assumed that this class of retrogenes replaced the ancestral functions of the parental gene, and reported that these retrogenes, although mostly originating from movement out of the X chromosome, showed female-biased or unbiased expression. These observations led the authors to suggest that selective forces (such as meiotic sex chromosome inactivation and sexual antagonism) were not responsible for the observed pattern of retrogene movement out of the X chromosome.
We reanalyzed the dataset published by Metta and Schlötterer and found several issues that led us to a different conclusion. In particular, Metta and Schlötterer used a dataset combined with expression data in which significant sex-biased expression is not detectable. First, the authors used a segmental dataset where the genes selected for analysis were less testis-biased in expression than those that were excluded from the study. Second, sex-biased expression was defined by comparing male and female whole-body data and not the expression of these genes in gonadal tissues. This approach significantly reduces the probability of detecting sex-biased expressed genes, which explains why the vast majority of the genes analyzed (parental and retrogenes) were equally expressed in both males and females. Third, the female-biased expression observed by Metta and Schlötterer is mostly found for parental genes located on the X chromosome, which is known to be enriched with genes with female-biased expression. Fourth, using additional gonad expression data, we found that autosomal genes analyzed by Metta and Schlötterer are less up regulated in ovaries and have higher chance to be expressed in meiotic cells of spermatogenesis when compared to X-linked genes.
The criteria used to select retrogenes and the sex-biased expression data based on whole adult flies generated a segmental dataset of female-biased and unbiased expressed genes that was unable to detect the higher propensity of autosomal retrogenes to be expressed in males. Thus, there is no support for the authors’ view that the movement of new retrogenes, which originated from X-linked parental genes, was not driven by selection. Therefore, selection-based genetic models remain the most parsimonious explanations for the observed chromosomal distribution of retrogenes.
PMCID: PMC3532075  PMID: 22950647
19.  Negative Correlation between Expression Level and Evolutionary Rate of Long Intergenic Noncoding RNAs 
Genome Biology and Evolution  2011;3:1390-1404.
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
PMCID: PMC3242500  PMID: 22071789
long noncoding RNA; ncRNA; RNA expression; genomic alignments; introns; RNA folding
20.  Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe 
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe.
The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences.Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation.Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011).
Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance.
Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci.
At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression.
To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation.
If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs.
The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus.
Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome.
Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product.
Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase.
Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates.
These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell.
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
PMCID: PMC3738847  PMID: 22186733
antisense; meiosis; ncRNA; S. pombe; siRNA
21.  No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly 
It has been suggested that rates of protein evolution are influenced, to a great extent, by the proportion of amino acid residues that are directly involved in protein function. In agreement with this hypothesis, recent work has shown a negative correlation between evolutionary rates and the number of protein-protein interactions. However, the extent to which the number of protein-protein interactions influences evolutionary rates remains unclear. Here, we address this question at several different levels of evolutionary relatedness.
Manually curated data on the number of protein-protein interactions among Saccharomyces cerevisiae proteins was examined for possible correlation with evolutionary rates between S. cerevisiae and Schizosaccharomyces pombe orthologs. Only a very weak negative correlation between the number of interactions and evolutionary rate of a protein was observed. Furthermore, no relationship was found between a more general measure of the evolutionary conservation of S. cerevisiae proteins, based on the taxonomic distribution of their homologs, and the number of protein-protein interactions. However, when the proteins from yeast were assorted into discrete bins according to the number of interactions, it turned out that 6.5% of the proteins with the greatest number of interactions evolved, on average, significantly slower than the rest of the proteins. Comparisons were also performed using protein-protein interaction data obtained with high-throughput analysis of Helicobacter pylori proteins. No convincing relationship between the number of protein-protein interactions and evolutionary rates was detected, either for comparisons of orthologs from two completely sequenced H. pylori strains or for comparisons of H. pylori and Campylobacter jejuni orthologs, even when the proteins were classified into bins by the number of interactions.
The currently available comparative-genomic data do not support the hypothesis that the evolutionary rates of the majority of proteins substantially depend on the number of protein-protein interactions they are involved in. However, a small fraction of yeast proteins with the largest number of interactions (the hubs of the interaction network) tend to evolve slower than the bulk of the proteins.
PMCID: PMC140311  PMID: 12515583
22.  Network Hubs Buffer Environmental Variation in Saccharomyces cerevisiae 
PLoS Biology  2008;6(11):e264.
Regulatory and developmental systems produce phenotypes that are robust to environmental and genetic variation. A gene product that normally contributes to this robustness is termed a phenotypic capacitor. When a phenotypic capacitor fails, for example when challenged by a harsh environment or mutation, the system becomes less robust and thus produces greater phenotypic variation. A functional phenotypic capacitor provides a mechanism by which hidden polymorphism can accumulate, whereas its failure provides a mechanism by which evolutionary change might be promoted. The primary example to date of a phenotypic capacitor is Hsp90, a molecular chaperone that targets a large set of signal transduction proteins. In both Drosophila and Arabidopsis, compromised Hsp90 function results in pleiotropic phenotypic effects dependent on the underlying genotype. For some traits, Hsp90 also appears to buffer stochastic variation, yet the relationship between environmental and genetic buffering remains an important unresolved question. We previously used simulations of knockout mutations in transcriptional networks to predict that many gene products would act as phenotypic capacitors. To test this prediction, we use high-throughput morphological phenotyping of individual yeast cells from single-gene deletion strains to identify gene products that buffer environmental variation in Saccharomyces cerevisiae. We find more than 300 gene products that, when absent, increase morphological variation. Overrepresented among these capacitors are gene products that control chromosome organization and DNA integrity, RNA elongation, protein modification, cell cycle, and response to stimuli such as stress. Capacitors have a high number of synthetic-lethal interactions but knockouts of these genes do not tend to cause severe decreases in growth rate. Each capacitor can be classified based on whether or not it is encoded by a gene with a paralog in the genome. Capacitors with a duplicate are highly connected in the protein–protein interaction network and show considerable divergence in expression from their paralogs. In contrast, capacitors encoded by singleton genes are part of highly interconnected protein clusters whose other members also tend to affect phenotypic variability or fitness. These results suggest that buffering and release of variation is a widespread phenomenon that is caused by incomplete functional redundancy at multiple levels in the genetic architecture.
Author Summary
Most species maintain abundant genetic variation and experience a wide range of environmental conditions, yet phenotypic differences between individuals are usually small. This phenomenon, known as phenotypic robustness, presents an apparent contradiction: if biological systems are so resistant to variation, how do they diverge and adapt through evolutionary time? Here, we address this question by investigating the molecular mechanisms that underlie phenotypic robustness and how these mechanisms can be broken to produce phenotypic heterogeneity. We identify genes that contribute to phenotypic robustness in yeast by analyzing the variance of morphological phenotypes in a comprehensive collection of single-gene knockout strains. We find that ∼5% of yeast genes break phenotypic robustness when knocked out. The products of these genes tend to be involved in critical cellular processes, including maintaining DNA stability, processing RNA, modifying proteins, and responding to stressful environments. These genes tend to interact genetically with a large number of other genes, and their products tend to interact physically with a large number of other gene products. Our results suggest that loss of phenotypic robustness might be a common phenomenon during evolution that occurs when cellular networks are disrupted.
A genome-wide screen inSaccharomyces cerevisiae identifies over 300 gene products that buffer environmental variation--dubbed phenotypic capacitors--and function as hubs in protein-protein and synthetic-lethal interaction networks.
PMCID: PMC2577700  PMID: 18986213
23.  Evolutionary Dynamics of the wnt Gene Family: A Lophotrochozoan Perspective 
Molecular Biology and Evolution  2010;27(7):1645-1658.
The wnt gene family encodes a set of secreted glycoproteins involved in key developmental processes, including cell fate specification and regulation of posterior growth (Cadigan KM, Nusse R. 1997. Wnt signaling: a common theme in animal development. Genes Dev. 11:3286–3305.; Martin BL, Kimelman D. 2009. Wnt signaling and the evolution of embryonic posterior development. Curr Biol. 19:R215–R219.). As for many other gene families, evidence for expansion and/or contraction of the wnt family is available from deuterostomes (e.g., echinoderms and vertebrates [Nusse R, Varmus HE. 1992. Wnt genes. Cell. 69:1073–1087.; Schubert M, Holland LZ, Holland ND, Jacobs DK. 2000. A phylogenetic tree of the Wnt genes based on all available full-length sequences, including five from the cephalochordate amphioxus. Mol Biol Evol. 17:1896–1903.; Croce JC, Wu SY, Byrum C, Xu R, Duloquin L, Wikramanayake AH, Gache C, McClay DR. 2006. A genome-wide survey of the evolutionarily conserved Wnt pathways in the sea urchin Strongylocentrotus purpuratus. Dev Biol. 300:121–131.]) and ecdysozoans (e.g., arthropods and nematodes [Eisenmann DM. 2005. Wnt signaling. WormBook. 1–17.; Bolognesi R, Farzana L, Fischer TD, Brown SJ. 2008. Multiple Wnt genes are required for segmentation in the short-germ embryo of Tribolium castaneum. Curr Biol. 18:1624–1629.]), but little is known from the third major bilaterian group, the lophotrochozoans (e.g., mollusks and annelids [Prud'homme B, Lartillot N, Balavoine G, Adoutte A, Vervoort M. 2002. Phylogenetic analysis of the Wnt gene family. Insights from lophotrochozoan members. Curr Biol. 12:1395.]). To obtain a more comprehensive scenario of the evolutionary dynamics of this gene family, we exhaustively mined wnt gene sequences from the whole genome assemblies of a mollusk (Lottia gigantea) and two annelids (Capitella teleta and Helobdella robusta) and examined them by phylogenetic, genetic linkage, intron–exon structure, and embryonic expression analyses. The 36 wnt genes obtained represent 11, 12, and 9 distinct wnt subfamilies in Lottia, Capitella, and Helobdella, respectively. Thus, two of the three analyzed lophotrochozoan genomes retained an almost complete ancestral complement of wnt genes emphasizing the importance and complexity of this gene family across metazoans. The genome of the leech Helobdella reflects significantly more dynamism than those of Lottia and Capitella, as judged by gene duplications and losses, branch length, and changes in genetic linkage. Finally, we performed a detailed expression analysis for all the Helobdella wnt genes during embryonic development. We find that, although the patterns show substantial overlap during early cleavage stages, each wnt gene has a unique expression pattern in the germinal plate and during tissue morphogenesis. Comparisons of the embryonic expression patterns of the duplicated wnt genes in Helobdella with their orthologs in Capitella reveal extensive regulatory diversification of the duplicated leech wnt genes.
PMCID: PMC2912473  PMID: 20176615
wnt family genes; lophotrochozoan genomes; gene duplication and diversification; annelid; leech; polychaete
24.  Short-term memory in gene induction reveals the regulatory principle behind stochastic IL-4 expression 
Combining experiments on primary T cells and mathematical modeling, we characterized the stochastic expression of the interleukin-4 cytokine gene in its physiologic context, showing that a two-step model of transcriptional regulation acting on chromatin rearrangement and RNA polymerase recruitment accounts for the level, kinetics, and population variability of expression.A rate-limiting step upstream of transcription initiation, but occurring at the level of an individual allele, controls whether the interleukin-4 gene is expressed during antigenic stimulation, suggesting that the observed stochasticity of expression is linked to the dynamics of chromatin rearrangement.The computational analysis predicts that the probability to re-express an interleukin-4 gene that has been expressed once is transiently increased. In support, we experimentally demonstrate a short-term memory for interleukin-4 expression at the predicted time scale of several days.The model provides a unifying framework that accounts for both graded and binary modes of gene regulation. Graded changes in expression level can be achieved by controlling transcription initiation, whereas binary regulation acts at the level of chromatin rearrangement and is targeted during the differentiation of T cells that specialize in interleukin-4 production.
Cell populations are typically heterogeneous with respect to protein expression even when clonally derived from a single progenitor. In bacteria and yeast, such heterogeneity has been shown to be due to intrinsically stochastic dynamics of gene expression (Raj and van Oudenaarden, 2008). Thus, cross-population heterogeneity may be an unavoidable by-product of random fluctuations in molecular interactions (Raser and O'Shea, 2004; Pedraza and van Oudenaarden, 2005). The phenotypic variability deriving from it may also be beneficial for cell function, differentiation, or adaptation to changing environments (Chang et al, 2008; Feinerman et al, 2008; Losick and Desplan, 2008). However, little is known about how gene-expression variability is caused in mammalian cells.
Two principal modes of gene regulation have been identified: graded and binary. In the graded mode, transcriptional regulators can tune the level of a gene product in a continuous manner (Hazzalin and Mahadevan, 2002). In the binary mode, the gene is expressed at an invariant level, whereas its probability of being expressed in a given cell is regulated, so that the gene has discrete ‘on' and ‘off' states (Walters et al, 1995; Hume, 2000; Biggar and Crabtree, 2001). In humans and mice, cytokine genes are expressed in a binary manner (Bix and Locksley, 1998; Riviere et al, 1998; Hu-Li et al, 2001; Apostolou and Thanos, 2008). A particularly well-studied case is the interleukin-4 (il4) gene that is critical for antibody-based immune responses. This gene is expressed by antigen-stimulated T cells initially with low probability, so that in most IL-4-positive cells only one allele is active (Bix and Locksley, 1998; Riviere et al, 1998). The expressed allele is not imprinted but chosen stochastically during each cell stimulation (Hu-Li et al, 2001).
Here, we have studied the dynamics of IL-4 expression quantitatively. Primary murine CD4+ T cells have been differentiated uniformly into type-2 T-helper (Th2) cells that express the lineage-specifying transcription factor (TF) Gata-3 and are competent to activate the il4 gene upon challenge with antigen. Using T cells heterozygous for an il4 wild-type allele and an il4 allele with GFP knock-in after the promoter, the alleles are found to be expressed stochastically and in an uncorrelated manner (Figure 2A; Hu-Li et al, 2001). To account for the observed stochastic dynamics of IL-4 expression, we considered a basic model of gene transcription, mRNA translation, turnover, and protein secretion (Figure 2B). However, our experimental estimates of the intracellular life times of IL-4 mRNA and protein (∼1 h) and their absolute numbers (mRNA∼103, protein∼105) rule out random fluctuations in transcription, translation as well as mRNA and protein turnover as an explanation for the observed stochastic properties of IL-4 expression (Thattai and van Oudenaarden, 2001; Paulsson, 2004).
As il4 is known to be strongly regulated at the chromatin level (Ansel et al, 2006), we included in the model a reversible step of chromatin opening that is permissive for transcription (Figure 2C and D). Both chromatin opening and transcription initiation are driven by TFs that are transiently activated during the antigen stimulus, with NFAT1 playing a prominent role (Agarwal et al, 2000; Avni et al, 2002; Guo et al, 2004). The model accounts for the kinetics of NFAT1 TF activity (Figure 2E) (Loh et al, 1996). Using a best-fit procedure for estimating the kinetics of the chromatin transition and TF activity from experimental data, we found that the model accurately reproduces the distribution of IL-4 expression within the cell population over the entire time course of a stimulation (Figure 3A). At the same time, it accounts for the measured kinetics of IL-4 mRNA, intracellular and secreted protein (Figure 3B). Additional data show that the model can also explain IL-4 expression at different stages of Th2 differentiation and upon pharmacological inhibition of NFAT1 activity. In each case, the model predicts a slow and stochastic chromatin opening (Step 1 in Figure 2C) that is the limiting step for the activation of the gene.
The slowness of chromatin opening inferred by the model implies an extended lifetime of the open chromatin state (several days), which lasts longer than TF activity during antigenic stimulation (several hours). This indicates that acute IL-4 expression is terminated by the cessation of TF activity (Step 2 in Figure 2C), rather than by the closing of the chromatin (Step 1). In support of this prediction, we observed an elevated fraction of IL-4-producing cells after secondary stimulations administered within a few days of the primary stimulus. Consistent with the model, this elevation disappeared with a half-life of ∼3 days (Figure 4B). To test whether this ‘short-term memory' for activation of the il4 gene is indeed due to the IL-4 producers in the primary stimulation, we sorted stimulated Th2 cells into viable IL-4-producing and non-producing fractions using the cytokine secretion assay (Ouyang et al, 2000) and cultured them separately for different resting periods. The probability of IL-4 re-expression in the positive-sorted cells was consistently larger than in negative-sorted cells and decreased progressively over several days (Figure 4C). By contrast, the sorted IL-4 negative cells exhibited a constant induction probability indistinguishable from the unsorted population. This behavior was not due to differential cell proliferation in the sorted populations or different success of Th2 differentiation. Moreover, using heterozygous il4-wild-type/il4-gfp cells, and sorting for expression of the wild-type allele, we observed that expression of the il4-gfp allele was similar in IL-4-positive and negative sorted fractions. Taken together, these findings imply that stochastic, slow chromatin changes at individual il4 genes govern the binary expression pattern of this cytokine.
In conclusion, we propose an experimentally based model of inducible gene expression where strong stochasticity arises from slow (hours to days) chromatin opening and closing transitions, rather than being due to small numbers of mRNA or protein molecules or transcriptional bursting (Raj et al, 2006). This rate-limiting step upstream of transcription initiation (which may entail several interacting epigenetic processes) naturally gives rise to a binary expression pattern of the gene. By contrast, regulation at the level of transcription initiation can have a graded effect on the expression level. We provide evidence that both binary and graded regulation can occur for the il4 gene. Physiological regulation of il4 seems to be mainly binary, thus enabling a dose–response within a population while producing an unequivocal all-or-none signal at the single-cell level.
Although cell-to-cell variability has been recognized as an unavoidable consequence of stochasticity in gene expression, it may also serve a functional role for tuning physiological responses within a cell population. In the immune system, remarkably large variability in the expression of cytokine genes has been observed in homogeneous populations of lymphocytes, but the underlying molecular mechanisms are incompletely understood. Here, we study the interleukin-4 gene (il4) in T-helper lymphocytes, combining mathematical modeling with the experimental quantification of expression variability and critical parameters. We show that a stochastic rate-limiting step upstream of transcription initiation, but acting at the level of an individual allele, controls il4 expression. Only a fraction of cells reaches an active, transcription-competent state in the transient time window determined by antigen stimulation. We support this finding by experimental evidence of a previously unknown short-term memory that was predicted by the model to arise from the long lifetime of the active state. Our analysis shows how a stochastic mechanism acting at the chromatin level can be integrated with transcriptional regulation to quantitatively control cell-to-cell variability.
PMCID: PMC2872609  PMID: 20393579
cytokines; cytokine secretion assay; epigenetic regulation; gene expression; stochastic model
25.  Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? 
Numerous studies, using in aggregate some 28 genes, have achieved a consensus in recognizing three groups of plants, including Amborella, as comprising the basal-most grade of all other angiosperms. A major exception is the recent study by Goremykin et al. (2003; Mol. Biol. Evol. 20:1499–1505), whose analyses of 61 genes from 13 sequenced chloroplast genomes of land plants nearly always found 100% support for monocots as the deepest angiosperms relative to Amborella, Calycanthus, and eudicots. We hypothesized that this conflict reflects a misrooting of angiosperms resulting from inadequate taxon sampling, inappropriate phylogenetic methodology, and rapid evolution in the grass lineage used to represent monocots.
We used two main approaches to test this hypothesis. First, we sequenced a large number of chloroplast genes from the monocot Acorus and added these plus previously sequenced Acorus genes to the Goremykin et al. (2003) dataset in order to explore the effects of altered monocot sampling under the same analytical conditions used in their study. With Acorus alone representing monocots, strongly supported Amborella-sister trees were obtained in all maximum likelihood and parsimony analyses, and in some distance-based analyses. Trees with both Acorus and grasses gave either a well-supported Amborella-sister topology or else a highly unlikely topology with 100% support for grasses-sister and paraphyly of monocots (i.e., Acorus sister to "dicots" rather than to grasses). Second, we reanalyzed the Goremykin et al. (2003) dataset focusing on methods designed to account for rate heterogeneity. These analyses supported an Amborella-sister hypothesis, with bootstrap support values often conflicting strongly with cognate analyses performed without allowing for rate heterogeneity. In addition, we carried out a limited set of analyses that included the chloroplast genome of Nymphaea, whose position as a basal angiosperm was also, and very recently, challenged.
These analyses show that Amborella (or Amborella plus Nymphaea), but not monocots, is the sister group of all other angiosperms among this limited set of taxa and that the grasses-sister topology is a long-branch-attraction artifact leading to incorrect rooting of angiosperms. These results highlight the danger of having lots of characters but too few and, especially, molecularly divergent taxa, a situation long recognized as potentially producing strongly misleading molecular trees. They also emphasize the importance in phylogenetic analysis of using appropriate evolutionary models.
PMCID: PMC543456  PMID: 15453916

Results 1-25 (1296609)