Ciliates are an ancient and diverse group of microbial eukaryotes that have emerged as powerful models for RNA-mediated epigenetic inheritance. They possess extensive sets of both tiny and long noncoding RNAs that, together with a suite of proteins that includes transposases, orchestrate a broad cascade of genome rearrangements during somatic nuclear development. This Review emphasizes three important themes: the remarkable role of RNA in shaping genome structure, recent discoveries that unify many deeply diverged ciliate genetic systems, and a surprising evolutionary “sign change” in the role of small RNAs between major species groups.
Organisms represented by the root of the universal evolutionary tree were most likely complex cells with a sophisticated protein translation system and a DNA genome encoding hundreds of genes. The growth of bioinformatics data from taxonomically diverse organisms has made it possible to infer the likely properties of early life in greater detail. Here we present LUCApedia, (http://eeb.princeton.edu/lucapedia), a unified framework for simultaneously evaluating multiple data sets related to the Last Universal Common Ancestor (LUCA) and its predecessors. This unification is achieved by mapping eleven such data sets onto UniProt, KEGG and BioCyc IDs. LUCApedia may be used to rapidly acquire evidence that a certain gene or set of genes is ancient, to examine the early evolution of metabolic pathways, or to test specific hypotheses related to ancient life by corroborating them against the rest of the database.
Genome duality in ciliated protozoa offers a unique system to showcase their epigenome as a model of inheritance. In Oxytricha, the somatic genome is responsible for vegetative growth, while the germline contributes DNA to the next sexual generation. Somatic nuclear development removes all transposons and other so-called “junk DNA”, which comprise ~95% of the germline. We demonstrate that Piwi-interacting small RNAs (piRNAs) from the maternal nucleus can specify genomic regions for retention in this process. Oxytricha piRNAs map primarily to the somatic genome, representing the ~5% of the germline that is retained. Furthermore, injection of synthetic piRNAs corresponding to normally-deleted regions leads to their retention in later generations. Our findings highlight small RNAs (sRNAs) as powerful transgenerational carriers of epigenetic information for genome programming.
Several independent lines of evidence suggest that the modern genetic system was preceded by the ‘RNA world’ in which RNA genes encoded RNA catalysts. Current gaps in our conceptual framework of early genetic systems make it difficult to imagine how a stable RNA genome may have functioned and how the transition to a DNA genome could have taken place. Here we use the single-celled ciliate, Oxytricha, as an analog to some of the genetic and genomic traits that may have been present in organisms before and during the establishment of a DNA genome. Oxytricha and its close relatives have a unique genome architecture involving two differentiated nuclei, one of which encodes the genome on small, linear nanochromosomes. While its unique genomic characteristics are relatively modern, some physiological processes related to the genomes and nuclei of Oxytricha may exemplify primitive states of the developing genetic system.
Ciliated protists rearrange their genomes dramatically during nuclear development via chromosome fragmentation and DNA deletion to produce a trimmer and highly reorganized somatic genome. The deleted portion of the genome includes potentially active transposons or transposon-like sequences that reside in the germline. Three independent studies recently showed that transposase proteins of the DDE/DDD superfamily are indispensible for DNA processing in three distantly related ciliates. In the spirotrich Oxytricha trifallax, high copy-number germline-limited transposons mediate their own excision from the somatic genome but also contribute to programmed genome rearrangement through a remarkable transposon mutualism with the host. By contrast, the genomes of two oligohymenophorean ciliates, Tetrahymena thermophila and Paramecium tetraurelia, encode homologous PiggyBac-like transposases as single-copy genes in both their germline and somatic genomes. These domesticated transposases are essential for deletion of thousands of different internal sequences in these species. This review contrasts the events underlying somatic genome reduction in three different ciliates and considers their evolutionary origins and the relationships among their distinct mechanisms for genome remodeling.
Oxytricha trifallax — an established model organism for studying genome rearrangements, chromosome structure, scrambled genes, RNA-mediated epigenetic inheritance, and other phenomena — has been the subject of a nomenclature controversy for several years. Originally isolated as a sibling species of O. fallax, O. trifallax was reclassified in 1999 as Sterkiella histriomuscorum, a previously identified species, based on morphological similarity. The proper identification of O. trifallax is crucial to resolve in order to prevent confusion in both the comparative genomics and the general scientific communities. We analyzed nine conserved nuclear gene sequences between the two given species and several related ciliates. Phylogenetic analyses suggest that O. trifallax and a bona fide S. histriomuscorum have accumulated significant evolutionary divergence from each other relative to other ciliates such that they should be unequivocally classified as separate species. We also describe the original isolation of O. trifallax, including its comparison to O. fallax, and we provide criteria to identify future isolates of O. trifallax.
Oxytricha fallax; Oxytricha trifallax; Sterkiella histriomuscorum; ciliate; spirotrich; hypotrich; evolution; phylogeny; concatenated tree
With more chromosomes than any other sequenced genome, the macronuclear genome of Oxytricha trifallax has a unique and complex architecture, including alternative fragmentation and predominantly single-gene chromosomes.
The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor “silent” germline micronuclear genome by a process of “unscrambling” and fragmentation. The tiny macronuclear “nanochromosomes” typically encode single, protein-coding genes (a small portion, 10%, encode 2–8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease.
The macronuclear genome of the ciliate Oxytricha trifallax, contained in its somatic nucleus, has a unique genome architecture. Unlike its diploid germline genome, which is transcriptionally inactive during normal cellular growth, the macronuclear genome is fragmented into at least 16,000 tiny (∼3.2 kb mean length) chromosomes, most of which encode single actively transcribed genes and are differentially amplified to a few thousand copies each. The smallest chromosome is just 469 bp, while the largest is 66 kb and encodes a single enormous protein. We found considerable variation in the genome, including frequent alternative fragmentation patterns, generating chromosome isoforms with shared sequence. We also found limited variation in chromosome amplification levels, though insufficient to explain mRNA transcript level variation. Another remarkable feature of Oxytricha's macronuclear genome is its inordinate fondness for telomeres. In conjunction with its possession of tens of millions of chromosome-ending telomeres per macronucleus, we show that Oxytricha has evolved multiple putative telomere-binding proteins. In addition, we identified two new domesticated transposase-like protein classes that we propose may participate in the process of genome rearrangement. The macronuclear genome now provides a crucial resource for ongoing studies of genome rearrangement processes that use Oxytricha as an experimental or comparative model.
Interchromosomal chimeric RNA molecules are often transcription products from genomic rearrangement in cancerous cells. Here we report the computational detection of an interchromosomal RNA fusion between ZC3HAV1L and CHMP1A from RNA-seq data of normal human mammary epithelial cells, and experimental confirmation of the chimeric transcript in multiple human cells and tissues. Our experimental characterization also detected three variants of the ZC3HAV1L-CHMP1A chimeric RNA, suggesting that these genes are involved in complex splicing. The fusion sequence at the novel exon-exon boundary, and the absence of corresponding DNA rearrangement suggest that this chimeric RNA is likely produced by trans-splicing in human cells.
This article was reviewed by Rory Johnson (nominated by Fyodor Kondrashov); Gal Avital and Itai Yanai
Chimeric transcripts; RNA fusion; trans-splicing; Genome rearrangement
RNA, normally thought of as a conduit in gene expression, has a novel mode of action in ciliated protozoa. Maternal RNA templates provide both an organizing guide for DNA rearrangements and a template that can transport somatic mutations to the next generation. This opportunity for RNA-mediated genome rearrangement and DNA repair is profound in the ciliate Oxytricha, which deletes 95% of its germline genome during development in a process that severely fragments its chromosomes and then sorts and reorders the hundreds of thousands of pieces remaining. Oxytricha’s somatic nuclear genome is therefore an epigenome formed through RNA templates and signals arising from the previous generation. Furthermore, this mechanism of RNA-mediated epigenetic inheritance can function across multiple generations, and the discovery of maternal template RNA molecules has revealed new biological roles for RNA and has hinted at the power of RNA molecules to sculpt genomic information in cells.
noncoding RNA; maternal inheritance; Lamarckian inheritance; scrambled genes; ciliates; Oxytricha
In a process similar to exon splicing, ciliates use DNA splicing to produce a new somatic macronuclear genome from their germline micronuclear genome after sexual reproduction. This extra layer of DNA rearrangement permits novel mechanisms to create genetic complexity during both evolution and development. Here we describe a chimeric macronuclear chromosome in Oxytricha trifallax constructed from two smaller macronuclear chromosomes. To determine how the chimera was generated, we cloned and sequenced the corresponding germline loci. The chimera derives from a novel locus in the micronucleus that arose by partial duplication of the loci for the two smaller chromosomes. This suggests that an exon shuffling-like process, which we call MDS shuffling, enables ciliates to generate novel genetic material and gene products using different combinations of genomic DNA segments.
Ciliate; Duplication; Gene shuffling; Chimeric chromosome; Micronucleus; Oxytricha
Despite comprising much of the eukaryotic genome, few transposons are active, and they usually confer no benefit to the host. Through an exaggerated process of genome rearrangement, Oxytricha trifallax destroys 95% of its germline genome during development. This includes the elimination of all transposon DNA. We show that germline-limited transposase genes play key roles in this process of genome-wide DNA excision, which suggests that transposases function in large eukaryotic genomes containing thousands of active transposons. We show that transposase gene expression occurs during germline-soma differentiation and that silencing of transposase by RNA interference leads to abnormal DNA rearrangement in the offspring. This study suggests a new important role in Oxytricha for this large portion of genomic DNA that was previously thought of as junk.
Cytosine methylation of DNA is conserved across eukaryotes and plays important functional roles regulating gene expression during differentiation and development in animals, plants and fungi. Hydroxymethylation was recently identified as another epigenetic modification marking genes important for pluripotency in embryonic stem cells.
Here we describe de novo cytosine methylation and hydroxymethylation in the ciliate Oxytricha trifallax. These DNA modifications occur only during nuclear development and programmed genome rearrangement. We detect methylcytosine and hydroxymethylcytosine directly by high-resolution nano-flow UPLC mass spectrometry, and indirectly by immunofluorescence, methyl-DNA immunoprecipitation and bisulfite sequencing. We describe these modifications in three classes of eliminated DNA: germline-limited transposons and satellite repeats, aberrant DNA rearrangements, and DNA from the parental genome undergoing degradation. Methylation and hydroxymethylation generally occur on the same sequence elements, modifying cytosines in all sequence contexts. We show that the DNA methyltransferase-inhibiting drugs azacitidine and decitabine induce demethylation of both somatic and germline sequence elements during genome rearrangements, with consequent elevated levels of germline-limited repetitive elements in exconjugant cells.
These data strongly support a functional link between cytosine DNA methylation/hydroxymethylation and DNA elimination. We identify a motif strongly enriched in methylated/hydroxymethylated regions, and we propose that this motif recruits DNA modification machinery to specific chromosomes in the parental macronucleus. No recognizable methyltransferase enzyme has yet been described in O. trifallax, raising the possibility that it might employ a novel cytosine methylation machinery to mark DNA sequences for elimination during genome rearrangements.
epigenetics; DNA degradation; heterochromatin; methyltransferase; 5-Aza-2'-deoxycitidine; 5-azacytidine; azacitidine; decitabine
Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition.
Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure.
Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.
The Oxytricha trifallax mitochondrial genome contains the largest sequenced ciliate mitochondrial chromosome (∼70 kb) plus a ∼5-kb linear plasmid bearing mitochondrial telomeres. We identify two new ciliate split genes (rps3 and nad2) as well as four new mitochondrial genes (ribosomal small subunit protein genes: rps- 2, 7, 8, 10), previously undetected in ciliates due to their extreme divergence. The increased size of the Oxytricha mitochondrial genome relative to other ciliates is primarily a consequence of terminal expansions, rather than the retention of ancestral mitochondrial genes. Successive segmental duplications, visible in one of the two Oxytricha mitochondrial subterminal regions, appear to have contributed to the genome expansion. Consistent with pseudogene formation and decay, the subtermini possess shorter, more loosely packed open reading frames than the remainder of the genome. The mitochondrial plasmid shares a 251-bp region with 82% identity to the mitochondrial chromosome, suggesting that it most likely integrated into the chromosome at least once. This region on the chromosome is also close to the end of the most terminal member of a series of duplications, hinting at a possible association between the plasmid and the duplications. The presence of mitochondrial telomeres on the mitochondrial plasmid suggests that such plasmids may be a vehicle for lateral transfer of telomeric sequences between mitochondrial genomes. We conjecture that the extreme divergence observed in ciliate mitochondrial genomes may be due, in part, to repeated invasions by relatively error-prone DNA polymerase-bearing mobile elements.
split genes; segmental duplication; genome expansion; linear mitochondrial plasmid; mobile elements; extreme mitochondrial divergences
Genome-wide DNA rearrangements occur in many eukaryotes but are most exaggerated in ciliates, making them ideal model systems for epigenetic phenomena. During development of the somatic macronucleus, Oxytricha trifallax destroys 95% of its germ line, severely fragmenting its chromosomes, and then unscrambles hundreds of thousands of remaining fragments by permutation or inversion. Here we demonstrate that DNA or RNA templates can orchestrate these genome rearrangements in Oxytricha, supporting an epigenetic model for sequence-dependent comparison between germline and somatic genomes. A complete RNA cache of the maternal somatic genome may be available at a specific stage during development to provide a template for correct and precise DNA rearrangement. We show the existence of maternal RNA templates that could guide DNA assembly, and that disruption of specific RNA molecules disables rearrangement of the corresponding gene. Injection of artificial templates reprogrammes the DNA rearrangement pathway, suggesting that RNA molecules guide genome rearrangement.
Nyctotherus ovalis is a single-celled eukaryote that has hydrogen-producing mitochondria and lives in the hindgut of cockroaches. Like all members of the ciliate taxon, it has two types of nuclei, a micronucleus and a macronucleus. N. ovalis generates its macronuclear chromosomes by forming polytene chromosomes that subsequently develop into macronuclear chromosomes by DNA elimination and rearrangement.
We examined the structure of these gene-sized macronuclear chromosomes in N. ovalis. We determined the telomeres, subtelomeric regions, UTRs, coding regions and introns by sequencing a large set of macronuclear DNA sequences (4,242) and cDNAs (5,484) and comparing them with each other. The telomeres consist of repeats CCC(AAAACCCC)n, similar to those in spirotrichous ciliates such as Euplotes, Sterkiella (Oxytricha) and Stylonychia. Per sequenced chromosome we found evidence for either a single protein-coding gene, a single tRNA, or the complete ribosomal RNAs cluster. Hence the chromosomes appear to encode single transcripts. In the short subtelomeric regions we identified a few overrepresented motifs that could be involved in gene regulation, but there is no consensus polyadenylation site. The introns are short (21–29 nucleotides), and a significant fraction (1/3) of the tiny introns is conserved in the distantly related ciliate Paramecium tetraurelia. As has been observed in P. tetraurelia, the N. ovalis introns tend to contain in-frame stop codons or have a length that is not dividable by three. This pattern causes premature termination of mRNA translation in the event of intron retention, and potentially degradation of unspliced mRNAs by the nonsense-mediated mRNA decay pathway.
The combination of short leaders, tiny introns and single genes leads to very minimal macronuclear chromosomes. The smallest we identified contained only 150 nucleotides.
Programmed DNA elimination and reorganization frequently occur during cellular differentiation. Development of the somatic macronucleus in some ciliates presents an extreme case, involving excision of internal eliminated sequences (IESs) that interrupt coding DNA segments (macronuclear destined sequences, MDSs), as well as removal of transposon-like elements and extensive genome fragmentation, leading to 98% genome reduction in Stylonychia lemnae. Approximately 20–30% of the genes are estimated to be scrambled in the germline micronucleus, with coding segment order permuted and present in either orientation on micronuclear chromosomes. Massive genome rearrangements are therefore critical for development.
To understand the process of DNA deletion and reorganization during macronuclear development, we examined the population of DNA molecules during assembly of different scrambled genes in two related organisms in a developmental time-course by PCR. The data suggest that removal of conventional IESs usually occurs first, accompanied by a surprising level of error at this step. The complex events of inversion and translocation seem to occur after repair and excision of all conventional IESs and via multiple pathways.
This study reveals a temporal order of DNA rearrangements during the processing of a scrambled gene, with simpler events usually preceding more complex ones. The surprising observation of a hidden layer of errors, absent from the mature macronucleus but present during development, also underscores the need for repair or screening of incorrectly-assembled DNA molecules.
We present BLAST on Orthologous groups (BLASTO), a modified BLAST tool for searching orthologous group data. It treats each orthologous group as a unit and outputs a ranked list of orthologous groups instead of single sequences. By filtering out redundancy and putative paralogs, sequence comparisons to orthologous groups, instead of to single sequences in the database, can improve both functional prediction and phylogenetic inference. BLASTO computes the significance score of each orthologous group based on the individual BLAST hits in the orthologous group, using the number of taxa in the group as an optional weight. This allows users to control the species diversity of the orthologous groups. BLASTO incorporates the best-known multispecies ortholog databases, including NCBI Clusters of Orthologous Group, NCBI euKaryotic Orthologous Group database, OrthoMCL, MultiParanoid and TIGR Eukaryotic Gene Orthologues database, and offers a useful platform to integrate orthology information into functional inference and evolutionary studies of individual sequences. BLASTO is accessible online at http://oxytricha.princeton.edu/BlastO.
The somatic DNA molecules of spirotrichous ciliates are present as linear chromosomes containing mostly single-gene coding sequences with short 5' and 3' flanking regions. Only a few conserved motifs have been found in the flanking DNA. Motifs that may play roles in promoting and/or regulating transcription have not been consistently detected. Moreover, comparing subtelomeric regions of 1,356 end-sequenced somatic chromosomes failed to identify more putatively conserved motifs.
We sequenced and compared DNA and RNA versions of the DNA polymerase α (pol α) gene from nine diverged spirotrichous ciliates. We identified a G-C rich motif aaTACCGC(G/C/T) upstream from transcription start sites in all nine pol α orthologs. Furthermore, we consistently found likely polyadenylation signals, similar to the eukaryotic consensus AAUAAA, within 35 nt upstream of the polyadenylation sites. Numbers of introns differed among orthologs, suggesting independent gain or loss of some introns during the evolution of this gene. Finally, we discuss the occurrence of short direct repeats flanking some introns in the DNA pol α genes. These introns flanked by direct repeats resemble a class of DNA sequences called internal eliminated sequences (IES) that are deleted from ciliate chromosomes during development.
Our results suggest that conserved motifs are present at both 5' and 3' untranscribed regions of the DNA pol α genes in nine spirotrichous ciliates. We also show that several independent gains and losses of introns in the DNA pol α genes have occurred in the spirotrichous ciliate lineage. Finally, our statistical results suggest that proven introns might also function in an IES removal pathway. This could strengthen a recent hypothesis that introns evolve into IESs, explaining the scarcity of introns in spirotrichs. Alternatively, the analysis suggests that ciliates might occasionally use intron splicing to correct, at the RNA level, failures in IES excision during developmental DNA elimination.
This article was reviewed by Dr. Alexei Fedorov (referred by Dr. Manyuan Long), Dr. Martin A. Huynen and Dr. John M. Logsdon.
We present a bioinformatic web server (SWAKK) for detecting amino acid sites or regions of a protein under positive selection. It estimates the ratio of non-synonymous to synonymous substitution rates (KA/KS) between a pair of protein-coding DNA sequences, by sliding a 3D window, or sphere, across one reference structure. The program displays the results on the 3D protein structure. In addition, for comparison or when a reference structure is unavailable, the server can also perform a sliding window analysis on the primary sequence. The SWAKK web server is available at .