The visualization of massive datasets, such as those resulting from comparative metatranscriptome analyses or the analysis of microbial population structures using ribosomal RNA sequences, is a challenging task. We developed a new method called CoVennTree (Comparative weighted Venn Tree) that simultaneously compares up to three multifarious datasets by aggregating and propagating information from the bottom to the top level and produces a graphical output in Cytoscape. With the introduction of weighted Venn structures, the contents and relationships of various datasets can be correlated and simultaneously aggregated without losing information. We demonstrate the suitability of this approach using a dataset of 16S rDNA sequences obtained from microbial populations at three different depths of the Gulf of Aqaba in the Red Sea. CoVennTree has been integrated into the Galaxy ToolShed and can be directly downloaded and integrated into the user instance.
CoVennTree; weighted Venn diagram; VDS value; massive comparative analysis; rooted tree
RNA-seq and especially differential RNA-seq-type transcriptomic analyses (dRNA-seq) are powerful analytical tools, as they not only provide insights into gene expression changes but also provide detailed information about all promoters active at a given moment, effectively giving a deep insight into the transcriptional landscape. Synechocystis sp. PCC 6803 (Synechocystis 6803) is a unicellular model cyanobacterium that is widely used in research fields from ecology, photophysiology to systems biology, modelling and biotechnology. Here, we analysed the response of the Synechocystis 6803 primary transcriptome to different, environmentally relevant stimuli. We established genome-wide maps of the transcriptional start sites active under 10 different conditions relevant for photosynthetic growth and identified 4,091 transcriptional units, which provide information about operons, 5′ and 3′ untranslated regions (UTRs). Based on a unique expression factor, we describe regulons and relevant promoter sequences at single-nucleotide resolution. Finally, we report several sRNAs with an intriguing expression pattern and therefore likely function, specific for carbon depletion (CsiR1), nitrogen depletion (NsiR4), phosphate depletion (PsiR1), iron stress (IsaR1) or photosynthesis (PsrR1). This dataset is accompanied by comprehensive information providing extensive visualization and data access to allow an easy-to-use approach for the design of experiments, the incorporation into modelling studies of the regulatory system and for comparative analyses.
comparative transcriptome analysis; cyanobacteria; regulation of gene expression; sRNA; transcriptional unit
An RNA-based screen was performed to reveal a possible evolutionary scenario for the CRISPR-Cas systems in two cyanobacterial model strains. Following the analysis of a draft genome sequence of Synechocystis sp PCC6714, three different CRISPR-Cas systems were characterized that have different degrees of relatedness to another three CRISPR-Cas systems in Synechocystis sp PCC6803. A subtype III-B system was identified that is extremely conserved between both strains. Strong signals in northern hybridizations and the presence of different spacers (but identical repeats) indicated this system to be active, despite the absence of a known endonuclease candidate gene involved in the maturation of its crRNAs in the two strains. The other two systems were found to differ significantly from each other, with different sets of repeat-spacer arrays and different Cas genes. In view of the otherwise very close relatedness of the two analyzed strains, this is suggestive of an unknown mechanism involved in the replacement of CRISPR-Cas cassettes as a whole. Further RNA analyses revealed the accumulation of crRNAs to be impacted by environmental conditions critical for photoautotropic growth. All six systems are associated with a gene for a possible transcriptional repressor. Indeed, we identified one of these genes, sll7009, as encoding a negative regulator specific for the CRISPR1 subtype I-D system in Synechocystis sp PCC6803.
comparative genomics; CRISPR; cyanobacteria; defense mechanisms; crRNA maturation; transcriptional regulator
RNA-seq and its variant differential RNA-seq (dRNA-seq) are today routine methods for transcriptome analysis in bacteria. While expression profiling and transcriptional start site prediction are standard tasks today, the problem of identifying transcriptional units in a genome-wide fashion is still not solved for prokaryotic systems.
We present RNAseg, an algorithm for the prediction of transcriptional units based on dRNA-seq data. A key feature of the algorithm is that, based on the data, it distinguishes between transcribed and un-transcribed genomic segments. Furthermore, the program provides many different predictions in a single run, which can be used to infer the significance of transcriptional units in a consensus procedure. We show the performance of our method based on a well-studied dRNA-seq data set for Helicobacter pylori.
With our algorithm it is possible to identify operons and 5’- and 3’-UTRs in an automated fashion. This alleviates the need for labour intensive manual inspection and enables large-scale studies in the area of comparative transcriptomics.
RNA-seq; Differential RNA-seq; Segmentation; Transcriptional unit; Transcriptome; Transcriptional start site; Dynamic programming
RNA molecules, especially non-coding RNAs, play vital roles in the cell and their biological functions are mostly determined by structural properties. Often, these properties are related to dynamic changes in the structure, as in the case of riboswitches, and thus the analysis of RNA folding kinetics is crucial for their study. Exact approaches to kinetic folding are computationally expensive and, thus, limited to short sequences. In a previous study, we introduced a position-specific abstraction based on helices which we termed helix index shapes (hishapes) and a hishape-based algorithm for near-optimal folding pathway computation, called HiPath. The combination of these approaches provides an abstract view of the folding space that offers information about the global features.
In this paper we present HiKinetics, an algorithm that can predict RNA folding kinetics for sequences up to several hundred nucleotides long. This algorithm is based on RNAHeliCes, which decomposes the folding space into abstract classes, namely hishapes, and an improved version of HiPath, namely HiPath2, which estimates plausible folding pathways that connect these classes. Furthermore, we analyse the relationship of hishapes to locally optimal structures, the results of which strengthen the use of the hishape abstraction for studying folding kinetics. Finally, we show the application of HiKinetics to the folding kinetics of two well-studied RNAs.
HiKinetics can calculate kinetic folding based on a novel hishape decomposition. HiKinetics, together with HiPath2 and RNAHeliCes, is available for download at http://www.cyanolab.de/software/RNAHeliCes.htm.
RNA; Folding space; Kinetics; Abstraction
Synechocystis sp. PCC 6803 is the most popular cyanobacterial model for prokaryotic photosynthesis and for metabolic engineering to produce biofuels. Genomic and transcriptomic comparisons between closely related bacteria are powerful approaches to infer insights into their metabolic potentials and regulatory networks. To enable a comparative approach, we generated the draft genome sequence of Synechocystis sp. PCC 6714, a closely related strain of 6803 (16S rDNA identity 99.4%) that also is amenable to genetic manipulation. Both strains share 2838 protein-coding genes, leaving 845 unique genes in Synechocystis sp. PCC 6803 and 895 genes in Synechocystis sp. PCC 6714. The genetic differences include a prophage in the genome of strain 6714, a different composition of the pool of transposable elements, and a ∼40 kb genomic island encoding several glycosyltransferases and transport proteins. We verified several physiological differences that were predicted on the basis of the respective genome sequence. Strain 6714 exhibited a lower tolerance to Zn2+ ions, associated with the lack of a corresponding export system and a lowered potential of salt acclimation due to the absence of a transport system for the re-uptake of the compatible solute glucosylglycerol. These new data will support the detailed comparative analyses of this important cyanobacterial group than has been possible thus far. Genome information for Synechocystis sp. PCC 6714 has been deposited in Genbank (accession no AMZV01000000).
comparative genomics; cyanophages; genome sequence; prophage; salt acclimation
Marine cyanobacteria of the genus Acaryochloris are the only known organisms that use chlorophyll d as a photosynthetic pigment. However, based on chemical sediment analyses, chlorophyll d has been recognized to be widespread in oceanic and lacustrine environments. Therefore it is highly relevant to understand the genetic basis for different physiologies and possible niche adaptation in this genus. Here we show that unlike all other known isolates of Acaryochloris, the strain HICR111A, isolated from waters around Heron Island, Great Barrier Reef, possesses a unique genomic region containing all the genes for the structural and enzymatically active proteins of nitrogen fixation and cofactor biosynthesis. Their phylogenetic analysis suggests a close relation to nitrogen fixation genes from certain other marine cyanobacteria. We show that nitrogen fixation in Acaryochloris sp. HICR111A is regulated in a light–dark-dependent fashion. We conclude that nitrogen fixation, one of the most complex physiological traits known in bacteria, might be transferred among oceanic microbes by horizontal gene transfer more often than anticipated so far. Our data show that the two powerful processes of oxygenic photosynthesis and nitrogen fixation co-occur in one and the same cell also in this branch of marine microbes and characterize Acaryochloris as a physiologically versatile inhabitant of an ecological niche, which is primarily driven by the absorption of far-red light.
Acaryochloris; chlorophyll d; cyanobacteria; dinitrogen fixation; microbial diversity; nitrogenase
Nodularia spumigena is a filamentous diazotrophic cyanobacterium that dominates the annual late summer cyanobacterial blooms in the Baltic Sea. But N. spumigena also is common in brackish water bodies worldwide, suggesting special adaptation allowing it to thrive at moderate salinities. A draft genome analysis of N. spumigena sp. CCY9414 yielded a single scaffold of 5,462,271 nucleotides in length on which genes for 5,294 proteins were annotated. A subsequent strand-specific transcriptome analysis identified more than 6,000 putative transcriptional start sites (TSS). Orphan TSSs located in intergenic regions led us to predict 764 non-coding RNAs, among them 70 copies of a possible retrotransposon and several potential RNA regulators, some of which are also present in other N2-fixing cyanobacteria. Approximately 4% of the total coding capacity is devoted to the production of secondary metabolites, among them the potent hepatotoxin nodularin, the linear spumigin and the cyclic nodulapeptin. The transcriptional complexity associated with genes involved in nitrogen fixation and heterocyst differentiation is considerably smaller compared to other Nostocales. In contrast, sophisticated systems exist for the uptake and assimilation of iron and phosphorus compounds, for the synthesis of compatible solutes, and for the formation of gas vesicles, required for the active control of buoyancy. Hence, the annotation and interpretation of this sequence provides a vast array of clues into the genomic underpinnings of the physiology of this cyanobacterium and indicates in particular a competitive edge of N. spumigena in nutrient-limited brackish water ecosystems.
Synechocystis sp. PCC 6803 is a widely used model cyanobacterium for studying photosynthesis, phototaxis, the production of biofuels and many other aspects. Here we present a re-sequencing study of the genome and seven plasmids of one of the most widely used Synechocystis sp. PCC 6803 substrains, the glucose tolerant and motile Moscow or ‘PCC-M’ strain, revealing considerable evidence for recent microevolution. Seven single nucleotide polymorphisms (SNPs) specifically shared between ‘PCC-M’ and the ‘PCC-N and PCC-P’ substrains indicate that ‘PCC-M’ belongs to the ‘PCC’ group of motile strains. The identified indels and SNPs in ‘PCC-M’ are likely to affect glucose tolerance, motility, phage resistance, certain stress responses as well as functions in the primary metabolism, potentially relevant for the synthesis of alkanes. Three SNPs in intergenic regions could affect the promoter activities of two protein-coding genes and one cis-antisense RNA. Two deletions in ‘PCC-M’ affect parts of clustered regularly interspaced short palindrome repeats-associated spacer-repeat regions on plasmid pSYSA, in one case by an unusual recombination between spacer sequences.
CRISPR; genome sequence; plasmid; substrain; Synechocystis sp. PCC 6803
Information on the numbers and functions of naturally occurring antisense RNAs (asRNAs) in eubacteria has thus far remained incomplete. Here, we screened the model cyanobacterium Synechocystis sp. PCC 6803 for asRNAs using four different methods. In the final data set, the number of known noncoding RNAs rose from 6 earlier identified to 60 and of asRNAs from 1 to 73 (28 were verified using at least three methods). Among these, there are many asRNAs to housekeeping, regulatory or metabolic genes, as well as to genes encoding electron transport proteins. Transferring cultures to high light, carbon-limited conditions or darkness influenced the expression levels of several asRNAs, suggesting their functional relevance. Examples include the asRNA to rpl1, which accumulates in a light-dependent manner and may be required for processing the L11 r-operon and the SyR7 noncoding RNA, which is antisense to the murF 5′ UTR, possibly modulating murein biosynthesis. Extrapolated to the whole genome, ∼10% of all genes in Synechocystis are influenced by asRNAs. Thus, chromosomally encoded asRNAs may have an important function in eubacterial regulatory networks.
antisense RNA; cyanobacteria; microarray; noncoding RNA; Synechocystis
In bacteria, non-coding RNAs (ncRNA) are crucial regulators of gene expression, controlling various stress responses, virulence, and motility. Previous work revealed a relatively high number of ncRNAs in some marine cyanobacteria. However, for efficient genetic and biochemical analysis it would be desirable to identify a set of ncRNA candidate genes in model cyanobacteria that are easy to manipulate and for which extended mutant, transcriptomic and proteomic data sets are available.
Here we have used comparative genome analysis for the biocomputational prediction of ncRNA genes and other sequence/structure-conserved elements in intergenic regions of the three unicellular model cyanobacteria Synechocystis PCC6803, Synechococcus elongatus PCC6301 and Thermosynechococcus elongatus BP1 plus the toxic Microcystis aeruginosa NIES843. The unfiltered numbers of predicted elements in these strains is 383, 168, 168, and 809, respectively, combined into 443 sequence clusters, whereas the numbers of individual elements with high support are 94, 56, 64, and 406, respectively. Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set. Experimental analysis of selected ncRNA candidates in Synechocystis PCC6803 validated new ncRNAs originating from the fabF-hoxH and apcC-prmA intergenic spacers and three highly expressed ncRNAs belonging to the Yfr2 family of ncRNAs. Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.
Comparison to entries in Rfam and experimental testing of selected ncRNA candidates in Synechocystis PCC6803 indicate a high reliability of the current prediction, despite some contamination by the high number of repetitive sequences in some of these species. In particular, we identified in the four species altogether 8 new ncRNA homologs belonging to the Yfr2 family of ncRNAs. Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs. Since our analysis has been restricted to find ncRNA candidates with a reasonable high degree of conservation among these four cyanobacteria, there might be many more, requiring direct experimental approaches for their identification.
Non-coding RNAs (ncRNA) are regulators of gene expression in all domains of life. They control growth and differentiation, virulence, motility and various stress responses. The identification of ncRNAs can be a tedious process due to the heterogeneous nature of this molecule class and the missing sequence similarity of orthologs, even among closely related species. The small ncRNA Yfr1 has previously been found in the Prochlorococcus/Synechococcus group of marine cyanobacteria.
Here we show that screening available genome sequences based on an RNA motif and followed by experimental analysis works successfully in detecting this RNA in all lineages of cyanobacteria. Yfr1 is an abundant ncRNA between 54 and 69 nt in size that is ubiquitous for cyanobacteria except for two low light-adapted strains of Prochlorococcus, MIT 9211 and SS120, in which it must have been lost secondarily. Yfr1 consists of two predicted stem-loop elements separated by an unpaired sequence of 16–20 nucleotides containing the ultraconserved undecanucleotide 5'-ACUCCUCACAC-3'.
Starting with an ncRNA previously found in a narrow group of cyanobacteria only, we show here the highly specific and sensitive identification of its homologs within all lineages of cyanobacteria, whereas it was not detected within the genome sequences of E. coli and of 7 other eubacteria belonging to the alpha-proteobacteria, chlorobiaceae and spirochaete. The integration of RNA motif prediction into computational pipelines for the detection of ncRNAs in bacteria appears as a promising step to improve the quality of such predictions.
MicroRNAs (miRNAs) are regulatory RNA molecules that are specified by their mode of action, the structure of primary transcripts, and their typical size of 20–24 nucleotides. Frequently, not only single miRNAs but whole families of closely related miRNAs have been found in animals and plants. Some families are widely conserved among different plant taxa. Hence, it is evident that these conserved miRNAs are of ancient origin and indicate essential functions that have been preserved over long evolutionary time scales. In contrast, other miRNAs seem to be species-specific and consequently must possess very distinct functions. Thus, the analysis of an early-branching species provides a window into the early evolution of fundamental regulatory processes in plants.
Based on a combined experimental-computational approach, we report on the identification of 48 novel miRNAs and their putative targets in the moss Physcomitrella patens. From these, 18 miRNAs and two targets were verified in independent experiments. As a result of our study, the number of known miRNAs in Physcomitrella has been raised to 78. Functional assignments to mRNAs targeted by these miRNAs revealed a bias towards genes that are involved in regulation, cell wall biosynthesis and defense. Eight miRNAs were detected with different expression in protonema and gametophore tissue. The miRNAs 1–50 and 2–51 are located on a shared precursor that are separated by only one nucleotide and become processed in a tissue-specific way.
Our data provide evidence for a surprisingly diverse and complex miRNA population in Physcomitrella. Thus, the number and function of miRNAs must have significantly expanded during the evolution of early land plants. As we have described here within, the coupled maturation of two miRNAs from a shared precursor has not been previously identified in plants.
The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at .
Soon after the first algorithms for RNA folding became available, it was recognised that the prediction of only one energetically optimal structure is insufficient to achieve reliable results. An in-depth analysis of the folding space as a whole appeared necessary to deduce the structural properties of a given RNA molecule reliably. Folding space analysis comprises various methods such as suboptimal folding, computation of base pair probabilities, sampling procedures and abstract shape analysis. Common to many approaches is the idea of partitioning the folding space into classes of structures, for which certain properties can be derived.
In this paper we extend the approach of abstract shape analysis. We show how to compute the accumulated probabilities of all structures that share the same shape. While this implies a complete (non-heuristic) analysis of the folding space, the computational effort depends only on the size of the shape space, which is much smaller. This approach has been integrated into the tool RNAshapes, and we apply it to various RNAs.
Analyses of conformational switches show the existence of two shapes with probabilities approximately 23
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabikdaYaqaaiabiodaZaaaaaa@2EA2@ vs. 13
MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaadaWcaaqaaiabigdaXaqaaiabiodaZaaaaaa@2EA0@, whereas the analysis of a microRNA precursor reveals one shape with a probability near to 1.0. Furthermore, it is shown that a shape can outperform an energetically more favourable one by achieving a higher probability. From these results, and the fact that we use a complete and exact analysis of the folding space, we conclude that this approach opens up new and promising routes for investigating and understanding RNA secondary structure.
The function of a non-protein-coding RNA is often determined by its structure. Since experimental determination of RNA structure is time-consuming and expensive, its computational prediction is of great interest, and efficient solutions based on thermodynamic parameters are known. Frequently, however, the predicted minimum free energy structures are not the native ones, leading to the necessity of generating suboptimal solutions. While this can be accomplished by a number of programs, the user is often confronted with large outputs of similar structures, although he or she is interested in structures with more fundamental differences, or, in other words, with different abstract shapes. Here, we formalize the concept of abstract shapes and introduce their efficient computation. Each shape of an RNA molecule comprises a class of similar structures and has a representative structure of minimal free energy within the class. Shape analysis is implemented in the program RNAshapes. We applied RNAshapes to the prediction of optimal and suboptimal abstract shapes of several RNAs. For a given energy range, the number of shapes is considerably smaller than the number of structures, and in all cases, the native structures were among the top shape representatives. This demonstrates that the researcher can quickly focus on the structures of interest, without processing up to thousands of near-optimal solutions. We complement this study with a large-scale analysis of the growth behaviour of structure and shape spaces. RNAshapes is available for download and as an online version on the Bielefeld Bioinformatics Server.