Motivation: Prochlorococcus possesses the smallest genome of all sequenced photoautotrophs. Although the number of regulatory proteins in the genome is very small, the relative number of small regulatory RNAs is comparable with that of other bacteria. The compact genome size of Prochlorococcus offers an ideal system to search for targets of small RNAs (sRNAs) and to refine existing target prediction algorithms.
Results: Target predictions for the cyanobacterial sRNA Yfr1 were carried out with INTARNA in Prochlorococcus MED4. The ultraconserved Yfr1 sequence motif was defined as the putative interaction seed. To study the impact of Yfr1 on its predicted mRNA targets, a reporter system based on green fluorescent protein (GFP) was applied. We show that Yfr1 inhibits the translation of two predicted targets. We used mutation analysis to confirm that Yfr1 directly regulates its targets by an antisense interaction sequestering the ribosome binding site, and to assess the importance of interaction site accessibility.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
The first genome-wide and systematic screen for non-coding RNAs (ncRNAs) in cyanobacteria. Several ncRNAs were computationally predicted and their presence was biochemically verified. These ncRNAs may have regulatory functions, and each shows a distinct phylogenetic distribution.
Whole genome sequencing of marine cyanobacteria has revealed an unprecedented degree of genomic variation and streamlining. With a size of 1.66 megabase-pairs, Prochlorococcus sp. MED4 has the most compact of these genomes and it is enigmatic how the few identified regulatory proteins efficiently sustain the lifestyle of an ecologically successful marine microorganism. Small non-coding RNAs (ncRNAs) control a plethora of processes in eukaryotes as well as in bacteria; however, systematic searches for ncRNAs are still lacking for most eubacterial phyla outside the enterobacteria.
Based on a computational prediction we show the presence of several ncRNAs (cyanobacterial functional RNA or Yfr) in several different cyanobacteria of the Prochlorococcus-Synechococcus lineage. Some ncRNA genes are present only in two or three of the four strains investigated, whereas the RNAs Yfr2 through Yfr5 are structurally highly related and are encoded by a rapidly evolving gene family as their genes exist in different copy numbers and at different sites in the four investigated genomes. One ncRNA, Yfr7, is present in at least seven other cyanobacteria. In addition, control elements for several ribosomal operons were predicted as well as riboswitches for thiamine pyrophosphate and cobalamin.
This is the first genome-wide and systematic screen for ncRNAs in cyanobacteria. Several ncRNAs were both computationally predicted and their presence was biochemically verified. These RNAs may have regulatory functions and each shows a distinct phylogenetic distribution. Our approach can be applied to any group of microorganisms for which more than one total genome sequence is available for comparative analysis.
In bacteria, non-coding RNAs (ncRNA) are crucial regulators of gene expression, controlling various stress responses, virulence, and motility. Previous work revealed a relatively high number of ncRNAs in some marine cyanobacteria. However, for efficient genetic and biochemical analysis it would be desirable to identify a set of ncRNA candidate genes in model cyanobacteria that are easy to manipulate and for which extended mutant, transcriptomic and proteomic data sets are available.
Here we have used comparative genome analysis for the biocomputational prediction of ncRNA genes and other sequence/structure-conserved elements in intergenic regions of the three unicellular model cyanobacteria Synechocystis PCC6803, Synechococcus elongatus PCC6301 and Thermosynechococcus elongatus BP1 plus the toxic Microcystis aeruginosa NIES843. The unfiltered numbers of predicted elements in these strains is 383, 168, 168, and 809, respectively, combined into 443 sequence clusters, whereas the numbers of individual elements with high support are 94, 56, 64, and 406, respectively. Removing also transposon-associated repeats, finally 78, 53, 42 and 168 sequences, respectively, are left belonging to 109 different clusters in the data set. Experimental analysis of selected ncRNA candidates in Synechocystis PCC6803 validated new ncRNAs originating from the fabF-hoxH and apcC-prmA intergenic spacers and three highly expressed ncRNAs belonging to the Yfr2 family of ncRNAs. Yfr2a promoter-luxAB fusions confirmed a very strong activity of this promoter and indicated a stimulation of expression if the cultures were exposed to elevated light intensities.
Comparison to entries in Rfam and experimental testing of selected ncRNA candidates in Synechocystis PCC6803 indicate a high reliability of the current prediction, despite some contamination by the high number of repetitive sequences in some of these species. In particular, we identified in the four species altogether 8 new ncRNA homologs belonging to the Yfr2 family of ncRNAs. Modelling of RNA secondary structures indicated two conserved single-stranded sequence motifs that might be involved in RNA-protein interactions or in the recognition of target RNAs. Since our analysis has been restricted to find ncRNA candidates with a reasonable high degree of conservation among these four cyanobacteria, there might be many more, requiring direct experimental approaches for their identification.
Using gene order as a phylogenetic character has the potential to resolve previously unresolved species relationships. This character was used to resolve the evolutionary history within the genus Prochlorococcus, a group of marine cyanobacteria.
Orthologous gene sets and their genomic positions were identified from 12 species of Prochlorococcus and 1 outgroup species of Synechococcus. From this data, inversion and breakpoint distance-based phylogenetic trees were computed by GRAPPA and FastME. Statistical support of the resulting topology was obtained by application of a 50% jackknife resampling technique. The result was consistent and congruent with nucleotide sequence-based and gene-content based trees. Also, a previously unresolved clade was resolved, that of MIT9211 and SS120.
This is the first study to use gene order data to resolve a bacterial phylogeny at the genus level. It suggests that the technique is useful in resolving the Tree of Life.
Cyanobacteria of the genera Synechococcus and Prochlorococcus are the most abundant photosynthetic organisms on earth, occupying a key position at the base of marine food webs. The cynS gene that encodes cyanase was identified among bacterial, fungal, and plant sequences in public databases, and the gene was particularly prevalent among cyanobacteria, including numerous Prochlorococcus and Synechococcus strains. Phylogenetic analysis of cynS sequences retrieved from the Global Ocean Survey database identified >60% as belonging to unicellular marine cyanobacteria, suggesting an important role for cyanase in their nitrogen metabolism. We demonstrate here that marine cyanobacteria have a functionally active cyanase, the transcriptional regulation of which varies among strains and reflects the genomic context of cynS. In Prochlorococcus sp. strain MED4, cynS was presumably transcribed as part of the cynABDS operon, implying cyanase involvement in cyanate utilization. In Synechococcus sp. strain WH8102, expression was not related to nitrogen stress responses and here cyanase presumably serves in the detoxification of cyanate resulting from intracellular urea and/or carbamoyl phosphate decomposition. Lastly, we report on a cyanase activity encoded by cynH, a novel gene found in marine cyanobacteria only. The presence of dual cyanase genes in the genomes of seven marine Synechococcus strains and their respective roles in nitrogen metabolism remain to be clarified.
Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs.
We present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences.
Our method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability: http://compbio.cs.sfu.ca/taverna/smyrna
Cyanobacteria are an ancient group of photoautotrophic prokaryotes with wide variations in genome size and ecological habitat. Metacaspases (MCAs) are cysteine proteinases that have sequence homology to caspases and play essential roles in programmed cell death (PCD). MCAs have been identified in several prokaryotes, fungi and plants; however, knowledge about cyanobacterial metacaspases still remains obscure. With the availability of sequenced genomes of 33 cyanobacteria, we perform a comparative analysis of metacaspases and explore their distribution, domain structure and evolution.
A total of 58 putative MCAs were identified, which are abundant in filamentous diazotrophic cyanobacteria and Acaryochloris marina MBIC 11017 and absent in all Prochlorococcus and marine Synechococcus strains, except Synechococcus sp. PCC 7002. The Cys-His dyad of caspase superfamily is conserved, while mutations (Tyr in place of His and Ser/Asn/Gln/Gly instead of Cys) are also detected in some cyanobacteria. MCAs can be classified into two major families (α and β) based on the additional domain structure. Ten types and a total of 276 additional domains were identified, most of which involves in signal transduction. Apoptotic related NACHT domain was also found in two cyanobacterial MCAs. Phylogenetic tree of MCA catalytic P20 domains coincides well with the domain structure and the phylogenies based on 16s rRNA.
The existence and quantity of MCA genes in unicellular and filamentous cyanobacteria are a function of the genome size and ecological habitat. MCAs of family α and β seem to evolve separately and the recruitment of WD40 additional domain occurs later than the divergence of the two families. In this study, a general framework of sequence-structure-function connections for the metacaspases has been revealed, which may provide new targets for function investigation.
Marine cyanobacteria of the genera Prochlorococcus and Synechococcus are the most abundant photosynthetic prokaryotes in oceanic environments, and are key contributors to global CO2 fixation, chlorophyll biomass and primary production. Cyanophages, viruses infecting cyanobacteria, are a major force in the ecology of their hosts. These phages contribute greatly to cyanobacterial mortality, therefore acting as a powerful selective force upon their hosts. Phage reproduction is based on utilization of the host transcription and translation mechanisms; therefore, differences in the G+C genomic content between cyanophages and their hosts could be a limiting factor for the translation of cyanophage genes. On the basis of comprehensive genomic analyses conducted in this study, we suggest that cyanophages of the Myoviridae family, which can infect both Prochlorococcus and Synechococcus, overcome this limitation by carrying additional sets of tRNAs in their genomes accommodating AT-rich codons. Whereas the tRNA genes are less needed when infecting their Prochlorococcus hosts, which possess a similar G+C content to the cyanophage, the additional tRNAs may increase the overall translational efficiency of their genes when infecting a Synechococcus host (with high G+C content), therefore potentially enabling the infection of multiple hosts.
codon usage; cross-infectivity; marine cyanophages; Prochlorococcus; Synechococcus; tRNA
NCRNAs (noncoding RNAs) play important roles in many biological processes. Existing genome-scale ncRNA search tools identify ncRNAs in local sequence alignments generated by conventional sequence comparison methods. However, some types of ncRNA lack strong sequence conservation and tend to be missed or mis-aligned by conventional sequence comparison.
In this paper, we propose an ncRNA identification framework that is complementary to existing sequence comparison tools. By integrating a filtration step based on Hamming distance and ncRNA alignment programs such as FOLDALIGN or PLAST-ncRNA, the proposed ncRNA search framework can identify ncRNAs that lack strong sequence conservation. In addition, as the ratio of transition and transversion mutation is often used as a discriminative feature for functional ncRNA identification, we incorporate this feature into the filtration step using a coding strategy. We apply Hamming distance seeds to ncRNA search in the intergenic regions of human and mouse genomes and between the Burkholderia cenocepacia J2315 genome and the Ralstonia solanacearum genome. The experimental results demonstrate that a carefully designed Hamming distance seed can achieve better sensitivity in searching for poorly conserved ncRNAs than conventional sequence comparison tools.
Hamming distance seeds provide better sensitivity as a filtration strategy for genome-wide ncRNA homology search than the existing seeding strategies used in BLAST-like tools. By combining Hamming distance seeds matching and ncRNA alignment, we are able to find ncRNAs with sequence similarities below 60%.
The phylogeny and taxonomy of cyanobacteria is currently poorly understood due to paucity of reliable markers for identification and circumscription of its major clades.
A combination of phylogenomic and protein signature based approaches was used to characterize the major clades of cyanobacteria. Phylogenetic trees were constructed for 44 cyanobacteria based on 44 conserved proteins. In parallel, Blastp searches were carried out on each ORF in the genomes of Synechococcus WH8102, Synechocystis PCC6803, Nostoc PCC7120, Synechococcus JA-3-3Ab, Prochlorococcus MIT9215 and Prochlor. marinus subsp. marinus CCMP1375 to identify proteins that are specific for various main clades of cyanobacteria. These studies have identified 39 proteins that are specific for all (or most) cyanobacteria and large numbers of proteins for other cyanobacterial clades. The identified signature proteins include: (i) 14 proteins for a deep branching clade (Clade A) of Gloebacter violaceus and two diazotrophic Synechococcus strains (JA-3-3Ab and JA2-3-B'a); (ii) 5 proteins that are present in all other cyanobacteria except those from Clade A; (iii) 60 proteins that are specific for a clade (Clade C) consisting of various marine unicellular cyanobacteria (viz. Synechococcus and Prochlorococcus); (iv) 14 and 19 signature proteins that are specific for the Clade C Synechococcus and Prochlorococcus strains, respectively; (v) 67 proteins that are specific for the Low B/A ecotype Prochlorococcus strains, containing lower ratio of chl b/a2 and adapted to growth at high light intensities; (vi) 65 and 8 proteins that are specific for the Nostocales and Chroococcales orders, respectively; and (vii) 22 and 9 proteins that are uniquely shared by various Nostocales and Oscillatoriales orders, or by these two orders and the Chroococcales, respectively. We also describe 3 conserved indels in flavoprotein, heme oxygenase and protochlorophyllide oxidoreductase proteins that are specific for either Clade C cyanobacteria or for various subclades of Prochlorococcus. Many other conserved indels for cyanobacterial clades have been described recently.
These signature proteins and indels provide novel means for circumscription of various cyanobacterial clades in clear molecular terms. Their functional studies should lead to discovery of novel properties that are unique to these groups of cyanobacteria.
Prochlorococcus is a genus of marine cyanobacteria characterized by small cell and genome size, an evolutionary trend toward low GC content, the possession of chlorophyll b, and the absence of phycobilisomes. Whereas many shared derived characters define Prochlorococcus as a clade, many genome-based analyses recover them as paraphyletic, with some low-light adapted Prochlorococcus spp. grouping with marine Synechococcus. Here, we use 18 Prochlorococcus and marine Synechococcus genomes to analyze gene flow within and between these taxa. We introduce embedded quartet scatter plots as a tool to screen for genes whose phylogeny agrees or conflicts with the plurality phylogenetic signal, with accepted taxonomy and naming, with GC content, and with the ecological adaptation to high and low light intensities. We find that most gene families support high-light adapted Prochlorococcus spp. as a monophyletic clade and low-light adapted Prochlorococcus sp. as a paraphyletic group. But we also detect 16 gene families that were transferred between high-light adapted and low-light adapted Prochlorococcus sp. and 495 gene families, including 19 ribosomal proteins, that do not cluster designated Prochlorococcus and Synechococcus strains in the expected manner. To explain the observed data, we propose that frequent gene transfer between marine Synechococcus spp. and low-light adapted Prochlorococcus spp. has created a “highway of gene sharing” (Beiko RG, Harlow TJ, Ragan MA. 2005. Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA. 102:14332–14337) that tends to erode genus boundaries without erasing the Prochlorococcus-specific ecological adaptations.
marine cyanobacteria; horizontal gene transfer; introgression; quartet decomposition; supertree; genome evolution
Serine/threonine kinases (STKs) have been found in an increasing number of prokaryotes, showing important roles in signal transduction that supplement the well known role of two-component system. Cyanobacteria are photoautotrophic prokaryotes able to grow in a wide range of ecological environments, and their signal transduction systems are important in adaptation to the environment. Sequence information from several cyanobacterial genomes offers a unique opportunity to conduct a comprehensive comparative analysis of this kinase family. In this study, we extracted information regarding Ser/Thr kinases from 21 species of sequenced cyanobacteria and investigated their diversity, conservation, domain structure, and evolution.
286 putative STK homologues were identified. STKs are absent in four Prochlorococcus strains and one marine Synechococcus strain and abundant in filamentous nitrogen-fixing cyanobacteria. Motifs and invariant amino acids typical in eukaryotic STKs were conserved well in these proteins, and six more cyanobacteria- or bacteria-specific conserved residues were found. These STK proteins were classified into three major families according to their domain structures. Fourteen types and a total of 131 additional domains were identified, some of which are reported to participate in the recognition of signals or substrates. Cyanobacterial STKs show rather complicated phylogenetic relationships that correspond poorly with phylogenies based on 16S rRNA and those based on additional domains.
The number of STK genes in different cyanobacteria is the result of the genome size, ecophysiology, and physiological properties of the organism. Similar conserved motifs and amino acids indicate that cyanobacterial STKs make use of a similar catalytic mechanism as eukaryotic STKs. Gene gain-and-loss is significant during STK evolution, along with domain shuffling and insertion. This study has established an overall framework of sequence-structure-function interactions for the STK gene family, which may facilitate further studies of the role of STKs in various organisms.
Non-coding RNAs (ncRNAs) are transcripts that do not code for proteins. Recent findings have shown that RNA-mediated regulatory mechanisms influence a substantial portion of typical microbial genomes. We present an efficient method for finding potential ncRNAs in bacteria by clustering genomic sequences based on homology inferred from both primary sequence and secondary structure. We evaluate our approach using a set of predominantly Firmicutes sequences. Our results showed that, though primary sequence based-homology search was inaccurate for diverged ncRNA sequences, through our clustering method, we were able to infer motifs that recovered nearly all members of most known ncRNA families. Hence, our method shows promise for discovering new families of ncRNA.
ncRNAs; noncoding RNA; RNA discovery; hierarchical clustering; motif discovery
Prochlorococcus, an extremely small cyanobacterium that is very abundant in the world's oceans, has a very streamlined genome. On average, these cells have about 2,000 genes and very few regulatory proteins. The limited capability of regulation is thought to be a result of selection imposed by a relatively stable environment in combination with a very small genome. Furthermore, only ten non-coding RNAs (ncRNAs), which play crucial regulatory roles in all forms of life, have been described in Prochlorococcus. Most strains also lack the RNA chaperone Hfq, raising the question of how important this mode of regulation is for these cells. To explore this question, we examined the transcription of intergenic regions of Prochlorococcus MED4 cells subjected to a number of different stress conditions: changes in light qualities and quantities, phage infection, or phosphorus starvation. Analysis of Affymetrix microarray expression data from intergenic regions revealed 276 novel transcriptional units. Among these were 12 new ncRNAs, 24 antisense RNAs (asRNAs), as well as 113 short mRNAs. Two additional ncRNAs were identified by homology, and all 14 new ncRNAs were independently verified by Northern hybridization and 5′RACE. Unlike its reduced suite of regulatory proteins, the number of ncRNAs relative to genome size in Prochlorococcus is comparable to that found in other bacteria, suggesting that RNA regulators likely play a major role in regulation in this group. Moreover, the ncRNAs are concentrated in previously identified genomic islands, which carry genes of significance to the ecology of this organism, many of which are not of cyanobacterial origin. Expression profiles of some of these ncRNAs suggest involvement in light stress adaptation and/or the response to phage infection consistent with their location in the hypervariable genomic islands.
Prochlorococcus is the most abundant phototroph in the vast, nutrient-poor areas of the ocean. It plays an important role in the ocean carbon cycle, and is a key component of the base of the food web. All cells share a core set of about 1,200 genes, augmented with a variable number of “flexible” genes. Many of the latter are located in genomic islands—hypervariable regions of the genome that encode functions important in differentiating the niches of “ecotypes.” Of major interest is how cells with such a small genome regulate cellular processes, as they lack many of the regulatory proteins commonly found in bacteria. We show here that contrary to the regulatory proteins, ncRNAs are present at levels typical of bacteria, revealing that they might have a disproportional regulatory role in Prochlorococcus—likely an adaptation to the extremely low-nutrient conditions of the open oceans, combined with the constraints of a small genome. Some of the ncRNAs were differentially expressed under stress conditions, and a high number of them were found to be associated with genomic islands, suggesting functional links between these RNAs and the response of Prochlorococcus to particular environmental challenges.
Fatty acid desaturases are enzymes that introduce double bonds into the hydrocarbon chains of fatty acids. The fatty acid desaturases from 37 cyanobacterial genomes were identified and classified based upon their conserved histidine-rich motifs and phylogenetic analysis, which help to determine the amounts and distributions of desaturases in cyanobacterial species. The filamentous or N2-fixing cyanobacteria usually possess more types of fatty acid desaturases than that of unicellular species. The pathway of acyl-lipid desaturation for unicellular marine cyanobacteria Synechococcus and Prochlorococcus differs from that of other cyanobacteria, indicating different phylogenetic histories of the two genera from other cyanobacteria isolated from freshwater, soil, or symbiont. Strain Gloeobacter violaceus PCC 7421 was isolated from calcareous rock and lacks thylakoid membranes. The types and amounts of desaturases of this strain are distinct to those of other cyanobacteria, reflecting the earliest divergence of it from the cyanobacterial line. Three thermophilic unicellular strains, Thermosynechococcus elongatus BP-1 and two Synechococcus Yellowstone species, lack highly unsaturated fatty acids in lipids and contain only one Δ9 desaturase in contrast with mesophilic strains, which is probably due to their thermic habitats. Thus, the amounts and types of fatty acid desaturases are various among different cyanobacterial species, which may result from the adaption to environments in evolution.
Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution.
Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA.
The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms.
Peroxiredoxin; Structure; Phylogeny and evolution; Comparative genomics; Cyanobacteria
NONCODE is an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NONCODE are as follows: (i) the ncRNAs in NONCODE include almost all the types of ncRNAs, except transfer RNAs and ribosomal RNAs. (ii) All ncRNA sequences and their related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature: more than 80% of the entries are based on experimental data. (iii) Based on the cellular process and function, which a given ncRNA is involved in, we introduced a novel classification system, labeled process function class, to integrate existing classification systems. (iv) In addition, some 1100 ncRNAs have been grouped into nine other classes according to whether they are specific to gender or tissue or associated with tumors and diseases, etc. (v) NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequence, regulatory elements in the flanking sequences, secondary structure, related publications and other information. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Access is free for all users through a web interface at http://noncode.bioinfo.org.cn.
Horizontal or lateral transfer of genetic material between distantly related prokaryotes has been shown to play a major role in the evolution of bacterial and archaeal genomes, but exchange of genes between prokaryotes and eukaryotes is not as well understood. In particular, gene flow from eukaryotes to prokaryotes is rarely documented with strong support, which is unusual since prokaryotic genomes appear to readily accept foreign genes.
Here, we show that abundant marine cyanobacteria in the related genera Synechococcus and Prochlorococcus acquired a key Calvin cycle/glycolytic enzyme from a eukaryote. Two non-homologous forms of fructose bisphosphate aldolase (FBA) are characteristic of eukaryotes and prokaryotes respectively. However, a eukaryotic gene has been inserted immediately upstream of the ancestral prokaryotic gene in several strains (ecotypes) of Synechococcus and Prochlorococcus. In one lineage this new gene has replaced the ancestral gene altogether. The eukaryotic gene is most closely related to the plastid-targeted FBA from red algae. This eukaryotic-type FBA once replaced the plastid/cyanobacterial type in photosynthetic eukaryotes, hinting at a possible functional advantage in Calvin cycle reactions. The strains that now possess this eukaryotic FBA are scattered across the tree of Synechococcus and Prochlorococcus, perhaps because the gene has been transferred multiple times among cyanobacteria, or more likely because it has been selectively retained only in certain lineages.
A gene for plastid-targeted FBA has been transferred from red algae to cyanobacteria, where it has inserted itself beside its non-homologous, functional analogue. Its current distribution in Prochlorococcus and Synechococcus is punctate, suggesting a complex history since its introduction to this group.
Non-coding RNAs (ncRNAs) have diverse essential biological functions in all organisms, and in eukaryotes, two such classes of ncRNAs are the small nucleolar (sno) and small nuclear (sn) RNAs. In this study, we have identified and characterized a collection of sno and snRNAs in Giardia lamblia, by exploiting our discovery of a conserved 12 nt RNA processing sequence motif found in the 3′ end regions of a large number of G. lamblia ncRNA genes. RNA end mapping and other experiments indicate the motif serves to mediate ncRNA 3′ end formation from mono- and di-cistronic RNA precursor transcripts. Remarkably, we find the motif is also utilized in the processing pathway of all four previously identified trans-spliced G. lamblia introns, revealing a common RNA processing pathway for ncRNAs and trans-spliced introns in this organism. Motif sequence conservation then allowed for the bioinformatic and experimental identification of additional G. lamblia ncRNAs, including new U1 and U6 spliceosomal snRNA candidates. The U6 snRNA candidate was then used as a tool to identity novel U2 and U4 snRNAs, based on predicted phylogenetically conserved snRNA–snRNA base-pairing interactions, from a set of previously identified G. lamblia ncRNAs without assigned function. The Giardia snRNAs retain the core features of spliceosomal snRNAs but are sufficiently evolutionarily divergent to explain the difficulties in their identification. Most intriguingly, all of these snRNAs show structural features diagnostic of U2-dependent/major and U12-dependent/minor spliceosomal snRNAs.
Searching for members of characterized ncRNA families containing pseudoknots is an important component of genome-scale ncRNA annotation. However, the state-of-the-art known ncRNA search is based on context-free grammar (CFG), which cannot effectively model pseudoknots. Thus, existing CFG-based ncRNA identification tools usually ignore pseudoknots during search. As a result, dozens of sequences that do not contain the native pseudoknots are reported by these tools. When pseudoknot structures are vital to the functions of the ncRNAs, these sequences may not be true members.
In this work, we design a pseudoknot search tool using multiple simple sub-structures, which are derived from knot-free and bifurcation-free structural motifs in the underlying family. We test our tool on a contiguous 22-Mb region of the Maize Genome. The experimental results show that our work competes favorably with other pseudoknot search methods.
Our sub-structure based tool can conduct genome-scale pseudoknot-containing ncRNA search effectively and efficiently. It provides a complementary pseudoknot search tool to Infernal. The source codes are available at http://www.cse.msu.edu/~chengy/knotsearch.
Prochlorococcus and Synechococcus are the two most abundant marine cyanobacteria. They represent a significant fraction of the total primary production of the world oceans and comprise a major fraction of the prey biomass available to phagotrophic protists. Despite relatively rapid growth rates, picocyanobacterial cell densities in open-ocean surface waters remain fairly constant, implying steady mortality due to viral infection and consumption by predators. There have been several studies on grazing by specific protists on Prochlorococcus and Synechococcus in culture, and of cell loss rates due to overall grazing in the field. However, the specific sources of mortality of these primary producers in the wild remain unknown. Here, we use a modification of the RNA stable isotope probing technique (RNA-SIP), which involves adding labelled cells to natural seawater, to identify active predators that are specifically consuming Prochlorococcus and Synechococcus in the surface waters of the Pacific Ocean. Four major groups were identified as having their 18S rRNA highly labelled: Prymnesiophyceae (Haptophyta), Dictyochophyceae (Stramenopiles), Bolidomonas (Stramenopiles) and Dinoflagellata (Alveolata). For the first three of these, the closest relative of the sequences identified was a photosynthetic organism, indicating the presence of mixotrophs among picocyanobacterial predators. We conclude that the use of RNA-SIP is a useful method to identity specific predators for picocyanobacteria in situ, and that the method could possibly be used to identify other bacterial predators important in the microbial food-web.
Unicellular nitrogen-fixing cyanobacteria are important components of marine phytoplankton. Although non-nitrogen-fixing marine phytoplankton generally exhibit high gene sequence and genomic diversity, gene sequences of natural populations and isolated strains of Crocosphaera watsonii, one of the two most abundant open ocean unicellular cyanobacteria groups, have been shown to be 98–100% identical. The low sequence diversity in Crocosphaera is a dramatic contrast to sympatric species of Prochlorococcus and Synechococcus, and raises the question of how genome differences can explain observed phenotypic diversity among Crocosphaera strains. Here we show, through whole genome comparisons of two phenotypically different strains, that there are strain-specific sequences in each genome, and numerous genome rearrangements, despite exceptionally low sequence diversity in shared genomic regions. Some of the strain-specific sequences encode functions that explain observed phenotypic differences, such as exopolysaccharide biosynthesis. The pattern of strain-specific sequences distributed throughout the genomes, along with rearrangements in shared sequences is evidence of significant genetic mobility that may be attributed to the hundreds of transposase genes found in both strains. Furthermore, such genetic mobility appears to be the main mechanism of strain divergence in Crocosphaera which do not accumulate DNA microheterogeneity over the vast majority of their genomes. The strain-specific sequences found in this study provide tools for future physiological studies, as well as genetic markers to help determine the relative abundance of phenotypes in natural populations.
comparative genomics; Crocosphaera; exopolysaccharide biosynthesis; genome conservation; mobile genetic elements; nitrogen fixation
Summary: Marine picocyanobacteria of the genera Prochlorococcus and Synechococcus numerically dominate the picophytoplankton of the world ocean, making a key contribution to global primary production. Prochlorococcus was isolated around 20 years ago and is probably the most abundant photosynthetic organism on Earth. The genus comprises specific ecotypes which are phylogenetically distinct and differ markedly in their photophysiology, allowing growth over a broad range of light and nutrient conditions within the 45°N to 40°S latitudinal belt that they occupy. Synechococcus and Prochlorococcus are closely related, together forming a discrete picophytoplankton clade, but are distinguishable by their possession of dissimilar light-harvesting apparatuses and differences in cell size and elemental composition. Synechococcus strains have a ubiquitous oceanic distribution compared to that of Prochlorococcus strains and are characterized by phylogenetically discrete lineages with a wide range of pigmentation. In this review, we put our current knowledge of marine picocyanobacterial genomics into an environmental context and present previously unpublished genomic information arising from extensive genomic comparisons in order to provide insights into the adaptations of these marine microbes to their environment and how they are reflected at the genomic level.
Motivation: Small non-coding RNAs (ncRNAs) play important roles in various cellular functions in all clades of life. With next-generation sequencing techniques, it has become possible to study ncRNAs in a high-throughput manner and by using specialized algorithms ncRNA classes such as miRNAs can be detected in deep sequencing data. Typically, such methods are targeted to a certain class of ncRNA. Many methods rely on RNA secondary structure prediction, which is not always accurate and not all ncRNA classes are characterized by a common secondary structure. Unbiased classification methods for ncRNAs could be important to improve accuracy and to detect new ncRNA classes in sequencing data.
Results: Here, we present a scoring system called ALPS (alignment of pattern matrices score) that only uses primary information from a deep sequencing experiment, i.e. the relative positions and lengths of reads, to classify ncRNAs. ALPS makes no further assumptions, e.g. about common structural properties in the ncRNA class and is nevertheless able to identify ncRNA classes with high accuracy. Since ALPS is not designed to recognize a certain class of ncRNA, it can be used to detect novel ncRNA classes, as long as these unknown ncRNAs have a characteristic pattern of deep sequencing read lengths and positions. We evaluate our scoring system on publicly available deep sequencing data and show that it is able to classify known ncRNAs with high sensitivity and specificity.
Availability: Calculated pattern matrices of the datasets hESC and EB are available at the project web site http://www.bio.ifi.lmu.de/ALPS. An implementation of the described method is available upon request from the authors.
From genomic sequencing it has become apparent that the marine cyanomyoviruses capable of infecting strains of unicellular cyanobacteria assigned to the genera Synechococcus and Prochlorococcus are not only morphologically similar to T4, but are also genetically related, typically sharing some 40-48 genes. The large majority of these common genes are the same in all marine cyanomyoviruses so far characterized. Given the fundamental physiological differences between marine unicellular cyanobacteria and heterotrophic hosts of T4-like phages it is not surprising that the study of cyanomyoviruses has revealed novel and fascinating facets of the phage-host relationship. One of the most interesting features of the marine cyanomyoviruses is their possession of a number of genes that are clearly of host origin such as those involved in photosynthesis, like the psbA gene that encodes a core component of the photosystem II reaction centre. Other host-derived genes encode enzymes involved in carbon metabolism, phosphate acquisition and ppGpp metabolism. The impact of these host-derived genes on phage fitness has still largely to be assessed and represents one of the most important topics in the study of this group of T4-like phages in the laboratory. However, these phages are also of considerable environmental significance by virtue of their impact on key contributors to oceanic primary production and the true extent and nature of this impact has still to be accurately assessed.