The oceanic cyanobacteria Prochlorococcus are globally important, ecologically diverse primary producers. It is thought that their viruses (phages) mediate population sizes and affect the evolutionary trajectories of their hosts. Here we present an analysis of genomes from three Prochlorococcus phages: a podovirus and two myoviruses. The morphology, overall genome features, and gene content of these phages suggest that they are quite similar to T7-like (P-SSP7) and T4-like (P-SSM2 and P-SSM4) phages. Using the existing phage taxonomic framework as a guideline, we examined genome sequences to establish “core” genes for each phage group. We found the podovirus contained 15 of 26 core T7-like genes and the two myoviruses contained 43 and 42 of 75 core T4-like genes. In addition to these core genes, each genome contains a significant number of “cyanobacterial” genes, i.e., genes with significant best BLAST hits to genes found in cyanobacteria. Some of these, we speculate, represent “signature” cyanophage genes. For example, all three phage genomes contain photosynthetic genes (psbA, hliP) that are thought to help maintain host photosynthetic activity during infection, as well as an aldolase family gene (talC) that could facilitate alternative routes of carbon metabolism during infection. The podovirus genome also contains an integrase gene (int) and other features that suggest it is capable of integrating into its host. If indeed it is, this would be unprecedented among cultured T7-like phages or marine cyanophages and would have significant evolutionary and ecological implications for phage and host. Further, both myoviruses contain phosphate-inducible genes (phoH and pstS) that are likely to be important for phage and host responses to phosphate stress, a commonly limiting nutrient in marine systems. Thus, these marine cyanophages appear to be variations of two well-known phages—T7 and T4—but contain genes that, if functional, reflect adaptations for infection of photosynthetic hosts in low-nutrient oceanic environments.
An analysis of the genome sequences of three phages capable of infecting marine unicellular cyanobacteria Prochlorococcus reveals they are genetically complex with intriguing adaptations related to their oceanic environment
Marine Synechococcus spp and marine Prochlorococcus spp are numerically dominant photoautotrophs in the open oceans and contributors to the global carbon cycle. Syn5 is a short-tailed cyanophage isolated from the Sargasso Sea on Synechococcus strain WH8109. Syn5 has been grown in WH8109 to high titer in the laboratory and purified and concentrated retaining infectivity. Genome sequencing and annotation of Syn5 revealed that the linear genome is 46,214bp with a 237bp terminal direct repeat. Sixty-one open reading frames (ORFs) were identified. Based on genomic organization and sequence similarity to known protein sequences within GenBank, Syn5 shares features with T7-like phages. The presence of a putative integrase suggests access to a temperate life-cycle. Assignment of eleven ORFs to structural proteins found within the phage virion was confirmed by mass-spectrometry and N-terminal sequencing. Eight of these identified structural proteins exhibited amino acid sequence similarity to enteric phage proteins. The remaining three virion proteins did not resemble any known phage sequences in GenBank as of August 2006. Cryoelectron micrographs of purified Syn5 virions revealed that the capsid has a single “horn”, a novel fibrous structure protruding from the opposing end of the capsid from the tail of the virion. The tail appendage displayed an apparent three-fold rather than six-fold symmetry. An 18Å-resolution icosahedral reconstruction of the capsid revealed a T=7 lattice, but with an unusual pattern of surface knobs. This phage/host system should allow detailed investigation of the physiology and biochemistry of phage propagation in marine photosynthetic bacteria.
Cyanophages (cyanobacterial viruses) are important agents of horizontal gene transfer among marine cyanobacteria, the numerically dominant photosynthetic organisms in the oceans. Some cyanophage genomes carry and express host-like photosynthesis genes, presumably to augment the host photosynthetic machinery during infection. To study the prevalence and evolutionary dynamics of this phenomenon, 33 cultured cyanophages of known family and host range and viral DNA from field samples were screened for the presence of two core photosystem reaction center genes,
psbD. Combining this expanded dataset with published data for nine other cyanophages, we found that 88% of the phage genomes contain
psbA, and 50% contain both
psbA gene was found in all myoviruses and
Prochlorococcus podoviruses, but could not be amplified from
Prochlorococcus siphoviruses or
Synechococcus podoviruses. Nearly all of the phages that encoded both
psbD had broad host ranges. We speculate that the presence or absence of
psbA in a phage genome may be determined by the length of the latent period of infection. Whether it also carries
psbD may reflect constraints on coupling of viral- and host-encoded PsbA–PsbD in the photosynthetic reaction center across divergent hosts. Phylogenetic clustering patterns of these genes from cultured phages suggest that whole genes have been transferred from host to phage in a discrete number of events over the course of evolution (four for
psbA, and two for
psbD), followed by horizontal and vertical transfer between cyanophages. Clustering patterns of
Synechococcus cells were inconsistent with other molecular phylogenetic markers, suggesting genetic exchanges involving
Synechococcus lineages. Signatures of intragenic recombination, detected within the cyanophage gene pool as well as between hosts and phages in both directions, support this hypothesis. The analysis of cyanophage
psbD genes from field populations revealed significant sequence diversity, much of which is represented in our cultured isolates. Collectively, these findings show that photosynthesis genes are common in cyanophages and that significant genetic exchanges occur from host to phage, phage to host, and within the phage gene pool. This generates genetic diversity among the phage, which serves as a reservoir for their hosts, and in turn influences photosystem evolution.
Analysis of 33 cultured cyanophages of known family and host range, as well as viral DNA from field samples, reveals the prevalence of photosynthesis genes in cyanophages and demonstrates significant genetic exchanges between host and phage.
Prochlorococcus, an extremely small cyanobacterium that is very abundant in the world's oceans, has a very streamlined genome. On average, these cells have about 2,000 genes and very few regulatory proteins. The limited capability of regulation is thought to be a result of selection imposed by a relatively stable environment in combination with a very small genome. Furthermore, only ten non-coding RNAs (ncRNAs), which play crucial regulatory roles in all forms of life, have been described in Prochlorococcus. Most strains also lack the RNA chaperone Hfq, raising the question of how important this mode of regulation is for these cells. To explore this question, we examined the transcription of intergenic regions of Prochlorococcus MED4 cells subjected to a number of different stress conditions: changes in light qualities and quantities, phage infection, or phosphorus starvation. Analysis of Affymetrix microarray expression data from intergenic regions revealed 276 novel transcriptional units. Among these were 12 new ncRNAs, 24 antisense RNAs (asRNAs), as well as 113 short mRNAs. Two additional ncRNAs were identified by homology, and all 14 new ncRNAs were independently verified by Northern hybridization and 5′RACE. Unlike its reduced suite of regulatory proteins, the number of ncRNAs relative to genome size in Prochlorococcus is comparable to that found in other bacteria, suggesting that RNA regulators likely play a major role in regulation in this group. Moreover, the ncRNAs are concentrated in previously identified genomic islands, which carry genes of significance to the ecology of this organism, many of which are not of cyanobacterial origin. Expression profiles of some of these ncRNAs suggest involvement in light stress adaptation and/or the response to phage infection consistent with their location in the hypervariable genomic islands.
Prochlorococcus is the most abundant phototroph in the vast, nutrient-poor areas of the ocean. It plays an important role in the ocean carbon cycle, and is a key component of the base of the food web. All cells share a core set of about 1,200 genes, augmented with a variable number of “flexible” genes. Many of the latter are located in genomic islands—hypervariable regions of the genome that encode functions important in differentiating the niches of “ecotypes.” Of major interest is how cells with such a small genome regulate cellular processes, as they lack many of the regulatory proteins commonly found in bacteria. We show here that contrary to the regulatory proteins, ncRNAs are present at levels typical of bacteria, revealing that they might have a disproportional regulatory role in Prochlorococcus—likely an adaptation to the extremely low-nutrient conditions of the open oceans, combined with the constraints of a small genome. Some of the ncRNAs were differentially expressed under stress conditions, and a high number of them were found to be associated with genomic islands, suggesting functional links between these RNAs and the response of Prochlorococcus to particular environmental challenges.
Prochlorococcus is a marine cyanobacterium that numerically dominates the mid-latitude oceans and is the smallest known oxygenic phototroph. Numerous isolates from diverse areas of the world's oceans have been studied and shown to be physiologically and genetically distinct. All isolates described thus far can be assigned to either a tightly clustered high-light (HL)-adapted clade, or a more divergent low-light (LL)-adapted group. The 16S rRNA sequences of the entire Prochlorococcus group differ by at most 3%, and the four initially published genomes revealed patterns of genetic differentiation that help explain physiological differences among the isolates. Here we describe the genomes of eight newly sequenced isolates and combine them with the first four genomes for a comprehensive analysis of the core (shared by all isolates) and flexible genes of the Prochlorococcus group, and the patterns of loss and gain of the flexible genes over the course of evolution. There are 1,273 genes that represent the core shared by all 12 genomes. They are apparently sufficient, according to metabolic reconstruction, to encode a functional cell. We describe a phylogeny for all 12 isolates by subjecting their complete proteomes to three different phylogenetic analyses. For each non-core gene, we used a maximum parsimony method to estimate which ancestor likely first acquired or lost each gene. Many of the genetic differences among isolates, especially for genes involved in outer membrane synthesis and nutrient transport, are found within the same clade. Nevertheless, we identified some genes defining HL and LL ecotypes, and clades within these broad ecotypes, helping to demonstrate the basis of HL and LL adaptations in Prochlorococcus. Furthermore, our estimates of gene gain events allow us to identify highly variable genomic islands that are not apparent through simple pairwise comparisons. These results emphasize the functional roles, especially those connected to outer membrane synthesis and transport that dominate the flexible genome and set it apart from the core. Besides identifying islands and demonstrating their role throughout the history of Prochlorococcus, reconstruction of past gene gains and losses shows that much of the variability exists at the “leaves of the tree,” between the most closely related strains. Finally, the identification of core and flexible genes from this 12-genome comparison is largely consistent with the relative frequency of Prochlorococcus genes found in global ocean metagenomic databases, further closing the gap between our understanding of these organisms in the lab and the wild.
Prochlorococcus—the most abundant photosynthetic microbe living in the vast, nutrient-poor areas of the ocean—is a major contributor to the global carbon cycle. Prochlorococcus is composed of closely related, physiologically distinct lineages whose differences enable the group as a whole to proliferate over a broad range of environmental conditions. We compare the genomes of 12 strains of Prochlorococcus representing its major lineages in order to identify genetic differences affecting the ecology of different lineages and their evolutionary origin. First, we identify the core genome: the 1,273 genes shared among all strains. This core set of genes encodes the essentials of a functional cell, enabling it to make living matter out of sunlight and carbon dioxide. We then create a genomic tree that maps the gain and loss of non-core genes in individual strains, showing that a striking number of genes are gained or lost even among the most closely related strains. We find that lost and gained genes commonly cluster in highly variable regions called genomic islands. The level of diversity among the non-core genes, and the number of new genes added with each new genome sequenced, suggest far more diversity to be discovered.
ProPortal (http://proportal.mit.edu/) is a database containing genomic, metagenomic, transcriptomic and field data for the marine cyanobacterium Prochlorococcus. Our goal is to provide a source of cross-referenced data across multiple scales of biological organization—from the genome to the ecosystem—embracing the full diversity of ecotypic variation within this microbial taxon, its sister group, Synechococcus and phage that infect them. The site currently contains the genomes of 13 Prochlorococcus strains, 11 Synechococcus strains and 28 cyanophage strains that infect one or both groups. Cyanobacterial and cyanophage genes are clustered into orthologous groups that can be accessed by keyword search or through a genome browser. Users can also identify orthologous gene clusters shared by cyanobacterial and cyanophage genomes. Gene expression data for Prochlorococcus ecotypes MED4 and MIT9313 allow users to identify genes that are up or downregulated in response to environmental stressors. In addition, the transcriptome in synchronized cells grown on a 24-h light–dark cycle reveals the choreography of gene expression in cells in a ‘natural’ state. Metagenomic sequences from the Global Ocean Survey from Prochlorococcus, Synechococcus and phage genomes are archived so users can examine the differences between populations from diverse habitats. Finally, an example of cyanobacterial population data from the field is included.
Phages infecting marine picocyanobacteria often carry a psbA gene, which encodes a homolog to the photosynthetic reaction center protein, D1. Host encoded D1 decays during phage infection in the light. Phage encoded D1 may help to maintain photosynthesis during the lytic cycle, which in turn could bolster the production of deoxynucleoside triphosphates (dNTPs) for phage genome replication.
Methodology / Principal Findings
To explore the consequences to a phage of encoding and expressing psbA, we derive a simple model of infection for a cyanophage/host pair — cyanophage P-SSP7 and Prochlorococcus MED4— for which pertinent laboratory data are available. We first use the model to describe phage genome replication and the kinetics of psbA expression by host and phage. We then examine the contribution of phage psbA expression to phage genome replication under constant low irradiance (25 µE m−2 s−1). We predict that while phage psbA expression could lead to an increase in the number of phage genomes produced during a lytic cycle of between 2.5 and 4.5% (depending on parameter values), this advantage can be nearly negated by the cost of psbA in elongating the phage genome. Under higher irradiance conditions that promote D1 degradation, however, phage psbA confers a greater advantage to phage genome replication.
Conclusions / Significance
These analyses illustrate how psbA may benefit phage in the dynamic ocean surface mixed layer.
Cyanophage infecting the marine cyanobacteria Prochlorococcus and Synechococcus require light and host photosystem activity for optimal reproduction. Many cyanophages encode multiple photosynthetic electron transport (PET) proteins, which are presumed to maintain electron flow and produce ATP and NADPH for nucleotide biosynthesis and phage genome replication. However, evidence suggests phage augment NADPH production via the pentose phosphate pathway (PPP), thus calling into question the need for NADPH production by PET. Genes implicated in cyclic PET have since been identified in cyanophage genomes. It remains an open question which mode of PET, cyclic or linear, predominates in infected cyanobacteria, and thus whether the balance is towards producing ATP or NADPH. We sequenced transcriptomes of a cyanophage (P-HM2) and its host (Prochlorococcus MED4) throughout infection in the light or in the dark, and analyzed these data in the context of phage replication and metabolite measurements. Infection was robust in the light, but phage were not produced in the dark. Host gene transcripts encoding high-light inducible proteins and two terminal oxidases (plastoquinol terminal oxidase and cytochrome c oxidase)—implicated in protecting the photosynthetic membrane from light stress—were the most enriched in light but not dark infection. Among the most diminished transcripts in both light and dark infection was ferredoxin–NADP+ reductase (FNR), which uses the electron acceptor NADP+ to generate NADPH in linear photosynthesis. The phage gene for CP12, which putatively inhibits the Calvin cycle enzyme that receives NADPH from FNR, was highly expressed in light infection. Therefore, both PET production of NADPH and its consumption by carbon fixation are putatively repressed during phage infection in light. Transcriptomic evidence is thus consistent with cyclic photophosphorylation using oxygen as the terminal electron acceptor as the dominant mode of PET under infection, with ATP from PET and NADPH from the PPP producing the energy and reducing equivalents for phage nucleotide biosynthesis and replication.
Halophage HF2 is a lytic, broad-host-range bacteriophage of the extremely halophilic domain Archaea. It has a 79.7-kb double-stranded DNA genome which is linear, contains no modified nucleotides, and is not susceptible to cleavage by many type II restriction endonucleases. This insensitivity is attributed to selection against palindromic restriction sites, a commonly observed feature of broad-host-range phages. Interestingly, enzymes that did cut the genome recognized AT-rich sites, and five such enzymes, DraI, AseI, HpaI, HindIII, and SspI, were used to construct a physical map of the genome. Southern hybridization experiments used to order fragments on the map indicated homologies between the phage termini, and subsequent sequence analysis showed that HF2 possessed 306-bp direct terminal repeats. The presence of such repeats suggested replication through concatameric intermediates, and this was confirmed by analysis of the state of the phage genome in infected cells. This is a replication strategy adopted by many well-studied bacterial phages, for example T3 and T7. Other similarities between the terminal repeats of T3 or T7 and HF2 include a putative nick site at the repeat border and a series of short imperfect repeats. These observations suggest a long evolutionary history for concatamer-based strategies of phage replication, possibly predating the divergence of Archaea/Eucarya and Bacteria, or alternatively, indicate possible lateral transfer of phage genes or modules between the domains Archaea and Bacteria.
Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The ∼108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element ‘mobilome’.
Prochlorococcus is the numerically dominant phototroph in the oligotrophic subtropical ocean and carries out a significant fraction of marine primary productivity. Although field studies have provided evidence for nitrate uptake by Prochlorococcus, little is known about this trait because axenic cultures capable of growth on nitrate have not been available. Additionally, all previously sequenced genomes lacked the genes necessary for nitrate assimilation. Here we introduce three Prochlorococcus strains capable of growth on nitrate and analyze their physiology and genome architecture. We show that the growth of high-light (HL) adapted strains on nitrate is ∼17% slower than their growth on ammonium. By analyzing 41 Prochlorococcus genomes, we find that genes for nitrate assimilation have been gained multiple times during the evolution of this group, and can be found in at least three lineages. In low-light adapted strains, nitrate assimilation genes are located in the same genomic context as in marine Synechococcus. These genes are located elsewhere in HL adapted strains and may often exist as a stable genetic acquisition as suggested by the striking degree of similarity in the order, phylogeny and location of these genes in one HL adapted strain and a consensus assembly of environmental Prochlorococcus metagenome sequences. In another HL adapted strain, nitrate utilization genes may have been independently acquired as indicated by adjacent phage mobility elements; these genes are also duplicated with each copy detected in separate genomic islands. These results provide direct evidence for nitrate utilization by Prochlorococcus and illuminate the complex evolutionary history of this trait.
Cyanobacteria and their phages are significant microbial components of the freshwater and marine environments. We identified a lytic phage, Ma-LMM01, infecting Microcystis aeruginosa, a cyanobacterium that forms toxic blooms on the surfaces of freshwater lakes. Here, we describe the first sequenced freshwater cyanomyovirus genome of Ma-LMM01. The linear, circularly permuted, and terminally redundant genome has 162,109 bp and contains 184 predicted protein-coding genes and two tRNA genes. The genome exhibits no colinearity with previously sequenced genomes of cyanomyoviruses or other Myoviridae. The majority of the predicted genes have no detectable homologues in the databases. These findings indicate that Ma-LMM01 is a member of a new lineage of the Myoviridae family. The genome lacks homologues for the photosynthetic genes that are prevalent in marine cyanophages. However, it has a homologue of nblA, which is essential for the degradation of the major cyanobacteria light-harvesting complex, the phycobilisomes. The genome codes for a site-specific recombinase and two prophage antirepressors, suggesting that it has the capacity to integrate into the host genome. Ma-LMM01 possesses six genes, including three coding for transposases, that are highly similar to homologues found in cyanobacteria, suggesting that recent gene transfers have occurred between Ma-LMM01 and its host. We propose that the Ma-LMM01 NblA homologue possibly reduces the absorption of excess light energy and confers benefits to the phage living in surface waters. This phage genome study suggests that light is central in the phage-cyanobacterium relationships where the viruses use diverse genetic strategies to control their host's photosynthesis.
A myovirus-like temperate phage, ΦHAP-1, was induced with mitomycin C from a Halomonas aquamarina strain isolated from surface waters in the Gulf of Mexico. The induced cultures produced significantly more virus-like particles (VLPs) (3.73 × 1010 VLP ml−1) than control cultures (3.83 × 107 VLP ml−1) when observed with epifluorescence microscopy. The induced phage was sequenced by using linker-amplified shotgun libraries and contained a genome 39,245 nucleotides in length with a G+C content of 59%. The ΦHAP-1 genome contained 46 putative open reading frames (ORFs), with 76% sharing significant similarity (E value of <10−3) at the protein level with other sequences in GenBank. Putative functional gene assignments included small and large terminase subunits, capsid and tail genes, an N6-DNA adenine methyltransferase, and lysogeny-related genes. Although no integrase was found, the ΦHAP-1 genome contained ORFs similar to protelomerase and parA genes found in linear plasmid-like phages with telomeric ends. Southern probing and PCR analysis of host genomic, plasmid, and ΦHAP-1 DNA indicated a lack of integration of the prophage with the host chromosome and a difference in genome arrangement between the prophage and virion forms. The linear plasmid prophage form of ΦHAP-1 begins with the protelomerase gene, presumably due to the activity of the protelomerase, while the induced phage particle has a circularly permuted genome that begins with the terminase genes. The ΦHAP-1 genome shares synteny and gene similarity with coliphage N15 and vibriophages VP882 and VHML, suggesting an evolutionary heritage from an N15-like linear plasmid prophage ancestor.
A large fraction of any bacterial genome consists of hypothetical protein-coding open reading frames (ORFs). While most of these ORFs are present only in one or a few sequenced genomes, a few are conserved, often across large phylogenetic distances. Such conservation provides clues to likely uncharacterized cellular functions that need to be elucidated. Marine cyanobacteria from the Prochlorococcus/marine Synechococcus clade are dominant bacteria in oceanic waters and are significant contributors to global primary production. A Hyper Conserved Protein (PSHCP) of unknown function is 100% conserved at the amino acid level in genomes of Prochlorococcus/marine Synechococcus, but lacks homologs outside of this clade. In this study we investigated Prochlorococcus marinus strains MED4 and MIT 9313 and Synechococcus sp. strain WH 8102 for the transcription of the PSHCP gene using RT-Q-PCR, for the presence of the protein product through quantitative immunoblotting, and for the protein's binding partners in a pull down assay. Significant transcription of the gene was detected in all strains. The PSHCP protein content varied between 8±1 fmol and 26±9 fmol per ug total protein, depending on the strain. The 50 S ribosomal protein L2, the Photosystem I protein PsaD and the Ycf48-like protein were found associated with the PSHCP protein in all strains and not appreciably or at all in control experiments. We hypothesize that PSHCP is a protein associated with the ribosome, and is possibly involved in photosystem assembly.
The entire double-stranded DNA genome of the Actinobacillus actinomycetemcomitans bacteriophage AaΦ23 was sequenced. Linear DNA contained in the phage particles is circularly permuted and terminally redundant. Therefore, the physical map of the phage genome is circular. Its size is 43,033 bp with an overall molar G+C content of 42.5 mol%. Sixty-six potential open reading frames (ORFs) were identified, including an ORF resulting from a translational frameshift. A putative function could be assigned to 23 of them. Twenty-three other ORFs share homologies only with hypothetical proteins present in several bacteria or bacteriophages, and 20 ORFs seem to be specific for phage AaΦ23. The organization of the phage genome and several genetic functions share extensive similarities to that of the lambdoid phages. However, AaΦ23 encodes a DNA adenine methylase, and the DNA packaging strategy is more closely related to the P22 system. The attachment sites of AaΦ23 (attP) and several A. actinomycetemcomitans hosts (attB) are 49 bp long.
Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, and Treponema pallidum ssp. pertenue (TPE), the causative agent of yaws, are closely related spirochetes causing diseases with distinct clinical manifestations. The TPA Mexico A strain was isolated in 1953 from male, with primary syphilis, living in Mexico. Attempts to cultivate TPA Mexico A strain under in vitro conditions have revealed lower growth potential compared to other tested TPA strains.
The complete genome sequence of the TPA Mexico A strain was determined using the Illumina sequencing technique. The genome sequence assembly was verified using the whole genome fingerprinting technique and the final sequence was annotated. The genome size of the Mexico A strain was determined to be 1,140,038 bp with 1,035 predicted ORFs. The Mexico A genome sequence was compared to the whole genome sequences of three TPA (Nichols, SS14 and Chicago) and three TPE (CDC-2, Samoa D and Gauthier) strains. No large rearrangements in the Mexico A genome were found and the identified nucleotide changes occurred most frequently in genes encoding putative virulence factors. Nevertheless, the genome of the Mexico A strain, revealed two genes (TPAMA_0326 (tp92) and TPAMA_0488 (mcp2-1)) which combine TPA- and TPE- specific nucleotide sequences. Both genes were found to be under positive selection within TPA strains and also between TPA and TPE strains.
The observed mosaic character of the TPAMA_0326 and TPAMA_0488 loci is likely a result of inter-strain recombination between TPA and TPE strains during simultaneous infection of a single host suggesting horizontal gene transfer between treponemal subspecies.
Treponema pallidum is a Gram-negative spirochete that causes diseases with distinct clinical manifestations and uses different transmission strategies. While syphilis (caused by subspecies pallidum) is a worldwide venereal and congenital disease, yaws (caused by subspecies pertenue) is a tropical disease transmitted by direct skin contact. Currently the genetic basis and evolution of these diseases remain unknown.
In this study, we describe a high quality whole genome sequence of T. pallidum ssp. pallidum strain Mexico A, determined using the ?next generation? sequencing technique (Illumina). Although the genome of this strain contains no large rearrangements in comparison with other treponemal genomes, we found two genes which combined sequences from both subspecies pallidum and pertenue. The observed mosaic character of these two genes is likely a result of inter-strain recombination between pallidum and pertenue during simultaneous infection of a single host.
Lactococci isolated from non-dairy sources have been found to possess enhanced metabolic activity when compared to dairy strains. These capabilities may be harnessed through the use of these strains as starter or adjunct cultures to produce more diverse flavor profiles in cheese and other dairy products. To understand the interactions between these organisms and the phages that infect them, a number of phages were isolated against lactococcal strains of non-dairy origin. One such phage, ΦL47, was isolated from a sewage sample using the grass isolate L. lactis ssp. cremoris DPC6860 as a host. Visualization of phage virions by transmission electron microscopy established that this phage belongs to the family Siphoviridae and possesses a long tail fiber, previously unseen in dairy lactococcal phages. Determination of the lytic spectrum revealed a broader than expected host range, with ΦL47 capable of infecting 4 industrial dairy strains, including ML8, HP and 310, and 3 additional non-dairy isolates. Whole genome sequencing of ΦL47 revealed a dsDNA genome of 128, 546 bp, making it the largest sequenced lactococcal phage to date. In total, 190 open reading frames (ORFs) were identified, and comparative analysis revealed that the predicted products of 117 of these ORFs shared greater than 50% amino acid identity with those of L. lactis phage Φ949, a phage isolated from cheese whey. Despite their different ecological niches, the genomic content and organization of ΦL47 and Φ949 are quite similar, with both containing 4 gene clusters oriented in different transcriptional directions. Other features that distinguish ΦL47 from Φ949 and other lactococcal phages, in addition to the presence of the tail fiber and the genome length, include a low GC content (32.5%) and a high number of predicted tRNA genes (8). Comparative genome analysis supports the conclusion that ΦL47 is a new member of the 949 lactococcal phage group which currently includes the dairy Φ949.
Lactococcus lactis; non-dairy; phage; tail fiber; genome
Vibrio parahaemolyticus O3:K6 pandemic strains recovered in Chile frequently possess a 42-kb plasmid which is the prophage of a myovirus. We studied the prototype phage VP58.5 and show that it does not integrate into the host cell chromosome but replicates as a linear plasmid (Vp58.5) with covalently closed ends (telomeres). The Vp58.5 replicon coexists with other plasmid prophages (N15, PY54, and ΦKO2) in the same cell and thus belongs to a new incompatibility group of telomere phages. We determined the complete nucleotide sequence (42,612 nucleotides) of the VP58.5 phage DNA and compared it with that of the plasmid prophage. The two molecules share the same nucleotide sequence but are 35% circularly permuted to each other. In contrast to the hairpin ends of the plasmid, VP58.5 phage DNA contains 5′-protruding ends. The VP58.5 sequence is 92% identical to the sequence of phage VHML, which was reported to integrate into the host chromosome. However, the gene order and termini of the phage DNAs are different. The VHML genome exhibits the same gene order as does the Vp58.5 plasmid. VHML phage DNA has been reported to contain terminal inverted repeats. This repetitive sequence is similar to the telomere resolution site (telRL) of VP58.5 which, after processing by the phage protelomerase, forms the hairpin ends of the Vp58.5 prophage. It is discussed why these closely related phages may be so different in terms of their genome ends and their lifestyle.
The complete sequence of the 46,267 bp genome of the lytic bacteriophage tf specific to Pseudomonas putida PpG1 has been determined. The phage genome has two sets of convergently transcribed genes and 186 bp long direct terminal repeats. The overall genomic architecture of the tf phage is similar to that of the previously described Pseudomonas aeruginosa phages PaP3, LUZ24 and phiMR299-2, and 39 out of the 72 products of predicted tf open reading frames have orthologs in these phages. Accordingly, tf was classified as belonging to the LUZ24-like bacteriophage group. However, taking into account very low homology levels between tf DNA and that of the other phages, tf should be considered as an evolutionary divergent member of the group. Two distinguishing features not reported for other members of the group were found in the tf genome. Firstly, a unique end structure – a blunt right end and a 4-nucleotide 3′-protruding left end – was observed. Secondly, 14 single-chain interruptions (nicks) were found in the top strand of the tf DNA. All nicks were mapped within a consensus sequence 5′-TACT/RTGMC-3′. Two nicks were analyzed in detail and were shown to be present in more than 90% of the phage population. Although localized nicks were previously found only in the DNA of T5-like and phiKMV-like phages, it seems increasingly likely that this enigmatic structural feature is common to various other bacteriophages.
Sequencing analysis of mitochondrial genomes is important for understanding the evolution and genome structures of various plant species. Barley is a self-pollinated diploid plant with seven chromosomes comprising a large haploid genome of 5.1 Gbp. Wild barley (Hordeum vulgare ssp. spontaneum) and cultivated barley (H. vulgare ssp. vulgare) have cross compatibility and closely related genomes, although a significant number of nucleotide polymorphisms have been reported between their genomes.
We determined the complete nucleotide sequences of the mitochondrial genomes of wild and cultivated barley. Two independent circular maps of the 525,599 bp barley mitochondrial genome were constructed by de novo assembly of high-throughput sequencing reads of barley lines H602 and Haruna Nijo, with only three SNPs detected between haplotypes. These mitochondrial genomes contained 33 protein-coding genes, three ribosomal RNAs, 16 transfer RNAs, 188 new ORFs, six major repeat sequences and several types of transposable elements. Of the barley mitochondrial genome-encoded proteins, NAD6, NAD9 and RPS4 had unique structures among grass species.
The mitochondrial genome of barley was similar to those of other grass species in terms of gene content, but the configuration of the genes was highly differentiated from that of other grass species. Mitochondrial genome sequencing is essential for annotating the barley nuclear genome; our mitochondrial sequencing identified a significant number of fragmented mitochondrial sequences in the reported nuclear genome sequences. Little polymorphism was detected in the barley mitochondrial genome sequences, which should be explored further to elucidate the evolution of barley.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-3159-3) contains supplementary material, which is available to authorized users.
Hordeum vulgare; Mitochondrial genome; De novo assembly; Comparative genomics
Bacteriophage asccφ28 infects dairy fermentation strains of Lactococcus lactis. This report describes characterization of asccφ28 and its full genome sequence. Phage asccφ28 has a prolate head, whiskers, and a short tail (C2 morphotype). This morphology and DNA hybridization to L. lactis phage P369 DNA showed that asccφ28 belongs to the P034 phage species, a group rarely encountered in the dairy industry. The burst size of asccφ28 was found to be 121 ± 18 PFU per infected bacterial cell after a latent period of 44 min. The linear genome (18,762 bp) contains 28 possible open reading frames (ORFs) comprising 90% of the total genome. The ORFs are arranged bidirectionally in recognizable functional modules. The genome contains 577 bp inverted terminal repeats (ITRs) and putatively eight promoters and four terminators. The presence of ITRs, a phage-encoded DNA polymerase, and a terminal protein that binds to the DNA, along with BLAST and morphology data, show that asccφ28 more closely resembles streptococcal phage Cp-1 and the φ29-like phages that infect Bacillus subtilis than it resembles common lactococcal phages. The sequence of this phage is the first published sequence of a P034 species phage genome.
The purpose of this study was to investigate the characteristics of transfer RNA (tRNA) responsible for the association between tRNA genes and genes of apparently foreign origin (genomic islands) in five high-light adapted Prochlorococcus strains. Both bidirectional best BLASTP (basic local alignment search tool for proteins) search and the conservation of gene order against each other were utilized to identify genomic islands, and 7 genomic islands were found to be immediately adjacent to tRNAs in Prochlorococcus marinus AS9601, 11 in P. marinus MIT9515, 8 in P. marinus MED4, 6 in P. marinus MIT9301, and 6 in P. marinus MIT9312. Monte Carlo simulation showed that tRNA genes are hotspots for the integration of genomic islands in Prochlorococcus strains. The tRNA genes associated with genomic islands showed the following characteristics: (1) the association was biased towards a specific subset of all iso-accepting tRNA genes; (2) the codon usages of genes within genomic islands appear to be unrelated to the codons recognized by associated tRNAs; and, (3) the majority of the 3′ ends of associated tRNAs lack CCA ends. These findings contradict previous hypotheses concerning the molecular basis for the frequent use of tRNA as the insertion site for foreign genetic materials. The analysis of a genomic island associated with a tRNA-Asn gene in P. marinus MIT9301 suggests that foreign genetic material is inserted into the host genomes by means of site-specific recombination, with the 3′ end of the tRNA as the target, and during the process, a direct repeat of the 3′ end sequence of a boundary tRNA (namely, a scar from the process of insertion) is formed elsewhere in the genomic island. Through the analysis of the sequences of these targets, it can be concluded that a region characterized by both high GC content and a palindromic structure is the preferred insertion site.
Genomic islands; Prochlorococcus; Transfer RNA (tRNA); Palindromic structure; Codon usage
The complete genome of φEcoM-GJ1, a lytic phage that attacks porcine enterotoxigenic Escherichia coli of serotype O149:H10:F4, was sequenced and analyzed. The morphology of the phage and the identity of the structural proteins were also determined. The genome consisted of 52,975 bp with a G+C content of 44% and was terminally redundant and circularly permuted. Seventy-five potential open reading frames (ORFs) were identified and annotated, but only 29 possessed homologs. The proteins of five ORFs showed homology with proteins of phages of the family Myoviridae, nine with proteins of phages of the family Podoviridae, and six with proteins of phages of the family Siphoviridae. ORF 1 encoded a T7-like single-subunit RNA polymerase and was preceded by a putative E. coli σ70-like promoter. Nine putative phage promoters were detected throughout the genome. The genome included a tRNA gene of 95 bp that had a putative 18-bp intron. The phage morphology was typical of phages of the family Myoviridae, with an icosahedral head, a neck, and a long contractile tail with tail fibers. The analysis shows that φEcoM-GJ1 is unique, having the morphology of the Myoviridae, a gene for RNA polymerase, which is characteristic of phages of the T7 group of the Podoviridae, and several genes that encode proteins with homology to proteins of phages of the family Siphoviridae.
Phage genome analysis is a rapidly growing field. Recurrent obstacles include software access and usability, as well as genome sequences that vary in sequence orientation and/or start position. Here we describe modifications to the phage comparative genomics software program, Phamerator, provide public access to the code, and include instructions for creating custom Phamerator databases. We further report genomic analysis techniques to determine phage packaging strategies and identification of the physical ends of phage genomes.
The original Phamerator code can be successfully modified and custom databases can be generated using the instructions we provide. Results of genome map comparisons within a custom database reveal obstacles in performing the comparisons if a published genome has an incorrect complementarity or an incorrect location of the first base of the genome, which are common issues in GenBank-downloaded sequence files. To address these issues, we review phage packaging strategies and provide results that demonstrate identification of the genome start location and orientation using raw sequencing data and software programs such as PAUSE and Consed to establish the location of the physical ends of the genome. These results include determination of exact direct terminal repeats (DTRs) or cohesive ends, or whether phages may use a headful packaging strategy. Phylogenetic analysis using ClustalO and phamily circles in Phamerator demonstrate that the large terminase gene can be used to identify the phage packaging strategy and thereby aide in identifying the physical ends of the genome.
Using available online code, the Phamerator program can be customized and utilized to generate databases with individually selected genomes. These databases can then provide fruitful information in the comparative analysis of phages. Researchers can identify packaging strategies and physical ends of phage genomes using raw data from high-throughput sequencing in conjunction with phylogenetic analyses of large terminase proteins and the use of custom Phamerator databases. We promote publication of phage genomes in an orientation consistent with the physical structure of the phage chromosome and provide guidance for determining this structure.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-3018-2) contains supplementary material, which is available to authorized users.
Phage; Terminase; Phamerator; Phylogenetic tree; DNA packaging; Comparative genomics; Sequencing
Epidemics and pandemics of cholera, a diarrheal disease, are attributed to Vibrio cholerae serogroups O1 and O139. In recent years, specific lytic phages of V. cholerae have been proposed to be important factors in the cyclic occurrence of cholera in endemic areas. However, the role and potential participation of lytic phages during long interepidemic periods of cholera in non-endemic regions have not yet been described. The purpose of this study was to isolate and characterize specific lytic phages of V. cholerae O1 strains.
Sixteen phages were isolated from wastewater samples collected at the Endhó Dam in Hidalgo State, Mexico, concentrated with PEG/NaCl, and purified by density gradient. The lytic activity of the purified phages was tested using different V. cholerae O1 and O139 strains. Phage morphology was visualized by transmission electron microscopy (TEM), and phage genome sequencing was performed using the Genome Analyzer IIx System. Genome assembly and bioinformatics analysis were performed using a set of high-throughput programs. Phage structural proteins were analyzed by mass spectrometry.
Sixteen phages with lytic and lysogenic activity were isolated; only phage ØVC8 showed specific lytic activity against V. cholerae O1 strains. TEM images of ØVC8 revealed a phage with a short tail and an isometric head. The ØVC8 genome comprises linear double-stranded DNA of 39,422 bp with 50.8 % G + C. Of the 48 annotated ORFs, 16 exhibit homology with sequences of known function and several conserved domains. Bioinformatics analysis showed multiple conserved domains, including an Ig domain, suggesting that ØVC8 might adhere to different mucus substrates such as the human intestinal epithelium. The results suggest that ØVC8 genome utilize the “single-stranded cohesive ends” packaging strategy of the lambda-like group. The two structural proteins sequenced and analyzed are proteins of known function.
ØVC8 is a lytic phage with specific activity against V. cholerae O1 strains and is grouped as a member of the VP2-like phage subfamily. The encoding of an Ig domain by ØVC8 makes this phage a good candidate for use in phage therapy and an alternative tool for monitoring V. cholerae populations.
Electronic supplementary material
The online version of this article (doi:10.1186/s12985-016-0490-x) contains supplementary material, which is available to authorized users.
Vibrio cholerae; Bacteriophage; Caudovirales; Podoviridae; ØVC8