Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.
Next-generation sequencing; Drafts; Prokaryotic genomes; Computational tools; Omics
Corynebacterium ulcerans is a bacterial species with high importance because it causes infections in animals and, rarely, in humans. Its virulence mechanisms remain unclear. The current study describes the draft genome of C. ulcerans FRC58, which was isolated from the bronchitic aspiration of a patient in France.
The completion of whole-genome sequencing for Corynebacterium pseudotuberculosis strain 1002 has contributed to major advances in research aimed at understanding the biology of this microorganism. This bacterium causes significant loss to goat and sheep farmers because it is the causal agent of the infectious disease caseous lymphadenitis, which may lead to outcomes ranging from skin injury to animal death. In the current study, we simulated the conditions experienced by the bacteria during host infection. By sequencing transcripts using the SOLiDTM 3 Plus platform, we identified new targets expected to potentiate the survival and replication of the pathogen in adverse environments. These results may also identify possible candidates useful for the development of vaccines, diagnostic kits or therapies aimed at the reduction of losses in agribusiness.
Under the 3 simulated conditions (acid, osmotic and thermal shock stresses), 474 differentially expressed genes exhibiting at least a 2-fold change in expression levels were identified. Important genes to the infection process were induced, such as those involved in virulence, defence against oxidative stress, adhesion and regulation, and many genes encoded hypothetical proteins, indicating that further investigation of the bacterium is necessary. The data will contribute to a better understanding of the biology of C. pseudotuberculosis and to studies investigating strategies to control the disease.
Despite the veterinary importance of C. pseudotuberculosis, the bacterium is poorly characterised; therefore, effective treatments for caseous lymphadenitis have been difficult to establish. Through the use of RNAseq, these results provide a better biological understanding of this bacterium, shed light on the most likely survival mechanisms used by this microorganism in adverse environments and identify candidates that may help reduce or even eradicate the problems caused by this disease.
Differential gene expression; Transcripts; RNAseq; SOLID™; Stress; C. pseudotuberculosis
Serratia fonticola UTAD54 is an environmental isolate that is resistant to carbapenems due to the presence of a class A carbapenemase and a metallo-β-lactamase that are unique to this strain. Its draft genome sequence was obtained to clarify the molecular basis of its carbapenem resistance and identify the genomic context of its carbapenem resistance determinants.
Serratia fonticola is a Gram-negative bacterium with a wide distribution in aquatic environments. On some occasions, it has also been regarded as a significant human pathogen. In this work, we report the first draft genome sequence of an S. fonticola strain (LMG 7882T), which was isolated from freshwater.
An epidemic of surgical-site infections by a single strain of Mycobacterium abscessus subsp. bolletii affected >1,700 patients in Brazil from 2004 to 2008. The genome of the epidemic prototype strain M. abscessus subsp. bolletii INCQS 00594, deposited in the collection of the National Institute for Health Quality Control (INCQS), was sequenced.
Since the first successful attempt at sequencing the Corynebacterium pseudotuberculosis genome, large amounts of genomic, transcriptomic and proteomic data have been generated. C. pseudotuberculosis is an interesting bacterium due to its great zoonotic potential and because it causes considerable economic losses worldwide. Furthermore, different strains of C. pseudotuberculosis are capable of causing various diseases in different hosts. Currently, we seek information about the phylogenetic relationships between different strains of C. pseudotuberculosis isolates from different hosts across the world and to employ these data to develop tools to diagnose and eradicate the diseases these strains cause. In this review, we present the latest findings on C. pseudotuberculosis that have been obtained with the most advanced techniques for sequencing and genomic organization. We also discuss the development of in silico tools for processing these data to prompt a better understanding of this pathogen.
Corynebacterium pseudotuberculosis; SOLiD next generation sequencing; Ion Torrent next generation sequencing; SDS-PAGE; mass spectrometry; RNA-seq
Microcystis aeruginosa strain SPC777 is an important toxin-producing cyanobacterium, isolated from a water bloom of the Billings reservoir (São Paulo State, Brazil). Here, we report the draft genome sequence and initial findings from a preliminary analysis of strain SPC777, including several gene clusters involved in nonribosomal and ribosomal synthesis of secondary metabolites.
Methylobacterium mesophilicum strain SR1.6/6 is an endophytic bacterium isolated from a surface-sterilized Citrus sinensis branch. Ecological and biotechnological aspects of this bacterium, such as the genes involved in its association with the host plant and the primary oxidation of methanol, were annotated in the draft genome.
Corynebacterium pseudotuberculosis is of major veterinary importance because it affects many animal species, causing economically significant livestock diseases and losses. Therefore, the genomic sequencing of various lines of this organism, isolated from different hosts, will aid in the development of diagnostic methods and new prevention and treatment strategies and improve our knowledge of the biology of this microorganism. In this study, we present the genome of C. pseudotuberculosis Cp31, isolated from a buffalo in Egypt.
Exiguobacterium antarcticum is a psychotropic bacterium isolated for the first time from microbial mats of Lake Fryxell in Antarctica. Many organisms of the genus Exiguobacterium are extremophiles and have properties of biotechnological interest, e.g., the capacity to adapt to cold, which make this genus a target for discovering new enzymes, such as lipases and proteases, in addition to improving our understanding of the mechanisms of adaptation and survival at low temperatures. This study presents the genome of E. antarcticum B7, isolated from a biofilm sample of Ginger Lake on King George Island, Antarctic peninsula.
The bacterium Corynebacterium pseudotuberculosis is of major veterinary importance because it affects livestock, particularly sheep, goats, and horses, in several countries, including Australia, Brazil, the United States, and Canada, resulting in significant economic losses. In the present study, we describe the complete genome of the Corynebacterium pseudotuberculosis Cp316 strain, biovar equi, isolated from the abscess of a North American horse.
Streptococcus agalactiae (Lancefield group B; GBS) is the causative agent of meningoencephalitis in fish, mastitis in cows, and neonatal sepsis in humans. Meningoencephalitis is a major health problem for tilapia farming and is responsible for high economic losses worldwide. Despite its importance, the genomic characteristics and the main molecular mechanisms involved in virulence of S. agalactiae isolated from fish are still poorly understood. Here, we present the genomic features of the 1,820,886 bp long complete genome sequence of S. agalactiae SA20-06 isolated from a meningoencephalitis outbreak in Nile tilapia (Oreochromis niloticus) from Brazil, and its annotation, consisting of 1,710 protein-coding genes (excluding pseudogenes), 7 rRNA operons, 79 tRNA genes and 62 pseudogenes.
Streptococcus agalactiae; fish pathogen; genome sequencing
Methanosarcina mazei is a strictly anaerobic methanogen from the Methanosarcinales order, which is known for its broad catabolic range among methanogens and is widespread throughout diverse environments. The draft genome of the strain presented here was cultivated from sediment samples collected from the Tucuruí hydroelectric power station reservoir.
An extended outbreak of mycobacterial surgical infections occurred in Brazil during 2004–2008. Most infections were caused by a single strain of Mycobacterium abscessus subsp. bolletii, which was characterized by a specific rpoB sequevar and two highly similar pulsed-field gel electrophoresis (PFGE) patterns differentiated by the presence of a ∼50 kb band. The nature of this band was investigated.
Genomic sequencing of the prototype outbreak isolate INCQS 00594 using the SOLiD platform demonstrated the presence of a 56,264-bp circular plasmid, designated pMAB01. Identity matrices, genetic distances and phylogeny analyses indicated that pMAB01 belongs to the broad-host-range plasmid subgroup IncP-1β and is highly related to BRA100, pJP4, pAKD33 and pB10. The presence of pMAB01-derived sequences in 41 M. abscessus subsp. bolletii isolates was evaluated using PCR, PFGE and Southern blot hybridization. Sixteen of the 41 isolates showed the presence of the plasmid. The plasmid was visualized as a ∼50-kb band using PFGE and Southern blot hybridization in 12 isolates. The remaining 25 isolates did not exhibit any evidence of this plasmid. The plasmid was successfully transferred to Escherichia coli by conjugation and transformation. Lateral transfer of pMAB01 to the high efficient plasmid transformation strain Mycobacterium smegmatis mc2155 could not be demonstrated.
The occurrence of a broad-host-range IncP-1β plasmid in mycobacteria is reported for the first time. Thus, genetic exchange could result in the emergence of specific strains that might be better adapted to cause human disease.
Corynebacterium pseudotuberculosis is a pathogen of great veterinary and economic importance, since it affects livestock, mainly sheep and goats, worldwide, together with reports of its presence in camels in several Arabic, Asiatic, and East and West African countries, as well as Australia. In this article, we report the genome sequence of Corynebacterium pseudotuberculosis strain Cp162, collected from the external neck abscess of a camel in the United Kingdom.
Here, we report the whole-genome sequences of two ovine-pathogenic Corynebacterium pseudotuberculosis isolates: strain 3/99-5, which represents the first C. pseudotuberculosis genome originating from the United Kingdom, and 42/02-A, the second from Australia. These genome sequences will contribute to the objective of determining the global pan-genome of this bacterium.
New sequencing platforms have enabled rapid decoding of complete prokaryotic genomes at relatively low cost. The Ion Torrent platform is an example of these technologies, characterized by lower coverage, generating challenges for the genome assembly. One particular problem is the lack of genomes that enable reference-based assembly, such as the one used in the present study, Corynebacterium pseudotuberculosis biovar equi, which causes high economic losses in the US equine industry. The quality treatment strategy incorporated into the assembly pipeline enabled a 16-fold greater use of the sequencing data obtained compared with traditional quality filter approaches. Data preprocessing prior to the de novo assembly enabled the use of known methodologies in the next-generation sequencing data assembly. Moreover, manual curation was proved to be essential for ensuring a quality assembly, which was validated by comparative genomics with other species of the genus Corynebacterium. The present study presents a modus operandi that enables a greater and better use of data obtained from semiconductor sequencing for obtaining the complete genome from a prokaryotic microorganism, C. pseudotuberculosis, which is not a traditional biological model such as Escherichia coli.
Corynebacterium pseudotuberculosis equi is a Gram-positive pathogenic bacterium which affects a variety of hosts. Besides the great economic losses it causes to horse-breeders, this organism is also known to be an important infectious agent to cattle and buffaloes. As an outcome of the efforts in characterizing the molecular basis of its virulence, several complete genome sequences were made available in recent years, enabling the large-scale assessment of genes throughout distinct isolates. Meanwhile, the RNA-seq stood out as the technology of choice for comprehensive transcriptome studies, which may bring valuable information regarding active genomic regions, despite of the still impeditive associated costs. In an attempt to increase the use of generated reads per instrument run, by effectively eliminating unwanted rRNAs from total RNA samples without relying on any commercially available kits, we applied denaturing high-performance liquid chromatography (DHPLC) as an alternative method to assess the transcriptional profile of C. pseudotuberculosis. We have found that the DHPLC depletion method, allied to Ion Torrent sequencing, allows mapping of transcripts in a comprehensive way and identifying novel transcripts when a de novo approach is used. These data encourage us to use DHPLC in future transcriptional evaluations in C. pseudotuberculosis.
Corynebacterium pseudotuberculosis causes disease in several animal species, although distinct biovars exist that appear to be restricted to specific hosts. In order to facilitate a better understanding of the differences between biovars, we report here the complete genome sequence of the equine pathogen Corynebacterium pseudotuberculosis strain 1/06-A.
Vibrio cholerae is the causal organism of the cholera epidemic, which is mostly prevalent in developing and underdeveloped countries. However, incidences of cholera in developed countries are also alarming. Because of the emergence of new drug-resistant strains, even though several generic drugs and vaccines have been developed over time, Vibrio infections remain a global health problem that appeals for the development of novel drugs and vaccines against the pathogen. Here, applying comparative proteomic and reverse vaccinology approaches to the exoproteome and secretome of the pathogen, we have identified three candidate targets (ompU, uppP and yajC) for most of the pathogenic Vibrio strains. Two targets (uppP and yajC) are novel to Vibrio, and two targets (uppP and ompU) can be used to develop both drugs and vaccines (dual targets) against broad spectrum Vibrio serotypes. Using our novel computational approach, we have identified three peptide vaccine candidates that have high potential to induce both B- and T-cell-mediated immune responses from our identified two dual targets. These two targets were modeled and subjected to virtual screening against natural compounds derived from Piper betel. Seven compounds were identified first time from Piper betel to be highly effective to render the function of these targets to identify them as emerging potential drugs against Vibrio. Our preliminary validation suggests that these identified peptide vaccines and betel compounds are highly effective against Vibrio cholerae. Currently we are exhaustively validating these targets, candidate peptide vaccines, and betel derived lead compounds against a number of Vibrio species.
Corynebacterium pseudotuberculosis is a facultative intracellular pathogen and the causative agent of several infectious and contagious chronic diseases, including caseous lymphadenitis, ulcerative lymphangitis, mastitis, and edematous skin disease, in a broad spectrum of hosts. In addition, Corynebacterium pseudotuberculosis infections pose a rising worldwide economic problem in ruminants. The complete genome sequences of 15 C. pseudotuberculosis strains isolated from different hosts and countries were comparatively analyzed using a pan-genomic strategy. Phylogenomic, pan-genomic, core genomic, and singleton analyses revealed close relationships among pathogenic corynebacteria, the clonal-like behavior of C. pseudotuberculosis and slow increases in the sizes of pan-genomes. According to extrapolations based on the pan-genomes, core genomes and singletons, the C. pseudotuberculosis biovar ovis shows a more clonal-like behavior than the C. pseudotuberculosis biovar equi. Most of the variable genes of the biovar ovis strains were acquired in a block through horizontal gene transfer and are highly conserved, whereas the biovar equi strains contain great variability, both intra- and inter-biovar, in the 16 detected pathogenicity islands (PAIs). With respect to the gene content of the PAIs, the most interesting finding is the high similarity of the pilus genes in the biovar ovis strains compared with the great variability of these genes in the biovar equi strains. Concluding, the polymerization of complete pilus structures in biovar ovis could be responsible for a remarkable ability of these strains to spread throughout host tissues and penetrate cells to live intracellularly, in contrast with the biovar equi, which rarely attacks visceral organs. Intracellularly, the biovar ovis strains are expected to have less contact with other organisms than the biovar equi strains, thereby explaining the significant clonal-like behavior of the biovar ovis strains.
Genome assembly has always been complicated due to the inherent difficulties of sequencing technologies, as well the
computational methods used to process sequences. Although many of the problems for the generation of contigs from reads are
well known, especially those involving short reads, the orientation and ordination of contigs in the finishing stages is still very
challenging and time consuming, as it requires the manual curation of the contigs to guarantee correct identification them and
prevent misassembly. Due to the large numbers of sequences that are produced, especially from the reads produced by next
generation sequencers, this process demands considerable manual effort, and there are few software options available to facilitate
the process. To address this problem, we have developed the Graphic Contig Analyzer for All Sequencing Platforms (G4ALL): a
stand-alone multi-user tool that facilitates the editing of the contigs produced in the assembly process. Besides providing
information on the gene products contained in each contig, obtained through a search of the available biological databases, G4ALL
produces a scaffold of the genome, based on the overlap of the contigs after curation.
The software is available at: http://www.genoma.ufpa.br/rramos/softwares/g4all.xhtml
Genome assembly; Bioinformatic tools; sequence analysis; software
Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become
important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges
have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have
been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However,
most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience
given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a
computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote
management by multiple assemblers through XML templates.
AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher.
Next-generation sequencing; Genome Assembly; Bioinformatics