Search tips
Search criteria

Results 1-18 (18)

Clipboard (0)

Select a Filter Below

Year of Publication
author:("gina, Paul")
1.  Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains 
PLoS Biology  2014;12(8):e1001920.
This manuscript calls for an international effort to generate a comprehensive catalog from genome sequences of all the archaeal and bacterial type strains.
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
PMCID: PMC4122341  PMID: 25093819
2.  Genome Sequences of Industrially Relevant Saccharomyces cerevisiae Strain M3707, Isolated from a Sample of Distillers Yeast and Four Haploid Derivatives 
Genome Announcements  2013;1(3):e00323-13.
Saccharomyces cerevisiae strain M3707 was isolated from a sample of commercial distillers yeast, and its genome sequence together with the genome sequences for the four derived haploid strains M3836, M3837, M3838, and M3839 has been determined. Yeasts have potential for consolidated bioprocessing (CBP) for biofuel production, and access to these genome sequences will facilitate their development.
PMCID: PMC3675515  PMID: 23792743
3.  Complete genome sequence of Rhodospirillum rubrum type strain (S1T) 
Standards in Genomic Sciences  2011;4(3):293-302.
Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rhodospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteobacteria. The species is of special interest because it is an anoxygenic phototroph that produces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1T can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in particular for physiological and genetic studies. Next to R. centenum strain SW, the genome sequence of strain S1T is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chromosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002.
PMCID: PMC3156396  PMID: 21886856
facultatively anaerobic; photolithotrophic; mesophile; Gram-negative; motile; Rhodospirillaceae; Alphaproteobacteria; DOEM 2002
4.  Meeting Report from the Genomic Standards Consortium (GSC) Workshops 6 and 7 
Standards in Genomic Sciences  2009;1(1):68-71.
This report summarizes the proceedings of the 6th and 7th workshops of the Genomic Standards Consortium (GSC), held back-to-back in 2008. GSC 6 focused on furthering the activities of GSC working groups, GSC 7 focused on outreach to the wider community. GSC 6 was held October 10-14, 2008 at the European Bioinformatics Institute, Cambridge, United Kingdom and included a two-day workshop focused on the refinement of the Genomic Contextual Data Markup Language (GCDML). GSC 7 was held as the opening day of the International Congress on Metagenomics 2008 in San Diego California. Major achievements of these combined meetings included an agreement from the International Nucleotide Sequence Database Consortium (INSDC) to create a “MIGS” keyword for capturing ”Minimum Information about a Genome Sequence” compliant information within INSDC (DDBJ/EMBL /Genbank) records, launch of GCDML 1.0, MIGS compliance of the first set of “Genomic Encyclopedia of Bacteria and Archaea” project genomes, approval of a proposal to extend MIGS to 16S rRNA sequences within a “Minimum Information about an Environmental Sequence”, finalization of plans for the GSC eJournal, “Standards in Genomic Sciences” (SIGS), and the formation of a GSC Board. Subsequently, the GSC has been awarded a Research Co-ordination Network (RCN4GSC) grant from the National Science Foundation, held the first SIGS workshop and launched the journal. The GSC will also be hosting outreach workshops at both ISMB 2009 and PSB 2010 focused on “Metagenomics, Metadata and MetaAnalysis” (M3). Further information about the GSC and its range of activities can be found at, including videos of all the presentations at GSC 7.
PMCID: PMC3035212  PMID: 21304639
5.  Identification of ribosomal RNA genes in metagenomic fragments 
Bioinformatics  2009;25(10):1338-1340.
Motivation: Identification of genes coding for ribosomal RNA (rRNA) is considered an important goal in the analysis of data from metagenomics projects. Here, we report the development of a software program designed for the identification of rRNA genes from metagenomic fragments based on hidden Markov models (HMMs). This program provides rRNA gene predictions with high sensitivity and specificity on artificially fragmented genomic DNAs.
Availability: Supplementary files, scripts and sample data are available at
Supplementary information: Supplementary Data are available at Bioinformatics online.
PMCID: PMC2677747  PMID: 19346323
6.  Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities 
PLoS ONE  2008;3(8):e3042.
Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. Until recently, metatranscriptomics has been limited to microarray technology and random cloning methodologies. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities.
Methodology/Principal Findings
We present a study of a complex marine metatranscriptome obtained from random whole-community mRNA using the GS-FLX Pyrosequencing technology. Eight samples, four DNA and four mRNA, were processed from two time points in a controlled coastal ocean mesocosm study (Bergen, Norway) involving an induced phytoplankton bloom producing a total of 323,161,989 base pairs. Our study confirms the finding of the first published metatranscriptomic studies of marine and soil environments that metatranscriptomics targets highly expressed sequences which are frequently novel. Our alternative methodology increases the range of experimental options available for conducting such studies and is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. Analysis of corresponding metagenomes confirms much higher levels of assembly in the metatranscriptomic samples and a far higher yield of large gene families with >100 members, ∼91% of which were novel.
This study provides further evidence that metatranscriptomic studies of natural microbial communities are not only feasible, but when paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities – if we can overcome the challenges of elucidating the functions of so many never-seen-before gene families.
PMCID: PMC2518522  PMID: 18725995
7.  The minimum information about a genome sequence (MIGS) specification 
Nature biotechnology  2008;26(5):541-547.
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
PMCID: PMC2409278  PMID: 18464787
8.  Genome Sequence of the Cellulolytic Gliding Bacterium Cytophaga hutchinsonii▿ † 
Applied and Environmental Microbiology  2007;73(11):3536-3546.
The complete DNA sequence of the aerobic cellulolytic soil bacterium Cytophaga hutchinsonii, which belongs to the phylum Bacteroidetes, is presented. The genome consists of a single, circular, 4.43-Mb chromosome containing 3,790 open reading frames, 1,986 of which have been assigned a tentative function. Two of the most striking characteristics of C. hutchinsonii are its rapid gliding motility over surfaces and its contact-dependent digestion of crystalline cellulose. The mechanism of C. hutchinsonii motility is not known, but its genome contains homologs for each of the gld genes that are required for gliding of the distantly related bacteroidete Flavobacterium johnsoniae. Cytophaga-Flavobacterium gliding appears to be novel and does not involve well-studied motility organelles such as flagella or type IV pili. Many genes thought to encode proteins involved in cellulose utilization were identified. These include candidate endo-β-1,4-glucanases and β-glucosidases. Surprisingly, obvious homologs of known cellobiohydrolases were not detected. Since such enzymes are needed for efficient cellulose digestion by well-studied cellulolytic bacteria, C. hutchinsonii either has novel cellobiohydrolases or has an unusual method of cellulose utilization. Genes encoding proteins with cohesin domains, which are characteristic of cellulosomes, were absent, but many proteins predicted to be involved in polysaccharide utilization had putative D5 domains, which are thought to be involved in anchoring proteins to the cell surface.
PMCID: PMC1932680  PMID: 17400776
9.  Genome of Methylobacillus flagellatus, Molecular Basis for Obligate Methylotrophy, and Polyphyletic Origin of Methylotrophy▿ †  
Journal of Bacteriology  2007;189(11):4020-4027.
Along with methane, methanol and methylated amines represent important biogenic atmospheric constituents; thus, not only methanotrophs but also nonmethanotrophic methylotrophs play a significant role in global carbon cycling. The complete genome of a model obligate methanol and methylamine utilizer, Methylobacillus flagellatus (strain KT) was sequenced. The genome is represented by a single circular chromosome of approximately 3 Mbp, potentially encoding a total of 2,766 proteins. Based on genome analysis as well as the results from previous genetic and mutational analyses, methylotrophy is enabled by methanol and methylamine dehydrogenases and their specific electron transport chain components, the tetrahydromethanopterin-linked formaldehyde oxidation pathway and the assimilatory and dissimilatory ribulose monophosphate cycles, and by a formate dehydrogenase. Some of the methylotrophy genes are present in more than one (identical or nonidentical) copy. The obligate dependence on single-carbon compounds appears to be due to the incomplete tricarboxylic acid cycle, as no genes potentially encoding alpha-ketoglutarate, malate, or succinate dehydrogenases are identifiable. The genome of M. flagellatus was compared in terms of methylotrophy functions to the previously sequenced genomes of three methylotrophs, Methylobacterium extorquens (an alphaproteobacterium, 7 Mbp), Methylibium petroleiphilum (a betaproteobacterium, 4 Mbp), and Methylococcus capsulatus (a gammaproteobacterium, 3.3 Mbp). Strikingly, metabolically and/or phylogenetically, the methylotrophy functions in M. flagellatus were more similar to those in M. capsulatus and M. extorquens than to the ones in the more closely related M. petroleiphilum species, providing the first genomic evidence for the polyphyletic origin of methylotrophy in Betaproteobacteria.
PMCID: PMC1913398  PMID: 17416667
10.  Complete Genomic Characterization of a Pathogenic A.II Strain of Francisella tularensis Subspecies tularensis 
PLoS ONE  2007;2(9):e947.
Francisella tularensis is the causative agent of tularemia, which is a highly lethal disease from nature and potentially from a biological weapon. This species contains four recognized subspecies including the North American endemic F. tularensis subsp. tularensis (type A), whose genetic diversity is correlated with its geographic distribution including a major population subdivision referred to as A.I and A.II. The biological significance of the A.I – A.II genetic differentiation is unknown, though there are suggestive ecological and epidemiological correlations. In order to understand the differentiation at the genomic level, we have determined the complete sequence of an A.II strain (WY96-3418) and compared it to the genome of Schu S4 from the A.I population. We find that this A.II genome is 1,898,476 bp in size with 1,820 genes, 1,303 of which code for proteins. While extensive genomic variation exists between “WY96” and Schu S4, there is only one whole gene difference. This one gene difference is a hypothetical protein of unknown function. In contrast, there are numerous SNPs (3,367), small indels (1,015), IS element differences (7) and large chromosomal rearrangements (31), including both inversions and translocations. The rearrangement borders are frequently associated with IS elements, which would facilitate intragenomic recombination events. The pathogenicity island duplicated regions (DR1 and DR2) are essentially identical in WY96 but vary relative to Schu S4 at 60 nucleotide positions. Other potential virulence-associated genes (231) varied at 559 nucleotide positions, including 357 non-synonymous changes. Molecular clock estimates for the divergence time between A.I and A.II genomes for different chromosomal regions ranged from 866 to 2131 years before present. This paper is the first complete genomic characterization of a member of the A.II clade of Francisella tularensis subsp. tularensis.
PMCID: PMC1978527  PMID: 17895988
11.  The Complete Genome Sequence of Bacillus thuringiensis Al Hakam▿  
Journal of Bacteriology  2007;189(9):3680-3681.
Bacillus thuringiensis is an insect pathogen that is widely used as a biopesticide (E. Schnepf, N. Crickmore, J. Van Rie, D. Lereclus, J. Baum, J. Feitelson, D. R. Zeigler, and D. H. Dean, Microbiol. Mol. Biol. Rev. 62:775-806, 1998). Here we report the finished, annotated genome sequence of B. thuringiensis Al Hakam, which was collected in Iraq by the United Nations Special Commission (L. Radnedge, P. Agron, K. Hill, P. Jackson, L. Ticknor, P. Keim, and G. Andersen, Appl. Environ. Microbiol. 69:2755-2764, 2003).
PMCID: PMC1855882  PMID: 17337577
13.  CAMERA: A Community Resource for Metagenomics 
PLoS Biology  2007;5(3):e75.
The CAMERA (Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis) community database for metagenomic data deposition is an important first step in developing methods for monitoring microbial communities.
PMCID: PMC1821059  PMID: 17355175
14.  The Methanosarcina barkeri Genome: Comparative Analysis with Methanosarcina acetivorans and Methanosarcina mazei Reveals Extensive Rearrangement within Methanosarcinal Genomes▿ †  
Journal of Bacteriology  2006;188(22):7922-7931.
We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. The genome of M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcina spp. in the region proximal to the origin of replication, with interspecies gene similarities as high as 95%. However, it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the distal semigenome. Of the 3,680 open reading frames (ORFs) in M. barkeri, 746 had homologs with better than 80% identity to both M. acetivorans and M. mazei, while 128 nonhypothetical ORFs were unique (nonorthologous) among these species, including a complete formate dehydrogenase operon, genes required for N-acetylmuramic acid synthesis, a 14-gene gas vesicle cluster, and a bacterial-like P450-specific ferredoxin reductase cluster not previously observed or characterized for this genus. A cryptic 36-kbp plasmid sequence that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143-nucleotide motif was detected in M. barkeri. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the relatively large M. acetivorans genome is the result of uniformly distributed multiple gene scale insertions and duplications, while the M. barkeri genome is characterized by localized inversions associated with the loss of gene content. In contrast, the short M. mazei genome most closely approximates the putative ancestral organizational state of these species.
PMCID: PMC1636319  PMID: 16980466
16.  Pathogenomic Sequence Analysis of Bacillus cereus and Bacillus thuringiensis Isolates Closely Related to Bacillus anthracis† 
Journal of Bacteriology  2006;188(9):3382-3390.
Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis are closely related gram-positive, spore-forming bacteria of the B. cereus sensu lato group. While independently derived strains of B. anthracis reveal conspicuous sequence homogeneity, environmental isolates of B. cereus and B. thuringiensis exhibit extensive genetic diversity. Here we report the sequencing and comparative analysis of the genomes of two members of the B. cereus group, B. thuringiensis 97-27 subsp. konkukian serotype H34, isolated from a necrotic human wound, and B. cereus E33L, which was isolated from a swab of a zebra carcass in Namibia. These two strains, when analyzed by amplified fragment length polymorphism within a collection of over 300 of B. cereus, B. thuringiensis, and B. anthracis isolates, appear closely related to B. anthracis. The B. cereus E33L isolate appears to be the nearest relative to B. anthracis identified thus far. Whole-genome sequencing of B. thuringiensis 97-27and B. cereus E33L was undertaken to identify shared and unique genes among these isolates in comparison to the genomes of pathogenic strains B. anthracis Ames and B. cereus G9241 and nonpathogenic strains B. cereus ATCC 10987 and B. cereus ATCC 14579. Comparison of these genomes revealed differences in terms of virulence, metabolic competence, structural components, and regulatory mechanisms.
PMCID: PMC1447445  PMID: 16621833
17.  GenBank 
Nucleic Acids Research  1991;19(Suppl):2221-2225.
The GenBank nucleotide sequence database now contains sequence data and associated annotation corresponding to 56,000,000 nucleotides in 45,000 entries. The input stream of data coming into the database has largely been shifted to direct submissions from the scientific community on electronic media. The data have been installed in a relational database management system and are made available in this form through on-line access, and through various network and off-line computer-readable media. In addition, GenBank provides the U.S. distribution center for the BIOSCI electronic bulletin board service.
PMCID: PMC331354  PMID: 2041806
18.  GenBank 
Nucleic Acids Research  1992;20(Suppl):2065-2069.
The GenBank nucleotide sequence database now contains sequence data and associated annotation corresponding to 85,000,000 nucleotides in 67,000 entries from a total of 3,000 organisms. The input stream of data coming into the database is primarily as direct submissions from the scientific community on electronic media, with little or no data being keyboarded from the printed page by the databank staff. The data are maintained in a relational database management system and are made available in flatfile form through on-line access, and through various network and off-line computer-readable media. The data are also distributed in relational form through satellite copies at a number of institutions in the U.S. and elsewhere. In addition, GenBank provides the U.S. distribution center for the BIOSCI electronic bulletin board service.
PMCID: PMC333982  PMID: 1598235

Results 1-18 (18)