AIM: To determine the diagnostic accuracy and radiation dose of conventional radiography and multidetector computed tomography (MDCT) in suspected scaphoid fractures.
METHODS: One hundred twenty-four consecutive patients were enrolled in our study who had suffered from a wrist trauma and showed typical clinical symptoms suspicious of an acute scaphoid fracture. All patients had initially undergone conventional radiography. Subsequent MDCT was performed within 10 d because of persisting clinical symptoms. Using the MDCT data as the reference standard, a fourfold table was used to classify the test results. The effective dose and impaired energy were assessed in order to compare the radiation burden of the two techniques. The Wilcoxon test was performed to compare the two diagnostic modalities.
RESULTS: Conventional radiography showed 34 acute fractures of the scaphoid in 124 patients (42.2%). Subsequent MDCT revealed a total of 42 scaphoid fractures. The sensitivity of conventional radiography for scaphoid fracture detection was 42.8% and its specificity was 80% resulting in an overall accuracy of 59.6%. Conventional radiography was significantly inferior to MDCT (P < 0.01) concerning scaphoid fracture detection. The mean effective dose of MDCT was 0.1 mSv compared to 0.002 mSv of conventional radiography.
CONCLUSION: Conventional radiography is insufficient for accurate scaphoid fracture detection. Regarding the almost negligible effective dose, MDCT should serve as the first imaging modality in wrist trauma.
Musculoskeletal imaging; Scaphoid fracture; Multidetector computed tomography; Biplane radiography; Emergency radiology; Diagnostic accuracy; Wrist trauma; Dose calculation
This manuscript calls for an international effort to generate a comprehensive catalog from genome sequences of all the archaeal and bacterial type strains.
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered.
FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines.
The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.
FASTA; Data validation; High-throughput
In recent years, representatives of the Bacteroidetes have been increasingly recognized as specialists for the degradation of macromolecules. Formosa constitutes a Bacteroidetes genus within the class Flavobacteria, and the members of this genus have been found in marine habitats with high levels of organic matter, such as in association with algae, invertebrates, and fecal pellets. Here we report on the generation and analysis of the genome of the type strain of Formosa agariphila (KMM 3901T), an isolate from the green alga Acrosiphonia sonderi. F. agariphila is a facultative anaerobe with the capacity for mixed acid fermentation and denitrification. Its genome harbors 129 proteases and 88 glycoside hydrolases, indicating a pronounced specialization for the degradation of proteins, polysaccharides, and glycoproteins. Sixty-five of the glycoside hydrolases are organized in at least 13 distinct polysaccharide utilization loci, where they are clustered with TonB-dependent receptors, SusD-like proteins, sensors/transcription factors, transporters, and often sulfatases. These loci play a pivotal role in bacteroidetal polysaccharide biodegradation and in the case of F. agariphila revealed the capacity to degrade a wide range of algal polysaccharides from green, red, and brown algae and thus a strong specialization of toward an alga-associated lifestyle. This was corroborated by growth experiments, which confirmed usage particularly of those monosaccharides that constitute the building blocks of abundant algal polysaccharides, as well as distinct algal polysaccharides, such as laminarins, xylans, and κ-carrageenans.
Phaeobacter gallaeciensis CIP 105210T (= DSM 26640T = BS107T) is the type strain of the species Phaeobacter gallaeciensis. The genus Phaeobacter belongs to the marine Roseobacter group (Rhodobacteraceae, Alphaproteobacteria). Phaeobacter species are effective colonizers of marine surfaces, including frequent associations with eukaryotes. Strain BS107T was isolated from a rearing of the scallop Pecten maximus. Here we describe the features of this organism, together with the complete genome sequence, comprising eight circular replicons with a total of 4,448 genes. In addition to a high number of extrachromosomal replicons, the genome contains six genomic island and three putative prophage regions, as well as a hybrid between a plasmid and a circular phage. Phylogenomic analyses confirm previous results, which indicated that the originally reported P. gallaeciensis type-strain deposit DSM 17395 belongs to P. inhibens and that CIP 105210T (= DSM 26640T) is the sole genome-sequenced representative of P. gallaeciensis.
Alphaproteobacteria; Roseobacter group; Plasmid wealth; Replication systems; Sister species; Phaeobacter inhibens
This report summarizes the proceedings of the 14th workshop of the Genomic Standards Consortium (GSC) held at the University of Oxford in September 2012. The primary goal of the workshop was to work towards the launch of the Genomic Observatories (GOs) Network under the GSC. For the first time, it brought together potential GOs sites, GSC members, and a range of interested partner organizations. It thus represented the first meeting of the GOs Network (GOs1). Key outcomes include the formation of a core group of “champions” ready to take the GOs Network forward, as well as the formation of working groups. The workshop also served as the first meeting of a wide range of participants in the Ocean Sampling Day (OSD) initiative, a first GOs action. Three projects with complementary interests – COST Action ES1103, MG4U and Micro B3 – organized joint sessions at the workshop. A two-day GSC Hackathon followed the main three days of meetings.
The co-authors of this paper hereby state their intention to work together to launch the Genomic Observatories Network (GOs Network) for which this document will serve as its Founding Charter. We define a Genomic Observatory as an ecosystem and/or site subject to long-term scientific research, including (but not limited to) the sustained study of genomic biodiversity from single-celled microbes to multicellular organisms.
An international group of 64 scientists first published the call for a global network of Genomic Observatories in January 2012. The vision for such a network was expanded in a subsequent paper and developed over a series of meetings in Bremen (Germany), Shenzhen (China), Moorea (French Polynesia), Oxford (UK), Pacific Grove (California, USA), Washington (DC, USA), and London (UK). While this community-building process continues, here we express our mutual intent to establish the GOs Network formally, and to describe our shared vision for its future. The views expressed here are ours alone as individual scientists, and do not necessarily represent those of the institutions with which we are affiliated.
Biodiversity; Genomics; Biocode; Earth observations
The Genomic Standards Consortium (GSC) is an open-membership community that was founded in 2005 to work towards the development, implementation and harmonization of standards in the field of genomics. Starting with the defined task of establishing a minimal set of descriptions the GSC has evolved into an active standards-setting body that currently has 18 ongoing projects, with additional projects regularly proposed from within and outside the GSC. Here we describe our recently enacted policy for proposing new activities that are intended to be taken on by the GSC, along with the template for proposing such new activities.
Dinoroseobacter shibae, a member of the Roseobacter clade abundant in marine environments, maintains morphological heterogeneity throughout growth, with small cells dividing by binary fission and large cells dividing by budding from one or both cell poles. This morphological heterogeneity is lost if the quorum sensing (QS) system is silenced, concurrent with a decreased expression of the CtrA phosphorelay, a regulatory system conserved in Alphaproteobacteria and the master regulator of the Caulobacter crescentus cell cycle. It consists of the sensor histidine kinase CckA, the phosphotransferase ChpT and the transcriptional regulator CtrA. Here we tested if the QS induced differentiation of D. shibae is mediated by the CtrA phosphorelay.
Mutants for ctrA, chpT and cckA showed almost homogeneous cell morphology and divided by binary fission. For ctrA and chpT, expression in trans on a plasmid caused the fraction of cells containing more than two chromosome equivalents to increase above wild-type level, indicating that gene copy number directly controls chromosome number. Transcriptome analysis revealed that CtrA is a master regulator for flagellar biosynthesis and has a great influence on the transition to stationary phase. Interestingly, the expression of the autoinducer synthase genes luxI2 and luxI3 was strongly reduced in all three mutants, resulting in loss of biosynthesis of acylated homoserine-lactones with C14 side-chain, but could be restored by expressing these genes in trans. Several phylogenetic clusters of Alphaproteobacteria revealed a CtrA binding site in the promoters of QS genes, including Roseobacters and Rhizobia.
The CtrA phosphorelay induces differentiation of a marine Roseobacter strain that is strikingly different from that of C. crescentus. Instead of a tightly regulated cell cycle and a switch between two morphotypes, the morphology and cell division of Dinoroseobacter shibae are highly heterogeneous. We discovered for the first time that the CtrA phosphorelay controls the biosynthesis of signaling molecules. Thus cell-cell communication and differentiation are interlinked in this organism. This may be a common strategy, since we found a similar genetic set-up in other species in the ecologically relevant group of Alphaproteobacteria. D. shibae will be a valuable model organism to study bacterial differentiation into pleomorphic cells.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-130) contains supplementary material, which is available to authorized users.
SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive resource for up-to-date quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. SILVA provides a manually curated taxonomy for all three domains of life, based on representative phylogenetic trees for the small- and large-subunit rRNA genes. This article describes the improvements the SILVA taxonomy has undergone in the last 3 years. Specifically we are focusing on the curation process, the various resources used for curation and the comparison of the SILVA taxonomy with Greengenes and RDP-II taxonomies. Our comparisons not only revealed a reasonable overlap between the taxa names, but also points to significant differences in both names and numbers of taxa between the three resources.
The pathomechanism of mycosis fungoides (MF), the most common type of primary cutaneous T-cell lymphomas (CTCLs) and a malignancy of non-recirculating, skin-resident T-cells, is unknown albeit underlying viral infections have been sought for. Human endogenous retroviruses (HERVs) are ancient retroviral sequences in the human genome and their transcription is often deregulated in cancers. We explored the transcriptional activity of HERV sequences in a total of 34 samples comprising MF and psoriasis skin lesions, as well as corresponding non-malignant skin using a retrovirus-specific microarray and quantitative RT-PCR. To identify active HERV-W loci, we cloned the HERV-W specific RT-PCR products, sequenced the cDNA clones and assigned the sequences to HERV-W loci. Finally, we used immunohistochemistry on MF patient and non-malignant inflammatory skin samples to confirm specific HERV-encoded protein expression. Firstly, a distinct, skin-specific transcription profile consisting of five constitutively active HERV groups was established. Although individual variability was common, HERV-W showed significantly increased transcription in MF lesions compared to clinically intact skin from the same patient. Predominantly transcribed HERV-W loci were found to be located in chromosomes 6q21 and 7q21.2, chromosomal regions typically altered in CTCL. Surprisingly, we also found the expression of 7q21.2/ERVWE1-encoded Syncytin-1 (Env) protein in MF biopsies and expression of Syncytin-1 was seen in malignant lymphocytes, especially in the epidermotropic ones, in 15 of 30 cases studied. Most importantly, no Syncytin-1 expression was detected in inflammatory dermatosis (Lichen ruber planus) with skin-homing, non-malignant T lymphocytes. The expression of ERVWE1 mRNA was further confirmed in 3/7 MF lesions analyzed. Our observations strengthen the association between activated HERVs and cancer. The study offers a new perspective into the pathogenesis of CTCL since we demonstrate that differences in HERV-W transcription levels between lesional MF and non-malignant skin are significant, and that ERVWE1-encoded Syncytin-1 is expressed in MF lymphoma cells.
Members of the Planctomycetes clade share many unusual features for bacteria. Their cytoplasm contains membrane-bound compartments, they lack peptidoglycan and FtsZ, they divide by polar budding, and they are capable of endocytosis. Planctomycete genomes have remained enigmatic, generally being quite large (up to 9 Mb), and on average, 55% of their predicted proteins are of unknown function. Importantly, proteins related to the unusual traits of Planctomycetes remain largely unknown. Thus, we embarked on bioinformatic analyses of these genomes in an effort to predict proteins that are likely to be involved in compartmentalization, cell division, and signal transduction. We used three complementary strategies. First, we defined the Planctomycetes core genome and subtracted genes of well-studied model organisms. Second, we analyzed the gene content and synteny of morphogenesis and cell division genes and combined both methods using a “guilt-by-association” approach. Third, we identified signal transduction systems as well as sigma factors. These analyses provide a manageable list of candidate genes for future genetic studies and provide evidence for complex signaling in the Planctomycetes akin to that observed for bacteria with complex life-styles, such as Myxococcus xanthus.
The proportion of conserved DNA sequences with no clear function is steadily growing in bioinformatics databases. Studies of sequence and structural homology have indicated that many uncharacterized protein domain sequences are variants of functionally described domains. If these variants promote an organism's ecological fitness, they are likely to be conserved in the genome of its progeny and the population at large. The genetic composition of microbial communities in their native ecosystems is accessible through metagenomics. We hypothesize the co-variation of protein domain sequences across metagenomes from similar ecosystems will provide insights into their potential roles and aid further investigation.
We calculated the correlation of Pfam protein domain sequences across the Global Ocean Sampling metagenome collection, employing conservative detection and correlation thresholds to limit results to well-supported hits and associations. We then examined intercorrelations between domains of unknown function (DUFs) and domains involved in known metabolic pathways using network visualization and cluster-detection tools. We used a cautious “guilty-by-association” approach, referencing knowledge-level resources to identify and discuss associations that offer insight into DUF function. We observed numerous DUFs associated to photobiologically active domains and prevalent in the Cyanobacteria. Other clusters included DUFs associated with DNA maintenance and repair, inorganic nutrient metabolism, and sodium-translocating transport domains. We also observed a number of clusters reflecting known metabolic associations and cases that predicted functional reclassification of DUFs.
Critically examining domain covariation across metagenomic datasets can grant new perspectives on the roles and associations of DUFs in an ecological setting. Targeted attempts at DUF characterization in the laboratory or in silico may draw from these insights and opportunities to discover new associations and corroborate existing ones will arise as more large-scale metagenomic datasets emerge.
SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches.
Metagenomics has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet non-cultivable. Continual progress in next-generation sequencing allows for generating increasingly large metagenomes and studying multiple metagenomes over time or space. Recently, a new type of holistic ecosystem study has emerged that seeks to combine metagenomics with biodiversity, meta-expression and contextual data. Such ‘ecosystems biology’ approaches bear the potential to not only advance our understanding of environmental microbes to a new level but also impose challenges due to increasing data complexities, in particular with respect to bioinformatic post-processing. This mini review aims to address selected opportunities and challenges of modern metagenomics from a bioinformatics perspective and hopefully will serve as a useful resource for microbial ecologists and bioinformaticians alike.
16S rRNA biodiversity; binning; bioinformatics; Genomic Standards Consortium; metagenomics; next-generation sequencing
16S ribosomal RNA gene (rDNA) amplicon analysis remains the standard approach for the cultivation-independent investigation of microbial diversity. The accuracy of these analyses depends strongly on the choice of primers. The overall coverage and phylum spectrum of 175 primers and 512 primer pairs were evaluated in silico with respect to the SILVA 16S/18S rDNA non-redundant reference dataset (SSURef 108 NR). Based on this evaluation a selection of ‘best available’ primer pairs for Bacteria and Archaea for three amplicon size classes (100–400, 400–1000, ≥1000 bp) is provided. The most promising bacterial primer pair (S-D-Bact-0341-b-S-17/S-D-Bact-0785-a-A-21), with an amplicon size of 464 bp, was experimentally evaluated by comparing the taxonomic distribution of the 16S rDNA amplicons with 16S rDNA fragments from directly sequenced metagenomes. The results of this study may be used as a guideline for selecting primer pairs with the best overall coverage and phylum spectrum for specific applications, therefore reducing the bias in PCR-based microbial diversity studies.
Preventive health care is an important part of general practice however uptake of activities by patients is variable. Monetary incentives for doctors have been used in the UK and Australia to improve rates of screening and immunisation. Few studies have focussed on incentives for patients to attend preventive health care examinations. Our objective was to investigate the use of a monetary incentive to increase patient attendance with their general practitioner for a cardiovascular risk assessment (CVRA).
A pragmatic RCT was conducted in two Australian general practices. Participating GPs underwent academic detailing for cardiovascular risk assessment. 301 patients aged 40–74, who did not have cardiovascular disease, were independently randomised to receive a letter inviting them to a no cost cardiovascular risk assessment with their GP, or the same letter plus an offer of a $25 shopping voucher if they attended. An audit of patient medical records was also undertaken and a patient questionnaire administered to a sub sample of participants. Our main outcome measure was attendance for cardiovascular risk assessment.
In the RCT, 56/301(18.6%) patients attended for cardiovascular risk assessment, 29/182 (15.9%) in the control group and 27/119 (22.7%) in the intervention group. The estimated difference of 6.8% (95% CI: -2.5% to 16.0%) was not statistically significant, P = 0.15. The audit showed that GPs may underestimate patients’ absolute cardiovascular risk and the questionnaire that mailed invitations from GPs for a CVRA may encourage patients to attend.
A small monetary incentive does not improve attendance for cardiovascular risk assessment. Further research should be undertaken to determine if there are other incentives that may increase attendance for preventive activities in the general practice setting.
Clinical trials registration
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
Due to its extreme salinity and high Mg concentration the Dead Sea is characterized by a very low density of cells most of which are Archaea. We discovered several underwater fresh to brackish water springs in the Dead Sea harboring dense microbial communities. We provide the first characterization of these communities, discuss their possible origin, hydrochemical environment, energetic resources and the putative biogeochemical pathways they are mediating. Pyrosequencing of the 16S rRNA gene and community fingerprinting methods showed that the spring community originates from the Dead Sea sediments and not from the aquifer. Furthermore, it suggested that there is a dense Archaeal community in the shoreline pore water of the lake. Sequences of bacterial sulfate reducers, nitrifiers iron oxidizers and iron reducers were identified as well. Analysis of white and green biofilms suggested that sulfide oxidation through chemolitotrophy and phototrophy is highly significant. Hyperspectral analysis showed a tight association between abundant green sulfur bacteria and cyanobacteria in the green biofilms. Together, our findings show that the Dead Sea floor harbors diverse microbial communities, part of which is not known from other hypersaline environments. Analysis of the water’s chemistry shows evidence of microbial activity along the path and suggests that the springs supply nitrogen, phosphorus and organic matter to the microbial communities in the Dead Sea. The underwater springs are a newly recognized water source for the Dead Sea. Their input of microorganisms and nutrients needs to be considered in the assessment of possible impact of dilution events of the lake surface waters, such as those that will occur in the future due to the intended establishment of the Red Sea−Dead Sea water conduit.
Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements.
Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands.
SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks.
Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.
Supplementary data are available at Bioinformatics online.
Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion.
binning; metagenomics; molecular ecology; self-organizing map (SOM); taxonomic classification; TaxSOM
Planctomycetes represent a remarkable clade in the domain Bacteria because they play crucial roles in global carbon and nitrogen cycles and display cellular structures that closely parallel those of eukaryotic cells. Studies on Planctomycetes have been hampered by the lack of genetic tools, which we developed for Planctomyces limnophilus.
Marine phages have an astounding global abundance and ecological impact. However, little knowledge is derived from phage genomes, as most of the open reading frames in their small genomes are unknown, novel proteins. To infer potential functional and ecological relevance of sequenced marine Pseudoalteromonas phage H105/1, two strategies were used. First, similarity searches were extended to include six viral and bacterial metagenomes paired with their respective environmental contextual data. This approach revealed ‘ecogenomic' patterns of Pseudoalteromonas phage H105/1, such as its estuarine origin. Second, intrinsic genome signatures (phylogenetic, codon adaptation and tetranucleotide (tetra) frequencies) were evaluated on a resolved intra-genomic level to shed light on the evolution of phage functional modules. On the basis of differential codon adaptation of Phage H105/1 proteins to the sequenced Pseudoalteromonas spp., regions of the phage genome with the most ‘host'-adapted proteins also have the strongest bacterial tetra signature, whereas the least ‘host'-adapted proteins have the strongest phage tetra signature. Such a pattern may reflect the evolutionary history of the respective phage proteins and functional modules. Finally, analysis of the structural proteome identified seven proteins that make up the mature virion, four of which were previously unknown. This integrated approach combines both novel and classical strategies and serves as a model to elucidate ecological inferences and evolutionary relationships from phage genomes that typically abound with unknown gene content.
ecogenomics; genome signatures; genomics; marine; phage; Pseudoalteromonas
State of the art (DNA) sequencing methods applied in “Omics” studies grant insight into the ‘blueprints’ of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion.