Previous studies of epidermal kinetics in psoriasis have relied on invasive biopsy procedures or the use of radioactive labels. We previously developed a non-invasive method for measuring keratin synthesis in human skin using deuterated water labeling, serial collection of tape strips and measurement of deuterium enrichment in protein by mass spectrometry. This powerful method can be applied to measure other skin proteins and lipids collected by tape stripping. Here, for the first time, we apply this technique to investigate the epidermal kinetics of psoriasis, the first step in defining a kinetic profile for normal skin versus activated or quiescent psoriatic skin.
Psoriatic subjects were given 2H2O orally as twice-daily doses for 16–38 days. Affected and unaffected skin was sampled by tape stripping and washing (modified Pachtman method). Proteins were isolated from the tape strips by a method that enriches for keratin. Turnover times were determined by gas chromatography/mass spectrometry. Kinetic data were compared to transepidermal water loss (TEWL).
Deuterium-labeled protein from lesional psoriatic skin appeared at the skin surface within 3–8 days of label administration, whereas labeled protein from non-lesional skin requires 10–20 days to appear. Psoriatic skin had similar rate of growth despite varying anatomic location. Proteins recovered from tape strips were identified by nanoscale liquid chromatography/tandem mass spectrometry. Isolated peptides were >98% from keratin in uninvolved skin and >72% keratin in psoriatic skin. Revealing that one-quarter of all newly synthesized proteins in psoriatic skin are antimicrobial defense and other immune-related proteins. TEWL values were greater in lesional than non-lesional skin, suggesting barrier compromise in psoriatic skin despite increased clinical thickness.
This simple, elegant, and non-invasive method for measuring epidermal protein synthesis, which can also be adapted to measure epidermal lipids, provides a metric that may reveal new insights into the mechanisms and dynamic processes underlying psoriasis and may also provide an objective scale for determining response to therapeutic agents in pre-clinical and clinical trials. This opens a pathway to the non-invasive study of kinetics of protein formation in psoriasis or other skin diseases.
Psoriasis; Kinetics; Keratin; Skin; Stable isotopes
Analyses of the taxonomic diversity associated with the human microbiome continue to be an area of great importance. The study of the nature and extent of the commonly shared taxa (“core”), versus those less prevalent, establishes a baseline for comparing healthy and diseased groups by quantifying the variation among people, across body habitats and over time. The National Institutes of Health (NIH) sponsored Human Microbiome Project (HMP) has provided an unprecedented opportunity to examine and better define what constitutes the taxonomic core within and across body habitats and individuals through pyrosequencing-based profiling of 16S rRNA gene sequences from oral, skin, distal gut (stool), and vaginal body habitats from over 200 healthy individuals. A two-parameter model is introduced to quantitatively identify the core taxonomic members of each body habitat’s microbiota across the healthy cohort. Using only cutoffs for taxonomic ubiquity and abundance, core taxonomic members were identified for each of the 18 body habitats and also for the 4 higher-level body regions. Although many microbes were shared at low abundance, they exhibited a relatively continuous spread in both their abundance and ubiquity, as opposed to a more discretized separation. The numbers of core taxa members in the body regions are comparatively small and stable, reflecting the relatively high, but conserved, interpersonal variability within the cohort. Core sizes increased across the body regions in the order of: vagina, skin, stool, and oral cavity. A number of “minor” oral taxonomic core were also identified by their majority presence across the cohort, but with relatively low and stable abundances. A method for quantifying the difference between two cohorts was introduced and applied to samples collected on a second visit, revealing that over time, the oral, skin, and stool body regions tended to be more transient in their taxonomic structure than the vaginal body region.
Antibiotics administered in low doses have been widely used as growth promoters in the agricultural industry since the 1950s, yet the mechanisms for this effect are unclear. Because antimicrobial agents of different classes and varying activity are effective across several vertebrate species, we hypothesized that such subtherapeutic administration alters the population structure of the gut microbiome as well as its metabolic capabilities. We generated a model of adiposity by giving subtherapeutic antibiotic therapy to young mice and evaluated changes in the composition and capabilities of the gut microbiome. Administration of subtherapeutic antibiotic therapy increased adiposity in young mice and increased hormones related to metabolism. We observed substantial taxonomic changes in the microbiome, changes in copies of key genes involved in the metabolism of carbohydrates to short-chain fatty acids, increases in colonic short-chain fatty acid levels, and alterations in the regulation of hepatic metabolism of lipids and cholesterol. In this model, we demonstrate the alteration of early-life murine metabolic homeostasis through antibiotic manipulation.
In a high-throughput environment, to PCR amplify and sequence a large set of viral isolates from populations that are potentially heterogeneous and continuously evolving, the use of degenerate PCR primers is an important strategy. Degenerate primers allow for the PCR amplification of a wider range of viral isolates with only one set of pre-mixed primers, thus increasing amplification success rates and minimizing the necessity for genome finishing activities. To successfully select a large set of degenerate PCR primers necessary to tile across an entire viral genome and maximize their success, this process is best performed computationally.
We have developed a fully automated degenerate PCR primer design system that plays a key role in the J. Craig Venter Institute’s (JCVI) high-throughput viral sequencing pipeline. A consensus viral genome, or a set of consensus segment sequences in the case of a segmented virus, is specified using IUPAC ambiguity codes in the consensus template sequence to represent the allelic diversity of the target population. PCR primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the full length of the specified target region. As part of the tiling process, primer pairs are computationally screened to meet the criteria for successful PCR with one of two described amplification protocols. The actual sequencing success rates for designed primers for measles virus, mumps virus, human parainfluenza virus 1 and 3, human respiratory syncytial virus A and B and human metapneumovirus are described, where >90% of designed primer pairs were able to consistently successfully amplify >75% of the isolates.
Augmenting our previously developed and published JCVI Primer Design Pipeline, we achieved similarly high sequencing success rates with only minor software modifications. The recommended methodology for the construction of the consensus sequence that encapsulates the allelic variation of the targeted population and is a key step prior to designing degenerate primers is also formally described.
High-throughput computational degenerate PCR primer design; sequencing viral isolates
Calorie restriction (CR) promotes longevity. A prevalent mechanistic hypothesis explaining this effect suggests that protein degradation, including mitochondrial autophagy, is increased with CR, removing damaged proteins and improving cellular fitness. At steady state, increased catabolism must be balanced by increasing mitochondrial biogenesis and protein synthesis, resulting in faster protein replacement rates. To test this hypothesis, we measured replacement kinetics and relative concentrations of hundreds of proteins in vivo in long-term CR and ad libitum-fed mice using metabolic 2H2O-labeling combined with the Stable Isotope Labeling in Mammals protocol and LC-MS/MS analysis of mass isotopomer abundances in tryptic peptides. CR reduced absolute synthesis and breakdown rates of almost all measured hepatic proteins and prolonged the half-lives of most (∼80%), particularly mitochondrial proteins (but not ribosomal subunits). Proteins with related functions exhibited coordinated changes in relative concentration and replacement rates. In silico expression pathway interrogation allowed the testing of potential regulators of altered network dynamics (e.g. peroxisome proliferator-activated receptor gamma coactivator 1-alpha). In summary, our combination of dynamic and quantitative proteomics suggests that long-term CR reduces mitochondrial biogenesis and mitophagy. Our findings contradict the theory that CR increases mitochondrial protein turnover and provide compelling evidence that cellular fitness is accompanied by reduced global protein synthetic burden.
Analysis of human body microbial diversity is fundamental to understanding community structure, biology and ecology. The National Institutes of Health Human Microbiome Project (HMP) has provided an unprecedented opportunity to examine microbial diversity within and across body habitats and individuals through pyrosequencing-based profiling of 16 S rRNA gene sequences (16 S) from habits of the oral, skin, distal gut, and vaginal body regions from over 200 healthy individuals enabling the application of statistical techniques. In this study, two approaches were applied to elucidate the nature and extent of human microbiome diversity. First, bootstrap and parametric curve fitting techniques were evaluated to estimate the maximum number of unique taxa, Smax, and taxa discovery rate for habitats across individuals. Next, our results demonstrated that the variation of diversity within low abundant taxa across habitats and individuals was not sufficiently quantified with standard ecological diversity indices. This impact from low abundant taxa motivated us to introduce a novel rank-based diversity measure, the Tail statistic, (“τ”), based on the standard deviation of the rank abundance curve if made symmetric by reflection around the most abundant taxon. Due to τ’s greater sensitivity to low abundant taxa, its application to diversity estimation of taxonomic units using taxonomic dependent and independent methods revealed a greater range of values recovered between individuals versus body habitats, and different patterns of diversity within habitats. The greatest range of τ values within and across individuals was found in stool, which also exhibited the most undiscovered taxa. Oral and skin habitats revealed variable diversity patterns, while vaginal habitats were consistently the least diverse. Collectively, these results demonstrate the importance, and motivate the introduction, of several visualization and analysis methods tuned specifically for next-generation sequence data, further revealing that low abundant taxa serve as an important reservoir of genetic diversity in the human microbiome.
The aim was to evaluate the outcome of patients who underwent surgery for perforated gastric malignancies.
A review of all patients who underwent surgery for perforated gastric malignancy was performed.
Twelve patients (nine gastric adenocarcinoma and three B-cell lymphoma) formed the study group. Ten (83.3%) had subtotal gastrectomy performed, while two (16.7%) underwent total gastrectomy. All eight patients with adenocarcinoma who survived the initial operation fared poorly. The two patients with lymphoma who survived the surgery underwent subsequent chemotherapy has no disease recurrence currently.
Surgery in perforated gastric malignancy is fraught with numerous challenges.
emergency; surgery; perforation; treatment outcome; malignancy
Summary: JCVI Metagenomics Reports (METAREP) is a Web 2.0 application designed to help scientists analyze and compare annotated metagenomics datasets. It utilizes Solr/Lucene, a high-performance scalable search engine, to quickly query large data collections. Furthermore, users can use its SQL-like query syntax to filter and refine datasets. METAREP provides graphical summaries for top taxonomic and functional classifications as well as a GO, NCBI Taxonomy and KEGG Pathway Browser. Users can compare absolute and relative counts of multiple datasets at various functional and taxonomic levels. Advanced comparative features comprise statistical tests as well as multidimensional scaling, heatmap and hierarchical clustering plots. Summaries can be exported as tab-delimited files, publication quality plots in PDF format. A data management layer allows collaborative data analysis and result sharing.
Availability: Web site http://www.jcvi.org/metarep; source code http://github.com/jcvi/METAREP
Supplementary information: Supplementary data are available at Bioinformatics online.
The advancements in DNA sequencing technologies have allowed researchers to progress from the analyses of a single organism towards the deep sequencing of a sample of organisms. With sufficient sequencing depth, it is now possible to detect subtle variations between members of the same species, or between mixed species with shared biomarkers, such as the 16S rRNA gene. However, traditional sequencing analyses of samples from largely homogeneous populations are often still based on multiple sequence alignments (MSA), where each sequence is placed along a separate row and similarities between aligned bases can be followed down each column. While this visual format is intuitive for a small set of aligned sequences, the representation quickly becomes cumbersome as sequencing depths cover loci hundreds or thousands of reads deep.
We have developed ANDES, a software library and a suite of applications, written in Perl and R, for the statistical ANalyses of DEep Sequencing. The fundamental data structure underlying ANDES is the position profile, which contains the nucleotide distributions for each genomic position resultant from a multiple sequence alignment (MSA). Tools include the root mean square deviation (RMSD) plot, which allows for the visual comparison of multiple samples on a position-by-position basis, and the computation of base conversion frequencies (transition/transversion rates), variation (Shannon entropy), inter-sample clustering and visualization (dendrogram and multidimensional scaling (MDS) plot), threshold-driven consensus sequence generation and polymorphism detection, and the estimation of empirically determined sequencing quality values.
As new sequencing technologies evolve, deep sequencing will become increasingly cost-efficient and the inter and intra-sample comparisons of largely homogeneous sequences will become more common. We have provided a software package and demonstrated its application on various empirically-derived datasets. Investigators may download the software from Sourceforge at https://sourceforge.net/projects/andestools.
High-throughput DNA sequencing has produced a large number of closed and well annotated genomes. As the focus from whole genome sequencing and assembly moves towards resequencing, variant data is becoming more accessible and large quantities of polymorphisms are being detected. An easy-to-use tool for quickly assessing the potential importance of these discovered variants becomes ever important.
Written in Perl, the VariantClassifier receives a list of polymorphisms and genome annotation, and generates a hierarchically-structured classification for each variant. Depending on the available annotation, the VariantClassifier may assign each polymorphism to a large variety of feature types, such as intergenic or genic; upstream promoter region, intronic region, exonic region or downstream transcript region; 5' splice site or 3' splice site; 5' untranslated region (UTR), 3' UTR or coding sequence (CDS); impacted protein domain; substitution, insertion or deletion; synonymous or non-synonymous; conserved or unconserved; and frameshift or amino acid insertion or deletion (indel). If applicable, the truncated or altered protein sequence is also predicted. For organisms with annotation maintained at Ensembl, a software application for downloading the necessary annotation is also provided, although the classifier will function with properly formatted annotation provided through alternative means.
We have utilized the VariantClassifier for several projects since its implementation to quickly assess hundreds of thousands of variations on several genomes and have received requests to make the tool publically available. The project website can be found at: http://www.jcvi.org/cms/research/projects/variantclassifier.
Measurements of cell proliferation and matrix synthesis in cartilage explants have identified regulatory factors (e.g., interleukin 1, IL-1) that contribute to osteoarthritis and anabolic mediators (e.g., BMP-7) that may have therapeutic potential. The objective of this study was to develop a robust method for measuring cell proliferation and glycosaminoglycan synthesis in articular cartilage that could be applied in vivo.
A stable isotope-mass spectrometry approach was validated by measuring the metabolic effects of IL-1 and BMP-7 in cultures of mature and immature bovine cartilage explants. The method was also applied in vivo to quantify physiologic turnover rates of matrix and cells in the articular cartilage of normal rats. Heavy water was administered to explants in the culture medium and to rats via drinking water, and cartilage was analyzed for labeling of chondroitin sulfate (CS), hyaluronic acid (HA) and DNA.
As expected, IL-1 inhibited the synthesis of DNA and CS in cartilage explants. However, IL-1 inhibited HA synthesis only in immature cartilage. Futhermore, BMP-7 was generally stimulatory, but immature cartilage was significantly more responsive than mature cartilage, particularly in terms of HA and DNA synthesis. In vivo, labeling of CS and DNA in normal rats for up to a year indicated half-lives of 22 and 862 days, respectively, in the joint.
We describe a method by which deuterium from heavy water is traced into multiple metabolites from a single cartilage specimen to profile its metabolic activity. This method was demonstrated in tissue culture and rodents but may have significant clinical applications.
Most RNA viruses lack the mechanisms to recognize and correct mutations that arise during genome replication, resulting in quasispecies diversity that is required for pathogenesis and adaptation. However, it is not known how viruses encoding large viral RNA genomes such as the Coronaviridae (26 to 32 kb) balance the requirements for genome stability and quasispecies diversity. Further, the limits of replication infidelity during replication of large RNA genomes and how decreased fidelity impacts virus fitness over time are not known. Our previous work demonstrated that genetic inactivation of the coronavirus exoribonuclease (ExoN) in nonstructural protein 14 (nsp14) of murine hepatitis virus results in a 15-fold decrease in replication fidelity. However, it is not known whether nsp14-ExoN is required for replication fidelity of all coronaviruses, nor the impact of decreased fidelity on genome diversity and fitness during replication and passage. We report here the engineering and recovery of nsp14-ExoN mutant viruses of severe acute respiratory syndrome coronavirus (SARS-CoV) that have stable growth defects and demonstrate a 21-fold increase in mutation frequency during replication in culture. Analysis of complete genome sequences from SARS-ExoN mutant viral clones revealed unique mutation sets in every genome examined from the same round of replication and a total of 100 unique mutations across the genome. Using novel bioinformatic tools and deep sequencing across the full-length genome following 10 population passages in vitro, we demonstrate retention of ExoN mutations and continued increased diversity and mutational load compared to wild-type SARS-CoV. The results define a novel genetic and bioinformatics model for introduction and identification of multi-allelic mutations in replication competent viruses that will be powerful tools for testing the effects of decreased fidelity and increased quasispecies diversity on viral replication, pathogenesis, and evolution.
Quasispecies diversity is critical to virus fitness, adaptation, and pathogenesis. However, the relationship of fidelity to population diversity is less studied because viral systems with engineered differences in fidelity and bioinformatic methods that robustly measure and compare fidelity and diversity during replication and passage have not been available. Coronaviruses contain the largest and most complex RNA genomes, and encode multiple novel replicase nonstructural proteins (nsps). We previously demonstrated that murine hepatitis virus nsp14-exonuclease (ExoN) activity is required for replication fidelity. In the present report we have generated nsp14-ExoN inactivation mutants of SARS-coronavirus (S-ExoN) that have stable growth defects and dramatically decreased replication fidelity during replication in culture. We used the S-ExoN mutant viruses to define the diversity and stability of the genome during replication and passage, and to test the capacity of deep sequencing to track virus population diversity over time. The experiments demonstrate that viable S-ExoN mutants accumulate large numbers of predominantly unique mutations across the genome, and that increased diversity is continuous over passage. The results establish methods for direct comparison of consensus genome sequences with total population diversity and the impact on viral growth and adaptation.
Background and Objectives:
Right paraduodenal hernia (PDH) results from a primitive gut malrotation. The resultant jejunal mesenteric defect posterior to the superior mesenteric vessels allows decompressed jejunum to herniate retroperitoneally. PDH make up 53% of all internal hernias, but account for only 0.2% to 5.8% of all cases of intestinal obstruction. In addition, PDH exhibits male and left-sided predominance. Ours is the second report to describe the preoperative diagnosis and totally laparoscopic repair of a right PDH.
We report the case of a 26-year-old female with symptoms suggestive of partial small bowel obstruction and a 6-year history of intermittent abdominal pain. Physical examination demonstrated lower quadrant tenderness. Plain abdominal radiographs and ultrasonography were nondiagnostic. Contrasted computed tomography of the abdomen revealed jejunum encased within the right upper quadrant suspicious for right PDH.
The patient underwent successful laparoscopic right PDH repair and was discharged home on postoperative day 1 without late sequelae.
In the outpatient setting, clinical suspicion and comprehensive radiological investigation permit preoperative diagnosis of right PDH. In acute situations, clinical presentation, plain radiographs, and then diagnostic laparoscopy may be an expeditious diagnostic algorithm. Subsequent laparoscopic repair of right PDH is feasible and may shorten hospital length of stay.
Paraduodenal hernia; Mesocolic hernia; Internal hernia; Intestinal malrotation; Embryology; Small bowel obstruction; Gastrointestinal radiology; Laparoscopic repair
Motivation: DNA sequence reads from Sanger and pyrosequencing platforms differ in cost, accuracy, typical coverage, average read length and the variety of available paired-end protocols. Both read types can complement one another in a ‘hybrid’ approach to whole-genome shotgun sequencing projects, but assembly software must be modified to accommodate their different characteristics. This is true even of pyrosequencing mated and unmated read combinations. Without special modifications, assemblers tuned for homogeneous sequence data may perform poorly on hybrid data.
Results: Celera Assembler was modified for combinations of ABI 3730 and 454 FLX reads. The revised pipeline called CABOG (Celera Assembler with the Best Overlap Graph) is robust to homopolymer run length uncertainty, high read coverage and heterogeneous read lengths. In tests on four genomes, it generated the longest contigs among all assemblers tested. It exploited the mate constraints provided by paired-end reads from either platform to build larger contigs and scaffolds, which were validated by comparison to a finished reference sequence. A low rate of contig mis-assembly was detected in some CABOG assemblies, but this was reduced in the presence of sufficient mate pair data.
Availability: The software is freely available as open-source from http://wgs-assembler.sf.net under the GNU Public License.
Supplementary information: Supplementary data are available at Bioinformatics online.
There is much interest in characterizing the variation in a human individual, because this may elucidate what contributes significantly to a person's phenotype, thereby enabling personalized genomics. We focus here on the variants in a person's ‘exome,’ which is the set of exons in a genome, because the exome is believed to harbor much of the functional variation. We provide an analysis of the ∼12,500 variants that affect the protein coding portion of an individual's genome. We identified ∼10,400 nonsynonymous single nucleotide polymorphisms (nsSNPs) in this individual, of which ∼15–20% are rare in the human population. We predict ∼1,500 nsSNPs affect protein function and these tend be heterozygous, rare, or novel. Of the ∼700 coding indels, approximately half tend to have lengths that are a multiple of three, which causes insertions/deletions of amino acids in the corresponding protein, rather than introducing frameshifts. Coding indels also occur frequently at the termini of genes, so even if an indel causes a frameshift, an alternative start or stop site in the gene can still be used to make a functional protein. In summary, we reduced the set of ∼12,500 nonsilent coding variants by ∼8-fold to a set of variants that are most likely to have major effects on their proteins' functions. This is our first glimpse of an individual's exome and a snapshot of the current state of personalized genomics. The majority of coding variants in this individual are common and appear to be functionally neutral. Our results also indicate that some variants can be used to improve the current NCBI human reference genome. As more genomes are sequenced, many rare variants and non-SNP variants will be discovered. We present an approach to analyze the coding variation in humans by proposing multiple bioinformatic methods to hone in on possible functional variation.
Characterizing the functional variation in an individual is an important step towards the era of personalized medicine. Protein-coding exons are thought to be especially enriched in functional variation. In 2007, we published the genome sequence of J. Craig Venter. Here we analyze the genetic variation of J. Craig Venter's exome, focusing on variation in the coding portion of genes, which is thought to contribute significantly to a person's physical make-up. We survey ∼12,500 nonsilent coding variants and, by applying multiple bioinformatic approaches, we reduce the number of potential phenotypic variants by ∼8-fold. Our analysis provides a snapshot of the current state of personalized genomics. We find that <1% of variants are linked to any known phenotypes; this demonstrates the dearth of scientific knowledge for phenotype-genotype associations. However, ∼80% of an individual's nonsynonymous variants are commonly found in the human population and, because phenotypic associations to common variants will be elucidated via genome-wide association studies over the next few years, the capability to interpret personalized genomes will expand and evolve. As sequencing of individual genomes becomes more prevalent, the bioinformatic approaches we present in this study can be used as a paradigm to pursue the study of protein-coding variants for the genomes of many individuals.
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases.
Polymerase chain reaction (PCR) is used in directed sequencing for the discovery of novel polymorphisms. As the first step in PCR directed sequencing, effective PCR primer design is crucial for obtaining high-quality sequence data for target regions. Since current computational primer design tools are not fully tuned with stable underlying laboratory protocols, researchers may still be forced to iteratively optimize protocols for failed amplifications after the primers have been ordered. Furthermore, potentially identifiable factors which contribute to PCR failures have yet to be elucidated. This inefficient approach to primer design is further intensified in a high-throughput laboratory, where hundreds of genes may be targeted in one experiment.
We have developed a fully integrated computational PCR primer design pipeline that plays a key role in our high-throughput directed sequencing pipeline. Investigators may specify target regions defined through a rich set of descriptors, such as Ensembl accessions and arbitrary genomic coordinates. Primer pairs are then selected computationally to produce a minimal amplicon set capable of tiling across the specified target regions. As part of the tiling process, primer pairs are computationally screened to meet the criteria for success with one of two PCR amplification protocols. In the process of improving our sequencing success rate, which currently exceeds 95% for exons, we have discovered novel and accurate computational methods capable of identifying primers that may lead to PCR failures. We reveal the laboratory protocols and their associated, empirically determined computational parameters, as well as describe the novel computational methods which may benefit others in future primer design research.
The high-throughput PCR primer design pipeline has been very successful in providing the basis for high-quality directed sequencing results and for minimizing costs associated with labor and reprocessing. The modular architecture of the primer design software has made it possible to readily integrate additional primer critique tests based on iterative feedback from the laboratory. As a result, the primer design software, coupled with the laboratory protocols, serves as a powerful tool for low and high-throughput primer design to enable successful directed sequencing.
The world's oceans contain a complex mixture of micro-organisms that are for the most part, uncharacterized both genetically and biochemically. We report here a metagenomic study of the marine planktonic microbiota in which surface (mostly marine) water samples were analyzed as part of the Sorcerer II Global Ocean Sampling expedition. These samples, collected across a several-thousand km transect from the North Atlantic through the Panama Canal and ending in the South Pacific yielded an extensive dataset consisting of 7.7 million sequencing reads (6.3 billion bp). Though a few major microbial clades dominate the planktonic marine niche, the dataset contains great diversity with 85% of the assembled sequence and 57% of the unassembled data being unique at a 98% sequence identity cutoff. Using the metadata associated with each sample and sequencing library, we developed new comparative genomic and assembly methods. One comparative genomic method, termed “fragment recruitment,” addressed questions of genome structure, evolution, and taxonomic or phylogenetic diversity, as well as the biochemical diversity of genes and gene families. A second method, termed “extreme assembly,” made possible the assembly and reconstruction of large segments of abundant but clearly nonclonal organisms. Within all abundant populations analyzed, we found extensive intra-ribotype diversity in several forms: (1) extensive sequence variation within orthologous regions throughout a given genome; despite coverage of individual ribotypes approaching 500-fold, most individual sequencing reads are unique; (2) numerous changes in gene content some with direct adaptive implications; and (3) hypervariable genomic islands that are too variable to assemble. The intra-ribotype diversity is organized into genetically isolated populations that have overlapping but independent distributions, implying distinct environmental preference. We present novel methods for measuring the genomic similarity between metagenomic samples and show how they may be grouped into several community types. Specific functional adaptations can be identified both within individual ribotypes and across the entire community, including proteorhodopsin spectral tuning and the presence or absence of the phosphate-binding gene PstS.
Marine microbes remain elusive and mysterious, even though they are the most abundant life form in the ocean, form the base of the marine food web, and drive energy and nutrient cycling. We know so little about the vast majority of microbes because only a small percentage can be cultivated and studied in the lab. Here we report on the Global Ocean Sampling expedition, an environmental metagenomics project that aims to shed light on the role of marine microbes by sequencing their DNA without first needing to isolate individual organisms. A total of 41 different samples were taken from a wide variety of aquatic habitats collected over 8,000 km. The resulting 7.7 million sequencing reads provide an unprecedented look at the incredible diversity and heterogeneity in naturally occurring microbial populations. We have developed new bioinformatic methods to reconstitute large portions of both cultured and uncultured microbial genomes. Organism diversity is analyzed in relation to sampling locations and environmental pressures. Taken together, these data and analyses serve as a foundation for greatly expanding our understanding of individual microbial lineages and their evolution, the nature of marine microbial communities, and how they are impacted by and impact our world.
TheSorcerer II GOS expedition, data sampling, and analysis is described. The immense diversity in the sequence data required novel comparative genomic assembly methods, which uncovered genomic differences that marker-based methods could not.
Cyclin T1 (CycT1), a component of positive-transcription-elongation factor-b (P-TEFb), is an essential cofactor for transcriptional activation by lentivirus Tat proteins. It is thought that low CycT1 expression levels restrict human immunodeficiency virus type 1 (HIV-1) expression levels and replication in resting CD4+ lymphocytes. In this study, we undertook a functional analysis of the cycT1 promoter to determine which, if any, promoter elements might be responsible for cellular activation state-dependent CycT1 expression. The cycT1 gene contains a complex promoter that exhibits an extreme degree of functional redundancy: five nonoverlapping fragments were found to exhibit significant promoter activity in immortalized cell lines, and these elements could interact in a synergistic or redundant manner to mediate cycT1 transcription. Reporter gene expression, mediated by the cycT1 promoter, was detectable in unstimulated transfected primary lymphocytes and multiple sites within the promoter could serve to initiate transcription. While utilization of these start sites was significantly altered by the application of exogenous stimuli to primary lymphocytes and two distinct promoter elements exhibited enhanced activity in the presence of phorbol ester, overall cycT1 transcription was only modestly enhanced in response to cell activation. These observations prompted a reexamination of CycT1 protein expression in primary lymphocytes. In fact, steady-state CycT1 expression is only slightly lower in unstimulated lymphocytes compared to phorbol ester-treated cells or a panel of immortalized cell lines. Importantly, CycT1 is expressed at sufficient levels in unstimulated primary cells to support robust Tat activity. These results strongly suggest that CycT1 expression levels in unstimulated primary lymphocytes do not profoundly limit HIV-1 gene expression or provide an adequate mechanistic explanation for proviral latency in vivo.