PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (34)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
Document Types
1.  Nasal microenvironments and interspecific interactions influence nasal microbiota complexity and S. aureus carriage 
Cell host & microbe  2013;14(6):631-640.
Summary
The indigenous microbiota of the nasal cavity plays important roles in human health and disease. Patterns of spatial variation in microbiota composition may help explain Staphylococcus aureus colonization, and reveal interspecies and species-host interactions. To assess the biogeography of the nasal microbiota, we sampled healthy subjects, representing both S. aureus carriers and non-carriers, at 3 nasal sites (anterior naris, middle meatus, and sphenoethmoidal recess). Phylogenetic compositional and sparse linear discriminant analyses revealed communities that differed according to site epithelium type and S. aureus culture-based carriage status. Corynebacterium accolens and C. pseudodiphtheriticum were identified as the most important microbial community determinants of S. aureus carriage, with competitive interactions evident only at sites with ciliated pseudostratified columnar epithelium. In vitro co-cultivation experiments provided supporting evidence of interactions among these species. These results highlight spatial variation in nasal microbial communities and differences in community composition between S. aureus carriers and non-carriers.
doi:10.1016/j.chom.2013.11.005
PMCID: PMC3902146  PMID: 24331461
16S rDNA; biogeography; epithelium type; microbiome; nose; sparse linear discriminant analysis; Staphylococcus aureus carrier; Corynebacterium accolens; Corynebacterium pseudodiphtheriticum; species-species interactions
2.  Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking 
Bioinformatics  2014;31(2):282-283.
Summary: We have created a Shiny-based Web application, called Shiny-phyloseq, for dynamic interaction with microbiome data that runs on any modern Web browser and requires no programming, increasing the accessibility and decreasing the entrance requirement to using phyloseq and related R tools. Along with a data- and context-aware dynamic interface for exploring the effects of parameter and method choices, Shiny-phyloseq also records the complete user input and subsequent graphical results of a user’s session, allowing the user to archive, share and reproduce the sequence of steps that created their result—without writing any new code themselves.
Availability and implementation: Shiny-phyloseq is implemented entirely in the R language. It can be hosted/launched by any system with R installed, including Windows, Mac OS and most Linux distributions. Information technology administrators can also host Shiny-phyloseq from a remote server, in which case users need only have a Web browser installed. Shiny-phyloseq is provided free of charge under a GPL-3 open-source license through GitHub at http://joey711.github.io/shiny-phyloseq/.
Contact: mcmurdie@alumni.stanford.edu.
doi:10.1093/bioinformatics/btu616
PMCID: PMC4287943  PMID: 25262154
3.  Harvester ants use interactions to regulate forager activation and availability 
Animal behaviour  2013;86(1):197-207.
Social groups balance flexibility and robustness in their collective response to environmental changes using feedback between behavioural processes that operate at different timescales. Here we examine how behavioural processes operating at two timescales regulate the foraging activity of colonies of the harvester ant, Pogonomyrmex barbatus, allowing them to balance their response to food availability and predation. Previous work showed that the rate at which foragers return to the nest with food influences the rate at which foragers leave the nest. To investigate how interactions inside the nest link the rates of returning and outgoing foragers, we observed outgoing foragers inside the nest in field colonies using a novel observation method. We found that the interaction rate experienced by outgoing foragers inside the nest corresponded to forager return rate, and that the interactions of outgoing foragers were spatially clustered. Activation of a forager occurred on the timescale of seconds: a forager left the nest 3–8 s after a substantial increase in interactions with returning foragers. The availability of outgoing foragers to become activated was adjusted on the timescale of minutes: when forager return was interrupted for more than 4–5 min, available foragers waiting near the nest entrance went deeper into the nest. Thus, forager activation and forager availability both increased with the rate at which foragers returned to the nest. This process was checked by negative feedback between forager activation and forager availability. Regulation of foraging activation on the timescale of seconds provides flexibility in response to fluctuations in food abundance, whereas regulation of forager availability on the timescale of minutes provides robustness in response to sustained disturbance such as predation.
doi:10.1016/j.anbehav.2013.05.012
PMCID: PMC3767282  PMID: 24031094
collective behaviour; complex system; flexibility; foraging; interaction rate; Pogonomyrmex barbatus; regulation; robustness; temporal dynamics; timescale
4.  Nucleoside Reverse Transcriptase Inhibitor Resistance Mutations Associated with First-Line Stavudine-Containing Antiretroviral Therapy: Programmatic Implications for Countries Phasing Out Stavudine 
The Journal of Infectious Diseases  2013;207(Suppl 2):S70-S77.
Background
The World Health Organization Antiretroviral Treatment Guidelines recommend phasing-out stavudine because of its risk of long-term toxicity. There are two mutational pathways of stavudine resistance with different implications for zidovudine and tenofovir cross-resistance, the primary candidates for replacing stavudine. However, because resistance testing is rarely available in resource-limited settings, it is critical to identify the cross-resistance patterns associated with first-line stavudine failure.
Methods
We analyzed HIV-1 resistance mutations following first-line stavudine failure from 35 publications comprising 1,825 individuals. We also assessed the influence of concomitant nevirapine vs. efavirenz, therapy duration, and HIV-1 subtype on the proportions of mutations associated with zidovudine vs. tenofovir cross-resistance.
Results
Mutations with preferential zidovudine activity, K65R or K70E, occurred in 5.3% of individuals. Mutations with preferential tenofovir activity, ≥two thymidine analog mutations (TAMs) or Q151M, occurred in 22% of individuals. Nevirapine increased the risk of TAMs, K65R, and Q151M. Longer therapy increased the risk of TAMs and Q151M but not K65R. Subtype C and CRF01_AE increased the risk of K65R, but only CRF01_AE increased the risk of K65R without Q151M.
Conclusions
Regardless of concomitant nevirapine vs. efavirenz, therapy duration, or subtype, tenofovir was more likely than zidovudine to retain antiviral activity following first-line d4T therapy.
doi:10.1093/infdis/jit114
PMCID: PMC3657117  PMID: 23687292
HIV-1; drug resistance; mutations; nucleoside reverse transcriptase inhibitor; NRTI; stavudine; d4T; zidovudine; AZT; tenofovir; TDF; subtypes
5.  Detection of Cytomegalovirus Drug Resistance Mutations by Next-Generation Sequencing 
Journal of Clinical Microbiology  2013;51(11):3700-3710.
Antiviral therapy for cytomegalovirus (CMV) plays an important role in the clinical management of solid organ and hematopoietic stem cell transplant recipients. However, CMV antiviral therapy can be complicated by drug resistance associated with mutations in the phosphotransferase UL97 and the DNA polymerase UL54. We have developed an amplicon-based high-throughput sequencing strategy for detecting CMV drug resistance mutations in clinical plasma specimens using a microfluidics PCR platform for multiplexed library preparation and a benchtop next-generation sequencing instrument. Plasmid clones of the UL97 and UL54 genes were used to demonstrate the low overall empirical error rate of the assay (0.189%) and to develop a statistical algorithm for identifying authentic low-abundance variants. The ability of the assay to detect resistance mutations was tested with mixes of wild-type and mutant plasmids, as well as clinical CMV isolates and plasma samples that were known to contain mutations that confer resistance. Finally, 48 clinical plasma specimens with a range of viral loads (394 to 2,191,011 copies/ml plasma) were sequenced using multiplexing of up to 24 specimens per run. This led to the identification of seven resistance mutations, three of which were present in <20% of the sequenced population. Thus, this assay offers more sensitive detection of minor variants and a higher multiplexing capacity than current methods for the genotypic detection of CMV drug resistance mutations.
doi:10.1128/JCM.01605-13
PMCID: PMC3889754  PMID: 23985916
6.  Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible 
PLoS Computational Biology  2014;10(4):e1003531.
Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.
Author Summary
The term microbiome refers to the ecosystem of microbes that live in a defined environment. The decreasing cost and increasing speed of DNA sequencing technology has recently provided scientists with affordable and timely access to the genes and genomes of microbiomes that inhabit our planet and even our own bodies. In these investigations many microbiome samples are sequenced at the same time on the same DNA sequencing machine, but often result in total numbers of sequences per sample that are vastly different. The common procedure for addressing this difference in sequencing effort across samples – different library sizes – is to either (1) base analyses on the proportional abundance of each species in a library, or (2) rarefy, throw away sequences from the larger libraries so that all have the same, smallest size. We show that both of these normalization methods can work when comparing obviously-different whole microbiomes, but that neither method works well when comparing the relative proportions of each bacterial species across microbiome samples. We show that alternative methods based on a statistical mixture model perform much better and can be easily adapted from a separate biological sub-discipline, called RNA-Seq analysis.
doi:10.1371/journal.pcbi.1003531
PMCID: PMC3974642  PMID: 24699258
7.  HIV-1 Transmission Networks in a Small World 
The Journal of Infectious Diseases  2013;209(2):180-182.
doi:10.1093/infdis/jit525
PMCID: PMC3873789  PMID: 24151310
HIV; phylogeny; transmission; network
8.  Nest site and weather affect the personality of harvester ant colonies 
Behavioral Ecology  2012;23(5):1022-1029.
Environmental conditions and physical constraints both influence an animal's behavior. We investigate whether behavioral variation among colonies of the black harvester ant, Messor andrei, remains consistent across foraging and disturbance situations and ask whether consistent colony behavior is affected by nest site and weather. We examined variation among colonies in responsiveness to food baits and to disturbance, measured as a change in numbers of active ants, and in the speed with which colonies retrieved food and removed debris. Colonies differed consistently, across foraging and disturbance situations, in both responsiveness and speed. Increased activity in response to food was associated with a smaller decrease in response to alarm. Speed of retrieving food was correlated with speed of removing debris. In all colonies, speed was greater in dry conditions, reducing the amount of time ants spent outside the nest. While a colony occupied a certain nest site, its responsiveness was consistent in both foraging and disturbance situations, suggesting that nest structure influences colony personality.
doi:10.1093/beheco/ars066
PMCID: PMC3431114  PMID: 22936841
behavioral syndromes; collective behavior; harvester ant; Messor andrei; nest structure; personality; plasticity; social insects; temperament
9.  Prototypical Recombinant Multi-Protease-Inhibitor-Resistant Infectious Molecular Clones of Human Immunodeficiency Virus Type 1 
The many genetic manifestations of HIV-1 protease inhibitor (PI) resistance present challenges to research into the mechanisms of PI resistance and the assessment of new PIs. To address these challenges, we created a panel of recombinant multi-PI-resistant infectious molecular clones designed to represent the spectrum of clinically relevant multi-PI-resistant viruses. To assess the representativeness of this panel, we examined the sequences of the panel's viruses in the context of a correlation network of PI resistance amino acid substitutions in sequences from more than 10,000 patients. The panel of recombinant infectious molecular clones comprised 29 of 41 study-defined PI resistance amino acid substitutions and 23 of the 27 tightest amino acid substitution clusters. Based on their phenotypic properties, the clones were classified into four groups with increasing cross-resistance to the PIs most commonly used for salvage therapy: lopinavir (LPV), tipranavir (TPV), and darunavir (DRV). The panel of recombinant infectious molecular clones has been made available without restriction through the NIH AIDS Research and Reference Reagent Program. The public availability of the panel makes it possible to compare the inhibitory activities of different PIs with one another. The diversity of the panel and the high-level PI resistance of its clones suggest that investigational PIs active against the clones in this panel will retain antiviral activity against most if not all clinically relevant PI-resistant viruses.
doi:10.1128/AAC.00614-13
PMCID: PMC3754322  PMID: 23796938
10.  Low-Level Persistence of Drug Resistance Mutations in Hepatitis B Virus-Infected Subjects with a Past History of Lamivudine Treatment 
We sought to determine the prevalence of hepatitis B virus (HBV) lamivudine (LAM)-resistant minority variants in subjects who once received LAM but had discontinued it prior to virus sampling. We performed direct PCR Sanger sequencing and ultradeep pyrosequencing (UDPS) of HBV reverse transcriptase (RT) of plasma viruses from 45 LAM-naive subjects and 46 LAM-experienced subjects who had discontinued LAM a median of 24 months earlier. UDPS was performed to a depth of ∼3,000 reads per nucleotide. Minority variants were defined as differences from the Sanger sequence present in ≥0.5% of UDPS reads in a sample. Sanger sequencing identified ≥1 LAM resistance mutations (rtL80I/V, rtM204I, and rtA181T) in samples from 5 (11%) of 46 LAM-experienced and none of 45 LAM-naive subjects (0%; P = 0.06). UDPS detected ≥1 LAM resistance mutations (rtL80I/V, rtV173L, rtL180M, rtA181T, and rtM204I/V) in 10 (22%) of the 46 LAM-experienced subjects, including 5 in whom LAM resistance mutations were not identified by Sanger sequencing. Overall, LAM resistance mutations were more likely to be present in LAM-experienced (10/46, 22%) than LAM-naive subjects (0/45, 0%; P = 0.001). The median time since LAM discontinuation was 12.8 months in the 10 subjects with a LAM resistance mutation compared to 30.5 months in the 36 LAM-experienced subjects without a LAM resistance mutation (P < 0.001). The likelihood of detecting a LAM resistance mutation was significantly increased using UDPS compared to Sanger sequencing and was inversely associated with the time since LAM discontinuation.
doi:10.1128/AAC.01601-12
PMCID: PMC3535911  PMID: 23114756
11.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data 
PLoS ONE  2013;8(4):e61217.
Background
The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.
Results
Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.
Conclusions
The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.
doi:10.1371/journal.pone.0061217
PMCID: PMC3632530  PMID: 23630581
12.  PRC2/EED-EZH2 Complex Is Up-Regulated in Breast Cancer Lymph Node Metastasis Compared to Primary Tumor and Correlates with Tumor Proliferation In Situ 
PLoS ONE  2012;7(12):e51239.
Background
Lymph node metastasis is a key event in the progression of breast cancer. Therefore it is important to understand the underlying mechanisms which facilitate regional lymph node metastatic progression.
Methodology/Principal Findings
We performed gene expression profiling of purified tumor cells from human breast tumor and lymph node metastasis. By microarray network analysis, we found an increased expression of polycomb repression complex 2 (PRC2) core subunits EED and EZH2 in lymph node metastatic tumor cells over primary tumor cells which were validated through real-time PCR. Additionally, immunohistochemical (IHC) staining and quantitative image analysis of whole tissue sections showed a significant increase of EZH2 expressing tumor cells in lymph nodes over paired primary breast tumors, which strongly correlated with tumor cell proliferation in situ. We further explored the mechanisms of PRC2 gene up-regulation in metastatic tumor cells and found up-regulation of E2F genes, MYC targets and down-regulation of tumor suppressor gene E-cadherin targets in lymph node metastasis through GSEA analyses. Using IHC, the expression of potential EZH2 target, E-cadherin was examined in paired primary/lymph node samples and was found to be significantly decreased in lymph node metastases over paired primary tumors.
Conclusions/Significance
This study identified an over expression of the epigenetic silencing complex PRC2/EED-EZH2 in breast cancer lymph node metastasis as compared to primary tumor and its positive association with tumor cell proliferation in situ. Concurrently, PRC2 target protein E-cadherin was significant decreased in lymph node metastases, suggesting PRC2 promotes epithelial mesenchymal transition (EMT) in lymph node metastatic process through repression of E-cadherin. These results indicate that epigenetic regulation mediated by PRC2 proteins may provide additional advantage for the outgrowth of metastatic tumor cells in lymph nodes. This opens up epigenetic drug development possibilities for the treatment and prevention of lymph node metastasis in breast cancer.
doi:10.1371/journal.pone.0051239
PMCID: PMC3519681  PMID: 23251464
13.  Denoising PCR-amplified metagenome data 
BMC Bioinformatics  2012;13:283.
Background
PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy.
Results
We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche’s 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise.
Conclusions
DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.
doi:10.1186/1471-2105-13-283
PMCID: PMC3563472  PMID: 23113967
14.  Ultra-Deep Pyrosequencing of Hepatitis B Virus Quasispecies from Nucleoside and Nucleotide Reverse-Transcriptase Inhibitor (NRTI)–Treated Patients and NRTI-Naive Patients 
The Journal of Infectious Diseases  2009;199(9):1275-1285.
The dynamics of emerging nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI) resistance in hepatitis B virus (HBV) are not well understood because standard dideoxynucleotide direct polymerase chain reaction (PCR) sequencing assays detect drug-resistance mutations only after they have become dominant. To obtain insight into NRTI resistance, we used a new sequencing technology to characterize the spectrum of low-prevalence NRTI-resistance mutations in HBV obtained from 20 plasma samples from 11 NRTI-treated patients and 17 plasma samples from 17 NRTI-naive patients, by using standard direct PCR sequencing and ultra-deep pyrosequencing (UDPS). UDPS detected drug-resistance mutations that were not detected by PCR in 10 samples from 5 NRTI-treated patients, including the lamivudine-resistance mutation V173L (in 5 samples), the entecavir-resistance mutations T184S (in 2 samples) and S202G (in 1 sample), the adefovir-resistance mutation N236T (in 1 sample), and the lamivudine and adefovir–resistance mutations V173L, L180M, A181T, and M204V (in 1 sample). G-to-A hypermutation mediated by the apolipoprotein B mRNA editing enzyme, catalytic polypeptide–like family of cytidine deaminases was estimated to be present in 0.6% of reverse-transcriptase genes. Genotype A coinfection was detected by UDPS in each of 3 patients in whom genotype G virus was detected by direct PCR sequencing. UDPS detected low-prevalence HBV variants with NRTI-resistance mutations, G-to-A hypermutation, and low-level dual genotype infection with a sensitivity not previously possible.
doi:10.1086/597808
PMCID: PMC3353721  PMID: 19301976
15.  Colony variation in the collective regulation of foraging by harvester ants 
Behavioral Ecology  2011;22(2):429-435.
This study investigates variation in collective behavior in a natural population of colonies of the harvester ant, Pogonomyrmex barbatus. Harvester ant colonies regulate foraging activity to adjust to current food availability; the rate at which inactive foragers leave the nest on the next trip depends on the rate at which successful foragers return with food. This study investigates differences among colonies in foraging activity and how these differences are associated with variation among colonies in the regulation of foraging. Colonies differ in the baseline rate at which patrollers leave the nest, without stimulation from returning ants. This baseline rate predicts a colony's foraging activity, suggesting there is a colony-specific activity level that influences how quickly any ant leaves the nest. When a colony's foraging activity is high, the colony is more likely to regulate foraging. Moreover, colonies differ in the propensity to adjust the rate of outgoing foragers to the rate of forager return. Naturally occurring variation in the regulation of foraging may lead to variation in colony survival and reproductive success.
doi:10.1093/beheco/arq218
PMCID: PMC3071749  PMID: 22479133
behavioral reaction norm; behavioral syndrome; individual variation
16.  THE DUALITY DIAGRAM IN DATA ANALYSIS: EXAMPLES OF MODERN APPLICATIONS 
The annals of applied statistics  2011;5(4):2266-2277.
Today's data-heavy research environment requires the integration of different sources of information into structured datasets that can not be analyzed as simple matrices. We introduce an old technique, known in the European data analyses circles as the Duality Diagram Approach, put to new uses through the use of a variety of metrics and ways of combining different diagrams together. This issue of the Annals of Applied Statistics contains contemporary examples of how this approach provides solutions to hard problems in data integration. We present here the genesis of the technique and how it can be seen as a precursor of the modern kernel based approaches.
PMCID: PMC3265363  PMID: 22282721
17.  A multifaceted analysis of HIV-1 protease multidrug resistance phenotypes 
BMC Bioinformatics  2011;12:477.
Background
Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to arise within the population. Understanding the phenotypic and genotypic patterns responsible for multi-PI resistance is necessary for developing PIs that are active against clinically-relevant PI-resistant HIV-1 variants.
Results
In this work, we use globally optimal integer programming-based clustering techniques to elucidate multi-PI phenotypic resistance patterns using a data set of 398 HIV-1 protease sequences that have each been phenotyped for susceptibility toward the nine clinically-approved HIV-1 PIs. We validate the information content of the clusters by evaluating their ability to predict the level of decreased susceptibility to each of the available PIs using a cross validation procedure. We demonstrate the finding that as a result of phenotypic cross resistance, the considered clinical HIV-1 protease isolates are confined to ~6% or less of the clinically-relevant phenotypic space. Clustering and feature selection methods are used to find representative sequences and mutations for major resistance phenotypes to elucidate their genotypic signatures. We show that phenotypic similarity does not imply genotypic similarity, that different PI-resistance mutation patterns can give rise to HIV-1 isolates with similar phenotypic profiles.
Conclusion
Rather than characterizing HIV-1 susceptibility toward each PI individually, our study offers a unique perspective on the phenomenon of PI class resistance by uncovering major multidrug-resistant phenotypic patterns and their often diverse genotypic determinants, providing a methodology that can be applied to understand clinically-relevant phenotypic patterns to aid in the design of novel inhibitors that target other rapidly evolving molecular targets as well.
doi:10.1186/1471-2105-12-477
PMCID: PMC3305535  PMID: 22172090
18.  A classification model for G-to-A hypermutation in hepatitis B virus ultra-deep pyrosequencing reads 
Bioinformatics  2010;26(23):2929-2932.
Motivation: G → A hypermutation is an innate antiviral defense mechanism, mediated by host enzymes, which leads to the mutational impairment of viruses. Sensitive and specific identification of host-mediated G → A hypermutation is a novel sequence analysis challenge, particularly for viral deep sequencing studies. For example, two of the most common hepatitis B virus (HBV) reverse transcriptase (RT) drug-resistance mutations, A181T and M204I, arise from G → A changes and are routinely detected as low-abundance variants in nearly all HBV deep sequencing samples.
Results: We developed a classification model using measures of G → A excess and predicted indicators of lethal mutation and applied this model to 325 920 unique deep sequencing reads from plasma virus samples from 45 drug treatment-naïve HBV-infected individuals. The 2.9% of sequence reads that were classified as hypermutated by our model included most of the reads with A181T and/or M204I, indicating the usefulness of this model for distinguishing viral adaptive changes from host-mediated viral editing.
Availability: Source code and sequence data are available at http://hivdb.stanford.edu/pages/resources.html.
Contact: ereuman@stanfordalumni.org
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq570
PMCID: PMC2982158  PMID: 20937597
19.  Site-Specific Mobilization of Vinyl Chloride Respiration Islands by a Mechanism Common in Dehalococcoides 
BMC Genomics  2011;12:287.
Background
Vinyl chloride is a widespread groundwater pollutant and Group 1 carcinogen. A previous comparative genomic analysis revealed that the vinyl chloride reductase operon, vcrABC, of Dehalococcoides sp. strain VS is embedded in a horizontally-acquired genomic island that integrated at the single-copy tmRNA gene, ssrA.
Results
We targeted conserved positions in available genomic islands to amplify and sequence four additional vcrABC -containing genomic islands from previously-unsequenced vinyl chloride respiring Dehalococcoides enrichments. We identified a total of 31 ssrA-specific genomic islands from Dehalococcoides genomic data, accounting for 47 reductive dehalogenase homologous genes and many other non-core genes. Sixteen of these genomic islands contain a syntenic module of integration-associated genes located adjacent to the predicted site of integration, and among these islands, eight contain vcrABC as genetic 'cargo'. These eight vcrABC -containing genomic islands are syntenic across their ~12 kbp length, but have two phylogenetically discordant segments that unambiguously differentiate the integration module from the vcrABC cargo. Using available Dehalococcoides phylogenomic data we estimate that these ssrA-specific genomic islands are at least as old as the Dehalococcoides group itself, which in turn is much older than human civilization.
Conclusions
The vcrABC -containing genomic islands are a recently-acquired subset of a diverse collection of ssrA-specific mobile elements that are a major contributor to strain-level diversity in Dehalococcoides, and may have been throughout its evolution. The high similarity between vcrABC sequences is quantitatively consistent with recent horizontal acquisition driven by ~100 years of industrial pollution with chlorinated ethenes.
doi:10.1186/1471-2164-12-287
PMCID: PMC3146451  PMID: 21635780
20.  An Interactive Java Statistical Image Segmentation System: GemIdent 
Supervised learning can be used to segment/identify regions of interest in images using both color and morphological information. A novel object identification algorithm was developed in Java to locate immune and cancer cells in images of immunohistochemically-stained lymph node tissue from a recent study published by Kohrt et al. (2005). The algorithms are also showing promise in other domains. The success of the method depends heavily on the use of color, the relative homogeneity of object appearance and on interactivity. As is often the case in segmentation, an algorithm specifically tailored to the application works better than using broader methods that work passably well on any problem. Our main innovation is the interactive feature extraction from color images. We also enable the user to improve the classification with an interactive visualization system. This is then coupled with the statistical learning algorithms and intensive feedback from the user over many classification-correction iterations, resulting in a highly accurate and user-friendly solution. The system ultimately provides the locations of every cell recognized in the entire tissue in a text file tailored to be easily imported into R (Ihaka and Gentleman 1996; R Development Core Team 2009) for further statistical analyses. This data is invaluable in the study of spatial and multidimensional relationships between cell populations and tumor structure. This system is available at http://www.GemIdent.com/ together with three demonstration videos and a manual.
PMCID: PMC3100170  PMID: 21614138
interactive boosting; cell recognition; image segmentation; Java
21.  The effect of individual variation on the structure and function of interaction networks in harvester ants 
Social insects exhibit coordinated behaviour without central control. Local interactions among individuals determine their behaviour and regulate the activity of the colony. Harvester ants are recruited for outside work, using networks of brief antennal contacts, in the nest chamber closest to the nest exit: the entrance chamber. Here, we combine empirical observations, image analysis and computer simulations to investigate the structure and function of the interaction network in the entrance chamber. Ant interactions were distributed heterogeneously in the chamber, with an interaction hot-spot at the entrance leading further into the nest. The distribution of the total interactions per ant followed a right-skewed distribution, indicating the presence of highly connected individuals. Numbers of ant encounters observed positively correlated with the duration of observation. Individuals varied in interaction frequency, even after accounting for the duration of observation. An ant's interaction frequency was explained by its path shape and location within the entrance chamber. Computer simulations demonstrate that variation among individuals in connectivity accelerates information flow to an extent equivalent to an increase in the total number of interactions. Individual variation in connectivity, arising from variation among ants in location and spatial behaviour, creates interaction centres, which may expedite information flow.
doi:10.1098/rsif.2011.0059
PMCID: PMC3177612  PMID: 21490001
agent-based model; movement pattern; network analysis; Pogonomyrmex barbatus; spatial behaviour; weighted degree
22.  Threshold Graph Limits and Random Threshold Graphs 
Internet mathematics  2008;5(3):267-320.
We study the limit theory of large threshold graphs and apply this to a variety of models for random threshold graphs. The results give a nice set of examples for the emerging theory of graph limits.
PMCID: PMC2930250  PMID: 20811581
23.  Quantitative, Architectural Analysis of Immune Cell Subsets in Tumor-Draining Lymph Nodes from Breast Cancer Patients and Healthy Lymph Nodes 
PLoS ONE  2010;5(8):e12420.
Background
To date, pathological examination of specimens remains largely qualitative. Quantitative measures of tissue spatial features are generally not captured. To gain additional mechanistic and prognostic insights, a need for quantitative architectural analysis arises in studying immune cell-cancer interactions within the tumor microenvironment and tumor-draining lymph nodes (TDLNs).
Methodology/Principal Findings
We present a novel, quantitative image analysis approach incorporating 1) multi-color tissue staining, 2) high-resolution, automated whole-section imaging, 3) custom image analysis software that identifies cell types and locations, and 4) spatial statistical analysis. As a proof of concept, we applied this approach to study the architectural patterns of T and B cells within tumor-draining lymph nodes from breast cancer patients versus healthy lymph nodes. We found that the spatial grouping patterns of T and B cells differed between healthy and breast cancer lymph nodes, and this could be attributed to the lack of B cell localization in the extrafollicular region of the TDLNs.
Conclusions/Significance
Our integrative approach has made quantitative analysis of complex visual data possible. Our results highlight spatial alterations of immune cells within lymph nodes from breast cancer patients as an independent variable from numerical changes. This opens up new areas of investigations in research and medicine. Future application of this approach will lead to a better understanding of immune changes in the tumor microenvironment and TDLNs, and how they affect clinical outcomes.
doi:10.1371/journal.pone.0012420
PMCID: PMC2928294  PMID: 20811638
24.  Constrained patterns of covariation and clustering of HIV-1 non-nucleoside reverse transcriptase inhibitor resistance mutations 
Objectives
We characterized pairwise and higher order patterns of non-nucleoside reverse transcriptase inhibitor (NNRTI)-selected mutations because multiple mutations are usually required for clinically significant resistance to second-generation NNRTIs.
Patients and methods
We analysed viruses from 13 039 individuals with sequences containing at least one of 52 published NNRTI-selected mutations, including 1133 viruses from individuals who received efavirenz but no other NNRTI and 1510 viruses from individuals who received nevirapine but no other NNRTI. Of the 17 reported etravirine resistance-associated mutations (RAMs), Y181C/I/V, L100I, K101P and M230L were considered major based on published in vitro susceptibility data.
Results
Efavirenz preferentially selected for 16 mutations, including L100I (14% versus 0.1%, P < 0.001), K101P (3.3% versus 0.4%, P < 0.001) and M230L (2.8% versus 1.3%, P = 0.004), whereas nevirapine preferentially selected for 12 mutations, including Y181C/I/V (48% versus 6.9%, P < 0.001). Twenty-nine pairs of NNRTI-selected mutations covaried significantly, including Y181C with seven other mutations (A98G, K101E/H, V108I, G190A/S and H221Y), L100I with K103N, and K101P with K103S. Two pairs (Y181C + V179F and Y181C + G190S) were predicted to confer >10-fold decreased etravirine susceptibility. Seventeen percent of sequences had three or more NNRTI-selected mutations, mostly in clusters of covarying mutations. Many clusters had Y181C plus a non-major etravirine RAM; few had more than one major etravirine RAM.
Conclusions
Although major etravirine RAMs rarely occur in combination, 2 of 29 pairs of covarying mutations were associated with >10-fold decreased etravirine susceptibility. Viruses with three or more NNRTI-selected mutations often contained Y181C in combination with one or more minor etravirine RAMs; however, phenotypic and clinical correlates for most of these higher order combinations have not been published.
doi:10.1093/jac/dkq140
PMCID: PMC2882873  PMID: 20462946
Multidrug resistance; etravirine; antiviral therapy
25.  Nonpolymorphic Human Immunodeficiency Virus Type 1 Protease and Reverse Transcriptase Treatment-Selected Mutations▿  
Antimicrobial Agents and Chemotherapy  2009;53(11):4869-4878.
The spectrum of human immunodeficiency virus type 1 (HIV-1) protease and reverse transcriptase (RT) mutations selected by antiretroviral (ARV) drugs requires ongoing reassessment as ARV treatment patterns evolve and increasing numbers of protease and RT sequences of different viral subtypes are published. Accordingly, we compared the prevalences of protease and RT mutations in HIV-1 group M sequences from individuals with and without a history of previous treatment with protease inhibitors (PIs) or RT inhibitors (RTIs). Mutations in protease sequences from 26,888 individuals and in RT sequences from 25,695 individuals were classified according to whether they were nonpolymorphic in untreated individuals and whether their prevalence increased fivefold with ARV therapy. This analysis showed that 88 PI-selected and 122 RTI-selected nonpolymorphic mutations had a prevalence that was fivefold higher in individuals receiving ARVs than in ARV-naïve individuals. This was an increase of 47% and 77%, respectively, compared with the 60 PI- and 69 RTI-selected mutations identified in a similar analysis that we published in 2005 using subtype B sequences obtained from one-fourth as many individuals. In conclusion, many nonpolymorphic mutations in protease and RT are under ARV selection pressure. The spectrum of treatment-selected mutations is changing as data for more individuals are collected, treatment exposures change, and the number of available sequences from non-subtype B viruses increases.
doi:10.1128/AAC.00592-09
PMCID: PMC2772298  PMID: 19721070

Results 1-25 (34)