Current practice in the normalization of microbiome count data is inefficient in the statistical sense. For apparently historical reasons, the common approach is either to use simple proportions (which does not address heteroscedasticity) or to use rarefying of counts, even though both of these approaches are inappropriate for detection of differentially abundant species. Well-established statistical theory is available that simultaneously accounts for library size differences and biological variability using an appropriate mixture model. Moreover, specific implementations for DNA sequencing read count data (based on a Negative Binomial model for instance) are already available in RNA-Seq focused R packages such as edgeR and DESeq. Here we summarize the supporting statistical theory and use simulations and empirical data to demonstrate substantial improvements provided by a relevant mixture model framework over simple proportions or rarefying. We show how both proportions and rarefied counts result in a high rate of false positives in tests for species that are differentially abundant across sample classes. Regarding microbiome sample-wise clustering, we also show that the rarefying procedure often discards samples that can be accurately clustered by alternative methods. We further compare different Negative Binomial methods with a recently-described zero-inflated Gaussian mixture, implemented in a package called metagenomeSeq. We find that metagenomeSeq performs well when there is an adequate number of biological replicates, but it nevertheless tends toward a higher false positive rate. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.
The term microbiome refers to the ecosystem of microbes that live in a defined environment. The decreasing cost and increasing speed of DNA sequencing technology has recently provided scientists with affordable and timely access to the genes and genomes of microbiomes that inhabit our planet and even our own bodies. In these investigations many microbiome samples are sequenced at the same time on the same DNA sequencing machine, but often result in total numbers of sequences per sample that are vastly different. The common procedure for addressing this difference in sequencing effort across samples – different library sizes – is to either (1) base analyses on the proportional abundance of each species in a library, or (2) rarefy, throw away sequences from the larger libraries so that all have the same, smallest size. We show that both of these normalization methods can work when comparing obviously-different whole microbiomes, but that neither method works well when comparing the relative proportions of each bacterial species across microbiome samples. We show that alternative methods based on a statistical mixture model perform much better and can be easily adapted from a separate biological sub-discipline, called RNA-Seq analysis.
HIV; phylogeny; transmission; network
Environmental conditions and physical constraints both influence an animal's behavior. We investigate whether behavioral variation among colonies of the black harvester ant, Messor andrei, remains consistent across foraging and disturbance situations and ask whether consistent colony behavior is affected by nest site and weather. We examined variation among colonies in responsiveness to food baits and to disturbance, measured as a change in numbers of active ants, and in the speed with which colonies retrieved food and removed debris. Colonies differed consistently, across foraging and disturbance situations, in both responsiveness and speed. Increased activity in response to food was associated with a smaller decrease in response to alarm. Speed of retrieving food was correlated with speed of removing debris. In all colonies, speed was greater in dry conditions, reducing the amount of time ants spent outside the nest. While a colony occupied a certain nest site, its responsiveness was consistent in both foraging and disturbance situations, suggesting that nest structure influences colony personality.
behavioral syndromes; collective behavior; harvester ant; Messor andrei; nest structure; personality; plasticity; social insects; temperament
The many genetic manifestations of HIV-1 protease inhibitor (PI) resistance present challenges to research into the mechanisms of PI resistance and the assessment of new PIs. To address these challenges, we created a panel of recombinant multi-PI-resistant infectious molecular clones designed to represent the spectrum of clinically relevant multi-PI-resistant viruses. To assess the representativeness of this panel, we examined the sequences of the panel's viruses in the context of a correlation network of PI resistance amino acid substitutions in sequences from more than 10,000 patients. The panel of recombinant infectious molecular clones comprised 29 of 41 study-defined PI resistance amino acid substitutions and 23 of the 27 tightest amino acid substitution clusters. Based on their phenotypic properties, the clones were classified into four groups with increasing cross-resistance to the PIs most commonly used for salvage therapy: lopinavir (LPV), tipranavir (TPV), and darunavir (DRV). The panel of recombinant infectious molecular clones has been made available without restriction through the NIH AIDS Research and Reference Reagent Program. The public availability of the panel makes it possible to compare the inhibitory activities of different PIs with one another. The diversity of the panel and the high-level PI resistance of its clones suggest that investigational PIs active against the clones in this panel will retain antiviral activity against most if not all clinically relevant PI-resistant viruses.
We sought to determine the prevalence of hepatitis B virus (HBV) lamivudine (LAM)-resistant minority variants in subjects who once received LAM but had discontinued it prior to virus sampling. We performed direct PCR Sanger sequencing and ultradeep pyrosequencing (UDPS) of HBV reverse transcriptase (RT) of plasma viruses from 45 LAM-naive subjects and 46 LAM-experienced subjects who had discontinued LAM a median of 24 months earlier. UDPS was performed to a depth of ∼3,000 reads per nucleotide. Minority variants were defined as differences from the Sanger sequence present in ≥0.5% of UDPS reads in a sample. Sanger sequencing identified ≥1 LAM resistance mutations (rtL80I/V, rtM204I, and rtA181T) in samples from 5 (11%) of 46 LAM-experienced and none of 45 LAM-naive subjects (0%; P = 0.06). UDPS detected ≥1 LAM resistance mutations (rtL80I/V, rtV173L, rtL180M, rtA181T, and rtM204I/V) in 10 (22%) of the 46 LAM-experienced subjects, including 5 in whom LAM resistance mutations were not identified by Sanger sequencing. Overall, LAM resistance mutations were more likely to be present in LAM-experienced (10/46, 22%) than LAM-naive subjects (0/45, 0%; P = 0.001). The median time since LAM discontinuation was 12.8 months in the 10 subjects with a LAM resistance mutation compared to 30.5 months in the 36 LAM-experienced subjects without a LAM resistance mutation (P < 0.001). The likelihood of detecting a LAM resistance mutation was significantly increased using UDPS compared to Sanger sequencing and was inversely associated with the time since LAM discontinuation.
The analysis of microbial communities through DNA sequencing brings many challenges: the integration of different types of data with methods from ecology, genetics, phylogenetics, multivariate statistics, visualization and testing. With the increased breadth of experimental designs now being pursued, project-specific statistical analyses are often needed, and these analyses are often difficult (or impossible) for peer researchers to independently reproduce. The vast majority of the requisite tools for performing these analyses reproducibly are already implemented in R and its extensions (packages), but with limited support for high throughput microbiome census data.
Here we describe a software project, phyloseq, dedicated to the object-oriented representation and analysis of microbiome census data in R. It supports importing data from a variety of common formats, as well as many analysis techniques. These include calibration, filtering, subsetting, agglomeration, multi-table comparisons, diversity analysis, parallelized Fast UniFrac, ordination methods, and production of publication-quality graphics; all in a manner that is easy to document, share, and modify. We show how to apply functions from other R packages to phyloseq-represented data, illustrating the availability of a large number of open source analysis techniques. We discuss the use of phyloseq with tools for reproducible research, a practice common in other fields but still rare in the analysis of highly parallel microbiome census data. We have made available all of the materials necessary to completely reproduce the analysis and figures included in this article, an example of best practices for reproducible research.
The phyloseq project for R is a new open-source software package, freely available on the web from both GitHub and Bioconductor.
Lymph node metastasis is a key event in the progression of breast cancer. Therefore it is important to understand the underlying mechanisms which facilitate regional lymph node metastatic progression.
We performed gene expression profiling of purified tumor cells from human breast tumor and lymph node metastasis. By microarray network analysis, we found an increased expression of polycomb repression complex 2 (PRC2) core subunits EED and EZH2 in lymph node metastatic tumor cells over primary tumor cells which were validated through real-time PCR. Additionally, immunohistochemical (IHC) staining and quantitative image analysis of whole tissue sections showed a significant increase of EZH2 expressing tumor cells in lymph nodes over paired primary breast tumors, which strongly correlated with tumor cell proliferation in situ. We further explored the mechanisms of PRC2 gene up-regulation in metastatic tumor cells and found up-regulation of E2F genes, MYC targets and down-regulation of tumor suppressor gene E-cadherin targets in lymph node metastasis through GSEA analyses. Using IHC, the expression of potential EZH2 target, E-cadherin was examined in paired primary/lymph node samples and was found to be significantly decreased in lymph node metastases over paired primary tumors.
This study identified an over expression of the epigenetic silencing complex PRC2/EED-EZH2 in breast cancer lymph node metastasis as compared to primary tumor and its positive association with tumor cell proliferation in situ. Concurrently, PRC2 target protein E-cadherin was significant decreased in lymph node metastases, suggesting PRC2 promotes epithelial mesenchymal transition (EMT) in lymph node metastatic process through repression of E-cadherin. These results indicate that epigenetic regulation mediated by PRC2 proteins may provide additional advantage for the outgrowth of metastatic tumor cells in lymph nodes. This opens up epigenetic drug development possibilities for the treatment and prevention of lymph node metastasis in breast cancer.
PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy.
We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche’s 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise.
DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.
D6 is a scavenging-receptor for inflammatory CC chemokines that are essential for resolution of inflammatory responses in mice. Here, we demonstrate that D6 plays a central role in controlling cutaneous inflammation, and that D6 deficiency is associated with development of a psoriasis-like pathology in response to varied inflammatory stimuli in mice. Examination of D6 expression in human psoriatic skin revealed markedly elevated expression in both the epidermis and lymphatic endothelium in “uninvolved” psoriatic skin (ie, skin that was more than 8 cm distant from psoriatic plaques). Notably, this increased D6 expression is associated with elevated inflammatory chemokine expression, but an absence of plaque development, in uninvolved skin. Along with our previous observations of the ability of epidermally expressed transgenic D6 to impair cutaneous inflammatory responses, our data support a role for elevated D6 levels in suppressing inflammatory chemokine action and lesion development in uninvolved psoriatic skin. D6 expression consistently dropped in perilesional and lesional skin, coincident with development of psoriatic plaques. D6 expression in uninvolved skin also was reduced after trauma, indicative of a role for trauma-mediated reduction in D6 expression in triggering lesion development. Importantly, D6 is also elevated in peripheral blood leukocytes in psoriatic patients, indicating that upregulation may be a general protective response to inflammation. Together our data demonstrate a novel role for D6 as a regulator of the transition from uninvolved to lesional skin in psoriasis.
The dynamics of emerging nucleoside and nucleotide reverse-transcriptase inhibitor (NRTI) resistance in hepatitis B virus (HBV) are not well understood because standard dideoxynucleotide direct polymerase chain reaction (PCR) sequencing assays detect drug-resistance mutations only after they have become dominant. To obtain insight into NRTI resistance, we used a new sequencing technology to characterize the spectrum of low-prevalence NRTI-resistance mutations in HBV obtained from 20 plasma samples from 11 NRTI-treated patients and 17 plasma samples from 17 NRTI-naive patients, by using standard direct PCR sequencing and ultra-deep pyrosequencing (UDPS). UDPS detected drug-resistance mutations that were not detected by PCR in 10 samples from 5 NRTI-treated patients, including the lamivudine-resistance mutation V173L (in 5 samples), the entecavir-resistance mutations T184S (in 2 samples) and S202G (in 1 sample), the adefovir-resistance mutation N236T (in 1 sample), and the lamivudine and adefovir–resistance mutations V173L, L180M, A181T, and M204V (in 1 sample). G-to-A hypermutation mediated by the apolipoprotein B mRNA editing enzyme, catalytic polypeptide–like family of cytidine deaminases was estimated to be present in 0.6% of reverse-transcriptase genes. Genotype A coinfection was detected by UDPS in each of 3 patients in whom genotype G virus was detected by direct PCR sequencing. UDPS detected low-prevalence HBV variants with NRTI-resistance mutations, G-to-A hypermutation, and low-level dual genotype infection with a sensitivity not previously possible.
This study investigates variation in collective behavior in a natural population of colonies of the harvester ant, Pogonomyrmex barbatus. Harvester ant colonies regulate foraging activity to adjust to current food availability; the rate at which inactive foragers leave the nest on the next trip depends on the rate at which successful foragers return with food. This study investigates differences among colonies in foraging activity and how these differences are associated with variation among colonies in the regulation of foraging. Colonies differ in the baseline rate at which patrollers leave the nest, without stimulation from returning ants. This baseline rate predicts a colony's foraging activity, suggesting there is a colony-specific activity level that influences how quickly any ant leaves the nest. When a colony's foraging activity is high, the colony is more likely to regulate foraging. Moreover, colonies differ in the propensity to adjust the rate of outgoing foragers to the rate of forager return. Naturally occurring variation in the regulation of foraging may lead to variation in colony survival and reproductive success.
behavioral reaction norm; behavioral syndrome; individual variation
Today's data-heavy research environment requires the integration of different sources of information into structured datasets that can not be analyzed as simple matrices. We introduce an old technique, known in the European data analyses circles as the Duality Diagram Approach, put to new uses through the use of a variety of metrics and ways of combining different diagrams together. This issue of the Annals of Applied Statistics contains contemporary examples of how this approach provides solutions to hard problems in data integration. We present here the genesis of the technique and how it can be seen as a precursor of the modern kernel based approaches.
Great strides have been made in the effective treatment of HIV-1 with the development of second-generation protease inhibitors (PIs) that are effective against historically multi-PI-resistant HIV-1 variants. Nevertheless, mutation patterns that confer decreasing susceptibility to available PIs continue to arise within the population. Understanding the phenotypic and genotypic patterns responsible for multi-PI resistance is necessary for developing PIs that are active against clinically-relevant PI-resistant HIV-1 variants.
In this work, we use globally optimal integer programming-based clustering techniques to elucidate multi-PI phenotypic resistance patterns using a data set of 398 HIV-1 protease sequences that have each been phenotyped for susceptibility toward the nine clinically-approved HIV-1 PIs. We validate the information content of the clusters by evaluating their ability to predict the level of decreased susceptibility to each of the available PIs using a cross validation procedure. We demonstrate the finding that as a result of phenotypic cross resistance, the considered clinical HIV-1 protease isolates are confined to ~6% or less of the clinically-relevant phenotypic space. Clustering and feature selection methods are used to find representative sequences and mutations for major resistance phenotypes to elucidate their genotypic signatures. We show that phenotypic similarity does not imply genotypic similarity, that different PI-resistance mutation patterns can give rise to HIV-1 isolates with similar phenotypic profiles.
Rather than characterizing HIV-1 susceptibility toward each PI individually, our study offers a unique perspective on the phenomenon of PI class resistance by uncovering major multidrug-resistant phenotypic patterns and their often diverse genotypic determinants, providing a methodology that can be applied to understand clinically-relevant phenotypic patterns to aid in the design of novel inhibitors that target other rapidly evolving molecular targets as well.
Motivation: G → A hypermutation is an innate antiviral defense mechanism, mediated by host enzymes, which leads to the mutational impairment of viruses. Sensitive and specific identification of host-mediated G → A hypermutation is a novel sequence analysis challenge, particularly for viral deep sequencing studies. For example, two of the most common hepatitis B virus (HBV) reverse transcriptase (RT) drug-resistance mutations, A181T and M204I, arise from G → A changes and are routinely detected as low-abundance variants in nearly all HBV deep sequencing samples.
Results: We developed a classification model using measures of G → A excess and predicted indicators of lethal mutation and applied this model to 325 920 unique deep sequencing reads from plasma virus samples from 45 drug treatment-naïve HBV-infected individuals. The 2.9% of sequence reads that were classified as hypermutated by our model included most of the reads with A181T and/or M204I, indicating the usefulness of this model for distinguishing viral adaptive changes from host-mediated viral editing.
Availability: Source code and sequence data are available at http://hivdb.stanford.edu/pages/resources.html.
Supplementary information: Supplementary data are available at Bioinformatics online.
Vinyl chloride is a widespread groundwater pollutant and Group 1 carcinogen. A previous comparative genomic analysis revealed that the vinyl chloride reductase operon, vcrABC, of Dehalococcoides sp. strain VS is embedded in a horizontally-acquired genomic island that integrated at the single-copy tmRNA gene, ssrA.
We targeted conserved positions in available genomic islands to amplify and sequence four additional vcrABC -containing genomic islands from previously-unsequenced vinyl chloride respiring Dehalococcoides enrichments. We identified a total of 31 ssrA-specific genomic islands from Dehalococcoides genomic data, accounting for 47 reductive dehalogenase homologous genes and many other non-core genes. Sixteen of these genomic islands contain a syntenic module of integration-associated genes located adjacent to the predicted site of integration, and among these islands, eight contain vcrABC as genetic 'cargo'. These eight vcrABC -containing genomic islands are syntenic across their ~12 kbp length, but have two phylogenetically discordant segments that unambiguously differentiate the integration module from the vcrABC cargo. Using available Dehalococcoides phylogenomic data we estimate that these ssrA-specific genomic islands are at least as old as the Dehalococcoides group itself, which in turn is much older than human civilization.
The vcrABC -containing genomic islands are a recently-acquired subset of a diverse collection of ssrA-specific mobile elements that are a major contributor to strain-level diversity in Dehalococcoides, and may have been throughout its evolution. The high similarity between vcrABC sequences is quantitatively consistent with recent horizontal acquisition driven by ~100 years of industrial pollution with chlorinated ethenes.
Supervised learning can be used to segment/identify regions of interest in images using both color and morphological information. A novel object identification algorithm was developed in Java to locate immune and cancer cells in images of immunohistochemically-stained lymph node tissue from a recent study published by Kohrt et al. (2005). The algorithms are also showing promise in other domains. The success of the method depends heavily on the use of color, the relative homogeneity of object appearance and on interactivity. As is often the case in segmentation, an algorithm specifically tailored to the application works better than using broader methods that work passably well on any problem. Our main innovation is the interactive feature extraction from color images. We also enable the user to improve the classification with an interactive visualization system. This is then coupled with the statistical learning algorithms and intensive feedback from the user over many classification-correction iterations, resulting in a highly accurate and user-friendly solution. The system ultimately provides the locations of every cell recognized in the entire tissue in a text file tailored to be easily imported into R (Ihaka and Gentleman 1996; R Development Core Team 2009) for further statistical analyses. This data is invaluable in the study of spatial and multidimensional relationships between cell populations and tumor structure. This system is available at http://www.GemIdent.com/ together with three demonstration videos and a manual.
interactive boosting; cell recognition; image segmentation; Java
Social insects exhibit coordinated behaviour without central control. Local interactions among individuals determine their behaviour and regulate the activity of the colony. Harvester ants are recruited for outside work, using networks of brief antennal contacts, in the nest chamber closest to the nest exit: the entrance chamber. Here, we combine empirical observations, image analysis and computer simulations to investigate the structure and function of the interaction network in the entrance chamber. Ant interactions were distributed heterogeneously in the chamber, with an interaction hot-spot at the entrance leading further into the nest. The distribution of the total interactions per ant followed a right-skewed distribution, indicating the presence of highly connected individuals. Numbers of ant encounters observed positively correlated with the duration of observation. Individuals varied in interaction frequency, even after accounting for the duration of observation. An ant's interaction frequency was explained by its path shape and location within the entrance chamber. Computer simulations demonstrate that variation among individuals in connectivity accelerates information flow to an extent equivalent to an increase in the total number of interactions. Individual variation in connectivity, arising from variation among ants in location and spatial behaviour, creates interaction centres, which may expedite information flow.
agent-based model; movement pattern; network analysis; Pogonomyrmex barbatus; spatial behaviour; weighted degree
We study the limit theory of large threshold graphs and apply this to a variety of models for random threshold graphs. The results give a nice set of examples for the emerging theory of graph limits.
To date, pathological examination of specimens remains largely qualitative. Quantitative measures of tissue spatial features are generally not captured. To gain additional mechanistic and prognostic insights, a need for quantitative architectural analysis arises in studying immune cell-cancer interactions within the tumor microenvironment and tumor-draining lymph nodes (TDLNs).
We present a novel, quantitative image analysis approach incorporating 1) multi-color tissue staining, 2) high-resolution, automated whole-section imaging, 3) custom image analysis software that identifies cell types and locations, and 4) spatial statistical analysis. As a proof of concept, we applied this approach to study the architectural patterns of T and B cells within tumor-draining lymph nodes from breast cancer patients versus healthy lymph nodes. We found that the spatial grouping patterns of T and B cells differed between healthy and breast cancer lymph nodes, and this could be attributed to the lack of B cell localization in the extrafollicular region of the TDLNs.
Our integrative approach has made quantitative analysis of complex visual data possible. Our results highlight spatial alterations of immune cells within lymph nodes from breast cancer patients as an independent variable from numerical changes. This opens up new areas of investigations in research and medicine. Future application of this approach will lead to a better understanding of immune changes in the tumor microenvironment and TDLNs, and how they affect clinical outcomes.
We characterized pairwise and higher order patterns of non-nucleoside reverse transcriptase inhibitor (NNRTI)-selected mutations because multiple mutations are usually required for clinically significant resistance to second-generation NNRTIs.
Patients and methods
We analysed viruses from 13 039 individuals with sequences containing at least one of 52 published NNRTI-selected mutations, including 1133 viruses from individuals who received efavirenz but no other NNRTI and 1510 viruses from individuals who received nevirapine but no other NNRTI. Of the 17 reported etravirine resistance-associated mutations (RAMs), Y181C/I/V, L100I, K101P and M230L were considered major based on published in vitro susceptibility data.
Efavirenz preferentially selected for 16 mutations, including L100I (14% versus 0.1%, P < 0.001), K101P (3.3% versus 0.4%, P < 0.001) and M230L (2.8% versus 1.3%, P = 0.004), whereas nevirapine preferentially selected for 12 mutations, including Y181C/I/V (48% versus 6.9%, P < 0.001). Twenty-nine pairs of NNRTI-selected mutations covaried significantly, including Y181C with seven other mutations (A98G, K101E/H, V108I, G190A/S and H221Y), L100I with K103N, and K101P with K103S. Two pairs (Y181C + V179F and Y181C + G190S) were predicted to confer >10-fold decreased etravirine susceptibility. Seventeen percent of sequences had three or more NNRTI-selected mutations, mostly in clusters of covarying mutations. Many clusters had Y181C plus a non-major etravirine RAM; few had more than one major etravirine RAM.
Although major etravirine RAMs rarely occur in combination, 2 of 29 pairs of covarying mutations were associated with >10-fold decreased etravirine susceptibility. Viruses with three or more NNRTI-selected mutations often contained Y181C in combination with one or more minor etravirine RAMs; however, phenotypic and clinical correlates for most of these higher order combinations have not been published.
Multidrug resistance; etravirine; antiviral therapy
The spectrum of human immunodeficiency virus type 1 (HIV-1) protease and reverse transcriptase (RT) mutations selected by antiretroviral (ARV) drugs requires ongoing reassessment as ARV treatment patterns evolve and increasing numbers of protease and RT sequences of different viral subtypes are published. Accordingly, we compared the prevalences of protease and RT mutations in HIV-1 group M sequences from individuals with and without a history of previous treatment with protease inhibitors (PIs) or RT inhibitors (RTIs). Mutations in protease sequences from 26,888 individuals and in RT sequences from 25,695 individuals were classified according to whether they were nonpolymorphic in untreated individuals and whether their prevalence increased fivefold with ARV therapy. This analysis showed that 88 PI-selected and 122 RTI-selected nonpolymorphic mutations had a prevalence that was fivefold higher in individuals receiving ARVs than in ARV-naïve individuals. This was an increase of 47% and 77%, respectively, compared with the 60 PI- and 69 RTI-selected mutations identified in a similar analysis that we published in 2005 using subtype B sequences obtained from one-fourth as many individuals. In conclusion, many nonpolymorphic mutations in protease and RT are under ARV selection pressure. The spectrum of treatment-selected mutations is changing as data for more individuals are collected, treatment exposures change, and the number of available sequences from non-subtype B viruses increases.
Vinyl chloride (VC) is a human carcinogen and widespread priority pollutant. Here we report the first, to our knowledge, complete genome sequences of microorganisms able to respire VC, Dehalococcoides sp. strains VS and BAV1. Notably, the respective VC reductase encoding genes, vcrAB and bvcAB, were found embedded in distinct genomic islands (GEIs) with different predicted integration sites, suggesting that these genes were acquired horizontally and independently by distinct mechanisms. A comparative analysis that included two previously sequenced Dehalococcoides genomes revealed a contextually conserved core that is interrupted by two high plasticity regions (HPRs) near the Ori. These HPRs contain the majority of GEIs and strain-specific genes identified in the four Dehalococcoides genomes, an elevated number of repeated elements including insertion sequences (IS), as well as 91 of 96 rdhAB, genes that putatively encode terminal reductases in organohalide respiration. Only three core rdhA orthologous groups were identified, and only one of these groups is supported by synteny. The low number of core rdhAB, contrasted with the high rdhAB numbers per genome (up to 36 in strain VS), as well as their colocalization with GEIs and other signatures for horizontal transfer, suggests that niche adaptation via organohalide respiration is a fundamental ecological strategy in Dehalococccoides. This adaptation has been exacted through multiple mechanisms of recombination that are mainly confined within HPRs of an otherwise remarkably stable, syntenic, streamlined genome among the smallest of any free-living microorganism.
Dehalococcoides are free-living sediment and subsurface bacteria with remarkably small, streamlined genomes and an unusual degree of niche specialization. These strictly anaerobic bacteria gain metabolic energy exclusively through a novel type of respiration that results in reductive elimination of chlorides from organochlorines, many of which are priority pollutants. In this article, we compare the first complete genome sequences of Dehalococcoides strains that grow via respiration of vinyl chloride (VC), a human carcinogen and abundant groundwater pollutant. Our work provides novel insights into Dehalococcoides chromosome organization and evolution, identifies specific positions in the chromosomes where new genes—like the genes responsible for growth on VC—are integrated, and generates clues how these dechlorinating bacteria adapt to anthropogenic contamination. This information sheds new light on Dehalococcoides biology and ecology, with implications for enhanced bioremediation to protect dwindling drinking water reservoirs.
T215 revertant mutations such as T215C/D/E/S that evolve from the nucleoside reverse transcriptase (RT) inhibitor mutations T215Y/F have been found in about 3% of human immunodeficiency virus type 1 (HIV-1) isolates from newly diagnosed HIV-1-infected persons. We used a newly developed sequencing method—ultradeep pyrosequencing (UDPS; 454 Life Sciences)—to determine the frequency with which T215Y/F or other RT inhibitor resistance mutations could be detected as minority variants in samples from untreated persons that contain T215 revertants (“revertant” samples) compared with samples from untreated persons that lack such revertants (“control” samples). Among the 22 revertant and 29 control samples, UDPS detected a mean of 3.8 and 4.8 additional RT amino acid mutations, respectively. In 6 of 22 (27%) revertant samples and in 4 of 29 control samples (14%; P = 0.4), UDPS detected one or more RT inhibitor resistance mutations. T215Y or T215F was not detected in any of the revertant or control samples; however, 4 of 22 revertant samples had one or more T215 revertants that were detected by UDPS but not by direct PCR sequencing. The failure to detect viruses with T215Y/F in the 22 revertant samples in this study may result from the overwhelming replacement of transmitted T215Y variants by the more fit T215 revertants or from the primary transmission of a T215 revertant in a subset of persons with T215 revertants.
HIV-1 integrase is the third enzymatic target of antiretroviral (ARV) therapy. However, few data have been published on the distribution of naturally occurring amino acid variation in this enzyme. We therefore characterized the distribution of integrase variants among more than 1,800 published group M HIV-1 isolates from more than 1,500 integrase inhibitor (INI)-naïve individuals. Polymorphism rates equal or above 0.5% were found for 34% of the central core domain positions, 42% of the C-terminal domain positions, and 50% of the N-terminal domain positions. Among 727 ARV-naïve individuals in whom the complete pol gene was sequenced, integrase displayed significantly decreased inter- and intra-subtype diversity and a lower Shannon's entropy than protease or RT. All primary INI-resistance mutations with the exception of E157Q – which was present in 1.1% of sequences – were nonpolymorphic. Several accessory INI-resistance mutations including L74M, T97A, V151I, G163R, and S230N were also polymorphic with polymorphism rates ranging between 0.5% to 2.0%.
The purpose of this article is to catalogue in a systematic way the available information about factors that may influence the outcome and variability of cascade impactor (CI) measurements of pharmaceutical aerosols for inhalation, such as those obtained from metered dose inhalers (MDIs), dry powder inhalers (DPIs) or products for nebulization; and to suggest ways to minimize the influence of such factors. To accomplish this task, the authors constructed a cause-and-effect Ishikawa diagram for a CI measurement and considered the influence of each root cause based on industry experience and thorough literature review. The results illustrate the intricate network of underlying causes of CI variability, with the potential for several multi-way statistical interactions. It was also found that significantly more quantitative information exists about impactor-related causes than about operator-derived influences, the contribution of drug assay methodology and product-related causes, suggesting a need for further research in those areas. The understanding and awareness of all these factors should aid in the development of optimized CI methods and appropriate quality control measures for aerodynamic particle size distribution (APSD) of pharmaceutical aerosols, in line with the current regulatory initiatives involving quality-by-design (QbD).
aerosol; impactor; inhaler; nebulizer; variability