Search tips
Search criteria

Results 1-25 (1370309)

Clipboard (0)

Related Articles

1.  Optimizing transformations for automated, high throughput analysis of flow cytometry data 
BMC Bioinformatics  2010;11:546.
In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of data transformation on the output of high throughput analysis has been largely overlooked. Flow cytometry measurements can vary over several orders of magnitude, cell populations can have variances that depend on their mean fluorescence intensities, and may exhibit heavily-skewed distributions. Consequently, the choice of data transformation can influence the output of automated gating. An appropriate data transformation aids in data visualization and gating of cell populations across the range of data. Experience shows that the choice of transformation is data specific. Our goal here is to compare the performance of different transformations applied to flow cytometry data in the context of automated gating in a high throughput, fully automated setting. We examine the most common transformations used in flow cytometry, including the generalized hyperbolic arcsine, biexponential, linlog, and generalized Box-Cox, all within the BioConductor flowCore framework that is widely used in high throughput, automated flow cytometry data analysis. All of these transformations have adjustable parameters whose effects upon the data are non-intuitive for most users. By making some modelling assumptions about the transformed data, we develop maximum likelihood criteria to optimize parameter choice for these different transformations.
We compare the performance of parameter-optimized and default-parameter (in flowCore) data transformations on real and simulated data by measuring the variation in the locations of cell populations across samples, discovered via automated gating in both the scatter and fluorescence channels. We find that parameter-optimized transformations improve visualization, reduce variability in the location of discovered cell populations across samples, and decrease the misclassification (mis-gating) of individual events when compared to default-parameter counterparts.
Our results indicate that the preferred transformation for fluorescence channels is a parameter- optimized biexponential or generalized Box-Cox, in accordance with current best practices. Interestingly, for populations in the scatter channels, we find that the optimized hyperbolic arcsine may be a better choice in a high-throughput setting than current standard practice of no transformation. However, generally speaking, the choice of transformation remains data-dependent. We have implemented our algorithm in the BioConductor package, flowTrans, which is publicly available.
PMCID: PMC3243046  PMID: 21050468
2.  QUAliFiER: An automated pipeline for quality assessment of gated flow cytometry data 
BMC Bioinformatics  2012;13:252.
Effective quality assessment is an important part of any high-throughput flow cytometry data analysis pipeline, especially when considering the complex designs of the typical flow experiments applied in clinical trials. Technical issues like instrument variation, problematic antibody staining, or reagent lot changes can lead to biases in the extracted cell subpopulation statistics. These biases can manifest themselves in non–obvious ways that can be difficult to detect without leveraging information about the study design or other experimental metadata. Consequently, a systematic and integrated approach to quality assessment of flow cytometry data is necessary to effectively identify technical errors that impact multiple samples over time. Gated cell populations and their statistics must be monitored within the context of the experimental run, assay, and the overall study.
We have developed two new packages, flowWorkspace and QUAliFiER to construct a pipeline for quality assessment of gated flow cytometry data. flowWorkspace makes manually gated data accessible to BioConductor’s computational flow tools by importing pre–processed and gated data from the widely used manual gating tool, FlowJo (Tree Star Inc, Ashland OR). The QUAliFiER package takes advantage of the manual gates to perform an extensive series of statistical quality assessment checks on the gated cell sub–populations while taking into account the structure of the data and the study design to monitor the consistency of population statistics across staining panels, subject, aliquots, channels, or other experimental variables. QUAliFiER implements SVG–based interactive visualization methods, allowing investigators to examine quality assessment results across different views of the data, and it has a flexible interface allowing users to tailor quality checks and outlier detection routines to suit their data analysis needs.
We present a pipeline constructed from two new R packages for importing manually gated flow cytometry data and performing flexible and robust quality assessment checks. The pipeline addresses the increasing demand for tools capable of performing quality checks on large flow data sets generated in typical clinical trials. The QUAliFiER tool objectively, efficiently, and reproducibly identifies outlier samples in an automated manner by monitoring cell population statistics from gated or ungated flow data conditioned on experiment–level metadata.
PMCID: PMC3499158  PMID: 23020243
Flow cytometry; Quality assessment; BioConductor package
3.  Automated in-silico detection of cell populations in flow cytometry readouts and its application to leukemia disease monitoring 
BMC Bioinformatics  2006;7:282.
Identification of minor cell populations, e.g. leukemic blasts within blood samples, has become increasingly important in therapeutic disease monitoring. Modern flow cytometers enable researchers to reliably measure six and more variables, describing cellular size, granularity and expression of cell-surface and intracellular proteins, for thousands of cells per second. Currently, analysis of cytometry readouts relies on visual inspection and manual gating of one- or two-dimensional projections of the data. This procedure, however, is labor-intensive and misses potential characteristic patterns in higher dimensions.
Leukemic samples from patients with acute lymphoblastic leukemia at initial diagnosis and during induction therapy have been investigated by 4-color flow cytometry. We have utilized multivariate classification techniques, Support Vector Machines (SVM), to automate leukemic cell detection in cytometry. Classifiers were built on conventionally diagnosed training data. We assessed the detection accuracy on independent test data and analyzed marker expression of incongruently classified cells. SVM classification can recover manually gated leukemic cells with 99.78% sensitivity and 98.87% specificity.
Multivariate classification techniques allow for automating cell population detection in cytometry readouts for diagnostic purposes. They potentially reduce time, costs and arbitrariness associated with these procedures. Due to their multivariate classification rules, they also allow for the reliable detection of small cell populations.
PMCID: PMC1501051  PMID: 16753055
4.  Flow Cytometry Bioinformatics 
PLoS Computational Biology  2013;9(12):e1003365.
Flow cytometry bioinformatics is the application of bioinformatics to flow cytometry data, which involves storing, retrieving, organizing, and analyzing flow cytometry data using extensive computational resources and tools. Flow cytometry bioinformatics requires extensive use of and contributes to the development of techniques from computational statistics and machine learning. Flow cytometry and related methods allow the quantification of multiple independent biomarkers on large numbers of single cells. The rapid growth in the multidimensionality and throughput of flow cytometry data, particularly in the 2000s, has led to the creation of a variety of computational analysis methods, data standards, and public databases for the sharing of results. Computational methods exist to assist in the preprocessing of flow cytometry data, identifying cell populations within it, matching those cell populations across samples, and performing diagnosis and discovery using the results of previous steps. For preprocessing, this includes compensating for spectral overlap, transforming data onto scales conducive to visualization and analysis, assessing data for quality, and normalizing data across samples and experiments. For population identification, tools are available to aid traditional manual identification of populations in two-dimensional scatter plots (gating), to use dimensionality reduction to aid gating, and to find populations automatically in higher dimensional space in a variety of ways. It is also possible to characterize data in more comprehensive ways, such as the density-guided binary space partitioning technique known as probability binning, or by combinatorial gating. Finally, diagnosis using flow cytometry data can be aided by supervised learning techniques, and discovery of new cell types of biological importance by high-throughput statistical methods, as part of pipelines incorporating all of the aforementioned methods.
Open standards, data, and software are also key parts of flow cytometry bioinformatics. Data standards include the widely adopted Flow Cytometry Standard (FCS) defining how data from cytometers should be stored, but also several new standards under development by the International Society for Advancement of Cytometry (ISAC) to aid in storing more detailed information about experimental design and analytical steps. Open data is slowly growing with the opening of the CytoBank database in 2010 and FlowRepository in 2012, both of which allow users to freely distribute their data, and the latter of which has been recommended as the preferred repository for MIFlowCyt-compliant data by ISAC. Open software is most widely available in the form of a suite of Bioconductor packages, but is also available for web execution on the GenePattern platform.
PMCID: PMC3867282  PMID: 24363631
5.  CCAST: A Model-Based Gating Strategy to Isolate Homogeneous Subpopulations in a Heterogeneous Population of Single Cells 
PLoS Computational Biology  2014;10(7):e1003664.
A model-based gating strategy is developed for sorting cells and analyzing populations of single cells. The strategy, named CCAST, for Clustering, Classification and Sorting Tree, identifies a gating strategy for isolating homogeneous subpopulations from a heterogeneous population of single cells using a data-derived decision tree representation that can be applied to cell sorting. Because CCAST does not rely on expert knowledge, it removes human bias and variability when determining the gating strategy. It combines any clustering algorithm with silhouette measures to identify underlying homogeneous subpopulations, then applies recursive partitioning techniques to generate a decision tree that defines the gating strategy. CCAST produces an optimal strategy for cell sorting by automating the selection of gating markers, the corresponding gating thresholds and gating sequence; all of these parameters are typically manually defined. Even though CCAST is optimized for cell sorting, it can be applied for the identification and analysis of homogeneous subpopulations among heterogeneous single cell data. We apply CCAST on single cell data from both breast cancer cell lines and normal human bone marrow. On the SUM159 breast cancer cell line data, CCAST indicates at least five distinct cell states based on two surface markers (CD24 and EPCAM) and provides a gating sorting strategy that produces more homogeneous subpopulations than previously reported. When applied to normal bone marrow data, CCAST reveals an efficient strategy for gating T-cells without prior knowledge of the major T-cell subtypes and the markers that best define them. On the normal bone marrow data, CCAST also reveals two major mature B-cell subtypes, namely CD123+ and CD123- cells, which were not revealed by manual gating but show distinct intracellular signaling responses. More generally, the CCAST framework could be used on other biological and non-biological high dimensional data types that are mixtures of unknown homogeneous subpopulations.
Author Summary
Sorting out homogenous subpopulations in a heterogeneous population of single cells enables downstream characterization of specific cell types, such as cell-type specific genomic profiling. This study proposes a data-driven gating strategy, CCAST, for sorting out homogeneous subpopulations from a heterogeneous population of single cells without relying on expert knowledge thereby removing human bias and variability. In a fully automated manner, CCAST identifies the relevant gating markers, gating hierarchy and partitions that isolate homogeneous cell subpopulations. CCAST is optimized for cell sorting but can be applied to the identification and analysis of homogeneous subpopulations. CCAST is shown to identify more homogeneous breast cancer subpopulations in SUM159 compared to prior sorting strategies. When applied to normal bone marrow single cell data, CCAST proposes an efficient strategy for gating out T-cells without relying on expert knowledge; on B-cells, it reveals a new characterization of mature B-cell subtypes not revealed by manual gating.
PMCID: PMC4117418  PMID: 25078380
6.  A model of yeast cell-cycle regulation based on multisite phosphorylation 
Multisite phosphorylation of CDK target proteins provides the requisite nonlinearity for cell cycle modeling using elementary reaction mechanisms.Stochastic simulations, based on Gillespie's algorithm and using realistic numbers of protein and mRNA molecules, compare favorably with single-cell measurements in budding yeast.The role of transcription–translation coupling is critical in the robust operation of protein regulatory networks in yeast cells.
Progression through the eukaryotic cell cycle is governed by the activation and inactivation of a family of cyclin-dependent kinases (CDKs) and auxiliary proteins that regulate CDK activities (Morgan, 2007). The many components of this protein regulatory network are interconnected by positive and negative feedback loops that create bistable switches and transient pulses (Tyson and Novak, 2008). The network must ensure that cell-cycle events proceed in the correct order, that cell division is balanced with respect to cell growth, and that any problems encountered (in replicating the genome or partitioning chromosomes to daughter cells) are corrected before the cell proceeds to the next phase of the cycle. The network must operate robustly in the context of unavoidable molecular fluctuations in a yeast-sized cell. With a volume of only 5×10−14 l, a yeast cell contains one copy of the gene for each component of the network, a handful of mRNA transcripts of each gene, and a few hundreds to thousands of protein molecules carrying out each gene's function. How large are the molecular fluctuations implied by these numbers, and what effects do they have on the functioning of the cell-cycle control system?
To answer these questions, we have built a new model (Figure 1) of the CDK regulatory network in budding yeast, based on the fact that the targets of CDK activity are typically phosphorylated on multiple sites. The activity of each target protein depends on how many sites are phosphorylated. The target proteins feedback on CDK activity by controlling cyclin synthesis (SBF's role) and degradation (Cdh1's role) and by releasing a CDK-counteracting phosphatase (Cdc14). Every reaction in Figure 1 can be described by a mass-action rate law, with an accompanying rate constant that must be estimated from experimental data. As the transcription and translation of mRNA molecules have major effects on fluctuating numbers of protein molecules (Pedraza and Paulsson, 2008), we have included mRNA transcripts for each protein in the model.
To create a deterministic model, the rate laws are combined, according to standard principles of chemical kinetics, into a set of 60 differential equations that govern the temporal dynamics of the control system. In the stochastic version of the model, the rate law for each reaction determines the probability per unit time that a particular reaction occurs, and we use Gillespie's stochastic simulation algorithm (Gillespie, 1976) to compute possible temporal sequences of reaction events. Accurate stochastic simulations require knowledge of the expected numbers of mRNA and protein molecules in a single yeast cell. Fortunately, these numbers are available from several sources (Ghaemmaghami et al, 2003; Zenklusen et al, 2008). Although the experimental estimates are not always in good agreement with each other, they are sufficiently reliable to populate a stochastic model with realistic numbers of molecules.
By simulating thousands of cells (as in Figure 5), we can build up representative samples for computing the mean and s.d. of any measurable cell-cycle property (e.g. interdivision time, size at division, duration of G1 phase). The excellent fit of simulated statistics to observations of cell-cycle variability is documented in the main text and Supplementary Information.
Of particular interest to us are observations of Di Talia et al (2007) of the timing of a crucial G1 event (export of Whi5 protein from the nucleus) in a population of budding yeast cells growing at a specific growth rate α=ln2/(mass-doubling time). Whi5 export is a consequence of Whi5 phosphorylation, and it occurs simultaneously with the release (activation) of SBF (see Figure 1). Using fluorescently labeled Whi5, Di Talia et al could easily measure (in individual yeast cells) the time, T1, from cell birth to the abrupt loss of Whi5 from the nucleus. Correlating T1 to the size of the cell at birth, Vbirth, they found that, for a sample of daughter cells, αT1 versus ln(Vbirth) could be fit with two straight lines of slope −0.7 and −0.3. Our simulation of this experiment (Figure 7 of the main text) compares favorably with Figure 3d and e in Di Talia et al (2007).
The major sources of noise in our model (and in protein regulatory networks in yeast cells, in general) are related to gene transcription and the small number of unique mRNA transcripts. As each mRNA molecule may instruct the synthesis of dozens of protein molecules, the coefficient of variation of molecular fluctuations at the protein level (CVP) may be dominated by fluctuations at the mRNA level, as expressed in the formula (Pedraza and Paulsson, 2008) where NM, NP denote the number of mRNA and protein molecules, respectively, and ρ=τM/τP is the ratio of half-lives of mRNA and protein molecules. For a yeast cell, typical values of NM and NP are 8 and 800, respectively (Ghaemmaghami et al, 2003; Zenklusen et al, 2008). If ρ=1, then CVP≈25%. Such large fluctuations in protein levels are inconsistent with the observed variability of size and age at division in yeast cells, as shown in the simplified cell-cycle model of Kar et al (2009) and as we have confirmed with our more realistic model. The size of these fluctuations can be reduced to a more acceptable level by assuming a shorter half-life for mRNA (say, ρ=0.1).
There must be some mechanisms whereby yeast cells lessen the protein fluctuations implied by transcription–translation coupling. Following Pedraza and Paulsson (2008), we suggest that mRNA gestation and senescence may resolve this problem. Equation (3) is based on a simple, one-stage, birth–death model of mRNA turnover. In Supplementary Appendix 1, we show that a model of mRNA processing, with 10 stages each of mRNA gestation and senescence, gives reasonable fluctuations at the protein level (CVP≈5%), even if the effective half-life of mRNA is 10 min. A one-stage model with τM=1 min gives comparable fluctuations (CVP≈5%). In the main text, we use a simple birth–death model of mRNA turnover with an ‘effective' half-life of 1 min, in order to limit the computational complexity of the full cell-cycle model.
In order for the cell's genome to be passed intact from one generation to the next, the events of the cell cycle (DNA replication, mitosis, cell division) must be executed in the correct order, despite the considerable molecular noise inherent in any protein-based regulatory system residing in the small confines of a eukaryotic cell. To assess the effects of molecular fluctuations on cell-cycle progression in budding yeast cells, we have constructed a new model of the regulation of Cln- and Clb-dependent kinases, based on multisite phosphorylation of their target proteins and on positive and negative feedback loops involving the kinases themselves. To account for the significant role of noise in the transcription and translation steps of gene expression, the model includes mRNAs as well as proteins. The model equations are simulated deterministically and stochastically to reveal the bistable switching behavior on which proper cell-cycle progression depends and to show that this behavior is robust to the level of molecular noise expected in yeast-sized cells (∼50 fL volume). The model gives a quantitatively accurate account of the variability observed in the G1-S transition in budding yeast, which is governed by an underlying sizer+timer control system.
PMCID: PMC2947364  PMID: 20739927
bistability; cell-cycle variability; size control; stochastic model; transcription–translation coupling
7.  A flow cytometry-based workflow for detection and quantification of anti-plasmodial antibodies in vaccinated and naturally exposed individuals 
Malaria Journal  2012;11:367.
Antibodies play a central role in naturally acquired immunity against Plasmodium falciparum. Current assays to detect anti-plasmodial antibodies against native antigens within their cellular context are prone to bias and cannot be automated, although they provide important information about natural exposure and vaccine immunogenicity. A novel, cytometry-based workflow for quantitative detection of anti-plasmodial antibodies in human serum is presented.
Fixed red blood cells (RBCs), infected with late stages of P. falciparum were utilized to detect malaria-specific antibodies by flow cytometry with subsequent automated data analysis. Available methods for data-driven analysis of cytometry data were assessed and a new overlap subtraction algorithm (OSA) based on open source software was developed. The complete workflow was evaluated using sera from two GMZ2 malaria vaccine trials in semi-immune adults and pre-school children residing in a malaria endemic area.
Fixation, permeabilization, and staining of infected RBCs were adapted for best operation in flow cytometry. As asexual blood-stage vaccine candidates are designed to induce antibody patterns similar to those in semi-immune adults, serial dilutions of sera from heavily exposed individuals were compared to naïve controls to determine optimal antibody dilutions. To eliminate investigator effects introduced by manual gating, a non-biased algorithm (OSA) for data-driven gating was developed. OSA-derived results correlated well with those obtained by manual gating (r between 0.79 and 0.99) and outperformed other model-driven gating methods. Bland-Altman plots confirmed the agreement of manual gating and OSA-derived results. A 1.33-fold increase (p=0.003) in the number of positive cells after vaccination in a subgroup of pre-school children vaccinated with 100 μg GMZ2 was present and in vaccinated adults from the same region we measured a baseline-corrected 1.23-fold, vaccine-induced increase in mean fluorescence intensity of positive cells (p=0.03).
The current workflow advances detection and quantification of anti-plasmodial antibodies through improvement of a bias-prone, low-throughput to an unbiased, semi-automated, scalable method. In conclusion, this work presents a novel method for immunofluorescence assays in malaria research.
PMCID: PMC3545855  PMID: 23130649
Malaria; Flow cytometry-based IFA; Algorithmic data analysis; Anti-malarial antibodies; Human serum
8.  Automatic Clustering of Flow Cytometry Data with Density-Based Merging 
Advances in Bioinformatics  2009;2009:686759.
The ability of flow cytometry to allow fast single cell interrogation of a large number of cells has made this technology ubiquitous and indispensable in the clinical and laboratory setting. A current limit to the potential of this technology is the lack of automated tools for analyzing the resulting data. We describe methodology and software to automatically identify cell populations in flow cytometry data. Our approach advances the paradigm of manually gating sequential two-dimensional projections of the data to a procedure that automatically produces gates based on statistical theory. Our approach is nonparametric and can reproduce nonconvex subpopulations that are known to occur in flow cytometry samples, but which cannot be produced with current parametric model-based approaches. We illustrate the methodology with a sample of mouse spleen and peritoneal cavity cells.
PMCID: PMC2801806  PMID: 20069107
9.  Multiparameter Flow Cytometry for the Diagnosis and Monitoring of Small GPI-deficient Cellular Populations 
Glycosyl-phosphatidylinositol (GPI)-negative blood cells are diagnostic for Paroxysmal Nocturnal Hemoglobinuria (PNH). Marrow failure states are often associated with GPI negative cell populations. Quantification of small clonal populations of GPI negative cells influences clinical decisions to administer immunosuppressive therapy in marrow failure states (aplastic anemia or myelodysplastic syndrome) and to monitor minimal residual disease after allogeneic blood or marrow transplantation (BMT). We studied the reliability of high resolution flow cytometry markers operating at the limits of detection.
We performed serial quantification of the PNH clone size in 38 samples using multiparameter flow cytometry. Granulocytes, monocytes and RBCs were gated using forward and side scatter as well as lineage-specific markers. The GPI-linked markers fluorescent aerolysin (FLAER), CD55 and CD59 were comparatively evaluated. We also evaluated CD16 on granulocytes and CD14 on monocytes. The sensitivity of detection by each marker was further defined by serial dilution experiments on a flow-sorted sample. Two patients had quantification of their GPI-negative clones before and after allogeneic BMT.
FLAER was the most discriminant marker and allowed identification of 0.1% of GPI-negative cells despite other markers having superior signal-to-noise characteristics. CD14 and CD16 were inferior to CD55 at lower concentrations and in clinical application.
Multiparameter flow cytometry permits quantification of small GPI-negative clones with a sensitivity limit of about 0.1%. The single most reliable marker to monitor small granulocyte or monocyte PNH clones is FLAER, especially in conditions such as myelodysplastic syndromes or BMT, when traditional GPI-linked surface marker expression can be significantly altered.
PMCID: PMC2937167  PMID: 20533383
PNH; flow cytometry; FLAER; sensitivity; specificity; BMT
10.  Multispectral Imaging of Hematopoietic Cells: Where Flow Meets Morphology 
Journal of immunological methods  2008;336(2):91-97.
Normal and abnormal blood cells are typically analyzed by either histologic or flow cytometric approaches. Histology allows morphological examination of complex visual traits but with relatively limited numbers of cells. Flow cytometry can quantify multiple fluorescent parameters on millions of cells, but lacks morphological or sub-cellular spatial detail. In this review we present how a new flow technology, the ImageStream (Amnis Corporation, Seattle, WA), blends morphology and flow cytometry and can be used to analyze cell populations in ways not possible by standard histology or flow cytometry alone. The ImageStream captures brightfield, darkfield and multiple fluorescent images of individual cells in flow. The images can then be analyzed for levels of fluorescence intensity in multiple ways (i.e. maximum, minimum, or mean) as well as the shape and size of the area of fluorescence. Combinatorial measurements can also be defined to compare levels and spatial associations for multiple fluorescent channels. We demonstrate an application of this technology to distinguish six stages of erythroid maturation which have been classically defined by morphological criteria, by measuring changes in Ter119 mean intensity and area, DNA (Draq5 stain) mean intensity and area, and RNA content (thiazole orange stain). Using this approach, we find that other characteristics of erythroid maturational, such as marker expression and nuclear offset, vary appropriately within the defined cell subsets. Finally, we show that additional measurements of cell characteristics not classically analyzed in cytometry, including surface unevenness and unusually high contrast in brightfield images combined with fluorescent markers allow complex discriminations of rare populations.
PMCID: PMC2529019  PMID: 18539294
Erythropoiesis; Multispectral Imaging; ImageStream
11.  A Numerical Approach to Ion Channel Modelling Using Whole-Cell Voltage-Clamp Recordings and a Genetic Algorithm 
PLoS Computational Biology  2007;3(8):e169.
The activity of trans-membrane proteins such as ion channels is the essence of neuronal transmission. The currently most accurate method for determining ion channel kinetic mechanisms is single-channel recording and analysis. Yet, the limitations and complexities in interpreting single-channel recordings discourage many physiologists from using them. Here we show that a genetic search algorithm in combination with a gradient descent algorithm can be used to fit whole-cell voltage-clamp data to kinetic models with a high degree of accuracy. Previously, ion channel stimulation traces were analyzed one at a time, the results of these analyses being combined to produce a picture of channel kinetics. Here the entire set of traces from all stimulation protocols are analysed simultaneously. The algorithm was initially tested on simulated current traces produced by several Hodgkin-Huxley–like and Markov chain models of voltage-gated potassium and sodium channels. Currents were also produced by simulating levels of noise expected from actual patch recordings. Finally, the algorithm was used for finding the kinetic parameters of several voltage-gated sodium and potassium channels models by matching its results to data recorded from layer 5 pyramidal neurons of the rat cortex in the nucleated outside-out patch configuration. The minimization scheme gives electrophysiologists a tool for reproducing and simulating voltage-gated ion channel kinetics at the cellular level.
Author Summary
Voltage-gated ion channels affect neuronal integration of information. Some neurons express more than ten different types of voltage-gated ion channels, making information processing a highly convoluted process. Kinetic modelling of ion channels is an important method for unravelling the role of each channel type in neuronal function. However, the most commonly used analysis techniques suffer from shortcomings that limit the ability of researchers to rapidly produce physiologically relevant models of voltage-gated ion channels and of neuronal physiology. We show that conjugating a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols enables the semi-automatic production of models of voltage-gated ion channels. Once fully automated, this approach may be used for high throughput analysis of voltage-gated currents. This in turn will greatly shorten the time required for building models of neuronal physiology to facilitate our understanding of neuronal behaviour.
PMCID: PMC1963494  PMID: 17784781
12.  Misty Mountain clustering: application to fast unsupervised flow cytometry gating 
BMC Bioinformatics  2010;11:502.
There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments.
To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 106 data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment.
Misty Mountain is fast, unbiased for cluster shape, identifies stable clusters and is robust to noise. It provides a useful, general solution for multidimensional clustering problems. We demonstrate its suitability for automated gating of flow cytometry data.
PMCID: PMC2967560  PMID: 20932336
13.  Hierarchical Modeling for Rare Event Detection and Cell Subset Alignment across Flow Cytometry Samples 
PLoS Computational Biology  2013;9(7):e1003130.
Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.
Author Summary
The use of flow cytometry to count antigen-specific T cells is essential for vaccine development, monitoring of immune-based therapies and immune biomarker discovery. Analysis of such data is challenging because antigen-specific cells are often present in frequencies of less than 1 in 1,000 peripheral blood mononuclear cells (PBMC). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. Consequently, there is intense interest in automated approaches for cell subset identification. One popular class of such automated approaches is the use of statistical mixture models. We propose a hierarchical extension of statistical mixture models that has two advantages over standard mixture models. First, it increases the ability to detect extremely rare event clusters that are present in multiple samples. Second, it enables direct comparison of cell subsets by aligning clusters across multiple samples in a natural way arising from the hierarchical formulation. We demonstrate the algorithm on clinically relevant reference PBMC samples with known frequencies of CD8 T cells engineered to express T cell receptors specific for the cancer-testis antigen (NY-ESO-1) and compare its performance with other popular automated analysis approaches.
PMCID: PMC3708855  PMID: 23874174
14.  Single cell cytometry of protein function in RNAi treated cells and in native populations 
BMC Cell Biology  2008;9:43.
High Content Screening has been shown to improve results of RNAi and other perturbations, however significant intra-sample heterogeneity is common and can complicate some analyses. Single cell cytometry can extract important information from subpopulations within these samples. Such approaches are important for immune cells analyzed by flow cytometry, but have not been broadly available for adherent cells that are critical to the study of solid-tumor cancers and other disease models.
We have directly quantitated the effect of resolving RNAi treatments at the single cell level in experimental systems for both exogenous and endogenous targets. Analyzing the effect of an siRNA that targets GFP at the single cell level permits a stronger measure of the absolute function of the siRNA by gating to eliminate background levels of GFP intensities. Extending these methods to endogenous proteins, we have shown that well-level results of the knockdown of PTEN results in an increase in phospho-S6 levels, but at the single cell level, the correlation reveals the role of other inputs into the pathway. In a third example, reduction of STAT3 levels by siRNA causes an accumulation of cells in the G1 phase of the cell cycle, but does not induce apoptosis or necrosis when compared to control cells that express the same levels of STAT3. In a final example, the effect of reduced p53 levels on increased adriamycin sensitivity for colon carcinoma cells was demonstrated at the whole-well level using siRNA knockdown and in control and untreated cells at the single cell level.
We find that single cell analysis methods are generally applicable to a wide range of experiments in adherent cells using technology that is becoming increasingly available to most laboratories. It is well-suited to emerging models of signaling dysfunction, such as oncogene addition and oncogenic shock. Single cell cytometry can demonstrate effects on cell function for protein levels that differ by as little as 20%. Biological differences that result from changes in protein level or pathway activation state can be modulated directly by RNAi treatment or extracted from the natural variability intrinsic to cells grown under normal culture conditions.
PMCID: PMC2529295  PMID: 18673568
15.  Elucidation of Seventeen Human Peripheral Blood B cell Subsets and Quantification of the Tetanus Response Using a Density-Based Method for the Automated Identification of Cell Populations in Multidimensional Flow Cytometry Data 
Cytometry. Part B, Clinical cytometry  2010;78(Suppl 1):S69-S82.
Advances in multi-parameter flow cytometry (FCM) now allow for the independent detection of larger numbers of fluorochromes on individual cells, generating data with increasingly higher dimensionality. The increased complexity of these data has made it difficult to identify cell populations from high-dimensional FCM data using traditional manual gating strategies based on single-color or two-color displays.
To address this challenge, we developed a novel program, FLOCK (FLOw Clustering without K), that uses a density-based clustering approach to algorithmically identify biologically relevant cell populations from multiple samples in an unbiased fashion, thereby eliminating operator-dependent variability.
FLOCK was used to objectively identify seventeen distinct B cell subsets in a human peripheral blood sample and to identify and quantify novel plasmablast subsets responding transiently to tetanus and other vaccinations in peripheral blood. FLOCK has been implemented in the publically available Immunology Database and Analysis Portal – ImmPort ( for open use by the immunology research community.
FLOCK is able to identify cell subsets in experiments that use multi-parameter flow cytometry through an objective, automated computational approach. The use of algorithms like FLOCK for FCM data analysis obviates the need for subjective and labor intensive manual gating to identify and quantify cell subsets. Novel populations identified by these computational approaches can serve as hypotheses for further experimental study.
PMCID: PMC3084630  PMID: 20839340
flow cytometry; density-based analysis; data clustering; tetanus vaccination; B lymphocyte subsets
16.  Rapid Cell Population Identification in Flow Cytometry Data* 
We have developed flowMeans, a time-efficient and accurate method for automated identification of cell populations in flow cytometry (FCM) data based on K-means clustering. Unlike traditional K-means, flowMeans can identify concave cell populations by modelling a single population with multiple clusters. flowMeans uses a change point detection algorithm to determine the number of sub-populations, enabling the method to be used in high throughput FCM data analysis pipelines. Our approach compares favourably to manual analysis by human experts and current state-of-the-art automated gating algorithms. flowMeans is freely available as an open source R package through Bioconductor.
PMCID: PMC3137288  PMID: 21182178
flow cytometry; data analysis; cluster analysis; model selection; bioinformatics; statistics
17.  Flow-Based Cytometric Analysis of Cell Cycle via Simulated Cell Populations 
PLoS Computational Biology  2010;6(4):e1000741.
We present a new approach to the handling and interrogating of large flow cytometry data where cell status and function can be described, at the population level, by global descriptors such as distribution mean or co-efficient of variation experimental data. Here we link the “real” data to initialise a computer simulation of the cell cycle that mimics the evolution of individual cells within a larger population and simulates the associated changes in fluorescence intensity of functional reporters. The model is based on stochastic formulations of cell cycle progression and cell division and uses evolutionary algorithms, allied to further experimental data sets, to optimise the system variables. At the population level, the in-silico cells provide the same statistical distributions of fluorescence as their real counterparts; in addition the model maintains information at the single cell level. The cell model is demonstrated in the analysis of cell cycle perturbation in human osteosarcoma tumour cells, using the topoisomerase II inhibitor, ICRF-193. The simulation gives a continuous temporal description of the pharmacodynamics between discrete experimental analysis points with a 24 hour interval; providing quantitative assessment of inter-mitotic time variation, drug interaction time constants and sub-population fractions within normal and polyploid cell cycles. Repeated simulations indicate a model accuracy of ±5%. The development of a simulated cell model, initialized and calibrated by reference to experimental data, provides an analysis tool in which biological knowledge can be obtained directly via interrogation of the in-silico cell population. It is envisaged that this approach to the study of cell biology by simulating a virtual cell population pertinent to the data available can be applied to “generic” cell-based outputs including experimental data from imaging platforms.
Author Summary
One of the key challenges facing cell biologists today is understanding the influence of molecular controls in shaping and controlling cell growth and proliferation. There is growing recognition that abnormal progression through the cell cycle and the associated effects on the growth of cell populations has a major impact on a wide range of biological and clinical problems, including: tumour growth, developmental control, origins of chromosomal instability and drug resistance. Multiparameter flow cytometry is frequently used to assess proliferation dynamics of cellular populations using fluorescent reporters generating large data sets that can inform simulation models. We have developed stochastic computing approaches allied to evolutionary algorithms to produce simulated cell populations—providing a new approach to the analysis of real multi-variate data sets obtained by flow cytometry. The methodology delivers new insight on biological processes in delivering a continuous simulation of the dynamic evolution of a cellular system between fixed sampling points, hence, converting static data into dynamic data revealing the effective traverse of the cell cycle, restriction points and commitment gateways. The approach also permits the visualisation of the variation between individual cells reflecting biological heterogeneity and potentially Darwinian fitness, given that the simulation delivers a report on population dynamics in which each and every cell can be tracked.
PMCID: PMC2855319  PMID: 20419143
18.  Heterogeneity of alveolar macrophages in experimental silicosis. 
The alveolar macrophage (AM) population has been shown to be heterogeneous in composition as well as in function. The aim of our study was to assess morphological and functional features of AM in an experimental model of quartz-induced lung fibrosis by flow cytometric methods. Twelve cynomolgus monkeys were exposed 8 hr/day, 5 days/week for 26 months to either normal atmosphere (n = 5) or 5 mg/m3 DQ12 less than 5 microns quartz dust (n = 7). After 20 months of exposure, we studied AM phagocytosis by incubating bronchoalveolar lavage cells with fluorescent polystyrene microspheres (mean diameter 1.91 microns). Using a fluorescence-activated cell sorter analyzer, AM subpopulations were identified via their volume/side scatter properties. After selective electronic "gating" of the AM populations, both the percentage of phagocytic AM and the mean number of ingested microspheres per AM were determined. In addition, a phagocytic index (microspheres/AM x % phagocytic AM x 10(-2) and a hypothetical total phagocytic capacity of one lung (phagocytic index x total number of AM x 10(-6) were calculated. The total bronchoalveolar lavage cell counts rose (75.6 +/- 11.3 x 10(6) versus 10.1 +/- 0.8 x 10(6)) significantly after quartz exposure. In contrast, the percentage of phagocytic AM was significantly (p less than 0.05) reduced (43.5 +/- 5.0% versus 74.2 +/- 1.4%). Flow cytometric measurements revealed the appearance of an AM subpopulation characterized by size/granularity features identical to blood monocytes. Only minimal numbers of these cells were found under normal conditions, but they constituted 50% of the entire AM population in the quartz group.(ABSTRACT TRUNCATED AT 250 WORDS)
PMCID: PMC1519543  PMID: 1396469
19.  The importance of Foxp3 antibody and fixation/permeabilization buffer combinations in identifying CD4+CD25+Foxp3+ regulatory T cells 
Foxp3 is a key marker for CD4+ regulatory T cells (Tregs) and was utilized in developing a multiparameter flow cytometric panel to identify Tregs. Achieving reproducible staining and analysis first required optimization of Foxp3 staining.
We present a comparative study of PCH101, 236A/E7, 3G3, 206D, 150D, and 259D/C7 clones of anti-human-Foxp3 antibodies, used in combination with five different fixation/permeabilization buffers. Staining for CD25, CD152, and CD127 was also compared between fixation/permeabilization treatments. Promising antibody/buffer combinations were tested in a panel of PBMCs from 10 individuals, then on fresh versus frozen cells from four individuals. Finally, different fluorochromes coupled to two representative antibodies were compared to optimize separation of Foxp3+ from Foxp3- events. Foxp3 gates were set using two gating strategies, based on CD127+CD25- “non-Tregs” or based on isotype controls.
For Foxp3 staining the best conditions for fixation/permeabilization were obtained using the eBioscience Foxp3, Imgenex, BioLegend, and BD Foxp3 buffers. Comparing results from 10 subjects, 259D/C7, PCH101, 236A/E7, and 206D antibodies yielded statistically higher levels of Foxp3 cells than 150D and 3G3 antibodies (mean=6.9, 5.1, 4.7, and 3.7% compared to 1.7, and 0.3% of CD25+Foxp3+ events within CD4+ cells, respectively). Importantly, the “non-specificity” of some antibodies observed with a Foxp3 gate based on isotype controls could be eliminated by setting the Foxp3 gate on “non-Tregs”. Better separation of Foxp3+ and Foxp3- populations was observed using the PCH101 clone coupled to Alexa647 compared to FITC, or the 259D/C7 clone coupled to PE compared to Alexa488 fluorochrome.
Foxp3 staining can be highly variable and depends on the choice of antibody/buffer pair and the fluorochrome used. Selecting the correct population for setting the Foxp3 gate is critical to avoid including non-Tregs in the Foxp3+ gate. The experiments presented here will aid in optimization of flow cytometry staining panels to quantify Treg frequencies in humans.
PMCID: PMC2862733  PMID: 19845018
Foxp3 staining; Anti-human Foxp3 antibodies; PCH101; 259D/C7; 236A/E7; 3G3; 206D; 150D; Regulatory CD4+ T cells
20.  Cluster Cytometry for High Capacity Bioanalysis 
Cytometry  2012;81(5):419-429.
Flow cytometry specializes in high content measurements of cells and particles in suspension. Having long excelled in analytical throughput of single cells and particles, only recently with the advent of HyperCyt sampling technology has flow cytometry’s multi-experiment throughput begun to approach the point of practicality for efficiently analyzing hundreds-of-thousands of samples, the realm of high throughput screening (HTS). To extend performance and automation compatibility we built a HyperCyt-linked Cluster Cytometer platform, a network of flow cytometers for analyzing samples displayed in high-density, 1536-well plate format. To assess performance we used cell and microsphere based HTS assays that had been well characterized in previous studies. Experiments addressed important technical issues: challenges of small wells (assay volumes 10 μL or less, reagent mixing, cell and particle suspension), detecting and correcting for differences in performance of individual flow cytometers, and the ability to reanalyze a plate in the event of problems encountered during the primary analysis. Boosting sample throughput an additional four-fold, this platform is uniquely positioned to synergize with expanding suspension array and cell barcoding technologies in which as many as 100 experiments are performed in a single well or sample. As high-performance flow cytometers shrink in cost and size, cluster cytometry promises to become a practical, productive approach for HTS and other large scale investigations of biological complexity.
PMCID: PMC3331957  PMID: 22438314
Flow cytometry; suspension array; high content analysis; high throughput screening
21.  FIND: A new software tool and development platform for enhanced multicolor flow analysis 
BMC Bioinformatics  2011;12:145.
Flow Cytometry is a process by which cells, and other microscopic particles, can be identified, counted, and sorted mechanically through the use of hydrodynamic pressure and laser-activated fluorescence labeling. As immunostained cells pass individually through the flow chamber of the instrument, laser pulses cause fluorescence emissions that are recorded digitally for later analysis as multidimensional vectors. Current, widely adopted analysis software limits users to manual separation of events based on viewing two or three simultaneous dimensions. While this may be adequate for experiments using four or fewer colors, advances have lead to laser flow cytometers capable of recording 20 different colors simultaneously. In addition, mass-spectrometry based machines capable of recording at least 100 separate channels are being developed. Analysis of such high-dimensional data by visual exploration alone can be error-prone and susceptible to unnecessary bias. Fortunately, the field of Data Mining provides many tools for automated group classification of multi-dimensional data, and many algorithms have been adapted or created for flow cytometry. However, the majority of this research has not been made available to users through analysis software packages and, as such, are not in wide use.
We have developed a new software application for analysis of multi-color flow cytometry data. The main goals of this effort were to provide a user-friendly tool for automated gating (classification) of multi-color data as well as a platform for development and dissemination of new analysis tools. With this software, users can easily load single or multiple data sets, perform automated event classification, and graphically compare results within and between experiments. We also make available a simple plugin system that enables researchers to implement and share their data analysis and classification/population discovery algorithms.
The FIND (Flow Investigation using N-Dimensions) platform presented here provides a powerful, user-friendly environment for analysis of Flow Cytometry data as well as providing a common platform for implementation and distribution of new automated analysis techniques to users around the world.
PMCID: PMC3119067  PMID: 21569257
22.  A Computational Framework to Emulate the Human Perspective in Flow Cytometric Data Analysis 
PLoS ONE  2012;7(5):e35693.
In recent years, intense research efforts have focused on developing methods for automated flow cytometric data analysis. However, while designing such applications, little or no attention has been paid to the human perspective that is absolutely central to the manual gating process of identifying and characterizing cell populations. In particular, the assumption of many common techniques that cell populations could be modeled reliably with pre-specified distributions may not hold true in real-life samples, which can have populations of arbitrary shapes and considerable inter-sample variation.
To address this, we developed a new framework flowScape for emulating certain key aspects of the human perspective in analyzing flow data, which we implemented in multiple steps. First, flowScape begins with creating a mathematically rigorous map of the high-dimensional flow data landscape based on dense and sparse regions defined by relative concentrations of events around modes. In the second step, these modal clusters are connected with a global hierarchical structure. This representation allows flowScape to perform ridgeline analysis for both traversing the landscape and isolating cell populations at different levels of resolution. Finally, we extended manual gating with a new capacity for constructing templates that can identify target populations in terms of their relative parameters, as opposed to the more commonly used absolute or physical parameters. This allows flowScape to apply such templates in batch mode for detecting the corresponding populations in a flexible, sample-specific manner. We also demonstrated different applications of our framework to flow data analysis and show its superiority over other analytical methods.
The human perspective, built on top of intuition and experience, is a very important component of flow cytometric data analysis. By emulating some of its approaches and extending these with automation and rigor, flowScape provides a flexible and robust framework for computational cytomics.
PMCID: PMC3341382  PMID: 22563466
23.  Analysis of High-Throughput Flow Cytometry Data Using plateCore 
Advances in Bioinformatics  2009;2009:356141.
Flow cytometry (FCM) software packages from R/Bioconductor, such as flowCore and flowViz, serve as an open platform for development of new analysis tools and methods. We created plateCore, a new package that extends the functionality in these core packages to enable automated negative control-based gating and make the processing and analysis of plate-based data sets from high-throughput FCM screening experiments easier. plateCore was used to analyze data from a BD FACS CAP screening experiment where five Peripheral Blood Mononucleocyte Cell (PBMC) samples were assayed for 189 different human cell surface markers. This same data set was also manually analyzed by a cytometry expert using the FlowJo data analysis software package (TreeStar, USA). We show that the expression values for markers characterized using the automated approach in plateCore are in good agreement with those from FlowJo, and that using plateCore allows for more reproducible analyses of FCM screening data.
PMCID: PMC2777006  PMID: 19956418
24.  Assay validation for the assessment of adipogenesis of multipotential stromal cells—a direct comparison of four different methods 
Cytotherapy  2013;15(1):89-101.
Background aims
Mesenchymal stromal cells (MSCs) are regenerative and immuno-privileged cells that are used for both tissue regeneration and treatment of severe inflammation-related disease. For quality control of manufactured MSC batches in regard to mature fat cell contamination, a quantitative method for measuring adipogenesis is needed.
Four previously proposed methods were validated with the use of bone marrow (BM) MSCs during a 21-day in vitro assay. Oil red staining was scored semiquantitatively; peroxisome proliferator activated receptor-γ and fatty acid binding protein (FABP)4 transcripts were measured by quantitative real-time polymerase chain reaction; FABP4 protein accumulation was evaluated by flow cytometry; and Nile red/4′,6-diamidino-2-phenylindole (DAPI) ratios were measured in fluorescent microplate assay. Skin fibroblasts and MSCs from fat pad, cartilage and umbilical cord were used as controls.
Oil red staining indicated considerable heterogeneity between BM donors and individual cells within the same culture. FABP4 transcript levels increased 100- to 5000-fold by day 21, with large donor variability observed. Flow cytometry revealed increasing intra-culture heterogeneity over time; more granular cells accumulated more FABP4 protein and Nile red fluorescence compared with less granular cells. Nile red increase in day-21 MSCs was ∼5- and 4-fold, measured by flow cytometry or microplate assay, respectively. MSC proliferation/apoptosis was accounted through the use of Nile red/DAPI ratios; adipogenesis levels in day-21 BM MSCs increased ∼13-fold, with significant correlations with oil red scoring observed for MSC from other sources.
Flow cytometry permits the study of MSC differentiation at the single-cell level and sorting more and less mature cells from mixed cell populations. The microplate assay with the use of the Nile red/DAPI ratio provides rapid quantitative measurements and could be used as a low-cost, high-throughput method to quality-control MSC batches from different tissue sources.
PMCID: PMC3539160  PMID: 23260089
adipogenesis; flow cytometry; multipotential stromal cells, Nile red
Nature methods  2013;10(3):228-238.
Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks – mammalian cell population identification to determine if automated algorithms can reproduce expert manual gating, and sample classification to determine if analysis pipelines can identify characteristics that correlate with external variables (e.g., clinical outcome). This analysis presents the results of the first of these challenges. Several methods performed well compared to manual gating or external variables using statistical performance measures, suggesting that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
PMCID: PMC3906045  PMID: 23396282

Results 1-25 (1370309)