Search tips
Search criteria

Results 1-13 (13)

Clipboard (0)
Year of Publication
1.  A Survey of Flow Cytometry Data Analysis Methods 
Advances in Bioinformatics  2009;2009:584603.
Flow cytometry (FCM) is widely used in health research and in treatment for a variety of tasks, such as in the diagnosis and monitoring of leukemia and lymphoma patients, providing the counts of helper-T lymphocytes needed to monitor the course and treatment of HIV infection, the evaluation of peripheral blood hematopoietic stem cell grafts, and many other diseases. In practice, FCM data analysis is performed manually, a process that requires an inordinate amount of time and is error-prone, nonreproducible, nonstandardized, and not open for re-evaluation, making it the most limiting aspect of this technology. This paper reviews state-of-the-art FCM data analysis approaches using a framework introduced to report each of the components in a data analysis pipeline. Current challenges and possible future directions in developing fully automated FCM data analysis tools are also outlined.
PMCID: PMC2798157  PMID: 20049163
2.  Automatic Clustering of Flow Cytometry Data with Density-Based Merging 
Advances in Bioinformatics  2009;2009:686759.
The ability of flow cytometry to allow fast single cell interrogation of a large number of cells has made this technology ubiquitous and indispensable in the clinical and laboratory setting. A current limit to the potential of this technology is the lack of automated tools for analyzing the resulting data. We describe methodology and software to automatically identify cell populations in flow cytometry data. Our approach advances the paradigm of manually gating sequential two-dimensional projections of the data to a procedure that automatically produces gates based on statistical theory. Our approach is nonparametric and can reproduce nonconvex subpopulations that are known to occur in flow cytometry samples, but which cannot be produced with current parametric model-based approaches. We illustrate the methodology with a sample of mouse spleen and peritoneal cavity cells.
PMCID: PMC2801806  PMID: 20069107
3.  Fluorescence Intensity Normalisation: Correcting for Time Effects in Large-Scale Flow Cytometric Analysis 
Advances in Bioinformatics  2009;2009:476106.
A next step to interpret the findings generated by genome-wide association studies is to associate molecular quantitative traits with disease-associated alleles. To this end, researchers are linking disease risk alleles with gene expression quantitative trait loci (eQTL). However, gene expression at the mRNA level is only an intermediate trait and flow cytometry analysis can provide more downstream and biologically valuable protein level information in multiple cell subsets simultaneously using freshly obtained samples. Because the throughput of flow cytometry is currently limited, experiments may need to span over several weeks or months to obtain a sufficient sample size to demonstrate genetic association. Therefore, normalisation methods are needed to control for technical variability and compare flow cytometry data over an extended period of time. We show how the use of normalising fluorospheres improves the repeatability of a cell surface CD25-APC mean fluorescence intensity phenotype on CD4+ memory T cells. We investigate two types of normalising beads: broad spectrum and spectrum matched. Lastly, we propose two alternative normalisation procedures that are usable in the absence of normalising beads.
PMCID: PMC2798117  PMID: 20049162
4.  Assessing the Quality of Whole Genome Alignments in Bacteria 
Advances in Bioinformatics  2009;2009:749027.
Comparing genomes is an essential preliminary step to solve many problems in biology. Matching long similar segments between two genomes is a precondition for their evolutionary, genetic, and genome rearrangement analyses. Though various comparison methods have been developed in recent years, a quantitative assessment of their performance is lacking. Here, we describe two families of assessment measures whose purpose is to evaluate bacteria-oriented comparison tools. The first measure is based on how well the genome segmentation fits the gene annotation of the studied organisms; the second uses the number of segments created by the segmentation and the percentage of the two genomes that are conserved. The effectiveness of the two measures is demonstrated by applying them to the results of genome comparison tools obtained on 41 pairs of bacterial species. Despite the difference in the nature of the two types of measurements, both show consistent results, providing insights into the subtle differences between the mapping tools.
PMCID: PMC2798158  PMID: 20049164
5.  Merging Mixture Components for Cell Population Identification in Flow Cytometry 
Advances in Bioinformatics  2009;2009:247646.
We present a framework for the identification of cell subpopulations in flow cytometry data based on merging mixture components using the flowClust methodology. We show that the cluster merging algorithm under our framework improves model fit and provides a better estimate of the number of distinct cell subpopulations than either Gaussian mixture models or flowClust, especially for complicated flow cytometry data distributions. Our framework allows the automated selection of the number of distinct cell subpopulations and we are able to identify cases where the algorithm fails, thus making it suitable for application in a high throughput FCM analysis pipeline. Furthermore, we demonstrate a method for summarizing complex merged cell subpopulations in a simple manner that integrates with the existing flowClust framework and enables downstream data analysis. We demonstrate the performance of our framework on simulated and real FCM data. The software is available in the flowMerge package through the Bioconductor project.
PMCID: PMC2798116  PMID: 20049161
6.  iFlow: A Graphical User Interface for Flow Cytometry Tools in Bioconductor 
Advances in Bioinformatics  2009;2009:103839.
Flow cytometry (FCM) has become an important analysis technology in health care and medical research, but the large volume of data produced by modern high-throughput experiments has presented significant new challenges for computational analysis tools. The development of an FCM software suite in Bioconductor represents one approach to overcome these challenges. In the spirit of the R programming language (Tree Star Inc., “FlowJo,”, these tools are predominantly console-driven, allowing for programmatic access and rapid development of novel algorithms. Using this software requires a solid understanding of programming concepts and of the R language. However, some of these tools|in particular the statistical graphics and novel analytical methods|are also useful for nonprogrammers. To this end, we have developed an open source, extensible graphical user interface (GUI) iFlow, which sits on top of the Bioconductor backbone, enabling basic analyses by means of convenient graphical menus and wizards. We envision iFlow to be easily extensible in order to quickly integrate novel methodological developments.
PMCID: PMC2798115  PMID: 20049160
7.  Analysis of High-Throughput Flow Cytometry Data Using plateCore 
Advances in Bioinformatics  2009;2009:356141.
Flow cytometry (FCM) software packages from R/Bioconductor, such as flowCore and flowViz, serve as an open platform for development of new analysis tools and methods. We created plateCore, a new package that extends the functionality in these core packages to enable automated negative control-based gating and make the processing and analysis of plate-based data sets from high-throughput FCM screening experiments easier. plateCore was used to analyze data from a BD FACS CAP screening experiment where five Peripheral Blood Mononucleocyte Cell (PBMC) samples were assayed for 189 different human cell surface markers. This same data set was also manually analyzed by a cytometry expert using the FlowJo data analysis software package (TreeStar, USA). We show that the expression values for markers characterized using the automated approach in plateCore are in good agreement with those from FlowJo, and that using plateCore allows for more reproducible analyses of FCM screening data.
PMCID: PMC2777006  PMID: 19956418
8.  The KM-Algorithm Identifies Regulated Genes in Time Series Expression Data 
Advances in Bioinformatics  2009;2009:284251.
We present a statistical method to rank observed genes in gene expression time series experiments according to their degree of regulation in a biological process. The ranking may be used to focus on specific genes or to select meaningful subsets of genes from which gene regulatory networks can be built. Our approach is based on a state space model that incorporates hidden regulators of gene expression. Kalman (K) smoothing and maximum (M) likelihood estimation techniques are used to derive optimal estimates of the model parameters upon which a proposed regulation criterion is based. The statistical power of the proposed algorithm is investigated, and a real data set is analyzed for the purpose of identifying regulated genes in time dependent gene expression data. This statistical approach supports the concept that meaningful biological conclusions can be drawn from gene expression time series experiments by focusing on strong regulation rather than large expression values.
PMCID: PMC2777010  PMID: 19956417
9.  Bridging the Divide between Manual Gating and Bioinformatics with the Bioconductor Package flowFlowJo 
Advances in Bioinformatics  2009;2009:809469.
In flow cytometry, different cell types are usually selected or “gated” by a series of 1- or 2-dimensional geometric subsets of the measurements made on each cell. This is easily accomplished in commercial flow cytometry packages but it is difficult to work computationally with the results of this process. The ability to retrieve the results and work with both them and the raw data is critical; our experience points to the importance of bioinformatics tools that will allow us to examine gating robustness, combine manual and automated gating, and perform exploratory data analysis. To provide this capability, we have developed a Bioconductor package called flowFlowJo that can import gates defined by the commercial package FlowJo and work with them in a manner consistent with the other flow packages in Bioconductor. We present this package and illustrate some of the ways in which it can be used.
PMCID: PMC2775689  PMID: 19956421
10.  FlowFP: A Bioconductor Package for Fingerprinting Flow Cytometric Data 
Advances in Bioinformatics  2009;2009:193947.
A new software package called flowFP for the analysis of flow cytometry data is introduced. The package, which is tightly integrated with other Bioconductor software for analysis of flow cytometry, provides tools to transform raw flow cytometry data into a form suitable for direct input into conventional statistical analysis and empirical modeling software tools. The approach of flowFP is to generate a description of the multivariate probability distribution function of flow cytometry data in the form of a “fingerprint.” As such, it is independent of a presumptive functional form for the distribution, in contrast with model-based methods such as Gaussian Mixture Modeling. FlowFP is computationally efficient and able to handle extremely large flow cytometry data sets of arbitrary dimensionality. Algorithms and software implementation of the package are described. Use of the software is exemplified with applications to data quality control and to the automated classification of Acute Myeloid Leukemia.
PMCID: PMC2777013  PMID: 19956416
11.  A Combinatory Approach for Selecting Prognostic Genes in Microarray Studies of Tumour Survivals 
Advances in Bioinformatics  2009;2009:480486.
Different from significant gene expression analysis which looks for genes that are differentially regulated, feature selection in the microarray-based prognostic gene expression analysis aims at finding a subset of marker genes that are not only differentially expressed but also informative for prediction. Unfortunately feature selection in literature of microarray study is predominated by the simple heuristic univariate gene filter paradigm that selects differentially expressed genes according to their statistical significances. We introduce a combinatory feature selection strategy that integrates differential gene expression analysis with the Gram-Schmidt process to identify prognostic genes that are both statistically significant and highly informative for predicting tumour survival outcomes. Empirical application to leukemia and ovarian cancer survival data through-within- and cross-study validations shows that the feature space can be largely reduced while achieving improved testing performances.
PMCID: PMC2777003  PMID: 19956419
12.  Tumor Classification Using High-Order Gene Expression Profiles Based on Multilinear ICA 
Advances in Bioinformatics  2009;2009:926450.
Motivation. Independent Components Analysis (ICA) maximizes the statistical independence of the representational components of a training gene expression profiles (GEP) ensemble, but it cannot distinguish relations between the different factors, or different modes, and it is not available to high-order GEP Data Mining. In order to generalize ICA, we introduce Multilinear-ICA and apply it to tumor classification using high order GEP. Firstly, we introduce the basis conceptions and operations of tensor and recommend Support Vector Machine (SVM) classifier and Multilinear-ICA. Secondly, the higher score genes of original high order GEP are selected by using t-statistics and tabulate tensors. Thirdly, the tensors are performed by Multilinear-ICA. Finally, the SVM is used to classify the tumor subtypes. Results. To show the validity of the proposed method, we apply it to tumor classification using high order GEP. Though we only use three datasets, the experimental results show that the method is effective and feasible. Through this survey, we hope to gain some insight into the problem of high order GEP tumor classification, in aid of further developing more effective tumor classification algorithms.
PMCID: PMC2778791  PMID: 19956422
13.  The FAST-AIMS Clinical Mass Spectrometry Analysis System 
Advances in Bioinformatics  2009;2009:598241.
Within clinical proteomics, mass spectrometry analysis of biological samples is emerging as an important high-throughput technology, capable of producing powerful diagnostic and prognostic models and identifying important disease biomarkers. As interest in this area grows, and the number of such proteomics datasets continues to increase, the need has developed for efficient, comprehensive, reproducible methods of mass spectrometry data analysis by both experts and nonexperts. We have designed and implemented a stand-alone software system, FAST-AIMS, which seeks to meet this need through automation of data preprocessing, feature selection, classification model generation, and performance estimation. FAST-AIMS is an efficient and user-friendly stand-alone software for predictive analysis of mass spectrometry data. The present resource review paper will describe the features and use of the FAST-AIMS system. The system is freely available for download for noncommercial use.
PMCID: PMC2775698  PMID: 19956420

Results 1-13 (13)