The QUAliFiER Workflow
The workflow involved in using QUAliFiER is relatively straightforward. It involves importing the data, extracting cell population statistics, defining QA tasks, performing outlier calling, and then generating an quality assessment report. The first three steps are handled by the QUALIFIER function, which essentially combines the different pieces of information necessary to perform QA on a dataset. A more detailed description of these steps follows.
Importing the QA gating template with flowWorkspace
The flowWorkspace package is used to import the gating template from the FlowJo workspace for the ITN study dataset into R. Note that although the workflow presented here uses template gates, the approach can be applied without loss of generality to any set of samples that has been gated in the same manner (i.e. where corresponding populations can be identified across samples). This template includes the set of hierarchical manual gates that identify all the cell sub-poplulations of interest for QA (Figure
A). A call to openWorkspace creates a flowJoWorkspace object from the XML workspace file, then parseWorkspace reads the template and constructs the necessary R–level objects for the gates, compensation matrices, and data transformations, and optionally reads in the FCS files and performs the preprocessing and gating while calculating the population statistics for each gated cell population. The results for each FCS file are stored in an object of the GatingHierarchy class, with multiple files regrouped in a GatingSet object. This is the object which is ultimately passed on to the QUAliFiER package.
Figure 1 Functionality of flowWorkspace applied to the QA template gating hierarchy of the test data in the QUAliFiER pipeline. Functionality of flowWorkspace applied to the QA template gating hierarchy of the test data in the QUAliFiER pipeline. A) The design (more ...)
The gating hierarchy for any sample can be inspected via plot
B) and the success of the import procedure verified via the concordance of the imported cell counts against FlowJo’s cell counts using plotPopCV
C). Slight discrepancies (a few fractions of a percent in the coefficient of variation) are due to FlowJo’s quantization of the data transformation function, which must be interpolated by flowWorkspace
. Larger CVs may either indicate errors in the import process or small (containing few cells) populations where differences of two or three cells between the computed and imported counts result in a large coefficient of variation. Individual gates and samples can be visualized with the plotGate
D) to inspect populations flagged with a large coefficient of variation. Importantly, these statistics and plots can be exported (via ExportTSVAnalysis
) to the LabKey tool, which provides a web–based front–end for visualizing gated flow cytometry data
Extracting population statistics
After importing the data from FlowJo, QUAliFiER extracts population statistics from the GatingSet (internally via the getQAStats function), and stores them in a local database. Subsequent quality assessment makes use of this database to rapidly query and manipulate the data. QUAliFiER can apply filters to the population statistics and perform outlier calls based on grouping and conditioning variables defined in the associated study metadata. Each quality assessment task is defined in a qaTask object. The details for all the qaTasks are provided in a qaTask definition file (described below), whereas the study metadata is supplied as an associated comma separated value file. This file associates each FCS data file with study metadata (e.g. subject, date, dose, aliquot, and so forth). The GatingSet, qaTask definition file, study metadata file, and database connection are passed to the qaPreprocess() function, which does the work of extracting and combining the relevant information from each source into a coherent data structure. Importantly, the QUAliFiER package could be used to QA any manually gated data file format supported by flowWorkspace and is not limited to the template gating QA process highlighted here. Additionally, QUAliFiER could be used in a stand-alone fashion to perform QA on a set of extracted cell-population statistics and study metadata. flowWorkspace acts to simplify access to extracted statistics, but is not strictly required for use with QUAliFiER.
A qaTask defines a specific quality assessment procedure and requires the following information:
1. The specific cell population or gate for QA.
2. The cell population statistic (i.e. counts or proportion) to QA.
3. The metadata variables for stratification and outlier calling.
4. How to present the data to the user (i.e. plot type).
class is a general container that allow users to define different quality assessment
tasks using the information above. The class uses R’s familiar formula
interface as a compact and flexible description of the QA task. Briefly, it is generally of the form
, where y
is the population statistic to monitor and takes four possible values:
• MFI: Median or Mean Fluorescence Intensity of the cell population (the mean or median is user–defined).
• proportion: The percent of the parent population represented by the population being QC’d.
• count: The number of events in the cell population.
• spike: Applicable to each channel of an FCS file measured over the acquisition time. A windowed, cumulative Z–score that quantifies spikes in the MFI of a channel over the acquisition time of the sample. In the absence of spikes, this is approximately zero.
In the right hand side of the formula, x specifies the x–axis variable for plotting. It can be any variable defined in the associated study metadata such as date or sample id. Variables on the right of the vertical bar represent conditioning variables used to stratify the population statistics for outlier detection. These also must appear in the study metadata. Outlier detection is performed within each level of the cross product of the grouping variables. If these are omitted, then outlier detection is performed on the entire set of samples.
The qaTask also requires a plot type to be specified. This can be any of the standard lattice plot types, such as xyplot or bwplot. The plot type defines how the data will be summarized and presented to the user. QUAliFiER defines some default qaTasks such as monitoring the stability of the MFI for a channel over time, or monitoring the variation in the percentage of a cell population within and across aliquots (Figures
Figure 2 QA result of of MFI stability vs time for the FITC channel. We can see clear examples where the MFI is not stable over time, i.e. it is either increasing (HLADR, CD8, CD11c), or decreasing (LD). Some stains show residuals that are not normally distributed, (more ...)
Figure 3 Consistency of the mononuclear cell gate across aliquots. The plot shows the consistency of the MNC population across aliquots (coresampleid). The plot type is bwplot, and the formula for generating this output is coresampleid~proportion, while (more ...)
The cell population to be monitored by the qaTask is passed as a name to the pop argument of the qaTask constructor. All of this information (the formula, population name, plot type, and other details) can be provided for all the qaTasks to be performed on a data set via an external csv file passed to the qaPreprocess function. Internally, the makeQaTask function can read a set of these task definitions from the csv file and construct all the qaTask objects simultaneously. Users may also create individual qaTasks directly via the new method.
Aggregate QA populations
The population name defining a qaTask generally refers to a unique gated cell population, either via the terminal gate name (e.g. “WBC_perct”), or via a unique gating path (e.g. “/MNC/FITC-A MFI”) (Figure
B). QUAliFiER also supports aggregating populations using common portions of gate names (e.g. “MFI” or “margin”) (Figure
B). The tool supports regular expression and substring matching to select multiple, non–unique cell populations for QA assessment. In this way, the population “MFI” selects all five terminal populations matching the string “MFI”, which can then be visualized simultaneously in separate plot panels, with each panel representing a different channel, as defined in the formula (see Figure
). Aggregating multiple cell populations in this way for quality assessment provides further flexibility to tailor the quality checks to the needs of the user. This aggregate approach is used throughout the template gates applied to the sample data set in this paper.
Outlier Detection and Visualization
Once data is imported and quality assessment tasks are defined, the qaCheck and plot methods perform the quality assessment and visualization based on the definitions stored in each qaTask object.
The actual outlier calls are performed by the qaCheck method. The method reads the population statistics from the database and performs outlier detection within the groups defined in the formula. The qaCheck method can accept a default or user–defined outlier detection function.
The package defines several outlier functions for general use in common QA tasks. These are summarized in Table
. Briefly, the outlier.cutoff function is used to call outliers based on a threshold value of the statistic. The outlier.norm function is used for outlier detection in most QA tasks. It models the data within each group using a normal distribution with a robust estimate of the mean and variance. Outlier calls are made based on a threshold, α (significance level) or a Z–score cutoff, either of which can be provided as an argument to the function, which also allows for one or two–sided tests. If the plot type is bwplot (box plot), then outlier.norm is used to call between–group outliers (i.e. boxes with a larger than expected variation) with a default Z–score cutoff of 3, based on the distribution of the log–transformed IQRs (Interquartile ranges) of the groups. If the plot type is xyplot, the user can add a regression line to the plot via rFunc argument (defaults to rlm robust regression). Individual observations are flagged as outliers based on the residuals. qoutlier implements the default box–plot outlier detection algorithm for outlier calls within groups for any observation beyond ± 1.5 × IQR for the group.
Summary of outlier detection methods in the QUAliFiER package
The qaCheck method will record the outlier calls in the database. Plots can be generated without outlier detection by simply omitting the call to qaCheck. In some applications it may be desirable to simply examine trends rather than make explicit outlier calls (e.g., for monitoring MFI stability over time, Figure
We show an example for monitoring the efficiency of red blood cell lysis (Figure
) from the ITN data set. Efficiency of lysis is measured as the fraction of total cells collected in the WBC_perct (white blood cell) gate (Figure
B). The qaTask definition used to monitor this population statistic over time, conditioning on all staining panel (tubes) is:
Figure 4 Consistency of red blood cell lysis across staining panels. If red blood cells are not properly lysed, they will be detected as events in the FCM experiment. Under ideal conditions, only white blood cells would be detected. The outlier threshold is set (more ...)
Description=‘‘Sufficient RBC lysis'',
plotType=‘‘xyplot'', qaName=‘‘RBCLysis'', qaID=1L,db=db)
Level : Tube
Description : Sufficient RBC lysis
Plot type: xyplot
Gated node: WBC_perct
Default formula :proportion ~RecdDt | Tube
The call to data loads the study data that has already been parsed and combined with metadata and quality assessment tasks as defined in the previous section. When constructing a qaTask via new it is also necessary to supply a unique qaID, and the database (an environment) holding the extracted statistics and metadata (this is initially passed to the qaPreprocess() function, where it is populated).
To perform the outlier detection, the qaCheck function is called on the rbc.lysis task and the results are stored in the database. A call to the plot method will generate the summary plot in Figure
, passing additional plotting parameters via the par argument.
The plot method is used to generate figures summarizing the outlier detection and quality assessment checks. This function takes the qaTask as an argument, as well as options similar to the lattice package, such as subset, which allows a subset of the levels in the grouping variables to be plotted. For example, samples can be subset based on a range of dates, or the plot of the quality assessment task defined above could be restricted to samples within a single staining panel (Tube) by passing subset=Tube%in%'CD8/CD25/CD4/CD3/CD62' to the plot method. This allows for flexibility in visualizing or analyzing subsets of the data.
Adding robust regression lines to scatterplots
As data accumulates over the course of a study (e.g. a longitudinal study), QUAliFiER stores this data in the QA database, and it becomes trivial to monitor trends in data collected over longer periods of time. As an example, the QA task for monitoring fluorescence stability in the FITC channel over time benefits from the addition of a robust regression line to the output plots within each panel in order to identify groups of samples where there are either non–linear effects or where the MFI is not stable over time (i.e. the slope of the regression is significantly different from zero). The outlier detection task for this procedure is defined in the following way:
"Fluorescence stability vs time",
> plot(MFIvsTime,y=MFI RecdDt|stain
Note the rFunc argument to the qaCheck and plot functions. It allows us to fit a robust linear regression within each group in order to help visualize the changes in MFI over time. Outliers within each level of the stain grouping factor are detected based on the deviation of the residuals from the regression line. By default these are called at a threshold of the absolute Z–score of the standardized residuals (3 by default) (Figure
). If the qaCheck call is omitted, but rFunc is passed to the plot function, the resulting plots will be generated without outlier detection, which may be desirable in some circumstances. Importantly, all the qaTask definitions can be pre–defined in a csv file read in by qaPreprocess(), with column names for each argument to the qaTask constructor.
Summary of the Quality Assessment Report for an ITN Clinical Trial Dataset
packages were applied to a dataset of 3000 FCS files from the Immune Tolerance Network. The QA report (
) identified instances where issues with sample quality merited further review by domain scientists. The stability of the MFI (Figure
) for each antibody stain showed non–linear effects and changes in stability in some instances, which may have been associated with experimental factors such as changes in the intensity of the staining antibody. The consistency of lymphocyte gating across sample aliquots identified several instances where an elevated amount of debris in the sample resulted in a lower proportion of lymphocytes and mononuclear cells in the MNC gate (Figures
). Evaluation of redundant staining (see QA report online, and Figure
B, C) across sample aliquots allowed for rapid identification of samples with inconsistent staining. The quality of individual aliquots was evaluated by looking at the number of events collected for each aliquot, and identifying those samples where fewer than the number of expected events were collected (see QA report online). Another approach to assess the quality of individual aliquots was to examine the consistency of lysis of red blood cells in each aliquot (Figure
). Aliquots with fewer than 80% of lysed red blood cells were flagged for further investigation. Instrument stability during sample acquisition was evaluated by monitoring spikes or drifts in each measured channel for each sample (see QA report online). Plots of cumulative Z–scores of those drifts or spikes allowed identification of samples which showed significant deviations.
Figure 6 Dot plots of outlier MNC samples. Dot plots of the MNC gates for coresampleid 11732, one of the group outliers identified by the MNC stability qaTask. Samples with lower proportions of lymphocytes inside the gate are readily visible, caused by elevated (more ...)
One of the key advantages of QUAliFiER is that it provides an integrated environment for review of quality assurance data by flow domain experts. In the past, the flow analyst would either spot check and manually review plots within flow gating software tools or have data exported from such tools into spreadsheets for sorting, plotting, and viewing of trends over time or across tubes. Should specific anomalies be found, the analyst would have to shuffle between applications, sort through files to review plots within the flow gating tool and return back to summary statistics or plots of trends for confirmation. The disjointed process was cumbersome.
QUALiFiER takes a lot of this frustration out of the process so domain scientists can focus on the scientific questions of interest. It should be noted that the use of QUALiFiER, whether in a research or clinical trial setting is to have the flow cytometry domain expert always review trends and patterns and not simply rely on automatic exclusion of flagged samples. There may be instances where a trend may be due to administration of therapy or other clinical event of interest. In those instances, having the system within the R/BioConducotor framework allows us to easily overlay QA concerns with potentially biological events in an integrated, seamless fashion, further demonstrating the ease and utility of the tool. To our knowledge it is the first tool to integrate this level of extensive quality assessment for large scale gated FCM data in a cohesive pipeline.
Ongoing improvements to the software include complete FlowJo support, as well as FACSDiva (BD Biosciences, San Jose, CA) experiment files, improvements to the HTML report formatting, and generation of PDF output for quality assessment reports. The tool will also be integrated into LabKey (Seattle, WA).
The features and description of the software herein refer to flowWorkspace version 1.2 and QUAliFiER version 1.0.1 found at the BioConductor website (see Availability and Requirements). The development version of flowWorkspace supports Windows and Mac versions of FlowJo, including the latest version (version X, Chimera) which is Gating–ML compliant. Support for BD’s (Franklin Lakes, New Jersey) FACS DiVA is actively being developed and the next release of flowWorkspace will support some the most frequently used manual gating tools (DiVA and FlowJo reach approximately 50% of users). FlowWorkspace data import and gating has also been reimplemented in C++ in the development release, for a 100–fold speed up over the current R–only version of the package.