Today, commercial flow cytometers capable of measuring 10 parameters are common, and those capable of 12–20 parameters are widespread. This rapid dissemination of hardware technology has revealed pressing needs in other areas: 1) practical and efficient methods for instrument calibration and quality control (QC), 2) availability of a wider variety of fluorochromes and antibody conjugates, 3) assistance with experimental design, and 4) better strategies for data analysis.
Instrument Calibration and QC
The success of polychromatic flow cytometry depends critically on instrument calibration; however, the old paradigm of analyzing unstained cells, and adjusting the gain (voltage) for all detectors until the signal falls under the second log decade persists. This method is problematic, since it does not consider the precision of fluorescence measurements in choosing voltages, nor does it ensure that the signal from a particular fluorochrome is detected maximally in the channel designated for it. The former can impact resolution of dim populations, by increasing the “spillover spreading” of data28
, while the latter complicates compensation for spectral overlap. Most critically, the method puts enormous importance on the ability to quantify “negative” signals – unstained cells – where the relative error in measurement is huge. The consequence can be poor performance of staining panels, and an inability to interpret experimental data.
In recent years, two systems for instrument calibration have emerged. Becton Dickinson (BD) has developed a proprietary system, known as Cytometer Setup and Tracking, or CS&T, for integration with their instruments and analysis software. The cornerstone of this system are beads that yield varying amounts of fluorescent signals, mimicking positive, dim, and negative staining. Analysis of these beads (which is performed automatically within the instrument software) checks laser and detector performance, linearity, and reports the gain settings that minimize the CV of dim populations. The method is fast and automatically provides a wealth of information about instrument performance, which can be tracked over time as a QC tool. However, there are important caveats beyond the restricted availability on instruments from solely this manufacturer. First, the signal of the brightest peak in each channel must be calibrated against a sample stained with the antibodies used in the experiment. Thus, the calibration routine requires use of a stained sample, and must be repeated every time a new staining panel is employed. Second, the algorithm requires that the positive signal be limited to ~10X the background signal. This setting is arbitrary, and may result in suboptimal quantification of extreme signals (very bright markers). Third, the method is currently incompatible with high power lasers, which perform so well that there is no measurable difference between the CVs of the lower and upper peaks. This is an important limitation, because it excludes green laser systems, which have been shown to provide much improved resolution of signals from PE and PE-based tandems compared to blue lasers. It also excludes high power red lasers, where the increased generation of photons may help resolve far red fluorochromes. Finally, the beads do not employ dyes commonly used in flow cytometry and vary from lot-to-lot. As such, calculations must be performed by downloading conversion factors, both for flow cytometry dyes (some of which may not be supported by BD) and for the lot identities.
A second system for instrument calibration was developed in our lab, and has been described previously29
. Briefly, this method employs antibody capture beads, which can be incubated with the fluorochrome-conjugated antibodies commonly used in experiments. In this way, the optimal gain values are calculated using all the fluorochromes the investigator might use. These beads have been manufactured with varying amounts of capture antibody, so that the beads provide a mix of five fluorescence levels (not simply bright, dim, and negative), thereby providing more information about the resolution of dim markers. When the beads are analyzed, the CV of signals are determined across a range of gain settings, and the voltage that provides the lowest CV, but still gives the brightest signal is chosen. Singly-stained compensation beads, and cell samples stained with a prototypic panel, are analyzed to verify that gain settings are optimal. These aspects of the method have important advantages over the CS&T system: first, it is applicable to all experiments, regardless of the fluorochromes chosen. Second, this calibration need not be performed daily (in contrast to the CS&T system). Third, it allows more consistent instrument monitoring, using an independent bead system (8-peak rainbow beads).
The next generation of cytometry will address the major limitations of these two systems. First, automated analysis of calibration data – which is possible to some extent with the CS&T system, but not at all with our system – will likely become a reality soon, as flow cytometry software vendors offer calibration platforms. Second, these systems will eventually employ better measures of instrument performance, beyond changes in gain settings over time or CVs of stained populations. These numbers can vary dramatically from detector to detector, with no consequence to cell staining, or they can be consistent even in the face of dramatic changes. The reason for this is the relatively large variance in the performance of detectors. Therefore, new measures that are applicable across detectors and from instrument to instrument are needed. Two relevant measures, the efficiency of a detector (Q) in measuring a particular fluorochrome and the optical background (B) or electronic noise, should be considered in this regard. With such measures, the ability to standardize performance across instruments may be possible in the future. This would provide a great advantage over current systems, which can only standardize the behavior of a single instrument over time.
A Wider Variety of Fluorochromes and Antibody Conjugates
To perform (up to) 18-color flow cytometry analyses, a wide range of reagents is required. At a minimum, 18 different fluorescent dyes must be available, and a library of multiple antibodies (both in terms of specificity and clone) must be conjugated to each dye. In practice, many more fluorochromes are needed, since some fluorochrome conjugates of the same antibody perform better than others.
For many years, the variety of antibody-fluorochrome conjugates available was limited. This was especially true for UV- and violet-laser excited fluorochromes (). With the introduction of quantum dots in 2004, seven or eight additional parameters could be measured in flow cytometry experiments, with excellent sensitivity. However, despite their utility, the availability of these materials has been limited. Moreover, many conjugates have not been suitable for intracellular staining (because of unpurified free dye)30
. Finally, quantum dot fluorescence can be fatally compromised by exposure to low level heavy metal contaminants in buffers31
Common fluorochromes used in flow cytometry, and the variety of reagents commercially available for each
Recently, the development of a new class of dyes has been reported. Brilliant violet (BV) dyes were developed as a consequence of the Nobel Prize-winning finding that some organic polymers could conduct electrons as well as inorganic materials. This led to the development of π-conjugated polymers, in which each monomer harvests laser light, and transfers electrons from this light along the polymer chain. The collected light is cooperatively emitted as fluorescence, providing a highly sensitive probe (). In fact, the first of these dyes, emitting light at 421nm (BV421) has a staining index rivaling PE. Because the design of these polymers is well-defined, conjugation to antibodies occurs at precise sites (limiting aggregation or multimeric complexes) and the materials are very stable. These dyes are suitable for high-sensitivity applications, such as the identification of antigen-specific T-cells (by peptide-MHC Class I multimer or CD154 staining) and can easily be substituted for pacific blue in multicolor staining panels. Notably, BV421 can be conjugated to other fluorochromes to make tandem dyes that emit light at a wide variety of wavelengths. These tandem dyes can be used intracellularly, providing a new sensitive option for intracellular detection from the violet laser.
Properties and applications of BV421
The Challenges Associated with Experimental Design
A major obstacle in polychromatic flow cytometry is the development of optimized staining panels. Without carefully constructed panels, artefacts in staining patterns may be observed (caused by incorrect compensation or incompatible antibody combinations) and reproducibility may suffer. The importance of optimized panel design in experimental design cannot be overemphasized. However, panel design is a lengthy and labor-intensive process. The process is reviewed elsewhere32
, but briefly, it consists of the following steps: 1) stratification of the markers into those required for the study, those that would be useful, and those that could be interesting but are not necessary, 2) purchase and titration of a wide variety of antibody reagents specific for the markers considered, 3) ranking of the best antibody reagents for each marker, 4) characterization of expression levels (does the marker exhibit on/off expression, a continuum of staining, or is it dim?), 5) construction of putative panels by pacing the dim markers on the brightest channels (PE, APC, and now BV421) and markers with on/off expression on the dimmest channels (QD565) or channels that are subject to significant spreading (Cy7APC), and 6) testing of the panels by staining tubes with increasing numbers of reagents, and examining the staining patterns and spreading of each panel upon the addition of a new antibody. Repeated iteration of the last two steps with alternate reagent combinations is needed to optimize the panel.
As with instrument calibration, there are opportunities to improve this process using automated tools. In fact, such tools could be integrated into instrument calibration software by using calculated Q and B values to determine the minimal signal that can be separated from background for each detector. This information could be used to determine the best channels for markers with on/off, continuum, or dim expression, and to build putative panels. Using theoretical compensation matrices for each putative panel, built from the titration data of each reagent, the spreading error in each detector could be calculated to predict experimental results. Dim markers that overlap with the negative population, after adjustment of spreading, could be flagged, or excluded, so that a final list of optimal staining combinations would be reported to the investigator.
Better Strategies for Data Analysis
Once instrument calibration, reagent choice, and panel design are complete, investigators face an enormous hurdle of data analysis. A fundamental advantage of polychromatic flow cytometry is the ability to examine finely-defined subsets of cells. A few such subsets can be identified a priori by the investigator; however, in typical experiments, there is an interest in exploring hundreds of phenotypic combinations (including all the markers). A number of tools and approaches are currently being evaluated for use; these include multidimensional visualization (exploration) tools, gating tools, and post-analysis data aggregation tools.
When multiple markers and hundreds of phenotypic combinations are available for exploration, the ability to visualize data in multiple dimensions becomes important. To this end, polychromatic plots33
have been developed. These are similar to standard dot pots, with the important exception that the color of each dot varies according to the expression of three other markers. Thus, every event is encoded with a shade of red, green, and blue to reflect the expression levels in three additional dimensions. The function that encodes color mapping can be altered, as can the priority of colors/markers, so that various populations can be emphasized. In this way, a two-dimensional dot plot can be translated into a five-dimensional visualization. Although great care must be taken in analyzing and interpreting data generated this way, the method is powerful and accessible (since it preserves the dot plot format we are used to seeing).
When gates defining each phenotypic combination are required for downstream analysis, Boolean algorithms are helpful. Such algorithms require simply a single gate identifying positive cells for each marker, from which negative gates are imputed, and Boolean combinations are constructed consisting of every possible combination. For example, if CD45RA, CCR7, and CD27 are put into the algorithm, the gates defining the following cell types are constructed: CD45RA+ CCR7+ CD27+, CD45RA+ CCR7+ CD27−, CD45RA+ CCR7− CD27+, CD45RA+ CCR7− CD27−, etc. This allows rapid enumeration of cells expressing these combinations of markers, through automated construction of the series of gates necessary for identification. A disadvantage of this tool is that it assumes that all subsets can be discriminated with equal sensitivity; this may not always be the case.
In addition to the tools available for visualization of staining patterns, gating, and phenotyping, specialized software can aggregate the frequency of every cell type across multiple specimens. For example, SPICE software34
performs this function, and joins categorical data (time point, disease condition), allowing rapid statistical comparison of cell frequencies across multiple different conditions. In addition, the compete dataset can be visualized as scatter plots, bar graphs, or pie charts, and overlaid with categorical variables. Finally, data can be normalized for background biological controls, as is required for intracellular cytokine assays (where data from mock-stimulated control samples is subtracted from each condition).
There are a number of considerations for employing these and similar approaches. First, when performing hypothesis-driven research, a single subset (or a couple of related subsets) is identified to test against a biological or disease outcome. However, this ignores the bulk of the data generated in the experiment, and limits the ability to detect new relationships. Using Boolean gating and SPICE, the majority of the data can be examined; however, many users consider testing only the terminal phenotypes defined by the Boolean gates (for example, CD45RA− CCR7+ CD127+ CD27+ CCR5− CD57− CD28− cells) against the outcome. What if the cell type relevant to disease is defined by some lesser combination of markers (a parent population, such as CD45RA− CD127+ CD57− cells)? Such tests are simple to perform in SPICE, but the software cannot perform multiple comparisons adjustments; the need for which is compounded by the additional number of populations tested. To compensate for this, the p-value for which differences are considered significant must be reduced in SPICE.
The next generation of cytometry tools may automate many of these processes. Using tools like FlowMeans35
, gates can be identified automatically (with a user’s input if needed) through computational formulas that examine the pattern of staining, and all Boolean combinations of every combination of markers can be constructed. Thus, the subjectivity and labor of gating are partially eliminated, and a greater number of subsets (e.g., “parent” populations) are included in the analysis.
A related tool, FlowType, takes these phenotypes and tests their relationship to a biologic measurement such as clinical outcome. For example, the tool can identify the cell types for which the association between subset frequency and patient survival time is statistically significant. Since there can be tens of thousands of phenotypes tested in such a scheme, adjustment of statistical significance for multiple comparisons is done. A subsequent analysis can test the correlation between all significant phenotypes, and define clusters of phenotypes that are highly related. From these, a representative phenotype is chosen, and each of the markers defining that phenotype is tested for the necessity of including it, in terms of predicting the biological outcome. Markers with less impact can be dropped from the phenotype, thereby reducing a dataset with tens of thousands of cell populations to only the simplest phenotypes with the strongest association to biological outcome. By defining the simplest phenotypes with biological relevance, it informs future experiments (which can eliminate the less relevant markers) and allows researchers in resource-poor or strictly regulated clinical settings to design the simplest, most meaningful immunophenotyping panels possible.
It is important to realize that these approaches rely on clusters of cells (cell populations) that are tested for biological importance. However, what if the cells of interest do not fall into a distinct cluster defined by a particular phenotype, but are instead enriched within a region of multidimensional space that cuts across defined gates? Two rarely-employed methods, known as probability binning and frequency difference gating36,37
, are useful in this regard. The process can be described as follows: First, data from patients within the same disease group is concatenated into a single file. This has the important advantage of increasing the number of events examined, particularly for rare, antigen-specific responses. Data from one patient group is then divided into multi-dimensional “bins” (spanning the measurement space) which contain roughly equal numbers of events; these bins are also then applied to the concatenated data from other disease groups. For each bin, the number of events in the disease groups are compared, and a test statistic (describing the degree of difference) is generated. The bins are then ranked by this test statistic (i.e., from the most similar to the least similar). The bins with most significant differences in frequency represent regions of multidimensional measurement space that identify cells that are present at different frequencies between disease states – importantly, not just present vs. absent, but different in representation. Furthermore, the bins themselves can be used as a “gate” for further analysis and enumeration of those cells.
Recently, two unique approaches for data analysis have been introduced; both tools can examine the relationship between markers or cell populations more directly than the approaches described above. The first is called spanning-tree progression analysis of density-normalized events (SPADE), and consists of four steps38
. The initial step takes the dataset, consisting of hundreds of thousands of events, and downsamples it in a density-dependent manner so that it is computationally manageable (but still reflects the original population frequencies). The second step dusters the data, while the third step links the clusters using minimum spanning trees. Finally, the data is upsampled to restore all the cells from the dataset. Results appear in tree-like graphics, where branches represent the relationships/hierarchy underlying the dataset, and the cell populations are depicted as nodes along these branches (colored, as is done for heat maps, to reflect frequency). The algorithm has particular utility in complex, multiparameter datasets generated to explore the relationships between cell types or the response to stimuli; however, it is less suited for comparison of samples across multiple individuals or disease conditions (where statistics are required to confirm differences). In some respects, it is similar to a second recently introduced approach, which is employed in Gemstone software (Verity House). This system uses probability state modeling to reveal relationships between many markers by examining them in the context of one, or a few, markers that describe the progression of a cell population39
. To do this, an initial modeling step is necessary, after which the variations in markers from the complete dataset can be visualized in the context of the modeled markers using ribbon pots.
In theory, the methods described above have the power to detect biologically important cells when the precise phenotypic definition of these cells is not known. That is, a biologically important cell type may uniquely express a marker that has not been measured in the polychromatic staining panel. However, this cell type is likely to share expression of other markers with closely related, but less biologically relevant cells. In this scenario, the probability binning/frequency difference gating may be able to detect these cells within a slice (or bin) of multidimensional space, or SPADE and Gemstone may be used to reveal other candidate markers to describe the key population. Notably, the concept that our assays may not identify a uniquely important marker (because we did not or could not measure it) suggests we need more multiparametric technology than available in current, state-of-the art flow cytometers.