|Home | About | Journals | Submit | Contact Us | Français|
Monoclonal antibodies are an important resource for defining molecular expression and probing molecular function. The characterization of monoclonal antibody reactivity patterns, however, can be costly and inefficient in nonhuman experimental systems. To develop a computational approach to the pattern analysis of monoclonal antibody reactivity, we analyzed a panel of 128 monoclonal antibodies recognizing sheep antigens. Quantitative single parameter flow cytometry histograms were obtained from five cell types isolated from normal animals. The resulting 640 histograms were smoothed using a Gaussian kernel over a range of bandwidths. Histogram features were selected by SiZer—an analytic tool that identifies statistically significant features. The extracted histogram features were compared and grouped using hierarchical clustering. The validity of the clustering was indicated by the accurate pairing of externally verified molecular reactivity. We conclude that our computational algorithm is a potentially useful tool for both monoclonal antibody classification and molecular taxonomy in nonhuman experimental systems.
The focus of the early Human Leukocyte Differentiation Antigens Workshops (HLDA) was the use of monoclonal antibodies (Mab) and flow cytometry to identify new molecules (1–7). The Workshops characterized the reactivity patterns of large panels of antibodies against a panel of cell populations. For practical purposes, most of the expression analyses were performed with widely available cell lines. Since multi-passaged cell lines generally represented uniform cell populations, the results of the analyses were typically binary; that is, the antibody either reacted with the cell line or it did not. Although the histograms derived from cell lines were typically amenable to parametric analyses, the antibody reactivity was effectively summarized as the percentage of cells with fluorescence intensity greater than the negative control. This “percent positive” data was subjected to hierarchical clustering to identify “clusters of differentiation” or CD specificities (1). In human biology, the ability to identify the gene corresponding to the target antigen has rendered this Workshop approach to identifying new molecules obsolete. Mab, however, continue to be an important resource for defining molecular expression and probing molecular function.
Despite the utility of monoclonal antibodies in biologic investigations, attempts to develop antibody panels in other species has had limited success. A practical problem is that the HLDA approach requires widely available cell lines—a resource that is uncommon in most species. An alternative approach is to use isolated cell populations such as peripheral blood, thymocytes and splenocytes. These cell populations, however, are nonuniform. Flow cytometry analyses with most Mab produce histograms that cannot be effectively clustered using “percent positive” data. Furthermore, the visual inspection method of pairwise comparison becomes overwhelming with even a few dozen antibodies.
To compare reactivity patterns of 128 monoclonal antibodies (Mab) produced against sheep antigens, we developed an algorithm of heirarchical clustering based on nonparametric feature identification of quantitative single parameter flow cytometry histograms. After kernel smoothing, statistically significant features were compared by hierarchical clustering. The accurate pairing of known Mab suggest the utility of this approach in the characterization and comparison of Mab in nonhuman experimental systems.
Balb/c mice (Jackson Laboratory, Bar Harbor, Maine, USA), 25–33g, were used in all Mab production experiments. The care of the animals was consistent with guidelines of the American Association for Accreditation of Laboratory Animal Care (Bethesda, MD).
Randomly bred female sheep (Parsons Farm, Springfield, Massachusetts, USA), 6–9 months old, ranging in weight from 35–50kg were used in these studies. They were given free access to food and water. Sheep were excluded from the study if there was any gross or microscopic evidence of infection. The care of the animals was consistent with the guidelines of the American Association for Accreditation of Laboratory Animal Care (AAALC) (Bethesda, Maryland, USA) and the Guide for the Care and Use of Laboratory Animals (Department of Health and Human Services, Publication No. 85-23).
Of the 128 monoclonal antibodies (Mab), 112 were produced in our laboratory. Myeloma and spleen cells were infused at a ratio of five splenocytes per NS1 myeloma cell (8) with polyethylene glycol 4000 (Sigma; St. Louis, Missouri, USA) (50%, weight/volume) as a fusing agent (9). Mouse myeloma cells and immune splenocytes in serum-free medium were centrifuged at 300×g for 10 minutes. The cells were gently suspended in 1ml of fusogen for 30 seconds at 37°C and then slowly diluted with the 20ml of serum-free medium. The cells were again centrifuged for five minutes at 250×g and gently resuspended in culture medium containing hypoxanthine-aminopterin-thymidine (HAT; Gibco Laboratories, Grand Island, New York, USA) supplemented with fetal Serum (20%, v/v) and endothelial growth supplement (Sigma) (10). The cells were plated in 96-will flat-bottom microtiter culture plates (Costar; Corning, New York, USA) at approximately 5×104 cells per well. The cells were cultured at 37°C in a 5% CO2 incubator and examined for hybridoma colonies at two-day intervals starting seven days after the fusion. The productive hybridomas, identified both by antibody concentration and antigen binding (11,12), were selected for further analysis.
The Mab were cloned and subcloned by limiting dilution. The hybridoma colonies were monoclonal based on statistical analysis of both limiting dilution and multi-plexed bead isotyping (13). The anti-CD4 Mab 17D (14), the anti-LFA-1 Mab F10–150 (14), the beta1 integrin Mab ERD2/117 and FW4-101-1 (15) and the class I Mab T2/39 (16) have been previously described. Additionally, the anti-macrophage Mab M3/246 as well as the anti-lymphocyte Mab L2/34 and T4/268 were produced and characterized as previously described (12,17,18). In addition to flow cytometry, all antibodies were characterized by immunohistochemistry and immunoprecipitation. Affinity purified Mab or cultured supernatants used at five-fold saturating concentrations were used in all experiments.
Cells were washed twice with phosphate-buffered saline containing 2.5% fetal calf serum and 0.02% sodium azide. Approximately 106 cells were incubated on ice for 30 minutes with an excess concentration of Mab. The cells were washed twice and stained with fluorescein-conjugated goat F(ab′)2 anti-mouse immunoglobulin (SouthernBiotech, Birmingham Alabama, USA) diluted 1:10. The cells were incubated on ice for another 30 minutes. After 3 washes, the cells were fixed in 1% para formaldehyde and analyzed on en Epics XL flow cytometer (Beckman Coulter, Miami, Florida, USA). The antibodies were tested against a panel of 5 sheep cell types: thymocytes, lymph node cells, alveolar macrophages, efferent lymphocytes and splenocytes.
The thymus was harvested immediately after euthanasia through a median sternotomy and dissection of both lobes of the thymic gland. The thymus was dissected into 3mm cubes of tissue and placed in RPMI 1640, 10% fetal calf serum (FCS) and squeezed through a steel mesh. After mechanical dissociation, the thymocytes were washed twice in RPMI containing 2.5% FCS prior to immunostaining.
Prescapular efferent lymph duct cannulations of normal 30–45 kg ewes were used as a source of lymphocytes (19). The lymphocyte-containing lymph plasma was collected in sterile polypropylene tubes (Falcon, Franklin Lakes, New Jersey, USA) or plastic bags (Baxter Healthcare, Deerfield, Illinois, USA) containing 300 units of heparin, 1000 units of penicillin and 1mg of streptomycin (Sigma-Aldrich, St. Louis, Missouri, USA). The cellular component was separated by velocity centrifugation and resuspended in cell media containing RPMI 1640 (Sigma-Aldrich) supplemented by 3% FCS (Sigma-Aldrich) and 1% deoxyribonuclease (Worthington Biochemical, Lakewood, New Jersey, USA) at a concentration of 5.0×106 cells/ml.
The sheep prescapular lymph node was harvested immediately after euthanasia and perinodal fat removed by sharp dissection. The lymph node was dissected into 3mm cubes of tissue and processed by mechanical shear using the glazed ends of two glass microscope slides. The cells were filtered and serially washed prior to preparation for immunostaining.
With airway protection using a 9Fr endotracheal tube (Bard, Covington, California, USA), an 18Fr Nasogastric tube (Bard) was passed into the mainstem bronchi. The tube was used to instill 1000cc of 27°C normal saline. A separte suction catheter was subsequently placed within the endotracheal tube to remove the lavage fluid. The procedure was repeated for the contralateral lung. The samples were passed through a strainer and washed twice in culture media (RPMI 1640; Sigma) containing 10% FCS. The cells were counted and viability assessed prior to preparation for immunostaining.
The spleen was harvested immediately after euthanasia through a midline laparotomy incision. The splenic hilum was ligated and the spleen removed. The spleen was dissected into 3mm cubes of tissue and placed in RPMI 1640, 10% FCS and squeezed through a steel mesh. After mechanical dissociation, the cells were filtered and washed. The cells were centrifuged at 200g for 10 min and resuspended in an ammonium chloride lysis buffer for 10 minutes at 27°C. The splenocytes were washed twice in RPMI containing FCS with the final resuspension in preparation for immunostaining.
The immunostained cells were analyzed on an Epics XL (Beckman Coulter; Maimi, Florida, USA) equipped with a single laser with excitation wavelength at 488nm and three emission detectors. Gain settings were calibrated to 4 peak Rainbow calibration particles (Spherotech; Libertyville, Illinois, USA). During the experiments the laser power, photomultiplier tube voltage, light scatter and fluorescent gains were kept constant. A total of 10,000 events were acquired from each sample. The data was processed using WinList 5.0 (Verity; Topsham, Maine, USA) and exported to Microsoft Excel (Redmond, Washington, USA) for further analysis. The output is a summary graph, or histogram, of the frequency distribution of the data. In flow cytometry nomenclature, the data is grouped in “channels” which corresponds to “bins” in statistical nomenclature.
The correspondence between histogram features identified by the computational algorithm and visual inspection was assessed by panel of flow cytometry users judged to be “experts” by peer evaluation. The panel was independently shown the 640 histograms in random order and asked to identify the number of biologically meaningful peaks.
Kernel smoothing is a general-purpose statistical technique for removing noise and highlighting structure in nonparametric datasets (20). The spatial resolution of kernel smoothing is determined by the bandwidth: a narrow bandwidth provides a highly resolved density estimation whereas a wide bandwidth produces a highly smoothed density estimation (21). In kernel smoothing, the estimation of the density at a particular location x and for bandwidth h is
The kernel K in our application was a Gaussian kernel; one implication of which was that the number of zero crossings of the derivative is always a decreasing function of h (22).
The statistical technique for defining statistically significant peaks has been described in detail (23). Briefly, the technique uses a family of curves
to assess the significant features. The bandwidth range
is taken to be hmin=2(bin width) and hmax=the data range. The subsequent map was based on the confidence limits given by
The application of SiZer was possible using MatLab 7.0 (MathWorks, Natick, Massachusetts, USA) and software downloadable from Professor Marron’s website http://www.stat.unc.edu/faculty/marron/marron_software.html).
First, the listmode standard FCS 2.0 files (24) were converted to a Matlab (ver. 7.0.4; MathWorks) array. In addition to the raw data, the FCS 2.0 structure specifies the instrument used for the measurement and the sample measured. The clustering process contains the following steps:
In this experiment, we chose 11 smoothing levels and extracted features from level 2 to 9, which led to y=8. The number of peaks per histograpm used for clustering is set to 4, i.e. x=4.
Meaningful clustering requires that the histogram features are both varied and reproducible. To provide this variability and avoid duplication, we selected a panel of 128 unique monoclonal antibodies. When tested against naturally occurring sheep cells, the single parameter flow cytometry analysis of Mab staining demonstrated a range of histogram features (Figure 1). More than one Mab recognizing LFA-1 (N=2), CD29 (N=2), class II (N=2), and class I (N=3) were included in the panel because these antibodies recognized unique epitopes as demonstrated by crossblocking experiments (not shown). These antibodies provided an externally verified test of clustering accuracy.
To simplify machine and reagent calibration, the analysis was performed with single parameter flow cytometry. The use of a single fluorochrome 1) eliminated the need to normalize the integral scale of different fluorochromes (e.g. fluorescein and phycoerythrin), and 2) facilitated quantitative flow cytometry data collection. Flow cytometry histograms obtained over a 6 year period were highly reproducible (Figure 2A). Biologic variability, as reflected in the population dynamics of tissues such as the efferent lymph, lymph node, alveolar macrophage, spleen and thymus (Figure 2B–F) was minimized by selecting a modal histogram representative of the results obtained from multiple animals.
To identify the statistically significant underlying structure in flow cytometry histograms, the histogram was smoothed using a Gaussian kernel (22) and a confidence interval was determined for the derivative of the smoothed curve at each location along the histogram. This technique, developed by Chaudhuri and Marron (23), identifies a statistically significant peak when the confidence interval is completely above zero; hence, “SiZer” defines the Significant Zero crossing. Of note, only 152 of the 640 histograms (23.8%) were judged to be unimodal by an expert panel.
Although bin width is fixed by the cytometer calibration, data analysis must still choose the level of smoothing or bandwidth (h). Since there is no broadly applicable “data-driven” method for bandwidth selection (20), we included a range of smoothing levels in the data analysis by constructing a confidence interval for the derivative of the smoothed curve at each location along the histogram and at a range of bandwidths (Figure 3A,B). Since the location of significant zero crossings varies with location on the intensity (x) axis, the SiZer output can be plotted graphically. The location of significant histogram features can be identified by color transition (Figure 3C,D).
To test the utility of SiZer-derived feature identification in data clustering, the reactivity of a panel of 128 Mab was tested against five sheep cell types: efferent lymphocytes, lymph node cells, alveolar macrophages, splenocytes and thymocytes. These cell types were chosen because of their availability and our interest in leukocyte biology. The dataset from hierarchical clustering was comprised of the modal histogram of three or more replicates. By convention, the resulting dendrogram is presented with the horizontal position reflecting the combined metric information (Figure 4). Given the cell types analyzed in our panel, antibodies recognizing target molecules with high cell surface density, such as major histocompatibility complex class I or CD45 molecules, are displayed rightward. In contrast, antibodies with little or no reactivity with these cell types would be plotted on the left side of the dendrogram.
The utility of hierarchical clustering depends upon the accurate estimation of similarity or dissimilarity between two objects. In the clustering of flow cytometry histograms, the similarity/dissimilarity was calculated as the Euclidean distance between Mab histograms. This distance measure can be expressed as a “dissimilarity index” with lower distance measures reflecting greater similarity and higher distance measures reflecting greater dissimilarity (Figure 4B). To validate the clustering result, antibodies recognizing the same target molecule—independently confirmed by biochemical and genetic studies (15,25)—were examined. Mab recognizing class I, LFA-1 and β1 integrin demonstrated high similarity indices (Figure 4B: yellow, green and red). Visual inspection of the corresponding flow cytometry results demonstrated correspondence between the histograms and the dissimilarity indices (Figure 5: yellow, green and red). For comparison, the pair with the highest dissimilarity index (Figure 4B: blue) demonstrated markedly disparate flow cytometry histograms (Figure 5: blue).
To determine the relative advantage of examining multiple cell types, we compared the dissimilarity Indices of the anti-LFA-1, anti-CD29, class II and anti-Class I Mab. For controls, we chose two Mab pairs with comparable surface expression to the anti-LFA-1 Mab on sheep splenocytes. As expected, the hierarchical clustering of individual cell types was less discriminating than the combination of all five cell types: the combined analysis demonstrated high dissimilarity in the control antibody comparisons and low dissimilarity indices in antibodies with shared specificities (Figure 6). Of interest, Mab recognizing related molecular subunits (e.g. CD29 and CD49a) showed marked similarity in some tissues and dissimilarity on others (Figure 6). Mab recognizing polymorphic epitopes on class II showed comparable variability (Figure 6). The intermediate results with antibodies recognizing members of a molecular family—the heterodimeric subunits of the CD29 family and polymorphic epitopes of class II molecules--suggest that the dissimilarity index, when compared across cell types, may also provide a nuanced measure of molecular relatedness.
Characterizing and comparing Mab reactivity patterns continues to be important for both basic and applied biologic research. In this report, we used a anti-sheep Mab panel to illustrate the application of statistical and informatics tools in Mab pattern analysis. Our approach has similarities to traditional workshops; that is, Mab were used as specific probes of target molecules, flow cytometry was used as a quantitative measure of molecular expression, and hierarchical clustering was performed with the cytometry results. Because of the absence of cell lines in sheep, however, Mab reactivity was tested against naturally occurring cell types. A disadvantage of using isolated cell populations is that the resulting histograms are frequently multimodal and difficult to model using parametric families—a data structure incompatible with a “percent positive” analysis. Identifying the underlying structure in these histograms required the application of kernel smoothing and an advanced statistical tool called SiZer. Using SiZer to identify significant features of the histograms from 5 different cell types, we performed hierarchical clustering of 128 Mab. The validity of the clustering algorithm was inferred by the reliable identification of externally verified molecular reactivity.
Our analysis of single parameter histograms is different from, but complementary to, current trends in flow cytometry (26). Recent technologic advances have focused on increasing the number of dimensions that can be simultaneously measured by flow cytometers. This multi-dimensional approach takes advantage of the unique ability of flow cytometry to analyze cell populations on a “per cell” basis. Implicit in this approach are the assumptions that 1) the analysis will yield a binary result if sufficient subdivisions are analyzed, and 2) the result will reflect static cell populations with a stable phenotype. An advantage of this approach is that it reduces a multi-dimensional dataset to a string of binary descriptors (e.g. CD45RO−CD44+L-selectin−) (27). A disadvantage is that the analysis is relatively inflexible to independently regulated molecular expression and evolving phenotypes. An additional limitation of this approach is that the discriminating dimensions (parameters) must be “built into” the analysis. As a result, conceptual distinctions may be driven more by methodologic decisions than biologic patterns.
In contrast, our data analysis is based on simple single parameter histograms. Implicit in our approach are the assumptions that 1) combinatorial complexity—multiple cell types—can be used to effectively discriminate molecular expression patterns, and 2) stochastic properties of cell populations may reflect insights that are nonidentical to the detailed phenotypic description of individual cells (28–30). An advantage of our approach is that single parameter analyses are currently more technically reproducible than multi-parameter analyses (31). Similarly, the use of a single fluorochrome preserves the integral scale of the distance measures used for clustering. Third, our approach limits the a priori judgements required for the analysis—perhaps enhancing the knowledge discovery process. Finally, it is important to underscore that patterns observed in our analysis of single parameter histograms can and should be explored using contemporary multi-dimensional (multi-parameter) flow cytometry techniques.
A major accomplishment of our work is the development of a computational approach to the reliable identification of “real” peaks. The underlying structure of nonparametric flow cytometry histograms can be masked by both biologic and technical variability. A statistical technique designed to preserve the underlying histogram structure, while limiting channel-to-channel variability, is kernel smoothing (20). In studies of smoothing techniques based on a Gaussian kernel, we found that clustering results varied substantially depending upon the bandwidth or degree of smoothing. Narrow bandwidths risked undersmoothing and the inclusion of unmeaningful peaks in the clustering algorithm. Alternatively, wide bandwidths risked oversmoothing and the exclusion of small, but meaningful, peaks. To address this problem we used the statistical methodology referred to as SiZer (Significant Zero crossings)(23).
SiZer provided three major advantages. First, SiZer produced an assessment of the significance of a histogram feature. A peak was present when there was a zero crossing of the derivative of the smoothed curve (or “density estimate”). The peak was statistically significant when the derivative of the estimate was significantly positive to the left and significantly negative to the right. Second, SiZer enabled data analysis and clustering without the arbitrary selection of a bandwidth parameter. Although an appropriate bandwidth could be reasonably selected by visual inspection when analyzing one histogram, this approach became impractical when clustering dozens of histograms. Using a family of bandwidths avoided the necessity of arbitrary bandwidth selection. Third, SiZer provided an objective tool for differentially weighting prominent peaks. Since the clustering algorithm used data from the entire family of SiZer curves, prominent peaks that were present in all bandwidths were more influential in the hierarchical clustering analysis than peaks that were present in only 1 or 2 bandwidths.
Another important advantage of our approach is that it permits the development of a longitudinal database independent of pairwise comparisons. Because features are identified relative to an absolute calibration standard, pairwise comparisons used in many approaches (32)(33) are unnecessary. As a result, the database is cumulative; that is, new antibody data can be added to the database without the necessity of testing the entire antibody “panel.” A cumulative database is a particular advantage in nonhuman species because international antibody workshops have become increasingly impractical. A challenge to our technique, shared by almost all approaches to Mab pattern recognition, is the requirement for “real” peaks on which to base the clustering analysis. In circumstances in which the histogram is only slightly shifted from the “negative” peak, the reliable identification of a significant peak depends on minimizing technical and biologic variability. Similarly, when target molecules are infrequently expressed in normal tissues, the analysis must systematically include cell populations that express the appropriate target antigens.
A theoretical advantage of clustering data derived from naturally occurring cell populations, rather than immortalized cells lines, is biological relevance. We speculate that similarities and dissimilarities of cell surface molecule expression identified by hierarchical clustering of naturally occurring cell populations will provide insights into a variety of biological processes. Parallel molecular expression levels may reflect receptor-ligand relationships, common regulation, or membrane linkage. Similarly, a parallel increase in cell surface molecule expression after cytokine exposure may imply a related function or the common participation in a biologic process. Of practical significance, at least for nonhuman experimental systems, will be the comparison of hierarchical clustering results between species. Cross-species comparisons may facilitate Mab characterization. Further, these clustering comparisons may enhance the connection between animal models and human disease.
Supported in part by NIH Grant HL47078 and HL75426
The authors would like to thank Drs. Howard Shapiro and Edgar Milford for their helpful comments and insights, and Drs. Matt Wand and J. Stephen Marron for their introduction to SiZer.