Recent advances in proteomic profiling technologies, such as SELDI-TOF MS (Ciphergen Biosystems, Inc., Fremont, CA, http://www.ciphergen.com
), have allowed preliminary profiling and identification of biomarkers in biological fluids for biological, toxicological, and clinical research [1
]. ProteinChip technology coupled with SELDI-TOF MS is an effective tool for the simultaneous detection of the relative expression levels of proteins over a wide range of molecular weights in biological samples under different conditions. Differences in protein expression level can then be used to identify disease, differentiate different stages of a disease, toxicant treatment versus control, or different time points following toxicant treatment [11
Analysis of SELDI-TOF MS data presents challenges similar to those for gene expression profile analysis from microarray technologies. Global profiling analyses strive to identify reliable and reproducible expression patterns that are signatures specific to each state, such as disease versus healthy control or different experimental conditions (e.g. treated with a toxicant of interest versus untreated). The identification of biomarkers for diagnosis or prognosis is dependent on analysis of the highly dimensional protein expression profiles. Data must be correctly analyzed before valid interpretation and reliable biological conclusions to be drawn from a protein expression profiles. Analysis of poor quality, noise laden protein expression profiles, however, will likely lead to results lacking biological relevance. Therefore, quality assessment of the protein expression profiles and determination of reproducibility of SELDI-TOF MS experiments and profiles prior to data analysis is of critical importance.
Using SELDI-TOF MS coupled with protein chip technologies for biomarker development is a complicated process that involves many steps, including sample collection and preparation, protein chip selection and preparation, matrix selection and application, spectral calibration, loading sample on chip, washing away non-specifically bound proteins, SELDI-TOF MS parameter settings, data recording, and data pre-processing. Any of these many steps could introduce noise, thus, adversely affecting the quality of the experiment and the reliability of the protein expression profile. A high degree of variability of protein expression profiles in SELDI experiments is not infrequent. The coefficient of variation for absolute intensity measures can be as high as 50–60% [15
]. Scientists have recently realized that quality control (QC) is an important issue in SELDI experiments and several efforts have been made to apply some QC techniques to improve the reproducibility of SELDI profiling data [3
]. For example, QC samples that are pooled from multiple samples have been used to assess the reproducibility of a SELDI experiment [18
], while technical replicates have been used to assess the reproducibility within the same samples [21
]. Because of the complex nature of SELDI-TOF MS – ProteinChip experiments, even with experimental QC, the resultant data must be subjected to stringent quality assessment prior to data analysis. Specifically, low quality spectra should be identified and eliminated from analysis to ensure the reliability of biomarkers and the associated patterns discovered during analysis. For example, systematic variability in experiments may introduce additional error sources into the data and this possibility should be examined prior to data analysis.
We investigated systematic variability for plates, chips, and spot positions in two independent SELDI biomarker studies. All peaks (five in our study) appearing in all QC samples should be used to assess the reproducibility of experiments as recommended by Ciphergen because those peaks are the common proteins for the QC samples and should be in similar levels of expression. The high level of reproducibility of our experiments was demonstrated by low coefficients of variation for the five peaks that appeared in all 144 spectra from QC samples. No systematic bias in the experiments was detected (there is no single source of variation that is consistent when switching samples between spots, plates and chips). To identify spectra of low quality, a Pearson correlation matrix was developed as a QC tool to detect low quality spectra in SELDI profiling data analysis, using all peaks in all spectra. The rationale behind the use of a correlation matrix as a QC tool is the assumption that protein expression profiles from biological replicates and technical replicates should be similar. Thus, the correlation matrix is a measure of the similarities among the spectra and useful for quantifying how consistently the experiments have been conducted. We applied the correlation matrix to the SELDI data from the study of biomarkers for liver cancer and liver toxicity, as well as myeloma-associated lytic bone disease. We found that the correlation matrix was an efficient and reliable means to detect low quality spectra should be removed. Doing so should result in more reliable biomarker identification in the final protein expression profiles.