In this study, we developed multi-gene biomarker models to predict early-stage (asymptomatic) atherosclerosis based on different types of leukocytes. In particular, we have shown a proof-of-concept strategy to predict asymptomatic atherosclerosis by using molecular biomarkers from peripheral blood samples, and the existence of common molecular expression signatures of atherogenic risks across different cell types of WBC. We believe these predictions were possible due to several reasons. First, gene expression signatures from patient blood samples appear to have a high potential to predict atherosclerosis in its early stage since certain molecular changes in blood cells may occur much before the plaque development or serious clinical symptoms. Second, a genome-wide microarray technique enabled us to comprehensibly identify relevant molecular expression signatures of atherogenic risks beyond the information obtained from standard clinical parameters. We also believe our multi-gene predictors could predict the risk of atherosclerosis more accurately than predictors based on single biomarkers or a small number of clinical parameters. Finally, we found many biomarkers of atherosclerosis shared consistent expression patterns across different leukocyte subsets. It will be interesting to further investigate these common biomarkers for their specific roles in the disease.
We note that FH1 and FH2 were from the identical set of familial hyperglycemia patients and healthy controls, and possibly that our significant prediction on FH2 is correlated with the use of the identical patient set. However, we believe that our significant prediction on FH2 is not mainly due to the use of the same patient set for the following reasons. First, the molecular data of the two sets were from completely different immune cells---FH1 from monocytes and FH2 from circulating T cells. Since our biomarker discovery and predictive modeling were performed strictly based on monocyte cells of the FH1 set, FH2 is independent of FH1 for its molecular characteristics and data. Second, we observed that our identical prediction model performed considerably better for white blood samples on FH3, a completely independent set of FH patients and controls from the FH1 set. We think this was due to the fact that monocytes are partially included in white blood cells so our monocyte-based predictor presented a better predictability for that set. Therefore, the common molecular information appears to be more important than the use of specific patient set for our training and prediction.
We believe our approach can be highly useful for clinical diagnosis of atherosclerosis for the following reasons. First, we used molecular signatures from patients' blood samples that can be conveniently obtained in routine clinical practice. Second, we found that molecular signatures of atherogenic risks exist and are commonly shared among different types of leukocytes so we may be able to choose and further refine diagnosis tests based on one or multiple types depending on their accuracy and clinical applicability. For example, clinical data showed that monocyte cells are involved in very early pathogenetic stages of atherosclerosis [18
]. In particular, when atherosclerosis plaques (or foam cells) are differentiated from monocytes recruited from circulating blood, critical molecular changes appear to occur much before any clinical symptoms of atherosclerosis. Thus, a molecular test based on blood monocyte cells may serve as an effective diagnosis tool for an early stage of atherosclerosis. Also, as seen in several patient data sets we investigated here, microarray profiling of a small amount of blood cells can now be efficiently and cost-effectively performed so its use becomes quite practical for clinical applications as well as scientific investigations. However, once final biomarkers and prediction models are identified and finalized, diagnosis tests and assays can also be developed with more economic and convenient techniques such as RT-PCR.
There are several limitations in our current study. First, our primary biomarker discovery and prediction model training were performed by contrasting familial hypercholesterolemia patients against healthy controls. Likely due to this restriction, our prediction was better in stratifying FH patients from healthy controls than general subclinical atherosclerosis patients. When we reversed the role of training and test sets in our preliminary analysis, i.e. used the subclinical atherosclerosis patient set for model training and FH patient sets for independent model test, the prediction results were generally deteriorated, possibly due to the small sample size of the subclinical atherosclerosis patient set (data not shown). Questions regarding whether predictors can perform significantly better if they are trained based on the same disease type of patients and/or same subtype of blood cells requires more careful investigation with a larger number of patient data in a future study. Also, we found that COXEN biomarker discovery and modeling training based on monocyte data was more successful to predict risks based on other cell types than other directions, e.g. T-cell data for training to predict the others. It may be due to the data quality or biological information in the monocyte data which requires further investigation.
Even though we did not use patients' outcome information in our COXEN-based predictions, we partially used the molecular information of our test patient data sets in the current study. A more rigorous prediction performance of these predictors should thus be further evaluated using a third patient set from the same cell type of leukocytes. Our current predictors were constructed solely based on molecular data due to the lack of patients' other clinical information in our datasets. However, we believe additional predictive information can be obtained from many clinical parameters of patients such as age, gender [19
], LDL level, HDL level, apolipoproteins or triglyceride level. Also one of the keys to enhancing the success rate by the prediction model in the future is that "six stages of atherosclerotic lesion" were used to construct the model, rather than only using "0" or "1" to represent the atherogenic risk of the atherosclerosis patients. If relevant clinical data are available for our model development, we believe prediction of atherogenic risks can be further improved by constructing models both with molecular and clinical parameters.