In this proof of principle study, we applied our approach to data from three patients and from healthy controls. Samples were stimulated as described and fixed at 8 minutes after stimulating BCR signaling [3
]. Six phospho-proteins were profiled, three at a time, with SYK, ERK and p38 measured in one stain set panel, and CBL, SFK and BTK in a second panel. Two stain sets were used because measuring phosphorylated forms of all 6 proteins together, along with the necessary surface markers for identification of the relevant cell types, would have caused logistical and technical difficulties. Additionally, the data were not originally generated for multivariate analysis, and so no attempt was made to increase the dimensionality of each experiment. We augmented the data matrix as described above, adding a state node for each patient. Because both stimulated and unstimulated data were included, a stimulation state node was included as well, to avoid confounding the distributions. This state node was fully connected as anticipated, so it has been eliminated from the results graphs for visual convenience. Data were discretized to 6 levels, and structure learning was performed as previously described [7
], with all state nodes constrained to be root nodes. The resulting model () shows very high, nearly full, connectivity, with nearly all patients pointing to nearly all phospho-proteins. None of the patients pointed to p38, in spite of the fact that p38 is higher in these patient samples than in the normal controls (see [3
]). Although the abundance of phosphorylated p38 changed in the disease state, this difference was explained by the influence of ERK on p38. Thus, the absolute amount of phosphorylated p38 was altered, but the conditional distribution P
) remained the same. Patient 1 alone did not point to ERK. Consistent with this, the correlation between ERK and SYK in the Patient 1 sample was similar to that in the healthy samples (R
≈ 0.8, data not shown), while the correlation in the Patient 2 and Patient 3 samples was distinct.
Model results. Structure learning was performed on the phospho-protein variables augmented with A. patient nodes only or B. patient nodes in addition to a disease state node. All state nodes are constrained to be roots.
The disease state results in general changes, making the patients as a group distinct from the healthy samples. These differences obscure individual patient to patient variation in the model results. We addressed this by including a disease state node, which indicated for each cell in the data whether it was from a healthy sample or a disease sample in a patient nonspecific manner. The resulting model () is significantly more sparse than the original model, with general disease differences indicated by the disease state node. From stain set 1, the disease state node points to SYK, but not to ERK or p38. Irish et al. [3
] reported a difference in activation of all three of these phospho-proteins, but our model was able to discern that the difference was due to the difference in SYK; their conditional distributions remain unchanged from the healthy to disease state. Additionally, the original study did not explore the role of CBL or SFK, but our model discerned a change in these phospho-proteins. Examination of the data revealed a change in the distributions of these molecules (), however, because the data are non-normal and nonunimodal, the summary statistics employed in [3
] missed these changes. Thus, our approach successfully identifies differences between healthy and disease states.
Raw data. A. 3-dimensional and B. Histogram plots of stain set 2 phospho-proteins. Patient 7 corresponds to patient 1, patient 10 to patient 2 and patient 11 to patient 3. A healthy control is included for comparison.
Differences among the patients could be seen more clearly once the major disease/healthy alterations were represented separately (by the disease state node). As before, Patient 1 did not point to ERK because the SYK, ERK correlation was similar to healthy samples. Note that the disease state node did not change this, because it itself did not point to ERK. Patient 2 did not point to SYK, which is consistent with the data – the SYK distribution was about average among the patient samples. However, the SYK, ERK correlation for Patient 2 was different from the other patients, explaining the presence of an edge from Patient 2 to ERK. (See )
Raw data. 2 dimensional plot of SYK versus ERK. Patient 7 corresponds to patient 1, patient 10 to patient 2 and patient 11 to patient 3. A healthy control is included for comparison.
Prominently, for the stain set 2 phospho-proteins, Patient 3 pointed to no phospho-proteins. This is unsurprising, as the values of three phospho-proteins were about average for the disease state for this patient. Patient 2 had a level of CBL that was about average for the disease state, but a drastically altered distribution of SFK and BTK, as discerned by the model. A visual inspection of the data demonstrates that the joint distributions of the stain set 2 phospho-proteins were different among the three patients, though it does not clearly demonstrate the specific points of difference for each patient, aside from those mentioned above (). In general, the technique was more sensitive to changes, as compared to what can be discerned by visual inspection. As the dimensionality of our data increases to 4 dimensions and beyond, a thorough visual inspection of the complex interactions becomes impossible, necessitating a computational examination of patient differences.