|Home | About | Journals | Submit | Contact Us | Français|
Objective: To apply advanced statistical model-building procedures to derive proteomic “signatures” from 2D gels and validate the approach by predicting double-blinded samples.
Methods: A large experiment was used to explore the power of the predictive modeling process (340 samples, 18 groups, seven double-blinded and three unknown).
The images were geometrically corrected and then analyzed at a pixel level. On completion of this procedure, areas important for obtaining good group discrimination were automatically identified.
The areas were ranked and visually examined. Up to 10 per group were selected for the next stage of analysis. The 117 resultant spots were then used to build predictive models. The models were explored in the context of the experiment and also for their prediction performance. This process enables the selection of candidate spots that may be below standard univariate thresholds (such as p < 0.05, 1.5-fold change).
Results: Models were successfully built that gave perfect performance on the training sets. The blind samples were successfully predicted and interesting information on the unknown samples was produced and is the subject of further experimentation. The effective “systems” dimension for the 18 group sets was estimated to be 12, which suggests we may have more groups than is supported by the data. A “minimal spot set” was calculated and showed a saturation in prediction performance at around 60 spots. A follow-on procedure was employed to choose the best spots for group discrimination and also to specify the spot number vs. performance relationship.
Conclusion: Proteomics data provide a rich source for advanced statistical modeling techniques, and using standard double-blind procedures can add an intuitive confidence to the experimental results. The techniques are very powerful in assisting in the exploration of the complex relationships intrinsic to the data.