Reproducibility
The more precisely the position is determined, the less precisely the momentum is known in this instant, and vice versa.
– Heisenberg
The heart of the issue for “omics” methodology has been reproducibility—healthy skepticism of the reproducibility issue has advanced genomic exploration to derive the training set from well-characterized cohorts and to validate in separate cohorts [
20]. The proteome is somewhat different, but the simplest solution of utilizing well-characterized samples with standard operating procedures enhances reproducibility greatly.
Nevertheless, unlike a relatively stable genome, the proteins present in a given system will necessarily differ not only from one individual to the next, but also from one moment to the next. Proteomic studies are extraordinarily sensitive to such changes as well as to differences in sample preparation protocols and instrumentation. As confounders, such variables can quickly overwhelm experimental data with extrinsic noise. But, in the context of well-designed and well-controlled experiments, the same sensitivity to individual and temporal variation can augment the wealth of biologically useful information. For example, the study of a well-phenotyped individual organism over time may offer more reliable hypothesis-generating targets than pooled heterogeneous samples of large sizes.
Accordingly, it can be equally argued that sensitive measurement of rapidly changing biological variability potentially leaves no room for a traditional sense of “reproducibility.” Is it actually possible to reproduce/recapture both quantity and temporal profile from experiment to experiment or from one individual to another? Heisenberg’s principle of uncertainty states that it is impossible to determine simultaneously both position and momentum of a particle—the foundation of quantum mechanics’ “wave-particle duality” to address the inadequacy of classical debate of the one-dimensional theories of particle versus wave. In acknowledging the “uncertainty” and complexity of matter, the theory laid foundations for quantum mechanics, calling for alternative ways to describe matter. Perhaps, an analogous principle should be applied to proteomics—where the unique biological signature must also be described in relation to both space and time? That is, in a biological entity, the precise quantity of any protein is expressed as a function of many “phenotypic dimensions” over time. In order to be fully descriptive, proteomics should not be simply “reproducible” in one dimension, but proteomic patterns will have to be “continuously convergent.”
Innovative bioinformatics and mathematical tools can help us to store, visualize, sort, and reconstruct, such that similar conclusions can be drawn from multiple datasets. But, this may not mean the exact same proteins are elevated. Instead, global pathways or interactions may be preserved through phenotypically different ways. Individual proteomic profiles may never be identical (“reproducible”), but by making the right interpretive comparisons in experiments constructed to focus on relevant changes within each single subject, they may consistently tell us the same story and that is what matters.
By analogy, each organism is still the same or its own “replicate” as it ages. Its biological and physical features may change drastically over time as any biological entity cannot stay static or “noise-free.” But, it is still the same organism! So, perhaps a new set of criteria needs to be established to measure this “continuous convergence” of a distinct biological signature with respect to time and intrinsic biological “noise.”
Noise
How do you make sense of your life? Signal to noise: What’s signal? What’s noise?
– Neil Gaiman
Accordingly, it may be necessary to recalibrate our understanding of what reproducibility means in the context of noise. And all biological systems have noise. For example, early protein chemistry of what is now termed “targeted” or “candidate” approaches that focus on a single or a few biomarkers have been successful. Examples of useful clinical markers in cancer (e.g., PSA for prostate cancer diagnosis), infectious disease (CD4 count to follow progression of HIV), or cardiovascular disease (CRP in cardiac risk stratification) all pre-date some of the “omics” explosion and are strongly rooted in bench research. A major introspective question within the omics/biomarker field is, “why is it that over the last decade, we are equipped with better instrumentation and omics technology, and yet we have fewer discoveries of clinically relevant biomarkers” [
17]? And, even if there are new findings, they seem to be nonspecific inflammatory “noise,” and they become irreproducible and non-predictive in another individual or another cohort. Of course, many potential reasons have been discussed, such as overfitting of data and various and different sample/instrumentation quality [
18,
19].
However, it is unreasonable to expect the same mass spectrometry spectra to be generated by different investigators working with different protocols on different machines, using pooled samples taken at different times from different cohorts of multiple individuals. Though such methods were important in early efforts to chart the human proteome—understood as an abstract entity analogous to the genome—they may not be appropriate for translational research directed at clinically relevant variations in individual patients.
Pragmatically and methodologically, if omics technology can afford us the sensitivity of studying a multitude of markers at the same time, can we leverage this power to study a smaller number of individuals over time and space, across specific pathways, interactions, phenotypes, or disease states, utilizing each individual as their own control, thereby reducing the caveats of complexity/dimensionality/confounders? While we carry our own blood type for future match, can we each have our own non-diseased proteome baseline, such that, when we are ill, a novel signal can be detected more readily? Is the “noise” also worthy of investigating and incorporating into the new perspective? Not just philosophically, but pragmatically, perhaps, we need another set of criteria for reproducibility in the context of the noise of life? Once again, expecting everything to be the same or reproducible at all times, just because they share similar disease states may not reflect our ever-changing homeostasis. And, proteomic investigations may require a search for a continued convergence of patterns.
Fishing
Cast the net on the right side of the boat, and ye shall find.
– John 21:6
Proteomics is not blind fishing, nor is it entirely new, since it is built on decades of careful work in the study of proteins, which even pre-dates the discovery of DNA. It is a method, enabled by advancing technology in mass spectrometry and protein separation chemistry, to be able to study expression and interaction in real time. Though a powerful methodology, it cannot take the place of a well-designed experiment, years of well-studied targets, or thoughtful hypothesis of focused biological questions at hand. The science remains the same, but omics technology can now offer a more multi-dimensional probing of complex organisms. While it offers more details—as when perspective was first being introduced into painting to give a more realistic 3D depiction of the world—the subjects of rigorous scientific inquiry remain the same.
One important innovation from “discovery” proteomics is its ability to yield multiple novel candidates for further study. To simply label this as “fishing” is to turn our back on a great potential benefit. Of course, we should be on guard as always against poorly designed experiments and unfocused hypotheses. But, real fishermen possess a great deal of detailed skill and knowledge about fish, the water in which they live, and how/when/where to catch the fish they want. Similarly, the type of protein–protein interactions, pathways, and candidates should fit the context of the specific and focused hypothesis.
For example, well-planned discovery proteomics should aim to capture the augmented signal by asking questions with sufficiently large signals such as pre- and post-therapeutic challenge or cellular stimuli to gauge therapeutic efficacy/response. And, perhaps a combination of targeted and discovery approaches would yield more fruitful results that are specific to the mechanism under investigation. Sometimes, starting with discovery proteomics in a less complex mixture such as cell culture to guide the more complex in vivo exploration with targeted methodology may yield helpful results. And vice versa, sometimes starting with clinical exploration to take back to the animal model for validation can help to confirm biological relevance. Incorporating a longitudinal temporal component and crossover design will also minimize noise and maximize signal. Therefore, a well-designed proteomic study has the potential to answer specific scientific questions while at the same time discovering new candidates for further investigation, greatly accelerating the pace of new hypothesis generation to compliment traditional methods. If one never goes fishing, one cannot catch fish. So, happy (wise, well-planned, and productive) fishing!