Evolution and translation of research findings does not have to be a roundtrip journey from bench to nowhere. In Box 4
, I list some suggestions that may improve the situation. As we work on integrating scientific disciplines and materializing discoveries, translation would benefit from robust evidence. Translating non-credible, non-replicated research findings may have bleak consequences. We already have several useless prognostic and diagnostic tests, ineffective and possibly harmful therapies, and redundant sub-specialties sustained by unsubstantiated optimism on their benefits [55
]. We should not add more junk to this pile.
Box 4. Improving the Credibility, Replication, and Translation of Research Findings: Thoughts for Possible Solutions
- Promote multidisciplinary communication.
- Foster systematic, evidence-based approaches to research.
- Acknowledge in earnest the difficulty and even the failures of the scientific enterprise.
- Examine which pathways have led to specific successes and failures in translation.
- Focus on credibility rather than simply the statistical significance of research findings.
- Synthesize evidence systematically from many studies and teams of investigators and anticipate this integration from the design phase of research.
- Give credit to original ideas, good-quality work, and robust methodology rather than to impressive claims and magazine hype.
- Encourage rigorous replication, not just discovery.
As researchers, we should acknowledge difficulties and failures. In a world where everyone struggles to impress with achievements, public trust in science may be enhanced if it is seen as an enterprise where its workers do not simply try to impress, but seek the truth under often unfavorable odds of success. We also need to examine systematically what really has worked to date and the pathways of discovery for such successes. Moreover, we have a large evidence base where we can find out what has not worked so far and where and why we have been misled.
Research findings should be ascribed a credibility level that is different from their formal statistical significance. In the current era of massive hypothesis testing, levels of statistical significance are almost non-interpretable. The p-value threshold of 0.05, which barely worked when there were few hypotheses and investigators, is currently impractical. Circulating p-values increasingly reach depths of 10−4, 10−10, or 10−60. “Details” on how the data are collected, handled, and analyzed can change p-values by log scales.
In the past, we had few research findings; currently we have too many. This is exciting, but we don't know what they mean and how to use them. Credibility of research findings may be visualized in the form of a wide-based pyramid, where most findings have low credibility, and few have high credibility. RCTs can test findings that are somewhere between the middle to the top of the credibility pyramid. Target selection should be careful and systematically evidence based. Apart from attention to design, power, and protection from biases, this requires also careful strategic planning for designing research agendas and making sense of the overall picture of all RCTs in each field [56
]. Designing trials in isolation or with non-scientific priorities creates fragmented, irrelevant evidence.
Finally, replication in the current era is probably as important as or even more important than discovery. Replication alone does not protect against bias. Studies with inherently bad design may be prone to replication if the same errors are repeated, while well-designed studies tend to replicate only when they are correct [58
]. Replication requires rigorous evaluation with consistency in a variety of repeated tests. Scientific credit has traditionally been given to discoverers, but for many research fronts, discovery is currently an automated multiple testing process. The more difficult challenge is to dismiss false discoveries and materialize some truly useful findings.