The CONSORT [
34,
35] statement provides guidance and structure to investigators when reporting the results of clinical trials. These guidelines are intended to clarify the key outcomes of these investigations, and ensure that their description is detailed and consistent within the abstract, methods, results and tables. Furthermore, while the CONSORT statement recommends only a single primary outcome, it does not directly specify statistical methods for appropriately handling multiple outcomes. A recent study examining statistical problems found by reviewers in high-impact psychiatry journals demonstrated the need to improve reporting of multiple statistical tests [
22].
The CONSORT 2010 [
36] statement strengthens the discussion of multiple outcomes, and notes that while a trial may have more than one primary outcome, "having several primary outcomes, however, incurs the problems of interpretation associated with multiplicity of analyses … and is not recommended. " (p. 7).
Nearly half of depression clinical trials published between January 2007 and October 2008 in leading medical and psychiatry journals reported more than one primary outcome, while nearly all reported more than one primary or secondary outcome. The median number of total outcomes (not including our category of tertiary outcomes for side effects or similar) was seven. While depression is a multifaceted disorder that manifests itself in many ways over multiple domains, there is a need to specify what outcomes are being considered and how they will be accounted for in a clear fashion. No single primary outcome is appropriate for all depression studies.
We also found that determining the number of primary and secondary outcomes for many of the articles included in this study was not straightforward, with relatively few clearly and consistently specifying primary and secondary outcomes [e.g.
28,
30,
37].
Separate analyses, with no correction for multiplicity, were the most common method to analyze multiple outcomes. A familiar drawback of this approach is the risk of inflating the Type-I error rate (likelihood of obtaining significant results due to chance). While it is critically important that multiple domains of a disorder are discussed, interpretation of a large number of p-values by a clinical reader can be challenging. While we focused on randomized trials, similar issues arise in observational studies. Failure to account for the multiplicity of comparisons could lead to invalid inferences and spurious conclusions. At the very least, researchers reporting a profusion of results without adjustment should address the internal consistency of their findings [
38].
The appropriate use of corrections for multiplicity is not always straightforward [
38,
39]. Rothman [
40] notes that scientists need to explore multiple leads in the search for better interventions and treatments, and that inappropriate use of multiplicity adjustment may obscure possibly important findings. Nonetheless, inflation of Type I error is a serious concern, and in the setting of randomized trials, this must be accounted for in the trial protocol. Several papers employed a Bonferroni-type correction to address the issue of multiplicity. A particularly creative approach was undertaken by Lesperance et al [
30], where the primary outcome (HAM-D) was tested at alpha=0.033 while the secondary outcome (BDI) was tested at 0.017.
A common critique of the Bonferroni method is that it will tend to be conservative when the outcomes are correlated. However, the simulations of Yoon et al [
4] indicated that for settings similar to that of the CATIE trial, with 5 outcomes, the Bonferroni adjustment performed adequately when correlations were moderately. For psychiatric studies, it is rare to have highly correlated endpoints.
Further use of more sophisticated approaches to account for multiplicity may be warranted. Joint testing is particularly attractive in this setting [
4,
7]. By capitalizing on the correlation of multiple outcomes, these methods are generally more powerful than separate analyses [
7] or Bonferroni adjustment [
4]. While more complicated than separate testing of multiple outcomes with multiplicity adjustment, these approaches are straightforward to fit in general purpose statistical software [
4,
7]. Changes in the scale of research and the use of large data banks to test hypotheses will complicate future evidence-based medicine, and will likely exacerbate these issues [
41].
Another troubling problem, unrelated to the multiplicity issue, concerns missing data. When outcome data are only partially observed, separate analyses of the outcomes will lead to the inclusion of different subjects for the analysis of each outcome. The reader is then faced with interpreting treatment effects based on different samples of subjects, as well as assessing assumptions regarding missingness. Joint models are particularly attractive in this setting, since they incorporate partially observed data and pool information across outcomes.
The concordance between the published protocols in registries and the number of published outcomes was also discouragingly low, albeit similar to findings reported for cardiology, rheumatology and gastroenterology RCTs [
42]. Although the 2007 FDA Modernization Act now requires investigators and sponsors to submit information for any applicable clinical trial to NIH/NLM, complete adherence to this act will require some time before becoming appearing in published trial results. While we anticipate that more investigators will publish their protocols in a more timely and complete fashion as part of new journal requirements, selective reporting remains a potential problem [
8,
9]. Investigators must not "torture their data until they speak" [
38] by examining additional outcomes, undertaking unplanned subgroup analyses or similar mischief. The addition of a CONSORT checklist item to note changes in trial outcomes after the trial commences should also help with this issue.
To help improve practice in this area, we suggest that all clinical trial reports:
- Clearly specify a single primary outcome of the study (potentially a clinically interpretable composite), or include multiple primary outcomes along with a strategy to account for multiplicity (e.g. adjustment for multiple comparisons or analysis using a joint model or global test),
- Specify a limited number of secondary outcomes, along with a justification for their inclusion,
- Report these analytic decisions in the published protocol in a recognized trial registry prior to the start of trial analysis,
- Ensure that the discussion of these outcomes is consistent in the protocol, abstract, methods, results and tables, and,
- Consider use of more principled approaches to account for multiple outcomes to help minimize the chance of spurious results due to multiplicity and help to ensure maximal gain of evidence-based knowledge accrues from these important and expensive trials.
Widespread adoption of these recommendations, all of which flow from the CONSORT guidelines and are consistent with the FDA modernization act, could be easily incorporated into common practice. If implemented, they could help improve the timely dissemination and appropriate interpretation of results from clinical trials.