|Home | About | Journals | Submit | Contact Us | Français|
Pooled analysis of individual patient data from stroke trials can deliver more precise estimates of treatment effect, enhance power to examine pre-specified subgroups, and facilitate exploration of treatment-modifying influences. Analysis plans should be declared, and preferably published, before trial results are known. For pooling trials that used diverse analytic approaches, an ordinal analysis is favoured, with justification for considering deaths and severe disability jointly. Since trial pooling is an incremental process, analyses should follow a sequential approach, with statistical adjustment for iterations. Updated analyses should be published when revised conclusions have a clinical implication. However, caution is recommended in declaring pooled findings that may prejudice ongoing trials, unless clinical implications are compelling. All contributing trial teams should contribute to leadership, data verification, and authorship of pooled analyses. Development work is needed to enable reliable inferences to be drawn about individual drug or device effects that contribute to a pooled analysis, versus a class effect, if the treatment strategy combines two or more such drugs or devices. Despite the practical challenges, pooled analyses are powerful and essential tools in interpreting clinical trial findings and advancing clinical care.
Scientific advancement is based on hypothesis testing and replication. Clinical trials are interpreted on this basis: a single positive trial may be encouraging for any new therapy but two such trials are typically required for marketing authorisation and establishment into clinical practice. For many reasons, that may include insufficient statistical power, suboptimal design, inexperience with treatment delivery, and use of prototype treatment approaches, initial clinical trials of a useful treatment may declare a falsely neutral result; however, publication bias also contributes to the trend for later trials to be positive. It has become recognised practice to pool trials to refine our assessment of the treatment effect, helping to indicate not just whether it is effective, but also how effective it may be, and in which circumstances.
Typically, data are pooled at the trial level (i.e., meta-analysis) or, occasionally, at the subgroup level. For example, the Cochrane review of thrombolysis for acute ischaemic stroke management used trial-level data for its main analysis and considered two treatment time windows for subgroup analyses. While this type of trial-level pooling is useful, it disregards potentially valuable information at the patient level that could prevent false conclusions. Taking the principal results of the Cochrane thrombolysis review, a reader may conclude that use of iv alteplase is justified only if administered within 3 hours of stroke onset, since the 3–6 hour subgroup analysis showed no significant benefit; or that treatment at any time within 6 hours is justified, since the primary analysis of the 0–6 hour data was positive.1 Conversely, a reader of the pooled analysis of individual patient data (IPD) would likely draw a different conclusion: seeing that treatment benefit is closely dependent on delay since stroke onset, and that benefit remains statistically significant until at least 4.5 hours, the reader may favour treatment beyond 3 hours but only until 4.5 hours.
Pooling of IPD also opens the way for more powerful analyses, since results can be adjusted for multiple covariates (i.e., variables that may influence the outcome such as time to treatment, age, stroke severity, sex, diabetes, prior stroke, and baseline neuroimaging features in the setting of thrombolysis trial data). Exploration of individual covariates in larger samples allows for a better estimate of treatment effect size in future populations and subgroups, restricts the confidence interval around these estimates, and indicates which are the important factors to consider when selecting patients for treatment. The analysis of pooled IPD releases the restrictions imposed by the individual trial protocols and publications: fresh criteria for defining subgroups and applying a common outcome measure become possible. Furthermore, subgroup analyses that are pre-specified (i.e., prior to release of trial results) and adequately powered could go beyond being hypothesis-generating to achieving a new level of evidence. Individual trials may be underpowered to assess a given subgroup and, in that circumstance, a pooled analysis might bring key confirmatory data for regulatory considerations. It is acknowledged that in addition to prespecifying subgroups of interest, pooled analyses must still protect against the risks inherent in multiple testing by prespecifying the primary endpoint and incorporating statistical adjustment where necessary.
These advantages of IPD carry a modestly greater burden, however. Cooperation among trialists is required and needs to be coordinated; the necessary technical skills, time, effort and costs are increased; it is essential to understand and allow for the varied context and conditions under which data were collected across trials; and the risks of data mining become infinitely greater.
This manuscript describes conclusions arising from a workshop held at the ninth Stroke Academic Industry Roundtable (STAIR) on 5 October 2015 in Bethesda, USA. This workshop was designed to discuss principles that would facilitate and optimise value from pooling of stroke trial data. Participants included academic, industry and regulatory experts are listed in the Appendix. The approach taken to develop STAIR guidelines has been described elsewhere.2 Key recommendations are summarized in the Table.
Though most acute stroke trials have chosen the modified Rankin Scale (mRS) as their principal outcome measure, and those that instead targeted vessel patency have retained mRS as a secondary measure, several analytic approaches and definitions of good results have been used. For example, these have included dichotomising modified Rankin at 0–1 versus 2–6, at 0–2 versus 3–6 or even at 0–4 versus 5–6; examining a ‘shift’ in distribution of the full scale or examining the distribution after combining category 5 (bedbound) with 6 (dead); or finally examining patient-centred ‘utility’ of the mRS scores, which has a similar effect as the previous approach. (Figure) Each has merit, but a pooled analysis may have different aims from individual RCT objectives. A common approach is needed when combining trials, if the influence of covariates is to be correctly estimated. Since switching the choice of endpoint may change the formal interpretation of the trial between neutral and positive, and since a common endpoint likely will not already exist, the collaborators planning a pooled IPD analysis must take care when pre-specifying their endpoint.
If none of the trials to be included has already been unblinded, then any rational approach to analysis may be justified. If, however, one or more trial results were known, then this would influence or be perceived as influencing the choice of common endpoint. The least restrictive approach is needed (i.e., the one that invokes fewest assumptions). It must still be an endpoint that is rational for the treatment being tested and useful for clinical interpretation. The available choices each have pros and cons.
Dichotomisation considers the mRS in only two categories, such as mRS 0–1 as good outcome and mRS 2–6 as bad outcome. This endpoint can readily be used to assess statistical significance and to generate a measure of effect size with an associated confidence interval; can be converted easily to a number needed to treat (NNT); and is simple to explain to patients and clinicians. However, it also suffers from three disadvantages. First, it may conceal harmful effects within the ‘poor outcome’ stratum: for example, an increase in mortality due to increased intracranial bleeding. This separates benefit from risk. It may be desirable to do so, particularly if the timescale for these two differs, such as when fatal bleeding due to treatment may be somewhat balanced by later survival gains among the less disabled survivors of treatment. Second, for many stroke trial populations, it also conceals benefits among a majority of patients who participated and were destined at best to achieve partly disabled survival (mRS 2, 3 or 4). It is neither ethical to include such patients if they will not contribute usefully to interpretation to the trial, nor is it statistically sensible to disregard the richness of the information that they provide; indeed, an ordinal approach to analysis typically contributes 36% more information and thus statistical power than a dichotomised approach.3 Third, dichotomisation requires a combination of advanced knowledge of the treatment’s effects, the case mix of the trial, and luck. Without these, the chosen cut point for dichotomisation may turn out to show a smaller treatment effect than other thresholds that have been disregarded. Although this has been discussed in the stroke literature, several recent trials retained dichotomization of primary endpoints and reported neutral results, whereas they would have declared positive results if different cut points or ordinal analyses had been selected as favoured by the European Stroke Organisation Outcomes Working Group.4–6
An ordinal approach also invokes certain assumptions and requires some choices, however. The first assumption of ordinality is that each step on the scale reflects a genuine improvement from the preceding step, as perceived by all relevant parties. This may not be universally accepted for mRS, because in some societies and among certain age groups, survival with severe disability – bedbound, incontinent and totally dependent (i.e., mRS 5) – is considered to be as bad as or even worse than death.7 This creates an argument for combining mRS categories 5 and 6 in an ordinal analysis approach.8 The second assumption, which has less importance and which does not compromise statistical analysis but has an impact on presentation of results, is that all steps are of equal value (i.e., assumption of proportionality). It is evident that this assumption is violated for mRS: many patients regard the steps between mRS 5 and 4 (being released from bed) and from 4 to 3 (recovering independent mobility) as carrying greater value than returning to all usual activities (mRS 2 to 1) or being free from non-disabling symptoms (mRS 1 to 0). Describing a trial result by showing average improvement of a certain proportion for each mRS category is complex. The statistical approaches to ordinal analysis suffer from some disadvantages also. We usually compare overall differences in mRS distributions between two treatment groups using the Cochran-Mantel-Haenzsel test, which is a nonparametric test and therefore does not assume a normal distribution of the data. We then adjust for imbalances in covariates by using its van Elteren variant, but this approach requires the covariates to be categorical rather than continuous; for example, age and stroke severity must be grouped into strata. It provides a p value but expresses neither the direction of change nor the size of effect. It typically is followed by ordinal logistic regression to estimate the odds ratio of the treatment effect and its associated confidence interval. This second step introduces an assumption of proportionality of odds – implying that the treatment has changed the odds of moving from mRS 5 to mRS 4 by a similar amount to the odds of moving from mRS 3 to 2, etc. This assumption has been violated when examining thrombolysis treatment for acute stroke.9 (Lees, unpublished data, 2016) It also creates a second problem: the logistic regression generates its own p value associated with the estimated confidence interval for the odds ratio, and that p-value generally differs slightly from the overall p-value calculated from the van-Elteren test. However, logistic regression permits use of both continuous and categorical covariates. Even so, there are several approaches to describing the effect size that do not invoke the proportionality assumption10–12 and also several circumstances where any violation of the assumption has limited impact.
A further variation in ordinal approach is to adjust the weight given to mRS categories according to the perceived preference (i.e., utility) for each mRS category by multiplying the number of patients within each mRS category by that utility weight, and then statistically comparing the sum of these products from each of two treatment arms.13,14 This approach solves several of the weaknesses of the earlier methods but its main disadvantages are that social, geographical and demographic factors may influence the weights given to mRS categories, and that some disabilities such as dysphasia cannot be ranked because many stroke survivors with dysphasia cannot respond to such surveys.
In considering all of these issues, the STAIR workshop participants concluded that a standard methodology for pooled analyses would be desirable. An ordinal approach should generally be favoured for an IPD pooled analysis, in which contributory trials have varying endpoints, because this has greater statistical power and reduced reliance on assumptions around the nature of the treatment effect. The participants also favoured collapsing mRS categories 5 and 6, since this better reflected perceived value of the steps. Although there was considerable enthusiasm for the utility-weighted approach, it was considered still to be less validated and subject to geographic or cultural biases. The participants noted that, though an ordinal approach for distribution of mRS (where mRS 5&6 are combined) should be the primary approach, results should also be converted to the utility-weighted and dichotomised approaches for descriptive purposes. Finally, they noted also that their conclusion should not restrict the analytic approaches of individual trials, where different considerations may apply.
A pooled IPD analysis must also harmonise the timing of final assessment used for its principal analysis, though the choice here is less controversial, less under control of the trialists and possibly may have less impact on interpretation. The latest common assessment that is available in all trial datasets should be used, recognising the usual convention that recovery is unlikely to have stabilised before 3 months. For example, the stroke thrombolysis trialists’ collaboration chose to accept outcomes at 3 months for 8 trials, but at 6 months for a ninth trial in their pooled analysis, rather than describe outcome at one month or earlier.15
Just as interim analysis of a single trial for efficacy or futility influences the probability of reaching a final positive result and thus requires reduction of the final p value for significance, assimilation over time of trial datasets to a pooled IPD analysis must be recognised as a sequential approach. Even if the protocol for a pooled analysis were published in advance of unblinding of any of its contributory trials, specifying the number, size and identity of the trials that will be included before a result will be announced, it is conceivable that a further trial will be created later to extend, confirm or refine some aspect of its findings. Pooled analysis is a continual process. The participants at STAIR recommend that a sequential analysis approach be taken to control for the potential bias generated when analysis may be undertaken repeatedly, on an expanding sample. The statistical analysis plan for the TREAT collaborators’ pooling of the thrombectomy trials describes an appropriate approach.8 Bayesian approaches were also suggested, and further work in this area is needed to consider the advantages of one over the other..
This requirement to adjust for potential repeated looks at the data applies not just to an overall result from the pooled analyses (is treatment effective or not?); it also, perhaps more importantly, applies to subgroups, which likely will expand at different rates since trials vary in their case mix. Further, for subgroups especially, a sample size calculation should be described. This need not restrict analysis prior to attainment of that sample, but will assist in interpretation of neutral findings for such subgroups. Again, the TREAT statistical analysis plan covers both issues.8
Third, due to variations including trial design, timescale, geographical location, there will be likely variation in treatment effects that cluster within trials. Pooling IPD allows powerful analyses of individual factors that contribute to variation but the analysis approach should still stratify by trial to control for possible heterogeneity between trials. This was done by the STTC and is planned by TREAT investigators.8.9
The sequential nature of such pooled IPD analyses leads to a question over timing of publication of results. There are arguments in favour of lodging such papers in an accessible online repository each time that they are updated, but the STAIR participants recommend that formal publication in a peer review journal be considered each time that a fresh analysis produces a finding that may change clinical management. Reporting should follow PRISMA-IPD recommendations.16
There may be conflict regarding pooled outcomes as a group treatment effect (e.g., thrombectomy by any reasonable means improves outcome) versus specific device- or drug-specific effects (e.g., thrombolysis via rtPA but not streptokinase is effective). There is a need to define circumstances in which the scientific community, and regulators, should accept a group effect. This may be reasonable if each component drug or device in isolation shows a point estimate for effect above a certain threshold, but it is uncertain whether absence of significant heterogeneity is sufficient. In circumstances of a new treatment, a non-inferiority analytic approach may be taken. This could be extended to allow for a drug-by-drug (or device-by-device) comparison of individual effects against the pooled effect of the remaining treatments. There is a need for development work in this area, to consider technical aspects and to formulate guidance on managing such exploratory analyses.
It would be ideal if the planned interpretation in this regard were published in advance of any trial result being known, and certainly preferable that it should be decided in advance of any pooled analysis. If plans are not prespecified, then any heterogeneity within the result will need cautious interpretation.
Ideally, all datasets would be collected in a common format and would be shared immediately upon conclusion of each trial. In practice, neither is realistic. Trials are individually designed and require time to publish individual primary and secondary results. Common data elements for NINDS trials have been defined elsewhere17 but are variably observed. Data are stored and shared in varied formats using diverse definitions for each variable. A substantial part of the work of pooling involves understanding each trial properly. This requires a skilled, stroke-experienced statistician working in close collaboration with the original investigators of each trial. These original investigators should also meet and collaborate actively in the writing of protocols for pooled analysis and in the interpretation of findings. The trial protocols, statistical analysis plans, manuals of procedures and case report forms should be shared to aid interpretation of the dataset. It is not sufficient to send a file with data to the pooling group and hope that they will correctly understand the documents from an individual study.
The timing of data sharing presents another challenge. Trial investigators must have an opportunity to present and publish the primary and planned secondary analyses of their study without compromising this intellectual property by releasing raw data to the public domain or having the research questions answered from a pooled source beforehand. The pooling collaboration should be able to offer firm guarantees that shared data will be used only for the approved pooled analyses and will not be released to a third party without prior agreement, and that the pooled analyses that compete with the trialists’ existing plans will not be released in advance of their individual publication. At some later point, these issues become less relevant, particularly for government-sponsored and investigator-initiated trials. For example, NIH-funded clinically trials are required to have data-sharing plans upon initiation to ensure “timely” release of data to the public.18 More broadly, the Institute of Medicine recommends public release of data associated with the primary publication of the trial results within six months of primary publication, and the full data set no longer than 18 months after study completion (unless the data are part of a regulatory application).19 Even so, the STAIR participants recognised that IPD analyses should be undertaken as a joint, collaborative venture for scientific as well as political reasons, and these rules about data do not directly guarantee cooperation.
More complex is the situation in which a pooled dataset may already answer a question that is being tackled specifically by an ongoing or planned trial, and may thereby compromise completion of that trial. For example, an analysis of IPD from recently published thrombectomy trials may indicate an apparent relation of treatment benefit to time elapsed from stroke onset. At the same time, ongoing trials are examining late time windows. The pooling collaborators must consider the merits of such cases, taking into account the relative size of the datasets, the timescale over which the ongoing trial(s) may be completed, the clinical impact of any early announcement and the ethical dimension. Potential conflicts of interest among investigators must be handled carefully. These questions are similar to issues that regularly face independent data monitoring committees.
A collegiate spirit and recognition of colleagues’ contributions and concerns is also required for leadership and authorship purposes. Pooling projects require representation from every contributing trial. These representatives should ideally be in place even during the planning phase, though there must be a mechanism to add contributors when new trials become available. It is a good principle that authorship should also have one or more representative from each trial that contributes data to the collaboration, even if a small writing group will draft the manuscripts, and it is desirable that the author byline should refer to each of the component trial groups, with a listing of their steering committees in an appendix. Pooled analyses can have considerable academic impact, and it would be unreasonable for the original authors of the contributing trials not to share in the final reports. Pooled analyses should not be undertaken by independent groups without full participation of the original trialists, for both academic and practical reasons. Some of these, for example relating to checks of data integrity, are reflected in the PRISMA-IPD statement.16
A further challenge arises from the contribution of funders. Sponsorship of research should merit access to output from pooled analyses at an early stage but should neither influence the design of the analysis, the interpretation of findings nor the timing of publication. Handling of subsets of data and of drug- or device-specific analyses, as discussed earlier, may need cautious consideration if these would have commercial implications.
Pooling of individual patient data from individual clinical trials provides the power to determine treatment effects with more precision, especially within subgroups, and to explore modifiers of treatment effect. To be unbiased, detailed analysis plans should be declared before trial results are available. An ordinal analysis with a sequential approach, with statistical adjustment for each iteration, is favoured. All contributing trial teams should contribute to leadership, data verification, and authorship of pooled analyses. With careful planning and collaborative approaches, pooled analyses can meaningfully and rapidly advance clinical care for our stroke patients by providing supportive data and new observations.
We thank Gary Houser for his invaluable help in organizing the STAIR conference.
STAIR IX Collaborators:
KRL chairs the Virtual International Stroke Trials Archive (VISTA) collaboration and the European Stroke Organisation (ESO) Outcomes Working Group, is a member of the Stroke Thrombolysis Trialists’ Collaboration (STTC), the ThRombEctomy And tPA (TREAT) Collaboration and reports receipt of fees and expenses from American Stroke Association, Applied Clinical Intelligence, Atrium, Boehringer Ingelheim, EVER NeuroPharma, Hilicon, Nestle, Novartis, Stroke Academic Industry Roundtable, University of Lancaster; and research funding to the University of Glasgow and to the Virtual International Stroke Trials Archive from Genentech.
PK is a member of the TREAT Collaboration within VISTA; reports payment to University of Cincinnati Dept of Neurology for her research efforts from NIH/NINDS (StrokeNET NCC Co-PI and RCC PI), Genentech, Inc (PRISMS Lead PI), and Penumbra, Inc (THERAPY Neurology PI); and receives fees from Grand Rounds Experts, Inc (online clinical consultations), UpToDate, Inc (royalties), and medicolegal consultations.