|Home | About | Journals | Submit | Contact Us | Français|
Recent evidence has highlighted deficiencies in clinical trial protocols, having implications for many groups. Existing guidelines for randomized clinical trial (RCT) protocol content vary substantially and most do not describe systematic methodology for their development. As one of three prespecified steps for the systematic development of a guideline for trial protocol content, the objective of this study was to conduct a three-round Delphi consensus survey to develop and refine minimum content for RCT protocols.
Panellists were identified using a multistep iterative approach, met prespecified minimum criteria and represented key stakeholders who develop or use clinical trial protocols. They were asked to rate concepts for importance in a minimum set of items for RCT protocols. The main outcome measures were degree of importance (scale of 1 to 10; higher scores indicating higher importance) and level of consensus for items. Results were presented as medians, interquartile ranges, counts and percentages.
Ninety-six expert panellists participated in the Delphi consensus survey including trial investigators, methodologists, research ethics board members, funders, industry, regulators and journal editors. Response rates were between 88 and 93% per round. Overall, panellists rated 63 of 88 concepts of high importance (of which 50 had a 25th percentile rating of 8 or greater), 13 of moderate importance (median 6 or 7) and 12 of low importance (median less than or equal to 5) for minimum trial protocol content. General and item-specific comments and subgroup results provided valuable insight for further discussions.
This Delphi process achieved consensus from a large panel of experts from diverse stakeholder groups on essential content for RCT protocols. It also highlights areas of divergence. These results, complemented by other empirical research and consensus meetings, are helping guide the development of a guideline for protocol content.
The protocol of a randomized clinical trial (RCT) serves many purposes. Protocols provide investigators with a document to guide trial conduct; trial participants with a detailed description of trial methodology; research ethics committees/institutional review boards (REC/IRBs) with a foreknowledge of predefined safeguards to protect participants’ interests and safety; research funders with a means of assessing proposed methods; and systematic reviewers and others with a description of prespecified methods to evaluate potential biases [1-8]. To fulfill these purposes, protocols must be clear, detailed and transparent.
Unfortunately, many protocols do not adequately describe important methodological details such as allocation concealment (59%) , primary outcomes (25%) , power calculations (27%)  and sponsor and investigators’ roles in aspects of trial conduct  - all of which have been associated with exaggerated effect sizes and potential bias in trials. The lack of transparency and incomplete description of methods makes critical assessment of trials difficult.
Reporting guidelines have been developed to help improve deficiencies in research reports [11-18]. A recent systematic review examined 40 guidelines for trial protocols; only 20% included any description of their methodological development process. Of those reporting consensus methods, none described formal processes for achieving consensus among stakeholders (for example Nominal consensus technique, Delphi consensus) and none described a systematic consideration of empirical evidence for guideline development . Additionally, recommendations differed considerably across guidelines and many did not include concepts supported by empirical evidence. These inconsistencies and deficiencies have implications for those preparing, using, and reviewing clinical trial protocols.
An international group of researchers launched the SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) Initiative in 2007, with the primary aim of increasing the transparency and completeness of trial protocols. The main product of this initiative is a checklist of key items to address in protocols of clinical trials. This guideline is being developed with systematic and transparent methodology.
In line with current recommendations , three complementary methods were specified a priori to develop the SPIRIT checklist: 1) a Delphi consensus survey involving key expert stakeholders in the development and use of clinical trial protocols; 2) a systematic review of empirical evidence supporting the importance of specific checklist items; and 3) face-to-face consensus meetings to develop and finalize the SPIRIT Statement and its associated explanatory document. This paper describes in detail the first component of this research.
The objective of this study was to develop and refine minimum content for RCT protocols by expert consensus. We conducted a three-round electronic Delphi survey. Ethics approval was obtained through the Children’s Hospital of Eastern Ontario.
Invited expert panellists represented the main stakeholders involved in clinical trials: investigators, methodologists, statisticians and senior study coordinators from academia, pharmaceutical industry and government; REC/IRB members; members of funding and regulatory agencies; and major healthcare journal editors. Experts had to meet the following predefined criteria : relevant knowledge and experience; capacity, willingness and sufficient time to participate; and ability to communicate effectively in English. Participants were selected based on expertise and, where possible, were ranked and selected according to objective criteria (trialists were required to be an author on a minimum of five English-language RCT publications over the past 10years).
We identified potential panellists using a multistep, iterative approach , which included nomination/snowballing, authors of relevant methodological research and the Institute for Scientific Information’s ‘Highly cited researchers in clinical medicine’ . This search was supplemented by specific location-based PubMed searches and targeted Internet searching to increase geographical distribution and areas of panellist expertise. Our objective was to include approximately 100 panellists (40 trialists/clinicians, 20 methodologists, 15 study coordinators, 10 ethics board heads/members, 10 funding/regulatory agency representatives and 5 healthcare journal editors) to enable detection of any divergent opinions between experts groups.
An initial list of 59 potential checklist items was collated based on existing protocol guidance  and known empirical evidence. Items were grouped under the following broad headings: a) General information; b) Introduction; c) Methods; d) Trial organization and administration; e) Ethical considerations; f) Reporting and dissemination; and g) Other. Each item included a heading and description; wording and structure were kept similar to existing guidelines, where possible.
All correspondence occurred via email or facsimile. Approximately two weeks before the survey was administered (August 2007), we informed potential participants of the objectives of the SPIRIT Initiative and Delphi process, and invited them to participate. We solicited reasons for declining, where relevant. Participant anonymity and confidentiality of responses were ensured; individual responses were known only to the moderator (JT). Each survey round was conducted over five to six weeks: one week for pilot testing, three weeks for response acquisition (including two reminders prior to the round closing date) and one week for collating the results and preparing the subsequent round.
Each candidate item was rated in at least two rounds. In each round, respondents were asked to rate items on a 10-point scale (or ‘No judgement’) for their suitability for inclusion in a minimum checklist for RCT protocols. A rating of one corresponded to ‘unimportant - should be dropped as an item to consider’ and ten corresponded to ‘very important - must be included’. We provided panellists with a space following each item and encouraged them to add free text comments, suggest reiterations or suggest additional items they felt would be of benefit for inclusion in the SPIRIT checklist, if relevant. Round 1 also collected demographic information (occupation/field and place of employment) and panellists’ self-rated level of expertise in participating in this process.
Round 2 of the survey contained all Round 1 items grouped categorically by median scores rounded to the nearest whole number (median≥8; 6≤median≥7; median≤5). No changes were made to checklist items, aside from the addition of newly nominated items from Round 1, which were drafted to include a heading and description; as before, wording and structure were kept similar to existing guidelines, where possible. For each item, panellists were provided with their previous rating, group summary ratings (medians, interquartile ranges (IQRs) and frequency distributions) and anonymized free text comments from Round 1 (Figure 1). They were asked to re-rate the items and respond to existing comments, if desired. Panellists were informed that, following Round 2, consensus would be defined by the consistency of median scores between rounds (median≥8=high importance, median≤5=low importance) and the absence of significant issues noted in text comments.
The third and final round presented results of items reaching consensus (Parts 1 and 2) and three sections requiring additional feedback. Part 3 included items introduced in Round 2 (to be rated as before: from 1 to 10). Parts 4 and 5 included items requiring a third round of feedback: those rated of moderate importance (median 6 to 7) after two rounds (Part 4; Figure 2a) and items where comments suggested that single items contained multiple concepts of differing importance (Part 5; Figure 2b). For the latter, concepts were delineated and respondents were asked to rate each subitem separately. Each item in Parts 4 and 5 had the following response options: ‘Include’, ‘Exclude’ or ‘Unsure’.
Medians and IQRs were calculated for each item. Subgroup analyses were explored by respondents’ occupation and self-rated expertise.
Invitations to participate in the Delphi survey were sent (by email) to 167 experts; we received a response from 123 experts, of which 104 (85%) accepted the invitation. Reasons for declining (n=19; 15%) were too busy/unable (n=15), not interested (n=1) or no reason provided (n=3). Of the panellists agreeing to participate, eight were unable to respond to either Round 1 or 2 and were not invited to participate in Round 3. Thus, ninety-six experts comprised the final panel.
Panellists met our a priori goals for profession/expertise representation (Table 1). Eighty-nine (93%) panellists from 17 countries responded to Round 1; 86 (90%) panellists from 17 countries responded to Round 2; and 84 (88%) panellists from 16 countries responded to Round 3 of the survey. Seventy-seven percent responded to all three rounds, 16% to two rounds and 7% to one round of the Delphi. Most initiated surveys had 100% completion; missing data were sparse and were clarified individually with the respondent.
Figure 3 presents the flow of items through the Delphi and Tables 2 and and33 present the final results for each concept. In Round 1, respondents collectively rated 56 of the original 59 items with a median of 8 or greater, three with a median of 6 or 7 (Personnel, Logistics and Budget) and none with a median of 5 or less. All items were recirculated in Round 2, where consensus was achieved for 46 (78%) of the original 59 items; 45 items were considered to be of high importance and 1 (Budget) of low importance. The remaining items - four rated of moderate importance in Round 2 and nine where comments suggested that clarification was necessary - were recirculated for Round 3. Of the four rated of moderate importance, three were ultimately recommended for exclusion (General Approach, Personnel and Logistics). The fifteen panellist-nominated items (rated in Rounds 2 and 3; denoted by § in Tables 2 and and3)3) included seven with a median of 8 or greater, six with a median of 6 or 7; and two with a median of 5 or less (Signatures, Insurance)
Where clarification was required after Round 2 (N=9 items), panellists’ ratings in Round 3 commonly demonstrated differential support for specific subcomponents (denoted by ‡ in Tables 2 and and3).3). For example, in general, where items requested specific information plus a justification, respondents strongly favoured the main concept but not the justification (for example Study locations (Include (I)=87%, justification: I=46%) and Eligibility criteria (I=99%, justification: I=66%)). The four components of the item Monetary and material support also received differing levels of support (source of support: I=95%; type of support - material, financial: I=70%; amount of support: I=30%; how support is provided: I=35%).
Overall, the Delphi panellists rated 63 concepts of high importance (of which 50 had a lower quartile of 8 or more), 13 of moderate importance and 12 of low importance for inclusion in a minimum set of concepts for RCT protocols (Tables 2 and and3).3). Most items had narrow IQRs, suggesting agreement between panel members. However, some items had IQRs that spanned from recommendations to exclude the item (five or less) to recommendations to include the item (eight or greater), such as Reporting of early stopping, Ancillary and substudies, trial Feasibility, Signatures and plans to monitor the health of pregnant women and their children (Pregnancy). These items were very often associated with comments stating that the concept is important but is either too specific for recommending in a minimum set for all trials or could be encompassed within another existing item.
Many general and item-specific comments were received during the three survey rounds and were retained for discussion by the SPIRIT group at subsequent face-to-face consensus meetings; examples are highlighted here.
In general, many respondents stated that, although there were many items, most were important and hence rated highly. While some stated that there must be a ‘balance between guiding researchers and being too prescriptive’, others stated that a comprehensive list is more useful in light of the evidence for poor reporting in protocols and due to the ‘serious business’ of clinical trials that ‘deserve(s) a detailed reporting at any stage’. A few respondents were concerned, however, about the possible increased burden on trialists. Some suggested that some concepts may be addressed in associated documents (for example contracts, statistical and Data and Safety Monitoring Board (DSMB) charters, laboratory manuals) - with reference to such documents in the protocol - or through other sources (for example websites). Finally, some panellists suggested excluding items requiring repeated protocol amendments (for example Personnel, REC/IRB approval) to avoid jeopardizing trial progress with required official amendments and resubmissions.
Other general comments related to ambiguity of the term ‘protocol’, the desired scope of study designs that the checklist should address, and the potential need to define intended users of the protocol or checklist. Some noted that, while all items were potentially important elements, the importance of some may be relative to the target end user.
Item-specific comments consisted mostly of explanations to substantiate chosen ratings, suggested revisions, notes of potential overlap between and opportunities for merging items (for example Background, Rationale and Preliminary data; Risks, Harms and Adverse event reporting), and requests for clarification where items contained more than one concept or were vague. All comments were circulated to panellists in each round and delineations provided, where appropriate, in the final survey round.
Subgroup analyses showed few differences between respondents by profession or level of self-perceived expertise (not shown). As examples, REB/IRB members and journals editors were more likely than other groups to support some concepts including a lay summary, a list of abbreviations, and justifications for study locations or eligibility criteria. There were no cases of bimodal results; rather, any differences were in the strength of support with overlapping IQRs. In some cases, the subgroup results enabled examination of the potential validity of additional comments. For example, while some panellists suggested that the items Logistics and Feasibility (which received low support overall) would be important to funding agencies but not to other end users, we found no difference between the opinions of our expert funding agency representatives and other groups for these items. This enabled greater insight and confidence for generating recommendations from the results.
This Delphi survey produced rich information for further development of the SPIRIT Initiative, which aims to develop a guideline for clinical trial protocol content. Recent studies suggest that PubMed indexes over 6,000 RCTs annually  and this number has likely increased over time . This finding does not account for trials indexed in other databases (between 20% and 70% of trials depending on the discipline ) and the minimum of 40% of trials not reaching full publication . Given that all clinical trials should have a protocol, this Delphi and the SPIRIT Initiative have broad applicability.
Our panellists rated many concepts as highly important for inclusion in RCT protocols, most of which had a strong majority favouring inclusion, indicating consensus within the panel (for example narrow IQRs). The importance of some of these concepts, such as allocation concealment, outcomes (including delineation of primary outcomes), roles of sponsors, and conflicts of interest, are substantiated by strong empirical evidence associating them with risk of bias in trials [2,8,28-34]. Other concepts are supported by more pragmatic, regulatory or ethical rationale. Importantly, many of these concepts are often not described in protocols of RCTs [1,3,9,10,35]. This may be, in part, because most existing protocol content guidelines do not recommend such concepts . The reasons for the variation between existing guidelines and our results are unclear as most guidelines do not report their methods of development.
Our results also indicate where panellists favoured excluding concepts and where a clear consensus was not attained. For the former, such as Budget and Logistics, the lack of support does not suggest that such items should not be included in protocols; only that they may be context-specific (for example not necessary for journal publication of protocols) and thus are not appropriate in a minimum set of requirements. Examples of the latter include items where wide IQRs remained. We believe that a systematic review of the methodological literature is important to complement the Delphi results and to guide and substantiate final recommendations.
Beyond the utility of the Delphi results for trialists, REC/IRBs representatives, funding agencies and the SPIRIT group, this research may be relevant to those developing reporting guidelines and our experience has already helped shape the methodology of other ongoing initiatives. Selecting potential panellists should be given adequate time and attention to ensure they meet the criteria suggested by previous guidance  as this is pivotal to both the internal and external validity (generalizability) of the Delphi results. Future endeavours should also consider empirically supported strategies to help increase response rates [36-42] including those used in the current study: survey prenotification/invitation to participate, personalized invitations and surveys, notification of and adherence to expected timelines, clear outline of expectations including time-commitments, written commitment by panellists to participate (reply by email), follow-up reminders to non-respondents, provision of previous rounds’ responses and assurance of confidentiality. We also pilot tested each round, collected panellists’ comments and employed a flexible survey design. Using an Internet-based tool may substantially increase Delphi efficiency and is recommended for future work.
Despite the many benefits of the Delphi consensus technique, the results are only as valid as the opinions of the experts constituting the panel. Even if consensus is attained, validating whether this consensus represents the ‘truth’ is not possible, and we recognize that expert opinion remains among the lowest levels of empirical evidence . To safeguard the validity of our results, we carefully selected a panel representing key stakeholders. Structured, predefined methods were employed to minimize biased response collation. Importantly, our panellists were experienced and committed to completing the process, increasing internal validity of the results.
We chose the Delphi consensus method  for this work for several reasons: the research problem was felt to benefit from expert opinion on a collective basis; a larger and more diverse group could be consulted than could effectively meet face-to-face due to expense, size and the logistics of group interaction; and the preservation of participant anonymity allowed for open discussion. This method also shares the advantages of other integrative methods of knowledge translation  ideally resulting in a guideline that, beyond being founded on transparent and systematic methods, is externally valid and ultimately meets the needs of end users. We recommend this technique to others embarking on similar initiatives.
This Delphi consensus has provided a large volume of rich information to guide the development of the SPIRIT checklist, an evidence-based guideline for the content of trial protocols. By applying a formal consensus method and engaging experts from diverse areas, the results of which will be complemented by empirical evidence from the methodological literature, the aim is to collate guidance on important concepts to address in protocols. The SPIRIT Initiative ultimately aspires to help increase transparency and completeness of information in trial protocols, ideally helping to improve the reliability and validity of the medical literature guiding healthcare decisions.
CONSORT: Consolidated Standards of Reporting Trials; IQR: interquartile range; RCT: randomized controlled trial; REC/IRB: research ethics committee/institutional review board; SPIRIT: Standard Protocol Items: Recommendations for Interventional Trials.
The authors have declared that no competing interests exist. All authors are members of the SPIRIT Initiative Steering Group.
JMT, AWC and DM conceived the study and prepared the initial list of survey items. JMT designed and administered the survey, and collected and analyzed data. JMT wrote the first draft of the manuscript with input from all authors. All authors read and approved the final manuscript.
No direct funding was received for this study. The authors were personally salaried by their institutions during the period of writing (though no specific salary was set aside or given for the writing of this paper). Dr. Moher is supported, in part, by a University (of Ottawa) Research Chair. No funding bodies had any role in the study design, data collection, analysis, decision to publish or preparation of the manuscript.
We thank the Delphi panelists for their participation and for their dedication to seeing this process succeed. Their substantive feedback and support was very much appreciated given the significant request for their time.