This paper describes the use of the PRECIS tool for the multidimensional evaluation of the ACCESSATION Study, and provides a thorough exploration of the study design that impacts its pragmatic/explanatory nature. Use of the tool highlighted study design features for which discrepancy of opinion existed among the authors regarding the degree of pragmatism within the trial, and provided a basis for discussing those areas more explicitly. However, this occurred after the study was initiated, but before data collection was completed. The high variability in ratings at the first scoring round was primarily due to differences in interpretation of the criteria described in the PRECIS tool and of how the design elements of the study fitted these dimensions. The dimensions were discussed and ratings were clarified based on the PRECIS tool. Therefore, it appears that deliberate discussion about each dimension is necessary, especially when there is considerable variability between raters. Use of a Delphi method is appropriate to reach consensus on such complex and subjective material.
If we were to design the study to be more pragmatic, we would reduce the frequency of visits for assessments and use a patient- and physician-defined primary outcome measure. For example, we would ask patients if they had quit or not, as opposed to using a validated scale. To make the study completely pragmatic on the primary outcome measure, we would use an administrative database to see if there was reduction in healthcare utilization in those who received coverage versus the control group.
Developers of the PRECIS tool [16
] considered it to be an initial attempt to identify and quantify trial characteristics that distinguish between pragmatic and explanatory trials, and requested suggestions for its further development. Since 2010, five papers describing modifications to the PRECIS tool have been published, all of which employ quantification of the ratings on each dimension [17
]. Each paper is summarized in Table .
Studies attempting to quantify PRECIS dimensions since 2010
In a similar analysis to ours, Riddle et al. used the PRECIS tool to design a randomized, controlled trial of pain-coping skills [17
]. The authors also used the PRECIS tool to assist with face-to-face meetings and found the approach helpful, for similar reasons. They, too, added a semi-quantitative scale, but of 4
cm in length, and had three rounds of discussions. Their final evaluation led to greater agreement on all dimensions, whereby they increased the explanatory scores of each domain. The timing of their exercise prompted the authors to make revisions to the design of their randomized trial prior to submission for funding [17
Tosh et al. [19
] had three reviewers (co-authors) use a 1- to 5-point scale to review published trials in mental health; they referred to this as the Pragmascope. If a dimension could not be rated, it received a score of 0. Each trial could be allocated a total possible score of 50, with a range of 0 to 30 indicating an explanatory trial, 31 to 39 indicating a balanced trial and any score >35 indicating a pragmatic trial. However, in Figures
and 3, Tosh et al use ranges of 0 to 16 to describe explanatory trials and 16 to 35 to describe an interim trial that is balanced. They had independent ratings and averaging of scores, but did not describe an explicit process to be used to reach consensus.
Several limitations are associated with the use of the Pragmascope at this time. For example, if the dimension could not be rated, the dimension would receive a score of 0 and as such, bias ratings towards the study being explanatory. Moreover, the use of cut-offs for the total scores categorizing trials reverts to the problem of looking at trial design as purely explanatory or pragmatic [8
]. Moreover, the reason for the cut-offs used is not specified. It is not clear why they did not choose 25 (the midpoint) to indicate a balanced trial and any score less than that would favour an explanatory study, while any score greater than 25 would favour a pragmatic study. Moreover, we agree with Glasgow et al. [5
] and Spigt and Kotz [22
] that composite scores should be avoided because widely disparate trials can receive the same score and defeat the purpose of having a dimensional approach to the rating.
The PRECIS Review (PR) tool was developed by Koppenaal et al. [21
] to evaluate systematic reviews and the randomized controlled trials used in the review to help policy makers decide on applicable trials to inform their work. Like us, they quickly realised that a Visual Analog Scale (VAS) scale of 0 to 10 was arbitrary and so converted it to a Likert-type scale of 1 to 5, also including a percentage score. They used two reviewers and an additional reviewer to rate the score when consensus could not be reached. The scoring scale appeared to be valid for the stated purpose and they acknowledged the limitations of broader applicability. Again, given the purpose behind the PRECIS tool to introduce multidimensionality to the evaluation of a study design, scores are important to initiate and guide discussion, but broader consensus on the rating is still required to inform decision making.
Glasgow et al. [20
] also used a 5-point (0 to 4) scale to rate three interrelated, yet separate studies by investigators from three separate institutions. They describe a similar process of training reviewers and noted that investigators tended to rate their own papers as being pragmatic. The scoring revealed moderate levels of variability with most variability within 1 point on the 5-point scale. However, several telephone calls were required to develop consensus on the meaning of each score. It is possible that the scale was not sufficiently sensitive to detect a difference, which would be important if the group was interested in achieving consensus, but less so if trying to evaluate the study per se
and categorize the protocol dichotomously.
Our proposed refinement also identified the need for a rating scale, but included a modified Delphi technique to reach consensus [23
]. We chose a 20-point numerical scale to approximate a continuous scale. This permitted easier, more accurate and more stable coding of the response using e-mail. A VAS with measurement is appropriate when standardized in pen and paper format rather than e-mail, which distorts the dimensions. We also used extreme anchor points, 1 to 20, to discourage rating the domains beyond the numbers provided. Moreover, Likert scales have increased reliability with up to 11 steps, 7 steps being the minimum. Therefore, the scale we used was most sensitive to capture inter-individual differences to better target our discussions. This may be one reason why the spider graphs do not reach the extremes, but it is also possible that the raters appreciated that elements existed in each dimension to prevent an extreme rating. Use of the iterative technique provided a sound basis for discussing the intricacies of the trial design and allowed individuals to provide viewpoints anonymously and then offer their opinions during face-to-face meetings.
Taken together, these examples demonstrate that depending on the purpose of the application of the PRECIS tool (study evaluation versus study design), different scales and methods may need to be used to rate studies. However, our method may be particularly helpful to trialists to ensure common understanding of a study design when working in teams with disparate expertise. Therefore, other investigative teams may find these approaches helpful.
The multidimensional PRECIS tool can be implemented easily by investigators and represents a major advance in the design and evaluation of clinical trials that inform practice, as demonstrated by our own experience and that of others. All clinical trialists need to make compromises in their design due to a variety of practical factors that affect the conduct of a large study. Collaborative research by a team requires consensus on study design to ensure the methods are appropriate to answer the study question. Methods to evaluate study design and reach consensus are needed to ensure that disparate views and perspectives can be reconciled so that the best possible course of action is adopted.
Although most agree that the 10 dimensions are necessary to understand the explanatory–pragmatic continuum, numerical scales run the risk of dichotomously classifying the study and we did not provide a composite score for the study. This required a qualitative approach. Therefore, a more structured process using the Delphi technique that we employed, or a similar nominal group technique used by Riddle et al. [17
], allowed a more democratic process of consensus among the investigators, who hailed from different disciplines and institutions. This process may be helpful to investigators during the design stage of a multicentre collaborative study to resolve disagreements and assist in reaching a common understanding of the design of the study.