A.V. provided a detailed presentation on the use of advanced pharmacokinetic (PK) and pharmacodynamic (PD) modeling and simulation as a strategy to determine the dose of antipsychotic medication that should maximize separation between side effects (eg, extrapyramidal symptoms, prolactin increases, and other adverse events) and efficacy (eg, PANSS improvement or dopamine receptor occupancy level as an intermediate indicator) while still differentiating from placebo responses. Specifically, population-based PK/PD modeling and simulation can provide a priori projections to guide the design of RCTs (ie, dose selection, timing of efficacy measures, etc) in a manner that should optimize the probability that a given study will detect the “signal” of the compound, despite the “noise” of placebo response. Furthermore, A.V. also reviewed how advanced mixed-effects models that take many relevant aspects into account (eg, participant characteristics, disease progression, treatment effects, placebo effects, differential dropout rates) can also be applied post hoc to provide invaluable insight into the parameters that contribute most directly to response rate differences among participants. Collectively, advanced methods of modeling and simulation are relatively underutilized analytic tools that could contribute directly to overcoming existing obstacles in detecting drug-placebo differences in antipsychotic clinical trials.
The refinement of assessment instruments may also provide a way to improve detection of differences. This has become a familiar topic in forums addressing clinical trial methodology. In a review of the importance of such issues, A.C.L. presented a statistical perspective on how improved assessment procedures directly translates into increased power to detect changes across time and between treatment arms. According to the Guidelines for Statistical Practice from the Committee on Professional Ethics (American Statistical Association, 1999), researchers should “avoid the use of excessive or inadequate numbers of research subjects by making informed recommendations for study size.” Such informed recommendations stem from statistical power analyses, which for most clinical trial designers means increasing sample sizes until the power is sufficient to detect statistically significant change.
Alternatively, substantial improvements in the reliability of assessment procedures can result in decreased within-group variability, increased between-group effect sizes, and consequently smaller sample size requirements to achieve acceptable statistical power.10–12
This precept was illustrated in a poster, presented at the session by A.S.K., which suggested that the improved reliability afforded by computerized administration of neurocognitive assessments could result in a 28% reduction in the sample size required to detect a 10% improvement on these measures. This estimate was derived from the respective means and SDs obtained in a direct comparison of test-retest reliability and concurrent validity between standard and computerized administration of a representative battery of neurocognitive tests, including those selected by the Clinical Antipsychotic Treatment Intervention Effectiveness (CATIE) and Measurement And Treatment Research to Improve Cognition in Schizophrenia (MATRICS) consortia.13
A.C.L. also provided evidence that within-group variance could be substantially reduced by enhancing interrater reliability and ratings validity using a limited cadre of highly trained raters who were blinded not only to treatment but also to time point in the study. He described a method for such assessments that uses raters at a central site who are connected to study participants via a secured video internet connection and showed data indicating improved ability to detect drug-placebo differences.
The importance of site characteristics and potential solutions for improving site performance was presented by L.E. Among the many issues reviewed, the concern that “professional” participants or rater inflation are a unique problem for US-based sites was depicted as somewhat premature, particularly because entrepreneurs throughout the world will inevitably follow capital investment in this market. Therefore, the critical determining factors influencing site selection should be based on individual site and investigator characteristics that indicate the investigators’ commitment to ensure ethical, nonbiased execution of study protocols. Included in the attributes that L.E. suggested a “quality” site must possess were staff with considerable clinical experience working with the patient population and assessment instruments employed, demonstrated “in-house” training procedures and quality assurance metrics, ongoing programs to prevent rater drift and insure consistency as staff turnover, a reliable source of participant recruitment across a variety of settings, and facilities that are appropriate to fully service the clinical needs of the participants and requirements of the study. An additional issue raised in discussion following this presentation was the need for enhancing dialogue between the sponsor and the participating sites to induce greater involvement by the investigators in the planning and design aspects of the study. In closing, L.E. stated that the “culture” of a site is best judged by the involvement of the principal investigator, which, in turn, is a critical determining factor in the quality of the data that will result from the study as a whole.
Following formal presentations, roundtable discussions were conducted among the session participants and a panel composed of L.E., R.A., M.D., J.-P.L., and A.C.L. These discussions were moderated by session chairs, A.H.K. and N.R.S., and served as a platform for debating the overall implications of the issues raised throughout the session among a range of expert attendees from industry, academia, clinical sites, and governmental agencies. A poster session also served as an additional outlet for the sharing of findings with direct relevance to the issues discussed.