This study was an initial effort to evaluate the feasibility of multi-site measures of acoustic startle and startle plasticity in a community comparison sample, with an ultimate goal of using this multi-site approach to identify genes responsible for startle and PPI abnormalities in clinical populations. We assessed the consistency of startle and PPI measures in CCS across 7 geographically distributed testing sites, and examined differences in methodology and subject characteristics that might have contributed to the variance in these measures observed across sites. Substantial efforts were undertaken to ensure that subject recruitment, data acquisition and data analysis were standardized across the 7 sites. Each of the 7 sites in this study had many years of experience in subject recruitment and testing in psychophysiological measures, and one site (UCSD) had over 25 years of experience in measures of acoustic startle and PPI in normal and patient populations. Testers from all sites were trained during a several-day session, and testing instructions were distributed in the forms of manuals and videotapes (cf. Calkins et al. 2006
). Equipment was assembled and calibrated at one site to enhance uniformity, and then distributed to the remaining 6 sites. Substantial efforts were dedicated towards quality assurance, and new personnel required “certification” before they were permitted to collect data. Annual re-retraining and re-certification procedures were also performed for all testers.
Adding to the standardization process, all waveforms were analyzed at a central site; while almost all waveform analyses are automated, all non-automated assessments (e.g. visual inspection of trials automatically marked as artifact) were made by a single investigator (JS), who remained blind to all subject characteristics. By utilizing multiple sites for data collection, it is possible to ascertain and test a large number of subjects from rare populations (e.g. large schizophrenia pedigrees), and thereby overcome what would otherwise be a rate-limiting step in these types of studies. In contrast, data processing and analysis is not rate-limiting, and is most efficiently standardized at one site with expertise in a particular measure.
Through the QA process, a number of differences in experimental methodology were detected across sites. Some reflected methodologies previously used at a particular site, which were then applied to the present study (e.g. electrode type or placement, use of a video camera, split testing). Geographic or socioeconomic differences or differences in participant recruitment practices across sites may have contributed to differences in the sex and ethnic distributions, WRAT performance and prevalence of psychopathology in CCS across sites. Other differences likely reflected user-specific patterns (e.g. styles of skin abrasion and gel application can impact electrode impedance), equipment or test environment differences (e.g. hearing threshold), and others reflected user error (e.g. failure to balance test orders, accidental reversal of electrode channels). Among the methodological differences that could be quantified or categorized, none could independently account for the observed site variability in key dependent measures.
Given the complexities of this study, it is notable that no statistically significant differences were detected across the 7 sites in any of the four primary measures: startle magnitude (during blocks 2-3), reflex habituation, reflex latency / latency facilitation, and PPI. Significant differences across sites were detected only in block 1 startle magnitude. Nonetheless, a closer inspection reveals non-trivial variability across the 7 sites in each of these measures. For example, we examined %PPI at the 60 ms prepulse interval, where differences between medicated schizophrenia patients and CCS are often most robust (Braff et al. 1978
; Leumann et al. 2002
; Swerdlow et al. 2006
; Weike et al. 2000
). At this interval, the effect size (Cohen's d) for site differences in PPI in 8 out of the 21 pair-wise contrasts among the 7 sites in this study (A vs. B, A vs. C, A vs. D, etc.) exceeds that for the comparison between schizophrenia patients vs. CCS tested at one site (0.24) in a recent single-site study of 103 patients and 66 CCS (Swerdlow et al. 2006
). Five of these pair-wise comparisons reached at least small effect sizes (d>0.3), and one reached a medium effect size (d>0.5). Despite this variability across sites, there was some evidence that the total sample variability of this study of %PPI for 60 ms intervals - as reflected by the standard deviation (37.82) - compares favorably with the variability of 60 ms PPI collected in 70 consecutive CCS at one test site over a comparable time interval (site A; SD = 37.95, unpublished observation). Furthermore, correcting alpha for multiple comparisons, no pair-wise contrasts among the 7 sites in this study reached statistical significance for this key measure (one pair reached p<0.05, a second reached p<0.065, and a third reached p<0.10).
Post-hoc analyses narrowed the list of sources contributing to variability of measures across sites. Site-related patterns of startle variables do not appear to be related to differences in hearing threshold, electrode impedance, the use of split testing or levels of ambient room noise. Sex differences were evident in measures of both latency facilitation and PPI, the latter interacting with test order. Because both sex distributions and test order differed significantly across the 7 test sites, it is possible that these variables contributed significantly to the overall cross-site variability of latency facilitation and PPI.
In assessing the feasibility of multi-site measures of startle and PPI, we considered whether the patterns of findings from an integration of data from 7 test sites are consistent with those among the literature of single-site studies. This is clearly the case with many of the key measures: for example, the observed patterns of startle habituation, and the effects of prepulses and prepulse intervals on startle magnitude and latency are consistent with numerous previous reports (cf. Braff et al. 2001). Lateralized (R>L) patterns of PPI among CCS, though less often examined, have also been reported previously (e.g. Cadenhead et al. 2000
Other findings from the present study do not agree perfectly with those previously reported in single-site studies. For example, sex differences have been reported previously in startle measures, particularly PPI (Jovanovic et al. 2004
; Swerdlow et al. 1993
). In the present study, however, the pattern usually reported in the literature - male > female PPI - was evident only in test order A, and not in test order B. Startle testing in order A most closely resembles that used in “stand alone” single-site startle studies. In contrast, PPI testing in order B was preceded by several hours of neurocognitive and other measures that might lead to participant fatigue. The substantial reduction in male PPI in order B vs. A is consistent with recent evidence linking fatigue with reduced PPI in men (van der Linden et al. 2006
). No comparable published data addresses the effects of fatigue on PPI in women. It may also be relevant that we previously failed to detect male > female PPI levels when startle was measured in the context of a cognitive “challenge” (Talledo et al. 2005). Conceivably, whatever accounts for the fact that cognitive tasks mitigate male > female PPI differences (observed in uninstructed, automatic measures of PPI) might contribute to our failure to detect these sex differences in order B, where PPI testing follows several hours of neurocognitive batteries. While this study cannot elucidate the mechanisms underlying such an effect, the present findings at least suggest that test sequence should be controlled in multi-site PPI studies. Certainly, this would apply not only to studies with CCS, but also to those with patient populations.
Any multi-site startle study would benefit from this type of initial and continuing on-site standardization procedures used in the present study to detect a number of methodological differences across sites. However, with the increasing facility of web-based video monitoring, future studies might employ more “real time” supervision by a central QA test site. Even at sites with substantial experience in psychophysiological testing, methodological elements with long institutional histories can be introduced unknowingly into startle testing protocols, and create potential sources of cross-site variance. Presumably, this process might also occur in multi-site studies using other psychophysiological or neuropsychological measures. Conceivably, real-time monitoring might help detect this inadvertent contamination of methods in a manner that is more sensitive than the annual QA visits used here.
In summary, the present findings suggest that, even with substantial energy and detailed planning directed towards methodological standardization and quality assurance among an experienced group of investigators at different sites, startle data collected from community comparison subjects exhibits variability across sites. Based on our experience to date, a list of recommendations for future multi-site startle studies can be found in . Some sources of variability can be controlled relatively easily, including equal sex distributions among test groups and the use of a standardized testing order that includes startle testing relatively early in any test battery. Other sources of variability are more difficult to identify, but may include features of the testing environment or participant population. Some sources of variability might diminish as sites gain expertise through the testing of larger samples. However, in multi-site studies necessitated by the relative scarcity of a particular test population, test samples may remain small at any single site; infrequent testing and normal personnel turnover will complicate effective quality assurance and make it difficult to gain expertise in startle measures. Conceivably, given the amount of variability detected under the conditions of the present study, considerable thought in the design of startle studies should be given to the benefits of using many geographically dispersed test sites, each testing relatively smaller numbers of subjects, versus the benefits of using fewer test sites, each testing a larger number of participants and clinical subjects from larger geographic catchment areas.
Recommendations for minimizing cross-site variability in multi-site studies of the startle reflex in CCS*