|Home | About | Journals | Submit | Contact Us | Français|
Startle and its inhibition by weak lead stimuli (“prepulse inhibition”: PPI) are studied to understand the neurobiology of information processing in patients and community comparison subjects (CCS). PPI has a strong genetic basis in infrahumans, and there is evidence for its heritability, stability and reliability in humans. PPI has gained increasing use as an endophenotype to identify vulnerability genes for brain disorders, including schizophrenia. Genetic studies now often employ multiple, geographically dispersed test sites to accommodate the need for large and complex study samples. Here, we assessed the feasibility of using PPI in multi-site studies.
Within a 7-site investigation with multiple measures, the Consortium on the Genetics of Schizophrenia conducted a methodological study of acoustic startle and PPI in CCS. Methods were manualized, videotaped and standardized across sites with intensive in-person training sessions. Equipment was acquired and programmed at the “PPI site” (UCSD), and stringent quality assurance (QA) procedures were used. Testing was completed on 196 CCS over 2.5 years, with 5 primary startle dependent measures: eyeblink startle magnitude, habituation, peak latency, latency facilitation and PPI.
Analyses identified significant variability across sites in some but not all primary measures, and determined factors both within the testing process and subject characteristics that influenced a number of test measures. QA procedures also identified non-standardized practices with respect to testing methods and procedural “drift”, which may be particularly relevant to multi-site studies using these measures.
With thorough oversight and QA procedures, measures of acoustic startle PPI can be acquired reliably across multiple testing sites. Nonetheless, even among sites with substantial expertise in utilizing psychophysiological measures, multi-site studies using startle and PPI as dependent measures require careful attention to methodological procedures.
Acoustic startle and its inhibition by weak lead stimuli (“prepulse inhibition”: PPI) are quantified easily via electromyographic (EMG) measures of the blink reflex in normal and disordered human populations (Graham 1975; cf. Braff et al. 2001b). Neurobiological substrates of automatic, uninstructed PPI have been elucidated in humans and non-human mammalian species (Kumari et al. 2003; Postma et al. 2006; cf. Swerdlow et al. 2000, 2001a), and quantitative trait loci associated with PPI have been identified in rodents (Joober et al. 2002). Preclinical studies have demonstrated that strain differences in PPI reflect genetic rather than epigenetic influences (Francis et al. 2003; Swerdlow et al. 2004). When measured over intervals of several weeks, PPI is stable (Cadenhead et al. 1999), and some evidence suggests that it is a heritable phenotype in humans (Anokhin et al. 2003). These features underscore the importance of PPI deficits in certain complex neuropsychiatric disorders with suspected or identified genetic origins, including schizophrenia (cf. Braff et al. 2001b; Braff et al. 1978, 1992, 1999, 2001a, 2005; Leumann et al. 2002; Swerdlow et al. 2006; Weike et al. 2000) and Tourette Syndrome (Castellanos et al. 1996; Swerdlow et al. 2001b). For these reasons, there has been increasing interest in the use of PPI as a quantitative physiological phenotype, or endophenotype, to identify vulnerability genes for these disorders (cf. Braff and Freedman 2002; Anokhin et al. 2003; Cadenhead et al. 2000; Gottesman and Gould 2003; Kumari et al. 2005; Braff and Light 2005; Braff et al. 2006; Turetsky et al. 2006).
Genetic studies of complex quantitative behavioral disorders often require multiple data collection sites to ascertain large samples or family cohorts. Such geographically diverse studies present challenges for the use of endophenotypes, because differences in methodologies or test conditions across sites can introduce uncontrolled variance into the experimental measures. This variance could potentially obscure detection of physiological signals used to identify vulnerability genes. Thus, before PPI can be used with full confidence as an endophenotype in studies of complex behavioral disorders, it will be important to determine whether PPI can be reliably acquired at multiple test sites, using uniform techniques. This issue is most easily examined in non-patient populations, since in this case, site differences in patient characteristics, symptoms, medications and other clinical variables do not impact the findings. The present study reports the initial PPI data gathered from non-patient community comparison subjects (CCS) in a multisite collaborative study, the Consortium on the Genetics of Schizophrenia (COGS) (cf. Calkins et al. 2006). Subjects completed standardized neurocognitive and neurophysiological tests, including tests of the acoustic startle response and PPI at seven sites, all of which used identical equipment, techniques, training and subject selection criteria. All data were analyzed at a central PPI Quality Assurance site (UCSD: see Figure 1), with substantial experience in startle waveform analysis. We hypothesized that we would observe comparable levels of acoustic startle and PPI across sites, thereby confirming the suitability of these measures for multisite genetic studies. Differences in subject characteristics across sites were also assessed, as such differences might be expected in multi-site studies, and might contribute to variance that impacts the reliability of startle measures and PPI.
Participants were recruited as part of the COGS project and tested at seven geographically dispersed, University affiliated sites (in alphabetical order): Harvard University, Mount Sinai School of Medicine, University of California at Los Angeles, University of California at San Diego, University of Colorado, University of Pennsylvania and University of Washington. Participants were18-65 years old and fluent in English. Only data from CCS are described in this report. At each site, medically healthy CCS were recruited directly through flyers, print, and electronic media. CCS were excluded from the study for: electroconvulsive therapy in the past 6 months, positive illicit drug or alcohol screen, diagnosis of substance abuse disorder in past 30 days or substance dependence in past 6 months, estimated IQ lower than 70, history of significant head injury (including any of: loss of consciousness > 1 min, post-concussive syndrome > 1 week, or abnormal brain imaging or electroencephalography after event), seizure disorder, and other ocular, auditory, neurological or major systemic medical problems, a personal history of cluster A personality disorder or psychosis, or a family history of psychosis in a first- or second-degree relative. Local IRB boards of each testing site approved the study and all participants provided signed informed consent before participation in the study procedures.
All participants underwent standardized diagnostic and clinical assessments by diagnosticians trained according to a standardized procedure, including a modified version of the Diagnostic Interview for Genetic Studies (DIGS; (Nurnberger et al., 1994), Scale for the Assessment of Negative Symptoms (SANS; Andreason 1984a) and Positive Symptoms (SAPS; Andreason 1984b), and a review of relevant medical records. Each subject was assigned DSM-IV best estimate final diagnoses through a consensus process that included review by at least two faculty level clinicians.
Sample characteristics are seen in Table 1. Of the 196 CCS who underwent valid testing during 30 months of data collection, 23 (11.7 %) were excluded from analysis based on established criteria for unacceptably low startle magnitude (see Figure 2). One participant was excluded for a positive saliva alcohol measurement, one for a current diagnosis of alcohol dependence, and one for a current diagnosis of alcohol abuse. The final study sample included 170 participants. A complete description of clinical and neurocognitive instruments used to characterize participants is found in a published report (Calkins et al. 2006).
Startle testing was administered as part of the COGS research protocol consisting of 4 hours of clinical assessment and 6 hours of neurophysiological and neuropsychological testing. The COGS neurophysiological and neuropsychological tasks were presented in one of two standardized orders, and a brief rest period was allowed between tasks (Table 2). Dependent measure scoring was conducted blind to diagnostic group membership.
Startle testing was initiated after completion of a specific set of diagnostic or experimental measures (Table 2). In 46 subjects, testing was divided over two days, but the test sequence was maintained.
A list of test equipment is provided as a footnote to this report. Audiometric screening excluded hearing impairment (exclusion for threshold >40 dB at 1000 Hz). For startle testing, subjects sat in a recliner chair in a sound-attenuated room. Methods followed previous reports (e.g. Braff et al. 2001a, 2005). Two 4 mm Ag/AgCl electrodes were positioned below and lateral to each eye over orbicularis oculi, with a ground electrode behind the left ear. Electrode resistances were <10 kOhm. The eye-blink component of the acoustic startle response was measured using an EMG startle system that recorded 250 1-ms epochs, starting with startle stimulus onset. Recorded EMG activity was band-pass filtered (100-1000 Hz). Amplification was set at 10,000 (a 0.25 mV signal triggered a 2.5V amplifier output). Blink scoring parameters are based on criteria established by Graham (1975), described previously (Braff et al. 1992). A square wave calibrator established sensitivity (1.31 μV/digital unit). A 60-Hz notch filter was used to reduce interference. Stimuli were presented binaurally through headphones. Sound levels were calibrated monthly. Ambient room noise was measured using an artificial ear coupler on the headphones. Electrical noise in the test room was measured using magnetic field meter.
Laboratory staff and key faculty members from each site participated in a two-day in-person training session at UCSD prior to the study's initiation. Staff also received annual, in-person, refresher training on the startle testing procedures. A comprehensive manual describing the equipment, software, subject placement, scripts for interacting with participants and task administration was written and distributed to all sites (available on request), as were videotapes of all testing procedures. Ongoing quality assurance consisted of data quality review, biweekly conference calls and consultation with the central site. In addition, the project administrator visited each site annually to review procedures, inspect equipment, and undergo startle testing (Calkins et al. 2006).
Data were scored using San Diego's Instrument's SRREDNEW program, which generates baseline EMG levels during the 20 ms epoch after startle stimulus delivery, onset and peak response latency (ms) and response amplitude (digital units). Software parameters by which voluntary and spontaneous blinks were recognized and excluded were derived from published criteria (Braff et al 1978; Geyer & Braff 1982; Graham 1975). Onset latency was defined by an algorithm based on the initial deviation from baseline EMG levels, and peak latency was defined as the point of maximal amplitude within 100 ms from startle stimulus onset. To exclude non-startle-induced blinks, rectified waveforms were automatically flagged for visual inspection when the onset and peak latencies differed by more than 100 ms. Responses were also assigned an error code when baseline values shifted by more than 10 units. Exclusion of all but stimulus-elicited eye blinks precludes contamination of the measures from group-related differences and rates of spontaneous blinks, which may be abnormal in patient populations.
Each trial was visually inspected for spontaneous and voluntary blinks, noise created by movement artifact and on-going EMG activity occurring at the onset of the stimulus. Trials with artifact were excluded from analysis on a trial-by-trial basis (<4% total trial exclusion). The SRREDNEW program automatically selected the peak of the startle response waveform within the 250 ms post-stimulus window for analysis, and this selection was monitored, blind to subject identity. Mean startle latencies and amplitudes were generated for each trial type within each block for the right and left eye for each participant. The QA site then uploaded these averages to the central data website, along with comments on the validity and quality of the data.
Demographic differences across test sites were analyzed by 1-way ANOVAs for continuous variables and chi-square test for categorical variables. For experimental measures (startle magnitude, habituation, latency and PPI), repeated measures ANOVAs with Greenhouse-Geisser corrections and Fisher's PLSD post-hoc comparisons were performed with test site and sex as between-subject factors. The ratio of the mean magnitude of startle with a prepulse to the amplitude of startle magnitude without a prepulse was determined. %PPI is equal to 1 minus this ratio, expressed as a percentage, i.e. 100 × (1 − (magnitude of startle to pulse preceded by prepulse)/ magnitude of startle to pulse without a preceding prepulse)). For %PPI, within-subject factors were block, prepulse interval and eye side. Analyses also compared measures of onset and peak reflex latency, latency facilitation (latency reduction on trials with a prepulse followed by pulse vs. pulse alone trials) and reflex habituation (startle magnitude reduction in trial block 4 vs. 1 calculated as both difference scores and % reduction) across sites. Additional “secondary” analyses examined the impact of specific testing features (e.g. test order, time of day, one vs. two test days) and participant characteristics (e.g. age, smoking behavior) on the key dependent measures, particularly where there were known inter-site sample (e.g. different sex distributions) or methodological differences (e.g. differences in test order). Because patterns of onset and peak latency were comparable for most analyses, only peak latency values are reported for secondary analyses. Correlations were assessed via simple regression, or when using a single value per site (e.g. ambient noise level), via Spearman Rank analyses. Alpha for all comparisons was 0.05. Effect sizes (Cohen's d (Cohen 1988)) are reported where appropriate.
Recruitment and ascertainment practices across sites yielded samples comparable in some but not all characteristics (Table 1). ANOVAs revealed significant differences in the Wide Range Achievement Test, Third Edition (WRAT3) reading subscale scores across sites (p=0.03); Chi-square analyses revealed significant differences in CCS sex distributions across sites (% female, maximum vs. minimum = 75% vs. 29%; p<0.01), current Axis I non-schizophrenia diagnoses (% with current diagnoses, maximum vs. minimum = 22% vs. 0%, p<0.05) and ethnic representation (% non-Caucasian, maximum vs. minimum = 50% vs. 4%; p<0.05), and trends for differences in the proportion of CCS with lifetime Axis I diagnoses (% ever diagnosis, maximum vs. minimum = 50% vs. 12.5 %, p<0.1), and the proportion of CCS that are currently using psychotropic medications (% using psychotropic medication, maximum vs. minimum = 16.7% vs. 0%; p<0.2).
Axis I diagnoses were identified in 45 (26.5%) of the CCS. Of these, 29 (17.1%) were diagnosed with a non-psychotic affective disorder (Major Depressive Disorder or Dysthymia), and 23 (13.5%) were diagnosed with a substance-related disorder (Dependence or Abuse). No CCS carried diagnoses known to be associated with reduced levels of PPI. Medications were used by 22 (12.9%) of the final sample; of these, most (13; 7.6%) were taking herbal supplements or non-prescription medications, and 8 (4.7%) were taking antidepressants.
Despite significant efforts towards cross-site standardization of methods (see above and Calkins et al. (2006)), QA efforts detected a number of differences across sites in the experimental conditions and methodologies related to startle testing (Table 3). These differences related to the placement of EMG electrodes, EMG channel configurations, subject instructions, levels of ambient acoustic noise (range: 34-45 dB(A)) and electrical noise (range = 0.1-1.2 gauss) in the test room, test order (p<0.001; % sample tested in order “A”, maximum vs. minimum = 94% vs. 35%) and use of “split test” designs (testing over more than one day) (p<0.001; % sample tested over > 1 day, maximum vs. minimum = 90% vs. 0%). Differences across sites were also detected in EMG electrode impedance (p<0.0001; range (mean/site) = 5.16-9.10 kΩ) and hearing threshold (p<0.0001; range (mean/site) = 7.00-20.31 dB(A)). Despite these differences, all mean site values for electrode impedance and hearing threshold fell within designated acceptable ranges.
Primary analyses compared four major startle characteristics across sites. Secondary analyses were then used to identify specific predicted patterns of startle characteristics, and to examine any impact of inter-test site differences on these measures. Of the sample characteristics and methodological variables found to differ across sites (above), WRAT3 Reading scores, hearing threshold, electrode impedance, participant sex and ethnicity, test order and use of split testing could be assessed as potential contributors to variability in startle measures.
ANOVA of startle magnitude on pulse alone (PA) trials during blocks 2 and 3, when PPI was assessed, revealed no significant main effect of test site (F=1.18, df 6, 163, ns), a significant effect of trial block (2 vs. 3: F=168.03, df 1,163, p<0.0001), and no significant interaction of site × block (F<1). There were no significant effects of eye side (F<1), or other 2- or 3-way interactions. Across the 7 test sites, mean startle magnitude (SD) on PA trials during blocks 2 and 3 ranged from 43.0 (28.0) units to 78.7 (37.3) units (56.3 (36.7) - 103.1 (48.9) μV). It is possible that this range of startle magnitude values across sites reflected the impact of any one of several identified cross-site differences in participant characteristics or methodologies. However, startle magnitude did not correlate significantly with either hearing threshold (r=0.04) or electrode impedance (r=0.04), suggesting that site differences in these measures did not contribute substantially to the range of startle values. Startle magnitude differed based on ethnicity, with Caucasians showing significantly greater magnitude than non-Caucasian participants (F=9.73, df 1,133, p<0.003), consistent with previous reports (Swerdlow et al. 2005). Because site C used predominantly one test order, and split-testing was never done at site A but always done at site D, ANOVA was performed without a “site” factor, and detected no significant effects of sex, test order or split testing on startle magnitude, and no significant interactions, suggesting that these factors did not contribute significantly to the overall range of startle values. There was no significant correlation between mean startle magnitude and ambient noise levels at each site (rs=0.40, p=0.33). Thus, none of the methodological variables that differed across sites independently accounted for the observed site variability in startle magnitude.
Analysis of startle magnitude on no-stim (blank) trials revealed no significant effects of test site, trial block, eye side or sex (all F's<1), or any significant 2, 3 or 4-way interactions.
Startle magnitude normally declines with repeated stimulus presentation, and the amount of reflex habituation was assessed by comparing reflex magnitudes in trial blocks 1 vs. 4. ANOVA of startle magnitude during blocks 1 and 4 revealed a trend for effect of test site (F=1.98, df 6,162, p<0.071), a significant effect of trial block (1 vs. 4: F=430.68, df 1,162, p<0.0001), and a significant interaction of site × block (F=2.90, df 6,162, p=0.01). Rather than a difference in habituation per se, this interaction reflected a significant difference in block 1 startle magnitude across sites (F=2.73, df 6,162, p<0.015), based on significant differences between site B and sites A (p<0.0009), E (p<0.001) and F (p<0.05). Consistent with this, ANOVA of percent startle habituation revealed no main effect of test site (F=1.06, df 6,162, ns).
Post-hoc comparisons examined the impact of inter-site sample or methodological differences on block 1 startle magnitude. Block 1 startle magnitude did not correlate significantly with either hearing threshold or electrode impedance (all r's<0.1). Block 1 startle magnitude was significantly related to ethnicity; as described above for startle magnitude in blocks 2 and 3, this characteristic could account for site differences in block 1 startle magnitude. ANOVAs also detected no significant effects of sex, test order or split testing on block 1 startle magnitude or habituation, and no significant interactions, suggesting that these factors did not contribute significantly to site-related differences in these measures. Mean block 1 startle magnitude did not correlate significantly with ambient noise levels across sites (rs=0.37, p=0.38).
ANOVA of peak startle latency on pulse alone trials revealed no significant effect of test site (F=1.74, df 6,162, ns) or trial block (F<1), a significant effect of eye side (R > L: F=6.46, df 1,162, p<0.015), and no significant 2- or 3-way interactions. Site differences were also not evident for reflex onset latency (F=1.04, df 6,155, ns), and lateralized differences did not reach statistical significance (F=2.45, df 1,155, ns). Reflex latency normally becomes smaller (“latency facilitation” (Hoffman and Searle 1968; Ison et al. 1973)) when startling stimuli are preceded by short interval prepulses, and this latency facilitation was assessed by comparing peak latency across all trial types (PA and prepulse+pulse). Across all trial types, ANOVA of peak startle latency revealed no significant main effect of test site (F=1.83, df 6,144, ns) and significant main effects of sex (men > women: F=6.31, df 1,149, p<0.015) and trial type (F=143.77, df 3,144, p<0.0001), the latter reflecting the expected reduction in startle latency for 30, 60 and 120 ms prepulse conditions compared to PA trials (all p's<0.0001). The effect of eye-side remained significant (R>L: F=11.30, df 1,144, p=0.001), and a significant interaction of site by eye side reflected substantial R>L asymmetry at some test sites (e.g. site C: mean (SEM) R vs. L (ms)=62.7 (0.80) vs. 59.9 (0.81); p<0.002) but not others (e.g. site B: mean (SEM) R vs. L (ms) = 57.8 (0.47) vs. 58.4 (0.50); ns). Similar patterns were observed for onset startle latency: no significant effect of site (F=1.39, df 1,137, ns), significant effects of sex (male > women: F=6.31, df 1,149, p<0.015) and trial type (F=55.24, df 3,411, p<0.0001, reflecting significant facilitation for 30 and 60 ms prepulse trials (p's<0.0001)) and side (R>L: F=7.61, df 1, 137, p<0.007), though the site by eye side interaction did not reach significance (F=1.27, df 6,137, ns).
ANOVA of %PPI revealed no significant effect of test site (F<1), but significant effects of prepulse interval (120 ms > 60 ms > 30 ms; F=68.02, df 2,324, p<0.0001) and eye side (R > L; F=7.78, df 1,162, p<0.006), though not trial block (F=1.83, df 1,162, ns). There were no other significant 2-, 3- or 4-way interactions. Despite the lack of significant main effect of test site, there was a substantial range of mean %PPI values across sites (maximum (SEM) vs. minimum (SEM) averaged across prepulse intervals = 47.08 (1.83) vs. 35.20 (1.81)).
We examined factors that might have contributed to this range of PPI values across sites. Mean %PPI did not correlate significantly with either hearing threshold or electrode impedance (r's<0.03), or startle magnitude during blocks 2-3 (r<0.01), suggesting that these variables did not contribute substantially to inter-site variability in PPI. Neither WRAT scores nor ethnicity were significantly related to PPI values. Across sites, mean PPI did not correlate significantly with ambient noise levels (rs= − 0.03, p=0.96). ANOVA across all sites revealed a significant effect of prepulse interval (F=55.76, df 2,322, p<0.0001), and a significant interaction of sex × test order (F=5.70, df 1,161, p<0.02; Figure 6B), but no significant effect of sex (F<1), test order (F=2.00, df 1,161, ns), or split testing (F<1), and no other significant 2-, 3- or 4-way interactions. Post-hoc comparisons revealed that the interaction of sex and test order reflected the expected pattern of significantly greater PPI in men than in women in test order A (F=6.16, df 1,99, p<0.01; Figure 6B), but an opposite trend in test order B (F=2.28, df 1,66, ns). Post-hoc comparisons revealed that, while PPI in women did not differ across test orders, PPI in men was significantly greater in test order A vs. B (F=15.36, df 1,70, p<0.0003). In considering possible factors accounting for this effect of test order on PPI in men, this sex-specific pattern did not appear to reflect the influence of smoking habits (smoking × test order: F < 1) or test site (site × test order: F < 1, for all sites except site C, which used only one test order).
This study was an initial effort to evaluate the feasibility of multi-site measures of acoustic startle and startle plasticity in a community comparison sample, with an ultimate goal of using this multi-site approach to identify genes responsible for startle and PPI abnormalities in clinical populations. We assessed the consistency of startle and PPI measures in CCS across 7 geographically distributed testing sites, and examined differences in methodology and subject characteristics that might have contributed to the variance in these measures observed across sites. Substantial efforts were undertaken to ensure that subject recruitment, data acquisition and data analysis were standardized across the 7 sites. Each of the 7 sites in this study had many years of experience in subject recruitment and testing in psychophysiological measures, and one site (UCSD) had over 25 years of experience in measures of acoustic startle and PPI in normal and patient populations. Testers from all sites were trained during a several-day session, and testing instructions were distributed in the forms of manuals and videotapes (cf. Calkins et al. 2006). Equipment was assembled and calibrated at one site to enhance uniformity, and then distributed to the remaining 6 sites. Substantial efforts were dedicated towards quality assurance, and new personnel required “certification” before they were permitted to collect data. Annual re-retraining and re-certification procedures were also performed for all testers.
Adding to the standardization process, all waveforms were analyzed at a central site; while almost all waveform analyses are automated, all non-automated assessments (e.g. visual inspection of trials automatically marked as artifact) were made by a single investigator (JS), who remained blind to all subject characteristics. By utilizing multiple sites for data collection, it is possible to ascertain and test a large number of subjects from rare populations (e.g. large schizophrenia pedigrees), and thereby overcome what would otherwise be a rate-limiting step in these types of studies. In contrast, data processing and analysis is not rate-limiting, and is most efficiently standardized at one site with expertise in a particular measure.
Through the QA process, a number of differences in experimental methodology were detected across sites. Some reflected methodologies previously used at a particular site, which were then applied to the present study (e.g. electrode type or placement, use of a video camera, split testing). Geographic or socioeconomic differences or differences in participant recruitment practices across sites may have contributed to differences in the sex and ethnic distributions, WRAT performance and prevalence of psychopathology in CCS across sites. Other differences likely reflected user-specific patterns (e.g. styles of skin abrasion and gel application can impact electrode impedance), equipment or test environment differences (e.g. hearing threshold), and others reflected user error (e.g. failure to balance test orders, accidental reversal of electrode channels). Among the methodological differences that could be quantified or categorized, none could independently account for the observed site variability in key dependent measures.
Given the complexities of this study, it is notable that no statistically significant differences were detected across the 7 sites in any of the four primary measures: startle magnitude (during blocks 2-3), reflex habituation, reflex latency / latency facilitation, and PPI. Significant differences across sites were detected only in block 1 startle magnitude. Nonetheless, a closer inspection reveals non-trivial variability across the 7 sites in each of these measures. For example, we examined %PPI at the 60 ms prepulse interval, where differences between medicated schizophrenia patients and CCS are often most robust (Braff et al. 1978, 2005; Leumann et al. 2002; Swerdlow et al. 2006; Weike et al. 2000). At this interval, the effect size (Cohen's d) for site differences in PPI in 8 out of the 21 pair-wise contrasts among the 7 sites in this study (A vs. B, A vs. C, A vs. D, etc.) exceeds that for the comparison between schizophrenia patients vs. CCS tested at one site (0.24) in a recent single-site study of 103 patients and 66 CCS (Swerdlow et al. 2006). Five of these pair-wise comparisons reached at least small effect sizes (d>0.3), and one reached a medium effect size (d>0.5). Despite this variability across sites, there was some evidence that the total sample variability of this study of %PPI for 60 ms intervals - as reflected by the standard deviation (37.82) - compares favorably with the variability of 60 ms PPI collected in 70 consecutive CCS at one test site over a comparable time interval (site A; SD = 37.95, unpublished observation). Furthermore, correcting alpha for multiple comparisons, no pair-wise contrasts among the 7 sites in this study reached statistical significance for this key measure (one pair reached p<0.05, a second reached p<0.065, and a third reached p<0.10).
Post-hoc analyses narrowed the list of sources contributing to variability of measures across sites. Site-related patterns of startle variables do not appear to be related to differences in hearing threshold, electrode impedance, the use of split testing or levels of ambient room noise. Sex differences were evident in measures of both latency facilitation and PPI, the latter interacting with test order. Because both sex distributions and test order differed significantly across the 7 test sites, it is possible that these variables contributed significantly to the overall cross-site variability of latency facilitation and PPI.
In assessing the feasibility of multi-site measures of startle and PPI, we considered whether the patterns of findings from an integration of data from 7 test sites are consistent with those among the literature of single-site studies. This is clearly the case with many of the key measures: for example, the observed patterns of startle habituation, and the effects of prepulses and prepulse intervals on startle magnitude and latency are consistent with numerous previous reports (cf. Braff et al. 2001). Lateralized (R>L) patterns of PPI among CCS, though less often examined, have also been reported previously (e.g. Cadenhead et al. 2000).
Other findings from the present study do not agree perfectly with those previously reported in single-site studies. For example, sex differences have been reported previously in startle measures, particularly PPI (Jovanovic et al. 2004; Swerdlow et al. 1993, 1997). In the present study, however, the pattern usually reported in the literature - male > female PPI - was evident only in test order A, and not in test order B. Startle testing in order A most closely resembles that used in “stand alone” single-site startle studies. In contrast, PPI testing in order B was preceded by several hours of neurocognitive and other measures that might lead to participant fatigue. The substantial reduction in male PPI in order B vs. A is consistent with recent evidence linking fatigue with reduced PPI in men (van der Linden et al. 2006). No comparable published data addresses the effects of fatigue on PPI in women. It may also be relevant that we previously failed to detect male > female PPI levels when startle was measured in the context of a cognitive “challenge” (Talledo et al. 2005). Conceivably, whatever accounts for the fact that cognitive tasks mitigate male > female PPI differences (observed in uninstructed, automatic measures of PPI) might contribute to our failure to detect these sex differences in order B, where PPI testing follows several hours of neurocognitive batteries. While this study cannot elucidate the mechanisms underlying such an effect, the present findings at least suggest that test sequence should be controlled in multi-site PPI studies. Certainly, this would apply not only to studies with CCS, but also to those with patient populations.
Any multi-site startle study would benefit from this type of initial and continuing on-site standardization procedures used in the present study to detect a number of methodological differences across sites. However, with the increasing facility of web-based video monitoring, future studies might employ more “real time” supervision by a central QA test site. Even at sites with substantial experience in psychophysiological testing, methodological elements with long institutional histories can be introduced unknowingly into startle testing protocols, and create potential sources of cross-site variance. Presumably, this process might also occur in multi-site studies using other psychophysiological or neuropsychological measures. Conceivably, real-time monitoring might help detect this inadvertent contamination of methods in a manner that is more sensitive than the annual QA visits used here.
In summary, the present findings suggest that, even with substantial energy and detailed planning directed towards methodological standardization and quality assurance among an experienced group of investigators at different sites, startle data collected from community comparison subjects exhibits variability across sites. Based on our experience to date, a list of recommendations for future multi-site startle studies can be found in Table 4. Some sources of variability can be controlled relatively easily, including equal sex distributions among test groups and the use of a standardized testing order that includes startle testing relatively early in any test battery. Other sources of variability are more difficult to identify, but may include features of the testing environment or participant population. Some sources of variability might diminish as sites gain expertise through the testing of larger samples. However, in multi-site studies necessitated by the relative scarcity of a particular test population, test samples may remain small at any single site; infrequent testing and normal personnel turnover will complicate effective quality assurance and make it difficult to gain expertise in startle measures. Conceivably, given the amount of variability detected under the conditions of the present study, considerable thought in the design of startle studies should be given to the benefits of using many geographically dispersed test sites, each testing relatively smaller numbers of subjects, versus the benefits of using fewer test sites, each testing a larger number of participants and clinical subjects from larger geographic catchment areas.
We thank the following key personnel for their dedicated efforts to the COGS: Harvard University: Lynda Jacobs, Monica Landi, Erica Lee, Andrea Roe, Frances Schopick, and Alison Thomas. Mount Sinai School of Medicine: Rui Ferreira, Robert Fieo, Fran Schopick, Christopher Smith, Rebecca West. University of California Los Angeles: William Horan, Mark Sergi. University of California San Diego: Andras Kovach, Katrin Meyer-Gomes, Barbara Haugeland, Kari Tweedale, Sheldrick Holmes, Emmeline Crowley. University of Colorado: Jamey Ellis, Jeff Hollis, Vicki Pender, Bernadette Sullivan, Bettye Clement, Christopher Cason, Alexis Ritvo. University of Pennsylvania: Alexandra Duncan Ramos, Jarrod Gutman, Carla Ann Henry, Paul Hughett, Jennifer Jackson, Adrienne Mishkin, J. Dan Ragland, Leslie Ramsey, David Rice, Jan Richard, Devon Seward, Felipe Silva and Robert Witalec. University of Washington: Kate B. Alvey, Andrew C. David, Sean P. Meichle, Denise O. Pritzl.
Role of the Funding Source. Supported by the National Institute of Mental Health (NIMH), via grants listed below by site. The NIMH had no further role in study design; in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication. Funding sources for this work were: Harvard University: RO1-MH065562, MH43518, Commonwealth Research Center of the Massachusetts Department of Mental Health, Mount Sinai School of Medicine: RO1-MH065554; University of California Los Angeles: RO1-MH65707; University of California San Diego: R01-MH065571, MH01436 (NRS); University of Colorado: RO1-MH65588; University of Pennsylvania: RO1-MH65578; University of Washington: R01-MH65558.
Conflict of Interest: No authors had a conflict of interest related to the contents of this manuscript.
Test equipment for this multi-site study included:
1. System for stimulus programming and delivery, and response acquisition:
San Diego Instruments EMG-SR Startle System SR-Lab (San Diego, CA); includes EMG-SR software, PC interface board, Stimulus Module, all cables and connectors;
Headphones for startle stimulus delivery (Maico Model TDH-39-P; Minneapolis, MN);
Stimulus calibration: Quest Model 2700-10 dB Meter (Quest Electronics, Oconomowoc, WI), QC-10 sound calibrator and calibrator adapter ring (#056-990), adapter ring (#58-928), metal EC9A 6 cc earphone coupler and weight, W-440.
2. Equipment and supplies for electrode preparation and application:
Electrodes - In Vivo Metrics (Healdsburg, CA) model #E220X-LS 4 mm, 40 inch lead, gray;
Electrode washers - In Vivo Metrics, model #E401;
Gel - Electro-Cap International, Inc electrode gel, model E9 (Eaton, OH);
Impedance meter: UFI (Morro Bay, CA) model 1089 MKIII CHECKTRODE;
Skin preparation: Kendall Curity Gauze Sponges, 12 ply- 4 in × 3 in, USP Type VII Gauze and rubbing alcohol or PDI Electrode Prep Pads (with pumice & rubbing alcohol).
Electrical noise measurements - Sypris Triaxial ELF Magnetic Field Meter (Model 4080; Orlando, FL).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.