The methods C1, C2, and C3 were named according to their degree of sensitivity, with C1 being the least sensitive and C3 the most sensitive. All 3 methods are based on a positive 1-sided CUSUM calculation. For C1 and C2, the CUSUM threshold reduces to the mean plus 3 standard deviations (SD). The mean and SD for the C1 calculation are based on information from the past 7 days. The mean and SD for the C2 and C3 calculations are based on information from 7 days, ignoring the 2 most recent days. These methods take into consideration daily variation because the mean and SD used by the methods are based on a week's information. These methods also take seasonality into consideration because the mean and SD are calculated in the same season as the data value in question.
Since 1989, results from the historical limits method have been used to produce Figure 1 in the Morbidity and Mortality Weekly Report. This method compares the number of reported cases in the 4 most recent time periods for a given health outcome with historical incidence data on the same outcome from the preceding 5 years; the method is based on comparing the ratio of current reports with the historical mean and SD. The historical mean and SD are derived from 15 totals of 3 intervals (including the same 4 periods, the preceding 4 periods, and the subsequent 4 periods over the preceding 5 years of historical data).
The seasonally adjusted CUSUM method is based on the positive 1-sided CUSUM where the count of interest is compared to the 5-year mean and the 5-year SD for that period. The seasonally adjusted CUSUM was originally applied to laboratory-based Salmonella serotype data.
To calculate sensitivity, specificity, and time to detection, all 5 detection methods of EARS were used to independently analyze 56,000 sets of artificially generated case-count data based on 56 sets of parameters. These 56 sets of parameters each generated 1,000 iterations of 6 years of daily data, 1994–1999, by using a negative binomial distribution with superimposed outbreaks. Means and standard deviations were based on observed values from national and local public health systems and syndromic surveillance systems. Examples of the data included national and state pneumonia and influenza data and hospital influenzalike illness. Adjustments were made for days of the week, holidays, postholiday periods, seasonality, and trend. Any 6 years could be used, but the years 1994–1999 were used to set day of the week and holiday patterns and to avoid any problems that programs might have with the year 2000. Fifty (89%) of these datasets then had outbreaks superimposed throughout the data. Three types of outbreaks were used, each representing various types of naturally occurring events: log normal, a rapidly increasing outbreak; inverted log normal, a slowly starting outbreak; and a single-day spike. These types of outbreaks were combined with different SDs and incubation times to create 10 different types of outbreaks that had equal probability of being included in the simulated data. A year of final simulated data can be seen in the , with original data and outbreaks that were added. As a result of these analyses, the statistically marked aberrations, or flags, produced by the 5 detection methods were evaluated for their specificity, sensitivity, and time to detection. These data can be obtained at http://www.bt.cdc.gov/surveillance/ears/datasets.asp
Example of 1 year of simulated data with simulated outbreaks. Simulated data are based on real means and standard deviations with different types of simulated outbreaks randomly inserted.
In our study, sensitivity was defined as the number of outbreaks in which >1 day was flagged, divided by the total number of outbreaks in the data. An outbreak was defined as a period of consecutive days in which varying numbers of aberrant cases were added to the baseline number of cases. An outbreak had days before and after it when no aberrant cases were added to the baseline case counts. Specificity was defined as the total number of days that did not contain aberrant cases (and that were not flagged), divided by the total number of days that did not contain aberrant cases. Based on these definitions, actual values for sensitivity and specificity were calculated.
Time to detection was defined as the number of complete days that occurred between the beginning of an outbreak and the first day the outbreak was flagged. For example, if a method flags an outbreak on the first day, its time to detection is 0. Likewise, if it flags on the second day, its time to detection is 1, and so on. Time to detection is an average of the times to detection for each outbreak and dataset. Only outbreaks that were flagged on at least 1 day were included in the average. Therefore, sensitivity is needed to completely interpret time to detection. We calculated 2-sided 95% confidence values, and they were relatively small and consistent.
Overall, the CUSUM methods (the seasonally adjusted CUSUM, C1, C2, and C3) had similar times to detection, but their sensitivities varied (Table). Specifically, C1, C2, and C3 showed increasing sensitivity from 60% to 71% to 82%, respectively. The seasonally adjusted CUSUM and C3 methods had similar sensitivities, 82.5% and 82.3%, but C3 had a higher specificity, 88.7% and 95.4%. The historical limits and C1 and C2 methods showed varying sensitivities (44%–71%), with C1 and C2 having the highest, but all demonstrated similar specificities (96%–97%).
When results were stratified by outbreak type, 1-day outbreaks (i.e., spikes) exhibited the lowest sensitivities. Analysis was broken down by dataset and outbreak type ( and ).
For the 6 datasets that contained noise but no outbreaks, no sensitivity or time to detection exist to calculate. The overall specificity for the seasonally adjusted CUSUM, historical limits, C1, C2, and C3 were 88.7%, 98.3%, 97.2%, 97.2%, and 95.2%, respectively. The specificity for these 6 datasets was consistent with general results. The historical limits method showed superior specificity in all but the last dataset.