PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ploscompComputational BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS Comput Biol. Apr 2012; 8(4): e1002472.
Published online Apr 12, 2012. doi:  10.1371/journal.pcbi.1002472
PMCID: PMC3325176
Optimizing Provider Recruitment for Influenza Surveillance Networks
Samuel V. Scarpino,1* Nedialko B. Dimitrov,2 and Lauren Ancel Meyers1,3
1The University of Texas at Austin, Section of Integrative Biology, Austin, Texas, United States of America
2Naval Postgraduate School, Operations Research Department, Monterey, California, United States of America
3The Santa Fe Institute, Santa Fe, New Mexico, United States of America
Mark M. Tanaka, Editor
University of New South Wales, Australia
* E-mail: scarpino/at/utexas.edu
Conceived and designed the experiments: SVS NBD LAM. Performed the experiments: SVS NBD. Analyzed the data: SVS NBD. Contributed reagents/materials/analysis tools: SVS NBD. Wrote the paper: SVS NBD LAM.
Received August 9, 2011; Accepted February 29, 2012.
The increasingly complex and rapid transmission dynamics of many infectious diseases necessitates the use of new, more advanced methods for surveillance, early detection, and decision-making. Here, we demonstrate that a new method for optimizing surveillance networks can improve the quality of epidemiological information produced by typical provider-based networks. Using past surveillance and Internet search data, it determines the precise locations where providers should be enrolled. When applied to redesigning the provider-based, influenza-like-illness surveillance network (ILINet) for the state of Texas, the method identifies networks that are expected to significantly outperform the existing network with far fewer providers. This optimized network avoids informational redundancies and is thereby more effective than networks designed by conventional methods and a recently published algorithm based on maximizing population coverage. We show further that Google Flu Trends data, when incorporated into a network as a virtual provider, can enhance but not replace traditional surveillance methods.
Public health agencies use surveillance systems to detect and monitor chronic and infectious diseases. These systems often rely on data sources that are chosen based on loose guidelines or out of convenience. In this paper, we introduce a new, data-driven method for designing and improving surveillance systems. Our approach is a geographic optimization of data sources designed to achieve specific surveillance goals. We tested our method by re-designing Texas' provider-based influenza surveillance system (ILINet). The resulting networks better predicted influenza associated hospitalizations and contained fewer providers than the existing ILINet. Furthermore, our study demonstrates that the integration of Internet source data, like Google Flu Trends, into surveillance systems can enhance traditional, provider-based networks.
Since the Spanish Flu Pandemic of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e001.jpg, the global public health community has made great strides towards the effective surveillance of infectious diseases. However, modern travel patterns, heterogeneity in human population densities, proximity to wildlife populations, and variable immunity interact to drive increasingly complex patterns of disease transmission and emergence. As a result, there is an increasing need for effective, evidence-based surveillance, early detection, and decision-making methods [1][3]. This need was clearly articulated in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e002.jpg by a directive from the Department of Homeland Security and the Centers for Disease Control and Prevention to develop a nationwide, real-time public health surveillance network [4], [5].
The U.S. Outpatient Influenza-Like Illness Surveillance Network (ILINet) gathers data from thousands of healthcare providers across all fifty states. Throughout influenza season (CDC mandating reporting during weeks An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e003.jpg, which is approximately October through mid-May), participating providers are asked to report weekly the number of cases of influenza-like illness treated and total number of patients seen, by age group. Cases qualify as ILI if they manifest fever in excess of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e004.jpgF along with a cough and/or a sore throat, without another known cause. Although the CDC receives reports of approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e005.jpg million patient visits per year, many of the reports may use a loose application of the ILI case definition and/or may simply be inaccurate. The data are used in conjunction with other sources of laboratory, hospitalization and mortality data to monitor regional and national influenza activity and associated mortality. Similar national surveillance networks are in place in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e006.jpg EU countries and elsewhere around the globe [6][9].
Each US state is responsible for recruiting and managing ILINet providers. The CDC advises states to recruit one regularly reporting sentinel provider per An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e007.jpg residents, with a state-wide minimum of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e008.jpg sentinel providers. Since An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e009.jpg, the Texas Department of State Health Services (DSHS) has enrolled a total of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e010.jpg volunteer providers. Participating providers regularly drop out of the network; Texas DSHS aims to maintain approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e011.jpg active participants through year-round recruitment of providers in heavily populated areas (cities with populations of at least An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e012.jpg). DSHS also permits other (non-targeted) providers of family medicine, internal medicine, pediatrics, university student health services, emergency medicine, infectious disease, OB/GYN and urgent care to participate in the network. During the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e013.jpg influenza season, the Texas ILINet included An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e014.jpg providers with approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e015.jpg reporting most weeks of the influenza season.
A number of statistical studies have demonstrated that ILI surveillance data is adequate for characterizing past influenza epidemics, monitoring populations for abnormal influenza activity, and forecasting the onsets and peaks of local influenza epidemics [10][16]. However, the surveillance networks are often limited by non-representative samples [17], inaccurate and variable reporting [12][14], and low reporting rates [6]. Some of these studies have yielded specific recommendations for improving the performance of the surveillance network, for example, inclusion of particular categories of hospitals in China [12], preference for general practitioners over pediatricians in Paris, France [14], and a general guideline to target practices with high reporting rates and high numbers of patient visits (per capita) [6]. Polgreen et al. An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e016.jpg recently described a computational method for selecting ILINet providers so as to maximize coverage, that is, the number of people living within a specified distance of a provider [17]. They applied the approach to optimizing the placement of the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e017.jpg providers in the Iowa ILINet. While their algorithm ensures maximum coverage, it is not clear that maximum coverage is, in general, the most appropriate criterion for building a statistically informative ILINet.
In An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e018.jpg, Google.org launched Google Flu Trends, a website that translates the daily number of Googles search terms associated with signs, symptoms, and treatment for acute respiratory infections into an estimate of the number of ILI patients per An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e019.jpg people. It was shown that Google Flu Trends reliably estimates national influenza activity in the US [18], the state of Utah [18], and in some European countries [19], but it provided imperfect data regarding the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e020.jpg H1N1 pandemic in New Zealand [20]. We assessed the correlation between Google Flu Trends for Texas and Texas' ILINet data and found a correlation of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e021.jpg, similar to those presented in Ginsberg et al. 2009 [18] (See Text S1). The Google Flu Trends website includes ILI-related search activity down to the level of cities (in beta version as of November An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e022.jpg). Thus, Google Flu Trends may serve as a valuable resource for influenza detection and forecasting if effectively integrated with public health data such as those coming from state ILINets.
Here, we present an evaluation of the Texas Influenza-Like-Illness Surveillance Network (ILINet), in terms of its ability to forecast statewide hospitalizations due to influenza (ICD9 An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e023.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e024.jpg) and unspecified pneumonia (ICD9 An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e025.jpg). Although we henceforth refer to this subset of hospitalizations as influenza-like hospitalizations, we emphasize that these data do not perfectly reflect influenza-related hospitalizations: some unrelated pneumonias may be classified under ICD9 An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e026.jpg, and some influenza cases may not be correctly diagnosed and/or recorded as influenza. Nonetheless, this subset of hospitalizations likely includes a large fraction of hospitalized influenza cases and exhibits strong seasonal dynamics that mirror ILINet trends. The inclusion of all three ICD9 codes was suggested by health officials at Texas DSHS who seek to use ILINet to ascertain seasonal influenza-related hospitalization rates throughout the state (Texas DSHS contract numbers An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e027.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e028.jpg). Hospitalizations associated with these three codes in Texas accounted for between An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e029.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e030.jpg of all hospitalizations due to infections and roughly An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e031.jpg billion dollars of hospitalization payments in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e032.jpg (See Text S1).
Using almost a decade of state-level ILINet and hospitalization data, we find that the existing network performs reasonably well in its ability to predict influenza-like hospitalizations. However, smaller, more carefully chosen sets of providers should yield higher quality surveillance data, which can be further enhanced with the integration of state-level Google Flu Trends data. For this analysis, we adapted a new, computationally tractable, multilinear regression approach to solving complex subset selection problems. The details of this method are presented below and can be tailored to meet a broad range of surveillance objectives.
Using a submodular ILINet optimization algorithm, we investigate two scenarios for improving the Texas ILINet: designing a network from scratch and augmenting the existing network. We then evaluate the utility of incorporating Google Flu Trends as a virtual provider into an existing ILINet.
Designing a New ILINet
To construct new sentinel surveillance networks, we choose individual providers sequentially from a pool of approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e033.jpg mock providers, one for each zip code in Texas, until we reach An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e034.jpg total providers. At each step, the provider that most improves the quality of the epidemiological information produced by the network is added to the network. We optimize and evaluate the networks in terms of the time-lagged statistical correlation between aggregated ILINet provider reports (simulated by the model) and actual statewide influenza-like hospitalizations. Specifically, for each candidate network, we perform a least squares multilinear regression from the simulated ILINet time series to the actual Texas hospitalization time series, and use the coefficient of determination, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e035.jpg, as the indicator of ILINet performance. Henceforth, we will refer to these models as ILINet regression models.
We compare the networks generated by this method to networks generated by two naive models and a published computational method [17] (Figure 1). Random selection models an open call for providers and entails selecting providers randomly with probabilities proportional to their zip code's population; Greedy selection prioritizes providers strictly by the population density of their zip code. Submodular optimization significantly outperforms these naive methods, particularly for small networks, with Random selection producing slightly more informative networks than Greedy selection. The Geographic optimization method of Polgreen et al. [17] selects providers to maximize the number of people that live within a specified “coverage distance” of a provider. Submodular optimization consistently produces more informative networks than this method at a An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e036.jpg mile coverage distance (Figure 1) (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e037.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e038.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e039.jpg mile coverage distances perform worse, not shown). To visualize the relative performance of several of these networks, we compared their estimates of influenza-like hospitalizations (by applying each ILINet regression model to simulated ILINet report data) to the true state-wide hospitalization data (Figure 2). The time series estimated by a network designed using submodular optimization more closely and smoothly matches true hospitalizations than both the actual An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e040.jpg Texas ILINet and a network designed using geographic optimization (each with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e041.jpg providers).
Figure 1
Figure 1
Expected performance of optimized ILINets.
Figure 2
Figure 2
Comparing ILINet estimates to actual state-wide influenza hospitalizations.
The submodular optimization algorithm is not guaranteed to find the highest performing provider network, and an exhaustive search for the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e049.jpg provider network from the pool of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e050.jpg providers is computationally intractable. However, the submodular property of the objective function allows us to compute an upper bound on the performance of the optimal network, without knowing its actual composition (Figure 1). The performance gap between the theoretical upper bound and the optimized networks may indicate that the upper bound is loose (higher than the performance of the true optimal network) and/or the existence of better networks that might be found using more powerful optimization methods.
The networks selected by submodular optimization reveal some unexpected design principles. Most of the Texas population resides in Houston and the “I-35 corridor” – a North-South transportation corridor spanning San Antonio, Austin, and Dallas (Figure 3a). The first ten provider locations selected by submodular optimization are spread throughout the eastern half of the state (Figure 4a, pink circles). While most of the providers are concentrated closer to Texas' population belt, only two are actually located within Texas' major population centers (in this case, College Station).
Figure 3
Figure 3
Statewide influenza activity mirrors population distribution.
Figure 4
Figure 4
Location and population coverage of optimized ILINets.
The submodular networks are qualitatively different from the networks created by the other algorithms considered, which focus providers within the major population centers (Figure 4b). The higher performance of the submodular ILINets suggest that over-concentration of providers in major population centers is unnecessary. Influenza levels in the major population centers are strongly correlated (Figure 3b). Thus, ILINet information from San Antonio, for example, will also be indicative of influenza levels in Austin and Dallas. This synchrony probably arises, in part, from extensive travel between the major Texas population centers.
Subsampling and Augmenting an ILINet
Using submodular optimization, we augment the 2008 Texas ILINet by first subsampling from the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e052.jpg enrolled providers and then adding up to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e053.jpg new providers. When subsampling, performance does not reach a maximum until all An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e054.jpg providers are included in the network (Figure 5), indicating that each provider adds predictive value to the network. However, the theoretical upper bound plateaus around An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e055.jpg providers, suggesting that smaller (more optimally chosen) networks of equal predictive value may exist. During the second stage, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e056.jpg additional providers improve the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e057.jpg objective by An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e058.jpg. Most of these providers are located in relatively remote areas of the state.
Figure 5
Figure 5
Augmenting an existing ILINet.
We also considered inclusion of Internet trend data sources as virtual providers, specifically, the freely available Google Flu Trends data for the state of Texas [21]. Google Flu Trends alone is able to explain about An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e062.jpg of the variation in state-wide hospitalizations; it outperforms the 2008 Texas ILINet and matches the performance of a network with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e063.jpg traditional providers constructed from scratch using submodular optimization (Figure 6). However, the best networks include both traditional providers and Google Flu Trends. For example, by adding An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e064.jpg providers to Google Flu Trends using submodular optimization, we improve the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e065.jpg objective by a third and halve the optimality gap (from a trivial upper bound of one). The additional providers are located in non-urban areas (Figure 4a, green circles) distinct from those selected when Google Flu Trends is not allowed as a provider.
Figure 6
Figure 6
Google Flu Trends as a virtual ILINet provider.
Out-of-Sample Validation
To further validate our methodology, we simulated the real-world scenario in which historical data are used to design an ILINet and build forecasting models, and then current ILINet reports are used to make forecasts. Specifically, we used An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e067.jpg data to design ILINets and estimate multilinear regression models relating influenza-like hospitalizations to mock provider reports, and then used An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e068.jpg data to test the models' ability to forecast influenza-like hospitalizations. For networks with fewer than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e069.jpg providers, the ILINets designed using submodular optimization consistently outperform ILINets designed using the other three strategies (Figure 7). Above An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e070.jpg providers, the predictive performance of the submodular optimization ILINet begins to decline with additional providers. As the number of providers approaches An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e071.jpg (the number of weeks in the training period), the estimated prediction models become overfit to the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e072.jpg period. Thus, the slightly increased performance of the Random method over the submodular optimization after An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e073.jpg providers is spurious. For the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e074.jpg values presented in Figure 7, the effect of noise and variable reporting are integrated out when calculating the expected provider reports. An alternative approach to out-of-sample validation is presented in Text S1; it yields the same rank-order of model performance.
Figure 7
Figure 7
Predictive performance of ILINets.
Since the mid twentieth century, influenza surveillance has been recognized as an increasingly complex problem of global concern [22]. However, the majority of statistical research has focused on the analysis of surveillance data rather than the data collection itself, with a few notable exceptions [12], [17]. High quality data is essential for effectively monitoring seasonal dynamics, detecting anomalies, such as emerging pandemic strains, and implementing effective time-sensitive control measures. Using a new method for optimizing provider-based surveillance systems, we have shown that the Texas state ILINet would benefit from the inclusion of a few strategically selected providers and the use of Internet data streams.
Our method works by iteratively selecting providers that contribute the most information about influenza-like hospitalizations. We quantified the performance of various ILINets using the coefficient of determination An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e075.jpg resulting from a multi-linear regression between each provider's time series and state-wide influenza-like hospitalizations. Importantly, these simulated providers have reporting rates and error distributions estimated from actual ILINet providers in Texas (see Text S1). The result is a prioritized list of zip codes for inclusion in an ILINet that can be used for future ILINet recruiting. Although this analysis was specifically motivated by the Texas DSHS interest in predicting hospitalizations with ICD9 codes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e076.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e077.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e078.jpg, our method can be readily extended to design a network for any disease or influenza definition with the appropriate historical data. In general, the method requires both historical provider reports and historical time series of the prediction target. However, if one has reasonable estimates of provider reporting rates and informational noise from another source (e.g., estimates from a surveillance network in another region or for another disease), then historical provider reports are not necessary.
ILINet provider reports do not necessarily reflect true influenza activity. Rather they are supposed to indicate the number of patients that meet the clinical ILI case definition, which results in a substantial number of false positives (reported non-influenza cases) and false negatives (missed cases of influenza) [23]. The case definition for ILI is often loosely applied, further confounding the relationship between these measures and true influenza. Similarly, the ICD9 codes used in our analysis do not correspond perfectly to influenza hospitalizations: some influenza cases will fail to be classified under those codes, and some non-influenza cases will be. Nonetheless, public health agencies are interested in monitoring and forecasting the large numbers of costly hospitalizations associated with these codes. We find that ILINet surveillance data correlates strongly with this set of influenza-like hospitalizations, and that the networks can be designed to be even more informative.
Although we provide only a single example here, this optimization method can be readily applied to designing surveillance networks for a wide range of diseases on any geographic scale, provided historical data are available and the goals of the surveillance network can be quantified. For example, surveillance networks could be designed to detect emerging strains of influenza on a global scale, monitor influenza in countries without surveillance networks, or track other infectious diseases such as malaria, whooping cough, or tuberculosis or non-infectious diseases and chronic conditions such as asthma, diabetes, cancer or obesity that exhibit heterogeneity in space, time or by population subgroup. As we have shown with Google Flu Trends, our method can be leveraged to evaluate the potential utility of incorporating other Internet trend data mined from search, social media, and online commerce platforms into traditional surveillance systems.
While optimized networks meet their specified goals, they may suffer from over optimization and be unable to provide valuable information for other diseases or even for the focal disease during atypical situations. For example, a surveillance network designed for detecting the early emergence of pandemic influenza may look very different from one optimized to monitor seasonal influenza. Furthermore, an ILINet optimized to predict influenza-like hospitalizations in a specific socio-economic group, geographic region, or race/ethnicity may look very different from an ILINet optimized to predict state-wide hospitalizations. When optimizing networks, it is thus important to carefully consider the full range of possible applications of the network and integrate diverse objectives into the optimization analysis.
The optimized Texas ILINets described above exhibit much less redundancy (geographic overlap in providers) than the actual Texas ILINet. Whereas CDC guidelines have led Texas DSHS to focus the majority of recruitment on high population centers, the optimizer only sparsely covered the major urban areas because of their synchrony in influenza activity. This is an important distinction between submodular optimization and the other methods considered (Geographic, Random and Greedy). The submodular method does not track population density and instead adds providers who contribute the most marginal information to the network. Consequently, it places far more providers in rural areas than the other methods (Figure 4b). There can be substantial year-to-year variation in spatial synchrony for seasonal influenza, driven by the predominant influenza strains and commuter traffic between population centers [24]. As long as the historical data used during optimization reflect this stochasticity, the resulting networks will be robust. However, synchrony by geography and population density does not occur for all diseases including emerging pandemic influenza [24]; thus the relatively sparse networks designed for forecasting seasonal influenza hospitalizations may not be appropriate for other surveillance objectives, like detecting emerging pandemic strains or other rare events. For example, a recent study of influenza surveillance in Beijing, PRC suggested that large hospitals provided the best surveillance information for seasonal influenza, while smaller provincial hospitals were more useful for monitoring H5N1 [12].
Although our method outperforms the Maximal Coverage Method (MCM), referred to as Geographic, proposed by Polgreen et al. (2009), there are several caveats. First, population densities and travel patterns within Texas are highly non-uniform. The two methods might perform similarly for regions with greater spatial uniformity. Second, our method is data intensive, requiring historical surveillance data that may not be available, for example, in developing nations, whereas the population density data required for MCM is widely available. However, the type of data used in this study is readily available to most state public health agencies in the United States. For example, the CDC's Influenza Hospitalization Network (FluSurv-NET) collects weekly reports on laboratory confirmed influenza-related hospitalizations in fourteen states. In addition, alternative internet-based data sources like Google Flu Trends are becoming available. Third, as discussed above, our networks are optimized towards specific goals and may thus have no expected level of performance for alternate surveillance goals. Important future research should focus on designing networks able to perform well under a range of surveillance goals. Fourth, neither ILINet data nor influenza-like hospitalizations correspond perfectly to actual influenza activity. One could instead optimize ILINets using historical time series of laboratory-confirmed cases of influenza. Although some provider locations and the estimated regression models may change, we conjecture that the general geospatial distribution of providers will not change significantly. Fourth, we followed Polgreen et al. (2009)'s use of Euclidean distances. However, travel distance is known to correlate more strongly with influenza transmission than Euclidean distance [24], and thus alternative distance metrics might improve the performance of the MCM method. Finally, while submodular optimization generally outperforms the other design methods in out-of-sample prediction of influenza-like hospitalizations, it suffers from overfitting when the number of providers in the network approaches the number of data points in the historical time series.
The impressive performance of Google Flu Trends leads us to question the role of traditional methods, such as provider-based surveillance networks, in next generation disease surveillance systems. While Texas Google Flu Trends alone providers almost as much information about state-wide influenza hospital discharges as the entire 2008 Texas ILINet, an optimized ILINet of the same size contains An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e079.jpg more information than Google Flu Trends alone. Adding Google Flu Trends to this optimized network as a virtual provider increases its performance by an additional An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e080.jpg. Internet driven data streams, like Google Flu Trends, may have age and socio-economic biases that over-represent certain groups, a possible explanation for the difference in providers selected when Google Flu Trends is included, Figure 4a. Given the relatively low cost of voluntary provider surveillance networks, synergistic approaches that combine data from conventional and Internet sources offer a promising path forward for public health surveillance.
This optimization method was conceived through a collaboration between The University of Texas at Austin and the Texas Department of State Health Services to evaluate and improve the Texas ILINet. The development and utility of quantitative methods to support public health decision making hinges on the continued partnership between researchers and public health agencies.
Data
The Texas Department of State Health Services (DSHS) provided (1) ILINet data containing weekly records from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e081.jpg reporting the number of patients with influenza-like-illness and the total number of patients seen by each provider in the network, and (2) individual discharge records for every hospital in Texas from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e082.jpg (excluding hospitals in counties with less than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e083.jpg inhabitants, in counties with less than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e084.jpg total hospital beds, or those hospitals that do not seek insurance payment or government reimbursement). We classified all hospital discharges containing ICD9 codes of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e085.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e086.jpg, or An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e087.jpg as influenza-related. Google Flu Trends data was downloaded from the Google Flu Trends site [21] and contains estimates of ILI cases per An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e088.jpg physician visits determined using Google searches [25]. Data on population size and density was obtained from the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e089.jpg census [26].
Provider Reporting Model
The first step in the ILINet optimization is to build a data-driven model reflecting actual provider reporting rates and informational noise, that is, inconsistencies between provider reports and true local influenza prevalence.
We model reporting as a Markov process, where each provider is in a “reporting” or “non-reporting” state. A provider in the reporting state enters weekly reports, while a provider in the non-reporting state does not enter reports. At the end of each week, providers independently transition between the reporting and non-reporting states. Such a Markov process model allows for streaks of reporting and streaks of non-reporting for each provider, which is typical for ILINet providers. We estimate transition probabilities between states from actual ILINet provider report data. For each provider, the transition probability from reporting to non-reporting is estimated by dividing the number of times the transition occurred by the number of times any transition out of reporting is observed. The probabilities of remaining in the current reporting state and transitioning from non-reporting to reporting are estimated similarly.
We model noise in reports using a standard regression noise model of the form
A mathematical equation, expression, or formula.
 Object name is pcbi.1002472.e090.jpg
(1)
where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e091.jpg(i) denotes the number of ILI cases reported by the provider in week An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e092.jpg; An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e093.jpg(i) denotes the estimated prevalence of ILI in the provider's zip code in week An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e094.jpg; An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e095.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e096.jpg are regression constants fixed for the provider; and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e097.jpg is a normally distributed noise term with variance An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e098.jpg also fixed for the provider. For existing providers, we use empirical time series (their past ILINet reporting data matched with local ILI prevalence, described below) to estimate the constants An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e099.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e100.jpg using least squares linear regression. This noise model has the intuitive interpretation that each provider's reports are a noisy reading of the percent of the population with ILI in the provider's zip code.
We use the Texas hospital discharge data to estimate the local ILI prevalences (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e101.jpg(i)) for each zip code. Given an estimate of the influenza hospitalization rate [27] and assuming that each individual with ILI is hospitalized independently, we can obtain a distribution for the number of influenza-related hospitalizations in a zip code, given the number of ILI cases in the zip code. Using Bayes rule, a uniform prior, and the real number of influenza-related hospitalizations (from the hospital discharge data), we derive distributions for the number of ILI cases for each zip code and each week. We then set An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e102.jpg(i) for each zip code equal to the mean of the distribution of ILI cases in that zip code for week An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e103.jpg, divided by the population of the zip code.
Generating Pools of Mock Providers
The second step in the ILINet optimization is to generate a pool of mock providers. For each actual provider in the Texas ILINet, we estimate a reporting profile specified by [1)] transition probabilities between reporting and non-reporting (Markov) states, and the constants An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e104.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e105.jpg, modeling noise in the weekly ILI reports. To generate a mock provider in a specified zip code, we select a uniformly random reporting profile out of all reporting profiles estimated from existing providers. The generated mock providers are thereby given reporting characteristics typical of existing providers. We can then generate an ILI report time series for a mock provider, by 1) generating reports only during reporting weeks, and calculating reports using equation (1) with the constants given in the provider's reporting profile and estimates of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e106.jpg(i) for the mock provider's zip code.
We select providers from pools consisting of a single mock provider from each zip code. Zip codes offer a convenient spatial resolution, because they have geographic specificity and are recorded in both the Texas ILINet and hospital discharge data. The optimization algorithm is not aware of a mock provider's reporting profile when the provider is selected (discussed below).
Provider Selection Optimization
The final step in our ILINet design method is selecting an optimized subset of providers from the mock provider pool. We seek the subset that most effectively predicts a target time series (henceforth, goal), as measured by the coefficient of determination (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e107.jpg) from a least squares multilinear regression to the goal from the report time series for all providers in the subset. Specifically, the objective function is given by
A mathematical equation, expression, or formula.
 Object name is pcbi.1002472.e108.jpg
where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e109.jpg is the goal random variable; An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e110.jpg is a subset of the mock provider pool; An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e111.jpg are provider reports for provider An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e112.jpg; and the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e113.jpg are the best multilinear regression coefficients (values that minimize the second term in the numerator).
There are several advantages to this objective function. First, it allows us to optimize an ILINet for predicting a particular random variable. Here, we set the goal to be state-wide influenza-related hospitalizations for Texas. This method can be applied similarly to design surveillance networks that predict, for example, morbidity and/or mortality within specific age groups or high risk groups.
Second, the objective function is submodular in the set of providers, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e114.jpg [28], implying generally that adding a new provider to a small network will improve performance more than adding the provider to a larger network. The submodular property enables computationally efficient searches for near optimal networks and guarantees a good level of performance from the resulting network [29]. Without a submodular objective function, optimization of a An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e115.jpg provider ILINet may require an exhaustive search of all subsets of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e116.jpg providers from the provider pool, which quickly becomes intractable. For example, an exhaustive search for the optimal An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e117.jpg provider Texas ILINet from our pool of approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e118.jpg mock providers would require roughly An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e119.jpg regressions.
Taking advantage of the submodular property, we rapidly build high performing networks (with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e120.jpg providers) according to the following algorithm:
  • Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e121.jpg be the entire provider pool, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e122.jpg be the providers selected thus far, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e123.jpg be a submodular function in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e124.jpg. We begin without any providers in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e125.jpg.
  • Repeat until there are An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e126.jpg providers in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e127.jpg:
    • Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e128.jpg be the provider in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e129.jpg that maximizes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e130.jpg
    • Add An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e131.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e132.jpg.
This is guaranteed to produce a network that performs within a fraction of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e133.jpg of the optimal network [28]. The submodularity property also allows us to compute a posterior bound on the distance from optimality, which is often much better than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e134.jpg. Finally, even if implemented naively, the algorithm only requires approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e135.jpg regressions to select An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e136.jpg providers from a pool of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e137.jpg.
When optimizing, it is important to consider potential noise (underreporting and discrepancies between provider reports and actual ILI activity in the zip code). However, we assume that one cannot predict the performance of a particular provider before the provider is recruited into the network. To address this issue, the optimization's objective function is an expectation over the possible provider reporting profiles. Specifically, we define An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e138.jpg as a random variable describing the provider reporting profile for the entire pool of mock providers. If An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e139.jpg is a specific reporting profile, then the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e140.jpg objective function can be written as
A mathematical equation, expression, or formula.
 Object name is pcbi.1002472.e141.jpg
To design the ILINet, we solve the following optimization problem
A mathematical equation, expression, or formula.
 Object name is pcbi.1002472.e142.jpg
The objective function is a convex combination of submodular functions, and thus is also submodular. This allows us to use the above algorithm along with its theoretical guarantees to design ILINets using a realistic model of reporting practices and informational stochasticity, without assuming that the designer knows the quality of specific providers a priori.
Maximal Coverage Model
We implemented the Maximal coverage model (MCM) following Polgreen et al. (2009). Briefly, a greedy algorithm was used to minimize the number of people in Texas who live outside a pre-defined coverage distance, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e143.jpg, of at least one provider in the selected set, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e144.jpg. A general version of this algorithm was developed by Church and Re Velle (1974) to solve this class of MCM's [30]. As per Polgreen et al. (2009), we assumed that the population density of each zip code exists entirely at the geographic center of the zip code and used Euclidean distance to measure the distance between zip codes. Using a matrix of inter-zip code distances we select providers iteratively, choosing zip codes that cover the greatest amount of population density within the pre-defined coverage distance, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e145.jpg. We considered An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e146.jpg, 10, 20, and 25 miles, and found that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e147.jpg miles yielded the most informative networks.
Naive Methods
We used two naive methods to model common design practices for state-level provider-based surveillance networks.
  • Greedy selection by population density - All zip codes were ordered by population density and added to the provider pool An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e148.jpg. Providers are then moved from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e149.jpg to the selected set An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e150.jpg from highest to lowest density. The algorithm stops when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e151.jpg reaches a pre-determined size or An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e152.jpg is empty.
  • Uniform random by population size - Zip codes are randomly selected from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e153.jpg and moved to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e154.jpg with a probability proportional to their population size. The algorithm proceeds until either An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e155.jpg reaches a pre-determined size or An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e156.jpg is empty.
Principal Component Analysis of Hospitalization Time Series across Texas Zip Codes
To analyze similarities in ILI hospitalizations across different zip codes, we apply principal component analysis (PCA) [31]. Specifically, we perform PCA on the centered (mean zero), standardized (unit variance) hospitalization time series of all zip codes in Texas. We first compute a time series for the first principal component, and then compute an An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e157.jpg for each zip code, based on a linear regression from the first principal component to the zip code's centered, standardized hospitalizations. Zip codes with high An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e158.jpg values have hospitalization patterns that exhibit high temporal synchronicity with the first principal component.
Out-of-Sample Validation
To validate our method, we first use submodular optimization to create a provider network of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e159.jpg providers, using only data from 2001 to 2007, and then evaluate the performance of the network in predicting 2008 influenza-like hospitalizations. Specifically, after creating the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e160.jpg-provider network (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e161.jpg), we use actual hospitalization data and mock provider reports for the 2001–2007 period to fit a multilinear regression model of the form An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e162.jpg where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e163.jpg is time series of state-wide influenza-like hospitalizations at week An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e164.jpg for weeks in 2001 to 2007, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e165.jpg is the mock report time series of provider An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e166.jpg during week An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e167.jpg for weeks in 2001 to 2007, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e168.jpg is the best multilinear regression coefficient associated with provider An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e169.jpg.
We then use the estimated multilinear regression function to forecast state-wide influenza-like hospitalization during 2008 from mock provider reports of 2008, and compare these forecasts to actual 2008 hospitalization data. This simulates a real-world prediction, where only historical data is available to create the provider network (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e170.jpg) and estimate the prediction function (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e171.jpg's), and then the most recent provider reports (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e172.jpg's) are used to make predictions. We evaluate the 2008 predictions using a variance reduction measure similar to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e173.jpg, except that the multilinear prediction model uses coefficients estimated from prior data, as given by
A mathematical equation, expression, or formula.
 Object name is pcbi.1002472.e174.jpg
where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e175.jpg is the hospitalization time series in 2008, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e176.jpg is the provider noise profile, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e177.jpg are the mock provider reports in 2008. Importantly, we first calculate an expected value for the provider reports, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e178.jpg, given the noise profiles An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e179.jpg, before calculating An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e180.jpg. We also considered an alternative validation method in which we first calculate an An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e181.jpg for each provider report and noise-profile combination, and then analyze the resulting distribution of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e182.jpg values (see Text S1 for results).
Figure S1
Proportion of hospitalizations associated with ICD9s 486, 487 and 488 - We present the proportion of respiratory illness related hospitalizations that were also associated with ICD9s 486, 487 and 488. The total number of respiratory illness related hospitalizations were estimated from the Texas hospitalization database, the same database used to determine the number of ICD9 486, 487 and 488 associated cases. There is a strong seasonality in the proportion, with peaks in the winter between 0.30 and 0.37 and valleys in the summer around 0.24.
(TIF)
Figure S2
Weekly costs associated with ICD9s 486, 487 and 488 - The total weekly billing charges associated with influenza-like hospitalizations are plotted from the end of 2001 through the beginning of 2009. On average 500 million dollars of hospital charges were billed per month to patients associated with ICD9s 486, 487 and 488. However, it is important to note the over two-fold increase in this amount since 2002. For the 2007–2008 influenza season this increase corresponded to a total billed amount of 9.3 billion dollars. This represents nearly 1 percent of the yearly GDP in Texas, which is not much less than the year-to-year economic growth.
(TIF)
Figure S3
Texas ILINet provider reporting rates - (a) Histograms are presented for the four transition probabilities used in our Markov model of provider reporting. The change in skew between panels i and iv as compared to panels ii and iii is expected given the observation of “streaky” reporting of ILINet providers in Texas. The providers with a score of one in panel ii are those ideal providers who are likely to resume reporting after missing a week. (b) A scatter plot of the values in S3a- i and S3a- ii, Report given Reported and Report given Failed to Report, are presented to indicate that there are both reliable and unreliable providers enrolled in the Texas ILINet, with darker blue indicating a more reliable provider and light-blue to white a less reliable provider.
(TIF)
Figure S4
Out-of-Sample Model Validation - We used data from 2001–2007 to design ILINets and to fit multi-linear prediction functions, and then generated provider-report based forecasts of hospitalizations during 2008 (without using any data from 2008) and compared these predictions to actual 2008 hospitalization data (see text for details). The An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e183.jpg values reflect the predictive performance of the different ILINets. For each ILINet, we predicted 100 time series from simulated provider reports, each time drawing random deviates from the provider noise and reporting distributions, and then compared them to actual 2008 hospitalizations by calculating An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e184.jpg. Lines indicate the average An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e185.jpg and shaded regions indicate the middle An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e186.jpg of the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e187.jpg distribution. Negative values indicate that the predicted hospitalization time series are more variable than the actual time series. The increasingly poor performance and uncertainty with additional providers is a result of over-fitting of the prediction model to data from the 2001–2007 training period. The submodular method is the only one to yield ILINets with a An external file that holds a picture, illustration, etc.
Object name is pcbi.1002472.e188.jpg greater than zero.
(TIF)
Figure S5
The importance of realistic reporting rates and noise - We compared the first ten providers selected by the submodular optimization method when providers either contained (a) perfect information and perfect reporting rates or (b) were subject to the patterns of imperfect and variable reporting exhibited by actual ILINet providers. When simulated providers had reporting probabilities and noise similar to actual providers the resulting network contained more geographic redundancy than one built from simulated providers with perfect information and reporting rates. All results presented in the manuscript were determined using simulated providers with patterns of imperfect and variable reporting derived from actual ILINet data. The stark difference highlights the importance of incorporating the characteristics of actual ILINet provider reporting.
(TIF)
Text S1
In text S1, we present the results of five supplementary analyses. 1) The importance of influenza-like hospitalizations in terms of total respiratory disease related hospitalizations and health care charges in Texas, 2) The details of actual ILINet provider reporting in Texas, the data described here were used to derive our provider reporting model, 3) The time-lagged, linear relationship between influenza-like hospitalizations, ILINet, and Google Flu Trends in Texas, 4) Additional model validation results, which support and confirm those presented in the main text, and 5) The importance of incorporating realistic provider reporting rates and noise illustrated by the dramatic difference in the results when perfect information and reporting is assumed.
(PDF)
Acknowledgments
The authors thank Marilyn Felkner and Jeff Taylor at the Texas Department of State Health Services and the Meyers and Wilke labs at the University of Texas at Austin for useful comments and discussion.
Footnotes
The authors have declared that no competing interests exist.
This study was supported by the National Science Foundation DEB-0749097 to LAM and graduate research fellowship to SVS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
1. Brownstein JS, Freifeld CC, Reis BY, Mandl KD. Surveillance Sans Frontières: Internet-Based Emerging Infectious Disease Intelligence and the HealthMap Project. PLoS Med. 2008;5:e151. [PMC free article] [PubMed]
2. Khan AS, Fleischauer A, Casani J, Groseclose SL. The Next Public Health Revolution: Public Health Information Fusion and Social Networks. Am J Public Health. 2010;100:1237–1242. [PubMed]
3. Mnatsakanyan ZR, Burkom HS, Coberly JS, Lombardo JS. Bayesian Information Fusion Networks for Biosurveillance Applications. J Am Med Inform Assoc. 2011;16(6):855–863. [PMC free article] [PubMed]
4. Bush GW. Homeland security presidential directive 21: public health and medical preparedness. 2007. Available: http://www.dhs.gov/xabout/laws/gc_1219263961449.shtm. Accessed February 13th 2012.
5. US Centers for Disease Control and Prevention. National biosurveillance strategy for human health. 2010. Available: http://www.cdc.gov/osels/ph_surveillance/bc.html. Accessed February 13th 2012.
6. Clothier H, Turner J, Hampson A, Kelly H. Geographic representativeness for sentinel influenza surveillance: implications for routine surveillance and pandemic preparedness. Aust NZ J Public Health. 2006;30:337–341. [PubMed]
7. Deckers JG, Paget WJ, Schellevis FG, Fleming DM. European primary care surveillance networks: their structure and operation. Fam Pract. 2006;23:151–158. [PubMed]
8. Carrat F, Flahault A, Boussard E, Farran N, Dangoumau L, et al. Surveillance of influenza-like illness in france. The example of the 1995/1996 epidemic. J Epidemiol Community Health. 1998;52:32S–38S. [PubMed]
9. Ordobás M, Zorrilla B, Arias P. Influenza in madrid, spain, 1991–92: validity of the sentinel network. J Epidemiol Community Health. 1995;49:14–16. [PMC free article] [PubMed]
10. Viboud C, Boëlle P, Carrat F, Valleron A, Flahault A. Prediction of the spread of influenza epidemics by the method of analogues. Am J Epidemiol. 2003;158:996–1006. [PubMed]
11. Rath T, Carreras M, Sebastiani P. Automated detection of influenza epidemics with hidden markov models. In: Berthold M, Lenz H, Bradley E, Kruse R, Borgelt C, editors. Advances in Intelligent Data Analysis V. Berlin-Heidelberg: Springer; 2003. pp. 521–532.
12. Yang P, Duan W, Lv M, Shi W, Peng X, et al. Review of an influenza surveillance system, beijing, people's republic of china. Emerg Infect Dis. 2009;15:1603–1608. [PMC free article] [PubMed]
13. Fleming D, Zambon M, Bartelds A, de Jong J. The duration and magnitude of influenza epidemics: A study of surveillance data from sentinel general practices in england, wales and the netherlands. Eur J Epidemiol. 1999;15:467–473. [PubMed]
14. Quénel P, Dab W. Influenza a and b epidemic criteria based on time-series analysis of health services surveillance data. Eur J Epidemiol. 1998;14:275–285. [PubMed]
15. Cowling BJ, Wong IOL, Ho LM, Riley S, Leung GM. Methods for monitoring influenza surveillance data. Int J Epidemiol. 2006;35:1314–1321. [PubMed]
16. Jiang X, Wallstrom G, Cooper GF, Wagner MM. Bayesian prediction of an epidemic curve. J Biomed Inform. 2009;42:90–99. [PubMed]
17. Polgreen P, Chen Z, Segre A, Harris M, Pentella M, et al. Optimizing influenza sentinel surveillance at the state level. Am J Epidemiol. 2009;170 [PMC free article] [PubMed]
18. Ginsberg J, Mohebbi M, Patel R, Brammer L, Smolinki M, et al. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012–1014. [PubMed]
19. Valdivia A, López-Alcalde J, Vicente M, Pichiule M, Ruiz M, et al. Monitoring influenza activity in europe with google flu trends: comparison with the findings of sentinel physician networks – results for 2009–10. Euro Surveill. 2010;15:1–6. [PubMed]
20. Wilson N, Mason K, Tobias M, Peacey M, Huang Q, et al. Pandemic influenza a (h1n1) v in new zealand: the experience from april to august 2009. Euro Surveill. 2009;14:19386. [PubMed]
21. Googleorg. Explore flu trends - United States. 2003. http://www.google.org/flutrends/us/#US-TX. Accessed February 13th 2012.
22. Langmuir A, Housworth J. A critical evaluation of influenza surveillance. Bull World Health Organ. 1969;41:393–398. [PubMed]
23. Monto AS, Gravenstein S, Elliott M, Colopy M, Schweinle J. Clinical Signs and Symptoms Predicting Influenza Infection. Arch Intern Med. 2000;160:3242–3247. [PubMed]
24. Viboud C, Bjørnstad ON, Smith DL, Simonsen L, Miller MA, et al. Synchrony, waves, and spatial hierarchies in the spread of influenza. Science. 2006;312:447–451. [PubMed]
25. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, et al. Detecting influenza epidemics using search engine query data. Nature. 2008;457:1012–1014. [PubMed]
26. Censusgov. Census 2000 US Gazetteer Files. 2002. http://www2.census.gov/census_2000/datasets/ Accessed February 13th 2012.
27. Thompson WW, Shay DK, Weintraub E, Brammer L, Bridges CB, et al. Influenza-associated hospitalizations in the united states. JAMA. 2004;292:1333–1340. [PubMed]
28. Das A, Kempe D. Proceedings of the 40th annual ACM symposium on Theory of computing. New York: ACM; 2008. Algorithms for subset selection in linear regression. pp. 45–54.
29. Nemhauser GL, Wolsey LA, Fisher ML. An analysis of approximations for maximizing submodular set functions–i. Math Program. 1978;14:265–294.
30. Church R, ReVelle C. The maximal covering location problem. Papers in Regional Science. 1974;32:101–118.
31. Jolliffe I. Principal Component Analysis. New York: Springer series in statistics; 2002.
Articles from PLoS Computational Biology are provided here courtesy of
Public Library of Science