We present simple methods to estimate the reproduction number of emerging zoonoses from routine surveillance data. This research project was initiated to answer a seemingly straightforward question: 50% of the H3N2v-M cases that were detected in the US in 2011 had no contact with swine. What were the implications for the level of human-to-human transmission? At the time, however, the answer did not appear as straightforward as our analysis now shows it to be, with these simple estimators R
, depending on the surveillance scenario we are in.
Our approach has specific properties that potentially overcome some of the limitations of existing methods. First, the investigation effort required is less than that for other methods. For example, if there is active case finding (surveillance scenario 2), one only needs to investigate the source of infection of the first case detected in each cluster. Second, the statistical treatment of the data is extremely simple, making it possible for anyone to interpret raw surveillance statistics about the source of infection of cases (statistics G or F) in terms of human-to-human transmissibility (reproduction number R). Third, the method is robust to selection bias (i.e., the fact that larger clusters are more likely to be detected) and under-ascertainment (i.e., ability to detect all cases in a cluster once a cluster is identified).
However, our methods do require that the source of infection (i.e., human or natural reservoir) can be identified for either the first detected cases of a cluster or a random subset of detected cases, depending on the surveillance scenario. In the US H3N2v-M context, where most individuals have no contact with swine, it is usually relatively easy to rule out the natural reservoir as a source of infection. Determining the source of infection might however be harder in situations where a large part of the population has regular contacts with the natural reservoir (e.g., backyard poultry in rural areas). In this case, an in-depth epidemiological investigation of the potential sources of infection is required. Clearly our approach cannot be used if the zoonotic source has not been identified yet. That being said, the examples we have presented show that proxy measures for the source of transmission (e.g., contacts with the natural reservoir) are often available. Although imperfect, these proxy measures are often already used by the scientific and public health community. For example, the World Health Organization fact sheet on Nipah virus states that “in Bangladesh, half of reported cases between 2001 and 2008 were due to human-to-human transmission” 
. Our methods improve this existing practice in at least three ways: (1) it adds an essential layer of interpretation by providing estimates of the reproduction number, R
, which allows assessment of how far transmission is from being sustained (R
≥1); (2) it clarifies sources of potential biases; and (3) it provides simple guidelines to reduce these biases (see summary in Box 1
If the source of infection is unknown for a subset of cases, different options are available. If data are believed to be missing at random, these cases can simply be excluded from the analysis. However, it is possible that certain sources of infection are more likely to generate missing data. In such a scenario, upper bounds for R
can be obtained by assuming that all cases with missing data were due to human-to-human transmission and lower bounds by assuming the reverse. If human-to-human cases are prone to be classified as reservoir-to-human transmission, R
may be underestimated. If the source of infection is uncertain, the analysis of detailed outbreak data 
might allow estimation of the probability that the case was infected by the reservoir, which could then be used to estimate F
using our methods. Phylogenetic analysis might also help resolve uncertainties about the source of infection. Clearly, our approach will start to break down if the proportion of cases with missing data becomes too large or if the classification of cases is too unreliable. We note that other methods, for instance based on contact tracing, would also struggle in these situations.
It is possible that surveillance intensity might change over time due to increasing media attention or health concerns. In such a context, in the surveillance scenario with outbreak investigations (scenario 2), shows that it is possible to derive bounds on R. In surveillance scenario 1, a change in the case detection rate is not expected to impact the summary statistic G or the estimate of R.
A bigger source of concern is if the increase in surveillance intensity focuses disproportionately on those cases that were exposed to the natural reservoir, as this would lead to overestimating F
and underestimating R
. For example, with the substantial increase in H3N2v-M virus infections during the summer of 2012, CDC changed their recommendations and asked clinicians to obtain respiratory specimens from ill persons with recent swine exposure 
. Therefore, in the summer 2012, ill persons with recent swine exposure may have been more likely to have been tested for H3N2v-M infection than those without such exposure. For this reason, we cannot use our method to analyze data collected in 2012.
Nonetheless, it is interesting to note that our estimate of R
of around 0.5 for H3N2v-M in 2011 seems larger than what is suggested by data collected in summer 2012. We believe that these differences could at least partly be explained by seasonal variations in the ability of the virus to transmit 
. Indeed, five of the six human-to-human transmission events detected in 2011 occurred in November–December; the remaining one happened in late August.
Our method has been developed for routine surveillance systems where cases are detected independently of each other. It would require modification to be applied to data collected from cluster detection surveillance systems of the type developed after the SARS epidemic, which target unexplained clusters of severe respiratory infection 
While a positive property of our approach is that it does not require a full-blown outbreak investigation, there are still many good reasons to target and investigate large outbreaks. Detailed investigations of large outbreaks are indeed needed to estimate essential parameters such as transmission risk factors or the generation time with adequate power.
Our method is designed to estimate R in the context of subcritical outbreaks, i.e., R<1. As illustrated with the cholera in the Dominican Republic example, the method can give a hint that transmission is at levels close or above what is needed for sustained transmission (R≈1 or R>1). However if R≥1, other estimation methods are necessary to derive a point estimate of R.
If transmissibility of a zoonotic infection suddenly increased (for example due to seasonal factors or genetic changes in the virus), it might take time for estimates to adjust (since the method uses data from all clusters detected so far). Future developments of the method could aim to detect sudden changes in the frequency of cases linked to the reservoir, building on methods for the sequential detection of change points in quality control and dynamical systems 
Finally, we hope that simplicity of our method and its limited data requirements will facilitate more robust monitoring of the epidemic potential of many zoonoses known to cause occasional human case clusters (such as Crimean-Congo hemorrhagic fever, Monkeypox virus, E. coli O157
H7, or Mycobacterium bovis
around the world.