We simulated a comparative study for a ‘healthy cohort’ of participants that are not infected with influenza at recruitment. We assume that participants in the cohort are individually randomized in equal proportions between an intervention arm and a control arm (or between two intervention arms), as balanced studies tend to have greater statistical power than unbalanced studies. We proceed under the assumption that the intervention being considered is an NPI, such as wearing face masks or shields, or increasing their hand hygiene behaviors. We assume that all participants are recruited independently and are not members of the same households, schools, or otherwise clustered.
A range of syndromic definitions have been used as proxy outcomes in influenza studies, including definitions aiming at greater sensitivity such as “acute respiratory illness” (ARI) defined as any two of a range of respiratory and systemic symptoms (e.g. fever ≥37.8°C, cough, headache, sore throat, or myalgia) as well as definitions aiming at greater specificity by restricting to febrile ARI (FARI) for example the CDC surveillance definition of “influenza-like illness” as fever ≥37.8°C plus cough or sore throat 
We consider seven alternative approaches to identification of influenza infections in the comparative study:
- Collection and testing by RT-PCR of respiratory specimens from participants reporting FARI.
- Collection and testing by RT-PCR of respiratory specimens from participants reporting ARI.
- Collection and testing by RT-PCR of respiratory specimens collected from all participants at biweekly intervals regardless of illness, as well as from any participants reporting ARI.
- Collection of paired serum from all participants plus collection and testing by RT-PCR of respiratory specimens from participants reporting FARI.
- Collection of paired serum from all participants plus collection and testing by RT-PCR of respiratory specimens from participants reporting ARI.
- Collection of paired serum from all participants plus collection and testing by RT-PCR of respiratory specimens collected from all participants at biweekly intervals regardless of illness, as well as from any participants reporting ARI.
- Collection of paired serum from all participants but no collection of respiratory specimens.
In these approaches, ARI and FARI trigger refers to collection of respiratory specimens within 1–3 days of onset of illness only if and when ARI or FARI are reported by a study participant. Because our interest is in studies that can demonstrate effectiveness of interventions against influenza specifically, we did not consider ARI or FARI as primary outcomes in our analysis and therefore of primary relevance to the present optimal design considerations, although they might be included as secondary outcomes. For analysis of paired sera a 4-fold or greater rise in antibody titers on hemagglutination inhibition (HI) assays is used to indicate infection 
. We did not consider proxy outcomes such as absenteeism, or clinical outcomes such as hospital admissions or outpatient visits because they were believed to have low power as study endpoints 
shows the parameter values used in our simulations. We assumed that the intervention could reduce the risk of influenza virus infections by 30%, with a consequent reduction in the rates of ARI and FARI episodes associated with influenza. Our simulations also allowed for an effect of the NPI on the rates of ARI and FARI episodes not associated with influenza 
. For simplicity we assume that the risk of ARI and FARI associated with non-influenza infections is independent of the transmission dynamics of and infection with influenza virus and vice versa. For each study design variant, we used a Monte Carlo approach to randomly simulate a set of 2500 datasets. For each dataset we used chi-squared tests of the difference between arms in the proportion of laboratory confirmed infections. The proportion of datasets in which the null-hypothesis of no difference was rejected at the 0.05 significance level was defined as the statistical power 
. Further technical details are provided in Text S1
For each study budget, we calculated the number of participants per arm that can be recruited given the chosen diagnostic method and consequent costs of follow-up, as well as the anticipated ‘base case’ level of ARI and FARI incidence. We investigated the effect on study power to variability in the activity of influenza and other respiratory viruses during the study as a key sensitivity analysis. This was done because in the case of respiratory specimen collection triggered by ARI or FARI, the number of specimens collected could exceed the allotted budget if the activity of influenza and other non-influenza respiratory viruses was higher than anticipated. If that occurred in our simulation, only the number of specimens allowed by the study budget was tested. Simulations were performed assuming three different scenarios. In the first scenario (I) we assume that the cumulative incidence of ARI and FARI not associated with influenza in the control arm are 0.40 and 0.10 respectively and that these are correctly estimated in advance of the study. In the scenarios (II) and (III) the cumulative incidence of non-influenza ARI and FARI are again 0.40 or 0.10 but for purposes of study planning these are believed incorrectly to be 0.20 and 0.06 or 0.60 and 0.14 respectively. Scenarios II and III are used to illustrate how underestimation or over-estimation of ARI and FARI attack rates will reduce the power of detection methods relying on ARI or FARI report or trigger. Power, sample size, and cumulative incidence of infection in the control arm (i.e. proportion of control arm participants identified as having influenza) were plotted as a function of field budget for these three scenarios.
Due to uncertainties in model parameters, we performed several sensitivity analyses to examine how sensitive power estimates were to variations in model parameters (). Specifically, we examined the sensitivity of power estimates to differing influenza cumulative incidences, the effect of the NPI intervention on the rate of non-influenza ARI and FARI, the cost of RT-PCR testing, the cost of serological testing, the sensitivity of RT-PCR testing and the sensitivity and specificity of serology. In another sensitivity analysis, we assumed a longer, six-month influenza season with lower incidence rates but the same cumulative incidence of infection across the study as the base case.
Parameter values and ranges of the input values in sensitivity analysis.