We identified 82 cohort studies of 1,000 or more participants that were published from 2000 to 2009 and which analysed exposure data collected from repeated follow-up waves. The reporting of missing data in these studies was found to be inconsistent and generally did not follow the recommendations set out by the STROBE guidelines [10
] or the guidelines set out by Sterne et al. [12
]. The STROBE guidelines recommend that authors report the number of participants who take part in each wave of the study and give reasons why participants did not attend a wave. Only three papers [30
] followed the STROBE guidelines fully. The majority of papers did not provide a reason or comment for why study participants did not attend each wave of follow-up. Sterne et al. [12
] recommend that the reasons for missing data be described with respect to other variables and that authors investigate potentially important differences between participants and non-participants.
The STROBE guidelines were published in 2007. Of the nine papers published after 2007, only one followed the STROBE guidelines fully. This suggests that either journal editors are not using these guidelines or authors are not considering the impact of missing covariate data in their research.
A review of missing data in cancer prognostic studies published in 2004 by Burton et al. [36
] and a review of developmental psychology studies published in 2009 by Jelicic et al. [3
] reported similar findings to ours. Burton et al. [36
] found a deficiency in the reporting of missing covariate data in cancer prognostic studies. After reviewing 100 articles, they found that only 40% of articles provided information about the method used to handle missing covariate data and only 12 articles would have satisfied their proposed guidelines for the reporting of missing data. We observed in our review, of articles published from 2000 to 2009, that a larger proportion of articles reported the method used to handle the missing data in the analysis but that many articles were still not reporting the amount of missing data and the reasons for missingness.
The cohort studies we identified used numerous methods to handle missing data in the exposure-outcome analyses. Although some studies used advanced statistical modelling procedures (e.g. MI and Bayesian), the majority removed individuals with missing data and performed a complete-case analysis; a method that may produce biased results if the missing data are not MCAR. Jelicic et al. also found in their review that a large proportion of studies used complete-case analysis to handle their missing data [3
]. For studies with a large proportion of missing data, excluding participants with missing data may also reduce the precision of the analysis substantially. Ad hoc
methods (e.g. LOCF, the missing indicator method and mean value substitution), which are generally not recommended [16
] because they fail to account for the uncertainty in the data and may produce biased estimates [12
], continue to be used. Although MI is becoming more accessible, only five studies used this method. The reporting of the imputation procedure was inconsistent and often incomplete. This was also observed by two independent reviews of the reporting of MI in the medical journals: BMJ, JAMALancet
and the New England Journal of Medicine
]. Future studies should follow the recommendations outlined by Sterne et al. [12
] to ensure that enough details are provided about the MI procedure, especially the implementation and details of the imputation modelling process.
Strengths and limitations of the literature review
We aimed to complete a comprehensive review of all papers published that analysed exposure variables measured at multiple follow-up waves. Several keywords were used in order to obtain as many articles as possible. The keyword search was then supplemented with cohort studies identified from a pooled analysis of 61 cohort studies. Although a large number of abstracts and studies were identified, some cohort studies might have been missed. If multiple papers were identified from one study, the most recent article was included in the review, which might have led us to omit papers from the same study that used a more appropriate missing data method. Our search criteria only included papers written in English and only PubMed was searched. Our search strategy was limited to articles published between 2000 and 2009. On average three papers of the type we focussed on were published each year from 2000 to 2002 and the number has increased since then, so it seems unlikely that many papers were published before this time. Also, MI was not as accessible prior to 1997, so papers published before 2000 were more likely to have used complete case analysis or other ad hoc methods.