We compared maternal telephone interview data with medical record data to assess maternal recall in a case–control study of infant leukaemia. Case and noncase mothers perfectly recalled having a Caesarean section. All other medical conditions we studied exhibited differences between telephone interview responses and medical records, albeit the extent of these differences was estimated imprecisely. Different data sources for maternal and infant characteristics led to qualitative differences in some instances, which appeared especially marked for hypertension.
Our analyses use slightly different sets of mothers for the record-based and interview-based estimates due to missing and ‘don’t know’ responses. Hence, our comparisons represent the total difference between the two, combining inclusion effects (for medical record vs. telephone interview) with recall effects. Due to the minor inclusion differences, however, the inclusion effect appeared negligible; e.g. when we recomputed the hypertension ORs using only mothers with both medical record and interview data, the bootstrap ratio of unadjusted ORs (interview to medical record) became 0.58 [95% CI 0.33, 1.01] (vs. 0.53 [95% CI 0.30, 0.95] in ). Therefore, our discussion focuses on recall effects.
Despite the 37% longer time between birth and interview among noncases, we did not see evidence of systematically poorer recall among noncase mothers. Hypertension was also the only condition for which a clear direction of differentiality seemed apparent, with half the recall of actual hypertension among cases compared to noncases (sensitivity = 46% for cases vs. 90% for noncases, P = 0.003). Nonetheless, given the number of comparisons made, this result could easily be a chance artefact. In fact, the exact test of nondifferentiality for maternal hypertension recall (which simultaneously compares the sensitivities and specificities) gave P = 0.11, meaning that even the direction of differentiality is not well determined by our data.
We know of only two published validation studies of maternal risk factors for childhood leukaemia.1,2
Infante-Rivard and Jacques1
used mother’s self-reported responses as the gold standard for validating radiographic examinations during pregnancy, whereas our gold standard measure was the medical record. Both studies, however, estimated that sensitivity for ultrasound exceeded 95%. On the other hand, Olson et al
focused on validating birthweight, reproductive history and maternal medical procedures also using medical records as the gold standard. They coded unrecorded medical record values as if the patient did not have the condition, an alternative approach to our method of coding them as not recorded/not abstracted. As a result, differences between those results and ours are not unexpected. Unfortunately, there was not sufficient information in the paper by Olson and colleagues to allow for calculation of P
-values or CIs for these differences. Nonetheless, in accord with our findings, Olson et al
. observed perfect recall of Caesarean section and very high recall for ultrasound examinations. Specificities were mostly similar to ours except for peripheral oedema in cases (63% in Olson et al
. vs. 47% here). Sensitivities appeared most different for child received oxygen (cases: 64% vs. 25%; noncases 70% vs. 24%), maternal hypertension (cases: 75% vs. 46%; noncases: 59% vs. 90%) and peripheral oedema (cases: 69% vs. 86%; noncases: 68% vs. 89%). Among noncases alone, differences in sensitivities appeared for diabetes mentioned in the medical record (60% vs. 83%) and child moved for special treatment (67% vs. 81%), and among cases alone for toxaemia or pre-eclampsia (75% vs. 47%).
Maternal recall of perinatal conditions has been explored in mothers with healthy children and other illnesses.10–17
In a case–control study of Sudden Infant Death Syndrome, Drews et al
found Caesarean section to have high sensitivity (cases = 100%; non-cases = 97%) and maternal hypertension to have low sensitivity (cases = 53%; noncases = 61%). Other studies11–17
also suggest that recall of many perinatal events is not perfect, but that method of delivery has high agreement.
The data for our analyses are from the largest case–control study of infant leukaemia to date. Cases and noncases resided throughout the United States and Canada and were diagnosed at COG institutions, providing a near population-based study.18,19
Nonetheless, our analysis involved only 234 out of 348 (67%) eligible case mothers and 215 out of 430 (50%) eligible noncase mothers. These limited participation rates reflect the fact that our analyses required both completion of telephone interview and release of medical records. Consequently, our findings may not reflect the results that would have been found had everyone in our target population participated. Moreover, as with all studies we cannot be sure that associations we observed (even if they are valid) generalise to other populations.
Additional data sources are rarely available to check interview responses. Perinatal medical records have the advantage of being collected prior to the diagnosis of cancer, thereby eliminating much recall bias. However, medical records can be difficult and expensive to collect. There is also unease among participants (especially healthy individuals) to permit collection of medical records due to privacy and confidentiality issues.20
In addition, medical records can have errors. In fact, some maternal and infant characteristics may be more accurate using self-reported data than medical records (e.g. morning sickness).2
In addition to maternal recall bias, our agreement study may have other sources of error present. Our study depends on completeness as well as accuracy of medical records. Lack of information recorded in a medical record does not necessarily imply that a medical condition was not discussed or medically tested, and it is important to distinguish lack of recording from lack of the condition. This can be accomplished by checking each medical record for method of data capture (e.g. specific checkbox acknowledging condition vs. open form to write in the condition), a task unfortunately not feasible in this study due to the multiplicity of forms from different hospitals. Failure to record items would result in some mothers being classified as negative when actually positive (false negatives), and others being classified as positive who were actually negative (false positives).
Another possible source of error concerns the fact that the medical record data in our study were abstracted only once by two nurse abstractors. Although both nurse abstractors were trained and blinded to case–control status, we cannot rule out reporting errors that could have occurred during the abstraction process. No formal quality control process was implemented; however, data checks were performed during data cleaning. Nonetheless, we would expect these reporting errors to be sporadic rather than systematic, and more often in the form of failure to note. Despite such errors, medical records can still identify problematic recall items, especially if they exhibit conditions that were not reported by the mother (indicating that recall failures are present).
Despite such errors, sensitivity and specificity estimates from agreement studies like ours can be used as starting points for sensitivity and bias analyses of recall error.6,7,21
The results of our study and previous validation studies2,10–17
can also be useful when designing future case–control studies that will collect perinatal information. While medical procedures such as Caesarean sections and ultrasounds may be recalled accurately, maternal reports of pregnancy and neonatal complications appear to be much less reliable, especially if they are more occult conditions with technical names (e.g. proteinuria or toxaemia). Hence, instead of relying on only one data source for perinatal events, multiple measurement tools may be necessary to construct more accurate perinatal measurements. Therefore, agreement studies such as ours can also guide which data source should be used for gathering exposure data for future epidemiological case–control studies, and suggest ways to combine measurements into a single more accurate summary.
Agreement and validation substudies are valuable but need some forethought to produce useful information for designing future studies and analyses. Ideally, these substudies would be included in the initial study proposals to allow their review before implementation. When designing substudies, we recommend identifying information collected by the best available measurement. Then, use this information to design the tool that is used to obtain the primary measurements (e.g. a telephone interview). For instance, one could gather medical records to understand the type of data collected by the best available measurement. One would then include interview questions that match the content of the best available measurements so that agreement can be examined whenever the latter can be obtained.