|Home | About | Journals | Submit | Contact Us | Français|
In a case–control study of infant leukaemia, we assessed agreement between medical records and mother’s self-reported pregnancy-related conditions and procedures and infant treatments. Interview and medical record data were available for 234 case and 215 control mothers. Sensitivity, specificity and predictive values for maternal report were estimated for case and control mothers separately, taking the medical record as correct. For most perinatal conditions, sensitivity and specificity were over 75%. Low sensitivity was observed for maternal protein or albumin in the urine (cases: 12% [95% exact confidence interval (CI) 8%, 18%]; controls: 11% [95% CI 7%, 17%]) and infant supplemental oxygen use (cases: 25% [95% CI 11%, 43%]; controls: 24% [95% CI 13%, 37%]). Low specificity was found for peripheral oedema (cases: 47% [95% CI 37%, 58%]; controls: 54% [95% CI 43%, 64%]). Sensitivity for maternal hypertension appeared much lower for cases (cases: 46% [95% CI 28%, 66%]; controls: 90% [95% CI 70%, 99%]; P = 0.003). We did not detect other case–control differences in recall (differentiality), even though the average time between childbirth and interview was 2.7 years for case and 3.7 years for control mothers. Many conditions exhibited notable differences between interview and records. We recommend use of multiple measurement sources to allow both cross-checking and synthesis of results into more accurate measures.
Case–control studies are often the only feasible design for epidemiological research on the aetiology of childhood cancer. It is unclear, however, how much mis-classification exists due to inaccurate maternal recall in these studies. In some instances, the index pregnancy occurred years ago and therefore recollecting past experiences may be difficult and subject to considerable error. To our knowledge, two case–control studies of childhood cancer have evaluated the validity of maternal recall of perinatal conditions. The first study included children aged 9 years or less diagnosed with leukaemia between 1980 and 1993;1 the second study included children diagnosed 18 months or less between 1983 and 1988.2
Medical records are often taken as a ‘gold standard’ to assess the accuracy of self-reported data although such records may have errors. Here, we report results for agreement between maternal reports and medical records in a North American study of infant leukaemia.
The current study uses data from a Children’s Oncology Group (COG) case–control study of infant leukaemia (COG-AE24) with a particular focus on maternal and infant characteristics as risk factors.3–5 The initial phase of the study included infants diagnosed with leukaemia (cases) during the first year of life at a COG institution between 1 January 1996 and 20 August 2002. Noncase (control) mothers were identified by random digit dialling and frequency matched on child’s birth year. Of those eligible, 240 out of 348 (69%) case and 254 out of 430 (59%) noncase mothers participated in a telephone interview about events during and after pregnancy, including medical conditions, menstrual, hormone-use and contraceptive history, labour and delivery, and infant birth characteristics and hospitalisation.
Mothers were also asked for consent to obtain all hospital and health care provider records beginning 6 months prior to conception of the index child until the end of 1 week after birth or discharge from hospital (whichever occurred later). Records from all providers seen during pregnancy and delivery were requested. Providers were re-contacted if record information was incomplete. Medical records for 234 (98%) case and 215 (85%) noncase mothers who completed the telephone interview were obtained. Of those six case mothers without any medical record information, five refused to release medical records and one mother’s medical record was not in English and therefore not requested. Among noncase mothers, medical records were not available for 39 (34 refused to release medical records and five consented but medical records were not received after request and follow-up or incorrect records were received). Written informed consent was obtained before the telephone interview and written release of medical records was obtained after the interview. Each of the participating COG institutional review boards approved the study.
For simplicity and brevity, we restricted maternal recall to dichotomies included in the telephone interview whose content was similar to information expected in the medical record. In calculating measures of agreement, the medical record was taken as correct. The following conditions were included: gestational diabetes, hypertension, maternal protein or albumin in urine, peripheral oedema, toxaemia or pre-eclampsia, ultrasound, Caesarean section, and whether infant received phototherapy, blood products, supplemental oxygen, or was moved for special treatment. All medical conditions were identified as being mentioned in a medical record; none were based on diagnostic criteria or defined values. If a condition was not specifically noted or ruled out in a medical record, we coded it as not recorded/not abstracted and therefore not included in calculations.
Sensitivity was estimated by taking self-reported presence of the condition as a percentage of mothers having the condition indicated in the medical record. Specificity was estimated by taking the self-reported absence of the condition as a percentage of mothers not having the condition in the medical record. Both were calculated separately for mothers of cases and noncases. To test the hypothesis of nondifferentiality (which is no association of interview data with childhood cancer conditional on the medical record data), we computed an exact P-value for the association of case–control status with telephone interview data stratified on the medical record data for each perinatal condition.6 We also estimated positive and negative predictive values, stratified on the telephone interview data, for mothers of cases and noncases separately. Positive predictive values were estimated as the percentage showing the condition in the medical record among mothers reporting the condition in the interview. Negative predictive values were estimated as the percentage not showing the condition in the medical record among mothers not reporting the condition in the interview. Predictive values depend on exposure prevalences, which can vary across study populations.7 Nonetheless, we have included them because they illustrate directly the limitations of maternal report in this study.
Lastly, we calculated odds ratios (ORs) and 95% confidence intervals (CIs) for the association between mother and infant conditions and the odds of infant leukaemia for the maternal telephone interview and medical record data separately, provided each cell in the 2 × 2 table exceeded 3. Telephone interview ORs were also adjusted for length of time in years between childbirth and telephone interview (continuous variable); due to small cells for gestational diabetes and toxaemia/pre-eclampsia, for their regressions we added two extra records (one case, one control) to stabilise the estimates, as described in Greenland (p. 526).8 We compared telephone interview and medical record ORs using nonparametric bootstrap methods,9 with 10 000 data resamplings to estimate the standard errors of the log ORs and log ratio of ORs; we added 1/2 to each cell when resampling produced a zero cell.
Most mothers provided information during the telephone interview. There were 14 missing responses for the telephone interview, and one control mother responded ‘don’t know’ for gestational diabetes. Some perinatal conditions had low prevalences in medical records. Nonetheless, we calculated sensitivities, specificities and predictive values for these conditions because they have been previously linked aetiologically to infant leukaemia.
Table 1 provides the characteristics of the mothers who completed the telephone interview and for whom at least one medical record (prenatal or labour and delivery) was obtained. Mothers for cases and non-cases were primarily married and White, with a higher proportion of case mothers being Hispanic (11% vs. 3%). Noncase mothers were somewhat more educated and reported higher household incomes than case mothers. Case mothers were younger than noncase mothers at the time of the telephone interview. Further, the average length of time between the index child’s birth and telephone interview was 2.7 and 3.7 years (37% longer) for case and noncase mothers, respectively. Over 90% of case and noncase mothers consented to the release of prenatal records and labour and delivery records. The median number of prenatal visits was 12 for 220 case mothers (range 3–22) and 12 for 206 noncase mothers (range 1–18).
Table 2 provides the numbers for maternal report and medical records for the selected conditions, along with P-values testing nondifferentiality. Table 3 provides the estimated sensitivities, specificities, positive predictive values and negative predictive values comparing medical records and mothers’ self-reports of maternal and infant characteristics. Caesarean section was perfectly classified. For most other perinatal factors of interest, sensitivity and specificity were over 75%. Nonetheless, maternal protein or albumin in the urine and infant supplemental oxygen had estimated sensitivities less than 26%, while peripheral oedema had estimated specificities less than 60%. Case and control sensitivities appeared to differ for maternal hypertension (cases: 46% [95% CI 28%, 66%]; noncases: 90% [95% CI 70%, 99%]; Fisher’s exact P = 0.003). Other systematic case–control differences in recall (differentiality, Table 2) were not detected despite an average time between childbirth and telephone interview that was 1 year longer for noncase mothers (Table 1). Nevertheless, except for maternal hypertension, the direction of differentiality remains ambiguous and nondifferentiality remains a possibility.
High predictive values were estimated for Caesarean section and ultrasound, but only four noncase mothers reported not having an ultrasound (Table 3). Negative predictive values for all pregnancy complications except protein or albumin in urine were greater than 75%. Positive predictive values were less than 75% for all conditions except protein or albumin in urine. For both case and noncase mothers, predictive values for infant received supplemental oxygen and infant moved for special treatment were greater than 60%.
Telephone interview data exhibited inverse associations with leukaemia for hypertension and infant received supplemental oxygen, whereas these perinatal conditions were positively associated using medical record data (Table 4). Protein or albumin in urine was inversely associated for medical records but positively associated for telephone interview data. Gestational diabetes was inversely associated for interview and medical record data, whereas increased odds for leukaemia were estimated for all other conditions in both data sources. Adjusting interview data for length of time between index child’s birth and telephone interview made little difference, although there was a minor reversal in association for peripheral oedema and a near null result for gestational diabetes (Table 4). Comparison of unadjusted ORs for hypertension suggested that the maternal telephone interview and medical record ORs differed (bootstrap ratio of unadjusted interview OR vs. unadjusted medical record OR: 0.53 [95% CI 0.30, 0.95]; P-value = 0.03).
We compared maternal telephone interview data with medical record data to assess maternal recall in a case–control study of infant leukaemia. Case and noncase mothers perfectly recalled having a Caesarean section. All other medical conditions we studied exhibited differences between telephone interview responses and medical records, albeit the extent of these differences was estimated imprecisely. Different data sources for maternal and infant characteristics led to qualitative differences in some instances, which appeared especially marked for hypertension.
Our analyses use slightly different sets of mothers for the record-based and interview-based estimates due to missing and ‘don’t know’ responses. Hence, our comparisons represent the total difference between the two, combining inclusion effects (for medical record vs. telephone interview) with recall effects. Due to the minor inclusion differences, however, the inclusion effect appeared negligible; e.g. when we recomputed the hypertension ORs using only mothers with both medical record and interview data, the bootstrap ratio of unadjusted ORs (interview to medical record) became 0.58 [95% CI 0.33, 1.01] (vs. 0.53 [95% CI 0.30, 0.95] in Table 4). Therefore, our discussion focuses on recall effects.
Despite the 37% longer time between birth and interview among noncases, we did not see evidence of systematically poorer recall among noncase mothers. Hypertension was also the only condition for which a clear direction of differentiality seemed apparent, with half the recall of actual hypertension among cases compared to noncases (sensitivity = 46% for cases vs. 90% for noncases, P = 0.003). Nonetheless, given the number of comparisons made, this result could easily be a chance artefact. In fact, the exact test of nondifferentiality for maternal hypertension recall (which simultaneously compares the sensitivities and specificities) gave P = 0.11, meaning that even the direction of differentiality is not well determined by our data.
We know of only two published validation studies of maternal risk factors for childhood leukaemia.1,2 Infante-Rivard and Jacques1 used mother’s self-reported responses as the gold standard for validating radiographic examinations during pregnancy, whereas our gold standard measure was the medical record. Both studies, however, estimated that sensitivity for ultrasound exceeded 95%. On the other hand, Olson et al.2 focused on validating birthweight, reproductive history and maternal medical procedures also using medical records as the gold standard. They coded unrecorded medical record values as if the patient did not have the condition, an alternative approach to our method of coding them as not recorded/not abstracted. As a result, differences between those results and ours are not unexpected. Unfortunately, there was not sufficient information in the paper by Olson and colleagues to allow for calculation of P-values or CIs for these differences. Nonetheless, in accord with our findings, Olson et al. observed perfect recall of Caesarean section and very high recall for ultrasound examinations. Specificities were mostly similar to ours except for peripheral oedema in cases (63% in Olson et al. vs. 47% here). Sensitivities appeared most different for child received oxygen (cases: 64% vs. 25%; noncases 70% vs. 24%), maternal hypertension (cases: 75% vs. 46%; noncases: 59% vs. 90%) and peripheral oedema (cases: 69% vs. 86%; noncases: 68% vs. 89%). Among noncases alone, differences in sensitivities appeared for diabetes mentioned in the medical record (60% vs. 83%) and child moved for special treatment (67% vs. 81%), and among cases alone for toxaemia or pre-eclampsia (75% vs. 47%).
Maternal recall of perinatal conditions has been explored in mothers with healthy children and other illnesses.10–17 In a case–control study of Sudden Infant Death Syndrome, Drews et al.10 found Caesarean section to have high sensitivity (cases = 100%; non-cases = 97%) and maternal hypertension to have low sensitivity (cases = 53%; noncases = 61%). Other studies11–17 also suggest that recall of many perinatal events is not perfect, but that method of delivery has high agreement.
The data for our analyses are from the largest case–control study of infant leukaemia to date. Cases and noncases resided throughout the United States and Canada and were diagnosed at COG institutions, providing a near population-based study.18,19 Nonetheless, our analysis involved only 234 out of 348 (67%) eligible case mothers and 215 out of 430 (50%) eligible noncase mothers. These limited participation rates reflect the fact that our analyses required both completion of telephone interview and release of medical records. Consequently, our findings may not reflect the results that would have been found had everyone in our target population participated. Moreover, as with all studies we cannot be sure that associations we observed (even if they are valid) generalise to other populations.
Additional data sources are rarely available to check interview responses. Perinatal medical records have the advantage of being collected prior to the diagnosis of cancer, thereby eliminating much recall bias. However, medical records can be difficult and expensive to collect. There is also unease among participants (especially healthy individuals) to permit collection of medical records due to privacy and confidentiality issues.20 In addition, medical records can have errors. In fact, some maternal and infant characteristics may be more accurate using self-reported data than medical records (e.g. morning sickness).2
In addition to maternal recall bias, our agreement study may have other sources of error present. Our study depends on completeness as well as accuracy of medical records. Lack of information recorded in a medical record does not necessarily imply that a medical condition was not discussed or medically tested, and it is important to distinguish lack of recording from lack of the condition. This can be accomplished by checking each medical record for method of data capture (e.g. specific checkbox acknowledging condition vs. open form to write in the condition), a task unfortunately not feasible in this study due to the multiplicity of forms from different hospitals. Failure to record items would result in some mothers being classified as negative when actually positive (false negatives), and others being classified as positive who were actually negative (false positives).
Another possible source of error concerns the fact that the medical record data in our study were abstracted only once by two nurse abstractors. Although both nurse abstractors were trained and blinded to case–control status, we cannot rule out reporting errors that could have occurred during the abstraction process. No formal quality control process was implemented; however, data checks were performed during data cleaning. Nonetheless, we would expect these reporting errors to be sporadic rather than systematic, and more often in the form of failure to note. Despite such errors, medical records can still identify problematic recall items, especially if they exhibit conditions that were not reported by the mother (indicating that recall failures are present).
Despite such errors, sensitivity and specificity estimates from agreement studies like ours can be used as starting points for sensitivity and bias analyses of recall error.6,7,21 The results of our study and previous validation studies2,10–17 can also be useful when designing future case–control studies that will collect perinatal information. While medical procedures such as Caesarean sections and ultrasounds may be recalled accurately, maternal reports of pregnancy and neonatal complications appear to be much less reliable, especially if they are more occult conditions with technical names (e.g. proteinuria or toxaemia). Hence, instead of relying on only one data source for perinatal events, multiple measurement tools may be necessary to construct more accurate perinatal measurements. Therefore, agreement studies such as ours can also guide which data source should be used for gathering exposure data for future epidemiological case–control studies, and suggest ways to combine measurements into a single more accurate summary.
Agreement and validation substudies are valuable but need some forethought to produce useful information for designing future studies and analyses. Ideally, these substudies would be included in the initial study proposals to allow their review before implementation. When designing substudies, we recommend identifying information collected by the best available measurement. Then, use this information to design the tool that is used to obtain the primary measurements (e.g. a telephone interview). For instance, one could gather medical records to understand the type of data collected by the best available measurement. One would then include interview questions that match the content of the best available measurements so that agreement can be examined whenever the latter can be obtained.
This research was supported by NIH R03 CA141403, R01 CA79940, U10 CA13539, U10 CA98543, U10 CA98413, the Children’s Cancer Research Fund (Minneapolis) and NIH P30 CA77598 (University of Minnesota Masonic Cancer Center shared resource: Health Survey Research Center). The authors would like to thank the anonymous reviewers for helpful comments on a previous draft.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Cancer Institute or the National Institutes of Health.
Conflict of interest: none declared.