|Home | About | Journals | Submit | Contact Us | Français|
Birth records are an important source of data for examining population-level birth outcomes, but questions about the reliability of these vital records exist. We sought to assess the reliability of birth certificate data by comparing them with data from a large prospective cohort. Pregnancy, Infection, and Nutrition cohort study participants were matched with their birth certificates to assess agreement for maternal demographics, health behaviours, previous pregnancies and major pregnancy events. Agreement among categorical variables was assessed using percentage agreement and kappa statistics; for continuous variables, Spearman’s correlations and concordance correlation coefficients were used.
The majority of variables had high agreement between the two data sources, especially for maternal demographic and birth outcome variables. Variables measuring anaemia, gestational diabetes and alcohol consumption showed the lowest correlations. Number of cigarettes smoked and number of previous pregnancies differed by education categories. For most variables, birth records appear to be a good source of reliable information. With the exception of a few variables that differed by education, most variables did not differ by stratum of race or education. Our research further supports the use of birth certificates as a reliable source of population-level data.
Vital records are widely used in research monitoring maternal and child health status in the United States.1–5 Administrative records, such as birth certificates, are commonly used because of their beneficial features. Birth records represent the total population of births in a given geographical area during a specific time. The birth record form is standardised and therefore some information, such as infant birthweight, is uniformly collected across geographical areas and over time. In addition, birth record data are relatively inexpensive to obtain.
Vital records have improved over time. In the last decade, there has been a dramatic increase in the amount of information collected on birth certificates. In 1906, only about seven fields of information were included on a birth certificate, but by the end of the century, some states collected data on >200 items.6 Infant information has grown from collecting the child’s name and birth date to reporting of congenital abnormalities and method of delivery.6 Similarly, before 1925 the maternal information collected included a woman’s name, address, age, birthplace, occupation and number of previous children.6 Now, information on obstetric procedures, labour complications, delivery methods and medical risk factors treated during the pregnancy are available.
The concerns inherent in using vital records to monitor public health are numerous and include the variability of data quality (Table 1), especially those data addressing maternal health behaviours,7–10 inconsistent vital records data collection11 and reliance on maternal recall for events occurring in the past.11 In light of these issues researchers generally recommend caution when interpreting birth certificate data for research.
One approach to assessing the validity and reliability in vital records data has been to match vital records data with other data sources, such as hospital medical records.7,10,12–15 We recently reviewed the literature assessing the quality of vital records data and found that demographic, prenatal care, pregnancy history, insurance, delivery method and birth outcomes are described by the authors as demonstrating consistently good agreement (Table 1). Other variables, including behavioural risk factors, concurrent illnesses or medical conditions, and pregnancy and delivery complications are described as demonstrating both moderate and poor agreement. Because birth records are commonly used as a data source for both outcomes (e.g. infant birthweight) and exposures (e.g. maternal age) in maternal and child health-related research, the validity and reliability of reported information is crucial.
Of particular concern to researchers using vital records for health disparity work is the possible differential reporting of birth record data by maternal socioeconomic status or race/ethnicity. For instance, one recent study found smoking behaviour differentially reported by maternal education level and infant birthweight16 while another found lack of English language proficiency associated with under-reporting elements of birth certificates.17 Both studies noted that differential reporting could produce biased associations.16,17 In the light of persistent racial and social class disparities in maternal and child health outcomes, differential vital record reporting may account for some portion of the disparate associations noted in the literature. To assess this possibility as well as to assess the reliability of select variables in the North Carolina vital records data among our study population, we compared vital records with data from the Pregnancy, Infection, and Nutrition (PIN) cohort study.
In this study, we assessed the extent to which agreement existed between selected demographic, socioeconomic, health behaviour, maternal complications and birth outcome variables of the vital records and the cohort study data. We further assessed whether reporting differences were found by race and by maternal educational level.
Data were from the PIN cohort study. Between 2000 and 2004, 2006 women were recruited before 20 weeks’ gestation through the University of North Carolina Hospitals residents’ and private physicians’ obstetrics clinics. Women were excluded from study participation if they were <16 years old, did not speak English, had a multiple pregnancy, were not planning on continuing care or delivering at the study site or did not have a telephone number at which they could be reached for interviews. Study participants completed two self-administered questionnaires and two telephone interviews. Participants consented to medical chart review, and trained PIN project personnel abstracted information related to medical conditions and clinical tests located in the study participants’ medical charts. Further details on the study methodology can be found elsewhere.18,19
We obtained North Carolina birth records for the five counties containing the majority of PIN participants (Alamance, Chatham, Durham, Orange and Wake) from the North Carolina State Center for Vital Statistics (2001–05). PIN participants were matched to their birth record using the mother’s name, address of her residence and the birth date and sex of the child. Of the 95 261 birth records available, 1685 were successfully matched, resulting in an 87% match rate for the PIN participants for whom delivery information was available.
Vital records often use birth records as a source of geocodable outcome data, and in this study we chose to assess variables that are potentially located on the causal pathway between neighbourhoods and health, including maternal demographic and behavioural variables. We were also interested in health conditions that develop over the course of pregnancy, such as anaemia, gestational diabetes and pregnancy-induced hypertension, which could be affected by neighbourhood conditions and stressful environments.20
A priori, we chose the cohort data to serve as the ‘gold standard’ for reporting. The cohort data were collected during the pregnancy, not after the birth outcome. Also, research interviewers worked with the participants for an extended period of time, developing trust that might promote more honest responses from the participants.
We did not assess comparability of multiple gestations because these women were excluded from the PIN study. We also did not compare month entered into prenatal care, number of prenatal care visits or insurance information because these variables were standard in the PIN dataset due to women being recruited early in pregnancy and all PIN women having some form of insurance.
Most continuous variables were constructed similarly in both the PIN study and on the birth records; only two of the continuous variables were slightly discrepant. Women were asked the average number of cigarettes smoked per day for months 1–6 of the pregnancy in the PIN study. For the birth records, the time period used when asking women about their smoking habits was the full pregnancy. For the other variable, the number of previous pregnancies, the PIN study included stillbirths in their count of previous pregnancies whereas the birth records did not.
Differences in categorical variable construction between the PIN study and the birth records were overcome by collapsing the original categories to create the most common metric between the two data sources. For instance, maternal race became White non-Hispanic, Black non-Hispanic or other (hereafter referred to simply as White, Black and other), marital status became married or not married, and alcohol consumption became <5 or ≥5 drinks per week while pregnant. The PIN study reported the presence of anaemia during each trimester of pregnancy, whereas the birth records ask about anaemia during the entire pregnancy; therefore, the presence or absence of anaemia was used.
Categorical variables were compared using percentage agreement and kappa statistics while continuous variables were compared with Spearman’s correlations and concordance correlation coefficients (CCC). The kappa statistic estimates chance-corrected agreement by subtracting out degree of concordance expected by chance alone.21 An unweighted version of the kappa statistic was used here, which gives no ‘partial credit’ for near-agreement in the case of multicategorical variables. A kappa value of 0 corresponds to a degree of concordance consistent with the null hypothesis that two scores agree only by chance, whereas a score of +1 indicates perfect agreement and −1 indicates perfect disagreement. The CCC is a comparable statistic for assessing agreement on a continuous measure.22,23 It is estimated as a product of r (the Pearson correlation coefficient) and the measures of precision and accuracy. The 95% confidence intervals (CI) were estimated with bootstrapping methods because asymptotic intervals function poorly for estimates close to 1.00, yielding upper limits >1, and thus outside the logical range of the statistic. Intervals were estimated by taking empirical central 95% percentiles after 1000 resamples with replacement from the observed data.24 The same was done for the calculation of CIs for the Spearman’s correlations.
In addition, we investigated whether these correlations may differ by stratum of race or education. Thus, we compared the race- and education-stratified kappas and CCCs for categorical and continuous variables, respectively, by examining their 95% CI overlap.
The PIN study recruited 2006 women, of whom 69% classified themselves as White. Over 71% were married and 56% had at least 16 years of education. Of the subset of PIN study women who were successfully matched to birth records, 70% classified themselves as White, 73% were married, 60% had at least 16 years of education, showing that the sample of women that were matched to their birth records were representative of the women participating in the PIN study. Other characteristics of the women from the PIN study matched to their birth records are given in Table 2.
The majority of responses given in the PIN study matched the responses provided in birth records. Of the eight categorical variables we examined, agreement exceeded 93% for seven of the variables and four had a kappa statistic of at least 0.80. The variable for anaemia during pregnancy had the lowest percentage agreement (71%) and a kappa statistic of 0.18 [95% CI 0.14, 0.23]. For the continuous variables, all of the CCCs reported were above 0.80, and four of the CCCs for these variables were above 0.95. The other variables, years of education, maternal weight gain and number of cigarettes smoked during pregnancy, had CCCs of 0.86 [95% CI 0.85, 0.87], 0.82 [95% CI 0.80, 0.83] and 0.80 [95% CI 0.78, 0.82], respectively. The Spearman’s correlations were similar with five of the seven correlations being above 0.90 and two being between 0.80 and 0.90.
We evaluated the agreement between the cohort study and vital records data stratified by White and Black race (Table 3). For both Whites and Blacks, marital status, birthweight and preterm birth had a percentage agreement and a kappa statistic >0.80. Pregnancy-induced hypertension/eclampsia and gestational diabetes had percentage agreements above 90% but kappa statistics below 0.75. For both Whites and Blacks, anaemia had a percentage agreement below 75% and a kappa statistic <0.20. Among the continuous variables, maternal age, years of education, number of weeks of gestation and birthweight had a Spearman’s correlation coefficient of 0.90 or greater and a CCC of at least 0.80 for Whites and Blacks. For the number of previous pregnancies both Whites and Blacks had a Spearman’s correlation coefficient >0.95 but the CCCs were close to 0.60. Both race categories also had similar Spearman’s correlations, approximately 0.80, for the number of cigarettes women reported smoking during pregnancy. The CCC for White women was at a similar level of agreement, 0.81 [95% CI 0.79, 0.83]; however, for Black women the CCC was lower (0.68 [95% CI 0.61, 0.74]). Maternal weight gain had higher agreement among White women than Black women (CCC of 0.85 [95% CI 0.84, 0.87] for Whites and 0.72 [95% CI 0.66, 0.72] for Blacks, respectively).
We analysed the agreement of these same variables stratified by a women’s educational achievement, categorised as <12 years, 12 years or >12 years of education (Table 4). For the categorical variables of race, birthweight and preterm birth the percentage agreement and kappa statistics were above 0.90 within all stratum of education. The kappa statistic for marital status varied by years of education, with women who obtained higher education having a higher agreement between the PIN and vital records data (education <12 years: kappa 0.73 [95% CI 0.59, 0.85]; education >12 years: kappa 0.89 [95% CI 0.86, 0.93]).
Similar to what was seen in the race-stratified analysis, gestational diabetes and pregnancy-induced hypertension had a percentage agreement above 90% but lower kappa statistics values, with the range for pregnancy-induced hypertension between 0.68 [95% CI 0.60, 0.75] and 0.79 [95% CI 0.56, 0.96] and the range for gestational diabetes between 0.06 [95% CI 0.00, 0.27] and 0.17 [95% CI 0.00, 0.52]. Anaemia had a percentage agreement and a kappa statistic below 75% and 0.30, respectively, for all strata of education. Alcohol intake could not be evaluated because of the small number of individuals that reported consuming alcohol while pregnant. Both maternal age and birth-weight showed the highest correlations among the continuous variables, with all strata of education having a Spearman’s correlation coefficient and a CCC >0.98. Gestational age also had correlations of ≥0.90 across all education strata. Similar to the race-stratified results for the number of previous pregnancies, for this variable each stratum of education had a Spearman’s correlation >0.95 but had a CCC below 0.65 with the exception of the lowest educated group (CCC 0.75 [95% CI 0.68, 0.80]). The number of cigarettes smoked during pregnancy also showed a greater correlation for women with fewer years of education when compared with those with 12 or more years of education (education < 12 years: CCC 0.83 [95% CI 0.76, 0.88]; education = 12 years: CCC 0.72 [95% CI 0.65, 0.78]; education > 12 years: CCC 0.77 [95% CI 0.74, 0.79]). Finally, for maternal weight gain, women with at least 12 years of education had a CCC of 0.85 [95% CI 0.83, 0.86] but women with <12 years of education had a CCC of 0.72 [95% CI 0.63, 0.80].
We used data from the PIN prospective cohort study and North Carolina birth records to assess the reliability of the information obtained on the birth certificate. As demonstrated in previous studies, we found high agreement among maternal demographic and birth outcome variables.9,10,14 In addition, we found moderate agreement for behavioural risk factors and medical events variables, except for alcohol consumption, anaemia and gestational diabetes. This level of agreement is similar to some research assessing vital record reliability7,9,10,12,16 but better than others.8,13,14 Like previous research,7–9 alcohol consumption showed low correlation between the two data sources; however, the prevalence of women reporting that they consumed at least five drinks per week while pregnant was <1%, which had an effect on the correlation results.
Overall, anaemia showed poor percentage agreement and kappa. This could be due to the way the variable was constructed. For the PIN study, women’s medical records were checked for any report of anaemia for each trimester of her pregnancy. For the birth records, it was recorded only at the end of pregnancy. Women may not remember to report a brief period early in their pregnancy when they were anaemic. Therefore, we found that anaemia during pregnancy, as reported on the birth record, was not a reliable variable. Gestational diabetes is a rare event with <4% of the sample having reported gestational diabetes, which factored into the agreement.
The only variable that showed a difference in reliability by race among our study cohort was maternal weight gain. Whites had a higher correlation between weight gain reported in the PIN study and on the birth records than Blacks. Maternal weight gain was also reported with differential agreement by stratum of education, with women of ≤12 years of education having a lower correlation than women with >12 years of education.
The majority of variables in this study had no apparent difference in reporting by education level, and generally, we found similar patterns of agreement among all categories of education. Higher educated women had a better correlation for reporting of their marital status but had a lower correlation for the number of cigarettes smoked. Women with higher education had a lower correlation for the reporting of their number of previous pregnancies. This may be due to the exclusion of stillbirths from the count of previous pregnancies in the birth records variable, as women with higher education may be waiting longer to become pregnant and thus increasing their chances of having difficulties with the pregnancy. Finding differential reporting of birth record elements by educational strata is consistent with other reported research.17
Some variables had high percentage agreement values but low kappa scores, which indicates that they have very high agreement by chance alone, with little room for agreement beyond what one would expect by random assignment. This generally occurs for variables with high prevalence. For example, consider a binary variable with 90% of values equal to 1 in both data sources, and suppose that these values are assigned completely at random (i.e. the null value of the kappa statistic is true as an outcome in one data source is completely independent of the outcome in the other data source). The proportion in agreement will be (0.9*0.9) + (0.1*0.1) = 0.82 even when the kappa statistic equals zero.
More information is being collected on birth records than ever before, and there continues to be interest among perinatal researchers in using these data for surveillance purposes and estimating health associations. The additional variables collected on birth records may allow researchers to begin exploring possible mechanisms from maternal demographics, health behaviours and pregnancy events to birth outcomes. As interest in contextual and neighbourhood-level analyses has grown, vital records have increasingly become recognised as a source of readily available geocodable data. The intersection of geocoded addresses and sensitive data, however, is a potent combination and calls for careful consideration of privacy and confidentiality, not only of individual women, but also of their neighbourhoods. We do not argue that the quantity and nature of the data collected on today’s birth certificate is a negative; rather, we want to stress the importance of keeping individual and neighbourhood information confidential.
This study has several strengths. We were able to examine correlations stratified by race and highest level of education achieved. We included counties with urban, suburban and rural areas. Unlike previous research linking birth certificate data with hospital discharge summaries which are also rife with challenges, we linked our birth certificates with data sources in which we have considerable confidence. Interviewers for the PIN study received substantial training in how to reliably collect sensitive and other data and built a rapport with the women they interviewed.
One important limitation to the study reported here relates to PIN participants’ ability to represent the general population in this area of North Carolina.25 While we only compare PIN cohort data with PIN participants’ birth records, the cohort’s lack of generalisability may hinder our ability to make broad inferences regarding vital record reliability for all women. Additionally, only 87% of the women in the PIN study were matched with birth certificates and included in the analysis presented here. Some variables had low prevalence that hindered our assessment of agreement. Specifically, the low prevalence of alcohol consumption and gestational diabetes greatly contributed to the poor agreement for those variables. Further, data abstracted from medical records may not necessarily have perfect validity or reliability. Therefore, correlation between medically abstracted and birth records data may not necessarily be as informative if the former does not constitute an ideal gold standard. In the case of the PIN study, trained study personnel abstracted the relevant information from medical charts, thereby reducing the likelihood of transcription errors and reproduction of questionable values.
In conclusion, for most variables, birth records appear to be a good source of reliable information. The majority of variables showed no difference in agreement stratified by race which demonstrates that differential reporting does not contribute meaningfully to the racial disparity in maternal health behaviours, medical events and birth outcomes. Results also illustrated similar agreement across strata of education with the exception of variables for maternal weight gain, cigarette smoking and marital status. We support the use of birth records for studying how individual sociodemographic and health behaviour characteristics are influenced by social and environmental factors.
Funding for this study was provided by the Department of Health and Human Services, Health Resources and Services Administration, Maternal and Child Health Bureau (#1 R40MC07841-01-00) and National Institute of Health (NIH)/National Cancer Institute (#CA109804-01). Data collection was supported by NIH/National Institute of Child Health and Human Development (#HD37584) and NIH General Clinical Research Center (#RR00046). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The PIN Study is a join effort of many investigators and staff members whose work is gratefully acknowledged.