We updated Miranda et al.’s (2002)
analysis using additional blood lead surveillance and updated tax parcel data from 18 North Carolina counties and by substituting 2000 Census data for the 1990 data used in the earlier analysis (U.S. Census Bureau 2000
). The 18 counties span the state and represent very different populations, climates, economies, and housing stocks. They include Buncombe, Carteret, Craven, Cumberland, Durham, Edgecombe, Forsyth, Guilford, Henderson, Lenoir, Mecklenburg, Nash, New Hanover, Orange, Stanly, Wake, Wayne, and Wilson Counties, as shown in . Methods for receiving, storing, linking, and analyzing data and presenting results related to this study were all governed by a research protocol approved by the Duke University Institutional Review Board.
Map of 18 counties in North Carolina included in the analysis.
The models include demographic data from the 2000 Census at the block group and block level (U.S. Census Bureau 2000
). We overlaid the Census data on publicly available tax assessor data from each of the counties. In the model, we focused on residential tax parcels, which typically include single-or multifamily housing structures. Digital tax assessor data vary from county to county, but models for each county include year of construction.
We characterized the relationship between BLLs and housing and demographic characteristics by geocoding to the tax parcel the blood lead test results for children who were 9 months to 6 years of age and who were tested between 1995 and 2003. Eighty-nine percent of blood lead samples were capillary draws, with only 5% reported as venous draws. The sample collection method was reported as unknown for the remaining 6% of lead test results. Access to the blood lead data was granted via a negotiated confidentiality agreement with the North Carolina Childhood Lead Poisoning Prevention Program. We geocoded the lead surveillance data to the individual tax parcel unit (as opposed to larger geographic units such as census block groups or tracts) using the tax assessor databases.
Parcel geocoding is a critical step in developing these highly resolved spatial models. Geocoding refers to the process of assigning a geographic coordinate (latitude and longitude) to observations from one data set (in this case, blood lead screens) using reference data (in this case, tax parcel data). This process facilitates the linking of multiple data sets via spatial relationships. Thus, we linked an environmental exposure biomarker (BLL test results) to a polygonal areal unit (tax parcel) via the residential address common to both data sets.
We had 467,204 BLL test results for this population, representing 336,736 individual children. We attempted to geocode all records with complete addresses, defined as addresses with at least street number, street name, and street type, to county tax parcel data. Parcel address information varied by county in both quality and level of completeness. Quality of surveillance address information was also not uniform across local health departments, clinics, and laboratories. Percentages of records geocoded by county ranged from 42.5% to 89.0% for all records and from 56.1% to 96.2% for records with complete addresses. Tax assessor address data tend to be of poorer quality for housing authority parcels and other multifamily complexes, leading to lower match rates for children who reside in these tax parcels.
We implemented three stages of geocoding, as described in . Level I geocoding was an exact match of “as-reported” address information to reference parcel data. With the North Carolina lead screening data, we geocoded about 36.4% of all records by the level I process, which took 7–9 days with one trained staff member working 8 hr/day. Level II geocoding matched data after standardizing the lead screening data to reflect the reference data structure (e.g., by converting all versions of “street”—str., street, st, etc.—to ST). This process took about 20 days and yielded an additional 10.4% of records. Level III geocoding processed records, one by one, using visual analysis of and matching to tax parcel address data. This stage required an additional 3–4 months and led to an additional 22.0% of records being geocoded. Even after the most intensive level III geocoding, 31.2% of records remained ungeocoded, although one-third of these did not include a complete address.
Geocoding processes for 18 North Carolina counties.
In general, the level I stage geocoded records rapidly, but in some counties the number geocoded by this process alone may not be sufficient. Level II geocoding provided additional data with little additional effort, whereas level III geocoding required substantial time and effort and might be more prone to errors in positional accuracy—that is, locating an observation at the wrong parcel. This could be due to the quality of the recorded address or to the process. Geocoding outcomes differ substantially across individual counties. For instance, the level III process geocoded approximately 40% of records in Carteret and Lenoir Counties but no records in Henderson County.
After all three geocoding processes were complete, we included only the record with the highest BLL at each parcel for each child in the analysis. We used this conservative or more protective selection method because the highest results provide information regarding levels of exposure to biologically available lead. This method is consistent with the approach used by Lanphear et al. (1998)
. Because of left skewness, the natural logarithm of BLLs from the blood lead data served as the dependent variable in our multivariate statistical analysis, in which we used a weighted regression model to avoid having model output influenced excessively by tax parcels with multiple records. We performed the regression with clustering by block group to adjust standard errors for correlation within the same block group. Explanatory variables included median household income, percentage of households receiving public assistance, percent African Americans, and percent Hispanics—all taken from the 2000 U.S. Census. We also used year of construction from the tax assessor data and accounted for seasonal changes in lead exposure by including three dummy variables for seasons when the blood samples were taken (winter as reference) (Miranda et al. 2007
). We also included dummy variables for each of the counties. We combined the spatially linked data listed above into a single GIS database to prepare for statistical analysis using ArcGIS, version 9.1 (ESRI, Redlands, CA).We performed statistical analysis using Stata, version 9.0 (StataCorp., College Station, TX).
We ran three models using combinations of data from the different geocoding processes: a) level I geocoding only, b) levels I and II geocoding, and c) levels I–III geocoding. We compared the results from the models to investigate how much the additional data from more intensive geocoding processes improve performance of childhood lead exposure risk models in identifying areas of elevated lead exposure risk.