The validation effort identified 2,208 open food outlets, including 160 supermarkets/grocery stores, 504 convenience stores, 120 dollar/variety stores, 79 drug stores, 36 specialty stores, 650 full-service restaurants, 312 franchised limited-service restaurants, and 347 nonfranchised limited-service restaurants. Fifty-two percent of all food outlets were located in Richland County, an urban area.
shows the results of the validation effort relative to the secondary data sources. The DHEC database had the largest number of listed outlets (n = 1,694), including 417 stores and 1,277 restaurants. InfoUSA listed 1,657 outlets (672 stores and 985 restaurants), and D&B listed 1,573 (751 and 822, respectively). Of the outlets listed by the DHEC, only about 11% could not be confirmed by the field census, because they were either not found (6.6%) or closed (4.2%). The InfoUSA database was similar: Approximately 14% of listed outlets were not confirmed, largely because they were not found (9%) or closed (3.5 %). In contrast, D&B had the highest proportion of outlets that could not be confirmed (22%), with 12.3% not being found, 2.3% not being found because of a post office box address, and 7.6% being closed.
We found 183 food outlets during the validation effort that were not listed in any of the 3 data sources. The number of outlets newly found ranged from 696 for the DHEC to 774 for InfoUSA to 985 for D&B. The majority of outlets discovered relative to the DHEC were stores, which is not surprising since many stores do not fall under the licensing regulations enforced by the DHEC. However, 183 restaurants were discovered that were not listed by the DHEC. The InfoUSA database was missing 349 stores and 425 restaurants. The D&B database was missing 330 stores and a very large number of restaurants (n = 655). Combining D&B and InfoUSA data sources would have yielded 2,170 unique, listed outlets (944 stores and 1,226 restaurants), of which 1,727 were found and open during the field survey (data not shown). Relative to this combined listing, 481 new outlets were discovered during fieldwork.
Validity statistics are shown in . For all outlets combined, the sensitivity (i.e., the ability to capture food outlets that truly existed in the area) was moderate to fair (68% for the DHEC, 65% for InfoUSA, and 55% for D&B). Both D&B and InfoUSA exhibited moderate sensitivities for food stores (63% for D&B and 61% for InfoUSA) and ranked significantly better than the DHEC (43%). This implies notable undercounting of existing stores for both commercial databases—approximately 37% for D&B and 39% for InfoUSA. Combining the 2 commercial databases would have resulted in a significant improvement in sensitivity for food store identification (81%). Likewise, for supermarkets and grocery stores, all 3 databases had similar levels of sensitivity, ranging from 71% to 76%. The combination of InfoUSA and D&B data would have resulted in a marked improvement in supermarket sensitivity (90%). Combination of DHEC and InfoUSA data or DHEC and D&B data or data from all 3 sources would have resulted in very good to excellent sensitivity for the identification of supermarkets: 86%, 91%, and 97%, respectively (data not shown). No data are shown for DHEC with respect to dollar/variety stores and drug stores/pharmacies because of the DHEC's focus on licensing prepared food outlets. Had we excluded dollar/variety stores and drug stores/pharmacies entirely from the DHEC analysis, the sensitivity of the overall food store category would have improved to 54% (95% confidence interval: 51, 58). Of the other types of food stores, D&B and InfoUSA demonstrated very good sensitivities at 80% or above for drug stores and pharmacies.
Validity of Food Outlet Locations Listed in Secondary Data Sources in an 8-County Region as Compared With a Field Census, South Carolina, 2008–2009
With respect to restaurants (lower portion of ), the DHEC database had very good sensitivity of 86% (i.e., an undercount (discovery rate) of only 14%). InfoUSA (67%) and D&B (50%) performed significantly worse. With respect to ranking of the 3 databases across type of restaurants, the findings were consistent. Combining the 2 commercial databases would have resulted in a significant increase in sensitivity compared with using either one alone, but sensitivity would not have reached the level of the DHEC database. However, the combination of DHEC data with InfoUSA or D&B data or the combination of all 3 databases would have resulted in excellent sensitivity values for restaurants (91%, 92%, and 94%, respectively; data not shown).
also shows PPVs, which can also be interpreted as verification rates—that is, the likelihood that a listed food outlet actually existed and was open. PPVs ranged from good to very good: 78% for D&B, 86% for InfoUSA, and 89% for DHEC, for all food outlets combined. For stores, the PPV was highest (92%) for the DHEC database, significantly better than for any other database (D&B, 76%; InfoUSA, 82%). For restaurants, DHEC and InfoUSA data performed equally well with respect to PPV (88% and 90%, respectively), with D&B performing significantly worse (79%). This ranking between the databases remained consistent for all 3 restaurant types.
We subsequently evaluated potential differences in the validity of the 3 secondary data sources across levels of urbanization (). For stores, there were no marked differences between levels of urbanization in any of the 3 databases or the combined D&B and InfoUSA databases. A similar picture emerged for restaurants, the exception being significantly higher sensitivity in urban areas in the D&B data. We additionally evaluated the potential influence of tract racial composition or poverty on the validity estimates but found no evidence for any systematic differences (data not shown).
Validity of Secondary Data Sources for Locations of Food Outlets in an 8-County Region as Compared With a Field Census, by Level of Urbanicity, South Carolina, 2008–2009
Geospatial accuracy statistics are shown in and are limited to located and open outlets because of the need to have both geocodes from the database and GPS coordinates from the field census. The geospatial accuracy varied widely, with a median Euclidian difference of 76 m (n = 1,507) for DHEC, 92 m for D&B (n = 1,213), and 92 m for InfoUSA (n = 1,434). The percentage of outlets for which the geocoded position was less than 100 m from the GPS location ranged from 53% for both D&B and InfoUSA to 56% for DHEC. The correct allocation of outlets to census tracts was high (83%, 85%, and 84% for DHEC, D&B, and InfoUSA, respectively). No notable differences in the median distances were observed by type of food outlet; hence, all outlet types were combined. As expected, the Euclidian distance differences were lowest for the urban areas for all 3 data sources, intermediate for the suburban and large-town areas, and highest for small-town and rural areas (P < 0.0001 for all contrasts).
Geospatial Accuracy of Secondary Data Sources for Locations of Food Outlets (All Types Combined) in an 8-County Region, by Level of Urbanicity, South Carolina, 2008–2009
Finally, to combine our evaluation of count accuracy with the geospatial accuracy, we calculated the proportion of open outlets that had been both listed in the respective database and geocoded to a position less than 100 m from the actual GPS-recorded location (). Overall, between 29% (D&B) and 39% (DHEC) of open food outlets were listed with geocodes that would place them less than 100 m from their actual location. DHEC performed best for restaurants (49%) and worst for stores (24%). D&B and InfoUSA did not differ significantly for stores (31% vs. 29%), but InfoUSA performed significantly better than D&B for restaurants (38% vs. 28%).
Percentage of Located and Open Food Outlets That Were Correctly Allocated to Within 100 m of Their Actual Position, by Secondary Data Source, South Carolina, 2008–2009