|Home | About | Journals | Submit | Contact Us | Français|
Many geographic studies use distance as a simple measure of accessibility, risk, or disparity. Straight-line (Euclidean) distance is most often used because of the ease of its calculation. Actual travel distance over a road network is a superior alternative, although historically an expensive and labor-intensive undertaking. This is no longer true, as travel distance and travel time can be calculated directly from commercial Web sites, without the need to own or purchase specialized geographic information system software or street files. Taking advantage of this feature, we compare straight-line and travel distance and travel time to community hospitals from a representative sample of more than 66,000 locations in the fifty states of the United States, the District of Columbia, and Puerto Rico. The measures are very highly correlated (r2 > 0.9), but important local exceptions can be found near shorelines and other physical barriers. We conclude that for nonemergency travel to hospitals, the added precision offered by the substitution of travel distance, travel time, or both for straight-line distance is largely inconsequential.
Numerous geographic studies have analyzed the distances between residential locations and locations of features in an effort to identify risks, gaps, shortages, and disparities. These include emergency services, such as distances from fire stations or ambulance response times (Jones and Bentham 1995; Lyon et al. 2004; Liu, Huang, and Chandramouli 2006; Nicholl et al. 2007; Schuurman et al. 2009); medical services, such as distances and travel times to primary care physicians, hospitals, or specialists (Luo 2004; Wang and Luo 2005; Patel, Waters, and Ghali 2007; Ludwick et al. 2009); and proximity to amenities such as schools, playgrounds, and greengrocers (Pearce, Witten, and Bartie 2006; Pearce et al. 2007; Larsen and Gilliland 2008; Sharkey 2009). There are also a host of studies that look at spatial proximity to adverse features such as gambling centers, liquor stores, and pollution sources (Burdette and Whitaker 2004; Cradock et al. 2005; Pearce et al. 2008; Hart et al. 2009; Hay et al. 2009; Kearney and Kiros 2009). The preponderance of these studies has found that distance is a relevant explanatory variable, with shorter distance corresponding to higher utilization or exposure.
In many of these studies, geographic distance is the single measure of accessibility or exposure. Others incorporate additional measures involving population size, density of features, and choice among competing options (Guagliardo 2004), but even for these more complex analyses, distance is a necessary component. A majority of studies define distance as the straight-line distance between locations, using either Euclidean distance with projected coordinates or spherical distance with latitude and longitude coordinates. Locations are often first aggregated to some geographic unit for which the data have been collected, such as postal codes or census-defined areas. An advantage of this approach is that calculations are straightforward, not necessarily requiring specialized geographic information systems (GIS) software. A smaller number of studies measure distance using actual road network distances or automobile travel times. This approach offers greater sophistication and precision, although traditionally at the expense of purchasing and managing specialized GIS software or street network data. In our experience, we have found that driving distances and times are perceived to be substantially more precise than straight-line distance. For example, this perception was a major impetus for the recent development of a shortest path calculator by the North American Association of Central Cancer Registries and the University of Southern California GIS Research Laboratory (2009).
Recent technological advances have essentially eliminated the cost of using street-network distance in analyses. There are now at least five commercial Web sites offering precise driving directions between nearly all locations in the developed world (Google, Yahoo!, Mapquest, Bing, and Rand McNally). Simple programs written in open-source programming languages such as Python can be used to make repeated calls to these sites to obtain the travel time and distance information for any number of locations. In this article, we make use of this functionality to conduct a large-scale comparison of straight-line distances and travel times and distances for the fifty states of the United States, District of Columbia, and Puerto Rico. Our aim is to assess the extent to which using travel time or distance confers a genuine advantage over straight-line distance and to identify locations where differences between the two are most pronounced.
Interest in this question dates at least to the 1960s and research on network models in geography (Haggett 1967). Cole and King’s (1968) Quantitative Geography defined the ratio of travel distance to straight-line distance as the “detour index” and reported typical values of 1.2 to 1.6 for rural areas in various parts of Britain, with the calculations done by having students trace roadways on paper maps. This ratio has been applied in other fields ranging from ecology to computer science where travel over networks is measured, often under different names such as the “index of circuitry” or “route factor” (Cardillo et al. 2006; Bebber et al. 2007; Buhl et al. 2009).
Several studies have found that the correlation between straight-line distance and road-network distance or time is extremely high and that substituting one for the other is unlikely to have a substantial impact on analytic results (Martin et al. 2002; Wood and Gatrell 2002; Jordan et al. 2004; Fone, Christie, and Lester 2006; Apparicio et al. 2008). In particular, a New York State study considering travel from postal ZIP codes to hospitals via state highways found a near-perfect correlation between straight-line distance and road-network distance (Phibbs and Luft 1995). In contrast, a study of access to renal units in England reported the use of road-network distance represented a “significant improvement” (Martin et al. 1998), and a Spanish study found that travel distance offered better predictors of transit ridership than straight-line distance (Gutierrez and Garcia-Palomares 2008).
It is sensible that the correlation between straight-line distance and road-network distance would be very high in the United States given the overall density of roads. Of course, it is not a perfect correlation. Islands, points along an irregular coastline or lakeshore, and locations separated by uncrossable lakes, rivers, and mountains would be expected to have higher-than-expected travel times. This was the case for many locations in a study of hospital access in rural British Columbia (Schuurman et al. 2006). In the New York State study just cited, points on opposing sides of the Hudson River where there were no nearby bridges were among the largest outliers. In this article, we evaluate the magnitude of these deviations throughout the United States and the extent to which they argue for the standard use of travel distance in social scientific research.
We developed a population-based nationwide sample of travel paths by selecting one point from each census tract as origins and locations of community hospitals as destinations. Census tracts are designed to be demographically homogeneous and roughly equal in population, with an average of about 4,000 people per tract. There were 66,125 census tracts containing at least one road in the fifty states, District of Columbia, and Puerto Rico according to the 2000 Census. The geographic centroid of each tract, derived from the Census cartographic boundary files, was snapped to the nearest vertex of the nearest road, and this was taken to be a representative location in the tract.1
The straight-line distance, travel time, and travel distance between these points and the nearest community hospital were then calculated. Community hospitals consisted of the set of hospitals categorized as general acute care hospitals (N = 5,111) by the Centers for Medicare and Medicaid Services (2009) as of March 2009. These are nonfederal, publicly accessible hospitals that mirror the population distribution generally, with 40 percent located outside metropolitan statistical areas. Geocoding of hospital locations was done using QualityStage Geolocator software, version 2.0.1, and the Dynamap street reference file, version 9. Hospitals that could not be geocoded (< 1 percent) were manually reviewed and geocoded using Google Maps (Google 2009).
The nearest hospital was found by first identifying all candidate tract–hospital pairs within one degree of latitude and longitude of each other, calculating the straight-line distance for each pair, and retaining the minimum distance. For tracts not within one degree of latitude and longitude of a hospital, the search was expanded to two degrees, and so on. This method reduced the number of potential calculations by 98 percent. Straight-line distances were computed as great circles assuming a spherical earth. Results summarizing the difference between the predicted and actual driving distances for all tracts were viewed on a scatterplot, basic statistics were calculated, and the magnitude and locations of substantial outliers were noted.
Travel distance and travel time were obtained through repeated calls to the Google Maps Web page using the SAS FILENAME URL method in SAS version 9.1 (Helf 2005; Zdeb 2009). Each call generates a vast amount of HTML code from which the travel distance and travel time can be extracted using character string functions. (Walking time and distance can also be obtained in this manner, along with driving time during peak traffic periods and public transportation time for selected metropolitan areas. These were explored but not used in this study.) The SAS code is available from the authors.
Once straight-line distances, driving distances, and driving times were obtained for all tracts, the linear relation and correlations between these three measures were assessed using ordinary least-squares regression. The model was fitted using a zero intercept given that a trip of zero distance requires zero travel time. The ratio of driving distance to straight-line distance was defined as the detour index. The difference between the actual driving distance and the predicted driving distance as derived from the regression equation was calculated for all tracts and used to examine outliers. The “predicted” travel distance or travel time was defined as the straight-line distance multiplied by the slope of the regression line. Except where noted previously, all analyses were performed using SAS version 9.1 and ArcGIS version 9.3.
There were 66,011 census tracts with a valid driving route to a hospital. The 114 remaining tracts mainly consisted of islands without ferries or bridges, along with a very small number where the selected point fell within a gated residential community.2 Straight-line distance predicted travel distance very well in nearly all locations, with the r2 for the United States as a whole equal to 0.94. The largest outliers were disproportionately located in Alaska, which has significant roadless areas and locations connected by ferry, but excluding Alaska did not alter the r2. The detour index was 1.417 for the entire data set. Straight-line distance also predicted travel time very well, with r2 = 0.91 for the United States as a whole—reasonable given that travel distance and travel time are themselves highly correlated (Table 1).
The r2 values are presented merely to establish that they are extremely high; their exact interpretation is confounded by the spatial autocorrelation of the observations. Of greater interest is identifying the number and location of tracts where the straight-line distance is a poor predictor of driving distance. Both the absolute and relative differences between the actual and predicted driving distances were used to measure this (Table 2). Over 90 percent of the tracts have good agreement using thresholds within 10 percent or 5 kilometers. For the remainder, positive relative errors represent locations where the actual driving distance exceeds the predicted driving distance; negative values represent the converse. Large positive relative errors are found near irregular shorelines, on islands, in very low-population-density wilderness areas, adjacent to other impassable physical features, or some combination of these. Large negative relative errors are in locations that are not close to a hospital but that have a very straight drivable route to the nearest one. The lowest possible relative error is 41.7 percent, the situation when a route follows exactly the shortest straight-line distance.
The most extreme difference between straight-line distance and travel distance is found between Grand Marais, Minnesota, and Houghton, Michigan (Figure 1). Here, an 8.5-hour drive through three states is required to cover a distance that via straight line is just 155 kilometers, yielding a detour index of 3.4. This example also misassigns the hospital that is truly closest, as the hospital in Duluth, Minnesota, would be reached before the one in Houghton. Other large differences are found at other locations in the western Great Lakes; between the eastern end of Long Island, New York, and Connecticut; and in remote parts of western states such as Utah and Idaho. The same type of pattern can be found within urban areas, albeit on a much reduced scale. In New York City, there is a section of Queens close to the East River where the closest hospital is 1.1 km across the river in Manhattan. With no bridge immediately nearby, driving this route requires traveling 9.4 km, a detour index of 8.5, but there are hospitals in Queens closer than this.
The differences between straight-line distance and travel distance are further illustrated by mapping the outliers in Nevada, a state with some of the largest outliers (Figure 2). The map reveals longer-than-expected travel distances in the mountainous suburbs west of Reno, where roads are sparse and serpentine. Meanwhile, in the small town of Elko, travel distances are shorter than expected owing to a very direct route to the nearest hospital—albeit one that is in an adjacent state, roughly four hours away. Overall, though, the two measures agree to within 10 kilometers for over 90 percent of the tracts.
In terms of computational complexity, few studies involving geographic distance use as many points as we have used here (66,000 origins and 5,000 destinations). For studies larger than this, processing time could become an issue. In our study, identifying the nearest hospitals from each sample point and calculating the straight-line distances took about 1.5 hours using a desktop computer with dual 3.16 Ghz processors, 3.3 GB of RAM, and 250 GB of free drive space. Finding the travel times via the repeated calls to Google Maps took about five hours. Our approach would be inappropriate for the calculation of travel-distance or travel-time buffers, where very large numbers of travel routes would need to be evaluated. In contrast, the calculation of buffers based on straight-line distance is trivial.
In nearly all locations within the United States, the straight-line distance is an adequate proxy for travel distance, after applying a detour index of about 1.4. Exceptions are limited to areas located near uncrossable physical features such as lakes, rivers, and mountains and in wilderness areas of the western United States and Alaska. If errors up to 5 kilometers or 10 percent are tolerated, then the two distance measures are equivalent for over 90 percent of the population; relaxing the threshold to 10 kilometers or 10 percent raises this figure to over 96 percent. These are conservative tolerances in the area of nonemergency medical care, where variations in travel of less than thirty minutes generally do not pose significant barriers (Lee 1991).
These results strongly suggest that the many past studies where straight-line distance was used remain valid, and they contradict the widespread perception that travel distance or time represent a tremendous improvement in precision that should be pursued even at significant cost. But because the cost of obtaining travel distance and travel time has become negligible, we do recommend incorporating their small added precision into future studies that relate residential location to geographic features. In the area of emergency response, where results are sensitive to even small differences, such inclusion is essential. Although we focused on community hospitals, we do believe that our findings logically extend to other geographic destinations of varying densities and spatial scales such as parks, schools, and shopping centers that typically involve car travel. They would not necessarily extend to fine spatial scales where driving is unlikely (Okabe and Kitamura 1996). Our results assume the accuracy of the route choices and drive-time estimates using Google’s database. Although we find these data reliable, we are unaware of any formal evaluation of this. The many comments that have been posted online on this subject tend to describe inaccuracies of several hundred meters at most.
Finally, we call attention to the observation that the nationwide detour index of 1.417 is virtually equal to the diagonal of a unit square (1.414). This means that, on average, traveling from an arbitrary address in the United States to the nearest community hospital is equivalent to the maximum possible Manhattan distance between those two points (that is, the distance measured along the two equal axes of an isosceles right triangle). We leave as a future project the determination of whether this is merely an interesting coincidence or a theoretically meaningful result.
FRANCIS P. BOSCOE is a Research Scientist at the New York State Cancer Registry, 150 Broadway, Suite 361, Menands, NY 12204. : .su.yn.etats.htlaeh@10bpf His research interests include cancer and chronic disease epidemiology, medical geography, environmental health, spatial methods, and data visualization.
KEVIN A. HENRY is an Associate Director and Research Scientist at the New Jersey Cancer Registry, The Cancer Institute of New Jersey, UMDNJ-Robert Wood Johnson Medical School, 120 Albany Street, Tower II 5th Floor, New Brunswick, NJ 08901. : .firstname.lastname@example.org His research interests include geographic analysis of disease, applied methods in health geography, and socioeconomic disparities in cancer stage at diagnosis, treatment, and survival.
MICHAEL S. ZDEB is an Assistant Professor in the Department of Epidemiology and Statistics at the University at Albany School of Public Health, Rensselaer, NY 12144. : .ude.ynabla@30zsm His research interests include biostatistical methods, data visualization, and analysis of vital records data
1The alternative of choosing a random location in the tract does not impact the results. The average resulting displacement from the centroid is about 1 kilometer and is independent on the hospital locations.
2The Google Maps database structure does not allow these points to be connected to the greater road network. This database characteristic was introduced into the Google Maps database during the course of our research, and it is unclear whether this was an intended feature or a bug. We considered using a competing database, but the number of problematic locations was small enough to have no impact on our results.
Francis P. Boscoe, New York State Cancer Registry.
Kevin A. Henry, New Jersey Cancer Registry.
Michael S. Zdeb, University at Albany School of Public Health.