Geographic location of hospitalizations for TBI, aggregated to regional counts by municipality, was examined for two separate periods eight years apart in the province of Ontario, Canada. Hospitalization rates for TBI were mapped by age and major mechanisms of injury (e.g., motor vehicle collisions or falls). A province-wide exploratory analysis was used to identify potential geographic areas of high risk. Further, mapping incidence at two different times aimed to show changes in rates over time, and to identify those areas with a persisting high risk for TBI occurrences. Although other studies have compared the incidence of TBI in urban and rural areas [18
], this study collects and analyses within-province hospitalizations for TBI for each municipality.
This study was designed to determine the potential of GIS methods, focusing on data exploration and hypothesis-generation rather than on formal hypothesis-testing. As a result, the study went through two iterations of data preparation and analysis: first, to identify data characteristics, and to test software and the methodological approach; and second, to try to resolve some of the methodological issues identified in the first iteration and apply the most promising methods. A prior technical paper provides additional methodological details [26
]. Since the purpose of this paper is to explain the methods via the example data set, the methods and results section of this paper are combined. This section contains a discussion of how and why different analytical methods are used. Also, it contains information on how to interpret the results for this data.
The data on hospitalizations for TBI were obtained from the Ontario Trauma Registry's Minimum Data Set for two time periods, 1993-94 and 2001-02. These individual-level data were geographically located using the Ministry of Health "Residence Code", used by the ministry for service provision, and were based on the address recorded by the Ontario Health Insurance Plan - the province's public health insurance plan - for each patient. The population and socio-demographic data used were publicly available census information, collected and distributed by Statistics Canada ("StatCan"), for the census years 1991 and 2001. Through the Data Liberation Initiative of StatCan, the University of Toronto Library System licenses these data for research purposes. Geographic reference map files from StatCan's Census geography are also available through this program. Supplementary geographic data files from the library and other sources were also used for map creation and data analysis. Table summarizes the two main data sources.
Summary of two main data sources
Uses of GIS to inform public health decision-making
For the purposes of this paper, the main uses of GIS can be categorized into four phases, as visualization, exploratory data analysis, geographic (spatial) analysis, and presentation of results (See Figure , derived from Dragićević et al [27
]). The boundaries between these uses are sometimes blurred, and the distinctions between them may be somewhat semantic; while working in a GIS environment they occur more as way stations along a continuous process rather than as discrete steps. In fact, each successive use may be seen as an extension or enhancement of the previous one.
Figure 1 Uses of GIS to inform public health decision-making. Derived from the "exploratory spatial data analysis" process of Dragićević et al. 
The first phase, visualization, is defined herein as the act of representing a single data set on a map and examining it for patterns. In this study, counts of TBI hospitalizations were aggregated and mapped by geographical area, as shown in Figure . Exploratory data analysis takes the visualization process further by comparing data sets by overlay of one time frame over another or calculation of statistics (Figure ) and includes tools ancillary to mapping, such as graphing or data brushing. The third phase, geographic (spatial) analysis, explicitly utilizes the methods of spatial statistics, which incorporate location and topological (i.e., neighbouring) relationships into the analysis of a dataset (Figure ). Finally, presentation of results represents the graphic communication of the results of analysis to an audience.
Figure 2 Examples of maps showing visualization, exploratory data analysis, and geographic (spatial) analysis. 2a: Example of visualization. Providing an overview and visual illustration of data sets, and putting them in geographic context, is an important function (more ...)
In this study individual incident data were aggregated to regional counts; this means that the total number of injuries of each type and for each subpopulation were cross-tabulated by municipality. Using regional count data to analyze spatial clustering raises a number of issues and limitations related to the imposition of the "filter" of the aggregation units on the data [28
]. Prime among these is the risk of ecological fallacy, or the geographical equivalent, the "modifiable areal unit problem [MAUP]". Simply put, this is the demonstrable risk that aggregation by different geographic "containers" will lead to variable results in statistical correlation [29
]. Regional count analysis must "also balance the small-number problem with the spatial scale of the data" [28, pg. 201]. That is, small geographic units lead to small counts, which reduce the statistical stability of observed and estimated data. Therefore, deciding which geographic units to use is key.
In this study census subdivisions (CSDs) were used, as these were the smallest units of census geography that could be related to the incidence data available at the level of the municipality (Figure ). It was necessary to use census units for the provision of demographic and socio-economic information that can be obtained; for example, age-cohort data were used to calculate SMR by CSD. Since another goal of the study was to compare rates from 1993-94 and 2001-02, using census geography also allowed comparable aggregation units to be created, and data compared, across time. Lastly, the potential for correlation of rates to socio-economic variables was also explored.
Ontario Census subdivisions (CSDs) in 1991 (n = 951), and 2001 (n = 586).
The first iteration of data exploration mapped and examined a wide variety of data comparisons, including comparisions of hospitalization rates to age distribution and mechanism of injury. Generally, expected patterns seemed to emerge: areas with older populations had higher rates, presumably due to falls; rural locations appeared to have higher rates than urban ones. We found that, although some intriguing patterns emerged, interpretation was limited by the methodological issues listed above (i.e., MAUP), as well as by some limitations in GIS functionality. Regarding the latter, the areas that needed to be improved were:
1. Ability to visualize and explore multivariate data relationships
2. Ability to control the method of creating neighbour relationships and other parameters for aggregation and spatial clustering analysis
3. Ability to compare patterns of spatial clustering over time
4. Ability to do regression analysis incorporating a spatial component.1
Therefore, in preparation for the second iteration of analysis, the methodological issues dealing with regional count data in this context, and limitations in GIS functionality, needed to be addressed.
The Local Indicator of Spatial Association (LISA statistic) was used [30
] for analysis. This application identifies clusters of High-high CSDs (units of significantly high rates surrounded by other significantly high rates, after a randomization process and significance testing is applied), and High-low clusters (units of significantly high rates surrounded by significantly low ones.) The statistic also identifies significant clusters of low rates, but these were not considered in this study. Persistence of clusters between the two time periods studied was also examined. In addition to the LISA analysis, an alternative measure to identify clustering, the Getis-Ord Gi* statistic [32
], was used to corroborate results.2
GIS methodological and functional issues
We dealt with the methodological issues in two ways. First, we decided to aggregate CSDs to achieve a "minimum population threshold," intended to stabilize rate calculations. This involves pooling together areas with small populations in order to provide enough of a population to determine rates. Second, use of an appropriate "rate-smoothing" method was made to overcome the problems of high rates based on low base populations [28
]. The most common solution in the literature is the use of spatial empirical Bayes interpolation to smooth the data surface and eliminate "zero" values. Both of these operations required analysis of "nearest neighbours" for each geographic unit based on their relationship to surrounding units; to establish nearest neighbours, CSDs were aggregated and rates were smoothed. Notably, in this study, we decided that relationships should be based primarily on a network analysis, using transportation connections and distance to define nearest neighbours. This was also the basis for the definition of neighbours in the cluster analysis described below.
The limitations in GIS functionality were overcome by using two different GIS packages. Also, after the second iteration's visualization and data exploration stages, we decided to focus on points 2 and 3 above -- analysis of clustering of standardized rates of TBI hospitalization and comparison of spatial clustering patterns over time. This seemed to be the best way to identify areas with significant TBI occurrences.
Visualization and exploratory data analysis
Both the 1991 and 2001 maps of age-standardized TBI counts by municipality show a strong correlation to overall population distribution, as would be expected. (These maps are not illustrated here for confidentiality reasons.) Thus, more urbanized Southern Ontario shows the concentration of large counts. All other maps represent age-standardized TBI rates rather than counts, so differences in population sizes are no longer an issue.3
When SMR is mapped (see Figure ), the pattern is generally the inverse of population distribution, with rural areas and the North showing more high rates of hospitalizations, with a few outliers in the South. This pattern is maintained when empirical Bayes smoothed rates (EBR) are mapped (Figure ), and generally applies to both the 1993-94 and the 2001-02 data. Within this pattern, there are several areas where the higher values tend to cluster, or high outliers occur. Described generally, these are:
Examples of mapping of TBI rates and cluster analyses.
1. a large number of isolated communities in Northwestern Ontario
2. North Central Ontario (a collection of high rates)
3. on or near Manitoulin Island in Lake Huron (a large concentration of high rates)
4. spread across Southwestern Ontario (a collection of high rates)
5. scattered parts of South Central and Eastern Ontario (a few large outliers; these vary between time periods)
These findings highlight potential problem areas for further investigation. Our study follows up on this analysis using spatial analysis of clustering.
One aspect of this exploration, often neglected, is that the cartographic methods used for representing data have a significant impact on their visual interpretation. As an example, many municipalities are small in area and so are practically invisible in the shaded area (choropleth) maps generated by default in GIS statistical software. However, when data by CSD are mapped using circles proportional in size to the data, these data become more visible, and distinct clustering patterns may be better perceived (See Figure ). To accomplish this, the default cartographic rendering in most GIS packages will need to be overridden with appropriate customized representation, or symbolization.
Spatial analysis of clustering - LISA and Getis-Ord Gi* statistics
The importance of geographic clustering of high hospitalization rates has not yet been established. Since TBI is not "contagious", the assumption is that an underlying phenomenon may exist which is related to proximity, connectivity, or other environmental contextual factors; this would influence high rates to be grouped together spatially. If this clustering is found, further investigation into these potential factors should be undertaken.
The LISA (Local Moran's i) and Getis-Ord Gi* methods for identifying clusters each take a slightly different approach to the task. In terms of practical interpretation, both methods identify significant High and Low clusters, i.e., High or Low geographic units neighbouring on similarly High or Low units, where units in this case are CSDs. The LISA also identifies anomalous clusters, i.e., High units surrounded by Low ones, or vice versa. In this study we are interested in clusters of High values only. For interpretation purposes, maps were constructed showing only significant High clusters as classed, colour-coded circles sized according to multiple significance levels (p < .01, p < .02, or p < .05) (See Figure and .) This provides a more nuanced tool for interpretation of results than a simple binary representation.
The results of the spatial analysis of clustering generally reinforce the visual analysis of the data exploration maps: many of the same groupings of high EBR values identified visually appeared as significant High-high clusters resulting from the LISA analysis, although at varied levels of significance. In contrast, many of the high outliers which were geographically isolated did not re-appear as significant clusters at all, either in the High-high or the High-low category. This is to be expected, as the relationships among neighbouring CSDs affects the cluster analysis; for example, a value can be high, but if surrounded by moderate values it will not be identified as a significant cluster. Comparing the results of the LISA clusters and the Getis-Ord Gi* clusters, most of the LISA High-high clusters are repeated as Gi* High values. There are some exceptions in both directions, but overall the two methods corroborate the clustering results.
Comparison of two time periods: 1993-94 and 2001-02
In order to make the mapping and analysis of the 1993-1994 and 2001-2002 TBI data compatible, we aggregated the data to common geographic units and merged the 2001 CSDs to match the more numerous 1991CSD boundaries as closely as possible. This created a "lowest common denominator" map of comparable geographic units. Comparison was done in two ways: a visual comparison of patterns between EBR and clustering maps for the two time periods, and an analysis of persistence of significantly high clusters between the two periods.
Visual comparison showed significant similarities in the patterns of high EBR values and clustering between the earlier and later data series. The most stable were the patterns noted above as points 1, 2 and 3, that is, high rates of TBI incidence in Northwestern and North Central Ontario, and around Manitoulin Island. The analysis of persistence of clustering, however, found a fairly small number of individual CSDs that are identified as clusters in both time periods (see Figure ). This map specifically compares TBI smoothed EBR significant clustering statistics (LISA and Gi*) between two time periods, showing persistence of clusters by each method. For the LISA statistic, there are few persistent clusters: only nine keep the same classification from one time period to the next. A greater number of Gi* High clusters persist, but still a small proportion. This reflects the fact that even in the "stable" areas of high rates and clustering, on closer examination, there is some amount of shifting of high rates among neighbouring CSDs. Even where clusters do not persist, however, comparable patterns may be repeated. A good example of this is the High-low clusters identified by the LISA analysis in Southern Ontario (See Figure ). These represent elevated rates with moderately low neighbours. That these exact clusters do not persist indicates they may be the result of a temporary situation or unique event. However, the fact that there is a similar pattern of other CSDs in the same general area with similar cluster characteristics may indicate that there is some mechanism at work that has a geographic component, or that similar conditions in these Southern Ontario communities result in similar kinds of TBI rate profiles, eight years apart.
Persistent high clusters for 1993-94 and 2001-02 data, identified by LISA and Getis-Ord Gi* cluster analyses.
LISA Cluster maps contrasting results using 1993-94 data with 2001-02 data, each aggregated to 2001-comparable CSDs. High-low LISA clusters in Southern Ontario show similar patterns, but in different CSDs.
Interpreting the results of the GIS analyses for this data
The persistence of high rates of TBI-related hospitalizations between the two time periods suggests the possibility of a chronic or recurring problem that may be the result of persistent risk factors. Moreover, similar geographic patterns of occurrence, even where exact locational persistence does not occur, may also signal a contextual element related to high incidence of TBI in which geographic location or contact between neighbouring populations plays a role. The rational follow-up to this analysis would be to focus on these identified areas of high rates (and of persistent clusters) for more a detailed study of demographics, mechanism(s) of injury and risk factors to see if these potential underlying operational factors can be discovered.
At this time, the accuracy of the patients' residence code has not been thoroughly assessed. In addition, examining changes in rates over time presented many challenges; this is because the underlying geographic groupings had changed such that smaller areas were merged into larger ones. This raises methodological issues that have been noted, such as the modifiable areal unit problem. It should also be noted that our study focused on hospitalization for TBI and as such represents the most severe of injuries. A much larger and more representative sample of TBI cases would have been available if data from all emergency rooms and acute care hospitalizations were included. However, emergency room cases that are not associated with a subsequent hospitalization are less likely to require post acute services such as in patient rehabilitation. We recognize, however, that even a "mild" TBI can have long term implications [33
] and require subsequent care [34
]. Thus, future analyses should be conducted on data that include all acute care, emergency room, and where possible physician visits.
In addition, mechanisms of injury for TBI vary by level of severity, where mechanisms such as being struck by an object may be more common with the inclusion of emergency room data [21
]. Through ongoing research we plan to examine the counts by mechanism of injury. Preliminary data analyses show different geographic patterns by higher percentage of mechanism. For instance, TBI by falls have a higher concentration in a big city core whereas motor vehicle collisions are more likely to be in the suburban areas, presumably where there is a greater need for road travel.
It is also possible to compare the degree to which patients receive care in their own geographical area or go outside their regional funding units. In previous work, we have shown the percentage of persons who obtain care both within and outside their geographical area (local health integration network) [36
]. This information can be used to direct funds towards unmet needs or to monitor financial impact of care in specific geographical areas. This concordance can also be mapped in future analyses; however, modeling shifts between hospitals would be extremely complex. Mapping according to patients' residence allows one to plan for home-based services for the long term and could thus greatly benefit patients since many persons with serious brain injury require long term support.