|Home | About | Journals | Submit | Contact Us | Français|
Linking electronic health record (EHR) systems with community information systems (CIS) holds great promise for addressing inequities in social determinants of health (SDH). While EHRs are rich in location-specific data that allow us to uncover geographic inequities in health outcomes, CIS are rich in data that allow us to describe community-level characteristics relating to health. When meaningfully integrated, these data systems enable clinicians, researchers, and public health professionals to actively address the social etiologies of health disparities.
This article describes a process for exploring SDH by geocoding and integrating EHR data with a comprehensive CIS covering a large metropolitan area. Because the systems were initially designed for different purposes and had different teams of experts involved in their development, integrating them presents challenges that require multidisciplinary expertise in informatics, geography, public health, and medicine. We identify these challenges and the means of addressing them and discuss the significance of the project as a model for similar projects.
Electronic health record (EHR) systems have the potential to improve the quality of care and decrease overall health-care utilization costs.1,2 By augmenting clinical data captured in a health information exchange (HIE) with spatially enabled community data, these systems also can more effectively identify and characterize public health trends and events, predict future public health outcomes, and help devise more effective health interventions.3,4 In this article, we discuss the role of spatially enabled data and community information systems (CIS) in the context of health care.
Spatial data describe the geospatial location(s) of patients and associated geographic entities, such as neighborhoods, census tracts, or counties. These data also may include attributes associated with the patient or the geographic entities in which the patient lives. Residential street address is commonly recorded in electronic patient records. Addresses are translated into spatial data via a geocoding process called address matching. This process involves matching the input address with addresses in a digital reference map and extracting the associated geographic coordinates (latitude and longitude) to define the position of the associated point on the earth. Once geographic coordinates are defined for a patient record, they can be used to identify any other place that shares this location or any catchment area associated with it (e.g., ZIP code or neighborhood). As health disparities are often geographically specific, it is particularly important to consider place to understand and address their causal factors.5–9
Community data in this context refer to both compositional and contextual characteristics of the areas, or geographic entities, associated with a location. Compositional data can refer to population characteristics (e.g., adolescent fertility rates or socioeconomic status [SES]) and contextual data can refer to proximity to risk factors (e.g., nearness to high-crime areas or affordable clinical care).10–15 Contextual data can also refer to qualitative data, such as written histories of a place or interviews with residents. Community data often describe well-known geographic entities, such as neighborhoods, census tracts, and counties. Alternatively, community data may be defined by specific criteria, such as distance zones around a point location. Depending on the analysis and the theorized geographic level of influence, areas of interest may include the associated neighborhood, primary care service area, and/or county, among others.16
Data from the U.S. decennial censuses are commonly used to describe demographic, socioeconomic, and housing characteristics of a place. The annual American Community Survey data promise to provide population and housing data more regularly, although these data often have insufficient sample sizes to analyze the context of small geographic entities.17 In addition, census geographic entities, such as census tracts, may not be the most relevant to examine social processes influencing health.5,8
CIS, an ideal source of compositional and contextual community data, typically are developed by local stakeholders interested in assembling data to help assess local issues. In addition to U.S. census data, CIS commonly integrate data from a wide variety of state and local sources18 that provide more detailed and varied information not covered by the census, such as crime incidence or availability of community resources. Although some CIS focus primarily on neighborhood-level indicators, others provide data at multiple geographic levels to support a wider range of uses, including multilevel analysis, and/or allow data to be aggregated to custom boundaries of interest to the user. Once local administrative datasets are incorporated into a CIS, these data are typically updated on a recurring basis and made publicly available.
Equally important, selected data and associated CIS indicators reflect local concerns and interests, which can be informative to the researcher studying social determinants of health (SDH). Additionally, because CIS commonly incorporate geographic information systems to geocode, integrate, and visualize data, these systems can provide a good source of geocoding expertise, tools, and reference data.
We demonstrate the challenges and means of leveraging EHR systems and CIS by integrating two well-established information systems in central Indiana. The interdisciplinary Indiana Center of Excellence in Public Health Informatics manages a large EHR system called the Indiana Network for Patient Care (INPC). The Polis Center, also at Indiana University-Purdue University Indianapolis, has developed and manages the SAVI Community Information System (hereafter, SAVI).
INPC is one of the nation's most comprehensive and longest-running HIEs. It is operated by the Regenstrief Institute, an internationally recognized informatics and health-care research organization dedicated to improving health through research, development, and education. In addition to providing clinical data at the point of care to more than 14,000 physicians, INPC provides statewide syndromic surveillance, public health case detection, and physician alerting services to local and state public health. Its data repository contains more than one billion coded standardized clinical observations dating back more than 30 years and receives 350,000 to one million clinical transactions daily from more than 200 data sources, including nearly 80 emergency departments and 35 hospitals, more than 100 clinics, local and state health departments, and multiple ancillary care data sources.19 In addition to offering daily access to community-wide clinical data to providers in routine health-care settings and supporting real-time public health surveillance, INPC offers a rich source of clinical data for researchers to study pediatric obesity, adolescent and young adult sexually transmitted infections (STIs), and asthma, among other health topics.19–29
SAVI is one of the leading CIS noted by the U.S. Government Accountability Office.30 SAVI was developed and is managed by The Polis Center, which is dedicated to using collaboration, interdisciplinary research, and knowledge of advanced spatial technologies to provide reliable information for improving communities in Indiana and beyond. SAVI collects, geocodes, organizes, and presents integrated data on communities in the 11-county Indianapolis metropolitan statistical area drawn from more than 30 federal, state, and local providers, all linked to the lowest available geographic level (often, the street address). It is the nation's largest CIS, with more than 10,000 time-series variables from 1980 to the present, including welfare, education, health, public safety, housing, and demographics. SAVI also includes information on the locations of health facilities, health and human services, community facilities, and associated service areas. In addition to providing compositional and contextual data for local public health research, SAVI provides data for cross-site (multiple city and institution) studies on public health and other community issues.24–29
Until recently, researchers who wanted to link patient and community data from INPC and SAVI had to do so manually on a project-by-project basis. This ad hoc approach was inefficient and lacked the flexibility for data analysis that an integrated system could provide. Overlapping datasets would be extracted for geocoding by different researchers, with the results not returned for subsequent use. Also, when researchers who had geocoded their own datasets shared their data, there was often uncertainty about the relative accuracy of the geocodes, particularly because there was not a standard method for capturing spatial metadata or documenting the associated methodology and reference data used. In addition to being redundant, this manual approach was time-consuming. To address this problem, an INPC-SAVI team is augmenting the clinical data with geospatial attributes by designing, implementing, deploying, and evaluating a near real-time geocoding process capable of handling high-transaction volumes. Within this system linkage, all current and incoming patient records are geocoded, which allows the clinical data to be spatially associated with extensive, locally developed datasets about the social, economic, and physical environment.
Requirements for the initial version of this system were defined based on a relatively simple administrative use (i.e., generation of aggregate-level counts of notifiable disease cases per county). The accuracy and completeness of geocoding required for this administrative use are not as stringent as needed for many clinical and research uses. Additional use cases have been and will continue to be defined to guide the ongoing refinement and expansion of system requirements.
To help demonstrate the opportunities and challenges associated with systemized integration of data from INPC with geospatial attributes and associated community data of SAVI, we describe an example use case—disparities in the social determinants of STIs.
Despite dramatic health disparities in STI rates by race/ethnicity and SES, previous work has recognized the limitations of individual-level and behavioral approaches in effectively targeting STI risk factors31,32 and has called for further investigation of social33 and structural34 characteristics as potential factors for contributing to the geospatial clustering of STIs.12,30,33,35–42 Investigation of how STI cases cluster geographically may identify key associations with the social and physical environment and, thus, inform research on STI prevalence and disparities. Considering geographic context and subsequently addressing characteristics of core areas contributing to spatial heterogeneity of STIs is crucial in effectively tailoring and implementing interventions.33 Furthermore, determining significant “areas of exposure” may be important in targeting prevention efforts and understanding mechanisms of local sexual networks.
In considering how to integrate community-level exposures and STI risk, we have added to a conceptual model recently presented and discussed by Buffardi et al. (Figure).43 In this model, ecosocial and individual psychosocial factors (distal risk factors) contribute to STI risk via altered partner characteristics and high-risk sexual behaviors (proximal risk factors). We have modified this model by adding another layer to the distal risk factors box, which includes compositional and contextual measures. We hypothesize that these factors may also modify partner characteristics and high-risk sexual behaviors, thus increasing STI acquisition and diagnosis.
For the use case, we are interested in assessing associations between an individual's disease risk (occurrence of a particular STI during that year) and the selected community measures (low SES, incarceration rates, marriageable males, percentage of vacant houses, and institutional resources) at different geographic levels to evaluate significance of exposure proximity. We use incarceration as an example case because STIs have been well-documented in incarcerated populations and among arrestees,44–48 including adolescents.49–55 In addition, area-level incarceration has been linked to area-level STI rates. Researchers have suggested that associations between STI prevalence and incarceration rates may reflect the impact of ex-offenders on the general population.56 Incarceration may also produce a change in network structures while community members are incarcerated.32,56,57 Communities with high incarceration rates may also carry disproportionate risk through contact with individuals with high STI risk prior to their incarceration.58
Most studies of community risk factors rely on commonly available administrative boundaries, such as census-tract boundaries, to define a patient's area of exposure. This is likely a decision of convenience rather than a scientifically based one. While we are interested in developing measures at the census-tract level for comparison with previous studies, we are also interested in developing measures for zones around a patient's residence (e.g., 200 meters, 400 meters, and 800 meters). As such, we identified the following data-processing and integration steps for our example use case.
Residential address indicates where the individual likely spent a significant portion of his/her time, although other relevant exposures may include those associated with place of work, school, and other daily activities.59,60 Multiple residential addresses can be stored if a patient changed residence between clinical visits. Access to these time-series address data allows analysis of effects of residential mobility and longitudinal exposures. Once geographic coordinates are defined for a patient record, they can be used to identify the associated areas (i.e., census tracts) for which compositional and contextual data are available.
We are interested in where convicted criminals lived vs. where crimes occurred. While crime event data are collected and geocoded by SAVI, incarceration data currently are not. As such, individual-level incarceration data must be collected from other agencies and geocoded.
Using the generated geographic coordinates of each positive STI, we will be able to calculate incarceration rates for various geographies, including census boundaries or buffer zones.
The remaining community variables of interest are available from SAVI at the census-tract level. We will be able to readily link these variables to the STI cases using census-tract identification as the joining field. For analysis using our buffer zones of interest, we will be able aggregate the available record-level data in SAVI to the generated buffer zones.
While the integration of spatially enabled EHR systems with CIS holds great promise for understanding health disparities, the automated integration of these data presents challenges.
The list of use cases that could potentially be supported by such an integrated system is endless. Also, the output of spatial data integration and analysis will need to be translated into formats useful for a wide range of public health users and accessible to multiple agencies and jurisdictions. Prioritizing use cases and associated system requirements is important for project planning and managing stakeholder expectations.
Geocoding capabilities are available via a wide range of geographic information system tools and services. In addition, there are multiple types of reference data (e.g., street address centerlines, property parcels, and address points) and sources (e.g., federal, commercial, and academic) for any given geographic area. Different reference data may produce different geospatial attributes. Thus, appropriate selection depends on the use case and an associated assessment of available data, including the cost and associated data types, geometries, accuracies, and resolutions. Finally, geocoding technology and reference layers will continue to evolve.
Both INPC and SAVI had to develop relationships and trust among their data-sharing partners; INPC and SAVI must continuously demonstrate that their partners are getting value from sharing their data and that their interests are being protected. This sociocultural component is a greater challenge than the technical design and implementation of such a system. Data-sharing memoranda of understanding include oversight provisions, guarantees of confidentiality, and guides on allowable data use. Only aggregate data at a geographic level that appropriately protects the confidentiality of an individual's data are publicly released. In addition to assurances of data security, many stakeholders seek the greatest community benefit possible from their data-sharing efforts. To increase the possible uses of the integrated data while maintaining individual confidentiality, we have discussed the future integration of geomasking techniques into the system.
To meaningfully facilitate interdisciplinary research of SDH, the integration of clinical and community data systems requires active collaboration among information scientists, geographers, data providers, clinicians, and public health investigators experienced in using clinical, as well as community, data. Specifically, the collaboration helps to accomplish the following tasks.
The productive synthesis of geospatially enabled clinical and community data requires careful consideration of the data needs and uses associated with different groups, including, but not limited to, data system administrators, clinicians, public health researchers, public health agencies, and nongovernmental public health organizations. Multidisciplinary collaboration on system design can require extensive communication efforts due to different theoretical frameworks and terminology. Clear documentation of agreed-upon data classifications, standards, and processes is essential.
To develop integrated data systems that meaningfully facilitate exploration of SDH, we need input from all collaborators experienced in using clinical and community data. The coordinated integration of an EHR system with a CIS provides the opportunity to build consistency in geocoding methods and reference data. Use of consistent methods and reference data for geocoding clinical and community datasets is necessary to avoid varying assignment of geographic coordinates biasing subsequent analysis. As such, choice of geocoding technology and reference maps requires expert knowledge of geographic information science and an understanding of use cases.
Acquisition of administrative datasets, particularly at the individual record level, typically requires memoranda of understanding with the source data provider(s), usually with strict data-handling protocols designed to protect confidentiality. Establishing a memorandum of understanding can be time-consuming. With the explicit, written permission of source providers and Institutional Review Board approval, many of the individual-level data geocoded for CIS inclusion can be made available for research and other public health uses in a de-identified or aggregated format. CIS typically have data advisory committees composed of community stakeholders that recommend datasets for inclusion based on understanding the needs of the local community. These committees provide a valuable source of community perspective.
Ultimately, we want to make our integrated data available for public health uses. An EHR system in combination with a CIS can be used to significantly augment existing community health assessment tools. These tools typically report health outcomes by county or other geographic administrative unit relevant for policymaking but not necessarily for understanding SDH. Our envisioned system will incorporate the generated spatial attributes into existing standards-based, bidirectional communications between public health and clinical stakeholders. For this full potential to be realized, multiple public health sectors must become invested, contributing members of use case development.
The benefits of integrating EHR systems with CIS include a better fit between the intended use of clinical, spatial, and community data and the associated geoprocessing options, with the ultimate aim of developing a sound theoretical and practical guide for applying these data for addressing SDH. Benefits also include the opportunity to generate and test new hypotheses to address the social etiologies of health disparities. Our integration work will lead to the identification of nexus elements in system design that will inform the work of other systems' integration efforts and continually improve upon our current efforts. For instance, system integration will address whether geospatial data may improve patient-matching methods that are used to aggregate patient data.
The growth in the number of EHR systems and CIS nationwide holds promise for this type of system integration and the associated benefits to occur in cities and states across the nation. In 2009, there were an estimated 193 initiatives across the U.S. pursuing HIE activities.61 Additional investment from the American Recovery and Reinvestment Act of 2009 is likely to increase this number.62 The Centers for Disease Control and Prevention (CDC) has invested in HIE via the Nationwide Health Information Network (NHIN), envisioned to be a network of networks allowing for data integration across states and regions among clinical and public health information systems.63,64 CDC also has funded initiatives that would connect HIEs to state health departments and state health departments to CDC via NHIN.65,66
U.S. agencies are increasingly developing, integrating, and using local data for community building and policymaking, as reflected by the growth and activity of the National Neighborhood Indicators Partnership (NNIP), which grew from six original partner cities in 1995 to 36 in 2011 and has been involved in many multi-city initiatives to advance the development and use of local CIS. All NNIP partner cities have built advanced, spatially enabled data systems on neighborhood conditions. The local specificity of these data systems makes them a valuable resource for those interested in investigating the relationships between health and prevention and social, cultural, and other environmental factors. A 2004 U.S. Government Accountability Office study surveyed such systems for potential inclusion in a national indicators system.29 These systems serve different purposes, some with comprehensive missions (e.g., SAVI), and others focused on specific topics such as education or housing; however, most use geospatial information technologies and have efficient geocoding mechanisms.67 Despite the significant potential for geospatial integration with HIE, CIS do not have a standard design, as most have developed in situ in response to local interests and needs68 and, thus, their possibilities and challenges vary.
While it is not uncommon to geocode clinical data for ad hoc analyses and queries, it is uncommon to routinely maintain an up-to-date geocoded patient registry containing billions of clinical data results using operational automated mechanisms. We believe that this type of system will become more common in the future, and our system serves as an example of a novel data-enhancement process that links clinical data to important community-level information. Once the integration of CIS as a core part of HIE infrastructure becomes accepted practice, more in-depth investigation of SDH disparities will be possible.
Understanding SDH is critical in addressing health disparities. However, relatively few research projects use individually identified data, tracked longitudinally with point-level address data, possibly contributing to the reason that resulting findings and associated interventions have been inadequate to address the problem. With the nationwide development of HIEs and CIS, there is an increased opportunity for meaningful compositional, contextual, and geographic analysis of SDH. To realize the full potential of this opportunity, several challenges must be met, including the need for (1) a multidisciplinary approach; (2) research in related fields, such as information science, geoinformatics, informatics, and health geographics; and (3) tools that can be used by multiple public health sectors. The benefits, however, clearly outweigh the challenges.
With system integration recently completed, including the geocoding of more than 27 million historic clinical records and an additional 300,000 new clinical records every evening, we are poised to proceed with analysis of the linked data for several local research projects, including the study of STIs, obesity, and diabetes health disparities.
With our advancing technology, interdisciplinary collaborations, and developing experience, we will aptly be able to investigate further and hopefully address fully health disparities resulting from social determinants.