We evaluated six end-user applications and research network processes for data sharing. The applications and processes did not meet the security requirements for local control and vetting of individual queries. Several processes required unavailable local expertise at the clinics such as programming and database staff to manually execute queries. Given the infancy state of end-user applications that supported our data sharing security and governance needs and the enormity of developing the requisite software ourselves, we limited the project scope to implementing the ETL process without an end-user application layer. Thus, for the pilot project, data sharing across sites would be supported manually through ITHS and vendor collaborations, rather than through an end-user application. As we developed the ETL process, we also developed data quality and metadata management approaches to establish a foundation for a data sharing architecture that could apply an end-user application layer in the future.
Our sites use diverse EMR products without agreed upon data standards and data practices. Among the 10 evaluation sites, four different EMR products were in use with variations across ownership and physical location of EMR data for our six selected partner sites. This complicated cost and governance issues to gaining access to perform extractions. Three sites had physical access and ownership over their EMR data, providing the best ease of access. Two sites had only ownership or physical access, complicating access.
Our last site had neither physical access nor ownership over their EMR data, postponing our ability to include them in this new architecture until they migrated to a new EMR system.
We recognized that engaging a vendor who specialized in ETL across diverse practice settings would be the most cost-effective method for performing ETL at multiple sites with multiple vendor supplied EMRs. Therefore, vendors who had previous experience with ETL processes at primary care clinics, experience with multiple EMR products, and data quality and semantic alignment strategies were sought. Discussions with national CTSA colleagues building similar architectures and Practice Based Research Networks (PBRNs) led us to focus on point-of-care based clinical decision support (CDS) tools for our data quality approach. Employing a clinical decision support tool at sites offered two key features: 1) a natural data quality feedback loop to sustain the usability of the repositories by actively using and iterating the extracted clinical data in practice; and 2) immediate benefit to our partners. Therefore, we pinpointed vendors based on their experience with delivering ETL services in medical settings, point-of-care CDS tools, and solutions to semantic alignment. These requirements, in addition to our original requirements of including a federated architecture and remote management, comprised the core set of system requirements that we used to evaluate vendors.
Specifically, we identified and evaluated four vendors (W, X, Y, Z) with experience in providing data services to medical settings. summarizes our system requirements and vendor evaluations. All vendors had the ability to remotely manage their systems. Vendors W and X’s primary business was primary care CDS tools using an extracted data repository. Vendor W extracted EMR data into repositories located at practice sites while Vendor W extracted the data into a repository located remotely at their own facilities. Vendor Y was a clinical data warehousing consulting firm with extensive experiences performing custom ETL projects at large hospitals, but not small primary care based clinic settings. Vendor Z specialized in health data exchange services. Both Vendors Y and Z lacked the necessary ETL experience and a CDS tool. Vendor W met all of our criteria, met our budgetary constraints, and brought additional expertise in national health guidelines and delivering data extractions to support comparative effectiveness research.
Table 1. Vendor evaluation matrix. Vendors A and B specialized in clinical decision support products. Vender C was a clinical data warehousing consulting firm. Vendor D specialized in health data exchange. The system requirements are listed in the left column (more ...)
summarizes the resulting ETL and data quality components of our technical architecture. Our vendor extracts a set of common data elements into individual LC Data QUEST repositories located at each practice site. Once the data is loaded into the LC Data QUEST repository, it can be shared with researchers to support various research related activities, including cohort discovery, randomized control trials, and comparative effectiveness research. Individual practices can also analyze their own repository data to target quality improvement initiatives or to support any individual practice based activity, using a registry tool licensed by the vendor. Data are owned by the individual partner sites and no data are shared to outside collaborators unless explicitly approved by the site.
Figure 2. LC Data QUEST technical system architecture illustrating ETL and data quality management activities. At each practice, a standard set of EMR data elements are batched daily into a local repository. The repository supports generation of point-of-care decision (more ...)
At the bottom of the local data loop, a program generates a point-of-care report that includes CDS for recommended national guidelines of care. Patients and practitioners review the point-of-care report during visits and can correct data errors.
Our initial federated data sets included variables that support management of common guideline supported diseases, although the design is scalable for expansion to other domains that sites find desirable or are needed by data sharing projects in the future. However, as an initial proof-of-concept, providing decision support using national clinical guidelines was of immediate benefit to practices, while the extracted data allowed us to test our data quality and metadata management strategies. Two of the six pilot sites have implemented ETL, with two in the installation process, and two with some delay due to administrative and technical issues (i.e., governance requirements and EMR migrations). LC Data QUEST supports three funded research projects outside of the initial pilot funding, with several projects in development and growing collaborations.
A ,method for managing inventory and the complex set of shared clinical data has led to the development of the Federated Information Dictionary Tool (FInDiT), specified to catalogue type, quantity, and quality of the data that are available across the LC Data QUEST data sharing architecture. This design allows for easy addition of future sites by: 1) defining a set extract format for aggregated data content and metadata needed from any additional federated repository wishing to be added; 2) allowing for simple upload of this extract into a SQL database; and 3) dynamic access to the data via the web-based front-end graphical user interface.