The GEM database was designed to facilitate these standardization and sharing processes across scientific disciplines. Necessarily, the GEM database was built to be flexible enough to accommodate different types of measures (e.g., from self-report to biological). The system can accommodate independent and dependent variables across the health continuum, from prevention, diagnosis, treatment and end-of-life issues, regardless of disease/wellness focus. Using principles of Science 2.0, GEM solicits community participation in contributing, vetting, and selecting measures for harmonizing data in a grid-enabled world. More importantly, it is a doorway – or portal – to the use of harmonized measures and creation of a more semantically integrated science.
As a result, the GEM database creates an environment for “prospective meta-analyses” in which research is designed for integration. Comparisons across studies can become the norm, rather than the result of the current, often awkward, retrofitting process. The purpose and structure of the GEM approach can be seen from the design of the online tool’s primary interface. The GEM database interface uses a tab-based architecture with three main tabs: constructs, measures, and data sets. Briefly, users can search for or add information about theoretically based constructs (including meta-data such as the definition and synonyms), find linked measures in a one-to-many relationship (along with associated meta-data such as author, reliability, and validity) and search for and share harmonized data that use any common measures (see ).
Tabbed architecture of the GEM portal leading the user through an evaluation of construct, measure, and associated data sets.
The GEM database is based on four primary design principles: (1) Architecture for participation—barriers to participation (or collaboration, in the scientific realm) are removed—that is, it is easy to “plug and play” (think of how easy it is to upload videos to YouTube); (2) Data-driven decision making—decisions, whether to buy a stereo or choose a certain doctor, hospital, or self-report measure are based on objective data; (3) Wisdom of the masses—under certain circumstances (i.e., decentralization, diversity of opinion), the masses can make more intelligent decisions than an individual expert40
; (4) Open access—the ability to access or manipulate data and make results available to those who need and can use it in a functional manner.
Architectures for Participation
As its name implies, the GEM database is integrated with the caBIG® grid-computing infrastructure and its resources. caBIG® and GEM intersect in many ways. First, and most directly, the meta-data available in the GEM database (i.e., the information about the constructs, measures and data sets) are published through a publicly available caGrid data service. This data service was implemented to be fully compatible with the caBIG®/caGrid integrated framework (“the Grid”). This means the GEM meta-data can be used by other data systems and/or analytic web services connected to the Grid using a well defined architecture. While the caGrid object-oriented architecture ensures interoperability between systems, reconciling definitions of terms ensures semantic interoperability through a common vocabulary and concept mapping. Therefore, a system that understands the concepts associated with behavioral science measures and constructs can use the meta-data available through the GEM database service for its own purposes. This flexibility could help facilitate data sharing between systems that catalog scientific measures and constructs, among other things.
In addition to publishing its meta-data through a data service, the GEM database allows users to provide meta-data about research data sets (e.g., description, dates of data collection, sample characteristics) available as data services on the Grid. To work successfully, these data sets must be available as caGrid-compatible data services and must use at least one of the measures documented in the GEM database. The meta-data about each data set includes the information required to access the associated Grid data service. With the addition of this functionality, researchers can now find measures of interest and then immediately find data sets that include these measures. At a minimum, this function allows researchers to review the type of data produced by a measure of interest. Depending on a researcher’s objectives and technical capabilities, a query could then be executed to bring together data from several data services all using common constructs and associated measures. This interoperability could open the door to new data-sharing opportunities within the scientific community. Very likely, novel statistical techniques will need to be applied. Ultimately this approach to common data collection and resulting sharing has the potential to accelerate the accumulation of knowledge.
Templates Available for Uploading Meta-Data
From its inception, one of the core goals of GEM has been data sharing. As discussed previously, the GEM database shares meta-data via a public data service on the Grid and promotes the sharing of data through its “Datasets” tab. However, true data sharing is reciprocated. That is, the system shares its information but also receives information from outside sources. Since there are outside database systems that contain meta-data about measures and constructs, the creators of the GEM database wanted to make sure it could benefit from these existing repositories without requiring duplicate data entry. In the caBIG® world, the ideal solution would be to have the GEM database import these meta-data using another data service on the Grid. However, at this time, most outside sources of meta-data for measures and constructs are not connected to the Grid. These sources are primarily proprietary data belonging to individual investigators. Therefore, a file-based approach to importing data into the GEM database needed to be developed.
During the initial deployment phase, the GEM database supported the importation of information about constructs and measures via Microsoft Excel spreadsheets. Standardized import template spreadsheets were developed for both constructs and measures. Institutions wishing to contribute meta-data from an existing database used these templates to create import spreadsheets. These spreadsheets were then processed and the contents imported into the GEM database. Although the GEM database team successfully imported meta-data from several outside institutions using these Excel spreadsheets, automating the import process was challenging. The majority of the barriers to automation were related to handling data values that were not consistent with the database’s internal checks and balances. A future update will address these issues using XML (Extensible Markup Language), a schema for encoding documents in machine-readable form.
Wiki-like processes to edit meta-data
In the simplest terms, a wiki is an open, collaborative website where anyone can contribute (see for more information: http://en.wikipedia.org/wiki/Wiki
). It is in this spirit of enabling community-driven science that the GEM database was developed. It provides a means by which any registered user within the research community can contribute by adding meta-data about constructs, measures and research data. The contributor does not need to be the author of a construct or measure in order to document it in the GEM database. Any registered user can make additions or changes to any meta-data in the database. But it cannot be done anonymously. To ensure a sense of accountability, the author of the additions/edits will be known.
Just as with other Wiki-like systems, the GEM database is dependent on the community of users to monitor the accuracy and validity of the information being contributed. Information found to be erroneous or suspect by the community can be discussed and subsequently corrected by the community itself. As a safeguard, the GEM database keeps a full audit trail so that the “who, what and when” associated with each update is always available to every user. Despite these safeguards, it may seem to those not familiar with wikis that having a site where anyone who has an authorized login may contribute and edit existing information would lead to chaos and inaccurate entries. However, it has been demonstrated by Nature
that the accuracy of Wikipedia when covering the subject of science is on par with that of the Encyclopedia Britannica 41
Although the GEM database promotes Wiki-like open authorship and editing of meta-data, it is more structured than a traditional Wiki. For example, all contributed information must be entered into a fixed set of attributes instead of an open narrative format. These attributes are specific to constructs, measures or data sets. The attributes associated with measures include name, description, construct measured, target population, validity, and reliability among others. Also, some of these attributes will be required before a construct or measure can be submitted for public review. This process ensures a minimum level of completeness before the content is visible to the larger research community.
Rating/commenting on measures
An important aspect of community-driven websites is communication among various members of the community. The ability to provide feedback and comment on important concepts gives participants a voice and encourages them to fully engage in the project. The GEM database allows for community feedback in two ways. First, users are able to post comments about any aspect of the constructs, measures, and data sets. These comments are visible at the bottom of each detail page. Commenting can be used to provide feedback to content authors and potential users of measures and data sets. In addition to general comments, the GEM database encourages researchers to rate measures using a 5-star rating system. The scale has no pre-specified meaning and instead will access user’s evaluative gestalt of the measure. Users can always change their previous ratings but each user can have only one active rating for a given measure so there is less concern that someone may ‘pad’ the rating of their favorite measure. An average rating for each measure is calculated and made available through both the GEM database interface and the caGrid data service. This rating system allows a community of researchers to integrate data in the process of identifying measures that should become the standard.
Identification of one measure as better than others and a decision to promote for standard use should be data-driven. In the GEM database, users are given data to help with this process. These data include subjective outcomes such as averages and distributions of ratings, objective measures such as number of times a measure has been downloaded from the database, and more traditional psychometric information such as a measure’s reliability and validity.