|Home | About | Journals | Submit | Contact Us | Français|
The potential for genome-wide association studies to relate phenotypes to specific genetic variation is greatly increased when data can be combined or compared across multiple studies. To facilitate replication and validation across studies, RTI International (Research Triangle Park, North Carolina) and the National Human Genome Research Institute (Bethesda, Maryland) are collaborating on the consensus measures for Phenotypes and eXposures (PhenX) project. The goal of PhenX is to identify 15 high-priority, well-established, and broadly applicable measures for each of 21 research domains. PhenX measures are selected by working groups of domain experts using a consensus process that includes input from the scientific community. The selected measures are then made freely available to the scientific community via the PhenX Toolkit. Thus, the PhenX Toolkit provides the research community with a core set of high-quality, well-established, low-burden measures intended for use in large-scale genomic studies. PhenX measures will have the most impact when included at the experimental design stage. The PhenX Toolkit also includes links to standards and resources in an effort to facilitate data harmonization to legacy data. Broad acceptance and use of PhenX measures will promote cross-study comparisons to increase statistical power for identifying and replicating variants associated with complex diseases and with gene-gene and gene-environment interactions.
The incorporation of genomics data into population-based studies has led to the emergence of genome-wide association studies and a revolution in the way that scientists think about genetics and the etiology of common, complex diseases (1). Because of the rapid progress in genomic technology, investigators can now analyze hundreds of thousands of genetic polymorphisms (2, 3) against an array of disease phenotypes to identify associations. Genome-wide association studies have the potential to complement research focused on biochemical pathways and/or regulatory cascades and thus inspire new hypotheses (4). Increased understanding of disease etiology and mechanisms will facilitate development of interventions, such as novel prophylactic or therapeutic agents.
Although recent reports from genome-wide association studies have identified a large number of associations between chromosomal loci and complex human diseases (5), to date, most of these studies have had few measures in common (6–8). It is important to compare findings across studies to validate results and to detect relatively weak statistical associations that are commonly found when multiple genetic polymorphisms make small contributions to common disorders. Moreover, there are environmental exposures that can have important ramifications. These include the effects of environmental factors, including ambient environment, personal behaviors, and treatments that can influence susceptibility, presentation, and progression of disease. Several groups of investigators have successfully expanded study populations by incorporating extracted metadata from complementary studies (9, 10). For some diseases, such as diabetes and Crohn’s disease, pooling of multiple genome-wide association studies by meta-analysis has led to the discovery of new gene associations (11–13). However, standard measures could greatly simplify the task of combining studies and validating findings. Over time, the use of standard measures should make it possible to build larger populations for cross-study analysis, thus providing increased statistical power and the ability to detect moderate associations and gene-gene and gene-environment interactions.
The consensus measures for Phenotypes and eXposures (PhenX) Toolkit is designed to provide a core set of well-established, low-burden, high-quality measures for use in large-scale genomic studies. In this report, we describe the rationale and development of the PhenX Toolkit and highlight collaborations and harmonization efforts.
There are compelling reasons to promote the use of standard (common) measures for genome-wide association studies and other large-scale genomic research efforts.
The PhenX project is led by RTI International (Research Triangle Park, North Carolina) and is funded by the National Human Genome Research Institute (Bethesda, Maryland) at the National Institutes of Health (NIH). The goal of PhenX is to identify and catalog 15 high-quality, low-burden, well-established measures and accompanying standard protocols for each of 21 research domains. The PhenX measures are available to the scientific community via a Web-based toolkit (https://www.phenxtoolkit.org/).
The PhenX Steering Committee is composed of 12 scientists with a range of expertise in epidemiology, biostatistics, and genomics research who provide guidance throughout the project. The Steering Committee originally selected and defined 20 research domains that are the focus of the project (Table 1). Through collaboration with the Office of Behavioral and Social Science Research at the NIH, an additional Social Environments domain was added to the PhenX project. A PhenX domain is a field of research with a unifying theme and easily enumerated quantitative and qualitative measures. Domains include Demographics, Anthropometrics, organ systems (e.g., Neurology, Gastrointestinal), complex diseases (e.g., Cancer, Cardiovascular), and lifestyle factors (e.g., Alcohol, Tobacco and Other Substances; Physical Activity and Physical Fitness). Liaisons from the NIH institutes and centers participate in PhenX activities, including nominating Steering Committee and working group members, and are invited to participate in all Steering Committee and working group meetings. The liaisons exchange relevant information with their institutes and centers, help ensure that PhenX is coordinated with related NIH initiatives, and provide additional content expertise.
To address each PhenX research domain, a working group of domain experts is assembled. The working groups are composed of 6–9 domain experts from academic and government institutions. Working group members are carefully selected to include a balance of members with domain expertise and experience in epidemiology and genomics research. Each working group member commits to participating in the consensus process, which typically takes 7–9 months and includes 1 in-person meeting and 4–6 conference calls. The working group chairs play a key role, leading the working group throughout the consensus-based process. Working group participants are recognized on the project portal (https://www.phenx.org/). The working groups convene and use a consensus-based process to select 15 measures to be included in the PhenX Toolkit. Limiting the number of measures to 15 per domain ensures that the Toolkit includes only the highest-priority, well-established measures and also keeps the Toolkit a manageable size.
The PhenX Toolkit is designed primarily for investigators who wish to expand their disease-specific studies into other areas and who are unlikely to have sufficient resources to add more than a few measures from additional research domains that are outside their primary focus. The Toolkit provides a variety of measures; it is up to investigators to decide which PhenX measures (and how many) they want to incorporate into their overall study design. The overall process of selecting PhenX measures is outlined in Figure 1.
A major concern in any genomics-based study is ensuring accurate assessment of the phenotypes and exposures of interest. If the data used for the analyses do not reliably and accurately reflect the phenotypes or exposures, then the associations will not be valid. It is expected that investigators will almost certainly use multiple, more detailed, and potentially higher burden measures to assess their primary research interest but will use PhenX measures to expand their study to include measures from other research domains.
The Steering Committee developed the following criteria to guide the working groups in their deliberations:
The working groups review and discuss many measures pertinent to their respective domains and select preliminary measures (up to 25) for outreach to the broader scientific community (Figure 1). This outreach effort seeks to engage additional experts from the scientific community to review and comment on these preliminary measures. The working groups then consider this input in their final deliberations. Deciding on the measures to be included in the PhenX Toolkit is a difficult task, and each working group has to balance the criteria for selection put forth by the Steering Committee. If a measure is highly burdensome or is too cutting-edge to be suitable for the Toolkit but the working group thinks it is highly relevant to the research domain, then the working group may decide to include it in the Supplemental Information section of the Toolkit. The supplemental information may include gold-standard, high-burden measures and/or preliminary measures that were ultimately not selected for inclusion in the Toolkit. Other information that the working group agrees may be of value to the user may also be included. Thus, the supplemental information gives the PhenX Toolkit user additional access to the expertise and guidance of the working groups.
The PhenX Toolkit presents the measures and protocols selected by the working groups (https://www.phenxtoolkit.org/). Users can search or browse the Toolkit, selecting measures of interest by adding them to a cart. From the cart, the user can request reports that provide the information needed to collect data on the measures. The Toolkit provides a description of the measure, detailed protocols associated with the measure, and other related information: for example, rationale for selection, equipment and training required, and references. The Steering Committee envisioned the first few domains as building blocks for the entire PhenX Toolkit. Thus, Demographics, Anthropometrics, and Alcohol, Tobacco and Other Substances were selected as the first 3 domains to be addressed by working groups. With this approach, subsequent working groups, such as the Cancer or Diabetes working group, can review measures already in the Toolkit and then decide whether they are sufficient for their research domain. Setup of the initial 21 domains was completed near the end of 2010, and the selected measures are all available in the PhenX Toolkit.
Because the PhenX Toolkit includes detailed protocols for obtaining data on the measures, Toolkit users can review and assess whether or not a specific protocol is suitable for their study. The expectation is that researchers who visit the Toolkit site will be able to identify some PhenX measures that are suitable for their study population and their available resources. Figure 2 presents a screen shot of the home page of the PhenX Toolkit; a general summary of the PhenX Toolkit is shown in Table 2 (17).
Visitors to the PhenX Toolkit site can browse by research domain or search using keywords. PhenX Toolkit users can select measures and save them in a cart. Users can easily add to or remove measures from their cart as they decide which PhenX measures would be most helpful for their study. The PhenX Toolkit provides a brief description of each measure, its purpose and rationale for inclusion, standardized protocols for collecting data on the measure, supporting information, and references. The PhenX Toolkit describes the requirements for each measure, including details about the personnel and equipment needed to collect data on the measures. Users can request a report that provides the details of their selections, thus facilitating incorporation of these measures into the study design. In addition, the Toolkit alerts users if additional measures (essential data) are needed to interpret a selected measure. For example, if Toolkit users select “blood pressure,” the users are prompted to also add “current age,” “gender,” “race,” and “ethnicity” to their cart (i.e., a specific collection of measures). After following a simple registration process, registered Toolkit users can save multiple carts and can share their carts with other registered users via a Toolkit network. This allows investigators who are planning different studies (or expanding an existing study) to work together to include a common set of PhenX measures for future analyses. A data collection form that will help investigators collect the data associated with PhenX measures is currently in development. The data collection form will also make it easy for investigators to integrate PhenX measures into their primary study design.
Once an individual has been genotyped, that genotype can potentially be related to any trait, not just the primary phenotype in the original study (14). Because many of the target (primary) phenotypes of research studies are complex conditions or disorders, data are commonly collected on multiple risk factors and comorbid conditions. This opens the door to cross-study analyses of not only the primary phenotypes but also secondary phenotypes (18–21).
Although reports have clearly demonstrated the value of integrating data across related studies and even across disciplines (22), most genome-wide association studies to date have focused on a specific disease or trait. The PhenX Toolkit is designed to aid investigators who are interested in expanding their study to include measures that are outside of their primary area of expertise. For example, an investigator who is planning a neurology study may choose PhenX measures in the Nutrition and Dietary Supplements, Cancer, and Respiratory domains in addition to PhenX Neurology measures. It is also worth noting that some conditions or diseases may be associated with the same phenotype, such as obesity with cardiovascular disease and diabetes. Perhaps even more important, expanding genomics-based studies to include phenotypes outside of the primary research interest is essential to understanding pleiotropic genetic effects (23). Thus, as investigators extend their studies to incorporate PhenX measures, new relations between seemingly unrelated disciplines are likely to be uncovered. Figure 3 illustrates the incorporation of PhenX measures into individual studies and the resulting ability to combine data from multiple studies.
To achieve data interoperability, the adoption of standard data formats and vocabularies is essential (24). The incorporation of PhenX measures into individual studies at the experimental design stage and/or prior to collecting the data will make it possible to easily combine data from multiple, largely unrelated studies. Combining studies generates increased statistical power and the ability to detect both more subtle and more complex—and, perhaps, unexpected—gene associations.
The PhenX Toolkit is designed to help investigators effectively expand their studies, but there are limitations. Each working group is asked to balance multiple criteria for selecting measures (defined by the Steering Committee) as they decide what measures to include in the Toolkit. Current limitations are: 1) the Toolkit does not necessarily include the gold standard for each research domain, as these measures are often quite burdensome to administer; 2) promising but relatively new measures are not included in the Toolkit because they are not yet well established; and 3) established protocols are not modified (although some working groups indicated that this could be beneficial).
The PhenX investigators are currently collaborating with administrators of the database of Genotypes and Phenotypes (dbGaP) (8) (http://www.ncbi.nlm.nih.gov/gap/), the Public Population Project in Genomics (P3G) (25) (http://www.p3g.org/), the Data Schema and Harmonization Platform for Epidemiological Research (DataSHaPER) (26) (http://www.datashaper.org/), and the National Library of Medicine (http://www.nlm.nih.gov/). This work is focused on developing a consistent rule set for mapping PhenX measures to dbGaP study variables and DataSHaPER measures and variables. The plan is to highlight PhenX measures in dbGaP and DataSHaPER. The value of this approach is that investigators who visit the dbGaP or DataSHaPER site will be able to readily identify PhenX measures in these resources, thus facilitating data-sharing and data harmonization. In addition, researchers may be able to identify opportunities to extend studies to include data and samples associated with P3G biorepositories. The PhenX investigators are working collaboratively with the National Library of Medicine to ensure that PhenX is aligned with NIH bioinformatics efforts such as Logical Observation Identifiers Names and Codes (27, 28). They are also collaborating with the Electronic Medical Records and Genomics consortium (https://www.mc.vanderbilt.edu/) to facilitate sharing of data captured in electronic medical records.
The use of PhenX measures will facilitate downstream harmonization and meta-analysis. The compelling need to combine studies and to take advantage of legacy data has led to efforts to harmonize similar data elements. Harmonization efforts such as P3G, DataSHaPER, and the Gene Environment Association Studies (GENEVA) consortium are currently under way. DataSHaPER is focused on developing tools for retrospective data harmonization (26). The GENEVA consortium has established a unified framework for genotyping, data quality control, analysis, and interpretation (14). Harmonization methods that make it possible to compare or combine related data types for meta-analysis have proven to be very effective (29) and will always be an option.
As a result of supplemental funding provided under the American Recovery and Reinvestment Act of 2009 (Public Law 111–5), PhenX is extending the Toolkit’s browse and search capabilities to better reflect the interrelatedness of measures across the research domains and collected statistics from Toolkit users (such as “top 10” measures). Measures are currently organized into various groups or collections to allow investigators to browse the Toolkit from a variety of perspectives. For example, in addition to being able to browse measures by “research domain,” users may identify measures of interest by browsing collections of measures such as “risk factors” or “life stages.” This approach could be extended to help Toolkit users assess complex diseases and conditions. For example, investigators could come to the Toolkit and find measures associated with Sjögren’s syndrome or metabolic syndrome even though the measures may have been selected by several different working groups. The Smart Query tool helps Toolkit users find measures using keywords or concepts and traverses the entire Toolkit to provide relevant measures for consideration. A data collection form and a data dictionary are being developed for the Toolkit that will make it easier for investigators to collect and analyze the data associated with PhenX measures. Also in development is a comprehensive bioinformatics mapping document that will link PhenX measures to various resources and standards.
We are developing a strategy to raise the visibility of the Toolkit and promote its use by epidemiologists and other investigators. Based on Toolkit user feedback, we expect to continue to update the functionality of the Toolkit. We plan to establish a process for updating Toolkit content. As complementary and related research efforts mature, some of these measures may be incorporated into the PhenX Toolkit. For example, the Patient-Reported Outcomes Measurement System (http://www.nihpromis.org/) is developing new instruments for effectively capturing patient-provided information, and the NIH Toolbox (http://www.nihtoolbox.org/) is focused on developing new protocols for neurologic and behavioral assessments.
We also envision that the results of our current collaborative efforts will facilitate the mapping and highlighting of PhenX measures in additional data repositories and resources. The PhenX team will continue to welcome additional opportunities to collaborate.
The PhenX Toolkit provides the research community with a core set of high-quality, well-established, low-burden measures intended for use in genome-wide association studies and other population-based studies. More specifically, the PhenX Toolkit will make it easy for researchers to effectively expand a study to include standard measures outside of their primary research focus. Broad acceptance and use of PhenX measures will promote cross-study comparisons to increase statistical power for identifying and replicating variants associated with complex diseases and with gene-gene and gene-environment interactions. The hope is that the PhenX Toolkit will be widely adopted by the scientific community, fostering a new era of cooperation and collaboration and facilitating cross-study, transdisciplinary, and translational research.
Author affiliations: RTI International, Research Triangle Park, North Carolina (Carol M. Hamilton, Lisa C. Strader, Joseph G. Pratt, Deborah Maiese, Tabitha Hendershot, Jane A. Hammond, Wayne Huggins, Dean Jackman, Huaqin Pan, Destiney S. Nettles); National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland (Heather A. Junkins, Erin M. Ramos); Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland (Terri H. Beaty); Departments of Medicine, Neurology, Ophthalmology, and Genetics and Genomics, School of Medicine, Boston University, Boston, Massachusetts (Lindsay A. Farrer); Departments of Epidemiology and Biostatistics, School of Public Health, Boston University, Boston, Massachusetts (Lindsay A. Farrer); Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts (Peter Kraft); Center for Craniofacial and Dental Genetics, Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania (Mary L. Marazita); Human Nutrition Research Center on Aging, Tufts University, Boston, Massachusetts (Jose M. Ordovas); Zilkha Neurogenetic Institute, University of Southern California, Altadena, California (Carlos N. Pato); M. D. Anderson Cancer Center, University of Texas, Houston, Texas (Margaret R. Spitz); RTI International, San Diego, California (Diane Wagener); Department of Epidemiology, School of Public Health and Community Medicine, University of Washington, Seattle, Washington (Michelle Williams); Office of the Director, National Institutes of Health, Chevy Chase, Maryland (William R. Harlan (retired)); Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee (Jonathan Haines); and Epidemiology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina (Richard K. Kwok, Destiney S. Nettles).
This work was supported by the National Human Genome Research Institute (award U01 HG004597-01).
Guidance is provided to the PhenX project by the PhenX Steering Committee: Jonathan Haines (Chair), William R. Harlan (Vice-Chair), Terri H. Beaty, Lindsay A. Farrer, Peter Kraft, Mary L. Marazita, Jose M. Ordovas, Carlos N. Pato, Erin Ramos, Margaret R. Spitz, Diane Wagener, and Michelle Williams. The PhenX working groups have made key contributions to this project. In particular, the authors acknowledge the expertise and significant contributions of the PhenX working group chairs (to date): Deborah Hasin (Alcohol, Tobacco and Other Substances); Michele Forman (Anthropometrics); Co-Chairs Christine B. Ambrosone and Neil Caporaso (Cancer); Tom Pearson (Cardiovascular); Craig Hanis (Diabetes); Myles Cockburn (Demographics); Lynn Goldman (Environmental Exposures); Jeffery Vance (Neurology); Patrick J. Stover (Nutrition and Dietary Supplements); Co-Chairs James Beck and Bryan Michalowicz (Oral Health); Janey Wiggs (Ocular); Co-Chairs Bill Haskell and Rick Troiano (Physical Activity and Physical Fitness); Co-Chairs Kenneth S. Kendler and Jordan Smoller (Psychiatric); Carol Hogue (Reproductive Health); and Edwin K. Silverman (Respiratory).
The authors thank Dr. Teri Manolio and Dr. Kimberly Tryka for critical review of the manuscript, Michal Zmuda for help with figure design, and August Gering and Laura Small for editorial review.
Conflict of interest: none declared.