|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) have revealed novel genes and pathways involved in lung disease, many of which are potential targets for therapy. However, despite numerous successes, a large proportion of the genetic variance in disease risk remains unexplained, and the function of the associated genetic variations identified by GWAS and the mechanisms by which they alter individual risk for disease or pathogenesis are still largely unknown. The National Heart, Lung, and Blood Institute (NHLBI) convened a 2-day workshop to address these shortcomings and to make recommendations for future research areas that will move the scientific community beyond gene discovery. Topics of individual sessions ranged from data integration and systems genetics to functional validation of genetic variations in humans and model systems. There was broad consensus among the participants for five high-priority areas for future research, including the following: (1) integrated approaches to characterize the function of genetic variations, (2) studies on the role of environment and mechanisms of transcriptional and post-transcriptional regulation, (3) development of model systems to study gene function in complex biological systems, (4) comparative phenomic studies across lung diseases, and (5) training in and applications of bioinformatic approaches for comprehensive mining of existing data sets. Last, it was agreed that future research on lung diseases should integrate approaches across “-omic” technologies and to include ethnically/racially diverse populations in human studies of lung disease whenever possible.
Recent results of genome-wide association studies (GWAS) for asthma (1–6), chronic obstructive pulmonary disease (7), sarcoidosis (8), idiopathic pulmonary fibrosis (9), and other lung-relevant phenotypes (10–13) have highlighted both the power and the shortcomings of this approach (14–16). Although novel genes or genetic variants have been identified and, therefore, implicated in disease pathogenesis, a large proportion of the genetic variance in each case remains unexplained. Moreover, among those genes and genetic variations implicated in disease pathogenesis by the GWAS approach, the function of those variations and the mechanisms by which they contribute to disease pathogenesis are still largely unknown.
To address these shortcomings and consider future directions for the genetic dissection of complex lung phenotypes, the Lung Division of the National Heart, Lung, and Blood Institute (NHLBI) sponsored a 2-day workshop, “Getting from Genes to Function in Lung Disease,” on September 3 and 4, 2009. Two overview presentations by Dr. Carole Ober (University of Chicago) on the genetics of lung disease and by Dr. Ronald Crystal (Weill Cornell Medical College) on lung disease phenotypes emphasized both the complex etiology of lung diseases (Figure 1) and common pathogenic features across lung diseases. They suggested that additional insights may be gleaned from studying common features of diseases with different disease endpoints and emphasized the need for better phenotyping in studies of lung disease and development. Dr. Eric Schadt (Pacific Biosciences) then reviewed genomics and systems biology approaches to studying complex phenotypes. The workshop then covered topics on data integration (A. Butte, Chair), systems genetics (A. Lusis, Chair), functional validation in model organisms (J. Elias, Chair), and in humans (D. Schwartz, Chair), and translational and integrative genomics in clinical settings (C. Ober, Chair). The workshop attendees participated in lively discussions on these topics with the ultimate goal of defining a series of recommendations to the Institute on future research initiatives that will fill in our gaps of knowledge of the genetic basis of lung disease by applying integrated state-of-the-art “-omic” approaches.
In the opening scientific session of the workshop, Dr. William Cookson (Imperial College, London) spoke on “Integration of Genetics and Gene Expression Profiling in Lung Disease.” Dr. Cookson reviewed expression quantitative trait loci (eQTL) mapping, the increasingly popular analytic method for integrating gene expression measurements and genetic measurements made from the same samples or individuals. In this approach, gene expression levels are treated as quantitative traits and mapped to chromosomal loci, and Dr. Cookson showed how this method has been used to study traits related to asthma. However, Dr. Cookson argued that a single measurement of gene expression in a single tissue or cell type may not be sufficient, and time-series measurements and measurements in multiple tissues may be more informative. As a cautionary tale, Dr. Cookson illustrated how eQTL mapping results differ when different genotyping platforms are used. Because probes differ between array platforms, the results of studies using different platforms will also differ, and these differences must be taken into account when interpreting results. He pointed out, however, that this will only be an issue until RNA sequencing becomes more widely available, which is already the preferred approach for measuring transcript levels.
Dr. Atul Butte (Stanford University) next spoke on “Exploring Systems Medicine Using Translational Bioinformatics.” Dr. Butte focused on the highly enabling nature of publicly available molecular measurements, and showed that measurements from more than 300,000 gene expression microarrays and DNA samples from tens of thousands of individuals can be downloaded from the National Center for Biotechnology Information. Dr. Butte illustrated how these kinds of measurements can be used for integrative genomics, giving one case example of their use for finding novel biomarkers for solid organ transplantation rejection and for enabling the discovery of novel ligands and receptors associated with type 2 diabetes mellitus. His final points warned against the development of reference repositories and bioinformatics methodology as end goals, but instead promoted investments in the application of computational methodologies on available data to further the development of diagnostics and therapeutics.
The final speaker in this session, Dr. Dan Roden (Vanderbilt University Medical Center) spoke on “Integration of the Health System into Genetic and Genomic Data.” Dr. Roden showed Vanderbilt's impressive progress in obtaining DNA samples on tens of thousands of patients within their health system, and tying these deidentified samples with a linked but also deidentified copy of their electronic medical record, a system called BioVU. With more than 60,000 samples already obtained, Dr. Roden illustrated the next hardest challenge: determining patients' “medical phenotype” from the mostly text-based records that describe them. Obtaining DNA on this many patients will enable phenotype-wide association studies in the future, which Dr. Roden defined as the search for differences in specific medical phenotypes that correspond to variance at a genetic locus. This novel approach was contrasted to GWAS, in which variance in one narrowly defined phenotype is correlated against genotype at every locus to identify associations.
Discussant Dr. Joe (Skip) Garcia (The University of Illinois, Chicago) closed the session with a discussion on the current barriers to fully integrating medical and biological data and suggestions for overcoming these limitations in the future. One point raised in discussion was that we might use as models the organized structures around data storage and curation that have already been implemented in the National Cancer Institute, with their Cancer Biomedical Informatics Grid, and the National Human Genome Research Institute, with their Cancer Genome Atlas. Whether the NHLBI should consider building a similar infrastructure (e.g., “pulmBIG”), was debated. The laudatory history of the investments that NHLBI has made in genomics was also discussed, including the Programs in Genomic Application, which developed several reference technologies and data sets that are now widely used. Last, the need to fund junior faculty and trainees in these new data-driven, computationally intensive areas of research was also viewed as an important next step.
Dr. Nancy Cox (The University of Chicago) led off a session on system genetics with a talk on “Using Genome-Wide Association Studies and Integrative Network Approaches to Identify Genes and Pathways in Lung Disease.” She indicated that, although GWAS have clearly been successful in identifying novel genes for lung diseases, the findings thus far have explained a relatively small fraction variance of the heritability of the diseases and, most likely, all results to date are the “low-hanging fruit.” Moreover, she emphasized that relatively little biology has come from GWAS of lung or other diseases. The remainder of her talk focused on the use of genome-wide expression data to complement standard GWAS. She showed how expression array data could be used to in silico map loci controlling gene expression in cis or in trans and how eQTL analysis can help prioritize candidate genes in regions identified by GWAS. She pointed out that this information will also enhance our understanding of the underlying biology of the disease/trait and that eQTLs provide new ways of looking for gene–gene and even gene–environment interactions.
Dr. A. Jake Lusis (University of California, Los Angeles) built on this by speaking on “Systems Genetics Approaches to Complex Traits.” The goal of systems biology is to define all the elements present in a given system and to create an interaction network between these components so that the behavior of the system can be explained under specified conditions. Systems genetics is a form of systems biology in which the perturbations used to construct the biologic network are common variations in the population. The elements, or nodes, in the network correspond to molecular phenotypes, such as transcript, metabolite, or protein levels. Dr. Lusis described two systems being studied in his lab using systems genetics. The first consists of a series of primary endothelial cells studied both before and after treatment with oxidized phospholipids to model inflammation that occurs in atherosclerosis. The cells were then examined for global transcript levels using microarrays and the individuals were genotyped using a high-density single nucleotide polymorphism array. This allowed the identification of eQTL and the modeling of biologic networks using coexpression analysis. The networks, and predictions made from the networks, were validated using siRNA knockdown. Dr. Lusis described a second system that consisted of 100 inbred strains of mice, termed the Hybrid Mouse Diversity Panel, which allows relatively high-resolution mapping in mice using association rather than linkage. Because the inbred strains are permanent, the mice can be characterized for many different molecular and clinical traits and then readily mapped using standard mapping approaches.
Discussant Dr. Scott Weiss (Harvard Medical School) summarized the highlights of this session and led a discussion on future directions in applying systems genetics to airways diseases.
The first speaker in this session, Dr. Marcelo Nobrega (The University of Chicago) spoke on “In Vivo Platforms to Follow-up Noncoding Variants Emerging from Genome-Wide Association Studies.” The experimental follow-up of noncoding variants that map to gene deserts represents some of the most challenging aspects emerging from multiple GWAS. Focusing specifically on the hypothesis that functional noncoding variants often disrupt cis-regulatory elements, he described experimental platforms to identify these distant long-range regulatory sequences and infer the functional impact of variants within them. He showed how a strategy to convert bacterial artificial chromosomes into enhancer trapping systems allows for the characterization of regulatory landscapes using in vivo enhancer assays in mice and zebrafish. Dr. Nobrega showed how this strategy uncovered multiple enhancers in the gene desert surrounding the TBX20 gene. By showing that the expression of TBX20 was abrogated after the deletion of specific enhancers, he demonstrated that this strategy successfully uncovers most cis-acting elements at a locus. He concluded by applying these principles to identify a human prostate-specific enhancer in a gene desert and showing that this enhancer contains a single nucleotide polymorphism that has been associated with prostate cancer in multiple GWAS and confers allele-specific in vivo activity to this enhancer.
Dr. Jack A. Elias (Yale University) spoke on “Genetic Manipulation of Mice to Define Functionality Relevant to Human Lung Disease.” He discussed the need to go from genetic associations to biologic clarification, pathway identification, and therapeutic target validation; and he outlined the murine approaches that can be used to achieve these goals. Dr. Elias emphasized five basic questions: What does the gene do? What pathways does it use? How does a given genetic variant alter expression or function and relate to the pathogenesis of the associated disease? What is the role of the gene in development or early life origins of disease? Are the protein or its regulators viable therapeutic targets? The use and limitations of systemic and tissue-localized and constitutive and inducible null mutations, overexpressing transgenic, and knock-in approaches were addressed. Limiting issues included the need for (1) easier ways of generating mutant animals, (2) banks of affordable mutant animals, embryonic stem cells, and tissue-targeted Cre recombinase transgenic mice, (3) integrated analytic approaches to these models, and (4) improved methodologies to address issues relating to microRNA, glycobiology, and epigenetics. The need for better models of human disease that will allow more accurate extrapolations from mice to man and for an iterative approach that combines findings from mice, cells, and human investigations was emphasized.
Dr. Donata Vercelli discussed the “Functional Dissection of Human Genetic Variants, Or Finding the Mechanisms that Link Genotype to Phenotype.” She focused on functional studies of genetic variants associated with human complex diseases using asthma as a case in point, and emphasized the need for mechanistic studies that go beyond mere associations to define how complex disease-associated variants dysregulate gene expression and function. This knowledge is in turn necessary to identify targets for effective preventive and treatment strategies. She criticized the reductionist in vitro approaches that are commonly used as being artificial and unable to model interactions among multiple gene variants within complex haplotypes, or gene–gene and gene–environment interactions. Dr. Vercelli convincingly argued that in vivo models relying on mice that carry wild-type or polymorphic human haplotypes as transgenes or knock-in alleles provide a workable solution to this problem. Her studies on mice transgenic for the entire human Th2 locus on chromosome 5q (including IL13, one of the most robust asthma/allergy susceptibility genes) show that the human genes are appropriately regulated, both transcriptionally and epigenetically, in this model. Thus, the functional impact of polymorphic haplotypes on the expression of these genes, and the biological events they control in response to relevant stimuli, can be rigorously tested in vivo. Importantly, these models can be expanded to study gene–gene and gene–environment interactions, including response to drugs (i.e., pharmacogenetics).
Dr. David Schwartz (National Jewish Health) elaborated on this theme by pointing out that most human diseases are caused by genetic variation and environmental exposures, and that including epigenetic marks in genetic studies would, in part, account for this interaction. Moreover, he demonstrated that when biology is conserved across evolution, such as seen in innate immunity, model organisms (such as flies, worms, and yeast) represent ideal biological systems to exploit to discover novel genes and mechanisms involved in these common biological processes. Finally, Dr. Schwartz indicated that genetic, genomic, and molecular profiles can be used to define human disease, identify disease earlier, understand the dynamic biology of disease within an individual, and individualize therapy and prognosis.
Discussant Dr. Fernando Martinez (University of Arizona) closed the session on functional validation by further highlighted the importance of considering developmental stage and environmental exposures in functional studies.
In the final session of the workshop, Dr. Damien Chaussabel (Baylor Institute for Immunology Research) discussed “Translational Human Immunology,” an approach for integrating patient-based clinical studies and trials with high throughput genomic and proteomic profiling, flow cytometry, and high-resolution cellular studies to better classify human diseases, with the ultimate goal of improved (customized) therapeutics. He discussed the challenges that arise from integrating data from these various sources, as well as from investigators on different continents and using different formats. Once integrated, however, these data can be mined in ways that provide novel insights into specific diseases.
Dr. Joseph Loscalzo (Harvard Medical School) followed by speaking on “Human Disease Classification in the Post-Genomic Era: A Complex Systems Approach,” arguing against a reductionist approach to understanding human disease. Instead, he proposed using quantitative approaches to examine complex biological systems that comprise networks and simultaneously consider ensembles of models. Ultimately, this approach would enable the definition of pathophenotypes (clinical syndromes and disease) that are determined by genetic, protein, cellular, and environmental components of a network. Both speakers provided elegant examples of applying sophisticated bioinformatic approaches that integrate across data sets to characterize disease processes and to personalize therapeutic approaches.
A road map for moving from gene discovery to function, biology, and discovery of therapeutic targets is illustrated in Figure 2. In this section we focus on the early steps in this journey that require decisions on which variants/genes identified in a GWAS should be considered for functional studies.
It was widely recognized by the speakers and attendees at the conference that prioritizing specific variants or genes themselves for functional studies is a first critical step in moving beyond genetic association to function. Often a GWAS identifies many variants or genes as potential candidates for lung health or disease that cannot be further prioritized based on the statistical evidence of association. Because studies elucidating the function and biology of associated variants or the biology of newly discovered genes can be costly and time-consuming, prioritization of these GWAS discoveries for further study is an important and challenging first step.
There were two major themes that emerged from these discussions. First, in selecting variants or genes for functional studies, it is important to consider the robustness of the association. Although consideration should be given to the strength of the association, the strongest signal in any one GWAS may not always be the most robust association overall. Therefore, associations with particular variants, or even with different variants within the same gene, that replicate broadly across studies should be given highest priority. Often, these will not be the strongest associations in any one study, but the consistent evidence for association in many different studies (e.g., as revealed in a metaanalysis) would further suggest that the variant and gene have main effects on the phenotype, are less likely influenced by gene–gene or gene–environment interactions, and are most likely to be true associations.
Second, the large amassing of publicly available data on gene expression (including eQTLs) and of bioinformatic tools to predict potential functionality of genetic variants allow in silico studies of function or putative function that could both provide additional confidence in the association and inform the types of functional studies to consider. In this same vein, incorporating systems genetics, biological networks, data mining, and predictive structural changes can provide context for newly discovered genes and motivate specific functional studies, at a relatively low cost. Using in silico approaches for prioritizing variants or genes after a GWAS are particularly important in situations in which replication studies are not available, as may be the case for rare phenotypes or phenotypes that may be costly to obtain on a large number of patients (e.g., through imaging studies). In those cases, integrating bioinformatic approaches that include complementary data sets, such as gene expression networks and eQTL mapping, could reveal plausible biological pathways for the newly discovered gene and motivate subsequent functional studies.
The participants in the workshop identified specific high-priority areas for future research directions (Table 1), which fell into four broad categories: (1) functional characterization of genes involved in the pathogenesis of lung disease and of their associated genetic variations, (2) identification and characterization of interactions between genes and between genes and environment that impact lung development and pathology, (3) more sophisticated phenotyping (phenomics) that incorporates genetic, molecular, cellular, and/or physiologic biomarkers (transcriptomics, proteomics, metabolomics, etc.), and imaging, (4) better and more comprehensive mining of existing data, (5) high throughput biological screens in embryonic stem cells and model organisms that focus on gene targets identified in genome-wide association and linkage studies. It was further recommended that future research related to each of these categories should be addressed using integrated approaches that include more than one of the following: “-omic” technologies, systems genetics/biology and pathway/network analysis, animal models of lung disease, human studies in ethnically/racially diverse populations, multiple lung diseases, and mining of data from existing resources. The latter could include, but is not limited to, data from clinical trial cohorts and population-based cohorts, studies of transcriptional and proteomic profiling in relevant tissues, mouse models of lung disease, and repositories of banked tissues. Last, there was a strong consensus for the need to train young investigators to use bioinformatic tools that will allow the mining of existing data sets to address hypotheses on the pathogenesis of lung disease and to make lung-related data sets more accessible to the community at large.
All authors contributed equally to this article.
Sponsored by the Division of Lung Diseases, National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services.
A complete list of workshop participants may be found at the end of the article.
Originally Published in Press as DOI: 10.1164/rccm.201002-0180PP on June 17, 2010
Author Disclosure: C.O. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. A.J.B. received $1,001–$5,000 from Tercicia and $1,001–$5,000 from Lilly in consultancy fees; $50,001–$100,000 from Johnson and Johnson, up to $1,000 from Numedii, and $10,001–$50,000 from Genstruct in advisory board fees; $5,001–$10,000 from Siemens in lecture fees; more than $100,001 from the Hewlett Packard Foundation in institutional grants; holds a patent from Stanford University for biomarkers and therapeutics (not related to submitted paper); and received up to $1,000 from MIT Press in royalties and up to $1,000 from the American Medical Informatics Association for serving on an advisory board. J.A.E. received $1,001–$5,000 from Intermune Inc. for serving on a scientific advisory board; holds patents from Yale University on chitinases in lung Inflammation, mir-1 in VEGF tissue responses, IL-18 in COPD, and VEGF in asthma; and holds $1,001–$5,000 from Merck $5,001–$10,000 from Intermune in stock ownership or options. A.J.L. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. W.G. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. S.B-S. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. D.S. received $10,001–$50,000 from Wallace and Graham for serving as an expert witness on workers compensation evaluations; received $5,001–$10,000 from Brayton and Purcell, $50,001–$100,000 from Weitz and Luxemberg, and $10,001–$50,000 from Waters and Kraus for serving as an expert witness on determination of asbestos induced lung disease; holds a patent from MedImmune for TLR4 hyporesponsive polymorphisms used in RSV vaccine research (less than $10,000); and is employed by the NIH as a Director, NIEHS, and National Jewish Health as a professor.
Co-Chairs: Atul Butte, M.D., Ph.D., Stanford University, Stanford, CA; Jack Elias, M.D., Yale University, New Haven, CT; Aldons Jake Lusis, Ph.D., University of California, Los Angeles, CA; Carole Ober, Ph.D., University of Chicago, Chicago, IL; David Schwartz, M.D., National Jewish Health, Denver, CO.
Members: Michael J Bamshad, M.D., University of Washington, Seattle, WA; Kathleen Barnes, Ph.D., Johns Hopkins University, Baltimore, MD; Eugene Bleecker, M.D., Wake Forest University, Winston-Salem, NC; Pat Brooks, Pacific Biosciences, Menlo Park, CA; Esteban G. Burchard, M.D., M.P.H., University of California, San Francisco, CA; Damien Chaussabel, Ph.D., Baylor Research Institute, Houston, TX; Bohao Chen, M.D., University of Chicago, Chicago, IL; Geofffrey L. Chupp, M.D., Yale University, New Haven, CT; F. Sessions Cole., M.D., Washington University School of Medicine, St. Louis, MO; William Cookson, M.D., D.Phil., F.R.C.P., National Heart and Lung Institute, Royal Brompton Campus, Imperial College of London, London, UK; David B. Corry, M.D., Baylor College of Medicine, Houston, TX; Nancy Cox, Ph.D., University of Chicago, Chicago, IL; James D. Crapo, M.D., National Jewish Health, Denver, CO; Ronald G. Crystal, M.D., Cornell University, New York, NY; Joe (Skip) G. Garcia, M.D., University of Illinois, Chicago, IL; Frank Gilliland, M.D., Ph.D., University of Southern California, Los Angeles, CA; Hakon Hakonarson, M.D., Ph.D., Children's Hospital of Philadelphia, Philadelphia, PA; Howard J. Huang, M.D., Washington University School of Medicine, St. Louis, MO; Naftali Kaminski, M.D., University of Pittsburgh School of Medicine, Pittsburgh, PA; Michael Knowles, M.D., University of North Carolina, Chapel Hill, Chapel Hill, NC; Abigail Lara, M.D., University of Colorado, Denver, CO; Stephanie London, M.D., Dr. Ph., National Institute of Environmental Health Sciences, Research Triangle Park, NC; Joseph Loscalzo, M.D., Ph.D., Brigham and Women's Hospital, Boston, MA; James E. Loyd, Ph.D., Vanderbilt University School of Medicine, Nashville, TN; Fernando D. Martinez, M.D., University of Arizona, Tucson, AZ; Nuala Meyer, M.D., University of Pennsylvania School of Medicine, Philadelphia, PA; Deborah A. Meyers, Ph.D., Wake Forest University, Winston-Salem, NC; Deborah Nickerson, Ph.D., University of Washington, Seattle, WA; Dan Nicolae, Ph.D., University of Chicago, Chicago, IL; Marcelo A. Nobrega, M.D., Ph.D., University of Chicago, Chicago, IL; Rudy Pascual, M.D., Wake Forest University, Winston-Salem, NC; Vincinio de Jesus Perez, M.D., Ph.D., Stanford University School of Medicine, Stanford, CA; Diego A. Preciado, M.D., Ph.D., Children's National Medical Center, Washington, DC; Benjamin Raby, M.D., Brigham and Women's Hospital, Boston, MA; Dan M. Roden, M.D., Vanderbilt University School of Medicine, Nashville, TN; Eric Schadt, Ph.D., Pacific Biosciences, Menlo Park, CA; Sunita Sharma, M.D., M.P.H., Channing Laboratory, Harvard Medical School, Boston, MA; Edwin Silverman, M.D., Ph.D., Brigham and Women's Hospital, Boston, MA; Avrum Spira, M.D., M.Sc., Boston University School of Medicine, Boston, MA; Donata Vercelli, M.D., University of Arizona, Tucson, AZ; Scott T. Weiss, M.D., M.S., Brigham and Women's Hospital, Boston, MA; Marsha Wills-Karp, Ph.D., Cincinnati Children's Hospital Medical, Cincinnati, OH; Prescott G. Woodruff, M.D., M.P.H., University of California, San Francisco, CA; Fred Wright, Ph.D., University of North Carolina, Chapel Hill, Chapel Hill, NC; Mark M. Wurfel, M.D., Ph.D., University of Washington, Seattle, WA; John R. Yates, Ph.D., The Scripps Research Institute, La Jolla, CA; Fei Zou, Ph.D., University of North Carolina, Chapel Hill, Chapel Hill, NC.
NHLBI Staff: Susan Banks-Schlegel, Ph.D., Division of Lung Diseases, NHLBI, Bethesda, MD; Sandra Colombini-Hatch, M.D., Division of Lung Diseases, NHLBI, Bethesda, MD; Weiniu Gan, Ph.D., Division of Lung Diseases, NHLBI, Bethesda, MD; Dorothy Gail, Ph.D., Division of Lung Diseases, NHLBI, Bethesda, MD; James P. Kiley, Ph.D., Division of Lung Diseases, NHLBI, Bethesda, MD; Alan M. Michelson, M.D., Ph.D., Office of the Director, NHLBI, Bethesda, MD.