Despite overwhelming data that cigarette smoking causes chronic obstructive pulmonary disease (COPD), only a minority of chronic smokers are affected, strongly suggesting that genetic factors modify susceptibility to this disease. We hypothesized that there are individual variations in the response to cigarette smoking, with variability among smokers in expression levels of protective / susceptibility genes.
Affymetrix arrays and TaqMan PCR were used to assess the variability of gene expression in the small airway epithelium obtained by fiberoptic bronchoscopy of 18 normal non-smokers, 18 normal smokers and 18 smokers with COPD.
We identified 201 probesets representing 152 smoking-responsive genes that were significantly up- or down-regulated, and assessed the coefficient of variation in expression levels among the study population. Variation was a reproducible property of each gene as assessed by different microarray probesets and realtime PCR and was observed in both normal smokers and smokers with COPD. There was greater individual variability in smokers with COPD than in normal smokers. The majority of these highly variable smoking responsive genes were in the functional categories of signal transduction, xenobiotic degradation, metabolism, transport, oxidant-related and transcription. A similar pattern of the same highly variable genes was observed in an independent data set.
We propose that there is likely genetic diversity within this subset of genes with highly variable individual to individual responses of the small airway epithelium to smoking, and this subset of genes represent putative candidates for assessment of susceptibility/protection from disease in future gene-based epidemiological studies of smokers’ risk for COPD.
Chronic obstructive pulmonary disease and emphysema develops in 15% of ex-smokers despite sustained quitting, while 10% are free of emphysema or severe lung obstruction. The cause of the incapacity of the immune system to clear the inflammation in the first group remains unclear.
Methods and Findings
We searched genes that were protecting ex-smokers without emphysema, using microarrays on portions of human lungs surgically removed; we found that loss of lung function in patients with chronic obstructive pulmonary disease and emphysema was associated with a lower expression of CD46 and verified this finding by qRT-PCR and flow cytometry. Also, there was a significant association among decreased CD46+ cells with decreased CD4+T cells, apoptosis mediator CD95 and increased CD8+T cells that were protecting patients without emphysema or severe chronic obstructive pulmonary disease. CD46 not only regulates the production of T regulatory cells, which suppresses CD8+T cell proliferation, but also the complement cascade by degradation of C3b. These results were replicated in the murine smoking model, which showed increased C5a (produced by C3b) that suppressed IL12 mediated bias to T helper 1 cells and elastin co-precipitation with C3b, suggesting that elastin could be presented as an antigen. Thus, using ELISA from elastin peptides, we verified that 43% of the patients with severe early onset of chronic obstructive pulmonary disease tested positive for IgG to elastin in their serum compared to healthy controls.
These data suggest that higher expression of CD46 in the lungs of ex-smoker protects them from emphysema and chronic obstructive pulmonary disease by clearing the inflammation impeding the proliferation of CD8+ T cells and necrosis, achieved by production of T regulatory cells and degradation of C3b; restraining the complement cascade favors apoptosis over necrosis, protecting them from autoimmunity and chronic inflammation.
Chronic obstructive pulmonary disease affects millions worldwide. It is America’s third leading cause of death, and results in significant morbidity and cost. Although many therapies exist and are being developed to alleviate symptoms and decrease morbidity and mortality in chronic obstructive pulmonary disease, most have only been studied in placebo-controlled efficacy studies in highly selected populations. Comparative effectiveness and translational research in chronic obstructive pulmonary disease will require the development of infrastructures to support collaboration between researchers and the stakeholders who generate, disseminate and use new knowledge. Methodologies need to evolve to both prioritize research questions and to conduct collaborative comparative effectiveness research studies. Given the impracticality of testing every clinical intervention in comparative pragmatic trials for comparative effectiveness research in chronic obstructive pulmonary disease, we advocate expanding methodology that includes the use of observational databases with serially performed effectiveness analyses and quasi-experimental designs that include following healthcare changes longitudinally over time to assess benefit, harm, subgroups and cost.
chronic obstructive pulmonary disease; comparative effectiveness research; data warehouse; emphysema; health services research; outcome research; quasi-experimental design; registry; translational research
The toll-like receptors (TLRs) are important components of the respiratory epithelium host innate defense, enabling the airway surface to recognize and respond to a variety of insults in inhaled air. Based on the knowledge that smokers are more susceptible to pulmonary infection and that the airway epithelium of smokers with chronic obstructive pulmonary disease (COPD) is characterized by bacterial colonization and acute exacerbation of airway infections, we assessed whether smoking alters expression of TLRs in human small airway epithelium, the primary site of smoking-induced disease. Microarrays were used to survey the TLR family gene expression in small airway (10th–12th order) epithelium from healthy nonsmokers (n=60), healthy smokers (n=73) and smokers with COPD (n=36). Using the criteria of detection call of present in ≥50%, 6 of 10 TLRs (1, 2, 3, 4, 5 and 8) were expressed. Compared to nonsmokers, the most striking change was for TLR5, which was down-regulated in healthy smokers (1.4-fold, p<10−10) and smokers with COPD (1.6-fold, p<10−11). TaqMan RT-PCR confirmed these observations. Bronchial biopsy immunofluorescence studies showed that TLR5 was expressed mainly on the apical side of the epithelium and was decreased in healthy smokers and smokers with COPD. In vitro, the level of TLR5 downstream genes, IL-6 and IL-8, were highly induced by flagellin in TLR5 high-expressing cells compared to TLR5 low-expressing cells. In the context that TLR5 functions to recognize pathogens and activate innate immune responses, the smoking-induced down-regulation of TLR5 may contribute to smoking-related susceptibility to airway infection, at least for flagellated bacteria.
Complex chronic diseases are usually not caused by changes in a single causal gene but by an unbalanced regulating network resulting from the dysfunctions of multiple genes or their products. Therefore, network based systems approach can be helpful for the identification of candidate genes related to complex diseases and their relationships. Axial spondyloarthropathy (SpA) is a group of chronic inflammatory joint diseases that mainly affect the spine and the sacroiliac joints. The pathogenesis of SpA remains largely unknown.
In this paper, we conducted a network study of the pathogenesis of SpA. We integrated data related to SpA, from the OMIM database, proteomics and microarray experiments of SpA, to prioritize SpA candidate disease genes in the context of human protein interactome. Based on the top ranked SpA related genes, we constructed a SpA specific PPI network, identified potential pathways associated with SpA, and finally sketched an overview of biological processes involved in the development of SpA.
The protein-protein interaction (PPI) network and pathways reflect the link between the two pathological processes of SpA, i.e., immune mediated inflammation, as well as imbalanced bone modelling caused new boneformation and bone loss. We found that some known disease causative genes, such as TNFand ILs, play pivotal roles in this interaction.
Summary: The prioritization of candidate disease genes is often based on integrated datasets and their network representation with genes as nodes connected by edges for biological relationships. However, the majority of prioritization methods does not allow for a straightforward integration of the user’s own input data. Therefore, we developed the Cytoscape plugin NetworkPrioritizer that particularly supports the integrative network-based prioritization of candidate disease genes or other molecules. Our versatile software tool computes a number of important centrality measures to rank nodes based on their relevance for network connectivity and provides different methods to aggregate and compare rankings.
Availability: NetworkPrioritizer and the online documentation are freely available at http://www.networkprioritizer.de.
Finding a genetic disease-related gene is not a trivial task. Therefore, computational methods are needed to present clues to the biomedical community to explore genes that are more likely to be related to a specific disease as biomarker. We present biomarker identification problem using gene prioritization method called gene prioritization from microarray data based on shortest paths, extended with structural and biological properties and edge flux using voting scheme (GP-MIDAS-VXEF). The method is based on finding relevant interactions on protein interaction networks, then scoring the genes using shortest paths and topological analysis, integrating the results using a voting scheme and a biological boosting. We applied two experiments, one is prostate primary and normal samples and the other is prostate primary tumor with and without lymph nodes metastasis. We used 137 truly prostate cancer genes as benchmark. In the first experiment, GP-MIDAS-VXEF outperforms all the other state-of-the-art methods in the benchmark by retrieving the truest related genes from the candidate set in the top 50 scores found. We applied the same technique to infer the significant biomarkers in prostate cancer with lymph nodes metastasis which is not established well.
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.
Identifying the genes causing genetic disease is a key challenge in human health, and a crucial step on the road for developing novel diagnostics and treatments. Modern discovery methods involve genome-wide association studies that reveal regions of the genome where the causal gene is likely to reside, and then prioritizing the candidate genes within these regions and experimentally examining the most promising candidates' potential influence on the disease. Many computational methods were developed to automatically prioritize candidate genes. Some of the most successful methods use a biological network of interacting genes or proteins as an input. However, these networks – and subsequently, these methods – do not take into account the differences between tissues. In other words, a heart disease is analyzed using the same network as a skin disease. We constructed tissue-specific protein interaction networks and explored their effect on an existing prioritization algorithm by comparing the algorithm's performance on the tissue-specific networks and the generic network. We find that integrating tissue-specific data indeed leads to better prioritization. We also used the prioritization results of different tissues in order to suggest new disease-tissue associations.
Computational analysis of microarray data has provided an effective way to identify disease-related genes. Traditional disease gene selection methods from microarray data such as statistical test always focus on differentially expressed genes in different samples by individual gene prioritization. These traditional methods might miss differentially coexpressed (DCE) gene subsets because they ignore the interaction between genes. In this paper, MIClique algorithm is proposed to identify DEC gene subsets based on mutual information and clique analysis. Mutual information is used to measure the coexpression relationship between each pair of genes in two different kinds of samples. Clique analysis is a commonly used method in biological network, which generally represents biological module of similar function. By applying the MIClique algorithm to real gene expression data, some DEC gene subsets which correlated under one experimental condition but uncorrelated under another condition are detected from the graph of colon dataset and leukemia dataset.
Motivation: A challenging problem after a genome-wide association study (GWAS) is to balance the statistical evidence of genotype–phenotype correlation with a priori evidence of biological relevance.
Results: We introduce a method for systematically prioritizing single nucleotide polymorphisms (SNPs) for further study after a GWAS. The method combines evidence across multiple domains including statistical evidence of genotype–phenotype correlation, known pathways in the pathologic development of disease, SNP/gene functional properties, comparative genomics, prior evidence of genetic linkage, and linkage disequilibrium. We apply this method to a GWAS of nicotine dependence, and use simulated data to test it on several commercial SNP microarrays.
Availability: A comprehensive database of biological prioritization scores for all known SNPs is available at http://zork.wustl.edu/gin. This can be used to prioritize nicotine dependence association studies through a straightforward mathematical formula—no special software is necessary.
Supplementary information: Supplementary data are available at Bioinformatics online.
BACKGROUND AND AIM:
The multi‐drug resistant‐1 (MDR‐1) gene is located on human chromosome 7 and encodes a glycosylated membrane protein that is a member of the ATP‐binding cassette transporters superfamily. The aim of the study was to reveal the role of the C3435T MDR‐1 gene polymorphism in chronic obstructive pulmonary disease.
DNA samples from 41 patients with chronic obstructive pulmonary disease and 50 healthy control participants were used to compare MDR‐1 gene profiles. Genotyping assays were performed using the StripAssay technique that is based on reverse‐hybridization.
The T allele polymorphism in the MDR‐1 gene located at position 3435 in exon 26 was shown to correlate with chronic obstructive pulmonary disease.
These preliminary results suggest that the T allele polymorphism of the MDR‐1 gene is associated with chronic obstructive pulmonary disease.
COPD; MDR‐1 gene; T allele frequency; Transporter glycoprotein; Reverse‐hybridization
Cystatin A (gene: CSTA), is up-regulated in non-small-cell lung cancer(NSCLC) and dysplastic vs normal human bronchial epithelium. In the context that chronic obstructive pulmonary disease (COPD), a small airway epithelium (SAE) disorder, is independently associated with NSCLC(especially squamous cell carcinoma, SCC), but only occurs in a subset of smokers, we hypothesized that genetic variation, smoking and COPD modulate CSTA gene expression levels in SAE, with further up-regulation in SCC. Gene expression was assessed by microarray in SAE of 178 individuals [healthy nonsmokers (n=60), healthy smokers (n=82), and COPD smokers (n=36)], with corresponding large airway epithelium (LAE) data in a subset (n=52). Blood DNA was genotyped by SNP microarray. Twelve SNPs upstream of the CSTA gene were all significantly associated with CSTA SAE gene expression(p<0.04 to 5 × 10 −4). CSTA gene expression levels in SAE were higher in COPD smokers (28.4 ± 2.0) than healthy smokers (19.9 ± 1.4, p<10−3), who in turn had higher levels than nonsmokers(16.1 ± 1.1, p<0.04). CSTA LAE gene expression was also smoking-responsive (p<10−3). Using comparable publicly available NSCLC expression data, CSTA was up-regulated in SCC vs LAE (p<10−2) and down-regulated in adenocarcinoma vs SAE (p <10−7). All phenotypes were associated with significantly different proportional gene expression of CSTA to cathepsins. The data demonstrate that regulation of CSTA expression in human airway epithelium is influenced by genetic variability, smoking, and COPD, and is further up-regulated in SCC, all of which should be taken into account when considering the role of CSTA in NSCLC pathogenesis.
cystatin; small airway epithelium; gene expression; genotype; COPD
Differential expressed genes are more likely to have variants associated with disease. A new tool, fitSNP, prioritizes candidate SNPs from association studies.
Candidate single nucleotide polymorphisms (SNPs) from genome-wide association studies (GWASs) were often selected for validation based on their functional annotation, which was inadequate and biased. We propose to use the more than 200,000 microarray studies in the Gene Expression Omnibus to systematically prioritize candidate SNPs from GWASs.
We analyzed all human microarray studies from the Gene Expression Omnibus, and calculated the observed frequency of differential expression, which we called differential expression ratio, for every human gene. Analysis conducted in a comprehensive list of curated disease genes revealed a positive association between differential expression ratio values and the likelihood of harboring disease-associated variants. By considering highly differentially expressed genes, we were able to rediscover disease genes with 79% specificity and 37% sensitivity. We successfully distinguished true disease genes from false positives in multiple GWASs for multiple diseases. We then derived a list of functionally interpolating SNPs (fitSNPs) to analyze the top seven loci of Wellcome Trust Case Control Consortium type 1 diabetes mellitus GWASs, rediscovered all type 1 diabetes mellitus genes, and predicted a novel gene (KIAA1109) for an unexplained locus 4q27. We suggest that fitSNPs would work equally well for both Mendelian and complex diseases (being more effective for cancer) and proposed candidate genes to sequence for their association with 597 syndromes with unknown molecular basis.
Our study demonstrates that highly differentially expressed genes are more likely to harbor disease-associated DNA variants. FitSNPs can serve as an effective tool to systematically prioritize candidate SNPs from GWASs.
Disease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.
The majority of common diseases are multi-factorial and modified by genetically and mechanistically complex polygenic interactions and environmental factors. High-throughput genome-wide studies like linkage analysis and gene expression profiling, tend to be most useful for classification and characterization but do not provide sufficient information to identify or prioritize specific disease causal genes.
Extending on an earlier hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships we, for the first time, show the utility of mouse phenotype data in human disease gene prioritization. We study the effect of different data integration methods, and based on the validation studies, we show that our approach, ToppGene , outperforms two of the existing candidate gene prioritization methods, SUSPECTS and ENDEAVOUR.
The incorporation of phenotype information for mouse orthologs of human genes greatly improves the human disease candidate gene analysis and prioritization.
Gene expression technologies have the ability to generate vast amounts of data, yet there often resides only limited resources for subsequent validation studies. This necessitates the ability to perform sorting and prioritization of the output data. Previously described methodologies have used functional pathways or transcriptional regulatory grouping to sort genes for further study. In this paper we demonstrate a comparative genomics based method to leverage data from animal models to prioritize genes for validation. This approach allows one to develop a disease-based focus for the prioritization of gene data, a process that is essential for systems that lack significant functional pathway data yet have defined animal models. This method is made possible through the use of highly controlled spotted cDNA slide production and the use of comparative bioinformatics databases without the use of cross-species slide hybridizations.
Using gene expression profiling we have demonstrated a similar whole transcriptome gene expression patterns in prostate cancer cells from human and rat prostate cancer cell lines both at baseline expression levels and after treatment with physiologic concentrations of the proposed chemopreventive agent Selenium. Using both the human PC3 and rat PAII prostate cancer cell lines have gone on to identify a subset of one hundred and fifty-four genes that demonstrate a similar level of differential expression to Selenium treatment in both species. Further analysis and data mining for two genes, the Insulin like Growth Factor Binding protein 3, and Retinoic X Receptor alpha, demonstrates an association with prostate cancer, functional pathway links, and protein-protein interactions that make these genes prime candidates for explaining the mechanism of Selenium's chemopreventive effect in prostate cancer. These genes are subsequently validated by western blots showing Selenium based induction and using tissue microarrays to demonstrate a significant association between downregulated protein expression and tumorigenesis, a process that is the reverse of what is seen in the presence of Selenium.
Thus the outlined process demonstrates similar baseline and selenium induced gene expression profiles between rat and human prostate cancers, and provides a method for identifying testable functional pathways for the action of Selenium's chemopreventive properties in prostate cancer.
Motivation: During the past decade, we have seen an exponential growth of vast amounts of genetic data generated for complex disease studies. Currently, across a variety of complex biological problems, there is a strong trend towards the integration of data from multiple sources. So far, candidate gene prioritization approaches have been designed for specific purposes, by utilizing only some of the available sources of genetic studies, or by using a simple weight scheme. Specifically to psychiatric disorders, there has been no prioritization approach that fully utilizes all major sources of experimental data.
Results: Here we present a multi-dimensional evidence-based candidate gene prioritization approach for complex diseases and demonstrate it in schizophrenia. In this approach, we first collect and curate genetic studies for schizophrenia from four major categories: association studies, linkage analyses, gene expression and literature search. Genes in these data sets are initially scored by category-specific scoring methods. Then, an optimal weight matrix is searched by a two-step procedure (core genes and unbiased P-values in independent genome-wide association studies). Finally, genes are prioritized by their combined scores using the optimal weight matrix. Our evaluation suggests this approach generates prioritized candidate genes that are promising for further analysis or replication. The approach can be applied to other complex diseases.
Availability: The collected data, prioritized candidate genes, and gene prioritization tools are freely available at http://bioinfo.mc.vanderbilt.edu/SZGR/.
Supplementary information:Supplementary data are available at Bioinformatics online.
Schizophrenia is a chronic psychiatric disorder that affects about 1% of the population globally. A tremendous amount of effort has been expended in the past decade, including more than 2400 association studies, to identify genes influencing susceptibility to the disorder. However, few genes or markers have been reliably replicated. The wealth of this information calls for an integration of gene association data, evidence based gene ranking, and follow-up replication in large sample. The objective of this study is to develop and evaluate evidence based gene ranking methods and to examine the features of top-ranking candidate genes for schizophrenia.
We proposed a gene-based approach for selecting and prioritizing candidate genes by combining odds ratios (ORs) of multiple markers in each association study and then combining ORs in multiple studies of a gene. We named it combination-combination OR method (CCOR). CCOR is similar to our recently published method, which first selects the largest OR of the markers in each study and then combines these ORs in multiple studies (i.e., selection-combination OR method, SCOR), but differs in selecting representative OR in each study. Features of top-ranking genes were examined by gene ontology terms and gene expression in tissues.
Our evaluation suggested that the SCOR method overall outperforms the CCOR method. Using the SCOR, a list of 75 top-ranking genes was selected for schizophrenia candidate genes (SZGenes). We found that SZGenes had strong correlation with neuro-related functional terms and were highly expressed in brain-related tissues.
The scientific landscape for schizophrenia genetics and other complex disease studies is expected to change dramatically in the next a few years, thus, the gene-based combined OR method is useful in candidate gene selection for follow-up association studies and in further artificial intelligence in medicine. This method for prioritization of candidate genes can be applied to other complex diseases such as depression, anxiety, nicotine dependence, alcohol dependence, and cardiovascular diseases.
Schizophrenia; Candidate genes; Odds ratio; Association studies
The World Health Organization has estimated that by 2030, chronic obstructive pulmonary disease will be the third leading cause of death worldwide. Most knowledge of chronic obstructive pulmonary disease is based on studies performed in Europe or North America and little is known about the prevalence, patient characteristics and change in lung function over time in patients in developing countries, such as those of Latin America. This lack of knowledge is in sharp contrast to the high levels of tobacco consumption and exposure to biomass fuels exhibited in Latin America, both major risk factors for the development of chronic obstructive pulmonary disease. Studies have also demonstrated that most Latin American physicians frequently do not follow international chronic obstructive pulmonary disease diagnostic and treatment guidelines. The PRISA Study will expand the current knowledge regarding chronic obstructive pulmonary disease and risk factors in Argentina, Chile and Uruguay to inform policy makers and health professionals on the best policies and practices to address this condition.
PRISA is an observational, prospective cohort study with at least four years of follow-up. In the first year, PRISA has employed a randomized three-staged stratified cluster sampling strategy to identify 6,000 subjects from Marcos Paz and Bariloche, Argentina, Temuco, Chile, and Canelones, Uruguay. Information, such as comorbidities, socioeconomic status and tobacco and biomass exposure, will be collected and spirometry, anthropometric measurements, blood sampling and electrocardiogram will be performed. In year four, subjects will have repeat measurements taken.
There is no longitudinal data on chronic obstructive pulmonary disease incidence and risk factors in the southern cone of Latin America, therefore this population-based prospective cohort study will fill knowledge gaps in the prevalence and incidence of chronic obstructive pulmonary disease, patient characteristics and changes in lung function over time as well as quality of life and health care resource utilization. Information gathered during the PRISA Study will inform public health interventions and prevention practices to reduce risk of COPD in the region.
Chronic Obstructive Pulmonary Disease; Risk Factors; South America; Cohort
Recognizing the importance of improving lung health through lung disease research, the National Heart, Lung, and Blood Institute (NHLBI) convened a workshop of multidisciplinary experts for the following purpose: (1) to review the current scientific knowledge underlying the basis for treatment of adults and children with pulmonary vascular diseases (PVDs); (2) to identify gaps, barriers, and emerging scientific opportunities in translational PVD research and the means to capitalize on these opportunities; (3) to prioritize new research directions that would be expected to affect the clinical course of PVDs; and (4) to make recommendations to the NHLBI on how to fill identified gaps in adult and pediatric PVD clinical research. Workshop participants reviewed experiences from previous PVD clinical trials and ongoing clinical research networks with other lung disorders, including acute respiratory distress syndrome, chronic obstructive lung disease, and idiopathic pulmonary fibrosis, as well. Bioinformatics experts discussed strategies for applying cutting-edge health information technology to clinical studies. Participants in the workshop considered approaches in the following broad concept areas: (1) improved phenotyping to identify potential subjects for appropriate PVD clinical studies; (2) identification of potential new end points for assessing key outcomes and developing better-designed PVD clinical trials; and (3) the establishment of priorities for specific clinical research needed to advance care of patients with various subsets of PVDs from childhood through adulthood. This report provides a summary of the objectives and recommendations to the NHLBI concentrating on clinical research efforts that are needed to better diagnose and treat PVDs.
clinical trials; pediatrics; pulmonary hypertension; pulmonary vascular changes
Genome-wide expression profiling using microarrays or sequence-based technologies allows us to identify genes and genetic pathways whose expression patterns influence complex traits. Different methods to prioritize gene sets, such as the genes in a given molecular pathway, have been described. In many cases, these methods test one gene set at a time, and therefore do not consider overlaps among the pathways. Here, we present a Bayesian variable selection method to prioritize gene sets that overcomes this limitation by considering all gene sets simultaneously. We applied Bayesian variable selection to differential expression to prioritize the molecular and genetic pathways involved in the responses to Escherichia coli infection in Danish Holstein cows.
We used a Bayesian variable selection method to prioritize Kyoto Encyclopedia of Genes and Genomes pathways. We used our data to study how the variable selection method was affected by overlaps among the pathways. In addition, we compared our approach to another that ignores the overlaps, and studied the differences in the prioritization. The variable selection method was robust to a change in prior probability and stable given a limited number of observations.
Bayesian variable selection is a useful way to prioritize gene sets while considering their overlaps. Ignoring the overlaps gives different and possibly misleading results. Additional procedures may be needed in cases of highly overlapping pathways that are hard to prioritize.
Bayesian variable selection; Gene set; Overlap
Based on mouse lens gene expression profiling, a systems tool was developed for identification of genes associated with human congenital cataract. iSyTE ranked 88% of known isolated congenital cataract–associated genes within the top two of all candidates in the originally mapped genomic intervals.
To facilitate the identification of genes associated with cataract and other ocular defects, the authors developed and validated a computational tool termed iSyTE (integrated Systems Tool for Eye gene discovery; http://bioinformatics.udel.edu/Research/iSyTE). iSyTE uses a mouse embryonic lens gene expression data set as a bioinformatics filter to select candidate genes from human or mouse genomic regions implicated in disease and to prioritize them for further mutational and functional analyses.
Microarray gene expression profiles were obtained for microdissected embryonic mouse lens at three key developmental time points in the transition from the embryonic day (E)10.5 stage of lens placode invagination to E12.5 lens primary fiber cell differentiation. Differentially regulated genes were identified by in silico comparison of lens gene expression profiles with those of whole embryo body (WB) lacking ocular tissue.
Gene set analysis demonstrated that this strategy effectively removes highly expressed but nonspecific housekeeping genes from lens tissue expression profiles, allowing identification of less highly expressed lens disease–associated genes. Among 24 previously mapped human genomic intervals containing genes associated with isolated congenital cataract, the mutant gene is ranked within the top two iSyTE-selected candidates in approximately 88% of cases. Finally, in situ hybridization confirmed lens expression of several novel iSyTE-identified genes.
iSyTE is a publicly available Web resource that can be used to prioritize candidate genes within mapped genomic intervals associated with congenital cataract for further investigation. Extension of this approach to other ocular tissue components will facilitate eye disease gene discovery.
Chronic obstructive pulmonary disease (COPD) is a major public health problem. The aim of this study was to identify genes involved in emphysema severity in COPD patients.
Gene expression profiling was performed on total RNA extracted from non-tumor lung tissue from 30 smokers with emphysema. Class comparison analysis based on gas transfer measurement was performed to identify differentially expressed genes. Genes were then selected for technical validation by quantitative reverse transcriptase-PCR (qRT-PCR) if also represented on microarray platforms used in previously published emphysema studies. Genes technically validated advanced to tests of biological replication by qRT-PCR using an independent test set of 62 lung samples.
Class comparison identified 98 differentially expressed genes (p < 0.01). Fifty-one of those genes had been previously evaluated in differentiation between normal and severe emphysema lung. qRT-PCR confirmed the direction of change in expression in 29 of the 51 genes and 11 of those validated, remaining significant at p < 0.05. Biological replication in an independent cohort confirmed the altered expression of eight genes, with seven genes differentially expressed by greater than 1.3 fold, identifying these as candidate determinants of emphysema severity.
Gene expression profiling of lung from emphysema patients identified seven candidate genes associated with emphysema severity including COL6A3, SERPINF1, ZNHIT6, NEDD4, CDKN2A, NRN1 and GSTM3.
Disease activity measurement is a key component of rheumatoid arthritis (RA) management. Biomarkers that capture the complex and heterogeneous biology of RA have the potential to complement clinical disease activity assessment.
To develop a multi-biomarker disease activity (MBDA) test for rheumatoid arthritis.
Candidate serum protein biomarkers were selected from extensive literature screens, bioinformatics databases, mRNA expression and protein microarray data. Quantitative assays were identified and optimized for measuring candidate biomarkers in RA patient sera. Biomarkers with qualifying assays were prioritized in a series of studies based on their correlations to RA clinical disease activity (e.g. the Disease Activity Score 28-C-Reactive Protein [DAS28-CRP], a validated metric commonly used in clinical trials) and their contributions to multivariate models. Prioritized biomarkers were used to train an algorithm to measure disease activity, assessed by correlation to DAS and area under the receiver operating characteristic curve for classification of low vs. moderate/high disease activity. The effect of comorbidities on the MBDA score was evaluated using linear models with adjustment for multiple hypothesis testing.
130 candidate biomarkers were tested in feasibility studies and 25 were selected for algorithm training. Multi-biomarker statistical models outperformed individual biomarkers at estimating disease activity. Biomarker-based scores were significantly correlated with DAS28-CRP and could discriminate patients with low vs. moderate/high clinical disease activity. Such scores were also able to track changes in DAS28-CRP and were significantly associated with both joint inflammation measured by ultrasound and damage progression measured by radiography. The final MBDA algorithm uses 12 biomarkers to generate an MBDA score between 1 and 100. No significant effects on the MBDA score were found for common comorbidities.
We followed a stepwise approach to develop a quantitative serum-based measure of RA disease activity, based on 12-biomarkers, which was consistently associated with clinical disease activity levels.