|Home | About | Journals | Submit | Contact Us | Français|
Bronchodilator response tests measure the effect of β2-agonists, the most commonly used short-acting reliever drugs for asthma. We sought to relate candidate gene SNP data with bronchodilator response and measure the predictive accuracy of a model constructed with genetic variants.
Bayesian networks, multivariate models that are able to account for simultaneous associations and interactions among variables, were used to create a predictive model of bronchodilator response using candidate gene SNP data from 308 Childhood Asthma Management Program Caucasian subjects.
The model found that 15 SNPs in 15 genes predict bronchodilator response with fair accuracy, as established by a fivefold cross-validation area under the receiver-operating characteristic curve of 0.75 (standard error: 0.03).
Bayesian networks are an attractive approach to analyze large-scale pharmacogenetic SNP data because of their ability to automatically learn complex models that can be used for the prediction and discovery of novel biological hypotheses.
Asthma is a chronic respiratory disease that affects over 20 million Americans and 300 million people worldwide [1,2]. There are two main types of asthma therapies that differ in their onset time: reliever drugs, which are fast-acting and traditionally target acute bronchoconstriction, and controller drugs, which act long-term to reduce the severity of airway inflammation and obstruction . The most common short-acting reliever drugs are inhaled β2-agonists (e.g., albuterol) that stimulate β2-adrenergic receptors (β2-AR) on airway smooth muscle cells to reduce bronchoconstriction . Despite their routine clinical use, β2-agonist drugs have variable efficacy among patients . The effect of β2-agonists is commonly measured by the acute bronchodilator response (BDR) test, which is a lung function test used to evaluate reversible airway obstruction and aid in the diagnosis of asthma. The physiological response to a bronchodilator is a complex trait that is heritable and partly under genetic control , involving intricate interactions among airway epithelial and smooth muscle cells as well as sympathetic nerves. Although a comprehensive pathophysiologic understanding of BDR has not been realized, some evidence for the genetic basis of variable BDR has been established by genetic association studies [7–15].
The most studied gene related to BDR is that encoding the β2-AR (ADRB2). This transmembrane G protein-coupled receptor mediates airway smooth muscle relaxation when β2-agonist drugs bind to it. Several studies have looked at SNPs and haplotypes in this gene and related them to decreased pulmonary function , response to β2-agonist treatment , an increased frequency of asthma exacerbations  and BDR [10,11]. A meta-analysis of 21 studies of ADRB2 polymorphisms found that most of the positive associations identified in individual studies cannot be compared with findings in other studies due to different study designs, different phenotypes considered and potentially, selective reporting, making the number of variants with true replications very small . Other candidate genes have been reported to be associated with BDR, including adenylyl cyclase type 9 (AC9) , corticotrophin-releasing hormone receptor 2 (CRHR2)  and arginase 1 (ARG1) . Overall, these candidate gene studies suggest that genetic polymorphisms play a role in the success of β2-agonist therapy and that a set of variants that accurately predicts successful treatment may be identified. Unraveling the genetic basis of BDR would be helpful to identify which patients are responsive to β2-agonists and which biological mechanisms are responsible for variability in patient response to such drugs.
In this study, a genetic predictive model of BDR was created with Bayesian networks, multivariate models that are able to account for simultaneous associations and interactions among variables to make predictions of BDR [16–18]. In addition to Bayesian networks, other machine learning techniques have been successfully developed to create multivariate predictive models, including artificial neural networks , support vector machines  and random forests . Among these techniques, Bayesian networks have the advantage of representing relationships among variables in intuitive graphical models and of calculating probabilities for individual outcomes that help to determine the reliability of predictions. Bayesian networks have been used to study a wide variety of clinical phenotypes, including the prediction of risk of death among cardiac surgery  and sickle cell anemia  patients, the diagnosis of acute appendicitis , the assessment of ballistic penetrating trauma  and the identification of asthma patients at risk for chronic obstructive pulmonary disease . In addition, genetic data has been used to create Bayesian networks for the assessment of stroke risk in sickle cell anemia patients , and the probability of cardioembolic stroke . Here, Bayesian networks are being applied to pharmacogenetics to try to identify a set of genetic variants that may be used to predict drug response.
The Childhood Asthma Management Program (CAMP) is a clinical trial that followed 1041 asthmatic children for 4 years and nearly 80% of the original participants for an additional 12 years . Stringent inclusion criteria ensured that participants had mild-to-moderate asthma, which was assessed as having asthma symptoms at least twice per week, using asthma medication daily, or using an inhaled bronchodilator twice per week for 6 or more months of the year prior to recruitment. CAMP subjects had increased airway responsiveness, as established by a 20% or greater decrease in forced expiratory volume in 1 second (FEV1) after administration of up to 12.5 mg/dl of methacholine. At thrice yearly follow-up visits, spirometry was performed according to American Thoracic Society recommendations with a volume-displacement spirometer . Spirometry testing was performed by pulmonary function technicians trained and certified specifically for the CAMP protocol and procedures, and took place at least 4 h after the use of a short-acting bronchodilator and 24 h after the use of a long-acting bronchodilator. Post-bronchodilator (two puffs albuterol by metered-dose inhaler) measurements were taken at each spirometry session. After administration of the bronchodilator, the minimal elapsed time before the post-bronchodilator test was 15 min. BDR was defined as the percentage change in FEV1 after bronchodilator administration. Additional data collected for CAMP subjects has been described previously . CAMP participants and their parents provided DNA for genetic studies. Informed consent was obtained from all CAMP participants and their parents. The studies were approved by the Institutional Review Board of Partners’ Healthcare and by all eight CAMP clinical centers and the CAMP Data Coordinating Center.
A cohort of 308 Caucasian CAMP probands who had data on multiple genetic polymorphisms gathered for various candidate genes were selected to create a predictive model of BDR. These subjects were not part of the inhaled corticosteroid treatment group of CAMP, as inhaled corticosteroid treatment can influence BDR , and had at least nine out of 11 BDR measures available from follow-up visits. Subjects were classified as cases (n = 113) if they had a mean BDR across all visits of 12% or greater, and controls (n = 195) if they had a mean BDR across all visits of less than 8%. These thresholds of BDR are established clinical thresholds for classifying patients as ‘responders’ (12% or greater BDR) and being within natural measurement variability (8% or smaller BDR) [31,32]. Because it is less clear how to classify subjects with BDR between 8 and 12%, we focused on subjects with extreme values to maximize any potential underlying genetic difference.
Candidate genes were selected for genotyping based on previously published associations with asthma or related phenotypes, including involvement in innate immunity and steroid and β-agonist pharmacogenetic pathways. SNP selection was performed such that a small set of SNPs distinguished the common haplotypes of the genes of interest . Haplotypes were inferred using Bayesian methods as implemented in PHASE . SNPs that distinguished the most common haplotypes were identified using the best enumeration of SNP tags (BEST) algorithm . Only haplotypes that were found in the Caucasian population at a frequency of 5% or greater were considered. Rare SNPs (minor allele frequency <5%) were considered for genotyping only if the SNP led to a nonconservative amino acid change, implying greater potential for functional significance.
As described previously, SNPs were genotyped either with an Illumina BeadStation 500G (Illumima Inc., CA, USA) using the GoldenGate assay  or a Sequenom MassARRAY® matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometer (Sequenom, CA, USA) .
Prior to quality control filters, there were 329 subjects and 3305 SNPs. Subjects missing more than 5% of SNP data (n = 19), and SNPs missing in more than 5% of subjects (n = 590) were dropped from further analysis. Hardy–Weinberg equilibrium was checked in all SNPs among control subjects using the exact procedure used by Wigginton et al. , with a p = 0.01 significance threshold, and those that were not in equilibrium (n = 27) were dropped from further analysis. SNPs with minor allele frequency less than 5% (n = 244) and intergenic SNPs (n = 402) were dropped from further analysis. PLINK was used to obtain a subset of the 2042 SNPs that were independent, as established by an r2 threshold of 0.10 . Missing alleles were imputed by randomly selecting a genotype based on the observed genotype frequency distribution among controls. The resulting data corresponded to 426 SNPs from 254 candidate genes. The number of SNPs in each of these genes, assigned according to the September 2006 NCBI data (Build 36.2), are given in Supplementary Table 1.
Clinical variable differences among cases and controls were measured in R , with Fisher’s Exact test for categorical variables and with Welch Two Sample t-tests for continuous variables. Logistic regression and forward stepwise logistic regression procedures in R were used to create SNP multivariate models of BDR . Genetic case–control analyses measuring the association of individual SNPs to BDR were performed in PLINK using genotypic tests . Power calculations were performed using the Genetic Power Calculator by Purcell et al. with high risk allele frequency of 0.17, prevalence of 0.30, D′ of 1, marker allele frequency of 0.30 and with default error rates (α= 0.05, power = 0.80) .
A Bayesian network is a directed acyclic graph where nodes represent variables and edges between nodes represent probabilistic relationships between variables (i.e., a node that has an incoming arrow is dependent on the node from which the arrow originates). The topology of a Bayesian network and the associated probabilistic relationships among variables can be learned directly from data, making Bayesian networks suitable for extracting complex and unbiased relationships among variables. In this work, a Bayesian network was constructed using a ‘phenocentric’ search algorithm developed by some of the authors (Blanca E Himes, Marco F Ramoni) in which SNPs that directly modulate BDR were found . This set consisted of nodes with edges originating from BDR (i.e., children of BDR) and those with edges directed to children of BDR (i.e., parents of children of BDR). The search algorithm proceeded by greedily selecting SNPs to become children of BDR based on Bayes factors comparing the likelihood that BDR was dependent versus independent on each SNP, and, for each BDR child, greedily selected its parental SNPs based on Bayes factors comparing the likelihood that the new SNP should be connected versus not connected to the BDR child. The final network returned is that with the highest likelihood of predicting BDR, given a set of SNPs, among the networks explored. More details regarding Bayesian networks can be found in [16–18].
The network’s goodness of fit to the data used in its creation was assessed using fitted values corresponding to BDR prediction for each subject used in the Bayesian network construction. The predictive accuracy of the network was determined via a fivefold cross-validation in which each of five non-overlapping data subsets, obtained by randomly splitting the original dataset, is used as an independent dataset while the remaining four are used to learn the network dependencies. The probability of BDR given the genotype of an individual subject was calculated using the Clique algorithm implemented in Bayesware Discoverer . The predictive performance of the network was evaluated with receiver operating characteristic (ROC) curves calculated with both fitted values and cross-validation results. Predictive accuracy was measured as the area under the ROC curve (AUROC), and significance for this accuracy was obtained by comparing the classification ability of models obtained to random classification. The standard error (SE) for AUROCs and for the difference between AUROCs of two curves were estimated using the nonparametric asymptotic method proposed by DeLong et al.  and described in Lasko et al. . For ROC plots, convex hulls were estimated. All ROC analyses were performed in R . To compare the performance of the Bayesian Network to a single-SNP approach, the predictive performance of single SNPs was compared with the performance of all SNPs in the network using corresponding fitted value and fivefold cross-validation procedures.
Clinical characteristics of the subjects are provided in Table 1. In a Bayesian network that was learned directly from the genetic data, it was found that 15 of 426 (3.5%) SNPs from 15 of 254 (5.9%) genes were predictive of BDR (Figure 1). The model’s goodness of fit, assessed using fitted values, had an AUROC of 0.81 (SE: 0.03) and was significantly better than a random classifier (p < 10−16). Fivefold cross-validation was performed to test model robustness, and the corresponding AUROC of 0.75 (SE: 0.03), which was also significantly better than a random classifier (p = 2 × 10−13), demonstrated that the network had fair predictive accuracy (Figure 2). The genotypic p-values measuring the association of individual SNPs with BDR are consistent with Bayes factors measuring the likelihood that each SNP in the network is related directly to BDR (Table 2). Corresponding allelic odds ratios demonstrate that some of the SNPs increase the risk of having higher BDR, while others confer a protective effect.
Comparison of the performance of single SNPs to all SNPs in the Bayesian network found that the predictive accuracy, according to fitted value AUROCs having an area significantly greater than 0.5 (p < 0.05), of 13 SNPs individually is slightly better than a random classifier (Table 3 & Figure 3). However, according to fivefold cross-validation AUROCs, only three SNPs (rs880028, rs1110904 and rs163688) performed better than a random classifier. By either fitted value or cross-validation results comparison, the model performed considerably worse individually at predicting BDR than did using all SNP information (p < 10−14).
A forward stepwise logistic regression performed with the same SNPs that were used to learn the Bayesian network found that 47 SNPs were predictive of BDR, though none had a significant p-value for its individual effect in the model (Supplementary Table 2). The logistic regression model had a fitted value AUROC of 1.0 (SE: 0.0; p < 1 × 10−16) and a fivefold cross-validation AUROC of 0.55 (SE: 0.04; p = 0.04). A logistic regression model created using the SNPs that were in the BDR Bayesian network had an AUROC of 0.73 (SE: 0.03; p = 3 × 10−12) for fitted values and 0.66 (SE: 0.03; p = 1 × 10−6) for fivefold cross-validation. Six of the 15 SNPs in this smaller regression model had significant individual effects (Table 4).
An important aspect of the growth of pharmacogenetics over the past 15 years has been the collection of candidate gene data from participants of drug trials to find genetic variants correlated with drug response . Consistent with the approach taken in most genetic studies, the analyses in pharmacogenetics have been mostly limited to consideration of individual gene variants in single genes. When the effect of multiple genes and variants is taken into account, it is usually done so using traditional additive models (e.g., logistic regression). The consensus reached from these small-scale genetic association studies, and even from more recent genome-wide studies, is that SNPs individually contribute very modestly towards phenotypic differences and a very large number of variants are required to explain the genetic underpinnings of each complex trait [44,45]. Traditional approaches that analyze individual variants one at a time do not address the complex interactions that are known to underlie complex traits. If such interactions were taken into account by attempting to directly model epistasis, it is likely that a larger portion of phenotypic variance could be explained with fewer SNPs than the number estimated using additive models [46,47]. Therefore, pharmacogenetic studies will benefit from analytical methods that consider complex genetic interactions because such approaches may more accurately predict drug responses and characterize their complex biological underpinnings.
Bayesian networks are a promising approach for the analysis of complex traits because of their ability to model complex multivariate dependencies and make predictions [16–18]. Unlike traditional regression models, they are not limited to representing the dependencies of a single outcome variable on predictor variables. Instead, Bayesian networks go beyond single or pairwise gene interactions with a phenotype to account for complex multivariate interactions. In addition, they can be used to make predictions of a phenotype of interest for individual subjects that allow for ascertainment of model validity. Relationships found in a Bayesian network suggest true relationships among variables and may even reflect causal relationships. However, proof of any causal relationship among variables requires experimental verification. In this work, we have demonstrated that Bayesian networks can automatically integrate candidate gene data to make predictions of BDR, a pharmacogenetic phenotype. The resulting network provides a model that can also be used for both clinical prediction and to investigate potential biological relationships among the genes that compose it.
An important test of model goodness is its ability to make accurate predictions. Initially, a model is tested by using fitted values to ensure that it is adequate for the data used to construct it. The BDR Bayesian network had an AUROC of 0.81 (SE: 0.03) on the fitted values, demonstrating that it is good at predicting BDR in the subjects who were used to learn the model. Ideally, the next step in testing the generalizability of the model is to use it for prediction in an independent population to ensure that the predictive accuracy in fitted values is generalizable. Because we did not have access to an independent population with similar clinical characteristics as our training population, we used cross-validation, a standard procedure used to estimate the predictive accuracy of a model in the absence of an independent population. The fivefold cross-validation AUROC, 0.75 (SE: 0.03), demonstrated that some overfitting was present in the fitted values because the AUROC dropped significantly by 0.05 (p < 10−16). However, as shown in Table 3 & Figure 3, when the information of multiple SNPs is used to predict BDR, the accuracy of prediction is significantly higher than when using the information of single SNPs. The fairly good predictive accuracy of the network and the better predictive performance when using all, rather than single, SNP information, suggest that some of the relationships among SNPs and BDR are not due to overfitting and potentially, reflect true biological relationships.
Some of the genes corresponding to SNPs in the BDR Bayesian network belong to biological pathways expected to be involved in BDR: DDC and PRKCA are part of the β-agonist pathway, and IL12B and IL8RA are involved in immune responses. Some genes that are intuitively expected to be in the network are absent. Most notably, polymorphisms of ADRB2, the most studied gene related to BDR, are not part of the network despite this gene having three SNPs in our candidate gene set. Because ADRB2 variants have been associated with BDR  and related phenotypes (decreased pulmonary function, response to β2-agonist treatment [7,8], increased frequency of asthma exacerbations ), it might be expected that ADRB2 SNPs would be part of the network. Even in CAMP families, some ADRB2 SNPs were previously found to be associated with spirometric measures of lung function . In the current study, traditional genetic association results for the three ADRB2 SNPs that were part of the set of SNPs used to learn the BDR Bayesian network failed to show any association to BDR (Supplementary Table 3). These results are in agreement with some of the inconsistent ADRB2 association results observed when considering different phenotypes and populations as mentioned in the Introduction.
Traditional association statistics of the SNPs in the BDR Bayesian network reveals that the SNPs that are children of BDR have significant genotypic p-values, ranging from 0.0011 to 0.025 (Table 2). By contrast, some of the SNPs have significant allelic p-values, while others do not. The Bayes Factor column in this table contains the Bayes factors that tested whether each SNP should be a child of BDR. Those SNPs for which the Bayes factor was greater than 1 were selected as children of BDR. A significant p-value is, therefore, analogous to a Bayes factor greater than 1. The genotypic model p-value results are more consistent with those of the Bayes factors than those of the allelic p-values because both genotypic p-values and the Bayes factors are based on a SNPs genotype distributions. The two SNPs (rs7310083 and CL11302) from parents of BDR children were selected to be part of the Bayesian network based on Bayes factors indicating that it was more likely that both they and BDR should be parents of SNPs rs6711781 and rs1493193, rather than not. This type of relationship cannot be captured using individual allelic or genotypic p-values, but is readily found by the Bayesian network search algorithm.
The results from a stepwise logistic regression procedure performed on the 426 SNPs that were available to create the Bayesian network demonstrate that the Bayesian network learning algorithm was much better at selecting predictors of BDR and not as prone to overfitting. Although the fitted value AUROC of the stepwise logistic regression, 1.0 (SE: 0), suggested that the logistic regression model was perfect at predicting BDR, a fivefold cross-validation procedure had an AUROC of 0.55 (SE: 0.04), showing that the regression procedure was very poor at creating a generalizable model of BDR. If instead of using a stepwise procedure to select the predictors from the list of all SNPs, a logistic regression model is built with the same SNPs that were in the BDR Bayesian network, the results improve. The fitted values AUROC drops to 0.73 (SE: 0.03), but there is less overfitting as indicated by a cross-validation AUROC of 0.66 (SE: 0.03). However, the BDR Bayesian network, which had cross-validation AUROC of 0.75 (SE: 0.03), is much better at predicting BDR than the logistic regression model. Therefore, the logistic regression model is adequate to predict BDR when the appropriate SNPs are used in the model, but it is not an appropriate procedure to determine the best predictive SNPs from a long list of candidates.
As the cross-validation AUROC (0.75, SE: 0.03) of our model demonstrates, there is room for improvement in predictive accuracy. Some notable limitations of this work that may have reduced the predictive accuracy are: BDR phenotype definition, selection of candidate genes, and low power to detect genetic effects. The definition we chose for BDR was percentage change in FEV1 after bronchodilator administration, which is the phenotype most commonly used in a clinical setting. Because BDR is a continuous trait, with no clear boundary separating responders from nonresponders, arbitrary thresholds must be chosen to distinguish these groups. We used the average of multiple measures of BDR and selected subjects with values at two extremes, according to clinical standards, to define cases and controls. In so doing, we hoped to maximize any genetic difference between the two phenotype groups with a stable BDR. In addition to BDR being a continuous trait that is influenced by many environmental factors, it is a trait that is difficult to isolate from other factors that indicate poor lung function, such as baseline FEV1. As shown in Table 1, baseline measures of prebronchodilator FEV1, log-transformed provocative concentration of methacholine causing a 20% drop in FEV1 (PC20), log-transformed IgE level, eosinophil level and BMI are different in cases and controls. Most of these factors have trends that reflect more severe disease in cases, and therefore, it is likely that the BDR phenotype that we defined does not exclusively reflect bronchodilator response. These trends are also consistent with previous work in CAMP subjects that has demonstrated that bronchodilator response, bronchoconstriction, and baseline lung function effects are difficult to isolate [30,48]. Attempts to adjust for these known confounders of BDR may diminish the phenotype’s magnitude erroneously because all may be partly due to some overlapping genetic risk factors. Therefore, we chose to use BDR measures only to define our BDR phenotype. In future studies, it may be informative to look at genetic models that predict other lung function phenotypes to measure the extent to which the SNPs and genes composing them overlap.
The second limitation influencing the model’s predictive accuracy is that the genetic data used to learn the Bayesian network was a set of SNPs gathered from candidate gene studies. The selection of such SNPs is limited in number and biased as it was based on previous literature and not on the whole genome. It is therefore highly likely that important genes contributing to BDR were not included in the search. Inclusion of such genes may have resulted in higher predictive accuracy and lead to the discovery of more novel biological pathways. In future studies, it will be interesting to find out whether an unbiased genome-wide search leads to finding the same SNPs in the BDR Bayesian network presented here. Lastly, the number of subjects used to create the network (113 cases, 195 controls) is not of adequate size to measure small genetic effects, as exemplified by power calculations using the mean allelic frequencies and effect sizes observed for the children SNPs in the BDR Bayesian network yielding estimates of 0.32 for a genotypic model and 0.40 for an allelic model. Although these power estimates do not directly apply to the Bayesian network, having adequate sample sizes and large genetic effects increases the ability of any search algorithm to find true relationships. Low power to detect genetic effects is a limitation of many previous pharmacogenetics studies because the number of subjects in clinical trials is smaller than necessary for highly powered association studies. This limitation may be reduced with newly available subject populations that are being recruited through electronic medical records . Due to the limitations of our study, we would not recommend that the current BDR Bayesian network be used clinically at this time. Eventually, if a successful predictive model of BDR is found, it could be used clinically as a genetic test to help determine which patients will have a poor response to bronchodilator medications. In addition, a successful predictive model of BDR would likely contain polymorphisms that suggest novel hypotheses regarding biological variants involved in asthma pathophysiology.
A genetic predictive model of BDR was made using Bayesian networks. This model found that 15 of 426 (3.5%) SNPs from 15 of 254 (5.9%) genes were predictive of BDR. The predictive accuracy of the model is fair, as established with a fivefold cross-validation having an AUROC of 0.75 (SE: 0.03). The multivariate model was compared with a single-gene approach, and the superior predictive accuracy of the multivariate approach was demonstrated. Bayesian networks are an attractive approach to analyze pharmacogenetic data because of their ability to automatically learn complex models that can be used for prediction and the exploration of biological hypotheses.
We thank all CAMP subjects for their ongoing participation in this study.
Financial & competing interests disclosure
We acknowledge the CAMP investigators and research team, supported by the National Heart, Lung and Blood Institute (NHLBI), for collection of CAMP Genetic Ancillary Study data. All work on data collected from the CAMP Genetic Ancillary Study was conducted at the Channing Laboratory of the Brigham and Women’s Hospital under appropriate CAMP policies and human subject’s protections. The CAMP Genetics Ancillary Study is supported by U01 HL075419, U01 HL65899, and P01 HL083069 from the NHLBI, NIH. Support for BEH was provided by 2T15LM007092–16, National Library of Medicine, NIH. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
For reprint orders, please contact: moc.enicidemerutuf@stnirper
Ethical conduct of research
The authors state that they have obtained appropriate institutional review board approval or have followed the principles outlined in the Declaration of Helsinki for all human or animal experimental investigations. In addition, for investigations involving human subjects, informed consent has been obtained from the participants involved.
Papers of special note have been highlighted as:
of considerable interest