1.  Effects of BMI, Fat Mass, and Lean Mass on Asthma in Childhood: A Mendelian Randomization Study 
PLoS Medicine  2014;11(7):e1001669.
In this study, Granell and colleagues used Mendelian randomization to investigate causal effects of BMI, fat mass, and lean mass on current asthma at age 7½ years in the Avon Longitudinal Study of Parents and Children (ALSPAC) and found that higher BMI increases the risk of asthma in mid-childhood.
Please see later in the article for the Editors' Summary
Observational studies have reported associations between body mass index (BMI) and asthma, but confounding and reverse causality remain plausible explanations. We aim to investigate evidence for a causal effect of BMI on asthma using a Mendelian randomization approach.
Methods and Findings
We used Mendelian randomization to investigate causal effects of BMI, fat mass, and lean mass on current asthma at age 7½ y in the Avon Longitudinal Study of Parents and Children (ALSPAC). A weighted allele score based on 32 independent BMI-related single nucleotide polymorphisms (SNPs) was derived from external data, and associations with BMI, fat mass, lean mass, and asthma were estimated. We derived instrumental variable (IV) estimates of causal risk ratios (RRs). 4,835 children had available data on BMI-associated SNPs, asthma, and BMI. The weighted allele score was strongly associated with BMI, fat mass, and lean mass (all p-values<0.001) and with childhood asthma (RR 2.56, 95% CI 1.38–4.76 per unit score, p = 0.003). The estimated causal RR for the effect of BMI on asthma was 1.55 (95% CI 1.16–2.07) per kg/m2, p = 0.003. This effect appeared stronger for non-atopic (1.90, 95% CI 1.19–3.03) than for atopic asthma (1.37, 95% CI 0.89–2.11) though there was little evidence of heterogeneity (p = 0.31). The estimated causal RRs for the effects of fat mass and lean mass on asthma were 1.41 (95% CI 1.11–1.79) per 0.5 kg and 2.25 (95% CI 1.23–4.11) per kg, respectively. The possibility of genetic pleiotropy could not be discounted completely; however, additional IV analyses using FTO variant rs1558902 and the other BMI-related SNPs separately provided similar causal effects with wider confidence intervals. Loss of follow-up was unlikely to bias the estimated effects.
Higher BMI increases the risk of asthma in mid-childhood. Higher BMI may have contributed to the increase in asthma risk toward the end of the 20th century.
Editors' Summary
Editors' Summary
The global burden of asthma, a chronic (long-term) condition caused by inflammation of the airways (the tubes that carry air in and out of the lungs), has been rising steadily over the past few decades. It is estimated that, nowadays, 200–300 million adults and children worldwide are affected by asthma. Although asthma can develop at any age, it is often diagnosed in childhood—asthma is the most common chronic disease in children. In people with asthma, the airways can react very strongly to allergens such as animal fur or to irritants such as cigarette smoke, becoming narrower so that less air can enter the lungs. Exercise, cold air, and infections can also trigger asthma attacks, which can be fatal. The symptoms of asthma include wheezing, coughing, chest tightness, and shortness of breath. Asthma cannot be cured, but drugs can relieve its symptoms and prevent acute asthma attacks.
Why Was This Study Done?
We cannot halt the ongoing rise in global asthma rates without understanding the causes of asthma. Some experts think obesity may be one cause of asthma. Obesity, like asthma, is increasingly common, and observational studies (investigations that ask whether individuals exposed to a suspected risk factor for a condition develop that condition more often than unexposed individuals) in children have reported that body mass index (BMI, an indicator of body fat calculated by dividing a person's weight in kilograms by their height in meters squared) is positively associated with asthma. Observational studies cannot prove that obesity causes asthma because of “confounding.” Overweight children with asthma may share another unknown characteristic (confounder) that actually causes both obesity and asthma. Moreover, children with asthma may be less active than unaffected children, so they become overweight (reverse causality). Here, the researchers use “Mendelian randomization” to assess whether BMI has a causal effect on asthma. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the effect of a modifiable risk factor and the outcome of interest. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. So, if a higher BMI leads to asthma, genetic variants associated with increased BMI should be associated with an increased risk of asthma.
What Did the Researchers Do and Find?
The researchers investigated causal effects of BMI, fat mass, and lean mass on current asthma at age 7½ years in 4,835 children enrolled in the Avon Longitudinal Study of Parents and Children (ALSPAC, a long-term health project that started in 1991). They calculated an allele score for each child based on 32 BMI-related genetic variants, and estimated associations between this score and BMI, fat mass and lean mass (both measured using a special type of X-ray scanner; in children BMI is not a good indicator of “fatness”), and asthma. They report that the allele score was strongly associated with BMI, fat mass, and lean mass, and with childhood asthma. The estimated causal relative risk (risk ratio) for the effect of BMI on asthma was 1.55 per kg/m2. That is, the relative risk of asthma increased by 55% for every extra unit of BMI. The estimated causal relative risks for the effects of fat mass and lean mass on asthma were 1.41 per 0.5 kg and 2.25 per kg, respectively.
What Do These Findings Mean?
These findings suggest that a higher BMI increases the risk of asthma in mid-childhood and that global increases in BMI toward the end of the 20th century may have contributed to the global increase in asthma that occurred at the same time. It is possible that the observed association between BMI and asthma reported in this study is underpinned by “genetic pleiotropy” (a potential limitation of all Mendelian randomization analyses). That is, some of the genetic variants included in the BMI allele score could conceivably also increase the risk of asthma. Nevertheless, these findings suggest that public health interventions designed to reduce obesity may also help to limit the global rise in asthma.
Additional Information
Please access these websites via the online version of this summary at
The US Centers for Disease Control and Prevention provides information on asthma and on all aspects of overweight and obesity (in English and Spanish)
The World Health Organization provides information on asthma and on obesity (in several languages)
The UK National Health Service Choices website provides information about asthma, about asthma in children, and about obesity (including real stories)
The Global Asthma Report 2011 is available
The Global Initiative for Asthma released its updated Global Strategy for Asthma Management and Prevention on World Asthma Day 2014
Information about the Avon Longitudinal Study of Parents and Children is available
MedlinePlus provides links to further information on obesity in children, on asthma, and on asthma in children (in English and Spanish
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
PMCID: PMC4077660  PMID: 24983943
2.  Association of Adenotonsillectomy with Asthma Outcomes in Children: A Longitudinal Database Analysis 
PLoS Medicine  2014;11(11):e1001753.
Rakesh Bhattacharjee and colleagues use data from a US private health insurance database to compare asthma severity measures in children one year before and one year after they underwent adenotonsillectomy with asthma measures in those who did not undergo adenotonsillectomy.
Please see later in the article for the Editors' Summary
Childhood asthma and obstructive sleep apnea (OSA), both disorders of airway inflammation, were associated in recent observational studies. Although childhood OSA is effectively treated by adenotonsillectomy (AT), it remains unclear whether AT also improves childhood asthma. We hypothesized that AT, the first line of therapy for childhood OSA, would be associated with improved asthma outcomes and would reduce the usage of asthma therapies in children.
Methods and Findings
Using the 2003–2010 MarketScan database, we identified 13,506 children with asthma in the United States who underwent AT. Asthma outcomes during 1 y preceding AT were compared to those during 1 y following AT. In addition, 27,012 age-, sex-, and geographically matched children with asthma without AT were included to examine asthma outcomes among children without known adenotonsillar tissue morbidity. Primary outcomes included the occurrence of a diagnostic code for acute asthma exacerbation (AAE) or acute status asthmaticus (ASA). Secondary outcomes included temporal changes in asthma medication prescriptions, the frequency of asthma-related emergency room visits (ARERs), and asthma-related hospitalizations (ARHs). Comparing the year following AT to the year prior, AT was associated with significant reductions in AAE (30.2%; 95% CI: 25.6%–34.3%; p<0.0001), ASA (37.9%; 95% CI: 29.2%–45.6%; p<0.0001), ARERs (25.6%; 95% CI: 16.9%–33.3%; p<0.0001), and ARHs (35.8%; 95% CI: 19.6%–48.7%; p = 0.02). Moreover, AT was associated with significant reductions in most asthma prescription refills, including bronchodilators (16.7%; 95% CI: 16.1%–17.3%; p<0.001), inhaled corticosteroids (21.5%; 95% CI: 20.7%–22.3%; p<0.001), leukotriene receptor antagonists (13.4%; 95% CI: 12.9%–14.0%; p<0.001), and systemic corticosteroids (23.7%; 95% CI: 20.9%–26.5%; p<0.001). In contrast, there were no significant reductions in these outcomes in children with asthma who did not undergo AT over an overlapping follow-up period. Limitations of the MarketScan database include lack of information on race and obesity status. Also, the MarketScan database does not include information on children with public health insurance (i.e., Medicaid) or uninsured children.
In a very large sample of privately insured children, AT was associated with significant improvements in several asthma outcomes. Contingent on validation through prospectively designed clinical trials, this study supports the premise that detection and treatment of adenotonsillar tissue morbidity may serve as an important strategy for improving asthma control.
Please see later in the article for the Editors' Summary
Editors' Summary
The global burden of asthma has been rising steadily over the past few decades. Nowadays, about 200–300 million adults and children worldwide are affected by asthma, a chronic condition caused by inflammation of the airways (the tubes that carry air in and out of the lungs). Although asthma can develop at any age, it is often diagnosed in childhood—asthma is one of the commonest chronic diseases in children. In the US, for example, asthma affects around 7.1 million children under the age of 18 years and is the third leading cause of hospitalization of children under the age of 15 years. In people with asthma, the airways can react very strongly to allergens such as animal fur or to irritants such as cigarette smoke. Exercise, cold air, and infections can trigger asthma attacks, which can be fatal. The symptoms of asthma include wheezing, coughing, chest tightness, and shortness of breath. Asthma cannot be cured, but drugs can relieve its symptoms and prevent acute asthma attacks.
Why Was This Study Done?
Recent studies have found an association between severe childhood asthma and obstructive sleep apnea (OSA). In OSA, airway inflammation promotes hypertrophy (excess growth) of the adenoids and the tonsils, immune system tissues in the upper airway. During sleep, the presence of hypertrophic adenotonsillar tissues predisposes the walls of the throat to collapse, which results in apnea—a brief interruption in breathing. People with OSA often snore loudly and frequently wake from deep sleep as they struggle to breathe. Childhood OSA, which affects 2%–3% of children, can be effectively treated by removal of the adenoids and tonsils (adenotonsillectomy). Given the association between childhood OSA and severe asthma and given the involvement of airway inflammation in both conditions, might adenotonsillectomy also improve childhood asthma? Here, the researchers analyze data from the MarketScan database, a large database of US patients with private health insurance, to investigate whether adenotonsillectomy is associated with improvements in asthma outcomes and with reductions in the use of asthma therapies in children.
What Did the Researchers Do and Find?
The researchers used the database to identify 13,506 children with asthma who had undergone adenotonsillectomy and to obtain information about asthma outcomes among these children for the year before and the year after the operation. Because asthma severity tends to decrease with age, the researchers also used the database to identify 27,012 age-, sex-, and geographically matched children with asthma who did not have the operation so that they could examine asthma outcomes over an equivalent two-year period in the absence of complications related to adenotonsillar hypertrophy. Comparing the year after adenotonsillectomy with the year before the operation, adenotonsillectomy was associated with a 30% reduction in acute asthma exacerbations, a 37.9% reduction in acute status asthmaticus (an asthma attack that is unresponsive to the drugs usually used to treat attacks), a 25.6% reduction in asthma-related emergency room visits, and a 35.8% reduction in asthma-related hospitalizations. By contrast, among the control children, there was only a 2% reduction in acute asthma exacerbations and only a 7% reduction in acute status asthmaticus over an equivalent two-year period. Adenotonsillectomy was also associated with significant reductions (changes unlikely to have occurred by chance) in prescription refills for most types of drugs used to treat asthma, whereas there were no significant reductions in prescription refills among children with asthma who had not undergone adenotonsillectomy. The study was limited by the lack of measures of race and obesity, which are both associated with severity of asthma.
What Do These Findings Mean?
These findings show that in a large sample of privately insured children in the US, adenotonsillectomy was associated with significant improvements in several asthma outcomes. These results do not show, however, that adenotonsillectomy caused a reduction in the severity of childhood asthma. It could be that the children who underwent adenotonsillectomy (but not those who did not have the operation) shared another unknown factor that led to improvements in their asthma over time. To prove a causal link, it will be necessary to undertake a randomized controlled trial in which the outcomes of groups of children with asthma who are chosen at random to undergo or not undergo adenotonsillectomy are compared. However, with the proviso that there are some risks associated with adenotonsillectomy, these findings suggest that the detection and treatment of adenotonsillar hypertrophy may help to improve asthma control in children.
Additional Information
Please access these websites via the online version of this summary at
The US Centers for Disease Control and Prevention provides information on asthma, including videos, games, and links to other resources for children with asthma
The American Lung Association provides detailed information about asthma and a fact sheet on asthma in children; it also has information about obstructive sleep apnea
The National Sleep Foundation provides information on snoring and obstructive sleep apnea in children
The UK National Health Service Choices website provides information (including some personal stories) about asthma, about asthma in children, and about obstructive sleep apnea
The “Global Asthma Report 2014” will be available in October 2014
MedlinePlus provides links to further information on asthma, on asthma in children, on sleep apnea, and on tonsils and adenoids (in English and Spanish)
PMCID: PMC4219664  PMID: 25369282
3.  Analyses of shared genetic factors between asthma and obesity in children 
Epidemiological studies consistently show associations between asthma and obesity. Shared genetics may account for this association.
To identify genetic variants associated with both asthma and obesity.
Based on a literature search, we identified genes from: 1) Genome-wide association studies (GWAS) of Body Mass Index (BMI) (n=17 genes), 2) GWAS of asthma (n=14) and 3) candidate gene studies of BMI and asthma (n=7). We used GWAS data from the Childhood Asthma Management Program (CAMP) to analyze associations between single nucleotide polymorphisms (SNPs) in these genes and asthma (n=359 subjects) and BMI (n=537).
One top BMI GWAS SNP from the literature, rs10938397 near GNPDA2, was associated with both BMI (p=4 × 10−4) and asthma (p=0.03). Of the top asthma GWAS SNPs and the candidate gene SNPs, none was found to be associated with both BMI and asthma. Gene-based analyses that included all available SNPs in each gene found associations (p<0.05) with both phenotypes for several genes: NEGR1, ROBO1, DGKG, FAIM2, FTO and CHST8 among the BMI GWAS genes; ILRL1/IL18R1, DPP10, PDE4D, MYB, PDE10A, IL33 and especially PTPRD among the asthma GWAS genes; and PRKCA among the BMI and asthma candidate genes.
SNPs within several genes showed associations to BMI and asthma at a gene level, but none of these associations were significant after correction for multiple testing. Our analysis of known candidate genes reveals some evidence for shared genetics between asthma and obesity, but other shared genetic determinants are likely to be identified in novel loci.
PMCID: PMC2941152  PMID: 20816195
Association; Asthma; BMI; Children; Genetics; GWAS; Obesity; Polymorphism; SNP
4.  Genome-Wide Association Study Identifies HLA-DP as a Susceptibility Gene for Pediatric Asthma in Asian Populations 
PLoS Genetics  2011;7(7):e1002170.
Asthma is a complex phenotype influenced by genetic and environmental factors. We conducted a genome-wide association study (GWAS) with 938 Japanese pediatric asthma patients and 2,376 controls. Single-nucleotide polymorphisms (SNPs) showing strong associations (P<1×10−8) in GWAS were further genotyped in an independent Japanese samples (818 cases and 1,032 controls) and in Korean samples (835 cases and 421 controls). SNP rs987870, located between HLA-DPA1 and HLA-DPB1, was consistently associated with pediatric asthma in 3 independent populations (Pcombined = 2.3×10−10, odds ratio [OR] = 1.40). HLA-DP allele analysis showed that DPA1*0201 and DPB1*0901, which were in strong linkage disequilibrium, were strongly associated with pediatric asthma (DPA1*0201: P = 5.5×10−10, OR = 1.52, and DPB1*0901: P = 2.0×10−7, OR = 1.49). Our findings show that genetic variants in the HLA-DP locus are associated with the risk of pediatric asthma in Asian populations.
Author Summary
Asthma is the most common chronic disorder in children, and asthma exacerbation is an important cause of childhood morbidity and hospitalization. Here, taking advantage of recent technological advances in human genetics, we performed a genome-wide association study and follow-up validation studies to identify genetic variants for asthma. By examining 6,428 Asians, we found rs987870 and HLA-DPA1*0201/DPB1*0901 were associated with pediatric asthma. The association signal was stretched in the region of HLA-DPB2, collagen, type XI, alpha 2 (COL11A2), and Retinoid X receptor beta (RXRB), but strong linkage disequilibrium in this region made it difficult to specifically identify causative variants. Interestingly, the SNP (or the HLA-DP allele) associated with pediatric asthma (Th-2 type immune diseases) in the present study confers protection against Th-1 type immune diseases, such as type 1 diabetes and rheumatoid arthritis. Therefore, the association results obtained in the present study could partially explain the inverse relationship between asthma and Th-1 type immune diseases and may lead to better understanding of Th-1/Th-2 immune diseases.
PMCID: PMC3140987  PMID: 21814517
5.  Genome-Wide Association Study Implicates Chromosome 9q21.31 as a Susceptibility Locus for Asthma in Mexican Children 
PLoS Genetics  2009;5(8):e1000623.
Many candidate genes have been studied for asthma, but replication has varied. Novel candidate genes have been identified for various complex diseases using genome-wide association studies (GWASs). We conducted a GWAS in 492 Mexican children with asthma, predominantly atopic by skin prick test, and their parents using the Illumina HumanHap 550 K BeadChip to identify novel genetic variation for childhood asthma. The 520,767 autosomal single nucleotide polymorphisms (SNPs) passing quality control were tested for association with childhood asthma using log-linear regression with a log-additive risk model. Eleven of the most significantly associated GWAS SNPs were tested for replication in an independent study of 177 Mexican case–parent trios with childhood-onset asthma and atopy using log-linear analysis. The chromosome 9q21.31 SNP rs2378383 (p = 7.10×10−6 in the GWAS), located upstream of transducin-like enhancer of split 4 (TLE4), gave a p-value of 0.03 and the same direction and magnitude of association in the replication study (combined p = 6.79×10−7). Ancestry analysis on chromosome 9q supported an inverse association between the rs2378383 minor allele (G) and childhood asthma. This work identifies chromosome 9q21.31 as a novel susceptibility locus for childhood asthma in Mexicans. Further, analysis of genome-wide expression data in 51 human tissues from the Novartis Research Foundation showed that median GWAS significance levels for SNPs in genes expressed in the lung differed most significantly from genes not expressed in the lung when compared to 50 other tissues, supporting the biological plausibility of our overall GWAS findings and the multigenic etiology of childhood asthma.
Author Summary
Asthma is a leading chronic childhood disease with a presumed strong genetic component, but no genes have been definitely shown to influence asthma development. Few genetic studies of asthma have included Hispanic populations. Here, we conducted a genome-wide association study of asthma in 492 Mexican children with asthma, predominantly atopic by skin prick test, and their parents to identify novel genetic variation for childhood asthma. We implicated several polymorphisms in or near TLE4 on chromosome 9q21.31 (a novel candidate region for childhood asthma) and replicated one polymorphism in an independent study of childhood-onset asthmatics with atopy and their parents of Mexican ethnicity. Hispanics have differing proportions of Native American, European, and African ancestries, and we found less Native American ancestry than expected at chromosome 9q21.31. This suggests that chromosome 9q21.31 may underlie ethnic differences in childhood asthma and that future replication would be most effective in populations with Native American ancestry. Analysis of publicly available genome-wide expression data revealed that association signals in genes expressed in the lung differed most significantly from genes not expressed in the lung when compared to 50 other tissues, supporting the biological plausibility of the overall GWAS findings and the multigenic etiology of asthma.
PMCID: PMC2722731  PMID: 19714205
6.  Screening large-scale association study data: exploiting interactions using random forests 
BMC Genetics  2004;5:32.
Genome-wide association studies for complex diseases will produce genotypes on hundreds of thousands of single nucleotide polymorphisms (SNPs). A logical first approach to dealing with massive numbers of SNPs is to use some test to screen the SNPs, retaining only those that meet some criterion for futher study. For example, SNPs can be ranked by p-value, and those with the lowest p-values retained. When SNPs have large interaction effects but small marginal effects in a population, they are unlikely to be retained when univariate tests are used for screening. However, model-based screens that pre-specify interactions are impractical for data sets with thousands of SNPs. Random forest analysis is an alternative method that produces a single measure of importance for each predictor variable that takes into account interactions among variables without requiring model specification. Interactions increase the importance for the individual interacting variables, making them more likely to be given high importance relative to other variables. We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex disease models with up to 32 loci, incorporating both genetic heterogeneity and multi-locus interaction.
Keeping other factors constant, if risk SNPs interact, the random forest importance measure significantly outperforms the Fisher Exact test as a screening tool. As the number of interacting SNPs increases, the improvement in performance of random forest analysis relative to Fisher Exact test for screening also increases. Random forests perform similarly to the univariate Fisher Exact test as a screening tool when SNPs in the analysis do not interact.
In the context of large-scale genetic association studies where unknown interactions exist among true risk-associated SNPs or SNPs and environmental covariates, screening SNPs using random forest analyses can significantly reduce the number of SNPs that need to be retained for further study compared to standard univariate screening methods.
PMCID: PMC545646  PMID: 15588316
7.  Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification 
PLoS Genetics  2013;9(8):e1003609.
Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.
Author Summary
As next-generation sequencing (NGS) costs continue to fall and genome-wide association study (GWAS) platform coverage improves, the human genetics community is positioned to identify potentially causal variants. However, current NGS or imputation-based studies of either the whole genome or regions previously identified by GWAS have not yet been very successful in identifying causal variants. A major hurdle is the development of methods to distinguish disease-causing variants from their highly-correlated proxies within an associated region. We show that various common factors, such as differential sequencing or imputation accuracy rates and linkage disequilibrium patterns, with or without GWAS-informed region selection, can substantially decrease the probability of identifying the correct causal SNP, often by more than half. We then describe a novel and easy-to-implement re-ranking procedure that can double the probability that the causal SNP is top-ranked in many settings. Application to the NCI Breast and Prostate Cancer (BPC3) Cohort Consortium aggressive prostate cancer data identified new top SNPs within two associated loci previously established via GWAS, as well as several additional possible causal SNPs that had been previously overlooked.
PMCID: PMC3738448  PMID: 23950724
8.  Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data 
Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity.
We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA.
A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration.
We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.
PMCID: PMC3618247  PMID: 23566118
9.  Development of a Pharmacogenetic Predictive Test in asthma: proof of concept 
Pharmacogenetics and genomics  2010;20(2):86-93.
To assess the feasibility of developing a Combined Clinical and Pharmacogenetic Predictive Test, comprised of multiple single nucleotide polymorphisms (SNPs) that is associated with poor bronchodilator response (BDR).
We genotyped SNPs that tagged the whole genome of the parents and children in the Childhood Asthma Management Program (CAMP) and implemented an algorithm using a family-based association test that ranked SNPs by statistical power. The top eight SNPs that were associated with BDR comprised the Pharmacogenetic Predictive Test. The Clinical Predictive Test was comprised of baseline forced expiratory volume in 1 s (FEV1). We evaluated these predictive tests and a Combined Clinical and Pharmacogenetic Predictive Test in three distinct populations: the children of the CAMP trial and two additional clinical trial populations of asthma. Our outcome measure was poor BDR, defined as BDR of less than 20th percentile in each population. BDR was calculated as the percent difference between the prebronchodilator and postbronchodilator (two puffs of albuterol at 180 μg/puff) FEV1 value. To assess the predictive ability of the test, the corresponding area under the receiver operating characteristic curves (AUROCs) were calculated for each population.
The AUROC values for the Clinical Predictive Test alone were not significantly different from 0.50, the AUROC of a random classifier. Our Combined Clinical and Pharmacogenetic Predictive Test comprised of genetic polymorphisms in addition to FEV1 predicted poor BDR with an AUROC of 0.65 in the CAMP children (n= 422) and 0.60 (n= 475) and 0.63 (n= 235) in the two independent populations. Both the Combined Clinical and Pharmacogenetic Predictive Test and the Pharmacogenetic Predictive Test were significantly more accurate than the Clinical Predictive Test (AUROC between 0.44 and 0.55) in each of the populations.
Our finding that genetic polymorphisms with a clinical trait are associated with BDR suggests that there is promise in using multiple genetic polymorphisms simultaneously to predict which asthmatics are likely to respond poorly to bronchodilators.
PMCID: PMC3654515  PMID: 20032818
asthma; bronchodilator response; personalized medicine; pharmacogenetic test; predictive medicine
10.  Genome-wide association study of body mass index in 23,000 individuals with and without asthma 
Both asthma and obesity are complex disorders that are influenced by environmental and genetic factors. Shared genetic factors between asthma and obesity have been proposed to partly explain epidemiological findings of co-morbidity between these conditions.
To identify genetic variants that are associated with body mass index (BMI) in asthmatic children and adults, and to evaluate if there are differences between the genetics of BMI in asthmatics and healthy individuals.
In total, 19 studies contributed with genome-wide analysis study (GWAS) data from more than 23,000 individuals with predominantly European descent, of whom 8,165 are asthmatics.
We report associations between several DENND1B variants (p=2.2×10−7 for rs4915551) on chromosome 1q31 and BMI from a meta-analysis of GWAS data using 2,691 asthmatic children (screening data). The top DENND1B SNPs were next evaluated in seven independent replication data sets comprising 2,014 asthmatics, and rs4915551 was nominally replicated (p<0.05) in two of the seven studies and of borderline significance in one (p=0.059). However, strong evidence of effect heterogeneity was observed and overall, the association between rs4915551 and BMI was not significant in the total replication data set, p=0.71. Using a random effects model, BMI was overall estimated to increase by 0.30 kg/m2 (p=0.01 for combined screening and replication data sets, N=4,705) per additional G allele of this DENND1B SNP. FTO was confirmed as an important gene for adult and childhood BMI regardless of asthma status.
Conclusions and Clinical Relevance
DENND1B was recently identified as an asthma susceptibility gene in a GWAS on children, and here we find evidence that DENND1B variants may also be associated with BMI in asthmatic children. However, the association was overall not replicated in the independent data sets and the heterogeneous effect of DENND1B points to complex associations with the studied diseases that deserve further study.
PMCID: PMC3608930  PMID: 23517042
Association; Asthma; BMI; Genetics; Genome-wide; Obesity
11.  Genome-Wide Association Analysis in Asthma Subjects Identifies SPATS2L as a Novel Bronchodilator Response Gene 
PLoS Genetics  2012;8(7):e1002824.
Bronchodilator response (BDR) is an important asthma phenotype that measures reversibility of airway obstruction by comparing lung function (i.e. FEV1) before and after the administration of a short-acting β2-agonist, the most common rescue medications used for the treatment of asthma. BDR also serves as a test of β2-agonist efficacy. BDR is a complex trait that is partly under genetic control. A genome-wide association study (GWAS) of BDR, quantified as percent change in baseline FEV1 after administration of a β2-agonist, was performed with 1,644 non-Hispanic white asthmatic subjects from six drug clinical trials: CAMP, LOCCS, LODO, a medication trial conducted by Sepracor, CARE, and ACRN. Data for 469,884 single-nucleotide polymorphisms (SNPs) were used to measure the association of SNPs with BDR using a linear regression model, while adjusting for age, sex, and height. Replication of primary P-values was attempted in 501 white subjects from SARP and 550 white subjects from DAG. Experimental evidence supporting the top gene was obtained via siRNA knockdown and Western blotting analyses. The lowest overall combined P-value was 9.7E-07 for SNP rs295137, near the SPATS2L gene. Among subjects in the primary analysis, those with rs295137 TT genotype had a median BDR of 16.0 (IQR = [6.2, 32.4]), while those with CC or TC genotypes had a median BDR of 10.9 (IQR = [5.0, 22.2]). SPATS2L mRNA knockdown resulted in increased β2-adrenergic receptor levels. Our results suggest that SPATS2L may be an important regulator of β2-adrenergic receptor down-regulation and that there is promise in gaining a better understanding of the biological mechanisms of differential response to β2-agonists through GWAS.
Author Summary
Bronchodilator response (BDR) is an important asthma phenotype that measures reversibility of airway obstruction by comparing lung function before and after the administration of short-acting β2-agonists, common medications used for asthma treatment. We performed a genome-wide association study of BDR with 1,644 white asthmatic subjects from six drug clinical trials and attempted to replicate these findings in 1,051 white subjects from two independent cohorts. The most significant associated variant was near the SPATS2L gene. We knocked down SPATS2L mRNA in human airway smooth muscle cells and found that β2-adrenergic receptor levels increased, suggesting that SPATS2L may be a regulator of BDR. Our results highlight the promise of pursuing GWAS results that do not necessarily reach genome-wide significance and are an example of how results from pharmacogenetic GWAS can be studied functionally.
PMCID: PMC3390407  PMID: 22792082
12.  Breast cancer risk assessment with five independent genetic variants and two risk factors in Chinese women 
Recently, several genome-wide association studies (GWAS) have identified novel single nucleotide polymorphisms (SNPs) associated with breast cancer risk. However, most of the studies were conducted among Caucasians and only one from Chinese.
In the current study, we first tested whether 15 SNPs identified by previous GWAS were also breast cancer marker SNPs in this Chinese population. Then, we grouped the marker SNPs, and modeled them with clinical risk factors, to see the usage of these factors in breast cancer risk assessment. Two methods (risk factors counting and odds ratio (OR) weighted risk scoring) were used to evaluate the cumulative effects of the five significant SNPs and two clinical risk factors (age at menarche and age at first live birth).
Five SNPs located at 2q35, 3p24, 6q22, 6q25 and 10q26 were consistently associated with breast cancer risk in both testing set (878 cases and 900 controls) and validation set (914 cases and 967 controls) samples. Overall, all of the five SNPs contributed to breast cancer susceptibility in a dominant genetic model (2q35, rs13387042: adjusted OR = 1.26, P = 0.006; 3q24.1, rs2307032: adjusted OR = 1.24, P = 0.005; 6q22.33, rs2180341: adjusted OR = 1.22, P = 0.006; 6q25.1, rs2046210: adjusted OR = 1.51, P = 2.40 × 10-8; 10q26.13, rs2981582: adjusted OR = 1.31, P = 1.96 × 10-4). Risk score analyses (area under the curve (AUC): 0.649, 95% confidence interval (CI): 0.631 to 0.667; sensitivity = 62.60%, specificity = 57.05%) presented better discrimination than that by risk factors counting (AUC: 0.637, 95% CI: 0.619 to 0.655; sensitivity = 62.16%, specificity = 60.03%) (P < 0.0001). Absolute risk was then calculated by the modified Gail model and an AUC of 0.658 (95% CI = 0.640 to 0.676) (sensitivity = 61.98%, specificity = 60.26%) was obtained for the combination of five marker SNPs, age at menarche and age at first live birth.
This study shows that five GWAS identified variants were also consistently validated in this Chinese population and combining these genetic variants with other risk factors can improve the risk predictive ability of breast cancer. However, more breast cancer associated risk variants should be incorporated to optimize the risk assessment.
PMCID: PMC3496134  PMID: 22269215
13.  Genome-wide association analysis of circulating vitamin D levels in children with asthma 
Human genetics  2012;131(9):1495-1505.
Vitamin D deficiency is becoming more apparent in many populations. Genetic factors may play a role in the maintenance of vitamin D levels. The objective of this study was to perform a genome-wide analysis (GWAS) of vitamin D levels, including replication of prior GWAS results. We measured 25-hydroxyvitamin D (25(OH)D) levels in serum collected at the time of enrollment and at year 4 in 572 Caucasian children with asthma, who were part of a multi-center clinical trial, the Childhood Asthma Management Program. Replication was performed in a second cohort of 592 asthmatics from Costa Rica and a third cohort of 516 Puerto Rican asthmatics. In addition, we attempted replication of three SNPs that were previously identified in a large GWAS of Caucasian individuals. The setting included data from a clinical trial of childhood asthmatics and two cohorts of asthmatics recruited for genetic studies of asthma. The main outcome measure was circulating 25(OH)D levels. The 25(OH)D levels at the two time-points were only modestly correlated with each other (intraclass correlation coefficient = 0.33) in the CAMP population. We identified SNPs that were nominally associated with 25(OH)D levels at two time-points in CAMP, and replicated four SNPs in the Costa Rican cohort: rs11002969, rs163221, rs1678849, and rs4864976. However, these SNPs were not significantly associated with 25(OH)D levels in a third population of Puerto Rican asthmatics. We were able to replicate the SNP with the strongest effect, previously reported in a large GWAS: rs2282679 (GC), and we were able to replicate another SNP, rs10741657 (CYP2R1), to a lesser degree. We were able to replicate two of three prior significant findings in a GWAS of 25(OH)D levels. Other SNPs may be additionally associated with 25(OH)D levels in certain populations.
PMCID: PMC3648789  PMID: 22673963
14.  Genome-wide Association Identifies the T Gene as a Novel Asthma Pharmacogenetic Locus 
Rationale: To date, most studies aimed at discovering genetic factors influencing treatment response in asthma have focused on biologic candidate genes. Genome-wide association studies (GWAS) can rapidly identify novel pharmacogenetic loci.
Objectives: To investigate if GWAS can identify novel pharmacogenetic loci in asthma.
Methods: Using phenotypic and GWAS genotype data available through the NHLBI-funded Single-nucleotide polymorphism Health association-Asthma Resource Project, we analyzed differences in FEV1 in response to inhaled corticosteroids in 418 white subjects with asthma. Of the 444,088 single nucleotide polymorphisms (SNPs) analyzed, the lowest 50 SNPs by P value were genotyped in an independent clinical trial population of 407 subjects with asthma.
Measurements and Main Results: The lowest P value for the GWAS analysis was 2.09 × 10−6. Of the 47 SNPs successfully genotyped in the replication population, three were associated under the same genetic model in the same direction, including two of the top four SNPs ranked by P value. Combined P values for these SNPs were 1.06 × 10−5 for rs3127412 and 6.13 × 10−6 for rs6456042. Although these two were not located within a gene, they were tightly correlated with three variants mapping to potentially functional regions within the T gene. After genotyping, each T gene variant was also associated with lung function response to inhaled corticosteroids in each of the trials associated with rs3127412 and rs6456042 in the initial GWAS analysis. On average, there was a twofold to threefold difference in FEV1 response for those subjects homozygous for the wild-type versus mutant alleles for each T gene SNP.
Conclusions: Genome-wide association has identified the T gene as a novel pharmacogenetic locus for inhaled corticosteroid response in asthma.
PMCID: PMC3381232  PMID: 22538805
polymorphism; genome; pharmacogenomics; glucocorticoid
15.  Exonic Variants Associated with Development of Aspirin Exacerbated Respiratory Diseases 
PLoS ONE  2014;9(11):e111887.
Aspirin-exacerbated respiratory disease (AERD) is one phenotype of asthma, often occurring in the form of a severe and sudden attack. Due to the time-consuming nature and difficulty of oral aspirin challenge (OAC) for AERD diagnosis, non-invasive biomarkers have been sought. The aim of this study was to identify AERD-associated exonic SNPs and examine the diagnostic potential of a combination of these candidate SNPs to predict AERD. DNA from 165 AERD patients, 397 subjects with aspirin-tolerant asthma (ATA), and 398 normal controls were subjected to an Exome BeadChip assay containing 240K SNPs. 1,023 models (210-1) were generated from combinations of the top 10 SNPs, selected by the p-values in association with AERD. The area under the curve (AUC) of the receiver operating characteristic (ROC) curves was calculated for each model. SNP Function Portal and PolyPhen-2 were used to validate the functional significance of candidate SNPs. An exonic SNP, exm537513 in HLA-DPB1, showed the lowest p-value (p = 3.40×10−8) in its association with AERD risk. From the top 10 SNPs, a combination model of 7 SNPs (exm537513, exm83523, exm1884673, exm538564, exm2264237, exm396794, and exm791954) showed the best AUC of 0.75 (asymptotic p-value of 7.94×10−21), with 34% sensitivity and 93% specificity to discriminate AERD from ATA. Amino acid changes due to exm83523 in CHIA were predicted to be “probably damaging” to the structure and function of the protein, with a high score of ‘1’. A combination model of seven SNPs may provide a useful, non-invasive genetic marker combination for predicting AERD.
PMCID: PMC4221198  PMID: 25372592
16.  Decision Forest Analysis of 61 Single Nucleotide Polymorphisms in a Case-Control Study of Esophageal Cancer; a novel method 
BMC Bioinformatics  2005;6(Suppl 2):S4.
Systematic evaluation and study of single nucleotide polymorphisms (SNPs) made possible by high throughput genotyping technologies and bioinformatics promises to provide breakthroughs in the understanding of complex diseases. Understanding how the millions of SNPs in the human genome are involved in conferring susceptibility or resistance to disease, or in rendering a drug efficacious or toxic in the individual is a major goal of the relatively new fields of pharmacogenomics. Esophageal squamous cell carcinoma is a high-mortality cancer with complex etiology and progression involving both genetic and environmental factors. We examined the association between esophageal cancer risk and patterns of 61 SNPs in a case-control study for a population from Shanxi Province in North Central China that has among the highest rates of esophageal squamous cell carcinoma in the world.
High-throughput Masscode mass spectrometry genotyping was done on genomic DNA from 574 individuals (394 cases and 180 age-frequency matched controls). SNPs were chosen from among genes involving DNA repair enzymes, and Phase I and Phase II enzymes.
We developed a novel adaptation of the Decision Forest pattern recognition method named Decision Forest for SNPs (DF-SNPs). The method was designated to analyze the SNP data.
The classifier in separating the cases from the controls developed with DF-SNPs gave concordance, sensitivity and specificity, of 94.7%, 99.0% and 85.1%, respectively; suggesting its usefulness for hypothesizing what SNPs or combinations of SNPs could be involved in susceptibility to esophageal cancer. Importantly, the DF-SNPs algorithm incorporated a randomization test for assessing the relevance (or importance) of individual SNPs, SNP types (Homozygous common, heterozygous and homozygous variant) and patterns of SNP types (SNP patterns) that differentiate cases from controls. For example, we found that the different genotypes of SNP GADD45B E1122 are all associated with cancer risk.
The DF-SNPs method can be used to differentiate esophageal squamous cell carcinoma cases from controls based on individual SNPs, SNP types and SNP patterns. The method could be useful to identify potential biomarkers from the SNP data and complement existing methods for genotype analyses.
PMCID: PMC1637030  PMID: 16026601
17.  Performance of random forest when SNPs are in linkage disequilibrium 
BMC Bioinformatics  2009;10:78.
Single nucleotide polymorphisms (SNPs) may be correlated due to linkage disequilibrium (LD). Association studies look for both direct and indirect associations with disease loci. In a Random Forest (RF) analysis, correlation between a true risk SNP and SNPs in LD may lead to diminished variable importance for the true risk SNP. One approach to address this problem is to select SNPs in linkage equilibrium (LE) for analysis. Here, we explore alternative methods for dealing with SNPs in LD: change the tree-building algorithm by building each tree in an RF only with SNPs in LE, modify the importance measure (IM), and use haplotypes instead of SNPs to build a RF.
We evaluated the performance of our alternative methods by simulation of a spectrum of complex genetics models. When a haplotype rather than an individual SNP is the risk factor, we find that the original Random Forest method performed on SNPs provides good performance. When individual, genotyped SNPs are the risk factors, we find that the stronger the genetic effect, the stronger the effect LD has on the performance of the original RF. A revised importance measure used with the original RF is relatively robust to LD among SNPs; this revised importance measure used with the revised RF is sometimes inflated. Overall, we find that the revised importance measure used with the original RF is the best choice when the genetic model and the number of SNPs in LD with risk SNPs are unknown. For the haplotype-based method, under a multiplicative heterogeneity model, we observed a decrease in the performance of RF with increasing LD among the SNPs in the haplotype.
Our results suggest that by strategically revising the Random Forest method tree-building or importance measure calculation, power can increase when LD exists between SNPs. We conclude that the revised Random Forest method performed on SNPs offers an advantage of not requiring genotype phase, making it a viable tool for use in the context of thousands of SNPs, such as candidate gene studies and follow-up of top candidates from genome wide association studies.
PMCID: PMC2666661  PMID: 19265542
18.  Integrative Modeling of eQTLs and Cis-Regulatory Elements Suggests Mechanisms Underlying Cell Type Specificity of eQTLs 
PLoS Genetics  2013;9(8):e1003649.
Genetic variants in cis-regulatory elements or trans-acting regulators frequently influence the quantity and spatiotemporal distribution of gene transcription. Recent interest in expression quantitative trait locus (eQTL) mapping has paralleled the adoption of genome-wide association studies (GWAS) for the analysis of complex traits and disease in humans. Under the hypothesis that many GWAS associations tag non-coding SNPs with small effects, and that these SNPs exert phenotypic control by modifying gene expression, it has become common to interpret GWAS associations using eQTL data. To fully exploit the mechanistic interpretability of eQTL-GWAS comparisons, an improved understanding of the genetic architecture and causal mechanisms of cell type specificity of eQTLs is required. We address this need by performing an eQTL analysis in three parts: first we identified eQTLs from eleven studies on seven cell types; then we integrated eQTL data with cis-regulatory element (CRE) data from the ENCODE project; finally we built a set of classifiers to predict the cell type specificity of eQTLs. The cell type specificity of eQTLs is associated with eQTL SNP overlap with hundreds of cell type specific CRE classes, including enhancer, promoter, and repressive chromatin marks, regions of open chromatin, and many classes of DNA binding proteins. These associations provide insight into the molecular mechanisms generating the cell type specificity of eQTLs and the mode of regulation of corresponding eQTLs. Using a random forest classifier with cell specific CRE-SNP overlap as features, we demonstrate the feasibility of predicting the cell type specificity of eQTLs. We then demonstrate that CREs from a trait-associated cell type can be used to annotate GWAS associations in the absence of eQTL data for that cell type. We anticipate that such integrative, predictive modeling of cell specificity will improve our ability to understand the mechanistic basis of human complex phenotypic variation.
Author Summary
When interpreting genome-wide association studies showing that specific genetic variants are associated with disease risk, scientists look for a link between the genetic variant and a biological mechanism behind that disease. One functional mechanism is that the genetic variant may influence gene transcription via a co-localized genomic regulatory element, such as a transcription factor binding site within an open chromatin region. Often this type of regulation occurs in some cell types but not others. In this study, we look across eleven gene expression studies with seven cell types and consider how genetic transcription regulators, or eQTLs, replicate within and between cell types. We identify pervasive allelic heterogeneity, or transcriptional control of a single gene by multiple, independent eQTLs. We integrate extensive data on cell type specific regulatory elements from ENCODE to identify general methods of transcription regulation through enrichment of eQTLs within regulatory elements. We also build a classifier to predict eQTL replication across cell types. The results in this paper present a path to an integrative, predictive approach to improve our ability to understand the mechanistic basis of human phenotypic variation.
PMCID: PMC3731231  PMID: 23935528
19.  Genomewide association study of the age of onset of childhood asthma 
Childhood asthma is a complex disease with known heritability and phenotypic diversity. Although an earlier onset has been associated with more severe disease, there has been no genome-wide association study of the age of onset of asthma in children.
To identify genetic variants associated with earlier onset of childhood asthma.
We conducted the first genome-wide association study (GWAS) of the age of onset of childhood asthma among participants in the Childhood Asthma Management Program (CAMP), and used three independent cohorts from North America, Costa Rica, and Sweden for replication.
Two SNPs were associated with earlier onset of asthma in the combined analysis of CAMP and the replication cohorts: : rs9815663 (Fisher’s P value=2.31 × 10−8) and rs7927044 (P=6.54 × 10−9). Of these two SNPs, rs9815663 was also significantly associated with earlier asthma onset in an analysis including only the replication cohorts. Ten SNPs in linkage disequilibrium with rs9815663 were also associated with earlier asthma onset (2.24 × 10−7 < P < 8.22 ×10−6). Having ≥1 risk allele of the two SNPs of interest (rs9815663 and rs7927044) was associated with lower lung function and higher asthma medication use during 4 years of follow-up in CAMP.
We have identified two SNPs associated with earlier onset of childhood asthma in four independent cohorts.
PMCID: PMC3387331  PMID: 22560479
Asthma; pediatrics; age of onset; asthma genetics; C1orf100; genome-wide association study; pediatric asthma
20.  Pooled Genome-Wide Analysis to Identify Novel Risk Loci for Pediatric Allergic Asthma 
PLoS ONE  2011;6(2):e16912.
Genome-wide association studies of pooled DNA samples were shown to be a valuable tool to identify candidate SNPs associated to a phenotype. No such study was up to now applied to childhood allergic asthma, even if the very high complexity of asthma genetics is an appropriate field to explore the potential of pooled GWAS approach.
Methodology/Principal Findings
We performed a pooled GWAS and individual genotyping in 269 children with allergic respiratory diseases comparing allergic children with and without asthma. We used a modular approach to identify the most significant loci associated with asthma by combining silhouette statistics and physical distance method with cluster-adapted thresholding. We found 97% concordance between pooled GWAS and individual genotyping, with 36 out of 37 top-scoring SNPs significant at individual genotyping level. The most significant SNP is located inside the coding sequence of C5, an already identified asthma susceptibility gene, while the other loci regulate functions that are relevant to bronchial physiopathology, as immune- or inflammation-mediated mechanisms and airway smooth muscle contraction. Integration with gene expression data showed that almost half of the putative susceptibility genes are differentially expressed in experimental asthma mouse models.
Combined silhouette statistics and cluster-adapted physical distance threshold analysis of pooled GWAS data is an efficient method to identify candidate SNP associated to asthma development in an allergic pediatric population.
PMCID: PMC3040188  PMID: 21359210
21.  Data Mining Approaches for Genome-Wide Association of Mood Disorders 
Psychiatric Genetics  2012;22(2):55-61.
Mood disorders are highly heritable forms of major mental illness. A major breakthrough in elucidating the genetic architecture of mood disorders was anticipated with the advent of genome-wide association studies (GWAS). However, to date few susceptibility loci have been conclusively identified. The genetic etiology of mood disorders appears to be quite complex, and as a result, alternative approaches for analyzing GWAS data are needed. Recently, a polygenic scoring approach that captures the effects of alleles across multiple loci was successfully applied to the analysis of GWAS data in schizophrenia and bipolar disorder (BP). However, this method may be overly simplistic in its approach to the complexity of genetic effects. Data mining methods are available that may be applied to analyze the high dimensional data generated by GWAS of complex psychiatric disorders. We sought to compare the performance of five data mining methods, namely, Bayesian Networks (BN), Support Vector Machine (SVM), Random Forest (RF), Radial Basis Function network (RBF), and Logistic Regression (LR), against the polygenic scoring approach in the analysis of GWAS data on BP. The different classification methods were trained on GWAS datasets from the Bipolar Genome Study (2,191 cases with BP and 1,434 controls) and their ability to accurately classify case/control status was tested on a GWAS dataset from the Wellcome Trust Case Control Consortium. The performance of the classifiers in the test dataset was evaluated by comparing area under the receiver operating characteristic curves (AUC). BN performed the best of all the data mining classifiers, but none of these did significantly better than the polygenic score approach. We further examined a subset of SNPs in genes that are expressed in the brain, under the hypothesis that these might be most relevant to BP susceptibility, but all the classifiers performed worse with this reduced set of SNPs. The discriminative accuracy of all of these methods is unlikely to be of diagnostic or clinical utility at the present time. Further research is needed to develop strategies for selecting sets of SNPs likely to be relevant to disease susceptibility and to determine if other data mining classifiers that utilize other algorithms for inferring relationships among the sets of SNPs may perform better.
PMCID: PMC3306768  PMID: 22081063
data mining; Genome-Wide Association; Mood Disorders
22.  Polygenic risk and the development and course of asthma: Evidence from a 4-decade longitudinal study 
The lancet. Respiratory medicine  2013;1(6):453-461.
Genome-wide association studies (GWAS) have discovered loci that predispose to asthma. To integrate these new discoveries with emerging models of asthma pathobiology, research is needed to test how genetic discoveries relate to developmental and biological characteristics of asthma.
We derived a multi-locus profile of genetic risk from published GWAS of asthma case status. We then tested associations between this “genetic risk score” and developmental and biological characteristics of asthma in a population-based long-running birth cohort, the Dunedin Longitudinal Study (n=1,037). We evaluated asthma onset, persistence, atopy, airway hyperresponsiveness, incompletely reversible airflow obstruction, and asthma-related school and work absenteeism and hospitalization during 9 prospective assessments spanning ages 9–38 years, when 95% of surviving cohort members were seen.
Cohort members at higher genetic risk experienced asthma onset earlier in life (HR=1.12 [1.01–1.26]). Childhood-onset asthma cases at higher genetic risk were more likely to become life-course-persistent asthma cases (RR=1.36 [1.14–1.63]). Asthma cases at higher genetic risk more often manifested atopy (RR=1.07 [1.01–1.14]), airway hyperresponsiveness (RR=1.16 [1.03–1.32]), and incompletely reversible airflow obstruction (RR=1.28 [1.04–1.57]). They were also more likely to miss school or work due to asthma (IRR=1.38 [1.02–1.86]) and to be hospitalized with breathing problems (HR=1.38 [1.07–1.79]). Genotypic information about asthma risk was independent of and additive to information derived from cohort members’ family histories of asthma.
Findings from this population study confirm that GWAS-discoveries for asthma associate with a childhood-onset phenotype and advance asthma genetics beyond the original GWAS-discoveries in three ways: (1) We show that genetic risks predict which childhood-onset asthma cases remit and which become life-course-persistent cases, although these predictions are not sufficiently sensitive or specific to support immediate clinical translation; (2) We elucidate a biological profile of the asthma that arises from these genetic risks: asthma characterized by atopy and airway hyperresponsiveness and leading to incompletely reversible airflow obstruction; and (3) We describe the real-life impact of GWAS-discoveries by quantifying genetic associations with missed school and work and hospitalization.
PMCID: PMC3899706  PMID: 24429243
23.  Genetics of Asthma Susceptibility and Severity 
Clinics in chest medicine  2012;33(3):431-443.
The interaction of genes and environmental exposures influences the development of asthma and determines asthma severity. This review focuses on recent developments in genetic studies of asthma onset and progression. Genome-wide association studies (GWAS) are currently the most effective approach to study genetics of complex diseases. There have been two large meta-analyses of asthma susceptibility, GABRIEL and EVE, which identified the same four chromosomal regions, many of which had also been identified in previous GWAS: loci in the ORMDL3 region of 17q21, IL1RL/IL18R genes on chromosome 2q, the TSLP gene region on 5q22, and IL33 on chromosome 9p24. These regions were associated with asthma in individuals of different ethnic backgrounds. EVE also identified a novel asthma susceptibility locus, PYHIN1, in individuals of African descent. Genome-wide screens for asthma susceptibility in Asian adults and children both identified genetic variants in the major histocompatiblity complex gene region (HLA region) on chromosome 6p21 as highly associated with asthma risk. This locus was one of the first candidate genes identified for asthma and has been a significant predictor of asthma risk in several GWAS.
There is also a need to understand asthma disease heterogeneity as different phenotypes may reflect several pathogenic pathways. Genes that are associated with phenotypes including lung function, biomarker levels and asthma therapeutic responses provide insight into mechanisms of asthma severity progression. For example, the HHIP gene is a significant predictor of pulmonary function changes in asthma and in the normal population. A joint model of risk variants in lung function genes were highly associated with lower FEV1 and increased asthma severity criteria. In addition, a genome-wide screen to discover pharmacogenetic associations related to response to inhaled glucocorticoids identified two correlated SNPs in the GLCCI1 gene that confer a significant lung function response to this asthma therapy.
Future genetic studies for asthma susceptibility and severity will incorporate exome or whole-genome sequencing to identify common and rare genetic variants. Using these variants identified in comprehensively phenotyped asthmatics will lead to the development of personalized therapy in individuals with asthma.
PMCID: PMC3431509  PMID: 22929093
Asthma; genetics; susceptibility; severity; personalized medicine; therapy; lung function
24.  Genome-wide association study of asthma identifies RAD50-IL13 and HLA-DR/DQ regions 
Asthma is a heterogeneous disease that is caused by the interaction of genetic susceptibility with environmental influences. Genome-wide association studies (GWAS) represent a powerful approach to investigate the association of DNA variants with disease susceptibility. To date, few GWAS for asthma have been reported.
GWAS was performed on a population of severe or difficult-to-treat asthmatics to identify genes that are involved in the pathogenesis of asthma.
292,443 SNPs were tested for association with asthma in 473 TENOR cases and 1,892 Illumina general population controls. Asthma-related quantitative traits (total serum IgE, FEV1, FVC, and FEV1/FVC) were also tested in identified candidate regions in 473 TENOR cases and 363 phenotyped controls without a history of asthma to further analyze GWAS results. Imputation was performed in identified candidate regions for analysis with denser SNP coverage.
Multiple SNPs in the RAD50-IL13 region on chromosome 5q31.1 were associated with asthma: rs2244012 in intron 2 of RAD50 (P = 3.04E-07). The HLA-DR/DQ region on chromosome 6p21.3 was also associated with asthma: rs1063355 in the 3’ UTR of HLA-DQB1 (P = 9.55E-06). Imputation identified several significant SNPs in the TH2 locus control region (LCR) 3’ of RAD50. Imputation also identified a more significant SNP, rs3998159 (P = 1.45E-06), between HLA-DQB1 and HLA-DQA2.
This GWAS confirmed the important role of TH2 cytokine and antigen presentation genes in asthma at a genome-wide level and the importance of additional investigation of these two regions to delineate their structural complexity and biologic function in the development of asthma.
PMCID: PMC2824608  PMID: 20159242
Asthma; GWAS; RAD50; IL13; HLA-DQB1; TENOR
25.  Application of Multi-SNP Approaches Bayesian LASSO and AUC-RF to Detect Main Effects of Inflammatory-Gene Variants Associated with Bladder Cancer Risk 
PLoS ONE  2013;8(12):e83745.
The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.
PMCID: PMC3877090  PMID: 24391818

