To create surveillance algorithms to detect diabetes and classify type 1 versus type 2 diabetes using structured electronic health record (EHR) data.
RESEARCH DESIGN AND METHODS
We extracted 4 years of data from the EHR of a large, multisite, multispecialty ambulatory practice serving ∼700,000 patients. We flagged possible cases of diabetes using laboratory test results, diagnosis codes, and prescriptions. We assessed the sensitivity and positive predictive value of novel combinations of these data to classify type 1 versus type 2 diabetes among 210 individuals. We applied an optimized algorithm to a live, prospective, EHR-based surveillance system and reviewed 100 additional cases for validation.
The diabetes algorithm flagged 43,177 patients. All criteria contributed unique cases: 78% had diabetes diagnosis codes, 66% fulfilled laboratory criteria, and 46% had suggestive prescriptions. The sensitivity and positive predictive value of ICD-9 codes for type 1 diabetes were 26% (95% CI 12–49) and 94% (83–100) for type 1 codes alone; 90% (81–95) and 57% (33–86) for two or more type 1 codes plus any number of type 2 codes. An optimized algorithm incorporating the ratio of type 1 versus type 2 codes, plasma C-peptide and autoantibody levels, and suggestive prescriptions flagged 66 of 66 (100% [96–100]) patients with type 1 diabetes. On validation, the optimized algorithm correctly classified 35 of 36 patients with type 1 diabetes (raw sensitivity, 97% [87–100], population-weighted sensitivity, 65% [36–100], and positive predictive value, 88% [78–98]).
Algorithms applied to EHR data detect more cases of diabetes than claims codes and reasonably discriminate between type 1 and type 2 diabetes.
Motivation: Galaxy is a software application supporting high-throughput biology analyses and work flows, available as a free on-line service or as source code for local deployment. New tools can be written to extend Galaxy, and these can be shared using public Galaxy Tool Shed (GTS) repositories, but converting even simple scripts into tools requires effort from a skilled developer.
Results: The Tool Factory is a novel Galaxy tool that automates the generation of all code needed to execute user-supplied scripts, and wraps them into new Galaxy tools for upload to a GTS, ready for review and installation through the Galaxy administrative interface.
Availability and implementation: The Galaxy administrative interface supports automated installation from the main GTS. Source code and support are available at the project website, https://bitbucket.org/fubar/galaxytoolfactory. The Tool Factory is implemented as an installable Galaxy tool.
Rationale: Variability in pulmonary disease severity is found in patients with cystic fibrosis (CF) who have identical mutations in the CF transmembrane conductance regulator (CFTR) gene. We hypothesized that one factor accounting for heterogeneity in pulmonary disease severity is variation in the family of genes affecting the biology of interleukin-1 (IL-1), which impacts acquisition and maintenance of Pseudomonas aeruginosa infection in animal models of chronic infection. Methods: We genotyped 58 single nucleotide polymorphisms (SNPs) in the IL-1 gene cluster in 808 CF subjects from the University of North Carolina and Case Western Reserve University (UNC/CWRU) joint cohort. All were homozygous for ΔF508, and categories of “severe” (cases) or “mild” (control subjects) lung disease were defined by the lowest or highest quartile of forced expired volume (FEV1) for age in the CF population. After adjustment for age and gender, genotypic data were tested for association with lung disease severity. Odds ratios (ORs) comparing severe versus mild CF were also calculated for each genotype (with the homozygote major allele as the reference group) for all 58 SNPs. From these analyses, nine SNPs with a moderate effect size, OR ≤ 0.5or > 1.5, were selected for further testing. To replicate the case-control study results, we genotyped the same nine SNPs in a second population of CF parent-offspring trios (recruited from Children’s Hospital Boston), in which the offspring had similar pulmonary phenotypes. For the trio analysis, both family-based and population-based associations were performed. Results: SNPs rs1143634 and rs1143639 in the IL1B gene demonstrated a consistent association with lung disease severity categories (P < 0.10) and longitudinal analysis of lung disease severity (P < 0.10) in CF in both the case-control and family-based studies. In females, there was a consistent association (false discovery rate adjusted joint P-value < 0.06 for both SNPs) in both the analysis of lung disease severity in the UNC/CWRU cohort and the family-based analysis of affection status. Conclusion: Our findings suggest that IL1β is a clinically relevant modulator of CF lung disease.
gene modifiers; cystic fibrosis; CFTR; IL-1 gene family
IL10 is an anti-inflammatory cytokine that has been found to have lower production in macrophages and mononuclear cells from asthmatics. Since reduced IL10 levels may influence the severity of asthma phenotypes, we examined IL10 single-nucleotide polymorphisms (SNPs) for association with asthma severity and allergy phenotypes as quantitative traits. Utilizing DNA samples from 518 Caucasian asthmatic children from the Childhood Asthma Management Program (CAMP) and their parents, we genotyped six IL10 SNPs: 3 in the promoter, 2 in introns, and one in the 3′ UTR. Using family-based association tests, each SNP was tested for association with asthma and allergy phenotypes individually. Population-based association analysis was performed with each SNP locus, the promoter haplotypes and the 6-loci haplotypes. The 3′ UTR SNP was significantly associated with FEV1 as a percent of predicted (FEV1PP) (P=0.0002) in both the family and population analyses. The promoter haplotype GCC was positively associated with IgE levels and FEV1PP (P=0.007 and 0.012, respectively). The promoter haplotype ATA was negatively associated with lnPC20 and FEV1PP (P=0.008 and 0.043, respectively). Polymorphisms in IL10 are associated with asthma phenotypes in this cohort. Further studies of variation in the IL10 gene may help elucidate the mechanism of asthma development in children.
interleukin 10 (IL10); single nucleotide polymorphism (SNP); genetic association; family-based association test (FBAT); haplotype; promoter; 3′; untranslated region (3′UTR)
The response to treatment for asthma is characterized by wide interindividual variability, with a significant number of patients who have no response. We hypothesized that a genomewide association study would reveal novel pharmacogenetic determinants of the response to inhaled glucocorticoids.
We analyzed a small number of statistically powerful variants selected on the basis of a family-based screening algorithm from among 534,290 single-nucleotide polymorphisms (SNPs) to determine changes in lung function in response to inhaled glucocorticoids. A significant, replicated association was found, and we characterized its functional effects.
We identified a significant pharmacogenetic association at SNP rs37972, replicated in four independent populations totaling 935 persons (P = 0.0007), which maps to the glucocorticoid-induced transcript 1 gene (GLCCI1) and is in complete linkage disequilibrium (i.e., perfectly correlated) with rs37973. Both rs37972 and rs37973 are associated with decrements in GLCCI1 expression. In isolated cell systems, the rs37973 variant is associated with significantly decreased luciferase reporter activity. Pooled data from treatment trials indicate reduced lung function in response to inhaled glucocorticoids in subjects with the variant allele (P = 0.0007 for pooled data). Overall, the mean (± SE) increase in forced expiratory volume in 1 second in the treated subjects who were homozygous for the mutant rs37973 allele was only about one third of that seen in similarly treated subjects who were homozygous for the wild-type allele (3.2 ± 1.6% vs. 9.4 ± 1.1%), and their risk of a poor response was significantly higher (odds ratio, 2.36; 95% confidence interval, 1.27 to 4.41), with genotype accounting for about 6.6% of overall inhaled glucocorticoid response variability.
A functional GLCCI1 variant is associated with substantial decrements in the response to inhaled glucocorticoids in patients with asthma. (Funded by the National Institutes of Health and others; ClinicalTrials.gov number, NCT00000575.)
Reliable identification of cis regulatory elements influencing transcription remains a challenging problem in molecular
bioinformatics. This is especially true for enhancer elements which are often located hundreds of kilobases from the gene promoter.
High resolution DNase hypersensitivity and connectivity profiling by the ENCODE consortium provides evidence of millions of
interacting cis-acting elements in the human genome. This prior knowledge can be incorporated into genome-wide expression
analyses, in the form of gene sets sharing regulatory sequence motifs in known DNase hypersensitivity peak regions. High
proportions of enrichment among the most extreme differentially transcribed genes from controlled biological experiments may
suggest novel hypotheses about signalling pathways. The utility of this approach is demonstrated with the reanalysis of a
microarray-derived gene expression data set through the Gene Set Enrichment Analysis pipeline, uncovering new putative distal
cis elements in the context of innate immunity. The DNase Hypersensitivity Connectivity informed Motif Enrichment in Gene
Expression (DHC-MEGE) method described here has the advantage of identifying distal elements such as enhancers, which are
often overlooked with standard promoter motif analysis.
The DHC-MEGE shell script can be obtained from Sourceforge https://sourceforge.net/projects/dhcmege/ and the
generated GMT file is attached as supplementary data.
DNAse hypersensitivity; motif enrichment; gene expression; enhancer; gene set enrichment analysis
Electronic medical record (EMR) systems are a rich potential source for detailed, timely, and efficient surveillance of large populations. We created the Electronic medical record Support for Public Health (ESP) system to facilitate and demonstrate the potential advantages of harnessing EMRs for public health surveillance. ESP organizes and analyzes EMR data for events of public health interest and transmits electronic case reports or aggregate population summaries to public health agencies as appropriate. It is designed to be compatible with any EMR system and can be customized to different states’ messaging requirements. All ESP code is open source and freely available. ESP currently has modules for notifiable disease, influenza-like illness syndrome, and diabetes surveillance.
An intelligent presentation system for ESP called the RiskScape is under development. The RiskScape displays surveillance data in an accessible and intelligible format by automatically mapping results by zip code, stratifying outcomes by demographic and clinical parameters, and enabling users to specify custom queries and stratifications. The goal of RiskScape is to provide public health practitioners with rich, up-to-date views of health measures that facilitate timely identification of health disparities and opportunities for targeted interventions. ESP installations are currently operational in Massachusetts and Ohio, providing live, automated surveillance on over 1 million patients. Additional installations are underway at two more large practices in Massachusetts.
Genome-wide association studies of human gene expression promise to identify functional regulatory genetic variation that contributes to phenotypic diversity. However, it is unclear how useful this approach will be for the identification of disease-susceptibility variants. We generated gene expression profiles for 22 184 mRNA transcripts using RNA derived from peripheral blood CD4+ lymphocytes, and genome-wide genotype data for 516 512 autosomal markers in 200 subjects. We screened for cis-acting variants by testing variants mapping within 50 kb of expressed transcripts for association with transcript abundance using generalized linear models. Significant associations were identified for 1585 genes at a false discovery rate of 0.05 (corresponding to P-values ranging from 1 × 10−91 to 7 × 10−4). Importantly, we identified evidence of regulatory variation for 119 previously mapped disease genes, including 24 examples where the variant with the strongest evidence of disease-association demonstrates strong association with specific transcript abundance. The prevalence of cis-acting variants among disease-associated genes was 63% higher than the genome-wide rate in our data set (P = 6.41 × 10−6), and although many of the implicated loci were associated with immune-related diseases (including asthma, connective tissue disorders and inflammatory bowel disease), associations with genes implicated in non-immune-related diseases including lipid profiles, anthropomorphic measurements, cancer and neurologic disease were also observed. Genetic variants that confer inter-individual differences in gene expression represent an important subset of variants that contribute to disease susceptibility. Population-based integrative genetic approaches can help identify such variation and enhance our understanding of the genetic basis of complex traits.
A 900-KB inversion exists within a large region of conserved linkage disequilibrium (LD) on chromosome 17. CRHR1 is located within the inversion region and associated with inhaled corticosteroid response in asthma. We hypothesized that CRHR1 variants are in LD with the inversion, supporting a potential role for natural selection in the genetic response to corticosteroids. We genotyped 6 single nucleotide polymorphisms (SNPs) spanning chr17:40,410,565–42,372,240, including 4 SNPs defining inversion status. Similar allele frequencies and strong LD were noted between the inversion and a CRHR1 SNP previously associated with lung function response to inhaled corticosteroids. Each inversion-defining SNP was strongly associated with inhaled corticosteroid response in adult asthma (p-values 0.002–0.005). The CRHR1 response to inhaled corticosteroids may thus be explained by natural selection resulting from inversion status or by long-range LD with another gene. Additional pharmacogenetic investigations into to regions of chromosomal diversity, including copy number variation and inversions, are warranted.
CRHR1; tau haplotype; MAPT; inversion; asthma; corticosteroid; pharmacogenetics
Corticotropin - releasing hormone receptor 2 (CRHR2) participates in smooth muscle relaxation response and may influence acute airway bronchodilator response to short – acting β2 agonist treatment of asthma. We aim to assess associations between genetic variants of CRHR2 and acute bronchodilator response in asthma.
We investigated 28 single nucleotide polymorphisms in CRHR2 for associations with acute bronchodilator response to albuterol in 607 Caucasian asthmatic subjects recruited as part of the Childhood Asthma Management Program (CAMP). Replication was conducted in two Caucasian adult asthma cohorts – a cohort of 427 subjects enrolled in a completed clinical trial conducted by Sepracor Inc. (MA, USA) and a cohort of 152 subjects enrolled in the Clinical Trial of Low-Dose Theopylline and Montelukast (LODO) conducted by the American Lung Association Asthma Clinical Research Centers.
Five variants were significantly associated with acute bronchodilator response in at least one cohort (p-value ≤ 0.05). Variant rs7793837 was associated in CAMP and LODO (p-value = 0.05 and 0.03, respectively) and haplotype blocks residing at the 5’ end of CRHR2 were associated with response in all three cohorts.
We report for the first time, at the gene level, replicated associations between CRHR2 and acute bronchodilator response. While no single variant was significantly associated in all three cohorts, the findings that variants at the 5’ end of CRHR2 are associated in each of three cohorts strongly suggest that the causative variants reside in this region and its genetic effect, although present, is likely to be weak.
Asthma; genetics; corticotrophin releasing hormone receptor 2; CRHR2; bronchodilator response; polymorphism; β2 adrenergic receptor agonist
Electronic health records (EHRs) have the potential to improve completeness and timeliness of tuberculosis (TB) surveillance relative to traditional reporting, particularly for culture-negative disease. We report on the development and validation of a TB detection algorithm for EHR data followed by implementation in a live surveillance and reporting system.
We used structured electronic data from an ambulatory practice in eastern Massachusetts to develop a screening algorithm aimed at achieving 100% sensitivity for confirmed active TB with the highest possible positive predictive value (PPV) for physician-suspected disease. We validated the algorithm in 16 years of retrospective electronic data and then implemented it in a real-time EHR-based surveillance system. We assessed PPV and the completeness of case capture relative to conventional reporting in 18 months of prospective surveillance.
The final algorithm required a prescription for pyrazinamide, an International Classification of Diseases, Ninth Revision (ICD-9) code for TB and prescriptions for two antituberculous medications, or an ICD-9 code for TB and an order for a TB diagnostic test. During validation, this algorithm had a PPV of 84% (95% confidence interval 78, 88) for physician-suspected disease. One-third of confirmed cases were culture-negative. All false-positives were instances of latent TB. In 18 months of prospective EHR-based surveillance with this algorithm, seven additional cases of physician-suspected active TB were detected, including two patients with culture-negative disease. A review of state health department records revealed no cases missed by the algorithm.
Live, prospective TB surveillance using EHR data is feasible and promising.
Rationale: Several family-based studies have identified genetic linkage for lung function and airflow obstruction to chromosome 2q.
Objectives: We hypothesized that merging results of high-resolution single nucleotide polymorphism (SNP) mapping in four separate populations would lead to the identification of chronic obstructive pulmonary disease (COPD) susceptibility genes on chromosome 2q.
Methods: Within the chromosome 2q linkage region, 2,843 SNPs were genotyped in 806 COPD cases and 779 control subjects from Norway, and 2,484 SNPs were genotyped in 309 patients with severe COPD from the National Emphysema Treatment Trial and 330 community control subjects. Significant associations from the combined results across the two case-control studies were followed up in 1,839 individuals from 603 families from the International COPD Genetics Network (ICGN) and in 949 individuals from 127 families in the Boston Early-Onset COPD Study.
Measurements and Main Results: Merging the results of the two case-control analyses, 14 of the 790 overlapping SNPs had a combined P < 0.01. Two of these 14 SNPs were consistently associated with COPD in the ICGN families. The association with one SNP, located in the gene XRCC5, was replicated in the Boston Early-Onset COPD Study, with a combined P = 2.51 × 10−5 across the four studies, which remains significant when adjusted for multiple testing (P = 0.02). Genotype imputation confirmed the association with SNPs in XRCC5.
Conclusions: By combining data from COPD genetic association studies conducted in four independent patient samples, we have identified XRCC5, an ATP-dependent DNA helicase, as a potential COPD susceptibility gene.
emphysema; genetic linkage; metaanalysis; single nucleotide polymorphism
Pathogens have represented an important selective force during the adaptation of modern human populations to changing social and other environmental conditions. The evolution of the immune system has therefore been influenced by these pressures. Genomic scans have revealed that immune system is one of the functions enriched with genes under adaptive selection.
Here, we describe how the innate immune system has responded to these challenges, through the analysis of resequencing data for 132 innate immunity genes in two human populations. Results are interpreted in the context of the functional and interaction networks defined by these genes. Nucleotide diversity is lower in the adaptors and modulators functional classes, and is negatively correlated with the centrality of the proteins within the interaction network. We also produced a list of candidate genes under positive or balancing selection in each population detected by neutrality tests and showed that some functional classes are preferential targets for selection.
We found evidence that the role of each gene in the network conditions the capacity to evolve or their evolvability: genes at the core of the network are more constrained, while adaptation mostly occurred at particular positions at the network edges. Interestingly, the functional classes containing most of the genes with signatures of balancing selection are involved in autoinflammatory and autoimmune diseases, suggesting a counterbalance between the beneficial and deleterious effects of the immune response.
Network modeling of whole transcriptome expression data enables characterization of complex epistatic (gene-gene) interactions that underlie cellular functions. Though numerous methods have been proposed and successfully implemented to develop these networks, there are no formal methods for comparing differences in network connectivity patterns as a function of phenotypic trait.
Here we describe a novel approach for quantifying the differences in gene-gene connectivity patterns across disease states based on Graphical Gaussian Models (GGMs). We compare the posterior probabilities of connectivity for each gene pair across two disease states, expressed as a posterior odds-ratio (postOR) for each pair, which can be used to identify network components most relevant to disease status. The method can also be generalized to model differential gene connectivity patterns within previously defined gene sets, gene networks and pathways. We demonstrate that the GGM method reliably detects differences in network connectivity patterns in datasets of varying sample size. Applying this method to two independent breast cancer expression data sets, we identified numerous reproducible differences in network connectivity across histological grades of breast cancer, including several published gene sets and pathways. Most notably, our model identified two gene hubs (MMP12 and CXCL13) that each exhibited differential connectivity to more than 30 transcripts in both datasets. Both genes have been previously implicated in breast cancer pathobiology, but themselves are not differentially expressed by histologic grade in either dataset, and would thus have not been identified using traditional differential gene expression testing approaches. In addition, 16 curated gene sets demonstrated significant differential connectivity in both data sets, including the matrix metalloproteinases, PPAR alpha sequence targets, and the PUFA synthesis pathway.
Our results suggest that GGM can be used to formally evaluate differences in global interactome connectivity across disease states, and can serve as a powerful tool for exploring the molecular events that contribute to disease at a systems level.
Rationale: Animal models demonstrate that aberrant gene expression in utero can result in abnormal pulmonary phenotypes.
Objectives: We sought to identify genes that are differentially expressed during in utero airway development and test the hypothesis that variants in these genes influence lung function in patients with asthma.
Methods: Stage 1 (Gene Expression): Differential gene expression analysis across the pseudoglandular (n = 27) and canalicular (n = 9) stages of human lung development was performed using regularized t tests with multiple comparison adjustments. Stage 2 (Genetic Association): Genetic association analyses of lung function (FEV1, FVC, and FEV1/FVC) for variants in five differentially expressed genes were conducted in 403 parent-child trios from the Childhood Asthma Management Program (CAMP). Associations were replicated in 583 parent-child trios from the Genetics of Asthma in Costa Rica study.
Measurements and Main Results: Of the 1,776 differentially expressed genes between the pseudoglandular (gestational age: 7–16 wk) and the canalicular (gestational age: 17–26 wk) stages, we selected 5 genes in the Wnt pathway for association testing. Thirteen single nucleotide polymorphisms in three genes demonstrated association with lung function in CAMP (P < 0.05), and associations for two of these genes were replicated in the Costa Ricans: Wnt1-inducible signaling pathway protein 1 with FEV1 (combined P = 0.0005) and FVC (combined P = 0.0004), and Wnt inhibitory factor 1 with FVC (combined P = 0.003) and FEV1/FVC (combined P = 0.003).
Conclusions: Wnt signaling genes are associated with impaired lung function in two childhood asthma cohorts. Furthermore, gene expression profiling of human fetal lung development can be used to identify genes implicated in the pathogenesis of lung function impairment in individuals with asthma.
asthma; lung development; lung function; genetic variation; gene expression
Prior studies suggest a role for a variant (rs5743836) in the promoter of toll-like receptor 9 (TLR9) in asthma and other inflammatory diseases. We performed detailed genetic association studies of the functional variant rs5743836 with asthma susceptibility and asthma-related phenotypes in three independent cohorts.
rs5743836 was genotyped in two family-based cohorts of children with asthma and a case-control study of adult asthmatics. Association analyses were performed using chi square, family-based and population-based testing. A luciferase assay was performed to investigate whether rs5743836 genotype influences TLR9 promoter activity.
Contrary to prior reports, rs5743836 was not associated with asthma in any of the three cohorts. Marginally significant associations were found with FEV1 and FVC (p = 0.003 and p = 0.008, respectively) in one of the family-based cohorts, but these associations were not significant after correcting for multiple comparisons. Higher promoter activity of the CC genotype was demonstrated by luciferase assay, confirming the functional importance of this variant.
Although rs5743836 confers regulatory effects on TLR9 transcription, this variant does not appear to be an important asthma-susceptibility locus.
Genetic variation at the MYH9 locus is linked to the high incidence of focal segmental glomerulosclerosis (FSGS) and non-diabetic end-stage renal disease among African Americans. To further define risk alleles with FSGS we performed a genome-wide association analysis using more than one million single nucleotide polymorphisms in 56 African and 61 European American patients with biopsy-confirmed FSGS. Results were compared to 1641 European and 1800 African Americans as unselected controls. While no association was observed in the cohort of European Americans; the case-control comparison of African Americans found variants within a 60kb region of chromosome 22 containing part of the APOL1 and MYH9 genes associated with increased risk of FSGS. This region spans different linkage disequilibrium blocks and variants associating with disease within this region are in linkage disequilibrium with variants which have shown signals of natural selection. APOL1 is a strong candidate for a gene that has undergone recent natural selection and is known to be involved in the infection by Trypanosome brucei, a parasite common in Africa that has recently adapted to infect human hosts. Further studies will be required to establish which variants are causally related to kidney disease, what mutations caused the selective sweep, and to ultimately determine if these are the same.
focal segmental glomerulosclerosis; end stage kidney disease; genetic renal disease
Low plasma B-vitamin levels and elevated homocysteine have been associated with cancer, cardiovascular disease and neurodegenerative disorders. Common variants in FUT2 on chromosome 19q13 were associated with plasma vitamin B12 levels among women in a genome-wide association study in the Nurses’ Health Study (NHS) NCI-Cancer Genetic Markers of Susceptibility (CGEMS) project. To identify additional loci associated with plasma vitamin B12, homocysteine, folate and vitamin B6 (active form pyridoxal 5′-phosphate, PLP), we conducted a meta-analysis of three GWA scans (total n = 4763, consisting of 1658 women in NHS-CGEMS, 1647 women in Framingham-SNP-Health Association Resource (SHARe) and 1458 men in SHARe). On chromosome 19q13, we confirm the association of plasma vitamin B12 with rs602662 and rs492602 (P-value = 1.83 × 10−15 and 1.30 × 10−14, respectively) in strong linkage disequilibrium (LD) with rs601338 (P = 6.92 × 10−15), the FUT2 W143X nonsense mutation. We identified additional genome-wide significant loci for plasma vitamin B12 on chromosomes 6p21 (P = 4.05 × 10−08), 10p12 (P-value=2.87 × 10−9) and 11q11 (P-value=2.25 × 10−10) in genes with biological relevance. We confirm the association of the well-studied functional candidate SNP 5,10-methylene tetrahydrofolate reductase (MTHFR) Ala222Val (dbSNP ID: rs1801133; P-value=1.27 × 10−8), on chromosome 1p36 with plasma homocysteine and identify an additional genome-wide significant locus on chromosome 9q22 (P-value=2.06 × 10−8) associated with plasma homocysteine. We also identified genome-wide associations with variants on chromosome 1p36 with plasma PLP (P-value=1.40 × 10−15). Genome-wide significant loci were not identified for plasma folate. These data reveal new biological candidates and confirm prior candidate genes for plasma homocysteine, plasma vitamin B12 and plasma PLP.
Asthma is a chronic respiratory disease whose genetic basis has been explored for over two decades, most recently via genome-wide association studies. We sought to find asthma-susceptibility variants by using probands from a single population in both family-based and case-control association designs.
We used probands from the Childhood Asthma Management Program (CAMP) in two primary genome-wide association study designs: (1) probands were combined with publicly available population controls in a case-control design, and (2) probands and their parents were used in a family-based design. We followed a two-stage replication process utilizing three independent populations to validate our primary findings.
We found that single nucleotide polymorphisms with similar case-control and family-based association results were more likely to replicate in the independent populations, than those with the smallest p-values in either the case-control or family-based design alone. The single nucleotide polymorphism that showed the strongest evidence for association to asthma was rs17572584, which replicated in 2/3 independent populations with an overall p-value among replication populations of 3.5E-05. This variant is near a gene that encodes an enzyme that has been implicated to act coordinately with modulators of Th2 cell differentiation and is expressed in human lung.
Our results suggest that using probands from family-based studies in case-control designs, and combining results of both family-based and case-control approaches, may be a way to augment our ability to find SNPs associated with asthma and other complex diseases.
Rationale: Association studies have implicated many genes in asthma pathogenesis, with replicated associations between single-nucleotide polymorphisms (SNPs) and asthma reported for more than 30 genes. Genome-wide genotyping enables simultaneous evaluation of most of this variation, and facilitates more comprehensive analysis of other common genetic variation around these candidate genes for association with asthma.
Objectives: To use available genome-wide genotypic data to assess the reproducibility of previously reported associations with asthma and to evaluate the contribution of additional common genetic variation surrounding these loci to asthma susceptibility.
Methods: Illumina Human Hap 550Kv3 BeadChip (Illumina, San Diego, CA) SNP arrays were genotyped in 422 nuclear families participating in the Childhood Asthma Management Program. Genes with at least one SNP demonstrating prior association with asthma in two or more populations were tested for evidence of association with asthma, using family-based association testing.
Measurements and Main Results: We identified 39 candidate genes from the literature, using prespecified criteria. Of the 160 SNPs previously genotyped in these 39 genes, 10 SNPs in 6 genes were significantly associated with asthma (including the first independent replication for asthma-associated integrin β3 [ITGB3]). Evaluation of 619 additional common variants included in the Illumina 550K array revealed additional evidence of asthma association for 15 genes, although none were significant after adjustment for multiple comparisons.
Conclusions: We replicated asthma associations for a minority of candidate genes. Pooling genome-wide association study results from multiple studies will increase the power to appreciate marginal effects of genes and further clarify which candidates are true “asthma genes.”
asthma; replication; single-nucleotide polymorphism; integrin β3; association
Although asthma is highly prevalent among certain Hispanic subgroups, genetic determinants of asthma and asthma‐related traits have not been conclusively identified in Hispanic populations. A study was undertaken to identify genomic regions containing susceptibility loci for pulmonary function and bronchodilator responsiveness (BDR) in Costa Ricans.
Eight extended pedigrees were ascertained through schoolchildren with asthma in the Central Valley of Costa Rica. Short tandem repeat (STR) markers were genotyped throughout the genome at an average spacing of 8.2 cM. Multipoint variance component linkage analyses of forced expiratory volume in 1 second (FEV1) and FEV1/ forced vital capacity (FVC; both pre‐bronchodilator and post‐bronchodilator) and BDR were performed in these eight families (pre‐bronchodilator spirometry, n = 640; post‐bronchodilator spirometry and BDR, n = 624). Nine additional STR markers were genotyped on chromosome 7. Secondary analyses were repeated after stratification by cigarette smoking.
Among all subjects, the highest logarithm of the odds of linkage (LOD) score for FEV1 (post‐bronchodilator) was found on chromosome 7q34–35 (LOD = 2.45, including the additional markers). The highest LOD scores for FEV1/FVC (pre‐bronchodilator) and BDR were found on chromosomes 2q (LOD = 1.53) and 9p (LOD = 1.53), respectively. Among former and current smokers there was near‐significant evidence of linkage to FEV1/FVC (post‐bronchodilator) on chromosome 5p (LOD = 3.27) and suggestive evidence of linkage to FEV1 on chromosomes 3q (pre‐bronchodilator, LOD = 2.74) and 4q (post‐bronchodilator, LOD = 2.66).
In eight families of children with asthma in Costa Rica, there is suggestive evidence of linkage to FEV1 on chromosome 7q34–35. In these families, FEV1/FVC may be influenced by an interaction between cigarette smoking and a locus (loci) on chromosome 5p.
Bayesian hierarchical models that characterize the distributions of (transformed) gene profiles have been proven very useful and flexible in selecting differentially expressed genes across different types of tissue samples (e.g. Lo and Gottardo, 2007). However, the marginal mean and variance of these models are assumed to be the same for different gene clusters and for different tissue types. Moreover, it is not easy to determine which of the many competing Bayesian hierarchical models provides the best fit for a specific microarray data set. To address these two issues, we propose a marginal mixture model that directly models the marginal distribution of transformed gene profiles. Specifically, we approximate the marginal distributions of transformed gene profiles via a mixture of three-component multivariate Normal distributions, each component of which has the same structures of marginal mean vector and covariance matrix as those for Bayesian hierarchical models, but the values can differ. Based on the proposed model, a method is derived to select genes differentially expressed across two types of tissue samples. The derived gene selection method performs well on a real microarray data set and consistently has the best performance (based on class agreement indices) compared with several other gene selection methods on simulated microarray data sets generated from three different mixture models.
Rationale: Inhaled β-agonists are one of the most widely used classes of drugs for the treatment of asthma. However, a substantial proportion of patients with asthma do not have a favorable response to these drugs, and identifying genetic determinants of drug response may aid in tailoring treatment for individual patients.
Objectives: To screen variants in candidate genes in the steroid and β-adrenergic pathways for association with response to inhaled β-agonists.
Methods: We genotyped 844 single nucleotide polymorphisms (SNPs) in 111 candidate genes in 209 children and their parents participating in the Childhood Asthma Management Program. We screened the association of these SNPs with acute response to inhaled β-agonists (bronchodilator response [BDR]) using a novel algorithm implemented in a family-based association test that ranked SNPs in order of statistical power. Genes that had SNPs with median power in the highest quartile were then taken for replication analyses in three other asthma cohorts.
Measurements and Main Results: We identified 17 genes from the screening algorithm and genotyped 99 SNPs from these genes in a second population of patients with asthma. We then genotyped 63 SNPs from four genes with significant associations with BDR, for replication in a third and fourth population of patients with asthma. Evidence for association from the four asthma cohorts was combined, and SNPs from ARG1 were significantly associated with BDR. SNP rs2781659 survived Bonferroni correction for multiple testing (combined P value = 0.00048, adjusted P value = 0.047).
Conclusions: These findings identify ARG1 as a novel gene for acute BDR in both children and adults with asthma.
pharmacogenetics; asthma; bronchodilator agents
Health care providers are legally obliged to report cases of specified diseases to public health authorities, but existing manual, provider-initiated reporting systems generally result in incomplete, error-prone, and tardy information flow. Automated laboratory-based reports are more likely accurate and timely, but lack clinical information and treatment details. Here, we describe the Electronic Support for Public Health (ESP) application, a robust, automated, secure, portable public health detection and messaging system for cases of notifiable diseases. The ESP application applies disease specific logic to any complete source of electronic medical data in a fully automated process, and supports an optional case management workflow system for case notification control. All relevant clinical, laboratory and demographic details are securely transferred to the local health authority as an HL7 message. The ESP application has operated continuously in production mode since January 2007, applying rigorously validated case identification logic to ambulatory EMR data from more than 600,000 patients. Source code for this highly interoperable application is freely available under an approved open-source license at http://esphealth.org.
With the recent development of microarray technologies, the comparability of gene expression data obtained from different platforms poses an important problem. We evaluated two widely used platforms, Affymetrix U133 Plus 2.0 and the Illumina HumanRef-8 v2 Expression Bead Chips, for comparability in a biological system in which changes may be subtle, namely fetal lung tissue as a function of gestational age.
We performed the comparison via sequence-based probe matching between the two platforms. "Significance grouping" was defined as a measure of comparability. Using both expression correlation and significance grouping as measures of comparability, we demonstrated that despite overall cross-platform differences at the single gene level, increased correlation between the two platforms was found in genes with higher expression level, higher probe overlap, and lower p-value. We also demonstrated that biological function as determined via KEGG pathways or GO categories is more consistent across platforms than single gene analysis.
We conclude that while the comparability of the platforms at the single gene level may be increased by increasing sample size, they are highly comparable ontologically even for subtle differences in a relatively small sample size. Biologically relevant inference should therefore be reproducible across laboratories using different platforms.