Current computational methods used to analyze changes in DNA methylation and chromatin modification rely on sequenced genomes. Here we describe a pipeline for the detection of these changes from short-read sequence data that does not require a reference genome. Open source software packages were used for sequence assembly, alignment, and measurement of differential enrichment. The method was evaluated by comparing results with reference-based results showing a strong correlation between chromatin modification and gene expression. We then used our de novo sequence assembly to build the DNA methylation profile for the non-referenced Psammomys obesus genome. The pipeline described uses open source software for fast annotation and visualization of unreferenced genomic regions from short-read data.
ChIP-seq; DNA methylation; de novo assembly; epigenomic integration; High-throughput sequencing; MBD-seq; Psammomys obesus
Massively parallel cDNA sequencing (RNA-seq) experiments are gradually superseding microarrays in quantitative gene expression profiling. However, many biologists are uncertain about the choice of differentially expressed gene (DEG) analysis methods and the validity of cost-saving sample pooling strategies for their RNA-seq experiments. Hence, we performed experimental validation of DEGs identified by Cuffdiff2, edgeR, DESeq2 and Two-stage Poisson Model (TSPM) in a RNA-seq experiment involving mice amygdalae micro-punches, using high-throughput qPCR on independent biological replicate samples. Moreover, we sequenced RNA-pools and compared their results with sequencing corresponding individual RNA samples.
False-positivity rate of Cuffdiff2 and false-negativity rates of DESeq2 and TSPM were high. Among the four investigated DEG analysis methods, sensitivity and specificity of edgeR was relatively high. We documented the pooling bias and that the DEGs identified in pooled samples suffered low positive predictive values.
Our results highlighted the need for combined use of more sensitive DEG analysis methods and high-throughput validation of identified DEGs in future RNA-seq experiments. They indicated limited utility of sample pooling strategies for RNA-seq in similar setups and supported increasing the number of biological replicate samples.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1767-y) contains supplementary material, which is available to authorized users.
Gene expression; Next-generation RNA Sequencing; Predictive value of tests; Quantitative real-time polymerase chain reaction; Sensitivity and specificity
Defensins are antimicrobial peptides that may take part in airway inflammation and hyperresponsiveness.
We characterized the genetic diversity in the defensin β-1 (DEFB1) locus and tested for an association between common genetic variants and asthma diagnosis.
To identify single nucleotide polymorphisms (SNPs), we resequenced this gene in 23 self-defined European Americans and 24 African Americans. To test whether DEFB1 genetic variants are associated with asthma, we genotyped 4 haplotype-tag SNPs in 517 asthmatic and 519 control samples from the Nurses’ Health Study (NHS) and performed a case-control association analysis. To replicate these findings, we evaluated the DEFB1 polymorphisms in a second cohort from the Childhood Asthma Management Program.
Within the NHS, single SNP testing suggested an association between asthma diagnosis and a 5′ genomic SNP (g.–1816 T>C; P = .025) and intronic SNP (IVS+692 G>A; P = .054). A significant association between haplotype (Adenine, Cytosine, Thymine, Adenine [ACTA]) and asthma (P = .024) was also identified. Associations between asthma diagnosis and both DEFB1 polymorphisms were observed in Childhood Asthma Management Program, a second cohort: g.–1816 T>C and IVS+692 G>A demonstrated significant transmission distortion (P = .05 and .007, respectively). Transmission distortion was not observed in male subjects. The rare alleles (–1816C and +692A) were undertransmitted to offspring with asthma, suggesting a protective effect, contrary to the findings in the NHS cohort. Similar effects were evident at the haplotype level: ACTA was undertransmitted (P = .04) and was more prominent in female subjects (P = .007).
Variation in DEFB1 contributes to asthma diagnosis, with apparent gender-specific effects.
Asthma; asthma genetics; defensin; association studies
Corticosteroids exert their anti-inflammatory action by binding and activating the intracellular the glucocorticoid receptor (GR) hetero-complex.
Evaluate the genes HSPCB, HSPCA, STIP1, HSPA8, DNAJB1, PTGES3, FKBP5, and FKBP4 on corticosteroid response.
Caucasian asthmatics (382) randomized to once daily flunisolide or conventional inhaled corticosteroid therapy were genotyped. Outcome measures were baseline FEV1, % predicted FEV1, and % change in FEV1 after corticosteroid treatment. Multivariable analyses adjusted for age, gender, and height, were performed fitting the most appropriate genetic model based on quantitative mean derived from ANOVA models to determine if there was an independent effect of polymorphisms on change in FEV1 independent of baseline level.
Positive recessive model correlations for STIP1 SNPs were observed for baseline FEV1 [rs4980524, p=0.009; rs6591838, p=0.0045; rs2236647, p=0.002; and rs2236648; p=0.013], baseline % predicted FEV1 [rs4980524, p=0.002; rs6591838, p=0.017; rs2236647, p=0.003; and rs2236648; p=0.008] ; % change in FEV1 at 4 weeks [rs4980524, p=0.044; rs6591838, p=0.016; rs2236647; p=0.01] and 8 weeks therapy [rs4980524, p=0.044; rs6591838, p=0.016; rs2236647; p=0.01]. Haplotypic associations were observed for baseline FEV1 and % change in FEV1 at 4 weeks therapy [p=0.05 and p=0.01, respectively]. Significant trends towards association were observed for baseline % predicted FEV1 and % change in FEV1 at 8 weeks therapy. Positive correlations between haplotypes and % change in FEV1 were also observed.
STIP1 genetic variations may play a role in regulating corticosteroid response in asthmatics with reduced lung function. Replication in a second asthma population is required to confirm these observations.
Identifying genes that regulate corticosteroid responses could allow a priori determination of individual responses to corticosteroid therapy, leading to more effective dosing and/or selection of drug therapies for treating asthma.
corticosteroid; pharmacogenetics; glucocorticoid receptor; SNP; heat shock protein; heat shock organizing protein; immunophilin
We compared an electronic health record-based influenza-like illness (ILI) surveillance system with manual sentinel surveillance and virologic data to evaluate the utility of the automated system for routine ILI surveillance.
We obtained weekly aggregate ILI reports from the Electronic medical record Support for Public Health (ESP) disease-detection and reporting system, which used an automated algorithm to identify ILI visits among a patient population of about 700,000 in Eastern Massachusetts. The percentage of total visits for ILI (“percent ILI”) in ESP, percent ILI in the Massachusetts Department of Public Health's sentinel surveillance system, and percentage of laboratory specimens submitted to participating Massachusetts laboratories that tested positive for influenza were compared for the period October 2007–September 2011. We calculated Spearman's correlation coefficients and compared ESP and sentinel surveillance systems qualitatively, in terms of simplicity, flexibility, data quality, acceptability, timeliness, and usefulness.
ESP and sentinel surveillance percent ILI always peaked within one week of each other. There was 80% correlation between the two and 71%–73% correlation with laboratory data. Sentinel surveillance percent ILI was higher than ESP percent ILI during influenza seasons. The amplitude of variation in ESP percent ILI was greatest for 5- to 49-year-olds and typically peaked for the 5- to 24-year-old age group before the others.
The ESP system produces percent ILI data of similar quality to sentinel surveillance and offers the advantages of shifting disease reporting burden from clinicians to information systems, allowing tracking of disease by age group, facilitating efficient surveillance for very large populations, and producing consistent and timely reports.
High-throughput data production has revolutionized molecular biology. However, massive increases in data generation capacity require analysis approaches that are more sophisticated, and often very computationally intensive. Thus making sense of high-throughput data requires informatics support. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is available both as a publicly available web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual labs. Either way, it allows experimentalists without informatics or programming expertise to perform complex large-scale analysis with just a web browser.
Galaxy; analysis; bioinformatics; workflow; algorithm; pipeline; genomics; SNPs
Allergic rhinitis is a common disease whose genetic basis is incompletely explained. We report an integrated genomic analysis of allergic rhinitis.
We performed genome wide association studies (GWAS) of allergic rhinitis in 5633 ethnically diverse North American subjects. Next, we profiled gene expression in disease-relevant tissue (peripheral blood CD4+ lymphocytes) collected from subjects who had been genotyped. We then integrated the GWAS and gene expression data using expression single nucleotide (eSNP), coexpression network, and pathway approaches to identify the biologic relevance of our GWAS.
GWAS revealed ethnicity-specific findings, with 4 genome-wide significant loci among Latinos and 1 genome-wide significant locus in the GWAS meta-analysis across ethnic groups. To identify biologic context for these results, we constructed a coexpression network to define modules of genes with similar patterns of CD4+ gene expression (coexpression modules) that could serve as constructs of broader gene expression. 6 of the 22 GWAS loci with P-value ≤ 1x10−6 tagged one particular coexpression module (4.0-fold enrichment, P-value 0.0029), and this module also had the greatest enrichment (3.4-fold enrichment, P-value 2.6 × 10−24) for allergic rhinitis-associated eSNPs (genetic variants associated with both gene expression and allergic rhinitis). The integrated GWAS, coexpression network, and eSNP results therefore supported this coexpression module as an allergic rhinitis module. Pathway analysis revealed that the module was enriched for mitochondrial pathways (8.6-fold enrichment, P-value 4.5 × 10−72).
Our results highlight mitochondrial pathways as a target for further investigation of allergic rhinitis mechanism and treatment. Our integrated approach can be applied to provide biologic context for GWAS of other diseases.
Genome-wide association study; Allergic rhinitis; Coexpression network; Expression single-nucleotide polymorphism; Coexpression module; Pathway; Mitochondria; Hay fever; Allergy
Reversibility of airway obstruction in response to β2-agonists is highly variable among asthmatics, which is partially attributed to genetic factors. In a genome-wide association study of acute bronchodilator response (BDR) to inhaled albuterol, 534,290 single nucleotide polymorphisms (SNPs) were tested in 403 white trios from the Childhood Asthma Management Program using five statistical models to determine the most robust genetic associations. The primary replication phase included 1397 polymorphisms in three asthma trials (pooled n=764). The second replication phase tested 13 SNPs in three additional asthma populations (n=241, n=215, and n=592). An intergenic SNP on chromosome 10, rs11252394, proximal to several excellent biological candidates, significantly replicated (p=1.98×10−7) in the primary replication trials. An intronic SNP (rs6988229) in the collagen (COL22A1) locus also provided strong replication signals (p=8.51×10−6). This study applied a robust approach for testing the genetic basis of BDR and identified novel loci associated with this drug response in asthmatics.
pharmacogenetics; asthma; bronchodilator response; genome-wide association study; albuterol
To create surveillance algorithms to detect diabetes and classify type 1 versus type 2 diabetes using structured electronic health record (EHR) data.
RESEARCH DESIGN AND METHODS
We extracted 4 years of data from the EHR of a large, multisite, multispecialty ambulatory practice serving ∼700,000 patients. We flagged possible cases of diabetes using laboratory test results, diagnosis codes, and prescriptions. We assessed the sensitivity and positive predictive value of novel combinations of these data to classify type 1 versus type 2 diabetes among 210 individuals. We applied an optimized algorithm to a live, prospective, EHR-based surveillance system and reviewed 100 additional cases for validation.
The diabetes algorithm flagged 43,177 patients. All criteria contributed unique cases: 78% had diabetes diagnosis codes, 66% fulfilled laboratory criteria, and 46% had suggestive prescriptions. The sensitivity and positive predictive value of ICD-9 codes for type 1 diabetes were 26% (95% CI 12–49) and 94% (83–100) for type 1 codes alone; 90% (81–95) and 57% (33–86) for two or more type 1 codes plus any number of type 2 codes. An optimized algorithm incorporating the ratio of type 1 versus type 2 codes, plasma C-peptide and autoantibody levels, and suggestive prescriptions flagged 66 of 66 (100% [96–100]) patients with type 1 diabetes. On validation, the optimized algorithm correctly classified 35 of 36 patients with type 1 diabetes (raw sensitivity, 97% [87–100], population-weighted sensitivity, 65% [36–100], and positive predictive value, 88% [78–98]).
Algorithms applied to EHR data detect more cases of diabetes than claims codes and reasonably discriminate between type 1 and type 2 diabetes.
Motivation: Galaxy is a software application supporting high-throughput biology analyses and work flows, available as a free on-line service or as source code for local deployment. New tools can be written to extend Galaxy, and these can be shared using public Galaxy Tool Shed (GTS) repositories, but converting even simple scripts into tools requires effort from a skilled developer.
Results: The Tool Factory is a novel Galaxy tool that automates the generation of all code needed to execute user-supplied scripts, and wraps them into new Galaxy tools for upload to a GTS, ready for review and installation through the Galaxy administrative interface.
Availability and implementation: The Galaxy administrative interface supports automated installation from the main GTS. Source code and support are available at the project website, https://bitbucket.org/fubar/galaxytoolfactory. The Tool Factory is implemented as an installable Galaxy tool.
Rationale: Variability in pulmonary disease severity is found in patients with cystic fibrosis (CF) who have identical mutations in the CF transmembrane conductance regulator (CFTR) gene. We hypothesized that one factor accounting for heterogeneity in pulmonary disease severity is variation in the family of genes affecting the biology of interleukin-1 (IL-1), which impacts acquisition and maintenance of Pseudomonas aeruginosa infection in animal models of chronic infection. Methods: We genotyped 58 single nucleotide polymorphisms (SNPs) in the IL-1 gene cluster in 808 CF subjects from the University of North Carolina and Case Western Reserve University (UNC/CWRU) joint cohort. All were homozygous for ΔF508, and categories of “severe” (cases) or “mild” (control subjects) lung disease were defined by the lowest or highest quartile of forced expired volume (FEV1) for age in the CF population. After adjustment for age and gender, genotypic data were tested for association with lung disease severity. Odds ratios (ORs) comparing severe versus mild CF were also calculated for each genotype (with the homozygote major allele as the reference group) for all 58 SNPs. From these analyses, nine SNPs with a moderate effect size, OR ≤ 0.5or > 1.5, were selected for further testing. To replicate the case-control study results, we genotyped the same nine SNPs in a second population of CF parent-offspring trios (recruited from Children’s Hospital Boston), in which the offspring had similar pulmonary phenotypes. For the trio analysis, both family-based and population-based associations were performed. Results: SNPs rs1143634 and rs1143639 in the IL1B gene demonstrated a consistent association with lung disease severity categories (P < 0.10) and longitudinal analysis of lung disease severity (P < 0.10) in CF in both the case-control and family-based studies. In females, there was a consistent association (false discovery rate adjusted joint P-value < 0.06 for both SNPs) in both the analysis of lung disease severity in the UNC/CWRU cohort and the family-based analysis of affection status. Conclusion: Our findings suggest that IL1β is a clinically relevant modulator of CF lung disease.
gene modifiers; cystic fibrosis; CFTR; IL-1 gene family
IL10 is an anti-inflammatory cytokine that has been found to have lower production in macrophages and mononuclear cells from asthmatics. Since reduced IL10 levels may influence the severity of asthma phenotypes, we examined IL10 single-nucleotide polymorphisms (SNPs) for association with asthma severity and allergy phenotypes as quantitative traits. Utilizing DNA samples from 518 Caucasian asthmatic children from the Childhood Asthma Management Program (CAMP) and their parents, we genotyped six IL10 SNPs: 3 in the promoter, 2 in introns, and one in the 3′ UTR. Using family-based association tests, each SNP was tested for association with asthma and allergy phenotypes individually. Population-based association analysis was performed with each SNP locus, the promoter haplotypes and the 6-loci haplotypes. The 3′ UTR SNP was significantly associated with FEV1 as a percent of predicted (FEV1PP) (P=0.0002) in both the family and population analyses. The promoter haplotype GCC was positively associated with IgE levels and FEV1PP (P=0.007 and 0.012, respectively). The promoter haplotype ATA was negatively associated with lnPC20 and FEV1PP (P=0.008 and 0.043, respectively). Polymorphisms in IL10 are associated with asthma phenotypes in this cohort. Further studies of variation in the IL10 gene may help elucidate the mechanism of asthma development in children.
interleukin 10 (IL10); single nucleotide polymorphism (SNP); genetic association; family-based association test (FBAT); haplotype; promoter; 3′; untranslated region (3′UTR)
The response to treatment for asthma is characterized by wide interindividual variability, with a significant number of patients who have no response. We hypothesized that a genomewide association study would reveal novel pharmacogenetic determinants of the response to inhaled glucocorticoids.
We analyzed a small number of statistically powerful variants selected on the basis of a family-based screening algorithm from among 534,290 single-nucleotide polymorphisms (SNPs) to determine changes in lung function in response to inhaled glucocorticoids. A significant, replicated association was found, and we characterized its functional effects.
We identified a significant pharmacogenetic association at SNP rs37972, replicated in four independent populations totaling 935 persons (P = 0.0007), which maps to the glucocorticoid-induced transcript 1 gene (GLCCI1) and is in complete linkage disequilibrium (i.e., perfectly correlated) with rs37973. Both rs37972 and rs37973 are associated with decrements in GLCCI1 expression. In isolated cell systems, the rs37973 variant is associated with significantly decreased luciferase reporter activity. Pooled data from treatment trials indicate reduced lung function in response to inhaled glucocorticoids in subjects with the variant allele (P = 0.0007 for pooled data). Overall, the mean (± SE) increase in forced expiratory volume in 1 second in the treated subjects who were homozygous for the mutant rs37973 allele was only about one third of that seen in similarly treated subjects who were homozygous for the wild-type allele (3.2 ± 1.6% vs. 9.4 ± 1.1%), and their risk of a poor response was significantly higher (odds ratio, 2.36; 95% confidence interval, 1.27 to 4.41), with genotype accounting for about 6.6% of overall inhaled glucocorticoid response variability.
A functional GLCCI1 variant is associated with substantial decrements in the response to inhaled glucocorticoids in patients with asthma. (Funded by the National Institutes of Health and others; ClinicalTrials.gov number, NCT00000575.)
Reliable identification of cis regulatory elements influencing transcription remains a challenging problem in molecular
bioinformatics. This is especially true for enhancer elements which are often located hundreds of kilobases from the gene promoter.
High resolution DNase hypersensitivity and connectivity profiling by the ENCODE consortium provides evidence of millions of
interacting cis-acting elements in the human genome. This prior knowledge can be incorporated into genome-wide expression
analyses, in the form of gene sets sharing regulatory sequence motifs in known DNase hypersensitivity peak regions. High
proportions of enrichment among the most extreme differentially transcribed genes from controlled biological experiments may
suggest novel hypotheses about signalling pathways. The utility of this approach is demonstrated with the reanalysis of a
microarray-derived gene expression data set through the Gene Set Enrichment Analysis pipeline, uncovering new putative distal
cis elements in the context of innate immunity. The DNase Hypersensitivity Connectivity informed Motif Enrichment in Gene
Expression (DHC-MEGE) method described here has the advantage of identifying distal elements such as enhancers, which are
often overlooked with standard promoter motif analysis.
The DHC-MEGE shell script can be obtained from Sourceforge https://sourceforge.net/projects/dhcmege/ and the
generated GMT file is attached as supplementary data.
DNAse hypersensitivity; motif enrichment; gene expression; enhancer; gene set enrichment analysis
Electronic medical record (EMR) systems are a rich potential source for detailed, timely, and efficient surveillance of large populations. We created the Electronic medical record Support for Public Health (ESP) system to facilitate and demonstrate the potential advantages of harnessing EMRs for public health surveillance. ESP organizes and analyzes EMR data for events of public health interest and transmits electronic case reports or aggregate population summaries to public health agencies as appropriate. It is designed to be compatible with any EMR system and can be customized to different states’ messaging requirements. All ESP code is open source and freely available. ESP currently has modules for notifiable disease, influenza-like illness syndrome, and diabetes surveillance.
An intelligent presentation system for ESP called the RiskScape is under development. The RiskScape displays surveillance data in an accessible and intelligible format by automatically mapping results by zip code, stratifying outcomes by demographic and clinical parameters, and enabling users to specify custom queries and stratifications. The goal of RiskScape is to provide public health practitioners with rich, up-to-date views of health measures that facilitate timely identification of health disparities and opportunities for targeted interventions. ESP installations are currently operational in Massachusetts and Ohio, providing live, automated surveillance on over 1 million patients. Additional installations are underway at two more large practices in Massachusetts.
Genome-wide association studies of human gene expression promise to identify functional regulatory genetic variation that contributes to phenotypic diversity. However, it is unclear how useful this approach will be for the identification of disease-susceptibility variants. We generated gene expression profiles for 22 184 mRNA transcripts using RNA derived from peripheral blood CD4+ lymphocytes, and genome-wide genotype data for 516 512 autosomal markers in 200 subjects. We screened for cis-acting variants by testing variants mapping within 50 kb of expressed transcripts for association with transcript abundance using generalized linear models. Significant associations were identified for 1585 genes at a false discovery rate of 0.05 (corresponding to P-values ranging from 1 × 10−91 to 7 × 10−4). Importantly, we identified evidence of regulatory variation for 119 previously mapped disease genes, including 24 examples where the variant with the strongest evidence of disease-association demonstrates strong association with specific transcript abundance. The prevalence of cis-acting variants among disease-associated genes was 63% higher than the genome-wide rate in our data set (P = 6.41 × 10−6), and although many of the implicated loci were associated with immune-related diseases (including asthma, connective tissue disorders and inflammatory bowel disease), associations with genes implicated in non-immune-related diseases including lipid profiles, anthropomorphic measurements, cancer and neurologic disease were also observed. Genetic variants that confer inter-individual differences in gene expression represent an important subset of variants that contribute to disease susceptibility. Population-based integrative genetic approaches can help identify such variation and enhance our understanding of the genetic basis of complex traits.
A 900-KB inversion exists within a large region of conserved linkage disequilibrium (LD) on chromosome 17. CRHR1 is located within the inversion region and associated with inhaled corticosteroid response in asthma. We hypothesized that CRHR1 variants are in LD with the inversion, supporting a potential role for natural selection in the genetic response to corticosteroids. We genotyped 6 single nucleotide polymorphisms (SNPs) spanning chr17:40,410,565–42,372,240, including 4 SNPs defining inversion status. Similar allele frequencies and strong LD were noted between the inversion and a CRHR1 SNP previously associated with lung function response to inhaled corticosteroids. Each inversion-defining SNP was strongly associated with inhaled corticosteroid response in adult asthma (p-values 0.002–0.005). The CRHR1 response to inhaled corticosteroids may thus be explained by natural selection resulting from inversion status or by long-range LD with another gene. Additional pharmacogenetic investigations into to regions of chromosomal diversity, including copy number variation and inversions, are warranted.
CRHR1; tau haplotype; MAPT; inversion; asthma; corticosteroid; pharmacogenetics
Corticotropin - releasing hormone receptor 2 (CRHR2) participates in smooth muscle relaxation response and may influence acute airway bronchodilator response to short – acting β2 agonist treatment of asthma. We aim to assess associations between genetic variants of CRHR2 and acute bronchodilator response in asthma.
We investigated 28 single nucleotide polymorphisms in CRHR2 for associations with acute bronchodilator response to albuterol in 607 Caucasian asthmatic subjects recruited as part of the Childhood Asthma Management Program (CAMP). Replication was conducted in two Caucasian adult asthma cohorts – a cohort of 427 subjects enrolled in a completed clinical trial conducted by Sepracor Inc. (MA, USA) and a cohort of 152 subjects enrolled in the Clinical Trial of Low-Dose Theopylline and Montelukast (LODO) conducted by the American Lung Association Asthma Clinical Research Centers.
Five variants were significantly associated with acute bronchodilator response in at least one cohort (p-value ≤ 0.05). Variant rs7793837 was associated in CAMP and LODO (p-value = 0.05 and 0.03, respectively) and haplotype blocks residing at the 5’ end of CRHR2 were associated with response in all three cohorts.
We report for the first time, at the gene level, replicated associations between CRHR2 and acute bronchodilator response. While no single variant was significantly associated in all three cohorts, the findings that variants at the 5’ end of CRHR2 are associated in each of three cohorts strongly suggest that the causative variants reside in this region and its genetic effect, although present, is likely to be weak.
Asthma; genetics; corticotrophin releasing hormone receptor 2; CRHR2; bronchodilator response; polymorphism; β2 adrenergic receptor agonist
Electronic health records (EHRs) have the potential to improve completeness and timeliness of tuberculosis (TB) surveillance relative to traditional reporting, particularly for culture-negative disease. We report on the development and validation of a TB detection algorithm for EHR data followed by implementation in a live surveillance and reporting system.
We used structured electronic data from an ambulatory practice in eastern Massachusetts to develop a screening algorithm aimed at achieving 100% sensitivity for confirmed active TB with the highest possible positive predictive value (PPV) for physician-suspected disease. We validated the algorithm in 16 years of retrospective electronic data and then implemented it in a real-time EHR-based surveillance system. We assessed PPV and the completeness of case capture relative to conventional reporting in 18 months of prospective surveillance.
The final algorithm required a prescription for pyrazinamide, an International Classification of Diseases, Ninth Revision (ICD-9) code for TB and prescriptions for two antituberculous medications, or an ICD-9 code for TB and an order for a TB diagnostic test. During validation, this algorithm had a PPV of 84% (95% confidence interval 78, 88) for physician-suspected disease. One-third of confirmed cases were culture-negative. All false-positives were instances of latent TB. In 18 months of prospective EHR-based surveillance with this algorithm, seven additional cases of physician-suspected active TB were detected, including two patients with culture-negative disease. A review of state health department records revealed no cases missed by the algorithm.
Live, prospective TB surveillance using EHR data is feasible and promising.
Rationale: Several family-based studies have identified genetic linkage for lung function and airflow obstruction to chromosome 2q.
Objectives: We hypothesized that merging results of high-resolution single nucleotide polymorphism (SNP) mapping in four separate populations would lead to the identification of chronic obstructive pulmonary disease (COPD) susceptibility genes on chromosome 2q.
Methods: Within the chromosome 2q linkage region, 2,843 SNPs were genotyped in 806 COPD cases and 779 control subjects from Norway, and 2,484 SNPs were genotyped in 309 patients with severe COPD from the National Emphysema Treatment Trial and 330 community control subjects. Significant associations from the combined results across the two case-control studies were followed up in 1,839 individuals from 603 families from the International COPD Genetics Network (ICGN) and in 949 individuals from 127 families in the Boston Early-Onset COPD Study.
Measurements and Main Results: Merging the results of the two case-control analyses, 14 of the 790 overlapping SNPs had a combined P < 0.01. Two of these 14 SNPs were consistently associated with COPD in the ICGN families. The association with one SNP, located in the gene XRCC5, was replicated in the Boston Early-Onset COPD Study, with a combined P = 2.51 × 10−5 across the four studies, which remains significant when adjusted for multiple testing (P = 0.02). Genotype imputation confirmed the association with SNPs in XRCC5.
Conclusions: By combining data from COPD genetic association studies conducted in four independent patient samples, we have identified XRCC5, an ATP-dependent DNA helicase, as a potential COPD susceptibility gene.
emphysema; genetic linkage; metaanalysis; single nucleotide polymorphism
Pathogens have represented an important selective force during the adaptation of modern human populations to changing social and other environmental conditions. The evolution of the immune system has therefore been influenced by these pressures. Genomic scans have revealed that immune system is one of the functions enriched with genes under adaptive selection.
Here, we describe how the innate immune system has responded to these challenges, through the analysis of resequencing data for 132 innate immunity genes in two human populations. Results are interpreted in the context of the functional and interaction networks defined by these genes. Nucleotide diversity is lower in the adaptors and modulators functional classes, and is negatively correlated with the centrality of the proteins within the interaction network. We also produced a list of candidate genes under positive or balancing selection in each population detected by neutrality tests and showed that some functional classes are preferential targets for selection.
We found evidence that the role of each gene in the network conditions the capacity to evolve or their evolvability: genes at the core of the network are more constrained, while adaptation mostly occurred at particular positions at the network edges. Interestingly, the functional classes containing most of the genes with signatures of balancing selection are involved in autoinflammatory and autoimmune diseases, suggesting a counterbalance between the beneficial and deleterious effects of the immune response.
Network modeling of whole transcriptome expression data enables characterization of complex epistatic (gene-gene) interactions that underlie cellular functions. Though numerous methods have been proposed and successfully implemented to develop these networks, there are no formal methods for comparing differences in network connectivity patterns as a function of phenotypic trait.
Here we describe a novel approach for quantifying the differences in gene-gene connectivity patterns across disease states based on Graphical Gaussian Models (GGMs). We compare the posterior probabilities of connectivity for each gene pair across two disease states, expressed as a posterior odds-ratio (postOR) for each pair, which can be used to identify network components most relevant to disease status. The method can also be generalized to model differential gene connectivity patterns within previously defined gene sets, gene networks and pathways. We demonstrate that the GGM method reliably detects differences in network connectivity patterns in datasets of varying sample size. Applying this method to two independent breast cancer expression data sets, we identified numerous reproducible differences in network connectivity across histological grades of breast cancer, including several published gene sets and pathways. Most notably, our model identified two gene hubs (MMP12 and CXCL13) that each exhibited differential connectivity to more than 30 transcripts in both datasets. Both genes have been previously implicated in breast cancer pathobiology, but themselves are not differentially expressed by histologic grade in either dataset, and would thus have not been identified using traditional differential gene expression testing approaches. In addition, 16 curated gene sets demonstrated significant differential connectivity in both data sets, including the matrix metalloproteinases, PPAR alpha sequence targets, and the PUFA synthesis pathway.
Our results suggest that GGM can be used to formally evaluate differences in global interactome connectivity across disease states, and can serve as a powerful tool for exploring the molecular events that contribute to disease at a systems level.
Rationale: Animal models demonstrate that aberrant gene expression in utero can result in abnormal pulmonary phenotypes.
Objectives: We sought to identify genes that are differentially expressed during in utero airway development and test the hypothesis that variants in these genes influence lung function in patients with asthma.
Methods: Stage 1 (Gene Expression): Differential gene expression analysis across the pseudoglandular (n = 27) and canalicular (n = 9) stages of human lung development was performed using regularized t tests with multiple comparison adjustments. Stage 2 (Genetic Association): Genetic association analyses of lung function (FEV1, FVC, and FEV1/FVC) for variants in five differentially expressed genes were conducted in 403 parent-child trios from the Childhood Asthma Management Program (CAMP). Associations were replicated in 583 parent-child trios from the Genetics of Asthma in Costa Rica study.
Measurements and Main Results: Of the 1,776 differentially expressed genes between the pseudoglandular (gestational age: 7–16 wk) and the canalicular (gestational age: 17–26 wk) stages, we selected 5 genes in the Wnt pathway for association testing. Thirteen single nucleotide polymorphisms in three genes demonstrated association with lung function in CAMP (P < 0.05), and associations for two of these genes were replicated in the Costa Ricans: Wnt1-inducible signaling pathway protein 1 with FEV1 (combined P = 0.0005) and FVC (combined P = 0.0004), and Wnt inhibitory factor 1 with FVC (combined P = 0.003) and FEV1/FVC (combined P = 0.003).
Conclusions: Wnt signaling genes are associated with impaired lung function in two childhood asthma cohorts. Furthermore, gene expression profiling of human fetal lung development can be used to identify genes implicated in the pathogenesis of lung function impairment in individuals with asthma.
asthma; lung development; lung function; genetic variation; gene expression
Prior studies suggest a role for a variant (rs5743836) in the promoter of toll-like receptor 9 (TLR9) in asthma and other inflammatory diseases. We performed detailed genetic association studies of the functional variant rs5743836 with asthma susceptibility and asthma-related phenotypes in three independent cohorts.
rs5743836 was genotyped in two family-based cohorts of children with asthma and a case-control study of adult asthmatics. Association analyses were performed using chi square, family-based and population-based testing. A luciferase assay was performed to investigate whether rs5743836 genotype influences TLR9 promoter activity.
Contrary to prior reports, rs5743836 was not associated with asthma in any of the three cohorts. Marginally significant associations were found with FEV1 and FVC (p = 0.003 and p = 0.008, respectively) in one of the family-based cohorts, but these associations were not significant after correcting for multiple comparisons. Higher promoter activity of the CC genotype was demonstrated by luciferase assay, confirming the functional importance of this variant.
Although rs5743836 confers regulatory effects on TLR9 transcription, this variant does not appear to be an important asthma-susceptibility locus.
Genetic variation at the MYH9 locus is linked to the high incidence of focal segmental glomerulosclerosis (FSGS) and non-diabetic end-stage renal disease among African Americans. To further define risk alleles with FSGS we performed a genome-wide association analysis using more than one million single nucleotide polymorphisms in 56 African and 61 European American patients with biopsy-confirmed FSGS. Results were compared to 1641 European and 1800 African Americans as unselected controls. While no association was observed in the cohort of European Americans; the case-control comparison of African Americans found variants within a 60kb region of chromosome 22 containing part of the APOL1 and MYH9 genes associated with increased risk of FSGS. This region spans different linkage disequilibrium blocks and variants associating with disease within this region are in linkage disequilibrium with variants which have shown signals of natural selection. APOL1 is a strong candidate for a gene that has undergone recent natural selection and is known to be involved in the infection by Trypanosome brucei, a parasite common in Africa that has recently adapted to infect human hosts. Further studies will be required to establish which variants are causally related to kidney disease, what mutations caused the selective sweep, and to ultimately determine if these are the same.
focal segmental glomerulosclerosis; end stage kidney disease; genetic renal disease