Responses by resident cells are likely to play a key role in determining the severity of respiratory disease. However, sampling of the airways poses a significant challenge, particularly in infants and children. Here, we report a reliable method for obtaining nasal epithelial cell RNA from infants for genome-wide transcriptomic analysis, and describe baseline expression characteristics in an asymptomatic cohort. Nasal epithelial cells were collected by brushing of the inferior turbinates, and gene expression was interrogated by RNA-seq analysis. Reliable recovery of RNA occurred in the absence of adverse events. We observed high expression of epithelial cell markers and similarity to the transcriptome for intrapulmonary airway epithelial cells. We identified genes displaying low and high expression variability, both inherently, and in response to environmental exposures. The greatest gene expression differences in this asymptomatic cohort were associated with the presence of known pathogenic viruses and/or bacteria. Robust bacteria-associated gene expression patterns were significantly associated with the presence of Moraxella. In summary, we have developed a reliable method for interrogating the infant airway transcriptome by sampling the nasal epithelium. Our data demonstrates both the fidelity and feasibility of our methodology, and describes normal gene expression and variation within a healthy infant cohort.
DNA methylation, a major epigenetic mechanism, may regulate coordinated expression of multiple genes at specific time points during alveolar septation in lung development. The objective of this study was to identify genes regulated by methylation during normal septation in mice and during disordered septation in bronchopulmonary dysplasia. In mice, newborn lungs (preseptation) and adult lungs (postseptation) were evaluated by microarray analysis of gene expression and immunoprecipitation of methylated DNA followed by sequencing (MeDIP-Seq). In humans, microarray gene expression data were integrated with genome-wide DNA methylation data from bronchopulmonary dysplasia versus preterm and term lung. Genes with reciprocal changes in expression and methylation, suggesting regulation by DNA methylation, were identified. In mice, 95 genes with inverse correlation between expression and methylation during normal septation were identified. In addition to genes known to be important in lung development (Wnt signaling, Angpt2, Sox9, etc.) and its extracellular matrix (Tnc, Eln, etc.), genes involved with immune and antioxidant defense (Stat4, Sod3, Prdx6, etc.) were also observed. In humans, 23 genes were differentially methylated with reciprocal changes in expression in bronchopulmonary dysplasia compared with preterm or term lung. Genes of interest included those involved with detoxifying enzymes (Gstm3) and transforming growth factor-β signaling (bone morphogenetic protein 7 [Bmp7]). In terms of overlap, 20 genes and three pathways methylated during mouse lung development also demonstrated changes in methylation between preterm and term human lung. Changes in methylation correspond to altered expression of a number of genes associated with lung development, suggesting that DNA methylation of these genes may regulate normal and abnormal alveolar septation.
lung development; premature infant; epigenetics; bronchopulmonary dysplasia
To identify single nucleotide polymorphisms (SNPs) and pathways
associated with bronchopulmonary dysplasia (BPD) because O2
requirement at 36 weeks’ post-menstrual age risk is strongly
influenced by heritable factors.
A genome-wide scan was conducted on 1.2 million genotyped SNPs, and
an additional 7 million imputed SNPs, using a DNA repository of extremely
low birth weight infants. Genome-wide association and gene set analysis was
performed for BPD or death, severe BPD or death, and severe BPD in
survivors. Specific targets were validated using gene expression in BPD lung
tissue and in mouse models.
Of 751 infants analyzed, 428 developed BPD or died. No SNPs achieved
genome-wide significance (p<10−8) although
multiple SNPs in adenosine deaminase (ADARB2), CD44, and other genes were
just below p<10−6. Of approximately 8000
pathways, 75 were significant at False Discovery Rate (FDR) <0.1 and
p<0.001 for BPD/death, 95 for severe BPD/death, and 90 for severe
BPD in survivors. The pathway with lowest FDR was miR-219 targets
(p=1.41E-08, FDR 9.5E-05) for BPD/death and Phosphorous Oxygen Lyase
Activity (includes adenylate and guanylate cyclases) for both severe
BPD/death (p=5.68E-08, FDR 0.00019) and severe BPD in survivors (p=3.91E-08,
FDR 0.00013). Gene expression analysis confirmed significantly increased
miR-219 and CD44 in BPD.
Pathway analyses confirmed involvement of known pathways of lung
development and repair (CD44, Phosphorus Oxygen Lyase Activity) and
indicated novel molecules and pathways (ADARB2, Targets of miR-219) involved
in genetic predisposition to BPD.
Bronchopulmonary dysplasia; Infant; premature; Infant mortality; Single nucleotide polymorphisms
A greater understanding of the regulatory processes contributing to lung development could be helpful to identify strategies to ameliorate morbidity and mortality in premature infants and to identify individuals at risk for congenital and/or chronic lung diseases. Over the past decade, genomics technologies have enabled the production of rich gene expression databases providing information for all genes across developmental time or in diseased tissue. These data sets facilitate systems biology approaches for identifying underlying biological modules and programs contributing to the complex processes of normal development, and those that may be associated with disease states. The next decade will undoubtedly see rapid and significant advances in redefining both lung development and disease at the systems level.
Rationale: Bronchopulmonary dysplasia (BPD) is a major complication of premature birth. Risk factors for BPD are complex and include prenatal infection and O2 toxicity. BPD pathology is equally complex and characterized by inflammation and dysmorphic airspaces and vasculature. Due to the limited availability of clinical samples, an understanding of the molecular pathogenesis of this disease and its causal mechanisms and associated biomarkers is limited.
Objectives: Apply genome-wide expression profiling to define pathways affected in BPD lungs.
Methods: Lung tissue was obtained at autopsy from 11 BPD cases and 17 age-matched control subjects without BPD. RNA isolated from these tissue samples was interrogated using microarrays. Standard gene selection and pathway analysis methods were applied to the data set. Abnormal expression patterns were validated by quantitative reverse transcriptase–polymerase chain reaction and immunohistochemistry.
Measurements and Main Results: We identified 159 genes differentially expressed in BPD tissues. Pathway analysis indicated previously appreciated (e.g., DNA damage regulation of cell cycle) as well as novel (e.g., B-cell development) biological functions were affected. Three of the five most highly induced genes were mast cell (MC)-specific markers. We confirmed an increased accumulation of connective tissue MCTC (chymase expressing) mast cells in BPD tissues. Increased expression of MCTC markers was also demonstrated in an animal model of BPD-like pathology.
Conclusions: We present a unique genome-wide expression data set from human BPD lung tissue. Our data provide information on gene expression patterns associated with BPD and facilitated the discovery that MCTC accumulation is a prominent feature of this disease. These observations have significant clinical and mechanistic implications.
microarray; tryptase; chymase; carboxypeptidase A3; bronchopulmonary dysplasia
Traditional genome-wide association studies (GWAS) of large cohort of subjects with chronic obstructive pulmonary disease (COPD) have successfully identified novel candidate genes, but several other plausible loci do not meet strict criteria for genome-wide significance after correction for multiple testing.
We hypothesize that by applying unbiased weights derived from unique populations we can identify additional COPD susceptibility loci.
We performed a homozygosity haplotype analysis on a group of subjects with and without COPD to identify regions of conserved homozygosity (RCHH). Weights were constructed based on the frequency of these RCHH in case vs. controls, and used to adjust the P values from a large collaborative GWAS of COPD.
We identified 2,318 regions of conserved homozygosity, of which 576 were significantly (P < .05) overrepresented in cases. After applying the weights constructed from these regions to a collaborative GWAS of COPD, we identified two single nucleotide polymorphisms in a novel gene (FGF7) that gained genome-wide significance by the false discovery rate method. In a follow-up analysis, both SNPs (rs12591300 and rs4480740) were significantly associated with COPD in an independent population (combined P values of 7.9E-07 and 2.8E-06 respectively). In another independent population, increased lung tissue FGF7 expression was associated with worse measures of lung function.
Weights constructed from a homozygosity haplotype analysis of an isolated population successfully identify novel genetic associations from a GWAS on a separate population. This method can be used to identify promising candidate genes that fail to meet strict correction for multiple testing.
Rationale: Chromosome 12p has been linked to chronic obstructive pulmonary disease (COPD) in the Boston Early-Onset COPD Study (BEOCOPD), but a susceptibility gene in that region has not been identified.
Objectives: We used high-density single-nucleotide polymorphism (SNP) mapping to implicate a COPD susceptibility gene and an animal model to determine the potential role of SOX5 in lung development and COPD.
Methods: On chromosome 12p, we genotyped 1,387 SNPs in 386 COPD cases from the National Emphysema Treatment Trial and 424 control smokers from the Normative Aging Study. SNPs with significant associations were then tested in the BEOCOPD study and the International COPD Genetics Network. Based on the human results, we assessed histology and gene expression in the lungs of Sox5−/− mice.
Measurements and Main Results: In the case-control analysis, 27 SNPs were significant at P ≤ 0.01. The most significant SNP in the BEOCOPD replication was rs11046966 (National Emphysema Treatment Trial–Normative Aging Study P = 6.0 × 10−4, BEOCOPD P = 1.5 × 10−5, combined P = 1.7 × 10−7), located 3′ to the gene SOX5. Association with rs11046966 was not replicated in the International COPD Genetics Network. Sox5−/− mice showed abnormal lung development, with a delay in maturation before the saccular stage, as early as E16.5. Lung pathology in Sox5−/− lungs was associated with a decrease in fibronectin expression, an extracellular matrix component critical for branching morphogenesis.
Conclusions: Genetic variation in the transcription factor SOX5 is associated with COPD susceptibility. A mouse model suggests that the effect may be due, in part, to its effects on lung development and/or repair processes.
chronic obstructive pulmonary disease; emphysema; knockout mice; lung development; single nucleotide polymorphism
To identify non-invasive gene expression markers for chronic obstructive pulmonary disease (COPD), we performed genome-wide expression profiling of peripheral blood samples from 12 subjects with significant airflow obstruction and an equal number of non-obstructed controls. RNA was isolated from Peripheral Blood Mononuclear Cells (PBMCs) and gene expression was assessed using Affymetrix U133 Plus 2.0 arrays.
Tests for gene expression changes that discriminate between COPD cases (FEV1< 70% predicted, FEV1/FVC < 0.7) and controls (FEV1> 80% predicted, FEV1/FVC > 0.7) were performed using Significance Analysis of Microarrays (SAM) and Bayesian Analysis of Differential Gene Expression (BADGE). Using either test at high stringency (SAM median FDR = 0 or BADGE p < 0.01) we identified differential expression for 45 known genes. Correlation of gene expression with lung function measurements (FEV1 & FEV1/FVC), using both Pearson and Spearman correlation coefficients (p < 0.05), identified a set of 86 genes. A total of 16 markers showed evidence of significant correlation (p < 0.05) with quantitative traits and differential expression between cases and controls. We further compared our peripheral gene expression markers with those we previously identified from lung tissue of the same cohort. Two genes, RP9and NAPE-PLD, were identified as decreased in COPD cases compared to controls in both lung tissue and blood. These results contribute to our understanding of gene expression changes in the peripheral blood of patients with COPD and may provide insight into potential mechanisms involved in the disease.
Microarray; Biomarkers; PBMC
Rationale: The mechanisms contributing to alveolar formation are poorly understood. A better understanding of these processes will improve efforts to ameliorate lung disease of the newborn and promote alveolar repair in the adult. Previous studies have identified impaired alveogenesis in mice bearing compound mutations of fibroblast growth factor (FGF) receptors (FGFRs) 3 and 4, indicating that these receptors cooperatively promote postnatal alveolar formation.
Objectives: To determine the molecular and cellular mechanisms of FGF-mediated alveolar formation.
Methods: Compound FGFR3/FGFR4-deficient mice were assessed for temporal changes in lung growth, airspace morphometry, and genome-wide expression. Observed gene expression changes were validated using quantitative real-time RT-PCR, tissue biochemistry, histochemistry, and ELISA. Autocrine and paracrine regulatory mechanisms were investigated using isolated lung mesenchymal cells and type II pneumocytes.
Measurements and Main Results: Quantitative analysis of airspace ontogeny confirmed a failure of secondary crest elongation in compound mutant mice. Genome-wide expression profiling identified molecular alterations in these mice involving aberrant expression of numerous extracellular matrix molecules. Biochemical and histochemical analysis confirmed changes in elastic fiber gene expression resulted in temporal increases in elastin deposition with the loss of typical spatial restriction. No abnormalities in elastic fiber gene expression were observed in isolated mesenchymal cells, indicating that abnormal elastogenesis in compound mutant mice is not cell autonomous. Increased expression of paracrine factors, including insulin-like growth factor−1, in freshly-isolated type II pneumocytes indicated that these cells contribute to the observed pathology.
Conclusions: Epithelial/mesenchymal signaling mechanisms appear to contribute to FGFR-dependent alveolar elastogenesis and proper airspace formation.
lung development; fibroblast growth factor receptor; alveogenesis; insulin-like growth factor−1; microarray
Rationale: Current understanding of the molecular regulation of lung development is limited and derives mostly from animal studies.
Objectives: To define global patterns of gene expression during human lung development.
Methods: Genome-wide expression profiling was used to measure the developing lung transcriptome in RNA samples derived from 38 normal human lung tissues at 53 to 154 days post conception. Principal component analysis was used to characterize global expression variation and to identify genes and bioontologic attributes contributing to these variations. Individual gene expression patterns were verified by quantitative reverse transcriptase–polymerase chain reaction analysis.
Measurements and Main Results: Gene expression analysis identified attributes not previously associated with lung development, such as chemokine-immunologic processes. Lung characteristics attributes (e.g., surfactant function) were observed at an earlier-than-anticipated age. We defined a 3,223 gene developing lung characteristic subtranscriptome capable of describing a majority of the process. In gene expression space, the samples formed a time-contiguous trajectory with transition points correlating with histological stages and suggesting the existence of novel molecular substages. Induction of surfactant gene expression characterized a pseudoglandular “molecular phase” transition. Individual gene expression patterns were independently validated. We predicted the age of independent human lung transcriptome profiles with a median absolute error of 5 days, supporting the validity of the data and modeling approach.
Conclusions: This study extends our knowledge of key gene expression patterns and bioontologic attributes underlying early human lung developmental processes. The data also suggest the existence of molecular phases of lung development.
microarrays; surfactant; principal component analysis
Chronic obstructive pulmonary disease (COPD) is an inflammatory lung disorder with complex pathological features and largely unknown etiology. The identification of biomarkers for this disease could aid the development of methods to facilitate earlier diagnosis, the classification of disease subtypes, and provide a means to define therapeutic response. To identify gene expression biomarkers, we completed expression profiling of RNA derived from the lung tissue of 56 subjects with varying degrees of airflow obstruction using the Affymetrix U133 Plus 2.0 array. We applied multiple, independent analytical methods to define biomarkers for either discrete or quantitative disease phenotypes. Analysis of differential expression between cases (n = 15) and controls (n = 18) identified a set of 65 discrete biomarkers. Correlation of gene expression with quantitative measures of airflow obstruction (FEV1%predicted or FEV1/FVC) identified a set of 220 biomarkers. Biomarker genes were enriched in functions related to DNA binding and regulation of transcription. We used this group of biomarkers to predict disease in an unrelated data set, generated from patients with severe emphysema, with 97% accuracy. Our data contribute to the understanding of gene expression changes occurring in the lung tissue of patients with obstructive lung disease and provide additional insight into potential mechanisms involved in the disease process. Furthermore, we present the first gene expression biomarker for COPD validated in an independent data set.
microarray; gene expression; emphysema; lung function
A greater understanding of the regulatory processes contributing to lung development could help ameliorate morbidity and mortality in premature infants and identify individuals at risk for congenital and/or chronic lung diseases. Genomics technologies have provided rich gene expression datasets for the developing lung that enable systems biology approaches for identifying large-scale molecular signatures within this complex phenomenon. Here, we applied unsupervised principal component analysis on two developing lung datasets and identified common dominant transcriptomic signatures. Of particular interest, we identify an overlying biological program we term “time-to-birth,” which describes the distance in age from the day of birth. We identify groups of genes contributing to the time-to-birth molecular signature. Statistically overrepresented are genes involved in oxygen and gas transport activity, as expected for a transition to air breathing, as well as host defense function. In addition, we identify genes with expression patterns associated with the initiation of alveolar formation. Finally, we present validation of gene expression patterns across the two datasets, and independent validation of select genes by qPCR and immunohistochemistry. These data contribute to our understanding of genetic components contributing to large-scale biological processes and may be useful, particularly in animal models of abnormal lung development, to predict the state of organ development or preparation for birth.
lung development; microarray; principal component analysis
With the recent development of microarray technologies, the comparability of gene expression data obtained from different platforms poses an important problem. We evaluated two widely used platforms, Affymetrix U133 Plus 2.0 and the Illumina HumanRef-8 v2 Expression Bead Chips, for comparability in a biological system in which changes may be subtle, namely fetal lung tissue as a function of gestational age.
We performed the comparison via sequence-based probe matching between the two platforms. "Significance grouping" was defined as a measure of comparability. Using both expression correlation and significance grouping as measures of comparability, we demonstrated that despite overall cross-platform differences at the single gene level, increased correlation between the two platforms was found in genes with higher expression level, higher probe overlap, and lower p-value. We also demonstrated that biological function as determined via KEGG pathways or GO categories is more consistent across platforms than single gene analysis.
We conclude that while the comparability of the platforms at the single gene level may be increased by increasing sample size, they are highly comparable ontologically even for subtle differences in a relatively small sample size. Biologically relevant inference should therefore be reproducible across laboratories using different platforms.
The utility of previously generated microarray data is severely limited owing to small study size, leading to under-powered analysis, and failure of replication. Multiplicity of platforms and various sources of systematic noise limit the ability to compile existing data from similar studies. We present a model for transformation of data across different generations of Affymetrix arrays, developed using previously published datasets describing technical replicates performed with two generations of arrays. The transformation is based upon a probe set-specific regression model, generated from replicate measurements across platforms, performed using correlation coefficients. The model, when applied to the expression intensities of 5069 shared, sequence-matched probe sets in three different generations of Affymetrix Human oligonucleotide arrays, showed significant improvement in inter generation correlations between sample-wide means and individual probe set pairs. The approach was further validated by an observed reduction in Euclidean distance between signal intensities across generations for the predicted values. Finally, application of the model to independent, but related datasets resulted in improved clustering of samples based upon their biological, as opposed to technical, attributes. Our results suggest that this transformation method is a valuable tool for integrating microarray datasets from different generations of arrays.