1.  Gene Expression Profiling in Human Lung Development: An Abundant Resource for Lung Adenocarcinoma Prognosis 
PLoS ONE  2014;9(8):e105639.
A tumor can be viewed as a special “organ” that undergoes aberrant and poorly regulated organogenesis. Progress in cancer prognosis and therapy might be facilitated by re-examining distinctive processes that operate during normal development, to elucidate the intrinsic features of cancer that are significantly obscured by its heterogeneity. The global gene expression signatures of 44 human lung tissues at four development stages from Asian descent and 69 lung adenocarcinoma (ADC) tissue samples from ethnic Chinese patients were profiled using microarrays. All of the genes were classified into 27 distinct groups based on their expression patterns (named as PTN1 to PTN27) during the developmental process. In lung ADC, genes whose expression levels decreased steadily during lung development (genes in PTN1) generally had their expression reactivated, while those with uniformly increasing expression levels (genes in PTN27) had their expression suppressed. The genes in PTN1 contain many n-gene signatures that are of prognostic value for lung ADC. The prognostic relevance of a 12-gene demonstrator for patient survival was characterized in five cohorts of healthy and ADC patients [ADC_CICAMS (n = 69, p = 0.007), ADC_PNAS (n = 125, p = 0.0063), ADC_GSE13213 (n = 117, p = 0.0027), ADC_GSE8894 (n = 62, p = 0.01), and ADC_NCI (n = 282, p = 0.045)] and in four groups of stage I patients [ADC_CICAMS (n = 22, p = 0.017), ADC_PNAS (n = 76, p = 0.018), ADC_GSE13213 (n = 79, p = 0.02), and ADC_qPCR (n = 62, p = 0.006)]. In conclusion, by comparison of gene expression profiles during human lung developmental process and lung ADC progression, we revealed that the genes with a uniformly decreasing expression pattern during lung development are of enormous prognostic value for lung ADC.
PMCID: PMC4139381  PMID: 25141350
2.  Clinical biomarkers of pulmonary carcinoid tumors in never smokers via profiling miRNA and target mRNA 
Cell & Bioscience  2014;4:35.
miRNAs play key regulatory roles in cellular pathological processes. We aimed to identify clinically meaningful biomarkers in pulmonary carcinoid tumors (PCTs), a member of neuroendocrine neoplasms, via profiling miRNAs and mRNAs.
From the total of 1145 miRNAs, we obtained 16 and 17 miRNAs that showed positive and negative fold changes (FCs, tumors vs. normal tissues) in the top 1% differentially expressed miRNAs, respectively. We uncovered the target genes that were predicted by at least two prediction tools and overlapped by at least one-half of the top miRNAs, which yielded 44 genes (FC<-2) and 56 genes (FC>2), respectively. Higher expressions of CREB5, PTPRB and COL4A3 predicted favorable disease free survival (Hazard ratio: 0.03, 0.19 and 0.36; P value: 0.03, 0.03 and 0.08). Additionally, 79 mutated genes have been found in nine PCTs where TP53 was the only repeated mutation.
We identified that the expressions of three genes have clinical implications in PCTs. The biological functions of these biomarkers warrant further studies.
PMCID: PMC4124500  PMID: 25105010
miRNA; mRNA; Carcinoid; Survival
3.  Regulation of the SK3 channel by MicroRNA-499 - potential role in atrial fibrillation 
MicroRNAs are important regulators of gene expression, including those involving electrical remodeling in atrial fibrillation (AF). Recently, KCNN3, the gene that encodes the small conductance calcium-activated potassium channel 3 (SK3), was found to be strongly associated with AF.
This study sought to evaluate the changes in atrial myocardial microRNAs in patients with permanent AF and to determine the role of microRNA on the regulation of cardiac SK3 expression.
Atrial tissue obtained during cardiac surgery from patients (4 sinus rhythm and 4 permanent AF) was analyzed by microRNA arrays. Potential targets of microRNAs were predicted by software programs. The effects of specific microRNAs on target gene expression were evaluated in HL-1 cells from a continuously proliferating mouse hyperplastic atrial cardiomyocyte cell line. Interactions between microRNAs and targets were further evaluated by luciferase reporter assay and by Argonaute pull-down assay.
Twenty one microRNAs showed significant, greater than two-fold changes in AF. miR-499 was upregulated by 2.33 fold (P<0.01) in AF atria, whereas SK3 protein expression was down-regulated by 46% (P<0.05). Transfection of miR-499 mimic in HL-1 cells resulted in the downregulation of SK3 protein expression, while that of miR-499 inhibitor upregulated SK3 expression. Binding of miR-499 to the 3′UTR of KCNN3 was confirmed by luciferase reporter assay and by the enhanced presence of SK3 mRNA in Argonaute pulled-down microRNA-induced silencing complexes (mRISC) after transfection with miR-499.
Atrial miRNA-499 is significantly upregulated in AF, leading to SK3 downregulation and possibly contributing to the electrical remodeling in AF.
PMCID: PMC3710704  PMID: 23499625
atrial fibrillation; microRNA; SK3 channel; electrical remodeling; small-conductance calcium-activated potassium channel
4.  Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levels 
MAPT encodes for tau, the predominant component of neurofibrillary tangles that are neuropathological hallmarks of Alzheimer’s disease (AD). Genetic association of MAPT variants with late-onset AD (LOAD) risk has been inconsistent, although insufficient power and incomplete assessment of MAPT haplotypes may account for this.
We examined the association of MAPT haplotypes with LOAD risk in more than 20,000 subjects (n-cases = 9,814, n-controls = 11,550) from Mayo Clinic (n-cases = 2,052, n-controls = 3,406) and the Alzheimer’s Disease Genetics Consortium (ADGC, n-cases = 7,762, n-controls = 8,144). We also assessed associations with brain MAPT gene expression levels measured in the cerebellum (n = 197) and temporal cortex (n = 202) of LOAD subjects. Six single nucleotide polymorphisms (SNPs) which tag MAPT haplotypes with frequencies greater than 1% were evaluated.
H2-haplotype tagging rs8070723-G allele associated with reduced risk of LOAD (odds ratio, OR = 0.90, 95% confidence interval, CI = 0.85-0.95, p = 5.2E-05) with consistent results in the Mayo (OR = 0.81, p = 7.0E-04) and ADGC (OR = 0.89, p = 1.26E-04) cohorts. rs3785883-A allele was also nominally significantly associated with LOAD risk (OR = 1.06, 95% CI = 1.01-1.13, p = 0.034). Haplotype analysis revealed significant global association with LOAD risk in the combined cohort (p = 0.033), with significant association of the H2 haplotype with reduced risk of LOAD as expected (p = 1.53E-04) and suggestive association with additional haplotypes. MAPT SNPs and haplotypes also associated with brain MAPT levels in the cerebellum and temporal cortex of AD subjects with the strongest associations observed for the H2 haplotype and reduced brain MAPT levels (β = -0.16 to -0.20, p = 1.0E-03 to 3.0E-03).
These results confirm the previously reported MAPT H2 associations with LOAD risk in two large series, that this haplotype has the strongest effect on brain MAPT expression amongst those tested and identify additional haplotypes with suggestive associations, which require replication in independent series. These biologically congruent results provide compelling evidence to screen the MAPT region for regulatory variants which confer LOAD risk by influencing its brain gene expression.
PMCID: PMC4198935  PMID: 25324900
5.  Conserved recurrent gene mutations correlate with pathway deregulation and clinical outcomes of lung adenocarcinoma in never-smokers 
BMC Medical Genomics  2014;7:32.
Novel and targetable mutations are needed for improved understanding and treatment of lung cancer in never-smokers.
Twenty-seven lung adenocarcinomas from never-smokers were sequenced by both exome and mRNA-seq with respective normal tissues. Somatic mutations were detected and compared with pathway deregulation, tumor phenotypes and clinical outcomes.
Although somatic mutations in DNA or mRNA ranged from hundreds to thousands in each tumor, the overlap mutations between the two were only a few to a couple of hundreds. The number of somatic mutations from either DNA or mRNA was not significantly associated with clinical variables; however, the number of overlap mutations was associated with cancer subtype. These overlap mutants were preferentially expressed in mRNA with consistently higher allele frequency in mRNA than in DNA. Ten genes (EGFR, TP53, KRAS, RPS6KB2, ATXN2, DHX9, PTPN13, SP1, SPTAN1 and MYOF) had recurrent mutations and these mutations were highly correlated with pathway deregulation and patient survival.
The recurrent mutations present in both DNA and RNA are likely the driver for tumor biology, pathway deregulation and clinical outcomes. The information may be used for patient stratification and therapeutic target development.
PMCID: PMC4060138  PMID: 24894543
Lung adenocarcinoma of never smoker; Somatic mutations; Pathway deregulation; Patient survival
6.  Network-based approach identified cell cycle genes as predictor of overall survival in lung adenocarcinoma patients 
Lung adenocarcinoma is the most common type of primary lung cancer. The purpose of this study was to delineate gene expression patterns for survival prediction in lung adenocarcinoma. Gene expression profiles of 82 (discovery set) and 442 (validation set 1) lung adenocarcinoma tumor tissues were analyzed using a systems biology-based network approach. We also examined the expression profiles of 78 adjacent normal lung tissues from 82 patients. We found a significant correlation of an expression module with overall survival (adjusted hazard ratio or HR=1.71; 95% CI=1.06-2.74 in discovery set; adjusted HR=1.26; 95% CI=1.08-1.49 in validation set 1). This expression module contained genes enriched in the biological process of the cell cycle. Interestingly, the cell cycle gene module and overall survival association were also significant in normal lung tissues (adjusted HR=1.91; 95% CI, 1.32-2.75). From these survival-related modules, we further defined three hub genes (UBE2C, TPX2 and MELK) whose expression-based risk indices were more strongly associated with poor 5-year survival (HR=3.85, 95% CI=1.34-11.05 in discovery set; HR=1.72, 95% CI=1.21-2.46 in validation set 1; and HR=3.35, 95% CI=1.08-10.04 in normal lung set). The 3-gene prognostic result was further validated using 92 adenocarcinoma tumor samples (validation set 2); patients with a high-risk gene signature have a 1.52 fold increased risk (95% CI, 1.02–2.24) of death than patients with a low-risk gene signature. These results suggest that network-based approach may facilitate discovery of key genes that are closely linked to survival in patients with lung adenocarcinoma.
PMCID: PMC3595338  PMID: 23357462
Lung cancer; survival; gene expression profiling; cell cycle; systems biology
8.  Gene Expression, Single Nucleotide Variant and Fusion Transcript Discovery in Archival Material from Breast Tumors 
PLoS ONE  2013;8(11):e81925.
Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively, p<2x10-16. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries, but detection of eSNV and fusion transcripts was less sensitive.
PMCID: PMC3838386  PMID: 24278466
9.  Skeletal Muscle and Bone Marrow Derived Stromal Cells: A Comparison of Tenocyte Differentiation Capabilities 
This study investigated the comparative ability of bone marrow and skeletal muscle derived stromal cells (BMSC’s and SMSC’s) to express a tenocyte phenotype, and whether this expression could be augmented by growth and differentiation factor-5 (GDF-5).
Tissue harvest was performed on the hind limbs of seven dogs. Stromal cells were isolated via serial expansion in culture. After 4 passages, tenogenesis was induced using either ascorbic acid alone or in conjunction with GDF-5. CD44, tenomodulin, collagen I, and collagen III expression levels were compared for each culture condition at 7 and 14 days following induction. Immunohistochemistry (IHC) was performed to evaluate cell morphology and production of tenomodulin and collagen I.
SMSC’s and BMSC’s were successfully isolated in culture. Following tenocytic induction, SMSC’s demonstrated an increased mean relative expression of tenomodulin, collagen I, and collagen III at 14 days. BMSC’s only showed increased mean relative expression of collagen I, and collagen III at 14 days. IHC revealed positive staining for tenomodulin and collagen I at 14 days for both cell types. The morphology of skeletal muscle derived stromal cells at 14 days had an organized appearance in contrast to the haphazard arrangement of the bone marrow derived cells. GDF-5 did not affect gene expression, cell staining, or cell morphology significantly.
Stromal cells from either bone marrow or skeletal muscle can be induced to increase expression of matrix genes; however, based on expression of tenomodulin and cell culture morphology SMSC’s may be a more ideal candidate for tenocytic differentiation.
PMCID: PMC3402710  PMID: 22511232
Progenitors and Stem Cells; Tissue Engineering; Tendon Biology; Collagen and Matrix Proteins; Biomarkers
10.  Novel late-onset Alzheimer disease loci variants associate with brain gene expression 
Neurology  2012;79(3):221-228.
Recent genome-wide association studies (GWAS) of late-onset Alzheimer disease (LOAD) identified 9 novel risk loci. Discovery of functional variants within genes at these loci is required to confirm their role in Alzheimer disease (AD). Single nucleotide polymorphisms that influence gene expression (eSNPs) constitute an important class of functional variants. We therefore investigated the influence of the novel LOAD risk loci on human brain gene expression.
We measured gene expression levels in the cerebellum and temporal cortex of autopsied AD subjects and those with other brain pathologies (∼400 total subjects). To determine whether any of the novel LOAD risk variants are eSNPs, we tested their cis-association with expression of 6 nearby LOAD candidate genes detectable in human brain (ABCA7, BIN1, CLU, MS4A4A, MS4A6A, PICALM) and an additional 13 genes ±100 kb of these SNPs. To identify additional eSNPs that influence brain gene expression levels of the novel candidate LOAD genes, we identified SNPs ±100 kb of their location and tested for cis-associations.
CLU rs11136000 (p = 7.81 × 10−4) and MS4A4A rs2304933/rs2304935 (p = 1.48 × 10−4–1.86 × 10−4) significantly influence temporal cortex expression levels of these genes. The LOAD-protective CLU and risky MS4A4A locus alleles associate with higher brain levels of these genes. There are other cis-variants that significantly influence brain expression of CLU and ABCA7 (p = 4.01 × 10−5–9.09 × 10−9), some of which also associate with AD risk (p = 2.64 × 10−2–6.25 × 10−5).
CLU and MS4A4A eSNPs may at least partly explain the LOAD risk association at these loci. CLU and ABCA7 may harbor additional strong eSNPs. These results have implications in the search for functional variants at the novel LOAD risk loci.
PMCID: PMC3398432  PMID: 22722634
11.  Increased miR-708 Expression in NSCLC and Its Association with Poor Survival in Lung Adenocarcinoma from Never Smokers 
MicroRNA plays an important role in human diseases and cancer. We seek to investigate the expression status, clinical relevance, and functional role of microRNA in non-small cell lung cancer.
Experimental Design
We performed miRNA expression profiling in matched lung adenocarcinoma and uninvolved lung using 56 pairs of fresh-frozen (FF) and 47 pairs of formalin-fixed, paraffin-embedded (FFPE) samples from never smokers. The most differentially expressed miRNA genes were evaluated by Cox analysis and Log-Rank test. Among the best candidate, miR-708 was further examined for differential expression in two independent cohorts. Functional significance of miR-708 expression in lung cancer was examined by identifying its candidate mRNA target and through manipulating its expression levels in cultured cells.
Among the 20 miRNAs most differentially expressed between tested tumor and normal samples, high expression level of miR-708 in the tumors was most strongly associated with an increased risk of death after adjustments for all clinically significant factors including age, sex, and tumor stage (FF cohort: HR, 1.90; 95% CI, 1.08-3.35; P=.025 and FFPE cohort: HR, 1.93; 95% CI, 1.02-3.63; P=.042). The transcript for TMEM88 gene has a miR-708 binding site in its 3′ UTR and was significantly reduced in tumors high of miR-708. Forced miR-708 expression reduced TMEM88 transcript levels and increased the rate of cell proliferation, invasion, and migration in culture.
MicroRNA-708 acts as an oncogene contributing to tumor growth and disease progression by directly down regulating TMEM88, a negative regulator of the Wnt signaling pathway in lung cancer.
PMCID: PMC3616503  PMID: 22573352
NSCLC; adenocarcinoma; miR-708; never smoker; survival; TMEM88; Wnt signaling
12.  SH3GL2 is frequently deleted in non-small cell lung cancer and downregulates tumor growth by modulating EGFR signaling 
The purpose of this study was to identify key genetic pathways involved in non-small cell lung cancer (NSCLC) and understand their role in tumor progression. We performed a genome wide scanning using paired tumors and corresponding 16 mucosal biopsies from four follow-up lung cancer patients on Affymetrix 250K-NSpI array platform. We found that a single gene SH3GL2 located on human chromosome 9p22 was most frequently deleted in all the tumors and corresponding mucosal biopsies. We further validated the alteration pattern of SH3GL2 in a substantial number of primary NSCLC tumors at DNA and protein level. We also overexpressed wild-type SH3GL2 in three NSCLC cell lines to understand its role in NSCLC progression. Validation in 116 primary NSCLC tumors confirmed frequent loss of heterozygosity of SH3GL2 in overall 51 % (49/97) of the informative cases. We found significantly low (p=0.0015) SH3GL2 protein expression in 71 % (43/60) primary tumors. Forced over-expression of wild-type (wt) SH3GL2 in three NSCLC cell lines resulted in a marked reduction of active epidermal growth factor receptor (EGFR) expression and an increase in EGFR internalization and degradation. Significantly decreased in vitro (p=0.0015–0.030) and in vivo (p=0.016) cellular growth, invasion (p=0.029–0.049), and colony formation (p=0.023–0.039) were also evident in the wt-SH3GL2-transfected cells accompanied by markedly low expression of activated AKT(Ser473), STAT3 (Tyr705), and PI3K. Downregulation of SH3GL2 interactor USP9X and activated β-catenin was also evident in the SH3GL2-transfected cells. Our results indicate that SH3GL2 is frequently deleted in NSCLC and regulates cellular growth and invasion by modulating EGFR function.
PMCID: PMC3691869  PMID: 22968441
Single nucleotide polymorphism array; Lung cancer; SH3GL2; Deletion
13.  EGFR Somatic Mutations in Lung Tumors: Radon Exposure and Passive-smoking in Former- and Never-smoking U.S. Women 
Lung cancer patients with mutations in EGFR tyrosine kinase have improved prognosis when treated with EGFR inhibitors. We hypothesized that EGFR mutations may be related to residential radon or passive tobacco smoke.
This hypothesis was investigated by analyzing EGFR mutations in seventy lung tumors from a population of never and long-term former female smokers from Missouri with detailed exposure assessments. The relationship with passive-smoking was also examined in never-smoking female lung cancer cases from the Mayo clinic.
Overall, the frequency of EGFR mutation was 41% [95% Confidence Interval (CI): 32-49%]. Neither radon nor passive-smoking exposure was consistently associated with EGFR mutations in lung tumors.
The results suggest that EGFR mutations are common in female, never-smoking, lung cancer cases from the U.S, and EGFR mutations are unlikely due to exposure to radon or passive-smoking.
PMCID: PMC3372599  PMID: 22523180
EGFR mutations; never-smokers; lung cancer; radon; passive-smoking; second hand smoke; tobacco smoke
14.  High Throughput Quantitative PCR Using Low-input Samples for mRNA and MicroRNA Gene Expression Analyses 
Technical advancements in quantitative PCR (qPCR) instrumentation have made it possible to perform gene expression measurements using small sample input to support both basic and clinical research studies. As part of the strategic goals to assess new technologies and identify protocols that best fit the needs of the Mayo Clinic, we compared the Fluidigm BioMark system with standard Applied Biosystems (AB) instrumentation for mRNA and miRNA gene expression measurements. We also examined the performance of the BioMark system when using very low-input RNA.
We evaluated a set of control samples using the same TaqMan assays with both systems. We observed that the BioMark-generated data routinely yields Ct values approximately 10 cycles lower than those obtained with AB instrumentation. The correlations between the two platforms were high (r = 0.96) for both mRNA and miRNA expression experiments. For miRNA expression, a similarly high correlation was observed between fresh frozen and formalin-fixed paraffin embedded (FFPE) samples.
In an effort to accommodate our customer needs, we also evaluated the performance of the BioMark for evaluating gene expression in very low-input samples. Using six standard TaqMan control assays (having high, medium and low expression levels), we observed that high quality RNA samples as low as 10pg achieved linear amplification across four different pre-amplification cycles (10, 14, 18 and 22). At 10pg total RNA input, low-expression control assay IPO8 demonstrated a correlation of r = .999 among the four pre-amplification cycles. This linearity was also observed at higher RNA input levels, up to 10ng. The only control assay that did not perform in a linear fashion across all input amounts and all pre-amplification cycles was 18S ribosomal RNA. The highest correlation observed for 18S was r = 0.801, and this supports the vendor suggestion that 18S is not the best control assay option.
PMCID: PMC3635317
15.  Glucose transporter-1 in pulmonary neuroendocrine carcinomas: expression and survival analysis 
Glucose transporter-1 (GLUT-1) mediates the transport of glucose across the cellular membrane. Its elevated levels and/or activation have been shown to be associated with malignancy. The aim of this study was to investigate GLUT-1 expression in pulmonary neuroendocrine carcinomas. Tissue microarray-based samples of 178 neuroendocrine carcinomas, including 48 typical carcinoids, 31 atypical carcinoids, 27 large cell neuroendocrine carcinomas and 72 small cell carcinomas from different patients, were studied immunohistochemically for GLUT-1 expression. Forty-seven percent (75/161) of pulmonary neuroendocrine carcinomas were immunoreactive with GLUT-1. GLUT-1 was observed in 7% (3/46) of typical carcinoid, 21% (6/29) of atypical carcinoid, 74% (17/23) of large cell neuroendocrine carcinoma and 78% (49/63) of small cell carcinoma. GLUT-1 expression correlated with increasing patient age (P = 0.01) and with neuroendocrine differentiation/tumor type (P < 0.001), but not with gender, tumor size or stage. GLUT-1 expression was seen in a characteristic membranous pattern of staining along the luminal borders or adjacent to necrotic areas. GLUT-1 expression was associated with an increased risk of death for neuroendocrine carcinomas as a group (risk ratio = 2.519; 95% confidence interval = 1.519–4.178; P < 0.001) and carcinoids (risk ratio = 4.262; 95% confidence interval = 1.472–12.343; P = 0.01). In conclusion, GLUT-1 is expressed in approximately half of the pulmonary neuroendocrine carcinomas and shows a strong correlation with neuroendocrine differentiation/grade, but not with other clinicopathologic variables. Further studies appear plausible to elucidate the prognostic significance of GLUT-1 expression in pulmonary carcinoids.
PMCID: PMC3616507  PMID: 19234439
GLUT-1; neuroendocrine carcinoma; carcinoid; survival; lung
16.  SMAD6 Contributes to Patient Survival in Non–Small Cell Lung Cancer and Its Knockdown Reestablishes TGF-β Homeostasis in Lung Cancer Cells 
Cancer research  2008;68(23):9686-9692.
The malignant transformation in several types of cancer, including lung cancer, results in a loss of growth inhibition by transforming growth factor-β (TGF-β). Here, we show that SMAD6 expression is associated with a reduced survival in lung cancer patients. Short hairpin RNA (shRNA)–mediated knockdown of SMAD6 in lung cancer cell lines resulted in reduced cell viability and increased apoptosis as well as inhibition of cell cycle progression. However, these results were not seen in Beas2B, a normal bronchial epithelial cell line. To better understand the mechanism underlying the association of SMAD6 with poor patient survival, we used a lentivirus construct carrying shRNA for SMAD6 to knock down expression of the targeted gene. Through gene expression analysis, we observed that knockdown of SMAD6 led to the activation of TGF-β signaling through up-regulation of plasminogen activator inhibitor-1 and phosphorylation of SMAD2/3. Furthermore, SMAD6 knockdown activated the c-Jun NH2-terminal kinase pathway and reduced phosphorylation of Rb-1, resulting in increased G0-G1 cell arrest and apoptosis in the lung cancer cell line H1299. These results jointly suggest that SMAD6 plays a critical role in supporting lung cancer cell growth and survival. Targeted inactivation of SMAD6 may provide a novel therapeutic strategy for lung cancers expressing this gene.
PMCID: PMC3617041  PMID: 19047146
17.  Multi-Platform Analysis of MicroRNA Expression Measurements in RNA from Fresh Frozen and FFPE Tissues 
PLoS ONE  2013;8(1):e52517.
MicroRNAs play a role in regulating diverse biological processes and have considerable utility as molecular markers for diagnosis and monitoring of human disease. Several technologies are available commercially for measuring microRNA expression. However, cross-platform comparisons do not necessarily correlate well, making it difficult to determine which platform most closely represents the true microRNA expression level in a tissue. To address this issue, we have analyzed RNA derived from cell lines, as well as fresh frozen and formalin-fixed paraffin embedded tissues, using Affymetrix, Agilent, and Illumina microRNA arrays, NanoString counting, and Illumina Next Generation Sequencing. We compared the performance within- and between the different platforms, and then verified these results with those of quantitative PCR data. Our results demonstrate that the within-platform reproducibility for each method is consistently high and although the gene expression profiles from each platform show unique traits, comparison of genes that were commonly detectable showed that detection of microRNA transcripts was similar across multiple platforms.
PMCID: PMC3561362  PMID: 23382819
18.  Quality assessment metrics for whole genome gene expression profiling of paraffin embedded samples 
BMC Research Notes  2013;6:33.
Formalin fixed, paraffin embedded tissues are most commonly used for routine pathology analysis and for long term tissue preservation in the clinical setting. Many institutions have large archives of Formalin fixed, paraffin embedded tissues that provide a unique opportunity for understanding genomic signatures of disease. However, genome-wide expression profiling of Formalin fixed, paraffin embedded samples have been challenging due to RNA degradation. Because of the significant heterogeneity in tissue quality, normalization and analysis of these data presents particular challenges. The distribution of intensity values from archival tissues are inherently noisy and skewed due to differential sample degradation raising two primary concerns; whether a highly skewed array will unduly influence initial normalization of the data and whether outlier arrays can be reliably identified.
Two simple extensions of common regression diagnostic measures are introduced that measure the stress an array undergoes during normalization and how much a given array deviates from the remaining arrays post-normalization. These metrics are applied to a study involving 1618 formalin-fixed, paraffin-embedded HER2-positive breast cancer samples from the N9831 adjuvant trial processed with Illumina’s cDNA-mediated Annealing Selection extension and Ligation assay.
Proper assessment of array quality within a research study is crucial for controlling unwanted variability in the data. The metrics proposed in this paper have direct biological interpretations and can be used to identify arrays that should either be removed from analysis all together or down-weighted to reduce their influence in downstream analyses.
PMCID: PMC3626608  PMID: 23360712
High-dimensional array quality; Formalin-Fixed; Paraffin-embedded tissue; Outlier detection
19.  Brain Expression Genome-Wide Association Study (eGWAS) Identifies Human Disease-Associated Variants 
PLoS Genetics  2012;8(6):e1002707.
Genetic variants that modify brain gene expression may also influence risk for human diseases. We measured expression levels of 24,526 transcripts in brain samples from the cerebellum and temporal cortex of autopsied subjects with Alzheimer's disease (AD, cerebellar n = 197, temporal cortex n = 202) and with other brain pathologies (non–AD, cerebellar n = 177, temporal cortex n = 197). We conducted an expression genome-wide association study (eGWAS) using 213,528 cisSNPs within ±100 kb of the tested transcripts. We identified 2,980 cerebellar cisSNP/transcript level associations (2,596 unique cisSNPs) significant in both ADs and non–ADs (q<0.05, p = 7.70×10−5–1.67×10−82). Of these, 2,089 were also significant in the temporal cortex (p = 1.85×10−5–1.70×10−141). The top cerebellar cisSNPs had 2.4-fold enrichment for human disease-associated variants (p<10−6). We identified novel cisSNP/transcript associations for human disease-associated variants, including progressive supranuclear palsy SLCO1A2/rs11568563, Parkinson's disease (PD) MMRN1/rs6532197, Paget's disease OPTN/rs1561570; and we confirmed others, including PD MAPT/rs242557, systemic lupus erythematosus and ulcerative colitis IRF5/rs4728142, and type 1 diabetes mellitus RPS26/rs1701704. In our eGWAS, there was 2.9–3.3 fold enrichment (p<10−6) of significant cisSNPs with suggestive AD–risk association (p<10−3) in the Alzheimer's Disease Genetics Consortium GWAS. These results demonstrate the significant contributions of genetic factors to human brain gene expression, which are reliably detected across different brain regions and pathologies. The significant enrichment of brain cisSNPs among disease-associated variants advocates gene expression changes as a mechanism for many central nervous system (CNS) and non–CNS diseases. Combined assessment of expression and disease GWAS may provide complementary information in discovery of human disease variants with functional implications. Our findings have implications for the design and interpretation of eGWAS in general and the use of brain expression quantitative trait loci in the study of human disease genetics.
Author Summary
Genetic variants that regulate gene expression levels can also influence human disease risk. Discovery of genomic loci that alter brain gene expression levels (brain expression quantitative trait loci = eQTLs) can be instrumental in the identification of genetic risk underlying both central nervous system (CNS) and non–CNS diseases. To systematically assess the role of brain eQTLs in human disease and to evaluate the influence of brain region and pathology in eQTL mapping, we performed an expression genome-wide association study (eGWAS) in 773 brain samples from the cerebellum and temporal cortex of ∼200 autopsied subjects with Alzheimer's disease (AD) and ∼200 with other brain pathologies (non–AD). We identified ∼3,000 significant associations between cisSNPs near ∼700 genes and their cerebellar transcript levels, which replicate in ADs and non–ADs. More than 2,000 of these associations were reproducible in the temporal cortex. The top cisSNPs are enriched for both CNS and non–CNS disease-associated variants. We identified novel and confirmed previous cisSNP/transcript associations for many disease loci, suggesting gene expression regulation as their mechanism of action. These findings demonstrate the reproducibility of the eQTL approach across different brain regions and pathologies, and advocate the combined use of gene expression and disease GWAS for identification and functional characterization of human disease-associated variants.
PMCID: PMC3369937  PMID: 22685416
20.  Glutathione S-transferase omega genes in Alzheimer and Parkinson disease risk, age-at-diagnosis and brain gene expression: an association study with mechanistic implications 
Glutathione S-transferase omega-1 and 2 genes (GSTO1, GSTO2), residing within an Alzheimer and Parkinson disease (AD and PD) linkage region, have diverse functions including mitigation of oxidative stress and may underlie the pathophysiology of both diseases. GSTO polymorphisms were previously reported to associate with risk and age-at-onset of these diseases, although inconsistent follow-up study designs make interpretation of results difficult. We assessed two previously reported SNPs, GSTO1 rs4925 and GSTO2 rs156697, in AD (3,493 ADs vs. 4,617 controls) and PD (678 PDs vs. 712 controls) for association with disease risk (case-controls), age-at-diagnosis (cases) and brain gene expression levels (autopsied subjects).
We found that rs156697 minor allele associates with significantly increased risk (odds ratio = 1.14, p = 0.038) in the older ADs with age-at-diagnosis > 80 years. The minor allele of GSTO1 rs4925 associates with decreased risk in familial PD (odds ratio = 0.78, p = 0.034). There was no other association with disease risk or age-at-diagnosis. The minor alleles of both GSTO SNPs associate with lower brain levels of GSTO2 (p = 4.7 × 10-11-1.9 × 10-27), but not GSTO1. Pathway analysis of significant genes in our brain expression GWAS, identified significant enrichment for glutathione metabolism genes (p = 0.003).
These results suggest that GSTO locus variants may lower brain GSTO2 levels and consequently confer AD risk in older age. Other glutathione metabolism genes should be assessed for their effects on AD and other chronic, neurologic diseases.
PMCID: PMC3393625  PMID: 22494505
GSTO genes; Disease risk; Gene expression; Association
21.  Loss of Cytoplasmic CDK1 Predicts Poor Survival in Human Lung Cancer and Confers Chemotherapeutic Resistance 
PLoS ONE  2011;6(8):e23849.
The dismal lethality of lung cancer is due to late stage at diagnosis and inherent therapeutic resistance. The incorporation of targeted therapies has modestly improved clinical outcomes, but the identification of new targets could further improve clinical outcomes by guiding stratification of poor-risk early stage patients and individualizing therapeutic choices. We hypothesized that a sequential, combined microarray approach would be valuable to identify and validate new targets in lung cancer. We profiled gene expression signatures during lung epithelial cell immortalization and transformation, and showed that genes involved in mitosis were progressively enhanced in carcinogenesis. 28 genes were validated by immunoblotting and 4 genes were further evaluated in non-small cell lung cancer tissue microarrays. Although CDK1 was highly expressed in tumor tissues, its loss from the cytoplasm unexpectedly predicted poor survival and conferred resistance to chemotherapy in multiple cell lines, especially microtubule-directed agents. An analysis of expression of CDK1 and CDK1-associated genes in the NCI60 cell line database confirmed the broad association of these genes with chemotherapeutic responsiveness. These results have implications for personalizing lung cancer therapy and highlight the potential of combined approaches for biomarker discovery.
PMCID: PMC3161069  PMID: 21887332
22.  Genetic variants and risk of lung cancer in never smokers: a genome-wide association study 
The lancet oncology  2010;11(4):321-330.
Lung cancer in individuals who have never smoked tobacco products is an increasing medical and public-health issue. We aimed to unravel the genetic basis of lung cancer in never smokers.
We did a four-stage investigation. First, a genome-wide association study of single nucleotide polymorphisms (SNPs) was done with 754 never smokers (377 matched case-control pairs at Mayo Clinic, Rochester, MN, USA). Second, the top candidate SNPs from the first study were validated in two independent studies among 735 (MD Anderson Cancer Center, Houston, TX, USA) and 253 (Harvard University, Boston, MA, USA) never smokers. Third, further replication of the top SNP was done in 530 never smokers (UCLA, Los Angeles, CA, USA). Fourth, expression quantitative trait loci (eQTL) and gene-expression differences were analysed to further elucidate the causal relation between the validated SNPs and the risk of lung cancer in never smokers.
44 top candidate SNPs were identified that might alter the risk of lung cancer in never smokers. rs2352028 at chromosome 13q31.3 was subsequently replicated with an additive genetic model in the four independent studies, with a combined odds ratio of 1·46 (95% CI 1·26–1·70, p=5·94×10−6). A cis eQTL analysis showed there was a strong correlation between genotypes of the replicated SNPs and the transcription level of the gene GPC5 in normal lung tissues (p=1·96×10−4), with the high-risk allele linked with lower expression. Additionally, the transcription level of GPC5 in normal lung tissue was twice that detected in matched lung adenocarcinoma tissue (p=6·75×10−11).
Genetic variants at 13q31.3 alter the expression of GPC5, and are associated with susceptibility to lung cancer in never smokers. Downregulation of GPC5 might contribute to the development of lung cancer in never smokers.
PMCID: PMC2945218  PMID: 20304703
23.  Quantitative miRNA Expression Analysis Using Fluidigm Microfluidics Dynamic Arrays 
BMC Genomics  2011;12:144.
MicroRNAs (miRNAs) represent a growing class of small non-coding RNAs that are important regulators of gene expression in both plants and animals. Studies have shown that miRNAs play a critical role in human cancer and they can influence the level of cell proliferation and apoptosis by modulating gene expression. Currently, methods for the detection and measurement of miRNA expression include small and moderate-throughput technologies, such as standard quantitative PCR and microarray based analysis. However, these methods have several limitations when used in large clinical studies where a high-throughput and highly quantitative technology needed for the efficient characterization of a large number of miRNA transcripts in clinical samples. Furthermore, archival formalin fixed, paraffin embedded (FFPE) samples are increasingly becoming the primary resource for gene expression studies because fresh frozen (FF) samples are often difficult to obtain and requires special storage conditions. In this study, we evaluated the miRNA expression levels in FFPE and FF samples as well as several lung cancer cell lines employing a high throughput qPCR-based microfluidic technology. The results were compared to standard qPCR and hybridization-based microarray platforms using the same samples.
We demonstrated highly correlated Ct values between multiplex and singleplex RT reactions in standard qPCR assays for miRNA expression using total RNA from A549 (R = 0.98; p < 0.0001) and H1299 (R = 0.95; p < 0.0001) lung cancer cell lines. The Ct values generated by the microfluidic technology (Fluidigm 48.48 dynamic array systems) resulted in a left-shift toward lower Ct values compared to those observed by ABI 7900 HT (mean difference, 3.79), suggesting that the microfluidic technology exhibited a greater sensitivity. In addition, we show that as little as 10 ng total RNA can be used to reliably detect all 48 or 96 tested miRNAs using a 96-multiplexing RT reaction in both FFPE and FF samples. Finally, we compared miRNA expression measurements in both FFPE and FF samples by qPCR using the 96.96 dynamic array and Affymetrix microarrays. Fold change comparisons for comparable genes between the two platforms indicated that the overall correlation was R = 0.60. The maximum fold change detected by the Affymetrix microarray was 3.5 compared to 13 by the 96.96 dynamic array.
The qPCR-array based microfluidic dynamic array platform can be used in conjunction with multiplexed RT reactions for miRNA gene expression profiling. We showed that this approach is highly reproducible and the results correlate closely with the existing singleplex qPCR platform at a throughput that is 5 to 20 times higher and a sample and reagent usage that was approximately 50-100 times lower than conventional assays. We established optimal conditions for using the Fluidigm microfluidic technology for rapid, cost effective, and customizable arrays for miRNA expression profiling and validation.
PMCID: PMC3062620  PMID: 21388556
24.  Expression profiling of formalin-fixed paraffin-embedded primary breast tumors using cancer-specific and whole genome gene panels on the DASL® platform 
BMC Medical Genomics  2010;3:60.
The cDNA-mediated Annealing, extension, Selection and Ligation (DASL) assay has become a suitable gene expression profiling system for degraded RNA from paraffin-embedded tissue. We examined assay characteristics and the performance of the DASL 502-gene Cancer Panelv1 (1.5K) and 24,526-gene panel (24K) platforms at differentiating nine human epidermal growth factor receptor 2- positive (HER2+) and 11 HER2-negative (HER2-) paraffin-embedded breast tumors.
Bland-Altman plots and Spearman correlations evaluated intra/inter-panel agreement of normalized expression values. Unequal-variance t-statistics tested for differences in expression levels between HER2 + and HER2 - tumors. Regulatory network analysis was performed using Metacore (GeneGo Inc., St. Joseph, MI).
Technical replicate correlations ranged between 0.815-0.956 and 0.986-0.997 for the 1.5K and 24K panels, respectively. Inter-panel correlations of expression values for the common 498 genes across the two panels ranged between 0.485-0.573. Inter-panel correlations of expression values of 17 probes with base-pair sequence matches between the 1.5K and 24K panels ranged between 0.652-0.899. In both panels, erythroblastic leukemia viral oncogene homolog 2 (ERBB2) was the most differentially expressed gene between the HER2 + and HER2 - tumors and seven additional genes had p-values < 0.05 and log2 -fold changes > |0.5| in expression between HER2 + and HER2 - tumors: topoisomerase II alpha (TOP2A), cyclin a2 (CCNA2), v-fos fbj murine osteosarcoma viral oncogene homolog (FOS), wingless-type mmtv integration site family, member 5a (WNT5A), growth factor receptor-bound protein 7 (GRB7), cell division cycle 2 (CDC2), and baculoviral iap repeat-containing protein 5 (BIRC5). The top 52 discriminating probes from the 24K panel are enriched with genes belonging to the regulatory networks centered around v-myc avian myelocytomatosis viral oncogene homolog (MYC), tumor protein p53 (TP53), and estrogen receptor α (ESR1). Network analysis with a two-step extension also showed that the eight discriminating genes common to the 1.5K and 24K panels are functionally linked together through MYC, TP53, and ESR1.
The relative RNA abundance obtained from two highly differing density gene panels are correlated with eight common genes differentiating HER2 + and HER2 - breast tumors. Network analyses demonstrated biological consistency between the 1.5K and 24K gene panels.
PMCID: PMC3022545  PMID: 21172013
25.  Childhood Exposure to Secondhand Smoke and Functional Mannose Binding Lectin Polymorphisms Are Associated with Increased Lung Cancer Risk 
Exposure to secondhand smoke during adulthood has detrimental health effects, including increased lung cancer risk. Compared with adults, children may be more susceptible to secondhand smoke. This susceptibility may be exacerbated by alterations in inherited genetic variants of innate immunity genes. We hypothesized a positive association between childhood secondhand smoke exposure and lung cancer risk that would be modified by genetic polymorphisms in the mannose binding lectin-2 (MBL2) gene resulting in well-known functional changes in innate immunity.
Childhood secondhand smoke exposure and lung cancer risk was assessed among men and women in the ongoing National Cancer Institute-Maryland Lung Cancer (NCI-MD) study, which included 624 cases and 348 controls. Secondhand smoke history was collected via in-person interviews. DNA was used for genotyping the MBL2 gene. To replicate, we used an independent case-control study from Mayo Clinic consisting of 461 never smokers, made up of 172 cases and 289 controls. All statistical tests were two-sided.
In the NCI-MD study, secondhand smoke exposure during childhood was associated with increased lung cancer risk among never smokers [odds ratio (OR), 2.25; 95% confidence interval (95% CI), 1.04-4.90]. This was confirmed in the Mayo study (OR, 1.47; 95% CI, 1.00-2.15). A functional MBL2 haplotype associated with high circulating levels of MBL and increased MBL2 activity was associated with increased lung cancer risk among those exposed to childhood secondhand smoke in both the NCI-MD and Mayo studies (OR, 2.52; 95% CI, 1.13-5.60, and OR, 2.78; 95% CI, 1.18-3.85, respectively).
Secondhand smoke exposure during childhood is associated with increased lung cancer risk among never smokers, particularly among those possessing a haplotype corresponding to a known overactive complement pathway of the innate immune system.
PMCID: PMC2951599  PMID: 19959685

