Lung adenocarcinoma is the most common type of primary lung cancer. The purpose of this study was to delineate gene expression patterns for survival prediction in lung adenocarcinoma. Gene expression profiles of 82 (discovery set) and 442 (validation set 1) lung adenocarcinoma tumor tissues were analyzed using a systems biology-based network approach. We also examined the expression profiles of 78 adjacent normal lung tissues from 82 patients. We found a significant correlation of an expression module with overall survival (adjusted hazard ratio or HR=1.71; 95% CI=1.06-2.74 in discovery set; adjusted HR=1.26; 95% CI=1.08-1.49 in validation set 1). This expression module contained genes enriched in the biological process of the cell cycle. Interestingly, the cell cycle gene module and overall survival association were also significant in normal lung tissues (adjusted HR=1.91; 95% CI, 1.32-2.75). From these survival-related modules, we further defined three hub genes (UBE2C, TPX2 and MELK) whose expression-based risk indices were more strongly associated with poor 5-year survival (HR=3.85, 95% CI=1.34-11.05 in discovery set; HR=1.72, 95% CI=1.21-2.46 in validation set 1; and HR=3.35, 95% CI=1.08-10.04 in normal lung set). The 3-gene prognostic result was further validated using 92 adenocarcinoma tumor samples (validation set 2); patients with a high-risk gene signature have a 1.52 fold increased risk (95% CI, 1.02–2.24) of death than patients with a low-risk gene signature. These results suggest that network-based approach may facilitate discovery of key genes that are closely linked to survival in patients with lung adenocarcinoma.
Lung cancer; survival; gene expression profiling; cell cycle; systems biology
Caloric restriction (CR) mitigates many detrimental effects of aging and prolongs lifespan. CR has been suggested to increase mitochondrial biogenesis, thereby attenuating age-related declines in mitochondrial function; a concept that is challenged by recent studies. Here we show that lifelong CR in mice prevents age-related loss of mitochondrial oxidative capacity and efficiency, measured in isolated mitochondria and permeabilized muscle fibers. We find that these beneficial effects of CR occur without increasing mitochondrial abundance. Whole-genome expression profiling and large-scale proteomic surveys revealed expression patterns inconsistent with increased mitochondrial biogenesis, which is further supported by lower mitochondrial protein synthesis with CR. We find that CR decreases oxidant emission, increases antioxidant scavenging, and minimizes oxidative damage to DNA and protein. These results demonstrate that CR preserves mitochondrial function by protecting the integrity and function of existing cellular components rather than by increasing mitochondrial biogenesis.
Caloric Restriction; Calorie restriction; dietary restriction; aging; mitochondria; protein synthesis; skeletal muscle; mitochondrial biogenesis
Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively, p<2x10-16. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries, but detection of eSNV and fusion transcripts was less sensitive.
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative, and HER2-positive tumors to identify 685 differentially expressed genes, 102 alternatively spliced genes, and 303 genes that expressed single nucleotide sequence variants (eSNVs) that were associated with the HER2-positive tumors in our survey panel. These features were integrated into a transcriptome landscape model that identified 12 highly interconnected genomic modules, each of which represents a cellular processes pathway that appears to define the genomic architecture of the HER2-positive tumors in our test set. The generality of the model was confirmed by the observation that several key pathways were enriched in HER2-positive TCGA breast tumors. The ability of this model to make relevant predictions about the biology of breast cancer cells was established by the observation that integrin signaling was linked to lapatinib sensitivity in vitro and strongly associated with risk of relapse in the NCCTG N9831 adjuvant trastuzumab clinical trial dataset. Additional modules from the HER2 transcriptome model, including ubiquitin-mediated proteolysis, TGF-beta signaling, RHO-family GTPase signaling, and M-phase progression, were linked to response to lapatinib and paclitaxel in vitro and/or risk of relapse in the N9831 dataset. These data indicate that an integrated transcriptome landscape model derived from a test set of HER2-positive breast tumors has potential for predicting outcome and for identifying novel potential therapeutic strategies for this breast cancer subtype.
The sequencing by the PolyA selection is the most common approach for library preparation. With limited amount or degraded RNA, alternative protocols such as the NuGEN have been developed. However, it is not yet clear how the different library preparations affect the downstream analyses of the broad applications of RNA sequencing.
Methods and Materials
Eight human mammary epithelial cell (HMEC) lines with high quality RNA were sequenced by Illumina’s mRNA-Seq PolyA selection and NuGEN ENCORE library preparation. The following analyses and comparisons were conducted: 1) the numbers of genes captured by each protocol; 2) the impact of protocols on differentially expressed gene detection between biological replicates; 3) expressed single nucleotide variant (SNV) detection; 4) non-coding RNAs, particularly lincRNA detection; and 5) intragenic gene expression.
Sequences from the NuGEN protocol had lower (75%) alignment rate than the PolyA (over 90%). The NuGEN protocol detected fewer genes (12–20% less) with a significant portion of reads mapped to non-coding regions. A large number of genes were differentially detected between the two protocols. About 17–20% of the differentially expressed genes between biological replicates were commonly detected between the two protocols. Significantly higher numbers of SNVs (5–6 times) were detected in the NuGEN samples, which were largely from intragenic and intergenic regions. The NuGEN captured fewer exons (25% less) and had higher base level coverage variance. While 6.3% of reads were mapped to intragenic regions in the PolyA samples, the percentages were much higher (20–25%) for the NuGEN samples. The NuGEN protocol did not detect more known non-coding RNAs such as lincRNAs, but targeted small and “novel” lincRNAs.
Different library preparations can have significant impacts on downstream analysis and interpretation of RNA-seq data. The NuGEN provides an alternative for limited or degraded RNA but it has limitations for some RNA-seq applications.
The objective of this study was to analyze genome-wide differential methylation patterns in maternal leukocyte DNA in early pregnant and non-pregnant states. This is an age and body mass index matched case-control study comparing the methylation patterns of 27,578 cytosine-guanine (CpG) sites in 14,495 genes in maternal leukocyte DNA in early pregnancy (n = 14), in the same women postpartum (n = 14), and in nulligravid women (n = 14) on a BeadChip platform. Transient widespread hypomethylation was found in early pregnancy as compared with the non-pregnant states. Methylation of nine genes was significantly different in early pregnancy compared with both postpartum and nulligravid states (< 10% False Discovery Rate). Early pregnancy may be characterized by widespread hypomethylation compared with non-pregnant states; there is no apparent permanent methylation imprint after a normal term gestation. Nine potential candidate genes were identified as differentially methylated in early pregnancy and may play a role in the maternal adaptation to pregnancy.
DNA Methylation; epigenetics; immune; leukocyte; maternal; postpartum; pregnancy
MicroRNA plays an important role in human diseases and cancer. We seek to investigate the expression status, clinical relevance, and functional role of microRNA in non-small cell lung cancer.
We performed miRNA expression profiling in matched lung adenocarcinoma and uninvolved lung using 56 pairs of fresh-frozen (FF) and 47 pairs of formalin-fixed, paraffin-embedded (FFPE) samples from never smokers. The most differentially expressed miRNA genes were evaluated by Cox analysis and Log-Rank test. Among the best candidate, miR-708 was further examined for differential expression in two independent cohorts. Functional significance of miR-708 expression in lung cancer was examined by identifying its candidate mRNA target and through manipulating its expression levels in cultured cells.
Among the 20 miRNAs most differentially expressed between tested tumor and normal samples, high expression level of miR-708 in the tumors was most strongly associated with an increased risk of death after adjustments for all clinically significant factors including age, sex, and tumor stage (FF cohort: HR, 1.90; 95% CI, 1.08-3.35; P=.025 and FFPE cohort: HR, 1.93; 95% CI, 1.02-3.63; P=.042). The transcript for TMEM88 gene has a miR-708 binding site in its 3′ UTR and was significantly reduced in tumors high of miR-708. Forced miR-708 expression reduced TMEM88 transcript levels and increased the rate of cell proliferation, invasion, and migration in culture.
MicroRNA-708 acts as an oncogene contributing to tumor growth and disease progression by directly down regulating TMEM88, a negative regulator of the Wnt signaling pathway in lung cancer.
NSCLC; adenocarcinoma; miR-708; never smoker; survival; TMEM88; Wnt signaling
Exosomes, endosome-derived membrane microvesicles, contain specific RNA transcripts that are thought to be involved in cell-cell communication. These RNA transcripts have great potential as disease biomarkers. To characterize exosomal RNA profiles systemically, we performed RNA sequencing analysis using three human plasma samples and evaluated the efficacies of small RNA library preparation protocols from three manufacturers. In all we evaluated 14 libraries (7 replicates).
From the 14 size-selected sequencing libraries, we obtained a total of 101.8 million raw single-end reads, an average of about 7.27 million reads per library. Sequence analysis showed that there was a diverse collection of the exosomal RNA species among which microRNAs (miRNAs) were the most abundant, making up over 42.32% of all raw reads and 76.20% of all mappable reads. At the current read depth, 593 miRNAs were detectable. The five most common miRNAs (miR-99a-5p, miR-128, miR-124-3p, miR-22-3p, and miR-99b-5p) collectively accounted for 48.99% of all mappable miRNA sequences. MiRNA target gene enrichment analysis suggested that the highly abundant miRNAs may play an important role in biological functions such as protein phosphorylation, RNA splicing, chromosomal abnormality, and angiogenesis. From the unknown RNA sequences, we predicted 185 potential miRNA candidates. Furthermore, we detected significant fractions of other RNA species including ribosomal RNA (9.16% of all mappable counts), long non-coding RNA (3.36%), piwi-interacting RNA (1.31%), transfer RNA (1.24%), small nuclear RNA (0.18%), and small nucleolar RNA (0.01%); fragments of coding sequence (1.36%), 5′ untranslated region (0.21%), and 3′ untranslated region (0.54%) were also present. In addition to the RNA composition of the libraries, we found that the three tested commercial kits generated a sufficient number of DNA fragments for sequencing but each had significant bias toward capturing specific RNAs.
This study demonstrated that a wide variety of RNA species are embedded in the circulating vesicles. To our knowledge, this is the first report that applied deep sequencing to discover and characterize profiles of plasma-derived exosomal RNAs. Further characterization of these extracellular RNAs in diverse human populations will provide reference profiles and open new doors for the development of blood-based biomarkers for human diseases.
Exosome; microRNA; Next generation sequencing; Plasma; Biomarker
Insulin regulates many cellular processes, but the full impact of insulin deficiency on cellular functions remains to be defined. Applying a mass spectrometry–based nontargeted metabolomics approach, we report here alterations of 330 plasma metabolites representing 33 metabolic pathways during an 8-h insulin deprivation in type 1 diabetic individuals. These pathways included those known to be affected by insulin such as glucose, amino acid and lipid metabolism, Krebs cycle, and immune responses and those hitherto unknown to be altered including prostaglandin, arachidonic acid, leukotrienes, neurotransmitters, nucleotides, and anti-inflammatory responses. A significant concordance of metabolome and skeletal muscle transcriptome–based pathways supports an assumption that plasma metabolites are chemical fingerprints of cellular events. Although insulin treatment normalized plasma glucose and many other metabolites, there were 71 metabolites and 24 pathways that differed between nondiabetes and insulin-treated type 1 diabetes. Confirmation of many known pathways altered by insulin using a single blood test offers confidence in the current approach. Future research needs to be focused on newly discovered pathways affected by insulin deficiency and systemic insulin treatment to determine whether they contribute to the high morbidity and mortality in T1D despite insulin treatment.
Chronic Obstructive Pulmonary Disease (COPD) is a strong risk factor for lung cancer. Published studies regarding variations of genes encoding glutathione metabolism, DNA repair, and inflammatory response pathways in susceptibility to COPD were inconclusive.
We evaluated 470 single nucleotide polymorphisms (SNPs) from 56 genes of these 3 pathways in 620 cases and 893 controls to identify susceptibility markers for COPD risk, using existing resources. We assessed SNP- and gene-level effects adjusting for sex, age, and smoking status. Differential genetic effects on disease risk with and without lung cancer were also assessed; cumulative risk models were established.
Twenty-one SNPs were found to be significantly associated with risk of COPD (P<0.01); gene-based analyses confirmed 2 genes (GCLC and GSS) and identified 3 additional (GSTO2, ERCC1, and RRM1). Carrying 12 high-risk alleles may increase risk by 2.7-fold; 8 SNPs altered COPD risk with lung cancer 3.1-fold, and 4 SNPs altered the risk without lung cancer 2.3-fold.
Our findings indicate that multiple genetic variations in the 3 selected pathways contribute to COPD risk through GCLC, GSS, GSTO2, ERCC1, and RRM1 genes. Functional studies are needed to elucidate the mechanisms of these genes in the development of COPD, lung cancer, or both.
Chronic Obstructive Pulmonary Disease; Glutathione Metabolism Pathway; DNA Repair Pathway; Inflammatory Response Pathway
Background and objective
Gemcitabine is widely used to treat non-small cell lung cancer (NSCLC). To assess the pharmacogenomic effects of the entire gemcitabine metabolic pathway, we genotyped SNPs within the 17 pathway genes using DNA samples from NSCLC patients treated with gemcitabine to determine the effect of genetic variants within gemcitabine pathway genes on overall survival (OS) of NSCLC patients after treatment of gemcitabine.
Eight of the 17 pathway genes were resequenced with DNA samples from Coriell lymphoblastoid cell lines (LCLs) using Sanger sequencing for all exons, exon-intron junctions and 5′-, 3′-UTRs. A total of 107 tag SNPs were selected based on the resequencing data for the 8 genes and on HapMap data for the remaining 9 genes, followed by successful genotyping of 394 NSCLC patient DNA samples. Association of SNPs/haplotypes with OS was performed using the Cox regression model, followed by functional studies performed with LCLs and NSCLC cell lines.
5 SNPs in 4 genes (CDA, NT5C2, RRM1, and SLC29A1) showed associations with OS of those NSCLC patients, as well as 9 haplotypes in 4 genes (RRM1, RRM2, SLC28A3, and SLC29A1) with P < 0.05. Genotype imputation using the LCLs was performed for a region of 200kb surrounding those SNPs, followed by association studies with gemcitabine cytotoxicity. Functional studies demonstrated that downregulation of SLC29A1, NT5C2, and RRM1 in NSCLC cell lines altered cell susceptibility to gemcitabine.
These studies help identify biomarkers to predict gemcitabine response in NSCLC, a step toward the individualized chemotherapy of lung cancer.
gemcitabine; pharmacogenomics; metabolic pathway; lymphoblastoid cell lines; non-small cell lung cancer (NSCLC)
Endoxifen, a cytochrome P450 mediated tamoxifen metabolite, is being developed as a drug for the treatment of estrogen receptor (ER) positive breast cancer. Endoxifen is known to be a potent anti-estrogen and its mechanisms of action are still being elucidated. Here, we demonstrate that endoxifen-mediated recruitment of ERα to known target genes differs from that of 4-hydroxy-tamoxifen (4HT) and ICI-182,780 (ICI). Global gene expression profiling of MCF7 cells revealed substantial differences in the transcriptome following treatment with 4HT, endoxifen and ICI, both in the presence and absence of estrogen. Alterations in endoxifen concentrations also dramatically altered the gene expression profiles of MCF7 cells, even in the presence of clinically relevant concentrations of tamoxifen and its metabolites, 4HT and N-desmethyl-tamoxifen (NDT). Pathway analysis of differentially regulated genes revealed substantial differences related to endoxifen concentrations including significant induction of cell cycle arrest and markers of apoptosis following treatment with high, but not low, concentrations of endoxifen. Taken together, these data demonstrate that endoxifen’s mechanism of action is different from that of 4HT and ICI and provide mechanistic insight into the potential importance of endoxifen in the suppression of breast cancer growth and progression.
Taxane is one of the first line treatments of lung cancer. In order to identify novel single nucleotide polymorphisms (SNPs) that might contribute to taxane response, we performed a genome-wide association study (GWAS) for two taxanes, paclitaxel and docetaxel, using 276 lymphoblastoid cell lines (LCLs), followed by genotyping of top candidate SNPs in 874 lung cancer patient samples treated with paclitaxel.
GWAS was performed using 1.3 million SNPs and taxane cytotoxicity IC50 values for 276 LCLs. The association of selected SNPs with overall survival in 76 small or 798 non-small cell lung cancer (SCLC, NSCLC) patients were analyzed by Cox regression model, followed by integrated SNP-microRNA-expression association analysis in LCLs and siRNA screening of candidate genes in SCLC (H196) and NSCLC (A549) cell lines.
147 and 180 SNPs were associated with paclitaxel or docetaxel IC50s with p-values <10-4 in the LCLs, respectively. Genotyping of 153 candidate SNPs in 874 lung cancer patient samples identified 8 SNPs (p-value < 0.05) associated with either SCLC or NSCLC patient overall survival. Knockdown of PIP4K2A, CCT5, CMBL, EXO1, KMO and OPN3, genes within 200 kb up-/downstream of the 3 SNPs that were associated with SCLC overall survival (rs1778335, rs2662411 and rs7519667), significantly desensitized H196 to paclitaxel. SNPs rs2662411 and rs1778335 were associated with mRNA expression of CMBL or PIP4K2A through microRNA (miRNA) hsa-miR-584 or hsa-miR-1468.
GWAS in an LCL model system, joined with clinical translational and functional studies, might help us identify genetic variations associated with overall survival of lung cancer patients treated paclitaxel.
Taxane; Genome-wide association; Lymphoblastoid cell line; Lung cancer; Overall survival
Studies from selected candidate genes suggest that single nucleotide polymorphisms (SNP) involved in glutathione metabolism, DNA repair, or inflammatory responses may affect overall survival (OS) in stages I-II or low stage non-small cell lung cancer (LS-NSCLC); however, results are inconclusive. In this study, we took a systematic pathway-based approach to simultaneously evaluate the impact of genetic variation from these three pathways on OS following LS-NSCLC diagnosis.
DNA from 647 patients with LS-NSCLC was genotyped for 480 SNPs (tagSNPs) tagging 57 genes from the three candidate pathways. Associations of tagSNPs with OS were assessed at the individual SNP and whole gene levels, adjusting for age, tumor stage, surgery type, and adjuvant therapy. The genotype combinations of the SNPs associated with OS was also estimated.
Among the 412 tagSNPs that were successfully genotyped and passed multi-step quality assessments, 28 showed association with OS (p<0.05). Two of the 28 were estimated to have less than a 20% chance of being false positives (rs3768490 in GSTM4 gene: p=1.32×10-4, q=0.06; rs1729786 in ABCC4 gene: p=9.25×10-4, q=0.20). Gene-based analysis suggested that, in addition to GSTM4 and ABCC4, variation in two other genes, PTGS2 and GSTA2, was also associated with OS.
We describe further evidence that variations in genes involved in the glutathione and inflammatory response pathways are associated with OS in patients with LS-NSCLC. Further studies are warranted to verify our findings and elucidate their functional mechanisms and clinical utility leading to improved survival for lung cancer patients.
glutathione metabolism; DNA repair; inflammation response; genetic polymorphisms; non-small-cell lung cancer; survival analysis
Inherited variability in the prognosis of lung cancer patients treated with platinum-based chemotherapy has been widely investigated. However, the overall contribution of genetic variation to platinum response is not well established. To identify novel candidate SNPs/genes, we performed a genome-wide association study (GWAS) for cisplatin cytotoxicity using lymphoblastoid cell lines (LCLs), followed by an association study of selected SNPs from the GWAS with overall survival (OS) in lung cancer patients.
GWAS for cisplatin were performed with 283 ethnically diverse LCLs. 168 top SNPs were genotyped in 222 small cell and 961 non-small cell lung cancer (SCLC, NSCLC) patients treated with platinum-based therapy. Association of the SNPs with OS was determined using the Cox regression model. Selected candidate genes were functionally validated by siRNA knockdown in human lung cancer cells.
Among 157 successfully genotyped SNPs, 9 and 10 SNPs were top SNPs associated with OS for patients with NSCLC and SCLC, respectively, although they were not significant after adjusting for multiple testing. Fifteen genes, including 7 located within 200 kb up or downstream of the four top SNPs and 8 genes for which expression was correlated with three SNPs in LCLs were selected for siRNA screening. Knockdown of DAPK3 and METTL6, for which expression levels were correlated with the rs11169748 and rs2440915 SNPs, significantly decreased cisplatin sensitivity in lung cancer cells.
This series of clinical and complementary laboratory-based functional studies identified several candidate genes/SNPs that might help predict treatment outcomes for platinum-based therapy of lung cancer.
Lung cancer; cisplatin; pharmacogenomics; lymphoblastoid cell lines; GWAS
Copy number variants (CNVs) have been implicated in many complex diseases. We examined whether inherited CNVs were associated with overall survival among women with invasive epithelial ovarian cancer. Germline DNA from 1,056 cases (494 deceased, average of 3.7 years follow-up) was interrogated with the Illumina 610 quad genome-wide array containing, after quality control exclusions, 581,903 single nucleotide polymorphisms (SNPs) and 17,917 CNV probes. Comprehensive analysis capitalized upon the strengths of three complementary approaches to CNV classification. First, to identify small CNVs, single markers were evaluated and, where associated with survival, consecutive markers were combined. Two chromosomal regions were associated with survival using this approach (14q31.3 rs2274736 p = 1.59 × 10−6, p = 0.001; 22q13.31 rs2285164 p = 4.01 × 10−5, p = 0.009), but were not significant after multiple testing correction. Second, to identify large CNVs, genome-wide segmentation was conducted to characterize chromosomal gains and losses, and association with survival was evaluated by segment. Four regions were associated with survival (1q21.3 loss p = 0.005, 5p14.1 loss p = 0.004, 9p23 loss p = 0.002, and 15q22.31 gain p = 0.002); however, again, after correcting for multiple testing, no regions were statistically significant, and none were in common with the single marker approach. Finally, to evaluate associations with general amounts of copy number changes across the genome, we estimated CNV burden based on genome-wide numbers of gains and losses; no associations with survival were observed (p > 0.40). Although CNVs that were not well-covered by the Illumina 610 quad array merit investigation, these data suggest no association between inherited CNVs and survival after ovarian cancer.
association testing; copy number variation; genotyping array; ovarian cancer; overall survival
Summary: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation.
Availability and implementation: SAAP-RRBS is freely available to non-commercial users at the web site http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm.
email@example.com or firstname.lastname@example.org
Supplementary data are available at Bioinformatics online.
Variations in genes related to anticancer drugs' biologic activity could influence treatment responses and lung cancer prognosis. Genetic variants in four biological pathways, i.e., glutathione metabolism, DNA repair, cell cycle, and EGFR, were systematically investigated to examine their association with survival in advanced-stage NSCLC treated with chemotherapy.
A total of 894 tagging single-nucleotide polymorphisms (tagSNPs) in 70 genes from the four pathways were genotyped and analyzed in a 1076-patient cohort. Association with overall survival was analyzed at single-SNP and whole-gene levels within all patients and major chemotherapy agent combination groups.
A poorer overall survival was observed in patients with genetic variations in GSS (glutathione pathway) and MAP3K1 (EGFR pathway) (HR=1.45, 95% CI=1.20–1.70 and HR=1.25, 95% CI=1.05–1.50, respectively). In stratified analysis on patients receiving platinum plus taxane treatment, we observed a hazardous effect on overall survival by MAP3K1 variant (HR=1.38, 95% CI =1.11–1.72) and a protective effect by RAF1 (HR=0.64, 95% CI=0.5–0.82) in the EGFR pathway. In patients receiving platinum plus gemcitabine treatment, RAF and GPX5 (glutathione pathway) genetic variations showed protective effects on survival (HR=0.54, 95% CI=0.38–0.77; HR=0.67, 95% CI=0.52–0.85, respectively); in contrast, NRAS (EGFR pathway) and GPX7 (glutathione pathway) variations showed hazardous effects on overall survival (HR=1.91, 95% CI=1.30–2.80; HR=1.83, 95% CI=1.27–2.63, respectively). All genes that harbored these significant SNPs remained significant by whole-gene analysis.
Common genetic variations in genes of EGFR and glutathione pathways may be associated with overall survival among patients with advanced-stage NSCLC treated with platinum, taxane, and/or gemicitabine combinations.
non-small cell lung cancer; survival; single-nucleotide polymorphisms; pathway; chemotherapy
The androgen receptor (AR) is the principal target for treatment of non-organ confined prostate cancer (PCa). Androgen deprivation therapies (ADTs) directed against the AR ligand-binding domain do not fully inhibit androgen-dependent signaling critical for PCa progression. Thus, information that could direct the development of more effective ADTs are desired. Systems and bioinformatics approaches suggest that considerable variation exists in the mechanisms by which AR regulates expression of effector genes, pointing to a role for secondary transcription factors. A combination of microarray and in silico analyses led us to identify a 158 gene signature that relies on AR along with the transcription factor SRF, representing < 6% of androgen-dependent genes. This AR-SRF signature is sufficient to distinguish microdissected benign and malignant prostate samples, and it correlates with the presence of aggressive disease and poor outcome. Compared to other AR target gene signatures of similar size, the AR-SRF signature described here associates more strongly with biochemical failure. Further, it is enriched in malignant versus benign prostate tissues, compared to other signatures. To our knowledge, this profile represents the first demonstration of a distinct mechanism of androgen action with clinical relevance in PCa, offering a possible rationale to develop novel and more effective forms of ADT.
prostate cancer; androgen receptor; transcription; gene expression; disease progression
The Epstein-Barr virus (EBV) nuclear antigen 2 (EBNA-2) plays a key role in the B-cell growth transformation by initiating and maintaining the proliferation of infected B-cell upon EBV infection in vitro. Most studies about EBNA-2 have focused on its functions yet little is known for its intertypic polymorphisms.
Coding region for amino acid (aa) 148-487 of the EBNA-2 gene was sequenced in 25 EBV-associated gastric carcinomas (EBVaGCs), 56 nasopharyngeal carcinomas (NPCs) and 32 throat washings (TWs) from healthy donors in Northern China. Three variations (g48991t, c48998a, t49613a) were detected in all of the samples (113/113, 100%). EBNA-2 could be classified into four distinct subtypes: E2-A, E2-B, E2-C and E2-D based on the deletion status of three aa (294Q, 357K and 358G). Subtypes E2-A and E2-C were detected in 56/113 (49.6%), 38/113 (33.6%) samples, respectively. E2-A was observed more in EBVaGCs samples and subtype E2-D was only detected in the NPC samples. Variation analysis in EBNA-2 functional domains: the TAD residue (I438L) and the NLS residues (E476G, P484H and I486T) were only detected in NPC samples which located in the carboxyl terminus of EBNA-2 gene.
The subtypes E2-A and E2-C were the dominant genotypes of the EBNA-2 gene in Northern China. The subtype E2-D may be associated with the tumorigenesis of NPC. The NPC isolates were prone harbor to more mutations than the other two groups in the functional domains.
Epstein-Barr virus; Gastric carcinoma; Nasopharyngeal carcinoma; Nuclear antigen 2; Polymorphism
The heparin-degrading endosulfatases sulfatase 1 (SULF1) and sulfatase 2 (SULF2) have opposing effects in hepatocarcinogenesis despite structural similarity. Using mRNA expression arrays, we analyzed the correlations of SULF expression with signaling networks in human hepatocellular carcinomas (HCCs) and the associations of SULF expression with tumor phenotype and patient survival. Data from two mRNA microarray analyses of 139 and 36 HCCs and adjacent tissues were used as training and validation sets. Partek and Metacore software were used to identify SULF correlated genes and their associated signaling pathways. Associations between SULF expression, the hepatoblast subtype of HCC, and survival were examined. Both SULF1 and 2 had strong positive correlations with periostin, IQGAP1, TGFB1, and vimentin and inverse correlations with HNF4A and IQGAP2. Genes correlated with both SULFs were highly associated with the cell adhesion, cytoskeletal remodeling, blood coagulation, TGFB, and Wnt/β-catenin and epithelial mesenchymal transition signaling pathways. Genes uniquely correlated with SULF2 were more associated with neoplastic processes than genes uniquely correlated with SULF1. High SULF expression was associated with the hepatoblast subtype of HCC. There was a bimodal effect of SULF1 expression on prognosis, with patients in the lowest or highest tertile having a worse prognosis than those in the middle tertile. SULFs have complex effects on HCC signaling and patient survival. There are functionally similar associations with cell adhesion, ECM remodeling, TGFB, and WNT pathways, but also unique associations of SULF1 and SULF2. The roles and targeting of the SULFs in cancer require further investigation.
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.
transcriptome sequencing; RNA-Seq; KRAS mutation; NSCLC; bioinformatics; network analysis; data integration and computational methods
Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data.
We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated.
Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets.
Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.
Summary: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways.
Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website: http://ndc.mayo.edu/mayo/research/biostat/stand-alone-packages.cfm
Contact: Hossain.Asif@mayo.edu; Kocher.JeanPierre@mayo.edu
Supplementary information: Supplementary data are provided at Bioinformatics online.
Gene expression is suppressed by DNA methylation. The goal of this study was to identify genes whose CpG site methylation and mRNA expression are associated with recurrence after surgical resection for hepatocellular carcinoma (HCC). Sixty-two HCCs were examined by both whole genome DNA methylation and transcriptome analysis. The Cox model was used to select genes associated with recurrence. A validation was performed in an independent cohort of 66 HCC patients. Among fifty-nine common genes, increased CpG site methylation and decreased mRNA expression were associated with recurrence for 12 genes (Group A), whereas decreased CpG site methylation and increased mRNA expression were associated with recurrence for 25 genes (Group B). The remaining 22 genes were defined as Group C. Complement factor H (CFH) and myosin VIIA and Rab interacting protein (MYRIP) in Group A; proline/serine-rich coiled-coil 1 (PSRC1), meiotic recombination 11 homolog A (MRE11A), and myosin IE (MYO1E) in Group B; and autophagy-related protein LC3 A (MAP1LC3A), and NADH dehydrogenase 1 alpha subcomplex assembly factor 1 (NDUFAF1) in Group C were validated. In conclusion, potential tumor suppressor (CFH, MYRIP) and oncogenes (PSRC1, MRE11A, MYO1E) in HCC are reported. The regulation of individual genes by methylation in hepatocarcinogenesis needs to be validated.
Carcinoma, Hepatocellular; Gene Expression Profiling; Microarray analysis; DNA methylation; Survival