Search tips
Search criteria

Results 1-25 (54)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
author:("Sun, hifu")
1.  Increased BCAR1 predicts poor outcomes of non-small cell lung cancer in multiple-center patients 
Annals of surgical oncology  2013;20(0 3):S701-S708.
We aimed to study the prognostic value of BCAR1 expression and its associations with clinical-demographical characteristics in multiple centers of non-small cell lung cancer (NSCLC) patients.
Gene expression microarray (mRNA) of 77 adenocarcinomas from Mayo Clinic, RNA-sequencing of 508 NSCLC from The Cancer Genome Atlas (TCGA), and immunohistochemistry(IHC) stain of BCAR1-protein expression in 150 cases from Daping Hospital were included in the study. The association of mRNA or protein expression with patient clinical characteristics and overall survival was assessed in each dataset. We also predicted microRNAs (miRNA) that target BCAR1- using bioinformatics prediction tools and evaluated miRNA expression patterns with BCAR1 expression in miRNA-sequencing data of 74 lung cancer cases from TCGA dataset.
In the Mayo Clinic dataset, a higher BCAR1-mRNA level was significantly correlated with more advanced tumor-stage and lymphatic metastasis. Similar changes were observed in the TCGA RNA-seq dataset. Additionally, higher BCAR1-mRNA levels predicted poorer survival in adenocarcinoma and squamous carcinoma from the TCGA dataset. The protein levels in the adenocarcinoma cases with lymphatic metastasis were significantly higher than of those without metastasis. Tumor tissues demonstrated remarkably higher levels of protein compared with matched normal tissues although there was no significant difference in BCAR1-mRNA expression between tumor and matched normal tissues was detected. In miRNAs that were down-regulated in the tumors, Let-7f-2 and miR-22 differed the most (P<0.001 and P=0.007, respectively).
We confirmed that increased BCAR1 expression predicts poorer prognosis in NSCLC. We postulate mRNA-protein decoupling of BCAR1 may be a result of reduced inhibition of specific miRNAs in tumor tissues, which warrants further study.
PMCID: PMC4010387  PMID: 23904007
BCAR1; NSCLC; prognosis; mRNA; miRNA
2.  MACE: model based analysis of ChIP-exo 
Nucleic Acids Research  2014;42(20):e156.
Understanding the role of a given transcription factor (TF) in regulating gene expression requires precise mapping of its binding sites in the genome. Chromatin immunoprecipitation-exo, an emerging technique using λ exonuclease to digest TF unbound DNA after ChIP, is designed to reveal transcription factor binding site (TFBS) boundaries with near-single nucleotide resolution. Although ChIP-exo promises deeper insights into transcription regulation, no dedicated bioinformatics tool exists to leverage its advantages. Most ChIP-seq and ChIP-chip analytic methods are not tailored for ChIP-exo, and thus cannot take full advantage of high-resolution ChIP-exo data. Here we describe a novel analysis framework, termed MACE (model-based analysis of ChIP-exo) dedicated to ChIP-exo data analysis. The MACE workflow consists of four steps: (i) sequencing data normalization and bias correction; (ii) signal consolidation and noise reduction; (iii) single-nucleotide resolution border peak detection using the Chebyshev Inequality and (iv) border matching using the Gale-Shapley stable matching algorithm. When applied to published human CTCF, yeast Reb1 and our own mouse ONECUT1/HNF6 ChIP-exo data, MACE is able to define TFBSs with high sensitivity, specificity and spatial resolution, as evidenced by multiple criteria including motif enrichment, sequence conservation, direct sequence pileup, nucleosome positioning and open chromatin states. In addition, we show that the fundamental advance of MACE is the identification of two boundaries of a TFBS with high resolution, whereas other methods only report a single location of the same event. The two boundaries help elucidate the in vivo binding structure of a given TF, e.g. whether the TF may bind as dimers or in a complex with other co-factors.
PMCID: PMC4227761  PMID: 25249628
3.  HiChIP: a high-throughput pipeline for integrative analysis of ChIP-Seq data 
BMC Bioinformatics  2014;15(1):280.
Chromatin immunoprecipitation (ChIP) followed by next-generation sequencing (ChIP-Seq) has been widely used to identify genomic loci of transcription factor (TF) binding and histone modifications. ChIP-Seq data analysis involves multiple steps from read mapping and peak calling to data integration and interpretation. It remains challenging and time-consuming to process large amounts of ChIP-Seq data derived from different antibodies or experimental designs using the same approach. To address this challenge, there is a need for a comprehensive analysis pipeline with flexible settings to accelerate the utilization of this powerful technology in epigenetics research.
We have developed a highly integrative pipeline, termed HiChIP for systematic analysis of ChIP-Seq data. HiChIP incorporates several open source software packages selected based on internal assessments and published comparisons. It also includes a set of tools developed in-house. This workflow enables the analysis of both paired-end and single-end ChIP-Seq reads, with or without replicates for the characterization and annotation of both punctate and diffuse binding sites. The main functionality of HiChIP includes: (a) read quality checking; (b) read mapping and filtering; (c) peak calling and peak consistency analysis; and (d) result visualization. In addition, this pipeline contains modules for generating binding profiles over selected genomic features, de novo motif finding from transcription factor (TF) binding sites and functional annotation of peak associated genes.
HiChIP is a comprehensive analysis pipeline that can be configured to analyze ChIP-Seq data derived from varying antibodies and experiment designs. Using public ChIP-Seq data we demonstrate that HiChIP is a fast and reliable pipeline for processing large amounts of ChIP-Seq data.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-280) contains supplementary material, which is available to authorized users.
PMCID: PMC4152589  PMID: 25128017
ChIP-Seq; Next-generation sequencing; Peak calling; Duplicate filtering; Irreproducible discovery rate
4.  Clinical biomarkers of pulmonary carcinoid tumors in never smokers via profiling miRNA and target mRNA 
Cell & Bioscience  2014;4:35.
miRNAs play key regulatory roles in cellular pathological processes. We aimed to identify clinically meaningful biomarkers in pulmonary carcinoid tumors (PCTs), a member of neuroendocrine neoplasms, via profiling miRNAs and mRNAs.
From the total of 1145 miRNAs, we obtained 16 and 17 miRNAs that showed positive and negative fold changes (FCs, tumors vs. normal tissues) in the top 1% differentially expressed miRNAs, respectively. We uncovered the target genes that were predicted by at least two prediction tools and overlapped by at least one-half of the top miRNAs, which yielded 44 genes (FC<-2) and 56 genes (FC>2), respectively. Higher expressions of CREB5, PTPRB and COL4A3 predicted favorable disease free survival (Hazard ratio: 0.03, 0.19 and 0.36; P value: 0.03, 0.03 and 0.08). Additionally, 79 mutated genes have been found in nine PCTs where TP53 was the only repeated mutation.
We identified that the expressions of three genes have clinical implications in PCTs. The biological functions of these biomarkers warrant further studies.
PMCID: PMC4124500  PMID: 25105010
miRNA; mRNA; Carcinoid; Survival
5.  Conserved recurrent gene mutations correlate with pathway deregulation and clinical outcomes of lung adenocarcinoma in never-smokers 
BMC Medical Genomics  2014;7:32.
Novel and targetable mutations are needed for improved understanding and treatment of lung cancer in never-smokers.
Twenty-seven lung adenocarcinomas from never-smokers were sequenced by both exome and mRNA-seq with respective normal tissues. Somatic mutations were detected and compared with pathway deregulation, tumor phenotypes and clinical outcomes.
Although somatic mutations in DNA or mRNA ranged from hundreds to thousands in each tumor, the overlap mutations between the two were only a few to a couple of hundreds. The number of somatic mutations from either DNA or mRNA was not significantly associated with clinical variables; however, the number of overlap mutations was associated with cancer subtype. These overlap mutants were preferentially expressed in mRNA with consistently higher allele frequency in mRNA than in DNA. Ten genes (EGFR, TP53, KRAS, RPS6KB2, ATXN2, DHX9, PTPN13, SP1, SPTAN1 and MYOF) had recurrent mutations and these mutations were highly correlated with pathway deregulation and patient survival.
The recurrent mutations present in both DNA and RNA are likely the driver for tumor biology, pathway deregulation and clinical outcomes. The information may be used for patient stratification and therapeutic target development.
PMCID: PMC4060138  PMID: 24894543
Lung adenocarcinoma of never smoker; Somatic mutations; Pathway deregulation; Patient survival
6.  CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data 
BMC Genomics  2014;15(1):423.
miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.
We developed a Comprehensive Analysis Pipeline for microRNA Sequencing data (CAP-miRSeq) that integrates read pre-processing, alignment, mature/precursor/novel miRNA detection and quantification, data visualization, variant detection in miRNA coding region, and more flexible differential expression analysis between experimental conditions. According to computational infrastructure, users can install the package locally or deploy it in Amazon Cloud to run samples sequentially or in parallel for a large number of samples for speedy analyses. In either case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline’s superior performances, flexibility, and practical use in research and biomarker discovery.
CAP-miRSeq is a powerful and flexible tool for users to process and analyze miRNA-seq data scalable from a few to hundreds of samples. The results are presented in the convenient way for investigators or analysts to conduct further investigation and discovery.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-423) contains supplementary material, which is available to authorized users.
PMCID: PMC4070549  PMID: 24894665
miRNA sequencing; Analysis pipeline; Differential expression; Variant detection
7.  Network-based approach identified cell cycle genes as predictor of overall survival in lung adenocarcinoma patients 
Lung adenocarcinoma is the most common type of primary lung cancer. The purpose of this study was to delineate gene expression patterns for survival prediction in lung adenocarcinoma. Gene expression profiles of 82 (discovery set) and 442 (validation set 1) lung adenocarcinoma tumor tissues were analyzed using a systems biology-based network approach. We also examined the expression profiles of 78 adjacent normal lung tissues from 82 patients. We found a significant correlation of an expression module with overall survival (adjusted hazard ratio or HR=1.71; 95% CI=1.06-2.74 in discovery set; adjusted HR=1.26; 95% CI=1.08-1.49 in validation set 1). This expression module contained genes enriched in the biological process of the cell cycle. Interestingly, the cell cycle gene module and overall survival association were also significant in normal lung tissues (adjusted HR=1.91; 95% CI, 1.32-2.75). From these survival-related modules, we further defined three hub genes (UBE2C, TPX2 and MELK) whose expression-based risk indices were more strongly associated with poor 5-year survival (HR=3.85, 95% CI=1.34-11.05 in discovery set; HR=1.72, 95% CI=1.21-2.46 in validation set 1; and HR=3.35, 95% CI=1.08-10.04 in normal lung set). The 3-gene prognostic result was further validated using 92 adenocarcinoma tumor samples (validation set 2); patients with a high-risk gene signature have a 1.52 fold increased risk (95% CI, 1.02–2.24) of death than patients with a low-risk gene signature. These results suggest that network-based approach may facilitate discovery of key genes that are closely linked to survival in patients with lung adenocarcinoma.
PMCID: PMC3595338  PMID: 23357462
Lung cancer; survival; gene expression profiling; cell cycle; systems biology
8.  Chronic Caloric Restriction Preserves Mitochondrial Function in Senescence Without Increasing Mitochondrial Biogenesis 
Cell metabolism  2012;16(6):777-788.
Caloric restriction (CR) mitigates many detrimental effects of aging and prolongs lifespan. CR has been suggested to increase mitochondrial biogenesis, thereby attenuating age-related declines in mitochondrial function; a concept that is challenged by recent studies. Here we show that lifelong CR in mice prevents age-related loss of mitochondrial oxidative capacity and efficiency, measured in isolated mitochondria and permeabilized muscle fibers. We find that these beneficial effects of CR occur without increasing mitochondrial abundance. Whole-genome expression profiling and large-scale proteomic surveys revealed expression patterns inconsistent with increased mitochondrial biogenesis, which is further supported by lower mitochondrial protein synthesis with CR. We find that CR decreases oxidant emission, increases antioxidant scavenging, and minimizes oxidative damage to DNA and protein. These results demonstrate that CR preserves mitochondrial function by protecting the integrity and function of existing cellular components rather than by increasing mitochondrial biogenesis.
PMCID: PMC3544078  PMID: 23217257
Caloric Restriction; Calorie restriction; dietary restriction; aging; mitochondria; protein synthesis; skeletal muscle; mitochondrial biogenesis
9.  Gene Expression, Single Nucleotide Variant and Fusion Transcript Discovery in Archival Material from Breast Tumors 
PLoS ONE  2013;8(11):e81925.
Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively, p<2x10-16. Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries, but detection of eSNV and fusion transcripts was less sensitive.
PMCID: PMC3838386  PMID: 24278466
10.  An Integrated Model of the Transcriptome of HER2-Positive Breast Cancer 
PLoS ONE  2013;8(11):e79298.
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative, and HER2-positive tumors to identify 685 differentially expressed genes, 102 alternatively spliced genes, and 303 genes that expressed single nucleotide sequence variants (eSNVs) that were associated with the HER2-positive tumors in our survey panel. These features were integrated into a transcriptome landscape model that identified 12 highly interconnected genomic modules, each of which represents a cellular processes pathway that appears to define the genomic architecture of the HER2-positive tumors in our test set. The generality of the model was confirmed by the observation that several key pathways were enriched in HER2-positive TCGA breast tumors. The ability of this model to make relevant predictions about the biology of breast cancer cells was established by the observation that integrin signaling was linked to lapatinib sensitivity in vitro and strongly associated with risk of relapse in the NCCTG N9831 adjuvant trastuzumab clinical trial dataset. Additional modules from the HER2 transcriptome model, including ubiquitin-mediated proteolysis, TGF-beta signaling, RHO-family GTPase signaling, and M-phase progression, were linked to response to lapatinib and paclitaxel in vitro and/or risk of relapse in the N9831 dataset. These data indicate that an integrated transcriptome landscape model derived from a test set of HER2-positive breast tumors has potential for predicting outcome and for identifying novel potential therapeutic strategies for this breast cancer subtype.
PMCID: PMC3815156  PMID: 24223926
11.  Pharmacologic reversion of epigenetic silencing of the PRKD1 promoter blocks breast tumor cell invasion and metastasis 
DNA methylation-induced silencing of genes encoding tumor suppressors is common in many types of cancer, but little is known about how such epigenetic silencing can contribute to tumor metastasis. The PRKD1 gene encodes protein kinase D1 (PKD1), a serine/threonine kinase that is expressed in cells of the normal mammary gland, where it maintains the epithelial phenotype by preventing epithelial-to-mesenchymal transition.
The status of PRKD1 promoter methylation was analyzed by reduced representation bisulfite deep sequencing, methylation-specific PCR (MSP-PCR) and in situ MSP-PCR in invasive and noninvasive breast cancer lines, as well as in humans in 34 cases of “normal” tissue, 22 cases of ductal carcinoma in situ, 22 cases of estrogen receptor positive, HER2-negative (ER+/HER2-) invasive lobular carcinoma, 43 cases of ER+/HER2- invasive ductal carcinoma (IDC), 93 cases of HER2+ IDC and 96 cases of triple-negative IDC. A reexpression strategy using the DNA methyltransferase inhibitor decitabine was used in vitro in MDA-MB-231 cells as well as in vivo in a tumor xenograft model and measured by RT-PCR, immunoblotting and immunohistochemistry. The effect of PKD1 reexpression on cell invasion was analyzed in vitro by transwell invasion assay. Tumor growth and metastasis were monitored in vivo using the IVIS Spectrum Pre-clinical In Vivo Imaging System.
Herein we show that the gene promoter of PRKD1 is aberrantly methylated and silenced in its expression in invasive breast cancer cells and during breast tumor progression, increasing with the aggressiveness of tumors. Using an animal model, we show that reversion of PRKD1 promoter methylation with the DNA methyltransferase inhibitor decitabine restores PKD1 expression and blocks tumor spread and metastasis to the lung in a PKD1-dependent fashion.
Our data suggest that the status of epigenetic regulation of the PRKD1 promoter can provide valid information on the invasiveness of breast tumors and therefore could serve as an early diagnostic marker. Moreover, targeted upregulation of PKD1 expression may be used as a therapeutic approach to reverse the invasive phenotype of breast cancer cells.
PMCID: PMC4052945  PMID: 23971832
Decitabine; Invasion; Metastasis; PKD1; Protein kinase D1
12.  Impact of Library Preparation on Downstream Analysis and Interpretation of RNA-Seq Data: Comparison between Illumina PolyA and NuGEN Ovation Protocol 
PLoS ONE  2013;8(8):e71745.
The sequencing by the PolyA selection is the most common approach for library preparation. With limited amount or degraded RNA, alternative protocols such as the NuGEN have been developed. However, it is not yet clear how the different library preparations affect the downstream analyses of the broad applications of RNA sequencing.
Methods and Materials
Eight human mammary epithelial cell (HMEC) lines with high quality RNA were sequenced by Illumina’s mRNA-Seq PolyA selection and NuGEN ENCORE library preparation. The following analyses and comparisons were conducted: 1) the numbers of genes captured by each protocol; 2) the impact of protocols on differentially expressed gene detection between biological replicates; 3) expressed single nucleotide variant (SNV) detection; 4) non-coding RNAs, particularly lincRNA detection; and 5) intragenic gene expression.
Sequences from the NuGEN protocol had lower (75%) alignment rate than the PolyA (over 90%). The NuGEN protocol detected fewer genes (12–20% less) with a significant portion of reads mapped to non-coding regions. A large number of genes were differentially detected between the two protocols. About 17–20% of the differentially expressed genes between biological replicates were commonly detected between the two protocols. Significantly higher numbers of SNVs (5–6 times) were detected in the NuGEN samples, which were largely from intragenic and intergenic regions. The NuGEN captured fewer exons (25% less) and had higher base level coverage variance. While 6.3% of reads were mapped to intragenic regions in the PolyA samples, the percentages were much higher (20–25%) for the NuGEN samples. The NuGEN protocol did not detect more known non-coding RNAs such as lincRNAs, but targeted small and “novel” lincRNAs.
Different library preparations can have significant impacts on downstream analysis and interpretation of RNA-seq data. The NuGEN provides an alternative for limited or degraded RNA but it has limitations for some RNA-seq applications.
PMCID: PMC3747248  PMID: 23977132
13.  Normal early pregnancy 
Epigenetics  2012;7(7):729-734.
The objective of this study was to analyze genome-wide differential methylation patterns in maternal leukocyte DNA in early pregnant and non-pregnant states. This is an age and body mass index matched case-control study comparing the methylation patterns of 27,578 cytosine-guanine (CpG) sites in 14,495 genes in maternal leukocyte DNA in early pregnancy (n = 14), in the same women postpartum (n = 14), and in nulligravid women (n = 14) on a BeadChip platform. Transient widespread hypomethylation was found in early pregnancy as compared with the non-pregnant states. Methylation of nine genes was significantly different in early pregnancy compared with both postpartum and nulligravid states (< 10% False Discovery Rate). Early pregnancy may be characterized by widespread hypomethylation compared with non-pregnant states; there is no apparent permanent methylation imprint after a normal term gestation. Nine potential candidate genes were identified as differentially methylated in early pregnancy and may play a role in the maternal adaptation to pregnancy.
PMCID: PMC3414393  PMID: 22647708
DNA Methylation; epigenetics; immune; leukocyte; maternal; postpartum; pregnancy
14.  Increased miR-708 Expression in NSCLC and Its Association with Poor Survival in Lung Adenocarcinoma from Never Smokers 
MicroRNA plays an important role in human diseases and cancer. We seek to investigate the expression status, clinical relevance, and functional role of microRNA in non-small cell lung cancer.
Experimental Design
We performed miRNA expression profiling in matched lung adenocarcinoma and uninvolved lung using 56 pairs of fresh-frozen (FF) and 47 pairs of formalin-fixed, paraffin-embedded (FFPE) samples from never smokers. The most differentially expressed miRNA genes were evaluated by Cox analysis and Log-Rank test. Among the best candidate, miR-708 was further examined for differential expression in two independent cohorts. Functional significance of miR-708 expression in lung cancer was examined by identifying its candidate mRNA target and through manipulating its expression levels in cultured cells.
Among the 20 miRNAs most differentially expressed between tested tumor and normal samples, high expression level of miR-708 in the tumors was most strongly associated with an increased risk of death after adjustments for all clinically significant factors including age, sex, and tumor stage (FF cohort: HR, 1.90; 95% CI, 1.08-3.35; P=.025 and FFPE cohort: HR, 1.93; 95% CI, 1.02-3.63; P=.042). The transcript for TMEM88 gene has a miR-708 binding site in its 3′ UTR and was significantly reduced in tumors high of miR-708. Forced miR-708 expression reduced TMEM88 transcript levels and increased the rate of cell proliferation, invasion, and migration in culture.
MicroRNA-708 acts as an oncogene contributing to tumor growth and disease progression by directly down regulating TMEM88, a negative regulator of the Wnt signaling pathway in lung cancer.
PMCID: PMC3616503  PMID: 22573352
NSCLC; adenocarcinoma; miR-708; never smoker; survival; TMEM88; Wnt signaling
15.  Characterization of human plasma-derived exosomal RNAs by deep sequencing 
BMC Genomics  2013;14:319.
Exosomes, endosome-derived membrane microvesicles, contain specific RNA transcripts that are thought to be involved in cell-cell communication. These RNA transcripts have great potential as disease biomarkers. To characterize exosomal RNA profiles systemically, we performed RNA sequencing analysis using three human plasma samples and evaluated the efficacies of small RNA library preparation protocols from three manufacturers. In all we evaluated 14 libraries (7 replicates).
From the 14 size-selected sequencing libraries, we obtained a total of 101.8 million raw single-end reads, an average of about 7.27 million reads per library. Sequence analysis showed that there was a diverse collection of the exosomal RNA species among which microRNAs (miRNAs) were the most abundant, making up over 42.32% of all raw reads and 76.20% of all mappable reads. At the current read depth, 593 miRNAs were detectable. The five most common miRNAs (miR-99a-5p, miR-128, miR-124-3p, miR-22-3p, and miR-99b-5p) collectively accounted for 48.99% of all mappable miRNA sequences. MiRNA target gene enrichment analysis suggested that the highly abundant miRNAs may play an important role in biological functions such as protein phosphorylation, RNA splicing, chromosomal abnormality, and angiogenesis. From the unknown RNA sequences, we predicted 185 potential miRNA candidates. Furthermore, we detected significant fractions of other RNA species including ribosomal RNA (9.16% of all mappable counts), long non-coding RNA (3.36%), piwi-interacting RNA (1.31%), transfer RNA (1.24%), small nuclear RNA (0.18%), and small nucleolar RNA (0.01%); fragments of coding sequence (1.36%), 5′ untranslated region (0.21%), and 3′ untranslated region (0.54%) were also present. In addition to the RNA composition of the libraries, we found that the three tested commercial kits generated a sufficient number of DNA fragments for sequencing but each had significant bias toward capturing specific RNAs.
This study demonstrated that a wide variety of RNA species are embedded in the circulating vesicles. To our knowledge, this is the first report that applied deep sequencing to discover and characterize profiles of plasma-derived exosomal RNAs. Further characterization of these extracellular RNAs in diverse human populations will provide reference profiles and open new doors for the development of blood-based biomarkers for human diseases.
PMCID: PMC3653748  PMID: 23663360
Exosome; microRNA; Next generation sequencing; Plasma; Biomarker
16.  Concordance of Changes in Metabolic Pathways Based on Plasma Metabolomics and Skeletal Muscle Transcriptomics in Type 1 Diabetes 
Diabetes  2012;61(5):1004-1016.
Insulin regulates many cellular processes, but the full impact of insulin deficiency on cellular functions remains to be defined. Applying a mass spectrometry–based nontargeted metabolomics approach, we report here alterations of 330 plasma metabolites representing 33 metabolic pathways during an 8-h insulin deprivation in type 1 diabetic individuals. These pathways included those known to be affected by insulin such as glucose, amino acid and lipid metabolism, Krebs cycle, and immune responses and those hitherto unknown to be altered including prostaglandin, arachidonic acid, leukotrienes, neurotransmitters, nucleotides, and anti-inflammatory responses. A significant concordance of metabolome and skeletal muscle transcriptome–based pathways supports an assumption that plasma metabolites are chemical fingerprints of cellular events. Although insulin treatment normalized plasma glucose and many other metabolites, there were 71 metabolites and 24 pathways that differed between nondiabetes and insulin-treated type 1 diabetes. Confirmation of many known pathways altered by insulin using a single blood test offers confidence in the current approach. Future research needs to be focused on newly discovered pathways affected by insulin deficiency and systemic insulin treatment to determine whether they contribute to the high morbidity and mortality in T1D despite insulin treatment.
PMCID: PMC3331761  PMID: 22415876
17.  Genetic Variants Associated with the Risk of Chronic Obstructive Pulmonary Disease with and without Lung Cancer 
Chronic Obstructive Pulmonary Disease (COPD) is a strong risk factor for lung cancer. Published studies regarding variations of genes encoding glutathione metabolism, DNA repair, and inflammatory response pathways in susceptibility to COPD were inconclusive.
We evaluated 470 single nucleotide polymorphisms (SNPs) from 56 genes of these 3 pathways in 620 cases and 893 controls to identify susceptibility markers for COPD risk, using existing resources. We assessed SNP- and gene-level effects adjusting for sex, age, and smoking status. Differential genetic effects on disease risk with and without lung cancer were also assessed; cumulative risk models were established.
Twenty-one SNPs were found to be significantly associated with risk of COPD (P<0.01); gene-based analyses confirmed 2 genes (GCLC and GSS) and identified 3 additional (GSTO2, ERCC1, and RRM1). Carrying 12 high-risk alleles may increase risk by 2.7-fold; 8 SNPs altered COPD risk with lung cancer 3.1-fold, and 4 SNPs altered the risk without lung cancer 2.3-fold.
Our findings indicate that multiple genetic variations in the 3 selected pathways contribute to COPD risk through GCLC, GSS, GSTO2, ERCC1, and RRM1 genes. Functional studies are needed to elucidate the mechanisms of these genes in the development of COPD, lung cancer, or both.
PMCID: PMC3414259  PMID: 22044695
Chronic Obstructive Pulmonary Disease; Glutathione Metabolism Pathway; DNA Repair Pathway; Inflammatory Response Pathway
18.  Gemcitabine metabolic pathway genetic polymorphisms and response in non-small cell lung cancer patients 
Pharmacogenetics and Genomics  2012;22(2):105-116.
Background and objective
Gemcitabine is widely used to treat non-small cell lung cancer (NSCLC). To assess the pharmacogenomic effects of the entire gemcitabine metabolic pathway, we genotyped SNPs within the 17 pathway genes using DNA samples from NSCLC patients treated with gemcitabine to determine the effect of genetic variants within gemcitabine pathway genes on overall survival (OS) of NSCLC patients after treatment of gemcitabine.
Eight of the 17 pathway genes were resequenced with DNA samples from Coriell lymphoblastoid cell lines (LCLs) using Sanger sequencing for all exons, exon-intron junctions and 5′-, 3′-UTRs. A total of 107 tag SNPs were selected based on the resequencing data for the 8 genes and on HapMap data for the remaining 9 genes, followed by successful genotyping of 394 NSCLC patient DNA samples. Association of SNPs/haplotypes with OS was performed using the Cox regression model, followed by functional studies performed with LCLs and NSCLC cell lines.
5 SNPs in 4 genes (CDA, NT5C2, RRM1, and SLC29A1) showed associations with OS of those NSCLC patients, as well as 9 haplotypes in 4 genes (RRM1, RRM2, SLC28A3, and SLC29A1) with P < 0.05. Genotype imputation using the LCLs was performed for a region of 200kb surrounding those SNPs, followed by association studies with gemcitabine cytotoxicity. Functional studies demonstrated that downregulation of SLC29A1, NT5C2, and RRM1 in NSCLC cell lines altered cell susceptibility to gemcitabine.
These studies help identify biomarkers to predict gemcitabine response in NSCLC, a step toward the individualized chemotherapy of lung cancer.
PMCID: PMC3259218  PMID: 22173087
gemcitabine; pharmacogenomics; metabolic pathway; lymphoblastoid cell lines; non-small cell lung cancer (NSCLC)
19.  Endoxifen’s Molecular Mechanisms of Action Are Concentration Dependent and Different than That of Other Anti-Estrogens 
PLoS ONE  2013;8(1):e54613.
Endoxifen, a cytochrome P450 mediated tamoxifen metabolite, is being developed as a drug for the treatment of estrogen receptor (ER) positive breast cancer. Endoxifen is known to be a potent anti-estrogen and its mechanisms of action are still being elucidated. Here, we demonstrate that endoxifen-mediated recruitment of ERα to known target genes differs from that of 4-hydroxy-tamoxifen (4HT) and ICI-182,780 (ICI). Global gene expression profiling of MCF7 cells revealed substantial differences in the transcriptome following treatment with 4HT, endoxifen and ICI, both in the presence and absence of estrogen. Alterations in endoxifen concentrations also dramatically altered the gene expression profiles of MCF7 cells, even in the presence of clinically relevant concentrations of tamoxifen and its metabolites, 4HT and N-desmethyl-tamoxifen (NDT). Pathway analysis of differentially regulated genes revealed substantial differences related to endoxifen concentrations including significant induction of cell cycle arrest and markers of apoptosis following treatment with high, but not low, concentrations of endoxifen. Taken together, these data demonstrate that endoxifen’s mechanism of action is different from that of 4HT and ICI and provide mechanistic insight into the potential importance of endoxifen in the suppression of breast cancer growth and progression.
PMCID: PMC3557294  PMID: 23382923
20.  Genetic association with overall survival of taxane-treated lung cancer patients - a genome-wide association study in human lymphoblastoid cell lines followed by a clinical association study 
BMC Cancer  2012;12:422.
Taxane is one of the first line treatments of lung cancer. In order to identify novel single nucleotide polymorphisms (SNPs) that might contribute to taxane response, we performed a genome-wide association study (GWAS) for two taxanes, paclitaxel and docetaxel, using 276 lymphoblastoid cell lines (LCLs), followed by genotyping of top candidate SNPs in 874 lung cancer patient samples treated with paclitaxel.
GWAS was performed using 1.3 million SNPs and taxane cytotoxicity IC50 values for 276 LCLs. The association of selected SNPs with overall survival in 76 small or 798 non-small cell lung cancer (SCLC, NSCLC) patients were analyzed by Cox regression model, followed by integrated SNP-microRNA-expression association analysis in LCLs and siRNA screening of candidate genes in SCLC (H196) and NSCLC (A549) cell lines.
147 and 180 SNPs were associated with paclitaxel or docetaxel IC50s with p-values <10-4 in the LCLs, respectively. Genotyping of 153 candidate SNPs in 874 lung cancer patient samples identified 8 SNPs (p-value < 0.05) associated with either SCLC or NSCLC patient overall survival. Knockdown of PIP4K2A, CCT5, CMBL, EXO1, KMO and OPN3, genes within 200 kb up-/downstream of the 3 SNPs that were associated with SCLC overall survival (rs1778335, rs2662411 and rs7519667), significantly desensitized H196 to paclitaxel. SNPs rs2662411 and rs1778335 were associated with mRNA expression of CMBL or PIP4K2A through microRNA (miRNA) hsa-miR-584 or hsa-miR-1468.
GWAS in an LCL model system, joined with clinical translational and functional studies, might help us identify genetic variations associated with overall survival of lung cancer patients treated paclitaxel.
PMCID: PMC3573965  PMID: 23006423
Taxane; Genome-wide association; Lymphoblastoid cell line; Lung cancer; Overall survival
21.  Systematic evaluation of genetic variants in three biological pathways on patient survival in low stage non-small cell lung cancer 
Studies from selected candidate genes suggest that single nucleotide polymorphisms (SNP) involved in glutathione metabolism, DNA repair, or inflammatory responses may affect overall survival (OS) in stages I-II or low stage non-small cell lung cancer (LS-NSCLC); however, results are inconclusive. In this study, we took a systematic pathway-based approach to simultaneously evaluate the impact of genetic variation from these three pathways on OS following LS-NSCLC diagnosis.
DNA from 647 patients with LS-NSCLC was genotyped for 480 SNPs (tagSNPs) tagging 57 genes from the three candidate pathways. Associations of tagSNPs with OS were assessed at the individual SNP and whole gene levels, adjusting for age, tumor stage, surgery type, and adjuvant therapy. The genotype combinations of the SNPs associated with OS was also estimated.
Among the 412 tagSNPs that were successfully genotyped and passed multi-step quality assessments, 28 showed association with OS (p<0.05). Two of the 28 were estimated to have less than a 20% chance of being false positives (rs3768490 in GSTM4 gene: p=1.32×10-4, q=0.06; rs1729786 in ABCC4 gene: p=9.25×10-4, q=0.20). Gene-based analysis suggested that, in addition to GSTM4 and ABCC4, variation in two other genes, PTGS2 and GSTA2, was also associated with OS.
We describe further evidence that variations in genes involved in the glutathione and inflammatory response pathways are associated with OS in patients with LS-NSCLC. Further studies are warranted to verify our findings and elucidate their functional mechanisms and clinical utility leading to improved survival for lung cancer patients.
PMCID: PMC3158278  PMID: 21792076
glutathione metabolism; DNA repair; inflammation response; genetic polymorphisms; non-small-cell lung cancer; survival analysis
22.  Genetic variation predicting cisplatin cytotoxicity associated with overall survival in lung cancer patients receiving platinum-based chemotherapy †, ‡ 
Inherited variability in the prognosis of lung cancer patients treated with platinum-based chemotherapy has been widely investigated. However, the overall contribution of genetic variation to platinum response is not well established. To identify novel candidate SNPs/genes, we performed a genome-wide association study (GWAS) for cisplatin cytotoxicity using lymphoblastoid cell lines (LCLs), followed by an association study of selected SNPs from the GWAS with overall survival (OS) in lung cancer patients.
Experimental Design
GWAS for cisplatin were performed with 283 ethnically diverse LCLs. 168 top SNPs were genotyped in 222 small cell and 961 non-small cell lung cancer (SCLC, NSCLC) patients treated with platinum-based therapy. Association of the SNPs with OS was determined using the Cox regression model. Selected candidate genes were functionally validated by siRNA knockdown in human lung cancer cells.
Among 157 successfully genotyped SNPs, 9 and 10 SNPs were top SNPs associated with OS for patients with NSCLC and SCLC, respectively, although they were not significant after adjusting for multiple testing. Fifteen genes, including 7 located within 200 kb up or downstream of the four top SNPs and 8 genes for which expression was correlated with three SNPs in LCLs were selected for siRNA screening. Knockdown of DAPK3 and METTL6, for which expression levels were correlated with the rs11169748 and rs2440915 SNPs, significantly decreased cisplatin sensitivity in lung cancer cells.
This series of clinical and complementary laboratory-based functional studies identified several candidate genes/SNPs that might help predict treatment outcomes for platinum-based therapy of lung cancer.
PMCID: PMC3167019  PMID: 21775533
Lung cancer; cisplatin; pharmacogenomics; lymphoblastoid cell lines; GWAS
23.  Germline Copy Number Variation and Ovarian Cancer Survival 
Frontiers in Genetics  2012;3:142.
Copy number variants (CNVs) have been implicated in many complex diseases. We examined whether inherited CNVs were associated with overall survival among women with invasive epithelial ovarian cancer. Germline DNA from 1,056 cases (494 deceased, average of 3.7 years follow-up) was interrogated with the Illumina 610 quad genome-wide array containing, after quality control exclusions, 581,903 single nucleotide polymorphisms (SNPs) and 17,917 CNV probes. Comprehensive analysis capitalized upon the strengths of three complementary approaches to CNV classification. First, to identify small CNVs, single markers were evaluated and, where associated with survival, consecutive markers were combined. Two chromosomal regions were associated with survival using this approach (14q31.3 rs2274736 p = 1.59 × 10−6, p = 0.001; 22q13.31 rs2285164 p = 4.01 × 10−5, p = 0.009), but were not significant after multiple testing correction. Second, to identify large CNVs, genome-wide segmentation was conducted to characterize chromosomal gains and losses, and association with survival was evaluated by segment. Four regions were associated with survival (1q21.3 loss p = 0.005, 5p14.1 loss p = 0.004, 9p23 loss p = 0.002, and 15q22.31 gain p = 0.002); however, again, after correcting for multiple testing, no regions were statistically significant, and none were in common with the single marker approach. Finally, to evaluate associations with general amounts of copy number changes across the genome, we estimated CNV burden based on genome-wide numbers of gains and losses; no associations with survival were observed (p > 0.40). Although CNVs that were not well-covered by the Illumina 610 quad array merit investigation, these data suggest no association between inherited CNVs and survival after ovarian cancer.
PMCID: PMC3413872  PMID: 22891074
association testing; copy number variation; genotyping array; ovarian cancer; overall survival
24.  SAAP-RRBS: streamlined analysis and annotation pipeline for reduced representation bisulfite sequencing 
Bioinformatics  2012;28(16):2180-2181.
Summary: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation.
Availability and implementation: SAAP-RRBS is freely available to non-commercial users at the web site
Contact: or
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3413387  PMID: 22689387
25.  Genetic Variations in Multiple Drug Action Pathways and Survival in Advanced-Stage Non-small Cell Lung Cancer Treated with Chemotherapy 
Variations in genes related to anticancer drugs' biologic activity could influence treatment responses and lung cancer prognosis. Genetic variants in four biological pathways, i.e., glutathione metabolism, DNA repair, cell cycle, and EGFR, were systematically investigated to examine their association with survival in advanced-stage NSCLC treated with chemotherapy.
Experimental Design
A total of 894 tagging single-nucleotide polymorphisms (tagSNPs) in 70 genes from the four pathways were genotyped and analyzed in a 1076-patient cohort. Association with overall survival was analyzed at single-SNP and whole-gene levels within all patients and major chemotherapy agent combination groups.
A poorer overall survival was observed in patients with genetic variations in GSS (glutathione pathway) and MAP3K1 (EGFR pathway) (HR=1.45, 95% CI=1.20–1.70 and HR=1.25, 95% CI=1.05–1.50, respectively). In stratified analysis on patients receiving platinum plus taxane treatment, we observed a hazardous effect on overall survival by MAP3K1 variant (HR=1.38, 95% CI =1.11–1.72) and a protective effect by RAF1 (HR=0.64, 95% CI=0.5–0.82) in the EGFR pathway. In patients receiving platinum plus gemcitabine treatment, RAF and GPX5 (glutathione pathway) genetic variations showed protective effects on survival (HR=0.54, 95% CI=0.38–0.77; HR=0.67, 95% CI=0.52–0.85, respectively); in contrast, NRAS (EGFR pathway) and GPX7 (glutathione pathway) variations showed hazardous effects on overall survival (HR=1.91, 95% CI=1.30–2.80; HR=1.83, 95% CI=1.27–2.63, respectively). All genes that harbored these significant SNPs remained significant by whole-gene analysis.
Common genetic variations in genes of EGFR and glutathione pathways may be associated with overall survival among patients with advanced-stage NSCLC treated with platinum, taxane, and/or gemicitabine combinations.
PMCID: PMC3124814  PMID: 21636554
non-small cell lung cancer; survival; single-nucleotide polymorphisms; pathway; chemotherapy

Results 1-25 (54)