Search tips
Search criteria

Results 1-9 (9)

Clipboard (0)
Year of Publication
Document Types
author:("Tang, xiaoi")
1.  RNA Toxicity and Missplicing in the Common Eye Disease Fuchs Endothelial Corneal Dystrophy* 
The Journal of Biological Chemistry  2015;290(10):5979-5990.
Background: Expansion of intronic (CTG·CAG)n repeats in TCF4 is found in most Fuchs endothelial corneal dystrophy (FECD) patients.
Results: RNA foci co-localizing with the splicing factor MBNL1 are found in FECD cells, and changes in mRNA splicing occur.
Conclusion: Trinucleotide repeat expansion in FECD is associated with RNA focus formation and missplicing.
Significance: RNA toxicity occurs in a disease affecting millions of patients.
Fuchs endothelial corneal dystrophy (FECD) is an inherited degenerative disease that affects the internal endothelial cell monolayer of the cornea and can result in corneal edema and vision loss in severe cases. FECD affects ∼5% of middle-aged Caucasians in the United States and accounts for >14,000 corneal transplantations annually. Among the several genes and loci associated with FECD, the strongest association is with an intronic (CTG·CAG)n trinucleotide repeat expansion in the TCF4 gene, which is found in the majority of affected patients. Corneal endothelial cells from FECD patients harbor a poly(CUG)n RNA that can be visualized as RNA foci containing this condensed RNA and associated proteins. Similar to myotonic dystrophy type 1, the poly(CUG)n RNA co-localizes with and sequesters the mRNA-splicing factor MBNL1, leading to missplicing of essential MBNL1-regulated mRNAs. Such foci and missplicing are not observed in similar cells from FECD patients who lack the repeat expansion. RNA-Seq splicing data from the corneal endothelia of FECD patients and controls reveal hundreds of differential alternative splicing events. These include events previously characterized in the context of myotonic dystrophy type 1 and epithelial-to-mesenchymal transition, as well as splicing changes in genes related to proposed mechanisms of FECD pathogenesis. We report the first instance of RNA toxicity and missplicing in a common non-neurological/neuromuscular disease associated with a repeat expansion. The FECD patient population with this (CTG·CAG)n trinucleotide repeat expansion exceeds that of the combined number of patients in all other microsatellite expansion disorders.
PMCID: PMC4358235  PMID: 25593321
Alternative Splicing; Cornea; Eye; RNA Splicing; Trinucleotide Repeat Disease; Fuchs Corneal Dystrophy; RNA Foci; RNA Toxicity
2.  The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data 
Nucleic Acids Research  2014;42(22):e172.
Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at
PMCID: PMC4267611  PMID: 25352556
3.  MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing 
BMC Bioinformatics  2014;15:224.
Although the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome.
For optimization of tools and parameters, MAP-RSeq was validated using both simulated and real datasets. MAP-RSeq workflow consists of six major modules such as alignment of reads, quality assessment of reads, gene expression assessment and exon read counting, identification of expressed single nucleotide variants (SNVs), detection of fusion transcripts, summarization of transcriptomics data and final report. This workflow is available for Human transcriptome analysis and can be easily adapted and used for other genomes. Several clinical and research projects at the Mayo Clinic have applied the MAP-RSeq workflow for RNA-Seq studies. The results from MAP-RSeq have thus far enabled clinicians and researchers to understand the transcriptomic landscape of diseases for better diagnosis and treatment of patients.
Our software provides gene counts, exon counts, fusion candidates, expressed single nucleotide variants, mapping statistics, visualizations, and a detailed research data report for RNA-Seq. The workflow can be executed on a standalone virtual machine or on a parallel Sun Grid Engine cluster. The software can be downloaded from
PMCID: PMC4228501  PMID: 24972667
Transcriptomic sequencing; RNA-Seq; Bioinformatics workflow; Gene expression; Exon counts; Fusion transcripts; Expressed single nucleotide variants; RNA-Seq reports
4.  Transcriptome-Wide Analysis of UTRs in Non-Small Cell Lung Cancer Reveals Cancer-Related Genes with SNV-Induced Changes on RNA Secondary Structure and miRNA Target Sites 
PLoS ONE  2014;9(1):e82699.
Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways.
PMCID: PMC3885406  PMID: 24416147
5.  An Integrated Model of the Transcriptome of HER2-Positive Breast Cancer 
PLoS ONE  2013;8(11):e79298.
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative, and HER2-positive tumors to identify 685 differentially expressed genes, 102 alternatively spliced genes, and 303 genes that expressed single nucleotide sequence variants (eSNVs) that were associated with the HER2-positive tumors in our survey panel. These features were integrated into a transcriptome landscape model that identified 12 highly interconnected genomic modules, each of which represents a cellular processes pathway that appears to define the genomic architecture of the HER2-positive tumors in our test set. The generality of the model was confirmed by the observation that several key pathways were enriched in HER2-positive TCGA breast tumors. The ability of this model to make relevant predictions about the biology of breast cancer cells was established by the observation that integrin signaling was linked to lapatinib sensitivity in vitro and strongly associated with risk of relapse in the NCCTG N9831 adjuvant trastuzumab clinical trial dataset. Additional modules from the HER2 transcriptome model, including ubiquitin-mediated proteolysis, TGF-beta signaling, RHO-family GTPase signaling, and M-phase progression, were linked to response to lapatinib and paclitaxel in vitro and/or risk of relapse in the N9831 dataset. These data indicate that an integrated transcriptome landscape model derived from a test set of HER2-positive breast tumors has potential for predicting outcome and for identifying novel potential therapeutic strategies for this breast cancer subtype.
PMCID: PMC3815156  PMID: 24223926
6.  Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis 
BMC Plant Biology  2012;12:138.
Identification of the novel genes relevant to plant cell-wall (PCW) synthesis represents a highly important and challenging problem. Although substantial efforts have been invested into studying this problem, the vast majority of the PCW related genes remain unknown.
Here we present a computational study focused on identification of the novel PCW genes in Arabidopsis based on the co-expression analyses of transcriptomic data collected under 351 conditions, using a bi-clustering technique. Our analysis identified 217 highly co-expressed gene clusters (modules) under some experimental conditions, each containing at least one gene annotated as PCW related according to the Purdue Cell Wall Gene Families database. These co-expression modules cover 349 known/annotated PCW genes and 2,438 new candidates. For each candidate gene, we annotated the specific PCW synthesis stages in which it is involved and predicted the detailed function. In addition, for the co-expressed genes in each module, we predicted and analyzed their cis regulatory motifs in the promoters using our motif discovery pipeline, providing strong evidence that the genes in each co-expression module are transcriptionally co-regulated. From the all co-expression modules, we infer that 108 modules are related to four major PCW synthesis components, using three complementary methods.
We believe our approach and data presented here will be useful for further identification and characterization of PCW genes. All the predicted PCW genes, co-expression modules, motifs and their annotations are available at a web-based database:
PMCID: PMC3463447  PMID: 22877077
Plant cell wall; Arabidopsis; Co-expression network analysis; Bi-clustering; Cis regulatory motifs
7.  Deep Sequence Analysis of Non-Small Cell Lung Cancer: Integrated Analysis of Gene Expression, Alternative Splicing, and Single Nucleotide Variations in Lung Adenocarcinomas with and without Oncogenic KRAS Mutations 
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.
PMCID: PMC3356053  PMID: 22655260
transcriptome sequencing; RNA-Seq; KRAS mutation; NSCLC; bioinformatics; network analysis; data integration and computational methods
8.  Systems Biology of the qa Gene Cluster in Neurospora crassa 
PLoS ONE  2011;6(6):e20671.
An ensemble of genetic networks that describe how the model fungal system, Neurospora crassa, utilizes quinic acid (QA) as a sole carbon source has been identified previously. A genetic network for QA metabolism involves the genes, qa-1F and qa-1S, that encode a transcriptional activator and repressor, respectively and structural genes, qa-2, qa-3, qa-4, qa-x, and qa-y. By a series of 4 separate and independent, model-guided, microarray experiments a total of 50 genes are identified as QA-responsive and hypothesized to be under QA-1F control and/or the control of a second QA-responsive transcription factor (NCU03643) both in the fungal binuclear Zn(II)2Cys6 cluster family. QA-1F regulation is not sufficient to explain the quantitative variation in expression profiles of the 50 QA-responsive genes. QA-responsive genes include genes with products in 8 mutually connected metabolic pathways with 7 of them one step removed from the tricarboxylic (TCA) Cycle and with 7 of them one step removed from glycolysis: (1) starch and sucrose metabolism; (2) glycolysis/glucanogenesis; (3) TCA Cycle; (4) butanoate metabolism; (5) pyruvate metabolism; (6) aromatic amino acid and QA metabolism; (7) valine, leucine, and isoleucine degradation; and (8) transport of sugars and amino acids. Gene products both in aromatic amino acid and QA metabolism and transport show an immediate response to shift to QA, while genes with products in the remaining 7 metabolic modules generally show a delayed response to shift to QA. The additional QA-responsive cutinase transcription factor-1β (NCU03643) is found to have a delayed response to shift to QA. The series of microarray experiments are used to expand the previously identified genetic network describing the qa gene cluster to include all 50 QA-responsive genes including the second transcription factor (NCU03643). These studies illustrate new methodologies from systems biology to guide model-driven discoveries about a core metabolic network involving carbon and amino acid metabolism in N. crassa.
PMCID: PMC3114802  PMID: 21695121
9.  Systems Biology of the Clock in Neurospora crassa 
PLoS ONE  2008;3(8):e3105.
A model-driven discovery process, Computing Life, is used to identify an ensemble of genetic networks that describe the biological clock. A clock mechanism involving the genes white-collar-1 and white-collar-2 (wc-1 and wc-2) that encode a transcriptional activator (as well as a blue-light receptor) and an oscillator frequency (frq) that encodes a cyclin that deactivates the activator is used to guide this discovery process through three cycles of microarray experiments. Central to this discovery process is a new methodology for the rational design of a Maximally Informative Next Experiment (MINE), based on the genetic network ensemble. In each experimentation cycle, the MINE approach is used to select the most informative new experiment in order to mine for clock-controlled genes, the outputs of the clock. As much as 25% of the N. crassa transcriptome appears to be under clock-control. Clock outputs include genes with products in DNA metabolism, ribosome biogenesis in RNA metabolism, cell cycle, protein metabolism, transport, carbon metabolism, isoprenoid (including carotenoid) biosynthesis, development, and varied signaling processes. Genes under the transcription factor complex WCC ( = WC-1/WC-2) control were resolved into four classes, circadian only (612 genes), light-responsive only (396), both circadian and light-responsive (328), and neither circadian nor light-responsive (987). In each of three cycles of microarray experiments data support that wc-1 and wc-2 are auto-regulated by WCC. Among 11,000 N. crassa genes a total of 295 genes, including a large fraction of phosphatases/kinases, appear to be under the immediate control of the FRQ oscillator as validated by 4 independent microarray experiments. Ribosomal RNA processing and assembly rather than its transcription appears to be under clock control, suggesting a new mechanism for the post-transcriptional control of clock-controlled genes.
PMCID: PMC2518617  PMID: 18769678

Results 1-9 (9)