Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Aberrant signature methylome by DNMT1 hot spot mutation in hereditary sensory and autonomic neuropathy 1E 
Epigenetics  2014;9(8):1184-1193.
DNA methyltransferase 1 (DNMT1) is essential for DNA methylation, gene regulation and chromatin stability. We previously discovered DNMT1 mutations cause hereditary sensory and autonomic neuropathy type 1 with dementia and hearing loss (HSAN1E; OMIM 614116). HSAN1E is the first adult-onset neurodegenerative disorder caused by a defect in a methyltransferase gene. HSAN1E patients appear clinically normal until young adulthood, then begin developing the characteristic symptoms involving central and peripheral nervous systems. Some HSAN1E patients also develop narcolepsy and it has recently been suggested that HSAN1E is allelic to autosomal dominant cerebellar ataxia, deafness, with narcolepsy (ADCA-DN; OMIM 604121), which is also caused by mutations in DNMT1. A hotspot mutation Y495C within the targeting sequence domain of DNMT1 has been identified among HSAN1E patients. The mutant DNMT1 protein shows premature degradation and reduced DNA methyltransferase activity. Herein, we investigate genome-wide DNA methylation at single-base resolution through whole-genome bisulfite sequencing of germline DNA in 3 pairs of HSAN1E patients and their gender- and age-matched siblings. Over 1 billion 75-bp single-end reads were generated for each sample. In the 3 affected siblings, overall methylation loss was consistently found in all chromosomes with X and 18 being most affected. Paired sample analysis identified 564,218 differentially methylated CpG sites (DMCs; P < 0.05), of which 300 134 were intergenic and 264 084 genic CpGs. Hypomethylation was predominant in both genic and intergenic regions, including promoters, exons, most CpG islands, L1, L2, Alu, and satellite repeats and simple repeat sequences. In some CpG islands, hypermethylated CpGs outnumbered hypomethylated CpGs. In 201 imprinted genes, there were more DMCs than in non-imprinted genes and most were hypomethylated. Differentially methylated region (DMR) analysis identified 5649 hypomethylated and 1872 hypermethylated regions. Importantly, pathway analysis revealed 1693 genes associated with the identified DMRs were highly associated in diverse neurological disorders and NAD+/NADH metabolism pathways is implicated in the pathogenesis. Our results provide novel insights into the epigenetic mechanism of neurodegeneration arising from a hotspot DNMT1 mutation and reveal pathways potentially important in a broad category of neurological and psychological disorders.
PMCID: PMC4164503  PMID: 25033457
DNA methylation; neurodegeneration; OMIM 614116; OMIM 604121; whole genome bisulfite sequencing; HSAN1E
2.  Comprehensive Assessment of Genetic Variants Within TCF4 in Fuchs' Endothelial Corneal Dystrophy 
The single nucleotide variant (SNV), rs613872, in the transcription factor 4 (TCF4) gene was previously found to be strongly associated (P = 6 × 10−26) with Fuchs' endothelial corneal dystrophy (FECD). Subsequently, an intronic expansion of the repeating trinucleotides, TGC, was found to be even more predictive of disease. We performed comprehensive sequencing of the TCF4 gene region in order to identify the best marker for FECD within TCF4 and to identify other novel variants that may be associated with FECD.
Leukocyte DNA was isolated from 68 subjects with FECD and 16 unaffected individuals. A custom capture panel was used to isolate the region surrounding the two previously validated markers of FECD. Sequencing of the TCF4 coding region, introns and flanking sequence, spanning 465 kb was performed at >1000× average coverage using the Illumina HiSequation 2000.
TGC expansion (>50 repeats) was present in 46 (68%) FECD-affected subjects and one (6%) normal subject. A total of 1866 variants, including 1540 SNVs, were identified. Only two previously reported SNVs resided in the TCF4 coding region, neither of which segregated with disease. No variant, including TGC expansion, correlated perfectly with disease status. Trinucleotide repeat expansion was a better predictor of disease than any other variant.
Complete sequencing of the TCF4 genomic region revealed no single causative variant for FECD. The intronic trinucleotide repeat expansion within TCF4 continues to be more strongly associated with FECD than any other genetic variant.
Complete sequencing of the TCF4 gene was performed in 68 patients with Fuchs' dystrophy and 16 controls. No variant was a perfect predictor of disease or was more strongly associated with Fuchs' than TGC trinucleotide repeat expansion, suggesting the role of repeat expansion in disease pathogenesis.
PMCID: PMC4179444  PMID: 25168903
Fuchs' dystrophy; transcription factor 4; TCF4; trinucleotide; repeat expansion
3.  The eSNV-detect: a computational system to identify expressed single nucleotide variants from transcriptome sequencing data 
Nucleic Acids Research  2014;42(22):e172.
Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6–96.8% precision and 91.6–95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at
PMCID: PMC4267611  PMID: 25352556
4.  MAP-RSeq: Mayo Analysis Pipeline for RNA sequencing 
BMC Bioinformatics  2014;15:224.
Although the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome.
For optimization of tools and parameters, MAP-RSeq was validated using both simulated and real datasets. MAP-RSeq workflow consists of six major modules such as alignment of reads, quality assessment of reads, gene expression assessment and exon read counting, identification of expressed single nucleotide variants (SNVs), detection of fusion transcripts, summarization of transcriptomics data and final report. This workflow is available for Human transcriptome analysis and can be easily adapted and used for other genomes. Several clinical and research projects at the Mayo Clinic have applied the MAP-RSeq workflow for RNA-Seq studies. The results from MAP-RSeq have thus far enabled clinicians and researchers to understand the transcriptomic landscape of diseases for better diagnosis and treatment of patients.
Our software provides gene counts, exon counts, fusion candidates, expressed single nucleotide variants, mapping statistics, visualizations, and a detailed research data report for RNA-Seq. The workflow can be executed on a standalone virtual machine or on a parallel Sun Grid Engine cluster. The software can be downloaded from
PMCID: PMC4228501  PMID: 24972667
Transcriptomic sequencing; RNA-Seq; Bioinformatics workflow; Gene expression; Exon counts; Fusion transcripts; Expressed single nucleotide variants; RNA-Seq reports
5.  From Days to Hours: Reporting Clinically Actionable Variants from Whole Genome Sequencing 
PLoS ONE  2014;9(2):e86803.
As the cost of whole genome sequencing (WGS) decreases, clinical laboratories will be looking at broadly adopting this technology to screen for variants of clinical significance. To fully leverage this technology in a clinical setting, results need to be reported quickly, as the turnaround rate could potentially impact patient care. The latest sequencers can sequence a whole human genome in about 24 hours. However, depending on the computing infrastructure available, the processing of data can take several days, with the majority of computing time devoted to aligning reads to genomics regions that are to date not clinically interpretable. In an attempt to accelerate the reporting of clinically actionable variants, we have investigated the utility of a multi-step alignment algorithm focused on aligning reads and calling variants in genomic regions of clinical relevance prior to processing the remaining reads on the whole genome. This iterative workflow significantly accelerates the reporting of clinically actionable variants with no loss of accuracy when compared to genotypes obtained with the OMNI SNP platform or to variants detected with a standard workflow that combines Novoalign and GATK.
PMCID: PMC3914798  PMID: 24505267
6.  SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations 
PLoS ONE  2013;8(12):e83356.
Structural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints.
We developed and validated SoftSearch using real and synthetic datasets. SoftSearch’s key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call.
We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.
PMCID: PMC3865185  PMID: 24358278
7.  An Integrated Model of the Transcriptome of HER2-Positive Breast Cancer 
PLoS ONE  2013;8(11):e79298.
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative, and HER2-positive tumors to identify 685 differentially expressed genes, 102 alternatively spliced genes, and 303 genes that expressed single nucleotide sequence variants (eSNVs) that were associated with the HER2-positive tumors in our survey panel. These features were integrated into a transcriptome landscape model that identified 12 highly interconnected genomic modules, each of which represents a cellular processes pathway that appears to define the genomic architecture of the HER2-positive tumors in our test set. The generality of the model was confirmed by the observation that several key pathways were enriched in HER2-positive TCGA breast tumors. The ability of this model to make relevant predictions about the biology of breast cancer cells was established by the observation that integrin signaling was linked to lapatinib sensitivity in vitro and strongly associated with risk of relapse in the NCCTG N9831 adjuvant trastuzumab clinical trial dataset. Additional modules from the HER2 transcriptome model, including ubiquitin-mediated proteolysis, TGF-beta signaling, RHO-family GTPase signaling, and M-phase progression, were linked to response to lapatinib and paclitaxel in vitro and/or risk of relapse in the N9831 dataset. These data indicate that an integrated transcriptome landscape model derived from a test set of HER2-positive breast tumors has potential for predicting outcome and for identifying novel potential therapeutic strategies for this breast cancer subtype.
PMCID: PMC3815156  PMID: 24223926
8.  SAAP-RRBS: streamlined analysis and annotation pipeline for reduced representation bisulfite sequencing 
Bioinformatics  2012;28(16):2180-2181.
Summary: Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking. To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting and visualization. This package facilitates a rapid transition from sequencing reads to a fully annotated CpG methylation report to biological interpretation.
Availability and implementation: SAAP-RRBS is freely available to non-commercial users at the web site
Contact: or
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3413387  PMID: 22689387
9.  Deep Sequence Analysis of Non-Small Cell Lung Cancer: Integrated Analysis of Gene Expression, Alternative Splicing, and Single Nucleotide Variations in Lung Adenocarcinomas with and without Oncogenic KRAS Mutations 
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.
PMCID: PMC3356053  PMID: 22655260
transcriptome sequencing; RNA-Seq; KRAS mutation; NSCLC; bioinformatics; network analysis; data integration and computational methods
10.  TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data 
Bioinformatics  2011;28(2):277-278.
Summary: TREAT (Targeted RE-sequencing Annotation Tool) is a tool for facile navigation and mining of the variants from both targeted resequencing and whole exome sequencing. It provides a rich integration of publicly available as well as in-house developed annotations and visualizations for variants, variant-hosting genes and host-gene pathways.
Availability and implementation: TREAT is freely available to non-commercial users as either a stand-alone annotation and visualization tool, or as a comprehensive workflow integrating sequencing alignment and variant calling. The executables, instructions and the Amazon Cloud Images of TREAT can be downloaded at the website:
Supplementary information: Supplementary data are provided at Bioinformatics online.
PMCID: PMC3259432  PMID: 22088845

Results 1-10 (10)