Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)
Year of Publication
Document Types
1.  HiTC: exploration of high-throughput ‘C’ experiments 
Bioinformatics  2012;28(21):2843-2844.
Summary: The R/Bioconductor package HiTC facilitates the exploration of high-throughput 3C-based data. It allows users to import and export ‘C’ data, to transform, normalize, annotate and visualize interaction maps. The package operates within the Bioconductor framework and thus offers new opportunities for future development in this field.
Availability and implementation: The R package HiTC is available from the Bioconductor website. A detailed vignette provides additional documentation and help for using the package.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3476334  PMID: 22923296
2.  HMCan: a method for detecting chromatin modifications in cancer samples using ChIP-seq data 
Bioinformatics  2013;29(23):2979-2986.
Motivation: Cancer cells are often characterized by epigenetic changes, which include aberrant histone modifications. In particular, local or regional epigenetic silencing is a common mechanism in cancer for silencing expression of tumor suppressor genes. Though several tools have been created to enable detection of histone marks in ChIP-seq data from normal samples, it is unclear whether these tools can be efficiently applied to ChIP-seq data generated from cancer samples. Indeed, cancer genomes are often characterized by frequent copy number alterations: gains and losses of large regions of chromosomal material. Copy number alterations may create a substantial statistical bias in the evaluation of histone mark signal enrichment and result in underdetection of the signal in the regions of loss and overdetection of the signal in the regions of gain.
Results: We present HMCan (Histone modifications in cancer), a tool specially designed to analyze histone modification ChIP-seq data produced from cancer genomes. HMCan corrects for the GC-content and copy number bias and then applies Hidden Markov Models to detect the signal from the corrected data. On simulated data, HMCan outperformed several commonly used tools developed to analyze histone modification data produced from genomes without copy number alterations. HMCan also showed superior results on a ChIP-seq dataset generated for the repressive histone mark H3K27me3 in a bladder cancer cell line. HMCan predictions matched well with experimental data (qPCR validated regions) and included, for example, the previously detected H3K27me3 mark in the promoter of the DLEC1 gene, missed by other tools we tested.
Availability: Source code and binaries can be downloaded at, implemented in C++.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3834794  PMID: 24021381
3.  Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data 
Bioinformatics  2011;28(3):423-425.
Summary: More and more cancer studies use next-generation sequencing (NGS) data to detect various types of genomic variation. However, even when researchers have such data at hand, single-nucleotide polymorphism arrays have been considered necessary to assess copy number alterations and especially loss of heterozygosity (LOH). Here, we present the tool Control-FREEC that enables automatic calculation of copy number and allelic content profiles from NGS data, and consequently predicts regions of genomic alteration such as gains, losses and LOH. Taking as input aligned reads, Control-FREEC constructs copy number and B-allele frequency profiles. The profiles are then normalized, segmented and analyzed in order to assign genotype status (copy number and allelic content) to each genomic region. When a matched normal sample is provided, Control-FREEC discriminates somatic from germline events. Control-FREEC is able to analyze overdiploid tumor samples and samples contaminated by normal cells. Low mappability regions can be excluded from the analysis using provided mappability tracks.
Availability: C++ source code is available at:
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3268243  PMID: 22155870
4.  Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization 
Bioinformatics  2010;27(2):268-269.
Summary: We present a tool for control-free copy number alteration (CNA) detection using deep-sequencing data, particularly useful for cancer studies. The tool deals with two frequent problems in the analysis of cancer deep-sequencing data: absence of control sample and possible polyploidy of cancer cells. FREEC (control-FREE Copy number caller) automatically normalizes and segments copy number profiles (CNPs) and calls CNAs. If ploidy is known, FREEC assigns absolute copy number to each predicted CNA. To normalize raw CNPs, the user can provide a control dataset if available; otherwise GC content is used. We demonstrate that for Illumina single-end, mate-pair or paired-end sequencing, GC-contentr normalization provides smooth profiles that can be further segmented and analyzed in order to predict CNAs.
Availability: Source code and sample data are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3018818  PMID: 21081509
5.  girafe – an R/Bioconductor package for functional exploration of aligned next-generation sequencing reads 
Bioinformatics  2010;26(22):2902-2903.
Summary: The R/Bioconductor package girafe facilitates the functional exploration of alignments of sequence reads from next-generation sequencing data to a genome. It allows users to investigate the genomic intervals together with the aligned reads and to work with, visualise and export these intervals. Moreover, the package operates within and extends the ever-growing Bioconductor framework and thus enables users to leverage a multitude of methods for their data in order to answer specific research questions.
Availability and Implementation: The R package girafe is available from the Bioconductor web site:
An extensive vignette and the Bioconductor mailing lists provide additional documentation and help for using the package.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2971573  PMID: 20861030
6.  SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data 
Bioinformatics  2010;26(15):1895-1896.
Summary: We present SVDetect, a program designed to identify genomic structural variations from paired-end and mate-pair next-generation sequencing data produced by the Illumina GA and ABI SOLiD platforms. Applying both sliding-window and clustering strategies, we use anomalously mapped read pairs provided by current short read aligners to localize genomic rearrangements and classify them according to their type, e.g. large insertions–deletions, inversions, duplications and balanced or unbalanced inter-chromosomal translocations. SVDetect outputs predicted structural variants in various file formats for appropriate graphical visualization.
Availability: Source code and sample data are available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2905550  PMID: 20639544
7.  Classification of arrayCGH data using fused SVM 
Bioinformatics  2008;24(13):i375-i382.
Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of bacterial artificial chromosomes along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules.
Results: We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome.
Availability: All data and algorithms are publicly available.
PMCID: PMC2718663  PMID: 18586737

Results 1-7 (7)