Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Effective normalization for copy number variation detection from whole genome sequencing 
BMC Genomics  2012;13(Suppl 6):S16.
Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations.
We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions.
The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls.
Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis.
PMCID: PMC3481445  PMID: 23134596
2.  Major Chromosomal Breakpoint Intervals in Breast Cancer Co-Localize with Differentially Methylated Regions 
Frontiers in Oncology  2012;2:197.
Solid tumors exhibit chromosomal rearrangements resulting in gain or loss of multiple chromosomal loci (copy number variation, or CNV), and translocations that occasionally result in the creation of novel chimeric genes. In the case of breast cancer, although most individual tumors each have unique CNV landscape, the breakpoints, as measured over large datasets, appear to be non-randomly distributed in the genome. Breakpoints show a significant regional concentration at genomic loci spanning perhaps several megabases. The proximal cause of these breakpoint concentrations is a subject of speculation, but is, as yet, largely unknown. To shed light on this issue, we have performed a bio-statistical analysis on our previously published data for a set of 119 breast tumors and normal controls (Wiedswang et al., 2003), where each sample has both high-resolution CNV and methylation data. The method examined the distribution of closeness of breakpoint regions with differentially methylated regions (DMR), coupled with additional genomic parameters, such as repeat elements and designated “fragile sites” in the reference genome. Through this analysis, we have identified a set of 93 regional loci called breakpoint enriched DMR (BEDMRs) characterized by altered DNA methylation in cancer compared to normal cells that are associated with frequent breakpoint concentrations within a distance of 1 Mb. BEDMR loci are further associated with local hypomethylation (66%), concentrations of the Alu SINE repeats within 3 Mb (35% of the cases), and tend to occur near a number of cancer related genes such as the protocadherins, AKT1, DUB3, GAB2. Furthermore, BEDMRs seem to deregulate members of the histone gene family and chromatin remodeling factors, e.g., JMJD1B, which might affect the chromatin structure and disrupt coordinate signaling and repair. From this analysis we propose that preference for chromosomal breakpoints is related to genome structure coupled with alterations in DNA methylation and hence, chromatin structure, associated with tumorigenesis.
PMCID: PMC3530719  PMID: 23293768
DNA methylation; copy number variation; Alu repeat element; genome instability; multi-modal analysis; breast cancer
3.  Identification of Tumor Suppressors and Oncogenes from Genomic and Epigenetic Features in Ovarian Cancer 
PLoS ONE  2011;6(12):e28503.
The identification of genetic and epigenetic alterations from primary tumor cells has become a common method to identify genes critical to the development and progression of cancer. We seek to identify those genetic and epigenetic aberrations that have the most impact on gene function within the tumor. First, we perform a bioinformatic analysis of copy number variation (CNV) and DNA methylation covering the genetic landscape of ovarian cancer tumor cells. We separately examined CNV and DNA methylation for 42 primary serous ovarian cancer samples using MOMA-ROMA assays and 379 tumor samples analyzed by The Cancer Genome Atlas. We have identified 346 genes with significant deletions or amplifications among the tumor samples. Utilizing associated gene expression data we predict 156 genes with altered copy number and correlated changes in expression. Among these genes CCNE1, POP4, UQCRB, PHF20L1 and C19orf2 were identified within both data sets. We were specifically interested in copy number variation as our base genomic property in the prediction of tumor suppressors and oncogenes in the altered ovarian tumor. We therefore identify changes in DNA methylation and expression for all amplified and deleted genes. We statistically define tumor suppressor and oncogenic features for these modalities and perform a correlation analysis with expression. We predicted 611 potential oncogenes and tumor suppressors candidates by integrating these data types. Genes with a strong correlation for methylation dependent expression changes exhibited at varying copy number aberrations include CDCA8, ATAD2, CDKN2A, RAB25, AURKA, BOP1 and EIF2C3. We provide copy number variation and DNA methylation analysis for over 11,500 individual genes covering the genetic landscape of ovarian cancer tumors. We show the extent of genomic and epigenetic alterations for known tumor suppressors and oncogenes and also use these defined features to identify potential ovarian cancer gene candidates.
PMCID: PMC3234280  PMID: 22174824
4.  Multi-cancer computational analysis reveals invasion-associated variant of desmoplastic reaction involving INHBA, THBS2 and COL11A1 
BMC Medical Genomics  2010;3:51.
Despite extensive research, the details of the biological mechanisms by which cancer cells acquire motility and invasiveness are largely unknown. This study identifies an invasion associated gene signature shedding light on these mechanisms.
We analyze data from multiple cancers using a novel computational method identifying sets of genes whose coordinated overexpression indicates the presence of a particular phenotype, in this case high-stage cancer.
We conclude that there is one shared "core" metastasis-associated gene expression signature corresponding to a specific variant of stromal desmoplastic reaction, present in a large subset of samples that have exceeded a threshold of invasive transition specific to each cancer, indicating that the corresponding biological mechanism is triggered at that point. For example this threshold is reached at stage IIIc in ovarian cancer and at stage II in colorectal cancer. Therefore, its presence indicates that the corresponding stage has been reached. It has several features, such as coordinated overexpression of particular collagens, mainly COL11A1 and other genes, mainly THBS2 and INHBA. The composition of the overexpressed genes indicates invasion-facilitating altered proteolysis in the extracellular matrix. The prominent presence in the signature of INHBA in all cancers strongly suggests a biological mechanism centered on activin A induced TGF-β signaling, because activin A is a member of the TGF-β superfamily consisting of an INHBA homodimer. Furthermore, we establish that the signature is predictive of neoadjuvant therapy response in at least one breast cancer data set.
Therefore, these results can be used for developing high specificity biomarkers sensing cancer invasion and predicting response to neoadjuvant therapy, as well as potential multi-cancer metastasis inhibiting therapeutics targeting the corresponding biological mechanism.
PMCID: PMC2988703  PMID: 21047417
5.  PAPAyA: a platform for breast cancer biomarker signature discovery, evaluation and assessment 
BMC Bioinformatics  2009;10(Suppl 9):S7.
The decision environment for cancer care is becoming increasingly complex due to the discovery and development of novel genomic tests that offer information regarding therapy response, prognosis and monitoring, in addition to traditional histopathology. There is, therefore, a need for translational clinical tools based on molecular bioinformatics, particularly in current cancer care, that can acquire, analyze the data, and interpret and present information from multiple diagnostic modalities to help the clinician make effective decisions.
We present a platform for molecular signature discovery and clinical decision support that relies on genomic and epigenomic measurement modalities as well as clinical parameters such as histopathological results and survival information. Our Physician Accessible Preclinical Analytics Application (PAPAyA) integrates a powerful set of statistical and machine learning tools that leverage the connections among the different modalities. It is easily extendable and reconfigurable to support integration of existing research methods and tools into powerful data analysis and interpretation pipelines. A current configuration of PAPAyA with examples of its performance on breast cancer molecular profiles is used to present the platform in action.
PAPAyA enables analysis of data from (pre)clinical studies, formulation of new clinical hypotheses, and facilitates clinical decision support by abstracting molecular profiles for clinicians.
PMCID: PMC2745694  PMID: 19761577
6.  Inference of Disease-Related Molecular Logic from Systems-Based Microarray Analysis 
PLoS Computational Biology  2006;2(6):e68.
Computational analysis of gene expression data from microarrays has been useful for medical diagnosis and prognosis. The ability to analyze such data at the level of biological modules, rather than individual genes, has been recognized as important for improving our understanding of disease-related pathways. It has proved difficult, however, to infer pathways from microarray data by deriving modules of multiple synergistically interrelated genes, rather than individual genes. Here we propose a systems-based approach called Entropy Minimization and Boolean Parsimony (EMBP) that identifies, directly from gene expression data, modules of genes that are jointly associated with disease. Furthermore, the technique provides insight into the underlying biomolecular logic by inferring a logic function connecting the joint expression levels in a gene module with the outcome of disease. Coupled with biological knowledge, this information can be useful for identifying disease-related pathways, suggesting potential therapeutic approaches for interfering with the functions of such pathways. We present an example providing such gene modules associated with prostate cancer from publicly available gene expression data, and we successfully validate the results on additional independently derived data. Our results indicate a link between prostate cancer and cellular damage from oxidative stress combined with inhibition of apoptotic mechanisms normally triggered by such damage.
Diseases such as cancer are often associated with malfunctioning pathways involving several genes. Identifying modules of such genes and how the genes in each module interact with each other is helpful toward understanding the nature of these diseases. Here the authors provide a novel computational method for discovering such modules of genes merely from two sets of gene expression data, one from healthy tissues and one from tissues suffering from a particular disease. The method is based on the concept of identifying sets of genes whose joint expression state predicts the presence or absence of a particular disease with minimum uncertainty. Once such gene sets have been identified, we can then further use the microarray data to determine the “logic” that connects the genes' individual expression states related to the outcome of the disease. In turn, this logic may give us valuable insight into the nature of the pathways and how we may target some elements of these pathways for therapeutic purposes. The authors apply this methodology in a particular example and conclude that prostate cancer is often associated with cellular damage from oxidative stress combined with the inhibition of the apoptotic mechanisms normally activated by such damage.
PMCID: PMC1479089  PMID: 16789819
7.  Variable window binding for mutually exclusive alternative splicing 
Genome Biology  2006;7(1):R2.
A mechanism called Variable Window Binding for mutually exclusive alternative splicing is described for exon 6 of the Drosophila Dscam gene, which represents an extreme case of alternative splicing.
Genes of advanced organisms undergo alternative splicing, which can be mutually exclusive, in the sense that only one exon is included in the mature mRNA out of a cluster of alternative choices, often arranged in a tandem array. In many cases, however, the details of the underlying biologic mechanisms are unknown.
We describe 'variable window binding' - a mechanism used for mutually exclusive alternative splicing by which a segment ('window') of a conserved nucleotide 'anchor' sequence upstream of the exon 6 cluster in the pre-mRNA of the fruitfly Dscam gene binds to one of the introns, thereby activating selection of the exon directly downstream from the binding site. This mechanism is supported by the fact that the anchor sequence can be inferred solely from a comparison of the intron sequences using a genetic algorithm. Because the window location varies for each exon choice, regulation can be achieved by obstructing part of that sequence. We also describe a related mechanism based on competing pre-mRNA stem-loop structures that could explain the mutually exclusive choice of exon 17 of the Dscam gene.
On the basis of comparative sequence analysis, we propose efficient biologic mechanisms of alternative splicing of the Drosophila Dscam gene that rely on the inherent structure of the pre-mRNA. Related mechanisms employing 'locus control regions' could be involved on other occasions of mutually exclusive choices of exons or genes.
PMCID: PMC1431710  PMID: 16507134

Results 1-7 (7)