PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-17 (17)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
Document Types
1.  Bambino: a variant detector and alignment viewer for next-generation sequencing data in the SAM/BAM format 
Bioinformatics  2011;27(6):865-866.
Summary: Bambino is a variant detector and graphical alignment viewer for next-generation sequencing data in the SAM/BAM format, which is capable of pooling data from multiple source files. The variant detector takes advantage of SAM-specific annotations, and produces detailed output suitable for genotyping and identification of somatic mutations. The assembly viewer can display reads in the context of either a user-provided or automatically generated reference sequence, retrieve genome annotation features from a UCSC genome annotation database, display histograms of non-reference allele frequencies, and predict protein-coding changes caused by SNPs.
Availability: Bambino is written in platform-independent Java and available from https://cgwb.nci.nih.gov/goldenPath/bamview/documentation/index.html, along with documentation and example data. Bambino may be launched online via Java Web Start or downloaded and run locally.
Contact: edmonson@nih.gov.
doi:10.1093/bioinformatics/btr032
PMCID: PMC3051333  PMID: 21278191
2.  CREBBP mutations in relapsed acute lymphoblastic leukaemia 
Nature  2011;471(7337):235-239.
Relapsed acute lymphoblastic leukaemia (ALL) is a leading cause of death due to disease in young people, but the biologic determinants of treatment failure remain poorly understood. Recent genome-wide profiling of structural DNA alterations in ALL have identified multiple submicroscopic somatic mutations targeting key cellular pathways1,2, and have demonstrated substantial evolution in genetic alterations from diagnosis to relapse3. However, detailed analysis of sequence mutations in ALL has not been performed. To identify novel mutations in relapsed ALL, we resequenced 300 genes in matched diagnosis and relapse samples from 23 patients with ALL. This identified 52 somatic non-synonymous mutations in 32 genes, many of which were novel, including the transcriptional coactivators CREBBP and NCOR1, the transcription factors ERG, SPI1, TCF4 and TCF7L2, components of the Ras signalling pathway, histone genes, genes involved in histone modification (CREBBP and CTCF), and genes previously shown to be targets of recurring DNA copy number alteration in ALL. Analysis of an extended cohort of 71 diagnosis-relapse cases and 270 acute leukaemia cases that did not relapse found that 18.3% of relapse cases had sequence or deletion mutations of CREBBP, which encodes the transcriptional coactivator and histone acetyltransferase (HAT) CREB-binding protein (CBP)4. The mutations were either present at diagnosis or acquired at relapse, and resulted in truncated alleles or deleterious substitutions in conserved residues of the HAT domain. Functionally, the mutations impaired histone acetylation and transcriptional regulation of CREBBP targets, including glucocorticoid responsive genes. Several mutations acquired at relapse were detected in subclones at diagnosis, suggesting that the mutations may confer resistance to therapy. These results extend the landscape of genetic alterations in leukaemia, and identify mutations targeting transcriptional and epigenetic regulation as a mechanism of resistance in ALL.
doi:10.1038/nature09727
PMCID: PMC3076610  PMID: 21390130
3.  Detecting Cancer Gene Networks Characterized by Recurrent Genomic Alterations in a Population 
PLoS ONE  2011;6(1):e14437.
High resolution, system-wide characterizations have demonstrated the capacity to identify genomic regions that undergo genomic aberrations. Such research efforts often aim at associating these regions with disease etiology and outcome. Identifying the corresponding biologic processes that are responsible for disease and its outcome remains challenging. Using novel analytic methods that utilize the structure of biologic networks, we are able to identify the specific networks that are highly significantly, nonrandomly altered by regions of copy number amplification observed in a systems-wide analysis. We demonstrate this method in breast cancer, where the state of a subset of the pathways identified through these regions is shown to be highly associated with disease survival and recurrence.
doi:10.1371/journal.pone.0014437
PMCID: PMC3014942  PMID: 21283511
4.  Novel pathway analysis of genomic polymorphism-cancer risk interaction in the breast cancer prevention trial 
Purpose
Tamoxifen was approved for breast cancer risk reduction in high-risk women based on the National Surgical Adjuvant Breast and Bowel Project's Breast Cancer Prevention Trial (P-1:BCPT), which showed 50% fewer breast cancers with tamoxifen versus placebo, supporting tamoxifen's efficacy in preventing breast cancer. Poor metabolizing CYP2D6 variants are currently the subject of intensive scrutiny regarding their impact on clinical outcomes in the adjuvant setting. Our study extends to variants in a wider spectrum of tamoxifen-metabolizing genes and applies to the prevention setting.
Methods
Our case-only study, nested within P-1:BCPT, explored associations of polymorphisms in estrogen/tamoxifen-metabolizing genes with responsiveness to preventive tamoxifen. Thirty-nine candidate polymorphisms in 17 candidate genes were genotyped in 249 P-1:BCPT cases.
Results
CYP2D6_C1111T, individually and within a CYP2D6 haplotype, showed borderline significant association with treatment arm. Path analysis of the entire tamoxifen pathway gene network showed that the tamoxifen pathway model was consistent with the pattern of observed genotype variability within the placebo-arm dataset. However, correlation of variations in genes in the tamoxifen arm differed significantly from the predictions of the tamoxifen pathway model. Strong correlations between allelic variation in the tamoxifen pathway at CYP1A1-CYP3A4, CYP3A4-CYP2C9, and CYP2C9-SULT1A2, in addition to CYP2D6 and its adjacent genes, were seen in the placebo-arm but not the tamoxifen-arm. In conclusion, beyond reinforcing a role for CYP2D6 in tamoxifen response, our pathway analysis strongly suggests that specific combinations of allelic variants in other genes make major contributions to the tamoxifen-resistance phenotype.
PMCID: PMC2998292  PMID: 21152245
Breast cancer; tamoxifen resistance; chemoprevention; pathway analysis; breast cancer risk; genomic polymorphisms
5.  Novel pathway analysis of genomic polymorphism-cancer risk interaction in the Breast Cancer Prevention Trial 
Purpose: Tamoxifen was approved for breast cancer risk reduction in high-risk women based on the National Surgical Adjuvant Breast and Bowel Project's Breast Cancer Prevention Trial (P-1:BCPT), which showed 50% fewer breast cancers with tamoxifen versus placebo, supporting tamoxifen's efficacy in preventing breast cancer. Poor metabolizing CYP2D6 variants are currently the subject of intensive scrutiny regarding their impact on clinical outcomes in the adjuvant setting. Our study extends to variants in a wider spectrum of tamoxifen-metabolizing genes and applies to the prevention setting. Methods: Our case-only study, nested within P-1:BCPT, explored associations of polymorphisms in estrogen/tamoxifen-metabolizing genes with responsiveness to preventive tamoxifen. Thirty-nine candidate polymorphisms in 17 candidate genes were genotyped in 249 P-1:BCPT cases. Results: CVP2D6_C1111T, individually and within a CYP2D6 haplotype, showed borderline significant association with treatment arm. Path analysis of the entire tamoxifen pathway gene network showed that the tamoxifen pathway model was consistent with the pattern of observed genotype variability within the placebo-arm dataset. However, correlation of variations in genes in the tamoxifen arm differed significantly from the predictions of the tamoxifen pathway model. Strong correlations between allelic variation in the tamoxifen pathway at CYP1A1-CYP3A4, CYP3A4-CYP2C9, and CYP2C9-SULT1A2, in addition to CYP2D6 and its adjacent genes, were seen in the placebo-arm but not the tamoxifen-arm. In conclusion, beyond reinforcing a role for CYP2D6 in tamoxifen response, our pathway analysis strongly suggests that specific combinations of allelic variants in other genes make major contributions to the tamoxifen-resistance phenotype.
PMCID: PMC2998292  PMID: 21152245
Breast cancer; tamoxifen resistance; chemoprevention; pathway analysis; breast cancer risk; genomic
6.  Global transcription in pluripotent embryonic stem cells 
Cell stem cell  2008;2(5):437-447.
SUMMARY
The molecular mechanisms underlying pluripotency and lineage specification from embryonic stem (ES) cells are largely unclear. Differentiation pathways may be determined by the targeted activation of lineage specific genes or by selective silencing of genome regions during differentiation. Here we show that the ES cell genome is transcriptionally globally hyperactive and undergoes global silencing as cells differentiate. Normally silent repeat regions are active in ES cells and tissue-specific genes are sporadically expressed at low levels. Whole genome tiling arrays demonstrate widespread transcription in both coding and non-coding regions in pluripotent ES cells whereas the transcriptional landscape becomes more discrete as differentiation proceeds. The transcriptional hyperactivity in ES cells is accompanied by disproportionate expression of chromatin-remodeling genes and the general transcription machinery, but not histone modifying activities. Interference with several chromatin remodeling activities in ES cells affects their proliferation and differentiation behavior. We propose that global transcriptional activity is a hallmark of pluripotent ES cells that contributes to their plasticity and that lineage specification is strongly driven by reduction of the actively transcribed portion of the genome.
doi:10.1016/j.stem.2008.03.021
PMCID: PMC2435228  PMID: 18462694
7.  PID: the Pathway Interaction Database 
Nucleic Acids Research  2008;37(Database issue):D674-D679.
The Pathway Interaction Database (PID, http://pid.nci.nih.gov) is a freely available collection of curated and peer-reviewed pathways composed of human molecular signaling and regulatory events and key cellular processes. Created in a collaboration between the US National Cancer Institute and Nature Publishing Group, the database serves as a research tool for the cancer research community and others interested in cellular pathways, such as neuroscientists, developmental biologists and immunologists. PID offers a range of search features to facilitate pathway exploration. Users can browse the predefined set of pathways or create interaction network maps centered on a single molecule or cellular process of interest. In addition, the batch query tool allows users to upload long list(s) of molecules, such as those derived from microarray experiments, and either overlay these molecules onto predefined pathways or visualize the complete molecular connectivity map. Users can also download molecule lists, citation lists and complete database content in extensible markup language (XML) and Biological Pathways Exchange (BioPAX) Level 2 format. The database is updated with new pathway content every month and supplemented by specially commissioned articles on the practical uses of other relevant online tools.
doi:10.1093/nar/gkn653
PMCID: PMC2686461  PMID: 18832364
8.  Superposition of Transcriptional Behaviors Determines Gene State 
PLoS ONE  2008;3(8):e2901.
We introduce a novel technique to determine the expression state of a gene from quantitative information measuring its expression. Adopting a productive abstraction from current thinking in molecular biology, we consider two expression states for a gene - Up or Down. We determine this state by using a statistical model that assumes the data behaves as a combination of two biological distributions. Given a cohort of hybridizations, our algorithm predicts, for the single reading, the probability of each gene's being in an Up or a Down state in each hybridization. Using a series of publicly available gene expression data sets, we demonstrate that our algorithm outperforms the prevalent algorithm. We also show that our algorithm can be used in conjunction with expression adjustment techniques to produce a more biologically sound gene-state call. The technique we present here enables a routine update, where the continuously evolving expression level adjustments feed into gene-state calculations. The technique can be applied in almost any multi-sample gene expression experiment, and holds equal promise for protein abundance experiments.
doi:10.1371/journal.pone.0002901
PMCID: PMC2488367  PMID: 18682855
9.  Identification of Key Processes Underlying Cancer Phenotypes Using Biologic Pathway Analysis 
PLoS ONE  2007;2(5):e425.
Cancer is recognized to be a family of gene-based diseases whose causes are to be found in disruptions of basic biologic processes. An increasingly deep catalogue of canonical networks details the specific molecular interaction of genes and their products. However, mapping of disease phenotypes to alterations of these networks of interactions is accomplished indirectly and non-systematically. Here we objectively identify pathways associated with malignancy, staging, and outcome in cancer through application of an analytic approach that systematically evaluates differences in the activity and consistency of interactions within canonical biologic processes. Using large collections of publicly accessible genome-wide gene expression, we identify small, common sets of pathways – Trka Receptor, Apoptosis response to DNA Damage, Ceramide, Telomerase, CD40L and Calcineurin – whose differences robustly distinguish diverse tumor types from corresponding normal samples, predict tumor grade, and distinguish phenotypes such as estrogen receptor status and p53 mutation state. Pathways identified through this analysis perform as well or better than phenotypes used in the original studies in predicting cancer outcome. This approach provides a means to use genome-wide characterizations to map key biological processes to important clinical features in disease.
doi:10.1371/journal.pone.0000425
PMCID: PMC1855990  PMID: 17487280
10.  Allele-Specific Chromatin Immunoprecipitation Studies Show Genetic Influence on Chromatin State in Human Genome 
PLoS Genetics  2007;3(5):e81.
Several recent studies have shown a genetic influence on gene expression variation, including variation between the two chromosomes within an individual and variation between individuals at the population level. We hypothesized that genetic inheritance may also affect variation in chromatin states. To test this hypothesis, we analyzed chromatin states in 12 lymphoblastoid cells derived from two Centre d'Etude du Polymorphisme Humain families using an allele-specific chromatin immunoprecipitation (ChIP-on-chip) assay with Affymetrix 10K SNP chip. We performed the allele-specific ChIP-on-chip assays for the 12 lymphoblastoid cells using antibodies targeting at RNA polymerase II and five post-translation modified forms of the histone H3 protein. The use of multiple cell lines from the Centre d'Etude du Polymorphisme Humain families allowed us to evaluate variation of chromatin states across pedigrees. These studies demonstrated that chromatin state clustered by family. Our results support the idea that genetic inheritance can determine the epigenetic state of the chromatin as shown previously in model organisms. To our knowledge, this is the first demonstration in humans that genetics may be an important factor that influences global chromatin state mediated by histone modification, the hallmark of the epigenetic phenomena.
Author Summary
Human health and disease are determined by an interaction between genetic background and environmental exposures. Both normal development and disease are mediated by epigenetic regulation of gene expression. The epigenetic regulation causes heritable changes in gene expression, which is not associated with DNA sequence changes. Instead, it is mediated by chemical modification of DNA such as DNA methylation or by protein modifications such as histone acetylation and methylation. Although much has been known about epigenetic inheritance during development, little is known about the influence of the genetic background on epigenetic processes such as histone modifications. In this report the authors studied five histone modifications on a genome-wide level in cells from different families. Global epigenetic states, as measured by these histone modifications, showed a similar pattern for cells derived from the same family. This study demonstrates that genetic inheritance may be an important factor influencing global chromatin states mediated by histone modifications in humans. These observations illustrate the importance of integrating genetic and epigenetic information into studies of human health and complex diseases.
doi:10.1371/journal.pgen.0030081
PMCID: PMC1868950  PMID: 17511522
11.  Allele-Specific Chromatin Immunoprecipitation Studies Show Genetic Influence on Chromatin State in Human Genome 
PLoS Genetics  2007;3(5):e81.
Several recent studies have shown a genetic influence on gene expression variation, including variation between the two chromosomes within an individual and variation between individuals at the population level. We hypothesized that genetic inheritance may also affect variation in chromatin states. To test this hypothesis, we analyzed chromatin states in 12 lymphoblastoid cells derived from two Centre d'Etude du Polymorphisme Humain families using an allele-specific chromatin immunoprecipitation (ChIP-on-chip) assay with Affymetrix 10K SNP chip. We performed the allele-specific ChIP-on-chip assays for the 12 lymphoblastoid cells using antibodies targeting at RNA polymerase II and five post-translation modified forms of the histone H3 protein. The use of multiple cell lines from the Centre d'Etude du Polymorphisme Humain families allowed us to evaluate variation of chromatin states across pedigrees. These studies demonstrated that chromatin state clustered by family. Our results support the idea that genetic inheritance can determine the epigenetic state of the chromatin as shown previously in model organisms. To our knowledge, this is the first demonstration in humans that genetics may be an important factor that influences global chromatin state mediated by histone modification, the hallmark of the epigenetic phenomena.
Author Summary
Human health and disease are determined by an interaction between genetic background and environmental exposures. Both normal development and disease are mediated by epigenetic regulation of gene expression. The epigenetic regulation causes heritable changes in gene expression, which is not associated with DNA sequence changes. Instead, it is mediated by chemical modification of DNA such as DNA methylation or by protein modifications such as histone acetylation and methylation. Although much has been known about epigenetic inheritance during development, little is known about the influence of the genetic background on epigenetic processes such as histone modifications. In this report the authors studied five histone modifications on a genome-wide level in cells from different families. Global epigenetic states, as measured by these histone modifications, showed a similar pattern for cells derived from the same family. This study demonstrates that genetic inheritance may be an important factor influencing global chromatin states mediated by histone modifications in humans. These observations illustrate the importance of integrating genetic and epigenetic information into studies of human health and complex diseases.
doi:10.1371/journal.pgen.0030081
PMCID: PMC1868950  PMID: 17511522
12.  SNPdetector: A Software Tool for Sensitive and Accurate SNP Detection 
PLoS Computational Biology  2005;1(5):e53.
Identification of single nucleotide polymorphisms (SNPs) and mutations is important for the discovery of genetic predisposition to complex diseases. PCR resequencing is the method of choice for de novo SNP discovery. However, manual curation of putative SNPs has been a major bottleneck in the application of this method to high-throughput screening. Therefore it is critical to develop a more sensitive and accurate computational method for automated SNP detection. We developed a software tool, SNPdetector, for automated identification of SNPs and mutations in fluorescence-based resequencing reads. SNPdetector was designed to model the process of human visual inspection and has a very low false positive and false negative rate. We demonstrate the superior performance of SNPdetector in SNP and mutation analysis by comparing its results with those derived by human inspection, PolyPhred (a popular SNP detection tool), and independent genotype assays in three large-scale investigations. The first study identified and validated inter- and intra-subspecies variations in 4,650 traces of 25 inbred mouse strains that belong to either the Mus musculus species or the M. spretus species. Unexpected heterozgyosity in CAST/Ei strain was observed in two out of 1,167 mouse SNPs. The second study identified 11,241 candidate SNPs in five ENCODE regions of the human genome covering 2.5 Mb of genomic sequence. Approximately 50% of the candidate SNPs were selected for experimental genotyping; the validation rate exceeded 95%. The third study detected ENU-induced mutations (at 0.04% allele frequency) in 64,896 traces of 1,236 zebra fish. Our analysis of three large and diverse test datasets demonstrated that SNPdetector is an effective tool for genome-scale research and for large-sample clinical studies. SNPdetector runs on Unix/Linux platform and is available publicly (http://lpg.nci.nih.gov).
Synopsis
Single nucleotide polymorphisms (SNPs) are an abundant and important class of heritable genetic variations, and many of them contribute to genetic diseases. Accurate and automated detection of SNPs as heterozygous alleles in fluorescence-based sequencing traces from diploid DNA samples is challenging because of the low signal-to-noise ratio in the data, and because of sequencing artifacts associated with the various DNA sequencing chemistries.
The authors of this publication have developed a new computer program, SNPdetector, that improves upon existing software tools. The main design principle of SNPdetector was to model the process of human visual inspection of experienced analysts. The new tool is able to cut down significantly on both false positive and false negative discovery rates. Good performance can be achieved, without the need for retraining, in substantially different datasets such as SNP discovery in human resequencing data, mutation discovery in zebra fish candidate genes, and discovery of inter- and intra-subspecies variations in inbred mouse strains. The results demonstrate that this software tool is suitable for the automation of SNP discovery in diploid sequencing traces, and permits a substantial reduction of costly and laborious visual data analysis.
doi:10.1371/journal.pcbi.0010053
PMCID: PMC1274293  PMID: 16261194
13.  HapScope: a software system for automated and visual analysis of functionally annotated haplotypes 
Nucleic Acids Research  2002;30(23):5213-5221.
We have developed a software analysis package, HapScope, which includes a comprehensive analysis pipeline and a sophisticated visualization tool for analyzing functionally annotated haplotypes. The HapScope analysis pipeline supports: (i) computational haplotype construction with an expectation-maximization or Bayesian statistical algorithm; (ii) SNP classification by protein coding change, homology to model organisms or putative regulatory regions; and (iii) minimum SNP subset selection by either a Brute Force Algorithm or a Greedy Partition Algorithm. The HapScope viewer displays genomic structure with haplotype information in an integrated environment, providing eight alternative views for assessing genetic and functional correlation. It has a user-friendly interface for: (i) haplotype block visualization; (ii) SNP subset selection; (iii) haplotype consolidation with subset SNP markers; (iv) incorporation of both experimentally determined haplotypes and computational results; and (v) data export for additional analysis. Comparison of haplotypes constructed by the statistical algorithms with those determined experimentally shows variation in haplotype prediction accuracies in genomic regions with different levels of nucleotide diversity. We have applied HapScope in analyzing haplotypes for candidate genes and genomic regions with extensive SNP and genotype data. We envision that the systematic approach of integrating functional genomic analysis with population haplotypes, supported by HapScope, will greatly facilitate current genetic disease research.
PMCID: PMC137968  PMID: 12466546
14.  Life sciences domain analysis model 
Objective
Meaningful exchange of information is a fundamental challenge in collaborative biomedical research. To help address this, the authors developed the Life Sciences Domain Analysis Model (LS DAM), an information model that provides a framework for communication among domain experts and technical teams developing information systems to support biomedical research. The LS DAM is harmonized with the Biomedical Research Integrated Domain Group (BRIDG) model of protocol-driven clinical research. Together, these models can facilitate data exchange for translational research.
Materials and methods
The content of the LS DAM was driven by analysis of life sciences and translational research scenarios and the concepts in the model are derived from existing information models, reference models and data exchange formats. The model is represented in the Unified Modeling Language and uses ISO 21090 data types.
Results
The LS DAM v2.2.1 is comprised of 130 classes and covers several core areas including Experiment, Molecular Biology, Molecular Databases and Specimen. Nearly half of these classes originate from the BRIDG model, emphasizing the semantic harmonization between these models. Validation of the LS DAM against independently derived information models, research scenarios and reference databases supports its general applicability to represent life sciences research.
Discussion
The LS DAM provides unambiguous definitions for concepts required to describe life sciences research. The processes established to achieve consensus among domain experts will be applied in future iterations and may be broadly applicable to other standardization efforts.
Conclusions
The LS DAM provides common semantics for life sciences research. Through harmonization with BRIDG, it promotes interoperability in translational science.
doi:10.1136/amiajnl-2011-000763
PMCID: PMC3486731  PMID: 22744959
Semantics; knowledge representation (computer); interoperability; life sciences; information model; knowledge bases; knowledge representations; data models; clinical; OMICS; genomics; cancer genomics
15.  e-Science, caGrid, and Translational Biomedical Research 
Computer  2008;41(11):58-66.
Translational research projects target a wide variety of diseases, test many different kinds of biomedical hypotheses, and employ a large assortment of experimental methodologies. Diverse data, complex execution environments, and demanding security and reliability requirements make the implementation of these projects extremely challenging and require novel e-Science technologies.
doi:10.1109/MC.2008.459
PMCID: PMC3035203  PMID: 21311723
16.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation 
PLoS Biology  2009;7(11):e1000247.
A novel method for quantifying the similarity between phenotypes by the use of ontologies can be used to search for candidate genes, pathway members, and human disease models on the basis of phenotypes alone.
Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.
Author Summary
Model organisms such as fruit flies, mice, and zebrafish are useful for investigating gene function because they are easy to grow, dissect, and genetically manipulate in the laboratory. By examining mutations in these organisms, one can identify candidate genes that cause disease in humans, and develop models to better understand human disease and gene function. A fundamental roadblock for analysis is, however, the lack of a computational method for describing and comparing phenotypes of mutant animals and of human diseases when the genetic basis is unknown. We describe here a novel method using ontologies to record and quantify the similarity between phenotypes. We tested our method by using the annotated mutant phenotype of one member of the Hedgehog signaling pathway in zebrafish to identify other pathway members with similar recorded phenotypes. We also compared human disease phenotypes to those produced by mutation in model organisms, and show that orthologous and biologically relevant genes can be identified by this method. Given that the genetic basis of human disease is often unknown, this method provides a means for identifying candidate genes, pathway members, and disease models by computationally identifying similar phenotypes within and across species.
doi:10.1371/journal.pbio.1000247
PMCID: PMC2774506  PMID: 19956802
17.  Genome-wide loss of heterozygosity and copy number alteration in esophageal squamous cell carcinoma using the Affymetrix GeneChip Mapping 10 K array 
BMC Genomics  2006;7:299.
Background
Esophageal squamous cell carcinoma (ESCC) is a common malignancy worldwide. Comprehensive genomic characterization of ESCC will further our understanding of the carcinogenesis process in this disease.
Results
Genome-wide detection of chromosomal changes was performed using the Affymetrix GeneChip 10 K single nucleotide polymorphism (SNP) array, including loss of heterozygosity (LOH) and copy number alterations (CNA), for 26 pairs of matched germ-line and micro-dissected tumor DNA samples. LOH regions were identified by two methods – using Affymetrix's genotype call software and using Affymetrix's copy number alteration tool (CNAT) software – and both approaches yielded similar results. Non-random LOH regions were found on 10 chromosomal arms (in decreasing order of frequency: 17p, 9p, 9q, 13q, 17q, 4q, 4p, 3p, 15q, and 5q), including 20 novel LOH regions (10 kb to 4.26 Mb). Fifteen CNA-loss regions (200 kb to 4.3 Mb) and 36 CNA-gain regions (200 kb to 9.3 Mb) were also identified.
Conclusion
These studies demonstrate that the Affymetrix 10 K SNP chip is a valid platform to integrate analyses of LOH and CNA. The comprehensive knowledge gained from this analysis will enable improved strategies to prevent, diagnose, and treat ESCC.
doi:10.1186/1471-2164-7-299
PMCID: PMC1687196  PMID: 17134496

Results 1-17 (17)