Positional MEDLINE (PosMed; http://biolod.org/PosMed) is a powerful Semantic Web Association Study engine that ranks biomedical resources such as genes, metabolites, diseases and drugs, based on the statistical significance of associations between user-specified phenotypic keywords and resources connected directly or inferentially through a Semantic Web of biological databases such as MEDLINE, OMIM, pathways, co-expressions, molecular interactions and ontology terms. Since 2005, PosMed has long been used for in silico positional cloning studies to infer candidate disease-responsible genes existing within chromosomal intervals. PosMed is redesigned as a workbench to discover possible functional interpretations for numerous genetic variants found from exome sequencing of human disease samples. We also show that the association search engine enhances the value of mouse bioresources because most knockout mouse resources have no phenotypic annotation, but can be associated inferentially to phenotypes via genes and biomedical documents. For this purpose, we established text-mining rules to the biomedical documents by careful human curation work, and created a huge amount of correct linking between genes and documents. PosMed associates any phenotypic keyword to mouse resources with 20 public databases and four original data sets as of May 2013.
The identification of genes underlying human genetic disorders requires the combination of data related to cytogenetic localization, phenotypes and expression patterns, to generate a list of candidate genes. In the field of human genetics, it is normal to perform this combination analysis by hand. We report on GeneSeeker (), a web server that gathers and combines data from a series of databases. All database searches are performed via the web interfaces provided with the original databases, guaranteeing that the most recent data are queried, and obviating data warehousing. GeneSeeker makes the same selection of candidate genes as the human geneticists would have performed, and thus reducing the time-consuming process to a few minutes. GeneSeeker is particularly well suited for syndromes in which the disease gene displays altered expression patterns in the affected tissue(s).
This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.
We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries.
Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins.
Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces.
Semantic Web; Semantic mashup; Nicotine dependence; Information integration; Ontologies
The ability to deduce other persons' mental states and emotions which has been termed ‘theory of mind (ToM)’ is highly heritable. First molecular genetic studies focused on some dopamine-related genes, while the genetic basis underlying different components of ToM (affective ToM and cognitive ToM) remain unknown. The current study tested 7 candidate polymorphisms (rs4680, rs4633, rs2020917, rs2239393, rs737865, rs174699 and rs59938883) on the catechol-O-methyltransferase (COMT) gene. We investigated how these polymorphisms relate to different components of ToM. 101 adults participated in our study; all were genetically unrelated, non-clinical and healthy Chinese subjects. Different ToM tasks were applied to detect their theory of mind ability. The results showed that the COMT gene rs2020917 and rs737865 SNPs were associated with cognitive ToM performance, while the COMT gene rs5993883 SNP was related to affective ToM, in which a significant gender-genotype interaction was found (p = 0.039). Our results highlighted the contribution of DA-related COMT gene on ToM performance. Moreover, we found out that the different SNP at the same gene relates to the discriminative aspect of ToM. Our research provides some preliminary evidence to the genetic basis of theory of mind which still awaits further studies.
We demonstrate the use of Semantic Web technology to integrate the ALFRED allele frequency database and the Starpath pathway resource. The linking of population-specific genotype data with cancer-related pathway data is potentially useful given the growing interest in personalized medicine and the exploitation of pathway knowledge for cancer drug discovery. We model our data using the Web Ontology Language (OWL), drawing upon ideas from existing standard formats BioPAX for pathway data and PML for allele frequency data. We store our data within an Oracle database, using Oracle Semantic Technologies. We then query the data using Oracle’s rule-based inference engine and SPARQL-like RDF query language. The ability to perform queries across the domains of population genetics and pathways offers the potential to answer a number of cancer-related research questions. Among the possibilities is the ability to identify genetic variants which are associated with cancer pathways and whose frequency varies significantly between ethnic groups. This sort of information could be useful for designing clinical studies and for providing background data in personalized medicine. It could also assist with the interpretation of genetic analysis results such as those from genome-wide association studies.
Semantic Web technologies have been developed to overcome the limitations of the current Web and conventional data integration solutions. The Semantic Web is expected to link all the data present on the Internet instead of linking just documents. One of the foundations of the Semantic Web technologies is the knowledge representation language Resource Description Framework (RDF). Knowledge expressed in RDF is typically stored in so-called triple stores (also known as RDF stores), from which it can be retrieved with SPARQL, a language designed for querying RDF-based models. The Semantic Web technologies should allow federated queries over multiple triple stores. In this paper we compare the efficiency of a set of biologically relevant queries as applied to a number of different triple store implementations.
Previously we developed a library of queries to guide the use of our knowledge base Cell Cycle Ontology implemented as a triple store. We have now compared the performance of these queries on five non-commercial triple stores: OpenLink Virtuoso (Open-Source Edition), Jena SDB, Jena TDB, SwiftOWLIM and 4Store. We examined three performance aspects: the data uploading time, the query execution time and the scalability. The queries we had chosen addressed diverse ontological or biological questions, and we found that individual store performance was quite query-specific. We identified three groups of queries displaying similar behaviour across the different stores: 1) relatively short response time queries, 2) moderate response time queries and 3) relatively long response time queries. SwiftOWLIM proved to be a winner in the first group, 4Store in the second one and Virtuoso in the third one.
Our analysis showed that some queries behaved idiosyncratically, in a triple store specific manner, mainly with SwiftOWLIM and 4Store. Virtuoso, as expected, displayed a very balanced performance - its load time and its response time for all the tested queries were better than average among the selected stores; it showed a very good scalability and a reasonable run-to-run reproducibility. Jena SDB and Jena TDB were consistently slower than the other three implementations. Our analysis demonstrated that most queries developed for Virtuoso could be successfully used for other implementations.
The TOM1 gene of Arabidopsis thaliana encodes a putative multipass transmembrane protein which is necessary for the efficient multiplication of tobamoviruses. We have previously shown that mutations severely destructive to the TOM1 gene reduce tobamovirus multiplication to low levels but do not impair it completely. In this report, we subjected one of the tom1 mutants (tom1-1) to another round of mutagenesis and isolated a new mutant which did not permit a detectable level of tobamovirus multiplication. In addition to tom1-1, this mutant carried a mutation referred to as tom3-1. Positional cloning showed that TOM3 was one of two TOM1-like genes in Arabidopsis. Based on the similarity between the amino acid sequences of TOM1 and TOM3, together with the results of a Sos recruitment assay suggesting that both TOM1 and TOM3 bind tobamovirus-encoded replication proteins, we propose that TOM1 and TOM3 play parallel and essential roles in the replication of tobamoviruses.
Endeavour (http://www.esat.kuleuven.be/endeavourweb; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes. Using a training set of genes known to be involved in a biological process of interest, our approach consists of (i) inferring several models (based on various genomic data sources), (ii) applying each model to the candidate genes to rank those candidates against the profile of the known genes and (iii) merging the several rankings into a global ranking of the candidate genes. In the present article, we describe the latest developments of Endeavour. First, we provide a web-based user interface, besides our Java client, to make Endeavour more universally accessible. Second, we support multiple species: in addition to Homo sapiens, we now provide gene prioritization for three major model organisms: Mus musculus, Rattus norvegicus and Caenorhabditis elegans. Third, Endeavour makes use of additional data sources and is now including numerous databases: ontologies and annotations, protein–protein interactions, cis-regulatory information, gene expression data sets, sequence information and text-mining data. We tested the novel version of Endeavour on 32 recent disease gene associations from the literature. Additionally, we describe a number of recent independent studies that made use of Endeavour to prioritize candidate genes for obesity and Type II diabetes, cleft lip and cleft palate, and pulmonary fibrosis.
Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs.
Tom40 is the major subunit of the translocase of the outer
mitochondrial membrane (the TOM complex). To study the assembly pathway
of Tom40, we have followed the integration of the protein into the TOM
complex in vitro and in vivo using wild-type and altered versions of
the Neurospora crassa Tom40 protein. Upon import into
isolated mitochondria, Tom40 precursor proteins lacking the first 20 or
the first 40 amino acid residues were assembled as the wild-type
protein. In contrast, a Tom40 precursor lacking residues 41 to 60,
which contains a highly conserved region of the protein, was arrested
at an intermediate stage of assembly. We constructed mutant versions of
Tom40 affecting this region and transformed the genes into a sheltered
heterokaryon containing a tom40 null nucleus.
Homokaryotic strains expressing the mutant Tom40 proteins had growth
rate defects and were deficient in their ability to form conidia.
Analysis of the TOM complex in these strains by blue native gel
electrophoresis revealed alterations in electrophoretic mobility and a
tendency to lose Tom40 subunits from the complex. Thus, both in vitro
and in vivo studies implicate residues 41 to 60 as containing a
sequence required for proper assembly/stability of Tom40 into the TOM
complex. Finally, we found that TOM complexes in the mitochondrial
outer membrane were capable of exchanging subunits in vitro. A model is
proposed for the integration of Tom40 subunits into the TOM complex.
The relationship between disease susceptibility and genetic variation is complex, and many different types of data are relevant. We describe a web resource and database that provides and integrates as much information as possible on disease/gene relationships at the molecular level.
The resource has three primary modules. One module identifies which genes are candidates for involvement in a specified disease. A second module provides information about the relationships between sets of candidate genes. The third module analyzes the likely impact of non-synonymous SNPs on protein function. Disease/candidate gene relationships and gene-gene relationships are derived from the literature using simple but effective text profiling. SNP/protein function relationships are derived by two methods, one using principles of protein structure and stability, the other based on sequence conservation. Entries for each gene include a number of links to other data, such as expression profiles, pathway context, mouse knockout information and papers. Gene-gene interactions are presented in an interactive graphical interface, providing rapid access to the underlying information, as well as convenient navigation through the network. Use of the resource is illustrated with aspects of the inflammatory response and hypertension.
The combination of SNP impact analysis, a knowledge based network of gene relationships and candidate genes, and access to a wide range of data and literature allow a user to quickly assimilate available information, and so develop models of gene-pathway-disease interaction.
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.
Forward genetic screens have been used as a powerful strategy to dissect complex biological pathways in many model systems. A significant limitation of this approach has been the time-consuming and costly process of positional cloning and molecular characterization of the mutations isolated in these screens. Here, the authors describe a strategy using microarray hybridizations to facilitate positional cloning. This method relies on the fact that premature stop codons (i.e., nonsense mutations) constitute a frequent class of mutations isolated in screens and that nonsense mutant messenger RNAs are efficiently degraded by the conserved nonsense-mediated decay pathway. They validate this strategy by identifying two previously uncharacterized mutations: (1) tom-1, a mutation found in a forward genetic screen for enhanced acetylcholine secretion in Caenorhabditis elegans, and (2) an apparently spontaneous mutation in the hif-1 transcription factor gene. They further demonstrate the broad applicability of this strategy using other known mutants in C. elegans, Arabidopsis, and mouse. Characterization of tom-1 mutants suggests that TOM-1, the C. elegans ortholog of mammalian tomosyn, functions as an endogenous inhibitor of neurotransmitter secretion. These results also suggest that microarray hybridizations have the potential to significantly reduce the time and effort required for positional cloning.
Genetic screens are commonly used to figure out which genes are involved in a biological process. The first step in a genetic screen is to isolate mutant animals that are defective in the process being studied. The next step is to find which of the thousands of genes has the mutation that causes the observed defect. Positional cloning, the tried-and-true method for locating mutations, is slow and expensive. The authors propose using microarray hybridizations to speed the process. Their approach relies on the fact that a large fraction of the mutations found in screens are the results of premature stop codons, a particularly severe type of mutation. In cells, messages containing premature stop codons are rapidly destroyed by a protective pathway, called nonsense-mediated decay, thus making them directly detectable by microarray hybridization.
The authors apply this strategy retrospectively to known mutants in Caenorhabditis elegans, Arabidopsis, and mouse. They identify two uncharacterized mutations in C. elegans, including one, tom-1, found in a forward genetic screen for enhancers of neurotransmission. Interestingly, their characterization of tom-1 mutants suggests that the highly conserved protein tomosyn inhibits neurotransmission in neurons. This study shows that microarray hybridizations will help reduce the time and effort required for positional cloning.
Bipolar disorder is a highly heritable mental illness. The global burden of bipolar disorder is complicated by its comorbidity with substance abuse. Several genome-wide linkage/association studies on bipolar disorder as well as substance abuse have focused on the identification and/or prioritization of candidate disease genes. A useful step for translational research of these identified/prioritized genes is to identify sets of genes that have particular kinds of publicly available data. Therefore, we have leveraged the availability of links to related resources in the Entrez Gene database to develop a web-based resource for selecting genes based on presence or absence in particular biological data resources. The utility of our approach is demonstrated using a set of 3,399 genes from multiple eukaryotes that have been studied in the context of bipolar disorder and/or substance abuse. A web resource to automate the selection of genes that contain certain database links is available at http://compbio.jsums.edu/bpd.
The Mouse Genome Informatics (MGI; http://www.informatics.jax.org/) database integrates genetic and genomic data with the primary mission of facilitating the use of the mouse as a model system for understanding human biology and disease processes. MGI is the authoritative source of official mouse genetic nomenclature, gene ontology annotations, mammalian phenotype annotations, and mouse anatomy terms. MGI staff enforce the use of standardized genetic nomenclature, ontologies, and controlled vocabularies to describe mouse sequence data, genes, strains, expression data, alleles, and phenotypes. Extensive links between gene-centric information in MGI and other informatics resources (e.g., OMIM, Ensembl, UCSC, NCBI, UniProt) are maintained and updated on a regular basis.
Using the Web-based query interfaces for MGI, users can query for a mouse gene or genes according to diverse biological attributes of those genes, including phenotype associations, gene expression, functional annotation, and genome location. The MGI MouseBLAST server allows users to interrogate the MGI database using nucleotide and/or protein sequences. Functional and phenotypic data from MGI can be viewed in a broader genomic context using an interactive genome browser called Mouse GBrowse. The power of the MGI database as a research tool for biomedicine stems from the degree to which data from diverse sources are integrated. Integration, in turn, allows the data to be evaluated in new contexts. For example, integration makes possible such complex queries as “Find all genes from Chromosome 1 where the function is annotated as transcription factor and there is a knockout allele that results in eye dysmorphology.”
We have designed and implemented a web-based database system, called PlantQTL-GE, to facilitate quantitatine traits locus (QTL) based candidate gene identification and gene function analysis. We collected a large number of genes, gene expression information in microarray data and expressed sequence tags (ESTs) and genetic markers from multiple sources of Oryza sativa and Arabidopsis thaliana. The system integrates these diverse data sources and has a uniform web interface for easy access. It supports QTL queries specifying QTL marker intervals or genomic loci, and displays, on rice or Arabidopsis genome, known genes, microarray data, ESTs and candidate genes and similar putative genes in the other plant. Candidate genes in QTL intervals are further annotated based on matching ESTs, microarray gene expression data and cis-elements in regulatory sequences. The system is freely available at .
Rare disease research requires a broad range of disease-related information for the discovery of causes of genetic disorders that are maladies caused by abnormalities in genes or chromosomes. A rarity in cases makes it difficult for researchers to elucidate definite inception. This knowledge base will be a major resource not only for clinicians, but also for the general public, who are unable to find consistent information on rare diseases in a single location.
We design a compact database schema for faster querying; its structure is optimized to store heterogeneous data sources. Then, clinicians at Seoul National University Hospital (SNUH) review and revise those resources. Additionally, we integrated other sources to capture genomic resources and clinical trials in detail on the Korean Rare Disease Knowledge base (KRDK).
As a result, we have developed a Web-based knowledge base, KRDK, suitable for study of Mendelian diseases that commonly occur among Koreans. This knowledge base is comprised of disease summary and review, causal gene list, laboratory and clinic directory, patient registry, and so on. Furthermore, database for analyzing and giving access to human biological information and the clinical trial management system are integrated on KRDK.
We expect that KRDK, the first rare disease knowledge base in Korea, may contribute to collaborative research and be a reliable reference for application to clinical trials. Additionally, this knowledge base is ready for querying of drug information so that visitors can search a list of rare diseases that is relative to specific drugs. Visitors can have access to KRDK via http://www.snubi.org/software/raredisease/.
Rare Diseases; Knowledge Bases; Korean; Genetic Databases; Online Systems
RiceGeneThresher is a public online resource for mining genes underlying genome regions of interest or quantitative trait loci (QTL) in rice genome. It is a compendium of rice genomic resources consisting of genetic markers, genome annotation, expressed sequence tags (ESTs), protein domains, gene ontology, plant stress-responsive genes, metabolic pathways and prediction of protein–protein interactions. RiceGeneThresher system integrates these diverse data sources and provides powerful web-based applications, and flexible tools for delivering customized set of biological data on rice. Its system supports whole-genome gene mining for QTL by querying using DNA marker intervals or genomic loci. RiceGeneThresher provides biologically supported evidences that are essential for targeting groups or networks of genes involved in controlling traits underlying QTL. Users can use it to discover and to assign the most promising candidate genes in preparation for the further gene function validation analysis. The web-based application is freely available at http://rice.kps.ku.ac.th.
Gene-to-gene coexpression analysis is a powerful approach to infer the function of uncharacterized genes. Here, we report comprehensive identification of coexpression gene modules of tomato (Solanum lycopersicum) and experimental verification of coordinated expression of module member genes. On the basis of the gene-to-gene correlation coefficient calculated from 67 microarray hybridization data points, we performed a network-based analysis. This facilitated the identification of 199 coexpression modules. A gene ontology annotation search revealed that 75 out of the 199 modules are enriched with genes associated with common functional categories. To verify the coexpression relationships between module member genes, we focused on one module enriched with genes associated with the flavonoid biosynthetic pathway. A non-enzyme, non-transcription factor gene encoding a zinc finger protein in this module was overexpressed in S. lycopersicum cultivar Micro-Tom, and expression levels of flavonoid pathway genes were investigated. Flavonoid pathway genes included in the module were up-regulated in the plant overexpressing the zinc finger gene. This result demonstrates that coexpression modules, at least the ones identified in this study, represent actual transcriptional coordination between genes, and can facilitate the inference of tomato gene function.
coexpression; flavonoid; Solanum lycopersicum; tomato; zinc finger
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. One of the key requirements to perform GWAS is the identification of subject cohorts with accurate classification of disease phenotypes. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical data stored in electronic health records (EHRs) to accurately identify subjects with specific diseases for inclusion in cohort studies. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR data and enabling federated querying and inferencing via standardized Web protocols for identifying subjects with Diabetes Mellitus. Our study highlights the potential of using Web-scale data federation approaches to execute complex queries.
Even in the post-genomic era, the identification of candidate genes within loci associated with human genetic diseases is a very demanding task, because the critical region may typically contain hundreds of positional candidates. Since genes implicated in similar phenotypes tend to share very similar expression profiles, high throughput gene expression data may represent a very important resource to identify the best candidates for sequencing. However, so far, gene coexpression has not been used very successfully to prioritize positional candidates.
We show that it is possible to reliably identify disease-relevant relationships among genes from massive microarray datasets by concentrating only on genes sharing similar expression profiles in both human and mouse. Moreover, we show systematically that the integration of human-mouse conserved coexpression with a phenotype similarity map allows the efficient identification of disease genes in large genomic regions. Finally, using this approach on 850 OMIM loci characterized by an unknown molecular basis, we propose high-probability candidates for 81 genetic diseases.
Our results demonstrate that conserved coexpression, even at the human-mouse phylogenetic distance, represents a very strong criterion to predict disease-relevant relationships among human genes.
One of the most limiting aspects of biological research in the post-genomic era is the capability to integrate massive datasets on gene structure and function for producing useful biological knowledge. In this report we have applied an integrative approach to address the problem of identifying likely candidate genes within loci associated with human genetic diseases. Despite the recent progress in sequencing technologies, approaching this problem from an experimental perspective still represents a very demanding task, because the critical region may typically contain hundreds of positional candidates. We found that by concentrating only on genes sharing similar expression profiles in both human and mouse, massive microarray datasets can be used to reliably identify disease-relevant relationships among genes. Moreover, we found that integrating the coexpression criterion with systematic phenome analysis allows efficient identification of disease genes in large genomic regions. Using this approach on 850 OMIM loci characterized by unknown molecular basis, we propose high-probability candidates for 81 genetic diseases.
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions—that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.
Struct2Net is a web server for predicting interactions between arbitrary protein pairs using a structure-based approach. Prediction of protein–protein interactions (PPIs) is a central area of interest and successful prediction would provide leads for experiments and drug design; however, the experimental coverage of the PPI interactome remains inadequate. We believe that Struct2Net is the first community-wide resource to provide structure-based PPI predictions that go beyond homology modeling. Also, most web-resources for predicting PPIs currently rely on functional genomic data (e.g. GO annotation, gene expression, cellular localization, etc.). Our structure-based approach is independent of such methods and only requires the sequence information of the proteins being queried. The web service allows multiple querying options, aimed at maximizing flexibility. For the most commonly studied organisms (fly, human and yeast), predictions have been pre-computed and can be retrieved almost instantaneously. For proteins from other species, users have the option of getting a quick-but-approximate result (using orthology over pre-computed results) or having a full-blown computation performed. The web service is freely available at http://struct2net.csail.mit.edu.
CLUB (“Candidate List of yoUr Biomarkers”) is a freely available, web-based resource designed to support Cancer biomarker research. It is targeted to provide a comprehensive list of candidate biomarkers for various cancers that have been reported by the research community. CLUB provides tools for comparison of marker candidates from different experimental platforms, with the ability to filter, search, query and explore, molecular interaction networks associated with cancer biomarkers from the published literature and from data uploaded by the community. This complex and ambitious project is implemented in phases. As a first step, we have compiled from the literature an initial set of differentially expressed human candidate cancer biomarkers. Each candidate is annotated with information from publicly available databases such as Gene Ontology, Swiss-Prot database, National Center for Biotechnology Information’s reference sequences, Biomolecular Interaction Network Database and IntAct interaction. The user has the option to maintain private lists of biomarker candidates or share and export these for use by the community. Furthermore, users may customize and combine commonly used sets of selection procedures and apply them as a stored workflow using selected candidate lists. To enable an assessment by the user before taking a candidate biomarker to the experimental validation stage, the platform contains the functionality to identify pathways associated with cancer risk, staging, prognosis, outcome in cancer and other clinically associated phenotypes. The system is available at http://club.bii.a-star.edu.sg.
The development of inexpensive high throughput methods to identify individual DNA sequence differences is important to the future growth of medical genetics. This has become increasingly apparent as epidemiologists, pathologists, and clinical geneticists focus more attention on the molecular basis of complex multifactorial diseases. Such undertakings will rely upon genetic maps based upon newly discovered, common, single nucleotide polymorphisms. Furthermore, candidate gene approaches used in identifying disease associated genes necessitate screening large sequence blocks for changes tracking with the disease state. Even after such genes are isolated, large scale mutational analyses will often be needed for risk assessment studies to define the likely medical consequences of carrying a mutated gene.
This review concentrates on the use of oligonucleotide arrays for hybridisation based comparative sequence analysis. Technological advances within the past decade have made it possible to apply this technology to many different aspects of medical genetics. These applications range from the detection and scoring of single nucleotide polymorphisms to mutational analysis of large genes. Although we discuss published scientific reports, unpublished work from the private sector12 could also significantly affect the future of this technology.
Keywords: mutational analysis; oligonucleotide microarrays; DNA chips