PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-12 (12)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Dalliance: interactive genome viewing on the web 
Bioinformatics  2011;27(6):889-890.
Summary: Dalliance is a new genome viewer which offers a high level of interactivity while running within a web browser. All data is fetched using the established distributed annotation system (DAS) protocol, making it easy to customize the browser and add extra data.
Availability and Implementation: Dalliance runs entirely within your web browser, and relies on existing DAS server infrastructure. Browsers for several mammalian genomes are available at http://www.biodalliance.org/, and the use of DAS means you can add your own data to these browsers. In addition, the source code (Javascript) is available under the BSD license, and is straightforward to install on your own web server and embed within other documents.
Contact: thomas@biodalliance.org
doi:10.1093/bioinformatics/btr020
PMCID: PMC3051325  PMID: 21252075
2.  iMotifs: an integrated sequence motif visualization and analysis environment 
Bioinformatics  2010;26(6):843-844.
Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important.
iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces.
The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided.
Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files.
Contact: matias.piipari@gmail.com; imotifs@googlegroups.com
doi:10.1093/bioinformatics/btq026
PMCID: PMC2832821  PMID: 20106815
3.  CASP5 Target Classification 
Proteins  2003;53(Suppl 6):340-351.
This report summarizes the Critical Assessment of Protein Structure Prediction (CASP5) target proteins, which included 67 experimental models submitted from various structural genomics efforts and independent research groups. Throughout this special issue, CASP5 targets are referred to with the identification numbers T0129–T0195. Several of these targets were excluded from the assessment for various reasons: T0164 and T0166 were cancelled by the organizers; T0131, T0144, T0158, T0163, T0171, T0175, and T0180 were not available in time; T0145 was “natively unfolded”; the T0139 structure became available before the target expired; and T0194 was solved for a different sequence than the one submitted. Table I outlines the sequence and structural information available for CASP5 proteins in the context of existing folds and evolutionary relationships. This information provided the basis for a domain-based classification of the target structures into three assessment categories: comparative modeling (CM), fold recognition (FR), and new fold (NF). The FR category was further subdivided into homologues [FR(H)] and analogs [FR(A)] based on evolutionary considerations, and the overlap between assessment categories was classified as CM/FR(H) and FR(A)/NF. CASP5 domains are illustrated in Figure 1. Examples of nontrivial links between CASP5 target domains and existing structures that support our classifications are provided.
doi:10.1002/prot.10555
PMCID: PMC2656935  PMID: 14579323
4.  Adding Some SPICE to DAS 
Bioinformatics (Oxford, England)  2005;21(Suppl 2):ii40-ii41.
Summary
The distributed annotation system (DAS) defines a communication protocol used to exchange biological annotations. It is motivated by the idea that annotations should not be provided by single centralized databases but instead be spread over multiple sites. Data distribution, performed by DAS servers, is separated from visualization, which is carried out by DAS clients. The original DAS protocol was designed to serve annotation of genomic sequences. We have extended the protocol to be applicable to macromolecular structures. Here we present SPICE, a new DAS client that can be used to visualize protein sequence and structure annotations.
Availability
http://www.efamily.org.uk/software/dasclients/spice/
doi:10.1093/bioinformatics/bti1106
PMCID: PMC2656757  PMID: 16204122
5.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis 
Nature biotechnology  2008;26(7):779-785.
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation.
doi:10.1038/nbt1414
PMCID: PMC2644410  PMID: 18612301
6.  Data growth and its impact on the SCOP database: new developments 
Nucleic Acids Research  2007;36(Database issue):D419-D425.
The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
doi:10.1093/nar/gkm993
PMCID: PMC2238974  PMID: 18000004
7.  Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster 
A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.
Author Summary
In contrast to the genomic sequences that encode proteins, little is known about the regulatory elements that instruct the cell as to when and where a given gene should be active. Regulatory elements are thought to consist of clusters of short DNA words (motifs), each of which acts as a binding site for sequence-specific DNA binding protein. Thus, building a comprehensive dictionary of such motifs is an important step towards a broader understanding of gene regulation. Using the recently published NestedMICA method for detecting overrepresented motifs in a set of sequences, we build a dictionary of 120 motifs from regulatory sequences in the fruitfly genome, 87 of which are novel. Analysis of positional biases, conservation across species, and association with specific patterns of gene expression in fruitfly embryos suggest that the great majority of these newly discovered motifs represent functional regulatory elements. In addition to providing an initial motif dictionary for one of the most intensively studied model organisms, this work provides an analytical framework for the comprehensive discovery of regulatory motifs in complex animal genomes.
doi:10.1371/journal.pcbi.0030007
PMCID: PMC1779301  PMID: 17238282
8.  SISYPHUS—structural alignments for proteins with non-trivial relationships 
Nucleic Acids Research  2006;35(Database issue):D253-D259.
With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Σισυϕος]—in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from .
doi:10.1093/nar/gkl746
PMCID: PMC1635320  PMID: 17068077
9.  NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence 
Nucleic Acids Research  2005;33(5):1445-1453.
NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.
doi:10.1093/nar/gki282
PMCID: PMC1064142  PMID: 15760844
10.  SCOP database in 2004: refinements integrate structure and sequence family data 
Nucleic Acids Research  2004;32(Database issue):D226-D229.
The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. Protein domains in SCOP are hierarchically classified into families, superfamilies, folds and classes. The continual accumulation of sequence and structural data allows more rigorous analysis and provides important information for understanding the protein world and its evolutionary repertoire. SCOP participates in a project that aims to rationalize and integrate the data on proteins held in several sequence and structure databases. As part of this project, starting with release 1.63, we have initiated a refinement of the SCOP classification, which introduces a number of changes mostly at the levels below superfamily. The pending SCOP reclassification will be carried out gradually through a number of future releases. In addition to the expanded set of static links to external resources, available at the level of domain entries, we have started modernization of the interface capabilities of SCOP allowing more dynamic links with other databases. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
doi:10.1093/nar/gkh039
PMCID: PMC308773  PMID: 14681400
11.  SCOP database in 2002: refinements accommodate structural genomics 
Nucleic Acids Research  2002;30(1):264-267.
The SCOP (Structural Classification of Proteins) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. Protein domains in SCOP are grouped into species and hierarchically classified into families, superfamilies, folds and classes. Recently, we introduced a new set of features with the aim of standardizing access to the database, and providing a solid basis to manage the increasing number of experimental structures expected from structural genomics projects. These features include: a new set of identifiers, which uniquely identify each entry in the hierarchy; a compact representation of protein domain classification; a new set of parseable files, which fully describe all domains in SCOP and the hierarchy itself. These new features are reflected in the ASTRAL compendium. The SCOP search engine has also been updated, and a set of links to external resources added at the level of domain entries. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
PMCID: PMC99154  PMID: 11752311
12.  SCOP: a Structural Classification of Proteins database 
Nucleic Acids Research  2000;28(1):257-259.
The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and distant evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database so far. The sequences of proteins in SCOP provide the basis of the ASTRAL sequence libraries that can be used as a source of data to calibrate sequence search algorithms and for the generation of statistics on, or selections of, protein structures. Links can be made from SCOP to PDB-ISL: a library containing sequences homologous to proteins of known structure. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL http://scop.mrc-lmb. cam.ac.uk/scop/
PMCID: PMC102479  PMID: 10592240

Results 1-12 (12)