PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (33)
 

Clipboard (0)
None

Select a Filter Below

Journals
more »
Year of Publication
1.  Dfam: a database of repetitive DNA based on profile hidden Markov models 
Nucleic Acids Research  2012;41(D1):D70-D82.
We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
doi:10.1093/nar/gks1265
PMCID: PMC3531169  PMID: 23203985
3.  Making your database available through Wikipedia: the pros and cons 
Nucleic Acids Research  2011;40(D1):D9-D12.
Wikipedia, the online encyclopedia, is the most famous wiki in use today. It contains over 3.7 million pages of content; with many pages written on scientific subject matters that include peer-reviewed citations, yet are written in an accessible manner and generally reflect the consensus opinion of the community. In this, the 19th Annual Database Issue of Nucleic Acids Research, there are 11 articles that describe the use of a wiki in relation to a biological database. In this commentary, we discuss how biological databases can be integrated with Wikipedia, thereby utilising the pre-existing infrastructure, tools and above all, large community of authors (or Wikipedians). The limitations to the content that can be included in Wikipedia are highlighted, with examples drawn from articles found in this issue and other wiki-based resources, indicating why other wiki solutions are necessary. We discuss the merits of using open wikis, like Wikipedia, versus other models, with particular reference to potential vandalism. Finally, we raise the question about the future role of dedicated database biocurators in context of the thousands of crowdsourced, community annotations that are now being stored in wikis.
doi:10.1093/nar/gkr1195
PMCID: PMC3245093  PMID: 22144683
4.  The Pfam protein families database 
Nucleic Acids Research  2011;40(D1):D290-D301.
Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the ‘sunburst’ representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.
doi:10.1093/nar/gkr1065
PMCID: PMC3245129  PMID: 22127870
5.  InterPro in 2011: new developments in the family and domain prediction database 
Nucleic Acids Research  2011;40(D1):D306-D312.
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
doi:10.1093/nar/gkr948
PMCID: PMC3245097  PMID: 22096229
6.  The crystal structure of a bacterial Sufu-like protein defines a novel group of bacterial proteins that are similar to the N-terminal domain of human Sufu 
Sufu (Suppressor of Fused), a two-domain protein, plays a critical role in regulating Hedgehog signaling and is conserved from flies to humans. A few bacterial Sufu-like proteins have previously been identified based on sequence similarity to the N-terminal domain of eukaryotic Sufu proteins, but none have been structurally or biochemically characterized and their function in bacteria is unknown. We have determined the crystal structure of a more distantly related Sufu-like homolog, NGO1391 from Neisseria gonorrhoeae, at 1.4 Å resolution, which provides the first biophysical characterization of a bacterial Sufu-like protein. The structure revealed a striking similarity to the N-terminal domain of human Sufu (r.m.s.d. of 2.6 Å over 93% of the NGO1391 protein), despite an extremely low sequence identity of ∼15%. Subsequent sequence analysis revealed that NGO1391 defines a new subset of smaller, Sufu-like proteins that are present in ∼200 bacterial species and has resulted in expansion of the SUFU (PF05076) family in Pfam.
doi:10.1002/pro.497
PMCID: PMC3005784  PMID: 20836087
Neisseria gonorrhoeae; NGO1391; UniProt Q5F6Z8; Pfam PF05076; suppressor of fused; sufu-like; structural genomics
7.  HMMER web server: interactive sequence similarity searching 
Nucleic Acids Research  2011;39(Web Server issue):W29-W37.
HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.
doi:10.1093/nar/gkr367
PMCID: PMC3125773  PMID: 21593126
8.  Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation 
PLoS ONE  2011;6(4):e18910.
The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.
doi:10.1371/journal.pone.0018910
PMCID: PMC3083393  PMID: 21556138
9.  Rfam: Wikipedia, clans and the “decimal” release 
Nucleic Acids Research  2010;39(Database issue):D141-D145.
The Rfam database aims to catalogue non-coding RNAs through the use of sequence alignments and statistical profile models known as covariance models. In this contribution, we discuss the pros and cons of using the online encyclopedia, Wikipedia, as a source of community‐derived annotation. We discuss the addition of groupings of related RNA families into clans and new developments to the website. Rfam is available on the Web at http://rfam.sanger.ac.uk.
doi:10.1093/nar/gkq1129
PMCID: PMC3013711  PMID: 21062808
10.  The crystal structure of a bacterial Sufu-like protein defines a novel group of bacterial proteins that are similar to the N-terminal domain of human Sufu 
Sufu (Suppressor of Fused), a two-domain protein, plays a critical role in regulating Hedgehog signaling and is conserved from flies to humans. A few bacterial Sufu-like proteins have previously been identified based on sequence similarity to the N-terminal domain of eukaryotic Sufu proteins, but none have been structurally or biochemically characterized and their function in bacteria is unknown. We have determined the crystal structure of a more distantly related Sufu-like homolog, NGO1391 from Neisseria gonorrhoeae, at 1.4 Å resolution, which provides the first biophysical characterization of a bacterial Sufu-like protein. The structure revealed a striking similarity to the N-terminal domain of human Sufu (r.m.s.d. of 2.6 Å over 93% of the NGO1391 protein), despite an extremely low sequence identity of ∼15%. Subsequent sequence analysis revealed that NGO1391 defines a new subset of smaller, Sufu-like proteins that are present in ∼200 bacterial species and has resulted in expansion of the SUFU (PF05076) family in Pfam.
doi:10.1002/pro.497
PMCID: PMC3005784  PMID: 20836087
Neisseria gonorrhoeae; NGO1391; UniProt Q5F6Z8; Pfam PF05076; suppressor of fused; sufu-like; structural genomics
11.  Cytochrome b5 null mouse: a new model for studying inherited skin disorders and the role of unsaturated fatty acids in normal homeostasis 
Transgenic Research  2010;20(3):491-502.
Microsomal cytochrome b5 is a ubiquitous, 15.2 kDa haemoprotein implicated in a number of cellular processes such as fatty acid desaturation, drug metabolism, steroid hormone biosynthesis and methaemoglobin reduction. As a consequence of these functions this protein has been considered essential for life. Most of the ascribed functions of cytochrome b5, however, stem from in vitro studies and for this reason we have carried out a germline deletion of this enzyme. We have unexpectedly found that cytochrome b5 null mice were viable and fertile, with pups being born at expected Mendelian ratios. However, a number of intriguing phenotypes were identified, including altered drug metabolism, methaemoglobinemia and disrupted steroid hormone homeostasis. In addition to these previously identified roles for this protein, cytochrome b5 null mice displayed skin defects closely resembling those observed in autosomal recessive congenital ichthyosis and retardation of neonatal development, indicating that this protein, possibly as a consequence of its role in the de novo biosynthesis of unsaturated fatty acids, plays a central role in skin development and neonatal nutrition. Results from fatty acid profile analysis of several tissues suggest that cytochrome b5 plays a role controlling saturated/unsaturated homeostasis. These data demonstrate that regional concentrations of unsaturated fatty acids are controlled by endogenous metabolic pathways and not by diet alone.
Electronic supplementary material
The online version of this article (doi:10.1007/s11248-010-9426-1) contains supplementary material, which is available to authorized users.
doi:10.1007/s11248-010-9426-1
PMCID: PMC3090575  PMID: 20676935
Cytochrome b5; Ichthyosis; Methaemoglobinemia; Nutrition; Skin; Unsaturated fatty acids
12.  DUFs: families in search of function 
Domains of unknown function (DUFs) are a large set of uncharacterized protein families that structural genomics is helping biologists to understand functionally.
Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.
doi:10.1107/S1744309110001685
PMCID: PMC2954198  PMID: 20944204
structural genomics; domain of unknown function (DUF); uncharacterized protein family (UPF); Pfam
13.  The structure of BVU2987 from Bacteroides vulgatus reveals a superfamily of bacterial periplasmic proteins with possible inhibitory function 
The crystal structure of the BVU2987 gene product from B. vulgatus (UniProt A6L4L1) reveals that members of the new Pfam family PF11396 (domain of unknown function; DUF2874) are similar to β-lactamase inhibitor protein and YpmB.
Proteins that contain the DUF2874 domain constitute a new Pfam family PF11396. Members of this family have predominantly been identified in microbes found in the human gut and oral cavity. The crystal structure of one member of this family, BVU2987 from Bacteroides vulgatus, has been determined, revealing a β-lactamase inhibitor protein-like structure with a tandem repeat of domains. Sequence analysis and structural comparisons reveal that BVU2987 and other DUF2874 proteins are related to β-lactamase inhibitor protein, PepSY and SmpA_OmlA proteins and hence are likely to function as inhibitory proteins.
doi:10.1107/S1744309109046788
PMCID: PMC2954215  PMID: 20944221
BVU2987; DUF2874; PF11396; human gut microbiome; β-lactamase inhibitor protein-like fold; putative inhibitor proteins
14.  Bacterial Pleckstrin Homology Domains: A Prokaryotic Origin for the PH Domain 
Journal of Molecular Biology  2010;396(1):31-46.
Pleckstrin homology (PH) domains have been identified only in eukaryotic proteins to date. We have determined crystal structures for three members of an uncharacterized protein family (Pfam PF08000), which provide compelling evidence for the existence of PH-like domains in bacteria (PHb). The first two structures contain a single PHb domain that forms a dome-shaped, oligomeric ring with C5 symmetry. The third structure has an additional helical hairpin attached at the C-terminus and forms a similar but much larger ring with C12 symmetry. Thus, both molecular assemblies exhibit rare, higher-order, cyclic symmetry but preserve a similar arrangement of their PHb domains, which gives rise to a conserved hydrophilic surface at the intersection of the β-strands of adjacent protomers that likely mediates protein–protein interactions. As a result of these structures, additional families of PHb domains were identified, suggesting that PH domains are much more widespread than originally anticipated. Thus, rather than being a eukaryotic innovation, the PH domain superfamily appears to have existed before prokaryotes and eukaryotes diverged.
doi:10.1016/j.jmb.2009.11.006
PMCID: PMC2817789  PMID: 19913036
PH, Pleckstrin homology; PHb, bacterial PH domain; PTB, phosphotyrosine binding; VPS36, vacuolar protein sorting protein 36; DUF1696, domain of unknown function family 1696; JCSG, Joint Center for Structural Genomics; MAD, multiwavelength anomalous diffraction; PEG, polyethylene glycol; asu, asymmetric unit; PDB, Protein Data Bank; PIPE, Polymerase Incomplete Primer Extension; TEV, tobacco etch virus; TCEP, tris(2-carboxyethyl)phosphine–HCl; SSRL, Stanford Synchrotron Radiation Lightsource; ALS, Advanced Light Source; Pleckstrin homology (PH) domain; bacterial PH domain (PHb); higher-order symmetry; protein assembly; protein evolution
15.  The structure of pyogenecin immunity protein, a novel bacteriocin-like immunity protein from Streptococcus pyogenes 
Background
Many Gram-positive lactic acid bacteria (LAB) produce anti-bacterial peptides and small proteins called bacteriocins, which enable them to compete against other bacteria in the environment. These peptides fall structurally into three different classes, I, II, III, with class IIa being pediocin-like single entities and class IIb being two-peptide bacteriocins. Self-protective cognate immunity proteins are usually co-transcribed with these toxins. Several examples of cognates for IIa have already been solved structurally. Streptococcus pyogenes, closely related to LAB, is one of the most common human pathogens, so knowledge of how it competes against other LAB species is likely to prove invaluable.
Results
We have solved the crystal structure of the gene-product of locus Spy_2152 from S. pyogenes, (PDB:2fu2), and found it to comprise an anti-parallel four-helix bundle that is structurally similar to other bacteriocin immunity proteins. Sequence analyses indicate this protein to be a possible immunity protein protective against class IIa or IIb bacteriocins. However, given that S. pyogenes appears to lack any IIa pediocin-like proteins but does possess class IIb bacteriocins, we suggest this protein confers immunity to IIb-like peptides.
Conclusions
Combined structural, genomic and proteomic analyses have allowed the identification and in silico characterization of a new putative immunity protein from S. pyogenes, possibly the first structure of an immunity protein protective against potential class IIb two-peptide bacteriocins. We have named the two pairs of putative bacteriocins found in S. pyogenes pyogenecin 1, 2, 3 and 4.
doi:10.1186/1472-6807-9-75
PMCID: PMC2806384  PMID: 20017931
16.  The Pfam protein families database 
Nucleic Acids Research  2009;38(Database issue):D211-D222.
Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is ∼100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).
doi:10.1093/nar/gkp985
PMCID: PMC2808889  PMID: 19920124
17.  DASMI: exchanging, annotating and assessing molecular interaction data 
Bioinformatics  2009;25(10):1321-1328.
Motivation: Ever increasing amounts of biological interaction data are being accumulated worldwide, but they are currently not readily accessible to the biologist at a single site. New techniques are required for retrieving, sharing and presenting data spread over the Internet.
Results: We introduce the DASMI system for the dynamic exchange, annotation and assessment of molecular interaction data. DASMI is based on the widely used Distributed Annotation System (DAS) and consists of a data exchange specification, web servers for providing the interaction data and clients for data integration and visualization. The decentralized architecture of DASMI affords the online retrieval of the most recent data from distributed sources and databases. DASMI can also be extended easily by adding new data sources and clients. We describe all DASMI components and demonstrate their use for protein and domain interactions.
Availability: The DASMI tools are available at http://www.dasmi.de/ and http://ipfam.sanger.ac.uk/graph. The DAS registry and the DAS 1.53E specification is found at http://www.dasregistry.org/.
Contact: mario.albrecht@mpi-inf.mpg.de
Supplementary information: Supplementary data and all figures in color are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp142
PMCID: PMC2677739  PMID: 19420069
18.  SCOOP: a simple method for identification of novel protein superfamily relationships 
Bioinformatics (Oxford, England)  2007;23(7):809-814.
Motivation
Profile searches of sequence databases are a sensitive way to detect sequence relationships. Sophisticated profile-profile comparison algorithms that have been recently introduced increase search sensitivity even further.
Results
In this article, a simpler approach than profile-profile comparison is presented that has a comparable performance to state-of-the-art tools such as COMPASS, HHsearch and PRC. This approach is called SCOOP (Simple Comparison Of Outputs Program), and is shown to find known relationships between families in the Pfam database as well as detect novel distant relationships between families. Several novel discoveries are presented including the discovery that a domain of unknown function (DUF283) found in Dicer proteins is related to double-stranded RNA-binding domains.
Availability
SCOOP is freely available under a GNU GPL license from http://www.sanger.ac.uk/Users/agb/SCOOP/
doi:10.1093/bioinformatics/btm034
PMCID: PMC2603044  PMID: 17277330
19.  Unsaturated fatty acid regulation of cytochrome P450 expression via a CAR-dependent pathway 
Biochemical Journal  2008;417(Pt 1):43-54.
The liver is responsible for key metabolic functions, including control of normal homoeostasis in response to diet and xenobiotic metabolism/detoxification. We have shown previously that inactivation of the hepatic cytochrome P450 system through conditional deletion of POR (P450 oxidoreductase) induces hepatic steatosis, liver growth and P450 expression. We have exploited a new conditional model of POR deletion to investigate the mechanism underlying these changes. We demonstrate that P450 induction, liver growth and hepatic triacylglycerol (triglyceride) homoeostasis are intimately linked and provide evidence that the observed phenotypes result from hepatic accumulation of unsaturated fatty acids, which mediate these phenotypes by activation of the nuclear receptor CAR (constitutive androstane receptor) and, to a lesser degree, PXR (pregnane X receptor). To our knowledge this is the first direct evidence that P450s play a major role in controlling unsaturated fatty acid homoeostasis via CAR. The regulation of P450s involved in xenobiotic metabolism by this mechanism has potentially significant implications for individual responses to drugs and environmental chemicals.
doi:10.1042/BJ20080740
PMCID: PMC2605957  PMID: 18778245
constitutive androstane receptor (CAR); cytochrome P450; linoleic acid; P450 oxidoreductase (POR); pregnane X receptor (PXR); steatosis; AhR, aryl hydrocarbon receptor; CAR, constitutive androstane receptor; Cpt1a, carnitine palmitoyltransferase 1a; CYP, cytochrome P450; FAS, fatty acid synthase; HRN, hepatic reductase-null; i.p., intraperitoneal; 3MC, 3-methylcholanthrene; P450, cytochrome P450; PB, phenobarbital; POR, P450 oxidoreductase; PPAR, peroxisome-proliferator-activated receptor; PUFA, polyunsaturated fatty acid; PXR, pregnane X receptor; TCPOBOP, 1,4-bis-[2-(3,5-dichloropyridyloxy)]benzene
20.  Defining the in Vivo Role for Cytochrome b5 in Cytochrome P450 Function through the Conditional Hepatic Deletion of Microsomal Cytochrome b5*S⃞ 
The Journal of Biological Chemistry  2008;283(46):31385-31393.
In vitro, cytochrome b5 modulates the rate of cytochrome P450-dependent mono-oxygenation reactions. However, the role of this enzyme in determining drug pharmacokinetics in vivo and the consequential effects on drug absorption distribution, metabolism, excretion, and toxicity are unclear. In order to resolve this issue, we have carried out the conditional deletion of microsomal cytochrome b5 in the liver to create the hepatic microsomal cytochrome b5 null mouse. These mice develop and breed normally and have no overt phenotype. In vitro studies using a range of substrates for different P450 enzymes showed that in hepatic microsomal cytochrome b5 null NADH-mediated metabolism was essentially abolished for most substrates, and the NADPH-dependent metabolism of many substrates was reduced by 50–90%. This reduction in metabolism was also reflected in the in vivo elimination profiles of several drugs, including midazolam, metoprolol, and tolbutamide. In the case of chlorzoxazone, elimination was essentially unchanged. For some drugs, the pharmacokinetics were also markedly altered; for example, when administered orally, the maximum plasma concentration for midazolam was increased by 2.5-fold, and the clearance decreased by 3.6-fold in hepatic microsomal cytochrome b5 null mice. These data indicate that microsomal cytochrome b5 can play a major role in the in vivo metabolism of certain drugs and chemicals but in a P450- and substrate-dependent manner.
doi:10.1074/jbc.M803496200
PMCID: PMC2581580  PMID: 18805792
21.  Phospholipid scramblases and Tubby-like proteins belong to a new superfamily of membrane tethered transcription factors 
Bioinformatics  2008;25(2):159-162.
Motivation: Phospholipid scramblases (PLSCRs) constitute a family of cytoplasmic membrane-associated proteins that were identified based upon their capacity to mediate a Ca2+-dependent bidirectional movement of phospholipids across membrane bilayers, thereby collapsing the normally asymmetric distribution of such lipids in cell membranes. The exact function and mechanism(s) of these proteins nevertheless remains obscure: data from several laboratories now suggest that in addition to their putative role in mediating transbilayer flip/flop of membrane lipids, the PLSCRs may also function to regulate diverse processes including signaling, apoptosis, cell proliferation and transcription. A major impediment to deducing the molecular details underlying the seemingly disparate biology of these proteins is the current absence of any representative molecular structures to provide guidance to the experimental investigation of their function.
Results: Here, we show that the enigmatic PLSCR family of proteins is directly related to another family of cellular proteins with a known structure. The Arabidopsis protein At5g01750 from the DUF567 family was solved by X-ray crystallography and provides the first structural model for this family. This model identifies that the presumed C-terminal transmembrane helix is buried within the core of the PLSCR structure, suggesting that palmitoylation may represent the principal membrane anchorage for these proteins. The fold of the PLSCR family is also shared by Tubby-like proteins. A search of the PDB with the HHpred server suggests a common evolutionary ancestry. Common functional features also suggest that tubby and PLSCR share a functional origin as membrane tethered transcription factors with capacity to modulate phosphoinositide-based signaling.
Contact: agb@sanger.ac.uk
doi:10.1093/bioinformatics/btn595
PMCID: PMC2639001  PMID: 19010806
22.  Modifier Effects between Regulatory and Protein-Coding Variation 
PLoS Genetics  2008;4(10):e1000244.
Genome-wide associations have shown a lot of promise in dissecting the genetics of complex traits in humans with single variants, yet a large fraction of the genetic effects is still unaccounted for. Analyzing genetic interactions between variants (epistasis) is one of the potential ways forward. We investigated the abundance and functional impact of a specific type of epistasis, namely the interaction between regulatory and protein-coding variants. Using genotype and gene expression data from the 210 unrelated individuals of the original four HapMap populations, we have explored the combined effects of regulatory and protein-coding single nucleotide polymorphisms (SNPs). We predict that about 18% (1,502 out of 8,233 nsSNPs) of protein-coding variants are differentially expressed among individuals and demonstrate that regulatory variants can modify the functional effect of a coding variant in cis. Furthermore, we show that such interactions in cis can affect the expression of downstream targets of the gene containing the protein-coding SNP. In this way, a cis interaction between regulatory and protein-coding variants has a trans impact on gene expression. Given the abundance of both types of variants in human populations, we propose that joint consideration of regulatory and protein-coding variants may reveal additional genetic effects underlying complex traits and disease and may shed light on causes of differential penetrance of known disease variants.
Author Summary
The ultimate goal of genome-wide association studies (GWAS) is to explain the proportion of variation in a phenotypic trait that can be attributed to genetic factors. The past two years have seen a plethora of successes in this field, yet, for most traits, a large fraction of variation remains unexplained. Epistasis, or interaction between genetic variants, is a largely under-explored factor, which may shed some light in this area. We use the HapMap populations to investigate interactions between regulatory and protein-coding variants and their impact on gene expression. We show that if a specific protein-coding variant has a functional impact, this can be modified by a co-segregating regulatory variant (cis interaction). Furthermore, the authors demonstrate that such modification effects between variants at one locus may affect the expression of other genes in the cell in a trans manner. The aim of this article is to present a framework though which variation can be considered in the context of GWAS. Viewing variation from this underappreciated angle may, in some cases, provide an explanation for differential penetrance of complex disease traits, but also for non-replication of GWAS results that may arise as a consequence of such interactions.
doi:10.1371/journal.pgen.1000244
PMCID: PMC2570624  PMID: 18974877
23.  Rfam: updates to the RNA families database 
Nucleic Acids Research  2008;37(Database issue):D136-D140.
Rfam is a collection of RNA sequence families, represented by multiple sequence alignments and covariance models (CMs). The primary aim of Rfam is to annotate new members of known RNA families on nucleotide sequences, particularly complete genomes, using sensitive BLAST filters in combination with CMs. A minority of families with a very broad taxonomic range (e.g. tRNA and rRNA) provide the majority of the sequence annotations, whilst the majority of Rfam families (e.g. snoRNAs and miRNAs) have a limited taxonomic range and provide a limited number of annotations. Recent improvements to the website, methodologies and data used by Rfam are discussed. Rfam is freely available on the Web at http://rfam.sanger.ac.uk/and http://rfam.janelia.org/.
doi:10.1093/nar/gkn766
PMCID: PMC2686503  PMID: 18953034
24.  InterPro: the integrative protein signature database 
Nucleic Acids Research  2008;37(Database issue):D211-D215.
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein–protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
doi:10.1093/nar/gkn785
PMCID: PMC2686546  PMID: 18940856
25.  Integrating biological data – the Distributed Annotation System 
BMC Bioinformatics  2008;9(Suppl 8):S3.
Background
The Distributed Annotation System (DAS) is a widely adopted protocol for dynamically integrating a wide range of biological data from geographically diverse sources. DAS continues to expand its applicability and evolve in response to new challenges facing integrative bioinformatics.
Results
Here we describe the various infrastructure components of DAS and present a new extended version of the DAS specification. Version 1.53E incorporates several recent developments, including its extension to serve new data types and an ontology for protein features.
Conclusion
Our extensions to the DAS protocol have facilitated the integration of new data types, and our improvements to the existing DAS infrastructure have addressed recent challenges. The steadily increasing numbers of available data sources demonstrates further adoption of the DAS protocol.
doi:10.1186/1471-2105-9-S8-S3
PMCID: PMC2500094  PMID: 18673527

Results 1-25 (33)