Tau is an intrinsically disordered protein (IDP) whose primary physiological role is to stabilize microtubules in neuronal axons at all stages of development. In Alzheimer's and other tauopathies, tau forms intracellular insoluble amyloid aggregates known as neurofibrillary tangles, a process that appears in many cases to be preceded by hyperphosphorylation of tau monomers. Understanding the shift in conformational bias induced by hyperphosphorylation is key to elucidating the structural factors that drive tau pathology, however, as an IDP, tau is not amenable to conventional structural characterization. In this work, we employ a straightforward technique based on Time-Resolved ElectroSpray Ionization Mass Spectrometry (TRESI-MS) and Hydrogen/Deuterium Exchange (HDX) to provide a detailed picture of residual structure in tau, and the shifts in conformational bias induced by hyperphosphorylation. By comparing the native and hyperphosphorylated ensembles, we are able to define specific conformational biases that can easily be rationalized as enhancing amyloidogenic propensity. Representative structures for the native and hyperphosphorylated tau ensembles were generated by refinement of a broad sample of conformations generated by low-computational complexity modeling, based on agreement with the TRESI-HDX profiles.
Time series data can provide valuable insight into the complexity of biological reactions. Such information can be obtained by mass-spectrometry-based approaches that measure pre-steady-state kinetics. These methods are based on a mixing device that rapidly mixes the reactants prior to the on-line mass measurement of the transient intermediate steps. Here, we describe an improved continuous-flow mixing apparatus for real-time electrospray mass spectrometry measurements. Our setup was designed to minimize metal–solution interfaces and provide a sheath flow of nitrogen gas for generating stable and continuous spray that consequently enhances the signal-to-noise ratio. Moreover, the device was planned to enable easy mounting onto a mass spectrometer replacing the commercial electrospray ionization source. We demonstrate the performance of our apparatus by monitoring the unfolding reaction of cytochrome C, yielding improved signal-to-noise ratio and reduced experimental repeat errors.
kinetics; mass spectrometry; on-line measurements; proteins; rapid mixing device; time-resolved experiments; structure characterization of biomolecules
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
The evolutionary importance of hybridization and introgression has long been debated1. We used genomic tools to investigate introgression in Heliconius, a rapidly radiating genus of neotropical butterflies widely used in studies of ecology, behaviour, mimicry and speciation2-5 . We sequenced the genome of Heliconius melpomene and compared it with other taxa to investigate chromosomal evolution in Lepidoptera and gene flow among multiple Heliconius species and races. Among 12,657 predicted genes for Heliconius, biologically important expansions of families of chemosensory and Hox genes are particularly noteworthy. Chromosomal organisation has remained broadly conserved since the Cretaceous, when butterflies split from the silkmoth lineage. Using genomic resequencing, we show hybrid exchange of genes between three co-mimics, H. melpomene, H. timareta, and H. elevatus, especially at two genomic regions that control mimicry pattern. Closely related Heliconius species clearly exchange protective colour pattern genes promiscuously, implying a major role for hybridization in adaptive radiation.
Bone biomechanical performance is a complex trait or, more properly, an ensemble of complex traits. Biomechanical performance incorporates flexibility under loading, yield and failure load, and energy to failure; all are important measures of bone function. To date, the vast majority of work has focused on yield and failure load and its surrogate, bone mineral density. We performed a reciprocal intercross of the mouse strains HcB-8 and HcB-23 to map and ultimately identify genes that contribute to differences in biomechanical performance. Mechanical testing was performed by 3-point bending of the femora. We measured femoral diaphysis cross-sectional anatomy from photographs of the fracture surfaces. We used beam equations to calculate material level mechanical properties. We performed a principal component (PC) analysis of normalized whole bone phenotypes (17 input traits). We measured distances separating mandibular landmarks from calibrated digital photographs and performed linkage analysis. Experiment-wide α = 0.05 significance thresholds were established by permutation testing. Three quantitative trait loci (QTLs) identified in these studies illustrate the advantages of the comprehensive phenotyping approach. A pleiotropic QTL on chromosome 4 affected multiple whole bone phenotypes with LOD scores as large as 17.5, encompassing size, cross-sectional ellipticity, stiffness, yield and failure load, and bone mineral density. This locus was linked to 3 of the PCs but unlinked to any of the tissue level phenotypes. From this pattern, we infer that the QTL operates by modulating the proliferative response to mechanical loading. On this basis, we successfully predicted that this locus also affects the length of a specific region of the mandible. A pleiotropic locus on chromosome 10 with LOD scores displays opposite effects on failure load and toughness with LOD scores of 4.5 and 5.5, respectively, so that the allele that increases failure load decreases toughness. A chromosome 19 QTL for PC2 with an LOD score of 4.8 was not detected with either the whole bone or tissue level phenotypes. We conclude that first, comprehensive, system-oriented phenotyping provides much information that could not be obtained by focusing on bone mineral density alone. Second, mechanical performance includes inherent trade-offs between strength and brittleness. Third, considering the aggregate phenotypic data allows prediction of novel QTLs.
Biomechanics; Bone modeling; Quantitative trait loci; Linkage; Pleiotropy; Principal components
VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community.
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
The SUPERFAMILY resource provides protein domain assignments at the structural classification of protein (SCOP) superfamily level for over 1400 completely sequenced genomes, over 120 metagenomes and other gene collections such as UniProt. All models and assignments are available to browse and download at http://supfam.org. A new hidden Markov model library based on SCOP 1.75 has been created and a previously ignored class of SCOP, coiled coils, is now included. Our scoring component now uses HMMER3, which is in orders of magnitude faster and produces superior results. A cloud-based pipeline was implemented and is publicly available at Amazon web services elastic computer cloud. The SUPERFAMILY reference tree of life has been improved allowing the user to highlight a chosen superfamily, family or domain architecture on the tree of life. The most significant advance in SUPERFAMILY is that now it contains a domain-based gene ontology (GO) at the superfamily and family levels. A new methodology was developed to ensure a high quality GO annotation. The new methodology is general purpose and has been used to produce domain-based phenotypic ontologies in addition to GO.
DNA-binding domains (DBDs) are essential components of sequence-specific transcription factors (TFs). We have investigated the distribution of all known DBDs in more than 500 completely sequenced genomes from the three major superkingdoms (Bacteria, Archaea and Eukaryota) and documented conserved and specific DBD occurrence in diverse taxonomic lineages. By combining DBD occurrence in different species with taxonomic information, we have developed an automatic method for inferring the origins of DBD families and their specific combinations with other protein families in TFs. We found only three out of 131 (2%) DBD families shared by the three superkingdoms.
Sequence-specific transcription factors (TFs) are important to genetic regulation in all organisms because they recognize and directly bind to regulatory regions on DNA. Here, we survey and summarize the TF resources available. We outline the organisms for which TF annotation is provided, and discuss the criteria and methods used to annotate TFs by different databases. By using genomic TF repertoires from ∼700 genomes across the tree of life, covering Bacteria, Archaea and Eukaryota, we review TF abundance with respect to the number of genes, as well as their structural complexity in diverse lineages. While typical eukaryotic TFs are longer than the average eukaryotic proteins, the inverse is true for prokaryotes. Only in eukaryotes does the same family of DNA-binding domain (DBD) occur multiple times within one polypeptide chain. This potentially increases the length and diversity of DNA-recognition sequence by reusing DBDs from the same family. We examined the increase in TF abundance with the number of genes in genomes, using the largest set of prokaryotic and eukaryotic genomes to date. As pointed out before, prokaryotic TFs increase faster than linearly. We further observe a similar relationship in eukaryotic genomes with a slower increase in TFs.
FlyTF (http://www.flytf.org) is a database of computationally predicted and/or experimentally verified site-specific transcription factors (TFs) in the fruit fly Drosophila melanogaster. The manual classification of TFs in the initial version of FlyTF that concentrated primarily on the DNA-binding characteristics of the proteins has now been extended to a more fine-grained annotation of both DNA binding and regulatory properties in the new release. Furthermore, experimental evidence from the literature was classified into a defined vocabulary, and in collaboration with FlyBase, translated into Gene Ontology (GO) annotation. While our GO annotations will also be available through FlyBase as they will be incorporated into the genes’ official GO annotation in the future, the entire evidence used for classification including computational predictions and quotes from the literature can be accessed through FlyTF. The FlyTF website now builds upon the InterMine framework, which provides experimental and computational biologists with powerful search and filter functionality, list management tools and access to genomic information associated with the TFs.
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein–protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
DNA-binding domain (DBD) is a database of predicted sequence-specific DNA-binding transcription factors (TFs) for all publicly available proteomes. The proteomes have increased from 150 in the initial version of DBD to over 700 in the current version. All predicted TFs must contain a significant match to a hidden Markov model representing a sequence-specific DNA-binding domain family. Access to TF predictions is provided through http://transcriptionfactor.org, where new search options are now provided such as searching by gene names in model organisms, searching for all proteins in a particular DBD family and specific organism. We illustrate the application of this type of search facility by contrasting trends of DBD family occurrence throughout the tree of life, highlighting the clear partition between eukaryotic and prokaryotic DBD expansions. The website content has been expanded to include dedicated pages for each TF containing domain assignment details, gene names, links to external databases and links to TFs with similar domain arrangements. We compare the increase in number of predicted TFs with proteome size in eukaryotes and prokaryotes. Eukaryotes follow a slower rate of increase in TFs than prokaryotes, which could be due to the presence of splice variants or an increase in combinatorial control.
The Dynesys, a flexible posterior stabilization system that provides an alternative to fusion, is designed to preserve intersegmental kinematics and alleviate loading at the facet joints. Recent biomechanical evidence suggests that the overall range of motion (ROM) with the Dynesys is less than the intact spine. The purpose of this investigation was to conduct a comprehensive characterization of the three-dimensional kinematic behaviour of the Dynesys and determine if the length of the Dynesys polymer spacer contributes to differences in the kinematic behaviour at the implanted level. Ten cadaveric lumbar spine segments (L2–L5) were tested by applying a pure moment of ±7.5 Nm in flexion–extension, lateral bending, and axial rotation, with and without a follower preload of 600 N. Test conditions included: (a) intact; (b) injury; (c) injury stabilized with Dynesys at L3–L4 (standard spacer); (d) long spacer (+2 mm); and (e) short spacer (−2 mm). Intervertebral rotations were measured using an optoelectronic camera system. The intersegmental range of motion (ROM), neutral zone (NZ), and three-dimensional helical axis of motion (HAM) were calculated. Statistical significance of changes in ROM, NZ, and HAM was determined using repeated measures analysis of variance (ANOVA) and Student–Newman–Keuls post-hoc analysis with P<0.05. Implantation of the standard length Dynesys significantly reduced ROM compared to the intact and injured specimens, with the least significant changes seen in axial rotation. Injury typically increased the NZ, but implantation of the Dynesys restored the NZ to a magnitude less that that of the intact spine. The Dynesys produced a significant posterior shift in the HAM in flexion–extension and axial rotation. The spacer length had a significant effect on ROM with the long spacer resulting in the largest ROM in all loading directions without a follower preload. The largest differences were in axial rotation. A 4 mm increase in spacer length led to an average intersegmental motion increase of 30% in axial rotation, 23% in extension, 14% in flexion, and 11% in lateral bending. There were no significant changes in NZ with different spacer lengths. Typically, the short spacer caused a greater shift and a greater change in orientation of the HAM than the long spacer. The long spacer resulted in a ROM and a motion pattern, as represented by the HAM, that was closer to that seen in an intact specimen. The results of this study suggest that the length of the Dynesys spacer altered the segmental position and therefore affected kinematic behaviour.
Biomechanics; Lumbar spine; Non-fusion; Stabilization; Surgical treatment
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .
The SUPERFAMILY database provides protein domain assignments, at the SCOP ‘superfamily’ level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from . The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment.
Three cases of human leptospirosis occurred on a small dairy farm at the foot of the Black Mountains in Powys. We describe the clinical course of these three patients and consider the sources of infection and the industrial implications.