Search tips
Search criteria

Results 1-25 (1540977)

Clipboard (0)

Related Articles

1.  Automated nucleic acid chain tracing in real time 
IUCrJ  2014;1(Pt 6):387-392.
A method is presented for the automatic building of nucleotide chains into electron density which is fast enough to be used in interactive model-building software. Likely nucleotides lying in the vicinity of the current view are located and then grown into connected chains in a fraction of a second. When this development is combined with existing tools, assisted manual model building is as simple as or simpler than for proteins.
The crystallographic structure solution of nucleotides and nucleotide complexes is now commonplace. The resulting electron-density maps are often poorer than for proteins, and as a result interpretation in terms of an atomic model can require significant effort, particularly in the case of large structures. While model building can be performed automatically, as with proteins, the process is time-consuming, taking minutes to days depending on the software and the size of the structure. A method is presented for the automatic building of nucleotide chains into electron density which is fast enough to be used in interactive model-building software, with extended chain fragments built around the current view position in a fraction of a second. The speed of the method arises from the determination of the ‘fingerprint’ of the sugar and phosphate groups in terms of conserved high-density and low-density features, coupled with a highly efficient scoring algorithm. Use cases include the rapid evaluation of an initial electron-density map, addition of nucleotide fragments to prebuilt protein structures, and in favourable cases the completion of the structure while automated model-building software is still running. The method has been incorporated into the Coot software package.
PMCID: PMC4224457  PMID: 25485119
nucleic acid chain tracing; Coot
2.  A Probabilistic Fragment-Based Protein Structure Prediction Algorithm 
PLoS ONE  2012;7(7):e38799.
Conformational sampling is one of the bottlenecks in fragment-based protein structure prediction approaches. They generally start with a coarse-grained optimization where mainchain atoms and centroids of side chains are considered, followed by a fine-grained optimization with an all-atom representation of proteins. It is during this coarse-grained phase that fragment-based methods sample intensely the conformational space. If the native-like region is sampled more, the accuracy of the final all-atom predictions may be improved accordingly. In this work we present EdaFold, a new method for fragment-based protein structure prediction based on an Estimation of Distribution Algorithm. Fragment-based approaches build protein models by assembling short fragments from known protein structures. Whereas the probability mass functions over the fragment libraries are uniform in the usual case, we propose an algorithm that learns from previously generated decoys and steers the search toward native-like regions. A comparison with Rosetta AbInitio protocol shows that EdaFold is able to generate models with lower energies and to enhance the percentage of near-native coarse-grained decoys on a benchmark of proteins. The best coarse-grained models produced by both methods were refined into all-atom models and used in molecular replacement. All atom decoys produced out of EdaFold’s decoy set reach high enough accuracy to solve the crystallographic phase problem by molecular replacement for some test proteins. EdaFold showed a higher success rate in molecular replacement when compared to Rosetta. Our study suggests that improving low resolution coarse-grained decoys allows computational methods to avoid subsequent sampling issues during all-atom refinement and to produce better all-atom models. EdaFold can be downloaded from
PMCID: PMC3400640  PMID: 22829868
3.  Reconstruction of Protein Backbones from the BriX Collection of Canonical Protein Fragments 
PLoS Computational Biology  2008;4(5):e1000083.
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures.
Author Summary
Large-scale DNA sequencing efforts produce large amounts of protein sequence data. However, in order to understand the function of a protein, its tertiary three-dimensional structure is required. Despite worldwide efforts in structural biology, experimental protein structures are determined at a significantly slower pace. As a result, computational methods for protein structure prediction receive significant attention. A large part of the structure prediction problem lies in the enormous size of the problem: proteins seem to occur in an infinite variety of shapes. Here, we propose that this huge complexity may be overcome by identifying recurrent protein fragments, which are frequently reused as building blocks to construct proteins that were hitherto thought to be unrelated. The BriX database is the outcome of identifying about 2,000 canonical shapes among 1,261 protein structures. We show any given protein can be reconstructed from this library of building blocks at a very high resolution, suggesting that the modelling of protein backbones may be greatly aided by our database.
PMCID: PMC2367438  PMID: 18483555
4.  Automated main-chain model building by template matching and iterative fragment extension 
A method for automated macromolecular main-chain model building is described.
An algorithm for the automated macromolecular model building of polypeptide backbones is described. The procedure is hierarchical. In the initial stages, many overlapping polypeptide fragments are built. In subsequent stages, the fragments are extended and then connected. Identification of the locations of helical and β-strand regions is carried out by FFT-based template matching. Fragment libraries of helices and β-strands from refined protein structures are then positioned at the potential locations of helices and strands and the longest segments that fit the electron-density map are chosen. The helices and strands are then extended using fragment libraries consisting of sequences three amino acids long derived from refined protein structures. The resulting segments of polypeptide chain are then connected by choosing those which overlap at two or more Cα positions. The fully automated procedure has been implemented in RESOLVE and is capable of model building at resolutions as low as 3.5 Å. The algorithm is useful for building a preliminary main-chain model that can serve as a basis for refinement and side-chain addition.
PMCID: PMC2745878  PMID: 12499537
model building; template matching; fragment extension
5.  Kinesin-like protein CENP-E is upregulated in rheumatoid synovial fibroblasts 
Arthritis Research  1999;1(1):71-80.
Our aim was to identify specifically expressed genes using RNA arbitrarily primed (RAP)-polymerase chain reaction (PCR) for differential display in patients with rheumatoid arthritis (RA). In RA, amplification of a distinct PCR product suitable for sequencing could be observed. Sequence analysis identified the PCR product as highly homologous to a 434 base pair segment of the human centromere kinesin-like protein CENP-E. Differential expression of CENP-E was confirmed by quantitative reverse transcription PCR, immunohistochemistry and in situ hybridization. CENP-E expression was independent from prednisolone and could not be completely inhibited by serum starvation. RAP-PCR is a suitable method to identify differentially expressed genes in rheumatoid synovial fibroblasts. Also, because motifs of CENP-E show homologies to jun and fos oncogene products and are involved in virus assembly, CENP-E may be involved in the pathophysiology of RA.
Articular destruction by invading synovial fibroblasts is a typical feature in rheumatoid arthritis (RA). Recent data support the hypothesis that key players in this scenario are transformed-appearing synovial fibroblasts at the site of invasion into articular cartilage and bone. They maintain their aggressive phenotype toward cartilage, even when first cultured and thereafter coimplanted together with normal human cartilage into severe combined immunodeficient mice for an extended period of time. However, little is known about the upregulation of genes that leads to this aggressive fibroblast phenotype. To inhibit this progressive growth without interfering with pathways of physiological matrix remodelling, identification of pathways that operate specifically in RA synovial fibroblasts is required. In order to achieve this goal, identification of genes showing upregulation restricted to RA synovial fibroblasts is essential.
To identify specifically expressed genes using RNA arbitrarily primed (RAP)-polymerase chain reaction (PCR) for differential display in patients with RA.
RNA was extracted from cultured synovial fibroblasts from 10 patients with RA, four patients with osteoarthritis (OA), and one patient with psoriatic arthritis. RAP-PCR was performed using different arbitrary primers for first-strand and second-strand synthesis. First-strand and second-strand synthesis were performed using arbitrary primers: US6 (5' -GTGGTGACAG-3') for first strand, and Nuclear 1+ (5' -ACGAAGAAGAG-3'), OPN28 (5' -GCACCAGGGG-3'), Kinase A2+ (5' -GGTGCCTTTGG-3')and OPN24 (5' -AGGGGCACCA-3') for second-strand synthesis. PCR reactions were loaded onto 8 mol/l urea/6% polyacrylamide-sequencing gels and electrophoresed.Gel slices carrying the target fragment were then excised with a razor blade, eluated and reamplified. After verifying their correct size and purity on 4% agarose gels, the reamplified products derived from the single-strand confirmation polymorphism gel were cloned, and five clones per transcript were sequenced. Thereafter, a GenBank® analysis was performed. Quantitative reverse transcription PCR of the segments was performed using the PCR MIMIC® technique.In-situ expression of centromere kinesin-like protein-E (CENP-E) messenger (m)RNA in RA synovium was assessed using digoxigenin-labelled riboprobes, and CENP-E protein expression in fibroblasts and synovium was performed by immunogold-silver immunohistochemistry and cytochemistry. Functional analysis of CENP-E was done using different approaches (eg glucocorticoid stimulation, serum starvation and growth rate analysis of synovial fibroblasts that expressed CENP-E).
In RA, amplification of a distinct PCR product suitable for sequencing could be observed. The indicated complementary DNA fragment of 434 base pairs from RA mRNA corresponded to nucleotides 6615-7048 in the human centromere kinesin-like protein CENP-E mRNA (GenBank® accession No. emb/Z15005).The isolated sequence shared greater than 99% nucleic acid (P = 2.9e-169) identity with the human centromere kinesin-like protein CENP-E. Two base changes at positions 6624 (A to C) and 6739 (A to G) did not result in alteration in the amino acid sequence, and therefore 100% amino acid identity could be confirmed. The amplification of 10 clones of the cloned RAP product revealed the presence of CENP-E mRNA in every fibroblast culture examined, showing from 50% (271.000 ± 54.000 phosphor imager arbitrary units) up to fivefold (961.000 ± 145.000 phosphor imager arbitrary units) upregulation when compared with OA fibroblasts. Neither therapy with disease-modifying antirheumatic drugs such as methotrexate, gold, resochine or cyclosporine A, nor therapy with oral steroids influenced CENP-E expression in the RA fibroblasts. Of the eight RA fibroblast populations from RA patients who were receiving disease-modifying antirheumatic drugs, five showed CENP-E upregulation; and of the eight fibroblast populations from RA patients receiving steroids, four showed CENP-E upregulation.
Numerous synovial cells of the patients with RA showed a positive in situ signal for the isolated CENP-E gene segment, confirming CENP-E mRNA production in rheumatoid synovium, whereas in OA synovial tissue CENP-E mRNA could not be detected. In addition, CENP-E expression was independent from medication. This was further confirmed by analysis of the effect of prednisolone on CENP-E expression, which revealed no alteration in CENP-E mRNA after exposure to different (physiological) concentrations of prednisolone. Serum starvation also could not suppress CENP-E mRNA completely.
Since its introduction in 1992, numerous variants of the differential display method and continuous improvements including RAP-PCR have proved to have both efficiency and reliability in examination of differentially regulated genes. The results of the present study reveal that RAP-PCR is a suitable method to identify differentially expressed genes in rheumatoid synovial fibroblasts.
The mRNA, which has been found to be upregulated in rheumatoid synovial fibroblasts, codes for a kinesin-like motor protein named CENP-E, which was first characterized in 1991. It is a member of a family of centromere-associated proteins, of which six (CENP-A to CENP-F) are currently known. CENP-E itself is a kinetochore motor, which accumulates transiently at kinetochores in the G2 phase of the cell cycle before mitosis takes place, appears to modulate chromosome movement and spindle elongation,and is degraded at the end of mitosis. The presence or upregulation of CENP-E has never been associated with RA.
The three-dimensional structure of CENP-E includes a coiled-coil domain. This has important functions and shows links to known pathways in RA pathophysiology. Coiled-coil domains can also be found in jun and fos oncogene products, which are frequently upregulated in RA synovial fibroblasts. They are also involved in DNA binding and transactivation processes resembling the situation in AP-1 (Jun/Fos)-dependent DNA-binding in rheumatoid synovium. Most interestingly, these coiled-coil motifs are crucial for the assembly of viral proteins, and the upregulation of CENP-E might reflect the influence of infectious agents in RA synovium. We also performed experiments showing that serum starvation decreased, but did not completely inhibit CENP-E mRNA expression. This shows that CENP-E is related to, but does not completely depend on proliferation of these cells. In addition, we determined the growth rate of CENP-E high and low expressors, showing that it was independent from the amount of CENP-E expression. supporting the statement that upregulation of CENP-E reflects an activated RA fibroblast phenotype. In summary, the results of the present study support the hypothesis that CENP-E, presumably independently from medication, may not only be upregulated, but may also be involved in RA pathophysiology.
PMCID: PMC17776  PMID: 11056662
arthritis; centromere; differential display; immunohistochemistry; in situ hybridization; RNA fingerprinting
6.  De novo protein sequence analysis of Macaca mulatta 
BMC Genomics  2007;8:270.
Macaca mulatta is one of the most utilized non-human primate species in biomedical research offering unique behavioral, neuroanatomical, and neurobiochemcial similarities to humans. This makes it a unique organism to model various diseases such as psychiatric and neurodegenerative illnesses while also providing insight into the complexities of the primate brain. A major obstacle in utilizing rhesus monkey models for human disease is the paucity of protein annotations for this species (~42,000 protein annotations) compared to 330,210 protein annotations for humans. The lack of available information limits the use of rhesus monkey for proteomic scale studies which rely heavily on database searches for protein identification. While characterization of proteins of interest from Macaca mulatta using the standard database search engines (e.g., MASCOT) can be accomplished, searches must be performed using a 'broad species database' which does not provide optimal confidence in protein annotation. Therefore, it becomes necessary to determine partial or complete amino acid sequences using either manual or automated de novo peptide sequence analysis methods.
The recently popularized MALDI-TOF-TOF mass spectrometer yields a complex MS/MS fragmentation pattern difficult to characterize by manual de novo sequencing method on a proteomics scale. Therefore, PEAKS assisted de novo sequencing was performed on nucleus accumbens cytosolic proteins from Macaca mulatta. The most abundant peptide fragments 'b-ions and y-ions', the less abundant peptide fragments 'a-ions' as well as the immonium ions were utilized to develop confident and complete peptide sequences de novo from MS/MS spectra. The generated sequences were used to perform homology searches to characterize the protein identification.
The current study validates a robust method to confidently characterize the proteins from an incomplete sequence database of Macaca mulatta, using the PEAKS de novo sequencing software, facilitating the use of this animal model in various neuroproteomics studies.
PMCID: PMC1965481  PMID: 17686166
7.  SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information 
Nucleic Acids Research  2014;42(Web Server issue):W252-W258.
Protein structure homology modelling has become a routine technique to generate 3D models for proteins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable models without the need for complex software packages or downloading large databases. Here, we describe the latest version of the SWISS-MODEL expert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The improved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models. The accuracy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates and select the ones to be used for model building. In cases where multiple alternative template structures are available for a protein of interest, a user-guided template selection step allows building models in different functional states. SWISS-MODEL is available at
PMCID: PMC4086089  PMID: 24782522
8.  A workflow learning model to improve geovisual analytics utility 
This paper describes the design and implementation of the G-EX Portal Learn Module, a web-based, geocollaborative application for organizing and distributing digital learning artifacts. G-EX falls into the broader context of geovisual analytics, a new research area with the goal of supporting visually-mediated reasoning about large, multivariate, spatiotemporal information. Because this information is unprecedented in amount and complexity, GIScientists are tasked with the development of new tools and techniques to make sense of it. Our research addresses the challenge of implementing these geovisual analytics tools and techniques in a useful manner.
The objective of this paper is to develop and implement a method for improving the utility of geovisual analytics software. The success of software is measured by its usability (i.e., how easy the software is to use?) and utility (i.e., how useful the software is). The usability and utility of software can be improved by refining the software, increasing user knowledge about the software, or both. It is difficult to achieve transparent usability (i.e., software that is immediately usable without training) of geovisual analytics software because of the inherent complexity of the included tools and techniques. In these situations, improving user knowledge about the software through the provision of learning artifacts is as important, if not more so, than iterative refinement of the software itself. Therefore, our approach to improving utility is focused on educating the user.
The research reported here was completed in two steps. First, we developed a model for learning about geovisual analytics software. Many existing digital learning models assist only with use of the software to complete a specific task and provide limited assistance with its actual application. To move beyond task-oriented learning about software use, we propose a process-oriented approach to learning based on the concept of scientific workflows. Second, we implemented an interface in the G-EX Portal Learn Module to demonstrate the workflow learning model. The workflow interface allows users to drag learning artifacts uploaded to the G-EX Portal onto a central whiteboard and then annotate the workflow using text and drawing tools. Once completed, users can visit the assembled workflow to get an idea of the kind, number, and scale of analysis steps, view individual learning artifacts associated with each node in the workflow, and ask questions about the overall workflow or individual learning artifacts through the associated forums. An example learning workflow in the domain of epidemiology is provided to demonstrate the effectiveness of the approach.
In the context of geovisual analytics, GIScientists are not only responsible for developing software to facilitate visually-mediated reasoning about large and complex spatiotemporal information, but also for ensuring that this software works. The workflow learning model discussed in this paper and demonstrated in the G-EX Portal Learn Module is one approach to improving the utility of geovisual analytics software. While development of the G-EX Portal Learn Module is ongoing, we expect to release the G-EX Portal Learn Module by Summer 2009.
PMCID: PMC3186065  PMID: 21983545
geovisual analytics; workflows; learning; utility; usability; geocollaboration; G-EX Portal; epidemiology
9.  In silico fragmentation for computer assisted identification of metabolite mass spectra 
BMC Bioinformatics  2010;11:148.
Mass spectrometry has become the analytical method of choice in metabolomics research. The identification of unknown compounds is the main bottleneck. In addition to the precursor mass, tandem MS spectra carry informative fragment peaks, but the coverage of spectral libraries of measured reference compounds are far from covering the complete chemical space. Compound libraries such as PubChem or KEGG describe a larger number of compounds, which can be used to compare their in silico fragmentation with spectra of unknown metabolites.
We created the MetFrag suite to obtain a candidate list from compound libraries based on the precursor mass, subsequently ranked by the agreement between measured and in silico fragments. In the evaluation MetFrag was able to rank most of the correct compounds within the top 3 candidates returned by an exact mass query in KEGG. Compared to a previously published study, MetFrag obtained better results than the commercial MassFrontier software. Especially for large compound libraries, the candidates with a good score show a high structural similarity or just different stereochemistry, a subsequent clustering based on chemical distances reduces this redundancy. The in silico fragmentation requires less than a second to process a molecule, and MetFrag performs a search in KEGG or PubChem on average within 30 to 300 seconds, respectively, on an average desktop PC.
We presented a method that is able to identify small molecules from tandem MS measurements, even without spectral reference data or a large set of fragmentation rules. With today's massive general purpose compound libraries we obtain dozens of very similar candidates, which still allows a confident estimate of the correct compound class. Our tool MetFrag improves the identification of unknown substances from tandem MS spectra and delivers better results than comparable commercial software. MetFrag is available through a web application, web services and as java library. The web frontend allows the end-user to analyse single spectra and browse the results, whereas the web service and console application are aimed to perform batch searches and evaluation.
PMCID: PMC2853470  PMID: 20307295
10.  A supersecondary structure library and search algorithm for modeling loops in protein structures 
Nucleic Acids Research  2006;34(7):2085-2097.
We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105 950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z-score that combines information on sequence similarity and fit of predicted and observed ϕ/ψ main chain dihedral angle propensities. Confidence Z-score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 Å of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.
PMCID: PMC1440879  PMID: 16617149
11.  Use of RNA structure flexibility data in nanostructure modeling 
Methods (San Diego, Calif.)  2010;54(2):239-250.
In the emerging field of RNA-based nanotechnology there is a need for automation of the structure design process. Our goal is to develop computer methods for aiding in this process. Towards that end, we created the RNAJunction database, which is a repository of RNA junctions, i.e. internal, multi-branch and kissing loops with emanating stem stubs, extracted from the larger RNA structures stored in the PDB database. These junctions can be used as building blocks for nanostructures. Two programs developed in our laboratory, NanoTiler and RNA2D3D, can combine such building blocks with idealized fragments of A-form helices to produce desired 3D nanostructures. Initially, the building blocks are treated as rigid objects and the resulting geometry is tested against the design objectives. Experimental data, however, shows that RNA accommodates its shape to the constraints of larger structural contexts. Therefore we are adding analysis of the flexibility of our building blocks to the full design process. Here we present an example of RNA-based nanostructure design, putting emphasis on the need to characterize the structural flexibility of the building blocks to induce ring closure in the automated exploration. We focus on the use of kissing loops (KL) in nanostructure design, since they have been shown to play an important role in RNA self-assembly. By using an experimentally proven system, the RNA tectosquare, we show that considering the flexibility of the KLs as well as distortions of helical regions may be necessary to achieve a realistic design.
PMCID: PMC3107926  PMID: 21163354
RNA; Nanostructure; Design; Modeling; Flexibility; Molecular dynamics
12.  Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads 
PLoS Computational Biology  2008;4(9):e1000186.
Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that the most efficient technology produces the shortest read lengths. Short-read sequencing has been applied successfully to resequence the human genome and those of other species but not to whole-genome sequencing of novel organisms. Here we describe the sequencing and assembly of a novel clinical isolate of Pseudomonas aeruginosa, strain PAb1, using very short read technology. From 8,627,900 reads, each 33 nucleotides in length, we assembled the genome into one scaffold of 76 ordered contiguous sequences containing 6,290,005 nucleotides, including one contig spanning 512,638 nucleotides, plus an additional 436 unordered contigs containing 416,897 nucleotides. Our method includes a novel gene-boosting algorithm that uses amino acid sequences from predicted proteins to build a better assembly. This study demonstrates the feasibility of very short read sequencing for the sequencing of bacterial genomes, particularly those for which a related species has been sequenced previously, and expands the potential application of this new technology to most known prokaryotic species.
Author Summary
In this paper we demonstrate that a bacterial genome, Pseudomonas aeruginosa, can be decoded using very short DNA sequences, namely, those produced by the newest generation of DNA sequencers such as the Solexa sequencer from Illumina. Our method includes a novel algorithm that uses the protein sequences from other species to assist the assembly of the new genome. This algorithm breaks up the genome into gene-sized chunks that can be put back together relatively easily, even from sequence fragments as short as 30 bases of DNA. We also take advantage of the genomes of related species, using them as reference strains to assist the assembly. By combining these and other techniques, we were able to assemble 94% of the 6.7 million bases of P. aeruginosa into just 76 large pieces. The remaining 6% is contained in 436 smaller fragments. We have made all of our software available for free under open-source licenses, and we have deposited the newly assembled genome in the public GenBank database.
PMCID: PMC2529408  PMID: 18818729
13.  Toward the automated generation of genome-scale metabolic networks in the SEED 
BMC Bioinformatics  2007;8:139.
Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process.
We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis.
Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.
PMCID: PMC1868769  PMID: 17462086
14.  Macromolecular ab initio phasing enforcing secondary and tertiary structure 
IUCrJ  2015;2(Pt 1):95-105.
ARCIMBOLDO replaces the atomicity constraints required for ab initio phasing by enforcement of model stereochemistry. Small model fragments and local folds are exploited at resolutions up to 2 Å in different contexts, from supercomputers to the standalone ARCIMBOLDO_LITE, which solves straightforward cases on a single multicore machine.
Ab initio phasing of macromolecular structures, from the native intensities alone with no experimental phase information or previous particular structural knowledge, has been the object of a long quest, limited by two main barriers: structure size and resolution of the data. Current approaches to extend the scope of ab initio phasing include use of the Patterson function, density modification and data extrapolation. The authors’ approach relies on the combination of locating model fragments such as polyalanine α-helices with the program PHASER and density modification with the program SHELXE. Given the difficulties in discriminating correct small substructures, many putative groups of fragments have to be tested in parallel; thus calculations are performed in a grid or supercomputer. The method has been named after the Italian painter Arcimboldo, who used to compose portraits out of fruit and vegetables. With ARCIMBOLDO, most collections of fragments remain a ‘still-life’, but some are correct enough for density modification and main-chain tracing to reveal the protein’s true portrait. Beyond α-helices, other fragments can be exploited in an analogous way: libraries of helices with modelled side chains, β-strands, predictable fragments such as DNA-binding folds or fragments selected from distant homologues up to libraries of small local folds that are used to enforce nonspecific tertiary structure; thus restoring the ab initio nature of the method. Using these methods, a number of unknown macromolecules with a few thousand atoms and resolutions around 2 Å have been solved. In the 2014 release, use of the program has been simplified. The software mediates the use of massive computing to automate the grid access required in difficult cases but may also run on a single multicore workstation ( to solve straightforward cases.
PMCID: PMC4285884  PMID: 25610631
ab initio phasing; α-helices; macromolecular structure; ARCIMBOLDO
15.  A versatile, efficient strategy for assembly of multi-fragment expression vectors in Saccharomyces cerevisiae using 60 bp synthetic recombination sequences 
In vivo recombination of overlapping DNA fragments for assembly of large DNA constructs in the yeast Saccharomyces cerevisiae holds great potential for pathway engineering on a small laboratory scale as well as for automated high-throughput strain construction. However, the current in vivo assembly methods are not consistent with respect to yields of correctly assembled constructs and standardization of parts required for routine laboratory implementation has not been explored. Here, we present and evaluate an optimized and robust method for in vivo assembly of plasmids from overlapping DNA fragments in S. cerevisiae.
To minimize occurrence of misassembled plasmids and increase the versatility of the assembly platform, two main improvements were introduced; i) the essential elements of the vector backbone (yeast episome and selection marker) were disconnected and ii) standardized 60 bp synthetic recombination sequences non-homologous with the yeast genome were introduced at each flank of the assembly fragments. These modifications led to a 100 fold decrease in false positive transformants originating from the backbone as compared to previous methods. Implementation of the 60 bp synthetic recombination sequences enabled high flexibility in the design of complex expression constructs and allowed for fast and easy construction of all assembly fragments by PCR. The functionality of the method was demonstrated by the assembly of a 21 kb plasmid out of nine overlapping fragments carrying six glycolytic genes with a correct assembly yield of 95%. The assembled plasmid was shown to be a high fidelity replica of the in silico design and all glycolytic genes carried by the plasmid were proven to be functional.
The presented method delivers a substantial improvement for assembly of multi-fragment expression vectors in S. cerevisiae. Not only does it improve the efficiency of in vivo assembly, but it also offers a versatile platform for easy and rapid design and assembly of synthetic constructs. The presented method is therefore ideally suited for the construction of complex pathways and for high throughput strain construction programs for metabolic engineering purposes. In addition its robustness and ease of use facilitate the construction of any plasmid carrying two or more genes.
PMCID: PMC3669052  PMID: 23663359
In vivo assembly; Saccharomyces cerevisiae; Synthetic biology; Pathway engineering; Homologous recombination
16.  Activation of synovial fibroblasts in rheumatoid arthritis: lack of expression of the tumour suppressor PTEN at sites of invasive growth and destruction 
Arthritis Research  1999;2(1):59-64.
In the present study, we searched for mutant PTEN transcripts in aggressive rheumatoid arthritis synovial fibroblasts (RA-SF) and studied the expression of PTEN in RA. By automated sequencing, no evidence for the presence of mutant PTEN transcripts was found. However, in situ hybridization on RA synovium revealed a distinct expression pattern of PTEN, with negligible staining in the lining layer but abundant expression in the sublining. Normal synovial tissue exhibited homogeneous staining for PTEN. In cultured RA-SF, only 40% expressed PTEN. Co-implantation of RA-SF and normal human cartilage into severe combined immunodeficiency (SCID) mice showed only limited expression of PTEN, with no staining in those cells aggressively invading the cartilage. Although PTEN is not genetically altered in RA, these findings suggest that a lack of PTEN expression may constitute a characteristic feature of activated RA-SF in the lining, and may thereby contribute to the invasive behaviour of RA-SF by maintaining their aggressive phenotype at sites of cartilage destruction.
PTEN is a novel tumour suppressor which exhibits tyrosine phosphatase activity as well as homology to the cytoskeletal proteins tensin and auxilin. Mutations of PTEN have been described in several human cancers and associated with their invasiveness and metastatic properties. Although not malignant, rheumatoid arthritis synovial fibroblasts (RA-SF) exhibit certain tumour-like features such as attachment to cartilage and invasive growth. In the present study, we analyzed whether mutant transcripts of PTEN were present in RA-SF. In addition, we used in situ hybridization to study the expression of PTEN messenger (m)RNA in tissue samples of RA and normal individuals as well as in cultured RA-SF and in the severe combined immunodeficiency (SCID) mouse model of RA.
Synovial tissue specimens were obtained from seven patients with RA and from two nonarthritic individuals. Total RNA was isolated from synovial fibroblasts and after first strand complementary (c)DNA synthesis, polymerase chain reaction (PCR) was performed to amplify a 1063 base pair PTEN fragment that encompassed the coding sequence of PTEN including the phosphatase domain and all mutation sites described so far. The PCR products were subcloned in Escherichia coli, and up to four clones were picked from each plate for automated sequencing. For in situ hybridization, digoxigenin-labelled PTEN-specific RNA probes were generated by in vitro transcription. For control in situ hybridization, a matrix metalloproteinase (MMP)-2-specific probe was prepared. To investigate the expression of PTEN in the absence of human macrophage or lymphocyte derived factors, we implanted RA-SF from three patients together with normal human cartilage under the renal capsule of SCID mice. After 60 days, mice were sacrificed, the implants removed and embedded into paraffin.
PCR revealed the presence of the expected 1063 base pair PTEN fragment in all (9/9) cell cultures (Fig. 1). No additional bands that could account for mutant PTEN variants were detected. Sequence analysis revealed 100% homology of all RA-derived PTEN fragments to those from normal SF as well as to the published GenBank sequence (accession number U93051). However, in situ hybridization demonstrated considerable differences in the expression of PTEN mRNA within the lining and the sublining layers of RA synovial membranes. As shown in Figure 2a, no staining was observed within the lining layer which has been demonstrated to mediate degradation of cartilage and bone in RA. In contrast, abundant expression of PTEN mRNA was found in the sublining of all RA synovial tissues (Figs 2a and b). Normal synovial specimens showed homogeneous staining for PTEN within the thin synovial membrane (Fig. 2c). In situ hybridization using the sense probe gave no specific staining (Fig. 2d). We also performed in situ hybridization on four of the seven cultured RA-SF and followed one cell line from the first to the sixth passage. Interestingly, only 40% of cultured RA-SF expressed PTEN mRNA (Fig. 3a), and the proportion of PTEN expressing cells did not change throughout the passages. In contrast, control experiments using a specific RNA probe for MMP-2 revealed mRNA expression by nearly all cultured cells (Fig. 3b). As seen before, implantation of RA-SF into the SCID mice showed considerable cartilage degradation. Interestingly, only negligible PTEN expression was found in those RA-SF aggressively invading the cartilage (Fig. 3c). In situ hybridization for MMP-2 showed abundant staining in these cells (Fig. 3d).
Although this study found no evidence for mutations of PTEN in RA synovium, the observation that PTEN expression is lacking in the lining layer of RA synovium as well as in more than half of cultured RA-SF is of interest. It suggests that loss of PTEN function may not exclusively be caused by genetic alterations, yet at the same time links the low expression of PTEN to a phenotype of cells that have been shown to invade cartilage aggressively.
It has been proposed that the tyrosine phosphatase activity of PTEN is responsible for its tumour suppressor activity by counteracting the actions of protein tyrosine kinases. As some studies have demonstrated an upregulation of tyrosine kinase activity in RA synovial cells, it might be speculated that the lack of PTEN expression in aggressive RA-SF contributes to the imbalance of tyrosine kinases and phosphatases in this disease. However, the extensive amino-terminal homology of the predicted protein to the cytoskeletal proteins tensin and auxilin suggests a complex regulatory function involving cellular adhesion molecules and phosphatase-mediated signalling. The tyrosine phosphatase TEP1 has been shown to be identical to the protein encoded by PTEN, and gene transcription of TEP1 has been demonstrated to be downregulated by transforming growth factor (TGF)-β. Therefore, it could be hypothesized that TGF-β might be responsible for the downregulation of PTEN. However, the expression of TGF-β is not restricted to the lining but found throughout the synovial tissue in RA. Moreover, in our study the percentage of PTEN expressing RA-SF remained stable for six passages in culture, whereas molecules that are cytokine-regulated in vivo frequently change their expression levels when cultured over several passages. Also, cultured RA-SF that were implanted into SCID mice and deeply invaded the cartilage did not show significant expression of PTEN after 60 days. The drop in the percentage of PTEN expressing cells from the original cell cultures to the SCID mouse implants is of interest as this observation goes along with data from previous studies that have shown the prominent expression of activation-related molecules in the SCID mice implants that in vivo are found predominantly in the lining layer. Therefore, our data point to endogenous mechanisms rather than to the influence of exogenous human cytokines or factors in the downregulation of PTEN. Low expression of PTEN may belong to the features that distinguish between the activated phenotype of RA-SF and the sublining, proliferating but nondestructive cells.
PMCID: PMC17804  PMID: 11219390
rheumatoid arthritis; synovial membrane; fibroblasts; PTEN tumour suppressor; severe combined immunodeficiency (SCID) mouse model; cartilage destruction; in situ hybridization
17.  ESTuber db: an online database for Tuber borchii EST sequences 
BMC Bioinformatics  2007;8(Suppl 1):S13.
The ESTuber database () includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface.
Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes.
Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure.
The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
PMCID: PMC1885842  PMID: 17430557
18.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment 
PLoS Computational Biology  2008;4(1):e10.
Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these “bent” alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of α-helices and β-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins.
Author Summary
Proteins fold into complicated highly asymmetrical 3-D shapes. When a protein is found to fold in a shape that is sufficiently similar to other proteins whose functional roles are known, this can significantly aid in predicting function in the new protein. In addition, the areas where structure is highly conserved in a set of such similar proteins may indicate functional or structural importance of the conserved region. Given a set of protein structures, the protein structural alignment problem is to determine the superimposition of the backbones of these protein structures that places as much of the structures as possible into close spatial alignment. We introduce an algorithm that allows local flexibility in the structures when it brings them into closer alignment. The algorithm performs as well as its competitors when the structures to be aligned are highly similar, and outperforms them by a larger and larger margin as similarity decreases. In addition, for the related classification problem that asks if the degree of structural similarity between two proteins implies if they likely evolved from a common ancestor, a scoring function assesses, based on the best alignment generated for each pair of protein structures, whether they should be declared sufficiently structurally similar or not. This score can be used to predict when two proteins have sufficiently similar shapes to likely share functional characteristics.
PMCID: PMC2186361  PMID: 18193941
19.  A computational platform to maintain and migrate manual functional annotations for BioCyc databases 
BMC Systems Biology  2014;8(1):115.
BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database.
We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers.
Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.
Electronic supplementary material
The online version of this article (doi:10.1186/s12918-014-0115-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4203924  PMID: 25304126
Annotation tool; BioCyc; Pathway/Genome database; JavaCycO
20.  CycADS: an annotation database system to ease the development and update of BioCyc databases 
In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms.
Database URL:
PMCID: PMC3072769  PMID: 21474551
21.  Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome 
BMC Bioinformatics  2007;8(Suppl 1):S9.
The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes.
GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences.
GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file.
The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations.
We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium.
The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation.
PMCID: PMC1885861  PMID: 17430576
22.  RNA Nanotechnology: Engineering, Assembly and Applications in Detection, Gene Delivery and Therapy 
Biological macromolecules including DNA, RNA, and proteins, have intrinsic features that make them potential building blocks for the bottom-up fabrication of nanodevices. RNA is unique in nanoscale fabrication due to its amazing diversity of function and structure. RNA molecules can be designed and manipulated with a level of simplicity characteristic of DNA while possessing versatility in structure and function similar to that of proteins. RNA molecules typically contain a large variety of single stranded loops suitable for inter- and intra-molecular interaction. These loops can serve as mounting dovetails obviating the need for external linking dowels in fabrication and assembly.
The self-assembly of nanoparticles from RNA involves cooperative interaction of individual RNA molecules that spontaneously assemble in a predefined manner to form a larger two- or three-dimensional structure. Within the realm of self-assembly there are two main categories, namely template and non-template. Template assembly involves interaction of RNA molecules under the influence of specific external sequence, forces, or spatial constraints such as RNA transcription, hybridization, replication, annealing, molding, or replicas. In contrast, non-template assembly involves formation of a larger structure by individual components without the influence of external forces. Examples of non-template assembly are ligation, chemical conjugation, covalent linkage, and loop/loop interaction of RNA, especially the formation of RNA multimeric complexes. The best characterized RNA multiplier and the first to be described in RNA nanotechnological application is the motor pRNA of bacteriophage phi29 which form dimers, trimers, and hexamers, via hand-in-hand interaction. phi29 pRNA can be redesigned to form a variety of structures and shapes including twins, tetramers, rods, triangles, and 3D arrays several microns in size via interaction of programmed helical regions and loops. 3D RNA array formation requires a defined nucleotide number for twisting and a palindromic sequence. Such arrays are unusually stable and resistant to a wide range of temperatures, salt concentrations, and pH. Both the therapeutic siRNA or ribozyme and a receptor-binding RNA aptamer or other ligands have been engineered into individual pRNAs. Individual chimeric RNA building blocks harboring siRNA or other therapeutic molecules have been fabricated subsequently into a trimer through hand-in-hand interaction of the engineered right and left interlocking RNA loops. The incubation of these particles containing the receptor-binding aptamer or other ligands results in the binding and co-entry of trivalent therapeutic particles into cells. Such particles were subsequently shown to modulate the apoptosis of cancer cells in both cell cultures and animal trials. The use of such antigen-free 20–40 nm particles holds promise for the repeated long-term treatment of chronic diseases. Other potentially useful RNA molecules that form multimers include HIV RNA that contain kissing loop to form dimers, tecto-RNA that forms a “jigsaw puzzle,” and the Drosophila bicoid mRNA that forms multimers via “hand-by-arm” interactions.
Applications of RNA molecules involving replication, molding, embossing, and other related techniques, have recently been described that allow the utilization of a variety of materials to enhance diversity and resolution of nanomaterials. It should eventually be possible to adapt RNA to facilitate construction of ordered, patterned, or pre-programmed arrays or superstructures. Given the potential for 3D fabrication, the chance to produce reversible self-assembly, and the ability of self-repair, editing and replication, RNA self-assembly will play an increasingly significant role in integrated biological nanofabrication. A random 100-nucleotide RNA library may exist in 1.6 × 1060 varieties with multifarious structure to serve as a vital system for efficient fabrication, with a complexity and diversity far exceeding that of any current nanoscale system.
This review covers the basic concepts of RNA structure and function, certain methods for the study of RNA structure, the approaches for engineering or fabricating RNA into nanoparticles or arrays, and special features of RNA molecules that form multimers. The most recent development in exploration of RNA nanoparticles for pathogen detection, drug/gene delivery, and therapeutic application is also introduced in this review.
PMCID: PMC2842999  PMID: 16430131
RNA; Nanotechnology; Self-Assembly; RNA Application; phi29 pRNA
23.  "TOF2H": A precision toolbox for rapid, high density/high coverage hydrogen-deuterium exchange mass spectrometry via an LC-MALDI approach, covering the data pipeline from spectral acquisition to HDX rate analysis 
BMC Bioinformatics  2008;9:387.
Protein-amide proton hydrogen-deuterium exchange (HDX) is used to investigate protein conformation, conformational changes and surface binding sites for other molecules. To our knowledge, software tools to automate data processing and analysis from sample fractionating (LC-MALDI) mass-spectrometry-based HDX workflows are not publicly available.
An integrated data pipeline (Solvent Explorer/TOF2H) has been developed for the processing of LC-MALDI-derived HDX data. Based on an experiment-wide template, and taking an ab initio approach to chromatographic and spectral peak finding, initial data processing is based on accurate mass-matching to fully deisotoped peaklists accommodating, in MS/MS-confirmed peptide library searches, ambiguous mass-hits to non-target proteins. Isotope-shift re-interrogation of library search results allows quick assessment of the extent of deuteration from peaklist data alone. During raw spectrum editing, each spectral segment is validated in real time, consistent with the manageable spectral numbers resulting from LC-MALDI experiments. A semi-automated spectral-segment editor includes a semi-automated or automated assessment of the quality of all spectral segments as they are pooled across an XIC peak for summing, centroid mass determination, building of rates plots on-the-fly, and automated back exchange correction. The resulting deuterium uptake rates plots from various experiments can be averaged, subtracted, re-scaled, error-barred, and/or scatter-plotted from individual spectral segment centroids, compared to solvent exposure and hydrogen bonding predictions and receive a color suggestion for 3D visualization. This software lends itself to a "divorced" HDX approach in which MS/MS-confirmed peptide libraries are built via nano or standard ESI without source modification, and HDX is performed via LC-MALDI using a standard MALDI-TOF. The complete TOF2H package includes additional (eg LC analysis) modules.
"TOF2H" provides a comprehensive HDX data analysis package that has accelerated the processing of LC-MALDI-based HDX data in the authors' lab from weeks to hours. It runs in a standard MS Windows (XP or Vista) environment, and can be downloaded or obtained from the authors at no cost.
PMCID: PMC2561049  PMID: 18803853
24.  Air Cleaning Technologies 
Executive Summary
This health technology policy assessment will answer the following questions:
When should in-room air cleaners be used?
How effective are in-room air cleaners?
Are in-room air cleaners that use combined HEPA and UVGI air cleaning technology more effective than those that use HEPA filtration alone?
What is the Plasmacluster ion air purifier in the pandemic influenza preparation plan?
The experience of severe acute respiratory syndrome (SARS) locally, nationally, and internationally underscored the importance of administrative, environmental, and personal protective infection control measures in health care facilities. In the aftermath of the SARS crisis, there was a need for a clearer understanding of Ontario’s capacity to manage suspected or confirmed cases of airborne infectious diseases. In so doing, the Walker Commission thought that more attention should be paid to the potential use of new technologies such as in-room air cleaning units. It recommended that the Medical Advisory Secretariat of the Ontario Ministry of Health and Long-Term Care evaluate the appropriate use and effectiveness of such new technologies.
Accordingly, the Ontario Health Technology Advisory Committee asked the Medical Advisory Secretariat to review the literature on the effectiveness and utility of in-room air cleaners that use high-efficiency particle air (HEPA) filters and ultraviolet germicidal irradiation (UVGI) air cleaning technology.
Additionally, the Ontario Health Technology Advisory Committee prioritized a request from the ministry’s Emergency Management Unit to investigate the possible role of the Plasmacluster ion air purifier manufactured by Sharp Electronics Corporation, in the pandemic influenza preparation plan.
Clinical Need
Airborne transmission of infectious diseases depends in part on the concentration of breathable infectious pathogens (germs) in room air. Infection control is achieved by a combination of administrative, engineering, and personal protection methods. Engineering methods that are usually carried out by the building’s heating, ventilation, and air conditioning (HVAC) system function to prevent the spread of airborne infectious pathogens by diluting (dilution ventilation) and removing (exhaust ventilation) contaminated air from a room, controlling the direction of airflow and the air flow patterns in a building. However, general wear and tear over time may compromise the HVAC system’s effectiveness to maintain adequate indoor air quality. Likewise, economic issues may curtail the completion of necessary renovations to increase its effectiveness. Therefore, when exposure to airborne infectious pathogens is a risk, the use of an in-room air cleaner to reduce the concentration of airborne pathogens and prevent the spread of airborne infectious diseases has been proposed as an alternative to renovating a HVAC system.
Airborne transmission is the spread of infectious pathogens over large distances through the air. Infectious pathogens, which may include fungi, bacteria, and viruses, vary in size and can be dispersed into the air in drops of moisture after coughing or sneezing. Small drops of moisture carrying infectious pathogens are called droplet nuclei. Droplet nuclei are about 1 to 5μm in diameter. This small size in part allows them to remain suspended in the air for several hours and be carried by air currents over considerable distances. Large drops of moisture carrying infectious pathogens are called droplets. Droplets being larger than droplet nuclei, travel shorter distances (about 1 metre) before rapidly falling out of the air to the ground. Because droplet nuclei remain airborne for longer periods than do droplets, they are more amenable to engineering infection control methods than are droplets.
Droplet nuclei are responsible for the airborne transmission of infectious diseases such as tuberculosis, chicken pox (varicella), measles (rubeola), and dessiminated herpes zoster, whereas close contact is required for the direct transmission of infectious diseases transmitted by droplets, such as influenza (the flu) and SARS.
The Technology
In-room air cleaners are supplied as portable or fixed devices. Fixed devices can be attached to either a wall or ceiling and are preferred over portable units because they have a greater degree of reliability (if installed properly) for achieving adequate room air mixing and airflow patterns, which are important for optimal effectiveness.
Through a method of air recirculation, an in-room air cleaner can be used to increase room ventilation rates and if used to exhaust air out of the room it can create a negative-pressure room for airborne infection isolation (AII) when the building’s HVAC system cannot do so. A negative-pressure room is one where clean air flows into the room but contaminated air does not flow out of it. Contaminated room air is pulled into the in-room air cleaner and cleaned by passing through a series of filters, which remove the airborne infectious pathogens. The cleaned air is either recirculated into the room or exhausted outside the building. By filtering contaminated room air and then recirculating the cleaned air into the room, an in-room air cleaner can improve the room’s ventilation. By exhausting the filtered air to the outside the unit can create a negative-pressure room. There are many types of in-room air cleaners. They vary widely in the airflow rates through the unit, the type of air cleaning technology used, and the technical design.
Crucial to maximizing the efficiency of any in-room air cleaner is its strategic placement and set-up within a room, which should be done in consultation with ventilation engineers, infection control experts, and/or industrial hygienists. A poorly positioned air cleaner may disrupt airflow patterns within the room and through the air cleaner, thereby compromising its air cleaning efficiency.
The effectiveness of an in-room air cleaner to remove airborne pathogens from room air depends on several factors, including the airflow rate through the unit’s filter and the airflow patterns in the room. Tested under a variety of conditions, in-room air cleaners, including portable or ceiling mounted units with either a HEPA or a non-HEPA filter, portable units with UVGI lights only, or ceiling mounted units with combined HEPA filtration and UVGI lights, have been estimated to be between 30% and 90%, 99% and 12% and 80% effective, respectively. However, and although their effectiveness is variable, the United States Centers for Disease Control and Prevention has acknowledged in-room air cleaners as alternative technology for increasing room ventilation when this cannot be achieved by the building’s HVAC system with preference given to fixed recirculating systems over portable ones.
Importantly, the use of an in-room air cleaner does not preclude either the need for health care workers and visitors to use personal protective equipment (N95 mask or equivalent) when entering AII rooms or health care facilities from meeting current regulatory requirements for airflow rates (ventilation rates) in buildings and airflow differentials for effective negative-pressure rooms.
The Plasmacluster ion technology, developed in 2000, is an air purification technology. Its manufacturer, Sharp Electronics Corporation, says that it can disable airborne microorganisms through the generation of both positive and negative ions. (1) The functional unit is the hydroxyl, which is a molecule comprised of one oxygen molecule and one hydrogen atom.
Plasmacluster ion air purifier uses a multilayer filter system composed of a prefilter, a carbon filter, an antibacterial filter, and a HEPA filter, combined with an ion generator to purify the air. The ion generator uses an alternating plasma discharge to split water molecules into positively and negatively charged ions. When these ions are emitted into the air, they are surrounded by water molecules and form cluster ions which are attracted to airborne particles. The cluster ion surrounds the airborne particle, and the positive and negative ions react to form hydroxyls. These hydroxyls steal the airborne particle’s hydrogen atom, which creates a hole in the particle’s outer protein membrane, thereby rendering it inactive.
Because influenza is primarily acquired by large droplets and direct and indirect contact with an infectious person, any in-room air cleaner will have little benefit in controlling and preventing its spread. Therefore, there is no role for the Plasmacluster ion air purifier or any other in-room air cleaner in the control of the spread of influenza. Accordingly, for purposes of this review, the Medical Advisory Secretariat presents no further analysis of the Plasmacluster.
Review Strategy
The objective of the systematic review was to determine the effectiveness of in-room air cleaners with built in UVGI lights and HEPA filtration compared with those using HEPA filtration only.
The Medical Advisory Secretariat searched the databases of MEDLINE, EMBASE, Cochrane Database of Systematic Reviews, INAHATA (International Network of Agencies for Health Technology Assessment), Biosis Previews, Bacteriology Abstracts, Web of Science, Dissertation Abstracts, and NIOSHTIC 2.
A meta-analysis was conducted if adequate data was available from 2 or more studies and where statistical and clinical heterogeneity among studies was not an issue. Otherwise, a qualitative review was completed. The GRADE system was used to summarize the quality of the body of evidence comprised of 1 or more studies.
Summary of Findings
There were no existing health technology assessments on air cleaning technology located during the literature review. The literature search yielded 59 citations of which none were retained. One study was retrieved from a reference list of a guidance document from the United States Centers for Disease Control and Prevention, which evaluated an in-room air cleaner with combined UVGI lights and HEPA filtration under 2 conditions: UVGI lights on and UVGI lights off. Experiments were performed using different ventilation rates and using an aerosolized pathogen comprised of Mycobaterium parafortuitum, a surrogate for the bacterium that causes tuberculosis. Effectiveness was measured as equivalent air changes per hour (eACH). This single study formed the body of evidence for our systematic review research question.
Experimental Results
The eACH rate for the HEPA-UVGI in-room air cleaner was statistically significantly greater when the UV lights were on compared with when the UV lights were off. (P < .05). However, subsequent experiments could not attribute this to the UVGI. Consequently, the results are inconclusive and an estimate of effect (benefit) is uncertain.
The study was reviewed by a scientific expert and rated moderate for quality. Further analysis determined that there was some uncertainty in the directness of the outcome measure (eACH); thus, the GRADE level for the quality of the evidence was low indicating that an estimate of effect is very uncertain.
There is uncertainty in the benefits of using in-room air cleaners with combined UVGI lights and HEPA filtration over systems that use HEPA filtration alone. However, there are no known risks to using systems with combined UVGI and HEPA technology compared with those with HEPA alone. There is an increase in the burden of cost including capital costs (cost of the device), operating costs (electricity usage), and maintenance costs (cleaning and replacement of UVGI lights) to using an in-room air cleaner with combined UVGI and HEPA technology compared with those with HEPA alone. Given the uncertainty of the estimate of benefits, an in-room air cleaner with HEPA technology only may be an equally reasonable alternative to using one with combined UVGI and HEPA technology
In-room air cleaners may be used to protect health care staff from air borne infectious pathogens such as tuberculosis, chicken pox, measles, and dessiminated herpes zoster. In addition, and although in-room air cleaners are not effective at protecting staff and preventing the spread of droplet-transmitted diseases such as influenza and SARS, they may be deployed in situations with a novel/emerging infectious agent whose epidemiology is not yet defined and where airborne transmission is suspected.
It is preferable that in-room air cleaners be used with a fixed and permanent room placement when ventilation requirements must be improved and the HVAC system cannot be used. However, for acute (temporary) situations where a novel/emerging infectious agent presents whose epidemiology is not yet defined and where airborne transmission is suspected it may be prudent to use the in room air cleaner as a portable device until mode of transmission is confirmed. To maximize effectiveness, consultation with an environmental engineer and infection control expert should be undertaken before using an in-room air cleaner and protocols for maintenance and monitoring of these devices should be in place.
If properly installed and maintained, in room air cleaners with HEPA or combined HEPA and UVGI air cleaning technology are effective in removing airborne pathogens. However, there is only weak evidence available at this time regarding the benefit of using an in-room air cleaner with combined HEPA and UVGI air cleaner technology instead of those with HEPA filter technology only.
PMCID: PMC3382390  PMID: 23074468
25.  Saturating representation of loop conformational fragments in structure databanks 
Short fragments of proteins are fundamental starting points in various structure prediction applications, such as in fragment based loop modeling methods but also in various full structure build-up procedures. The applicability and performance of these approaches depend on the availability of short fragments in structure databanks.
We studied the representation of protein loop fragments up to 14 residues in length. All possible query fragments found in sequence databases (Sequence Space) were clustered and cross referenced with available structural fragments in Protein Data Bank (Structure Space). We found that the expansion of PDB in the last few years resulted in a dense coverage of loop conformational fragments. For each loops of length 8 in the current Sequence Space there is at least one loop in Structure Space with 50% or higher sequence identity. By correlating sequence and structure clusters of loops we found that a 50% sequence identity generally guarantees structural similarity. These percentages of coverage at 50% sequence cutoff drop to 96, 94, 68, 53, 33 and 13% for loops of length 9, 10, 11, 12, 13, and 14, respectively. There is not a single loop in the current Sequence Space at any length up to 14 residues that is not matched with a conformational segment that shares at least 20% sequence identity. This minimum observed identity is 40% for loops of 12 residues or shorter and is as high as 50% for 10 residue or shorter loops. We also assessed the impact of rapidly growing sequence databanks on the estimated number of new loop conformations and found that while the number of sequentially unique sequence segments increased about six folds during the last five years there are almost no unique conformational segments among these up to 12 residues long fragments.
The results suggest that fragment based prediction approaches are not limited any more by the completeness of fragments in databanks but rather by the effective scoring and search algorithms to locate them. The current favorable coverage and trends observed will be further accentuated with the progress of Protein Structure Initiative that targets new protein folds and ultimately aims at providing an exhaustive coverage of the structure space.
PMCID: PMC1574324  PMID: 16820050

Results 1-25 (1540977)