Situs is a modular and widely used software package for the integration of biophysical data across the spatial resolution scales. It has been developed over the last decade with a focus on bridging the resolution gap between atomic structures, coarse-grained models, and volumetric data from low-resolution biophysical origins, such as electron microscopy, tomography, or small-angle scattering. Structural models can be created and refined with various flexible and rigid body docking strategies. The software consists of multiple, stand-alone programs for the format conversion, analysis, visualization, manipulation, and assembly of 3D data sets. The programs have been ported to numerous platforms in both serial and shared memory parallel architectures and can be combined in various ways for specific modeling applications. The modular design facilitates the updating of individual programs and the development of novel application workflows. This review provides an overview of the Situs package as it exists today with an emphasis on functionality and workflows supported by version 2.5.
Electronic supplementary material
The online version of this article (doi:10.1007/s12551-009-0026-3) contains supplementary material, which is available to authorized users.
Structural models; 3D data sets; Multi-platform; Modeling
Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large.
Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers.
One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development.
Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don’t provide an clear approach when one wants to shape a new command line tool from a prototype shell script.
The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code.
In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospective analysis of treatment outcome in orthognathic surgery. With MIA prototyping algorithms by using shell scripts that combine small, single-task command line tools is a viable alternative to the use of high level languages, an approach that is especially useful when large data sets need to be processed.
The molecular graphics program Sculptor and the command-line suite Situs are software packages for the integration of biophysical data across spatial resolution scales. Herein, we provide an overview of recently developed tools relevant to cryo-electron tomography (cryo-ET), with an emphasis on functionality supported by Situs 2.7 and Sculptor 2.1. We describe a work flow for automatically segmenting filaments in cryo-ET maps including denoising, local normalization, feature detection, and tracing. Tomograms of cellular actin networks exhibit both cross-linked and bundled filament densities. Such filamentous regions in cryo-ET data sets can then be segmented using a stochastic template-based search, VolTrac. The approach combines a genetic algorithm and a bidirectional expansion with a tabu search strategy to localize and characterize filamentous regions. The automated filament segmentation by VolTrac compares well to a manual one performed by expert users, and it allows an efficient and reproducible analysis of large data sets. The software is free, open source, and can be used on Linux, Macintosh or Windows computers.
Tomograms; 3D analysis; filament detection; actin networks; denoising; segmentation
Microsatellites (MSs) are DNA markers with high analytical power, which are widely used in population genetics, genetic mapping, and forensic studies. Currently available software solutions for high-throughput MS design (i) have shortcomings in detecting and distinguishing imperfect and perfect MSs, (ii) lack often necessary interactive design steps, and (iii) do not allow for the development of primers for multiplex amplifications. We present a set of new tools implemented as extensions to the STADEN package, which provides the backbone functionality for flexible sequence analysis workflows. The possibility to assemble overlapping reads into unique contigs (provided by the base functionality of the STADEN package) is important to avoid developing redundant markers, a feature missing from most other similar tools.
Our extensions to the STADEN package provide the following functionality to facilitate microsatellite (and also minisatellite) marker design: The new modules (i) integrate the state-of-the-art tandem repeat detection and analysis software PHOBOS into workflows, (ii) provide two separate repeat detection steps – with different search criteria – one for masking repetitive regions during assembly of sequencing reads and the other for designing repeat-flanking primers for MS candidate loci, (iii) incorporate the widely used primer design program PRIMER3 into STADEN workflows, enabling the interactive design and visualization of flanking primers for microsatellites, and (iv) provide the functionality to find optimal locus- and primer pair combinations for multiplex primer design. Furthermore, our extensions include a module for storing analysis results in an SQLite database, providing a transparent solution for data access from within as well as from outside of the STADEN Package.
The STADEN package is enhanced by our modules into a highly flexible, high-throughput, interactive tool for conventional and multiplex microsatellite marker design. It gives the user detailed control over the workflow, enabling flexible combinations of manual and automated analysis steps. The software is available under the OpenBSD License [1,2]. The high efficiency of our automated marker design workflow has been confirmed in three microsatellite development projects.
There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure.
In this paper, we describe the architecture of a software system that is adaptable to a range of both pluggable execution and data backends in an open source implementation called Yabi. Enabling seamless and transparent access to heterogenous HPC environments at its core, Yabi then provides an analysis workflow environment that can create and reuse workflows as well as manage large amounts of both raw and processed data in a secure and flexible way across geographically distributed compute resources. Yabi can be used via a web-based environment to drag-and-drop tools to create sophisticated workflows. Yabi can also be accessed through the Yabi command line which is designed for users that are more comfortable with writing scripts or for enabling external workflow environments to leverage the features in Yabi. Configuring tools can be a significant overhead in workflow environments. Yabi greatly simplifies this task by enabling system administrators to configure as well as manage running tools via a web-based environment and without the need to write or edit software programs or scripts. In this paper, we highlight Yabi's capabilities through a range of bioinformatics use cases that arise from large-scale biomedical data analysis.
The Yabi system encapsulates considered design of both execution and data models, while abstracting technical details away from users who are not skilled in HPC and providing an intuitive drag-and-drop scalable web-based workflow environment where the same tools can also be accessed via a command line. Yabi is currently in use and deployed at multiple institutions and is available at http://ccg.murdoch.edu.au/yabi.
Bioinformatics; workflows; Internet; high performance computing
This paper analyzes the workflow and implementation of electronic health record (EHR) systems across different functions in small physician offices. We characterize the differences in the offices based on the levels of computerization in terms of workflow, sources of time delay, and barriers to using EHR systems to support the entire workflow. The study was based on a combination of questionnaires, interviews, in situ observations, and data collection efforts. This study was not intended to be a full-scale time-and-motion study with precise measurements but was intended to provide an overview of the potential sources of delays while performing office tasks. The study follows an interpretive model of case studies rather than a large-sample statistical survey of practices. To identify time-consuming tasks, workflow maps were created based on the aggregated data from the offices. The results from the study show that specialty physicians are more favorable toward adopting EHR systems than primary care physicians are. The barriers to adoption of EHR systems by primary care physicians can be attributed to the complex workflows that exist in primary care physician offices, leading to nonstandardized workflow structures and practices. Also, primary care physicians would benefit more from EHR systems if the systems could interact with external entities.
Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services.
We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems.
We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
In biomedical research, a huge variety of different techniques is currently available for the structural examination of small specimens, including conventional light microscopy (LM), transmission electron microscopy (TEM), confocal laser scanning microscopy (CLSM), microscopic X-ray computed tomography (microCT), and many others. Since every imaging method is physically limited by certain parameters, a correlative use of complementary methods often yields a significant broader range of information. Here we demonstrate the advantages of the correlative use of microCT, light microscopy, and transmission electron microscopy for the analysis of small biological samples.
We used a small juvenile bivalve mollusc (Mytilus galloprovincialis, approximately 0.8 mm length) to demonstrate the workflow of a correlative examination by microCT, LM serial section analysis, and TEM-re-sectioning. Initially these three datasets were analyzed separately, and subsequently they were fused in one 3D scene. This workflow is very straightforward. The specimen was processed as usual for transmission electron microscopy including post-fixation in osmium tetroxide and embedding in epoxy resin. Subsequently it was imaged with microCT. Post-fixation in osmium tetroxide yielded sufficient X-ray contrast for microCT imaging, since the X-ray absorption of epoxy resin is low. Thereafter, the same specimen was serially sectioned for LM investigation. The serial section images were aligned and specific organ systems were reconstructed based on manual segmentation and surface rendering. According to the region of interest (ROI), specific LM sections were detached from the slides, re-mounted on resin blocks and re-sectioned (ultrathin) for TEM. For analysis, image data from the three different modalities was co-registered into a single 3D scene using the software AMIRA®. We were able to register both the LM section series volume and TEM slices neatly to the microCT dataset, with small geometric deviations occurring only in the peripheral areas of the specimen. Based on co-registered datasets the excretory organs, which were chosen as ROI for this study, could be investigated regarding both their ultrastructure as well as their position in the organism and their spatial relationship to adjacent tissues. We found structures typical for mollusc excretory systems, including ultrafiltration sites at the pericardial wall, and ducts leading from the pericardium towards the kidneys, which exhibit a typical basal infolding system.
The presented approach allows a comprehensive analysis and presentation of small objects regarding both the overall organization as well as cellular and subcellular details. Although our protocol involves a variety of different equipment and procedures, we maintain that it offers savings in both effort and cost. Co-registration of datasets from different imaging modalities can be accomplished with high-end desktop computers and offers new opportunities for understanding and communicating structural relationships within organisms and tissues. In general, the correlative use of different microscopic imaging techniques will continue to become more widespread in morphological and structural research in zoology. Classical TEM serial section investigations are extremely time consuming, and modern methods for 3D analysis of ultrastructure such as SBF-SEM and FIB-SEM are limited to very small volumes for examination. Thus the re-sectioning of LM sections is suitable for speeding up TEM examination substantially, while microCT could become a key-method for complementing ultrastructural examinations.
Advances in electron cryo-microscopy have enabled structure determination of macromolecules at near-atomic resolution. However, structure determination, even using de novo methods, remains susceptible to model bias and overfitting. Here we describe a complete workflow for data acquisition, image processing, all-atom modelling and validation of brome mosaic virus, an RNA virus. Data were collected with a direct electron detector in integrating mode and an exposure beyond the traditional radiation damage limit. The final density map has a resolution of 3.8 Å as assessed by two independent data sets and maps. We used the map to derive an all-atom model with a newly implemented real-space optimization protocol. The validity of the model was verified by its match with the density map and a previous model from X-ray crystallography, as well as the internal consistency of models from independent maps. This study demonstrates a practical approach to obtain a rigorously validated atomic resolution electron cryo-microscopy structure.
Recent developments in cryo-electron microscopy have enabled structure determination of large protein complexes at almost atomic resolution. Wang et al. combine some of these technologies into an effective workflow, and demonstrate the protocol by solving the atomic structure of an icosahedral RNA virus.
A suite of GUI programs written in MATLAB for advanced data collection and analysis of full-field transmission X-ray microscopy data including mosaic imaging, tomography and XANES imaging is presented.
Transmission X-ray microscopy (TXM) has been well recognized as a powerful tool for non-destructive investigation of the three-dimensional inner structure of a sample with spatial resolution down to a few tens of nanometers, especially when combined with synchrotron radiation sources. Recent developments of this technique have presented a need for new tools for both system control and data analysis. Here a software package developed in MATLAB for script command generation and analysis of TXM data is presented. The first toolkit, the script generator, allows automating complex experimental tasks which involve up to several thousand motor movements. The second package was designed to accomplish computationally intense tasks such as data processing of mosaic and mosaic tomography datasets; dual-energy contrast imaging, where data are recorded above and below a specific X-ray absorption edge; and TXM X-ray absorption near-edge structure imaging datasets. Furthermore, analytical and iterative tomography reconstruction algorithms were implemented. The compiled software package is freely available.
X-ray microscopy; full-field; tomography; XANES imaging
Metabolomics is a systems approach to the analysis of cellular processes through small-molecule metabolite profiling. Standardisation of sample handling and acquisition approaches has contributed to reproducibility. However, the development of robust methods for the analysis of metabolomic data is a work-in-progress. The tools that do exist are often not well integrated, requiring manual data handling and custom scripting on a case-by-case basis. Furthermore, existing tools often require experience with programming environments such as MATLAB® or R to use, limiting accessibility. Here we present Pathomx, a workflow-based tool for the processing, analysis and visualisation of metabolomic and associated data in an intuitive and extensible environment.
The core application provides a workflow editor, IPython kernel and a HumanCyc™-derived database of metabolites, proteins and genes. Toolkits provide reusable tools that may be linked together to create complex workflows. Pathomx is released with a base set of plugins for the import, processing and visualisation of data. The IPython backend provides integration with existing platforms including MATLAB® and R, allowing data to be seamlessly transferred. Pathomx is supplied with a series of demonstration workflows and datasets. To demonstrate the use of the software we here present an analysis of 1D and 2D 1H NMR metabolomic data from a model system of mammalian cell growth under hypoxic conditions.
Pathomx is a useful addition to the analysis toolbox. The intuitive interface lowers the barrier to entry for non-experts, while scriptable tools and integration with existing tools supports complex analysis. We welcome contributions from the community.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0396-9) contains supplementary material, which is available to authorized users.
Metabolomics; Omics; nmr; Analysis; Visualisation; Workflow; Automation; Python
In bioinformatics projects, scientific workflow systems are widely used to manage computational procedures. Full-featured workflow systems have been proposed to fulfil the demand for workflow management. However, such systems tend to be over-weighted for actual bioinformatics practices. We realize that quick deployment of cutting-edge software implementing advanced algorithms and data formats, and continuous adaptation to changes in computational resources and the environment are often prioritized in scientific workflow management. These features have a greater affinity with the agile software development method through iterative development phases after trial and error.
Here, we show the application of a scientific workflow system Pwrake to bioinformatics workflows. Pwrake is a parallel workflow extension of Ruby's standard build tool Rake, the flexibility of which has been demonstrated in the astronomy domain. Therefore, we hypothesize that Pwrake also has advantages in actual bioinformatics workflows.
We implemented the Pwrake workflows to process next generation sequencing data using the Genomic Analysis Toolkit (GATK) and Dindel. GATK and Dindel workflows are typical examples of sequential and parallel workflows, respectively. We found that in practice, actual scientific workflow development iterates over two phases, the workflow definition phase and the parameter adjustment phase. We introduced separate workflow definitions to help focus on each of the two developmental phases, as well as helper methods to simplify the descriptions. This approach increased iterative development efficiency. Moreover, we implemented combined workflows to demonstrate modularity of the GATK and Dindel workflows.
Pwrake enables agile management of scientific workflows in the bioinformatics domain. The internal domain specific language design built on Ruby gives the flexibility of rakefiles for writing scientific workflows. Furthermore, readability and maintainability of rakefiles may facilitate sharing workflows among the scientific community. Workflows for GATK and Dindel are available at http://github.com/misshie/Workflows.
We demonstrate that it is feasible to determine high-resolution protein structures by electron crystallography of three-dimensional crystals in an electron cryo-microscope (CryoEM). Lysozyme microcrystals were frozen on an electron microscopy grid, and electron diffraction data collected to 1.7 Å resolution. We developed a data collection protocol to collect a full-tilt series in electron diffraction to atomic resolution. A single tilt series contains up to 90 individual diffraction patterns collected from a single crystal with tilt angle increment of 0.1–1° and a total accumulated electron dose less than 10 electrons per angstrom squared. We indexed the data from three crystals and used them for structure determination of lysozyme by molecular replacement followed by crystallographic refinement to 2.9 Å resolution. This proof of principle paves the way for the implementation of a new technique, which we name ‘MicroED’, that may have wide applicability in structural biology.
X-ray crystallography has been used to work out the atomic structure of a large number of proteins. In a typical X-ray crystallography experiment, a beam of X-rays is directed at a protein crystal, which scatters some of the X-ray photons to produce a diffraction pattern. The crystal is then rotated through a small angle and another diffraction pattern is recorded. Finally, after this process has been repeated enough times, it is possible to work backwards from the diffraction patterns to figure out the structure of the protein.
The crystals used for X-ray crystallography must be large to withstand the damage caused by repeated exposure to the X-ray beam. However, some proteins do not form crystals at all, and others only form small crystals. It is possible to overcome this problem by using extremely short pulses of X-rays, but this requires a very large number of small crystals and ultrashort X-ray pulses are only available at a handful of research centers around the world. There is, therefore, a need for other approaches that can determine the structure of proteins that only form small crystals.
Electron crystallography is similar to X-ray crystallography in that a protein crystal scatters a beam to produce a diffraction pattern. However, the interactions between the electrons in the beam and the crystal are much stronger than those between the X-ray photons and the crystal. This means that meaningful amounts of data can be collected from much smaller crystals. However, it is normally only possible to collect one diffraction pattern from each crystal because of beam induced damage. Researchers have developed methods to merge the diffraction patterns produced by hundreds of small crystals, but to date these techniques have only worked with very thin two-dimensional crystals that contain only one layer of the protein of interest.
Now Shi et al. report a new approach to electron crystallography that works with very small three-dimensional crystals. Called MicroED, this technique involves placing the crystal in a transmission electron cryo-microscope, which is a fairly standard piece of equipment in many laboratories. The normal ‘low-dose’ electron beam in one of these microscopes would normally damage the crystal after a single diffraction pattern had been collected. However, Shi et al. realized that it was possible to obtain diffraction patterns without severely damaging the crystal if they dramatically reduced the normal low-dose electron beam. By reducing the electron dose by a factor of 200, it was possible to collect up to 90 diffraction patterns from the same, very small, three-dimensional crystal, and then—similar to what happens in X-ray crystallography—work backwards to figure out the structure of the protein. Shi et al. demonstrated the feasibility of the MicroED approach by using it to determine the structure of lysozyme, which is widely used as a test protein in crystallography, with a resolution of 2.9 Å. This proof-of principle study paves the way for crystallographers to study protein that cannot be studied with existing techniques.
electron crystallography; electron diffraction; electron cryomicroscopy (cryo-EM); microED; protein structure; microcrystals; None
Circuitry mapping of metazoan neural systems is difficult because canonical neural regions (regions containing one or more copies of all components) are large, regional borders are uncertain, neuronal diversity is high, and potential network topologies so numerous that only anatomical ground truth can resolve them. Complete mapping of a specific network requires synaptic resolution, canonical region coverage, and robust neuronal classification. Though transmission electron microscopy (TEM) remains the optimal tool for network mapping, the process of building large serial section TEM (ssTEM) image volumes is rendered difficult by the need to precisely mosaic distorted image tiles and register distorted mosaics. Moreover, most molecular neuronal class markers are poorly compatible with optimal TEM imaging. Our objective was to build a complete framework for ultrastructural circuitry mapping. This framework combines strong TEM-compliant small molecule profiling with automated image tile mosaicking, automated slice-to-slice image registration, and gigabyte-scale image browsing for volume annotation. Specifically we show how ultrathin molecular profiling datasets and their resultant classification maps can be embedded into ssTEM datasets and how scripted acquisition tools (SerialEM), mosaicking and registration (ir-tools), and large slice viewers (MosaicBuilder, Viking) can be used to manage terabyte-scale volumes. These methods enable large-scale connectivity analyses of new and legacy data. In well-posed tasks (e.g., complete network mapping in retina), terabyte-scale image volumes that previously would require decades of assembly can now be completed in months. Perhaps more importantly, the fusion of molecular profiling, image acquisition by SerialEM, ir-tools volume assembly, and data viewers/annotators also allow ssTEM to be used as a prospective tool for discovery in nonneural systems and a practical screening methodology for neurogenetics. Finally, this framework provides a mechanism for parallelization of ssTEM imaging, volume assembly, and data analysis across an international user base, enhancing the productivity of a large cohort of electron microscopists.
Building an accurate neural network diagram of the vertebrate nervous system is a major challenge in neuroscience. Diverse groups of neurons that function together form complex patterns of connections often spanning large regions of brain tissue, with uncertain borders. Although serial-section transmission electron microscopy remains the optimal tool for fine anatomical analyses, the time and cost of the undertaking has been prohibitive. We have assembled a complete framework for ultrastructural mapping using conventional transmission electron microscopy that tremendously accelerates image analysis. This framework combines small-molecule profiling to classify cells, automated image acquisition, automated mosaic formation, automated slice-to-slice image registration, and large-scale image browsing for volume annotation. Terabyte-scale image volumes requiring decades or more to assemble manually can now be automatically built in a few months. This makes serial-section transmission electron microscopy practical for high-resolution exploration of all complex tissue systems (neural or nonneural) as well as for ultrastructural screening of genetic models.
A framework for analysis of terabyte-scale serial-section transmission electron microscopic (ssTEM) datasets overcomes computational barriers and accelerates high-resolution tissue analysis, providing a practical way of mapping complex neural circuitry and an effective screening tool for neurogenetics.
There has been a dramatic increase in the amount of quantitative data derived from the measurement of changes at different levels of biological complexity during the post-genomic era. However, there are a number of issues associated with the use of computational tools employed for the analysis of such data. For example, computational tools such as R and MATLAB require prior knowledge of their programming languages in order to implement statistical analyses on data. Combining two or more tools in an analysis may also be problematic since data may have to be manually copied and pasted between separate user interfaces for each tool. Furthermore, this transfer of data may require a reconciliation step in order for there to be interoperability between computational tools.
Developments in the Taverna workflow system have enabled pipelines to be constructed and enacted for generic and ad hoc analyses of quantitative data. Here, we present an example of such a workflow involving the statistical identification of differentially-expressed genes from microarray data followed by the annotation of their relationships to cellular processes. This workflow makes use of customised maxdBrowse web services, a system that allows Taverna to query and retrieve gene expression data from the maxdLoad2 microarray database. These data are then analysed by R to identify differentially-expressed genes using the Taverna RShell processor which has been developed for invoking this tool when it has been deployed as a service using the RServe library. In addition, the workflow uses Beanshell scripts to reconcile mismatches of data between services as well as to implement a form of user interaction for selecting subsets of microarray data for analysis as part of the workflow execution. A new plugin system in the Taverna software architecture is demonstrated by the use of renderers for displaying PDF files and CSV formatted data within the Taverna workbench.
Taverna can be used by data analysis experts as a generic tool for composing ad hoc analyses of quantitative data by combining the use of scripts written in the R programming language with tools exposed as services in workflows. When these workflows are shared with colleagues and the wider scientific community, they provide an approach for other scientists wanting to use tools such as R without having to learn the corresponding programming language to analyse their own data.
The 155-kDa plasma glycoprotein factor H (FH), which consists of 20 complement control protein (CCP) modules, protects self-tissue but not foreign organisms from damage by the complement cascade. Protection is achieved by selective engagement of FH, via CCPs 1–4, CCPs 6–8 and CCPs 19–20, with polyanion-rich host surfaces that bear covalently attached, activation-specific, fragments of complement component C3. The role of intervening CCPs 9–18 in this process is obscured by lack of structural knowledge. We have concatenated new high-resolution solution structures of overlapping recombinant CCP pairs, 10–11 and 11–12, to form a three-dimensional structure of CCPs 10–12 and validated it by small-angle X-ray scattering of the recombinant triple‐module fragment. Superimposing CCP 12 of this 10–12 structure with CCP 12 from the previously solved CCP 12–13 structure yielded an S-shaped structure for CCPs 10–13 in which modules are tilted by 80–110° with respect to immediate neighbors, but the bend between CCPs 10 and 11 is counter to the arc traced by CCPs 11–13. Including this four-CCP structure in interpretation of scattering data for the longer recombinant segments, CCPs 10–15 and 8–15, implied flexible attachment of CCPs 8 and 9 to CCP 10 but compact and intimate arrangements of CCP 14 with CCPs 12, 13 and 15. Taken together with difficulties in recombinant production of module pairs 13–14 and 14–15, the aberrant structure of CCP 13 and the variability of 13–14 linker sequences among orthologues, a structural dependency of CCP 14 on its neighbors is suggested; this has implications for the FH mechanism.
► The 20-CCP‐module human protein FH prevents complement-mediated tissue damage. ► NMR structures of CCPs 10–11 and 11–12 suggest that this region enhances flexional strength of FH. ► Concatenating bi-modules helps interpret small‐angle X‐ray scattering data, revealing highly compacted arrangement of CCPs 13, 14 and 15. ► Apparent structural dependency of CCP 14 on neighbors could provide a switch between ordered and flexible FH architectures.
CCP, complement control protein; CR1, complement receptor type 1; DAF, decay accelerating factor; FH, factor H; EOM, ensemble optimization method; HSQC, heteronuclear single quantum coherence; MCP, membrane cofactor protein; NOE, nuclear Overhauser enhancement; SAXS, small-angle X-ray scattering; TOCSY, total correlated spectroscopy; protein NMR; protein domains; complement system; small-angle X-ray scattering; regulators of complement activation
New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples.
AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallography steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference Fourier maps.
AutoDrug; fragment-based drug discovery; workflow automation
Recently, the availability of high-resolution microscopy together with the advancements in the development of biomarkers as reporters of biomolecular interactions increased the importance of imaging methods in molecular cell biology. These techniques enable the investigation of cellular characteristics like volume, size and geometry as well as volume and geometry of intracellular compartments, and the amount of existing proteins in a spatially resolved manner. Such detailed investigations opened up many new areas of research in the study of spatial, complex and dynamic cellular systems. One of the crucial challenges for the study of such systems is the design of a well stuctured and optimized workflow to provide a systematic and efficient hypothesis verification. Computer Science can efficiently address this task by providing software that facilitates handling, analysis, and evaluation of biological data to the benefit of experimenters and modelers.
The Spatio-Temporal Simulation Environment (STSE) is a set of open-source tools provided to conduct spatio-temporal simulations in discrete structures based on microscopy images. The framework contains modules to digitize, represent, analyze, and mathematically model spatial distributions of biochemical species. Graphical user interface (GUI) tools provided with the software enable meshing of the simulation space based on the Voronoi concept. In addition, it supports to automatically acquire spatial information to the mesh from the images based on pixel luminosity (e.g. corresponding to molecular levels from microscopy images). STSE is freely available either as a stand-alone version or included in the linux live distribution Systems Biology Operational Software (SB.OS) and can be downloaded from http://www.stse-software.org/. The Python source code as well as a comprehensive user manual and video tutorials are also offered to the research community. We discuss main concepts of the STSE design and workflow. We demonstrate it's usefulness using the example of a signaling cascade leading to formation of a morphological gradient of Fus3 within the cytoplasm of the mating yeast cell Saccharomyces cerevisiae.
STSE is an efficient and powerful novel platform, designed for computational handling and evaluation of microscopic images. It allows for an uninterrupted workflow including digitization, representation, analysis, and mathematical modeling. By providing the means to relate the simulation to the image data it allows for systematic, image driven model validation or rejection. STSE can be scripted and extended using the Python language. STSE should be considered rather as an API together with workflow guidelines and a collection of GUI tools than a stand alone application. The priority of the project is to provide an easy and intuitive way of extending and customizing software using the Python language.
Scientific workflows integrate data and computing interfaces as configurable, semi-automatic graphs to solve a scientific problem. Kepler is such a software system for designing, executing, reusing, evolving, archiving and sharing scientific workflows. Electron tomography (ET) enables high-resolution views of complex cellular structures, such as cytoskeletons, organelles, viruses and chromosomes. Imaging investigations produce large datasets. For instance, in Electron Tomography, the size of a 16 fold image tilt series is about 65 Gigabytes with each projection image including 4096 by 4096 pixels. When we use serial sections or montage technique for large field ET, the dataset will be even larger. For higher resolution images with multiple tilt series, the data size may be in terabyte range. Demands of mass data processing and complex algorithms require the integration of diverse codes into flexible software structures. This paper describes a workflow for Electron Tomography Programs in Kepler (EPiK). This EPiK workflow embeds the tracking process of IMOD, and realizes the main algorithms including filtered backprojection (FBP) from TxBR and iterative reconstruction methods. We have tested the three dimensional (3D) reconstruction process using EPiK on ET data. EPiK can be a potential toolkit for biology researchers with the advantage of logical viewing, easy handling, convenient sharing and future extensibility.
Electron Tomography; Scientific workflows; EPiK; TxBR; Kepler
We describe a collection of standardized image processing protocols for electron microscopy single-particle analysis using the XMIPP software package. These protocols allow performing the entire processing workflow starting from digitized micrographs up to the final refinement and evaluation of 3D models. A particular emphasis has been placed on the treatment of structurally heterogeneous data through maximum-likelihood refinements and self-organizing maps as well as the generation of initial 3D models for such data sets through random conical tilt reconstruction methods. All protocols presented have been implemented as stand-alone, executable python scripts, for which a dedicated graphical user interface has been developed. Thereby, they may provide novice users with a convenient tool to quickly obtain useful results with minimum efforts in learning about the details of this comprehensive package. Examples of applications are presented for a negative stain random conical tilt data set on the hexameric helicase G40P and for a structurally heterogeneous data set on 70S Escherichia coli ribosomes embedded in vitrified ice.
Biological processes occur on a wide range of spatial and temporal scales: from femtoseconds to hours and from angstroms to meters. Many new biological insights can be expected from a better understanding of the processes that occur on these very fast and very small scales. In this regard, new instruments that use fast X-ray or electron pulses are expected to reveal novel mechanistic details for macromolecular protein dynamics. To ensure that any observed conformational change is physiologically relevant and not constrained by 3D crystal packing, it would be preferable for experiments to utilize small protein samples such as single particles or 2D crystals that mimic the target protein's native environment. These samples are not typically amenable to X-ray analysis, but transmission electron microscopy has imaged such sample geometries for over 40 years using both direct imaging and diffraction modes. While conventional transmission electron microscopes (TEM) have visualized biological samples with atomic resolution in an arrested or frozen state, the recent development of the dynamic TEM (DTEM) extends electron microscopy into a dynamic regime using pump-probe imaging. A new second-generation DTEM, which is currently being constructed, has the potential to observe live biological processes with unprecedented spatiotemporal resolution by using pulsed electron packets to probe the sample on micro- and nanosecond timescales. This article reviews the experimental parameters necessary for coupling DTEM with in situ liquid microscopy to enable direct imaging of protein conformational dynamics in a fully hydrated environment and visualize reactions propagating in real time.
in situ microscopy; liquid TEM; dynamic TEM; time-resolved imaging
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
Python; large-scale models; reproducibility; computational neuroscience; workflow; integration
Advanced grazing-incidence techniques have developed significantly during recent years. With the ongoing progress in instrumentation, novel methods have emerged which allow for an in-depth morphology characterization of modern soft-matter materials. Examples are in situ and in operando grazing-incidence small-angle X-ray scattering (GISAXS), micro- and nanofocused GISAXS, time-of-flight (TOF) grazing-incidence small-angle neutron scattering (GISANS) and surface-sensitive resonant soft X-ray scattering techniques, including the potential to investigate polarization. Progress in software for data analysis is another important aspect.
The complex nano-morphology of modern soft-matter materials is successfully probed with advanced grazing-incidence techniques. Based on grazing-incidence small- and wide-angle X-ray and neutron scattering (GISAXS, GIWAXS, GISANS and GIWANS), new possibilities arise which are discussed with selected examples. Due to instrumental progress, highly interesting possibilities for local structure analysis in this material class arise from the use of micro- and nanometer-sized X-ray beams in micro- or nanofocused GISAXS and GIWAXS experiments. The feasibility of very short data acquisition times down to milliseconds creates exciting possibilities for in situ and in operando GISAXS and GIWAXS studies. Tuning the energy of GISAXS and GIWAXS in the soft X-ray regime and in time-of flight GISANS allows the tailoring of contrast conditions and thereby the probing of more complex morphologies. In addition, recent progress in software packages, useful for data analysis for advanced grazing-incidence techniques, is discussed.
grazing-incidence techniques; GISAXS; GIWAXS; resonant soft X-ray scattering; GISANS; morphology; soft matter
The application of fluorescence microscopy in cell biology often generates a huge amount of imaging data. Automated whole cell segmentation of such data enables the detection and analysis of individual cells, where a manual delineation is often time consuming, or practically not feasible. Furthermore, compared to manual analysis, automation normally has a higher degree of reproducibility. CellSegm, the software presented in this work, is a Matlab based command line software toolbox providing an automated whole cell segmentation of images showing surface stained cells, acquired by fluorescence microscopy. It has options for both fully automated and semi-automated cell segmentation. Major algorithmic steps are: (i) smoothing, (ii) Hessian-based ridge enhancement, (iii) marker-controlled watershed segmentation, and (iv) feature-based classfication of cell candidates. Using a wide selection of image recordings and code snippets, we demonstrate that CellSegm has the ability to detect various types of surface stained cells in 3D. After detection and outlining of individual cells, the cell candidates can be subject to software based analysis, specified and programmed by the end-user, or they can be analyzed by other software tools. A segmentation of tissue samples with appropriate characteristics is also shown to be resolvable in CellSegm. The command-line interface of CellSegm facilitates scripting of the separate tools, all implemented in Matlab, offering a high degree of flexibility and tailored workflows for the end-user. The modularity and scripting capabilities of CellSegm enable automated workflows and quantitative analysis of microscopic data, suited for high-throughput image based screening.
Automated analysis; Cell segmentation; CellSegm; High-throughput; Nucleus staining; Surface staining
The development of a high-duty-cycle microsecond time-resolution SAXS capability at the Biophysics Collaborative Access Team beamline (BioCAT) 18ID at the Advanced Photon Source, Argonne National Laboratory, USA, is reported.
Small-angle X-ray scattering (SAXS) is a well established technique to probe the nanoscale structure and interactions in soft matter. It allows one to study the structure of native particles in near physiological environments and to analyze structural changes in response to variations in external conditions. The combination of microfluidics and SAXS provides a powerful tool to investigate dynamic processes on a molecular level with sub-millisecond time resolution. Reaction kinetics in the sub-millisecond time range has been achieved using continuous-flow mixers manufactured using micromachining techniques. The time resolution of these devices has previously been limited, in part, by the X-ray beam sizes delivered by typical SAXS beamlines. These limitations can be overcome using optics to focus X-rays to the micrometer size range providing that beam divergence and photon flux suitable for performing SAXS experiments can be maintained. Such micro-SAXS in combination with microfluidic devices would be an attractive probe for time-resolved studies. Here, the development of a high-duty-cycle scanning microsecond-time-resolution SAXS capability, built around the Kirkpatrick–Baez mirror-based microbeam system at the Biophysics Collaborative Access Team (BioCAT) beamline 18ID at the Advanced Photon Source, Argonne National Laboratory, is reported. A detailed description of the microbeam small-angle-scattering instrument, the turbulent flow mixer, as well as the data acquisition and control and analysis software is provided. Results are presented where this apparatus was used to study the folding of cytochrome c. Future prospects for this technique are discussed.
micro-SAXS; time-resolved; protein folding