Search tips
Search criteria

Results 1-25 (788341)

Clipboard (0)

Related Articles

1.  Using Situs for the integration of multi-resolution structures 
Biophysical Reviews  2010;2(1):21-27.
Situs is a modular and widely used software package for the integration of biophysical data across the spatial resolution scales. It has been developed over the last decade with a focus on bridging the resolution gap between atomic structures, coarse-grained models, and volumetric data from low-resolution biophysical origins, such as electron microscopy, tomography, or small-angle scattering. Structural models can be created and refined with various flexible and rigid body docking strategies. The software consists of multiple, stand-alone programs for the format conversion, analysis, visualization, manipulation, and assembly of 3D data sets. The programs have been ported to numerous platforms in both serial and shared memory parallel architectures and can be combined in various ways for specific modeling applications. The modular design facilitates the updating of individual programs and the development of novel application workflows. This review provides an overview of the Situs package as it exists today with an emphasis on functionality and workflows supported by version 2.5.
Electronic supplementary material
The online version of this article (doi:10.1007/s12551-009-0026-3) contains supplementary material, which is available to authorized users.
PMCID: PMC2821521  PMID: 20174447
Structural models; 3D data sets; Multi-platform; Modeling
2.  MIA - A free and open source software for gray scale medical image analysis 
Gray scale images make the bulk of data in bio-medical image analysis, and hence, the main focus of many image processing tasks lies in the processing of these monochrome images. With ever improving acquisition devices, spatial and temporal image resolution increases, and data sets become very large.
Various image processing frameworks exists that make the development of new algorithms easy by using high level programming languages or visual programming. These frameworks are also accessable to researchers that have no background or little in software development because they take care of otherwise complex tasks. Specifically, the management of working memory is taken care of automatically, usually at the price of requiring more it. As a result, processing large data sets with these tools becomes increasingly difficult on work station class computers.
One alternative to using these high level processing tools is the development of new algorithms in a languages like C++, that gives the developer full control over how memory is handled, but the resulting workflow for the prototyping of new algorithms is rather time intensive, and also not appropriate for a researcher with little or no knowledge in software development.
Another alternative is in using command line tools that run image processing tasks, use the hard disk to store intermediate results, and provide automation by using shell scripts. Although not as convenient as, e.g. visual programming, this approach is still accessable to researchers without a background in computer science. However, only few tools exist that provide this kind of processing interface, they are usually quite task specific, and don’t provide an clear approach when one wants to shape a new command line tool from a prototype shell script.
The proposed framework, MIA, provides a combination of command line tools, plug-ins, and libraries that make it possible to run image processing tasks interactively in a command shell and to prototype by using the according shell scripting language. Since the hard disk becomes the temporal storage memory management is usually a non-issue in the prototyping phase. By using string-based descriptions for filters, optimizers, and the likes, the transition from shell scripts to full fledged programs implemented in C++ is also made easy. In addition, its design based on atomic plug-ins and single tasks command line tools makes it easy to extend MIA, usually without the requirement to touch or recompile existing code.
In this article, we describe the general design of MIA, a general purpouse framework for gray scale image processing. We demonstrated the applicability of the software with example applications from three different research scenarios, namely motion compensation in myocardial perfusion imaging, the processing of high resolution image data that arises in virtual anthropology, and retrospective analysis of treatment outcome in orthognathic surgery. With MIA prototyping algorithms by using shell scripts that combine small, single-task command line tools is a viable alternative to the use of high level languages, an approach that is especially useful when large data sets need to be processed.
PMCID: PMC4015836  PMID: 24119305
3.  Workflows for microarray data processing in the Kepler environment 
BMC Bioinformatics  2012;13:102.
Microarray data analysis has been the subject of extensive and ongoing pipeline development due to its complexity, the availability of several options at each analysis step, and the development of new analysis demands, including integration with new data sources. Bioinformatics pipelines are usually custom built for different applications, making them typically difficult to modify, extend and repurpose. Scientific workflow systems are intended to address these issues by providing general-purpose frameworks in which to develop and execute such pipelines. The Kepler workflow environment is a well-established system under continual development that is employed in several areas of scientific research. Kepler provides a flexible graphical interface, featuring clear display of parameter values, for design and modification of workflows. It has capabilities for developing novel computational components in the R, Python, and Java programming languages, all of which are widely used for bioinformatics algorithm development, along with capabilities for invoking external applications and using web services.
We developed a series of fully functional bioinformatics pipelines addressing common tasks in microarray processing in the Kepler workflow environment. These pipelines consist of a set of tools for GFF file processing of NimbleGen chromatin immunoprecipitation on microarray (ChIP-chip) datasets and more comprehensive workflows for Affymetrix gene expression microarray bioinformatics and basic primer design for PCR experiments, which are often used to validate microarray results. Although functional in themselves, these workflows can be easily customized, extended, or repurposed to match the needs of specific projects and are designed to be a toolkit and starting point for specific applications. These workflows illustrate a workflow programming paradigm focusing on local resources (programs and data) and therefore are close to traditional shell scripting or R/BioConductor scripting approaches to pipeline design. Finally, we suggest that microarray data processing task workflows may provide a basis for future example-based comparison of different workflow systems.
We provide a set of tools and complete workflows for microarray data analysis in the Kepler environment, which has the advantages of offering graphical, clear display of conceptual steps and parameters and the ability to easily integrate other resources such as remote data and web services.
PMCID: PMC3431220  PMID: 22594911
4.  Automated Tracing of Filaments in 3D Electron Tomography Reconstructions using Sculptor and Situs 
Journal of structural biology  2012;178(2):121-128.
The molecular graphics program Sculptor and the command-line suite Situs are software packages for the integration of biophysical data across spatial resolution scales. Herein, we provide an overview of recently developed tools relevant to cryo-electron tomography (cryo-ET), with an emphasis on functionality supported by Situs 2.7 and Sculptor 2.1. We describe a work flow for automatically segmenting filaments in cryo-ET maps including denoising, local normalization, feature detection, and tracing. Tomograms of cellular actin networks exhibit both cross-linked and bundled filament densities. Such filamentous regions in cryo-ET data sets can then be segmented using a stochastic template-based search, VolTrac. The approach combines a genetic algorithm and a bidirectional expansion with a tabu search strategy to localize and characterize filamentous regions. The automated filament segmentation by VolTrac compares well to a manual one performed by expert users, and it allows an efficient and reproducible analysis of large data sets. The software is free, open source, and can be used on Linux, Macintosh or Windows computers.
PMCID: PMC3440181  PMID: 22433493
Tomograms; 3D analysis; filament detection; actin networks; denoising; segmentation
5.  BioFlow: a web based workflow management software for design and execution of genomics pipelines 
Bioinformatics data analysis is usually done sequentially by chaining together multiple tools. These are created by writing scripts and tracking the inputs and outputs of all stages. Writing such scripts require programming skills. Executing multiple pipelines in parallel and keeping track of all the generated files is difficult and error prone. Checking results and task completion requires users to remotely login to their servers and run commands to identify process status. Users would benefit from a web-based tool that allows creation and execution of pipelines remotely. The tool should also keep track of all the files generated and maintain a history of user activities.
A software tool for building and executing workflows is described here. The individual tools in the workflows can be any command line executable or script. The software has an intuitive mechanism for adding new tools to be used in workflows. It contains a workflow designer where workflows can be creating by visually connecting various components. Workflows are executed by job runners. The outputs and the job history are saved. The tool is web based software tool and all actions can be performed remotely.
Users without scripting knowledge can utilize the tool to build pipelines for executing tasks. Pipelines can be modeled as workflows that are reusable. BioFlow enables users to easily add new tools to the database. The workflows can be created and executed remotely. A number of parallel jobs can be easily controlled. Distributed execution is possible by running multiple instances of the application. Any number of tasks can be executed and the output will be stored making it is easy to correlate the outputs to the jobs executed.
PMCID: PMC4179862
6.  A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines 
BMC Bioinformatics  2011;12:61.
Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP) paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts.
To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'). A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption). An add-on module ('NuBio') facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures) and functionality (e.g., to parse/write standard file formats).
PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at, and includes extensive documentation and annotated usage examples.
PMCID: PMC3051902  PMID: 21352538
7.  STAMP: Extensions to the STADEN sequence analysis package for high throughput interactive microsatellite marker design 
BMC Bioinformatics  2009;10:41.
Microsatellites (MSs) are DNA markers with high analytical power, which are widely used in population genetics, genetic mapping, and forensic studies. Currently available software solutions for high-throughput MS design (i) have shortcomings in detecting and distinguishing imperfect and perfect MSs, (ii) lack often necessary interactive design steps, and (iii) do not allow for the development of primers for multiplex amplifications. We present a set of new tools implemented as extensions to the STADEN package, which provides the backbone functionality for flexible sequence analysis workflows. The possibility to assemble overlapping reads into unique contigs (provided by the base functionality of the STADEN package) is important to avoid developing redundant markers, a feature missing from most other similar tools.
Our extensions to the STADEN package provide the following functionality to facilitate microsatellite (and also minisatellite) marker design: The new modules (i) integrate the state-of-the-art tandem repeat detection and analysis software PHOBOS into workflows, (ii) provide two separate repeat detection steps – with different search criteria – one for masking repetitive regions during assembly of sequencing reads and the other for designing repeat-flanking primers for MS candidate loci, (iii) incorporate the widely used primer design program PRIMER3 into STADEN workflows, enabling the interactive design and visualization of flanking primers for microsatellites, and (iv) provide the functionality to find optimal locus- and primer pair combinations for multiplex primer design. Furthermore, our extensions include a module for storing analysis results in an SQLite database, providing a transparent solution for data access from within as well as from outside of the STADEN Package.
The STADEN package is enhanced by our modules into a highly flexible, high-throughput, interactive tool for conventional and multiplex microsatellite marker design. It gives the user detailed control over the workflow, enabling flexible combinations of manual and automated analysis steps. The software is available under the OpenBSD License [1,2]. The high efficiency of our automated marker design workflow has been confirmed in three microsatellite development projects.
PMCID: PMC2644677  PMID: 19183437
8.  Yabi: An online research environment for grid, high performance and cloud computing 
There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure.
In this paper, we describe the architecture of a software system that is adaptable to a range of both pluggable execution and data backends in an open source implementation called Yabi. Enabling seamless and transparent access to heterogenous HPC environments at its core, Yabi then provides an analysis workflow environment that can create and reuse workflows as well as manage large amounts of both raw and processed data in a secure and flexible way across geographically distributed compute resources. Yabi can be used via a web-based environment to drag-and-drop tools to create sophisticated workflows. Yabi can also be accessed through the Yabi command line which is designed for users that are more comfortable with writing scripts or for enabling external workflow environments to leverage the features in Yabi. Configuring tools can be a significant overhead in workflow environments. Yabi greatly simplifies this task by enabling system administrators to configure as well as manage running tools via a web-based environment and without the need to write or edit software programs or scripts. In this paper, we highlight Yabi's capabilities through a range of bioinformatics use cases that arise from large-scale biomedical data analysis.
The Yabi system encapsulates considered design of both execution and data models, while abstracting technical details away from users who are not skilled in HPC and providing an intuitive drag-and-drop scalable web-based workflow environment where the same tools can also be accessed via a command line. Yabi is currently in use and deployed at multiple institutions and is available at
PMCID: PMC3298538  PMID: 22333270
Bioinformatics; workflows; Internet; high performance computing
9.  Workflow and Electronic Health Records in Small Medical Practices 
This paper analyzes the workflow and implementation of electronic health record (EHR) systems across different functions in small physician offices. We characterize the differences in the offices based on the levels of computerization in terms of workflow, sources of time delay, and barriers to using EHR systems to support the entire workflow. The study was based on a combination of questionnaires, interviews, in situ observations, and data collection efforts. This study was not intended to be a full-scale time-and-motion study with precise measurements but was intended to provide an overview of the potential sources of delays while performing office tasks. The study follows an interpretive model of case studies rather than a large-sample statistical survey of practices. To identify time-consuming tasks, workflow maps were created based on the aggregated data from the offices. The results from the study show that specialty physicians are more favorable toward adopting EHR systems than primary care physicians are. The barriers to adoption of EHR systems by primary care physicians can be attributed to the complex workflows that exist in primary care physician offices, leading to nonstandardized workflow structures and practices. Also, primary care physicians would benefit more from EHR systems if the systems could interact with external entities.
PMCID: PMC3329208  PMID: 22737096
10.  An atomic model of brome mosaic virus using direct electron detection and real-space optimization 
Nature Communications  2014;5:4808.
Advances in electron cryo-microscopy have enabled structure determination of macromolecules at near-atomic resolution. However, structure determination, even using de novo methods, remains susceptible to model bias and overfitting. Here we describe a complete workflow for data acquisition, image processing, all-atom modelling and validation of brome mosaic virus, an RNA virus. Data were collected with a direct electron detector in integrating mode and an exposure beyond the traditional radiation damage limit. The final density map has a resolution of 3.8 Å as assessed by two independent data sets and maps. We used the map to derive an all-atom model with a newly implemented real-space optimization protocol. The validity of the model was verified by its match with the density map and a previous model from X-ray crystallography, as well as the internal consistency of models from independent maps. This study demonstrates a practical approach to obtain a rigorously validated atomic resolution electron cryo-microscopy structure.
Recent developments in cryo-electron microscopy have enabled structure determination of large protein complexes at almost atomic resolution. Wang et al. combine some of these technologies into an effective workflow, and demonstrate the protocol by solving the atomic structure of an icosahedral RNA virus.
PMCID: PMC4155512  PMID: 25185801
11.  TXM-Wizard: a program for advanced data collection and evaluation in full-field transmission X-ray microscopy 
Journal of Synchrotron Radiation  2012;19(Pt 2):281-287.
A suite of GUI programs written in MATLAB for advanced data collection and analysis of full-field transmission X-ray microscopy data including mosaic imaging, tomography and XANES imaging is presented.
Transmission X-ray microscopy (TXM) has been well recognized as a powerful tool for non-destructive investigation of the three-dimensional inner structure of a sample with spatial resolution down to a few tens of nanometers, especially when combined with synchrotron radiation sources. Recent developments of this technique have presented a need for new tools for both system control and data analysis. Here a software package developed in MATLAB for script command generation and analysis of TXM data is presented. The first toolkit, the script generator, allows automating complex experimental tasks which involve up to several thousand motor movements. The second package was designed to accomplish computationally intense tasks such as data processing of mosaic and mosaic tomography datasets; dual-energy contrast imaging, where data are recorded above and below a specific X-ray absorption edge; and TXM X-ray absorption near-edge structure imaging datasets. Furthermore, analytical and iterative tomography reconstruction algorithms were implemented. The compiled software package is freely available.
PMCID: PMC3284347  PMID: 22338691
X-ray microscopy; full-field; tomography; XANES imaging
12.  A correlative approach for combining microCT, light and transmission electron microscopy in a single 3D scenario 
Frontiers in Zoology  2013;10:44.
In biomedical research, a huge variety of different techniques is currently available for the structural examination of small specimens, including conventional light microscopy (LM), transmission electron microscopy (TEM), confocal laser scanning microscopy (CLSM), microscopic X-ray computed tomography (microCT), and many others. Since every imaging method is physically limited by certain parameters, a correlative use of complementary methods often yields a significant broader range of information. Here we demonstrate the advantages of the correlative use of microCT, light microscopy, and transmission electron microscopy for the analysis of small biological samples.
We used a small juvenile bivalve mollusc (Mytilus galloprovincialis, approximately 0.8 mm length) to demonstrate the workflow of a correlative examination by microCT, LM serial section analysis, and TEM-re-sectioning. Initially these three datasets were analyzed separately, and subsequently they were fused in one 3D scene. This workflow is very straightforward. The specimen was processed as usual for transmission electron microscopy including post-fixation in osmium tetroxide and embedding in epoxy resin. Subsequently it was imaged with microCT. Post-fixation in osmium tetroxide yielded sufficient X-ray contrast for microCT imaging, since the X-ray absorption of epoxy resin is low. Thereafter, the same specimen was serially sectioned for LM investigation. The serial section images were aligned and specific organ systems were reconstructed based on manual segmentation and surface rendering. According to the region of interest (ROI), specific LM sections were detached from the slides, re-mounted on resin blocks and re-sectioned (ultrathin) for TEM. For analysis, image data from the three different modalities was co-registered into a single 3D scene using the software AMIRA®. We were able to register both the LM section series volume and TEM slices neatly to the microCT dataset, with small geometric deviations occurring only in the peripheral areas of the specimen. Based on co-registered datasets the excretory organs, which were chosen as ROI for this study, could be investigated regarding both their ultrastructure as well as their position in the organism and their spatial relationship to adjacent tissues. We found structures typical for mollusc excretory systems, including ultrafiltration sites at the pericardial wall, and ducts leading from the pericardium towards the kidneys, which exhibit a typical basal infolding system.
The presented approach allows a comprehensive analysis and presentation of small objects regarding both the overall organization as well as cellular and subcellular details. Although our protocol involves a variety of different equipment and procedures, we maintain that it offers savings in both effort and cost. Co-registration of datasets from different imaging modalities can be accomplished with high-end desktop computers and offers new opportunities for understanding and communicating structural relationships within organisms and tissues. In general, the correlative use of different microscopic imaging techniques will continue to become more widespread in morphological and structural research in zoology. Classical TEM serial section investigations are extremely time consuming, and modern methods for 3D analysis of ultrastructure such as SBF-SEM and FIB-SEM are limited to very small volumes for examination. Thus the re-sectioning of LM sections is suitable for speeding up TEM examination substantially, while microCT could become a key-method for complementing ultrastructural examinations.
PMCID: PMC3750762  PMID: 23915384
13.  MassCascade: Visual Programming for LC-MS Data Processing in Metabolomics 
Molecular Informatics  2014;33(4):307-310.
Liquid chromatography coupled to mass spectrometry (LC-MS) is commonly applied to investigate the small molecule complement of organisms. Several software tools are typically joined in custom pipelines to semi-automatically process and analyse the resulting data. General workflow environments like the Konstanz Information Miner (KNIME) offer the potential of an all-in-one solution to process LC-MS data by allowing easy integration of different tools and scripts. We describe MassCascade and its workflow plug-in for processing LC-MS data. The Java library integrates frequently used algorithms in a modular fashion, thus enabling it to serve as back-end for graphical front-ends. The functions available in MassCascade have been encapsulated in a plug-in for the workflow environment KNIME, allowing combined use with e.g. statistical workflow nodes from other providers and making the tool intuitive to use without knowledge of programming. The design of the software guarantees a high level of modularity where processing functions can be quickly replaced or concatenated. MassCascade is an open-source library for LC-MS data processing in metabolomics. It embraces the concept of visual programming through its KNIME plug-in, simplifying the process of building complex workflows. The library was validated using open data.
PMCID: PMC4524413  PMID: 26279687
Mass spectrometry; Data analysis; Workflow platform; Metabolomics
14.  EPiK-a Workflow for Electron Tomography in Kepler* 
Procedia computer science  2014;20:2295-2305.
Scientific workflows integrate data and computing interfaces as configurable, semi-automatic graphs to solve a scientific problem. Kepler is such a software system for designing, executing, reusing, evolving, archiving and sharing scientific workflows. Electron tomography (ET) enables high-resolution views of complex cellular structures, such as cytoskeletons, organelles, viruses and chromosomes. Imaging investigations produce large datasets. For instance, in Electron Tomography, the size of a 16 fold image tilt series is about 65 Gigabytes with each projection image including 4096 by 4096 pixels. When we use serial sections or montage technique for large field ET, the dataset will be even larger. For higher resolution images with multiple tilt series, the data size may be in terabyte range. Demands of mass data processing and complex algorithms require the integration of diverse codes into flexible software structures. This paper describes a workflow for Electron Tomography Programs in Kepler (EPiK). This EPiK workflow embeds the tracking process of IMOD, and realizes the main algorithms including filtered backprojection (FBP) from TxBR and iterative reconstruction methods. We have tested the three dimensional (3D) reconstruction process using EPiK on ET data. EPiK can be a potential toolkit for biology researchers with the advantage of logical viewing, easy handling, convenient sharing and future extensibility.
PMCID: PMC4304086  PMID: 25621086
Electron Tomography; Scientific workflows; EPiK; TxBR; Kepler
15.  Biowep: a workflow enactment portal for bioinformatics applications 
BMC Bioinformatics  2007;8(Suppl 1):S19.
The huge amount of biological information, its distribution over the Internet and the heterogeneity of available software tools makes the adoption of new data integration and analysis network tools a necessity in bioinformatics. ICT standards and tools, like Web Services and Workflow Management Systems (WMS), can support the creation and deployment of such systems. Many Web Services are already available and some WMS have been proposed. They assume that researchers know which bioinformatics resources can be reached through a programmatic interface and that they are skilled in programming and building workflows. Therefore, they are not viable to the majority of unskilled researchers. A portal enabling these to take profit from new technologies is still missing.
We designed biowep, a web based client application that allows for the selection and execution of a set of predefined workflows. The system is available on-line. Biowep architecture includes a Workflow Manager, a User Interface and a Workflow Executor. The task of the Workflow Manager is the creation and annotation of workflows. These can be created by using either the Taverna Workbench or BioWMS. Enactment of workflows is carried out by FreeFluo for Taverna workflows and by BioAgent/Hermes, a mobile agent-based middleware, for BioWMS ones. Main workflows' processing steps are annotated on the basis of their input and output, elaboration type and application domain by using a classification of bioinformatics data and tasks. The interface supports users authentication and profiling. Workflows can be selected on the basis of users' profiles and can be searched through their annotations. Results can be saved.
We developed a web system that support the selection and execution of predefined workflows, thus simplifying access for all researchers. The implementation of Web Services allowing specialized software to interact with an exhaustive set of biomedical databases and analysis software and the creation of effective workflows can significantly improve automation of in-silico analysis. Biowep is available for interested researchers as a reference portal. They are invited to submit their workflows to the workflow repository. Biowep is further being developed in the sphere of the Laboratory of Interdisciplinary Technologies in Bioinformatics – LITBIO.
PMCID: PMC1885848  PMID: 17430563
16.  CellSegm - a MATLAB toolbox for high-throughput 3D cell segmentation 
The application of fluorescence microscopy in cell biology often generates a huge amount of imaging data. Automated whole cell segmentation of such data enables the detection and analysis of individual cells, where a manual delineation is often time consuming, or practically not feasible. Furthermore, compared to manual analysis, automation normally has a higher degree of reproducibility. CellSegm, the software presented in this work, is a Matlab based command line software toolbox providing an automated whole cell segmentation of images showing surface stained cells, acquired by fluorescence microscopy. It has options for both fully automated and semi-automated cell segmentation. Major algorithmic steps are: (i) smoothing, (ii) Hessian-based ridge enhancement, (iii) marker-controlled watershed segmentation, and (iv) feature-based classfication of cell candidates. Using a wide selection of image recordings and code snippets, we demonstrate that CellSegm has the ability to detect various types of surface stained cells in 3D. After detection and outlining of individual cells, the cell candidates can be subject to software based analysis, specified and programmed by the end-user, or they can be analyzed by other software tools. A segmentation of tissue samples with appropriate characteristics is also shown to be resolvable in CellSegm. The command-line interface of CellSegm facilitates scripting of the separate tools, all implemented in Matlab, offering a high degree of flexibility and tailored workflows for the end-user. The modularity and scripting capabilities of CellSegm enable automated workflows and quantitative analysis of microscopic data, suited for high-throughput image based screening.
PMCID: PMC3850890  PMID: 23938087
Automated analysis; Cell segmentation; CellSegm; High-throughput; Nucleus staining; Surface staining
17.  A Computational Framework for Ultrastructural Mapping of Neural Circuitry 
PLoS Biology  2009;7(3):e1000074.
Circuitry mapping of metazoan neural systems is difficult because canonical neural regions (regions containing one or more copies of all components) are large, regional borders are uncertain, neuronal diversity is high, and potential network topologies so numerous that only anatomical ground truth can resolve them. Complete mapping of a specific network requires synaptic resolution, canonical region coverage, and robust neuronal classification. Though transmission electron microscopy (TEM) remains the optimal tool for network mapping, the process of building large serial section TEM (ssTEM) image volumes is rendered difficult by the need to precisely mosaic distorted image tiles and register distorted mosaics. Moreover, most molecular neuronal class markers are poorly compatible with optimal TEM imaging. Our objective was to build a complete framework for ultrastructural circuitry mapping. This framework combines strong TEM-compliant small molecule profiling with automated image tile mosaicking, automated slice-to-slice image registration, and gigabyte-scale image browsing for volume annotation. Specifically we show how ultrathin molecular profiling datasets and their resultant classification maps can be embedded into ssTEM datasets and how scripted acquisition tools (SerialEM), mosaicking and registration (ir-tools), and large slice viewers (MosaicBuilder, Viking) can be used to manage terabyte-scale volumes. These methods enable large-scale connectivity analyses of new and legacy data. In well-posed tasks (e.g., complete network mapping in retina), terabyte-scale image volumes that previously would require decades of assembly can now be completed in months. Perhaps more importantly, the fusion of molecular profiling, image acquisition by SerialEM, ir-tools volume assembly, and data viewers/annotators also allow ssTEM to be used as a prospective tool for discovery in nonneural systems and a practical screening methodology for neurogenetics. Finally, this framework provides a mechanism for parallelization of ssTEM imaging, volume assembly, and data analysis across an international user base, enhancing the productivity of a large cohort of electron microscopists.
Author Summary
Building an accurate neural network diagram of the vertebrate nervous system is a major challenge in neuroscience. Diverse groups of neurons that function together form complex patterns of connections often spanning large regions of brain tissue, with uncertain borders. Although serial-section transmission electron microscopy remains the optimal tool for fine anatomical analyses, the time and cost of the undertaking has been prohibitive. We have assembled a complete framework for ultrastructural mapping using conventional transmission electron microscopy that tremendously accelerates image analysis. This framework combines small-molecule profiling to classify cells, automated image acquisition, automated mosaic formation, automated slice-to-slice image registration, and large-scale image browsing for volume annotation. Terabyte-scale image volumes requiring decades or more to assemble manually can now be automatically built in a few months. This makes serial-section transmission electron microscopy practical for high-resolution exploration of all complex tissue systems (neural or nonneural) as well as for ultrastructural screening of genetic models.
A framework for analysis of terabyte-scale serial-section transmission electron microscopic (ssTEM) datasets overcomes computational barriers and accelerates high-resolution tissue analysis, providing a practical way of mapping complex neural circuitry and an effective screening tool for neurogenetics.
PMCID: PMC2661966  PMID: 19855814
18.  Data processing and analysis with the autoPROC toolbox 
Typical topics and problems encountered during data processing of diffraction experiments are discussed and the tools provided in the autoPROC software are described.
A typical diffraction experiment will generate many images and data sets from different crystals in a very short time. This creates a challenge for the high-throughput operation of modern synchrotron beamlines as well as for the subsequent data processing. Novice users in particular may feel overwhelmed by the tables, plots and numbers that the different data-processing programs and software packages present to them. Here, some of the more common problems that a user has to deal with when processing a set of images that will finally make up a processed data set are shown, concentrating on difficulties that may often show up during the first steps along the path of turning the experiment (i.e. data collection) into a model (i.e. interpreted electron density). Difficulties such as unexpected crystal forms, issues in crystal handling and suboptimal choices of data-collection strategies can often be dealt with, or at least diagnosed, by analysing specific data characteristics during processing. In the end, one wants to distinguish problems over which one has no immediate control once the experiment is finished from problems that can be remedied a posteriori. A new software package, autoPROC, is also presented that combines third-party processing programs with new tools and an automated workflow script that is intended to provide users with both guidance and insight into the offline processing of data affected by the difficulties mentioned above, with particular emphasis on the automated treatment of multi-sweep data sets collected on multi-axis goniostats.
PMCID: PMC3069744  PMID: 21460447
autoPROC; data processing
19.  Molgenis-impute: imputation pipeline in a box 
BMC Research Notes  2015;8:359.
Genotype imputation is an important procedure in current genomic analysis such as genome-wide association studies, meta-analyses and fine mapping. Although high quality tools are available that perform the steps of this process, considerable effort and expertise is required to set up and run a best practice imputation pipeline, particularly for larger genotype datasets, where imputation has to scale out in parallel on computer clusters.
Here we present MOLGENIS-impute, an ‘imputation in a box’ solution that seamlessly and transparently automates the set up and running of all the steps of the imputation process. These steps include genome build liftover (liftovering), genotype phasing with SHAPEIT2, quality control, sample and chromosomal chunking/merging, and imputation with IMPUTE2. MOLGENIS-impute builds on MOLGENIS-compute, a simple pipeline management platform for submission and monitoring of bioinformatics tasks in High Performance Computing (HPC) environments like local/cloud servers, clusters and grids. All the required tools, data and scripts are downloaded and installed in a single step. Researchers with diverse backgrounds and expertise have tested MOLGENIS-impute on different locations and imputed over 30,000 samples so far using the 1,000 Genomes Project and new Genome of the Netherlands data as the imputation reference. The tests have been performed on PBS/SGE clusters, cloud VMs and in a grid HPC environment.
MOLGENIS-impute gives priority to the ease of setting up, configuring and running an imputation. It has minimal dependencies and wraps the pipeline in a simple command line interface, without sacrificing flexibility to adapt or limiting the options of underlying imputation tools. It does not require knowledge of a workflow system or programming, and is targeted at researchers who just want to apply best practices in imputation via simple commands. It is built on the MOLGENIS compute workflow framework to enable customization with additional computational steps or it can be included in other bioinformatics pipelines. It is available as open source from:
Electronic supplementary material
The online version of this article (doi:10.1186/s13104-015-1309-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4541731  PMID: 26286716
Imputation; Genotyping; GWAS
20.  Solution Structure of CCP Modules 10–12 Illuminates Functional Architecture of the Complement Regulator, Factor H 
Journal of Molecular Biology  2012;424(5):295-312.
The 155-kDa plasma glycoprotein factor H (FH), which consists of 20 complement control protein (CCP) modules, protects self-tissue but not foreign organisms from damage by the complement cascade. Protection is achieved by selective engagement of FH, via CCPs 1–4, CCPs 6–8 and CCPs 19–20, with polyanion-rich host surfaces that bear covalently attached, activation-specific, fragments of complement component C3. The role of intervening CCPs 9–18 in this process is obscured by lack of structural knowledge. We have concatenated new high-resolution solution structures of overlapping recombinant CCP pairs, 10–11 and 11–12, to form a three-dimensional structure of CCPs 10–12 and validated it by small-angle X-ray scattering of the recombinant triple‐module fragment. Superimposing CCP 12 of this 10–12 structure with CCP 12 from the previously solved CCP 12–13 structure yielded an S-shaped structure for CCPs 10–13 in which modules are tilted by 80–110° with respect to immediate neighbors, but the bend between CCPs 10 and 11 is counter to the arc traced by CCPs 11–13. Including this four-CCP structure in interpretation of scattering data for the longer recombinant segments, CCPs 10–15 and 8–15, implied flexible attachment of CCPs 8 and 9 to CCP 10 but compact and intimate arrangements of CCP 14 with CCPs 12, 13 and 15. Taken together with difficulties in recombinant production of module pairs 13–14 and 14–15, the aberrant structure of CCP 13 and the variability of 13–14 linker sequences among orthologues, a structural dependency of CCP 14 on its neighbors is suggested; this has implications for the FH mechanism.
Graphical abstract
► The 20-CCP‐module human protein FH prevents complement-mediated tissue damage. ► NMR structures of CCPs 10–11 and 11–12 suggest that this region enhances flexional strength of FH. ► Concatenating bi-modules helps interpret small‐angle X‐ray scattering data, revealing highly compacted arrangement of CCPs 13, 14 and 15. ► Apparent structural dependency of CCP 14 on neighbors could provide a switch between ordered and flexible FH architectures.
PMCID: PMC4068365  PMID: 23017427
CCP, complement control protein; CR1, complement receptor type 1; DAF, decay accelerating factor; FH, factor H; EOM, ensemble optimization method; HSQC, heteronuclear single quantum coherence; MCP, membrane cofactor protein; NOE, nuclear Overhauser enhancement; SAXS, small-angle X-ray scattering; TOCSY, total correlated spectroscopy; protein NMR; protein domains; complement system; small-angle X-ray scattering; regulators of complement activation
21.  A user-friendly workflow for analysis of Illumina gene expression bead array data available at the portal 
BMC Genomics  2015;16(1):482.
Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows.
To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis.
The module accepts data exported from Illumina’s GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of, but can also be downloaded for further downstream analysis in third-party tools.
The Illumina bead arrays analysis module is available at A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1689-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4486126  PMID: 26122086
Microarray; Illumina bead array; Transcriptomics; Data analysis; Normalization; Quality control
22.  Total disc replacement using a tissue-engineered intervertebral disc in vivo: new animal model and initial results  
Study type: Basic science
Introduction: Chronic back pain due to degenerative disc disease (DDD) is among the most important medical conditions causing morbidity and significant health care costs. Surgical treatment options include disc replacement or fusion surgery, but are associated with significant short- and long-term risks.1 Biological tissue-engineering of human intervertebral discs (IVD) could offer an important alternative.2 Recent in vitro data from our group have shown successful engineering and growth of ovine intervertebral disc composites with circumferentially aligned collagen fibrils in the annulus fibrosus (AF) (Figure 1).3
Tissue-engineered composite disc a Experimental steps to generate composite tissue-engineered IVDs3 b Example of different AF formulations on collagen alignment in the AF. Second harmonic generation and two-photon excited fluorescence images of seeded collagen gels (for AF) of 1 and 2.5 mg/ml over time. At seeding, cells and collagen were homogenously distributed in the gels. Over time, AF cells elongated and collagen aligned parallel to cells. Less contraction and less alignment is noted after 3 days in the 2.5 mg/mL gel. c Imaging-based creation of a virtual disc model that will serve as template for the engineered disc. Total disc dimensions (AF and NP) were retrieved from micro-computer tomography (CT) (left images), and nucleus pulposus (NP) dimensions alone were retrieved from T2-weighted MRI images (right images). Merging of MRI and micro-CT models revealed a composite disc model (middle image)—Software: Microview, GE Healthcare Inc., Princeton, NJ; and slicOmatic v4.3, TomoVision, Montreal, Canada. d Flow chart describing the process for generating multi-lamellar tissue engineered IVDs. IVDs are produced by allowing cell-seeded collagen layers to contract around a cell-seeded alginate core (NP) over time
Objective: The next step is to investigate if biological disc implants survive, integrate, and restore function to the spine in vivo. A model will be developed that allows efficient in vivo testing of tissue-engineered discs of various compositions and characteristics.
Methods: Athymic rats were anesthetized and a dorsal approach was chosen to perform a microsurgical discectomy in the rat caudal spine (Fig. 2,Fig. 3). Control group I (n = 6) underwent discectomy only, Control group II (n = 6) underwent discectomy, followed by reimplantation of the autologous disc. Two treatment groups (group III, n = 6, 1 month survival; group IV, n = 6, 6 months survival) received a tissue-engineered composite disc implant. The rodents were followed clinically for signs of infection, pain level and wound healing. X-rays and magnetic resonance imaging (MRI) were assessed postoperatively and up to 6 months after surgery (Fig. 6,Fig. 7). A 7 Tesla MRI (Bruker) was implemented for assessment of the operated level as well as the adjacent disc (hydration). T2-weighted sequences were interpreted by a semiquantitative score (0 = no signal, 1 = weak signal, 2 = strong signal and anatomical features of a normal disc). Histology was performed with staining for proteoglycans (Alcian blue) and collagen (Picrosirius red) (Fig. 4,Fig. 5).
Disc replacement surgery a Operative situs with native disc that has been disassociated from both adjacent vertebrae b Native disc (left) and tissue-engineered implant (right) c Implant in situ before wound closureAF: Annulus fi brosus, nP: nucleus pulposus, eP: endplate, M: Muscle, T: Tendon, s: skin, art: artery, GP: Growth plate, B: Bone
Disc replacement surgery. Anatomy of the rat caudal disc space a Pircrosirius red stained axial cut of native disc space b Saffranin-O stained sagittal cut of native disc space
Histologies of three separate motion segments from three different rats. Animal one = native IVD, Animal two = status after discectomy, Animal three = tissue-engineered implant (1 month) a–c H&E (overall tissue staining for light micrsocopy) d–f Alcian blue (proteoglycans) g–i Picrosirius red (collagen I and II)
Histology from one motion segment four months after implantation of a bio-engineered disc construct a Picrosirius red staining (collagen) b Polarized light microscopy showing collagen staining and collagen organization in AF region c Increased Safranin-O staining (proteoglycans) in NP region of the disc implant d Higher magnification of figure 5c: Integration between implanted tissue-engineered total disc replacement and vertebral body bone
MRI a Disc space height measurements in flash/T1 sequence (top: implant (714.0 micrometer), bottom: native disc (823.5 micrometer) b T2 sequence, red circle surrounding the implant NP
7 Tesla MRI imaging of rat tail IVDs showing axial images (preliminary pilot data) a Diffusion tensor imaging (DTI) on two explanted rat tail discs in Formalin b Higher magnification of a, showing directional alignment of collagen fibers (red and green) when compared to the color ball on top which maps fibers' directional alignment (eg, fibers directing from left to right: red, from top to bottom: blue) c Native IVD in vivo (successful imaging of top and bottom of the IVD (red) d Gradient echo sequence (GE) showing differentiation between NP (light grey) and AF (dark margin) e GE of reimplanted tail IVD at the explantation level f T1Rho sequence demonstrating the NP (grey) within the AF (dark margin), containing the yellow marked region of interest for value acquisition (preliminary data are consistent with values reported in the literature). g T2 image of native IVD in vivo for monitoring of hydration (white: NP)
Results: The model allowed reproducible and complete discectomies as well as disc implantation in the rat tail spine without any surgical or postoperative complications. Discectomy resulted in immediate collapse of the disc space. Preliminary results indicate that disc space height was maintained after disc implantation in groups II, III and IV over time. MRI revealed high resolution images of normal intervertebral discs in vivo. Eight out of twelve animals (groups III and IV) showed a positive signal in T2-weighted images after 1 month (grade 0 = 4, grade 1 = 4, grade 2 = 4). Positive staining was seen for collagen as well as proteoglycans at the site of disc implantation after 1 month in each of the six animals with engineered implants (group III). Analysis of group IV showed positive T2 signal in five out of six animals and disc-height preservation in all animals after 6 months.
Conclusions: This study demonstrates for the first time that tissue-engineered composite IVDs with circumferentially aligned collagen fibrils survive and integrate with surrounding vertebral bodies when placed in the rat spine for up to 6 months. Tissue-engineered composite IVDs restored function to the rat spine as indicated by maintenance of disc height and vertebral alignment. A significant finding was that maintenance of the composite structure in group III was observed, with increased proteoglycan staining in the nucleus pulposus region (Figure 4d–f). Proteoglycan and collagen matrix as well as disc height preservation and positive T2 signals in MRI are promising parameters and indicate functionality of the implants.
PMCID: PMC3623095  PMID: 23637671
23.  Automatically visualise and analyse data on pathways using PathVisioRPC from any programming environment 
BMC Bioinformatics  2015;16(1):267.
Biological pathways are descriptive diagrams of biological processes widely used for functional analysis of differentially expressed genes or proteins. Primary data analysis, such as quality control, normalisation, and statistical analysis, is often performed in scripting languages like R, Perl, and Python. Subsequent pathway analysis is usually performed using dedicated external applications. Workflows involving manual use of multiple environments are time consuming and error prone. Therefore, tools are needed that enable pathway analysis directly within the same scripting languages used for primary data analyses. Existing tools have limited capability in terms of available pathway content, pathway editing and visualisation options, and export file formats. Consequently, making the full-fledged pathway analysis tool PathVisio available from various scripting languages will benefit researchers.
We developed PathVisioRPC, an XMLRPC interface for the pathway analysis software PathVisio. PathVisioRPC enables creating and editing biological pathways, visualising data on pathways, performing pathway statistics, and exporting results in several image formats in multiple programming environments.
We demonstrate PathVisioRPC functionalities using examples in Python. Subsequently, we analyse a publicly available NCBI GEO gene expression dataset studying tumour bearing mice treated with cyclophosphamide in R. The R scripts demonstrate how calls to existing R packages for data processing and calls to PathVisioRPC can directly work together. To further support R users, we have created RPathVisio simplifying the use of PathVisioRPC in this environment. We have also created a pathway module for the microarray data analysis portal that calls the PathVisioRPC interface to perform pathway analysis. This module allows users to use PathVisio functionality online without having to download and install the software and exemplifies how the PathVisioRPC interface can be used by data analysis pipelines for functional analysis of processed genomics data.
PathVisioRPC enables data visualisation and pathway analysis directly from within various analytical environments used for preliminary analyses. It supports the use of existing pathways from WikiPathways or pathways created using the RPC itself. It also enables automation of tasks performed using PathVisio, making it useful to PathVisio users performing repeated visualisation and analysis tasks. PathVisioRPC is freely available for academic and commercial use at
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0708-8) contains supplementary material, which is available to authorized users.
PMCID: PMC4546821  PMID: 26298294
Automation; Biological pathways; Data visualisation; Multi-omics; Pathway analysis; Pathway building; R package; Workflow integration
24.  A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing 
Nucleic Acids Research  2010;38(17):e171.
Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism.
PMCID: PMC2943614  PMID: 20682560
25.  AutoDrug: fully automated macromolecular crystallography workflows for fragment-based drug discovery 
New software has been developed for automating the experimental and data-processing stages of fragment-based drug discovery at a macromolecular crystallography beamline. A new workflow-automation framework orchestrates beamline-control and data-analysis software while organizing results from multiple samples.
AutoDrug is software based upon the scientific workflow paradigm that integrates the Stanford Synchrotron Radiation Lightsource macromolecular crystallography beamlines and third-party processing software to automate the crystallo­graphy steps of the fragment-based drug-discovery process. AutoDrug screens a cassette of fragment-soaked crystals, selects crystals for data collection based on screening results and user-specified criteria and determines optimal data-collection strategies. It then collects and processes diffraction data, performs molecular replacement using provided models and detects electron density that is likely to arise from bound fragments. All processes are fully automated, i.e. are performed without user interaction or supervision. Samples can be screened in groups corresponding to particular proteins, crystal forms and/or soaking conditions. A single AutoDrug run is only limited by the capacity of the sample-storage dewar at the beamline: currently 288 samples. AutoDrug was developed in conjunction with RestFlow, a new scientific workflow-automation framework. RestFlow simplifies the design of AutoDrug by managing the flow of data and the organization of results and by orchestrating the execution of computational pipeline steps. It also simplifies the execution and interaction of third-party programs and the beamline-control system. Modeling AutoDrug as a scientific workflow enables multiple variants that meet the requirements of different user groups to be developed and supported. A workflow tailored to mimic the crystallography stages comprising the drug-discovery pipeline of CoCrystal Discovery Inc. has been deployed and successfully demonstrated. This workflow was run once on the same 96 samples that the group had examined manually and the workflow cycled successfully through all of the samples, collected data from the same samples that were selected manually and located the same peaks of unmodeled density in the resulting difference Fourier maps.
PMCID: PMC3640469  PMID: 23633588
AutoDrug; fragment-based drug discovery; workflow automation

Results 1-25 (788341)