PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (489261)

Clipboard (0)
None

Related Articles

1.  Software platform virtualization in chemistry research and university teaching 
Background
Modern chemistry laboratories operate with a wide range of software applications under different operating systems, such as Windows, LINUX or Mac OS X. Instead of installing software on different computers it is possible to install those applications on a single computer using Virtual Machine software. Software platform virtualization allows a single guest operating system to execute multiple other operating systems on the same computer. We apply and discuss the use of virtual machines in chemistry research and teaching laboratories.
Results
Virtual machines are commonly used for cheminformatics software development and testing. Benchmarking multiple chemistry software packages we have confirmed that the computational speed penalty for using virtual machines is low and around 5% to 10%. Software virtualization in a teaching environment allows faster deployment and easy use of commercial and open source software in hands-on computer teaching labs.
Conclusion
Software virtualization in chemistry, mass spectrometry and cheminformatics is needed for software testing and development of software for different operating systems. In order to obtain maximum performance the virtualization software should be multi-core enabled and allow the use of multiprocessor configurations in the virtual machine environment. Server consolidation, by running multiple tasks and operating systems on a single physical machine, can lead to lower maintenance and hardware costs especially in small research labs. The use of virtual machines can prevent software virus infections and security breaches when used as a sandbox system for internet access and software testing. Complex software setups can be created with virtual machines and are easily deployed later to multiple computers for hands-on teaching classes. We discuss the popularity of bioinformatics compared to cheminformatics as well as the missing cheminformatics education at universities worldwide.
doi:10.1186/1758-2946-1-18
PMCID: PMC2820496  PMID: 20150997
2.  A comparative analysis of dynamic grids vs. virtual grids using the A3pviGrid framework 
Bioinformation  2010;5(5):186-190.
With the proliferation of Quad/Multi-core micro-processors in mainstream platforms such as desktops and workstations; a large number of unused CPU cycles can be utilized for running virtual machines (VMs) as dynamic nodes in distributed environments. Grid services and its service oriented business broker now termed cloud computing could deploy image based virtualization platforms enabling agent based resource management and dynamic fault management. In this paper we present an efficient way of utilizing heterogeneous virtual machines on idle desktops as an environment for consumption of high performance grid services. Spurious and exponential increases in the size of the datasets are constant concerns in medical and pharmaceutical industries due to the constant discovery and publication of large sequence databases. Traditional algorithms are not modeled at handing large data sizes under sudden and dynamic changes in the execution environment as previously discussed. This research was undertaken to compare our previous results with running the same test dataset with that of a virtual Grid platform using virtual machines (Virtualization). The implemented architecture, A3pviGrid utilizes game theoretic optimization and agent based team formation (Coalition) algorithms to improve upon scalability with respect to team formation. Due to the dynamic nature of distributed systems (as discussed in our previous work) all interactions were made local within a team transparently. This paper is a proof of concept of an experimental mini-Grid test-bed compared to running the platform on local virtual machines on a local test cluster. This was done to give every agent its own execution platform enabling anonymity and better control of the dynamic environmental parameters. We also analyze performance and scalability of Blast in a multiple virtual node setup and present our findings. This paper is an extension of our previous research on improving the BLAST application framework using dynamic Grids on virtualization platforms such as the virtual box.
PMCID: PMC3040497  PMID: 21364795
Agents; Blast; Coalition; Grids; Virtual Machines and Virtualization
3.  Free Factories: Unified Infrastructure for Data Intensive Web Services 
We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center.
PMCID: PMC2877279  PMID: 20514356
4.  Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community 
BMC Bioinformatics  2012;13:42.
Background
A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure.
Results
Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds.
Conclusions
Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
doi:10.1186/1471-2105-13-42
PMCID: PMC3372431  PMID: 22429538
5.  TiArA: A Virtual Appliance for the Analysis of Tiling Array Data 
PLoS ONE  2010;5(4):e9993.
Background
Genomic tiling arrays have been described in the scientific literature since 2003, yet there is a shortage of user-friendly applications available for their analysis.
Methodology/Principal Findings
Tiling Array Analyzer (TiArA) is a software program that provides a user-friendly graphical interface for the background subtraction, normalization, and summarization of data acquired through the Affymetrix tiling array platform. The background signal is empirically measured using a group of nonspecific probes with varying levels of GC content and normalization is performed to enforce a common dynamic range.
Conclusions/Significance
TiArA is implemented as a standalone program for Linux systems and is available as a cross-platform virtual machine that will run under most modern operating systems using virtualization software such as Sun VirtualBox or VMware. The software is available as a Debian package or a virtual appliance at http://purl.org/NET/tiara.
doi:10.1371/journal.pone.0009993
PMCID: PMC2848623  PMID: 20376318
6.  Design and Analysis of Self-Adapted Task Scheduling Strategies in Wireless Sensor Networks 
Sensors (Basel, Switzerland)  2011;11(7):6533-6554.
In a wireless sensor network (WSN), the usage of resources is usually highly related to the execution of tasks which consume a certain amount of computing and communication bandwidth. Parallel processing among sensors is a promising solution to provide the demanded computation capacity in WSNs. Task allocation and scheduling is a typical problem in the area of high performance computing. Although task allocation and scheduling in wired processor networks has been well studied in the past, their counterparts for WSNs remain largely unexplored. Existing traditional high performance computing solutions cannot be directly implemented in WSNs due to the limitations of WSNs such as limited resource availability and the shared communication medium. In this paper, a self-adapted task scheduling strategy for WSNs is presented. First, a multi-agent-based architecture for WSNs is proposed and a mathematical model of dynamic alliance is constructed for the task allocation problem. Then an effective discrete particle swarm optimization (PSO) algorithm for the dynamic alliance (DPSO-DA) with a well-designed particle position code and fitness function is proposed. A mutation operator which can effectively improve the algorithm’s ability of global search and population diversity is also introduced in this algorithm. Finally, the simulation results show that the proposed solution can achieve significant better performance than other algorithms.
doi:10.3390/s110706533
PMCID: PMC3231676  PMID: 22163971
wireless sensor networks; task scheduling; particle swarm optimization; dynamic alliance
7.  SaTScan on a Cloud: On-Demand Large Scale Spatial Analysis of Epidemics 
Online Journal of Public Health Informatics  2010;2(1):ojphi.v2i1.2910.
By using cloud computing it is possible to provide on- demand resources for epidemic analysis using computer intensive applications like SaTScan. Using 15 virtual machines (VM) on the Nimbus cloud we were able to reduce the total execution time for the same ensemble run from 8896 seconds in a single machine to 842 seconds in the cloud. Using the caBIG tools and our iterative software development methodology the time required to complete the implementation of the SaTScan cloud system took approximately 200 man-hours, which represents an effort that can be secured within the resources available at State Health Departments. The approach proposed here is technically advantageous and practically possible.
doi:10.5210/ojphi.v2i1.2910
PMCID: PMC3615753  PMID: 23569576
8.  Power Consumption Analysis of Operating Systems for Wireless Sensor Networks 
Sensors (Basel, Switzerland)  2010;10(6):5809-5826.
In this paper four wireless sensor network operating systems are compared in terms of power consumption. The analysis takes into account the most common operating systems—TinyOS v1.0, TinyOS v2.0, Mantis and Contiki—running on Tmote Sky and MICAz devices. With the objective of ensuring a fair evaluation, a benchmark composed of four applications has been developed, covering the most typical tasks that a Wireless Sensor Network performs. The results show the instant and average current consumption of the devices during the execution of these applications. The experimental measurements provide a good insight into the power mode in which the device components are running at every moment, and they can be used to compare the performance of different operating systems executing the same tasks.
doi:10.3390/s100605809
PMCID: PMC3247733  PMID: 22219688
wireless sensor network operating systems; TinyOS; Mantis; Contiki; MICAz; Tmote
9.  THE VIRTUAL INSTRUMENT: SUPPORT FOR GRID-ENABLED MCELL SIMULATIONS 
Ensembles of widely distributed, heterogeneous resources, or Grids, have emerged as popular platforms for large-scale scientific applications. In this paper we present the Virtual Instrument project, which provides an integrated application execution environment that enables end-users to run and interact with running scientific simulations on Grids. This work is performed in the specific context of MCell, a computational biology application. While MCell provides the basis for running simulations, its capabilities are currently limited in terms of scale, ease-of-use, and interactivity. These limitations preclude usage scenarios that are critical for scientific advances. Our goal is to create a scientific “Virtual Instrument” from MCell by allowing its users to transparently access Grid resources while being able to steer running simulations. In this paper, we motivate the Virtual Instrument project and discuss a number of relevant issues and accomplishments in the area of Grid software development and application scheduling. We then describe our software design and report on the current implementation. We verify and evaluate our design via experiments with MCell on a real-world Grid testbed.
doi:10.1177/1094342004041290
PMCID: PMC2916200  PMID: 20689618
grid computing; computational neuroscience
10.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing 
BMC Bioinformatics  2011;12:356.
Background
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software.
Results
We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms.
Conclusion
The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
doi:10.1186/1471-2105-12-356
PMCID: PMC3228541  PMID: 21878105
11.  Non-synonymous variations in cancer and their effects on the human proteome: workflow for NGS data biocuration and proteome-wide analysis of TCGA data 
BMC Bioinformatics  2014;15:28.
Background
Next-generation sequencing (NGS) technologies have resulted in petabytes of scattered data, decentralized in archives, databases and sometimes in isolated hard-disks which are inaccessible for browsing and analysis. It is expected that curated secondary databases will help organize some of this Big Data thereby allowing users better navigate, search and compute on it.
Results
To address the above challenge, we have implemented a NGS biocuration workflow and are analyzing short read sequences and associated metadata from cancer patients to better understand the human variome. Curation of variation and other related information from control (normal tissue) and case (tumor) samples will provide comprehensive background information that can be used in genomic medicine research and application studies. Our approach includes a CloudBioLinux Virtual Machine which is used upstream of an integrated High-performance Integrated Virtual Environment (HIVE) that encapsulates Curated Short Read archive (CSR) and a proteome-wide variation effect analysis tool (SNVDis). As a proof-of-concept, we have curated and analyzed control and case breast cancer datasets from the NCI cancer genomics program - The Cancer Genome Atlas (TCGA). Our efforts include reviewing and recording in CSR available clinical information on patients, mapping of the reads to the reference followed by identification of non-synonymous Single Nucleotide Variations (nsSNVs) and integrating the data with tools that allow analysis of effect nsSNVs on the human proteome. Furthermore, we have also developed a novel phylogenetic analysis algorithm that uses SNV positions and can be used to classify the patient population. The workflow described here lays the foundation for analysis of short read sequence data to identify rare and novel SNVs that are not present in dbSNP and therefore provides a more comprehensive understanding of the human variome. Variation results for single genes as well as the entire study are available from the CSR website (http://hive.biochemistry.gwu.edu/dna.cgi?cmd=csr).
Conclusions
Availability of thousands of sequenced samples from patients provides a rich repository of sequence information that can be utilized to identify individual level SNVs and their effect on the human proteome beyond what the dbSNP database provides.
doi:10.1186/1471-2105-15-28
PMCID: PMC3916084  PMID: 24467687
SRA; TCGA; nsSNV; SNV; SNP; Next-gen; NGS; Phylogenetics; Cancer
12.  MOLA: a bootable, self-configuring system for virtual screening using AutoDock4/Vina on computer clusters 
Background
Virtual screening of small molecules using molecular docking has become an important tool in drug discovery. However, large scale virtual screening is time demanding and usually requires dedicated computer clusters. There are a number of software tools that perform virtual screening using AutoDock4 but they require access to dedicated Linux computer clusters. Also no software is available for performing virtual screening with Vina using computer clusters. In this paper we present MOLA, an easy-to-use graphical user interface tool that automates parallel virtual screening using AutoDock4 and/or Vina in bootable non-dedicated computer clusters.
Implementation
MOLA automates several tasks including: ligand preparation, parallel AutoDock4/Vina jobs distribution and result analysis. When the virtual screening project finishes, an open-office spreadsheet file opens with the ligands ranked by binding energy and distance to the active site. All results files can automatically be recorded on an USB-flash drive or on the hard-disk drive using VirtualBox. MOLA works inside a customized Live CD GNU/Linux operating system, developed by us, that bypass the original operating system installed on the computers used in the cluster. This operating system boots from a CD on the master node and then clusters other computers as slave nodes via ethernet connections.
Conclusion
MOLA is an ideal virtual screening tool for non-experienced users, with a limited number of multi-platform heterogeneous computers available and no access to dedicated Linux computer clusters. When a virtual screening project finishes, the computers can just be restarted to their original operating system. The originality of MOLA lies on the fact that, any platform-independent computer available can he added to the cluster, without ever using the computer hard-disk drive and without interfering with the installed operating system. With a cluster of 10 processors, and a potential maximum speed-up of 10x, the parallel algorithm of MOLA performed with a speed-up of 8,64× using AutoDock4 and 8,60× using Vina.
doi:10.1186/1758-2946-2-10
PMCID: PMC2987878  PMID: 21029419
13.  A Hybrid OFDM-TDM Architecture with Decentralized Dynamic Bandwidth Allocation for PONs 
The Scientific World Journal  2013;2013:561984.
One of the major challenges of passive optical networks is to achieve a fair arbitration mechanism that will prevent possible collisions from occurring at the upstream channel when multiple users attempt to access the common fiber at the same time. Therefore, in this study we mainly focus on fair bandwidth allocation among users, and present a hybrid Orthogonal Frequency Division Multiplexed/Time Division Multiplexed architecture with a dynamic bandwidth allocation scheme that provides satisfying service qualities to the users depending on their varying bandwidth requirements. Unnecessary delays in centralized schemes occurring during bandwidth assignment stage are eliminated by utilizing a decentralized approach. Instead of sending bandwidth demands to the optical line terminal (OLT) which is the only competent authority, each optical network unit (ONU) runs the same bandwidth demand determination algorithm. ONUs inform each other via signaling channel about the status of their queues. This information is fed to the bandwidth determination algorithm which is run by each ONU in a distributed manner. Furthermore, Light Load Penalty, which is a phenomenon in optical communications, is mitigated by limiting the amount of bandwidth that an ONU can demand.
doi:10.1155/2013/561984
PMCID: PMC3806392  PMID: 24194684
14.  The Influence Relevance Voter: An Accurate And Interpretable Virtual High Throughput Screening Method 
Given activity training data from Hight-Throughput Screening (HTS) experiments, virtual High-Throughput Screening (vHTS) methods aim to predict in silico the activity of untested chemicals. We present a novel method, the Influence Relevance Voter (IRV), specifically tailored for the vHTS task. The IRV is a low-parameter neural network which refines a k-nearest neighbor classifier by non-linearly combining the influences of a chemical's neighbors in the training set. Influences are decomposed, also non-linearly, into a relevance component and a vote component.
The IRV is benchmarked using the data and rules of two large, open, competitions, and its performance compared to the performance of other participating methods, as well as of an in-house Support Vector Machine (SVM) method. On these benchmark datasets, IRV achieves state-of-the-art results, comparable to the SVM in one case, and significantly better than the SVM in the other, retrieving three times as many actives in the top 1% of its prediction-sorted list.
The IRV presents several other important advantages over SVMs and other methods: (1) the output predictions have a probabilistic semantic; (2) the underlying inferences are interpretable; (3) the training time is very short, on the order of minutes even for very large data sets; (4) the risk of overfitting is minimal, due to the small number of free parameters; and (5) additional information can easily be incorporated into the IRV architecture. Combined with its performance, these qualities make the IRV particularly well suited for vHTS.
doi:10.1021/ci8004379
PMCID: PMC2750043  PMID: 19391629
15.  NeuReal: an interactive simulation system for implementing artificial dendrites and large hybrid networks 
Journal of neuroscience methods  2007;169(2):290-301.
The dynamic clamp is a technique which allows the introduction of artificial conductances into living cells. Up to now, this technique has been mainly used to add small numbers of ‘virtual’ ion channels to real cells or to construct small hybrid neuronal circuits. In this paper we describe a prototype computer system, NeuReal, that extends the dynamic clamp technique to include i) the attachment of artificial dendritic structures consisting of multiple compartments and ii) the construction of large hybrid networks comprising several hundred biophysically realistic modelled neurons. NeuReal is a fully interactive system that runs on Windows XP, is written in a combination of C++ and assembler, and uses the Microsoft DirectX application programming interface (API) to achieve high-performance graphics. By using the sampling hardware-based representation of membrane potential at all stages of computation and by employing simple look-up tables, NeuReal can simulate over 1000 independent Hodgkin and Huxley type conductances in real-time on a modern personal computer (PC). In addition, whilst not being a hard real-time system, NeuReal still offers reliable performance and tolerable jitter levels up to an update rate of 50 kHz. A key feature of NeuReal is that rather than being a simple dedicated dynamic clamp, it operates as a fast simulation system within which neurons can be specified as either real or simulated. We demonstrate the power of NeuReal with several example experiments and argue that it provides an effective tool for examining various aspects of neuronal function.
doi:10.1016/j.jneumeth.2007.10.014
PMCID: PMC3017968  PMID: 18067972
Dynamic Clamp; Thalamus; Oscillations; Computer Simulation; Gap Junctions
16.  A Virtual Sensor for Online Fault Detection of Multitooth-Tools 
Sensors (Basel, Switzerland)  2011;11(3):2773-2795.
The installation of suitable sensors close to the tool tip on milling centres is not possible in industrial environments. It is therefore necessary to design virtual sensors for these machines to perform online fault detection in many industrial tasks. This paper presents a virtual sensor for online fault detection of multitooth tools based on a Bayesian classifier. The device that performs this task applies mathematical models that function in conjunction with physical sensors. Only two experimental variables are collected from the milling centre that performs the machining operations: the electrical power consumption of the feed drive and the time required for machining each workpiece. The task of achieving reliable signals from a milling process is especially complex when multitooth tools are used, because each kind of cutting insert in the milling centre only works on each workpiece during a certain time window. Great effort has gone into designing a robust virtual sensor that can avoid re-calibration due to, e.g., maintenance operations. The virtual sensor developed as a result of this research is successfully validated under real conditions on a milling centre used for the mass production of automobile engine crankshafts. Recognition accuracy, calculated with a k-fold cross validation, had on average 0.957 of true positives and 0.986 of true negatives. Moreover, measured accuracy was 98%, which suggests that the virtual sensor correctly identifies new cases.
doi:10.3390/s110302773
PMCID: PMC3231587  PMID: 22163766
virtual sensor; Bayesian classifier; industrial applications; tool condition monitoring; multitooth-tools
17.  Parallelization of multicategory support vector machines (PMC-SVM) for classifying microarray data 
BMC Bioinformatics  2006;7(Suppl 4):S15.
Background
Multicategory Support Vector Machines (MC-SVM) are powerful classification systems with excellent performance in a variety of data classification problems. Since the process of generating models in traditional multicategory support vector machines for large datasets is very computationally intensive, there is a need to improve the performance using high performance computing techniques.
Results
In this paper, Parallel Multicategory Support Vector Machines (PMC-SVM) have been developed based on the sequential minimum optimization-type decomposition method for support vector machines (SMO-SVM). It was implemented in parallel using MPI and C++ libraries and executed on both shared memory supercomputer and Linux cluster for multicategory classification of microarray data. PMC-SVM has been analyzed and evaluated using four microarray datasets with multiple diagnostic categories, such as different cancer types and normal tissue types.
Conclusion
The experiments show that the PMC-SVM can significantly improve the performance of classification of microarray data without loss of accuracy, compared with previous work.
doi:10.1186/1471-2105-7-S4-S15
PMCID: PMC1780126  PMID: 17217507
18.  Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube. 
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
PMCID: PMC2247563  PMID: 1807632
19.  Simulated Medical Learning Environments on the Internet 
Learning anatomy and surgical procedures requires both a conceptual understanding of three-dimensional anatomy and a hands-on manipulation of tools and tissue. Such virtual resources are not available widely, are expensive, and may be culturally disallowed. Simulation technology, using high-performance computers and graphics, permits realistic real-time display of anatomy. Haptics technology supports the ability to probe and feel this virtual anatomy through the use of virtual tools. The Internet permits world-wide access to resources. We have brought together high-performance servers and high-bandwidth communication using the Next Generation Internet and complex bimanual haptics to simulate a tool-based learning environment for wide use. This article presents the technologic basis of this environment and some evaluation of its use in the gross anatomy course at Stanford University.
doi:10.1197/jamia.M1089
PMCID: PMC346631  PMID: 12223496
20.  RseqFlow: workflows for RNA-Seq data analysis 
Bioinformatics  2011;27(18):2598-2600.
Summary: We have developed an RNA-Seq analysis workflow for single-ended Illumina reads, termed RseqFlow. This workflow includes a set of analytic functions, such as quality control for sequencing data, signal tracks of mapped reads, calculation of expression levels, identification of differentially expressed genes and coding SNPs calling. This workflow is formalized and managed by the Pegasus Workflow Management System, which maps the analysis modules onto available computational resources, automatically executes the steps in the appropriate order and supervises the whole running process. RseqFlow is available as a Virtual Machine with all the necessary software, which eliminates any complex configuration and installation steps.
Availability and implementation: http://genomics.isi.edu/rnaseq
Contact: wangying@xmu.edu.cn; knowles@med.usc.edu; deelman@isi.edu; tingchen@usc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr441
PMCID: PMC3167055  PMID: 21795323
21.  A Cloud-Based Simulation Architecture for Pandemic Influenza Simulation 
High-fidelity simulations of pandemic outbreaks are resource consuming. Cluster-based solutions have been suggested for executing such complex computations. We present a cloud-based simulation architecture that utilizes computing resources both locally available and dynamically rented online. The approach uses the Condor framework for job distribution and management of the Amazon Elastic Computing Cloud (EC2) as well as local resources. The architecture has a web-based user interface that allows users to monitor and control simulation execution. In a benchmark test, the best cost-adjusted performance was recorded for the EC2 H-CPU Medium instance, while a field trial showed that the job configuration had significant influence on the execution time and that the network capacity of the master node could become a bottleneck. We conclude that it is possible to develop a scalable simulation environment that uses cloud-based solutions, while providing an easy-to-use graphical user interface.
PMCID: PMC3243184  PMID: 22195089
22.  A Biomimetic Adaptive Algorithm and Low-Power Architecture for Implantable Neural Decoders 
Algorithmically and energetically efficient computational architectures that operate in real time are essential for clinically useful neural prosthetic devices. Such devices decode raw neural data to obtain direct control signals for external devices. They can also perform data compression and vastly reduce the bandwidth and consequently power expended in wireless transmission of raw data from implantable brain-machine interfaces. We describe a biomimetic algorithm and micropower analog circuit architecture for decoding neural cell ensemble signals. The decoding algorithm implements a continuous-time artificial neural network, using a bank of adaptive linear filters with kernels that emulate synaptic dynamics. The filters transform neural signal inputs into control-parameter outputs, and can be tuned automatically in an on-line learning process. We provide experimental validation of our system using neural data from thalamic head-direction cells in an awake behaving rat.
doi:10.1109/IEMBS.2009.5333793
PMCID: PMC2905462  PMID: 19964345
Brain-machine interface; Neural decoding; Biomimetic; Adaptive algorithms; Analog; Low-power
23.  Not all physical errors can be linear CPTP maps in a correlation space 
Scientific Reports  2012;2:508.
In the framework of quantum computational tensor network, which is a general framework of measurement-based quantum computation, the resource many-body state is represented in a tensor-network form (or a matrix-product form), and universal quantum computation is performed in a virtual linear space, which is called a correlation space, where tensors live. Since any unitary operation, state preparation, and the projection measurement in the computational basis can be simulated in a correlation space, it is natural to expect that fault-tolerant quantum circuits can also be simulated in a correlation space. However, we point out that not all physical errors on physical qudits appear as linear completely-positive trace-preserving errors in a correlation space. Since the theories of fault-tolerant quantum circuits known so far assume such noises, this means that the simulation of fault-tolerant quantum circuits in a correlation space is not so straightforward for general resource states.
doi:10.1038/srep00508
PMCID: PMC3396152  PMID: 22798985
24.  Simple re-instantiation of small databases using cloud computing 
BMC Genomics  2013;14(Suppl 5):S13.
Background
Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress.
Results
We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases.
Conclusions
Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.
doi:10.1186/1471-2164-14-S5-S13
PMCID: PMC3852246  PMID: 24564380
Database archival; Re-instantiation; Cloud computing; BioSLAX; biodb100; MIABi
25.  Effects of system tuning and RAM disk on the performance of a clinical laboratory information system 
Improvements in the performance of a laboratory computer system do not necessarily require the replacement of major portions of the system and may not require the acquisition of any hardware at all. Major bottlenecks may exist in the ways that the operating system manages its resources and the algorithm used for timesharing decisions. Moreover, significant throughput improvements may be attainable by switching to a faster storage device if substantial disk activity is performed. In this study the fractions of time used for each of the types of tasks a laboratory computer system performs (e.g. applications programs, disk transfer, queue cycler) are defined and measured. Methods for reducing the time fractions of the various types of overhead are evaluated by doing before and after studies. The combined results of the three studies indicated that a 50% improvement could be gained through system tuning and faster storage without replacement of the computer itself
doi:10.1155/S146392468900026X
PMCID: PMC2547769  PMID: 18925220

Results 1-25 (489261)