An adaptive Cartesian grid (ACG) concept is presented for the fast and robust numerical solution of the 3D Poisson-Boltzmann Equation (PBE) governing the electrostatic interactions of large-scale biomolecules and highly charged multi-biomolecular assemblies such as ribosomes and viruses. The ACG offers numerous advantages over competing grid topologies such as regular 3D lattices and unstructured grids. For very large biological molecules and multi-biomolecule assemblies, the total number of grid-points is several orders of magnitude less than that required in a conventional lattice grid used in the current PBE solvers thus allowing the end user to obtain accurate and stable nonlinear PBE solutions on a desktop computer. Compared to tetrahedral-based unstructured grids, ACG offers a simpler hierarchical grid structure, which is naturally suited to multigrid, relieves indirect addressing requirements and uses fewer neighboring nodes in the finite difference stencils. Construction of the ACG and determination of the dielectric/ionic maps are straightforward, fast and require minimal user intervention. Charge singularities are eliminated by reformulating the problem to produce the reaction field potential in the molecular interior and the total electrostatic potential in the exterior ionic solvent region. This approach minimizes grid-dependency and alleviates the need for fine grid spacing near atomic charge sites. The technical portion of this paper contains three parts. First, the ACG and its construction for general biomolecular geometries are described. Next, a discrete approximation to the PBE upon this mesh is derived. Finally, the overall solution procedure and multigrid implementation are summarized. Results obtained with the ACG-based PBE solver are presented for: (i) a low dielectric spherical cavity, containing interior point charges, embedded in a high dielectric ionic solvent – analytical solutions are available for this case, thus allowing rigorous assessment of the solution accuracy; (ii) a pair of low dielectric charged spheres embedded in a ionic solvent to compute electrostatic interaction free energies as a function of the distance between sphere centers; (iii) surface potentials of proteins, nucleic acids and their larger-scale assemblies such as ribosomes; and (iv) electrostatic solvation free energies and their salt sensitivities – obtained with both linear and nonlinear Poisson-Boltzmann equation – for a large set of proteins. These latter results along with timings can serve as benchmarks for comparing the performance of different PBE solvers.
Poisson-Boltzmann equation; biomolecular electrostatics; implicit solvent model; algorithm; finite difference methods; Cartesian grid; adaptive; electrostatic potential
1. When there is projected on the retina (man, monocularly) the shadow of a grid which forms a visual field in several distinct pieces (not including the fovea in the present tests), the ordinary properties of the flicker recognition contour (F vs. log I) as a function of the light-time cycle fraction (tL) can be markedly disturbed. In the present experiments flicker was produced by the rotation of a cylinder with opaque vertical stripes. In the absence of such a grid shadow the "cone" segments of the contours form a set in which Fmax. and the abscissa of inflection are opposite but rectilinear functions of tL, while the third parameter of the probability integral (σ'log I) remains constant. This is the case also with diverse other animals tested. In the data with the grid, however, analysis shows that even for low values of tL (up to 0.50) there occurs an enhancement of the production of elements of neural effect, so that Fmax. rises rather than falls as ordinarily with increase of tL, although σ'log I stays constant and hence the total number of acting units is presumed not to change. This constitutes valid evidence for neural integration of effects due to the illumination of separated retinal patches. Beginning at tL = 0.75, and at 0.90, the slope of the "cone" curve is sharply increased, and the maximum F is far above its position in the absence of the grid. The decrease of σ'log I (the slope constant) signifies, in terms of other information, an increase in the number of acting cone units. The abscissa of inflection is also much lowered, relatively, whereas without the grid it increases as tL is made larger. These effects correspond subjectively to the fact that at the end-point flicker is most pronounced, on the "cone" curve, along the edges of the grid shadow where contrast is particularly evident with the longer light-times. The "rod" portion of the F - log I contour is not specifically affected by the presence of the grid shadow. Its form is obtainable at tL = 0.90 free from the influence of summating "cone" contributions, because then almost no overlapping occurs. Analysis shows that when overlapping does occur a certain number of rod units are inhibited by concurrent cone excitation, and that the mean contribution of elements of neural action from each of the non-inhibited units is also reduced to an extent depending on the degree of overlap. The isolated "rod" curve at tL = 0.90 is quite accurately in the form of a probability integral. The data thus give a new experimental proof of the occurrence of two distinct but interlocking populations of visual effects, and experimentally justify the analytical procedures which have been used to separate them. 2. The changing form of the F - log I contour as a function of tL, produced in man when the illuminated field is divided into parts by a shadow pattern, is normally found with the bird Taeniopygia castenotis (Gould), the zebra finch. The retina has elements of one general structural type (cones), and the F - log I contour is a simplex symmetrical probability integral. The eye of this bird has a large, complex, and darkly pigmented pecten, which casts a foliated shadow on the retina. The change in form of the F - log I curve occurs with tL above 0,50, and at tL = 0.90 is quite extreme. It is more pronounced than the one that is secured in the human data with the particular grid we have used, but there is no doubt that it could be mimicked completely by the use of other grids. The increase of flicker acuity due to the pecten shadow is considerable, when the dark spaces are brief relative to the light. The evidence thus confirms the suggestion (Menner) drawn from comparative natural history that the visual significance of the avian pecten might be to increase the sensory effect of small moving images. It is theoretically important that (as in the human experiment) this may be brought about by an actual decrease of effective retinal area illuminated. It is also significant theoretically that despite the presence of shadows of pecten or of grid, and of the sensory influences thus introduced, the probability integral formulation remains effective.
The noninvasive assessment of cardiac function is of first
importance for the diagnosis of cardiovascular diseases. Among all medical scanners only a few enables radiologists to evaluate the local cardiac motion. Tagged cardiac MRI is one of them. This protocol generates on Short-Axis (SA) sequences a dark grid which is deformed in accordance
with the cardiac motion. Tracking the grid allows specialists a local estimation of cardiac geometrical parameters within myocardium. The work described in this paper aims to automate the myocardial contours detection in order to optimize the detection and the tracking of the grid of tags within myocardium. The method we have developed for endocardial
and epicardial contours detection is based on the use of texture analysis
and active contours models. Texture analysis allows us to define energy
maps more efficient than those usually used in active contours methods
where attractor is often based on gradient and which were useless in our
case of study, for quality of tagged cardiac MRI is very poor.
Medial entorhinal grid cells and hippocampal place cells provide neural correlates of spatial representation in the brain. A place cell typically fires whenever an animal is present in one or more spatial regions, or places, of an environment. A grid cell typically fires in multiple spatial regions that form a regular hexagonal grid structure extending throughout the environment. Different grid and place cells prefer spatially offset regions, with their firing fields increasing in size along the dorsoventral axes of the medial entorhinal cortex and hippocampus. The spacing between neighboring fields for a grid cell also increases along the dorsoventral axis. This article presents a neural model whose spiking neurons operate in a hierarchy of self-organizing maps, each obeying the same laws. This spiking GridPlaceMap model simulates how grid cells and place cells may develop. It responds to realistic rat navigational trajectories by learning grid cells with hexagonal grid firing fields of multiple spatial scales and place cells with one or more firing fields that match neurophysiological data about these cells and their development in juvenile rats. The place cells represent much larger spaces than the grid cells, which enable them to support navigational behaviors. Both self-organizing maps amplify and learn to categorize the most frequent and energetic co-occurrences of their inputs. The current results build upon a previous rate-based model of grid and place cell learning, and thus illustrate a general method for converting rate-based adaptive neural models, without the loss of any of their analog properties, into models whose cells obey spiking dynamics. New properties of the spiking GridPlaceMap model include the appearance of theta band modulation. The spiking model also opens a path for implementation in brain-emulating nanochips comprised of networks of noisy spiking neurons with multiple-level adaptive weights for controlling autonomous adaptive robots capable of spatial navigation.
Clearly visualized biopathways provide a great help in understanding biological systems. However, manual drawing of large-scale biopathways is time consuming. We proposed a grid layout algorithm that can handle gene-regulatory networks and signal transduction pathways by considering edge-edge crossing, node-edge crossing, distance measure between nodes, and subcellular localization information from Gene Ontology. Consequently, the layout algorithm succeeded in drastically reducing these crossings in the apoptosis model. However, for larger-scale networks, we encountered three problems: (i) the initial layout is often very far from any local optimum because nodes are initially placed at random, (ii) from a biological viewpoint, human layouts still exceed automatic layouts in understanding because except subcellular localization, it does not fully utilize biological information of pathways, and (iii) it employs a local search strategy in which the neighborhood is obtained by moving one node at each step, and automatic layouts suggest that simultaneous movements of multiple nodes are necessary for better layouts, while such extension may face worsening the time complexity.
We propose a new grid layout algorithm. To address problem (i), we devised a new force-directed algorithm whose output is suitable as the initial layout. For (ii), we considered that an appropriate alignment of nodes having the same biological attribute is one of the most important factors of the comprehension, and we defined a new score function that gives an advantage to such configurations. For solving problem (iii), we developed a search strategy that considers swapping nodes as well as moving a node, while keeping the order of the time complexity. Though a naïve implementation increases by one order, the time complexity, we solved this difficulty by devising a method that caches differences between scores of a layout and its possible updates.
Layouts of the new grid layout algorithm are compared with that of the previous algorithm and human layout in an endothelial cell model, three times as large as the apoptosis model. The total cost of the result from the new grid layout algorithm is similar to that of the human layout. In addition, its convergence time is drastically reduced (40% reduction).
Several systems have been presented in the last years in order to manage the complexity of large microarray experiments. Although good results have been achieved, most systems tend to lack in one or more fields. A Grid based approach may provide a shared, standardized and reliable solution for storage and analysis of biological data, in order to maximize the results of experimental efforts. A Grid framework has been therefore adopted due to the necessity of remotely accessing large amounts of distributed data as well as to scale computational performances for terabyte datasets. Two different biological studies have been planned in order to highlight the benefits that can emerge from our Grid based platform. The described environment relies on storage services and computational services provided by the gLite Grid middleware. The Grid environment is also able to exploit the added value of metadata in order to let users better classify and search experiments. A state-of-art Grid portal has been implemented in order to hide the complexity of framework from end users and to make them able to easily access available services and data. The functional architecture of the portal is described. As a first test of the system performances, a gene expression analysis has been performed on a dataset of Affymetrix GeneChip® Rat Expression Array RAE230A, from the ArrayExpress database. The sequence of analysis includes three steps: (i) group opening and image set uploading, (ii) normalization, and (iii) model based gene expression (based on PM/MM difference model). Two different Linux versions (sequential and parallel) of the dChip software have been developed to implement the analysis and have been tested on a cluster. From results, it emerges that the parallelization of the analysis process and the execution of parallel jobs on distributed computational resources actually improve the performances. Moreover, the Grid environment have been tested both against the possibility of uploading and accessing distributed datasets through the Grid middleware and against its ability in managing the execution of jobs on distributed computational resources. Results from the Grid test will be discussed in a further paper.
The spatial responses of many of the cells recorded in layer II of rodent medial entorhinal cortex (MEC) show a triangular grid pattern, which appears to provide an accurate population code for animal spatial position. In layer III, V and VI of the rat MEC, grid cells are also selective to head-direction and are modulated by the speed of the animal. Several putative mechanisms of grid-like maps were proposed, including attractor network dynamics, interactions with theta oscillations or single-unit mechanisms such as firing rate adaptation. In this paper, we present a new attractor network model that accounts for the conjunctive position-by-velocity selectivity of grid cells. Our network model is able to perform robust path integration even when the recurrent connections are subject to random perturbations.
How do animals self-localize when they explore the environments with variable velocities? One mechanism is dead reckoning or path-integration. Recent experiments on rodents show that such computation may be performed by grid cells in medial entorhinal cortex. Each grid cell fires strongly when the animal enters locations that define the vertices of a triangular grid. Some of the grid cells show grid firing patterns only when the animal runs along particular directions. Here, we propose that grid cells collectively represent arbitrary conjunctions of positions and movements of the animal. Due to asymmetric recurrent connections, the network has grid patterns as states that are able to move intrinsically with all possible directions and speeds. A velocity-tuned input will activate a subset of the population that prefers similar movements, and the pattern in the network moves with a velocity proportional to the movement of the animal in physical space, up to a fixed rotation. Thus the network ‘imagines’ the movement of the animal, and produces single cell grid firing responses in space with different degree of head-direction selectivity. We propose testable predictions for new experiments to verify our model.
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position.
To develop software infrastructure that will provide support for discovery, characterization, integrated access, and management of diverse and disparate collections of information sources, analysis methods, and applications in biomedical research.
An enterprise Grid software infrastructure, called caGrid version 1.0 (caGrid 1.0), has been developed as the core Grid architecture of the NCI-sponsored cancer Biomedical Informatics Grid (caBIG™) program. It is designed to support a wide range of use cases in basic, translational, and clinical research, including 1) discovery, 2) integrated and large-scale data analysis, and 3) coordinated study.
The caGrid is built as a Grid software infrastructure and leverages Grid computing technologies and the Web Services Resource Framework standards. It provides a set of core services, toolkits for the development and deployment of new community provided services, and application programming interfaces for building client applications.
The caGrid 1.0 was released to the caBIG community in December 2006. It is built on open source components and caGrid source code is publicly and freely available under a liberal open source license. The core software, associated tools, and documentation can be downloaded from the following URL: https://cabig.nci.nih.gov/workspaces/Architecture/caGrid.
While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields. In this respect, caGrid 1.0 is the realization of a framework that can benefit the entire biomedical community.
Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult.
Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included.
To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures.
Tagged Magnetic Resonance Imaging (tagged MRI or tMRI) provides a means of directly and noninvasively displaying the internal motion of the myocardium. Reconstruction of the motion field is needed to quantify important clinical information, e.g., the myocardial strain, and detect regional heart functional loss. In this paper, we present a three-step method for this task. First, we use a Gabor filter bank to detect and locate tag intersections in the image frames, based on local phase analysis. Next, we use an improved version of the Robust Point Matching (RPM) method to sparsely track the motion of the myocardium, by establishing a transformation function and a one-to-one correspondence between grid tag intersections in different image frames. In particular, the RPM helps to minimize the impact on the motion tracking result of: 1) through-plane motion, and 2) relatively large deformation and/or relatively small tag spacing. In the final step, a meshless deformable model is initialized using the transformation function computed by RPM. The model refines the motion tracking and generates a dense displacement map, by deforming under the influence of image information, and is constrained by the displacement magnitude to retain its geometric structure. The 2D displacement maps in short and long axis image planes can be combined to drive a 3D deformable model, using the Moving Least Square method, constrained by the minimization of the residual error at tag intersections. The method has been tested on a numerical phantom, as well as on in vivo heart data from normal volunteers and heart disease patients. The experimental results show that the new method has a good performance on both synthetic and real data. Furthermore, the method has been used in an initial clinical study to assess the differences in myocardial strain distributions between heart disease (left ventricular hypertrophy) patients and the normal control group. The final results show that the proposed method is capable of separating patients from healthy individuals. In addition, the method detects and makes possible quantification of local abnormalities in the myocardium strain distribution, which is critical for quantitative analysis of patients’ clinical conditions. This motion tracking approach can improve the throughput and reliability of quantitative strain analysis of heart disease patients, and has the potential for further clinical applications.
Tagged MRI; Motion Tracking; Gabor Filter; RPM; Deformable Model; Strain
In this paper, we study several interesting optimal-ratio region detection (ORD) problems in d-D (d ≥ 3) discrete geometric spaces, which arise in high dimensional medical image segmentation. Given a d-D voxel grid of n cells, two classes of geometric regions that are enclosed by a single or two coupled smooth heighfield surfaces defined on the entire grid domain are considered. The objective functions are normalized by a function of the desired regions, which avoids a bias to produce an overly large or small region resulting from data noise. The normalization functions that we employ are used in real medical image segmentation. To our best knowledge, no previous results on these problems in high dimensions are known. We develop a unified algorithmic framework based on a careful characterization of the intrinsic geometric structures and a nontrivial graph transformation scheme, yielding efficient polynomial time algorithms for solving these ORD problems. Our main ideas include the following. We observe that the optimal solution to the ORD problems can be obtained via the construction of a convex hull for a set of O(n) unknown 2-D points using the hand probing technique. The probing oracles are implemented by computing a minimum s-t cut in a weighted directed graph. The ORD problems are then solved by O(n) calls to the minimum s-t cut algorithm. For the class of regions bounded by a single heighfield surface, our further investigation shows that the O(n) calls to the minimum s-t cut algorithm are on a monotone parametric flow network, which enables to detect the optimal-ratio region in the complexity of computing a single maximum flow.
Hand probing; parametric search; minimum closed set; optimal-ratio region detection; algorithms
Despite continuous efforts of the international community to reduce the impact of malaria on developing countries, no significant progress has been made in the recent years and the discovery of new drugs is more than ever needed. Out of the many proteins involved in the metabolic activities of the Plasmodium parasite, some are promising targets to carry out rational drug discovery.
Recent years have witnessed the emergence of grids, which are highly distributed computing infrastructures particularly well fitted for embarrassingly parallel computations like docking. In 2005, a first attempt at using grids for large-scale virtual screening focused on plasmepsins and ended up in the identification of previously unknown scaffolds, which were confirmed in vitro to be active plasmepsin inhibitors. Following this success, a second deployment took place in the fall of 2006 focussing on one well known target, dihydrofolate reductase (DHFR), and on a new promising one, glutathione-S-transferase.
In silico drug design, especially vHTS is a widely and well-accepted technology in lead identification and lead optimization. This approach, therefore builds, upon the progress made in computational chemistry to achieve more accurate in silico docking and in information technology to design and operate large scale grid infrastructures.
On the computational side, a sustained infrastructure has been developed: docking at large scale, using different strategies in result analysis, storing of the results on the fly into MySQL databases and application of molecular dynamics refinement are MM-PBSA and MM-GBSA rescoring. The modeling results obtained are very promising. Based on the modeling results, In vitro results are underway for all the targets against which screening is performed.
The current paper describes the rational drug discovery activity at large scale, especially molecular docking using FlexX software on computational grids in finding hits against three different targets (PfGST, PfDHFR, PvDHFR (wild type and mutant forms) implicated in malaria. Grid-enabled virtual screening approach is proposed to produce focus compound libraries for other biological targets relevant to fight the infectious diseases of the developing world.
Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform.
The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model.
(1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure.
Materials and Methods:
We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput.
Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download.
Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation.
Algorithm validation; parallel database; pathology imaging; spatial database
Understanding complex networks of protein-protein interactions (PPIs) is one of the foremost challenges of the post-genomic era. Due to the recent advances in experimental bio-technology, including yeast-2-hybrid (Y2H), tandem affinity purification (TAP) and other high-throughput methods for protein-protein interaction (PPI) detection, huge amounts of PPI network data are becoming available. Of major concern, however, are the levels of noise and incompleteness. For example, for Y2H screens, it is thought that the false positive rate could be as high as 64%, and the false negative rate may range from 43% to 71%. TAP experiments are believed to have comparable levels of noise.
We present a novel technique to assess the confidence levels of interactions in PPI networks obtained from experimental studies. We use it for predicting new interactions and thus for guiding future biological experiments. This technique is the first to utilize currently the best fitting network model for PPI networks, geometric graphs. Our approach achieves specificity of 85% and sensitivity of 90%. We use it to assign confidence scores to physical protein-protein interactions in the human PPI network downloaded from BioGRID. Using our approach, we predict 251 interactions in the human PPI network, a statistically significant fraction of which correspond to protein pairs sharing common GO terms. Moreover, we validate a statistically significant portion of our predicted interactions in the HPRD database and the newer release of BioGRID. The data and Matlab code implementing the methods are freely available from the web site: http://www.kuchaev.com/Denoising.
Proteins are responsible for much of the biological ‘heavy lifting’ that keeps our cells functioning. However, proteins don't usually work alone; instead they typically bind together to form geometrically and chemically complex structures that are tailored for a specific task. Experimental techniques allow us to detect whether two types of proteins are capable of binding together, or ‘interacting’. This creates a network where two proteins are connected if they have been seen to interact, just as we could regard two people as being connected if they are linked on Facebook. Such protein-protein interaction networks have been developed for several organisms, using a range of methods, all of which are subject to experimental errors. These network data reveal a fascinating and intricate pattern of connections. In particular, it is known that proteins can be arranged into a low-dimensional space, such as a three-dimensional cube, so that interacting proteins are close together. Our work shows that this structure can be exploited to assign confidence levels to recorded protein-protein interactions and predict new interactions that were overlooked experimentally. In tests, we predicted 251 new human protein-protein interactions, and through literature curation we independently validated a statistically significant number of them.
Examining whether disease cases are clustered in space is an important part of epidemiological research. Another important part of spatial epidemiology is testing whether patients suffering from a disease are more, or less, exposed to environmental factors of interest than adequately defined controls. Both approaches involve determining the number of cases and controls (or population at risk) in specific zones. For cluster searches, this often must be done for millions of different zones. Doing this by calculating distances can lead to very lengthy computations. In this work we discuss the computational advantages of geographical grid-based methods, and introduce an open source software (FGBASE) which we have created for this purpose.
Geographical grids based on the Lambert Azimuthal Equal Area projection are well suited for spatial epidemiology because they preserve area: each cell of the grid has the same area. We describe how data is projected onto such a grid, as well as grid-based algorithms for spatial epidemiological data-mining. The software program (FGBASE), that we have developed, implements these grid-based methods.
The grid based algorithms perform extremely fast. This is particularly the case for cluster searches. When applied to a cohort of French Type 1 Diabetes (T1D) patients, as an example, the grid based algorithms detected potential clusters in a few seconds on a modern laptop. This compares very favorably to an equivalent cluster search using distance calculations instead of a grid, which took over 4 hours on the same computer. In the case study we discovered 4 potential clusters of T1D cases near the cities of Le Havre, Dunkerque, Toulouse and Nantes. One example of environmental analysis with our software was to study whether a significant association could be found between distance to vineyards with heavy pesticide. None was found. In both examples, the software facilitates the rapid testing of hypotheses.
Grid-based algorithms for mining spatial epidemiological data provide advantages in terms of computational complexity thus improving the speed of computations. We believe that these methods and this software tool (FGBASE) will lower the computational barriers to entry for those performing epidemiological research.
Electronic supplementary material
The online version of this article (doi:10.1186/1476-072X-13-46) contains supplementary material, which is available to authorized users.
Computational epidemiology; Cluster; Environmental factors; Software; Geographical grid; Type 1 diabetes
In data grids scientific and business applications produce huge volume of data which needs to be transferred among the distributed and heterogeneous nodes of data grids. Data replication provides a solution for managing data files efficiently in large grids. The data replication helps in enhancing the data availability which reduces the overall access time of the file. In this paper an algorithm, namely, EDRA using agents for data grid, has been proposed and implemented. EDRA consists of dynamic replication of hierarchical structure taken into account for the selection of best replica. Decision for selecting the best replica is based on scheduling parameters. The scheduling parameters are bandwidth, load gauge, and computing capacity of the node. The scheduling in data grid helps in reducing the data access time. The distribution of the load on the nodes of data grid is done evenly by considering scheduling parameters. EDRA is implemented using data grid simulator, namely, OptorSim. European Data Grid CMS test bed topology is used in this experiment. The simulation results are obtained by comparing BHR, LRU, No Replication, and EDRA. The result shows the efficiency of EDRA algorithm in terms of mean job execution time, network usage, and storage usage of node.
There are many instruments available freely for evaluating obstetric care quality in low-resource settings. However, this profusion can be confusing; moreover, evaluation instruments need to be adapted to local issues. In this article, we present tools we developed to guide the choice of instruments and describe how we used them in Burkina Faso to facilitate the participative development of a locally adapted instrument.
Based on a literature review, we developed two tools: a conceptual framework and an analysis grid of existing evaluation instruments. Subsequently, we facilitated several sessions with evaluation stakeholders in Burkina Faso. They used the tools to develop a locally adapted evaluation instrument that was subsequently tested in six healthcare facilities.
Three outputs emerged from this process:
1) A comprehensive conceptual framework for the quality of obstetric care, each component of which is a potential criterion for evaluation.
2) A grid analyzing 37 instruments for evaluating the quality of obstetric care in low-resource settings. We highlight their key characteristics and describe how the grid can be used to prepare a new evaluation.
3) An evaluation instrument adapted to Burkina Faso. We describe the experience of the Burkinabé stakeholders in developing this instrument using the conceptual framework and the analysis grid, while taking into account local realities.
This experience demonstrates how drawing upon existing instruments can inspire and rationalize the process of developing a new, tailor-made instrument. Two tools that came out of this experience can be useful to other teams: a conceptual framework for the quality of obstetric care and an analysis grid of existing evaluation instruments. These provide an easily accessible synthesis of the literature and are useful in integrating it with the context-specific knowledge of local actors, resulting in evaluation instruments that have both scientific and local legitimacy.
This study concerns the development of a high performance workflow that, using grid technology, correlates different kinds of Bioinformatics data, starting from the base pairs of the nucleotide sequence to the exposed residues of the protein surface. The implementation of this workflow is based on the Italian Grid.it project infrastructure, that is a network of several computational resources and storage facilities distributed at different grid sites.
Workflows are very common in Bioinformatics because they allow to process large quantities of data by delegating the management of resources to the information streaming. Grid technology optimizes the computational load during the different workflow steps, dividing the more expensive tasks into a set of small jobs.
Grid technology allows efficient database management, a crucial problem for obtaining good results in Bioinformatics applications. The proposed workflow is implemented to integrate huge amounts of data and the results themselves must be stored into a relational database, which results as the added value to the global knowledge.
A web interface has been developed to make this technology accessible to grid users. Once the workflow has started, by means of the simplified interface, it is possible to follow all the different steps throughout the data processing. Eventually, when the workflow has been terminated, the different features of the protein, like the amino acids exposed on the protein surface, can be compared with the data present in the output database.
Computational fluid dynamics (CFD) simulations are becoming a reliable tool to understand hemodynamics, disease progression in pathological blood vessels and to predict medical device performance. Immersed boundary method (IBM) emerged as an attractive methodology because of its ability to efficiently handle complex moving and rotating geometries on structured grids. However, its application to study blood flow in complex, branching, patient-specific anatomies is scarce. This is because of the dominance of grid nodes in the exterior of the fluid domain over the useful grid nodes in the interior, rendering an inevitable memory and computational overhead. In order to alleviate this problem, we propose a novel multiblock based IBM that preserves the simplicity and effectiveness of the IBM on structured Cartesian meshes and enables handling of complex, anatomical geometries at a reduced memory overhead by minimizing the grid nodes in the exterior of the fluid domain. As pathological and medical device hemodynamics often involve complex, unsteady transitional or turbulent flow fields, a scale resolving turbulence model such as large eddy simulation (LES) is used in the present work. The proposed solver (here after referred as WenoHemo), is developed by enhancing an existing in-house high order incompressible flow solver that was previously validated for its numerics and several LES models by Shetty et al. [Journal of Computational Physics 2010; 229 (23), 8802-8822]. In the present work, WenoHemo is systematically validated for additional numerics introduced, such as IBM and the multiblock approach, by simulating laminar flow over a sphere and laminar flow over a backward facing step respectively. Then, we validate the entire solver methodology by simulating laminar and transitional flow in abdominal aortic aneurysm (AAA). Finally, we perform blood flow simulations in the challenging clinically relevant thoracic aortic aneurysm (TAA), to gain insights into the type of fluid flow patterns that exist in pathological blood vessels. Results obtained from the TAA simulations reveal complex vortical and unsteady flow fields that need to be considered in designing and implanting medical devices such as stent grafts.
Large-eddy simulation; WENO; Multiblock; Immersed boundary method; High-order finite difference; Incompressible; Biomechanical flows
This poster describes an approach which leverages grid technology for the epidemiological analysis of public health data. Through a virtual environment, users, particularly epidemiologists, and others unfamiliar with the application, can perform on-demand powerful statistical analyses.
Currently, there’s little effective communication and collaboration among public health departments. The lack of collaboration has resulted in more than 300 separate biosurveillance systems (1), which are disease specific, not integrated or interoperable, and may be duplicative (1). Grid architecture is a promising methodology to aid in building a decentralized health surveillance infrastructure because it encourages an ecosystem development culture (2), which has the potential to increase collaboration and decrease duplications.
This project had two major steps: creation and validation of the grid service. For the first step [creation of the service], we first determined the parameter set required to execute R from the command line. We then used the caGrid Introduce toolkit (3) and Grid Rapid Application Virtualization Interface (gRAVI) (4) to wrap the R command line interface into a grid service. The service was then deployed to the caGrid training grid. After deployment, the service was invoked using the R grid service client which was automatically created by Introduce and gRAVI.
Our second step was aimed at validating the service by using using the grid service client to illustrate the working principles of R in a grid environment. For this illustration, we selected the article by Hohle et al (5). In this article, the ‘surveillance’ package was developed to provide different algorithms for the detection of aberrations in routinely collected surveillance data. For validation purposes, only a subset of the analyses presented in the article, namely the Farrington and CUSUM algorithms, were reproduced. Using the grid web client, we uploaded the necessary data files for processing, as well as the Rscript which was used to replicate the results of (5). The application then ran the R script on the execution machine; this machine had all the necessary R packages needed for the specific scenario.
The implementation of was validated by showing that the results of the original paper can be reproduced using gird based version of R. Figure 1 shows the plots related to the steps described above; the plots illustrating the Farrington and CUSUM algorithms are seen to be identical to that in (5).
We demonstrated that it is possible to easily deploy applications for public health surveillance uses. We conclude that the techniques we used could be generalized to any application that has a command line interface. Future work will be aimed creating a workflow to access data services and grid-enabled text processing and analytic tools. We believe that by providing a set of examples to demonstrate the benefit of this technology to public health surveillance infrastructure may provide insight that may lead to a better, more collaborative system of tools that will become the future of public health surveillance.
Grid computing; Public health grid; analytical service
We examined the accuracy with which the location of an agent moving within an environment could be decoded from the simulated firing of systems of grid cells. Grid cells were modelled with Poisson spiking dynamics and organized into multiple ‘modules’ of cells, with firing patterns of similar spatial scale within modules and a wide range of spatial scales across modules. The number of grid cells per module, the spatial scaling factor between modules and the size of the environment were varied. Errors in decoded location can take two forms: small errors of precision and larger errors resulting from ambiguity in decoding periodic firing patterns. With enough cells per module (e.g. eight modules of 100 cells each) grid systems are highly robust to ambiguity errors, even over ranges much larger than the largest grid scale (e.g. over a 500 m range when the maximum grid scale is 264 cm). Results did not depend strongly on the precise organization of scales across modules (geometric, co-prime or random). However, independent spatial noise across modules, which would occur if modules receive independent spatial inputs and might increase with spatial uncertainty, dramatically degrades the performance of the grid system. This effect of spatial uncertainty can be mitigated by uniform expansion of grid scales. Thus, in the realistic regimes simulated here, the optimal overall scale for a grid system represents a trade-off between minimizing spatial uncertainty (requiring large scales) and maximizing precision (requiring small scales). Within this view, the temporary expansion of grid scales observed in novel environments may be an optimal response to increased spatial uncertainty induced by the unfamiliarity of the available spatial cues.
grid cell; spatial navigation; Poisson noise; spatial uncertainty
With the increasing age and cost of operation of the existing NCI SEER platform core technologies, such essential resources in the fight against cancer as these will eventually have to be migrated to Grid based systems. In order to model this migration, a simulation is proposed based upon an agent modeling technology. This modeling technique allows for simulation of complex and distributed services provided by a large scale Grid computing platform such as the caBIG™ project’s caGRID. In order to investigate such a migration to a Grid based platform technology, this paper proposes using agent-based modeling simulations to predict the performance of current and Grid configurations of the NCI SEER system integrated with the existing translational opportunities afforded by caGRID. The model illustrates how the use of Grid technology can potentially improve system response time as systems under test are scaled. In modeling SEER nodes accessing multiple registry silos, we show that the performance of SEER applications re-implemented in a Grid native manner exhibits a nearly constant user response time with increasing numbers of distributed registry silos, compared with the current application architecture which exhibits a linear increase in response time for increasing numbers of silos.
The Poisson-Boltzmann (PB) equation is an established multiscale model for electrostatic analysis of biomolecules and other dielectric systems. PB based molecular dynamics (MD) approach has a potential to tackle large biological systems. Obstacles that hinder the current development of PB based MD methods are concerns in accuracy, stability, efficiency and reliability. The presence of complex solvent-solute interface, geometric singularities and charge singularities leads to challenges in the numerical solution of the PB equation and electrostatic force evaluation in PB based MD methods. Recently, the matched interface and boundary (MIB) method has been utilized to develop the first second order accurate PB solver that is numerically stable in dealing with discontinuous dielectric coefficients, complex geometric singularities and singular source charges. The present work develops the PB based MD approach using the MIB method. New formulation of electrostatic forces is derived to allow the use of sharp molecular surfaces. Accurate reaction field forces are obtained by directly differentiating the electrostatic potential. Dielectric boundary forces are evaluated at the solvent-solute interface using an accurate Cartesian-grid surface integration method. The electrostatic forces located at reentrant surfaces are appropriately assigned to related atoms. Extensive numerical tests are carried out to validate the accuracy and stability of the present electrostatic force calculation. The new PB based MD method is implemented in conjunction with the AMBER package. MIB based MD simulations of biomolecules are demonstrated via a few example systems.
Implicit solvent model; Poisson-Boltzmann equation; Molecular dynamics; Interface method; Matched interface and boundary; Biomolecules
An approach to automated acquisition of cryoEM image data from lacey carbon grids using the Leginon program is described. Automated liquid nitrogen top up of the specimen holder dewar was used as a step towards full automation, without operator intervention during the course of data collection. During cryoEM studies of actin labelled with myosin V, we have found it necessary to work with lacey grids rather than Quantifoil or C-flat grids due to interaction of myosin V with the support film. Lacey grids have irregular holes of variable shape and size, in contrast to Quantifoil or C-flat grids which have a regular array of similar circular holes on each grid square. Other laboratories also prefer to work with grids with irregular holes for a variety of reasons. Therefore, it was necessary to develop a different strategy from normal Leginon usage for working with lacey grids for targetting holes for image acquisition and suitable areas for focussing prior to image acquisition. This approach was implemented by using the extensible framework provided by Leginon and by developing a new MSI application within that framework which includes a new Leginon node (for a novel method for finding focus targets).
Electronmicroscopy; Data collection; Single-particle reconstruction; High throughput