Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  PD5: A General Purpose Library for Primer Design Software 
PLoS ONE  2013;8(11):e80156.
Complex PCR applications for large genome-scale projects require fast, reliable and often highly sophisticated primer design software applications. Presently, such applications use pipelining methods to utilise many third party applications and this involves file parsing, interfacing and data conversion, which is slow and prone to error. A fully integrated suite of software tools for primer design would considerably improve the development time, the processing speed, and the reliability of bespoke primer design software applications.
The PD5 software library is an open-source collection of classes and utilities, providing a complete collection of software building blocks for primer design and analysis. It is written in object-oriented C++ with an emphasis on classes suitable for efficient and rapid development of bespoke primer design programs. The modular design of the software library simplifies the development of specific applications and also integration with existing third party software where necessary. We demonstrate several applications created using this software library that have already proved to be effective, but we view the project as a dynamic environment for building primer design software and it is open for future development by the bioinformatics community. Therefore, the PD5 software library is published under the terms of the GNU General Public License, which guarantee access to source-code and allow redistribution and modification.
The PD5 software library is downloadable from Google Code and the accompanying Wiki includes instructions and examples:
PMCID: PMC3836914  PMID: 24278254
2.  Towards Robot Scientists for autonomous scientific discovery 
We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.
PMCID: PMC2813846  PMID: 20119518
3.  The EXACT description of biomedical protocols 
Bioinformatics  2008;24(13):i295-i303.
Motivation: Many published manuscripts contain experiment protocols which are poorly described or deficient in information. This means that the published results are very hard or impossible to repeat. This problem is being made worse by the increasing complexity of high-throughput/automated methods. There is therefore a growing need to represent experiment protocols in an efficient and unambiguous way.
Results: We have developed the Experiment ACTions (EXACT) ontology as the basis of a method of representing biological laboratory protocols. We provide example protocols that have been formalized using EXACT, and demonstrate the advantages and opportunities created by using this formalization. We argue that the use of EXACT will result in the publication of protocols with increased clarity and usefulness to the scientific community.
Availability: The ontology, examples and code can be downloaded from
Contact: Larisa Soldatova
PMCID: PMC2718634  PMID: 18586727
4.  Locational distribution of gene functional classes in Arabidopsis thaliana 
BMC Bioinformatics  2007;8:112.
We are interested in understanding the locational distribution of genes and their functions in genomes, as this distribution has both functional and evolutionary significance. Gene locational distribution is known to be affected by various evolutionary processes, with tandem duplication thought to be the main process producing clustering of homologous sequences. Recent research has found clustering of protein structural families in the human genome, even when genes identified as tandem duplicates have been removed from the data. However, this previous research was hindered as they were unable to analyse small sample sizes. This is a challenge for bioinformatics as more specific functional classes have fewer examples and conventional statistical analyses of these small data sets often produces unsatisfactory results.
We have developed a novel bioinformatics method based on Monte Carlo methods and Greenwood's spacing statistic for the computational analysis of the distribution of individual functional classes of genes (from GO). We used this to make the first comprehensive statistical analysis of the relationship between gene functional class and location on a genome. Analysis of the distribution of all genes except tandem duplicates on the five chromosomes of A. thaliana reveals that the distribution on chromosomes I, II, IV and V is clustered at P = 0.001. Many functional classes are clustered, with the degree of clustering within an individual class generally consistent across all five chromosomes. A novel and surprising result was that the locational distribution of some functional classes were significantly more evenly spaced than would be expected by chance.
Analysis of the A. thaliana genome reveals evidence of unexplained order in the locational distribution of genes. The same general analysis method can be applied to any genome, and indeed any sequential data involving classes.
PMCID: PMC1855069  PMID: 17397552
5.  Accurate Prediction of Protein Functional Class From Sequence in the Mycobacterium Tuberculosis and Escherichia Coli Genomes Using Data Mining 
Yeast (Chichester, England)  2000;17(4):283-293.
The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interpretable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60–80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli.
PMCID: PMC2448385  PMID: 11119305

Results 1-5 (5)