PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1165168)

Clipboard (0)
None

Related Articles

1.  LabTrove: A Lightweight, Web Based, Laboratory “Blog” as a Route towards a Marked Up Record of Work in a Bioscience Research Laboratory 
PLoS ONE  2013;8(7):e67460.
Background
The electronic laboratory notebook (ELN) has the potential to replace the paper notebook with a marked-up digital record that can be searched and shared. However, it is a challenge to achieve these benefits without losing the usability and flexibility of traditional paper notebooks. We investigate a blog-based platform that addresses the issues associated with the development of a flexible system for recording scientific research.
Methodology/Principal Findings
We chose a blog-based approach with the journal characteristics of traditional notebooks in mind, recognizing the potential for linking together procedures, materials, samples, observations, data, and analysis reports. We implemented the LabTrove blog system as a server process written in PHP, using a MySQL database to persist posts and other research objects. We incorporated a metadata framework that is both extensible and flexible while promoting consistency and structure where appropriate. Our experience thus far is that LabTrove is capable of providing a successful electronic laboratory recording system.
Conclusions/Significance
LabTrove implements a one-item one-post system, which enables us to uniquely identify each element of the research record, such as data, samples, and protocols. This unique association between a post and a research element affords advantages for monitoring the use of materials and samples and for inspecting research processes. The combination of the one-item one-post system, consistent metadata, and full-text search provides us with a much more effective record than a paper notebook. The LabTrove approach provides a route towards reconciling the tensions and challenges that lie ahead in working towards the long-term goals for ELNs. LabTrove, an electronic laboratory notebook (ELN) system from the Smart Research Framework, based on a blog-type framework with full access control, facilitates the scientific experimental recording requirements for reproducibility, reuse, repurposing, and redeployment.
doi:10.1371/journal.pone.0067460
PMCID: PMC3720848  PMID: 23935832
2.  An automated and reproducible workflow for running and analyzing neural simulations using Lancet and IPython Notebook 
Lancet is a new, simulator-independent Python utility for succinctly specifying, launching, and collating results from large batches of interrelated computationally demanding program runs. This paper demonstrates how to combine Lancet with IPython Notebook to provide a flexible, lightweight, and agile workflow for fully reproducible scientific research. This informal and pragmatic approach uses IPython Notebook to capture the steps in a scientific computation as it is gradually automated and made ready for publication, without mandating the use of any separate application that can constrain scientific exploration and innovation. The resulting notebook concisely records each step involved in even very complex computational processes that led to a particular figure or numerical result, allowing the complete chain of events to be replicated automatically. Lancet was originally designed to help solve problems in computational neuroscience, such as analyzing the sensitivity of a complex simulation to various parameters, or collecting the results from multiple runs with different random starting points. However, because it is never possible to know in advance what tools might be required in future tasks, Lancet has been designed to be completely general, supporting any type of program as long as it can be launched as a process and can return output in the form of files. For instance, Lancet is also heavily used by one of the authors in a separate research group for launching batches of microprocessor simulations. This general design will allow Lancet to continue supporting a given research project even as the underlying approaches and tools change.
doi:10.3389/fninf.2013.00044
PMCID: PMC3874632  PMID: 24416014
IPython; pandas; reproducibility; workflow; simulation; batch computation; provenance; big data
3.  Make it better but don't change anything 
With massive amounts of data being generated in electronic format, there is a need in basic science laboratories to adopt new methods for tracking and analyzing data. An electronic laboratory notebook (ELN) is not just a replacement for a paper lab notebook, it is a new method of storing and organizing data while maintaining the data entry flexibility and legal recording functions of paper notebooks. Paper notebooks are regarded as highly flexible since the user can configure it to store almost anything that can be written or physically pasted onto the pages. However, data retrieval and data sharing from paper notebooks are labor intensive processes and notebooks can be misplaced, a single point of failure that loses all entries in the volume. Additional features provided by electronic notebooks include searchable indices, data sharing, automatic archiving for security against loss and ease of data duplication. Furthermore, ELNs can be tasked with additional functions not commonly found in paper notebooks such as inventory control. While ELNs have been on the market for some time now, adoption of an ELN in academic basic science laboratories has been lagging. Issues that have restrained development and adoption of ELN in research laboratories are the sheer variety and frequency of changes in protocols with a need for the user to control notebook configuration outside the framework of professional IT staff support. In this commentary, we will look at some of the issues and experiences in academic laboratories that have proved challenging in implementing an electronic lab notebook.
doi:10.1186/1759-4499-1-5
PMCID: PMC2810290  PMID: 20098591
4.  Mediators between Theoretical and Practical Medieval Knowledge: Medical Notebooks from the Cairo Genizah and their Significance 
Medical History  2013;57(4):487-515.
This article presents a plethora of fragments from the medical notebooks found in the Cairo Genizah that comprise a unique source of historical data for scholarly study and for a better understanding of the ways in which medieval medical knowledge in Egypt was transferred from theory to practice and vice versa. These documents provide the most direct evidence we have for preferred practical medical recipes because they record the choices of medical practitioners in medieval Cairo. Since the language most commonly used in them was Judaeo-Arabic, they were evidently written by Jews. The medical genre in the notebooks was primarily pharmacopoeic, consisting of apparently original recipes for the treatment of various diseases. There are also a few notebooks on materia medica. The subject matter of the Genizah medical notebooks shows that they were mostly of an eclectic nature, i.e. the writers had probably learnt about these treatments and recipes from their teachers, applied them at the hospitals where they worked or copied them from the books they read. Foremost among the subjects dealt with were eye diseases, followed by skin diseases, coughs and colds, dentistry and oral hygiene, and gynaecological conditions. The writers of the Genizah notebooks apparently recorded the practical medical knowledge they wished to preserve for their future use as amateur physicians, students, traditional healers or professional practitioners.
doi:10.1017/mdh.2013.56
PMCID: PMC3865955  PMID: 24069914
Cairo Genizah; History of Medicine; Jewish; Medieval Middle East; Middle Ages; Notebook
5.  Reduced inclination of cervical spine in a novel notebook screen system - implications for rehabilitation 
Background
Professional working at computer notebooks is associated with high requirements on the body posture in the seated position. By the high continuous static muscle stress resulting from this position at notebooks, professionals frequently working at notebooks for long hours are exposed to an increased risk of musculoskeletal complaints. Especially in subjects with back pain, new notebooks should be evaluated with a focus on rehabilitative issues.
Methods
In a field study a new notebook design with adjustable screen was analyzed and compared to standard notebook position.
Results
There are highly significant differences in the visual axis of individuals who are seated in the novel notebook position in comparison to the standard position. Also, differences are present between further alternative notebook positions. Testing of gender and glasses did not reveal influences.
Conclusion
This study demonstrates that notebooks with adjustable screen may be used to improve the posture. Future studies may focus on patients with musculoskeletal diseases.
doi:10.1186/1745-6673-6-30
PMCID: PMC3253038  PMID: 22118159
6.  iLAP: a workflow-driven software for experimental protocol development, data acquisition and analysis 
BMC Bioinformatics  2009;10:390.
Background
In recent years, the genome biology community has expended considerable effort to confront the challenges of managing heterogeneous data in a structured and organized way and developed laboratory information management systems (LIMS) for both raw and processed data. On the other hand, electronic notebooks were developed to record and manage scientific data, and facilitate data-sharing. Software which enables both, management of large datasets and digital recording of laboratory procedures would serve a real need in laboratories using medium and high-throughput techniques.
Results
We have developed iLAP (Laboratory data management, Analysis, and Protocol development), a workflow-driven information management system specifically designed to create and manage experimental protocols, and to analyze and share laboratory data. The system combines experimental protocol development, wizard-based data acquisition, and high-throughput data analysis into a single, integrated system. We demonstrate the power and the flexibility of the platform using a microscopy case study based on a combinatorial multiple fluorescence in situ hybridization (m-FISH) protocol and 3D-image reconstruction. iLAP is freely available under the open source license AGPL from http://genome.tugraz.at/iLAP/.
Conclusion
iLAP is a flexible and versatile information management system, which has the potential to close the gap between electronic notebooks and LIMS and can therefore be of great value for a broad scientific community.
doi:10.1186/1471-2105-10-390
PMCID: PMC2789074  PMID: 19941647
7.  Trees and networks before and after Darwin 
Biology Direct  2009;4:43.
It is well-known that Charles Darwin sketched abstract trees of relationship in his 1837 notebook, and depicted a tree in the Origin of Species (1859). Here I attempt to place Darwin's trees in historical context. By the mid-Eighteenth century the Great Chain of Being was increasingly seen to be an inadequate description of order in nature, and by about 1780 it had been largely abandoned without a satisfactory alternative having been agreed upon. In 1750 Donati described aquatic and terrestrial organisms as forming a network, and a few years later Buffon depicted a network of genealogical relationships among breeds of dogs. In 1764 Bonnet asked whether the Chain might actually branch at certain points, and in 1766 Pallas proposed that the gradations among organisms resemble a tree with a compound trunk, perhaps not unlike the tree of animal life later depicted by Eichwald. Other trees were presented by Augier in 1801 and by Lamarck in 1809 and 1815, the latter two assuming a transmutation of species over time. Elaborate networks of affinities among plants and among animals were depicted in the late Eighteenth and very early Nineteenth centuries. In the two decades immediately prior to 1837, so-called affinities and/or analogies among organisms were represented by diverse geometric figures. Series of plant and animal fossils in successive geological strata were represented as trees in a popular textbook from 1840, while in 1858 Bronn presented a system of animals, as evidenced by the fossil record, in a form of a tree. Darwin's 1859 tree and its subsequent elaborations by Haeckel came to be accepted in many but not all areas of biological sciences, while network diagrams were used in others. Beginning in the early 1960s trees were inferred from protein and nucleic acid sequences, but networks were re-introduced in the mid-1990s to represent lateral genetic transfer, increasingly regarded as a fundamental mode of evolution at least for bacteria and archaea. In historical context, then, the Network of Life preceded the Tree of Life and might again supersede it.
Reviewers
This article was reviewed by Eric Bapteste, Patrick Forterre and Dan Graur.
doi:10.1186/1745-6150-4-43
PMCID: PMC2793248  PMID: 19917100
8.  WASP: Wiki-based Automated Sequence Processor for Epigenomics and Genomics Applications 
w7-2
The advent of massively parallel sequencing (MPS) technology has lead to the development of assays which facilitate the study of epigenomics and genomics at the genome-wide level. However, the computational burden resulting from the need to store and process the gigbytes of data streaming from sequencing machines, in addition to collecting metadata and returning data to users, is becoming a major issue for both sequencing cores and users alike. We present WASP, a LIMS system designed to automate MPS data pre-processing and analysis. WASP integrates a user-friendly MediaWiki front end, a network file system (NFS) and MySQL database for recording experimental data and metadata, plus a multi-node cluster for data processing. The workflow includes capture of sample submission information to the database using web forms on the wiki, recording of core facility operations on samples and linking of samples to flowcells in the database followed by automatic processing of sequence data and running of data analysis pipelines following the sequence run. WASP currently supports MPS using the Illumina GaIIx. For epigenomics applications we provide a pipeline for our novel HpaII-tiny fragment enrichment by ligation-mediated PCR (HELP)-tag method which enables us to quantify the methylation status of ∼1.8 million CpGs located in 70% of the HpaII sites (CCGG) in the human genome. We also provide ChIP-seq analysis using MACS, which is also applicable for methylated DNA immunoprecipitation (MeDIP) assays, in addition to miRNA and mRNA analyses using custom pipelines. Output from the analysis pipelines is automatically linked to a users wiki-space and the data generated can be immediately viewed as tracks in a local mirror of the UCSC genome browser. WASP also provides capabilities for automated billing and keeping track of facility costs. We believe WASP represents a suitable model on which to develop LIMS systems for supporting MPS applications.
PMCID: PMC2918104
9.  Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database 
BioData Mining  2008;1:7.
Background
In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL.
Results
To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available.
Conclusion
This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.
doi:10.1186/1756-0381-1-7
PMCID: PMC2556319  PMID: 18822113
10.  The benefits of integrated systems for managing both samples and experimental data: An opportunity for labs in universities and government research institutions to lead the way 
Currently most biomedical labs in universities and government funded research institutions use paper lab notebooks for recording experimental data and spreadsheets for managing sample data. One consequence is that sample management and documenting experiments are viewed as separate and distinct activities, notwithstanding that samples and aliquots are an integral part of a majority of the experiments carried out by these labs.
Various drivers are pushing labs towards integrated management of sample data and experimental data. These include the ever increasing amounts of both kinds of data, the increasing adoption of online collaborative tools, changing expectations about online communication, and the increasing affordability of electronic lab notebooks and sample management software. There is now an opportunity for smaller labs, which have been slow to move from paper to electronic record keeping, to leapfrog better resourced commercial labs and lead the way in adopting the new generation of tools which permit integrated management of samples and experimental data and a range of tangible benefits to conducting research, including:
1. Fewer lost and mislabelled samples
2. Clearer visualization of relationships between samples and experiments
3. Reduction of experimental error
4. More effective search
5. Productivity gains
6. More efficient use of freezers, leading to cost reduction and enhanced sustainability
7. Improved archiving and enhanced memory at the lab and institutional levels
doi:10.1186/1759-4499-3-2
PMCID: PMC3146905  PMID: 21707999
11.  MyLabStocks: a web-application to manage molecular biology materials 
Yeast (Chichester, England)  2014;31(5):179-184.
Laboratory stocks are the hardware of research. They must be stored and managed with mimimum loss of material and information. Plasmids, oligonucleotides and strains are regularly exchanged between collaborators within and between laboratories. Managing and sharing information about every item is crucial for retrieval of reagents, for planning experiments and for reproducing past experimental results. We have developed a web-based application to manage stocks commonly used in a molecular biology laboratory. Its functionalities include user-defined privileges, visualization of plasmid maps directly from their sequence and the capacity to search items from fields of annotation or directly from a query sequence using BLAST. It is designed to handle records of plasmids, oligonucleotides, yeast strains, antibodies, pipettes and notebooks. Based on PHP/MySQL, it can easily be extended to handle other types of stocks and it can be installed on any server architecture. MyLabStocks is freely available from: https://forge.cbp.ens-lyon.fr/redmine/projects/mylabstocks under an open source licence.
doi:10.1002/yea.3008
PMCID: PMC4019915  PMID: 24643870
yeast strains; plasmids; oligonucleotides; software; laboratory management
12.  MyLabStocks: a web-application to manage molecular biology materials 
Yeast (Chichester, England)  2014;31(5):179-184.
Laboratory stocks are the hardware of research. They must be stored and managed with mimimum loss of material and information. Plasmids, oligonucleotides and strains are regularly exchanged between collaborators within and between laboratories. Managing and sharing information about every item is crucial for retrieval of reagents, for planning experiments and for reproducing past experimental results. We have developed a web-based application to manage stocks commonly used in a molecular biology laboratory. Its functionalities include user-defined privileges, visualization of plasmid maps directly from their sequence and the capacity to search items from fields of annotation or directly from a query sequence using BLAST. It is designed to handle records of plasmids, oligonucleotides, yeast strains, antibodies, pipettes and notebooks. Based on PHP/MySQL, it can easily be extended to handle other types of stocks and it can be installed on any server architecture. MyLabStocks is freely available from: https://forge.cbp.ens-lyon.fr/redmine/projects/mylabstocks under an open source licence.
doi:10.1002/yea.3008
PMCID: PMC4019915  PMID: 24643870
yeast strains; plasmids; oligonucleotides; software; laboratory management
13.  InChI in the wild: an assessment of InChIKey searching in Google 
While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and consequent IK indexing. Searchers who use Google as an adjunct to database access may be less familiar with the advantages of using the IK as explored in this review. As an example, the IK for atorvastatin retrieves ~200 low-redundancy links from a Google search in 0.3 of a second. These include most major databases and a very low false-positive rate. Results encompass less familiar but potentially useful sources and can be extended to isomer capture by using just the skeleton layer of the IK. Google Advanced Search can be used to filter large result sets. Image searching with the IK is also effective and complementary to open-web queries. Results can be particularly useful for less-common structures as exemplified by a major metabolite of atorvastatin giving only three hits. Testing also demonstrated document-to-document and document-to-database joins via structure matching. The necessary generation of an IK from chemical names can be accomplished using open tools and resources for patents, papers, abstracts or other text sources. Active global sharing of local IK-linked information can be accomplished via surfacing in open laboratory notebooks, blogs, Twitter, figshare and other routes. While information-rich chemistry (e.g. approved drugs) can exhibit swamping and redundancy effects, the much smaller IK result sets for link-poor structures become a transformative first-pass option. The IK indexing has therefore turned Google into a de-facto open global chemical information hub by merging links to most significant sources, including over 50 million PubChem and ChemSpider records. The simplicity, specificity and speed of matching make it a useful option for biologists or others less familiar with chemical searching. However, compared to rigorously maintained major databases, users need to be circumspect about the consistency of Google results and provenance of retrieved links. In addition, community engagement may be necessary to ameliorate possible future degradation of utility.
doi:10.1186/1758-2946-5-10
PMCID: PMC3598674  PMID: 23399051
InChI; InChIKey; Databases; Google; Chemical structures; Patents; PubChem; ChemSpider
14.  Mining the Gene Wiki for functional genomic knowledge 
BMC Genomics  2011;12:603.
Background
Ontology-based gene annotations are important tools for organizing and analyzing genome-scale biological data. Collecting these annotations is a valuable but costly endeavor. The Gene Wiki makes use of Wikipedia as a low-cost, mass-collaborative platform for assembling text-based gene annotations. The Gene Wiki is comprised of more than 10,000 review articles, each describing one human gene. The goal of this study is to define and assess a computational strategy for translating the text of Gene Wiki articles into ontology-based gene annotations. We specifically explore the generation of structured annotations using the Gene Ontology and the Human Disease Ontology.
Results
Our system produced 2,983 candidate gene annotations using the Disease Ontology and 11,022 candidate annotations using the Gene Ontology from the text of the Gene Wiki. Based on manual evaluations and comparisons to reference annotation sets, we estimate a precision of 90-93% for the Disease Ontology annotations and 48-64% for the Gene Ontology annotations. We further demonstrate that this data set can systematically improve the results from gene set enrichment analyses.
Conclusions
The Gene Wiki is a rapidly growing corpus of text focused on human gene function. Here, we demonstrate that the Gene Wiki can be a powerful resource for generating ontology-based gene annotations. These annotations can be used immediately to improve workflows for building curated gene annotation databases and knowledge-based statistical analyses.
doi:10.1186/1471-2164-12-603
PMCID: PMC3271090  PMID: 22165947
15.  Charles Darwin and the Origin of Life 
When Charles Darwin published The Origin of Species 150 years ago he consciously avoided discussing the origin of life. However, analysis of some other texts written by Darwin, and of the correspondence he exchanged with friends and colleagues demonstrates that he took for granted the possibility of a natural emergence of the first life forms. As shown by notes from the pages he excised from his private notebooks, as early as 1837 Darwin was convinced that “the intimate relation of Life with laws of chemical combination, & the universality of latter render spontaneous generation not improbable”. Like many of his contemporaries, Darwin rejected the idea that putrefaction of preexisting organic compounds could lead to the appearance of organisms. Although he favored the possibility that life could appear by natural processes from simple inorganic compounds, his reluctance to discuss the issue resulted from his recognition that at the time it was possible to undertake the experimental study of the emergence of life.
doi:10.1007/s11084-009-9172-7
PMCID: PMC2745620  PMID: 19633921
Darwin; Warm little pond; Origin of life; Spontaneous generation
16.  A novel collaborative e-learning platform for medical students - ALERT STUDENT 
BMC Medical Education  2014;14:143.
Background
The increasing complexity of medical curricula would benefit from adaptive computer supported collaborative learning systems that support study management using instructional design and learning object principles. However, to our knowledge, there are scarce reports regarding applications developed to meet this goal and encompass the complete medical curriculum. The aim of ths study was to develop and assess the usability of an adaptive computer supported collaborative learning system for medical students to manage study sessions.
Results
A study platform named ALERT STUDENT was built as a free web application. Content chunks are represented as Flashcards that hold knowledge and open ended questions. These can be created in a collaborative fashion. Multiple Flashcards can be combined into custom stacks called Notebooks that can be accessed in study Groups that belong to the user institution. The system provides a Study Mode that features text markers, text notes, timers and color-coded content prioritization based on self-assessment of open ended questions presented in a Quiz Mode. Time spent studying and Perception of knowledge are displayed for each student and peers using charts. Computer supported collaborative learning is achieved by allowing for simultaneous creation of Notebooks and self-assessment questions by many users in a pre-defined Group. Past personal performance data is retrieved when studying new Notebooks containing previously studied Flashcards. Self-report surveys showed that students highly agreed that the system was useful and were willing to use it as a reference tool.
Conclusions
The platform employs various instructional design and learning object principles in a computer supported collaborative learning platform for medical students that allows for study management. The application broadens student insight over learning results and supports informed decisions based on past learning performance. It serves as a potential educational model for the medical education setting that has gathered strong positive feedback from students at our school.
This platform provides a case study on how effective blending of instructional design and learning object principles can be brought together to manage study, and takes an important step towards bringing information management tools to support study decisions and improving learning outcomes.
doi:10.1186/1472-6920-14-143
PMCID: PMC4131539  PMID: 25017028
Medical education; Computer supported collaborative learning; E-learning; Information management; Memory retention; Computer-assisted instruction; Tailored learning; Student-centered learning; Spaced repetition
17.  Building a Virtual Network in a Community Health Research Training Program 
Objective: To describe the experiences, lessons, and implications of building a virtual network as part of a two-year community health research training program in a Canadian province.
Design: An action research field study in which 25 health professionals from 17 health regions participated in a seven-week training course on health policy, management, economics, research methods, data analysis, and computer technology. The participants then returned to their regions to apply the knowledge in different community health research projects. Ongoing faculty consultations and support were provided as needed. Each participant was given a notebook computer with the necessary software, Internet access, and technical support for two years, to access information resources, engage in group problem solving, share ideas and knowledge, and collaborate on projects.
Measurements: Data collected over two years consisted of program documents, records of interviews with participants and staff, meeting notes, computer usage statistics, automated online surveys, computer conference postings, program Web site, and course feedback. The analysis consisted of detailed review and comparison of the data from different sources. NUD*IST was then used to validate earlier study findings.
Results: The ten key lessons are that role clarity, technology vision, implementation staging, protected time, just-in-time training, ongoing facilitation, work integration, participatory design, relationship building, and the demonstration of results are essential ingredients for building a successful network.
Conclusion: This study provides a descriptive model of the processes involved in developing, in the community health setting, virtual networks that can be used as the basis for future research and as a practical guide for managers.
PMCID: PMC61441  PMID: 10887165
18.  eCAT: Online electronic lab notebook for scientific research 
Background
eCAT is an electronic lab notebook (ELN) developed by Axiope Limited. It is the first online ELN, the first ELN to be developed in close collaboration with lab scientists, and the first ELN to be targeted at researchers in non-commercial institutions. eCAT was developed in response to feedback from users of a predecessor product. By late 2006 the basic concept had been clarified: a highly scalable web-based collaboration tool that possessed the basic capabilities of commercial ELNs, i.e. a permissions system, controlled sharing, an audit trail, electronic signature and search, and a front end that looked like the electronic counterpart to a paper notebook.
Results
During the development of the beta version feedback was incorporated from many groups including the FDA's Center for Biologics Evaluation & Research, Uppsala University, Children's Hospital Boston, Alex Swarbrick's lab at the Garvan Institute in Sydney and Martin Spitaler at Imperial College. More than 100 individuals and groups worldwide then participated in the beta testing between September 2008 and June 2009. The generally positive response is reflected in the following quote about how one lab is making use of eCAT: "Everyone uses it as an electronic notebook, so they can compile the diverse collections of data that we generate as biologists, such as images and spreadsheets. We use to it to take minutes of meetings. We also use it to manage our common stocks of antibodies, plasmids and so on. Finally, perhaps the most important feature for us is the ability to link records, reagents and experiments."
Conclusion
By developing eCAT in close collaboration with lab scientists, Axiope has come up with a practical and easy-to-use product that meets the need of scientists to manage, store and share data online. eCAT is already being perceived as a product that labs can continue to use as their data management and sharing grows in scale and complexity.
doi:10.1186/1759-4499-1-4
PMCID: PMC2809322  PMID: 20334629
19.  CrasyDSE: A framework for solving Dyson–Schwinger equations☆ 
Computer Physics Communications  2012;183(11):2441-2457.
Dyson–Schwinger equations are important tools for non-perturbative analyses of quantum field theories. For example, they are very useful for investigations in quantum chromodynamics and related theories. However, sometimes progress is impeded by the complexity of the equations. Thus automating parts of the calculations will certainly be helpful in future investigations. In this article we present a framework for such an automation based on a C++ code that can deal with a large number of Green functions. Since also the creation of the expressions for the integrals of the Dyson–Schwinger equations needs to be automated, we defer this task to a Mathematica notebook. We illustrate the complete workflow with an example from Yang–Mills theory coupled to a fundamental scalar field that has been investigated recently. As a second example we calculate the propagators of pure Yang–Mills theory. Our code can serve as a basis for many further investigations where the equations are too complicated to tackle by hand. It also can easily be combined with DoFun, a program for the derivation of Dyson–Schwinger equations.1
Program summary
Program title: CrasyDSE
Catalogue identifier: AEMY _v1_0
Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEMY_v1_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html
No. of lines in distributed program, including test data, etc.: 49030
No. of bytes in distributed program, including test data, etc.: 303958
Distribution format: tar.gz
Programming language: Mathematica 8 and higher, C++.
Computer: All on which Mathematica and C++ are available.
Operating system: All on which Mathematica and C++ are available (Windows, Unix, Mac OS).
Classification: 11.1, 11.4, 11.5, 11.6.
Nature of problem: Solve (large) systems of Dyson–Schwinger equations numerically.
Solution method: Create C++ functions in Mathematica to be used for the numeric code in C++. This code uses structures to handle large numbers of Green functions.
Unusual features: Provides a tool to convert Mathematica expressions into C++ expressions including conversion of function names.
Running time: Depending on the complexity of the investigated system solving the equations numerically can take seconds on a desktop PC to hours on a cluster.
doi:10.1016/j.cpc.2012.05.019
PMCID: PMC4267552  PMID: 25540463
Dyson–Schwinger equations; Correlation functions; Quantum field theory
20.  Evaluation of text-mining systems for biology: overview of the Second BioCreative community challenge 
Genome Biology  2008;9(Suppl 2):S1.
Background:
Genome sciences have experienced an increasing demand for efficient text-processing tools that can extract biologically relevant information from the growing amount of published literature. In response, a range of text-mining and information-extraction tools have recently been developed specifically for the biological domain. Such tools are only useful if they are designed to meet real-life tasks and if their performance can be estimated and compared. The BioCreative challenge (Critical Assessment of Information Extraction in Biology) consists of a collaborative initiative to provide a common evaluation framework for monitoring and assessing the state-of-the-art of text-mining systems applied to biologically relevant problems.
Results:
The Second BioCreative assessment (2006 to 2007) attracted 44 teams from 13 countries worldwide, with the aim of evaluating current information-extraction/text-mining technologies developed for one or more of the three tasks defined for this challenge evaluation. These tasks included the recognition of gene mentions in abstracts (gene mention task); the extraction of a list of unique identifiers for human genes mentioned in abstracts (gene normalization task); and finally the extraction of physical protein-protein interaction annotation-relevant information (protein-protein interaction task). The 'gold standard' data used for evaluating submissions for the third task was provided by the interaction databases MINT (Molecular Interaction Database) and IntAct.
Conclusion:
The Second BioCreative assessment almost doubled the number of participants for each individual task when compared with the first BioCreative assessment. An overall improvement in terms of balanced precision and recall was observed for the best submissions for the gene mention (F score 0.87); for the gene normalization task, the best results were comparable (F score 0.81) compared with results obtained for similar tasks posed at the first BioCreative challenge. In case of the protein-protein interaction task, the importance and difficulties of experimentally confirmed annotation extraction from full-text articles were explored, yielding different results depending on the step of the annotation extraction workflow. A common characteristic observed in all three tasks was that the combination of system outputs could yield better results than any single system. Finally, the development of the first text-mining meta-server was promoted within the context of this community challenge.
doi:10.1186/gb-2008-9-s2-s1
PMCID: PMC2559980  PMID: 18834487
21.  Integrating text mining into the MGI biocuration workflow 
A major challenge for functional and comparative genomics resource development is the extraction of data from the biomedical literature. Although text mining for biological data is an active research field, few applications have been integrated into production literature curation systems such as those of the model organism databases (MODs). Not only are most available biological natural language (bioNLP) and information retrieval and extraction solutions difficult to adapt to existing MOD curation workflows, but many also have high error rates or are unable to process documents available in those formats preferred by scientific journals.
In September 2008, Mouse Genome Informatics (MGI) at The Jackson Laboratory initiated a search for dictionary-based text mining tools that we could integrate into our biocuration workflow. MGI has rigorous document triage and annotation procedures designed to identify appropriate articles about mouse genetics and genome biology. We currently screen ∼1000 journal articles a month for Gene Ontology terms, gene mapping, gene expression, phenotype data and other key biological information. Although we do not foresee that curation tasks will ever be fully automated, we are eager to implement named entity recognition (NER) tools for gene tagging that can help streamline our curation workflow and simplify gene indexing tasks within the MGI system. Gene indexing is an MGI-specific curation function that involves identifying which mouse genes are being studied in an article, then associating the appropriate gene symbols with the article reference number in the MGI database.
Here, we discuss our search process, performance metrics and success criteria, and how we identified a short list of potential text mining tools for further evaluation. We provide an overview of our pilot projects with NCBO's Open Biomedical Annotator and Fraunhofer SCAI's ProMiner. In doing so, we prove the potential for the further incorporation of semi-automated processes into the curation of the biomedical literature.
doi:10.1093/database/bap019
PMCID: PMC2797454  PMID: 20157492
22.  Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life 
PLoS ONE  2014;9(3):e89550.
Numerous digitization and ontological initiatives have focused on translating biological knowledge from narrative text to machine-readable formats. In this paper, we describe two workflows for knowledge extraction and semantic annotation of text data objects featured in an online biodiversity aggregator, the Encyclopedia of Life. One workflow tags text with DBpedia URIs based on keywords. Another workflow finds taxon names in text using GNRD for the purpose of building a species association network. Both workflows work well: the annotation workflow has an F1 Score of 0.941 and the association algorithm has an F1 Score of 0.885. Existing text annotators such as Terminizer and DBpedia Spotlight performed well, but require some optimization to be useful in the ecology and evolution domain. Important future work includes scaling up and improving accuracy through the use of distributional semantics.
doi:10.1371/journal.pone.0089550
PMCID: PMC3940440  PMID: 24594988
23.  Bioinformatics process management: information flow via a computational journal 
This paper presents the Bioinformatics Computational Journal (BCJ), a framework for conducting and managing computational experiments in bioinformatics and computational biology. These experiments often involve series of computations, data searches, filters, and annotations which can benefit from a structured environment. Systems to manage computational experiments exist, ranging from libraries with standard data models to elaborate schemes to chain together input and output between applications. Yet, although such frameworks are available, their use is not widespread–ad hoc scripts are often required to bind applications together. The BCJ explores another solution to this problem through a computer based environment suitable for on-site use, which builds on the traditional laboratory notebook paradigm. It provides an intuitive, extensible paradigm designed for expressive composition of applications. Extensive features facilitate sharing data, computational methods, and entire experiments. By focusing on the bioinformatics and computational biology domain, the scope of the computational framework was narrowed, permitting us to implement a capable set of features for this domain. This report discusses the features determined critical by our system and other projects, along with design issues. We illustrate the use of our implementation of the BCJ on two domain-specific examples.
doi:10.1186/1751-0473-2-9
PMCID: PMC2228283  PMID: 18053179
24.  Keeping Track of Interactomes Using the ProHits LIMS 
Affinity purification coupled with mass spectrometry (AP-MS) is a robust technique used to identify protein-protein interactions. With recent improvements in sample preparation, and dramatic advances in MS instrumentation speed and sensitivity, this technique is becoming more widely used throughout the scientific community. To meet the needs of research groups both large and small, we have developed software solutions for tracking, scoring and analyzing AP-MS data. Here, we provide details for the installation and utilization of ProHits, a Laboratory Information Management System designed specifically for AP-MS interaction proteomics that we distribute freely to the scientific community at ProHitsMS.com, and which is under continuous development. The complete ProHits solution1 performs scheduled backup of mass spectrometry data and initiates database searches (Mascot, X!Tandem, COMET, SEQUEST and the output from the TransProteomics Pipeline are now supported). It stores search results and enables linking the mass spectrometry data to entries in the relational database module called “Analyst”, which is also available as a stand-alone application (including as an easy-to-install virtual machine implementation2). ProHits Analyst is organized in a hierarchical manner by project, bait, experiment and sample and also serves as an electronic notebook. When a sample is created, mass spectrometry search results can be uploaded. Search results can be explored using a series of viewers, filtered based on mass spectrometry quality, frequency of detection or background lists, viewed in Cytoscape-Web or exported to text or as a PSI XML format for deposition in interaction databases. Importantly, however, search results can be further analyzed using the SAINT statistical tool which is seamlessly integrated within ProHits to derive interaction confidence scores(3-5). With the integration with a number of open source tools and public repositories, ProHits facilitates transparent analysis and reporting of AP-MS data. 1PMID:209445832PMID:229487303PMID:204890234PMID:211319685PMID:22948729
PMCID: PMC3635280
25.  ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses 
BMC Bioinformatics  2008;9(Suppl 6):S18.
Background
A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources.
Results
To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers (Semantic Agents) such as Google to further enhance data discovery.
Conclusions
Microarray data and meta information in ArrayWiki are distributed and visualized using a novel and compact data storage format, BioPNG. Also, they are open to the research community for curation, modification, and contribution. By making a small investment of time to learn the syntax and structure common to all sites running MediaWiki software, domain scientists and practioners can all contribute to make better use of microarray technologies in research and medical practices. ArrayWiki is available at .
doi:10.1186/1471-2105-9-S6-S18
PMCID: PMC2423441  PMID: 18541053

Results 1-25 (1165168)