|Home | About | Journals | Submit | Contact Us | Français|
The challenge of modern nutrition and health research is to identify food-based strategies promoting life-long optimal health and well-being. This research is complex because it exploits a multitude of bioactive compounds acting on an extensive network of interacting processes. Whereas nutrition research can profit enormously from the revolution in ‘omics’ technologies, it has discipline-specific requirements for analytical and bioinformatic procedures. In addition to measurements of the parameters of interest (measures of health), extensive description of the subjects of study and foods or diets consumed is central for describing the nutritional phenotype. We propose and pursue an infrastructural activity of constructing the “Nutritional Phenotype database” (dbNP). When fully developed, dbNP will be a research and collaboration tool and a publicly available data and knowledge repository. Creation and implementation of the dbNP will maximize benefits to the research community by enabling integration and interrogation of data from multiple studies, from different research groups, different countries and different—omics levels. The dbNP is designed to facilitate storage of biologically relevant, pre-processed—omics data, as well as study descriptive and study participant phenotype data. It is also important to enable the combination of this information at different levels (e.g. to facilitate linkage of data describing participant phenotype, genotype and food intake with information on study design and—omics measurements, and to combine all of this with existing knowledge). The biological information stored in the database (i.e. genetics, transcriptomics, proteomics, biomarkers, metabolomics, functional assays, food intake and food composition) is tailored to nutrition research and embedded in an environment of standard procedures and protocols, annotations, modular data-basing, networking and integrated bioinformatics. The dbNP is an evolving enterprise, which is only sustainable if it is accepted and adopted by the wider nutrition and health research community as an open source, pre-competitive and publicly available resource where many partners both can contribute and profit from its developments. We introduce the Nutrigenomics Organisation (NuGO, http://www.nugo.org) as a membership association responsible for establishing and curating the dbNP. Within NuGO, all efforts related to dbNP (i.e. usage, coordination, integration, facilitation and maintenance) will be directed towards a sustainable and federated infrastructure.
A primary goal of nutrition research is to optimize health by prevention, delay, or reduction in the severity of disease via dietary means. Determining optimal dietary intakes to maintain health require relevant methods for assessing the effects of the huge range of diverse food-delivered compounds (macro and micronutrients, and non-nutritional bioactive compounds) on individual health. There is a good understanding of the basic nutrient requirements for health maintenance, but the next steps towards quantification of the relationships between nutrition and health have proven to be difficult. Although nutrition researchers have adopted many modern approaches and technologies, tools for measuring the two major nutrition-specific “research axes” are far from perfect:
Genotypic (genetic and epigenetic) variation further complicates the picture, because the relation between the in and output axis depends on the genotype. Also, the time factor can be considered as a research axis.
Nutrition research needs better biomarkers of both exposures and outcome. This calls for approaches where results are automatically combined with knowledge derived from different sources such as existing protein–protein interaction databases, miRNA and transcription target inference data and literature sources. A database system is required for nutrition research to facilitate such approaches.
The description and quantification of the consequences for human physiology in response to nutrition are now commonly called the nutritional phenotype. The concept of the nutritional phenotype was first introduced by Zeisel et al.  who proposed that this should be defined as an integrated set of genetic, proteomic, metabolomic, functional and behavioural factors that form the basis for assessment of human nutritional status. The nutritional phenotype integrates the effects of diet on disease/wellness and is the quantitative indication of the paths by which genes and environment exert their effects on health.
The need to accurately capture subtle changes in a multitude of variables creates several challenges. Standardized technology, methodology and data formats are required for meeting these challenges. Elements of these issues are common to all biological sciences, and efforts to produce solutions and best practices for technologies and data handling in these areas are under way [27, 32]. To benefit from and to align with these developments, nutrition researchers have to adopt, adapt and customise the standards. In this paper, we address the nutrition-specific requirements and propose a strategy for meeting these challenges.
“The data handling challenges for nutrition research” describes the challenges of processing data for nutritional research. “A nutrigenomics research infrastructure” lists existing kinds of infrastructure available for nutrigenomics and describes dbNP as a system integrating such kinds of infrastructure. “The nutritional phenotype database” gives a detailed description of the dbNP. “The Nutrigenomics Organisation as a sustainable model of the nutritional phenotype database” explains the institutionalized curation of dbNP by the Nutrigenomics Organisation. “Conclusion” provides an outlook to future work and concludes this article.
Nutrition research has undergone a revolution in the last decade. To a large extent, this revolution is shared by most biology-based research and includes 6 areas:
The progress of biomedical research, technological advances and infrastructure developments is relevant for nutrition research. Thus, many nutrition research projects have exploited these developments, by the use of technologies and conceptual innovations, including the following six:
We propose implementation of an infrastructure for nutrition research, with an organizational framework that will facilitate optimal performance, storage, evaluation and sharing of information and results. The infrastructure is applicable to all nutritional studies, including human intervention studies and experimental studies in (transgenic) animals, and will provide necessary mechanistic insights into gene–nutrient interactions.
The core of this infrastructure rests on two pillars:
The dbNP is designed to facilitate the description along the two axes needed to perform nutrition studies (the exposure and the effect of food intake) and to connect them to information on genetic variation and study design. Capturing of the current major components of these axes (genetics, transcriptomics, proteomics, biomarkers, metabolomics, functional assays, imaging technologies, food intake and food composition) should be tailored to nutrition research. If we are to make maximum use of the collected information without introducing limitations for new research approaches or the use of new technologies, the dbNP requires extensive standard protocols and quality standards for nutritional data capturing, and nutrition-specific annotations, modular data-basing, distributed networking and integrated bioinformatics.
The dbNP will store and allow retrieval of data from high quality nutrition studies, regardless of the technology by which the data were acquired. This is different from most existing—omics databases, which often store data from one specific analytical technology only. Thus, dbNP would be the first complete systems biology study database.
This section describes the dbNP as follows: First, we introduce the pipeline in the dbNP. Next, we describe the eight characteristic design principles of the dbNP. This is followed by an overview of the three levels for accessing the dbNP. Finally, we give a detailed account of the modules from which the dbNP is built.
The integration of different data-capturing technologies in the dbNP can be illustrated by describing a typical workflow. It includes capturing the study design, (dietary) intervention, sampling protocol and the quantitative results using a dedicated study capture module. This module can combine the study description with the actual measurements, which can be used for storage in data repositories. The loading module loads the raw experimental data into technology-specific databases and links to a series of parallel technology-specific data processing pipelines (see Fig. 1).
The purpose of such pipelines is to produce and deliver “clean data” to several database modules. By “clean data”, we imply that the raw data have been transformed from their platform-specific format into a quality controlled and statistically evaluated format providing numerical values such as activities or concentrations, fold changes and p-values. The database modules can be queried based on the study design as well as experimental data and are connected to a series of statistical and bioinformatics packages facilitating further data processing. The modular, technology-specific database structure allows each technology to provide its dedicated LIMS and data pre-processing procedures, whereas the study design module, the study query and the evaluation modules provide integrated views of the biological research questions. In reality, data derived from different technologies but providing similar type of information (e.g. LC-MS and NMR both providing “clean” metabolomics data, or two types of transcriptomics technology using different pre-processing pipelines) are all stored in the same “clean data” database. Thus, a completely modular, flexible and technology-independent database structure is created and updated continually as new datasets are uploaded.
The architecture of the dbNP has the following eight design principles:
The dbNP serves different user needs simultaneously. We distinguish between three kinds of access allowing different modes of using the dbNP: (1) single study mode; (2) the protected-environment mode; and (3) public-domain mode.
This mode of dbNP is typical of projects where results have not been made fully available within public-domain databases. Permissions to access these datasets would be limited to collaborating centres. However, other members of the dbNP community could request access promoting novel research collaborations within the dbNP community, which might improve data processing. At this mode, all data remain fully “owner controlled”, stored on the NBX and thus accessible only to those specifically given permission by the data owner. This mode of multi-study collaboration and exchange is depicted in Fig. 3.
Results can also be uploaded to existing public-domain databases (such as GEO and ArrayExpress for transcriptomics data and PrIDE for proteomics data), using ISA-archives created at an early stage of data storage.
During analysis data can also be combined with other data from the external data repositories. An example of how such an approach would work is the integration of the already available dual web service integration of high quality expression data from ArrayExpress Atlas in pathways from WikiPathways . A similar approach could be extended using dbNP web services allowing evaluation of published and unpublished studies.
As described in “Characteristics of the nutritional phenotype database”, the dbNP is organized in modules: (1) functional modules; and (2) database modules. Functional modules offer services within the dbNP workflow and interface with the user. Existing and planned functional modules are the ISA-web study capture tool, the ISA-creator and BioInvestigation Index, the bioinformatics and statistics toolbox. Database modules include data at different levels of processing. These data may be generated by functional modules. Database modules include the study database, the genetic variation module, the phenotypic module and the food intake database. A database module may consist of one or more physical databases.
The modules of dbNP are organized according to the workflow and elements presented in the scheme (Fig. 5).
For a consistent entry of study information in the dbNP, it is essential to have a tool capturing and querying study metadata, sample characteristics, study design, measurements (e.g. transcriptomics, metabolomics, and biomarkers), SOPs and sample-data relationships. Nutrition studies may have complex designs like multiple doses, sampling time points and challenge tests, and all these should be made available for data analysis.
Consistent reporting of these experimental metadata and associated data files has a positive and long-lasting impact on the value of collective scientific outputs. It is critical, however, to reach a compromise between detail and practical reporting, and thus have good overall compliance. For this purpose, the nutritional phenotype database develops ISA-web and an underlying study description database.
ISA-web is a web-based application designed to structure and edit experimental metadata in “ISA-Tab” format (http://isa-tab.sf.net) and package it with corresponding data files for submission to the study description database. ISA-web has a dropdown menu to select a certain template to comply with relevant minimal information for biological and biomedical investigation (http://mibbi.sf.net). A wizard is constructed to provide a knowledge driven/assisted creation mode, which further reduces repetitive tasks. The direct interface between ISA-web and the study description database prevents duplication.
The ISA-Tab format has been developed to deposit multi-assay study datasets at EBI. ISA-web works in concert with the other modules, detailed in the next section. This ensures data persistence to a relational database management system.
The study description database includes a description of experimental information like the aims and hypothesis, design of the experiment, information detailing stress and response variables (also known as independent and dependent variables), as well as sample and data processing information.
The main goal is to allow navigation from samples and their characteristics to data files holding information on molecular measurement for further analysis, irrespective of the technique used to generate those data files.
Two alternative, more generic and more professional modules, are developed by the European Bioinformatics Institute: ISA-creator and BioInvestigation Index. These two modules are not specifically designed to store nutrition intervention studies, but rather aim at covering all possible study designs. ISA-creator is a stand-alone study capture tool. A wizard is constructed to help tailor the data capture to specific study types, which further reduces repetitive tasks. It has Excel-like functionalities and look, coupled with dynamic graphical view. Standardized metadata capturing is facilitated by support for searching and use of OBO Foundry ontologies (http://www.obofoundry.org) accessed via the Ontology Lookup Service (http://www.ebi.ac.uk/ontology-lookup).
The BioInvestigation Index (BII) database is able to propagate the information coded in ISA-Tab syntax to a relational database, thus enabling queries to be executed. Software has been developed to read, parse and persist ISA-Tab encoded information to BII databases. This software may also dispatch data files to technology-specific data stores and can be configured by local users. The dispatcher code is able not only to load data into BII but also to export data to a variety of public repositories, including PRIDE , ENA  and ArrayExpress . It is important to note that dispatcher component can be configured to point to local data stores, and submission to public repositories is not mandatory. Finally, an R package, supporting the ISA-Tab format, is under development. It is meant to facilitate data and metadata manipulation for analyses.
The purpose of this database module is to store “clean” transcriptomics data only: non-biological noise is removed from each experiment (microarray or otherwise obtained transcriptome data) as much as possible. The “clean” data are obtained by executing a dedicated “NuGOMakeCleanData” module in GenePattern. Genepattern is created and maintained by the Broad Institute. This module requires the original.CEL-files, which belong to a complete single experiment, as input. The output is “clean” data stored in a so-called.gct file and a.chip file describing the probes on the microarray chip and their descriptions. The file formats are defined by the Broad institute. A recent publication  describes NuGO contributions to Genepattern, and the NuGOMakeCleanData module is a recent new contribution.
This NuGOMakeCleanData module is essential as each microarray experiment requires a normalization procedure to remove experimental artefacts. For an individual experiment, these normalization procedures are convenient. However, if one wishes to combine multiple microarray experiments, so-called batch-effect problems are encountered. Each experiment has its own inherent features, which adds noise to the experiment. This can be demonstrated, for example by hierarchical clustering of the samples, where the samples cluster on experiment instead of biological exposure. Performing a statistical analysis on such combined experiments is thus problematic. Moreover, performing a microarray normalization procedure is time-consuming.
Therefore, it is necessary to store all microarray experiments in such a way that the batch-effect is minimized and no longer interferes with statistical analysis and more advanced queries. An advanced query in the nutritional phenotype database with transcriptomics data includes two steps: (1) collect the samples (among multiple experiments) that comply with criteria defined by scientist; and (2) retrieve the data from the clean transcriptomics database fast. After these steps, no further processing of the data is required, other than performing advanced queries or follow-up analyses on the combined datasets.
Gene expression intensities, fold changes and their respective p-values and q-values are stored with technology-related reporter identifiers (which for instance can be probe set identifiers). Translations to any other type of gene-identifiers are made by the query tool or during pathway profile generation using the BridgeDB software framework (http://www.bridgedb.org) .
The dbNP contains a database module for storage of data on metabolites for the phenotypic characterization of biofluids and tissues. Metabolite data can be derived from metabolomics experiments or clinical chemistry measurements. Both data formats will be supported by this module. Thus, some metabolites might be reported more than once.
Non-identified compounds from metabolomics analysis receive unique identifiers. Each peak in datasets derived from the same metabolomics platform is always represented by the same ID, which gives a certain unknown peak always the same identifier. The module differentiates between annotated (named peaks via databases) and identified metabolites (named peaks via confirmation by standards). The named metabolites are at least represented by human metabolome database (HMDB) identifiers (http://www.hmdb.ca). BridgeDB is used within the querying tool to translate the identifiers to other database-identifiers (e.g. ChEbI IDs), making it possible to retrieve a wide variety of metabolite information. Importantly, double identities in one study caused by the use of multiple platforms are not resolved.
Since not all metabolomics technologies produce results of the same quality, information on different levels of quality is defined and retrievable. First, data are expressed in standardized units, and information on reference ranges is included. Metabolite values are represented either in molar units or in relative to a well-defined control plasma sample (available from TNO, via Ben van Ommen), and/or synthetic plasma/urine sample (available from University of Copenhagen, via Lars Dragsted). For any of the other compounds, standard units will be defined. Moreover, information such as under/above detectable limit and statistical significance is incorporated. Secondly, information is included on correction methods (e.g. time drifts, alignment, and deconvolution) and standards (e.g. quality control sample, concentration curve and added internal standards). Thirdly, the database has storage capacity for the description of the applied methods and SOPs. Finally, information can be included on biomarker approval status (by EFSA, FDA and Passclaim) or most trustworthy method (by Eurreca).
For the interpretation of the phenotypic data, biological information on possible origins is essential. Examples are the possible origin of compounds (e.g. is it a bacterial metabolite, in which organ or biofluid was detected, is it a drug metabolite), information on intake, nutrient status and involvement in biological processes such as inflammation and oxidation–reduction. This information is made available via links to other databases/resources, such as NuGOwiki (http://www.nugowiki.org) and OMIM.
Additional features of the phenotypic module are differentiation between published and non-published data, batch import of data, an updating possibility with improved correction and identification methods from raw data and version management.
Protein (e.g. measured by multiplex or other targeted methods) and clinical biochemistry enzyme activity data are identified by Entrez and/or Swiss-Prot IDs, and for enzymes EC numbers are used. A complete proteome incorporation into this module has not been scheduled yet and collaborators are invited to join this initiative.
The genetic module of dbNP will store genetic information of the study subjects and links this to existing genomic databases: reference sequences are stored along with annotations of nucleotide, copy number variations and indels (insertions/deletions).
The goal of many efforts (e.g. the human variome project  and dbGAP ) is to associate a phenotype to each variant. dbNP does not require to replicate these data and therefore will link to NCBI, EBI, or other national and international databases for reference and variant sequences, allele frequencies, and other genetic variations.
Genetic variations cannot easily be translated into phenotypic effects. Individuals differ in response to food intake and the environment due to their unique genetic make-up. No population will display the entire range of phenotypic variation possible for a single genetic variant (SNP, copy number, deletion, insertion or other) or collection of variants (haplotypes). Epigenetic interactions (alteration in expression of genetic information not caused by changes in DNA sequence) may also alter the expression of the gene(s) variant. DNA methylation, microRNAs and histone modifications contribute to different extents to produce epigenetic differences among individuals .
The data necessary to decipher the function of genetic variations or thus requires extensive genotyping. Specifically, nutrition experiments will require
The first version of the database will focus on variants of interest for nutrient–gene interactions, capture and store genotyping results from array analyses, the sequence of multiple candidate genes and DNA methylation from targeted candidate genes. As DNA sequencing technologies improve and costs decrease, the future versions of dbNP will store or link to sequence information of each individual along with their global DNA methylation patterns .
These high dimensional datasets will be used to eventually create a distinct genome attribute for each individual. The analogy best describing this attribute is how biochemists refer to enzyme activity: units of activity are not as meaningful as units of activity per milligram protein. Each SNP or copy number variant could express differently in one individual (SNP1/genome A) than in another individual (SNP1/genome B). No such measure currently exists. Evidence supporting such concepts has been shown at the population level where, e.g. haplotype K has a different impact on myocardial infarction and CVD in European-Americans when compared with African-Americans .
The full range of phenotypic effects of a given SNP or copy number variant will thus require comparison of gene–nutrient interactions in widely different genetic and cultural backgrounds . Each genetic variant will be linked to the nutrients influencing its expression/phenotypic effect, and those effects will vary depending upon genome attribute (the denominator). Grouping or clustering genome attributes may reduce the variation from ~7 billion unique possibilities to smaller set of “metabolic” groups. Linear and non-linear dimensionality reduction algorithms and more complete genotype data will allow the incorporation of genomic data into analyses of nutrient phenotypes. Recent metabolomic research has demonstrated the existence of metabolic groups [1, 2, 16, 29], and such groups may ultimately be linked to genotype data.
The major challenges for the food intake database are (1) to capture a heterogeneous and complex eating behaviour in a systematic manner; and (2) to “translate” information on food intake to intakes of energy, nutrients and other bioactive compounds. Many tools are in use for assessing food intake and all struggle to assess habitual food intake accurately. The tool of choice will depend on the purpose of the study and the resources available. Food diary and 24 h recall-based methods offer the opportunity to collect the individual richness of food ingestion behaviour but generate technical challenges and are resource expensive when converting food descriptions to intakes of energy, nutrients and other bioactive compounds. Well-validated food frequency questionnaires (FFQ) designed to cover the whole diet are widely used, especially in large epidemiological studies. FFQ have the additional advantage that they are readily adapted for web-based data versions and can be linked to food composition databases.
Sharing information between projects in different parts of the world will require data, which are freed from their cultural and geographical constraints, by use of harmonized food composition databases. There are differences among countries and stakeholders in the way food data are expressed for e.g. food description, definition of nutrients and methods used to generate compositional values. A common European standard to describe food data is currently developed within the European Committee for Standardization framework to enable unambiguous identification and description of food data for dissemination and interchange. A common food description system needs to be agreed upon and shared. LanguaL is one such system, which uses a multilingual thesaurus based on 14 different facets including product type, food source, part of the plant or animal from which the food is derived and cooking method (http://www.langual.org). EuroFIR provides links to several food composition data banks in Europe (http://www.eurofir.org). A growing number of such databanks, generally specific of a given country, are now accessible via internet, and EuroFIR offers an eSearch function allowing retrieval and comparison of food composition values for macronutrients, vitamins and minerals, in different databases.
Databases for composition of bioactives are not yet well developed with some notable exception like the USDA databases for carotenoids or flavonoids  and the Phenol-Explorer database for polyphenols (http://www.phenol-explorer.eu). Different classifications have been proposed for food components. These are not trivial for bioactive compounds due to the diversity of their chemical structures. A unified system adapted to the specific needs of nutrigenomic studies should be developed.
Estimates of food intake (food name and quantity consumed) can be combined with food composition data to estimate intake of energy and specific nutrients. For this, foods or diets in nutrigenomics intervention studies should be characterized in detail using the standardized food descriptors. The dbNP should also contain food composition data, if specific data have been collected as part of the study, which can eventually be compared to common composition values as described in food databases.
Alternatively, information on the exposure to some nutrients or bioactive compounds can be captured by analysing the food metabolome as all metabolites directly derived from the digestion of food in urine or plasma . Some components of the food metabolome can also be derived from some food constituents characteristic of a given food (i.e. phloretin derived from phlorizin exclusively found in apple). These components can be used as biomarkers of food intake in cohort studies [24, 30] or as markers of compliance in dietary intervention studies. This information on the food metabolome should be part of the dbNP biomarker module and treated in the same way as data on the endogenous metabolome. The dbNP database should include, or be linked to, information of the food sources containing the precursors of the different constituents of the food metabolome to provide information on food intake. It should also contain information on their quality as markers with appropriate references (Fig. 6).
The dbNP provides study data for evaluation by storing, annotating and pre-processing of multiple layers of data. Evaluation can thus take place at various levels.
Evaluation usually is performed by statistical packages and bioinformatics tools. Many of these are available via the NBX system, especially if available as open source software. For simple analysis and evaluation (mostly level 2), these tools can read data files in specific formats or retrieve them directly from a database via web services. The job of the dbNP is thus twofold: (1) allow selection of the data of interest; and (2) deliver data to the appropriate toolboxes either by web services or as data files.
The routine of producing these data files involves selection of the study parts and studies to be analysed. This involves metadata query. For example, the dbNP (more specifically the study description module) should respond to questions like “list all studies with PBMC transcriptome data in men after at least 5 weeks of exposure to fish oil”. The study selection query tool will use the same ontology look up webservices as the study description creation tool (see: http://www.ebi.ac.uk/ontology-lookup/). This allows the user to search for instance for “human” studies that were described as “homo sapiens” during entry of the study description. This triggers the selection of a subset of transcriptome data, which after “manual” inspection can be pre-processed to clean data and presented in the right format for further statistical and bioinformatics evaluation. The complexity of the metadata query depends on the accuracy of capturing the study design in ISA-web. Nutrition-specific extensions are currently being added to ISA-Tab using ISA-configurator. ISA-web will be tested and extended to support these extensions if necessary. Results from interactive usage can be read directly from the screen. But in all cases, results will be available and downloadable as text files that can be interpreted by tools during next steps of the procedures or pdf files containing figures. Pathway results in graphical format with annotations can also be exported in html for direct usage and further interpretation using a webbrowser. Results from overexpression analysis can be downloaded in standard (MappFinder, ) and used further analysis in tools like GO_Elite. For the more extensive pipelines sets of data, text and graphical files can be downloaded in zip archives.
The next version of the study description database will support SOAP (‘Simple Object Access Protocol’) web services, which will allow access to the study description database by other tools. This is needed for the first type of querying, directed towards study selection and offers several advantages. First, it will no longer be necessary to propagate study description information in the study database to the different database modules. Each of these modules only needs to contain a study identifier. Even if researchers only use a direct graphical user interface to the domain-specific database, their own database will provide a study selection interface that directly takes the actual study descriptions from the study database. If the database modules are developed using common program environment like Grails/Java, the development needs to be done only once and not separately for all—omics fields. Thus, study descriptions will be generated only once and can be more extensive than any domain-specific database. The query interface in any given—omics database would furthermore be able to show which other databases contain related results.
The SOAP web services to the study description database will also allow queries from other tools like Genepattern and directly from R/Bioconductor. Furthermore, they can be used for the development of a more complex stand-alone query tool. Finally, these web services can be used for data selection for more knowledge-driven analysis of, for instance, queries from PathVisio and Cytoscape. The latter type of analyses thus already combine study description data, experimental data modules and external data to understand the experimental outcome of what we already know, but selects data only based on the study description.
The second type of querying selects data not only by study descriptions but also through the data itself, relying on the use of statistical and bioinformatics tools. Here, questions like “which genes in PBMC of men on a high fat diet are correlated with PPAR-alpha” or “is there a significant correlation between decrease in CRP and IL-1 in plasma of women after at least 5 weeks of exposure to fish oil, if the PPAR-alpha pathway is activated”. This type of query needs: (1) selection on metadata; (2) pre-processing of relevant data (in this case transcriptome and plasma proteins); and (3) preparation of either a data file in the right format or delivery of the relevant data via a web service.
A query tool that can access both the study metadata databases and the clean data databases can solve a large part of this. The tool should be able to invoke R/Bioconductor procedures to do dedicated study-specific statistics. For pathway-related questions, several options are possible. The resulting data can be delivered to pathway analysis, which can be used for the final pathway statistics . Visualizations of the query can be made from the pathway analysis tool itself by for instance a query plug-in in PathVisio. PathVisio will use the same software library to access the different web services and API’s that the central query tool uses.
Finally, we will add yet another data level on top of the clean data databases containing pathway profiles and GO analysis results. This allows selection of studies that show changes in the same pathways and clustering approaches focusing on studies that show comparable results in pathways affected or in GO levels where effects occur.
The third type of query uses semantic web-based technology and will probably become more important in the future. In this approach, data could be extracted as triples that define relations between two entities. Each entity would be stored in a concept store and the relation itself in a triple store. Both types of data stores can be combined with large volumes of information collected as part of semantic web initiatives and used for analyses by tools like Cytoscape plug-ins, allowing data integration of concepts from different data sources (literature, different curated databases and co-expression databases) with actual correlations in dbNP.
Relationships in dbNP can either be selected directly by the analytical tool that also accesses concept web-based knowledge triples, or the dbNP content itself can be transformed into knowledge triples, which can be added to the concept store and combined with other triple information. Overlap of triples derived from the two types of sources can in itself be an interesting target for further evaluation. Co-expressed genes in dbNP studies that often are mentioned together in publications about completely different topics or on protein–protein interaction databases known to have interacting products might be further evaluated. Because concept triples allow inclusion of synonymous information, this kind of combinations can be very powerful. It allows combining results with much more related knowledge than is currently done in pathway and network analyses. This may lead to surprising findings if we allow analyses to contain information only remotely related domains (different diseases, gene expression, etc.) (Fig. 7).
Although aspects of the planned work could be undertaken by a local or national approach, providing that the necessary broad skill and funding were available, such approaches would fail to address the major issue of fragmentation and lack of:
The unique feature for the development of the described dbNP is development of the infrastructure embedded in a nutrition research network combined with expertise in analytics and IT. This guarantees that procedures, protocols and other facilities will be tailored to the specific needs of researchers on food and health and will ensure acceptance by the cognate research community. The latter is an essential aspect, because several previous good initiatives to network or harmonize methodology in nutrition research have failed because of poor acceptance and competition between standards. This is the reason for the explicit choice of an integrated all-encompassing approach with a big network of associated parties rather than work on a single aspect.
To build this integrated and global dbNP, we have established the Nutrigenomics Organisation. The objectives of the Nutrigenomics Organisation are to:
The nutritional phenotype database is a more than a database. It is a project, which spans three dimensions of nutrition research: study execution from study design to evaluation, analysis from food intake to genetics and coordination from a single laboratory to global collaboration.
The dbNP also is an ongoing project; new analytical technologies will emerge and better standard operating procedures will be inserted. Thus, this paper also is a call for collaboration and invites the molecular nutrition research community to join this effort supported by the Nutrigenomics Organisation.
Development of the dbNP is an open source community effort, with central access at http://www.dbnp.org. Although many parts are functional, many need to be completed or initiated. This includes version control, detailed data management, the genetics module and the food intake module. Other modules, like imaging and flux analysis, have not even been designed.
Yet in launching this project, NuGO has high expectations: dbNP will grow into a global research and collaboration tool, and a publicly available data and knowledge repository, as an essential basis for a molecular nutrition research infrastructure. With an increasing number of nutritional systems biology studies becoming available to full interrogation in dbNP, this will become a valuable treasure for new nutrition research.
Work on dbNP is currently funded primarily from the following sources: The European Nutrigenomics Organisation (http://www.nugo.org). The Netherlands Nutrigenomics Consortium (http://www.nutrigenomicsconsortium.nl). The Netherlands Metabolomics Center (http://www.metabolomicscentre.nl/). The EU FP6 Network of Excellence Eurreca (http://www.eurreca.org). It is emphasized that dbNP is an open source project, with many individual contributions, not specifically linked to research projects.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Ben van Ommen, Email: email@example.com.
Jildau Bouwman, Email: firstname.lastname@example.org.
Lars O. Dragsted, Email: kd.uk.efil@ardl.
Christian A. Drevon, Email: email@example.com.
Jim Kaput, Email: vog.shh.adf@tupaK.semaJ.
John C. Mathers, Email: firstname.lastname@example.org.
Jahn Saito, Email: ln.saaminu.tacgib@otiasj.
Augustin Scalbert, Email: rf.arni.tnomrelc@treblacs.
Marijana Radonjic, Email: email@example.com.
Philippe Rocca-Serra, Email: ku.ca.ibe@accor.
Anthony Travis, Email: firstname.lastname@example.org.
Suzan Wopereis, Email: email@example.com.
Chris T. Evelo, Email: firstname.lastname@example.org.