Search tips
Search criteria 


Logo of advannutSearch Advances in NutritionManuscript SubmissionSubscribe to Advances in NutritionView all articlesRead published version of paper
Adv Nutr. 2012 September; 3(5): 654–665.
Published online 2012 September 6. doi:  10.3945/an.112.002477
PMCID: PMC3648747

Online Tools for Bioinformatics Analyses in Nutrition Sciences1,2


Recent advances in “omics” research have resulted in the creation of large datasets that were generated by consortiums and centers, small datasets that were generated by individual investigators, and bioinformatics tools for mining these datasets. It is important for nutrition laboratories to take full advantage of the analysis tools to interrogate datasets for information relevant to genomics, epigenomics, transcriptomics, proteomics, and metabolomics. This review provides guidance regarding bioinformatics resources that are currently available in the public domain, with the intent to provide a starting point for investigators who want to take advantage of the opportunities provided by the bioinformatics field.


The Human Genome Project released an advanced version of the human genome in 2003 in 2 landmark papers (1, 2). The availability of high-quality, comprehensive sequence data broke ground for a new era of research in various disciplines, including human nutrition and its role in disease prevention. Nutrigenomics emerged alongside other omics fields after the genome revolution and deals with the study of nutrient–gene interactions that could give way to possible dietary interventions in the overall goal of maintenance of optimum health or prevention of disease (3). The societal benefits of dietary interventions within the nutrigenomics framework are evident by looking at success stories in nutrition textbooks. For example, the implementation of newborn screening programs for early detection of phenylketonuria and for biotinidase deficiency, combined with the dietary restriction of phenylalanine and supplementation with biotin, respectively, has resulted in excellent prognoses for afflicted individuals.

Although dietary interventions in individuals with well-defined gene mutations have long been part of routine medical practice, they represent only the tip of the iceberg compared with the potential societal and individual benefits of taking full advantage of another recent milestone in genomics, i.e., the release of the “1000 Genomes” report (4). This report, which identifies ~15 million single nucleotide polymorphisms (SNP)6, copy number variations, deletions, and other interindividual variations in the human genomes from 3 projects, predicts a 5-fold greater frequency of sequence variations compared with previous estimates. One can assume with a reasonable level of confidence that many of these SNP are linked to disease risk. Dietary intervention offers an effective and cost-efficient approach to prevent disease (5). One of many classic examples in support of this theory relates to studies of SNP in the human methylene tetrahydrofolate reductase (MTHFR) gene; these SNP may predispose individuals to an increased risk of heart attack, renal disease, and birth defects (6, 7).

Based on the definition of genetics and genomics by the WHO (8), nutrigenetics can be defined as studies of nutrition and heredity, whereas nutrigenomics is the study of the mutual interactions among dietary molecules, genes, and gene function. The main difference between genomics and genetics is that genetics scrutinizes the functioning and composition of the single gene, whereas genomics addresses all genes and their interrelationships to identify their combined influence on the growth and development of the organism (8). In this review, nutrigenomics refers to both nutrigenetics and nutrigenomics.

Nutrition researchers, particularly those in the nutrigenomics area, need to make a whole-hearted and concerted effort to integrate bioinformatics expertise in their toolboxes to be considered a valuable partner in the genetics and genomics arena and to take full advantage of the many new opportunities that have emerged since the sequencing of the human genome. This review introduces the reader to publicly available datasets pertinent to nutrigenomics research and to some of the basic, yet valuable, online analysis tools for processing and interrogating such datasets. In particular, nutrigenomics core competencies in genomics, epigenomics, transcriptomics, proteomics, and metabolomics are highlighted (Table 1).

Table 1.
Nutrigenomics core competencies and their relevance


Genomics and associated databases

Access to accurate and complete genome sequences is a fundamental prerequisite for conducting genomics research. DNA sequencing has seen many breakthrough technological developments, from automated capillary-based methods to the recently developed ultrahigh-throughput methods. The field continues to develop rapidly and is now moving toward single-molecule sequencers (9). Sequence databases can be classified into 2 clusters, namely, primary sequence depositories and secondary databases derived from the primary databases (see the genome variations section) (Table 2). The primary sequence depositories store the raw sequence data obtained from various independent sequencing experiments and are generally considered starting points for subsequent research. Depositories such as GenBank (10), European Molecular Biology Laboratory (EMBL) Bank (11), and DNA DataBank of Japan (12) maintain identical data through a daily and mutual exchange of data. The sequences are grouped by taxonomic divisions (such as bacteria, fungi, invertebrates, and vertebrates) and into various data classes [expressed sequence tags (EST), genome survey sequences, sequence tagged sites, transcriptome shotgun assembly, environmental sequences, synthetic sequences, high-throughput cDNA sequences, high-throughput genomic sequences, and whole genome shotgun sequences] based on the type of sequencing experiment (13). In addition to storing the raw and annotated general nucleotide sequences, each of the 3 centers maintain a sequence read archive and a trace archive, which store raw sequence reads from next-generation sequencers and sequence traces from the conventional capillary sequencers respectively (14).

Table 2.
Databases in the major omics fields that contribute to nutrigenomics investigations

Despite holding the same sequence data, the primary sequence depositories have their own data submission and search interfaces. GenBank accepts either Web-based submissions (BANKIT) suitable for a limited number of sequences with simple annotations (15), or e-mail–based submissions (SEQUIN) suitable for both small and larger sequences with extended annotations (16). Files that exceed the e-mail attachment limits can be submitted using SequinMacroSend (17). There are other specific methods for submission of large batches of sequences including EST, sequence tagged sites, and genome survey sequences and sequence reads generated from capillary (trace archive) sequencing machines or one of the next-generation sequencers (sequence read archive) (18). EMBL-Bank uses the online submission portal WEBIN for both individual and bulk sequence submissions (19). Users are required to choose from a selection of templates when preparing their submission. The DNA Data Bank of Japan uses the online submission portal SAKURA for the submission of both short and individual sequences and a mass submission system for the submissions of large files and multiple sequences (20). Submission systems of each of these 3 primary sequence depositories offer intuitive interfaces and the advantage of resuming partial submissions.

Secondary databases differ from primary sequence repositories because the sequences in these databases are curated and nonredundant. The National Center for Biotechnology Information (NCBI) offers a large collection of secondary databases in various sections that can be accessed through the Entrez retrieval system (21). The RefSeq database is an important component of the Entrez system and is composed of sequences of genomic regions, transcripts, and proteins, which are identifiable through entry-specific accession numbers (22). RefSeq is the first landing point for any user looking for curated sequences in genomes, transcriptomes, and proteomes. The database Entrez-Gene (23) offers gene-centric information such as genomic location and gene products and their attributes and phenotypes, whereas Entrez-Genome (21) contains chromosomal sequences and maps for all completely sequenced genomes. Similarly, the European Bioinformatics Institute (EBI) maintains a large collection of derived databases and analysis programs. Most importantly, the databases Ensembl (24) and Ensembl-Genomes (25) provide access to gene, transcript, and protein sequences and to whole genomes of vertebrate and nonvertebrate organisms, respectively. Map viewers have been integrated in these databases and provide positional information for various genomic features. Users may benefit from using both Entrez and Ensembl because both databases offer unique perspectives. As an example, the human holocarboxylase synthetase gene (Entrez-Gene ID: 3141, Ensembl ID: ENSG00000159267) has unique transcripts in both Entrez-Gene and Ensembl. Ensembl-Gene is unique by providing the corresponding Refseq alignments and a Blast/primer search for the displayed sequence range. Ensembl-Gene is unique by providing convenient tabular information for all the associated transcripts, and its browser has information for many additional tracks related to variation and regulatory features. Note that some research centers maintain organism-specific databases, e.g., for some plant and animal model organisms (2629) and specialized databases may focus on features such as micro RNA (30), transfer RNA (31), gene promoters (32), and other regulatory elements (33).

Genome variations

The phenotype of an organism is a result of complex interaction between genotype and the environment. This interplay between genes and the environment is further complicated by the interindividual genomic variations. The Human Genome Variation Society, which funds the discovery, characterization, documentation, and dissemination of genome variation information, maintains a categorized list of the variation databases (34). SNP are the most abundant form of genetic variation observed among individuals. The analysis of the roles of SNP in disease risk attracted considerable attention after 2 large-scale initiatives (the SNP Consortium and the Human Genome Project) generated large SNP datasets (35) (Table 2). There have been many reports in the past decade linking nutrition, genetic variation, and disease risk, as illustrated using the following examples. MTHFR catalyzes the conversion of 5,10-methylenetetrahydrofolate to 5-methyltetrahydrofolate, which is a key step in homocysteine metabolism. Two SNP in the MTHFR gene have been characterized at the biochemical level. The C677T variant impairs the stability of the protein, whereas the A1298C variant results in decreased enzyme activity without affecting enzyme stability (36). Individuals who are T677 homozygous have higher cardiovascular disease risk than C677 homozygous individuals, and the risk can be reduced to that of C677 homozygous individuals by folate supplementation (37). Heterozygotes have elevated homocysteine levels, but a causal link with cardiovascular disease risk is uncertain. Holocarboxylase synthetase activates biotin-dependent carboxylases by covalently attaching biotin to the apocarboxylase. The SNP A2096G in the coding region of holocarboxylase synthetase gene decreases biotin-binding affinity and thus reduces enzyme activity. Supplementation with biotin can restore activity to wild-type levels (38) at least in vitro.

Currently there are >30 million human reference SNP in the SNP Database (dbSNP), which is the largest repository of SNP. dbSNP also contains less frequent types of variations such as multibase deletions or insertions, including those of retroposable elements and microsatellites (39). Distinct phenotypes have only been identified for a minority of the known SNP. Some rare SNP might have escaped detection. The dbSNP can be searched using keywords, or entries can be retrieved through unique accession numbers. Approximately half of the SNP in dbSNP have not been validated. Accordingly, entries must be interpreted with caution (40).

Numerous other databases are viable alternatives to dbSNP in studies of genome variation (Table 2). The Ensembl variation database is one such option. Many of the Ensembl data are imported from the dbSNP and other databases, and the data are linked with consequence types revealing effects on the final protein product (41). In addition, the University of California, Santa Cruz (UCSC) genome browser maintains its own collection of genomic variants (42). The Online Mendelian Inheritance in Man is one of the earliest established resources linking human diseases with allelic variants including SNP (43). Online Mendelian Inheritance in Man is indispensable for medical researchers as a comprehensive source of information for all the known disease-associated genetic variants. Other databases focus on SNP in specific organisms, specific diseases, or functional metadata. Many of these databases also have tools for experimental analysis, such as designing PCR primers. SNPper is an example for an organism (human)–specific SNP database. SNPper is derived from the dbSNP and the UCSC Human Genome Browser (44). Cancer Genome Annotation Project–Genetic Annotation Initiative’s genetic variation resource is an example of a disease (cancer)– specific database (45). In most of these databases, SNP can be visualized in the context of genes, transcripts and regulatory features through the use of integrated genome browsers.

As a result of the decreasing costs of high-throughput sequencing, genome-wide association studies (GWAS) have become a common strategy for identifying the genetic basis of disease susceptibility. GWAS involves surveying SNP in samples from a large number of healthy and afflicted individuals to identify variations that associate with disease phenotypes (46). The Database of Genotypes and Phenotypes at the NCBI is one of the largest GWAS repositories (47). The summary data are freely accessible, whereas individual level data are restricted to approved users. GWAS central is an alternative portal for accessing summary level information and has the added advantage of providing an integrated viewer to visualize the location of markers at the chromosomal and gene level (48). The Human Genome Epidemiology navigator collects and classifies genetic variant information from the literature and is an example for a database providing informationabout meta analyses and GWAS (49). Human Genome Epidemiology navigator can be searched using gene names (Genopedia) or disease names (Phenopedia).

The value of SNP as markers for disease risk is undisputed. However, despite recent advances in genotyping protocols, it is still not feasible for single investigators to interrogate a substantial fraction of the SNP in the human genome. In this regard, haplotype analysis has gained traction. Haplotypes are the specific patterns of SNP over short stretches of genome that tend to inherit together. Genotyping using the haplotype information for certain “tagged SNP” is more economical and has little information loss (50). The Ensembl variation database includes a list of “tagged variants” that have been identified as having high linkage disequilibrium with other proximal variants. These are useful for haplotype analysis. Haplotypes are catalogued in the HapMap Project (51) and in the 1000 Genomes Project (4). The variant and haplotype information from these databases can be combined in individual disease association studies to obtain genotype information beyond what is genotyped directly, which helps the researcher to precisely locate disease-associated regions.


Apart from the genome, the organism’s phenotype is influenced by its epigenome, which is the full complement of epigenetic marks including DNA methylation and histone modification. Micro RNA and other noncoding RNA are also being classified as epigenetic factors due to the accumulating evidence of their role in mediating epigenetic mechanisms like chromatin remodeling and transcriptional and posttranscriptional regulation (52) (see further discussion in the Transcriptomics section). Epigenetic factors can control gene expression (53), DNA replication (54), DNA repair (55), and DNA recombination (56). Epigenetic factors exhibit a dynamic profile depending on the environment status and developmental status of the organism. Micronutrients and bioactive food compounds play essential roles in creating epigenetic marks [see Zempleni et al. (57) for a review]. Typically, epigenetic marks are mapped using chromatin immunoprecipitation (ChIP) assays, in combination with DNA microarrays (ChIP-chip) or high-throughput sequencing (ChIP-seq) (58). MethDB is one of the foremost public databases dedicated to storing DNA methylation data (Table 2). The database provides graphical and tabular representation of methylation profiles, including experimental details and sample phenotype characteristics; the database can be searched by species, sex, tissue, and gene (59). Because alterations in DNA methylation are implicated in tumorigenesis (60), cancer-specific methylation databases such as PubMeth have been developed. PubMeth was created by literature mining and manual curation; the database can be searched by gene and by type of cancer (61).

Researchers with an interest in histones may use the Human Histone Modification Database, which currently holds data for 43 types of histone modifications in humans; these were produced by ChIP-chip, ChIP-seq, and qChIP experiments. The database is searchable by type of histone modification, gene, chromosomal location, functional category, and type of cancer; individual datasets can be viewed in the integrated genome browser (62). Additional databases provide extensive information on histones (63), histone-modifying proteins (64), other chromatin-associated proteins (65), and chromatin-remodeling proteins (66).

The Encyclopedia of DNA Elements Project also generated valuable information for epigenetics research and offers a catalogue of functional DNA elements including genes, transcripts, cis-regulatory elements including promoters, enhancers, insulators, silencers, transcription factor binding sites, DNA methylation sites, and histone modification sites (67). Encyclopedia of DNA Elements data can be easily viewed using the publicly available genome browsers, like the UCSC genome browser (42), NCBI workbench (68), GBrowse (69), and the ENSembl browser (70).

The NIH Roadmap Epigenomics Mapping Consortium was launched in 2008 to provide a publicly accessible resource of reference epigenomic maps in stem cells and primary ex vivo tissues from different individuals, mapping DNA methylation, histone modifications, and related chromatin features, with the objective to aid basic and disease-oriented research (71). The data generated by the consortium can be browsed from the project Web site using a matrix, in which rows correspond to tissue/cell types and columns correspond to epigenetic variables. The data can also be browsed using the visual data browser for specific stem cells and fetal and adult tissues. The consortium’s data can be visualized with the dedicated Human Epigenome Browser maintained by Washington University (72). The data tracks are uniquely represented as heat maps with color gradients denoting signal strength. The data can be selected based on cell type, assay type, epigenetic mark, phenotype, and data source. The browser also facilitates standard statistical analysis on the tracks including pairwise comparison, hypothesis testing, and correlation. The data can also be accessed using the human epigenomic atlas, the NCBI epigenomics hub, and the UCSC browser mirror for epigenomics data (Table 2). Each of these resources have unique features with respect to data representation, browsing method, data download formats, and data upload capability; each resource offers additional tools for viewing and analyzing the data (73).


Historically, the quantification of mRNA has been at the forefront of transcriptomics. More recently small RNA such as micro RNA and noncoding RNA have attracted considerable attention (74). Microarrays and next-generation sequencers are the primary analytical technologies in transcriptomics research; users can select from various platforms. The microarray technology is mature and well established. However, there are certain difficulties with data analysis and the reproducibility of the results, especially in the context of nutritional research due to the complex nature of relationships between nutrients and the target genes. Transcriptional profiling using microarrays has been used to identify cellular targets for many macronutrients and micronutrients and also to characterize gene expression differences under different nutritional conditions (75).

In the recent past, microarrays played a major role in expression profiling, and they continue to be used as a lower cost approach to gene expression analysis. Their use is limited because only known genes can be studied. Tiling arrays use contiguous stretches of genomic regions that cover both known and unknown genes; however, their use depends on the availability of a reference DNA sequence. Other contemporary sequencing methodologies, EST analysis, and serial analysis of gene expression do not have this limitation, but limited sensitivity of transcript identification and high costs have proven prohibitive for many laboratories. Currently, the transcriptomics investigations increasingly use high-throughput RNA-seq technology. In addition to providing quantitative data and a greater dynamic range of detection, RNA-seq has the ability to detect sequence variants and splicing events without bacterial cloning (76, 77).

The Gene Expression Omnibus (GEO) at NCBI is a central depository hosting both sequence-based and microarray-based gene expression data (78) (Table 2). The data can be quickly located either by searching the GEO DataSets database or by searching for specific gene expression profiles in the GEO Profiles database. This database offers a number of tools for statistical analysis of treatment-responsive genes, for cluster analyses, and for mapping the results using NCBI-BioSystems records in pathway analyses. The GEO2R tool is useful for identifying genes that are differentially expressed between ≥2 groups. The ArrayExpress archive (79) at EBI is another large public repository for microarray data from all platforms. The database can be queried for experiment, organism, array type, and author. The Gene Expression Atlas (80) is associated with the ArrayExpress archive and contains a database of summary statistics based on meta-analysis from a curated subset of expression data. The Atlas is useful for retrieving gene expression patterns in various organisms in different tissues and under different environmental, disease, or developmental states. With the recent discovery of role for noncoding RNA in biological systems, studies are now emerging that focus on profiling the small RNA transcriptome (81); this has led to the establishment of dedicated databases (8284).

Several steps in the analysis and interpretation of the microarray data including data normalization and supervised and nonsupervised methods of differential gene expression analysis can be performed by well-established open source software like Bioconductor/R TM4 software suite (85) and by other free tools (86). Next-generation platforms for RNA sequencing generate terabytes of data. These platforms use specific methods for quality assessment, alignment, assembly, and further processing (87), which are specific for noncoding RNA (88). Bioconductor/R can be used to perform data analysis (85), but alternative software suites and stand-alone programs are also available (89).


Proteomics focuses on the identification and the quantification of all cellular and extracellular proteins. Initially, 2-dimensional gel electrophoresis assays were frequently used for analysis of protein profiles and resulted in the establishment of many databases for these data (90). Later, protein microarrays based on the principle of immunoassays in which affinity reagents (e.g., antibodies) are spotted onto the arrays at high density were used to identify and quantify proteins (91). These technologies have important limitations; for example, 2-dimensional electrophoresis is less sensitive and less reproducible than protein arrays. Two-dimensional electrophoresis is not useful in comparison of profiles from different species. Protein arrays are limited by the requirement for a large number of high-quality antibodies. In this context, MS methods, especially matrix-assisted laser desorption ionization time of flight MS and electrospray ionization MS, have emerged as the most useful proteomics methods (92).

Database resources in proteomics are not as comprehensive as those in genomics and transcriptomics, and proteomic databases are heterogeneous with respect to content, primarily because of the high degree of complexity in the methods used for proteomics research. The publicly available proteomics databases complement each other to serve the varied needs in proteomics research (93) (Table 2). The PRIDe database at EBI is a prominent centralized repository for MS proteomics data that holds spectral data and data for peptide and protein identification. Public data as well as collaborative data can be accessed using a simple search based on protein accession numbers or using the advanced search, which uses peptide sequence or accession numbers from other databases. The results can be viewed online or downloaded in XML (extended markup language) files. Specific experiment sets can be compared creating visual Venn diagrams. The PRIDE-Inspector tool can be installed locally and used to interact with the database and view the results locally (94). The PeptideAtlas database at the Institute for Systems Biology is highly useful in targeted proteomics work because it provides readily accessible peptide information of high confidence extracted from existing MS spectral data. PeptideAtlas has an extensive architecture and compiles its data into several builds; in these, the raw data are processed through a series of validation, peptide identification, and genome mapping steps before making the entry accessible for users. The database can be searched against various builds using either the peptide/protein name or the peptide sequence; other useful features include searching using protein lists and searching for specific pathways (95). The Global Proteome Machine database (GPMDB) is the largest curated public proteome repository. GPMDB contains MS/MS spectra along with protein identifications (96). The database provides the X! Hunter and X! Tandem search engines, which can be used to compare user input spectra with the consensus spectra. These search engines can also identify proteins for the selected sample sources including human proteins. Users may deposit spectra in the GPMDB. Identified proteins can be viewed either in gene, protein sequence, or ontology representations.

The peptide sequences from the mass spectral analysis are usually mapped to protein databases for protein identification. The most comprehensive database in this regard is the UniProtKB (97). UniProtKB has 2 sections: TrEMBL, which contains automatically translated and annotated EMBL sequences, and Swiss-Prot, which contains nonredundant sequences that are manually annotated and reviewed. The databases can be searched by accession numbers, keywords, or protein sequence. The result pages have detailed information on function of the protein with links to specialized databases. Three-dimensional structural information on proteins can be accessed from Research Collaboratory for Structural Bioinformatics PDB database, which is a repository for experimentally determined 3-dimensional structures of proteins (98).

Finally, information on protein–protein interactions is necessary to develop pathways and molecular interaction networks. The Search Tool for the Retrieval of Interacting Genes/Proteins database at EBI contains known and predicted protein–protein interaction data from various organisms (99). Database entries are based on high-throughput experiments, literature mining, and sequence homology. The interactions are provided with confidence scores and can be viewed as a network or in other formats such as coexpressing proteins, neighborhood genes, fusion partners, pathway database links, links to specific experimental details, or PubMed citations. A large number of proteomics programs can be found at the ExPaSy bioinformatics resource portal (100).


Metabolomics is the study of all small molecule metabolites in living systems (i.e., cells and tissues) and their biological fluids (e.g., blood and urine) and deals with their identification and quantification (101). The metabolome is the most dynamic component of living systems with respect to changes in composition and relative proportions over time and in response to various biotic and abiotic factors. The study of the metabolome has a wide range of applications including studies of pharmacogenetics (102) and nutritional intervention (103). The metabolome is complex and diverse; the constantly changing metabolite flux poses unique challenges. Investigators may choose to focus on single metabolites (targeted analysis), groups of metabolites belonging to a specific class or pathway (profiling), or the unbiased identification and quantification of all metabolites (fingerprinting and footprinting) (104). For general information about metabolomics, relevant technologies, databases and data analysis tools, the user is referred to the metabolomics links at Scripps Institute (105).

The 2 most widely used analytical techniques in metabolomics are MS and NMR spectroscopy. MS techniques are indispensable for identification and characterization of unknown metabolites. Coupled with various chromatographic/electrophoretic separation methods, they offer excellent sensitivity. However, MS approaches are limited to ionizable metabolites (104). Although sensitivity is lower and dynamic range is smaller for individual identification of metabolites than MS, NMR has advantages. NMR requires little or no sample preparation and separation. NMR yields quantitative data, which are highly suitable for fingerprinting (106). Although less sensitive than MS and NMR, Fourier transform infrared spectroscopy is increasingly used as a rapid and nondestructive technique in fingerprinting.

Based on the type of primary information, metabolomic databases can be grouped into those that harbor the following a) information about the physicochemical properties and biological annotations of metabolites; b) experimental raw data, in the form of spectra coming from various analytical platforms (mostly MS and NMR); and c) information regarding the pathways that connect metabolites (107). Some databases are hybrids of these 3 forms (Table 2).

Databases with information on metabolites

The databases in the first category are dictionaries of small molecules, providing structure and biological activity information for endogenous metabolites and xenobiotics. In this category, PubChem is probably the most comprehensive small molecule database, offering physicochemical information, bioassay information, and a literature mining tool (108). ChEMBL (109) and ChEBI (110) at EMBL-EBI focus on pharmacologically active molecules and chemicals of biological interest, respectively. KEGG Compound in Japan also is useful source of information (111). The Royal Society of Chemistry, London, maintains ChemSpider (112), which contains information on chemicals including related metabolites; ChemSpider incorporates data from >150 sources and provides binding predictions for target/receptor proteins based on LASSO scores in an uncluttered format that is useful for molecular interaction studies. In addition, databases are available for metabolites and other small molecules from distinct compound classes such as carbohydrates (113), lipids (114), drugs (115), environmental chemicals (116), and metabolites in specific organs (117). Many of these databases fail to provide disease-relevant information such as effective concentrations, sources, and methods used for experimental identification and quantification. To serve this purpose, the Human Metabolome Database (HMDB) was developed as a part of the Human Metabolome Project. HMDB aims to catalogue all metabolites present in body fluids in significant concentrations and to validate the information for known metabolites (118). HMDB is the largest central database dedicated to the study of the human metabolome. An important feature of the HMDB is the incorporation of experimental data from specific analytical platforms, thus allowing an easy search and comparison of metabolites detected using various experimental techniques.

Databases holding raw data on metabolite detection

Databases in this category store the experimental data in the form of spectra or chromatograms from different analytical platforms and are analogous to the high-throughput genomics databases. They provide the reference data useful for the identification and quantification of unknown compounds as well as metabolic profiles for diagnostic purposes. Major databases providing reference mass spectral data include METLIN (119), MassBank (120), NIST Chemistry WebBook (121) and KNAPSACK (122). These databases can be searched by accession numbers, compound names, chemical formula, substructure, mass, and mass/charge ratios. METLIN has annotations for >42,000 metabolites and tandem mass spectral data for many of them. MassBank is a distributed metabolite database composed of mass spectra originating in several collaborating institutions. NIST contains electron impact MS and MS/MS spectral libraries, gas chromatography data libraries, and a Web service for converting between gas chromatography/MS data formats from various platforms. KNAPSACK contains species-specific data to facilitate comparative studies of mass spectra in relation to organism and taxonomic hierarchy. NMR is an indispensable tool for structural identification of completely novel compounds (123). NMR reference spectral data are essential components of NMR-based metabolomics. Biological Magnetic Resonance Data Bank (124) is as a freely accessible, curated database of original reference NMR data sets. Other notable NMR spectral databases include NMRshiftDB2 (125) and ChemSpider (112). PRIMe (126) is a Web-based service that harbors NMR and MS reference spectra and several tools for an integrated analysis of metabolomics and transcriptomics data. The Madison Metabolomics Consortium Database (127) presents the user with additional information on the metabolite in ~50 separate data fields.

Databases storing metabolic pathway information

Databases in this category store pathway information. Such data are useful to predict and compare metabolic pathways in newly sequenced organisms and also aid in metabolic engineering in the context of disease prevention and drug/dietary intervention studies. Kyoto Encyclopedia of Genes and Genomes (KEGG) (128) is among the oldest and best established pathway databases. KEGG contains manually compiled pathway maps with metabolite and reaction information from published literature. KEGG offers sequence similarity and chemical structure similarity search features; displays may be color coded. Reactome (129) is another curated and peer-reviewed pathway database with cross-references to other databases containing information about nucleotides, proteins, metabolites, and molecule interactions. Reactome has an emphasis on chemical reactions, and the networks of molecules participating in the reactions are grouped into pathways. In addition to the human pathway data, Reactome provides inferred pathway information for 20 other model organism/species and provides tools for comparative analysis of pathways. BioCyc (130) is a large repository of ~1690 organism-specific metabolic pathway databases. BioCyc is organized into 3 tiers based on the level of data curation. The top-tier databases are intensively curated; these include MetaCyc, which is a large collection of metabolic pathways from multiple organisms; HumanCyc for human pathways; EcoCyc for Escherichia coli pathways; AraCyc for Arabidopsis pathways; and YeastCyc for yeast pathways. The other 2 tiers provide information for computationally derived data with moderate (tier 2) or no curation (tier 3). The BioCyc Web site also provides several tools for metabolic pathway analysis including network analysis, comparative pathway analysis, and data visualization. The Small Molecule Pathway Database has >350 small molecule pathways, of which >280 are unique to this database (131). The pathways are depicted in graphical form in their cellular location, and the concentrations of metabolites in pathways are shown. The Small Molecule Pathway Database can be searched by text or by SwissProt, GenBank, or Affymetrix/Agilent microarray identifiers.

Systems biology

Systems biology studies the organism/system as a whole and is based on the premise that the information content of the interacting parts of the whole system studied together is more than the sum of its parts, revealing what are called the emergent properties of the system (132). Traditionally, biological research has been conducted using a reductionist approach in which the individual components are investigated in detail to identify cause-and-effect relationships. However, biological systems function as interacting components rather than as isolated components (133). Although systems biology methods had been appreciated since the 1950s (134), its current popularity and wide spread use is the result of the recent developments in “omics” technologies that create the large datasets necessary for a systems approach. Systems biology integrates the various “omics” datasets from genes to metabolites to model interaction networks and to study network function and evolution under normal and perturbed conditions (135). Systems biology can identify disease mechanisms that were previously not apparent; especially those of complex, chronic diseases. Nutrigenomics more than other fields lends itself to systems biology approaches due to the multiple input variables and the large number of diet–molecule interactions; most of these interactions cause only subtle effects that are not directly apparent if investigated in isolation (3). For example, modeling of folate-mediated 1-carbon metabolism pathway and simulation of the impact of genetic and nutritional variation gave novel insights related to the pathway (136). Low folate is inversely associated with serum homocysteine levels, but no associations have been established with DNA methylation marks. Decreased MTHFR activity due to the C677T polymorphism may lower levels of S-adenosylmethionine, 5-methyltetrahydrofolate, and DNA methylation, whereas increasing the levels of S-adenosylhomocysteine, homocysteine, and the rate of purine synthesis. Folate deficiency may increase the effects size in C677T homozygotes.

Some online resources are available for studies that take a systems biology approach. The basic requirement is that the entire genome of the model organism has been sequenced. A systems biology approach usually involves studying the biochemical network, by perturbation of the genetic or the environmental components of the system, in an iterative process of hypothesis generation and testing of the hypothesis to generate a new hypothesis (137). Therefore, the availability of databases that store molecular interactions and pathway information is essential. Because the amount of data that needs to be retrieved is large, platforms such as BioMart have been developed to interact and retrieve data from the various omics databases (138). BioMart was designed to work with any existing or newly developed databases. This benefits the user by relying on only 1 unified data access interface without having to master the advanced Web query interfaces or application programmatic interfaces that are specific to each data source. The power of BioMart comes from integrated querying of different Marts, whose source data locations may be at geographically distant regions, and from its easy integration with other external software. Software like Galaxy provides an integrated data analysis interface. Galaxy (139) offers a seamless approach to data analysis, including basic sequence searches and genome-wide analysis of next-generation sequencing data. Galaxy also offers tools for data storage and management. Thus, Galaxy eliminates the bottleneck at data integration and large-scale data management that can be a limiting factor in systems biology investigations. Gaggle is another tool that can link the capabilities of different software tools operating on varied data into an integrated analysis framework (140). Gaggle is a simple and extensible system; any software can be adapted into its framework with a minimum amount of coding for suitable modifications in the input and output data formats. Molecular interaction networks can be modeled, visualized, and analyzed using the free Cytoscape package (141). Cytoscape has several plug-ins suitable for several uses in systems biology, including advanced analysis and modeling of interaction data, with powerful visual mappings with functional annotations. Additional resources are constantly being developed in this relatively new field of research.


Nutrigenomics investigators seek to understand the organization and function of cellular components and characterize the various molecular phenotypes associated with health and disease. These studies are facilitated by omics technologies, which have created unprecedented opportunities. With the cost of sequencing decreasing steadily, the costs of data storage and analysis may prove to be the true bottlenecks in moving the field forward (142). This situation will likely worsen with the increasing amount of high-resolution data arising from complex disease and metagenomic investigations of the role of microbial communities in various ecological niches, including the human intestine (143). A greater understanding of the genome at the level of individual variations could eventually lead to the development of the much anticipated paradigm of personalized nutrition and medicine. Technically, this goal is feasible because we are only a few steps away from the popular benchmark of the $1000 genome (144).

The diverse nature of information content and storage formats of the current omics data precludes the possibility of having unified bioinformatics databases or analysis tools. Thus, there had been a concomitant development of a large number of bioinformatics tools and databases (145, 146). Many of these tools are freely available online, and accessing this information does not require programming skills. However, the burgeoning depth and diversity of information from all these resources can easily be overwhelming to researchers lacking a basic understanding of bioinformatics. We hope that this review will prepare the ground for nutrigenomics investigators to embrace bioinformatics and to take full advantage of the opportunities in this field.


All authors have read and approved the final manuscript.


5These authors contributed equally to the manuscript.

6Abbreviations used: ChIP, chromatin immunoprecipitation; dbSNP, Single Nucleotide Polymorphism Database; EBI, European Bioinformatics Institute; EMBL, European Molecular Biology Laboratory; EST, expressed sequence tags; GEO, Gene Expression Omnibus; GPMDB, Global Proteome Machine database; GWAS, genome-wide association studies; HMDB, Human Metabolome Database; KEGG, Kyoto Encyclopedia of Genes and Genomes; MTHFR, methylene tetrahydrofolate reductase; NCBI, National Center for Biotechnology Information; SNP, single nucleotide polymorphism; UCSC, University of California, Santa Cruz.

Literature Cited

1. Collins FS, Green ED, Guttmacher AE, Guyer MS. A vision for the future of genomics research. Nature. 2003;422:835–47 [PubMed]
2. Collins FS, Morgan M, Patrinos A. The Human Genome Project: lessons from large-scale biology. Science. 2003;300:286–90 [PubMed]
3. Mutch DM, Wahli W, Williamson G. Nutrigenomics and nutrigenetics: the emerging faces of nutrition. FASEB J. 2005;19:1602–16 [PubMed]
4. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–73 Erratum in: Nature. 2011;473:544 [PMC free article] [PubMed]
5. Pennisi E. Genomics. 1000 Genomes Project gives new map of genetic diversity. Science. 2010;330:574–5 [PubMed]
6. Botto LD, Yang Q. 5,10-Methylenetetrahydrofolate reductase gene variants and congenital anomalies: a HuGE review. Am J Epidemiol. 2000;151:862–77 [PubMed]
7. Wrone EM, Zehnder JL, Hornberger JM, McCann LM, Coplon NS, Fortmann SP. An MTHFR variant, homocysteine, and cardiovascular comorbidity in renal disease. Kidney Int. 2001;60:1106–13 [PubMed]
8. World Health Organization. World Health Assembly: Genomics and World Health. Fifty Seventh World Health Assembly Resolution. 2004;World Health Assembly 57.13. [cited 2012 July 21]; Available from:
9. Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011;52:413–35 [PMC free article] [PubMed]
10. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2010;39:D32–7 [PMC free article] [PubMed]
11. Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tarraga A, Cheng Y, Cleland I, Faruque N, Goodgame N, Gibson R, et al. The European Nucleotide Archive. Nucleic Acids Res. 2011;39:D28–31 [PMC free article] [PubMed]
12. Kaminuma E, Kosuge T, Kodama Y, Aono H, Mashima J, Gojobori T, Sugawara H, Ogasawara O, Takagi T, Okubo K, et al. DDBJ progress report. Nucleic Acids Res. 2011;39: Database issue:D22–7 [PMC free article] [PubMed]
13. National Center for Biotechnology Information. Accession Number prefixes: Where are the sequences from? 2012. [cited 2012 02/20/2012]; Available from:
14. Leinonen R, Sugawara H, Shumway M. International Nucleotide Sequence Database C. The Sequence Read Archive. Nucleic Acids Res. 2010;39:D19–21 [PMC free article] [PubMed]
15. National Center for Biotechnology Information BankIt. 2012. [cited 02/20/2012]; Available from:
16. National Center for Biotechnology Information Sequin. 2012. [cited 03/29/2012]; Available from:
17. National Center for Biotechnology Information SequinMacroSend. 2012. [cited 03/29/2012]; Available from:
18. National Center for Biotechnology Information The GenBank Submissions Handbook [Internet] 2011. Available from:
19. Amid C, Birney E, Bower L, Cerdeno-Tarraga A, Cheng Y, Cleland I, Faruque N, Gibson R, Goodgame N, Hunter C, et al. Major submissions tool developments at the European Nucleotide Archive. Nucleic Acids Res. 2012;40:D43–7 [PMC free article] [PubMed]
20. DNA Data Bank of Japan SAKURA. 2012. [cited 02/20/2012]; Available from:
21. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2011;39:D38–51 [PMC free article] [PubMed]
22. Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 2012;40:D130–5 [PMC free article] [PubMed]
23. Maglott D, Ostell J, Pruitt KD, Tatusova T. Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res. 2011;39:D52–7 [PMC free article] [PubMed]
24. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S, Fitzgerald S, et al. Ensembl 2012. Nucleic Acids Res. 2012;40:D84–90 [PMC free article] [PubMed]
25. Kersey PJ, Staines DM, Lawson D, Kulesha E, Derwent P, Humphrey JC, Hughes DS, Keenan S, Kerhornou A, Koscielny G, et al. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucleic Acids Res. 2012;40:D91–7 [PMC free article] [PubMed]
26. Yook K, Harris TW, Bieri T, Cabunoc A, Chan J, Chen WJ, Davis P, de la Cruz N, Duong A, Fang R, et al. WormBase 2012: more genomes, more data, new website. Nucleic Acids Res. 2012;40:D735–41 [PMC free article] [PubMed]
27. McQuilton P, St Pierre SE, Thurmond J. FlyBase 101–the basics of navigating FlyBase. Nucleic Acids Res. 2012;40:D706–14 [PMC free article] [PubMed]
28. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res. 2012;40:D1202–10 [PMC free article] [PubMed]
29. Blake JA, Bult CJ, Kadin JA, Richardson JE, Eppig JT. The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 2011;39:D842–8 [PMC free article] [PubMed]
30. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–7 [PMC free article] [PubMed]
31. Jühling F, Morl M, Hartmann RK, Sprinzl M, Stadler PF, Putz J. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 2009;37:D159–62 [PMC free article] [PubMed]
32. Praz V, Perier R, Bonnard C, Bucher P. The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data. Nucleic Acids Res. 2002;30:322–4 [PMC free article] [PubMed]
33. Jiang C, Xuan Z, Zhao F, Zhang MQ. TRED: a transcriptional regulatory element database, new entries and other development. Nucleic Acids Res. 2007;35:D137–40 [PMC free article] [PubMed]
34. Human Genome Variation Society Databases & Other Tools. 2012. [updated 22nd January 2009; cited 02/20/2012]; Available from:
35. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928–33 [PubMed]
36. Weisberg I, Tran P, Christensen B, Sibani S, Rozen R. A second genetic polymorphism in methylenetetrahydrofolate reductase (MTHFR) associated with decreased enzyme activity. Mol Genet Metab. 1998;64:169–72 [PubMed]
37. Klerk M, Verhoef P, Clarke R, Blom HJ, Kok FJ, Schouten EG. MTHFR 677C→T polymorphism and risk of coronary heart disease: a meta-analysis. JAMA. 2002;288:2023–31 [PubMed]
38. Esaki S, Malkaram SA, Zempleni J. Effects of single-nucleotide polymorphisms in the human holocarboxylase synthetase gene on enzyme catalysis. Eur J Hum Genet. 2012;20:428–33 [PMC free article] [PubMed]
39. Sherry ST. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–11 [PMC free article] [PubMed]
40. Musumeci L, Arthur JW, Cheung FSG, Hoque A, Lippman S, Reichardt JKV. Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies. Hum Mutat. 2010;31:67–73 [PMC free article] [PubMed]
41. Chen Y, Cunningham F, Rios D, McLaren WM, Smith J, Pritchard B, Spudich GM, Brent S, Kulesha E, Marin-Garcia P, et al. Ensembl variation resources. BMC Genomics. 2010;11:293–308 [PMC free article] [PubMed]
42. Dreszer TR, Karolchik D, Zweig AS, Hinrichs AS, Raney BJ, Kuhn RM, Meyer LR, Wong M, Sloan CA, Rosenbloom KR, et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 2012;40:D918–23 [PMC free article] [PubMed]
43. Hamosh A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2004;33:D514–7 [PMC free article] [PubMed]
44. Riva A, Kohane IS. SNPper: retrieval and analysis of human SNPs. Bioinformatics. 2002;18:1681–5 [PubMed]
45. Clifford R, Edmonson M, Hu Y, Nguyen C, Scherpbier T, Buetow KH. Expression-based genetic/physical maps of single-nucleotide polymorphisms identified by the cancer genome anatomy project. Genome Res. 2000;10:1259–65 [PubMed]
46. Wang WYS, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet. 2005;6:109–18 [PubMed]
47. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39:1181–6 [PMC free article] [PubMed]
48. Anthony JB, Robert F, Robert H, Tim B, Sirisha G, Raheleh R. GWAS Central. 2012. [updated April 2012; cited 02/20/2012]; 7.0: Available from:
49. Yu W, Clyne M, Khoury MJ, Gwinn M. Phenopedia and Genopedia: disease-centered and gene-centered views of the evolving knowledge of human genetic associations. Bioinformatics. 2010;26:145–6 [PMC free article] [PubMed]
50. Clark AG. The role of haplotypes in candidate gene studies. Genet Epidemiol. 2004;27:321–33 [PubMed]
51. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–61 [PMC free article] [PubMed]
52. Kaikkonen MU, Lam MT, Glass CK. Non-coding RNAs as regulators of gene expression and epigenetics. Cardiovasc Res. 2011;90:430–40 [PMC free article] [PubMed]
53. Gibney ER, Nolan CM. Epigenetics and gene expression. Heredity (Edinb). 2010;105:4–13 [PubMed]
54. McNairn AJ, Gilbert DM. Epigenomic replication: linking epigenetics to DNA replication. Bioessays. 2003;25:647–56 [PubMed]
55. Escargueil AE, Soares DG, Salvador M, Larsen AK, Henriques JA. What histone code for DNA repair? Mutat Res. 2008;658:259–70 [PubMed]
56. Borde V, Robine N, Lin W, Bonfils S, Geli V, Nicolas A. Histone H3 lysine 4 trimethylation marks meiotic recombination initiation sites. EMBO J. 2009;28:99–111 [PubMed]
57. Zempleni J, Liu D, Xue J. Nutrition, histone epigenetic marks, and disease. In: Jirtle RL, Tyson F, editors. Epigenomics in Health and Disease. Heidelberg, Germany: Springer; 2012.
58. Pellegrini M, Ferrari R. Epigenetic analysis: ChIP-chip and ChIP-seq. Methods Mol Biol. 2012;802:377–87 [PubMed]
59. Grunau C, Renault E, Rosenthal A, Roizes G. MethDB–a public database for DNA methylation data. Nucleic Acids Res. 2001;29:270–4 [PMC free article] [PubMed]
60. Kulis M, Esteller M. DNA methylation and cancer. Adv Genet. 2010;70:27–56 [PubMed]
61. Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, Van Criekinge W. PubMeth: a cancer methylation database combining text-mining and expert annotation. Nucleic Acids Res. 2008;36: Database issue:D842–6 [PMC free article] [PubMed]
62. Zhang Y, Lv J, Liu H, Zhu J, Su J, Wu Q, Qi Y, Wang F, Li X. HHMD: the human histone modification database. Nucleic Acids Res. 2010;38:D149–54 [PMC free article] [PubMed]
63. Mariño-Ramirez L, Levine KM, Morales M, Zhang S, Moreland RT, Baxevanis AD, Landsman D. The Histone Database: an integrated resource for histones and histone fold-containing proteins. 2011;2011:bar048 [PMC free article] [PubMed]
64. Khare SP, Habib F, Sharma R, Gadewal N, Gupta S, Galande S. HIstome–a relational knowledgebase of human histone proteins and histone modifying enzymes. Nucleic Acids Res. 2012;40:D337–42 [PMC free article] [PubMed]
65. Gendler K, Paulsen T, Napoli C. ChromDB: the chromatin database. Nucleic Acids Res. 2008;36:D298–302 [PMC free article] [PubMed]
66. Shipra A, Chetan K, Rao MR. CREMOFAC–a database of chromatin remodeling factors. Bioinformatics. 2006;22:2940–4 [PubMed]
67. ENCODE Project Consortium A user's guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 2011;9:e1001046. [PMC free article] [PubMed]
68. Wolfsberg TG. Using the NCBI Map Viewer to browse genomic sequence data. Curr Protoc Hum Genet. 2011. Apr;Chapter 18:Unit18 5 [PubMed]
69. Donlin MJ. Using the Generic Genome Browser (GBrowse). Current Protoc Bioinformatics. 2009. Dec;Chapter 9:Unit 9 [PubMed]
70. Fernandez-Suarez XM, Schuster MK. Using the Ensembl genome server to browse genomic sequence data. Current Protoc Bioinformatics. 2007. Jan;Chapter 1:Unit 1 15 [PubMed]
71. Chadwick LH. The NIH Roadmap Epigenomics Program data resource. Epigenomics. 2012;4:317–24 [PMC free article] [PubMed]
72. Zhou X, Maricque B, Xie M, Li D, Sundaram V, Martin EA, Koebbe BC, Nielsen C, Hirst M, Farnham P, et al. The Human Epigenome Browser at Washington University. Nat Methods. 2011;8:989–90 [PMC free article] [PubMed]
73. Pete J, Mason B, Steve J, Laura C, Josh R, Katie C, Raj M. Epigenie: Epigenetics and Non-Coding RNA News. 2012. [cited 07/03/2012]; Available from:
74. Langenberger D, Bermudez-Santana CI, Stadler PF, Hoffmann S. Identification and classification of small rnas in transcriptome sequence data. Pac Symp Biocomput. 2010;80–7 [PubMed]
75. Kussmann M, Rezzi S, Daniel H. Profiling techniques in nutrition and health research. Curr Opin Biotechnol. 2008;19:83–99 [PubMed]
76. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–8 [PubMed]
77. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63 [PMC free article] [PubMed]
78. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2011;39:D1005–10 [PMC free article] [PubMed]
79. Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, Holloway E, Kapushesky M, Kemmeren P, Lara GG, et al. ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003;31:68–71 [PMC free article] [PubMed]
80. Kapushesky M, Emam I, Holloway E, Kurnosov P, Zorin A, Malone J, Rustici G, Williams E, Parkinson H, Brazma A. Gene expression atlas at the European bioinformatics institute. Nucleic Acids Res. 2010;38:D690–8 [PMC free article] [PubMed]
81. Jima DD, Zhang J, Jacobs C, Richards KL, Dunphy CH, Choi WW, Au WY, Srivastava G, Czader MB, Rizzieri DA, et al. Deep sequencing of the small RNA transcriptome of normal and malignant human B cells identifies hundreds of novel microRNAs. Blood. 2010;116:e118–27 [PubMed]
82. Griffiths-Jones S. miRBase: microRNA sequences and annotation. Curr Protoc Bioinformatics. 2010. Mar;Chapter 12:Unit 12 9 1–0 [PubMed]
83. Szymanski M, Erdmann VA, Barciszewski J. Noncoding RNAs database (ncRNAdb). Nucleic Acids Res. 2007;35:D162–4 [PMC free article] [PubMed]
84. Bu D, Yu K, Sun S, Xie C, Skogerbo G, Miao R, Xiao H, Liao Q, Luo H, Zhao G, et al. NONCODE v3.0: integrative annotation of long noncoding RNAs. Nucleic Acids Res. 2012;40:D210–5 [PMC free article] [PubMed]
85. Dudoit S, Gentleman RC, Quackenbush J. Open source software for the analysis of microarray data. Biotechniques. 2003;Suppl:45–51 [PubMed]
86. Stanford University Stanford Microarray Database. 2012. [cited 02/27/2012]; Available from:
87. Zhang J, Chiodini R, Badr A, Zhang G. The impact of next-generation sequencing on genomics. J Genet Genomics. 2011;38:95–109 [PMC free article] [PubMed]
88. Meyer SU, Pfaffl MW, Ulbrich SE. Normalization strategies for microRNA profiling experiments: a 'normal’ way to a hidden layer of complexity? Biotechnol Lett. 2010;32:1777–88 [PubMed]
89. Li JW, Robison K, Martin M, Sjodin A, Usadel B, Young M, Olivares EC, Bolser DM. The SEQanswers wiki: a wiki database of tools for high-throughput sequencing analysis. Nucleic Acids Res. 2012;40:D1313–7 [PMC free article] [PubMed]
90. Vihinen M. Bioinformatics in proteomics. Biomol Eng. 2001;18:241–8 [PubMed]
91. MacBeath G. Protein microarrays and proteomics. Nat Genet. 2002;32: Suppl:526–32 [PubMed]
92. Martin DB, Nelson PS. From genomics to proteomics: techniques and applications in cancer research. Trends Cell Biol. 2001;11:S60–5 [PubMed]
93. Vizcaíno JA, Foster JM, Martens L. Proteomics data repositories: providing a safe haven for your data and acting as a springboard for further research. J Proteomics. 2010;73:2136–46 [PMC free article] [PubMed]
94. Vizcaíno JA, Cote R, Reisinger F, Foster JM, Mueller M, Rameseder J, Hermjakob H, Martens L. A guide to the Proteomics Identifications Database proteomics data repository. Proteomics. 2009;9:4276–83 [PMC free article] [PubMed]
95. Deutsch EW, Lam H, Aebersold R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 2008;9:429–34 [PubMed]
96. Beavis RC. Using the global proteome machine for protein identification. Methods Mol Biol. 2006;328:217–28 [PubMed]
97. Hinz U. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledge base. Cell Mol Life Sci. 2010;67:1049–64 [PMC free article] [PubMed]
98. Berman HM, Kleywegt GJ, Nakamura H, Markley JL. The Protein Data Bank at 40: reflecting on the past to prepare for the future. Structure. 2012;20:391–6 [PMC free article] [PubMed]
99. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011;39:D561–8 [PMC free article] [PubMed]
100. Swiss Institute of Bioinformatics ExPASy Bioinformatics Resource Portal. 2012. [cited 02/20/2012]; Available from:
101. Wishart DS, Tzur D, Knox C, Eisner R, Guo AC, Young N, Cheng D, Jewell K, Arndt D, Sawhney S, et al. HMDB: the Human Metabolome Database. Nucleic Acids Res. 2007;35:D521–D6-D-D6 [PMC free article] [PubMed]
102. Clayton TA, Lindon JC, Cloarec O, Antti H, Charuel C, Hanton G, Provost JP, Le Net JL, Baker D, Walley RJ, et al. Pharmaco-metabonomic phenotyping and personalized drug treatment. Nature. 2006;440:1073–7 [PubMed]
103. German JB, Roberts M, Watkins SM. Genomics and metabolomics as markers for the interaction of diet and health: lessons from lipids. J Nutr. 2003;133(6 Suppl 1):2078S–83S-S-83S [PubMed]
104. Dettmer K, Aronov PA, Hammock BD. Mass spectrometry-based metabolomics. Mass Spectrom Rev. 2007;26:51–78 [PMC free article] [PubMed]
105. Scripps Research Institute Metabolomics Science Links at Scripps Center for Metabolomics and Mass Spectrometry. 2012. [cited 02/21/2012]; Available from:
106. Reo NV. NMR-based metabolomics. Drug Chem Toxicol. 2002;25:375–82 [PubMed]
107. Go EP. Database resources in metabolomics: an overview. J Neuroimmune Pharmacol. 2010;5:18–30 [PubMed]
108. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009;37:W623–33 [PMC free article] [PubMed]
109. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012;40:D1100–7 [PMC free article] [PubMed]
110. de Matos P, Adams N, Hastings J, Moreno P, Steinbeck C. A database for chemical proteomics: ChEBI. Methods Mol Biol. 2012;803:273–96 [PubMed]
111. Kyoto Encyclopedia for Genes and Genomes Japan. KEGG Compound. 2012. [cited 02/20/2012]. Available from:
112. Williams AJ, Tkachenko V, Golotvin S, Kidd R, McCann G. ChemSpider - building a foundation for the semantic web by hosting a crowd sourced databasing platform for chemistry. J Cheminform. 2010;2(Suppl 1):O16-O
113. Ranzinger R, Herget S, von der Lieth CW, Frank M. GlycomeDB–a unified database for carbohydrate structures. Nucleic Acids Res. 2011;39:D373–6 [PMC free article] [PubMed]
114. Yasugi E, Seyama Y. [Lipid database “LipidBank” and international collaboration] Tanpakushitsu Kakusan Koso. 2007;52:1357–62 [PubMed]
115. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, et al. DrugBank 3.0: a comprehensive resource for 'omics’ research on drugs. Nucleic Acids Res. 2011;39:D1035–41 [PMC free article] [PubMed]
116. Davis AP, King BL, Mockus S, Murphy CG, Saraceni-Richards C, Rosenstein M, Wiegers T, Mattingly CJ. The Comparative Toxicogenomics Database: update 2011. Nucleic Acids Res. 2011;39:1067–72 [PMC free article] [PubMed]
117. Wishart DS, Lewis MJ, Morrissey JA, Flegel MD, Jeroncic K, Xiong Y, Cheng D, Eisner R, Gautam B, Tzur D, et al. The human cerebrospinal fluid metabolome. J Chromatogr B Analyt Technol Biomed Life Sci. 2008;871:164–73 [PubMed]
118. Wishart DS, Knox C, Guo AC, Eisner R, Young N, Gautam B, Hau DD, Psychogios N, Dong E, Bouatra S, et al. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res. 2009;37:D603–10 [PMC free article] [PubMed]
119. Smith CA, O'Maille G, Want EJ, Qin C, Trauger SA, Brandon TR, Custodio DE, Abagyan R, Siuzdak G. METLIN: a metabolite mass spectral database. Ther Drug Monit. 2005;27:747–51 [PubMed]
120. Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, Ojima Y, Tanaka K, Tanaka S, Aoshima K, et al. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45:703–14 [PubMed]
121. National Institute of Standards and Technology NIST Chemistry WebBook. 2012. [cited 02/20/2012]; Available from:
122. Yukiko N, Hiroko A, Altaf-Ul-Amin M, Ken K, Shigehiko K. KNApSAcK: A Comprehensive Species-Metabolite Relationship Database. 2012. [updated 2012/02/03; cited 02/20/2012]; Available from:
123. Cuperlović-Culf M, Barnett DA, Culf AS, Chute I. Cell culture metabolomics: applications and future directions. Drug Discov Today. 2010;15:610–21 [PubMed]
124. Markley JL, Anderson ME, Cui Q, Eghbalnia HR, Lewis IA, Hegeman AD, Li J, Schulte CF, Sussman MR, Westler WM, et al. New bioinformatics resources for metabolomics. Pac Symp Biocomput. 2007:157–68 [PubMed]
125. Steinbeck C, Kuhn S. NMRShiftDB–compound identification and structure elucidation support through a free community-built web database. Phytochemistry. 2004;65:2711–7 [PubMed]
126. Akiyama K, Chikayama E, Yuasa H, Shimada Y, Tohge T, Shinozaki K, Hirai MY, Sakurai T, Kikuchi J, Saito K. PRIMe: a Web site that assembles tools for metabolomics and transcriptomics. In Silico Biol. 2008;8:339–45 [PubMed]
127. Cui Q, Lewis IA, Hegeman AD, Anderson ME, Li J, Schulte CF, Westler WM, Eghbalnia HR, Sussman MR, Markley JL. Metabolite identification via the Madison Metabolomics Consortium Database. Nat Biotechnol. 2008;26:162–4 [PubMed]
128. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30 [PMC free article] [PubMed]
129. Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. 2011;39:D691–7 [PMC free article] [PubMed]
130. Latendresse M, Paley S, Karp PD. Browsing metabolic and regulatory networks with BioCyc. Methods Mol Biol. 2012;804:197–216 [PMC free article] [PubMed]
131. Frolkis A, Knox C, Lim E, Jewison T, Law V, Hau DD, Liu P, Gautam B, Ly S, Guo AC, et al. SMPDB: The Small Molecule Pathway Database. Nucleic Acids Res. 2010;38:D480–7 [PMC free article] [PubMed]
132. Weston AD, Hood L. Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J Proteome Res. 2004;3:179–96 [PubMed]
133. Fang FC, Casadevall A. Reductionistic and holistic science. Infect Immun. 2011;79:1401–4 [PMC free article] [PubMed]
134. Hodgkin AL, Huxley AF. A quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol. 1952;117:500–44 [PubMed]
135. Huang S. Back to the biology in systems biology: what can we learn from biomolecular networks? Brief Funct Genomic Proteomic. 2004;2:279–97 [PubMed]
136. Reed MC, Nijhout HF, Neuhouser ML, Gregory JF, 3rd, Shane B, James SJ, Boynton A, Ulrich CM. A mathematical model gives insights into nutritional and genetic aspects of folate-mediated one-carbon metabolism. J Nutr. 2006;136:2653–61 [PubMed]
137. Weston AD, Baliga NS, Bonneau R, Hood L. Systems approaches applied to the study of Saccharomyces cerevisiae and Halobacterium sp. Cold Spring Harb Symp Quant Biol. 2003;68:345–57 [PubMed]
138. Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A. BioMart–biological queries made easy. BMC Genomics. 2009;10:22. [PMC free article] [PubMed]
139. The Galaxy Team The Galaxy Project: Online bioinformatics analysis for everyone. 2012. [cited 02/21/2012]; Available from:
140. Shannon PT, Reiss DJ, Bonneau R, Baliga NS. The Gaggle: an open-source software system for integrating bioinformatics software and data sources. BMC Bioinformatics. 2006;7:176. [PMC free article] [PubMed]
141. Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, et al. Integration of biological networks and gene expression data using Cytoscape. Nat Protoc. 2007;2:2366–82 [PMC free article] [PubMed]
142. Pettersson E, Lundeberg J, Ahmadian A. Generations of sequencing technologies. Genomics. 2009;93:105–11 [PubMed]
143. Desai N, Antonopoulos D, Gilbert JA, Glass EM, Meyer F. From genomics to metagenomics. Curr Opin Biotechnol. 2012;23:72–6 [PubMed]
144. Loman NJ, Misra RV, Dallman TJ, Constantinidou C, Gharbia SE, Wain J, Pallen MJ. Performance comparison of benchtop high-throughput sequencing platforms. Nat Biotechnol. 2012;30:562 [PubMed]
145. University of Pittsburgh OBRC: Online Bioinformatics Resources Collection. 2012. [03/22/2012]; Available from:
146. Galperin MY, Fernandez-Suarez XM. The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic Acids Res. 2012;40:D1–8 [PMC free article] [PubMed]

Articles from Advances in Nutrition are provided here courtesy of American Society for Nutrition