Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Food Compost Anal. Author manuscript; available in PMC 2013 March 1.
Published in final edited form as:
PMCID: PMC3352238

A structured vocabulary for indexing dietary supplements in databases in the United States


Food composition databases are critical to assess and plan dietary intakes. Dietary supplement databases are also needed because dietary supplements make significant contributions to total nutrient intakes. However, no uniform system exists for classifying dietary supplement products and indexing their ingredients in such databases. Differing approaches to classifying these products make it difficult to retrieve or link information effectively. A consistent approach to classifying information within food composition databases led to the development of LanguaL™, a structured vocabulary. LanguaL™ is being adapted as an interface tool for classifying and retrieving product information in dietary supplement databases. This paper outlines proposed changes to the LanguaL™ thesaurus for indexing dietary supplement products and ingredients in databases. The choice of 12 of the original 14 LanguaL™ facets pertinent to dietary supplements, modifications to their scopes, and applications are described. The 12 chosen facets are: Product Type; Source; Part of Source; Physical State, Shape or Form; Ingredients; Preservation Method, Packing Medium, Container or Wrapping; Contact Surface; Consumer Group/Dietary Use/Label Claim; Geographic Places and Regions; and Adjunct Characteristics of food.

Keywords: LanguaL, Government, Dietary supplements, Databases, Indexing, Structured vocabulary, Thesaurus, Food analysis, Food composition

1 Introduction

Several dietary supplement databases are available in the United States of America (Dwyer, et al 2008; Saldanha, et al., 2010). To use their data efficiently, it is necessary that both conventional food1 and dietary supplement databases use consistent descriptive systems for classifying them.

The concept of using a faceted thesaurus to index foods originated at the United States (US) Food and Drug Administration (FDA) in the mid-1970s, because of the need to overcome barriers in accessing and exchanging information about food products (McCann et al, 1988). Barriers included differences in food names, food descriptive terms, and nutrient names and units. Because of the enormity and the complexity of the data on a global level, much useful data had become isolated in different and incompatible files (Pennington et al., 1995). The LanguaL™ thesaurus was created to answer the need for a consistent cataloguing system.

The term LanguaL™ is derived from the Latin words langua (language or tongue) and alimentaria or “food”, so it represents the language for describing food. LanguaL™ is a structured, controlled vocabulary for describing foods, in a systematic organization that simplifies retrieval of information for data analysis. It is based on the principle that items within a database (whether they are dietary supplements or conventional food products) can be described by a combination of uniform terms chosen from “facets” that characterize various mutually exclusive attributes of these products. These facets include food groupings, main ingredient source, physical attributes, other ingredients and processing, packaging and packaging materials, dietary uses, and other miscellaneous characteristics. Within each facet of the thesaurus, descriptors are arranged in a hierarchical order from broader to narrower terms to facilitate indexing and retrieval (Hendricks, 1992; Pennington & Hendricks, 1992; Pennington & Butrum, 1991). “Scope notes” accompany the thesaurus descriptors to explain precisely when a particular descriptor should be used and to ensure uniform use of the terms by indexers and searchers. The thesaurus also provides additional information for many descriptors: these often refer to specific definitions, such as those in the US Code of Federal Regulations (CFR). Currently the European LanguaL™ Technical Committee administers LanguaL™ and the Danish Food Information hosts and maintains the LanguaL™ website (

This paper outlines the development of a LanguaL™ Dietary Supplement Structured Vocabulary (LanguaL™ DS Thesaurus) for systematically indexing these products. Dietary supplement database compilers and managers can use this structured vocabulary to classify products for information retrieval and facilitate links to other databases. This paper provides modifications to the food product thesaurus, in order to capture the unique features of dietary supplement products and the preliminary schema for the LanguaL™ DS Thesaurus. The preliminary schema was developed by members from the US Federal Dietary Supplement Ingredient Database (DSID) ad hoc Working Group2 in collaboration with and drawing heavily on the work of European experts. Expert opinion was also sought from participants at the post-conference workshop held in conjunction with the 34th National Nutrient Databank Conference in 2010.

Although the classification system described in this paper was developed for use in the US, it can be used in countries where dietary supplements and dietary ingredients are defined in a manner similar to DSHEA and it can be adapted to f it regulatory definitions in place in other countries. For example, in some European countries the terms used to describe dietary supplements are narrower than those provided for in the US. The “extra” terms can be dealt with in the “Scope Notes” that provide them, but clearly state that they are not used in the US classification of dietary supplements.

2 Schema for dietary supplement databases

2.1 Proposed Adaptations of the LanguaL™ Thesaurus for Use in Dietary Supplement Databases

The complete LanguaL™ Thesaurus can be viewed and downloaded through the Danish Food Information's website ( This thesaurus provides detailed information about each descriptor and the scope of each facet. Presently foods (other than dietary supplements) can be described using descriptors chosen from 14 facets. Of these, 12 are applicable to dietary supplements. These are: A. Product Type; B. Source; C. Part of Source; E. Physical State, Shape or Form; H. Treatment Applied/Added Ingredients; J. Preservation Method, K. Packing Medium, M. Container or Wrapping; N. Food Contact Surface; P. Consumer Group/Dietary Use/Label Claim; R. Geographic Places and Regions; and Z. Adjunct Characteristics of Food. Some LanguaL™ food facets were dropped, because of distinctive differences in the manufacturing processes for conventional foods versus dietary supplements. The two food facets that were dropped are F. Extent of Heat Treatment; and G. Cooking Method (see Table 1).

Table 1
Proposed LanguaL™ facets for dietary supplements

The titles of some facets also did not seem fitting for dietary supplements, so modifications to the descriptors were proposed. For example, Facet H, Treatment Applied, was provisionally changed to “Ingredients” for the LanguaL™ DS Thesaurus, a s under this facet all ingredients, other than major source, can be indexed.

The scope and descriptors in each facet were chosen to be consistent with the scope of dietary supplements outlined in the US Dietary Supplement Health Education Act of 1994 (DSHEA) ( According to DSHEA, in the US a “dietary supplement” is defined as “a product (other than tobacco) intended to supplement the diet that bears or contains one or more of the following dietary ingredients: (A) a vitamin; (B) a mineral; (C) an herb or other botanical; (D) an amino acid; (E) a dietary substance for use by man to supplement the diet by increasing the total dietary intake; or (F) a concentrate, metabolite, constituent, extract, or combination of any ingredient described in clause (A), (B), (C), (D), or (E).” In addition, we considered the relevant regulations governing dietary supplements as published in the CFR, and US FDA guidance and practice in regulating dietary supplements (21CFR101; primarily 21 CFR 101.36, and 21 CFR 101.4 and 21 CFR 101.9) (

Only ingredients that meet the definition of dietary supplements according to DSHEA are described as “dietary (supplement) ingredients”. All other ingredients are treated as “non-dietary (supplement) ingredients”. Examples of dietary ingredients include vitamins, minerals, botanicals, amino acids, and dietary fiber. Food additives, colors, preservatives and bulking agents, are examples of non-dietary ingredients. According to US FDA labeling regulations, all dietary ingredients must be listed within the Supplement Facts panel. Non-dietary ingredients are listed outside the Supplement Facts panel (see Figure 1). This separation of ingredients is necessary, as the regulations governing dietary and non-dietary ingredients differ. For example, structure/function claims can be made only for dietary ingredients in dietary supplement products. Regulations governing dietary and non-dietary ingredients may differ in other countries.

Figure 1Figure 1
Generic multivitamin/mineral (MVM) Supplement Facts panel.

2.2 Indexing Dietary Supplements Using the LanguaL™ DS Thesaurus

Table 1 shows the 14 existing LanguaL™ facets and the proposed modification to allow indexation of dietary supplements. The table also indicates whether the corresponding information can be obtained from the product label. This point is relevant, because dietary supplement databases, such the National Health and Nutrition Examination Survey (NHANES) dietary supplement database, the National Library of Medicine’s consumer oriented dietary supplement database and others, have been developed using information from product labels (Dwyer et al., 2008; Saldanha et al., 2010). In the US, the requirement that labels must reflect product composition is stated in the law according to DSHEA. The manufacturer is legally liable if false or misleading claims are made. A product is also considered a misbranded product if the label does not reflect the product contents. In contrast, DSID is a quantitative database. The information in DSID provides estimates of specific ingredient levels based upon chemical analysis of products.

The classification of dietary ingredients specified in DSHEA can be used to index Product Type under LanguaL™ Facet A. In addition, the primary ingredient by weight that characterizes a dietary supplement product can be indexed under Facet B (Product Source). Any additional ingredients in a dietary supplement product can be indexed under Facet H (Ingredients), where dietary and non-dietary ingredients are listed separately in accordance with FDA regulations.

Table 2 further details the proposed categorization scheme, the narrower (more descriptive) terms, and scope for each facet in the proposed LanguaL™ DS Thesaurus. The Narrow Terms are hierarchical. When the LanguaL ™ DS Thesaurus is operational, it is envisioned that there will be pop-up boxes to guide the indexer with the coding. Based on the level of detail available, the person indexing would check the appropriate boxes and code the information accordingly. For example, in Table 2, the first narrow term column indicates under which category the product would fall. If the product contains fiber and enzymes, it would fall under “5. Other dietary substance to supplement the diet” classification.

Table 2
LanguaL™ Dietary Supplement Structured Vocabulary and with indexing of generic multivitamin/mineral (MVM) product shown in Figure 1

Every term in LanguaL™ is assigned an alphanumeric code, which is the indexing system. The letter refers to the facet. Specific terms/attributes that describe a product are assigned a numeric code. For example, the code H0311 refers to Facet H (an ingredient other than the primary ingredient) and 0311 is the code for niacin. Under Facet P, the exact language authorized by US FDA for health claims is included in the Scope Notes so that the indexer will not confuse a health claim with a structure/function claim.

Recently, codes have been added to LanguaL™ to classify dietary supplements at the highest level, i.e. Facet A. The Additional Information field for the new descriptor DIETARY SUPPLEMENT [A1298] contains reference to DSHEA, as well as to Codex Alimentarius and EC regulations. The new DS classification includes codes for all of the terms for indexing dietary supplements in databases in the US as well as narrower terms proposed in the European databases on dietary supplements (Finland, Netherlands, France, Denmark, and EPIC). These “extra” terms have Scope Notes that clearly state that they are not to be used in the US classification of dietary supplements. They can also be disabled in the US version for indexing dietary supplements.

2.3 Indexing a Generic Adult M ultivitamin/Mineral (MVM) Product using the LanguaL™ DS Thesaurus

Table 2 also shows how a generic adult MVM product like that shown in Figure 1, would be described using the LanguaL™ DS Thesaurus. Based on information provided in Supplement Facts panel, the product would be described as a MVM (combination) product under Facet A. Calcium carbonate would be identified as the “primary ingredient”, as it is the major ingredient by weight. Information described under Facets B and C is the “source” and “part of the source” of calcium carbonate. The “source” in this case is not specified and therefore assumed to be synthetic. All other ingredients captured in the Supplement Facts panel are dietary ingredients and are listed under Facet H “Ingredients”. All remaining ingredients listed outside the Supplement Facts panel, e.g. cornstarch, silicon dioxide and hydrogenated palm oil, are listed as non-dietary ingredients under Facet H.

In Figure 1, the source of the nutrient or dietary ingredient listed in the Supplement Facts panel is listed outside the Supplement Facts panel with all other ingredients in descending order by weight. The source of a nutrient or dietary ingredient can also be listed in parentheses after the name of the nutrient in the Supplement Facts panel. Both approaches are permitted under US FDA regulations. All other information entered in Table 2 to describe the MVM product, such as packaging, was obtained from the product as purchased.

Although it may appear that the amount of information that is needed to be compiled for each product is enormous, in practice only pertinent information is recorded. The indexer can also indicate that the information is “unknown”, i.e. simply not available, or that a certain facet is “not applicable” to a particular dietary supplement. For example, the geographic source of an ingredient is usually not available on a label (“unknown”). In contrast, Facet C (part of plant or animal) is not likely relevant (“not applicable”) when the main ingredient (facet B) is a chemical (e.g. calcium carbonate). Although a researcher may not search a database for “not known” or “not applicable” information, this distinction is useful information for researchers.

3 Conclusion

The proposed modifications of the LanguaL™ thesaurus described in this paper were used to help build the LanguaL™ DS Thesaurus. This Thesaurus will assist with classifying products and retrieving information about dietary supplements in databases. This tool should be of value to researchers and stakeholders, as it will improve the precision of matching dietary supplement products recorded in food consumption surveys with the composition of these products as indexed in dietary supplement databases. This more precise matching should result in more accurate estimates of nutrient intakes.

There are many advantages to using the LanguaL™ thesaurus to index dietary supplements in databases. Key applications and examples of these applications, some of which are also applicable to food composition databases, are summarized in Table 3.

Table 3
Use of LanguaL™ in Dietary Supplement Databases

The LanguaL™ DS Thesaurus enables database developers to catalogue and link their data and facilitate the sharing of databases. It should aid in better estimates of nutrient intakes from dietary supplements, because there will be a uniform approach to how data about dietary supplements products will be described, recorded, and retrieved. However, it is not intended to and will not resolve all the challenges faced in the development and maintenance of dietary supplement databases.

Several considerations in the development of the s chema were discussed during preliminary meetings. A list of these considerations and how they will be addressed in the LanguaL™ DS Thesaurus are summarized in Table 4. The authors welcome comments and suggestions for improving the indexing system.

Table 4
Considerations in the development of LanguaL™ Indexing System for dietary supplements


  • A shared tool that enables classifying dietary supplement products in databases.
  • Aids in a uniform way to how supplements are described, recorded, and retrieved.
  • Enables database developers to link their data and share supplement databases.


The Office of Dietary Supplements, National Institute of Health funded the development of the LanguaL™ Dietary Supplement Structured Vocabulary for use in the United States.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1Under the Federal Food, Drug and Cosmetic Act and its amendments, food labeling is required for most prepared foods, such as breads, cereals, canned and frozen foods, snacks, desserts, drinks, etc. Nutrition labeling for raw produce (fruits and vegetables) and fish is voluntary. Such products are referred to as "conventional" foods. Dietary supplements are a special category of products that comes under the general umbrella of foods, but which has separate labeling requirements.

2The Dietary Supplement Ingredient Database ad hoc Working Group was formed to assist with the development of the Dietary Supplement Ingredient Database (DSID). Information about DSID can be found at the DSID website:


No conflicts of interest reported by the authors.

Disclaimer: The findings and conclusions in this report are those of the author(s) and do not necessarily represent the views of the Office of Dietary Supplements, NIH, US FDA, CDC, the USDA, or any other entity of the US government.


  • Dwyer JT, et al. Progress in developing analytical and label-based dietary supplement databases at NIH’s Office of Dietary Supplements. Journal of Food Composition and Analysis. 2008;21:S83–S93. [PMC free article] [PubMed]
  • Eurofoods Working Group on Food Description, Terminology and Nomenclature, Report by the COST Action 99. [Retrieved October 9, 2011];LanguaL 2000 Thesaurus. 2000 Report No. EUR 19540. from Danish Food Information's Website:
  • Hendricks TC. LanguaL: An automated method for describing and retrieving data about food. In: Sempolus AP, Butrum RR, editors. International Food Data Bases and Information Exchange. Vol. 68. World Rev Nutr Diet. Basel, Karger; 1992. pp. 94–103. [PubMed]
  • McCann A, et al. FDA’s Factored Food Vocabulary for food product description. J Am Diet Assoc. 1988;88:336–341. [PubMed]
  • Pennington JA, Butrum RR. Food descriptions using taxonomy and the ‘LanguaL” system. Trends in Food Science & Technology. 1991 Nov;:285–288.
  • Pennington JA, Hendricks TC. Proposal for an international interface standard for food databases. Food Additives Contaminants. 1992;9(3):265–275. [PubMed]
  • Pennington JA, et al. International interface standard for food databases. Food Additives Contaminants. 1995;12(6):809–820. [PubMed]
  • Saldanha LG, et al. Online dietary supplement resources. J Am Diet Assoc. 2010;110(10):1426–1431. [PMC free article] [PubMed]
  • Dietary Supplement Health and Education Act of 1994. [Retrieved October 9, 2011];Pub Law 103–417, 108 Stat. 4325,Bill Number S.784. 1994 Oct 25; from US Government Printing Office Web Site:
  • Agriculture Research Service, US Department of Agriculture. [Retrieved October 9, 2011];Dietary Supplement Ingredient Database (DSID) 2011 from the Dietary Supplement Ingredient Database (DSID) home page:
  • US Food & and Drug Administration. [Retrieved October 9, 2011];A Dietary Supplement Labeling Guide. 2005 from the Dietary Supplements Website:
  • Code of Federal Regulations (CFR), US Government Printing Office, Title 21: Food and Drugs, - Part 101—Food Labeling. [Retrieved October 9, 2011]; from the Government Printing Office Website: