Because of the uniqueness and size of the study by McCann et al. (2007)
, it drew renewed attention to the food color debate, which was introduced to the environmental health community by an article in Environmental Health Perspectives
), followed by a letter from Weiss (2008)
. Barrett (2007)
solicited a response from a spokesperson for the FDA (Mike Herndon), who replied as follows:
However, we have no reason at this time to change our conclusions that the ingredients that were tested in this study that currently are permitted for food use in the United States are safe for the general population.
The article by McCann et al. (2007)
elicited a petition to the FDA from the Center for Science in the Public Interest (CSPI), a public interest group that earlier had called for a ban on food colors (CSPI 2008). This petition, together with congressional interest and media publicity, led to an FDA decision to review the food color literature and to hold a public hearing before its established Food Advisory Committee. The hearing was held on 30–31 March 2011. After listening to testimony from FDA reviewers and the public, the committee concluded that the evidence was too inconclusive to link food colors to hyperactivity and too insufficient to recommend warning labels for products containing artificial food colors. (I testified before the committee that the available evidence indicated a connection between adverse behavioral responses and food color consumption.)
As described by the FDA Food Advisory Committee (2011a), the FDA framed the question put to the advisory committee primarily in the form, “Are food colors a cause of hyperactivity?” Only as a secondary question did the FDA ask if food colors might be a source of other kinds of adverse behavioral responses. Although the food color question was framed quite narrowly by the FDA, it is representative of many of the questions that confront the environmental health sciences. What kind of data—and how much data—does it take to render an outcome conclusive enough for action? The committee decision and the FDA’s current view (as quoted by Barrett 2007
) signify a group of persistent questions pertaining both to environmental health science and to regulatory practices. In this commentary, I try to place the FDA committee decision in this broader context.
Identifying the appropriate measures. The FDA described the committee’s mission in these terms (FDA Food Advisory Committee 2011a):
The task before this Food Advisory Committee is to consider available relevant data on the possible association between consumption of synthetic color additives in food and hyperactivity in children, and to advise FDA as to what action, if any, is warranted to ensure consumer safety.
The charge did not explicitly conform to the DSM-IV definition of ADHD, which is multifaceted, so the charge was somewhat ambiguous.
Two review documents were contracted for and submitted to the Food Advisory Committee before the meeting: a background document, describing the FDA’s history of food color regulation (FDA Food Advisory Committee 2011a), and a literature review of publications about the connections between food colors and hyperactivity (FDA Food Advisory Committee 2011b). These documents provided the basis for the review presented to the committee by the FDA Office of Food Additive Safety.
In their review the FDA apparently decided to focus on Feingold’s 35-year-old hypothesis (Feingold 1975
) rather than on the broader environmental issue of whether food colors may induce adverse behavioral responses. This is a broader issue because, as noted above, most U.S. children, not just those diagnosed with ADHD, consume synthetic food colors in their diet.
Moreover, few of the artificial food color challenge studies did so to test the hypothesis that food colors cause ADHD as defined by the DSM-IV. No one, of course, can specify any predominant cause of ADHD. It is clearly a multicausal disorder as well as one with notable variation in expression. The food color literature is aimed mostly at the short-term effects of challenges, not chronic disease. Although the questionnaires, rating scales, and performance assays prominent in ADHD research have proven useful in challenge studies, they do not encompass all the behaviors evoked by food colors. Swanson and Kinsbourne (1980)
found that performance on a paired-associate learning task deteriorated after administration of a color mixture challenge. Goyette et al. (1978
; see also Conners et al. 1976
) identified 3 of the 16 children they assessed as responders by their performance on a visual tracking task. Even the FDA review observed that measures confined to ADHD symptoms may not reflect responses evoked by food colors. It noted the following in discussing a study by Rowe and Rowe (1994)
The behavioral effects elicited by the tartrazine challenges, however, involved irritability, fidgetiness and sleep problems which are not typically representative of hyperactivity related behaviors. Several other investigators also reported behavioral responses to color challenge that were not particularly characteristic of ADHD. (FDA Food Advisory Committee 2011b)
By narrowing the scope of the committee’s task to a judgment of whether artificial food colors are associated with ADHD, the FDA Food Advisory Committee (2011b) effectively eliminated a much more relevant and important question: Is there evidence that food colors are behaviorally toxic to the general population of children?
The large investment by the National Institute of Environmental Health Sciences (NIEHS) in bisphenol A research is, in many ways, a design for answering questions of similar scope. Bisphenol A is often labeled as “estrogenic.” Had the NIEHS bisphenol A initiative been restricted to this question (Spivey 2009
), it might have limited its breadth only to questions bearing on the chemical’s alleged estrogenic properties. The NIEHS, however, recognized the scope of associations between bisphenol A exposure and health effects, including those such as obesity and externalizing behavior in young girls, that could not be linked firmly to estrogenicity, if at all. Analogously, if questions about the adverse health effects of airborne particulates had been restricted to lung function, the superficially obvious target organ, the association with cardiovascular function, its primary adverse effect, would have been overlooked.
One possible source of the FDA review’s misleading charge may be its limited view of brain–behavior relationships. In summarizing its findings, the FDA Food Advisory Committee (2011a) offered the following statement:
For certain susceptible children with attention deficit/hyperactivity disorder and other problem behaviors, however, the data suggest that their condition may be exacerbated by exposure to a number of substances in food, including, but not limited to, synthetic color additives. Findings from relevant clinical trials indicate that the effects on their behavior appear to be due to a unique intolerance to these substances and not to any inherent neurotoxic properties.
This statement surely does not mean to assert that the central nervous system is not the essential substrate for behavior or that behavior is a phenomenon independent of the brain. Its roots perhaps may be found in how toxicology was practiced in the past, when pathology—overt tissue damage—was far more important than function in assessing chemical safety.
Identifying special populations.
The literature on behavioral toxicity of food additives is replete with observations by investigators—and by much of the applicable data—that not all children are sensitive to additives in general, or food colors in particular, at common dietary levels. Indeed, not even Feingold asserted that all hyperactive children were sensitive to food additives. In a convincing example of such findings, Rowe and Rowe (1994)
, in a double-blind controlled challenge study with tartrazine, identified a subgroup of 24 children within their sample of 54 that responded consistently on each occasion that they consumed a color rather than a placebo capsule. Moreover, these children displayed a clear dose–response function, with the higher doses eliciting higher scores on their 30-item behavior inventory, including five clusters of related behaviors: a
) irritability/control, b
) sleep disturbances, c
) restlessness, d
) aggression, and e
) attention span.
The FDA review, however, seemed to insist that proving a connection between food color ingestion and adverse behavioral effects requires a uniformity of response in the sample under study that is virtually impossible to achieve in the diverse human population. For example:
Generally, the various reported findings across these 10 reviewed post-1982 portion of Group I trials, suggests that certain susceptible subgroups of problem behavior children with and without ADHD and, possibly, certain susceptible children from the general population without particular behavioral problems may exhibit a unique intolerance to artificial food colors resulting in typically small to moderate adverse behavioral changes which may not necessarily be characteristic of the ADHD syndromes. (FDA Food Advisory Committee 2011b).
Such a rejection of evidence stemming from data suggesting a subpopulation of children with enhanced sensitivity to food colors is perplexing. The FDA review implies that, because such a subpopulation may represent only a small proportion of children (hardly a proven proposition), it does not represent a significant health problem. Such a contention is inconsistent with the tenets of public health. Much of biomedical research, including environmental health research, is devoted to identifying and treating especially sensitive or vulnerable subpopulations. The underlying health goals of the Human Genome Project surely embraced that perspective. FDA drug warnings often are directed at special subpopulations. Finally, the FDA view on how this question pertains to food colors is an outlier among federal agencies. Note how the U.S. Environmental Protection Agency (2011) interpreted the Clean Air Act:
The National Ambient Air Quality Standards (NAAQS) are designed to protect the most vulnerable populations from outdoor air pollutants. Identifying these groups more precisely and understanding why they are more susceptible is of great importance to scientists and policy makers.
In its critique of McCann et al. (2007)
, the FDA is somewhat dismissive of the results, at least as conveyed by this statement:
Whatever behavioral changes [in the Southampton study] may have occurred were apparently of rather low magnitude (effect size of 0.18). This would suggest that the type of treatment effects reported in this study, even though the investigators referred to increases in levels of “hyperactivity,” were not the disruptive excessive hyperactivity behaviors of ADHD but more likely the type of overactivity exhibited occasionally by the general population of preschool and school age children. (FDA Food Advisory Committee 2011b).
This is a puzzling statement because an important facet of an ADHD diagnosis is excessive or inappropriate activity. Also, the DSM-IV lists six kinds of hyperactivity, not just the variation described above. The term “occasionally,” in this context, is at least equally puzzling. Respiratory infections are also occasional events for most children. If a survey were to find, say, a significant rise in the incidence of such infections among a group of schoolchildren, questions would be asked and actions possibly taken. This question, in fact, is the theme of many reports in environmental health.
The more significant paradox about the passage above by the FDA Food Advisory Committee (2011b) is its view that an effect size of 0.18 [in the range of many of the published studies (see Schab and Trinh 2004
; Stevens et al. 2011
)] can be considered trivial. Effect size is often used to gauge the importance or strength of a finding; therefore, how it applies to McCann et al. (2007)
—and its interpretation—is worth examining with a more familiar example.
Consider . For an IQ (intelligence quotient) distribution with a mean of 100 and and an SD of 15 (which describes a standardized IQ test such as the Stanford-Binet), 2.3% of the population will receive a score of < 70, a score that many school districts will view as warranting remedial attention. Now, define effect size, as used by McCann et al. (2007)
and typically in the psychological literature, in terms of the standardized mean difference:
Figure 1 For an IQ distribution with a mean of 100 and SD of 15 (e.g., the Stanford-Binet), 2.3% of the population will have an IQ score < 70, a score that many school districts consider warranting remedial attention. If an environmental exposure shifts (more ...)
(mean 1 – mean 2) ÷ (pooled SD).
If an environmental exposure shifts the mean by 3%, equivalent to an effect size of 0.2, to a mean of 97, 3.6% of the population represented by the distribution will have a score < 70. Based on Census 2000 counts, the U.S. government (Childstats.gov 2011) estimates that there are 76 million children 0–17 years of age in the nation. Of these, 1.75 million would be presumed to have an IQ score of < 70, given a mean of 100. A shift of the mean IQ to 97 would indicate that 2.74 million children would have an IQ < 70 (an increase of 990,000 children). Most observers would not consider this to be a value of “rather low magnitude.”
presents another set of implications based on an effect size of 0.2, or a 3% shift in IQ. It depicts the calculations by Herrnstein and Murray (1994)
of the broader social consequences of a population IQ increase of 3%, which was converted into the effects of a corresponding decrease in IQ by Weiss and Bellinger (2006)
. Although some of the presuppositions of these authors have aroused controversy, the relationships between IQ scores and lifetime earnings (e.g., Grosse et al. 2002
), and how income influences the outcomes shown in , lends credibility to the calculations.
Depiction of calculations by Herrnstein and Murray (1994) of societal benefits achieved by a 3% rise in the population IQ.
One other aspect of effect size calculations that the FDA review failed to consider is how such values are influenced by population heterogeneity. In their , Weiss and Bellinger (2006)
showed how effect size calculations can be distorted if the sample population contains two subpopulations. Assume that the sample population consists of 70% nonresponders and 30% responders, and that the mean of the responders is shifted by 1 SD when presented with a challenge, such a food color. Under these conditions, it would require a total sample of 265 subjects to achieve an effect size of 1.0 as defined above. It is easy to see how true effects of a food color challenge to an unselected population can be missed if the sample size is small or if a minority of the sample consists of responders. Given such circumstances, it made sense for some investigators, such as Rowe and Rowe (1994)
, to screen subjects for responsiveness to an elimination diet before undertaking the tartrazine challenge portion of their study. It is analogous to the cancer bioassay strategy of using high doses to identify carcinogenic potential in reasonably small samples of rodents.