|Home | About | Journals | Submit | Contact Us | Français|
Historically, toxicology has played a significant role in verifying conclusions drawn on the basis of epidemiological findings. Agents that were suggested to have a role in human diseases have been tested in animals to firmly establish a causative link. Bacterial pathogens are perhaps the oldest examples, and tobacco smoke and lung cancer and asbestos and mesothelioma provide two more recent examples. With the advent of toxicity testing guidelines and protocols, toxicology took on a role that was intended to anticipate or predict potential adverse effects in humans, and epidemiology, in many cases, served a role in verifying or negating these toxicological predictions. The coupled role of epidemiology and toxicology in discerning human health effects by environmental agents is obvious, but there is currently no systematic and transparent way to bring the data and analysis of the two disciplines together in a way that provides a unified view on an adverse causal relationship between an agent and a disease. In working to advance the interaction between the fields of toxicology and epidemiology, we propose here a five-step “Epid-Tox” process that would focus on: (1) collection of all relevant studies, (2) assessment of their quality, (3) evaluation of the weight of evidence, (4) assignment of a scalable conclusion, and (5) placement on a causal relationship grid. The causal relationship grid provides a clear view of how epidemiological and toxicological data intersect, permits straightforward conclusions with regard to a causal relationship between agent and effect, and can show how additional data can influence conclusions of causality.
In 1775, Percivall Pott concluded, on the basis of clinical observations, that scrotal cancer in chimney sweeps was caused by chimney soot (Potter, 1962). It was almost 140 years before experimental confirmation of this was produced by Yamagiwa and Ichikawa (1918). By repeated painting of rabbit ears with a coal tar extract, they produced epithelial skin tumors, powerfully corroborating what Pott had seen in humans. In this case, an inference of causation in humans was arrived at through a combination of the two scientific disciplines.
Subsequently, animal studies were used to verify other epidemiological findings, serving to establish Koch's third postulate: the agent should cause the disease when introduced into a healthy organism (Koch, 1884, 1893). Although Koch's original intent was proving disease causation by microbiological pathogens, this third postulate has also been applied to corroborating chemical-related epidemiological findings in humans, by testing in animals.
Although there are a number of examples of how epidemiology and toxicology intersected over time, perhaps the most notable case is tobacco smoke and lung cancer. By 1964, there was ample epidemiological evidence for a causal connection between lung cancer and smoking tobacco products; at that point, the U.S. Surgeon General (Bayne-Jones et al., 1964) accepted the relationship as causal. Yet at that time, toxicologists could not reproduce similar tumors in animal models. This lack of concordance emphasized the difficulty of using Koch's postulates that were established for infectious disease in chemical-related pathogenesis. Toxicological corroboration of epidemiological evidence later became the element of “biological plausibility” in Hill's guidelines for establishing causality (Hill, 1965). Hill and others (Bayne-Jones et al., 1964) effectively modified Koch's third postulate from an orientation of proof to one of plausibility.
Whereas the concordance was high between cancer-causing agents initially discovered in humans and positive results in animal studies (Tomatis et al., 1989; Wilbourn et al., 1984), the same could not be said for the reverse relationship: carcinogenic effects in animals frequently lacked concordance with overall patterns in human cancer incidence (Pastoor and Stevens, 2005). This lack of concordance between toxicology and epidemiology might arise because the high doses used in animal studies to produce tumors are not typically seen in human populations.
Nonetheless, toxicology took on a predictive rather than a confirmatory role by providing alerts for potential effects in humans, whether carcinogenic, neurotoxic, hepatotoxic, or any other adverse outcome. These alerts became the basis for regulating chemical exposure to humans. The underlying assumption was that restricting exposure well below levels at which adverse effects were seen in animals would prevent harmful outcomes in humans.
Thus, the relationship between epidemiology and toxicology has shifted over time. Both disciplines seek to contribute data relating to the causes of human disease and occasionally lean on each other to support propositions of causality. Toxicologists and epidemiologists alike spend considerable time and effort characterizing the relationship between the putative causal agent and a response (Fig. 1). Many of the same fundamental considerations are part of the evidence-based analysis that takes place by scientists in the two disciplines. However, the two fields could arguably be said to work in parallel rather than in concert. Can toxicological experimentation augment a weak positive epidemiological finding? Conversely, when and how does low biological plausibility influence a positive epidemiological finding? Separately, the fields can derive conclusions based on paradigms illustrated in Figure 1. Together, conclusions of causality can be more firmly based, further investigations can be clearly identified, and improvements in human health protection can be achieved. In addition to highlighting the history of relevant developments in the fields, we suggest a way that the two disciplines can come together to better understand the impact, potential or real, of agents on human health.
The disciplines of toxicology and epidemiology ask the question: can a substance cause a particular effect in humans. The data obtained in toxicological and epidemiological studies do not always lead to a straightforward interpretation, and often different observers will differ in their conclusions. Even for associations that are widely regarded as causal today—such as ingestion of water contaminated with the bacterium Vibrio cholerae and the incidence of cholera or cigarette smoking and the incidence of lung cancer—for some years after relevant data became available, there was considerable disagreement as to the presence of a cause-effect relation in each instance. Indeed, a principle underlying the philosophy of science is that causality cannot be “proven”; it can only be inferred with different degrees of certainty.
Epidemiological investigation of a null hypothesis that postulates that a variable has no effect on a health outcome can never be established to be true (Popper, 1959); there can only be a failure to show that the null hypothesis is false within the limits of specific study designs. Theories that integrate observations from multiple studies, or rely on other biological considerations, are useful when they make testable predictions. Hypotheses that are not testable do not fall within the realm of science. Likewise, expert opinion should be supported by evidence for rational science-based decision making (Guzelian et al., 2005).
Because Hill (1965) and others (Bayne-Jones et al., 1964) articulated their perspectives on causal inference, scientists have further described methods to systematically review and characterize the evidence that might be used to support an inference of causality (Cole, 1997; ECETOC, 2009; Kundi, 2007; Phillips and Goodman, 2004; Rothman 1976; Rothman and Greenland, 2005; Susser, 1986; Weed, 2005). We suggest an expert judgment process for integrating the totality of the epidemiological findings in a weight of evidence framework. This integration takes note of the literature cited above but extends this thinking by offering a method to systematically consider biological plausibility and epidemiological evidence in a process to unite epidemiology and toxicology in a framework to infer causality.
Epidemiological studies document the occurrence of illness or injury in human populations. Depending on the design, epidemiological studies can provide evidence bearing on a causal relationship. For example, quantification of the efficacy of pharmaceutical agents in humans is often based on randomized controlled studies, where “exposed” and “nonexposed” persons are similar with regard to other characteristics that bear on the outcome in question. However, the focus of this paper is on causal inference for environmental agents (primarily synthetic chemicals). Because randomized trials with environmental agents are rarely feasible or perhaps ethical, this study design will not be discussed here.
Studies aimed at evaluating environmental chemicals and other environmental factors are generally nonrandomized observational studies with an ecologic, case-control, or cohort design. Although these studies are fundamental in gauging possible human health effects, their design may limit the extent to which inferences about causality can be drawn. Because observational studies do not randomly allocate subjects to exposure, interpretation of the results of these studies must take into account any differences, or the possibility of differences, between exposed and nonexposed subjects. A brief description of these observational studies and their strengths and weaknesses follows.
Ecologic studies contrast the incidence of disease across populations (or population subgroups) that differ in terms of presence or degree of an exposure to an environmental factor. The incidence of disease among different population subgroups may be evaluated on the basis of, for example, geographical differences or changes in disease incidence over time within a population.
Ecologic studies have the potential to contribute to our understanding of exposure-disease relationships if …
For example, aflatoxin (a toxic product of Aspergillus flavus) was found early on in experimental evaluations in animals to be an extremely potent carcinogen. At that time, the only relevant data in humans took the form of correlations of liver cancer mortality rates across population groups with marked differences in estimated aflatoxin intake. The positive correlation observed in these studies was open to alternative interpretations—the populations with the highest rates differed in ways other than exposure to aflatoxin such as the prevalence of Hepatitis B infection—so it was largely the strength of the laboratory evidence that served as a basis for a tentative causal inference. Later, stronger epidemiologic data became available supporting a causal effect. In particular, there were ecologic studies with less potential for confounding and nested case-control studies in which prediagnosis urinary markers of aflatoxin intake could be assessed (Qian et al., 1994; Wang et al., 1996).
In practice, causal inferences that can be drawn from the results of ecologic studies may be limited because:
Case-control studies ascertain the proportion of persons who previously experienced one or more exposures among persons with a disease (cases) and a sample of persons representative of the person-time from which the cases were generated (controls).
Exposure ascertainment is a potential source of bias in case-control studies, notably in studies investigating environmental exposures because they may be incompletely or inaccurately reported or recorded and misclassification may vary by case-control status. Thus, the results can indicate either spuriously high or spuriously low estimates of the magnitude of any association. Direct measurements of blood or tissue levels of chemical exposures (or metabolites of these chemicals) obtained after diagnosis in the cases may not reflect earlier levels of exposure because the illness and its treatment may have led to an alteration in these levels. Even if exposure levels of cases were unaffected by disease status among cases, the levels measured at the time of the study may not be indicative of those present earlier in life when critical pathogenic events occurred.
Unless a case-control study can overcome the difficulties of valid retrospective ascertainment of exposure status, it cannot be confidently relied upon to provide a valid estimate of the association between an exposure and a disease. Cross-sectional studies, in which current exposure levels are compared between persons with and without a given condition at the time of the assessment of the exposure (irrespective of when that condition first developed), are particularly problematic in this regard.
In unusual circumstances—when the proportion of ill individuals with a history of a given exposure far exceeds what might be expected—an association can be inferred without the need for a formal control group. For example, because all cases of a form of pneumonia in an area of Spain during a relatively short period of time reported ingestion of adulterated rapeseed oil (Tabuenca, 1981), it was reasonable to infer a causal connection (and to take preventive action) prior to the enrollment of controls into this study.
It often happens that the same chemicals to which one or more communities are exposed also are encountered in persons who work in the manufacture or distribution of these chemicals. Because these exposures tend to be higher than those received in a community at large, any impact on disease risk from exposure to the agent is likely to be greater in magnitude in the exposed workforce and therefore easier to ascertain in an epidemiologic study.
Because it is often possible to identify workforce members and monitor their status through vital records and disease registers, epidemiologic studies based on the workers’ experience are feasible. The results of occupational cohort studies can nevertheless be difficult to interpret because of the presence of multiple chemical exposures on the job and, particularly in retrospective cohort studies, difficulties in accounting for prior work history and other disease-causing exposures not ascertained in available records such as smoking history. Nonetheless, occupational cohort studies have contributed a great deal to our understanding of health effects of chemical exposures and, when available, can assume a prominent place in the evaluation of the safety of exposure to chemicals.
In toxicology, the test agent is given to the animal or in vitro cellular system under clearly defined exposure conditions (e.g., oral, dermal, inhalation; gavage, diet; short term, long term, etc.). The physiological status of each test group is compared with the untreated group. From this body of data, a toxicologist must then decide which responses are exposure related and determine whether the responses observed are relevant to humans. In the absence of evidence to the contrary, the toxicologist assumes that findings in animals are likely to be relevant to human health.
However, as our understanding of biological systems has evolved, we realize that effects in animals may not be relevant to humans. The need for this important distinction depends on either qualitative differences in biology or quantitative differences between animals and humans in the kinetics of the chemical or the dynamics of the response.
A systematic way of drawing conclusions of human relevance (causality) was first proposed by the International Programme on Chemical Safety (Sonich-Mullin et al., 2001) and later expanded substantially with the development of frameworks for evaluating the human relevance of mode of action (MoA) in experimental animals for carcinogens (Boobis et al., 2006; Cohen et al., 2003; Klaunig et al., 2003, Meek et al., 2003) and for noncancer effects (Boobis et al., 2008; Seed et al., 2005). Julien et al. (2009) carried this concept one step further and proposed the key events dose-response framework (KEDRF). KEDRF is a step-wise decision-logic process that provides a foundation for more rigorous and quantitative descriptions of dose-response.
The essential form of MoA analysis asks three questions to establish the likelihood of a chemical's potential effect on humans (Fig. 2):
If the answer is YES to all three questions, then the effect seen in animals could plausibly occur in humans. Likewise, if the MoA is considered to be not relevant to humans, then the biological plausibility of the effect being observed in humans through the proposed MoA is highly unlikely.
In one of the initial essays that wrestled with the bases for inferences of the causes of disease, Hill (1965) concluded that it is not possible to “lay down some hard-and-fast rules of evidence that must be obeyed before we accept cause and effect.” In practice, tentative inference regarding the presence or absence of a causal relation between exposure and disease is made through a subjective process in which one considers which of the indicated features are present and, in particular, the degree to which they are present. Occasionally, the process is straightforward—all the evidence supports a causal hypothesis—and nearly everyone who addresses the issue arrives at the same conclusion, for example, that cigarette smoke is a cause of lung cancer. The evidence is considered “conclusive”; causation is viewed as “definitely present.”
A similar conclusion occasionally can be reached when the epidemiologic data are overwhelming, even without supporting evidence from other medical disciplines. For example, the extremely strong association seen in epidemiologic studies between aspirin use and Reye's syndrome, combined with the absence of any similar association with the use of other analgesics (Halpin et al., 1982; Hurwitz and Schonberger, 1987; Forsyth et al., 1989), served as a solid basis for discouraging aspirin use in children, even without any precise knowledge at that time of how aspirin might have caused a child with flu or chicken pox to develop this illness.
In other instances, little or no evidence suggests causation, such as in the published literature relating exposure to magnetic fields and the occurrence of cancer. In this case, it is likely that most groups of experts would conclude that there is no evidence for an etiologic connection between exposure to magnetic fields and cancer in adults, that is, there is “no evidence supporting causality.” Because it is not possible to rule out a weak effect of exposure on disease incidence, it is not surprising that some debate continues regarding the safety of exposure to magnetic fields.
These instances of varied information from toxicology and epidemiology argue for a systematic approach that brings comprehensive, disciplined thinking into a complete and rational evaluation of the evidence. Such a systematic treatment lays on the table the complete story and gives practitioners a way to point to specific gaps in knowledge or lapses in logic based on the totality of information.
Overall, the Epid-Tox Framework follows a series of steps that assesses an explicit effect such as a specific cancer, neurological disease, or any tissue or system-specific adverse effect. The following steps would be to:
This may be too obvious, but a serious source of bias is the selective collection of studies. A comprehensive search for all studies relevant to the end point in question should be conducted and documented as part of the process. This step is meant to be as inclusive as possible, bearing in mind that the process begins with a specific question: does agent X cause effect Y? All studies that offer data should be included at this point. One problem that continues to plague both toxicology and epidemiology is the nonpublication of “negative” studies wherein no effects were seen and investigators and journals are reluctant to publish such information. Nonetheless, no-effect studies are an important part of the total available data set and their absence biases the overall judgment in favor of studies showing effects.
Both kinds of studies, in epidemiology and toxicology, may present the observer with a wide range of investigations carried out in variable ways, with differing entry or exclusion criteria, variable ascertainment of effects, a range of exposures or exposure estimations, and observational endpoints. No study should be excluded at this stage of consideration. Having collected all available studies, each study should be included or excluded by using a transparent rationale. Both disciplines have generally accepted criteria for assessing study quality.
The U.S. Environmental Protection Agency (USEPA) developed quality criteria that are typically applied in the evaluation of studies submitted for regulatory purposes. Various terms have been used to describe a study's suitability, relevance, conduct, and how well the study satisfies the intent of a particular guideline (USEPA, 1993), including “core guideline,” “core minimum,” “core supplementary,” or “invalid.” Core guideline indicates an acceptable study, whereas core minimum indicates that “while some things are missing, the study still fulfills the guideline requirements.” Core supplementary has been used to identify studies with “… a significant deficiency or that additional information is required.” Terms have changed over the years to “Acceptable” and “Unacceptable,” with additional statements as to whether a study is upgradable to Acceptable status (USEPA, 2001). For the purpose of the Epid-Tox framework, the extremes of Acceptable and Unacceptable are useful categories. There are clearly going to be well done studies with verifiable conclusions and on the other hand studies with inapplicable methods, inappropriate data, or unsubstantiated conclusions. As well, an intermediate category is needed to categorize studies that have deficiencies that render them less than fully acceptable, but have sufficient quality that they cannot be regarded as unacceptable. Thus, the suggested categories include Acceptable, Supplemental, and Unacceptable (Fig. 3).
Similar to all scientific investigations, no epidemiological study is perfect; all have limitations to some extent. Nonetheless, experienced epidemiologists can evaluate the strengths and weaknesses of individual studies and categorize them as to whether they can be used to inform a judgment regarding causality. ECETOC (2009) has an excellent rendition of quality criteria that are based on elements of study design, exposure information, and health effects data. However, no objective, numerical yardstick exists to grade the quality of epidemiology studies.
Certainly, there is a subjective element in the categorization process. However, it is better to take a study's quality into account, acknowledging the imperfection of the process, than to give each study an equal weight. How this is done might vary from investigator to investigator, but in any case, the process of quality categorization needs to be transparently documented in the evaluation. What one investigator may find to be an acceptable study might be rejected by another. The value of this step in the Epid-Tox Framework is to fully reveal and document not just the investigator's quality categorization but the reason for drawing a particular conclusion.
Documentation of these evaluation and categorization decisions can be provided in narrative form for individual studies. Study attributes to be considered include—but may not necessarily be limited to—the number of subjects, the range of exposure levels among these persons, study enrollment methodology, disease and exposure ascertainment methods, range of exposure, potential information bias, identification and measurement of potential confounders, and statistical methodology used to assess associations and to control for confounders.
As more reports of epidemiology studies include a complete description of the design and analytic methods used, the information needed to perform a quality assessment will be more readily available. A report checklist was developed by von Elm et al. (2007) for observational studies, known as the “Strobe statement,” and serves as a method to evaluate the quality of reporting of a study. In a similar initiative, the London Principles for Epidemiology itemized the attributes that characterize well-conducted observational epidemiological studies (Graham, 1995; London Principles, 1996).
However, it might be measured, the quality of epidemiological studies will likely be distributed from useful to useless. For practical purposes, studies can be put into discrete categories of quality similar to those used for toxicology studies. Studies that are well designed, relatively free of bias, and have adequate control of known confounders are classified as Acceptable. Supplemental studies would have more serious imperfections and be of lesser quality but still be useable. Unacceptable studies would fail to meet several or all of the quality criteria and would not be used in subsequent steps in the evaluation (Fig. 3).
With all Acceptable and Supplemental studies at hand, the question is asked, “Is the effect of interest present.” If there is evidence for a specific effect in some or all animal studies, the next stage in the evaluation is to use MoA analysis to determine human relevance (Fig. 2). If the answer to all three MoA Framework questions is “yes,” then the effect is considered plausible to occur in humans. If the specific effect of interest is absent from the animal studies or the effect is present but judged by MoA analysis to be not relevant to human health, then the effect is concluded to have low biological plausibility to occur in humans.
Based on the evaluation of the complete set of studies categorized as Acceptable or Supplemental, a judgment is made as to whether or not there is an association between an agent and a given disease in humans as well as the strength of that association. This conclusion must be made from the totality of evidence and may require balancing conflicting studies to produce one encompassing statement about the epidemiological evidence.
Various approaches can be used to produce one encompassing statement, including the systematic use of the Hill criteria (Hill, 1965). But the essence of evaluating the weight of evidence relies upon several central concepts. These include, but are not limited to, an effect within and among the studies that is found with strength, consistency, specificity, and coherence (Cole, 1997; Lagiou et al., 2005). Whereas there are currently no hard-and-fast systematic, numerical characterizations that capture this expert judgment process, most practitioners would acknowledge that faced with an array of epidemiological studies, these concepts would guide their judgment in deriving a reliable encompassing statement of causal inference.
The ultimate value of the Epid-Tox Framework is to determine the degree of strength or likelihood of the effect of interest. Therefore, for both the epidemiological and toxicological findings, there needs to be a semiquantitative conclusion that states the degree to which the studies indicate a positive, a negative, or no relationship.
At the beginning of any epidemiological or toxicological evaluation, there has to be a starting point from which evidence pushes a conclusion toward the existence or lack thereof of causality. Starting at one end of a scale is not appropriate. Such a starting point implies that as studies are accumulated, a positive association will be identified when the reverse, a lack of association, may also become increasingly plausible as scientific evidence accumulates.
Therefore, the scaling of strength for an epidemiological or toxicological evaluation begins at the center of the scale and, depending on the presence or absence of the effect, the scaling moves accordingly in the positive or negative direction. By starting at the center of each scale (the middle of the grid), evidence of absence can be distinguished from absence of evidence for an association. For example, with few sufficient quality epidemiology studies, one may have to state that there is an absence of evidence to conclude one way or another that there is a causal association. On the other hand, evidence for an absence of an epidemiological relationship can take either of two forms:
Figure 4 shows the Epid-Tox graphical template for establishing a causal relationship. Starting at the intersection of the x- and y-axes (middle of graph), the degree of biological plausibility (toxicology) is scaled on the y-axis and the degree (weight) of epidemiological evidence on the x-axis. The intersection of the toxicological and epidemiological scaling leads to an appropriate, evidence-based conclusion regarding causality.
The structure and appearance of the causal relationship graphic is fundamental to ensuing decisions about causality. Several factors led to the development of its form, including the impact of the degree of “positive” or “negative” data and the relative weighting of epidemiological studies versus toxicological studies. At the beginning of any analysis, there may be a dearth of either toxicological or epidemiological studies. In such a case, where the scaling remains at or near the center point, there is “insufficient information” to draw any conclusions. Note that the area of insufficient information is oblong and extended for the biological plausibility axis. The reason is that animal studies require a greater degree of evidence relative to epidemiology. Animal studies are surrogates for actual human data and as such require higher levels of evidence.
In addition, as more studies and information become available, the scaling of either the toxicological plausibility or epidemiological evidence can change in a way that can be easily illustrated with the two-dimensional graphic.
At this point, the evaluator can clearly see where the epidemiological and toxicological evidence intersects and, based on that location on the graphic makes an overall conclusion that starts with, “A causal relationship is …” and completes the conclusion with words that describe the resultant area. Short, descriptive phrases are used here, but the underlying data and weight of evidence should be well understood at this point. The categories are as given below.
A causal relationship is “Likely” between the environmental factor and the disease condition. This implies that consistent, reliable evidence from epidemiological and animal studies permits a causal inference to be made. Two examples of this outcome are asbestos as a cause of mesothelioma and tobacco smoke as a cause of lung cancer.
A causal relationship is “Uncertain” between the environmental factor and the disease condition. In this case, there may be epidemiological evidence that can reasonably be interpreted as indicating a causal link. However, there may be little or no biological plausibility based on animal studies. Note in the lower right-hand corner that the transition between Likely and Uncertain favors epidemiological evidence. That is, with a high degree of epidemiological evidence, significant, and compelling data for a lack of biological plausibility must exist to transition from Likely to Uncertain. This again stresses the primacy of epidemiological evidence.
For example, Kaposi's sarcoma, a normally rare tumor in humans, showed such a remarkable increased incidence following HIV infection (Sarid et al., 2002) that epidemiological criteria for a causal relationship were met (Fig. 5). However, at the time, no laboratory studies had verified the pathogen. Therefore, the association was categorized as likely but of low biological plausibility. Later, extensive laboratory investigations led to the discovery of a specific herpes virus (HHV8 or KSHV) that strengthened the inference of causality because of an increased knowledge regarding a likely pathogenesis of Kaposi's sarcoma.
Based on some early suggestive results, a number of epidemiologic studies have been done on the possible relation between exposure to electromagnetic fields (EMF) and the occurrence of brain cancer. The results from occupational studies—which typically involve higher levels of exposure than residential studies—have been summarized (Kheifets et al., 2008). The relative risk associated with EMF exposure was statistically significant (RR = 1.14, 95% CI = 1.07–1.22). The authors of the review, however, concluded that “the lack of a clear pattern of EMF exposure and outcome risk does not support a hypothesis that these exposures were responsible for the excess risk.” Biological plausibility is low in this example because “in vitro, in vivo, or mechanistic evidence has not provided clues” as to a basis for an association between exposure to EMF and the development of brain cancer (Kheifets et al., 2009). As shown in Figure 5, the initial analysis of epidemiological studies showed some evidence; however, in combination with low plausibility, the causal relationship would be considered Uncertain. With time, more recent studies—generally with relatively better exposure ascertainment—tended to observe an even smaller association than did earlier studies. The updated evaluation would move the categorization from Uncertain to Unlikely.
A causal relationship is “Uncertain” but plausible between the environmental factor and the disease condition. In this instance, the weight of evidence analysis of epidemiological studies shows little or no evidence of any effect although toxicological studies may indicate the plausibility of an effect in humans.
For example, as shown in Figure 6, melamine bladder and kidney toxicity seen in animal studies was considered relevant to human health, albeit only at very high exposures. But no epidemiological evidence supported a causal relationship. An initial evaluation placed melamine in this category (Uncertain but plausible). However, the unfortunate incidents in China after the adulteration of milk with melamine and resultant rise in the number of children with melamine crystals detected in the urinary bladder and death because of kidney damage confirmed that the mode of action understood from animal models is relevant to humans at high levels of exposure (World Health Organization, 2009). This additional epidemiological evidence moved the conclusion of causality from Uncertain to Likely.
A causal relationship is “Unlikely” between the environmental factor and the disease condition. Both epidemiological and toxicological evidence is compatible with the absence of effect.
For example, because D-limonene causes kidney toxicity in male rats, the biological plausibility was high and, without epidemiological evidence, would be considered Uncertain but plausible (Fig. 6). Subsequent investigations showed that D-limonene–induced kidney toxicity is not relevant to humans (Swenberg and Lehman-McKeeman, 1999; Meek et al., 2003), which moves the conclusion of a causal relationship to the Unlikely category.
Another example, not shown on the grid, is phenobarbital. Phenobarbital increased the incidence of liver tumors in long-term rodent bioassays (Whysner et al., 1996) by a mode of action that would be plausible in humans. Biological plausibility would be considered high for phenobarbital. However, epidemiological studies have found no evidence of liver tumors in patients on lifetime anti-epilepsy treatment with phenobarbital (IARC, 2001; Whysner et al., 1996). Without epidemiological evidence, the categorization would be Likely or Uncertain, but with sufficient epidemiological evidence for an absence of an effect in humans, the categorization would be Unlikely.
As mentioned previously, a checkbox approach to characterize the nature of the evidence that would lead an expert team of epidemiologists and toxicologists to reach a weight of evidence decision is not practical. However, the Epid-Tox Framework described here and shown schematically in Figure 7 provides by structure and example a way of systematically working through all evidence and reaching a conclusion that can be tracked, debated, and modified with further data.
The proposed new framework represents a concerted effort to bridge the fields of epidemiology and toxicology in a way that can impact, and hopefully improve, human risk assessment. It will benefit from application and critique and will undoubtedly require some modification. This formalized set of steps, in and of itself, provides a structure for challenging both disciplines and how they can and should be brought together. Each step of the process invites improvement, including the availability of studies, the determination of quality, the proper metric for assigning the degree or strength of evidence, and the appearance and utility of the two-dimensional grid.
Availability of evidence will always be an Achilles Heel in any evaluation that seeks scope and completeness. Unpublished data that languish in the drawer or is only contained in official submissions to regulatory agencies is an unfortunate omission that currently is unavoidable. Furthermore, for both disciplines, the lack of “negative” studies (showing no effects) are usually judged to be of lesser value either by the investigator seeking a new finding or a journal editor requiring impactful research. Such evidence rarely appears in the literature.
One area that continues to plague both toxicology and epidemiology is measurement of quality. Whereas poor and excellent studies can often be identified and categorized, there is no consistent and agreed method for those that fall in between poor and excellent. For example, the Strobe statement (von Elm et al., 2007) details criteria for judging the quality and reliability of epidemiology studies, but the method is not routinely used or cited with studies or reviews. In experimental biology, the criteria for quality are less well defined and are often the product of where the work was done and where it was published.
Another area that will need debate and refinement is the degree of detail one needs to complete the two-dimensional grid proposed in the Epid-Tox Framework. It may be sufficient to declare general degrees of confidence in the scaling of the two axes. However, with more precise scaling, one could imagine dividing the grid into four quadrants, with four quadrants within each quadrant, thereby subdividing and giving greater granularity to the overall conclusions. This may add greater detail to the analysis but may also spark fruitless debates about precisely where the biological plausibility or epidemiological point should be on each axis.
Nonetheless, a framework can provide the logic and disciplined thinking that promotes open discourse and leads to evidenced-based decisions. Furthermore, decisions about what epidemiological or toxicological study should be done can be facilitated by using the framework. For example, clear indications from animal studies for a particular effect should inform the data collected in an epidemiology study. Likewise, epidemiological findings should spur the design of in silico, in vitro, or in vivo studies that could corroborate observations in human populations. Important decisions about human safety should rely on the cohesive appreciation of both epidemiology and toxicology and the synergistic value that their combination brings to a comprehensive evaluation.
The refinement of any method occurs by working examples through it In order to take that first step toward refinement, Simpkins et al. (2011) provides a case study that utilizes the framework to collect, evaluate, and integrate epidemiological and toxicological evidence for causal inference. Hopefully, more environmental agents will be worked through the Epid-Tox Framework to test and improve its utility.