|Home | About | Journals | Submit | Contact Us | Français|
Neurodevelopmental disabilities affect 3–8% of the four million babies born each year in the U.S. alone, with known etiology for less than 25% of those disabilities. Numerous investigations have sought to determine the role of environmental exposures in the etiology of a variety of human neurodevelopmental disorders (e.g., learning disabilities, attention deficit-hyperactivity disorder, intellectual disabilities) that are manifested in childhood, adolescence, and young adulthood. A comprehensive critical examination and discussion of the various methodologies commonly used in investigations is needed. The Hershey Medical Center Technical Workshop: Optimizing the Design and Interpretation of Epidemiologic Studies for Assessing Neurodevelopmental Effects from In Utero Chemical Exposure provided such a forum for examining these methodologies. The objective of the Workshop was to develop scientific consensus on the key principles and considerations for optimizing the design and interpretation of epidemiologic studies of in utero exposure to environmental chemicals and subsequent neurodevelopmental effects. (The Panel recognized that the nervous system develops post-natally and that critical periods of exposure can span several developmental life stages.) Discussions from the Workshop Panel generated 17 summary points representing key tenets of work in this field. These points stressed the importance of:
Neurodevelopmental disabilities affect 3–8% of 4 million babies born each year in the U.S. alone (Weiss and Landrigan, 2000). Fewer than 25% of these neurodevelopmental disabilities have a known etiology. It is now appreciated that subtle damage that occurs to the nervous system during early development can manifest much later in life. This makes the ability to establish a relationship with events occurring during gestation even more challenging. In an effort to identify the causes of neurodevelopmental disabilities, epidemiologic research is a valuable tool that can be used to identify potential links between disease and genetic and environmental factors. Numerous epidemiologic studies have examined potential links between in utero or early postnatal exposure and specific chemicals (e.g., pharmaceuticals, environmental chemicals such as lead, methylmercury, polychlorinated biphenyls [PCBs]) and adverse developmental or behavioral effects in children (see Rice and Barone, 2000, for a recent review). These studies have been invaluable in laying the groundwork for how such investigations should be conducted, and provide an excellent foundation for future studies.
Given the current interest in expanding such studies to address issues related to adverse effects of low-level exposures to environmental factors, an examination of the methodologies commonly used would be of significant value to investigators in the design and analysis of future studies. Such a review would assess the strengths and limitations of methodological approaches used to date, and consider scientific and technical advances in relevant methodologies, such as exposure assessment, neurodevelopmental assessment, interpretation of data, and incorporation of an evidence-based approach to identify health concerns. This review would serve to identify key methodological factors that ultimately determine the value and strength of a study.
The Hershey Medical Center Technical Workshop: Optimizing the Design and Interpretation of Epidemiologic Studies for Assessing Neurodevelopmental Effects From In Utero Chemical Exposure was a one-day meeting held in conjunction with the 22nd International Neurotoxicology Conference (Environment and Neurodevelopmental Disorders), Research Triangle Park, North Carolina, September 2005. Within this framework, a multidisciplinary panel was convened to discuss issues as they relate to the design, conduct, interpretation, and dissemination of information of human studies examining the potential adverse effects from gestational exposure to various environmental agents. The Panel was comprised of experts in psychology, medicine, risk/exposure assessment, analytical chemistry, neuroimaging, epidemiology, toxicology, statistics, psychiatry, pediatrics, pediatric neuropsychology, and neurology. Each member had experience and interest in assessing the effects of environmental chemical exposure on human development.
This Workshop was organized to discuss the important principles for detecting the effects of environmental exposures on neurobehavioral development and to make recommendations for the design of future studies evaluating the impact of in utero exposures. (The Panel recognized that the nervous system develops post-natally and that critical periods of exposure can span several developmental life stages.) The discussions were initiated by a series of questions related to scientific methodological issues that were posed to the Panel prior to the Workshop (see Table 1). Given that adverse effects may emerge after long latent periods, the Panel discussed how effects that manifest as irreversible damage to the central nervous system, progressive neurodegeneration, or subtle neurological dysfunction first appearing in adolescence and adulthood could be considered and incorporated into study designs. The Panel focused on identifying ‘best practices’ for such studies which often required revisiting the basic principles underlying current epidemiological studies. The Panel did not evaluate conclusions or findings from previous neurodevelopmental epidemiological studies related to the topic of environmental exposure; however, discussion of focal points from such studies served the basis for identifying critical points for consideration in any future study designs. The outcome of this Workshop serves to provide input for both the design of future investigations and establishment of standards whereby one can judge the adequacy of reported studies. This report represents a summary of the Panel deliberations, including current basic scientific tenets that have been embraced by this field, as well as considerations for future work.
Developmental epidemiological studies examining perturbations to the nervous system from exposure to environmental factors have provided critical data for determining human heath risks. These studies have set the framework for the design of subsequent studies. Early observations regarding adverse effects on the fetus from maternal exposure to various agents focused primarily on pronounced and obvious adverse outcomes from high exposure levels. Contemporary concerns have shifted and are now focused on more subtle alterations that may occur with low levels of exposure. Such alterations, while not overt, may have long-term impact on human health. The evaluation of such relationships between exposure and outcome presents a significant challenge to the investigator, especially when the effects are subtle or delayed in manifestation. In the discussions, the Panel recognized the importance of revisiting some of the fundamentals of neurodevelopmental epidemiology and basic critical components of study design as they relate to studies of in utero exposure. An extensive discussion of epidemiological study design was outside the scope of this Workshop, as was the evaluation of specific published research.
The charge to the Panel was to evaluate study design for determining an association between in utero environmental chemical exposure and developmental outcomes in exposed children. Given the rationale for this Workshop, the Panel focused part of the deliberations on design strategies for examining various outcomes resulting from such in utero exposures. First and foremost, the Panel emphasized the importance of formulating a specific hypothesis or set of hypotheses as part of the study design. Such hypotheses should be informed by and consistent with previous research and, whenever possible, have biological plausibility and be related to an underlying biological mechanism. The degree of focus or specificity of the hypothesis will depend to some extent on the number of different outcome variables. However, hypothesis formulation is important for the design, analysis of data and interpretation of results.
The prospective longitudinal design has proven itself as one of the most useful study designs for examining outcomes associated with in utero exposures. Challenges exist when assessing the influence of in utero exposures on delayed manifestation of neurotoxicity, and the longitudinal approach may be the primary approach to detect delayed adverse outcomes. The impact of this type of study design comes from the ability to assess exposure at the beginning of the study, with the possibility of obtaining greater accuracy. Most alternative designs require a retrospective assessment of exposure — a useful, but less attractive option. The collection of longitudinal data allows for the possibility of detecting subtle changes in development that may not be apparent in cross-sectional analyses. Longitudinal assessment of both exposure and outcome has required the development of new methods of statistical analysis that can incorporate the resulting correlation among repeated measurements. Such methods permit the assessment of change within individuals over time. This approach has greatly increased the power of the statistical inference in longitudinal studies. The prospective longitudinal design can include developmental testing and assessment of potential confounding variables at regular, fixed time intervals, typically for a period of two to five years, and possibly even longer. The ability to reassess or identify potential confounding variables over the course of the study offers a unique strength to such a study design. However, as in any longitudinal study, the positive aspects of the extended length of the follow-up period must be balanced with the difficulties of maintaining active subject participation. Special efforts are often required to prevent attrition. Over time, the attrition can become systematic as opposed to random. This, in turn, can affect the results of the data analysis. However, with a longitudinal study design, one can actually examine the various stages of child development. Thus, it may be possible to demonstrate subtle differences due to in utero exposure by examining rates of development, as well as specific cognitive, motor, sensory, psychological, or other outcomes.
One design alternative is what is sometimes referred to as an historical cohort study. This involves enrolling a group of children within a particular age window, and retrospectively determining gestational exposure. Such assessment is difficult and will only be possible in very special circumstances, such as when sufficiently detailed, high-quality records, or even biological samples, are routinely available. Alternative useful designs, depending on the questions being asked and the hypotheses being posed, are case-control or case-cohort design (Rothman and Greenland, 1998). These designs are useful if a particular developmental problem is of interest for which standard diagnostic procedures are available. Within a cohort study, one can nest both case-control or case-cohort studies to take advantage of specific diagnostic tests.
This type of design may be especially important in the study of exposure in children, where the developmental stage may be critical in determining the ability to detect effects of neurotoxicants or other environmental risk factors in a given cohort.
The design of a particular study, will, in general, dictate the participants selected. For the majority of studies examining in utero exposure, the basic design typically involves assessing exposures that occur during pregnancy and outcomes that are assessed after birth. For such a study, during an initial enrollment period a group of pregnant women is identified early in pregnancy and both the women and children are followed longitudinally for a specified amount of time. The overall goal of participant selection is to select a sample that mirrors the parent population, and thus maximizes the generalizability (external validity) of an observational study (Kalsbeek and Heiss, 2000).
If a cohort is selected completely at random, it is assumed to be representative of the population under study. In a simple random sample, however, the possibility still exists that individuals in certain categories would be missed, over-sampled, or under-sampled. To address this issue, studies may rely on stratified random sampling strategies that involve the creation of mutually exclusive and exhaustive strata (for example, based on age, ethnicity, and residence) and selecting samples from each of those strata (Lemeshow and Stroh, 1988). The main disadvantage of both simple and stratified random sampling methods is that they require a complete list of the population from which a sample is drawn. One solution to this problem is to use a cluster sampling strategy where the original lists consist of clusters, such as villages, wards, or polling districts, rather than individuals (Kalsbeek and Heiss, 2000). Another possibility is a two-stage design in which subjects are selected for recruitment after an initial survey. In many instances, this initial survey would include a preliminary estimate of exposure. Although random samples selected from well-defined sampling frames are uncommon, some thought should be given to the available pool of subjects, and whether this represents the population to which the results of the study should apply. More representative samples can be obtained by using sites with different geographic locations representing different subpopulations, as has been proposed for the National Children’s Study (http://nationalchildrensstudy.gov/).
The actual process of selecting a representative sample is an ongoing challenge in both case-control and cohort studies. Technology, privacy laws, and regulations are severely limiting the use of established methods of selection, such as random digit dialing (RDD) (Brogan et al., 2001) and Department of Motor Vehicles (DMV) records (Funkhouser et al., 2000). Hospital-based and neighborhood recruitment strategies can be time-consuming, and are more likely to produce samples that are not representative of the general population (Wacholder et al., 1992). Recently, large comprehensive population directories, merged from multiple sources, have become commercially available. It has been suggested that these directories may be more comprehensive and more cost-effective than the traditional sources of participant selection. Supportive data are not yet available regarding their ultimate utility, and unless very diverse sources of information are used in these directories, they may be subject to stratification bias.
The term selection bias refers to introduction of systematic differences in characteristics between those who take part in (and provide data for) the study and those of the population, resulting in lack of generalizability and possibly comparability between groups being studied (Grimes and Schulz, 2002). By contrast, information bias occurs from errors in obtaining the needed information (Rothman and Greenland, 1998). Examples of bias include: inappropriate selection of controls in case-control studies, Berkson’s bias (i.e., when both disease and exposure are likely to be associated with hospitalization, but unrelated to each other in the general population), differential loss to follow up in longitudinal studies, non-response/missing data, volunteer/self-selection, and misclassification of exposure or outcome (Gordis, 2000). To minimize selection bias in the initial recruitment, the researchers may choose to over-sample certain population strata where there is a higher likelihood of refusal or drop-out (Ware and Lee, 1988). Although this approach may help ensure sufficient representation of hard-to-study groups, it does not take into consideration the issue of differential drop-out within the over-sampled categories.
It is important to note that even well-designed studies can become subject to bias in the implementation phase. Participants may refuse to enroll or fail to provide necessary data, or be lost to follow up due to change in residence or lack of motivation. The refusal to participate, failure to provide data, and loss to follow-up may be of particular concern when there is evidence that participation and missing information are not independent of exposure and outcome status. For example, in a large longitudinal study of children in South Africa, Richter et al. (2004) reported that after 10 years the project had retained 70% of the cohort, with different rates of attrition in different subpopulations. In a cohort study evaluating the association between methylmercury exposure and neurobehavioral testing results among Faroese children, the final dataset included approximately 63% of the eligible cohort (Grandjean et al., 1997). This level of participation in and of itself may not introduce bias; however, the hospital with the lowest participation (33%) had the highest median blood mercury concentration (Grandjean and Weihe, 1993).
In those circumstances where a concern about selection bias is justified, an extra effort to recruit a subsample of non-participants may be warranted. This allows researchers to evaluate and quantify the differences between participants and missing subjects—including individuals lost through attrition or non-sampled sub-populations (Caetano et al., 2003). To minimize bias due to subject drop-out, one may: a) obtain information on family or friends as additional contacts; b) maintain regular contact with participants; c) utilize monetary compensation; and/or d) use methods for early identification of attrition problems for correction (Hartman, 2005). Alternatively, one can address the potential impact of missing information during the analysis phase of the study.
During conduct of the study, errors in the collection of the data may also be a concern. Sources of information bias include measurement and misclassification errors, and recall bias in retrospective assessment. There appears to be growing consensus that bias quantification should become an important part of the collection, analysis, and interpretation of observational data, with an understanding that random variability is not the only important source of error (Caetano et al., 2003; Greenland, 2005; Hernan et al. 2004; Little and Vartivarian, 2003; Maclure and Schneeweiss, 2001). Both selection and information bias issues may be particularly important in cohort studies, especially if existing cohorts are used in the future to test hypotheses that were not considered at the time of the initial recruitment.
As noted by Rothman and Greenland (1998), “In large part, the quality of exposure measurement will determine the validity of an environmental epidemiology study.” Exposure assessment is clearly a critical component in these types of studies, particularly in studies of low-level toxicant exposures. Before exposure can be assessed, it must be defined. The definition should include an understanding of the exposure setting, the source, nature, and context of the exposure, whether the exposure of interest is a single agent or a mixture, and the possibility of exposure to other compounds not under study (Rothman and Greenland, 1998).
Individual exposures can be relatively constant or highly variable over time. Daily or seasonal variability in exposure can occur as a function of location of primary residence and pattern of daily activity including dietary preferences and cultural practices. This variability determines the sampling interval and may necessitate the need to construct summary measurements for analysis. Such summaries may provide a more accurate measure of total exposure than individual measurements. In addition, the biological kinetics of a compound drive the measures used for analysis. For many toxicants, an average daily exposure can be used; for others, the peak exposure is more relevant. Such decisions depend upon on the nature of the specific toxicant and the actual level and variability of exposure. Studies frequently rely on indirect assessments using biomarkers of exposure rather than on direct exposure measurements. While such markers can be useful, they can have the potential to be misleading and may require additional confirmation.
While accurate exposure measures are often difficult to obtain, they are essential if one is to interpret any statistical associations that might occur. Thus, quantitative measurements are preferred to those relying on subject recall. While occupational or acute exposures can often be determined with a degree of accuracy, chronic exposure to environmental toxicants can be variable and difficult to measure accurately. In this regard, exposure assessment for many agents is best measured prospectively during pregnancy. If applicable, a profile can be generated for postnatal exposure and information obtained retrospectively on pre-pregnancy exposure. The intensity of exposure must be sufficiently great and the length and frequency of follow-up adequate so that the exposure is more likely to affect measures of outcome in the hypothesized manner.
Environmental measurements (e.g., diet, soil, air) can be used to assess exposure as part of a neurodevelopmental study. In addition, biomonitoring has been incorporated into epidemiological studies on neurodevelopment. There have, for example, been investigations into the relationship between neurodevelopment and concentrations of selected environmental chemicals, primarily the persistent lipophilic chemicals, in human milk and maternal plasma. Over the past several years, guidance on the selection and collection of biological samples, as well as on sampling time and methods, have been developed. For example, total serum lipids are known to vary with the stage of pregnancy and decrease after delivery (Montes et al., 1984). Consequently, exposure to lipophilic environmental chemicals should be adjusted for the lipid percent at the time of sample acquisition.
Estimates of fetal exposure can be made based on maternal biological samples, such as urine, blood, and breast milk (Barr et al., 2005; Budtz-Jørgensen et al., 2004). Levels of chemicals in these maternal compartments, however, may not accurately reflect the actual dose to the target tissue, including the fetal compartment. For estimating fetal exposure, sampling of amniotic fluid, generally between 15 to 18 weeks of gestation, would be the most direct approach available; however, this is feasible only under very specific health conditions and is not a standard practice. In addition, this type of estimate, while accurate, would likely result in a biased population sampling. Finally, it has not been determined how these levels relate to the more routinely obtained maternal blood matrix.
With birth, other tissue matrices become available. These include meconium (the infant’s first stool), umbilical cord blood, umbilical cord tissue, and the infant’s nails and hair. Meconium starts accumulating in the fetal bowels from about 12–16 weeks gestation (Bearer, 2003) and has been used as a viable matrix for assessing exposure to drugs of abuse (Lauria et al., 1997; Ostrea et al., 2001), alcohol (Bearer et al., 2003), tobacco (Ostrea et al., 1994), and environmental chemicals (Ramirez et al., 2000; Whyatt and Barr, 2001); again, the relation to more traditional matrices has not been established. Of these matrices, cord blood is the most common matrix monitored. It has been estimated that in the month prior to a child’s birth, about 300 quarts of blood are pumped daily from the placenta to the fetal environment. In a recent website report, 287 chemicals were reportedly detected in cord blood (www.ewg.org/reports/bodyburden2). The chemicals included many halogenated organic chemicals, mercury, and polycyclic aromatic hydrocarbons. Although cord blood is exchanged with maternal blood, the concentrations of various chemicals are not necessarily equal. For example, it is known that that the concentration of methylmercury in fetal blood is about 1.7 time that in maternal blood (Amin-Zaki et al., 1974). Sampling of cord blood for chlorpyrifos showed a relationship to indoor air levels as well as outcome measures of length and weight at birth (Whyatt et al., 2004). Burse et al. (2000) utilized umbilical cord tissue to assess in utero exposure to selected persistent pesticides and PCBs; one of the difficulties they faced was gravimetrically determining the low lipid content (about 0.2%) of the cords and the variable and unmeasured presence of residual red blood cells in the vessels when the cord tissue was taken. For selenium, Lorenzo et al. (2005) reported positive correlations between cord blood and newborns’ hair, between placenta and umbilical cord, and between cord blood and maternal blood. For more details on the matrices available and volumes required for assessing in utero and early postpartum exposure, the reader is referred to Barr et al. (2005).
Compared to the measurement of exposure, outcome assessment is comparatively straightforward, since a number of standard developmental test batteries with good psychometric properties are readily available. Since the early work of Hanninen (1966) examining carbon disulfide exposure, the field of human behavioral neurotoxicology has embraced a neurobehavioral approach to the assessment of outcome following exposure, although the specific neurocognitive domains that are targeted vary from study to study. Over a decade ago, Anger et al. (1994) provided a list of functional domains in adults that could be affected by chemicals. These included learning and memory, coding, sustained attention, higher intellectual function, strength, coordination, speed, vision, somatosensory function, and affect. Over the past four decades, this assessment approach to neurodevelopmental outcomes has been extended downward to examine very young children following their exposure in utero and excellent outcome measures are now available across the lifespan.
One of the earlier efforts in this area came from the World Health Organization (WHO), which recommended the use of the Neurobehavioral Core Test Battery (NCTB). This battery was comprised of a variety of tasks including digit symbol, digit span, Benton visual retention, pursuit aiming II, simple reaction time, Santa Ana, and a profile of mood states. This original conceptualization has influenced the selection of neurobehavioral tools for the past 35 years (Anger, 2003). In the context of developing a testing strategy for the National Children’s Study, a workshop sponsored by National Institute of Child Health and Development (NICHD) and the Environmental Protection Agency (EPA) examined the questions of what, how, and when to measure neurobehavioral function in children. They considered batteries of tests and stand-alone metrics for use in longitudinal studies examining multiple functional domains. A major emphasis in this report was placed on testing functional competence, cognitive, sensory, motor and social/emotional development in the first four years of life (http://nationalchildrensstudy.gov/events/workshops/Neurobehavioral_Development.cfm). Paule (2005) also argued for task continuity across species to identify species similarities and differences, as well as to facilitate extrapolation of experimental animal data to humans.
While NCTB has driven much of the application methodology in pediatric neurotoxicology, its applicability to large epidemiological studies has been questioned. The direct assessment of these tasks to exposed individuals and the use of trained personnel create a labor-intensive situation for studying outcome. This is particularly true for large epidemiological studies and studies where cultural and linguistic issues are present. As a result, a number of other approaches have been employed to address these concerns. For example, in 1996 the Agency for Toxic Substances and Disease Registry designed the Pediatric Environmental Neurobehavioral Test Battery (PENTB) for children ages 1 to 16 years. The PENTB was constructed for cross-sectional studies. For children younger than 4 years of age, four informant-based scales are employed: the Parenting Stress Index (PSI), the Personality Inventory for Children (PIC), and the Vineland Adaptive Behavior Scales (VABS). For children ages 4 through 16, the PENTB evaluates cognitive, motor, sensory, and affective domains (Zeitz et al., 2002).
Another approach has embraced computerized assessment strategies and a variety of these have been developed. They require less time and expertise and provide for rapid scoring and data entry strategies. For example, tasks for The National Center for Toxicological Research Operant Test Battery and its variants (Paule, 1990; Paule et al., 1999) were selected based on their association with brain functions (e.g., short-term memory, attention, learning, time perception, motivation, color and position discrimination). For assessing older children, Iregren and Letz (1991) suggested the use of a Minimum Common Core Computerized Battery comprised of symbol digit, finger tapping, and simple reaction time. Otto et al. (1996) used a computerized battery, the Neurobehavioral Evaluation System (NES2), in pediatric neurotoxicology studies. This has been recently updated to the NES-3 (Proctor et al., 2000). A number of other computerized batteries have been developed, including the Cambridge Neuropsychological Test Automated Battery (CANTAB®), CogSport, HeadMinder, ImPACT, Automated Neuropsychological Assessment Metric, Cognitive Evaluation Protocol, and CNS Vital Signs. Other available batteries for use with both humans and non-human primates include the National Center for Toxicological Research (NCTR) Operant Test Battery (Paule et al., 1999) and the Behavioral Assessment and Research System (BARS) (Anger et al., 1996). Computerized batteries have been developed which combine various facets of other batteries (e.g., Rohlman et al., 2001). However, at present, there is no one neurobehavioral function test battery that has been recommended across age ranges or across cultures for any specific neurotoxic chemicals because none of the current batteries tests all functional domains of the nervous system function. As an overall guiding principle, choice of cognitive outcome measures should be based on the biological evidence of nervous system structural and functional development. To illustrate, during the infancy period (birth to 2 years of age), research suggests the developmental salience of three specific cognitive dimensions: (1) speed of processing, as assessed on tasks like habituation and visual expectancy performance (Colombo, 1993; Dougherty and Haith, 1997); (2) recognition memory, as assessed on tasks like novelty preference and conjugate reinforcement (Rovee-Collier and Barr, 2001; Fagen and Ohr, 2001); and (3) behavioral inhibition, as assessed on measures like the A not B task. As the child develops other specific cognitive processes, such as the development of selective attention (Ruff and Rothbart, 1996), processing speed and working memory become increasingly important (Schneider, 2002).
The majority of human behavioral neurotoxicology studies have focused primarily on measures of cognitive competence in children. For example, with regard to lead, most of the human developmental studies referenced in a recent review used measures of general cognitive function or cognitive-related measures such as visual-motor behavior, executive function, and attention, though some studies have reported an increased risk for anti-social behavior and impulsivity in children (Hubbs-Tait et al., 2005). The same review also reported virtually no evidence of non-cognitive outcomes as a function of exposure to mercury, manganese, or cadmium (Hubbs-Tait et al., 2005).
In recent years, developmental researchers have emphasized the multi-dimensional nature of children’s development, with specific reference to the concept of competence (Masten and Coatsworth, 1998; Masten and Powell, 2003). In addition to the acquisition of culturally- and age-appropriate specific cognitive abilities, the development of competence in children also involves the acquisition of intrapersonal characteristics and interpersonal skills that help the child to meet both major culturally salient developmental goals and to deal effectively with environmental challenges (Yates et al., 2003). This conceptual shift to a multi-dimensional approach to assessing children’s development in part reflects the fact that normality of development in one domain does not necessarily mean that there will be normality across all domains (Lester et al., 1995). A major implication of the concept of competence for research in behavioral teratology and toxicology is the need to go beyond a primary focus on cognitive outcomes, such as developmental quotient or IQ scores (Bellinger, 1995; Cory-Slechta, 1990; Rice, 2005; Weiss, 2000) to assess other critical developmental domains that also define competence. Some of these domains include: 1) temperament, including reference to indices of self-regulation and reactivity; 2) development of a secure attachment by the young child; 3) development of motivation by the young child to be actively involved with their environment; and 4) developing the ability to elicit and use other persons as resources and establishing appropriate social interaction patterns with caregivers and peers (Wachs, 1999; Masten and Powell, 2003; Werner, 1990). Children who are strong in these characteristics possess a range of flexible coping strategies to deal with day-to-day stressors, are more likely to demonstrate better performance in school, and are less likely to exhibit behavioral problems.
For example, one major dimension of temperament is self-regulation. This dimension of non-cognitive competence has unique linkages to specific cognitive executive functions, and refers to the child’s ability to control his or her own behavior and emotions (Rothbart and Bates, 2006). After the first year of life, indices of self-regulation (e.g., the ability to inhibit ongoing behavior, selectively attending to cues, joint attention) become more stable and increasingly influence an individual’s competence (Bates, 2001; Rothbart and Bates, 2006). Dimensions of self-regulation in infancy and childhood can be assessed either by validated parent report measures (e.g., Rothbart et al., 2001) or laboratory-based behavioral assessments (e.g., Kochanska et al., 2000). These measures have been demonstrated to be important in the context of environmental exposures; some studies have reported self-regulation deficits in infants and young children exposed to environmental chemicals (Burnette et al., 1999; Chasnoff et al., 1987; Mayes and Bornstein, 1995; Mendelsohn et al., 1998). Other studies indicate that infants exposed to certain environmental chemicals are harder to test and less likely to complete testing due to deficits in self-regulation skills, such as high distractibility or low consolability when distressed (Fagen and Ohr, 2001; Mayes and Bornstein, 1995; Struthers and Hansen, 1992). From these data, models have been developed for environmental exposure and disturbances in the development of regulation processes (Mayes, 2002).
A recent monograph from the Institute of Medicine (2004) presents a series of criteria that can be used to evaluate the utility of different procedures for assessing different aspects of infant competence (e.g., age appropriateness of instrument, degree to which scores on the instrument are predictive of later development, evidence that the instrument is sensitive to toxic exposure, whether performance on the instrument can be linked to perturbations in brain function/brain development). The Institute of Medicine (2004) monograph also stresses the importance of utilization of converging measurements of the same construct (aggregation) as a means of maximizing assessment sensitivity. The conclusions drawn in this monograph can serve as a guide to both dimension and instrument selection when assessing both cognitive and non-cognitive competence in early life. Choice of non-cognitive outcome measures should be based on the biological evidence of nervous system structural and functional development.
While a review of statistical methodologies in neurodevelopmental epidemiology studies was beyond the scope of the Workshop, the Panel emphasized a number of fundamentals such as the importance of laying out a statistical design for the study in advance and ensuring that the statistical methods be transparent to the reviewer and reader. The complexity and expense of most epidemiological studies confines them to confirming suspected associations rather than generating new hypotheses. Consequently, the formulation of a priori hypotheses based on prior empirical data or theoretical considerations is essential. After formulating a hypothesis, the statistical design for the primary analysis of the study should be determined in advance. An a priori primary analysis plan should provide the investigators the primary answer for whether the hypothesized associations are in fact present or not. This primary analysis plan should be adhered to and the statistical methods and results should be reported in enough detail that they could be replicated (i.e., transparency). This minimizes the possibility of identifying obscure effects or interactions as the result of conducting multiple tests. Secondary analyses can then confirm the primary study or not, answer secondary questions, explicate limitations and explore other possible associations. Secondary analyses can be very useful, but should be identified as such. For example, there may be susceptible populations to specific environmental chemicals. Secondary analyses of such susceptible groups can prevent a Type II error (failure to detect a real effect when one is present because it is confined to a particular group of individuals).
In both planning and evaluating the results of a confirmatory study, it is critical that sample size be taken into account; if the sample size is too small, important effects on small sub-populations can be completely masked. It is less well-recognized that small studies can also have a greater probability of Type I errors. One solution is to take a two-prong strategy: (1) cluster sampling, which (unlike random sampling) ensures representation of small subpopulations; and (2) calculating sample sizes in advance, based on a priori established probabilities of Type I and Type II errors. A sample size calculation is a prerequisite for any confirmatory study. In addition, in order to appropriately power studies, one needs a strong knowledge base to identify effects that should be considered. Most notable is the use of mechanism or mode of action for any specific compound underlying a specific endpoint under study.
The goal of environmental epidemiology is to evaluate the relationship between exposure and a disease or other outcome. This is nearly always accomplished by the use of statistical modeling in which the outcome (dependent variable) is modeled as a function of the exposure (independent variable). However, there is always a concern that factors other than the exposure could explain an observed relationship. Epidemiologic methods rely on the appropriate use of confounder measurements to avoid or adjust for confounders. While this appears to be rather straightforward, assessment of not only known confounders but of determining how these and other factors change over time and the impact of such changes become critical factors in neurodevelopmental studies. With the expansion of studies to examine effects that occur over a lifetime, an additional effort may be needed to identify influences not previously considered. For these reasons, the Panel reconsidered the current terminology and definitions.
Briefly, a covariate is a variable associated with the outcome of interest. Its inclusion in the model may increase the precision of the parameter estimates. A confounder is related to both exposure and outcome, and affects the estimate of the association between the two. As such, a confounder affects the estimate of the association between the two. Unless the statistical model adjusts for such confounding variables, a biased estimate of the association can result; either the appearance of an effect can occur where none exists or a real association can be attenuated. There are many factors (covariates) that affect developmental outcomes, although they are typically unrelated to exposure; however this may not be the case in every situation and with every environmental exposure; for example some exposures may be associated with SES.
Confounding variables that are commonly considered in studies of neurodevelopmental effects in children include participant age, race, gender, socioeconomic status (SES), and maternal variables such as age, education, and IQ. While each of the variables appears to be distinct, the variables are a mix of multiple potential confounding factors. For example, while age is an important variable in itself, it could also encompass other aspects such as the age at first exposure, the age-period of exposure duration, or the developmental stage of the child when exposure occurred. In addition, in the case of in utero exposure, the duration of exposure could be significantly influenced by the pregnancy duration inclusive of premature birth. This would influence the accumulated dose level and whether or not the exposure occurred during a normal period of in utero development. For children, additional variables include health status, nutrition and diet, educational experiences, and social experience. In any developmental study, the influence of the parents on the offspring is significant. This presents as a collage of postnatal confounders including genetic background, prenatal environment, parent/child interactions, mental heath, home environment, and health status. For example, while SES may not be a direct risk factor, many of the factors related to SES may increase the risk for the outcome under study. In addition, confounders like SES are not necessarily stagnant and can change over time. For this reason, such confounders require repeated assessment over the course of the study. In the case of multiple confounders, evaluation of a change with or without adjustment for other confounders can make a large difference in the final estimate. Even when adjustments are made, one must continue to be concerned with the remaining (residual) confounding. Given the impact of confounders in the final interpretation of the study results, it is critical that the criteria used to select confounders for adjustment and how they were measured be reported as part of any analysis plan.
More complicated is the potential need to differentiate between confounders and mediators. If an intervening variable is functioning as a mediator, an indirect causal pathway is established between exposure and outcome. In this case, the mediator serves as the mechanism through which toxic exposure translates into biological deficits. For neurodevelopmental studies, an example would include the case where exposure to an environmental chemical produces a change in an infant’s temperament characteristics. Such changes could include an increased level of irritability or a reduced level of responsiveness or attention. These, in turn, can have a significant influence on testing performance or have other long-term behavioral consequences (Bendersky et al., 1996; Chasnoff et al., 1987; Fried, 1989; O’Connor et al., 1993; Ruff and Rothbart, 1996). Under such conditions, exposure would result in negative developmental consequences, albeit through an indirect causal pathway. Thus, treating either temperament or patterns of parent-child relations as a confounder would not be appropriate. Given that confounders and mediators may show the same pattern of relations with predictors and outcomes, it is important to develop decision rules for interpretation. One such rule would involve the degree to which one can assume a causal link between toxic exposure and mediator, as opposed to a non-causal association. Reviews of statistical approaches to analyze for mediator effects are outside the scope of the workshop and can be found in a number of sources (Evans and Lepore, 1997; MacKinnon et al., 2002).
While studies often attempt to isolate the effects of a single chemical agent, exposure to multiple environmental agents occurs in the population (Cory-Slechta, 2005; Mayes, 2002; Wachs, 2000). The level of relation between exposure to environmental chemicals and developmental outcomes can vary systematically as a function of population characteristics (Bellinger, 2000; Ruff, 1999). In this situation, one is dealing with effect modifier(s) or moderating variable(s). An effect modifier modifies the effect of the exposure on disease. It is the variation in the magnitude of any specific measure of an exposure effect as it varies across levels of another variable. In the absence of identifying and testing for moderators, exposure-outcome relationships can be under- or over-estimated due to the fact that the impact of such exposure will be restricted only to a specific subgroup within the larger sample population (Bellinger, 2000; Davidson et al., 2004; Jacobson and Jacobson, 2002; Rauh et al., 2004; Vreugdenhil et al., 2002; Wachs and Plomin, 1991). For developmental neurotoxicology assessments in the human population, such modifiers may become more difficult to assess as the study duration increases and the number and types of modifiers change. How to handle these changing dynamics within a population and within individuals of a population will be a major challenge for future studies.
Epidemiologic studies of neurodevelopmental outcomes place extraordinary demands on the professional, intellectual, ethical, and communicative skills of the investigators. As a part of the standard scientific process, data obtained from one study are used to generate hypotheses to be tested in subsequent studies. While this sequence of events, in general, serves to advance our understanding of a process, hypotheses are just that. Quite often, hypotheses generated from studies on the human population can have a significant impact, in and of themselves, on the population under study. Thus, care must be exercised in the final interpretation of such data and the future use of the data. The Panel stressed the care required in order to advance the science while at the same time responsibly interpreting and communicating study data. For these reasons, the Panel reiterated the need for full, open, and transparent reporting of all aspects of a study design and analysis. In addition, the Panel encouraged the reporting of all results with a discussion inclusive of all of the findings of a study whether they identified a relationship or not. It was acknowledged that the reporting process requires at least as much planning, care, and attention to detail as other aspects of study design and implementation. However, it is often the last aspect to be considered in the effort to publish results. Ideally, report planning begins during the study design phase; this is when the investigators decide what research questions will be addressed and consider what conclusions might be drawn from various study outcomes that are possible. Use of data to address questions for which the study was not designed should proceed with caution because of the potential for misinterpretation or misattribution of the relationship between exposure and adverse effects on the nervous system. The results of such post hoc exploratory analysis may be hypothesis-generating, but not necessarily definitive.
A study report reaches many audiences, each with its own core concerns and priorities. In addition, the investigators have a responsibility to the participants of any specific study. Investigators need to consider these constituencies in reporting their findings.
Collective experience has been accumulated, by the authors and many others, from dozens of environmental health studies conducted over two decades across the United States (for review, Amler and Tinker, 1992; Friedman et al., 1999; Amler and Falk, 2000; Brauer et al., 2004; Schwartz et al., 2006). This experience demonstrates a consistent benefit of early and regular contact with study communities as an indispensable aid in report planning. As a part of the responsibility of the investigators, prospective study participants, household members, and community members can be familiarized with study objectives and study design. These participants can learn what information can be reasonably expected from the study results. Report planning is facilitated by the enhanced familiarity of study participants and community with issues. This familiarity can also assist in raising concerns that are likely to surface but are unknown to the investigator.
Results that are statistically significant may be significant only in the mathematical probabilistic sense—that the observed outcomes were unlikely to be due to chance alone. For example, a very small observed difference between two study groups in a neuropsychological test result may be inconsequential in any real sense but may achieve statistical significance simply due to the large study population or the large number of tests performed.
Because statistical significance is dependent on sample size, studies with large-sample sizes can yield statistically significant results that have only trivial clinical importance. This implies that results must be interpreted carefully considering at least three distinct types of conclusions and how they will be reported: the medical implications for individual participants, the potential health impact on the study population, and the acceptance or rejection of the study hypotheses.
A frequently encountered question is: how meaningful are study findings as they relate to individual study participants, the community, and the general advancement of environmental toxicology. For example, some of the non-cognitive neuropsychological tests as well as tests for specific cognitive functions, although useful in evaluating populations, do not have well established absolute normal values and thus must be interpreted solely in a comparison between exposed and non-exposed groups (Amler and Gibertini, 1996). This implies that results must be interpreted carefully considering at least three distinct types of conclusions and how they will be reported: the medical implications for individual participants, the potential health impact on the study population, and the acceptance or rejection of the study hypotheses.
All study participants with aberrant test results are entitled to some clinical reporting and counseling, or at least referral to an appropriate competent medical resource. Virtually all such individual interpretation issues are easier to manage when they are explained in advance of the study, detailed in the informed consent documentation, and reinforced throughout the conduct of the investigation.
Therefore, a communications plan is essential, both to alleviate concerns about test results that are statistically at variance but not deemed abnormal or injurious, and to ensure proper reporting of individual results that might require medical attention. Investigators must be cautious to avoid labeling study participants, especially children, and be circumspect in reporting test results to parents. Cognitive test results can be especially provocative because many people equate them with intellectual ability. Although there are few empirical studies on the effects of such labeling (Hastings, 1994; Jellison and Duke, 1994; Sparrow et al., 1993; Townsend et al., 1993), the issue of a child's intellect is clearly emotionally charged for most parents and families. For many, this may be the first occasion on which they are confronted with such information, from which they often infer far-reaching implications. Without appropriate preparation, they may be profoundly disheartened and discouraged by this information.
Communities and the public at-large, like individual study participants, are interested in whether a study finds any real or potential health impact from the toxic exposures or hazardous chemicals of concern, and if such effects are found, what remedies are available and recommended. The investigators, again, must distinguish carefully between study findings that are statistically significant merely in a probabilistic mathematical sense, and those that have real meaning for the health of the community and the public. This is best accomplished at the time the study protocol is developed, when the investigators should consider a priori what effect sizes they will consider meaningful in this latter sense. Reporting of the findings should be confined to the data as it relates to the actual research questions for which the study was designed to address. Speculation regarding the impact of these findings without additional supportive data should be discouraged. With regard to any individual concerns, the investigators must be able to put the findings into perspective in that any test results detecting differences between groups do not necessarily indicate individual abnormalities. Any presentation of the findings should be forthright and disciplined in stating the implications for the individual study participants, the exposed community, and human health overall.
In many instances, a small effect size can be important on a population basis even when it is not large enough to be important for individuals. A frequently cited hypothetical example examines the implications of a 3-point decrease in IQ scores in a population. Although a 3-point decrease in an otherwise healthy individual is not likely to be noticeable, a 3-point decline shifting the IQ distribution curve for an entire population might result in thousands more people with impaired cognitive function, and thus thousands of additional people requiring intervention services. The same 3-point shift might also result in thousands fewer people with exceptional cognitive ability, and perhaps most able to contribute exceptional leadership and vision to a community. Widely published population estimates of the economic and social impacts of lead poisoning are based on this hypothetical example. However, this hypothetical example assumes that both tails of the IQ distribution curve are unaltered in slope and contour by the 3-point shift. Actual effects in any real-life population would depend on a number of variable factors, such as the pre-morbid distribution of IQ scores, and the impairing effect, if any, of the neurotoxicant on that distribution curve, as well as the IQ-score shift itself.
The complex task of explaining fairly technical study results to communities, and helping people put those results in proper perspective, is aided by a number of published guidelines. Education of the health community in the interpretation of data obtained from epidemiological studies will serve substantially in the accurate communication to the general public as well as to the population under study. In many cases these are drawn from considerable experience in discussing the results of environmental health studies in communities located near hazardous waste sites. Effective principles of risk communication include early community involvement, clear identification of quantifiable hazards, and consideration of emotional and psychological components of risk perception (Amler and Tinker, 1992; Friedman et al., 1999; Brauer et al., 2004). Community members are often frustrated by the absence of clear answers to highly specific questions (e.g., “Do we have more children in special education because our water is polluted?), but when adequately informed will often respond constructively to recommended solutions (e.g., how to minimize exposure and thereby risk, how to detect clinical problems early and minimize their impact) (Amler and Falk, 2000). Training of investigators in effective methods of risk communication will be extremely helpful to this process.
Throughout report planning and generation, investigators must be disciplined and intellectually honest in asking themselves: how conclusive and how generalizable are the results, were the original study questions (and others) answered, and what are the implications for the individual study participants, the study community, the nation, and human health overall. Discipline is especially critical in restricting reporting to the actual research questions, how they were tested, and what the study found. Broader speculation should be avoided or at least kept to a minimum and clearly labeled as such. There is a parallel requirement for clinicians and policymakers to learn how to read scientific reports and interpret study findings and conclusions. The appropriate role of the scientist as a uniquely informed advocate for social change is a far more complex issue and beyond the scope of this Panel. Nevertheless, all investigators have an ethical duty to report negative as well as positive findings, and should take great care to neither minimize or exaggerate the findings.
The Executive Summary from this Workshop condenses the discussions of the Panel into 17 specific critical points. In addition, a number of discussion points were broached by the Panel regarding optimizing the design and interpretation of epidemiologic studies for assessing neurodevelopmental effects from in utero chemical exposure. These included the importance of understanding and integrating results from previous studies into the design of new neurodevelopmental epidemiology studies. The ability to compare results across studies has been severely limited in the past. This is not due to any specific error in design but rather only in differences in design that prevent direct comparisons. The Panel stressed the importance of developing and maintaining the ability to perform interstudy comparisons. While there are “optimal” study designs, they are often not feasible to conduct and there is a need for both small and moderate size studies to focus on a specific question in depth. This can serve to provide the foundation for large-scale study designs that can both confirm specific hypotheses and generate new ones. The Panel discussed the importance of a study design that accounts for enrichment effects, including concept of resilience.
With respect to the assessment of exposure and development of the nervous system, the concepts of critical windows of development and of effect come into play. While identifying critical windows would serve to significantly focus such studies, this would require previous knowledge concerning effects occurring at a specific stage of fetal development. The Panel emphasized that unless a critical gestational exposure window can be identified, exposure assessment over the entire period of gestation should be considered. In addition to providing an accurate measure of prenatal exposure, longitudinal measurements may allow identification of an especially sensitive period during pregnancy, particularly if sufficient variability is present.
In addition, the Panel stressed that practical considerations should be considered that affect participant burden, such as the length of time required to administer tests and in what setting (home or clinical setting) any testing may occur. All of these factors can influence the cost of a comprehensive evaluation of neurobehavioral function, but the hypothesis being tested should be the overarching determinant of what, when, and how the metrics are employed. Consequently, the real questions apply to the ultimate purpose of the measurement, when measurement should occur, and whether the measurement can be performed reliably. Further, as noted earlier, there is a need for the assessment of multiple neurobehavioral processes including measures of non-cognitive competence. For future studies, the Panel emphasized the inclusion of all information possible into the selection of outcome measures for example assessment of specific cognitive functions related to the development of competence in young children. Theoretical brain-based models can be used in the development of such assessment. Most importantly, the forward momentum for the field can be enhanced by the establishment of a system for neurodevelopmental surveillance. This would allow for tracking the outcomes from in utero exposure across early developmental time periods to determine whether central nervous system injuries may be lying silent until challenged. Such efforts will significantly improve the ability to identify those exposures that present a health risk to the fetus.
Other concepts and new directions discussed by the Panel included the examination of specific, a priori defined susceptible populations and the use of high risk populations. The expanded use of medical records, when possible, to confirm diagnoses, and the concept of early detection and screening procedures, and the use of dosimetry technology were also discussed. The Panel concluded by formulating the 17 points listed in the Executive Summary to continue to stimulate scientific discussions on contemporary issues with respect to this important area of inquiry.
The Panel would like to thank Dr. Cranmer and the sponsors of the 22nd International Neurotoxicology Conference. In addition, the assistance of Dr. Freya Kamel of the NIEHS is greatly appreciated.
Disclaimer: The opinions expressed in this article are those of the authors and do not necessarily reflect the views and/or policies of their affiliations.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Robert W. Amler, School of Public Health, New York Medical College.
Stanley Barone, Jr., National Center for Environmental Assessment, Office of Research and Development, US Environmental Protection Agency.
Aysenil Belger, Department of Psychiatry, Department of Psychiatry, School of Medicine, University of North Carolina at Chapel Hill.
Cheston M. Berlin, Jr., Department of Pediatrics, Children’s Hospital, Milton S. Hershey Medical Center, Pennsylvania State University College of Medicine.
Christopher Cox, Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University.
Harry Frank, Departments of Psychology and Earth & Resource Sciences, The University of Michigan-Flint.
Michael Goodman, Department of Epidemiology, Emory University Rollins School of Public Health.
Jean Harry, Laboratory of Neurobiology, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services.
Stephen R. Hooper, Clinical Center for the Study of Development and Learning, University of North Carolina School of Medicine.
Roger Ladda, Milton S. Hershey Medical Center, Pennsylvania State University College of Medicine.
Judy S. LaKind, LaKind Associates, LLC, Milton S. Hershey Medical Center, Pennsylvania State College of Medicine, University of Maryland School of Medicine.
Paul H. Lipkin, Division of Neurology and Developmental Medicine, The Kennedy Krieger Institute, The Johns Hopkins University School of Medicine.
Lewis P. Lipsitt, Department of Psychology, Brown University.
Matthew N. Lorber, National Center for Environmental Assessment, Office of Research and Development, US Environmental Protection Agency.
Gary Myers, Division of Pediatric Neurology, University of Rochester Medical Center.
Ann M. Mason, Research Foundation for Health and Environmental Effects.
Larry L. Needham, Division of Environmental Health Laboratory Sciences, Centers for Disease Control and Prevention.
Babasaheb Sonawane, National Center for Environmental Assessment, Office of Research and Development, US Environmental Protection Agency.
Theodore D. Wachs, Department of Psychological Sciences, Purdue University.
Janice W. Yager, Environment Division, Electric Power Research Institute.