|Home | About | Journals | Submit | Contact Us | Français|
A statistical generalizability analysis gauges the degree to which a single assessment of a parameter successfully estimates that measure over repeated assessments for that individual. The generalizability of enumerative and functional immune parameters was estimated for two species of macaque monkeys assessed every 3 months between 18 and 42 months of age. Subjects were cross-balanced by species (bonnet, Macaca radiata, n = 22; pigtail, Macaca nemestrina, n = 21), sex (male, n = 21; female, n = 22), and brief early maternal separation with reunion (control, n = 21; separated, n = 22). Cell subset analysis showed the best generalizability (35–69%). Natural cytotoxicity also performed well (44–70%), but when computed on a lysis per cell basis, removing the effect of cell phenotype, it was less stable (15–48%). For most immune parameters, at least 5 assessments would be necessary to establish conventionally reliable (.80) characterizations of stable individual differences in immunity, and 3 for minimally reliable (.60) characterizations. More reactive parameters, as well as more behaviorally reactive species, yielded more generalizable results. Cell subsets that are typically most sensitive to acute stress (CD8, CD16) were more stable than other subsets (CD4, CD20). Behaviorally reactive species (pigtail) yielded more stable natural cytotoxicity results than the less reactive species (bonnet). Sex and rearing condition (early, brief maternal separation) did not substantially affect generalizability, although females tended to generate more stable results than did males.
Individual differences in immunoregulation as well as environmental events that reset immune set points (Coe and Lubach, 2000) are of considerable interest in understanding brain, behavior, and immune relationships (Kemeny and Laudenslager, 1999). Stable individual differences in immune responses are relevant to a wide array of phenomena, including personality, temperament, sex, early life experiences, interventions, and so on. However, it is often unclear how many times an immune parameter should be measured in order to fully characterize that measure for the individual. Investigations of relationships between immunity and stable individual differences due to personality, temperament, sex, and early experiences may often assume that point assessments of immunity generalize to weeks, months, or years surrounding the assessment. In point of fact, single time point assessments only reflect a snapshot of the features of the very complex and changeable systems under investigation. Generalizability analyses gauge the degree to which a single assessment does so successfully.
Generalizability theory provides estimates of what percentage of a single value or assessment is due to the individual, the point in time, or idiosyncratic interactions between the two by using variance estimates from ANOVA or similar models (Brennan, 2001; Shavelson and Webb, 1991). These estimations provide, first, the ability to discern whether a specific measurement is adequate in terms of its ability to generalize to a time frame of interest, and, second, the number of measurements over that time frame that would be necessary to reach an adequate level of generalizability.
Two studies – one with humans and one with rhesus macaques – have examined the generalizability of immunological markers. Fletcher et al. (1992) found that proliferative responses to PHA and PWM were highly generalizable across two weeks, with 74–86% of variance in results due to individuals; the remainder of the variance was due to idiosyncratic differences between people across visits. Segerstrom et al. (2006), however, found that generalizability of proliferative responses in rhesus macaques (Macaca mulatta) was much lower across a longer span of time (7 months to 1 year), with 26–30% of variance due to individuals. In addition, only 25–39% of variance in NKCC was due to stable differences between individuals; 41–63% was due to idiosyncratic differences between individuals across 4-week intervals. As a result, as few as 4 and as many as 10 assessments were required to reliably assess NKCC and accurately identify the influence of a stable individual difference (in this case, handedness) on NKCC. Similarly, studies with human subjects have found that single assessments are adequate to characterize short-term (days) but not long-term (months) levels of proinflammatory cytokines (Cava et al., 2000; Rao et al., 1994).
Herein we describe a generalizability approach to immune data collected from two different species of macaques, pigtail and bonnet, collected at 3-month intervals between the ages of 18 and 42 months, including the period of transition into sexual maturity. The present study expands of previous investigations by including multiple types of immune assay, including phenotypes (CD4, CD8, CD16, and CD20), differential cell counts, and natural cytotoxicity toward two tumor targets (K562 and Raji cells).
The present analyses also illuminate the effects of dispositional and environmental influences on generalizability. In terms of dispositional differences, we compare generalizability as a function of macaque species (pigtail versus bonnet). The macaque genus is widely dispersed geographically, with the rhesus macaque demonstrating the greatest overall dispersion of macaques (Thierry et al., 2004). Wide behavioral variation exists across species with regard to both dominance structure and maternal care. Thierry (Thierry, 2004) has suggested that the macaques can be classified with regard to social organization from grade 1 (e.g., rhesus macaque), where social interactions are nepotistic and hierarchical, to grade 4 (e.g., Tonkean macaque), where social interactions are less likely to result in retaliation. Within laboratory housed social groups, the young bonnet macaque (grade 3) does not contest access to limited resources, young bonnet macaque infants are readily adopted by others in the group, and adolescents adapt more readily to a new social group by forming new peer relationships (Boccia et al., 1992, 1997; Kaufman and Stynes, 1978; Laudenslager and Boccia, 1996; Reite et al., 1989). In contrast, the pigtail macaque (grade 2) housed in social groups demonstrates considerably more conflictual social organization, less altruistic care of the young, and more reactive biobehavioral responses to early social challenge (Boccia et al., 1989; Laudenslager et al., 1995; Reite and Short, 1980; Reite et al., 1981). These behavioral and temperamental differences may affect fluctuations in, and therefore the generalizability of, immune parameters. In particular, the more conflictual and reactive species (pigtail) may have less stable immune parameters as a function of the social environment and temperament.
In terms of environmental influences, we compare generalizability as a function of a single brief early maternal separation. In young macaque monkeys between 4 and 8 months of age, a brief separation from their mother is associated with significant alterations in cell mediated and humoral immunity lasting many years (Coe, 1993; Coe and Laudenslager, 2007; Coe et al., 1988; Laudenslager et al., 1985, 1990, 1996). Health consequences also appear to be associated with these early challenges (Lewis et al., 2000). However, whether early separation affects fluctuations in, and therefore the generalizability of, immune parameters, is unknown.
The present study, therefore, characterizes variance in primate immunity over a long period of time. Variance was partialed into stable individual differences, systematic change over time (e.g., due to development), and random variance (e.g., due to methodological variability and random fluctuations). In addition, this study adds to the results of previous investigations (Segerstrom et al., 2006) by expanding the number and type of immune parameters subjected to generalizability analysis and by investigating whether dispositional and environmental events significantly affect variability.
Subjects in the present analysis were part of a longitudinal study of immunity in two species of macaque, beginning at 4–5 months of age and continuing to post-adolescence. Portions of this longitudinal study been reported previously (Laudenslager and Boccia, 1996; Laudenslager et al., 1990, 1995, 1996; Weaver et al., 2004) and will not be described further herein. We report here on a generalizability analysis applied to a number of immune parameters collected between 18 and 42 months of age collected at approximate three month intervals. The study took place over a decade.
A total of 22 pigtail (Macaca nemestrina) and 21 bonnet (Macaca radiata) monkeys were studied between 18 and 42 months of age for the present analysis (see Table 1). All animals were socially housed in conspecific adolescent social groups in the University of Colorado Denver Primate Research Facility. Subjects entered the study as half-sibling pairs between 5–6 months of age while living in their natal social groups. On average, subjects had 1 full sibling (range = 0–3) and 8 half siblings (range = 0–15) in the study. Although bonnets tended to have more close relatives in the study (average of 1.7 full and 11.0 half siblings) than pigtails (average of 0.4 full and 5.4 half siblings), these differences were not statistically significant.
One member of each pair experienced a 2-week maternal separation at 5–6 months of age, remaining with conspecifics in its natal group while its mother was removed. The mother was returned to the natal group after the separation. The young monkeys were never socially isolated; however, the mother was removed. Assignment to the separation condition was random once a study pair was established. At approximately 15 months, both members of each pair were transferred to an all peer adolescent social group of between 4 and 9 monkeys at any given time for the remainder of the study. Study pairs were housed together in the same natal group and adolescent group; therefore, the adolescent group included at least one member of the natal group (the paired animal) and often other members as well. The first blood draw was taken a mean of 83 days (SD = 22) after transfer to the peer group; this interval was slightly longer in pigtails than in bonnets (t(38) = −2.49, p < .05; see Table 1). This protocol was approved by the Institutional Animal Care and Use Committees of the University of Colorado Denver.
All blood samples (3.5–8.0 ml) were obtained prior to morning feedings between 0700 and 0800 hr by venipuncture (heparinized syringes) of the saphenous vein from unanesthetized subjects while under light restraint. For this and all phlebotomy procedures, a stopwatch was started when the group was first disturbed (i.e., entry into the social group) in preparation of drawing blood and removing targeted monkeys (up to 6) from the adolescent social group. This procedure took 5–10 minutes to cull subjects; the subsequent phlebotomy procedure was accomplished in 3–4 minutes in tandem by two phlebotomy teams. Mean time from initial disturbance to the end of the blood draw (i.e., disturbance time; Capitanio et al., 1996) was approximately 10 minutes for both species (see Table 1); phlebotomy took longer on those occasions when more subjects were targeted for blood draw. Preferred fruit was always provided to each subject following venipuncture. Subjects were returned to their social group immediately following venipuncture.
Prior to removing plasma, the blood samples were carefully suspended and a small aliquot was removed for a white cell count on a Coulter Counter (Model ZBI) and a differential cell count was performed on a Wright stained blood smear based on counting 100 cells. Reliability of research assistants over the course of the study for performing the differential cell count, based on counting 10 samples, was always greater than 90%.
Natural cytotoxicity was determined against K562 and Raji cell lines as previously described (Laudenslager et al., 1996). The cell lines were labeled with 51Cr, washed in Hank’s balanced salt solution (HBSS), and resuspended to 1 × 105 viable cells/ml in supplemented RPMI-1640. In the wells of a round bottomed microtiter plate, effector to target ratios of 30:1, 15:1, and 7.5:1 were plated in triplicate with 10,000 targets/well. Effectors were isolated on Ficol Hypaque, washed 3X with Dulbecco’s phosphate buffered saline, and resuspended in RPMI. Suspensions were incubated for 4 hours at 37°C in a humidified CO2 incubator. At the conclusion of the incubation period, plates were spun (200g) and 0.1 ml of the supernatant was added to 2 ml of Optifluor (Packard) and counted on a liquid scintillation counter. Total CPM was determined from lysed cells using ZAPoglobulin (Coulter). Percent lysis at each effector to target ratio was determined from the median value of each triplicate determination based on the following:
Targets cells were grown to confluence and split as necessary prior to assay. Cell lines were monitored daily and either fed or split the day before use. Culture contamination with mycoplasma was determined as indicated by reduced H3-thymidine uptake following 8 hr incubation. Contaminated cultures were replaced with fresh cell lines stored in liquid nitrogen. Lysis per CD8+ and/or CD16+ cell was based on estimating the number of cytotoxic cells in the lymphocyte suspension, taking the results of the differential cell count for percent lymphocytes and multiplying by the percent of CD8 + CD16 positive cells ( Raji targets) or percent of CD16 + cell cells (K562 target) (Maggio-Price et al., 1996).
Isolated mononuclear cells (0.5–2.0 × 106 cells) were incubated with directly conjugated monoclonal antibodies (Becton Dickinson) according to the manufacturer’s directions (Becton Dickinson). Monoclonal antibodies conjugated with phycoerythrin included leu16 (B cells/CD20), leu5b (NK cells and T cells/CD2), leu3a (T helper/CD4) and leu2a (T suppressor/CD8). Leu11a (NK-like cells/CD16) was conjugated with FITC and run as a two color probe with PE-labeled leu2a (CD8). Red cells were lysed with hypotonic buffer. After lysing, cells were washed with Dulbecco’s phosphate buffered saline to reduce background fluorescence. Cells were fixed in 2% paraformaldehyde and then counted on a Coulter EPICS 753 Triple Laser Cell Sorter.
For any given wave and immune parameter, a small number of animals had missing data due to problems with the blood draw, equipment problems in the laboratory, poor tumor recovery, or clots that interfered with isolation of lymphocytes. We assume that this missingness is random and not unexpected for a longitudinal study spanning a decade. Across animals, for NKCC (K562 and Raji), CD8, CD16, basophils, monocytes, lymphocytes, segmented neutrophils, and WBC, the median number of missing waves per animal was 3; for CD20 and eosinophils, 4; and for CD2 and CD4, 5. Missing data can create inaccurate ANOVA estimates; therefore, procedures described by Brennan (2001) for obtaining variance and generalizability estimates from data sets with missing data were employed. Individuals and waves were used as facets, with an emphasis on the generalizability of an animal’s immune parameters at one wave to the “universe” of 2 years. We additionally estimated the standard error of each of the generalizability estimates using nonparametric methods described by Brennan (2001). Raw variance estimates are reported for individual animals (S2a), waves (S2w), and idiosyncratic interactions between individual animal and wave, which are statistically undifferentiable from error (S2aw,e). However, it is impractical to compare these estimates across immune parameters due to differences in scaling. Therefore, variance estimates are also presented as a percentage of total variance in that parameter. For example, a variance estimate of 4.2 for that component of WBC variance attributable to the individual animal corresponds to 33% of the variance (Table 2).
For each parameter, we also calculated the number of waves that would be necessary to achieve two standards of reliability (w′.80 and w′.60). Reliability of .80 is a conventional standard; however, lower levels of reliability may be sufficient under some circumstances, and so we also report the number of waves necessary to reach reliability of .60.
Finally, we used generalizability estimates and standard errors produced for subgroups of animals to test whether generalizability estimates varied across three individual differences: species, rearing condition, and sex.
Generalizability estimates for flow cytometric cell count and differential results are found in Table 2; estimates for NKCC before and after controlling for effector mix (i.e., percentage of effector cells that were CD16+ and/or CD8+) are found in Table 3. The ability of a single assessment to generalize to the 2-year “universe” varied widely across types of immune assessment. For cell counts, CD20 percentage was least generalizable, with approximately 29% of the variance due to individual animals, and CD8 percentage was the most generalizable, with approximately 69% of the variance due to individual animals. At these levels, 8 assessments of CD20 would be necessary to capture stable differences between animals, but only 2 assessments of CD8 would be adequate.
Results from the cell differential yielded variance estimates for individual animals that ranged from moderate to quite poor. Although single eosinophil, lymphocyte, and WBC assessments captured 33–37% of variance due to individual animals, the estimates were lower for neutrophils (28%), monocytes (20%), and basophils (1%). In fact, 97% of variance in basophil counts was idiosyncratic change and error; only 3% could be attributed to animals and waves. This effect may be due in part to measurement error. Whereas flow cytometry was performed on 10,000 cells, reducing the effect of sampling error, differential was performed on only 100 cells located on the feather edge of the blood smear. Especially for low-frequency cells such as basophils (median percentage across all waves was 0), there may be little variance, a substantial influence of sampling, or both. In order to reach adequate generalizability for these parameters, it will probably be necessary to increase not only the number of waves at which they are measured, but also the number of cells counted. Interestingly, many electronic counting devices perform only three part differentials, thereby collapsing these lower frequency populations into a mixed category and counting lymphocytes and neutrophils as separate classes.
Unadjusted NKCC had the highest generalizability of all immune parameters, especially at lower effector:target ratios, with estimates for variance due to individual animals ranging from 44–71%. These estimates suggest that as few as 1 assessment may be adequate to capture individual differences at the lowest acceptable limit of reliability, although to reach conventional standards, 2 or 3 assessments are necessary. However, examination of the estimates for NKCC adjusted to actual numbers of effectors based on the cell subset analysis suggests that this stability arose in part from stability in the effector mix. CD16 cells are effectors for both Raji and K562 targets and CD8 cells are also effectors only for K562 targets (Maggio-Price et al., 1996). These cell subsets also had the highest generalizability estimates, at 56% and 69% respectively. Once the effector mix was adjusted to the actual number of these cell types (e.g., Raji lysis/CD16+ cell or K562 lysis/CD8+ plus CD16+ cell) the generalizability of NKCC dropped significantly. Estimates for variance due to individual animals ranged from 15–53%. Therefore, while total NKCC (incorporating number of cells and per-cell cytotoxicity) could be reliably assessed with only a few waves, per-cell NKCC would require at least 4 and up to 24 assessments to most reliably capture stable individual differences between animals.
In Table 4, differences between species, rearing conditions, and sex in generalizability estimates are characterized by the effect size for the difference between the two groups. Small effect sizes are approximately equal to a .2 SD difference in generalizability between groups; medium effect sizes are approximately equal to a .5 SD difference between groups (Cohen, 1992). Large effect sizes were not found in this sample. Differences in NKCC were calculated only for cell-adjusted NKCC, as unadjusted NKCC confounds effector mix and cytotoxicity, as described above. Between the two species, pigtails were generally more stable than bonnets, particularly in NKCC. Although bonnets were slightly more stable in CD4 and lymphocyte percentages, the overall pattern suggested greater generalizability in pigtails.
Between the two rearing conditions, the results were mixed. Although control animals were more stable for a pair of cell count parameters (CD2 and CD20), small effect sizes and a lack of consistent effects one way or the other characterized the results. Mixed results were also obtained for sex. However, in this case, greater generalizability in females seemed to be the more commonly obtained result. Although males tended to have more generalizable basophil measurements, these measurements are suspect, for reasons enumerated above.
It is common in PNI to take one immune assessment, often assuming that such an assessment generalizes over a longer time frame. However, the results of this analysis suggest that to identify stable individual differences in nonhuman primates would typically require 2–8 assessments over a several-month period. The increased reliability that would result would be more methodologically rigorous and less likely to yield null or erroneous results (Segerstrom et al., 2006). Furthermore, generalizability estimates in nonhuman primates living in a controlled laboratory environment may overestimate generalizability for primates living in the wild, that is, human animals. There is little data characterizing the generalizability of human immune parameters, although Stone (1992) noted that 5 daily observations were necessary to obtain minimally acceptable reliability for salivary IgA, that is, a measure that would characterize “trait” IgA over 8 weeks, and Fletcher et al. (1992) found that proliferative responses were well characterized over a two-week period by one assessment.
The necessity of specifying methodological requirements for different subjects is clear from the differences between species of macaque in the present study. Pigtail macaques, the more reactive species (Clarke and Boinski, 1995; Clarke et al., 1994), had higher generalizability estimates than bonnet macaques for several immune parameters, including percent monocytes, WBC, and NKCC. Furthermore, estimates of NKCC generalizability were higher for both species than for the rhesus macaque (Segerstrom et al., 2006). In another study of rhesus macaques, although generalizability theory was not applied, baseline CD4+ counts were more stable than CD8+ counts over a period of 1 year (Capitanio et al., 1998). In contrast, in the present study, CD8+ counts were more stable than CD4+ for both pigtail and bonnet macaques. Clearly, more research into the particular methodological requirements for different immune parameters in various species (including Homo sapiens) is needed.
Reactivity may be an important consideration in this line of inquiry. Because blood samples were drawn on unanesthetized animals, the blood draw itself could have acted as an acute stressor. Acute stress causes marked changes in the distribution of immune cells in both humans and macaques, particularly in the neutrophil, NK, and CD8+ T cell subsets (Capitanio et al., 1996; Laudenslager et al., 1999; Ottaway & Husband, 1994; Segerstrom & Miller, 2004). Although one might expect that the most reactive cells would provide the least generalizable estimates because they are influenced by changing environmental circumstances, in this case, those cells (CD8+ and CD16+) yielded the most generalizable estimates. Cells that are more resistant to changing circumstances (e.g., CD20+) yielded lower estimates. This finding was contrary to our expectation that reactivity would increase variability and decrease generalizability. However, some evidence suggests that reactive parameters are more reliable than resting. In humans, Marsland et al. (1995) reported correlations across a 2-week interval for resting and post-stressor flow cytometric cell count and proliferative immune parameters. Where the two estimates were markedly different (CD8+ and CD56+ lymphocytes, proliferation to PHA), the post-stressor estimate was higher. The data from humans are therefore in agreement with ours from monkeys, and this pattern of results should be taken into account in making design decisions.
Furthermore, it is clear that generalizability is better for some immune parameters than others, and some immune assessment techniques are more suitable for research than others due to reliability. Clearly, techniques that use many cells (e.g., flow cytometry) yield better (i.e., more reliable) parameters than techniques that use few cells (e.g., differential cell counts), and this is particularly true for low-base-rate dimensions of the differential such as percentage basophils. If the laboratory procedure yields unreliable estimates, even many assessments may not be able to overcome this unreliability to abstract stable individual differences in immunity.
Capitanio and colleagues (2006) identified multiple sources of variability in Old World monkeys such as macaques. Sources located in the animal include social rank, age, sex, species, genotype, and temperament. Sources located in the situation include rearing history, conditioning, diet, and relocation and social rearrangement. In the present study, we found minimal effects of age (i.e., wave) and minimal differences in generalizability due to sex and rearing history. Therefore, controlling for these variables may not result in increased reliability of immune measurements. In contrast, species was associated with generalizability; however, it should be noted that pigtails had both higher generalizability and longer time since introduction to the peer group (roughly two weeks). As some evidence suggests that immune parameters recover slowly from social rearrangement, the additional time could have allowed pigtails to recover more. On the other hand, the type of reorganization in the present study might be expected to have minimal residual effects insofar as animals were reorganized into groups of familiar peers rather than individual laboratory housing (Capitanio et al., 2006). In addition to examining the effects of variables not included in the present study (e.g., social rank), future research should examine the proximal sources of species differences, such as temperament.
The present study, conducted over a decade, included variability due to the inevitable impact of personnel changes, reagent shifts, and lot number differences, which were not directly addressed at the time of the study (although these factors are now a routine consideration for all assays performed in this laboratory). Improved reliability may result from strict control over these factors, as well as procedural factors such as disturbance time (Capitanio et al., 1996). On the other hand, one methodological precaution often taken may be unnecessary. Female subjects are sometimes excluded from PNI studies because of concerns about hormonal fluctuations. However, in the present study, females provided equally if not more generalizable samples than did males. One possibility is that hormonal changes that accompany development reduced the g estimates for males, but there was no evidence of variability due to maturation, with variance estimates for wave less than 10% for all immune parameters measured. It is important to note that the subjects of this longitudinal study had not reached sexual maturity nor adulthood over the ages tested based on both plasma sex steroids as well as bone closure and morphometric measurements (A. J. Bennett and M. L. Laudenslager, unpublished data).
Clearly, the degree to which a single immune assessment is an accurate reflection of the individual’s immune status over days, weeks, or even years varies according to a number of important factors. These factors include the species (even within a genus such as macaques), the length of the time frame (generalizability will be higher for shorter time frames), the immune parameters in question (generalizability will be higher for those measured on large numbers of cells), and the context (generalizability will be higher on reactive than resting assessments). Although it is difficult to make global recommendations because of these factors, inspection of the Tables would suggest that three assessments over the period of time of interest (e.g., weeks or months) would often adequately capture stable differences in immunity. It is not difficult, however, to claim that it appears that one assessment is rarely adequate to capture stable individual differences that are relevant to long-term psychological processes such as personality, temperament, sex, early experience, and the results of interventions and to long-term health processes such as the development of disease.
This research was supported in part by grants from the NIH (MH37373 and AA013973, MLL) and the Developmental Psychobiology Endowment Fund, University of Colorado Denver, Department of Psychiatry. The authors would like to thank Christy Broussard-Berger, Mark Goldstein, and Polly Held for their expert assistance in performing the immune assays and Kristin Laudenslager for her assistance in data entry and organization for this analysis.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.