It is informative to compare the results of the same tests conducted several months apart in the three experiments, all of which were done with the same eight inbred strains in the same lab with the same apparatus but different experimenters at the University of Windsor (). Patterns of strain differences in response to ethanol indicate genotype by treatment interactions, whereas overall mean scores on a test indicate consistent environmental differences between experiments.
Slips on the balance beam after ethanol were remarkably similar in the three experiments, always being highest in A/J and lowest in FVB. Every strain showed a clear and significant (P < .05, one-tailed) increase in slips under the influence of ethanol in all three experiments, with the sole exception of FVB in Experiment 2. Baseline counts of slips showed differences between experiments, whereas slips after an ethanol injection were quite similar in all experiments. The differences among experiments appeared to arise from slightly different definitions of a slip used by the experimenters. The person counting slips was always the same one who gave the injection and therefore knew the treatment condition, and that could have influenced judgments of slips. The test might be improved by concealing information about treatment condition during observations, but this would require the services of a second experimenter. Greater consistency of scoring slips might be achieved by some kind of electronic touch circuit or video observation. That kind of automation would not address the challenge of mice that yield no data because they do not traverse the beam or engage in other non-compliant behaviors [
17]. Despite these limitations, strain-specific results on the balance beam were clearly replicable across experiments and experimenters. The test is a good indicator of impairment of motor behavior by ethanol.
Baseline grip strengths were much higher in Experiments 1 and 3 than in the second study, but in all three studies, strain differences were relatively small, while ethanol effects were consistently very large. Only strain C3H in Experiment 3 failed to show a significant degree of impairment by ethanol. The grip strength test entails an intimate relation between the mouse and its human handler. Exactly how it is held, in the transfer from its holding cage and during the test itself, could influence the maximum strength of pull. In the present configuration, the handler must pull the mouse away from the strain gauge after it grasps the bar, and the rate of pull could be very important. A mechanical device that always pulls the gauge away from the mouse at a constant rate could improve consistency of the test. Pressure of the experimenter’s fingers on the tail may also play a role, and some kind of artificial cuff might provide a more consistent grasp on the tail. In any event, the current version of the test is very sensitive to effects of moderate doses of ethanol, while it indicates that strain differences in grip strength are generally quite small and therefore less consistent across experiments. The strain difference was not even statistically significant in one experiment (light-dark cycle), even though the ethanol effect on grip strength was large and unquestionably significant.
Mean scores on the accelerating rotarod were substantially but not entirely consistent across the three experiments. Over all groups, fall latencies were very similar in Experiments 1 and 2 but lower in Experiment 3, which could be a consequence of the sandpaper wearing down over time [
10]. The sandpaper is a likely source of a difference between successive shipments of mice reported in Fig. 8.2 of [
8]; more frequent renewal of the 320-grit sandpaper could address this problem. Strain rank orders under the saline condition showed some consistency, with C57BL/6J always being the best and BALB/cByJ ranking at or near second best, while C3H/HeJ was at or near the bottom rank. Once the overall mean fall latency in the three experiments is taken into account, substantial consistency of ethanol effects is apparent across experiments for strains SJL/J, C57BL/6J, C3H/HeJ, A/J, and 129S1/SvImJ. Strain FVB/NJ strain showed no impairment in two experiments but a slight improvement in another, which ranks them as least impaired in all three experiments. DBA/2J and BALB/cByJ showed clear impairment in two cases but no impairment in another. By comparing the magnitude of ethanol effects across the three experiments, it is evident that they were relatively smaller for the accelerating rotarod than for any other behavioral test. Generally speaking, it is expected that smaller effects will show less consistency across experiments.
It was interesting to compare the extent of ethanol-induced activation in the open field in Experiment 2 for the same eight strains that were also tested in the first experiment (). Under saline, strains A/J and 129S1/SvImJ were consistently low in activity while C57BL/6J was relatively high. The strain by ethanol interaction effect in Experiment 2 was an almost perfect replication of the interaction effect for the same strains in Experiment 1; the only exception being the absence of ethanol-induced activation for DBA/2J in the second experiment. This was a surprising finding in view of the well-documented sensitivity of this strain to ethanol’s locomotor stimulant effects [
57]. The nature of the interaction effect depended on the specific phenotype; open field rearing was greatly reduced by ethanol for six of eight strains in Experiment 2 but not for SJL/J that clearly showed an activation effect on distance traveled (data not shown).
Distance traveled in the open field was considerably greater across the same eight strains in Experiment 1 than Experiment 2, even though the same video camera and automated tracking software were employed. Recent tests have shown that illumination of the apparatus can markedly influence measured path distances [
58]. It appeared to the experimenters that illumination was very similar in the two experiments. The experimenters themselves differed, however, and they could have influenced activity levels [
59,
60], especially after the injection of saline or drug that entails human handling. It cannot be concluded that experimenters were the source of activity difference, although they could have been. A controlled study wherein different people test mice in a balanced order within the same study might provide data that are more convincing. Strain differences in locomotor activity have been an extremely stable characteristic of inbred mouse strains across decades of testing in multiple laboratories [
4]. A recently completed study in the Wahlsten laboratory found a large difference between activity levels of mice following injection of ethanol by different experimenters but not before injection (see [
8]).
The sample sizes for most of the groups shown in were eight mice per group, and it can be seen from the standard error bars that individual variation was substantial for many measures. Some of the failures to replicate specific strain or ethanol effects across experiments could arise from sampling error. Larger sample sizes are generally needed to detect interaction effects than to detect main effects [
61,
62]. Even larger sample sizes would be needed to ensure replication of an interaction effect itself, unless the interaction is very large.
It might be argued that our sample size was too small to detect strain by treatment interactions involving light-dark cycle and cage enrichment. This situation is of greatest concern, however, when there appears to be an interaction in the data but the significance test does not detect it. In the present study, there was not even a hint of an interaction effect. Thus, sample size cannot account readily for the absence of interaction effects involving light-dark cycle or cage enrichment.
The present experiments were designed to assess the influences of two common variations in the laboratory environment that might alter results of studies of ethanol effects on behavior by carefully controlling the variations within an experiment in one lab. They were not explicitly designed to detect causes of different outcomes between experiments in one lab or between different labs in any more global way, but the results have some relevance to these questions. It is has been argued that replicability of results in different labs will be enhanced if each lab employs more than one housing condition within a single experiment [
63,
64]. This question has been addressed in a general way using computer simulation [
8] with a factorial design involving several mouse strains and an experimental treatment studied independently in different labs. The critical issue in judging replicability is the strain by treatment by lab interaction effect. If it is very small, then results of the strain by treatment experiment are substantially the same across labs. The simulation shows how the outcome of the statistical analysis depends on the properties of the error term in the analysis. If the two housing conditions are included as a separate factor, the strain by treatment by lab interaction effect is not altered in any noteworthy way. If, on the other hand, variance attributable to housing condition is pooled with other sources of within-group variation into a global error term, power to detect the strain by treatment by lab interaction effect is reduced. The analysis is then less likely to find a significant interaction effect when such an effect is indeed present in the model that generates the data.
Another possible influence on the strain differences reported here could be that the strains differ in alcohol absorption, distribution, and/or elimination after equal intraperitoneal doses based on g/kg body weight. We did not measure blood ethanol levels in these studies. Nonetheless, we have done so in many other studies with these and other strains using these and other tests after giving alcohol doses throughout the range employed here. A global analysis of the role of blood ethanol levels in ethanol intoxication found that it was not an important factor in explaining strain differences in behavioral sensitivity to ethanol intoxication, so we do not believe it was important here [
12].
It is noteworthy that results of the experiments on light-dark cycle and enrichment that used the abbreviated test battery with eight strains rather than the longer battery with all 20 strains detected many alcohol effects that were large and highly significant. In our view, little information was lost because of the more compact test battery and abbreviated list of inbred strains, while the overall efficiency of subsequent experiments was greatly increased. Nevertheless, the larger sample of 20 strains did reveal some interesting facts that may warrant further study, such as the markedly greater sensitivity of strain PL/J to alcohol impairment of motor performance. PL/J was never among the least affected strain on any test. The notably greater sensitivity of C57L/J than C57BL/6J is also intriguing, but it arose mainly from greater sensitivity of C57L/J on just two tests, grip strength and hypothermia, that showed a low strain of correlation of only r = 0.28 across the full set of 20 strains.
The use of a within-subject design to evaluate alcohol effects also increased the efficiency of tests of strain differences. In Experiment 2, the sizes of the ethanol main effects were large and significance levels were high for both the between-subject and within-subject comparisons. In Experiment 3 that utilized only the within-subject design with the same eight strains, alcohol effects were very large and highly significant, as were several strain by alcohol interaction effects. Results from a within-subject design can be more difficult to interpret if the direction of a trend across trials for an untreated subject is similar to the effect of a treatment. In the present tests, however, the trend for untreated mice was clearly an improvement in performance across trials because of learning to balance on the beam or rotarod or to grip the bar. In the open field, activity of untreated mice tends to habituate across trials. Alcohol, on the other hand, tends to increase slips on the balance beam, decrease grip strength, decrease latency to fall from the rotarod, and increase open field activity, at least at moderate doses. Thus, reduced performance that was evident from pre- to post-injection levels of performance in this study must represent a genuine impairment of performance by alcohol. This interpretation may not be valid for other tests or higher doses of alcohol. Preliminary evaluations of within- versus between-subject designs would be well advised when working with different kinds of tests. There could be situations where the experience with a test during pre-injection testing attenuates the effects of an alcohol injection that might be evident when alcohol is given to a naïve animal.
In the present series of experiments, the abbreviated battery of four tests applied with a within-subjects design proved to be both efficient and effective for the purposes of the study. Those methods were particularly well adapted to experiments designed to evaluate possible interactions with an environmental treatment factor, because the sample size within a group could be reasonably large. Between-subject designs with many tests in a battery are likely to suffer a loss of statistical power when spreading fewer mice thinly over more test conditions.
In conclusion, these studies provide further evidence for the influence of cage enrichment on certain behaviors in mice. Interestingly, testing mice in their most active circadian phase appeared to be no different than testing them during a period when they normally sleep. It was reassuring to find that ethanol had potent intoxicating effects regardless of circadian phase or whether mice had been reared in standard or enriched cages. Thus, there appears to be little danger of misinterpreting genetically based differences when employing common variations of these two environmental factors.