Contrary to our hypothesis, cylinder scores – not the chamber preference change score – showed the highest test-retest reliability. Additionally, cylinder scores obtained on Day 2 were more ecologically valid than Day 2 chamber scores. Although the ecological validity of Day 1 cylinder scores appeared to be higher than the ecological validity for Day 1 chamber scores, this difference did not reach statistical significance. Yet even in this case, all the cylinder scores attained higher correlations than did any of the chamber scores. Thus overall, cylinder scores achieved higher test-retest reliability and ecological validity than did chamber scores.
Our sample size did not provide sufficient statistical power to detect the relatively small differences among social chamber/cylinder times, preference scores, and preference change scores. Yet notably, the social cylinder time showed the highest test-retest reliability of all the scores. Furthermore, the social cylinder time – and not the hypothesized cylinder preference change score – showed the highest ecological validity, though only slightly higher than the other cylinder scores. Thus, the three cylinder scores were more generally valid than the three chamber scores, and the social cylinder time may be the most generally valid of the three cylinder scores; however additional study with larger sample sizes would be required to determine whether the latter point is true.
We had hypothesized that preference change scores would be the most generally valid scores because they theoretically control for a mouse’s tendency to explore a nonsocial stimulus (as do preference scores) and for its individual preference for a chamber or cylinder. It was therefore surprising that the cylinder preference and cylinder preference change scores were not more generally valid than social cylinder time in any case. The information about nonsocial stimulus investigation and prior chamber/cylinder preference may be weakly or not related to sociability, so by including it, these scores may introduce nearly random “noise” to the “signal” of social cylinder time. The lower test-retest reliability of these complex scores, compared to social cylinder time, may support this notion, but a similar pattern does not appear for ecological validity. Regardless, the experimenter should be cautioned against assuming that a more complex score necessarily improves the general validity of a behavioral analysis, and it may sometimes decrease general validity.
The superior general validity of cylinder scores over chamber scores suggests that sociability should be measured primarily by including only the active behaviors that are most directly related to social investigation. The predominant active social behavior is sniffing the cylinder. When its nose is in contact with the cylinder, the test mouse can likely perceive both volatile and nonvolatile odorants from the stimulus mouse (Brennan and Kendrick, 2006
, Luo et al., 2003
, Sanchez-Andrade and Kendrick, 2009
). Other active social behaviors include scratching, gnawing, climbing on, and rearing against the cylinder. Chamber scores may include other, passive social behaviors that can occur with some distance between the mice, such as when the test mouse chooses to be near another mouse, watches that mouse, or smells volatile odorants that have diffused some distance from that mouse. But these scores also include behaviors that are not clearly social, such as sniffing the chamber walls, walking through the chamber but not towards the social cylinder, and remaining still next to the chamber wall. Likewise, low locomotor activity may substantially affect a chamber score. Excluding these behaviors by accounting for only active behaviors directed toward the social cylinder yields a more generally valid measurement of sociability.
Because this study analyzed a heterogeneous group of mice, some conclusions may not apply evenly across all subgroups. With no more than 20 mice in each subgroup, statistical power was not sufficient to test robustly for correlational differences among the subgroups, and the estimates of the magnitude of the correlations for each subgroup are imprecise. However, some general patterns are noteworthy.
Social cylinder time showed higher test-retest reliability and ecological validity than social chamber time for nearly all subgroups. This was not true for the test-retest reliability of the 4-week-old BALB/cJ females, which behaved inconsistently across test sessions. Thus, for no experimental group did social chamber time indicate greater reliability than social cylinder time. For ecological validity, the 9-week-old C57BL/6J females were an exception, where the social chamber time correlation exceeded that of social cylinder time. However, the chamber time correlation was near zero, while the cylinder time correlation was surprisingly negative, though not of high magnitude. Thus even in this case, the social cylinder time may show a relationship that the chamber time does not show. In sum, there is no evidence that the chamber scores are more generally valid than the cylinder scores for any subgroup.
The correlations presented here are based on the behaviors of individual mice. While assessing reliability on an individual level is a common approach (Andreatini and Bacellar, 2000
, Drugan et al., 1989
, Henderson, 2005
, Hilakivi and Lister, 1990
, Lister, 1987
, Teixeira-Silva et al., 2009
), studies of anxiety-related behaviors suggest that examining behaviors on a group level can yield different results (Ramos, 2008
). In some cases, group-level analyses were able to detect behavioral correlations that were not present at an individual level. Thus, the possibility remains that a group-level analysis could detect higher general validity of chamber scores than has been found here. In developing the Social Choice Test, Moy et al. (2004)
presented evidence that chamber scores are generally reliable at a group level: adult C57BL/6J and DBA/2 mice showed largely similar chamber scores between a test and re-test 11 – 12 days later. However, cylinder scores were not reported in this experiment, so it is unclear whether the chamber scores’ reliability equals that of the cylinder scores at a group level. Given the large difference in general validity between chamber and cylinder scores found here, it is unlikely that a group-level comparison of chamber and cylinder scores would undermine our recommendation to primarily use cylinder scores to evaluate sociability.
This study was limited by the use of archival data (Sankoorikal et al., 2006
) that were not originally designed to answer questions on the general validity of the sociability scores. One limitation was potential test order effects: the interactions of the mice during Phase 2 (Social Choice) might have affected their subsequent interactions in Phase 3 (Free Social Interaction). No mice were tested in Phase 3 before Phase 2 to identify any test order effects. Furthermore, a test mouse was exposed to the same stimulus mouse for Phase 2 on Day 2 and for Phase 3 (also on Day 2), which may have attenuated their interaction during Phase 3 due to a habituation effect. However, any attenuation of social interaction that affected the mice fairly uniformly would not have greatly affected the correlations between Phase 2 and Phase 3, which were based on Pearson’s r
. Notably, attenuation of social interaction (habituation) seems even less likely between Phase 2 on Day 1 and Phase 2 on Day 2, because each test mouse was tested with different stimulus mice on Day 1 and Day 2 and because of the day-long interval between tests. Additionally, any test order effects might have been minimal: the effects of prior testing experience depend on the specific paradigms used and do not necessarily affect results substantially (Henderson, 2005
, McIlwain et al., 2001
The Social Choice Test is a highly controlled assay for social affiliation, and this high level of control entails curtailing some naturalistic aspects of social interactions between mice. Confining the stimulus mouse to a cylinder in Phase 2 allows one to isolate, to some degree, the sociability of the test mouse. But it also alters the quality or nature of the social interaction, because the confinement of the stimulus mouse limits its ability to initiate, maintain, and terminate a social interaction and to respond to social cues from the test mouse. Social behaviors of the test mouse may also be affected by being in a novel environment, which can induce exploratory and anxiety-related behaviors, and by the inability to fully contact the stimulus mouse due to the presence of a partial barrier between them (cylinder wall with holes in it). However, it is worth noting that this controlled social interaction shows some similarity to a more naturalistic interaction, as shown by the positive correlations between the social measures of Phase 2 (Social Choice) and Phase 3 (Free Social Interaction) (). Moreover, we have chosen to regularly include a Free Social Interaction phase in the Social Choice Test in all of our studies (Brodkin et al., 2004
, Sankoorikal et al., 2006
, Fairless et al., 2008
), in order to include both a more controlled and a more naturalistic way of observing social interactions in the context of the Social Choice Test.
Phase 3 (Free Social Interaction) is more naturalistic than the Phase 2 (Social Choice), during which the stimulus mouse is confined to a cylinder, because both mice can move freely in Phase 3. Nevertheless, Phase 3 still differs substantially from a social situation between feral mice in their natural environment. Among many other artificial factors, the mice in the Free Social Interaction (Phase 3) are laboratory-bred; are restricted to a novel, artificial environment; and interact in the presence of a human. Strategies that reduce or eliminate such factors to attain more naturalism – such as observing mice in home cage environments or semi-natural burrow habitats – can be related to the Social Choice Test to further investigate its ecological validity. Importantly, mice of the inbred strain BTBR T
/J show lower social behaviors than C57BL/6J mice in both the Social Choice Test and semi-natural burrow habitats (McFarlane et al., 2008
, Pobbe et al., 2010
). Unlike the present study, these results are based on group-level analyses, but they do support the notion that results from the Social Choice Test can be relevant to more naturalistic social situations.
Since the archival data were collected, the procedure for the Social Choice Test has been altered. In the earlier experiment (Sankoorikal et al., 2006
, this study, Experiment 1) the stimulus mouse was placed into one cylinder while the other cylinder remained empty at the start of Phase 2. When the stimulus mouse was introduced, it was both a social stimulus and a novel stimulus. To control for novelty as a possible confound, subsequent experiments have included a novel object that is introduced into the other (nonsocial) cylinder at the same time that the stimulus mouse is introduced into the social cylinder at the start of Phase 2 (Fairless et al., 2008
; this study, Experiment 2). Given this change, it is possible that the results concerning test-retest reliability and ecological validity from Experiment 1 would not apply well to subsequent experiments. We consider this unlikely because the procedure change (presence of the novel object in the nonsocial cylinder) has not substantially changed behaviors of test mice: the test mice generally sniff the nonsocial cylinder little compared with the social cylinder using either procedure, and experimental results in C57BL/6J and BALB/cJ mice have been very similar before and after the procedural change (e.g., juvenile BALB/cJ mice consistently have shown lower sociability than juvenile C57BL/6J mice, both before and after the procedure change; Sankoorikal et al., 2006
, Fairless et al., 2008
Tools that can automate the measurement of chamber scores are well established (Nadler et al., 2004
, Page et al., 2009
) and widespread, and this may account for the prevalence of using only chamber scores to assess sociability in the Social Choice Test. Given the cylinder scores’ superior general validity indicated in our study, exclusive use of chamber scores may produce a higher rate of undetected false positives and false negatives in the Social Choice Test. To facilitate the use of cylinder scores, we have validated the software TopScan for automated measurement of cylinder sniffing in the Social Choice Test. At the settings that we specified, TopScan performs as well as or better than human raters at this task, as we had hypothesized.
Some have suggested that a mouse’s proximity to a cylinder provides an adequate measure of sociability (Page et al., 2009
). Contrary to this hypothesis, our data show that this approach provides a measurement of sniffing less accurate than that of directly measuring sniffing of the cylinder, either by manual or automated methods. We have observed that test mice often walk beside or along the cylinder wall, but orient their heads towards the cylinder for only brief, intermittent periods to sniff. This behavior may account for much of the discrepancy between the “cylinder proximity” measurements and our recommended “cylinder sniffing” approach. In summary, use of the cylinder proximity approach may risk a higher rate of false positives and false negatives in assessing sociability in the Social Choice Test; our results support the use of direct measurements of cylinder sniffing.
The higher general validity of cylinder scores compared to chamber scores suggests that active investigation of a conspecific is the predominant component of sociability in the Social Choice Test. Sociability, the tendency to approach and affiliate with an unfamiliar conspecific, is a relatively simple social behavior, but it is important in many species as a prelude to more complex behaviors, such as the formation of social bonds. Research into the biological factors that influence sociability in mouse models of ASD may eventually yield insight into the social impairments of ASD, and optimal measurement of sociability is essential to obtaining clear results in this endeavor.