Selection of studies
With one exception, all studies included in this mega-analysis were conducted by the first author or his collaborators. All studies available at the inception of this paper were included without prejudice as long as they contained at least ten participants and the two essential types of data: (i) standard Big-Five questionnaire scores and (ii) Big Five ratings of behavior during experience sampling. Of 21 studies available, six studies were not included, leaving 15. Four were not included because standard questionnaire data were not available, one study was not included because participants were explicitly instructed to center their ratings on the number 4, eliminating any individual differences in behavior, and one study was excluded because there were only nine participants.
Because we collected all of the data ourselves, we had the rare advantage of having access to all raw data. Thus, we were able to combine all data sets into one file for analysis for a “mega-analysis”, in addition to the more traditional meta-analysis (Steinberg et al., 1997
). Participants in the 15 studies used several different questionnaires to describe their behavior. For ratings on the standard questionnaires, one study had a 1 to 5 scale, two studies had a 1 to 6 scale, and twelve studies had a 1 to 7 scale. For ratings during the ESM reports, four studies had a 1 to 6 scale and eleven studies had a 1 to 7 scale. To equate the scales across studies, we POMP transformed scores on both standard questionnaires and ESM reports (Cohen, Cohen, West & Aiken, 2003
). Numbers were re-scaled so 100 was the maximum possible, 0 was the minimum possible, and each intermediate score represented the portion of 100, or the “Percent Of Maximum Possible”. This resulted in scores on a scale of 0 to 100 on each of the Big Five factors for all standard questionnaires and ESM reports. Because POMP is a linear transformation, it has no effect on the results, other than making them more interpretable and comparable. The scores obtained on each Big Five domain for each standard questionnaire, or for each ESM report, were then combined to form the data sets for mega-analysis.
Combining the data resulted in 21,871 concatenated ESM reports provided by 495 participants from 15 different studies. Each participant provided an average of 44.18 (SD = 35.32) reports. With the exception of Extraversion, which was assessed in every study [N (participants) = 495, n (reports) = 21,798 (not including reports for which the Extraversion score was missing), average number of reports = 44.04], different studies assessed different subsets of Big-Five traits. Agreeableness was assessed in 12 studies (N = 399, n = 15,388 average number of reports = 38.57), Conscientiousness in 14 studies (N = 474, n = 20,922, average number of reports = 44.14), Emotional Stability in 12 studies (N = 386, n = 17,827, average number of reports = 46.18), and Intellect in 10 studies (N = 311, n = 12,544 average number of reports = 40.33).
For state reports that had missing items, the state score was calculated with the completed items. For some state reports, state scores were missing for one or more Big Five domains (state scores would be missing if all items were not completed or marked “n/a”). As a check of the potential impact of missing data, we counted missing state scores for each trait. For Extraversion, 73 of 21,871 possible state scores were missing (0.33%; average number of missing scores per participant = 0.14). For Agreeableness, 497 of 15,885 possible state scores were missing (3.13%; average number of missing scores per participant = 1.24). For Conscientiousness, 368 of 21,290 possible state scores were missing (1.73%; average number of missing scores per participant = 0.77). For Emotional Stability, 150 of 17,977 possible state scores were missing (0.83%; average number of missing scores per participant = 0.38). For Intellect, 28 of 12,572 possible state scores were missing (0.22%; average number of missing scores per participant = 0.09). Given the small percentages of missing state scores, it is unlikely that these missing data, even if they are not missing at random, would have any appreciable effect on the relationships between parameters of density distributions and trait scores. Consequently, density distributions were simply calculated with available state scores (missing data were not replaced or imputed).
It is possible that study differences in adjectives, time of year, or other administrative factors led to differences in mean-level of endorsement of the traits (or in other parameters such as variability, etc.). Such study differences could artificially inflate correlations, if they affected both the questionnaire scores and the ESM scores of a given study in one direction. For example, if the items in a particular study led to higher means in the ESM portion and also higher means on the questionnaire than for other studies, then the correlation between the means would be artificially increased because of the between-study differences. However, we are interested only in between-individual differences. To correct for this possibility, questionnaire scores and parameters were centered within-study before computing correlations. For the analyses predicting single states, single states were centered within-study before computing correlations. It is important to note that this correction eliminates any legitimate differences between studies in the trait levels of individuals, and thus may lead to a slight underestimate of the true correlations.
shows two parameters of the density distributions for each study and each trait separately. The mean refers to the average person's average level of the state across the ESM portion of the study. Because the data were POMP transformed, 50 would mean that the average state is equally close to each pole of the trait; numbers higher than 50 mean that the average person's average state is closer to the high end of the trait to the low end of the trait. The standard deviation refers to the amount of within-person variability for the average person in the study. A high standard deviation means that the average person varies quite a bit in how much his or her behavior manifests a given trait.
Density Distributions for all Studies
The mega-analysis line of shows the average across all available individuals (no centering was done here because there were no correlations that could be artificially inflated). This line shows what the behavior of a typical individual is like, characterized by the five dimensions of the Big 5. The average person's mean is close to the midpoint for extraversion and intellect, but much closer to the agreeable, conscientious, and emotionally stable end than to the disagreeable, unconscientious, and neurotic ends, respectively. The average person's SD is high for each of the traits, especially for extraversion and conscientiousness, meaning that the typical individual routinely and regularly expressed manifested levels of most traits in his or her behavior (Fleeson, 2001
). For example, the SD of 17.37 for extraversion means that, assuming a normal distribution, the typical individual needed a range of about 70 to cover 95% of his or states. Because the scale is only 100 points wide, the typical person covered most of the possible ranges of extraversion states routinely and regularly.
Comparing the variation within people to the variation between people aids evaluation the extent to which individuals deviate from their trait standing. also shows the between-person standard deviations of within-person means and standard deviations; this line indicates how much people differed from each other in their distributions. For four of the five traits, the within-person standard deviation was higher than the between-person standard deviation. Additionally, we conducted unconditional multilevel models in order to partition the total variance in states into that portion that represents differences between people in which traits they manifest and that portion that represents differences within each person in which traits they manifest on different occasions. As shown in , most of the variation in trait manifestation in behavior is within-person variation, that is, the same person manifesting different traits at different moments.
This high degree of within-person variability means that the majority of the time, participants reported acting at a different level than their typical levels. This lowers the likelihood that questionnaire scores will be able to predict such behaviors accurately, and verifies one source of concern about the ability of traits to predict trait manifestation in behavior.
Each participant's scores on standard Big-Five questionnaires were correlated with one of their randomly selected state reports. Ten trials were repeated with a different random selection of states in each trial. The average of these correlations are shown in . Several aspects of these results are important. First, the correlations are all significant. Thus, trait standing does significantly predict trait manifestation in behavior. Second, the magnitudes of the correlations are all below the traditional .40 barrier, consistent with the assumed but never tested claim that trait standing does not as strongly predict naturally-occurring trait manifestation in behavior at any given randomly selected moment. Third, however, the correlations vary from .18 for extraversion to .37 for intellect, with three of the traits having correlations over .30, suggesting that the typical correlation for predicting even single states is respectably high. This also means that participants do not simply report their trait standing on the ESM reports, but rather vary their responses from moment to moment.
Results of the Mega-Analysis: Relationship of Big-Five Trait Standing to Multiple Behavior Summaries
Given the large variability in states, it is necessary to summarize them in some way. We first summarized states by the central tendency, using three different operationalizations of central tendency. The central tendency represents the location or center of an individual's distribution, and the three operationalizations differ in how they conceptualize the center. The mean represents the balancing point of the distribution, such that the sum of deviations above the mean is matched by the sum of deviations below the mean; the mean also represents the important aggregate of states (Epstein, 1979
). The correlations between means and trait standing, as shown in , reveals that trait standing strongly predicts mean levels of trait manifestation in behavior. All correlations exceeded .40, and one reached as high as .56. This means that trait standing is a powerful predictor of averaged trait manifestation in behavior.
Correlations with the median of the distribution were almost as high, ranging from .41 to .55. The median is also a balancing point, but balances the number of deviations above and below rather than the magnitude of deviations. Finally, correlations with the mode were not as high, but still higher than correlations with single states, ranging from .28 to .48. These correlations suggest that trait standing refers to the central tendency of states.
Because trait standings were correlated across traits, we also wanted to determine the unique relationship of each Big-Five trait standing with the mean. Participants' means for each state were regressed onto the five questionnaire trait standings. These correlations were reduced slightly in magnitude, but remained high. Thus, intercorrelations among traits do not explain the strong predictive power of traits.
Alternate summaries: Maximum and minimum
The most extreme behavior is also a possible characteristic of the behavioral distribution that corresponds to trait standing. For example, the maximum state level may be implicated by competence views of traits and the minimum state level may be implicated by baseline views of traits (i.e., that trait standing reflects the amount of a trait present when no other factors are acting on behavior). shows that both the maximum and the minimum state levels were predicted by trait standing. The magnitudes of predictions of the maximum state were similar to that for central tendency, ranging from .34 to .54, whereas the predictions of the minimum state were more similar to predictions of randomly selected states, ranging from .22 to .37.
Comparison of mean, median, mode, maximum, and minimum
In the above bivariate relationships, the mean appeared to be the strongest correlate of the trait standing. In order to determine whether the other summaries are related independently to trait standing, we conducted a series of multiple regressions. Each multiple regression predicted the trait standing from the mean and from one other summary, one trait at a time (note that this puts the questionnaire score into the DV position). For none of the five traits was the mean or mode significantly related to trait standing when holding the mean constant. For only one trait (intellect), did the minimum have a significant independent relationship to trait standing. However, for four of five traits (and with a p value of .06 for the fifth trait), the maximum was significantly related to trait standing even while controlling for the mean. In all 20 tests, the mean remained significantly related to trait standing, even while controlling for any other summary. These results suggest trait standing, as assessed by questionnaire, implicates primarily individuals' mean trait manifestation, but also provides independent information about individuals' maximum trait manifestation in behavior.
Other parameters of distributions
Trait standing may have implications not only for a single level in the distribution but also for the width and shape of the distribution. shows that trait standing was only weakly related to the amount of variability in states, with correlations ranging from -.07 to .23. To be sure that a curvilinear relationship between trait standing and variability was not underlying the weak linear relationship, we also correlated variability in states with the quadratic function of trait standing (while controlling for the linear function). These correlations were also very small, with only one being significant.
shows that even skew, correlations ranging from -.18 to -.28, and kurtosis, correlations ranging from -.02 to .27, of state distributions had relationships to trait standing. However, these relationships were small.
It is possible that repeated completion of reports may have caused scale-use drift, fatigue, or increased accuracy. We divided each participant's reports into the first 24 completed (typically about 1 week's worth), the next 24 completed (typically a second week), and the remainder (note that studies varied in the number of reports participants completed). There were no significant differences between means of four of the five traits due to duration, suggesting that participants maintained remarkably stable standards across the studies. Conscientiousness was the only exception, increasing from .61 in the first week to .66 in the third week. This might be explained by the fact that the longest study preselected participants to be high in conscientiousness. Within-person standard deviations differed slightly and significantly from the first week to the second for all five states, and again to the third week for two more states. Standard deviation dropped about 1-2 points per week, out of standard deviations that ranged from about 12 to 18. This means that people were less variable in later periods of the studies, possibly suggesting a change in scale use. Finally, correlations between questionnaire scores and average state levels were strongest in the first week, slightly lower in the second week, and weakest for longer durations. Specifically, the three correlations for each week, respectively, were .42/.32/.28 (extraversion), .54/.55/.50(agreeableness), .45/.43/.47 (conscientiousness), .53/.46/.38 (emotional stability), and .55/.50/.42 (intellect). These results suggest future research into the effects of study length on the quality of the data.
Effects of study factors
In a typical meta-analysis, the question of whether study factors are related to differences between study results would be addressed by formal tests. Such tests would compare groups of studies at different levels of the factor in question (for example, if some studies were conducted in the summer and some in the winter, a two-level variable would be created, and the variance in studies at one level (summer studies) would be compared to the variance in studies at the other level (winter studies) using a significance test). Although the total number of participants and of ESM reports in the present study is substantial, the number of studies (15) does not allow detection of study-level differences by this process. However, we selected certain study factors that seemed particularly relevant to the analyses, and then re-computed the central analyses for these select studies. The results for these studies, although not compared to the results for the other studies in a formal test, can provide some information about the potential effect of the identified study factors on the results. Because these analyses are only tentative, we limited them to the prediction of the average (mean) state level, which was the most predictable in the earlier analyses.
First, lower reliabilities of either the standard trait questionnaires or of the state measures are likely to reduce and put a ceiling on the correlation between the traits and behavior. reports any cases in which reliabilities for a given scale were below .65. Although many studies had a couple of traits out of 10 that were low in reliability, three studies did not have any traits in either the self-report questionnaire or the ESM portion with reliabilities below .65 (ESM Standard 2, Weekly, and ESM Standard 3). We recalculated the correlation between questionnaire scores and the mean for those three studies combined, and the correlations were stronger. Specifically, the correlations were .55 (extraversion), .59 (agreeableness), .53 (conscientiousness), .58 (emotional stability), and .68 (intellect). These three studies suggest the average questionnaire to mean correlation approaches .60, even stronger than the already strong results for the mega-analysis as a whole. As an alternate procedure for considering reliability, we recomputed the correlations between trait scores and mean state ratings, using only scales with reliabilities equal to or greater than .65. The resulting correlations were: for extraversion, .48 (N (studies) = 10, n (participants) = 311); for agreeableness, .55 (N = 9, n = 293); for conscientiousness, .50 (N = 7, n = 226); for emotional stability, .48 (N = 7, n = 220); for intellect, .52 (N = 7, n = 256). The average of these correlations is .51, very similar to the average obtained using all scales.
Second, in many studies, states were measured with slightly different items from or a subset of the items used to assess traits on the questionnaire. However, in six of the studies, the items were the same in the ESM portion as they were in the trait questionnaire, except for an occasional single item that did not cohere with its trait (ESM Opposites, Weekly, ESM Extraversion, ESM Defined, ESM Lab, ESM Standard 3). For these six studies, correlations between trait scores and means of corresponding states were again very high: .51 (extraversion), .59 (agreeableness), .56 (conscientiousness), .55 (emotional stability), and .60 (intellect).
Third, we considered the role of the described time frame for state reports, because the length of time described may affect the accuracy of the description. The correlations between trait scores and the mean for studies in which the described frame was the previous twenty or thirty minutes were the lowest: .24 (extraversion), .44 (agreeableness), .23 (conscientiousness), .39 (emotional stability), and .32 (intellect). Correlations between trait scores and mean reports based on the previous hour were similar to those for all studies combined: .41 (extraversion), .58 (agreeableness), .49 (conscientiousness), .51 (emotional stability), and .67 (intellect). This was also true for studies with reports describing the previous three hours: .52 (extraversion), .50 (agreeableness), .56 (conscientiousness), .57 (emotional stability), and .40 (intellect). These results suggest that 1-hour and 3-hour time frames may provide more accurate reports. It could be that describing longer time frames is more like describing trait standing. Longer lengths of time may include more behaviors that participants summarize over, and this summarization process may be similar to how participants answered trait questionnaires. However, the 1-hour reports were slightly better in these 15 studies than were the 3-hour reports (average = .53 vs. .51 for 3-hour reports). Unfortunately, any statistical test of study-level factors would have an N of 15, and so we are unable to test these differences statistically. Explaining these different results for different time frames may be a fruitful area for further research.
depicts the average of the correlations between traits and average states. Correlations were z-transformed, averaged, and the average was un-transformed. shows that traits were strongly predictive of average trait manifestation in behavior. Even when all 15 studies were included, the correlations were strong. However, when limited to studies with high reliability or high item overlap, correlations were even stronger, approaching .60.
Effects of Study Factors on State Predictability