Rounding is the familiar practice of reporting one value whenever a real number lies in an interval. Consider, for example, how American meteorologists describe surface wind direction. Weather reports issued to the general public commonly delineate eight wind directions (north, northeast, east, and so on) while those to aircraft pilots delineate thirty-six directions (360, 10, 20, 30 degrees, and so on). A report to the public that the wind is from the north means that the wind direction lies in the interval [337.5°, 22.5°] while a report to pilots that the wind direction is 360° means that the direction lies in the interval [355°, 5°]. An important feature of wind reports is that the extent of rounding is common knowledge. Hence, pilots and members of the public know the accuracy of the measurements they receive.
Whereas the extent of rounding is common knowledge in standardized communications such as weather reports, recipients of rounded data may be unsure of the extent of rounding in other settings. Consider, for example, responses to the question “What time is it?” If someone says “4.01 PM,” one might reasonably infer that the person is rounding to the nearest minute. However, if someone says “4 PM,” one might well be uncertain whether the person is rounding to the nearest minute, quarter hour, or half hour. Moreover, one might be uncertain whether a person who says “4 PM” knows the precise time and rounds to simplify communication or, contrariwise, does not know the precise time and rounds to convey partial knowledge.
Uncertainty about the extent of rounding is common when researchers analyze survey responses. Respondents are routinely asked to report their annual incomes, hours worked, and other numerical quantities. Questionnaires generally do not request that respondents round to a specified degree, nor do they ask persons to describe their rounding choices. There are no established conventions for rounding survey responses. Hence, researchers cannot be sure how much rounding there may be in survey data. Nor can researchers be sure whether respondents round to simplify communication or to convey partial knowledge. Consider, for example, responses to the question: “How many hours did you work last week?” A person who says “40 hours” may know he worked precisely 40 hours, or know he worked 42 hours but round for simplicity, or not know his hours with precision but want to convey that he has a “full-time” job.
The prevalent practice in survey research has been to ignore the possibility that responses may be rounded. Most empirical studies take numerical responses at face value. When researchers show concern about data accuracy, they typically assume the classical errors-in-variables model in which observed responses equal latent true values plus white-noise error. However, the structure of the data errors produced by rounding is different from that occurring in the errors-in-variables model.
This paper studies the intriguing forms of rounding that appear to occur in responses to survey questions asking persons to state the percent-chance that some future event will occur. From the early 1990s on, questions of this type have become increasingly common in economic surveys. Manski (2004)
reviews the literature.
Over a decade ago, Dominitz and Manski (1997)
observed that respondents tend to report values at one-percent intervals at the extremes (i.e., 0, 1, 2 and 98, 99, 100) and at five-percent intervals elsewhere (i.e., 5, 10, …, 90, 95), with responses more bunched at 50 percent than at adjacent round values (40, 45, 55, 60). This finding has been corroborated repeatedly in subsequent studies. It seems evident that respondents to subjective probability statements round their responses, but to what extent? When someone states “3 percent,” one might reasonably infer that the person is rounding to the nearest one percent. However, when someone states “30 percent,” one might well be uncertain whether the person is rounding to the nearest one, five, or ten percent. Even more uncertain is how to interpret responses of 0, 50, and 100 percent. In some cases, these may be sharp expressions of beliefs, rounded only to the nearest one or five percent. However, some respondents may engage in gross rounding, using 0 to express any relatively small chance of an event, 50 to represent any intermediate chance, and 100 for any relatively large chance.
Survey data do not reveal why sample members may give rounded expectations responses. Some persons may hold precise subjective probabilities for future events, as presumed in Bayesian statistics, but round their responses to simplify communication. Others may perceive the future as partially ambiguous and, hence, not feel able to place precise probabilities on events. Thus, a response of “30 percent” could mean that a respondent believes that the percent chance of the event is in the range [25, 35] but feels incapable of providing finer resolution.
Considering the extreme case of total ambiguity, Fischhoff and Bruine de Bruin (1999)
suggest that when respondents feel unable to assign any subjective probability to an event, they may report the value 50 to signal epistemic uncertainty, as in the loose statement ‘It’s a fifty-fifty chance.’ This idea is formally interpretable as the grossest possible form of rounding, where 50 means that the percent chance lies in the interval [0, 100]. Lillard and Willis (2001)
offer an alternative interpretation, in which respondents first form full subjective distributions for the probability of an event and then report whichever of the three values (0, 50, 100) is closest to the mode of this subjective distribution.
Although survey data do not directly reveal the extent or reasons for rounding in observed responses, analysis of patterns of responses across questions and respondents is informative. We perform such analysis in Section 2, focusing on the expectations module in the 2006 administration of the Health and Retirement Study (HRS). We find that, for each question posed, the great preponderance of the responses are multiples of five, most responses are multiples of ten, and a moderate minority are multiples of 50. Examining the module as a whole, we find that sample members vary considerably in their response tendencies. A small but non-negligible fraction use only the values (0, 50, 100) throughout the module. Most of the respondents make fuller use of the 0–100 percent chance scale. About 0.26 at least once use a multiple of ten that is not one of the values (0, 50, 100), about 0.51 at least once use a multiple of five that is not a multiple of ten, and about 0.12 at least once use a value that is not a multiple of five.
The findings of Section 2 indicate that respondents differ systematically in their rounding practices, with some habitually performing gross rounding and others tending to give more refined responses. In Section 3, we suggest use of each person’s response pattern across questions to infer the extent to which he rounds his responses to particular questions. Suppose that one makes such inferences. That is, suppose that when person j
answers question k
with the value vjk
, one finds it plausible to infer from his response pattern that the quantity of interest actually lies in an interval [vjkL, vjkU
], where vjk
]. Then empirical research can proceed based on the assembled interval data. Research interpreting reported expectations as interval data makes weaker assumptions than does research taking responses at face value. Hence, it is more credible.
In principle, empirical analysis with interval data is simply a matter of considering all points in the relevant interval to be feasible values of the quantity of interest. The practical feasibility of implementing this simple idea depends on the objective of the analysis. We focus on familiar problems of regression and best linear prediction, where the objective is to predict the quantity of interest conditional on specified covariates. Manski and Tamer (2002)
, Manski (2003)
, Chernozhukov, Hong, and Tamer (2007)
, and Beresteanu and Molinari (2008)
have studied identification of and statistical inference on regressions and best linear predictors with interval data. We draw on their work and use the HRS data to illustrate.
The research approach proposed in Section 3 is logically more credible than the traditional practice of ignoring rounding, but it carries the price of weakened inferences. The only way to strengthen inference without weakening credibility is to collect richer data on expectations. Section 4 reports an exploratory study to show what we have in mind. Here we describe a sequence of survey questions that follows the usual percent-chance with further questions that probe to learn the extent and reasons for rounding. We use data collected in the American Life Panel to illustrate. Section 5 concludes.