The simulations showed that the method used to categorize the open-ended response strongly affected type 1 error when we compared mean intake between open-ended and predefined responses. We also observed that the methods chosen to categorize open-ended responses biased odds ratio estimation, whether the data were treated as purely categorical or as interval. When we compared intake reported by open-ended responses versus predefined responses, we noticed a variation in the P values reported, and this variation depended on how we categorized the open-ended responses into the noncontiguous, predefined responses.
By simulating the open-ended and predefined responses to have the same proportions, we observed that the method of categorizing continuous observations that fall between predefined categories inflates type 1 error. Only one method for categorizing the data did not severely inflate type 1 error: treating the data in the gaps as missing. We caution against concluding that this approach is the best for comparing means; in our analysis, this method precisely matched the method we used to simulate the data. As shown in Appendix 2
, our use of proportions in the multinomial distribution matched the proportion of the open-ended data that fell into each of the predefined categories, essentially ignoring those observations in the gap. These results imply that, when making inferences about the mean difference and comparing categorized open-ended data with predefined categorical data, one needs to categorize open-ended data by using a method identical to the thought process a participant would use when categorizing his or her intake into 1 of 2 noncontiguous, predefined categories. However, this categorization could be accomplished only by conducting cognitive interviews with participants.
Our simulation studies of diet-disease associations using logistic regression models to estimate the odds ratios showed that categorizing open-ended data by using any method biased the estimates away from the null. This result is unsurprising, because categorizing the data can be seen as changing the size of the unit change in intake, which corresponds to a larger odds ratio (17
). However, once the open-ended responses are categorized, the method of categorizing the open-ended responses that fall in a gap does not have as strong an effect on odds ratios as it does on inferences about differences in mean intake. The odds ratio estimates for models 1 and 2 showed some variation, but, given the 95% confidence intervals of each estimate, the inference regarding the significance of the odds ratios was similar across categorization methods.
When the true underlying mechanism of risk is a threshold, as simulated in models 3 and 4, treating the categories as ordinal does not detect any change in risk due to intake. Only by comparing each category with a reference can the analysis correctly estimate the simulated odds ratio; however, the odds ratio estimate differs depending on the method of categorizing the data. Most categorization methods tell us that the threshold for orange juice intake to affect risk is between once per week and 2–4 times per week, but the maximum and minimum methods do not make this finding evident. Recall that the true threshold is the midpoint of the gap between once per week and 2–4 times per week. The minimum and maximum place the entire gap into one of the adjacent categories. Therefore, logistic regression estimates an odds ratio for a group that consists of individuals on both sides of the true risk threshold, thereby biasing the estimate toward the null (3
Given the results from the simulated data, interpretation of the results comparing mean intake collected from an FFQ using open-ended responses with mean intake collected from an FFQ using predefined responses for hamburger intake is not straightforward. Comparing mean intakes by using simulated data showed an inflation of type 1 error. Therefore, the significant difference detected by the t test comparing the current meat cookery FFQ with the modified meat cookery FFQ may be artificially generated by the method chosen to categorize the open-ended data before the comparison was made.
A limitation of this research is that we could not address the participant's framework for choosing answers or the error introduced by high quantities of missing data as a result of mixing open-ended responses after a series of closed-ended questions, because the simulations model data that had already been collected. By modeling our simulations as we did, we isolated the error induced by the researcher's choice of how to pool the data and showed that treatment of gaps can modify the inference in a study independently of more common sources of variation in a nutrition epidemiology study using FFQs.
In summary, when mean intakes based on open-ended responses are compared with those measured by predefined categories, the type 1 error rate can be severely inflated. Therefore, significant differences in mean intakes between 2 groups collected by using these 2 different methods cannot be trusted. In contrast, when odds ratios are estimated, if the risk changes continuously with intake, the odds ratio estimates are similar regardless of the categorization methods used, avoiding the extremes of the maximum or minimum methods. Yet, when there is an intake threshold for the change in risk, then if the threshold occurs within a large category, the precise magnitude of the threshold may be masked by the method selected to categorize the gaps.
Therefore, the message is to pool data of the same response types when possible. If different response types are used, odds ratios will have a slight, but most likely acceptable, bias. Mean intake cannot be compared when pooling data. Therefore, researchers must proceed with caution when building risk models. Commonly, t
tests for mean intake are computed first to determine whether a dietary variable enters multivariable logistic regression models. When dealing with pooled data of different types, researchers must rely on other multivariable model-building methods based on odds ratios such as stepwise regression or Bayesian model averaging (18
) rather than the preliminary t
tests. When developing the questionnaire, one can minimize the effect of the error from categorizing the gaps by simply wording the predefined categories so they are contiguous.