Tables and list the RMSE and relative bias of the disease risks estimated using the reduced and full model averaged over the 1000 replicated data sets when contamination occurred in AB individuals not exposed to the environmental factor. When there was no contamination, we could see that the relative biases were small (less than 1.8 percent) in all six categories in both models. This implied that our reduced and full models were both efficient and could give precise estimates under no contamination. When contamination occurred, both the relative biases and RMSE of the risks estimated from the reduced model increased in all categories with larger proportion of contaminated individuals. The predicted risks increased in BB individuals exposed to the environment and AA individuals exposed to the environment. They decreased in the other four categories. The logit of the contaminated category corresponded to the intercept of the logistic model. Since the estimated coefficients of logistic regression model were interdependent, changing of one parameter led to changing of all the other parameters. Thus all the predicted risks deviated from their true values as a result of contamination in a single category, though the relative bias increased fastest in the contaminated category. In the contaminated category, the predicted risk differed greatly from its true value (15 percent difference) even with 20 percent contaminated individuals. The deviation reached nearly 60 percent when the proportion of contaminated individuals reached 0.8. When contamination status was incorporated into the model (full model), the relative biases and RMSE remained small in all six categories despite increasing proportion of contaminated individuals. The relative bias was less than 3 percent in all the categories even with 80 percent contaminated individuals.
| Table 1Relative bias and RMSE of the disease risks predicted using the reduced model in the first scenario. |
| Table 2Relative bias and RMSE of the disease risks predicted using the full model in the first scenario. |
Tables and present our findings for the impact of two-category contamination on predictive risks. In general, the relative biases and RMSE increased with increasing proportions of contamination. However, there were cases where they decreased with increasing proportions of contamination. For example, in scenario three the absolute value of the relative bias of the estimated risk of AB individuals not exposed to the environmental factor decreased with increasing contamination in AA individuals exposed to the environment, as shown in Table . Actually, we could see that the relative bias varied from -0.109 to 0.008 when rAB,NE equaled 0.2. This indicated that with increasing contamination, the predicted risk reduced at first, but then it increased and became larger than its true value. This was possible because the predicted risks were functions of the model coefficients. Contamination caused different coefficients to change in different directions. Therefore, the predicted risks could fluctuate in either direction with increasing contamination. In the third scenario, when contamination proportions in both categories were 0.2, the estimated risk for AB individuals exposed to the environment reduced 13 percent even though there was no contamination in this category. When the two contamination factors were 0.6 and 0.8, the estimated risk for this category decreased nearly 50 percent.
| Table 3Relative bias and RMSE of the disease risks predicted using the reduced model in the second scenario. |
| Table 5Relative bias and RMSE of the disease risks predicted using the reduced model in the third scenario. |
Tables and list the average relative bias and RMSE of the disease risks for the second and third contamination scenarios estimated using the full model. Similar to one group contamination case, when additional knowledge about individual's contamination status was available, the estimated relative biases and RMSE were greatly improved in all categories. They remained small despite increasing proportion of contaminated individuals. In both scenarios, the largest relative risks were about 3 percent even with 80 percent contaminated individuals. These results suggested that the additional contamination covariate was efficient in modeling population heterogeneity. The difference in individual's disease susceptibility was accounted for properly. Knowledge about contamination could improve the accuracy of the predicted risks.
| Table 4Relative bias and RMSE of the estimated disease risks predicted using the full model in the second scenario. |
| Table 6Relative bias and RMSE of the disease risks predicted using the full model in the third scenario. |