Background: In epidemiologic studies that rely on professional judgment to assess occupational exposures, the raters’ accurate assessment is vital to detect associations. We examined the influence of the type of questionnaire, type of industry, and type of rater on the raters’ ability to reliably and validly assess within-industry differences in exposure. Our aim was to identify areas where improvements in exposure assessment may be possible.
Methods: Subjects from three foundries (n = 72) and three textile plants (n = 74) in Shanghai, China, completed an occupational history (OH) and an industry-specific questionnaire (IQ). Six total dust measurements were collected per subject and were used to calculate a subject-specific measurement mean, which was used as the gold standard. Six raters independently ranked the intensity of each subject’s current job on an ordinal scale (1–4) based on the OH alone and on the OH and IQ together. Aggregate ratings were calculated for the group, for industrial hygienists, and for occupational physicians. We calculated intra-class correlation coefficients (ICCs) to evaluate the reliability of the raters. We calculated the correlation between the subject-specific measurement means and the ratings to evaluate the raters’ validity. Analyses were stratified by industry, type of questionnaire, and type of rater. We also examined the agreement between the ratings by exposure category, where the subject-specific measurement means were categorized into two and four categories.
Results: The reliability and validity measures were higher for the aggregate ratings than for the ratings from the individual raters. The group’s performance was maximized with three raters. Both the reliability and validity measures were higher for the foundry industry than for the textile industry. The ICCs were consistently lower in the OH/IQ round than in the OH round in both industries. In contrast, the correlations with the measurement means were higher in the OH/IQ round than in the OH round for the foundry industry (group rating, OH/IQ: Spearman rho = 0.77; OH: rho = 0.64). No pattern by questionnaire type was observed for the textile industry (group rating, Spearman rho = 0.50, both assessment rounds). For both industries, the agreement by exposure category was higher when the task was reduced to discriminating between two versus four exposure categories.
Conclusions: Assessments based on professional judgment may reduce misclassification by using two or three raters, by using questionnaires that systematically collect task information, and by defining intensity categories that are distinguishable by the raters. However, few studies have the resources to use multiple raters and these additional efforts may not be adequate for obtaining valid subjective ratings. Thus, improving exposure assessment approaches for studies that rely on professional judgment remain an important research need.