The completeness of information on the ethnicity of cancer cases following linkage with HES varied significantly (P < 0.001 in each case) by the demographic and clinical factors listed in Table .
The value of linkage with breast cancer screening services (NBSS) information on ethnicity is shown in Table and Table . Table describes the sensitivity, specificity and PPV of ethnicity derived from the NBSS compared with that recorded in HES (i.e. using HES as a gold standard), for 5243 breast cancer cases with ethnic group recorded in both HES and NBSS datasets. Sensitivity was high (> 90%) for White and South Asian cases, and moderately high (61.4%) for Black cases, suggesting that the NBSS could be used to determine the ethnicity of cancer registry cases where this was not recorded in HES. No cases recorded as Chinese/Other or Mixed ethnicity in HES were assigned the same ethnic group in the NBSS dataset. However, the value of the NBSS records for these two ethnic groups cannot be precisely determined because of the small numbers involved (14 and 10 cases, respectively). Table shows the effect of using the NBSS data to resolve the ethnicity of registry cases that were not recorded in HES. A total of 1082/26 342 (4.1%) breast cancer cases whose ethnicity was not known in HES had an ethnic group recorded in the NBSS. Overall it decreased the proportion of cancer cases with unknown ethnicity from 23.6% to 22.6%.
Sensitivity, Specificity and Positive Predictive Value of NBSS-derived Ethnicity for Breast Cancer Cases
Ethnicity of Cases Following Linkage with HES and NBSS Datasets
The sensitivity, specificity and PPV of Onomap and Nam Pehchan for each ethnic group is shown in Table . The sensitivity of Onomap is high for White and South Asian ethnic groups (99.8% and 82.1%), but low for Black and Chinese/Other groups (4.4% and 0.0%). The sensitivity of Nam Pehchan was lower that of Onomap for South Asian cases (71.1% and 82.1%), but when both were combined, sensitivity was higher than each individual application (90.5%). A total of 14 615 cases had their name at birth recorded on their death certificate.
Sensitivity, Specificity and Positive Predictive Value of Name Recognition Software
Table shows the sensitivity, specificity and PPV of 2001 national Census data on ethnicity as a predictor of the ethnic group of individual cases. The sensitivity of Census data for the White ethnic group is high (99.3%), but very low for all other ethnic groups (less than 7.4% for South Asian cases, 2.3% for Black cases, and 0% for the remaining two groups).
Sensitivity, Specificity and Positive Predictive Value of Census Data on Ethnicity
The ethnicity of cases that were missing following linkage with the HES and NBSS datasets was imputed in Stata using ICE with an imputation model that included the variables significantly associated with missingness (Table ), the predicted ethnicity of each case made using Onomap and Nam Pehchan, and the ethnic breakdown of the area of residence of the case. The number of imputed datasets generated for the full run was set to 23 as ethnicity was missing for 22.6% of the cases (Table ).
Table shows the sensitivity, specificity and PPV of the full multinomial logistic regression model used to impute missing ethnicity. The sensitivity and specificity of the full model was comparable to that from the name recognition software alone for the White group (99.3%/56.0% vs. 99.8%/51.5%, respectively). The sensitivity of the full model was slightly higher for cases from the South Asian group than name recognition software alone (94.7% vs. 90.5%, respectively), and substantially higher for Black and Chinese/Other ethnic groups (20.4% vs. 2.3% and 21% vs. 0%, respectively). The sensitivity of the full model for the Mixed ethnic group remained at 0%.
Sensitivity, Specificity and Positive Predictive Value of Full Model
Table compares the proportion of cases in each ethnic group for complete and imputed cases (all 23 imputations combined). The proportion of cases in the White, South Asian and Black groups was slightly lower among the imputed cases than the complete cases (95.8% vs. 96%, 1.7% vs. 1.8%, and 1.6% vs. 1.7%, respectively). For the remaining ethnic groups, the proportion of cases each group was approximately 1.5 times higher among the imputed cases (0.6% vs. 0.4%, and 0.3% vs. 0.2% for Chinese/Other and Mixed ethnic groups, respectively.)
Comparison of Distribution of Ethnic Groups: Observed and Imputed