Our example concerns data on bacterial vaginosis status for women in the HIV Epidemiology Research Study. A total of 1,310 (871 HIV-infected and 439 at-risk uninfected) women were enrolled into this prospective study across four U.S. cities from 1993 to 1995.^{34} Researchers diagnosed bacterial vaginosis semi-annually by two different techniques, referred to as the “CLIN” (clinically-based) and “LAB” (laboratory-based) methods. A CLIN diagnosis required the presence of three or more specific clinical conditions based on a modification of Amsel's criteria,^{35} while LAB diagnoses were made via a sophisticated Gram-staining technique.^{36} Prior references^{37–38} provide details on these methods in the study. As in Gallo et al.,^{38} we treat the more costly LAB method as a gold standard assessment, while the CLIN approach represents an accessible error-prone substitute. These authors found evidence of low sensitivity for the CLIN method, and suggested that its accuracy may suffer due to wide heterogeneity in bacterial vaginosis cases or due to the need for technicians to be trained in order to properly apply the subjective Amsel criteria.^{38}

A unique feature of this example is that both LAB and CLIN diagnoses were made regularly. Thus, in addition to fitting a “naïve” main study-only version of

model (1) with CLIN status (Y*) substituted for LAB (Y), we were able to fit

Eq. (1) to data using the assumed gold standard (Y) on all subjects. While the illustration of validation data-based adjusted analyses then requires ignoring LAB data on a random subset, an advantage is that we have an “ideal” complete-data model for comparison.

We use data from the 4^{th} semi-annual study visit on 982 black, white, and Hispanic women who were 25 years or older at enrollment. Available variables potentially associated with bacterial vaginosis status include age, race, HIV status (0 if negative, 1 if positive), and HIV risk group (0 if via sexual contact; 1 if intravenous drug use). Study site and CD4 counts among HIV positives showed little association with bacterial vaginosis status in this sample.

Median age at enrollment was 37 years. Other potential bacterial vaginosis risk factors are distributed as follows: race/ethnicity (60% black, 24% white, 16% Hispanic); HIV status (69% positive, 31% negative); HIV risk group (47% sexual, 53% intravenous drug use). Among women with data on bacterial vaginosis, 41% were positive via the LAB method, versus 25% based on CLIN. Unadjusted estimates were 0.53 (sensitivity) and 0.94 (specificity), suggesting that CLIN yields a low risk of false positives but high risk of false negatives.

For an “ideal” comparative analysis, we first fit

Eq. (1) to all women, with the gold standard diagnosis (LAB; 1 vs. 0) as the outcome. Preliminary analyses revealed similar bacterial vaginosis prevalence among white and Hispanic women, so we created a binary variable (0 if non-black, 1 if black). Initially dichotomizing age at the median, we assessed second- and higher-order interactions among age, race, HIV status, and risk group. A likelihood ratio test supported elimination of all 11 interaction terms.

A total of 924 women, with complete data on both bacterial vaginosis assessments and all risk factors, contributed to the fitted models summarized in . The upper half of the table summarizes the fit of the resulting version of

model (1) for LAB status, in which we treat age (in years) continuously:

We then fit the same model upon substituting the error-prone CLIN diagnosis as the outcome (lower half of ). The two analyses differ markedly in terms of magnitude of the estimated OR for HIV risk group (1.50 for LAB, 2.68 for CLIN), and directionality of the estimated OR for HIV status (1.19 for LAB, 0.71 for CLIN).

| **Table 1**Logistic regression results on 924 women at their 4th study visit |

To illustrate misclassification adjustment, we selected a random internal validation subset of size n

_{v}=300 women. Predictor selection via

model (9) fit to these 300 women revealed no independent association between race and CLIN status. Pairwise and higher-order interactions among LAB status, risk group, HIV status, and age (dichotomized for purposes of estimating sensitivity and specificity) were non-significant as a group. The version of

Eq. (9) utilized in the main/internal validation study likelihood is

where AGEGTMED indicates whether a subject's age at enrollment exceeded the median.

The upper half of summarizes a complete analysis of the data via the joint likelihood in

Eq. (11)–

(12). For comparison, the lower half of gives corresponding results assuming non-differential misclassification [restricting θ

_{2}=θ

_{3}=θ

_{4}=0 in

Eq. (14)]. The likelihood ratio test comparing the joint models with and without the non-differentiality assumption was highly significant (χ

^{2}=20.1,

*P*<0.001), strongly confirming a need to account for dependence of the sensitivity and specificity of the CLIN diagnosis upon subject-specific covariates. Note that the analysis in the upper half of yields the same interpretations as the “ideal” analysis (upper half, ), in terms of directionalities and magnitudes of the estimated ORs. In contrast, results in the lower half of are similar to those of the “naïve” analysis (lower half, ), showing an elevated estimate for risk group and negative directionality for HIV status. This highlights the value of internal validation data for modeling sensitivity and specificity.

| **Table 2**Results of maximum likelihood analysis of main / internal validation study data on 924 women (n_{m} = 624; n_{v} = 300) at their 4th study visit: Estimates of primary model parameters |

provides the MLE of (θ

_{0}, θ

_{1}, θ

_{2}, θ

_{3}, θ

_{4}) in

Eq. (14) based on the joint likelihood

Eq. (11)–

(12). Note that all three predictors (risk group, HIV status, and age) are independently associated with sensitivity and specificity. provides corresponding MLEs of (SE, SP) via equations.

(9)–

(10), with multivariate delta method-based standard errors (details available from the authors). Holding other variables constant, sensitivity tends to be higher (and specificity lower) for those who are in the intravenous drug use risk group, younger, or HIV-negative. The variations in these estimates give further credence to the differential nature of outcome misclassification in this real-data example.

| **Table 3**Results of maximum likelihood analysis of main / internal validation study data on 924women (n_{m} = 624; n_{v} = 300) at their 4th study visit: Estimates of secondary model parameters |