The relatively infrequent use of IVs in epidemiology may be the result of a perceived lack of strong instruments or concerns about IV validity. In our two example studies evaluating the effectiveness of medicines in routine care, we found that PPP in almost any of its definitions or study formulations would be considered a strong instrument as compared with typical examples in the economics literature. The results also show a broad reduction in imbalance of measured covariates across our restriction and stratification variants. The reduced imbalance in measured covariates and the IV's strength lend credence to the notion that PPP may be an effective instrument for the selected drug comparison. We also noted that the association between instrument strength and imbalance in measured covariates was a mixed one; the Spearman correlation in BC was fairly high, whereas that of PA was close to zero.
Validity of an IV is an untestable property because it involves quantifying the strength of the association between the instrument and the outcome, potentially mediated through unmeasured paths. As in other approaches to controlling confounding, IV validity can be explored through subject matter expertise or empirical assessment of relationships likely to be correlated with unmeasured factors [
19]. Inspection of the reduction in imbalance of measured factors achieved by applying the instrument may also be informative. In our data, application of the IV generally reduced imbalance in measured covariates, but significant imbalance remained among the measured psychiatric conditions. These conditions were each correlated with each other, perhaps because of misclassification of specific psychiatric conditions [
38]. Because of these strong correlations, we used the Mahalanobis distance to assess overall balance.
The reduction in Mahalanobis distance in many of the variations, along with previous work [
7,
8,
19,
24], suggests that PPP was at least a reasonably valid instrument in this setting. The fact that some imbalance remained, especially in psychiatric conditions, suggests some “nonrandom” assignment of patient to practice, such as a clustering of a particular patient type within practice [
19]. (For a violation of the IV assumptions to occur, the selection of patients to practice would also have to be associated with the outcome of death.) Overall, an observed decrease in Mahalanobis distance may be suggestive of increased validity but is not necessarily indicative; it is possible to imagine a circumstance where the Mahalanobis distance is dramatically reduced but IV validity is not affected. It is also possible that using an IV—even one that yields strong treatment group balance—can lead to greater bias than would occur in a non-IV setting. To avoid this, any numeric evaluation of validity also requires due consideration of potential violations of the IV assumptions based on subject matter expertise and other knowledge [
14,
39].
The fairly consistent decrease in partial r2 when additional past prescriptions were added to the preference estimation algorithm suggests that considering the additional prescriptions decreases the proportion of the variance in treatment explained by the instrument and weakens the predictive power of the dichotomous IV. Using a continuous rather than dichotomous IV may have mitigated this effect. Even though the IV was weaker, the additional previous prescriptions may have also yielded a better estimate of the physician's true preference because they estimated preference over a longer period of time and over more patients. This suggests that the somewhat lower partial r2 values when adding previous prescriptions may be a better estimate of the PPP IV's true strength than the higher value observed in the base case.
At the same time, almost all of the cases in which we saw increases in overall imbalance came from requiring that a doctor be totally consistent in his or her prescribing over the window considered (, rows P4 through P6). However, the physician may be consistent not because of his or her preference but because he or she is seeing similar patients who may have self-selected to his or her practice (“doctor shopped”), or as a result of other forms of atypical case mix. In these cases, the element of randomness in the “assignment” of patients to doctor may have been reduced or lost.
We had hypothesized that a stronger instrument would be associated with somewhat greater imbalance: as instrument strength increases, the IV starts to resemble more closely the treatment variable. If this resemblance becomes too strong, then the IV may be confounded by the same factors that confound treatment, and stratification by the strong IV should reduce imbalance less than stratified by a weaker IV that is less correlated with the treatment's confounders. In our data, by Spearman's rank-based measure of correlation between strength and balance, this played out in BC (r = 0.482) but not in PA (r = −0.049). Using Pearson's measure based on an assumed linear relationship, there was moderate correlation in both populations (BC r = 0.270; PA r = −0.249). The divergent findings suggest no clear answer to whether there was a trade-off between imbalance and strength.
The IV methods measure the effect in the marginal patient rather than the effect in the entire cohort [
12,
40,
41]. By varying the cohort definitions, we may have also affected who the marginal patient would be, and therefore, any measures of effect drawn from these variations may not be comparable. We did not present second-stage-effect estimates for all variations, as the choice of the “right” estimate would be very much a decision of study design and subject matter expertise, and should not be driven by the results that appear most reasonable based on previous knowledge.
This study examined a range of implementations of the PPP instrument in two pharmacoepidemiologic studies on APM treatment. In these limited examples, the application of the PPP instrument did generally reduce imbalances, but created imbalances in some cases of the very stringent IV definitions. Imbalances in measured covariates can be controlled for in the analysis, but the remaining imbalances suggest that the unmeasured covariates may be imbalanced as well, and may therefore lead to bias in a traditional outcome model.
In summary, we have demonstrated a number of variants of the PPP instrument and shown how empirically assessing the strength of an IV and its reduction in imbalance of covariates may inform the use of PPP in practical settings relevant to pharmacoepidemiology using claims data.