With the creation and employment of empirical covariates, the hd-PS algorithm has been successful in adjusting for previously unmeasured confounders in nonrandomized studies. In this paper, we evaluated the original algorithm and several newly developed variants to test functionality in small study populations with few exposures and events. We observed that the original hd-PS algorithm functioned well in our studies except in cases where there were fewer than approximately 50 exposed patients with an event. Below this number, hd-PS yielded estimates similar to those obtained from standard covariate adjustment, but by using a selection technique that considered only the covariate-exposure association, we improved the algorithm's observed performance when the number of exposed patients with an event fell below this threshold. We further observed that, in all but the smallest study sizes, the algorithm reached its full potential to adjust for confounding after the addition of approximately 300 empirically selected covariates. Because of the very small number of exposed patients with events (*n* < 10), the hd-PS algorithm did not perform consistently in the SSRI Study; in all other cases, hd-PS appeared to perform as well as or better than did adjustment by standard investigator-selected variables, and it did so across a range of study sizes and event frequencies.

We chose to test the zero-cell correction and exposure-only selection techniques because the hd-PS algorithm evaluates and ranks variables by their potential for confounding by using 2 × 2 tables. The potential is driven by 2 factors: 1) the ratio of the prevalence of the confounder in the exposed to that of the unexposed and 2) the covariate-outcome risk ratio. If either of these values is 0 or undefined, then the confounder cannot be considered for inclusion. In studies with few events, it is likely that there will be a large number of confounder-event association 2 × 2 tables with 0’s in the

*a* or

*c* cells and thus undefined confounder-event risk ratios. We sought to remedy this problem by adding 0.1 to each of the 4 cells; while doing so will cause some shrinkage of the confounder-event risk ratio toward the null, it will also allow many confounders to remain under consideration for inclusion in the propensity score rather than be passed over (

37). The zero correction aids computation but does not add information, so with small numbers in the 2 × 2 table, it remains possible that confounder-event risk ratios are high or low solely due to chance and, thus, that confounders are inappropriately selected or omitted.

We observed that the original hd-PS algorithm with no correction performed optimally when there were 50 or more exposed patients with an event. In cases when there were 25–49 exposed events, adding the zero correction in certain cases aided the selection of variables for the hd-PS and consequently seemed to improve confounding adjustment, but using the exposure-only selection technique in these situations provided more reliable results across all examples. The SSRI Study, which had only 7 exposed events overall, did not have sufficient information for hd-PS adjustment to function optimally in the samples. The full cohort estimate may also be underadjusted.

A second issue with few events is that small sample bias may result in overestimation (

38). Including indicator terms for each decile of propensity scores yields 9 variables in the outcome model, which by usual calculations would call for 90 or more exposed patients with an event (

34). We attempted to address this issue in a sensitivity analysis in which we used continuous propensity scores rather than decile indicators in the outcome model. This approach makes strong assumptions about the functional relation between propensity score and outcome, but in line with findings that the assumptions are likely to be more of a theoretical concern than a practical one, (

39,

40) we observed results closer to the referent value in the smallest sizes of the non-SSRI studies. Overall, however, the decile-based exposure-only selection still offered equal or better performance in these small studies. The flexible functional form of the 9 indicators leads us to favor a decile-based approach where possible.

We also sought to find an optimal number of empirically selected covariates to include in the propensity score model. We observed that at

*k* ≈ 300, we had achieved the majority of the confounding control that the algorithm had to offer, particularly in the larger studies. In these larger studies, addition of more empirically selected covariates had no appreciable effect on estimation. One concern about using large values of

*k*—overfitting of propensity score models—is not warranted, as the propensity score is meant to be descriptive of the data at hand but not to be generalizable to other data sets (

41). Another concern—including instruments in the propensity score that may amplify the effect of unmeasured confounders (“Z-bias”) (

42,

43)—was allayed by evaluating the output produced by the algorithm that alerts to variables that have a strong association with the exposure but a very weak association with the outcome. On the basis of this output, we removed several potential instruments before beginning the analyses described in this paper. If any instruments remained, their potentially harmful effect was likely to have been outweighed by the beneficial effects of improved confounding control.

A third concern—that the selected variables may have been intermediates or colliders—was mitigated in part by our choice of an incident-user design (

29). This design imposes the constraint that all observed exposures are the first observed exposures after at least 1 year of nonuse and, thus, that all covariates measured at or before baseline have occurred prior to any exposure. An incident-user design or its equivalent should be considered in any study utilizing hd-PS. However, it is possible that colliders remained. Conditioning on a collider associated with 2 or more unmeasured confounders, but not itself a confounder for the exposure/disease association under study, could lead to “M-bias” (

44). Although preliminary research shows that the resulting bias may be small (

45), removing all colliders is not possible. In either an automated or investigator-driven approach, it is virtually impossible to distinguish colliders from confounders: There is no test to distinguish the 2 cases, and in a complex study, a variable that is a collider on one pathway may well be a confounder on another. In our study, we opted to take a pragmatic approach and acknowledge but not act upon this potential bias. We feel that hd-PS adjustment can be an important source of bias reduction, with the vast majority of selected covariates serving to improve validity.

The goal of our study was to describe the functionality of hd-PS in 4 real-world pharmacoepidemiology studies with varying cohort sizes. The empirical nature of the resampling experiment is a strength and limitation; although it uses real-world settings to explore the reliability of hd-PS in small samples, a fully specified simulation could have explored more extreme settings than those that we observed and would have provided true odds ratios. Any such simulation would require the multilevel interdependency of covariates present in data collected from health-care settings. Further, when evaluating the overall performance of the hd-PS, we had to rely on subject matter expertise to judge whether the hd-PS-adjusted point estimates were closer to the true value of the association than was the conventionally adjusted point estimate. We believe that using the odds ratios observed in the full cohort as referent values provided reasonable evaluations of the variable identification and prioritization process at smaller sample sizes.

This study furthered our understanding of how the hd-PS algorithm functions in real-world study situations, and it strengthened the evidence that hd-PS is a valuable addition to the epidemiology toolbox. With the results of this evaluation, we feel confident in recommending hd-PS for many study situations. The approach that considered just the covariate-exposure association and, to a lesser extent, the zero-cell correction was beneficial in cases of small study sizes. Both are now options in version 2 of the hd-PS algorithm.