In our Monte Carlo simulations, we found that, on average, increasing the number of untreated subjects matched to each treated subject increased the bias of the estimated treatment effect; conversely, it tended to result in increased precision. When using nearest-neighbor matching, we found that MSE was minimized in 67.7% of the 96 scenarios when 1 untreated subject was matched to each treated subject. For either matching method, MSE was minimized in at least 84% of the scenarios when either 1 or 2 untreated subjects were matched to each treated subject. These findings suggest that in the majority of settings, using 1:1 or 2:1 matching will result in optimal estimation of treatment effects when employing fixed *M*:1 matching. When using caliper matching, we observed that the optimal value of *M* tended to decrease as the proportion of treated subjects increased.

There is a paucity of explicit research into the effect of increasing the number of untreated subjects matched to each treated subject. Imbens suggested that “within the class of matching estimators, using only a single match leads to the most credible inference with the least bias, at most sacrificing some precision” (20, p. 14). The results of our extensive Monte Carlo simulations provided confirmation of this suggestion: We observed increased bias as the number of untreated subjects matched to each treated subject increased. Furthermore, when using nearest-neighbor matching, MSE was minimized in 67.7% of the scenarios when 1:1 matching was employed.

Rosenbaum and Rubin (

28) examined the bias due to incomplete and inexact matching when matching treated and untreated subjects on a set of baseline covariates. Incomplete matching occurs when there are treated subjects for whom no appropriate untreated subjects are identified. Inexact matching occurs when a treated subject and an untreated subject whose covariates are not identical are matched. In an empirical example, Rosenbaum and Rubin show that the bias due to incomplete matching can be substantial (

28). Thus, in conventional matching, an important issue is not whether there are unmatched untreated subjects; rather, a much more important issue relates to whether there are unmatched treated subjects. Rosenbaum and Rubin suggest that, rather than use exact matching and risk bias due to incomplete matching, one can match using a multivariate nearest-neighbor method (such as the propensity score) and thus avoid biases due to incomplete bias (

28). The resultant cost is only minor bias due to inexact matching. In the current study, estimation using nearest-neighbor matching would not have suffered from incomplete matching bias because sufficient matches were found for all treated subjects, since no constraints were placed upon the maximum difference in propensity score between treated and untreated subjects in the same matched set. However, estimation using caliper matching may have suffered from incomplete matching bias to a limited extent, since no matches may have been found for some treated subjects due to the constraint that the difference in the logit of the propensity score between treated and untreated subjects was required to not exceed a maximal value.

In the current study, we examined the impact of the number of untreated subjects matched to each treated subject in the context of propensity-score matching. The issue of how many subjects to include in a matched set has received greater attention in case-control studies. In case-control studies, cases (subjects who experience the outcome of interest) are matched with controls (subjects who did not experience the outcome of interest). Ury demonstrated that “the theoretical efficiency of a 1:*M* case-control ratio for estimating a relative risk of about 1, relative to having complete information on the control population (*M* = ∞), is *M*/(*M* + 1). Thus, 1 control per case is 50% efficient, while 4 per case is 80% efficient” (35, p. 169). Thus, in case-control studies, increasing the number of controls matched to each case results in improved efficiency; however, the relative gains in efficiency are minor once *M* exceeds 5 or so. In contrast, when using propensity-score matching, there is a trade-off between bias and variance that does not exist in case-control studies. We have shown that, in many settings, the trade-off can be optimized by matching either 1 or 2 untreated subjects to each treated subject. In only a very small minority of settings was using 5 untreated subjects per case optimal.

We have demonstrated that increasing the number of untreated subjects matched to each treated subject can result in increased bias in estimating treatment effects. However, there are additional limitations to having more than 1 untreated subject matched to each treated subject. In particular, it can make estimation of the variance of the estimated treatment effect more difficult. For instance, when outcomes are binary, McNemar's test can be used to compare the proportion of successes between the 2 treatment groups when 1:1 matching is employed. However, when multiple untreated subjects are matched to each treated subject, it is unclear how the statistical significance of the risk difference should be determined.

We have examined criteria for determining the optimal number of untreated subjects to match to each treated subject when using fixed

*M*:1 matching on the propensity score. There are alternatives to

*M*:1 matching that we have not examined in the current paper because of space constraints. Ming and Rosenbaum (

36) demonstrated that matching with a variable number of controls can reduce bias substantially in comparison with matching with a fixed number of controls. Furthermore, we have not considered full matching, in which multiple untreated subjects are matched to each treated subject or multiple treated subjects are matched to each treated subject, resulting in all subjects being included in a matched set (

37–

39). Full matching may often have superior performance to fixed

*M*:1 matching (

38). In this study, we have focused on

*M*:1 matching because of its more frequent use in the medical literature.

In summary, we recommend that, in most settings, researchers match either 1 or 2 untreated subjects to each treated subject when using fixed *M*:1 matching on the propensity score. Using only 1 untreated subject for each treated subject will tend to minimize bias. In some settings, attempting to match 2 untreated subjects to each treated subject will result in improved precision without a commensurate increase in bias.