A simulation study was carried out to evaluate the finite sample properties of our proposed estimator (

_{loc,eff},

_{loc,eff}). We conducted seven experiments, each consisting of 1000 repetitions so that 95% confidence intervals were expected to cover the true parameters with roughly a 1:3% margin of error. We generated independent vectors

*W*_{i} = (

*Z*_{i}, X_{i}, S_{i}(0),

*S*_{i}(1),

*Y*_{i}(0),

*Y*_{i}(1)),

*i* = 1,…,

*n*, where

*n* = 2000 or 20000 depending on the experiment, so that conditions (1), (2), 1–4, (9) and (8) would hold with

where the values of

and β are given below, and

This set-up implies that treatment

*Z* has indeed no effect on the conditional mean of the outcome in the subpopulation

_{E,E}. In addition, under our data-generating process the distributions of

*Y*(0) and

*Y*(1) given

*X* in the subpopulation

_{E,E} are the same and equal to

*N*{

*m*(

*z,X*; γ*), 1}. The distribution of

*Y*(0) in the subpopulation

_{E,Ē} is normal with mean

*m*(

*z,X*; γ*) − β and variance 1. Our parameter values were chosen so that roughly 70% of the subjects with

*S*(0) = 1 and

*X* = 38 (with 38 being the mean of

*X*, as indicated next) would also have

*S*(1) = 1, i.e. so that Pr{

*S*(1) = 1|

*S*(0) =1,

*X* = 38}≈0:7.

For each

*i* = 1,…,

*n, Z*_{i}, X_{i} and

*S*_{i}(0) were generated independently as

*Z*_{i} ~ Bernoulli(0.5),

*X*_{i} ~

*N*.(38,36) and

*S*_{i}(0) ~ Bernoulli.(0.25) (the distribution of

*X* was chosen to resemble that of age in the study that is described in Section 6). For units with

*S*_{i}(0) = 1,

*S*_{i}(1) was generated from a Bernoulli distribution with success probability equal to

and, depending on the experiment,(

) = (−2.2,0.1), (−5.5,1), (−9.9,3). For reasons that are explained below, we also conducted an additional experiment with

*n* = 2000 and
(

) = (−8.8,2). The values of β were chosen to reflect little, moderate and serious differences in the distributions of the outcome

*Y*(0) in the subpopulations

_{E,Ē} and

_{E,E}:We avoided the value β = 0 since, under such value, our proposal reduces to standard analysis based on weighted least squares among subjects with

*S* = 1. For units with

*S*_{i}(0) = 0,

*S*_{i}(1) was set to 0. Given

*S*_{i}(0),

*S*_{i}(1),

*X*_{i} and

*Z*_{i}, (

*Y*_{i}(0)

*Y*_{i}(1)) were generated as follows. First we generated

Then, we set

*Y*_{i}(0) =

*Y*_{i}(1) =

if

*S*_{i}(0) =

*S*_{i}(1) = 1,

*Y*_{i}(0) =

and

*Y*_{i}(1) =* if

*S*_{i}(0) = 1 and

*S*_{i}(1) = 0, and

*Y*_{i}(0) =

*Y*_{i}(1) = * if

*S*_{i}(0) = 0. Finally, we set

*S*_{i} =

*S*_{i}(

*Z*_{i}) and

*Y*_{i} =

*Y*_{i}(

*Z*_{i}). It can be easily seen that the data-generating process in our simulation satisfies the conditions that were described in the preceding paragraph.

and report the results for inference about the parameters γ = (γ

_{0}, γ

_{1}, γ

_{2}, γ

_{3}) and α = (α

_{0},α

_{1}) respectively of the models (14) for

*n* = 20000 and

*n* = 2000. Each table reports the Monte Carlo mean (labelled ‘Mean’) and median (labelled ‘Median’) of the estimators, the Monte Carlo coverage probability of nominal 95% Wald confidence intervals (labelled CP) and their median length (labelled ‘Length’). and report results for the following estimators: the naïve, OLS, estimator of

*Y* on

*Z* and

*X* based on observations with

*S* = 1 (labelled OLS), the inefficient estimator of (α, γ) (labelled INE) solving

equation (11) that uses

where

*b*(

*X*) was chosen so that

and the locally efficient estimator under (correctly specified) working models that assume a logistic model for Pr(

*S* = 1|

*Z* = 0,

*X*) and normal distributions with variance equal to 1 for

*f* *(

*Y*|

*X*) and

*f*(

*Y*|

*S* = 1,

*Z* = 1) (labelled EFF). All estimators were computed under the true value of β. The starting values of γ and α for the algorithm solving the inefficient estimating equation were set at the OLS estimator of γ and at α = (0, 0). The resulting estimates were used as starting values for the algorithm solving the locally efficient estimating equation. When

*n* = 2000, the inefficient estimation algorithm did not converge in 0.4–14.4% of runs, depending on the value of β (more frequently for the larger values of β), because the algorithm failed to find a root of the estimating equation. These runs were discarded and replaced with new runs. In each of the faulty runs we examined whether the algorithm solving the locally efficient estimating equation converged when started at values of γ and α near the true values. In all the runs that were investigated, the algorithm converged. We therefore attribute the lack of convergence of the algorithm solving the inefficient equation to the poor choice of function

*d*(

*X*). We do not regard this failure as serious, since, faced with it, an investigator would try different

*d*(

*X*) and different starting values until the algorithm converged.

| **Table 2**Mean and median of (_{0}, _{1}), coverage probability CP and median length of the associated 95% confidence interval in two 1000-run simulation studies with randomization probability *P*(*Z* = 1) = 0.5, probability of the event (more ...) |

Our results for

*n* = 20000 confirm that the properties that are established by theorem 3 hold. The naïve OLS estimator was biased, more so as the value of β departed from 0. When β = 3, this bias was sufficiently severe to reverse the sign of

_{0}. OLS estimators of the mean shift parameter γ

_{2} were significantly far from 0 even when β = 1. Also, as predicted by the theory, both the efficient and the inefficient estimators were unbiased. Coverage probabilities of the 95% confidence intervals were close to the nominal value, and efficiency, as measured by the median interval length, was somewhat better when the locally efficient estimator was used to centre the intervals, the gains in efficiency being more pronounced for estimation of α. Curiously, intervals that are centred at the inefficient estimator had poor coverage when β = 3 whereas the coverage was substantially improved when the intervals were centred at the efficient estimator. No significant gains in efficiency were obtained from the locally efficient estimator when β was 0.1. This came as no surprise since this value, accounting for the variance in

*Y*, is very close to 0. At β = 0, it can be easily shown that

*d*_{eff} (

*X*) =

*d*(

*X*) given in

equation (15), and ω does not depend on

*Y*, thus resulting in algebraic identity between the locally efficient and inefficient estimators. Results for

*n* = 2000 were qualitatively similar to those for

*n* = 20000, except that for β = 3 both the inefficient and the efficient estimators were biased and the coverage probability of the interval was poor. This demonstrates that the asymptotic distribution (

,

) is not a good approximation to the finite sample distribution of (

,

) when β is large even if

*n* is moderate. The extra experiment with β = 2 is meant to show that the asymptotic distribution is still a good approximation when

*n* = 2000 at this value of β.