This analysis utilizes a dataset of HIV treatment costs collected from 54 HIV treatment sites across six countries. These data were collected as part of a multicountry costing study conducted in Botswana, Ethiopia, Nigeria, Uganda and Vietnam (9 sites per country), as well as a more recent study conducted in Mozambique (11 sites) that utilized the same costing methods. Two Ugandan sites were excluded from the analysis due to a lack of adequate patient volume data. All sites included in this dataset were out-patient clinics providing free treatment for HIV-infected individuals, and all relied on resource support from a mix of funders, including PEPFAR. Most sites were attached to a larger health facility in some way, with only 8 out of 54 sites being stand-alone clinics. Data collection adopted a comprehensive provider perspective, including the costs of personnel, drugs and other clinical supplies, laboratory supplies, other supplies, travel, utilities and building costs, training and supervision, equipment and renovation/construction. This perspective included the costs of any regular technical assistance supervision, M&E, and management support to the site, but excluded higher-level overhead costs incurred at a regional or central-level which could not be attributed directly to the site. Data were collected with a modified macro-costing approach, whereby total site-level costs were estimated for each patient type. These totals were divided by patient volume (no. patient-years) for each patient type to estimate the average cost per patient. For each site, data were collected retrospectively to cover the full duration of site activities from the time when sites began scaling up to provide HIV treatment to the time of data collection. Standardized data collection tools were used to extract information from accounting records, prescribing logs, equipment inventories, and routine reports, as well through structured interviews with site personnel. Individual cost items were coded to allow disaggregation by program activity, budget category and funder, and costs were broken into successive 6-month periods for analysis. Cost analyses included all patients receiving HIV treatment at study sites, including both ART and pre-ART patients, with data on patient volume drawn from routine program reporting records. By the end of the evaluation, a total of 76,416 ART patients and 95,538 pre-ART patients were receiving HIV care at study sites. The studies which collected these data received institutional review board clearance and data collection was conducted with the approval of the Ministry of Health in each country. Further detail on data collection methods is reported in Menzies, et al 
The subject of the analysis is the annualized per-patient cost of service delivery, excluding ARV expenditures. The costs of ARVs were excluded from the analysis as these can be better explained by global drug commodity trends, regimen distributions and national price levels, and are not principally driven by site-level factors. Following standard practice, primary data on resource use were converted to economic costs, with investments annualized over their useful life at a 3% discount rate, and donated items valued at market prices 
. Overheads and shared costs were allocated by direct allocation 
and the opportunity costs of existing infrastructure were estimated as the equivalent rental cost. All costs were converted to U.S. dollars using prevailing inter-bank exchange rates and inflated to current prices. Results are reported in 2010 U.S. dollars.
The duration for which data were available varied between sites, from 6 to 36 months, providing between 1 and 6 six-month periods for analysis (mean
3·0 periods per site). In addition, cost data were available for five distinct patient types: pre-ART patients, newly initiated adult ART patients, established adult ART patients, newly initiated pediatric ART patients, and established pediatric ART patients. Pediatric and adult patients were those aged 0–15 and >15 years respectively, and newly initiated and established patients were those who had received ART for 0–6 and >6 months, respectively.
Possible explanatory variables were divided into distal determinants and proximal determinants. Distal determinants described general features of the site (i.e., location, health system level, type of administration) that might play a role in determining the operating characteristics of the site and thus influence costs. Proximal variables described site operating characteristics (i.e., site maturity, patient volume, frequency of clinical and laboratory monitoring, comprehensiveness of care services provided, staffing structure, percentage of spending devoted to management and administration, and log per-capita GDP as an indicator of price levels). The first part of the analysis focuses on proximal determinants, the second part focuses on distal determinants. provides descriptions for explanatory variables included in the analysis.
The dataset has a complex structure and a generalized linear mixed model (GLMM) was adopted for the analysis, with a log link function, and random effect terms used to account for clustering at country, site, and time period level. Fixed effects were also included for each patient type. The dataset includes a total of 692 observations, however the effective sample size is smaller than this suggests due to the clustering at site and time period level. The model was estimated using Markov Chain Monte Carlo (MCMC) simulation implemented with R
statistical computing software 
The estimates produced by the GLMM regression relate to log-transformed costs, and care must be taken when interpreting coefficient values. Individual regression coefficients have a non-linear relationship with the raw per-patient cost, such that a unit increase in a particular predictor xi
(with regression coefficient βi
) results in an average per-patient cost that is
of its original value, all other values being held equal. For this reason a series of first differences was calculated to investigate the implications of changes in site characteristics for the average per-patient cost, by simulating the absolute and percentage change in per-patient costs resulting from the change in one explanatory variable, all other variables being held at their mean value.
Direct retransformation of logged estimates can yield biased results 
, so estimates of the absolute per-patient cost were derived by sampling from the posterior distribution of the regression coefficients and taking the mean of the exponent of these sampled values, with 95% confidence intervals calculated as the 2·5th
percentiles of the exponentiated values.
A similar approach was used to calculate estimates of the annual per-patient cost for each patient type. For a given patient type, we set the patient type dummy variables to their appropriate value for that patient type, as well as setting the clinic visit frequency and CD4 count frequency variables to their subgroup-specific means, as both of these variables differ by patient type. All other variables were set to their global mean (mean across all observations), and the mean and confidence intervals for the annual per-patient cost calculated by simulating from the posterior distribution of the regression coefficients, as described above. To calculate total per-patient costs (including ARVs) we used current drug prices and regimen distributions for each country derived from the WHO Global Price Reporting Mechanism 
, with a 8.3% mark-up for transportation and other supply-chain management costs 
Exploratory analyses revealed that the size of the treatment program (as measured by patient volume) was strongly related to per-patient costs, with larger sites exhibiting substantially lower costs than smaller sites when controlling for other covariates. As a consequence, the per-patient cost calculated as an average across sites will be larger than the same statistic calculated as an average across patients. For an audience interested in budgeting and resource planning, it is intuitive that total funding requirements can be calculated by multiplying total patient volume by some measure of the average per-patient cost. For this purpose calculating the average cost across sites will give a biased (over)estimate of total costs, and the ‘patient-average’ cost alone is appropriate. For this reason, all dollar-valued results were calculated using this patient-average approach. This approach differs from prior analyses, which gave equal weight to each site when calculating summary statistics 
It was hypothesized that the effect of the distal determinants (location, health system level, type of administration) on per-patient costs would be mediated, in whole or in part, by their influence on the proximal determinants. For this reason three different model specifications were used to investigate the influence of the distal determinants. First, a parsimonious model was fit including only the distal determinants. A second model was then fit including these variables as well as variables relating to site maturity and patient volume. Finally, a full model was fit including the distal determinants as well as all proximal determinants. All regression models were implemented using the GLMM framework described above.