|Home | About | Journals | Submit | Contact Us | Français|
To support decision making many countries have now introduced some formal assessment process to evaluate whether health technologies represent good ‘value for money’. These often take the form of decision models which can be used to explore elements of importance to generalisability of study results across clinical settings and jurisdictions. The objectives of the present review were to assess: (i) whether the published studies clearly defined the decision-making audience for the model; (ii) the transparency of the reporting in terms of study question, structure and data inputs; (iii) the relevance of the data inputs used in the model to the stated decision-maker or jurisdiction; and (iv) how fully the robustness of the model's results to variation in data inputs between locations was assessed.
Articles reporting decision-analytic models in the area of osteoporosis were assessed to establish the extent to which the information provided enabled decision makers in different countries/jurisdictions to fully appreciate the variability of results according to location, and the relevance to their own.
Of the 18 articles included in the review, only three explicitly stated the decision-making audience. It was not possible to infer a decision-making audience in eight studies. Target population was well reported, as was resource and cost data, and clinical data used for estimates of relative risk reduction. However, baseline risk was rarely adapted to the relevant jurisdiction, and when no decision-maker was explicit it was difficult to assess whether the reported cost and resource use data was in fact relevant. A few studies used sensitivity analysis to explore elements of generalisability, such as compliance rates and baseline fracture risk rates, although such analyses were generally restricted to evaluating parameter uncertainty.
This review found that variability in cost-effectiveness across locations is addressed to a varying extent in modelling studies in the field of osteoporosis, limiting their use for decision-makers across different locations. Transparency of reporting is expected to increase as methodology develops, and decision-makers publish “reference case” type guidance.
Healthcare systems are increasingly under financial pressure to optimise the use and allocation of the available resources. To support decision making many countries have now introduced some formal assessment process to evaluate whether health technologies represent good ‘value for money’ . Decision analytic modelling is widely used in health technology assessment to evaluate the effectiveness and cost-effectiveness of alternative options under conditions of uncertainty [2, 3]. This form of modelling is necessary in situations where a single primary source of data (e.g. a randomised trial) does not wholly satisfy the data needs of a decision problem, and additional data sources and assumptions are needed. These situations include synthesising information when a number of estimates of a particular parameter exist, extending the results of a short-term trial over a long-term time horizon, increasing the range of alternative treatment strategies being compared and adapting the results of a study undertaken in one location to be relevant to another jurisdiction . It is the latter that this paper is concerned with.
The use of decision models to explore issues of variability and generalisability in cost-effectiveness between locations is of interest for two reasons. Firstly, models are often used to make the results of a particular patient-level study, such as a randomised trial, which was undertaken in one location relevant to one or more alternative locations. Secondly, geographical variability can generally be dealt with in published modelling studies, since the general purpose of models is to identify optimal solutions to specific decision problems faced by particular decision makers or jurisdictions. For example, the National Institute for Health and Clinical Excellence's (NICE) technology appraisal process often includes decision models submitted by manufacturers and/or developed by the academic Technology Assessment Team (see www.nice.org.uk). In each case, the decision maker (the NICE Appraisal Committee) and the jurisdiction (NHS) are explicit. An important feature of economic models is that they relate to the policy maker(s) and jurisdiction(s) whose decision the model is designed to inform. In published journal articles these are not always explicitly stated, in spite of such information being paramount to an assessment of the generalisability of study results to other decision makers. Therefore, it is of value to review published modelling studies to assess how clearly the authors have identified the decision making audience of their work, the extent to which the data incorporated into these models are the most appropriate for that decision maker and the degree to which they have assessed the importance of any variability between locations within a particular jurisdiction.
In recent years, a significant contributions in the area of generalisability has been published, much of which are in the statistical area of cost-effectiveness analysis.[5-13] There has been little evidence in the area of conceptual analysis and this article attempts to fill this gap. Rather than select a sample of model-based studies across a range of diseases and interventions, the particular clinical area of osteoporosis has been selected for the review. This clinical area has been selected for two reasons. Firstly, a large number of cost-effectiveness models are available since the majority of cost-effectiveness studies in osteoporosis are model-based. In a review of economic evaluations in this field undertaken in 1998, a total of 16 studies were identified, all of which were based on decision analytic models . Secondly, there is likely to be pronounced variability between countries and other locations in many of the parameters going into these models. This variability potentially relates to baseline event rates, resource use, cost and utility .
Osteoporosis is characterised by low bone mineral density (BMD) and deterioration of bone tissue, leading to enhanced bone fragility and consequent increase in fracture risk. The disease is manifested in terms of a high occurrence of hip, wrist and vertebral fractures, and is most prominent in post-menopausal women. Development of fractures is a complex function of osteoporosis, age and other risk factors which evolve over time. The majority of trials in this area have evaluated the impact of osteoporosis treatments on the intermediate endpoint of BMD . Historically, there has been a scarcity of literature evaluating final endpoints . The need to estimate links between intermediate end-points and ultimate measures of health gain, together with the requirement to synthesise results from several studies and extrapolate over a long-term time horizon, are the main reasons for the preponderance of decision models in osteoporosis.
The aim of the review was to assess how published model-based economic evaluations in the field of osteoporosis deal with variability in results by location and decision-maker.
The review set out to address the following specific objectives:
The inclusion criteria were full economic evaluation models evaluating therapeutic interventions in osteoporosis. Only studies reporting a summary measure of cost-effectiveness (e.g. cost-effectiveness ratio) were included in the review, as these studies combine an estimate of both costs and effectiveness and present an overall assessment of the value for money of the alternative interventions of interest to decision-makers. Since this was a review where methodology rather than results was of primary interest, only studies published in English were included. Economic evaluations which did not describe the structure and assumptions underlying a model were excluded from the assessment, as were simple cost analyses, secondary reviews of economic evaluation models and studies that did not present a summary measure of cost-effectiveness.
The identification of articles for this review was undertaken in 2001, as part of a NHS HTA R&D funded project on generalisability which commenced at that time. A search strategy was devised to retrieve published papers reporting economic evaluation models of interventions for osteoporosis. The searches were conducted using the following bibliographic databases: MEDLINE, EMBASE, EconLit, Health Economic Evaluation Database (HEED), the internal catalogue of the NHS CRD/CHE information service, the Health Technology Assessment (HTA) Database and the administration version of the NHS Economic Evaluation Database (NHS EED) held at the NHS CRD. The databases were searched from 1980 until 2001, or from the earliest publication in the relevant database after that date, and searches were restricted to English language documents. All the references from the database searches were imported into an Endnote Library and de-duplicated. The search strategies are detailed elsewhere . Searches were extended to bibliographies of retrieved articles, and reference lists of key review articles in the area were scrutinised. Papers that appeared relevant to the review were retrieved and assessed according to the inclusion criteria.
A data extraction tool was developed specifically for the purpose of this review. For included studies, information relevant to the review was extracted into a data extraction form by one of the authors (HU). The information was summarised in data tables, which provided the basis for assessment of the studies. Key characteristics of the models presented in each publication were recorded, as was information on interventions studied and results of the evaluations. The four numbered sections of the data extraction form addressed the four research questions of the review, and further details are provided below.
Being aware of the target decision-making audience for a model is important to a judgement about the appropriateness of the model and its inputs, and the review attempted to elicit the target decision-making audience or jurisdiction. In some instances, where models did not explicitly state the target decision-maker, it was possible to infer a decision-maker from the perspective taken and the data incorporated. In addition, the studies were assessed according to whether the study question was clearly stated or not, or whether the study question could be easily inferred.
Transparent reporting of a model is a prerequisite to understanding the relevance of the model to the target decision-maker, as well as to assessing its generalisability to other decision-makers and jurisdictions. The specification of study setting (e.g. country, and primary or secondary care) and patient population was therefore extracted. In addition, the description and justification of alternative interventions was considered, and an assessment of transparent reporting of the model structure and key assumptions was made.
The ease with which model inputs can be traced, and the relevance of those inputs to the stated decision-maker, will influence the degree to which a model is considered applicable in the target setting. The reporting of sources and the relevance of key data inputs to the model was therefore assessed, ranging from clinical data and their valuation to resource use and unit costs. Models that reported and referenced both baseline risks and risk reductions were considered as having provided sources of clinical data, whereas models that only reported references for risk reduction were assessed as having partially provided sources of clinical data.
The use of sensitivity analysis to explore the robustness of model results to variation in data inputs that may exist within and between jurisdictions was assessed. It has been extensively argued  that resource use estimates and their valuation, as well as health state estimates and their valuation, may vary across settings. This variation may exist between units within a given decision maker's jurisdiction  and, to provide information to the decision maker, the implications of such variation should be assessed. The authors may also choose to assess the robustness of their model's results to the level of variation that might be expected between jurisdictions. For example, the average length of stay in hospital for patients with hip fracture has been reported to be 29.6 days in Aberdeen and 41.7 days in Peterborough , whereas the national average in Denmark has been reported to be 21 days .
Similarly, the estimate of clinical effect incorporated into a model may depend on the study population and target population of the model. The robustness of the model's results under a range of clinical effectiveness estimates from different studies could, therefore, be explored in sensitivity analysis. Models may also undertake adjustments to translate the results of explanatory trials that may not hold in routine clinical practice. For example, compliance is generally acknowledged to be higher within the context of randomised trials than in clinical practice, and this may contribute substantially to the reduction in efficacy when an intervention is used in a non-trial environment [20, 21]. Reduced compliance in a clinical practice setting may result in reduced effectiveness of the drug, so models that evaluate the population-based impact of a strategy in clinical practice may provide a more representative estimate by factoring in the reduced compliance in the analysis .
Finally, the articles were reviewed with respect to whether the authors commented that results would be relevant to address the same decision-problem within the different jurisdictions.
A total of 18 publications reporting economic evaluation models satisfied the inclusion criteria (Table 1). These included four Markov state transition models [17, 23-25] and four simple decision trees [26-29]. Nine studies were cost utility analyses. Of the studies published in the 1980s, six were from the United States. Eight studies were from European countries, all of which were published in the 1990s. The studies covered the following countries: Australia, Canada, Denmark, UK, US, and Italy. Apart from the Italian study , all results were presented in local currencies. Many models based the evaluation on a time horizon spanning the lifetime from the onset of treatment at menopause, usually assumed to be 45 or 50 years of age [17, 19, 24, 25, 30-35]. Others evaluated a more limited time horizon, for example 2 years in Francis et al.,  3 years in Rosner et al.,  3-4 years in Torgerson et al.  and 1 year in Visentin et al. .
Overall, the main base-case results of the studies did not reveal any systematic differences across the studies that might be explained by location (Table 1). Neither systematic variation in cost-effectiveness estimates within countries or systematic changes over time were apparent from the review. Despite focusing on the osteoporosis area, the models compared a range of interventions and presented results using a variety of outcomes. For example, Ankjaer-Jenssen et al.  reported average cost per hip fracture avoided in screened and unscreened populations, comparing three different interventions. In contrast, Tosteson et al.  compared costs and QALYs of two different interventions in patient populations with different life expectancy.
None of the studies evaluated cost-effectiveness from a broader perspective than that of the health service sector. The target decision-making audience was only explicitly stated in three (16%) of the included studies (Table 2). Specifically, Coyle et al. stated the decision maker to be “a Canadian provincial Ministry of Health”,  OTA stated that the report was commissioned by the US Senate Special Committee on Ageing,  and Visentin et al. commented that the evaluation was targeting the Italian Health Service .
It was, however, possible to infer a target decision-maker for many of the remaining studies. For example, Ankjaer-Jenssen et al. commented that the analysis was “carried out in a Danish context”;  and the study by Cheung et al. seemed to target a decision-making body in New South Wales, Australia . Similarly, Daly et al. and Torgerson et al. appear to tailor their analyses to be applicable in a British context without explicitly stating this [27, 31]. It was not possible to infer a decision-making audience in eight studies.
The research question was explicitly stated in seven studies (39%) (Table 2). For example, Coyle et al. expressed the economic study question as “…to assess the cost-effectiveness of nasal calcitonin compared with no therapy, alendronate or etidronate in the treatment of post-menopausal women with previous osteoporotic fracture” . For those studies that explicitly stated the strategies under comparison, and the outcome, at the outset, it was possible to infer the study question. For example, Torgerson et al. did not explicitly state a research question but compared the (average) cost-effectiveness of screening followed by HRT treatment versus no screening and universal treatment .
The vast majority of the studies specified that the target population was women living in the community, though one study specifically considered women in nursing homes , and the target country could be inferred for all studies. The target populations were also indicated for all models. Gender, post-menopausal status or age above 50 were used in most studies to identify the study population. Susceptibility to osteoporosis, either via previous fracture  or low bone mass density,  were also used as descriptive factors. Some models evaluated hysterectomised women separately from those with uterus in situ [23, 27, 31, 32, 35]. Two studies restricted the analysis to apply to caucasian women only [23, 26] due to an underlying difference in baseline hip fracture risk between ethnic groups.
The model structure was described in most of the studies through an outline of all clinical outcomes incorporated into the structure or, where relevant, health states and transitions. One study did not present the structure adequately : it was, for example, unclear how this model estimated hip fracture risk reduction as a consequence of treatment.
The main assumptions were clearly stated and justified in most studies. For example, Coyle et al.,  OTA  and Tosteson et al.  clearly presented all assumptions in the model as well as omissions from the models. The study by Visentin et al.  was not entirely transparent - for example, it was not clear which trials were used to populate the model.
A common feature of the models in this review was the use of epidemiological studies to estimate the relative hip fracture risk reduction in the treated populations. Three models based the hip fracture risk reduction estimates on individual clinical trial data [26, 28, 36], whereas two models based the effect estimates for one of the therapies on meta-analyses of several trials [19, 37]. Of these studies, only two [26, 37] provided information about the clinical characteristics of the population of the trials
The remaining studies based the effect estimate primarily on observational studies. One of these studies  provided details of patient characteristics in the studies on which the effectiveness estimate was based, and two further studies presented patient characteristics only in terms of age range [24, 36].
Since the majority of studies provided only limited patient information, the scope for assessing the applicability of the results to the target population - and indeed other populations - was limited. Only two studies failed to provide any basis for the assumption on clinical effect [27, 30]. In spite of the fact that limited information was provided on the population sample, the majority of the studies provided references to primary studies that are likely to have given a more comprehensive description of the relevant patient sample. Only one study adjusted the risks measured in the trials to the target population for the modelling exercise . In spite of this, most studies appeared to use the best available data relevant to the stated or inferred decision-maker (Table 3).
Of the nine cost-utility studies that were included in the review, only two used utilities derived from patients [32, 37]. Rosner et al.  used a Canadian Delphi panel and the Health Utilities Index as a basis for the utilities used in their model. All the other authors based their utility weights on those assumed by Weinstein  either implicitly or explicitly With the exception of the three studies mentioned above which used sample-based utility weights, it was difficult to assess the relevance of the utilities assumed by Weinstein to particular health care decision-makers or jurisdictions (Table 3).
The sources of resource use were explicit for the majority of the studies included in the review, only four studies omitted reporting this information. For example, resource use estimates and assumptions were explicit in the study reported by Ankjaer-Jensen et al.;  and Rosner et al. used a Delphi panel to estimate resource use . Some studies included drug costs only in the estimate of resource use [26-29]. Sixteen studies in the review reported most sources of unit costs incorporated in the analyses whereas the remaining two studies [27, 29] did not report any sources for unit cost data.
It was difficult, if not impossible, to judge whether the estimates of resource use and unit costs were relevant to the decision-maker for those studies that did not explicitly state the decision-making audience for the study (Table 3). For those studies for which a decision-maker was explicit or could otherwise be inferred, the sources of resource use were largely judged to be relevant. The target decision-maker was not clear in the studies by Tosteson et al. [24, 25] and Weinstain ; however, the resource use would have been relevant provided that the target decision-maker was Medicare. It was not clear whether resource use was relevant to the decision-maker in four studies [26, 27, 29, 33].
The studies were also reviewed to assess whether they had considered the implications of variation in input parameters using sensitivity analysis. A total of ten studies (55%) explored alternative assumptions of effect estimate in sensitivity analyses. Ankjaer-Jensen et al.  used “best case” and “worst case” scenarios in their analysis of effectiveness, and Rosner et al.  explored the 95% confidence interval boundaries for vertebral fracture rates in the model. Similarly, Cheung et al.  varied the risk of death from myocardial infarction over a range, and the OTA  varied the risk of all clinical parameters (bone loss, cancer and heart attack) in its model. Daly et al. [31, 32] varied both the magnitude and the duration of effect estimates in their sensitivity analysis, whereas Geelhoed et al.  explored the impact of HRT on different body systems (e.g. breast cancer) in the sensitivity analysis. The study by Coyle et al.  based its effect estimate on a meta-analysis of several trials, and found that the cost-effectiveness estimate was highly sensitive to the inclusion of one particular study.
These sensitivity analyses were largely undertaken to explore parameter uncertainty (e.g. due to sampling uncertainty) rather than explicitly to consider possible variability in clinical effects within or between jurisdictions. In part, this comment also applies to the two models which estimated hip fracture rates from bone mass density (BMD) [24, 38]. However, varying the population baseline hip fracture risk in these studies (for example, baseline hip fracture risk was increased by 100% and decreased by 50% in the Tosteson et al. study) would have been of interest to decision makers as adjustment of baseline risk is often used to adapt the results of models between geographical areas (for an example see adaptation of the WOSCOP study results to Belgium by Caro et al.  and the work by Palmer et al. )
Assumptions of compliance and duration of treatment were made without adjustment to clinical practice circumstances in the majority of studies in the review. For example, the OTA  assumed 100% compliance over 10, 20, 30 and 40 years; and Daly et al. assumed 100% compliance over 5, 10, 15 and 20 years. The eight studies which took this into consideration differed in their definition of lack of compliance, but often it meant simply that patients ‘declined to accept’ therapy  or that patients ‘accepted but discontinued’ therapy  (Table 5). Generally, the cost-effectiveness estimates were found to be sensitive to the assumption of compliance, but the recommended policy-decision of the studies remained unchanged (Table 5). For example, Tosteson et al.  assumed 100% compliance over 15 years in the base case model, but varied compliance to 30% in the sensitivity analysis and found that cost-effectiveness estimates were sensitive to assumption of compliance.
One study  explored the sensitivity of health state valuation. Alternative utilities that assigned a 0.1 higher utility weight to the post-fracture health state for women receiving nasal calcitonin were identified. The estimates of ICER increased as a result of the alternative utilities. Again, however, this sensitivity analysis was associated more with parameter uncertainty than geographical variation.
Only five studies contrasted their findings with other economic evaluation studies in the area (Table 4). For example, the report by the OTA  provided a comprehensive discussion of methodology, costs, clinical assumptions and results in relation to other cost-effectiveness analyses. In principle, this would have allowed decision makers to assess whether other studies in the field had incorporated more appropriate data inputs for their jurisdiction.
Most studies applied unit cost data to the analysis relevant to the country for which the study was targeted (Table 8). Three studies used regional cost estimates [23, 30, 36]. None of the models accommodated differences in treatment patterns within regions or across countries. Furthermore, none of the investigators attempted to use cost estimates applicable to a broader audience of decision-makers within or between countries by using a range of costs or treatment patterns representing geographical differences. The sensitivity of the study results to national variation in cost per fracture was only explored in the study reported by Coyle et al.  who found that their cost-effectiveness estimate was markedly reduced when using cost calculations from an alternative cost-of-illness study.
Overall, four studies explicitly commented on the issue of the generalisability of their analysis to other settings in their presentation or discussion of results (Table 6).[19, 27, 35, 37] For example, Coyle et al commented that the results were sensitive to the baseline population hip fracture risk, and for that reason the generalisability of the results were unclear.
This paper has reviewed the use of decision analytic cost-effectiveness models, in a specific clinical area, to assess a range of factors associated with the potential relevance of an analysis to a particular decision maker, and the extent to which its results might be transferable to other jurisdictions. More general issues of good methods in decision analytic cost-effectiveness modelling have been considered elsewhere [41-43], and a general critical review of cost-effectiveness models in osteoporosis has also previously been published .
We found that only 3 (16%) of the studies in this review provided details of jurisdiction and target decision-maker explicitly, although it was possible to infer the decision context from other information provided in some studies. Similarly, authors frequently report the methods and results of studies for which a firm research question has not been stated. To aid decision making, models also need to be clear about the decision problem(s) being addressed. The majority of reports in the review defined the study setting, patient population, model structure and key assumptions in a transparent manner.
Once the target decision-maker/jurisdiction and decision problem have been established, the former will need to decide whether the data inputs and assumptions in the model are the best available for their context. The majority of the reports provided sources for clinical and economic data, and their valuation. The data inputs to those models for which a decision-maker was stated or could be inferred appeared to be relevant and, as far as could be judged from only the published article, these were the ‘best available’ for the decision context. There is often an implicit assumption in models that estimates of clinical effectiveness are transferable between locations in a way that resource use and cost data are not. Perhaps reflecting this assumption, the papers in the review generally made more effort to ensure (and to be seen to ensure) that their cost inputs (at least unit costs) were specific to their target jurisdiction. Most studies were prepared to use clinical data from studies undertaken outside the context of stated decision-maker. As in other clinical areas, an exception to this assumption of the transferability of clinical data is the adjustment of baseline risks to make them specific to a particular jurisdiction (usually country) whilst assuming that the relative treatment effect is exchangeable geographically.  This adjustment of baseline risks was rarely undertaken in the sample of papers reviewed, although the effect of variation in these parameters for the generalisability of analyses was discussed in another paper .
Little attempt was made to justify the particular utilities used with respect to the target decision-maker. This may reflect the limited amount of utility data available relating to osteoporosis. In other words, the authors often used any utility data which were available – in the majority of studies this was the assumption (rather than empirical elicitation) made by Weinstein . Increasingly, decision makers will be specific about the type of health state utility data they wish to see in economic evaluations. For example, NICE has indicated that it wishes to see the use of public preferences relating to the British population in cost-utility analyses submitted to its technology assessment programme . There are two aspects to this decision-maker preference. The first is the position a given decision-maker takes on the most appropriate utility estimation methods (e.g. use of patient or public preferences). The second is the issue of whether the preferences of individuals outside the particular jurisdiction of interest are acceptable.
One aspect of the review was to assess the extent to which studies had assessed the impact of variability in parameter estimates associated with location using sensitivity analysis. In principle, this sort of analysis might be undertaken for two reasons. Firstly, there may be variability in clinical and/or cost parameters within a given jurisdiction. If there was good reason to think that this level of variability might impact on the conclusions of the analysis, sensitivity analysis would be imperative. The second way sensitivity analysis could be used is to assess whether the results of the model, as they apply to the target jurisdiction, would also apply to other locations by appropriate variation in parameter values. If it is accepted that the purpose of decision models is to address decision problems for particular jurisdiction/decision makers, then this process of generalisation should probably not be seen as an essential element of model-based economic analysis.
Although most of the studies in the review undertake extensive sensitivity analysis, few explicitly do this to explore variation between locations/jurisdictions. Only one  of the models is probabilistic; that is, reflecting the uncertainty in parameters as random distributions and propagating that uncertainty through the model, to be jointly reflected in the results, using Monte Carlo simulation . Therefore, most of the standard one-way or two-way sensitivity analysis was undertaken to assess the importance of parameter uncertainty to the results, rather than variability between locations. The failure to assess variability within and between locations may have reflected the view that parameter uncertainty dwarfed variability within the target jurisdiction, and generalising their results across locations was not a primary concern.
One area of interest in the review was how the models dealt with the issue of patients' compliance with therapy. Compliance is expected to vary between research (e.g. trial) settings and routine practice and this is therefore important when assessing generalisability to clinical practice. Incorporating compliance rates in economic analyses, for example through changing of clinical benefit parameters or deviations in cost input, have previously been demonstrated to impact the results of the analyses (Ref 14), .. Different definitions of compliance were used by the authors in this review, but, for the most part, the assumption of compliance was based on the proportion of patients that initiates therapy. The overall results of the analysis were sometimes substantially influenced by the assumption of patient compliance, and more research is needed in order to determine the optimum manner of generalising from study to clinical practice.
Whilst development of a set of recommendations for authors would be valuable, it was beyond the scope of this journal article to provide specific guidance over and above that provided by others. However, this review has highlighted important aspects to consider in reporting results of modelling studies so that decision-makers can determine generalisability, and relevance of the results in their location. For example, resource use data and costs are key determinants of relevance. Could authors in their reporting distinguish those factors that prove specific to the stated decision-problem (and decision-making audience) from those that are generalisable to other locations?
This review was limited to articles published in the English language and limited to articles published up to, and including, the year 2001 only. One may assume that single-country assessments published in a local language have a higher degree of generalisability, or relevance, to the local environment, and as such this review may have captured a biased sample of studies. . Given that the purpose of this review was to provide a conceptual analysis of an under-investigated issue in the HTA literature, i.e. the extent to which the information produced in a cost-effectiveness analysis report allows jurisdiction specific decision makers to establish the validity of the study results to their own setting, we do not believe the lack of inclusion of more recent studies reduces the importance of the findings. However, the review was undertaken before agencies such as NICE  published their “Reference Case” for conducting and reporting economic models, and one would expect the standard of reporting to increase following such influential guidance. The review was also primarily limited to published journal articles where limited word-length naturally restricts the amount of information available to the reader.
The review found that most studies either stated their target decision-maker/jurisdiction or provided sufficient information from which this could be inferred. There was a greater tendency to ensure that cost (resource use and unit cost) inputs were specific to the target jurisdiction, rather than clinical parameters. Although there was extensive sensitivity analysis undertaken in most studies to assess parameter uncertainty, there was little use of these methods to explore the implications of variability within or between locations in parameter inputs. Only four studies explicitly commented on the issue of the generalisability of their analysis to other settings in their presentation or discussion of results. In spite of some limitations, this review found that variability in cost-effectiveness across locations is addressed to a varying extent in modelling studies in the field of osteoporosis, limiting their use for decision-makers. Further guidance is needed to those publishing results of economic model, supporting the reporting of design, conduct and results of models, however, transparency of reporting is expected to increase as methodology develops, and decision-makers publish “reference case” type guidance.
This work was developed as part of a project on generalisability in economic evaluation studies in health care (98/22/05) funded by the NHS Health Technology Assessment Programme.
Financial disclosure: HU is currently employee of pharmaceutical company Merck Sharp and Dohme which has no financial interests in the outcome of the paper. HU was PhD student at the University of York when most of the work included in this paper was undertaken. AM is currently recipient of a Wellcome Trust Training Fellowship in Health Services Research. MS is Public Health Career Scientist funded by the UK NHS R&D Programme.