|Home | About | Journals | Submit | Contact Us | Français|
Prognostic studies include clinical studies of variables predictive of future events as well as epidemiological studies of aetiological risk factors. As multiple similar studies accumulate it becomes increasingly important to identify and evaluate all of the relevant studies to develop a more reliable overall assessment. For prognostic studies this is not straightforward.
Box BoxB1B1 summarises the clinical importance of information on prognostic factors. Many of the issues discussed are also relevant to aetiological studies, especially cohort ones. Some features of prognostic studies lead to particular difficulties for the systematic reviewer. Firstly, in most clinical prognostic studies the outcome of primary interest is the time to an event, often death. Meta-analysis of such studies is rather more difficult than that for binary data or continuous measurements. Secondly, in many contexts the prognostic variable of interest is often one of several prognostic variables. When examining a variable of interest researchers should consider other prognostic variables with which it might be correlated. Thirdly, many prognostic factors are continuous variables, for which researchers use a wide variety of methods of analysis.
The emphasis in this paper is on clinical studies to examine the variation in prognosis in relation to a single putative prognostic variable of interest (also called a prognostic marker or factor). A more detailed discussion can be found elsewhere.2
It is probably more difficult to identify all prognostic studies by searching the literature than it is for randomised trials, which itself is problematic. As yet there is no widely acknowledged optimal strategy for searching the literature for prognostic studies, but search strategies have been developed with either good sensitivity or good specificity (box (boxB2B2).
Epidemiological studies are more prone to publication bias than randomised trials.4 It is probable that studies showing a strong (often statistically significant) prognostic ability are more likely to be published. Publication bias has recently been shown in studies of Barrett's oesophagus as a risk factor for cancer.5
There are no widely agreed quality criteria for assessing prognostic studies. As yet there is little empirical evidence to support the importance of particular study features affecting the reliability of findings, including the avoidance of bias. As a consequence, systematic reviewers tend either to ignore the issue or to devise their own criteria. Unfortunately the number of different criteria and scales is likely to continue to increase and cause confusion, as has happened for randomised trials and systematic reviews.6–8 Nevertheless, theoretical considerations and common sense point to several methodological aspects that are likely to be important. The table shows a list of those relating to internal validity, which draws on previous suggestions.9–13
A reliable prognostic study requires a well defined cohort of patients at the same stage of their disease. Some authors suggest that the sample should be an “inception” cohort of patients early in the course of the disease (perhaps at diagnosis).9 Whereas homogeneity is often desirable, heterogeneous cohorts can be stratified in the analysis. Also, not all prognostic studies relate to patients with overt disease. An example is a study of prognostic factors in a cohort of asymptomatic patients infected with HIV.14
Both case-control and cross sectional studies may be used to examine risk factors, but these designs are much weaker. Case-control designs have been shown to yield optimistic results for evaluations of diagnostic tests, a result that is likely to be relevant to prognostic studies.15 In cross sectional studies it may be difficult to determine whether the exposure or outcome came first—for example, in studies examining the association between the use of oral contraceptives and HIV infection.
Windeler observed that summaries of prognosis are not meaningful unless associated with a particular strategy for treatment and suggested that the greatest importance of prognostic studies is to aid decisions about treatment.16 Most published checklists do not, however, consider the issue of subsequent treatment. If the treatment received varies in relation to prognostic variables then the study cannot deliver an unbiased and meaningful assessment of prognostic ability unless the different treatments are equally effective (in which case why vary the treatment?). Such variation in treatment may be quite common once there is evidence (usually non-systematic) that a variable is prognostic. Ideally, therefore, prognostic variables should be evaluated either in a cohort of patients treated the same way or in a randomised trial.12,17
The inclusion of context specific as well as generic aspects of methodological quality is sometimes sensible. For example, a review of prognosis of idiopathic membranous nephropathy included two questions on the nature of the end points, reflecting particular problems in a discipline where many studies used ill defined surrogate end points.13
In addition to internal validity some checklists consider aspects of external validity and clinical usefulness of studies. Notably, Laupacis et al included five questions relating to the clinical usefulness of a study.11 Furthermore, some checklists reasonably include items relating to the clinical area of the review. For example, in their review of the association between maternal HIV infection and perinatal outcome, Brocklehurst and French considered whether there was an adequate description of the maternal stage of disease.18
The table includes two items relating to difficult aspects of data analysis. It is important to adjust for other prognostic variables because patients with different values of the covariate of primary interest are likely to differ with respect to other prognostic variables. This procedure is often referred to as control of confounding. In contexts where much is known about prognosis, such as many cancers, it is important to know whether the variable of primary interest (such as a new tumour marker) offers prognostic value over and above that which can be achieved with previously identified prognostic variables. It follows that prognostic studies generally require some sort of multiple regression analysis. Comparison of models with and without the variable of interest provides an estimate of its independent effect and a test of whether it contains additional prognostic information.
Two problems for the systematic reviewer are that different researchers use different statistical approaches to adjustment and adjust for different selections of variables. One way around the second of these problems is to use unadjusted analyses. This approach is sensible in systematic reviews of randomised controlled trials, but in prognostic studies it replaces one problem with a worse one. Whereas the least adjusted estimate “provides the maximum opportunity for comparison of consistent estimates across studies,”19 unadjusted analyses will generally be biased.
Many prognostic variables are continuous measurements, including many biochemical and physiological measurements, tumour markers, and levels of environmental exposure. If such a variable were prognostic the risk of an event would usually be expected to increase or decrease systematically as the level increases. Keeping variables continuous can greatly simplify any subsequent meta-analysis, but most researchers prefer to categorise patients into high risk and low risk groups based on some cut-off point. This type of analysis discards potentially important quantitative information and considerably reduces the power to detect a real association with outcome.20,21 If a cut-off point is used it should not be determined by a data dependent process (such as exploring all cut-off points to find the one that minimises the P value).22
The extraction of data is an additional problem. Some authors do not present a numerical summary of the prognostic strength of a variable, such as a hazard ratio, unless the analysis showed that the effect of that variable was significant. Also, when numerical results are given they may vary in format—for example, survival proportions may be given for different time points.
Box BoxB3B3 summarises the particular difficulties for the systematic reviewer of prognostic studies. Two major concerns are the quality of the primary studies and the possibility of publication bias. Because of the likelihood of serious methodological difficulties, in general it is difficult to carry out a sensible meta-analysis without access to the data of individual patients.2,23 Many authors have concluded that a set of studies was too diverse or too poor (or both) to allow a meaningful meta-analysis. Box BoxB4B4 summarises a systematic review of prognosis in elbow disorders, which reached such a conclusion. In a systematic review of studies of the possible relation between hormonal contraception and risk of transmission of HIV, Stephenson concluded that a meta-analysis was unwise.24 By contrast, Wang et al performed such a meta-analysis on a similar set of studies, arguing that this enabled the quantitative investigation of the impact of various features of the study.25
Even when a set of published studies is of high quality there are many potential barriers to a successful meta-analysis. In essence it is desirable to compare the outcome for groups with different values of the prognostic variable. In principle it should be relatively easy to combine data from studies that have produced compatible estimates of effect with standard errors. In practice, the lack of comparable information from all studies is likely. In particular, the prognostic variable is likely to have been handled in various ways. In the simplest case, researchers may all have dichotomised but used different cut-off points. A meta-analysis is possible comparing “high” and “low” values, using whatever definition was used in the primary studies. Interpretation is difficult, because patients with the same values would be high in some studies and low in others. (This analysis will be biased if any studies used a cut-off point derived by the minimum P value method.) Studies may use different numbers of categories,26 and some may have categorised whereas others did not. Estimates derived from categorised and ungrouped analyses are not comparable.
As noted, in general it is necessary to allow for other potentially confounding variables in a meta-analysis. When time to an event is not relevant, logistic regression for both binary and continuous prognostic variables is used to derive an odds ratio after adjustment for other prognostic or potentially confounding variables. The adjusted odds ratio and confidence interval can be obtained from an estimated log odds ratio with its standard error. For a binary prognostic variable this odds ratio gives the ratio of the odds of the event in those with and without that feature. For continuous predictors it relates to the increase in odds associated with an increase of one unit in the value of the variable. Estimated log odds ratios from several studies can be combined by using the inverse variance method.27
When the time to event is explicitly considered for each individual in a study, the data are analysed with “survival analysis” methods—most often the log rank test for simple comparisons or Cox regression for analyses of multiple predictor variables or where one or more variables is continuous. By analogy with logistic regression discussed above, these analyses yield hazard ratios, which are similar to relative risks. Log rank statistics and log hazard ratios can be combined using the Peto method or the inverse variance method, respectively.27
Practical difficulties are likely to make meta-analysis more difficult than the preceding explanation suggests. Most obviously the hazard ratio is not always explicitly presented for each study. Parmar et al described several methods of deriving estimates of the necessary statistics in a variety of situations.28 For example, an estimate can be derived from the P value of the log rank test. They also explain how to estimate the standard errors of these estimates.
Several authors have proposed more complex methods for combining data from several studies of survival.29,30 All can be applied in this context if it is possible to extract suitable data, but some require even more data than the basic items just discussed. The use of sophisticated statistical techniques may be inappropriate when several more basic weaknesses exist in the data. Indeed some reviewers have had to summarise the findings of the primary studies as P values as it is difficult to extract useful and usable quantitative information from many papers.31
The principles of the systematic review should be extended to studies of prognosis, but doing so is by no means straightforward. The literature on prognosis features studies of poor quality and variable methodology, and the difficulties are exacerbated by inadequate reporting of methodology. The poor quality of the published literature is a strong argument in favour of systematic reviews but also an argument against formal meta-analysis. To this end it is valuable if a systematic review includes details of the methodology of each study and its principal numerical results.32
Although meta-analyses of published information may sometimes be useful, especially when the study characteristics do not vary too much and only the best studies are included, the findings are rarely convincing. The main outcome from such systematic reviews may be the realisation that there is little good quality information in the literature. Even an apparently clear result may best be seen as providing the justification for a well designed prospective study.33
By contrast, meta-analysis based on individual patient data is highly desirable. Among several advantages of such data it is possible to analyse all the data in a consistent manner. It may also be possible to include data from unpublished studies. Meta-analysis of the raw data from all (or almost all) relevant studies is a worthy goal, and there have been some notable examples, especially in an epidemiological setting.34 Apart from the considerable resources needed to carry out such a review, in most cases it is likely that many of the data sets are unobtainable. However, a careful collaborative reanalysis of the raw data from several good studies may be more valuable than a more superficial review that mixes good and poor studies. Two examples of such collaborative meta-analyses of raw data are a study of the relation between alcohol consumption and the development of breast cancer and a study of the relation between a vegetarian diet and mortality.35,36
Poor quality studies may distort the results of a subsequent meta-analysis. When examined critically a high proportion of prognostic studies are found to be methodologically poor.9,37 Prognostic studies are generally too small and too poorly designed and analysed to provide reliable evidence. Although some suggested guidelines have appeared,2,13 progress may depend on developing a consensus regarding main methodological requirements for reliable studies of prognostic factors, as has happened for randomised trials.38,39
As a consequence of the poor quality of research, prognostic markers may remain under investigation for many years after initial studies without any resolution of the uncertainty. Multiple separate and uncoordinated studies may actually delay the process of defining the role of prognostic markers. Systematic reviews can draw attention to the paucity of good quality evidence and, it is hoped, improve the quality of future research.
This is the last in a series of four articles