|Home | About | Journals | Submit | Contact Us | Français|
Objectives: To examine the generalisability of multivariate risk functions from diverse populations in three contexts: ordering risk, magnitude of relative risks, and estimation of absolute risk.
Design: Meta-analysis of prospective cohort studies.
Patients: Participants from various epidemiological studies.
Main outcome measure: Death from coronary heart disease (CHD).
Results: The analysis included 105 420 men and 56 535 women 35–74 years of age and free of CHD at baseline from 16 observational studies with a total of 27 analytical groups. The area under the receiver operating characteristic curve (AUC) was used to judge the ability of the multivariate risk function to order risk correctly. AUCs ranged from 0.60 to 0.80. The AUCs differed significantly between the studies (p < 0.01) but were very similar for different risk functions applied to the same population, indicating similar ability to rank risk for different models. The magnitudes of the relative risks associated with major risk factors (age, systolic blood pressure, serum total cholesterol, smoking, and diabetes) varied significantly across studies (p < 0.05 for homogeneity). The prediction of absolute risk was not very accurate in most of the cases when a model derived from one study was applied to a different study.
Conclusions: When considered qualitatively, the major risk factors are associated with CHD mortality in a diverse set of populations. However, when considered quantitatively, there was significant heterogeneity in all three aspects: ordering risk, magnitude of relative risks, and estimation of absolute risk.
Coronary heart disease (CHD) is a leading cause of death in many countries. Prospective studies around the world have identified major risk factors for developing CHD and, based on these risk factors, functions have been developed to predict the occurrence of CHD in individual patients. Although many researchers have examined whether a risk function based on a single population is valid when applied to other populations,1–12 most have involved a small number of studies and the various aspects of predictive accuracy have not been systematically examined. In this report we examine the predictive accuracy of risk functions using three successively stronger sets of criterion: ordering risk, estimating relative risk, and estimating absolute risk.
The lowest level of validity for a predictive function is its ability to rank individual patients within a population according to their risk, differentiating patients with higher risk from those at lower risk levels. The absolute level of risk is not a concern; only the ordering is important. Several reports have examined the ability of a single risk function to order risk across studies and judged the validity of the risk function by this criterion.1,2
Estimating relative risk is more difficult than ordering. A more stringent criterion would require that the model parameters relating risk factors to disease be the same in different populations. Comparisons in the literature based on this criterion have yielded conflicting results.3–12
In recent years, some treatment guidelines have endorsed a multifactorial strategy that attempts to quantify the contributions of various coronary risk factors along a continuum of risk.13–17 Aggressive preventive treatment and intervention are justified if a patient's absolute risk exceeds a certain cut off point. In this context, it is important that the prediction of individual absolute risk be valid. Studies that have compared the estimation of absolute risk across populations have reported conflicting results.3,10,18–24
We used person-level data from 16 observational studies to test the heterogeneity of the relations between CHD mortality and age and the four major risk factors (that is, blood pressure, serum cholesterol, cigarette smoking, and diabetes status). We applied the three criteria described above and examined whether a common risk function exists that is valid for predicting CHD death in these diverse samples.
The Diverse Populations Collaboration examines epidemiological results in samples from populations from many countries and cultures. The 16 studies reported here comprise national representative samples from the USA, participants from cohort studies in the USA, the Middle East, Europe, Asia, and the Caribbean, and participants from clinical trials.
In this analysis we report results using data from the NHANES I (first national health and nutrition examination survey) epidemiologic follow up study,25 NHANES II (second national health and nutrition examination survey) mortality follow up study,26 the Framingham study,27,28 the Tecumseh community health study,29 the Honolulu heart program,30 the LRC (lipid research clinics) prevalence study,31 the Tuzla cohort of the Yugoslavia cardiovascular disease study,32 the Scottish collaborative study,33 the Renfrew and Paisley study,34 the Glostrup population studies,35 the Norwegian counties study,36 the Reykjavik Iceland study,37 the Israeli ischemic heart disease study,38 the Puerto Rico heart health program,39 the control group from the HDFP (hypertension detection and follow up program),40 and the control group from MRFIT (multiple risk factor intervention trial).41 To be consistent with recent Framingham reports, we pooled data for participants who attended the 11th examination (1971–1974) of the original Framingham heart study27 and from the initial examination (1971–1975) of the Framingham offspring study28 to form a single Framingham cohort.
The underlying cause of death was determined according to the International classification of diseases, ninth revision in most of the studies (codes 410–414 and 429.2 as CHD), except for the Glostrup population studies, HDFP, and the LRC mortality follow up study, in which the International classification of diseases, eighth revision was used (codes 410–414 as CHD). Cause of death was assigned by a panel of physicians in the Framingham cohort and in the Yugoslavia cardiovascular disease study.
Requirements for inclusion in this analysis were as follows: (1) age 35–74 years at the time of the baseline examination; (2) complete data on systolic blood pressure, serum total cholesterol, cigarette smoking, and diabetes status; (3) known vital status and underlying cause of death for decedents; and (4) absence of CHD at baseline. Data for men and women were analysed separately. The studies often contained subgroups based on factors such as area of residence (urban or rural) and other characteristics of the study samples (for example, random sample or hyperlipidaemic patients). Each distinct subgroup was analysed as a separate cohort. For clinical trials, only participants in the control groups were included. Any analytical group with fewer than 60 CHD deaths was excluded to assure at least 10 end point cases for each independent variable. This criterion resulted in the exclusion of CORDIS (cardiovascular occupational risk factor determination in Israel study), the Guangzhou Chinese cohorts from the People's Republic of China, and female samples from the lipid research clinics study, the Scottish collaborative study, and the Glostrup population studies. The final analysis considered 27 analytical groups (19 for men and eight for women).
The multivariate proportional hazards model was used to examine the relation between CHD death and five risk factors: age, systolic blood pressure, serum total cholesterol, current cigarette smoking, and diabetes status. There were only six people with diabetes in the rural samples from the Yugoslavia cardiovascular study; so that diabetes status was not included in the model for this cohort. Smoking status was dichotomised as current smoker or non-smoker. The follow up period was limited to 20 years; if a person had been followed up for more than 20 years, he or she was censored at 20 years. For presentation purposes, the usual transformation of the model coefficients to relative risk was used (that is, exp(xb), where x is an increment in a characteristic). The units of increment to calculate the relative risks were 10 years for age, 20 mm Hg for systolic blood pressure, and 1.04 mmol/l (~40 mg/dl) for serum cholesterol. These units were approximately equal to 1 SD of the measures.
Three approaches were used to examine the risk factor–CHD relations across studies. Firstly, a receiver operating characteristic (ROC) curve analysis was used to measure how well an equation ordered the risk of CHD death.42,43 The area under the ROC curve (AUC) was compared across studies.42,44 The AUC was calculated first using a model derived from the specific (internal) cohort and then calculated using a model estimated from an external cohort (for example, the Framingham cohort or the Renfrew and Paisley study). To calculate these different AUCs, the predicted survival probability of CHD death in eight years was used. Two examinations of these results were conducted: we first examined whether the results from the studies differed significantly using the χ2 statistic suggested by DerSimonian and Laird44; we also compared the AUCs within studies using different models to calculate them.
We next examined the relative size of the coefficients from the proportional hazards model for each risk factor. The coefficients were compared across studies using the χ2 test suggested by DerSimonian and Laird44 to determine whether variation between studies was significant.
Finally, the observed eight year CHD death rate of each cohort was compared with the predictive rate when a risk function based on another cohort (such as the Framingham study) was applied to the cohort.
Tables 1 and 22 present the numbers of participants, baseline characteristics, and numbers of all cause and CHD deaths during follow up for men and women, respectively, in each cohort. Altogether, this analysis included 161 955 participants (105 420 men and 56 535 women) with 29 662 deaths (21 841 men and 7821 women), among which 8316 were deaths from CHD (6442 men and 1874 women). In men, the mean age was between 48–54 years in most of the cohorts. The Norwegian counties study recruited only people aged 35–49 years (table 11).). Excluding the HDFP, in which all participants were hypertensive, men from the Renfrew and Paisley study had the highest average systolic blood pressure. Omitting clinical trials, samples from northern Europe in general had a higher mean serum total cholesterol concentration. Cigarette smoking was most prevalent in the Yugoslavia cardiovascular disease study (around 70%). Among women (table 22),), mean ages were 52–54 years in most of the cohorts and the Norwegian women had a lower mean age (42 years). Women from the Renfrew and Paisley study also had a higher average systolic blood pressure than did women in the other cohorts. Three cohorts from northern Europe had the highest average concentrations of serum cholesterol. The highest prevalence of smoking was found in the Renfrew and Paisley study (47%) and the lowest in the NHANES I and II cohorts (26–27%). The mean age of all cohorts combined was 49 years. Standardising to this age, men and women from the Renfrew and Paisley study and from the Tecumseh community health study had the highest CHD mortality.
AUC derived internally at eight years in men ranged from 0.66 to 0.83 (table 33 and x axis in fig 11).). In women, the ranges were 0.72 to 0.88 (table 44 and fig 11).). The heterogeneity of the AUCs was significant across studies for both men and women (p < 0.001). The AUC of each cohort calculated from its own model is plotted against that when the Framingham risk function is applied to the cohort (y axis in fig 11).). Except for two outliers (rural men from the Yugoslavia cardiovascular disease study and women from the Norwegian counties study), the plots are by and large distributed along the 45° line (the line of identity), indicating a similar ability of ordering risk by either function. When risk equations from other cohorts (such as the Renfrew and Paisley study) were applied to each cohort, instead of the Framingham study, the results were very similar (data not shown). The differences in AUCs between applying the Framingham model and the Renfrew and Paisley model were very small and ranged from −0.014 to 0.003 in men and from −0.018 to 0.029 in women. The Pearson correlation coefficients between AUCs derived from the cohorts' internally derived function, the Framingham model, and the Renfrew and Paisley model were 0.95–0.99, suggesting that the ability of these three different risk functions to order risk within a given cohort was nearly identical. Thus, the ability to rank risk in a population was determined by the intrinsic discriminatory power of the major risk factors measured in that population and not the multivariate predictive function chosen.
The risk of CHD death associated with each risk factor was examined with the use of the multivariate Cox proportional hazards model (table 33 for men and table 44 for women). With only a few exceptions, as expected, the major risk factors were significantly and independently related to CHD death for both sexes. However, the magnitude of the relative risks varied between the cohorts. For example, among men, with clinical trials omitted the relative risk associated with systolic blood pressure ranged from 1.28 in the NHANES I and II cohorts to 1.81 in the rural samples from the Yugoslavia cohort (table 33).). For serum total cholesterol, the relative risk ranged from 0.86 (the rural samples from the Yugoslavia cohort) to 1.40 in the random sample of the lipid research clinic follow up study. The relative risk associated with smoking ranged from 1.33 in the NHANES I to 2.55 in the Norwegian counties study. Large relative risks and 95% confidence intervals were found in several studies in which the number of participants with diabetes was small. The DerSimonian and Laird χ2 tests suggest the existence of significant heterogeneity of the relative risks (p < 0.05) between studies for systolic blood pressure, serum cholesterol, smoking, and diabetes but not for age in both men and women. To examine a possible quadratic relation between age and risk of CHD death, both age and squared term of age were included in the Cox proportional hazards model. Among 27 analytical groups, a significantly quadratic relation was found in only three groups (men from the Scottish collaborative study, the Renfrew and Paisley study, and the Israeli ischemic heart disease study). Hence, age quadratic term was not included in the model in the analyses.
Figure 22 plots the observed eight year CHD mortality of each cohort against mortality predicted from the Framingham model. Among men, the Framingham model tended to overpredict absolute risk in populations with a low observed CHD mortality (< 0.03/8 years) and to underpredict risk in populations with a high CHD mortality (≥ 0.03/8 years). This effect can be summarised by noting that the best fit regression (dotted) line through the data in fig 22 deviated from the line of identity with a slope of ~0.4. The three most severe instances of overpredictions were for the Honolulu cohort (by 225%) and the rural and urban samples from the Puerto Rico cohort (148% and 76%, respectively). The three most severe cases of underpredictions were for the Tecumseh cohort (by 42%), the Renfrew and Paisley study (40%), and the Scottish collaborative study (37%). Among women, the Framingham model underpredicted eight year CHD mortality for five of eight cohorts: the Tecumseh community health study (by 38%), the Renfrew and Paisley study (38%), the NHANES II mortality follow up study (28%), the HDFP (25%), and the NHANES I epidemiologic follow up program (20%).
The eight year CHD mortality in the Renfrew and Paisley study was twice as high as in the Framingham study. On the basis of this observation, we repeated our analyses using the Renfrew and Paisley model to predict CHD mortality in the other cohorts. The model from the Renfrew and Paisley study overpredicted CHD mortality for most of the cohorts (data not shown).
Using person-level data from diverse populations we examined variations of risk functions in three aspects: ordering risk, magnitude of relative risk, and estimation of absolute risk. Altman and Royston45 suggest that the validity of a model must be judged in the context in which it will be used. Historically the investigation of coronary risk was undertaken with the goal of identifying causal factors, which could then be avoided to prevent the disease. That task has been accomplished to a large degree and can be credited with a large proportion of the decline in CHD death rates over the past half century. To serve as the basis for a prevention policy the relative magnitude of harm associated with a lifestyle factor, such as smoking, need not be known with precision in all populations since uniform advice to abstain is effective. A need is now recognised, however, to characterise the risk of individual patients as a guide to treatment decisions. In this setting, the models are not being asked to identify risk factors but rather people at risk. In effect, the models are applied as diagnostic tests and are used to assign patients to treatment above a threshold. The requirements for a successful diagnostic procedure are different and quantitatively more stringent than those required to inform public health policy. In this analysis we have made an initial assessment of the robustness of these models for classifying patients based on risk. The quantitative results suggest mixed results, with acceptable precision for some purposes but not others.
A commonly used method to compare the ability of different predictive functions to order CHD risk is to rank the participants by quantile (for example, quintile or decile) of estimated risk based on an internal or an external multivariate function. The distributions of the observed CHD cases and expected value are displayed by quantile categories. Similar distributions of CHD death cases by risk quintile derived from the internal predictive functions were reported for the four regions (north, west, east, and south) of Europe.5 External evaluations showed that different risk functions had similar ranking power for middle aged men from several US based epidemiological studies in the same era.1 The Framingham model was able to separate lower risk from higher risk participants in several US4,46 and European populations.2,9 In the seven countries study, the expected and observed CHD cases by deciles of risk were highly correlated (r = 0.93–0.98) using either internal predictive models or external predictive models.18 In this report we used ROC curve analysis, which takes into account sensitivity and specificity and avoids the need to separate the whole sample concurrently into many small groups (for example, quintile or decile). In the current analysis the AUCs of the studies varied from 0.60s to 0.80s. However, for a specific population, the AUCs derived internally and those derived from an external risk function were very similar. The performance of different external functions was also very similar when they were applied to the same population. These results imply that the ability to order depends primarily on the accuracy with which risk factors are characterised at the beginning of a study, not the relative size of the multivariate coefficient that is generated from that study.
Similar relative risks across studies require ordering not only a person's risk of disease similarly but also a similar estimate of the ratio of the probabilities that two different people will develop disease. Even though the percentage of the observed cases in the quintiles of estimated risk were comparable among the four regions in Europe,5 there were substantial variations in the multiple logistic coefficients of the major risk factors. Similar coefficients for the major CHD risk factors were reported between the Framingham Study and several US1,4,46 and non-US cohorts9; between northern Europe and southern Europe10; between Italian men from the RIFLE (risk factors and life expectancy) pooling project and American men among MRFIT primary screenees6; and between eight nations of the seven countries study.8 However, Chambless and colleagues,11 reviewing the results of 15 studies in the literature, found considerable variation in the magnitude of the odds ratios for fatal and non-fatal CHD events. It appears that significant differences were usually not found in the previous reports because the analysis consisted of pairwise comparisons involving two or a small number of populations. Using person-level data from 16 studies (27 analytical groups) and a global test for heterogeneity, we found significant variation in the coefficients across populations.
The absolute risk or absolute rate defines the probability that a person will develop an event over a defined period of time. For different predictive functions derived from different populations to estimate absolute risk correctly, not only is it necessary for patients to be correctly ordered and for the magnitude of relative risk to be the same, but also the “background” or average risk across groups with a comparable set of risk characteristics must be the same. With few exceptions,3 studies that showed a good comparability of predictive functions did not examine comparability of absolute risk.2,4–9,46 Those studies that did examine this question usually found non-comparability across populations.10,12,18,20–24 The multiple logistic solutions for the American railroad men in the seven countries study overestimated CHD risk of the European men but underpredicted it for several large cohorts in the USA combined.18 Among cohorts from Finland, the Netherlands, and Italy, the prediction of events within each country using the risk function of the others produced errors ranging from –19% to 51%.20 The absolute risk was overestimated when applying the northern European model to southern European populations and vice versa.10 The Framingham predictive function for white middle aged men overestimated absolute CHD risks in Japanese American and Hispanic men and Native American women,19 in France,22 in Sweden,47 and in Italy.24 In the UK, it reliably predicted the absolute risk of heart disease in white men and women when the annual risk is above 1.5% but underestimated the risk when the absolute risk is lower.23
Several explanations may be offered for the systematic bias observed in absolute risk prediction across populations. The tendency for external models to overpredict in low risk populations and underpredict in high risk populations is in part a consequence of the mathematical procedure being used. The models are constrained to predict the number of cases that are observed and therefore pull the external population towards the level of absolute risk of the population from which they were generated. Hence, D'Agostino and colleagues19 suggested that the model be recalibrated so that the mean values for risk factors, as well as CHD incidence rates of non-Framingham cohorts, were substituted to the Framingham model to improve prediction. In addition, the model can adjust for only the five major risk factors. Factors not accounted for in the model can alter the background risk in a population. Since CHD risk tends to cluster in populations, groups that are at high risk based on hypertension, hypercholesterolaemia, smoking, and diabetes tend to be higher on those factors outside the model (for example, sedentarism), increasing risk beyond what a five factor model would predict. It is also obvious that the lifetime “dose” from risk exposure—particularly smoking—may vary systematically across groups. Prediction based on one initial baseline measurement instead of characteristics of the lifetime exposure limits the precision because of “regression dilution bias”. Other factors influencing the heterogeneity found in our results are differences in the age distributions between studies and the possibility that some subgroups (such as occupational samples) are not representative of the general population in specific countries.
Predicting risk for CHD may be a valuable tool for clinicians, healthcare planners, and researchers, and a source of critical information for individual patients. Health professionals have been using equations, charts, or tables to estimate individual risk for many years.13–17,48–57 Most of these algorithms13–17,48–52,54,55 are based on equations derived from the Framingham heart study, most likely because of its familiarity and availability. The data presented here show that comparability and generalisation of the risk functions should be evaluated comprehensively based not just on one but on several criteria. The heterogeneity of the risk factor–CHD relation across population groups should not be overlooked. Expert panels must exercise caution in generalising from one population to another when making clinical guidelines. Using estimated risk that is too low or too high results in the proportion of treated patients being incorrect. For example, when a risk of CHD death of 10% over 10 years was used as the cut off for treatment decision, the proportion of men from the Honolulu study needing to be treated was 0.9% based on the cohort specific (internal) model, whereas it was 5.3% based on the Framingham model. Similarly, using a model specific to the Renfrew and Paisley study, we estimate that 29.4% of the male cohort would be treated while only 8.2% of the men would be treated if treatment were based on the Framingham model.
We showed that there was significant variation between the studies in all aspects of multivariate risk. These findings have implications for inferences that can be drawn based on meta-analysis. While it is valid to report an “average” effect, the average may not be interpreted as appropriate to all studies. We emphasise, however, that these results show “quantitative” rather than “qualitative” variation among the studies. We find that the characteristics examined here are significantly related to CHD mortality universally. A future direction of our research is to examine study characteristics that may explain the significant variability in the results from the different studies.
While we found clear evidence of random and systematic error in the comparisons of prediction models between populations, a judgement about the importance of these findings must include a consideration of the purposes to which they are to be used. It must also be acknowledged that, given the limited information that is used to characterise individual participants, the lack of standardisation, and the wide range of unmeasured cultural influences in these populations, the models are remarkably robust compared with similar epidemiological tools. What remains to be evaluated, therefore, is whether they are accurate enough for specific applications—for example, as aids to clinical and policy decision making. That judgement would require further analysis of the magnitude of misclassification at the individual level.
The views expressed in this paper are those of the authors and do not necessarily reflect the views of their agencies. Data from the following studies were obtained from the National Heart, Lung, and Blood Institute (USA): the Framingham heart study, the Framingham offspring study, the Honolulu heart program, the lipid research clinics mortality follow-up study, the hypertension detection and follow-up program, and the multiple risk factor intervention trial. The public use data of the first national health and nutrition examination survey epidemiologic follow-up study and the second national health and nutrition examination survey mortality follow-up study were obtained from the National Center for Health Statistics (USA). Data from all other studies were obtained from the investigators of the studies. This work was supported by a grant HL 61769 from the National Institutes of Health, National Heart, Lung, and Blood Institute, USA.
The Tecumseh community health study: Dr Victor Hawthorne; The Yugoslavia cardiovascular disease study: Drs Djordje Kozarevic and Nikola Vojvodic; The Scottish collaborative study and The Renfrew and Paisley survey: Drs Charles Gillis, Victor Hawthorne, and David Hole and Ms Carole Hart; The Glostrup population studies: Drs Torben Jorgensen and Troels Thomsen; The Norwegian counties study: Dr Randi Selmer and Aage Tverdal; The Iceland Reykjavik study: Dr Emil Sigurdsson; The Israeli ischemic heart disease study: Dr Uri Goldbourt and Ms Shlomit Yaari; The Israeli CORDIS study: Dr Paul Froom; The Guangzhou Chinese cohorts: Drs Shuguang Lin, Yihe Li, and Xiaoqing Liu; Evans County: Drs Dan Lackland and Curtis Hames; Charleston heart study: Drs Peter Gazes, Julian Keil, Dan Lackland, and Susan Sutherland. Consultants: Dr Zhaohai Li, George Washington University; Dr Richard Cooper, Loyola University Stritch School of Medicine; Dr Ronan Conroy, Royal College of Surgeons, Dublin, Ireland; Dr Christopher Sempos, School of Medicine and Biomedical Sciences, SUNY, Buffalo, New York. Coordinating Center: Medical University of South Carolina: Janet Bean, Guichan Cao, Christopher Khedouri, Drs Ramon Durazo-Arvizu, Daniel Lackland, Youlian Liao, Stuart Lipsitz, Daniel McGee, Sundar Natarajan, Debjyoti Sinha, and Barbara Tilley.