Our analysis, based on studies of patients with any of four major diseases, found that evidence on the effect of treatments on comorbid patients is limited. The replicability of the inclusion and exclusion criteria is only moderate, thus making it hard to judge whether a specific patient or patient population would have been eligible to participate, and making it hard to replicate the clinical trial. Trials excluded patients with many common comorbidities. The reporting of comorbidities in trials of these four major chronic diseases was very limited, and it was even less common that the studies assessed whether comorbidities were potential modifiers of treatment effects.
Prior work has posited that comorbidities and older age are frequent exclusions for clinical trials 
. Published reports of trials frequently modify eligibility criteria when compared to the original trial protocols, and that many of these relate to comorbidity 
. Our results build on this prior work that identifies a potential problem in the literature, in general, by examining the specific information needed to inform clinical decision-making in people with comorbidities, and highlight the challenges of using the current evidence base to determine how we should treat the rapidly increasing population of people with multiple chronic conditions 
. The replicability of inclusion and exclusion criteria was poor in many of these studies, and only moderate on average, making it difficult to judge whether a patient or a specific patient population would have been eligible for the trial. This also makes it more difficult to replicate the study and compare populations across studies. Comparing populations across studies is an essential step for systematic reviews that attempt to synthesize the evidence base for specific clinical questions.
Serious concomitant diseases were very common exclusion criteria for trials, and yet how to replicate these determinations was not clear. The existence of multiple common comorbidities was a reason for exclusion in studies examining each of the four chronic diseases. For example, while 55 percent of COPD patients in a population-based study, the National Health and Nutrition Examination Study (NHANES) have arthritis, 35 percent of the COPD trials excluded people with musculoskeletal diseases 
. Sixteen percent of 45–64 year old diabetic patients, and thirty percent of 65 and older diabetic patients, have low glomerular filtration rate 
. However, 44 percent of the diabetes trials excluded patients with renal insufficiency. More than 60% of the congestive heart failure that occurs in the United States general population might be attributable to coronary heart disease, yet more than 40% of heart failure trials excluded people with coronary heart disease 
. The mismatch of eligibility criteria and the characteristics of a patient or patient population with the disease, affects our confidence in applying the results of the trial to these patients or patient populations. However, it is important to note that inclusion/exclusion criteria is an incomplete approach for determining whether a trial's results apply to a patient or a patient population with comorbidities, because restricting a trial population does not necessarily mean that the results do not apply to those excluded from the trial.
Also important is understanding what the patient characteristics were of people enrolled in a trial 
. Inclusion and exclusion criteria capture who was eligible, but may not reflect who was actually recruited for the trial. For example, even if older adults are not excluded from a trial, if the mean age is 50 years, with a SD of 15, these results may not apply to an 80-year-old man with multiple chronic conditions. Less than half of the trials reported the prevalence of comorbidities. Among those studies that reported prevalence of any comorbidities, the average number of comorbidities reported on was three. The trials also infrequently reported the definition of comorbidity and how the information needed for these definitions was obtained. Thus, determining the comorbidity burden on average of people in these trials was next to impossible. And even if the trials reported this information, we need additional information to determine whether the results of the trials really apply to people with a specific comorbidity (or comorbidities) or risk profile. Heterogeneity of treatment effects may arise from differences in baseline risk of the primary outcome, risk of harm, competing risks, or relative risk reduction 
. Frequently, researchers examine subgroup effects to determine whether treatment effects vary across groups, but such analyses should be consistent with criteria for appropriate subgroup analyses 
. While the trials occasionally examined subgroups based on comorbidity status, they were never examined according to established criteria 
. The trials also rarely considered heterogeneity of baseline risk and competing risks. The result is that it is rare that we can draw conclusions about the presence, or absence, of different treatment effects in people with comorbidity.
While our results may paint a grim picture of the current ability to draw evidence-based conclusions in evidence syntheses about people with comorbidity, they highlight specific steps for improving the way clinical trials, systematic reviews and resulting clinical practice guidelines might better inform treatments of patients with comorbidities. We could overcome some of the limitations of the current evidence base by improved reporting of the eligibility criteria of trials and the comorbidities of patients included in trials. Greater specificity to the descriptions of inclusion and exclusion criteria would help maximize replicability. Tables of baseline characteristics could include more detailed information about single comorbidities, how frequently they occur in combination and their treatments. And although it is often challenging to assess how treatment effects differ according the extent of comorbidity (sample size requirements may exceed feasibility), trials need to address questions of effect modification for common and important comorbidities for which there are prior hypotheses about potential effect modification. One potential solution could be to use observational studies to investigate effect modification by comorbidity 
. Although observational studies are more prone to confounding, the evidence base could still be improved overall. Defining multimorbidity clearly in such analyses of trials and observational studies will be essential 
. Another question is how to best convey the uncertainty arising from an incomplete evidence base to patients and health care providers. One solution would be to develop explicit guidance on how to rate the quality of evidence that is used for comorbid patients, which could then inform the strength of recommendations.
Limitations of our study included our focus on trials included in systematic reviews for four chronic diseases. Information about trials in additional chronic diseases may better guide treatment for people with comorbidity 
. Nevertheless, we chose Cochrane systematic reviews as the basis of our included trials in order to address the important challenge of how evidence syntheses can better address the important challenge of developing evidence-based guidance for people with comorbidity. We chose this system for feasibility, and also to ensure replicable sampling of trials included in our study. While the Cochrane reviews are recent, some of the included studies are significantly older, and there may be differences over time in the extent comorbidities are considered and there were no exclusions in the Cochrane reviews for quality of evidence or risk of bias. We did not assess for changes in reporting over time. However, recent work on exclusion and inclusion criteria suggests that there is not variation over the last 15 years among a more selected group of studies from high-impact journals 
. Our approach of classifying the exclusions was based on categories of exclusion, and may have grouped exclusions with varied definitions. For example, across trials, there was not a standard definition or threshold of “renal insufficiency”, and so some trials may have excluded only more severe renal disease, whereas some may have had a more restrictive cutpoint. Often, these operational details of a definition are not reported by the trials, as shown in our results.