|Home | About | Journals | Submit | Contact Us | Français|
Blood-based proteins might be an attractive option for early detection of colorectal cancer (CRC), but individually they are unlikely to achieve the diagnostic performance required for population based screening. We aimed at summarizing current evidence of diagnostic performance of signatures based on multiple proteins for early detection of CRC.
A systematic literature review adhering to the PRISMA (preferred reporting items for systematic reviews and meta-analysis) guidelines was performed. PubMed and Web of Science databases were searched for potentially relevant studies published until 28th August, 2017. Relevant studies were identified by predefined eligibility criteria. Estimates of indicators of diagnostic performance such as sensitivity, specificity, and the area under the curve (AUC), along with information on validation and other key methodological procedures were extracted. Study quality was assessed by a QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) instrument tool.
Thirty six eligible studies with numbers of CRC cases ranging from 23 to 512 and the number of proteins included in signatures ranged from 3 to 13 were identified. Reported Youden’s Index and AUC ranged from 0.19 to 0.95 and from 0.62 to 0.996, respectively. However most studies, especially those reporting better diagnostic performance, were conducted in clinical rather than screening setting and many studies lacked any internal or external validation of identified algorithm.
Blood-based tests using signatures of multiple proteins may be a promising approach for non-invasive CRC screening. However, promising signatures identified in clinical settings still require rigorous evaluation in large studies conducted in true screening setting.
With over 1.4 million incident cases accounting for 10% of all cancers and ~700,000 deaths per year, colorectal cancer (CRC) is the third most common cancer globally.1, 2 Most CRCs slowly develop from adenomas over many years, and opportunities for screening and early detection are much better than for most other types of cancers. Randomized trials have shown effective reduction of both CRC incidence and mortality by screening with stool tests or endoscopic examinations,3, 4, 5, 6 and screening programs are currently being implemented in an increasing number of countries.7
Screening colonoscopy and sigmoidoscopy are current gold standards for detection of CRC and its precursors in the total and distal colorectum, respectively. Although application of these screening procedures has been shown to effectively reduce CRC incidence and mortality,4 the implementation of and adherence to these invasive procedures in population-wide screening is limited on account of several disadvantages such as high costs, limited resources and low compliance.8, 9, 10 Compliance is also a major issue in CRC screening based on stool tests, such as the widely used guaiac-based or fecal immunochemical tests (FITs) for hemoglobin, or recently introduced DNA based stool tests.11, 12 Blood-based tests that could be easily integrated in routine exams, might therefore be a highly attractive alternative of minimally invasive CRC screening. It would therefore be of major interest to compare ability of blood-based tests and FITs with respect to their ability to detect early-stage CRC.
Over the years, several blood-based protein biomarkers such as CEA, CA19-9, or CA242 have been associated with diagnosis of CRC. However, none of these proteins individually is able to detect the majority of early-stage CRC manifestations.13, 14, 15, 16 A promising approach to increase early detection of CRC and its precursors could be to combine several different protein markers in a panel. The rationale behind this is that different protein markers may detect different morphological types of CRC and a combination might thus lead to increased overall accuracy. In this systematic review, we aim to provide a comprehensive overview of studies that have discovered, evaluated or validated blood-based protein signatures or panels for early detection of CRC, paying particular attention to the role of study design and study population characteristics. A particular focus of interest is diagnostic performance for detecting early-stage CRC or even adenomas, the precursors of most CRCs, which would be most relevant for potential application in CRC screening.
Observing the PRISMA (preferred reporting items for systematic reviews and meta-analysis) guidelines,17 the online scientific citation databases MEDLINE and Web of Science were searched from establishment until 28 August, 2017. Exact search terms which included keywords like “Tumor Marker”, “Protein”, “Colorectal Cancer”, or “Signature” are provided in Supplementary File 1.
Our search was restricted to human research studies in English language with blood collected prior to any treatment. Studies reporting diagnostic performance of individual proteins markers were not included. We excluded studies that included not only primary CRC cases but also post-operative patients or patients with recurrent disease, or studies that had total number or cases <10. In addition, signatures where the diagnostic performance was reported for panels of unidentified peaks were left out. Studies with insufficient information on diagnostic indicators were omitted. Studies assessing the potential for predicting metastasis or prognosis of CRC were not considered. Moreover, studies that assessed proteins in combination with other types of biomarkers like DNA or RNA based markers were not considered. Furthermore, studies that exclusively reported TAA (tumor-associated autoantibody) signatures were excluded as a systematic review of such TAA signatures for CRC early detection has recently been reported elsewhere.18 A protein marker set was defined as a panel or signature if it was based on information from more than two proteins.
Two authors (M.B. and A.G.) independently read and extracted data from the 36 studies. Information on key study characteristics like authors, year of publication, study population (country, numbers of cases and controls, age, and sex distribution), study design, type of protein signatures, type of protein detection technique, and diagnostic performance was retrieved. Data on the following diagnostic performance related indicators were extracted: overall and stage-specific (wherever reported) sensitivity and specificity, area under the receiver operating characteristics curve and P value. In addition, Youden’s Index (J) and 95% confidence interval (95% CI) of given sensitivities and specificities by Clopper–Pearson method using R 3.3.2 were calculated for each signature and are reported in Table 2.
To assess the risk of bias of individual studies, the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies 2) instrument19 was applied to all the studies. Careful assessment was independently performed by the two above-mentioned authors by inserting and deleting standard signaling questions and rating risk of bias and applicability concerns as ‘high’, ‘low’, or ‘unclear’.
The search in above-mentioned databases using search terms (reported in Supplementary File 1) yielded 3,808 records. Details of the study selection are reported in Figure 1 observing the PRISMA 2009 Flow Diagram.20 As shown in Figure 1, upon application of inclusion and exclusion criteria and screening for relevance to the topic, 91 articles were selected for full review. Sixty-one articles had to be excluded based on aforementioned predefined criteria. Six other studies were identified by cross referencing. In the end, 36 protein signature studies were included in this review. Information regarding study populations and design is summarized in Table 1. Furthermore, information concerning number of proteins in the signature, type of detection method, diagnostic performance and measures for validation is reported in Table 2. The specific proteins included in the signatures are listed in Table 3. Evidence extracted with respect to stage specific results, risk of bias of individual studies, brief description of protein detection techniques and internal validation methods and the PRISMA checklist20 are reported in Supplementary Files 2–6.
An overview on the characteristics of the studies and the participants is given in Table 1. The majority of studies were conducted in Europe (21/36). Of these, four studies were carried out in Denmark,21, 22, 23, 24 three each in Germany,25, 26, 27 Spain,28, 29, 30 and Russia,31, 32, 33 two each in Finland34, 35 and the United Kingdom,36, 37 one study each in Italy,38 France,39 Poland,40 and Czech Republic,26 and at last one study was multicenter.41 Eleven studies that were carried out in Asia, included eight studies from China,42, 43, 44, 45, 46, 47, 48, 49 two studies from Japan,50, 51 and one study from Korea.52 Three studies were carried out in the United States of America53, 54, 55 and one in Australia.56 Blood was collected prior to colonoscopy or any other clinical diagnostic procedure in a true screening setting from asymptomatic individuals only in three studies.27, 38, 53 Numbers of CRC cases ranged from 2354 to 512.24 Six studies21, 23, 30, 33, 53, 56 reported having matched cases and controls by age (n=6), gender (n=6), or ethnicity (n=1);53 two studies 22, 36 used age- and sex-matched cases and controls in the pilot phase but did not match the population for the later validation phase. Only healthy controls were used in 18 studies, nine studies used only disease controls, whereas nine studies used a mix of both types of controls. One study each was carried out in male-only (n=20)38 and female only (n=32)53 populations. In 23 studies, the number of male patients was higher than the number of females, whereas eight studies did not specify gender distribution of study participants. Six studies included (as a separate case group) adenoma cases.22, 23, 24, 27, 41, 55 Twenty-four studies reported stage-specific distribution of cases (Supplementary File 2).
Table 2 provides an overview on the protein detection and quantitation methods applied by the studies and among the included studies Immunoassay-based platforms like ELISA (enzyme-linked immunosorbent assay), EIA (enzyme immunoassay) and so on. were used by 21 studies and mass spectrometry-based platforms were used by nine studies, respectively. A brief description of all methods used is reported in Supplementary File 4. All in all, 93 different proteins were examined in 36 different protein signatures (Table 3) in this review. The number of proteins in the signatures ranged from 3 to 13, with almost half (15/36) of the signatures using three proteins. Several markers were found in more than one signature, including tumor markers like CEA (24 studies), CA19-9 (10 studies), CA242 (5 studies) and CA125, CA153, CA724, CYFRA21-1 (in 3 studies each). Other frequently used markers like Osteopontin and Separase were included in five and four studies, respectively, and AAG, A1AT, CO9, ferritin, and HSP60 were included in three different signatures each. Inflammation markers like TNF-alpha, interleukins, and CRP and tumor suppressor markers like anti-p53 were included in only two signatures each.
To avoid overoptimistic results, a signature derived in a specific study needs to be validated in independent samples. Initial internal evaluation of model performance was performed by 20 out of 36 different studies as shown in Table 2. Split-sampling method, the simplest form of internal validation, was utilized in seven different studies.31, 38, 39, 45, 49, 52, 53 More advanced techniques including various variants of cross-validation21, 23, 24, 26, 33, 36, 37, 41, 44, 50 and.632 bootstrap25, 55, 56 were used by 10 and 3 studies, respectively. External validation of results was not performed by any study except Werner et al.,27 which validated results from a previous clinical setting study (i.e., Wild et al.41) in a true screening setting. Supplementary File 5 contains a brief description of each validation method.
The diagnostic indicators of all the selected studies as shown in Table 2 varied greatly, with sensitivities and specificities ranging from 27–97% and 32–99%, respectively. The P value specifying the level of significance in test performance between cases and controls was stated in only 13 studies and ranged from 0.249 to <0.001. The area under the curve (AUC) was reported by 19 out of 23 studies published in the last five years and ranged from 0.6238 to 0.996.43 Notably, AUCs for CRC were much lower for studies conducted in true screening settings (range 0.62–0.78) than in studies conducted in clinical settings (range 0.68–0.996), and the AUCs of studies conducted in true screening settings were much lower than the summary AUC for FITs derived from such studies (0.95, 95% CI 0.93–0.97) in a recent meta-analysis.57
Among 21 studies that performed some form of validation measure, best diagnostic performance (94% sensitivity, 98% specificity, and AUC of 0.988) was reported by Zhang et al.45 for a panel of tumor markers (CA199, CA242, CA125, CA153, and CEA). However, this study was conducted in a clinical setting and the time of recruitment and blood collection was unclear. Among 15 studies that did not employ any form of validation, Pengjun et al.43 reported best diagnostic performance (96% sensitivity, 99% specificity, and AUC of 0.996) for a combination of inflammatory markers (IL-8, MMP-2, and TNF-α). Again, this study was conducted in a clinical setting with recruitment of clinically detected rather than screening-detected cases.
All of the protein signatures were identified or evaluated in just one study, except one signature constituting of three proteins HSP60, CA19-9, and CEA which was identical in two studies.39, 42 Hamelin et al.39 who conducted their study in a French population reported 47% sensitivity at 90% specificity (J=0.37) and an AUC of 0.77 in a split-sample validation, whereas Hou et al.42 reported a sensitivity of 97% at 91% specificity (J=0.88) and an AUC of 0.906 in a study in a Chinese population without any internal or external validation. Both studies were conducted in clinical settings, however, with unclear information on the time of recruitment and blood collection.
A notable difference was observed when a protein signature first evaluated in a clinical setting was later validated in a true screening setting. Wild et al.41 had reported a diagnostic sensitivity of 70% at 95% specificity for differentiating between CRC cases and controls for a signature with CYFRA21-1, ferritin, Osteopontin, anti-p53, seprase, and CEA. However, when these candidate proteins combined to a five biomarker blood test (excluding CYFRA21-1 as it was dispensable for the reoptimized algorithm) was later validated in a true screening setting, the diagnostic sensitivity decreased to 42% at 95% specificity.27
Stage distribution of cases reported by 24 studies and stage-specific diagnostic indicators of signatures reported by 11 studies are summarized in the Supplementary File 2. Stage-specific numbers of CRC cases were often small and several studies included larger proportion of late stage cancers compared with early-stage cancers with a tendency towards higher sensitivity for the more advanced cancers.
Supplementary File 3 summarizes the results of our assessment of risk of bias using the QUADAS-2 tool. In the QUADAS-2 assessment for four domains, only Werner et al.27 out of 36 studies presented low risk of bias and low applicability concerns in all domains as this was the only study that collected blood from asymptomatic participants in a true screening setting, analyzed the index test without knowledge of the results of colonoscopy and accounted for analytical, internal, and external validity. Risk of bias was highest for the domain ‘Patient selection’ given that most studies were conducted in a clinical rather than a screening setting. For more than two-third of the studies risk of bias was also high or unclear for the domain ‘Reference standard’, because it was not clearly reported whether all CRC patients and controls received a reference standard colonoscopy. Applicability concerns were common for the ‘Patient selection’ domain because blood was collected in most of the studies either from symptomatic patients or after establishment of clinical diagnosis. Risk of bias was low for all ‘Index Test’ criteria as none of the included studies conducted or interpreted the test in a way that would conflict the review question.
This review provides an overview of protein signature studies associated with diagnosis of CRC. Overall, 36 studies published from 1985 to 2017 and investigating 93 different proteins in multimarker signatures were identified. Great discrepancies in estimates of diagnostic performance were observed with sensitivities ranging from 27 to 99%. However, study setting and study quality were very heterogeneous and require careful consideration in the interpretation of the results.
In recent years, there have been studies from major prospective cohorts like the European Prospective Investigation into Cancer and Nutrition (EPIC)58 and Women’s Health Initiative (WHI).59, 60 In addition, studies were also carried out in screening cohorts like BLiTz (Begleitende Evaluierung innovativer Testverfahren zur Darmkrebs-Früherkennung)61, 62, 63 where both cases and controls were selected from participants of screening colonoscopy. Such studies have major advantages as they represent the target population for CRC screening, ensure fully comparable recruitment conditions for cases and controls and confirm absence of pre-clinical CRC from controls. However, they are costly, time consuming and the typically low prevalence and number of CRC cases restricts the possibility of subgroup-specific analysis. Thus, almost all previous studies recruited study participants in clinical settings which may give rise to various potential biases. For example, blood protein marker levels might be affected by diagnostic or early therapeutic interventions or lifestyle and diet modifications following diagnosis. Furthermore, patients recruited in clinical settings typically include higher proportions of advanced stage cases than screening-detected cases, which may often lead to overestimation of sensitivity13 and limit applicability to the target screening population due to spectrum bias.64, 65
For the analysis of multiple protein markers, it is essential to fit statistical models to the data. Because models are meant to offer best fit for the specific data analyzed, they will typically provide less-accurate prediction when applied to other study populations or data sets. Hence, validation is crucial to prevent overfitting and overoptimistic estimates of diagnostic performance.66 Internal validation of some form was performed for 20 out of the 36 studies included in this review and seven out of these used the split-sampling method, which may be subject to major variability of results owing to specific split of the sample.67 Complementary to internal validity, external validation on participants that differ in place, setting, time, and so on from the subjects used to develop a model is crucial for comprehensive evaluation of diagnostic signatures. In the current review signatures with AUC as high as 0.996,43 0.988,45 and 0.92450 have been identified but as these signatures were not externally validated it would be interesting to find the extent to which the results reported in one study would hold accurate for a separate study population from a different point of time and setting.
Protein signatures and their performance in particular may vary between populations from different parts of the world. One protein signature identified by Hamelin et al.39 in a French study population was identical to that of Hou et al.42 evaluated in a Chinese population. Despite the use of identical panels including commonly assessed proteins such as HSP60, CA19-9, and CEA reported diagnostic performances were far from identical. Sensitivity and specificity were 47% and 90% in the French study compared with 97% and 90% in the Chinese study. Given that the Chinese study, in contrast to the French study, did not employ any internal validation, the apparently much better performance might partly reflect overfitting and overoptimism, which underlines once more the importance of both internal and external validation in the evaluation of biomarker signatures. Apart from this no other signature was identical in Asian or Western population. Also as the proteins in all signatures in the review are functionally different, it is challenging to suggest if any particular class of proteins like tumor markers or tumor suppressor or inflammatory markers possess a higher diagnostic potential. Nevertheless, validation of all the identified proteins on same study population would yield interesting and generalizable results.
Out of the several methods used in the studies included in this review, immunoassay, and mass spectrometry-based methods were the most commonly used. Both mass spectrometry and immunoassay have a limited dynamic range with limit of detections in nanogram range and so several low abundance proteins are often missed. Whereas mass spectrometry can identify >1,000 proteins per run, immunoassays work for only one target at a time. Nevertheless, as the sample handling and collection procedure was very heterogeneous and the analytical procedure was also different across studies, the influence of the method upon result cannot be clearly established. In addition, several studies in this review used a candidate approach when selecting which proteins to analyze, whereas others used approaches without a priori defined targets. Although it is therefore not possible from this review to derive firm conclusions on which signatures would work best in practice, the overview of identified signatures provided by this review should help designing studies that directly compare promising signatures in the same study population and under comparable preanalytical conditions.
To our knowledge, this is the first systematic review summarizing the existing literature on protein panels or signatures rather than individual proteins. This is a rapidly emerging field, with 23 out of 36 identified articles published in the past five years only. Given the numerous differences between studies in terms of study populations, sample handling, number, composition, and overlap of proteins assessed, analytic strategies, and proneness to the various types of biases outlined above it is difficult if not impossible to identify one or more specific protein panels to be the most promising. Rather, our review may serve as a basis to identify potentially promising candidates of both individual proteins and protein signatures that ideally should be evaluated in parallel in the same study population in a study conducted in a true screening setting, avoiding or at least minimizing the various types of biases outlined above. Only this way will it be possible to disentangle to what extent the apparent large variations in diagnostic performance may result from differences in study populations, signature compositions or various degrees of bias control.
A number of limitations of our review should be kept in mind. Our search was restricted to English language articles, which could be a source of language bias. Even though a comprehensive literature search was performed in two databases and careful cross referencing was done, it cannot be excluded that relevant articles, especially those published in the “gray literature” were missed. Several studies had to be excluded in full text review because they did not report the diagnostic sensitivity/specificity of the protein signatures. This selective reporting could be a potential source of outcome reporting bias.68, 69 Furthermore, this systematic review only provides an overview and summary of individual studies, whereas a meta-analysis of results was not meaningful on account of multiple differences across the studies.
Although the number of studies assessing diagnostic performance of protein signatures for early detection of CRC has been rapidly increasing in recent years, evidence on diagnostic performance for detection of CRC precursors, such as advanced adenomas, is still very limited. The few studies that addressed this issue generally found diagnostic performance to be rather poor and much worse than diagnostic performance for CRC. This property is shared with other, established screening approaches, such as fecal occult blood testing. For the time being, thoroughly validated blood protein signatures cannot compete with FITs in terms of diagnostic performance in detecting CRC. There might be a large potential for protein signatures to enhance CRC screening in the future if sensitive analytical techniques could be developed that would more reliably allow detecting proneness to and presence of advanced adenomas in addition to CRC.
This review identified a large number of recent studies exploring protein panels or signatures (rather than single proteins) for diagnosis of CRC. Despite some apparently very promising results, the blood-based protein signatures are not ready to replace established screening and surveillance methods for CRC, such as fecal FITs, flexible sigmoidoscopy or colonoscopy. Even though a number of studies have reported promising diagnostic performance comparable to performance of FITs, thorough validation in studies conducted in true screening settings ideally in comparison with FITs and using screening colonoscopy as gold standard reference test in all participants would be essential. If the apparent good performance of promising signatures can be confirmed in such settings, they might become a promising alternative for CRC screening, given that higher adherence rates might be achieved in blood-based compared to stool-based screening. In addition, rapid progress in technologies for both targeted and untargeted proteomics screening may open new arenas for substantial future development of highly informative protein signatures.
Guarantor of the article: Megha Bhardwaj.
Specific author contributions: Hermann Brenner designed and supervised the study. Megha Bhardwaj carried out the literature search and drafted the manuscript. Megha Bhardwaj and Anton Gies extracted the data from the eligible studies. All authors critically reviewed, contributed to and approved the final manuscript.
Financial support: None.
Potential competing interests: The German Cancer Research Center has received industrial grants related to blood markers for early detection of colorectal cancer from Epigenomics, Applied Proteomics, Roche Diagnostics, and Volition. Megha Bhardwaj had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.