|Home | About | Journals | Submit | Contact Us | Français|
Correspondence to: Dr. Kerry Thoirs, International Centre of Allied Health Evidence, University of South Australia, City East Campus, North Terrace, Adelaide, 5000, South Australia, Australia. firstname.lastname@example.org
Telephone: +61-8-83022903 Fax: +61-8-83021818
AIM: To identify and assess studies reporting the diagnostic performance of ultrasound imaging for identifying chronic liver disease (CLD) in a high risk population.
METHODS: A search was performed to identify studies investigating the diagnostic accuracy of ultrasound imaging for CLD. Two authors independently used the quality assessment of diagnostic accuracy studies (QUADAS) checklist to assess the methodological quality of the selected studies. Inter-observer reliability of the QUADAS tool was assessed by measuring the degree of agreement (percent agreement, κ statistic) between the reviewers for each assessment prior to a consensus meeting. The characteristics of each study population, sensitivity and specificity results for the index tests, and results of any testing for observer agreement were extracted from the reports. Receiver Operator Characteristic plots were generated using Microsoft Excel 2003 software and used to graphically display the diagnostic performance data and to explore the relationships between the reported ultrasound techniques and study characteristics, and methodology quality.
RESULTS: Twenty-one studies published between 1991 and 2009 were retained for data extraction, analysis and assessment for methodological quality. Assessment of methodology quality was performed on the 21 selected studies by two independent reviewers (RA & KT) using the QUADAS assessment tool. Across all studies the mean number of responses within the QUADAS assessment tool was 10 (range 7-13) for “Yes”, 1 (range 0-3) for “No” and 3 (range 0-6) for “unclear”. Inter-rater agreement for assessment of methodology quality was significantly greater than chance when assessing for representative spectrum, clear selection criteria, appropriate delay between reference and index tests, adequate descriptions of the index and reference tests, reference and index test blinding, and if relevant clinical information was provided. Seven studies reported moderate to high observer agreement for ultrasound techniques. Studies which clearly reported blinding performed better than the other studies for diagnostic accuracy, and lower diagnostic accuracy was evident for populations with lower prevalence of disease. Assessment of the liver surface using ultrasound consistently had moderate diagnostic accuracy across studies which demonstrated good research methodology. Other techniques demonstrated variable or poor to fair diagnostic accuracy.
CONCLUSION: Ultrasound of the liver surface is a useful diagnostic tool in patients at risk of CLD when assessing whether they should undergo a liver biopsy.
Chronic liver disease (CLD) is a significant cause of morbidity and mortality in developed nations. It is commonly caused by viral hepatitis and alcohol abuse with significant contributions from metabolic disorders. Accurate diagnostic testing for CLD to identify asymptomatic patients in a high risk population has become more important due to recent advances in management and treatment options that provide better patient outcomes if the diagnosis of fibrosis or cirrhosis can be made before cirrhosis becomes clinically apparent. In some cases, liver fibrosis has been demonstrated to be reversible, a phenomenon that was previously not considered possible.
The standard method for determining, staging and grading CLD is liver biopsy. The invasiveness of this method, and its associated morbidity and mortality has led to the emergence of less invasive methods which include medical imaging techniques (computed tomography, magnetic resonance imaging and ultrasound), serum markers (both direct and indirect markers of fibrosis) and transient elastography. All of these techniques have the potential to reduce the number of biopsies performed in a high risk population.
Ultrasound can identify the manifestations of CLD such as liver fibrosis and cirrhosis which are characterized by the presence of vascularized fibrotic septa and regenerating nodules[1,5-7]. Ultrasound is an attractive diagnostic tool because it is readily available, inexpensive, well tolerated and is already extensively used in the diagnostic work-up of patients with CLD. The diagnostic accuracy of ultrasound needs to be established to inform clinicians of its role in patients at high risk of CLD.
The aim of the following systematic review was to identify and assess studies reporting the diagnostic performance of ultrasound imaging for identifying CLD in a high risk population.
A search of electronic databases in November 2009 was performed by one author (RA) to identify studies reported in English, investigating the diagnostic accuracy of ultrasound imaging for CLD. MEDLINE, EMBASE, CINAHL and Science Citation Index databases were searched using the terms “chronic liver disease”, “cirrhosis”, “fibrosis”, “liver biopsy”. The truncated terms “sonograph*” and “ultraso*” were also used in the search for alternate terms used for ultrasound such as sonography, sonographic, ultrasonic, ultrasound and ultrasonography. A Boolean search strategy was employed for the above terms in the following form: (sonograph* OR ultraso*) AND (chronic liver disease OR cirrhosis OR fibrosis) AND liver biopsy. No search filters were used. “Pearling” of the reference lists of all selected studies was also performed.
One author (RA) determined the eligibility of studies for inclusion in this review. Inclusion and exclusion criteria were created to identify studies that were likely to conform to the highest level of evidence for studies of diagnostic tests using the National Health and Medical Research Council of the Australian Government Level II criteria.
The inclusion and exclusion criteria for the systematic review are described in Table Table1.1. Initially, abstracts of all identified studies were assessed to determine if the study met the inclusion and exclusion criteria. Studies were retained if they clearly met the inclusion criteria, did not meet the exclusion criteria, or if it was unclear from the abstract if the study met the exclusion and inclusion criteria. The full text reports of all retained studies were then re-assessed for inclusion. All studies clearly meeting any of the exclusion criteria were excluded, and all studies meeting all the inclusion criteria were retained for assessment of methodological quality, data extraction and analysis.
Two authors (RA, KT) independently used the quality assessment of diagnostic accuracy studies (QUADAS) checklist to assess the methodological quality of the selected studies. The QUADAS checklist (Table (Table2)2) contains 14 assessment items, each assessing an aspect of the study that impacts on methodological quality. Each author assessed the selected studies by rating each assessment item for each study as “yes”, “no” or “unclear”. The studies were not given an overall score, nor were they stratified into high or low quality groups. Inter-observer reliability of the QUADAS tool was assessed by measuring the degree of agreement (percent agreement, κ statistic) between the reviewers for each assessment prior to a consensus meeting. A consensus meeting was held to resolve any discrepant scores between the two assessors. A third independent assessor (MP) reviewed the discrepant scores and acted as a final adjudicator if a consensus could not be reached.
The characteristics of each study population were extracted from the reports and included country of origin, sample size, gender, aetiology, age (mean, range and SD), exclusion and inclusion criteria, severity of disease, prevalence, staging system of liver biopsy, and the ultrasound technique(s) used. Sensitivity and specificity results for the index tests were extracted from the reports or from constructed contingency tables. The results of any testing for observer agreement were also extracted.
Receiver Operator Characteristic (ROC) plots were generated using Microsoft Excel 2003 software and used to graphically display the diagnostic performance data and to explore the relationships between the reported ultrasound techniques and study characteristics. To demonstrate any patterns and relationships between methodology quality and diagnostic quality, plots were also produced for items on the QUADAS checklist.
No previous systematic reviews addressing the diagnostic accuracy of ultrasound in liver fibrosis or cirrhosis were identified. A total of 1355 separate studies were revealed from the following databases: MEDLINE (n = 464), EMBASE (n = 1155), CINAHL (n = 18) and Science Citation Index searches (n = 639). Attrition of studies after an initial assessment of the abstracts against the inclusion and exclusion criteria resulted in a residual of 38 studies [MEDLINE (n = 33), EMBASE (n = 3), Science Citation Index (n = 2)]. An additional 8 studies were revealed after pearling of the residual 38 studies (n = 46). After assessment of the full text reports of these 46 studies against the selection criteria, there was further attrition of 25 studies resulting in a total of 21 studies retained for data extraction, analysis and assessment for methodological quality.
Assessment of methodology quality was performed on the 21 selected studies by two independent reviewers (RA & KT) using the QUADAS assessment tool. Inter-rater agreement for each item, across all studies, was assessed by calculating the percentage agreement and kappa value (κ) (Table (Table3).3). For items where there was disagreement between the reviewers, consensus was achieved without the need for an independent adjudicator.
Across all studies the mean number of responses within the QUADAS assessment tool was 10 (range 7-13) for “Yes”, 1 (range 0-3) for “No” and 3 (range 0-6) for “unclear”.
The studies included in this review were published between 1991 and 2009. The characteristics of the study populations are reported in Table Table44.
The method for staging the histology obtained at liver biopsy was either not reported or unclear in 5 studies, all of which were published prior to the year 2000. Across the other 16 studies a total of seven staging systems were used. METAVIR (n = 7), Ishak (n = 3), Desmet (n = 2) and four other systems which were each used once[14-17].
Seven studies reported observer agreement assessment of the ultrasound technique[18-24]. When reported, results for observer agreement were acceptable, with κ values ranging from 0.51-0.93, coefficient of variation values ranging from 2%-8%, and correlation coefficients ranging from 0.82-0.9.
Diagnostic accuracy was determined for a range of ultrasound techniques across all studies. There were 48 reports of diagnostic accuracy for specific ultrasound techniques within the 21 included studies. Thirty different ultrasound techniques were reported of which 23 were reported once. Seven techniques were reported multiple times. The ultrasound techniques could be broadly described according to four main categories: (1) low frequency grey scale imaging, where an assessment of the liver parenchyma, liver shape and size, spleen size and hepatic vessel appearance or calibre was made from an ultrasound examination using a low frequency (≤ 5 MHz) convex or sector transducer (n = 14 reports); (2) high frequency grey scale imaging, where the liver surface was assessed using a high frequency linear (> 5 MHz) array transducer (n = 8 reports); (3) Doppler techniques, where a Pulsed Wave (PW) Doppler study of the portal, hepatic and splenic veins and/or the hepatic artery was performed to determine measurements of maximum or mean velocities, ratios and/or indices of resistance and/or pulsatility, and/or subjective assessments of haemodynamic waveforms (n = 19 reports); and (4) Scoring system using a combination of techniques, where more than one technique and/or parameter described in categories 1-3 provided a quantitative or qualitative assessment (n = 7 reports).
The diagnostic accuracy of the ultrasound techniques by group are demonstrated in Table Table55.
A ROC plot (Figure (Figure1A)1A) was generated for all 48 reports of diagnostic accuracy according to the predetermined broad group categories. One scoring system achieved perfect results, while one report of high frequency liver surface technique indicated a performance no better than chance.
A ROC plot (Figure (Figure1B)1B) was generated for ultrasound techniques that were reported more than once. The ROC plots demonstrate that results for liver echogenicity were consistent but had poor diagnostic accuracy[27,28], results for hepatic vein pulsatility were highly variable[18,29,30], results for liver parenchyma[17,23,31], portal vein maximum velocity[23,30,32], and spleen size[16,23,30] were variable, results for caudate to right lobe ratio were consistent but fair in diagnostic accuracy, and results for liver surface consistently had moderate diagnostic accuracy[18,19,23,31,33,34] except for two outlying reports[26,35].
Reference test blinding (QUADAS item 11) was the only item of methodology quality which demonstrated an obvious trend when plotted on a ROC for diagnostic accuracy; most studies which clearly reported blinding performed better than the other studies (Figure (Figure1C1C).
ROC plots of diagnostic accuracy across disease characteristics (histology staging definition, prevalence, disease aetiology and severity of disease) demonstrated no obvious patterns except that diagnostic accuracy was generally lower for populations with lower prevalence of disease (Figure (Figure22).
The aim of this review was to assess the results and quality of studies reporting the diagnostic accuracy of ultrasound imaging techniques used to identify patients with CLD in a high risk population. The search was restricted to techniques that used ultrasound imaging techniques. Transient elastography, which has demonstrated good diagnostic performance and is becoming more widely used in hepatology practice, was not included because it is a non-imaging technique and currently is not an option on standard ultrasound equipment. A review to establish the performance of stand alone ultrasound is useful because ultrasound scans are often provided by medical imaging departments that do not have access to elastography.
The search strategy was optimized for sensitivity rather than precision, as recommended by the Cochrane Collaboration with no filters used which could potentially restrict the search. Efforts to identify as many relevant studies as possible included expanding the search to databases beyond MEDLINE and EMBASE, reading the abstracts of all identified studies and “pearling” of reference lists. Pearling was particular valuable with an additional eight studies identified, however, it is possible that relevant studies may have been missed because the search strategy did not include the grey literature and was restricted to English. Across the studies in this review there was a wide range of complexity and clarity of the described ultrasound techniques.
Methodology quality of the included studies was assessed with the QUADAS quality assessment tool, an independently validated method recommended by the Cochrane Collaboration. As recommended the QUADAS tool was modified for the specific needs of the review. Inter-rater variability testing of QUADAS showed good agreement over most of the QUADAS items with nine of 14 having substantial or almost perfect agreement. At the consensus meeting addressing differences in QUADAS ratings it was found that differences tended to relate to differing interpretations of item guidelines. Involving both reviewers in the formulation of the guidelines may have resulted in clearer guidelines and more consistent interpretations.
There was no identifiable group of studies that were clearly superior to the rest nor was there a group of studies that was markedly inferior; therefore all studies in the review were assessed for diagnostic accuracy. Blinding was the only item of methodology quality which demonstrated a relationship with diagnostic accuracy results. Studies reporting blinding for the reference test also reported higher diagnostic accuracy than studies which did not report reference test blinding. This finding further endorses the studies reporting higher diagnostic accuracy, because the chance of bias in these reports is reduced.
The only study characteristic that showed a relationship to diagnostic accuracy was prevalence, with studies reporting low prevalence also tending to have lower diagnostic accuracy. Whilst this may seem surprising, as sensitivity and specificity should be independent of prevalence, it has recently been shown that prevalence can affect diagnostic accuracy due to clinical or artefactual variability in studies.
Liver biopsy was chosen as the reference test in this review although it has a significant false negative rate due to difficulties with the biopsy technique and sampling error which make it a less than ideal reference test. We justify our choice because it is the test used in clinical practice and is the only practical choice for a reference test. Whilst laparoscopy may be more accurate, it is much more invasive, with significantly more risk, and generally not used in normal clinical practice. Studies using laparoscopy as the reference test were excluded as including more than one reference test has the potential to introduce differential verification bias.
Studies were included if the diagnostic accuracy results were either given as true positive (TP), false positive (FP), true negative (TN) and false negative (FN) data or simply in the form of sensitivity and specificity. Restricting studies to those that expressed results in full (TP, FP, TN, FN) would have reduced the range of studies included. Whilst potentially this would have enabled the use of forest plots and meta-analysis to assess the diagnostic accuracy, this was not performed because the numbers of studies of techniques similar enough to enable comparison was too small to provide meaningful results. Instead all studies included in this review were analysed visually using the ROC plot technique. This provided an effective method for comparing data and exploring the relationship between diagnostic accuracy and the quality and characteristics of the studies. The area under the ROC for the various ultrasound techniques was not calculated due to the lack of reported raw data to make this possible.
Across all studies there was wide variation in both the ultrasound techniques used and in the reported diagnostic sensitivities and specificities for liver fibrosis and cirrhosis. For ultrasound to be clinically useful as a test that can reduce the number of patients requiring liver biopsy it needs to accurately confirm chronic liver disease. To be effective it should have a low false positive rate resulting in high specificity and a high positive predictive value. In this way patients with positive ultrasound results may be able to avoid the risks of liver biopsy. Two studies[22,25] stand out as having very high specificity (100% and 97.6%, respectively) and very high sensitivity (100% and 87.8%, respectively). Both of these studies used scoring systems and this suggests that this may be the best method of identifying severe fibrosis and cirrhosis; however, these results need to be treated with caution. The scoring systems used in both studies were complex, subjective and relied on the compounding of several ultrasound techniques. The use of multiple techniques[20,22,25,31,39] raises concerns regarding reproducibility, as variations may occur with each of the methods used and become magnified with compounding of methods. It is also a concern that in one of these studies it was unclear if blinding had been used, if there were any subject withdrawals, how the selection criteria were applied, how the reference test was applied and how the scoring system was applied. In contrast, the other study scored very well for methodological quality excepting that observer agreement was not reported.
The reporting of observer agreement was poor in many of the reviewed studies despite it being an important consideration when assessing the usefulness of a diagnostic test. We made an assessment of consistency of results across studies which reported similar techniques as a proxy method to determine the reproducibility of a technique in the absence of agreement reporting. Confidence in the results of a study’s results can be increased if the technique has been reported over multiple studies with consistent results. We could make this assessment for the following ultrasound techniques; liver echogenicity, caudate lobe to right lobe ratio, portal vein maximum velocity, hepatic vein pulsatility, liver parenchyma echo-pattern, spleen size and liver surface.
The results for portal vein maximum velocity, hepatic vein pulsatility, liver parenchyma echo-pattern and spleen size were inconsistent between studies.
Consistently poor results of diagnostic accuracy were demonstrated between the two studies which tested measurements of liver echogenicity[27,28]. Liver echogenicity is known to be associated with liver steatosis but not with fibrosis so this result is not surprising. Consistent results of diagnostic accuracy were demonstrated for the caudate lobe to right lobe ratio across two studies[18,31] with high specificity (> 90%) and low sensitivity (41% and 32%, respectively). The liver surface technique was the most frequently reported technique (n = 8 reports). Diagnostic accuracy was consistent across six of these studies, with high specificities (78%-95%) and moderate sensitivities (51%-73%)[18,19,23,30,32,34]. These studies were also of reasonable or good methodological quality. There were two studies reporting the liver surface technique[26,34] which produced results that were outliers compared to the other six and contained methodological flaws that were serious enough to not accept their findings. The flaws included an unclear description of patient spectrum or selection criteria in one study together with a reported low prevalence of CLD which does not represent a high risk population which was the population of interest in this review. The other study scored poorly for verification and differential bias and had a significant number of unexplained withdrawals.
The findings of consistent results of diagnostic studies that are methodologically sound make the assessment of liver surface appealing to apply in the clinical environment. This technique also appeared simple to implement, was defined clearly in the reports, and used a simple dichotomous categorical classification technique to interpret definitions of normal and abnormal. Three of these studies[18,19,23] also reported substantial inter and/or intra-observer agreement. Although these studies did not demonstrate high sensitivities, the high specificity and therefore high positive predictive value indicate this technique should be accurate for identifying patients who have a high likelihood of severe fibrosis or cirrhosis and who may benefit by avoiding the risks associated with liver biopsy.
In conclusion, a wide range of ultrasound techniques have been reported in the literature and investigated for their diagnostic accuracy to identify CLD in a high risk population. The most robust ultrasound technique for assessment of CLD appears to be the assessment of liver surface. The studies investigating the liver surface technique consistently demonstrated good observer agreement and high specificity. This review has revealed that an assessment of the liver surface is a useful screen for patients at risk of CLD to assist in determining who should undergo a liver biopsy.
Chronic liver disease (CLD) is a significant cause of morbidity and mortality. Accurate diagnostic testing to identify early CLD in asymptomatic patients at high risk is advantageous due to recent management and treatment advances. Biopsy, which is the current method of choice, is invasive and carries a significant risk. Less invasive techniques have the potential to reduce biopsy numbers. Ultrasound is one such technique which is readily available, inexpensive and well-tolerated. However, there are several ultrasound techniques in current practice. For an ultrasound study to be clinically useful it has to demonstrate accuracy in confirming CLD. This systematic review informs clinicians of the usefulness of ultrasound in early diagnosis of CLD in high risk patients, in particular, which method is shown to be the most specific and sensitive.
There have been no identified published systematic reviews addressing diagnostic accuracy in ultrasound of CLD.
This rigorous systematic review identifies methodological and/or reporting flaws in several of the selected papers. It also highlights the variety and range of diagnostic ultrasound techniques for liver examination in CLD in current usage. This review demonstrates that the most robust ultrasound technique for assessment of CLD appears to be high frequency ultrasound assessment of the liver surface.
The high specificity of ultrasound of the liver surface provides a clinician with confidence that if signs of CLD are evident then the condition is present. The moderate sensitivity means that if ultrasound signs of CLD are not present, a liver biopsy may be performed to confirm the presence of CLD. Performing high frequency ultrasound of the liver surface in high risk patients has the potential to reduce the number of biopsies in patients at high risk of CLD.
Pulse-wave Doppler: A technique by which the ultrasound machine can determine the velocity of blood flowing in vessels. In addition, it allows evaluation of the direction and character of the blood flow. Pulse-wave Doppler is displayed as a spectral waveform on the screen. Maximum velocity: The velocity of blood cells flowing along a vessel will vary according to the position within the blood vessel. The maximum velocity is the greatest velocity detected in a particular vessel in a selected area; pulsatility and resistance indices and the spectral waveform allows quantification of the pulsatility of the blood flow by calculations using the maximum, minimum and mean velocities displayed. The indices are an indication of resistance to blood flow in the vessel and variation from normal may be an indication of disease, either in the vessel itself or the organ it supplies.
This is a well written review on the quality and accuracy of ultrasound imaging techniques for identifying patients with chronic liver disease.
Supported by The Division of Medical Imaging at Flinders Medical Centre, Flinders Drive, Bedford Pk, 5042, South Australia, Australia
Peer reviewers: Dr. Markus Reiser, Professor, Gastroenterology-Hepatology, Ruhr-University Bochum, Bürkle-de-la-Camp-Platz 1, Bochum 44789, Germany; Mirko D’Onofrio, MD, Assistant Professor, Department of Radiology, GB Rossi University Hospital, University of Verona, Piazzale LA Scuro 10, Verona, 37134, Italy; Marko Duvnjak, MD, Department of Gastroenterology and Hepatology, Sestre milosrdnice University Hospital, Vinogradska cesta 29, 10 000 Zagreb, Croatia
S- Editor Tian L L- Editor Webster JR E- Editor Ma WH