|Home | About | Journals | Submit | Contact Us | Français|
Correspondence to: Rolf Teschke, MD, Professor of Medicine, Division of Gastroenterology and Hepatology, Department of Internal Medicine II, Klinikum Hanau, Leimenstrasse 20, D-63450 Hanau, Germany. firstname.lastname@example.org
Telephone: +49-6181-21859 Fax: +49-6181-2964211
The diagnosis of herbal hepatotoxicity or herb induced liver injury (HILI) represents a particular clinical and regulatory challenge with major pitfalls for the causality evaluation. At the day HILI is suspected in a patient, physicians should start assessing the quality of the used herbal product, optimizing the clinical data for completeness, and applying the Council for International Organizations of Medical Sciences (CIOMS) scale for initial causality assessment. This scale is structured, quantitative, liver specific, and validated for hepatotoxicity cases. Its items provide individual scores, which together yield causality levels of highly probable, probable, possible, unlikely, and excluded. After completion by additional information including raw data, this scale with all items should be reported to regulatory agencies and manufacturers for further evaluation. The CIOMS scale is preferred as tool for assessing causality in hepatotoxicity cases, compared to numerous other causality assessment methods, which are inferior on various grounds. Among these disputed methods are the Maria and Victorino scale, an insufficiently qualified, shortened version of the CIOMS scale, as well as various liver unspecific methods such as the ad hoc causality approach, the Naranjo scale, the World Health Organization (WHO) method, and the Karch and Lasagna method. An expert panel is required for the Drug Induced Liver Injury Network method, the WHO method, and other approaches based on expert opinion, which provide retrospective analyses with a long delay and thereby prevent a timely assessment of the illness in question by the physician. In conclusion, HILI causality assessment is challenging and is best achieved by the liver specific CIOMS scale, avoiding pitfalls commonly observed with other approaches.
Core tip: This review focuses on diagnostic causality assessment algorithms that have been used so far in herb induced liver injury (HILI) cases. Detailed information of the various methods with their strengths and weaknesses is provided including their challenges and pitfalls that emerged during the assessing course. For the physician caring for a patient with suspected HILI, the Council for International Organizations of Medical Sciences (CIOMS) scale is the preferred tool for assessing causality compared to numerous other causality assessment methods, which are inferior on various grounds. CIOMS based assessment should start at the day HILI is suspected to ensure completeness of clinical data.
A total of 60 herbs, herbal drugs, and herbal dietary supplements have been reported to cause herb induced liver injury (HILI), though convincing causality assessment rarely was provided. Presented as a tabular compilation, these 60 different herbal products were based on a recent analysis of 185 case reports, spontaneous reports, review articles, and comments. The consideration of possible hepatotoxicity in various reports has been discussed by the National Institutes of Health (NIH) in their recently released LiverTox database, covering a selected group of herbal and dietary supplement (HDS) products[2,3]. Among these are: Aloe vera, Black cohosh (BC), Cascara, Chaparral, Chinese and other Asian herbal medicines (Ba Jiao Lian, Chi R Yun, Ephedra, Jin Bu Huan, Sho Saiko To and Dai Saiko To, Shou Wu Pian), Comfrey, Fenugreek, Germander, Ginkgo, Ginseng, Glucosamine, Greater Celandine, Green Tea, Hoodia, Horse Chestnut, Hyssop, Kava, Margosa Oil, Milk Thistle, Noni, Pennyroyal, St John’s Wort, Saw Palmetto, Senna, Skullcap, Usnic acid, Valerian, and Yohimbine[2,3]. However, causality confirmation was surprisingly rare for individual cases of suspected herbal hepatotoxicity, which often were published as narrative and anecdotal reports without valid and transparent data collection[1-3] that require stringent efforts for causality attribution.
The focus of this review is on causality assessment methods for herbal hepatotoxicity with particular reference to liver specific evaluation methods. This approach gives insight into challenges and pitfalls of these methods with surprising clinical and regulatory issues. Valid causality assessment of assumed HILI cases is required for further case evaluations, otherwise speculations and fruitless discussions will emerge.
Herbal product quality aspects are of primary concern, the respective evaluation should start at the day HILI is suspected. The products are destined for human use and must meet the highest possible quality based on specific standards (Table (Table11)[4-7]. Despite fulfilment of quality standards, batch and product variability is common[4,8-10]. Therefore, additional specific production quality standards have been described, for instance, as a proposal for a Kava Quality Standardization Code. It details standardization of overall herbal quality and specifically addresses chemical, agricultural, manufacturing, nutritional, regulatory, and legislation standardizations. In addition, labelling and consumer leaflet of herbal drugs and herbal dietary supplements should mandatorily provide a clear definition and identification of the plant family, subfamily, species, subspecies, and variety as classical botanical description for any herb used as an ingredient of a herbal product (Table (Table11)[4,8].
As an example, several hundred kava varieties exist[8-11], but specific information on kava variety identification was missing in all spontaneous reports and case report publications of suspected hepatotoxicity. This leaves open which kava variety had to be incriminated[9-17]. On the other hand, the regulatory recommendation for kava drugs was to use its peeled rhizome[8,11,15]. In various HILI cases, it remained unclear, whether unpeeled rhizomes, peeled and unpeeled roots, and/or stem peelings were also used[8,11,16,17]. This again hampered any evaluation of the causative agent of kava hepatotoxicity[16,17]. For both the United States Food and Drug Administration and the Australian Therapeutic Goods Administration, peeled kava rhizomes were recommended for kava supplements[18,19].
Another point of interest focuses on solvents and solubilizers without regulatory advice[8,11,15,16], as well as on adulterants, impurities, contaminants, or misidentified herbs[4,7,8,11]. These key issues of herbal product quality are rarely addressed in publications related to herbal hepatotoxicity[1,4,8-17,20-33].
Other concerns focus on incomplete clinical evaluation. Beginning at the day HILI is suspected, the physician has to gather all necessary information for an accurate diagnosis and the exclusion of alternative causes under relevant clinical aspects (Tables (Tables11 and and22)[1,4,13,14,17,20-26,34-59]. Hepatotoxicity requires strict criteria, best defined by alanine aminotransferase (ALT) and/or alkaline phosphatase (ALP) values. Its increases are expressed in multiples of the upper limit of their normal range as N[60-62]. For ALT, hepatotoxicity has been defined from > 2N[60,62], > 3N or > 5N, while ALP values of > 2N are commonly considered diagnostic[60,62]. Restricting ALT increases to > 5N will eliminate false positive cases and substantiate causality at a higher level of probability. Considering patients with ALT > 2N will include numerous cases with nonspecific increases, with higher requirements for thorough assessment and more stringent exclusion of causes unrelated to the herb(s) under discussion. Also for low threshold N values, the rate of alternative diagnoses must be higher[13,14,24-26,35-39], and missing a hepatotoxicity definition results in false high case numbers due to overdiagnosing and overreporting[17,23-26,38,39]. Special care is required for reporting of confounding variables[4,13,14,18,24,39]. For clinicians, a checklist with all clinical details is available for most alternative diagnoses (Table (Table22).
For a pragmatic approach to assess causality, special attention by the physician is of utmost importance. Only this physician can arrange collection and assessment of all data, thereby providing good data quality. To achieve this, a checklist with all important product and clinical items (Tables (Tables11 and and2)2) and a valid causality assessment algorithm (Tables (Tables33--6)6) should be applied early in the unfolding disease, beginning at the day HILI is suspected. Unless this is done in a stringent way, poor data quality will be provided to the scientific community, regulatory agencies, expert panels, and manufacturers, disabling reevaluation of the case. Initially poor data will produce poor results and is unacceptable. Complete and excellent case data including raw data provided by the physician are necessary to circumvent later investigative efforts, subsequent discussions, and speculative conclusions.
At each step of the evaluation, full transparency of all data is mandatory. This includes a complete narrative medical history, a causality assessment based on an established algorithm, and presentation of all data as item by item and raw data, ready for reevaluation by other scientists. This is also relevant for case publications and case series analyses, which is indeed feasible as shown in the past[13,14,25,35-39,58]. The same transparency is needed for statements and publications by regulatory agencies and expert panels. Neglecting full transparency will cause concern and uncertainty about the validity of the presented conclusions.
Some reservations exist about the best method for causality assessment in hepatotoxicity cases[1-4,13,14,17,21-26,34-39,59-64]. HILI case series reported in 23 publications with 573 HILI cases used various causality assessment methods[12-14,23,25,34-36,38,39,53,54,65-75]. These can be classified into prospective and retrospective analyses (Table (Table33).
The prospective evaluation focuses on the physician caring for a patient with suspected liver injury. This setting requires a readily available and time efficient method to evaluate causation that can adapt to further clinical and causality approach necessities. Candidates are the Council for International Organizations of Medical Sciences (CIOMS) scale, also called Roussel Uclaf Causality Assessment Method scale[60-62], the Maria and Victorino (MV) scale, the Naranjo scale, the Karch and Lasagna (KL) method, and the ad hoc approach.
Retrospective evaluations are based on an expert panel evaluating reported or published case data, sometimes going back for months or years. Examples are the Drug Induced Liver Injury Network (DILIN) method[73,80], the World Health Organization global introspection method (WHO method) as defined by the WHO Collaborating Centre for International Drug Monitoring, and the expert opinion[2,3]. Major differences exist (Table (Table3),3), especially when assessing items that require score attribution (Table (Table44).
Analyzing 23 publications of initially assumed causality but not necessarily confirmed later on[12-14,23,25,34-36,38,39,53,54,65-75] with HILI cases by BC, Greater Celandine, Green Tea extracts, some Herbalife products, Hydroxycut, kava, Pelargonium sidoides, and various herbs, the CIOMS scale was applied in 52.2%, the WHO method in 17.4%, the ad hoc approach in 13.1%, the Naranjo scale in 8.7%, and the KL and DILIN method each in 4.3% of these publications. Similar results were obtained when analyzing the frequency for the 573 cases: the CIOMS scale was used in 275 cases (48.0%), the WHO method in 134 cases (23.4%), the Naranjo scale in 64 cases (11.2%), the ad hoc approach in 63 cases (11.0%), the KL method in 20 cases (3.5%), and the DILIN method in 20 cases (3.0%). For instance, the CIOMS scale was applied for Kava[13,14,67], BC[25,34,71,72], Greater Celandine[35,36], Pelargonium sidoides[38,39], and various herbs, the WHO method for Kava[65,68] and Herbalife products[53,54], the ad hoc approach for Kava[12,66] and Greater Celandine, the Naranjo scale for BC and Green Tea extracts, the KL method for Herbalife products, and the DILIN method for Hydroxycut®.
A systematic analysis of causality methods is also available for DILI cases. In 2008, 61 DILI publications in the PubMed database over the last decade were reviewed. It revealed that in 38 publications (62.3%) no specific causality assessment method was mentioned; presumably, the evaluation was based on the ad hoc approach. The CIOMS scale, Naranjo scale, and WHO method were used in 10, 8, and 2 publications, respectively. Therefore, in HILI and DILI publications the CIOMS scale was the preferred specific causality assessment method if the unstructured ad hoc approach is excluded. Physicians are well advised to use the CIOMS scale for HILI causality evaluation, to err on the side of caution.
The NIH LiverTox specifically addressed the item of causality in hepatotoxicity cases[2,3]. It focuses primarily on using the CIOMS scale, which is discussed in detail. Moreover, the MV and Naranjo scales, the Bayesian, and expert opinion assessment are referred to; details of the DILIN causality assessment also are presented. Some strengths and weaknesses of these methods are compiled (Tables (Tables33 and and44).
The method of choice for the causality assessment of suspected HILI is the CIOMS scale in its original form[60,61] or preferably its update (Tables (Tables55 and and66), with early starting of the evaluation at the day the physician assumes this diagnosis. The CIOMS scale is intended for prospective use at the time of manifestation; it does not require expert knowledge, is structured, quantitative, liver specific, and validated for hepatotoxicity (Table (Table3).3). Its items provide individual scores, which estimate causality levels for the agent(s) under consideration as highly probable, probable, possible, unlikely, and excluded (Tables (Tables55 and and6).6). The CIOMS scale takes into account all core elements of hepatotoxicity and thereby has advantages over other algorithms (Table (Table44). Compared to the regulatory used ad hoc approach, assessment of HILI cases with the CIOMS scale leads to lower causality grades for the incriminated herb and/or for concomitant medications and to better reproducible results due to greater transparency.
CIOMS was developed by an international expert panel and validated by cases with positive reexposure tests serving as a gold standard[60,61]. CIOMS based assessment has shown good sensitivity (86%), specificity (89%), positive predictive value (93%), and negative predictive value (78%). The scales differ slightly for the hepatocellular and the cholestatic (± hepatocellular) type of injury (Tables (Tables55 and and66). Differentiation between these types is feasible by comparing the ratio of the serum activities of ALT and ALP at diagnosis of suspected herbal hepatotoxicity[60,62]. Enzyme activity is expressed as a multiple of the upper limit of the normal range (N), and the ratio (R) of ALT/ALP is calculated. Liver injury is classified as: (1) hepatocellular, if ALT > 2N alone or R ≥ 5; (2) cholestatic, when there is an increase of ALP > 2N alone or when R ≤ 2; and (3) mixed cholestatic-hepatocellular, if ALT > 2N, ALP is increased, and R between 2 and 5.
Strengths and weaknesses of the CIOMS scale have been discussed extensively[2,3,62,73,79,82,85-91]. This scale clearly compiles liver specific criteria for challenge, dechallenge, risk factors, exclusion of unrelated diseases, and comedication, but does not use liver histology data (Tables (Tables55 and and66)[60,62], agreed upon as less helpful criteria in most cases[90,91]. It considers unintentional reexposure results according to criteria as established by previous expert consensus meetings[92,93]. For reexposure results of the hepatocellular type of liver injury, ALT levels are assessed before reexposure (designed as baseline ALT or ALTb), and at reexposure (designed as ALTr). The reexposure test is positive, if (1) ALTb is below 5N with N as the upper limit of the normal value, and (2) ALTr ≥ 2ALTb.
The test is negative, if only one or no criterion is fulfilled; it is uninterpretable, if ALT data are lacking for one or both times. For reexposure assessments of the cholestatic (± hepatocellular) type of liver injury, ALT has to be replaced by ALP. Criteria for positive reexposure tests are included in the updated CIOMS scale (Tables (Tables55 and and6)6) and were not previously applied in cases with reported positive reexposure tests[40-57,59,91]. When these cases were submitted to retrospective analysis using the reexposure test criteria, a positive reexposure test could be confirmed in only 13/30 cases, the test was negative in 5/30 cases and uninterpretable in 12/30 cases. In 8 cases of initially assumed Herbalife hepatotoxicity with a previously reported positive reexposure test result, retrospective evaluation applying the test criteria revealed that criteria for a positive reexposure were fulfilled in only 1/8 cases, whereas the reexposure test was classified as negative in another case or the data were considered as uninterpretable due to missing information to comply adequately with the criteria in the remaining six cases.
The CIOMS scale was widely used for hepatotoxicity assessments in epidemiological studies, clinical trials, case reports, case series, regulatory analyses, and genotyping studies[13,14,24,25,35,36,38,39,58,59,61,64,72,79,84,86,87,90,95-98]. Proposals for refinement and strengthening of the CIOMS scale focused on the weight of individual parameters and risk factors such as alcohol and age, and other shortcomings were addressed[24,87,89,90,98]. However, there is lack of valid data to verify improvements based on reassessing and reevaluating of published approaches[87,89,90,98], calling for new approaches.
Assessment of suspected HILI cases may be problematic in spontaneous reports with insufficient data. Evaluating these cases requires a sophisticated approach, as undertaken by EMA for 31 EU cases of suspected HILI by BC, using the CIOMS scale. This series included 11/31 unassessable cases (35%) due to poor data quality, with causality assessment feasible in 20/31 cases (65%). Among these, EMA specified likely alternative causes in 8/20 cases with diagnoses such as autoimmune hepatitis, DILI, preexisting liver disease, alcoholic hepatitis, and preexisting liver cirrhosis with Stevens Johnson syndrome. Causality for BC was unlikely or excluded in another 6/20 cases and 5/20 cases, respectively. In 1/20 cases, causality was judged as possible by EMA, but upon further evaluation this particular case with insufficient data quality was attributed with an excluded causality. Consequently, in this EMA study group of 31 EU cases there was little evidence of liver injury by BC based on the use of the CIOMS scale, which was most helpful in this particular analysis and provided robust results. The approach of EMA to apply the CIOMS scale in hepatotoxicity cases should be highly appreciated and is in line with the corresponding recommendation by the NIH for their LiverTox database to prefer the CIOMS scale over other methods[2,3].
At present, we are far away from valid data and strict management in suspected HILI cases, which impedes description of classic HILI by the majority of herbs. Possible or likely alternative diagnoses were evident in 278/573 cases (48.5%) of suspected HILI cases; causality assessment was impeded in 165/573 patients (29.0%) due to missing case data or lack of a temporal association, resulting in diagnostic problems in 77.5% of all cases. Given these limitations, actual discussions of validity of reported HILI cases are understandable[82,90,91,94,98-100], and uncertainty also extends to the validity of the type of liver injury reported for some cases lacking a probable or highly probable causality. Considering these restrictions, the hepatocellular type of injury was described for Indian Ayurvedic herbs[72,98], Chaparral (Larrea tridentata)[40,98], Dai Saiko To[47,98], Germander, Green Tea extract, Greater Celandine, Hydroxycut®, Jin Bu Huan (Lycopodium serratum)[45,98], Kava; the cholestatic or mixed type for Chaparral, Germander, Green Tea extract, Greater Celandine, Hydroxycut®; and the veno-occlusive disease for plants containing pyrrolizidine alkaloids such as Senecio, Heliotropium, Crotalaria, and Symphytum species.
In clinical practice, the physician will start at the day HILI is suspected with the CIOMS scale to arrive at an initial estimation and to exclude the most frequent alternative causes, provided point by point in the CIOMS questionnaire (Tables (Tables55 and and6).6). The practical application of the CIOMS scale was published in various case series[13,25,35,36,38,39,71,72,94] and is shown by two single cases as examples, one for a case of hepatotoxicity by Indian Ayurvedic herbs (Table (Table77), and another one for a case of liver injury by a dietary supplement. For further refinement, specific information usually is necessary to rule out rare alternative causes (Table (Table2).2). This initial approach using the CIOMS scale ensures prospectively the collection of highly qualified case data and enables a sophisticated case evaluation currently and in the future. Information of individual CIOMS items (Tables (Tables55 and and6),6), the checklist for HILI diagnosis (Table (Table2),2), all raw data, and a narrative case report should be presented to regulatory agencies, the scientific community, manufacturers, and expert panels to allow refined use of the CIOMS scale and all other case data, provided causality for the incriminated herb reached a probable or highly probable level.
The MV scale was developed in an attempt to improve the CIOMS scale by adding other clinical elements and by simplifying and changing the relative weight of assessment parameters, in detail discussed by the NIH LiverTox[2,3] and others[62,87], or briefly referenced. As a shortened and modified version of the CIOMS scale, the MV scale has fewer specific criteria than the original CIOMS scale (Table (Table4);4); due to major differences in test cases, however, the equivalency to CIOMS has been debated[2,3,62,84,87,89,96].
Specifically, the MV scale evaluates dechallenge as the time necessary for ALT or ALP to fall below 2N, considers a shorter latency period, asks for less accurate exclusion criteria of drug-independent causes, ignores concomitant drug use, emphasizes drugs with more than 5 years marketing without published hepatotoxicity, and overestimates extrahepatic manifestations like hypersensitivity reactions. The validation used real and fictive cases and as gold standard the opinion of three external experts[76,87] and not cases with verified results of positive reexposure tests; for initial validation of the CIOMS scale, both a panel of experts and positive reexposure tests were used[60,61]. Compared to the CIOMS scale, the MV scale was equivalently accurate only in cases of hypersensitivity; otherwise, the CIOMS scale was superior to the MV scale[89,96]. A comparison of the two scales for hepatotoxicity cases demonstrated low consistency between the two systems, with agreement between the scales in only 18% of the cases; the CIOMS scale showed better discriminative power and produced assessments closer to those of specialists. These limitations restrict the general use of the MV scale in hepatotoxicity cases.
A recent HILI study confirmed poor concordance between the MV and CIOMS scales for both the herb and concomitant medication assessment. The CIOMS scale found higher causality levels for the herb and concomitant medications than the MV scale; this was associated with considerably lower causality levels provided by the MV scale compared to the ad hoc approach. The low MV scores were attributed to various parameters such as prolonged latency and dechallenge periods, the presence of several alternative herb independent causes for the observed liver disease, only partial exclusion of herb unrelated causes due to missing essential case data, and lacking consideration of extrahepatic manifestations like rash, fever, arthralgia, peripheral eosinophilia, and cytopenia. It therefore appeared that various confounders precluded a high level of causality for the herb in a setting of HILI cases assessed by the MV scale.
The MV scale may be useful in some selected hepatotoxicity cases. Nonetheless, little evidence is provided that this scale has advantages over the CIOMS scale and should be the preferred tool[2,3,62,87,89,95,96]. It has been criticized by the NIH LiverTox that the elements used in the MV scale and their relative weights were based upon the authors’ expert opinion and not by prospective evaluation of a variety of possible elements and different cutoff values and weights[2,3]. Additional concern was expressed that the MV scale focuses on hypersensitivity features that are comparatively infrequent in hepatotoxicity cases; it performs poorly in atypical cases, such as unusually long latency periods or residual chronic symptoms after cessation of the culprit. Another issue raised was the low numbers of experts and the low degree of validation[2,3] of the MV scale. Thus, the MV scale is not commonly recommended for assumed HILI cases and certainly is no substitute for the CIOMS scale[2,3,87,98].
The NIH LiverTox summarized the arguments for and against the Naranjo scale[2,3]. In detail, while this scale includes all general features important in assessing causality, most critical elements are not weighed in judging the likelihood of liver injury, for example specific time to onset, criteria for recovery time, and list of critical diagnoses to exclude, limiting the use of this scale for assessing hepatotoxicity. The Naranjo scale includes testing for drug levels, which is rarely helpful in idiosyncratic drug induced liver disease. Finally, the scale was designed for use in clinical trials, and points are subtracted if the reaction reappears with administration of placebo, which does not apply to the usual case of drug induced liver disease. Direct comparisons to the CIOMS scale have shown that the Naranjo scale is easier to apply, but has less sensitivity and specificity in assigning causality to cases of liver injury. These statements of the NIH LiverTox[2,3] supported other views, confirming low sensitivity, and a lower prediction rate of the Naranjo scale in a careful comparison with the CIOMS scale for suspected hepatotoxicity cases. These studies concluded that the Naranjo scale lacks validity and reproducibility when evaluating hepatotoxicity[86,93]; it was not recommended for hepatotoxicity assessment.
The Naranjo scale was designed to assess causality of any adverse drug reaction (ADR), independent from the affected organ. It substantially differs from other causality algorithms for hepatotoxicity (Tables (Tables33 and and44)[2,3,24-26,63,79,87,88,101]. This scale relates toxic drug reactions to general pharmacological drug actions rather than possibly to idiosyncratic reactions like rare hepatotoxicity. Its items include drug concentrations and monitoring, dose relations such as decreasing dose, placebo response, cross-reactivity, and confirmation of ADRs using unidentified objective evidence, which is relevant only for toxic reactions[77,79,88]. The general use of the Naranjo scale in hepatotoxicity cases[23,79] created concern[2,3,24-26,63,70,87,88,101].
The use of the liver unspecific Naranjo scale is unacceptable in suspected HILI cases[23,79], its results are heavily disputed[24-26,63,70,79,88]; this pertains especially to the shortened version used by the United States Pharmacopeia (USP)[23,79] with only 5 of the original 10 items. Lack of liver specificity associated with the Naranjo algorithm is evident by lack of a definition of liver injury as ADR; an unclear time frame and latency period; undefined time frames for dechallenge; no definition of risk factors; insufficient evaluation of alternative diagnoses; inappropriate assessment of comedication; and lacking definition of a positive rechallenge test (Table (Table44)[77,88]. This scale also was considered too insensitive, allowing a possible causality even in the absence of essential data, by virtue of the patient simply having taken the suspected agent[63,70]. Most importantly, the modified Naranjo scale as used by USP[23,70] did not exclude relevant alternative causes such as idiopathic autoimmune hepatitis, alcoholic or cardiac hepatopathy, other preexisting liver diseases, DILI, and drug-induced rhabdomyolysis[24-26]. Use of this method has raised concern about judgement validity by the USP[63,88]. Considering all shortcomings along with the lack of liver specificity and validation for hepatotoxicity, the Naranjo scale should be excluded from use in hepatotoxicity cases. It certainly is no substitute for the CIOMS scale.
The KL method is neither liver specific nor validated for hepatotoxicity (Table (Table3),3), it also lacks important items for hepatotoxicity assessment (Table (Table4).4). It was recently applied for causality assessment of suspected hepatotoxicity for some Herbalife products. Subjective judgement is needed for many steps, making this method more prone to bias. Though commonly applied by the Spanish Pharmacovigilance Centres, the KL method is not used by the Spanish Group for the Study of Drug-induced Liver Disease[59,85,87,95], which applies the CIOMS scale as the preferred assessment tool. The KL method should not be used for assessment of hepatotoxicity cases.
Numerous published HILI reports lack any causality method description and presumably are based on the ad hoc assessment with its relevant shortcomings (Tables (Tables33 and and4).4). When using this approach, the physician notes the coincidence of herbal product and chemical drug use, and will estimate the likelihood of a hepatotoxic reaction.
After ruling out alternative causes, the ad hoc approach is often used to distinguish a probable, possible, or unlikely causality. A probable causality is usually attributed when the manifestations of liver disease, temporal association, and dechallenge response seems to fit the typical signature pattern of the product in question. A possible attribution is assigned when one feature is not typical, the product not known to cause the reaction or so rarely that it is difficult to distinguish from background, or an alternative cause is less or equally plausible. An unlikely causality is assigned when most of the features are atypical or an alternative cause is more plausible.
Though relevant items such as signature of symptoms, latency period, dechallenge, definitive exclusion of alternative causes, risk factors, alcohol use, and track record of the product are used[79,89], no universally accepted description exists for either the method or its application. Due to missing specific criteria (Tables (Tables33 and and4),4), the ad hoc approach is obsolete to validly assess causality in HILI or DILI cases.
With the ad hoc assessment applied prior to the liver specific CIOMS scale, the physician inevitably will postpone an assessment by such a procedure and thereby delay the diagnosis. Since the parameters of the ad hoc approach are liver unspecific and not validated (Tables (Tables33 and and4),4), this method should be replaced by better alternatives. The NIH LiverTox does not even mention the ad hoc approach as a possible causality evaluation method for hepatotoxicity cases[2,3].
According to the NIH LiverTox, the DILIN method is based on a narrative summary and a compilation of clinical findings and sequential biochemical abnormalities[2,3]. These are extracted from clinical records and entered into a 65-page case report form, but a scoring system was lacking, as opposed to the CIOMS scale (Table (Table4).4). The DILIN causality adjunction process is outlined in a 12 step flow diagram, using three independently assessing experts in hepatotoxicity who grade the likelihood of a causal relationship between the drug and liver injury in one of five scores: (1) Definite (> 95% assurance): the evidence for the drug causing the injury is beyond a reasonable doubt; (2) Highly likely (75% to 95% assurance): the evidence for the drug causing the injury is clear and convincing but not definite; (3) Probable (50% to 74% assurance): the preponderance of the evidence supports the link between the drug and the liver injury; (4) Possible (25% to 49% assurance): the evidence for the drug causing the injury is equivocal but present; and (5) Unlikely (< 25% assurance): there is evidence that an etiological factor other than the drug caused the injury.
While these causality grades appear vague, attempts are made to provide an objective and critical evaluation of the likelihood that the liver injury is due to the suspected agent[2,3]. In particular, cases are not considered “probable” merely because there is no other explanation. Similarly, cases are not considered “definite” if another diagnosis is possible. If two or three drugs are implicated, only one can be considered probable, highly likely or definite, the others are assigned “possible” or “unlikely”, so that the total percent assurance does not exceed 100%[2,3]. The causality assessment is accepted as initially scored if the three expert reviewers completely agree; if there is disagreement, the reviewers meet to reconcile the differences and reach a final single score[2,3,102]. A complete summary of the definitions for each category is provided.
The DILIN method requires experts and has shortcomings (Tables (Tables33 and and44)[2,3,73,80,86,102]; it is therefore not suitable for the physician who needs assessment results during the early disease. The DILIN method was used for retrospective assessments of case series where time to conclusion is not a crucial issue[73,86,102]. In combination with the CIOMS scale, this method is the basis for future DILIN group studies of clinical, genetic, environmental, and immunological risk factors. To exclude alternative causes in retrospective analyses by the DILIN method, screening was required for previous liver disease, alcohol use, hepatitis A, B, or C infection, autoantibodies, ceruloplasmin, α1-antitrypsin, ferritin, iron, and imaging data; specific details or appropriate scores for each item were not provided (Table (Table44). Other possible causes were not considered (Table (Table2),2), including specific liver infections like hepatitis E or by cytomegalovirus (CMV), Epstein Barr virus (EBV), herpes simplex virus (HSV), and varicella zoster virus (VZV). At present, questions regarding the actual DILIN method validity remain, and transparent results of all diagnostic items from each individual patient would be preferred rather than a summarizing causality grade.
Another approach of the DILIN group targets a novel Causality Assessment Tool (CAT) specifically for HDS. CAT was designed to retrospectively adjudicate multiple products as a single entity using structured causality assessment and expert opinion. The elements of the CAT considered the multiplicity of products consumed, implicated drugs, alternative diagnoses, and published DILI literature on the product or an ingredient. In analogy to the scoring system, the DILIN method expresses causality levels as percentage assurance; CAT also grades the likelihood of a causal relationship between HDS and liver injury from definitive to unlikely. In this preliminary study, CAT was applied in 16 DILI cases, which were initially evaluated by the DILIN method and in which HDS are implicated as a potential cause. Overall agreement and reliability in this study of retrospective analysis requiring an expert panel was moderate; this method needs further investigation and validation.
In its recent statement, the NIH LiverTox does not mention the WHO method in connection with causality assessment methods for hepatotoxicity cases but rather discusses other methods[2,3]. Since the WHO method was not developed for hepatotoxicity cases and therefore does not consider hepatotoxicity characteristics[79,104], this omission appears warranted. The shortcomings of the unspecific features of the WHO method (Tables (Tables33 and and4)4) have been a matter of major concern[38,39,104-106] and led to the conclusion that this scale is not appropriate for causality assessment in suspected HILI cases[79,104].
The WHO method consists of two parts, one being the WHO scale to assess causality levels, the other one the global introspection by experts. Though not validated for any ADR, global introspection surprisingly represented a popular strategy in evaluating the likelihood of drug causality for general ADRs of all organs. As early as 1986, however, global introspection by experts has been shown to be neither reproducible nor valid. In detail, the assessor considers factors that might support a causal link of one or more drugs to an observed ADR, lists all factors, weighs their importance, and estimates the probability of drug causation; no specific checklist or level of strength is given. It has been recognized that both the questions and the answers are ambiguous. Though these shortcomings are described for general ADRs, they certainly also apply even more to hepatic ADRs.
The WHO scale has not been based on a gold standard, is not quantitative, not liver specific, and has not been validated for hepatotoxicity (Tables (Tables33 and and44)[4,38,39,79,104-106]. In particular, reliability, sensitivity, specificity, positive and negative predictive values are unknown, but likely are low[79,81,104-106]. Its scope is also limited since it cannot discriminate between a positive and a negative correlation, thereby resulting in overdiagnosing and overreporting.
The WHO method ignores relevant data like uncertainties in daily dose, temporal association, start, duration and end of herbal use, time to onset of ADR, and course of liver values after herbal discontinuation. Insufficiently considered or ignored are comedication, preexisting liver diseases, numerous alternative explanations, and exclusion of virus infections by hepatitis A, B, C and E, CMV, EBV, HSV, and VZV[38,39]. Since only a few raw data are evaluated, case duplications and retracted cases remain undetected by the WHO method to a higher degree than by other methods. Despite these flaws, the WHO method was used for causality assessment[17,38,39,53,54]. Reevaluation often could not confirm causality in cases of two assessed reports[38,39]; therefore, the use of the WHO method in HILI cases has major limitations.
Causality assessment by the WHO method requires a panel of experts rarely available at a hospital or a family physician office. Consequently, analyses based on this method are retrospective; their results are available long after the patient problems of assumed HILI.
Expert opinion as an assessment tool is poorly defined (Tables (Tables33 and and4),4), except that a panel of specialists with clinical expertise in hepatology is available for causality assessment in HILI. For DILI, groups of skilled hepatologists exists without any doubt in most countries including Japan[108,109] and in expert projects like the international DILI Expert Working Group, the United States DILIN group[73,80,86,102,103], the Spanish Group for the Study of Drug-Induced Liver Disease[59,85,87,95,101], and the Spanish-Latin American network on drug induced liver injury. For HILI, the Hong Kong Herb-Induced Liver Injury Network is of importance. However, the qualification of assessors is sometimes crucial and may be problematic as discussed in detail[88,105,106]. Even with specialists, individual opinion often results in judgement bias.
For HILI case assessment, strategies need to be developed that are clinically useful and applicable in daily practice. These must meet the expectations of the scientific community, regulatory agencies, and manufacturers, provided the case is going to be reported. At the day when HILI is suspected and criteria of hepatotoxicity are fulfilled, the physician should explore through the internet and regulatory databases how frequently the suspected herb has been associated with hepatotoxic adverse reactions both in the scientific literature and by regulatory notifications. Publication as an interesting case report should be encouraged, if there are few or even none hepatotoxicity reports of this particular herb. Consequently, the decision will depend on the physician’s own interest and clinical experience, resulting in three different levels of assessment intensity. These include first a wait and see approach after cessation of the herbal product, second a strategy aimed at exclusion of the most frequent differential diagnoses, or third an exclusion of even rare alternative causes.
The first approach of wait and see requires little attention and few elements and is cost effective, at least initially but not necessarily in the further course. If for some reasons the correct diagnosis was missed, it will be costly and risky for the patient, the physician, or both. Submitting such an insufficiently documented case as suspected HILI case to scientific journals, regulatory agencies or manufacturers would be difficult to reconcile, leading to overreporting due to overdiagnosing[68,82,88,104,105,111]. In detail, diagnostic problems including alternative diagnoses as confounding variables were evident in 77.5% of 573 cases of initially suspected HILI, presented as spontaneous reports or as published case reports.
For the second strategy, the elements of the updated CIOMS scale are sufficient, starting with the evaluation of time to onset to verify at least a temporal association between the herbal use and the liver disease (Tables (Tables55 and and6).6). For instance, if clinical assessment, hepatobiliary sonography, or serology of hepatitis A-C provides an alternative cause as the correct and final diagnosis, the costs will remain low since further diagnostic measures are not warranted. If diagnostic exclusion is unsuccessful so far, parameters of CMV, EBV, HEV, HSV, and VZV are needed (Tables (Tables55 and and6),6), though in reality these elements are rarely reported in suspected HILI cases[13,14,17,23-26,38,39,94]. With complete or even some missing CIOMS elements, the CIOMS scale provided causality for various herbs with levels of probable and highly probable[35-37].
For the third level of evaluation, the physician will have to decide, which of the multiple other and rare differential diagnoses are worth of consideration. The checklist should be valuated as a reminder of possible alternatives and as a suggestion for further approaches, depending on the clinical phenotype. Clearly, the number of criteria set for ruling out alternative causes is not required for all cases, the checklist therefore asks selectively whether the information was completely, partially or not obtained (Table (Table2).2). A sophisticated strategy is needed, however, if the case is reported to regulatory agencies and the scientific community, which are overflooded by poorly documented suspected and often misdiagnosed HILI cases[26,34-36,38,39,82]. For optimum case presentation, the individual items of the updated CIOMS scale should be provided for a single case (Table (Table77)[58,97] as well as for case series. This is feasible as shown in numerous publications[13,25,35,36,38,39,71,72,94] for 26 cases, 22 cases, 22 cases, 21 cases, 15 cases, 13 cases, and 4-9 cases[71,72,94]. The presentation of the CIOMS items for the single case should be combined with a detailed report of all relevant case data[58,97] and a list of differential diagnoses that were excluded completely or partially, or were not considered, similar to the checklist for HILI diagnosis (Table (Table2).2). For a case series, basis data for each individual case are to be provided in a single table, focusing on details required for causality assessment; examples are presented in various publications[14,25,35,36,38,39]. Presentation of excellent data will lead to valid causality results and appropriate conclusions. This is prerequisite for well founded assessments of further HILI cases, with benefit for patients, physicians, the scientific community, regulatory agencies, and manufacturers.
Future considerations will have to focus on improvements of causality assessment methods[90,98] to obtain prospectively valid HILI diagnoses at the time the patient experiences liver injury, corresponding efforts of retrospective causality assessments of HILI cases are promising and on the way with preliminary data. Strategies are to be developed to characterize liver injury by various herbs with all facets. At the day HILI is suspected, causality assessment should be initiated in all cases using the CIOMS scale preferentially in its updated form (Tables (Tables55 and and6).6). Supported by the checklist for HILI diagnosis (Table (Table2),2), this could provide HILI cases with a probable or highly probable causality for a special herb as basis for further evaluation. Overall, this will facilitate characterization of disease entities including phenotype standardization, retrospective reanalysis by expert panels, improvement of pharmacovigilance decisions, safety strategies of manufacturers, and studies directed to assess pathogenetic aspects of HILI.
Studies are needed in the future to assess factors leading to unpredictable HILI in few patients, who experience this disease with a probable or highly probable causality level. As for DILI, future issues for HILI cases with established causality are to define genetic, environmental, and immunological determinants of HILI susceptibility[80,90,112,113]. Overall, metabonomics, pharmacogenetics, proteomics, and transcriptomics are areas of potential interest in HILI, as detailed for DILI. Since HILI is commonly an unpredictable disease, experimental studies dealing with predictive cellular systems as used to identify potentially hepatotoxic synthetic drugs will be of limited if any relevance for herbs. Similarly, applying well-defined primary cultures of human hepatocytes and measuring a panel of signals directly linked to key mechanisms of liver injury to predict drugs, which can cause liver injury, will be restricted to drugs and not be applicable to herbs. Recent advances of the early pre-clinical assessment of the potential intrinsic hepatotoxicity of candidates drugs has been reviewed in detail, focusing on cell-based models such as cell cultures with outcome and detection methods, on profiling technologies, and emerging technologies including stem cell technologies and 3D as compared to 2D culturing techniques. However, it is unlikely that the results of these in vitro studies of intrinsic and predictable hepatotoxicity induced by synthetic drugs are transferable to a clinical setting of HILI that commonly represents the idiosyncratic and unpredictable form of liver injury by one or more herbs, each with multiple chemical constituents. More important seems the search for biomarkers in HILI patients with clearly established causality.
The rare liver injury by herbs, herbal drugs, and herbal supplements may present itself with numerous facets, providing challenging issues for causality assessment. The physician is responsible to make available all necessary data for a high quality judgement; otherwise, causality evaluation will be problematic. Timely causality assessment is mandatory when the disease is unfolding to base prospective diagnostic and therapeutic decisions. The most appropriate causality assessment method is the liver specific CIOMS scale, which should prospectively be applied by the physician. If used, other methods have pitfalls and cause ambiguous results debated on reasons of imprecision, liver unspecificity, and limitations to retrospective analyses, or they are unavailable due to requirements for expert panels.
P- Reviewers Devarbhavi H, Helling TS, Taye A S- Editor Wen LL L- Editor A E- Editor Xiong L