|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: TP. Performed the experiments: TP. Analyzed the data: TP. Contributed reagents/materials/analysis tools: TP MM VR YB DT OD. Wrote the paper: TP.
Factors associated with the survival of truth of clinical conclusions in the medical literature are unknown. We hypothesized that publications with a first author having a higher Hirsch' index value (h-I), which quantifies and predicts an individual's scientific research output, should have a longer half-life.
474 original articles concerning cirrhosis or hepatitis published from 1945 to 1999 were selected. The survivals of the main conclusions were updated in 2009. The truth survival was assessed by time-dependent methods (Kaplan Meier method and Cox). A conclusion was considered to be true, obsolete or false when three or more observers out of the six stated it to be so. 284 out of 474 conclusions (60%) were still considered true, 90 (19%) were considered obsolete and 100 (21%) false. The median of the h-I was=24 (range 1–85). Authors with true conclusions had significantly higher h-I (median=28) than those with obsolete (h-I=19; P=0.002) or false conclusions (h-I=19; P=0.01). The factors associated (P<0.0001) with h-I were: scientific life (h-I=33 for>30 years vs. 16 for<30 years), -methodological quality score (h-I=36 for high vs. 20 for low scores), and -positive predictive value combining power, ratio of true to not-true relationships and bias (h-I=33 for high vs. 20 for low values). In multivariate analysis, the risk ratio of h-I was 1.003 (95%CI, 0.994–1.011), and was not significant (P=0.56). In a subgroup restricted to 111 articles with a negative conclusion, we observed a significant independent prognostic value of h-I (risk ratio=1.033; 95%CI, 1.008–1.059; P=0.009). Using an extrapolation of h-I at the time of article publication there was a significant and independent prognostic value of baseline h-I (risk ratio=0.027; P=0.0001).
The present study failed to clearly demonstrate that the h-index of authors was a prognostic factor for truth survival. However the h-index was associated with true conclusions, methodological quality of trials and positive predictive values.
Science progresses via a series of paradigms that are held to be true until they are replaced by a better approximation of reality . In surgery and medicine two studies have estimated that the half-life of truth for clinical conclusions in the literature is 45 years , . We had tried to identify factors that were independently associated with this truth survival, and found only two, one expected (the negative conclusion of the publication) and one non-expected (the absence of meta-analysis in the methodology used) . We therefore concluded that better prognostic factors should be found to better convince clinicians of the long term utility of evidence-based medicine , .
In the previous study, we did not analyze any author's related factor . In the present study we hypothesized that publications with a first author having higher h-I which quantifies  and predicts an individual's scientific research output , , should have longer survival. An association between the h-I and truth survival could be also the proof of concept of using this type of method for validating such indexes. So far, the h-I has been validated using '“scientific achievement”, as defined by criteria which are finally very redundant: the number of citations , -peer review , -grant proposals  or quantitative performance measurements -.
We used 474 previously assessed articles  with an identified first author, and in which the survival of the main conclusions were updated in 2009.
We identified original articles concerning cirrhosis or hepatitis in adults from 1945 to 1999 in 11 five year periods. The articles selection was stratified into 3 categories: non-randomized studies, randomized trials and meta-analyses. In each five year period we selected 20 non-randomized articles from two journals, 10 published in Lancet and 10 in Gastroenterology. In each period we tried to select 20 randomized trials on cirrhosis or hepatitis, 10 from Lancet and 10 from Gastroenterology. We chose these two journals because they have published clinical studies in hepatitis and cirrhosis since at least 1945, because they are peer- reviewed with a high level of selection and have a high impact factors greater than 10. A hand search was utilized to select articles from 1945 to 1985. As a true randomization was very difficult to organize we used a selection by order of publication inside each 5 year period. The first article of the period concerning cirrhosis or hepatitis was chosen, then the last of the period, then the second, and then the one before the last and so on up to 20 articles. From 1985 to 1999 we used PUBMED electronic search specifying the following “limits”: cirrhosis or hepatitis, human, Lancet or Gastroenterology. Abstracts were randomly downloaded using a similar selection method, stratified by five year periods. We selected the first abstract listed on the first electronic page, then the first on the last electronic page, then the last on the second electronic page, then the last on the page before the last and so on up to 20 articles.
In each period we tried to select 20 randomized trials on cirrhosis or hepatitis, 10 from Lancet and 10 from Gastroenterology. This was possible from 1970 to 1999. In the periods from 1945 to 1969 we selected all identified randomized trials whatever the journal, with a range from four (1945-1950) to 20 trials (1965-1969). From 1945 to 1982 we used the_ manual method the hand searching method as previously described . From 1982 to 1985 we completed the random selection by hand searching and from 1985 to 1999 by PUBMED as described for non-randomized studies.For the meta-analyses, we used a hand searching method as described in the systematic review of meta-analyses . To be included, meta-analysis should be based on trials in the field of hepatology and published as full papers before 2000. The following operational definition of meta-analysis was adopted: a study in which a computation of an overall treatment effect, based on the estimation of treatment effect in each trial, was performed, and reported with its 95% con- fidence interval or with the corresponding statistical test. Meta-analyses on childhood diseases were not included .
The one conclusion from each abstract that seemed to best summarize the findings was copied to a database. Editing of these sentences was restricted to the rephrasing of outdated terminology and the elimination of redundant words.
Six hepatologists, called the observers, assessed the form which contained the selected conclusions in a random order. The observers were fulltime hepatologists from different subspecialties but working in a hospital and aged from 31 to 65 years. Observers were blind to the period, the journal, the authors, the method (meta-analysis, randomized, non randomized), and the methodological quality from which each conclusion was derived. They classified each conclusion into one of three categories: 1) still true in 2000 (updated in 2009), 2) obsolete but not false, 3) false.
The following seven factors were analyzed; 1) the design (meta-analysis, randomized trial, not randomized study); 2) the quality assessment of randomized trials and meta-analyses had been made independent of this study by one of us (TP) by means of scoring methods -; articles were rated as high quality when the score was greater or equal to the mean (12 for randomized trials, 27 for meta-analyses) and as low quality when lower than the mean. Non-randomized studies were classified as low methodological quality as there is no specific scoring method; 3) negative or positive conclusions; 4) the type of disease (hepatitis, portal hypertension, other); 5) the domain of clinical research (therapeutic, diagnostic or other study; other studies were defined as explanatory studies not assessing treatment or diagnostic tests); 6) the journal of publication (Lancet, Gastroenterology, other); and 7) the specialty (medicine or surgery).
A conclusion was considered to be true, obsolete or false when three or more observers out of the six stated it to be so. When there was a split decision 3 to 3 regarding conclusions being true-not true the final conclusion was considered to be true; these splits concerned 9 out of 474 (1.9%) articles. When there was a split decision 3 to 3 regarding conclusions being obsolete-non obsolete the final conclusion was obsolete; these splits concerned 26 articles out of 474 (5.5%). When the article was not classified as either true or obsolete it was considered as false. The half-life was calculated according to the Kaplan Meier method using the censored time as the duration between the year of publication to the year 2000 (updated in 2009). The censored time is the time at risk of being refuted or found to be obsolete. We analyzed the truth survival: if the conclusion was assessed to be still true the case was censored at the end of follow-up. If the conclusion was assessed to be false or obsolete it was considered as a failure. The comparison between factors used the two-sided logrank test and the multivariate analysis proportional hazard regression analysis.
The h-I of first authors was the main prognostic factor assessed in the present study. The h-I were assessed in the first 6 months of 2010. The h-I was originally computed using Google Scholar (“Google Scholar Universal Gadget”) for first authors. Because Google Scholar is not a perfect Gold Standard of estimating h-I, other methods were used. The commonness of last names can introduce a false estimate of the h-I  and therefore for the high risk names we used “liver” as a supplementary selection criteria in the Scholar research. As the Scholar research should be less performing for the oldest publications, the h-I was also assessed using the Scopus database for first authors of articles that were published after 1995,and using the ISI data-base. Only the authors still publishing after 1980 have been taken into account as the applicability of ISI search was very low in the older periods.
The date of the publication as well as the scientific age of the author (time between first and last publications) are mathematically associated with the h-I, which is cumulative, and increases over time , , , -. Therefore analyses were stratified according to the publication date (1945-1964, 1965-1979, 1980-1999), the rate of the h-I (h-I/scientific life in years) was estimated and the scientific life duration of the author was included in multivariate analyses.
The seven characteristics of studies  and two author characteristics associated with the h-I in the literature (gender of author, and place of residence) , - were analyzed as possible confounding factors in the prognostic analyses. The gender was unknown from the Scholar research and from the first name initials. We used the personal knowledge of coauthors and the details of first name given by Scopus.
No change was made for the selection of articles, and methodological quality assessment. Observer conclusions were updated in 2009, that is with 10 years more of follow-up. One previous observer had retired, two had moved and two new ones agreed to participate (MM, DT). The observers were asked to modify their previous conclusions if necessary. A conclusion was changed when at least three observers out of the five stated it to be so. Five changes occurred, one previously false conclusion and one previously obsolete became true, two previously true became false and one became obsolete.
The main a priori endpoint was the prognostic value of the h-I (quantitative value) in the multivariate analysis including previously identified prognostic factors. The other “significant” P values were detailed when <=0.10 and were described as NS if >0.10.
Statistical descriptions and analyses used non-parametric methods. Median was expressed with a 95% confidence interval. Multiple comparisons used the Kruskal Wallis variance analysis with Dunn s' multiple comparison test. In comparison with the previous analysis the same time-dependent analyses were used.  A modification was made for the estimated time of censoring for obsolete or false conclusion, according to a pertinent critique . Very old publications that had been declared obsolete at the end of follow-up could cause the duration of survival to be overestimated if they were in fact been obsolete or false many years earlier. Therefore for each obsolete or false conclusion, we estimated the year in which it became obsolete or false. We added the duration of scientific life in the Cox proportional regression model as a covariate for adjusting the prognostic value of the h-I. The conclusions of the first analysis and the factors associated or not associated with truth survival did not change .
It was not possible to assess directly the h-I of the author at the time of publication (baseline h-I) for each article included in the present survey. However it was possible to estimate the baseline h-I using backwards the progression rate of the given h-I. For example a Scholar h-I=81 in 2010 (h-I2010), for an author with a mean speed (h-speed) of 2.53, it was possible to extrapolate that for one article of the present database published in 1995 (h-Ibaseline) the h-I was at this baseline date: h-Ibaseline=h-I2010- (h-speed × (2010-1995)) =81-2.53x(15)=43. This baseline h-I was also assessed in the prognostic analysis.
It has been suggested that for a special “outstanding category” of top-scientists, citation' indexes can reflect scientific “quality” . Therefore we planned an analysis of “top-hepatologists” conclusions, using the cutoff which select the 30 highest h-I. Using h-Scholar the cutoff was h-I =60; this resulted in 33 articles (6.1%), as there was 4 ties at h-I =60. Using h-Scopus the h-I cutoff was 33 and for ISI 38.
We have not previously observed a prognostic value of studies according to criteria based on methodological quality scoring systems . Recently Ioannidis proposed a classification of research findings in 9 classes of positive predictive values according to various combinations of power, ratio of true to not-true relationships and bias . The details of this classification are available in Table S1. Therefore we planned an analysis using this classification in the multivariate prognostic analysis.
A total of 474 articles were included. The characteristics of included first authors are given in Table 1 and of the articles are given in Table 2, stratified by periods. There was a majority of articles published by residents of the US and UK before 1980, and by residents of continental Europe after 1980. A large majority of articles were published by male first authors, who were not surgeons, with a median scientific life of 30 years. The methodological quality, expressed according to scoring systems or predictive value, was much better since 1980.
In the year 2009, 284 out of 474 conclusions (60%) were still considered true, 90 were considered obsolete (19%) and 100 (21%) false. The half-life of truth was 45 years. The survival rate of conclusions was 85% (95%CI 83-89%) at 20 years and -52% (95%CI, 47-57%) at 40 years.
The first author Scholar h-I (median; 95%CI) was 24 (20-27), with a range from 1 to 85, and an increase of 0.87 (0.82-0.93) h-I per year of scientific life. There was a skewed distribution, not normal, with 33 articles published by 21 authors with h-I values -of 60 or higher. For authors publishing after 1994, the h-I, estimated using Scopus, was 17 (15-20) with an increase of 1.13 (1.00-1.40) per year. For the period after 1980, the h-I estimated using ISI, was 21 (17-25) with an increase of 0.72 (0.63-0.88) per year. The median baseline h-I was 0 (0-0) before 1965 and 6 (5-8) after 1980.
As expected the h-I was highly associated with duration of scientific life and recent publications (Table 3). Authors that had published after 1980 had a significantly higher h-I (30; 26-33); for those that had published earlier, the value was 23 (17-28) between 1965 and 1979, and 13 (9-15) between 1945 and 1964. There was no association between gender and the h-I.
Articles with true conclusions had significantly higher h-I (28; 24-31) than those with obsolete (19; 15-25; P=0.002 vs. true) or false conclusions (19; 16-25; P=0.01 vs. true) (Table 4). The same trends were observed for the h-I “rate” per year 0.97 (95%CI 0.84-1.07) for true conclusions, vs. 0.76 for obsolete (95%CI 0.62-0.86; P=0.07 vs. true) and 0.90 (95%CI 0.71-1.07; NS vs. true) for false conclusions.
Using univariate and not time-dependent analysis, the h-I was also associated with methodological quality either using scores (Table 4) or positive predictive value categories (Figure 1), randomization design, and with authors with several articles included (Table 5).
There was no significant association between the h-I and truth survival using time-dependent analysis both in uni- and multivariate analyses (Table 5). Comparing the Scholar h-I there was no significant difference between 50 years survival (main end point), 50±5% (h-I above median) and 46±4% (under the median), respectively (P=0.63) (Figure 2). There was also no difference in truth survival for Scopus h-I (Figure 3).
For the main endpoint the risk ratio of the h-I was 1.003 (0.994-1.011) and was not significant (P=0.56). There was a significant difference of the 50 years survival of conclusions according to the negative or positive finding, 72±12% (negative finding) and 40±3% (positive finding), respectively (P<0.0001) (Figure 4).
In a subgroup analysis restricted to 111 articles with negative conclusions we observed a significant independent predictive value of the h-I in multivariate analysis (risk ratio=1.033; 95%CI, 1.008-1.059; P=0.009). Negative conclusions of authors with an h-I >24 had an 82%±6% 50 years survival vs. 65%±9% for those <=24 (NS). The observed difference was even greater among the Lancet's studies: 74%±16% vs. 47%±18% (NS).
The 50 year survival of the 30 higher h-I “outstanding category” conclusions was 48% (95% CI 29-73%) vs. 35% (28-42%, P=0.10) among the others. Using Scholar h-I and ISI h-I the 25 year survival of the 30 higher vs others were 67% (44-90%) vs 60% (50-71%; P=0.89) and 71% (51-90%) vs 59% (49-70%; P=0.73).
Concordance between the h-I estimated using Google scholar on the overall scientific life of authors and the h-I estimated using Scopus and ISI for the scientific life after 1994, was assessed for the 217 authors of articles published after 1994 and applicable ISI (1à not applicable out of 227). There was a highly significant concordance between the 3 h-I estimates. The Spearman's rank correlation between Scholar and Scopus was 0.72, between Scholar and ISI 0.81 (P<0.0001) and between Scopus and ISI 0.82 (all P<0.0001). The median h-I Scholar value was 31 (95%CI, 28-36) with a median of 29 years (95%CI 28-32) of scientific life; the median h-I Scopus value was 17 (95%CI, 15-21) with a median of 15 years (95%CI, 15-15) of scientific life; the median h-I ISI value was 21 (95%CI, 17-25). The rate of h-I per year was 1.08 (95%CI, 0.96-1.13) according to Scholar 1.13 (95%CI, 1.00-1.40) according to Scopus and 0.73 (95%CI 0.63-0.88). The classification of authors that ranked above/under the h-I median by Scholar (>31), by Scopus (>17) or by ISI (>21) had a high kappa concordance rate of = Scholar/Scopus 0.61 (SE=0.07; P<0.001), Scholar/ISI 0.69 (SE=0.07; P<0.001) and Scopus/ISI 0.85 (SE=0.07; P<0.001). For h-I rate above/below 1 per year, the kappa were Scholar/Scopus 0.45 (SE=0.06; P<0.001), Scholar/ISI 0.59 (SE=0.06; P<0.001) and Scopus/ISI 0.31 (SE=0.06; P=0.01). In comparison with h-I estimated using Google Scholar, the h-I estimated using Scopus or ISI had similar variability according to characteristics of included first author (Table 3) and original articles (Table 4), and were also not independently associated with truth survival (Table 5).
For baseline H-I the prognostic value was opposite between uni and multivariate analyses. Using univariate comparison (Table 5), article with author baseline h-I greater than 3 (the median value) had lower 50 year survival (18%) than article with lower baseline h-I (42%; P<0.0001) and in multivariate analysis the quantitative value was positively associated with survival (Risk ratio=0.027; P=0.0001). This discrepancy was due to a very significant period effect. After 1980 the 25 year survival of author with baseline h-I >3 was 66% (54-77%) versus 63% (50-76%; NS) in h-I≤3, with in multivariate analysis a significant positive prognostic value (risk ratio=0.027; P=0.0001). Before 1980 the 25 year survival of author with baseline h-I >3 was 19% (5-34%) versus 63% (50-76%; NS) in h-I≤3 (negative prognostic value), with in multivariate analysis a significant positive prognostic value (risk ratio=1.052; P=0.04).
We observed that the h-I at the end of the study was associated with true conclusions, but its prognostic value did not survive with time-dependent analysis as previously observed for methodological quality. On the contrary baseline h-I (when the paper was written), was significantly and independently associated with truth survival, when adjusted on other covariables. Negative conclusions remained a robust and independent predictor of truth survival .
We confirmed in the present study the intriguing prognostic value of negative conclusions (72% vs. 40% for 50 years survival for positive conclusions), which persisted after other factors had been taken into account. This prognostic value was not due to obsolete conclusions as among negative conclusions, as only 2% of negative conclusions had been rated as obsolete compared to 25% of positive conclusions. We found few negative studies which had been published in order to reveal previous false positive conclusions (Proteus phenomenon) . An example is the article which concluded that hepatitis B virus was not responsible for primary biliary cirrhosis which was published 18 months after another article had suggested this association . There was no significant difference in the h-I of authors with negative (h-I=21) or positive (h-I=25) conclusions. If we accept that most published research findings are false , the better survival of negative findings (“no relationships”) is a corollary of this statement. This is therefore the most plausible explanation of the better long term survival of negative findings.
Subgroup analyses are hazardous, but in a multivariate analysis restricted to 111 articles with negative conclusions we observed a significant independent predictive value of the h-I. This retrospective observation without a priori hypothesis must be confirmed by another study. We previously observed in the present cohort that the prognostic value of negative versus positive conclusions was mainly due to high differences among the randomized trials' conclusions: 68±13% for 52 negative conclusions compared with 14±4% (P<0.001) for 118 positive conclusions . One hypothesis is that authors with an elevated h-I are principal investigators of “better trials” with better findings survival than those of authors with a lower h-I. From our analysis we cannot conclude that this “author effect” is a cause or a consequence of scientific performance. Some authors may be supported more by industry for other reasons than their “intrinsic” quality. A means of verifying whether “an intrinsic” author exists would have been to assess the factors associated with survival among articles published at the beginning of the authors' scientific life.
Our study has significant limitations. The study is retrospective between 1945 and 2000 and only prospective for the last 10 years of follow-up (updated in 2009). The inclusion criteria selected authors who may not have been representative of the overall biomedical community. They had published articles on liver diseases with high methodological levels (majority of randomized trials) in two competitive journals (mainly Lancet and Gastroenterology) with high impact factors in 2008, 28.4 and 12.6, respectively. We also used methods to assess methodological quality which are not the most recent and valid ones.
This selection should explain the high observed h-I (median of 24 for all periods and 30 for the period of 1980-1999). The h-I cannot be compared between different scientific fields or between different periods of publications , . However, the observed median (h-I=30) is higher compared with h-I of the same medical fields: versus other medical faculty members (same period): 7.6 mean h-I in 826 US oncologists , median 10 for 29 Dutch professors in cardiology , and median 23 for 45 editorial board members . Because of this rather high h-I level, it is possible that our study suffered lack of power to demonstrate a prognostic significance of the h-I in multivariate analysis. “Top scientists” -according to the h-I were at the borderline of the prognostic value (Table 3). Enlarging the spectrum of authors could test this risk of error.
There is no gold standard for scientific truth definition. We used a definition that was decided by the majority vote of a panel of 5 experts, 10 to 65 years after the findings' publication. The main advantage was the duration of follow-up with subsequent progresses in the field of knowledge. The main weakness was the arbitrary choice of experts. To limit the risk of bias, the experts were chosen from different domains of Hepatology and had different ages . We also adjusted the prognosis analysis using the classification of studies according to positive predictive values per Ioannidis . The results were similar to the previous adjustments using the -validated quality scoring system of randomized trials and meta-analyses . However we think that the positive predictive value estimates could be improved for negative findings and for diagnostic studies, which is a growing part of clinical research.
The h-I estimates had limitations and we cannot rule out that these limitations might be able to explain the absence of clear and independent prognostic values –, –, [ 25]. The first limitation is the reliability of a citation index in oldest years (1945-1980) before the prospective existence of PubMed and Google Scholar. The second main limitation is the commonness of last names which could introduce false estimates of the h-I. However, with the high risk names we used “liver” as a supplementary selection criterion in Scholar research and checked the authorship twice using Scopus for authors still publishing after 1994. Moreover, the main results were similar using two other estimates, Scopus and ISI (Table S2), which were significantly concordant.
Finally the extrapolation of baseline h-I at the year when the paper was written suggest a clear and independent prognostic value of h-I. The main limitation of this index in comparison with the 2010 h-I estimates, is its indirect assessment. This extrapolation rely on the normality and linearity of the h-I progression rate. We used median to reduce the risk of variability but a real prospective validation of the h-I prognostic value is needed.
The h-I is simple, probably more accurate than other citation indexes for estimating authors' scientific outputs, and it is accepted when its limitations are understood , with  or without ,  irony. We agree with Horne et al, that retaining a dignified aloofness to the h-I could be difficult for those with scores of less than 30 .
For living hepatologists, at least, our conclusions were balanced. The present study failed to clearly demonstrate that the h-index of authors was a prognostic factor for truth survival. However the h-I was partly validated as associated with true conclusions, the methodological quality of trials and with positive predictive values combining power, ratio of true to not-true relationships and bias.
Furthermore an indirect (extrapolated) estimate of baseline h-I clearly observed a high and independent prognostic value for articles published after 1980. Prospective study in the next decades should be initiated to confirm this observation.
(0.07 MB DOCX)
(0.09 MB DOCX)
Competing Interests: There was no interest for Biopredictive in this study and the participation of the two researchers does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials, as detailed online in the guide for authors.
Funding: Funding source: Association de Recherche pour Maladies Virales et Hepatiques (ARMHV). MM and OD are full employees (Researchers) of Biopredictive. Biopredictive is a startup of Paris University Pierre et Marie Curie (UPMC), which is marketing biomarkers in liver diseases. Thierry Poynard is the founder of this company, the inventor of FibroTest, ActiTest, SteatoTest, AshTest and NashTest. The patents of these biomarkers belong to the French Public organization Assistance Publique Hôpitaux de Paris. MM and OD had no role in study design, analysis, decision to publish, or preparation of the manuscript. OD helped in the measurements of Hirsch Index, MM in the evaluation of scientific survival of articles conclusions. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.