|Home | About | Journals | Submit | Contact Us | Français|
Objective To understand belief in a specific scientific claim by studying the pattern of citations among papers stating it.
Design A complete citation network was constructed from all PubMed indexed English literature papers addressing the belief that β amyloid, a protein accumulated in the brain in Alzheimer’s disease, is produced by and injures skeletal muscle of patients with inclusion body myositis. Social network theory and graph theory were used to analyse this network.
Main outcome measures Citation bias, amplification, and invention, and their effects on determining authority.
Results The network contained 242 papers and 675 citations addressing the belief, with 220553 citation paths supporting it. Unfounded authority was established by citation bias against papers that refuted or weakened the belief; amplification, the marked expansion of the belief system by papers presenting no data addressing it; and forms of invention such as the conversion of hypothesis into fact through citation alone. Extension of this network into text within grants funded by the National Institutes of Health and obtained through the Freedom of Information Act showed the same phenomena present and sometimes used to justify requests for funding.
Conclusion Citation is both an impartial scholarly method and a powerful form of social communication. Through distortions in its social use that include bias, amplification, and invention, citation can be used to generate information cascades resulting in unfounded authority of claims. Construction and analysis of a claim specific citation network may clarify the nature of a published belief system and expose distorted methods of social citation.
Biomedical knowledge arises from scientific data. The means by which this occurs within individual scientific papers is a generally accepted process whereby papers report rationale, methods, results, and conclusions. How an entire belief system shared by a scientific community ultimately evolves from data across all papers within a specialty is less well understood. I describe and apply methods for the analysis of such belief systems using a specific example.
The belief system studied is that a protein, β amyloid, known for its role in injuring brain in Alzheimer’s disease, is also produced by and injures skeletal muscle fibres in the muscle disease sporadic inclusion body myositis. This belief system was chosen for analysis because of its importance to the care of patients with inclusion body myositis, as this view seems to be accepted by many as likely or established fact (at least 200 different journal articles have stated such), with β amyloid production often reported to be a central element in the pathogenesis of the disease (see web extra note 1), and directs research and treatment trials in the specialty. The approach taken here was simply to collect all statements in the medical literature on this belief system and to study the pattern of citation among them—that is, how each statement is supported by reference to other papers.
The methods are fully described in web extra note 2. Briefly, queries identified all English language PubMed indexed articles potentially containing statements pertaining to any of three related molecules (β amyloid precursor protein, its transcript, or one of its potential cleaved protein products, β amyloid) and muscle disease. These 766 papers (see web extra table 1) were searched for statements addressing the belief that these molecules are abnormally and specifically present in muscle fibres of patients with inclusion body myositis among many other muscle diseases, identifying 302 papers1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 addressing the broad category of “amyloid” and inclusion body myositis of which 242 papers discussed these specific molecules (see web extra table 2). I collected all statements addressing the belief and citations supporting these statements. Each paper was classified as primary data (containing experimental data addressing the specific and abnormal presence of these molecules in inclusion body myositis muscle), myositis review (review papers with the term myositis or the equivalent in their title), model (reporting cell culture or animal model experiments), or other (all other papers). I classified each citation as supportive, neutral, or critical according to how its underlying statement supported the belief. A network was then constructed representing papers as nodes and citations as links from one node to another. Another investigator (Anthony Amato) validated text and citation extraction for 17% of the papers, including all primary data papers.
This citation network was further extended into research proposals funded by the US National Institutes of Health, obtained through the Freedom of Information Act in accordance with National Institutes of Health policy.
This claim specific citation network was then analysed using graph theory303 (see methods in web extra). Briefly, custom MATLAB software (MathWorks; Natick, MA) and the MatlabBGL package (written by David Gleich) were used for the analysis of adjacency matrices representing these networks. A centrality measure304 on the papers was defined (called the citation path index; similar to other variants of centrality measure305). Authority was measured according to the method of Kleinberg.306 Visualisation of networks was carried out using Pajek (http://vlado.fmf.uni-lj.si/pub/networks/pajek/). The maximum likelihood estimate method307 was implemented in MATLAB, with code available from www.santafe.edu/~aaronc/powerlaws/.
The claim that β amyloid and its precursors are abnormally and specifically present in inclusion body myositis muscle fibres among many other muscle diseases was studied. The 242 papers containing statements addressing it (all exact text provided in web extra table 3) and the 675 citations (not counting duplicates from one paper to another; see web extra table 4) supporting these statements were used to construct a claim specific citation network (fig 11).). This network contained 220609 citation paths, with chains of citations flowing from one paper to the next representing the entire National Library of Medicine PubMed indexed discourse on the claim as of 26 October 2007. The historical growth and various mathematical properties308 of this network are discussed in web extra note 3.
Within networks certain nodes may be recognised as “authorities,”306 receiving large amounts of network traffic. Such authorities can be identified by computational methods alone through examining the patterns of connections among the nodes; this is how many internet search engines identify authoritative web pages. Because citation is in part an act of communication within a community of people, social network theory309 in particular can be used to analyse it. Under social network theory, authority of a claim indicates the community’s net belief about it. Using these computational methods,306 four primary data papers, five model papers, and one review paper constituted the 10 most authoritative papers. All these papers expressed the view that the claim was true.
Of the 10 most authoritative papers, four provided experimental data addressing the claim, reporting the presence of these molecules in inclusion body myositis muscle fibres.74 75 79 80 All four papers were from the same laboratory, two of which79 80 probably reported mostly the same data without citing each other, a practice currently viewed as one that distorts available evidence (see web extra note 4). Major technical weaknesses were present in these papers, most notably a lack of quantitative data as to how many affected muscle fibres were seen and a lack of specificity of reagents for distinguishing β amyloid protein from β amyloid precursor protein (see web extra note 5).
Inspection of the network disclosed six primary data papers that were relatively isolated, receiving no or few citations (fig 1). These papers contained data that refuted or weakened the claim. Three papers71 73 77 from independent laboratories reported that in a combined 35 patients with inclusion body myositis studied, 28 had no affected muscle fibres while the remaining seven had five or fewer affected muscle fibres (typical biopsy sections contain thousands of muscle fibres). Two papers70 72 by the laboratory that wrote the four authority papers reported that two of these molecules (β amyloid precursor protein transcript and protein) were not specific to inclusion body myositis but were present in muscle fibres during regeneration in all diseased controls (up to 43 patients in seven disease categories, including polymyositis, dermatomyositis, Duchenne muscular dystrophy, and amyotrophic lateral sclerosis). These findings weaken the view that abnormal amounts of these molecules have any specificity to inclusion body myositis and that they cause degeneration of myofibre in patients with inclusion body myositis. One of these papers reported that all three molecules, including β amyloid, were produced by muscle invading macrophages in inclusion body myositis and all other inflammatory myopathies,70 offering an alternative source than myofibre production for them and indicating that β amyloid was non-specifically present in other inflammatory myopathy muscle (see web extra note 6 for a detailed discussion of these papers).
To understand why supportive but not critical data achieved authority over the ensuing 12 years since publication of all of these data, the number of citations received by each paper was analysed (fig 22).). The supportive papers received 94% of the 214 citations to these primary data, whereas the six papers containing data that weakened or refuted the claim received only 6% of these citations (differing citation frequency, P=0.01). Citation bias, here defined as statistically significant differences in the number of citations received among primary data papers, seemed to be specifically against critical data not the laboratory producing it, as two papers70 72 that were biased against were written by the same research group that wrote four of the highly cited supportive papers. For example, one of the papers70 addresses a crucial question in the specialty, the relation between inflammation and degeneration,1 2 3 9 but reported data that potentially conflicted with the belief that β amyloid is produced by inclusion body myositis myofibres or is uniquely present in inclusion body myositis muscle (reporting that β amyloid is produced by muscle invading macrophages in all inflammatory myopathies). These data have never been cited by their authors despite them having made 104 citations about β amyloid to other primary data papers.
Citation bias has also been used to claim that animal and cell culture experiments are valid models of inclusion body myositis, in 17 papers.81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 Of the 32 citations to primary data from these papers, 31 (97%) flowed to the four highly supportive papers,74 75 79 80 whereas only one citation (3%) was made to any of the six papers that presented data weakening or refuting these as valid models for inclusion body myositis (fig 33).). For example, one paper83 cited another74 in support of “abnormal accumulation of Aβ-containing inclusions are present in skeletal muscle of IBM patients” but not papers that found no71 or little73 77 β amyloid protein. Similarly, the same paper83 cited a paper75 in support of “there is evidence that APP [amyloid precursor protein] mRNA levels are selectively enhanced in human IBM [inclusion body myositis] samples thereby providing physiological justification for the overexpression of this protein in transgenic mice,” but not the paper73 that found no β amyloid precursor protein mRNA or the paper,72 by the same authors as the paper,75 that found β amyloid precursor protein mRNA not “selectively enhanced” in inclusion body myositis but present in muscle fibres in all other muscle diseases examined. The uncited data72 suggest that the animal and cell culture experiments are no more models of inclusion body myositis than any other neuromuscular disease in which muscle regeneration occurs.
Some papers cited content but distorted it. This is not citation bias, as papers are cited, but rather a different process called here “citation diversion”—that is, the citing of content but the altering of its meaning in a manner that diverts its implications.
One primary data paper77 reported no β amyloid precursor protein or β amyloid in three of five patients with inclusion body myositis and its presence in only a “few fibres” in the remaining two patients. Three papers28 37 38 cited these data (fig 1) reporting that they “confirmed” the claim (for example, one paper38 said “βAPP76 77 in s-IBM fibers has been confirmed by others”). Whether such data confirm the claim is perhaps open to interpretation. At the least these data are exaggerated and generalised into a view that β amyloid precursor protein is “accumulated in vacuolated muscle fibers of s-IBM patients77 [other]” as stated by one paper,28 supported by an erroneous citation because three patients in the paper77 had 1.4% to 5% of their myofibres vacuolated but all lacked β amyloid precursor protein. Over the ensuing 10 years, these three supportive citations developed into 7848 supportive citation paths—chains of false claim in the network created by citation diversion.
In another example of citation diversion one paper81 stated “Thus, it has been widely accepted that intracellular accumulation of βAPP, Aβ [β amyloid] and other βAPP proteolytic fragments play an important role in the pathogenesis of IBM,86 89” although one of the papers89 had not widely accepted this claim, stating “Aβ-intracellular deposition may be an epiphenomenon unrelated to myofiber death.”
Between 1996 and 2007 support for the claim grew exponentially, with the number of supportive citations and citation paths increasing sevenfold and 777-fold, to 636 citations and 220553 citation paths. In contrast, the critical view grew to only 21 citations and 28 citation paths (fig 44).). No papers refuted or critiqued the critical data, but instead the data were just ignored. Analysis of a claim specific citation network can identify exactly which papers and citations have been most influential in pushing forward belief304 305 (see web extra note 7). The increased support was facilitated by a small number of papers, not reporting any primary data, through which large amounts of traffic (citation paths) flow in the network. For example, 63% of all citation paths (n=139391) flow through one review paper21 (compared with 2% of citation paths flowing through randomly selected other papers); 95% of all citation paths flow through four review papers16 18 21 37 by the same research group (8% through four randomly selected other papers).
A lens effect was present in which a small number of these influential review papers and model papers containing no data on claim validity collected and focused citation (similar to a magnifying lens collecting light) on particular primary data papers supportive of the belief, while isolating others that weakened it (fig 4). Such papers have a network property known as high betweenness centrality.304
The term amplification can be used to describe the expansion of a claim’s belief system by citation to papers lacking any data addressing it, the phenomenon observed here. Amplification is not inherent to published belief systems. Authors could choose to cite only primary data when making claims, resulting in amplification minimal networks. Amplification of a claim is instead introduced into belief systems through the citing of review papers and other papers that lack data addressing the claim. Certainly such papers may be cited for other reasons; amplification only arises when they are cited to support claims of experimental results reported elsewhere. (See web extra note 8 for further discussion of amplification and methods for quantifying it.)
Papers may be biased against for many potential reasons. To examine the role of bias exclusively against critical content in establishing authority, a simulated network was constructed in which all statements making a supportive claim were amended to recognise critical views of equivalent content and temporal availability. Removing bias against critical content was sufficient to result in authority status for five of the six infrequently cited primary data papers (fig 4), indicating that authority status of the claim emerges from the citation bias against critical content. The claim cannot be both true and false; the resulting balanced authority of supportive and refuting papers indicates that without citation bias there would be balanced belief in its truth and falseness (see web extra note 9).
Distinct from citation bias and amplification, certain types of fact developed and spread through the belief system. These particular facts were not those that arose from restatement of published claims, but rather involved different mechanisms either deliberate or through scholarly negligence, herein called invention. For example, a subclaim (that the accumulation of β amyloid occurs early and precedes other abnormalities) has variously been stated as hypothesis, likelihood, or fact in 27 papers supported by 37 citations (see web extra note 10). Nine of these citations (24%), used to support text making these claims, in fact flowed to papers that contained no statement on the temporal relation of β amyloid to other abnormalities in inclusion body myositis muscle (dead end citations). This subclaim had transformed from hypothesis to “fact” through citation alone, a process that might be called citation transmutation (fig 55).). Thus one paper5 contained it as fact (“The appearance of Aβ-positive, noncongophilic deposits precedes vacuolization in IBM muscle fibers80”) supporting this statement by citing the paper80 where it had only been proposed as hypothesis (“may represent early changes of IBM”). Similarly, another paper134 reported this as fact (“our previous studies demonstrated that abnormalities of βAPP precede other changes including congophilia74 80 141”) even though the cited papers stated it only as hypothesis74 80 or made no statement at all141 about the accumulation of β amyloid precursor protein preceding other abnormalities.
In another form of invention, claims are introduced as fact through a “back door” that bypasses peer review and publication of methods and data. This is accomplished by repeated misrepresentation of abstracts as papers (seven different papers, 17 citations to 12 different misrepresented abstracts; for example, citation to Neurol 2003;60:333-334, an abstract with correct listing Neurol 2003;60(suppl 1):A333-4; see web extra note 11). The claim that “β-amyloid42 isoform [is] more common than β-amyloid40”4 is supported in this manner and accepted by peers as fact (paper 2 states this citing paper 4) (see web extra note 12 for another form of invention called title invention).
Through the publication of scientific papers and the demonstration of these publications as evidence of productivity, the elements of bias, amplification, and invention can be used indirectly to support requests for research funding. To determine if these mechanisms were used directly to support such requests, the claim specific citation network was extended from the PubMed indexed literature into the research sections and bibliographies of National Institutes of Health funded grant proposals containing text addressing the claim, obtained under the Freedom of Information Act according to National Institutes of Health policy.310 Of 27 grant proposals requested (identified through searches of the National Institutes of Health CRISP database as described in web extra note 13), nine were released by the National Institutes of Health. These seemed to be the proposals most pertinent to the belief system.
Citation bias or invention was present in eight of nine of these proposals (fig 66).). Of 23 citations to primary data (not counting multiple citations from one proposal to a single paper) addressing the claim’s validity, 20 were made to supportive primary data (19 supportive citations and one neutral citation), two were instances of citation diversion (one paper77 again cited for supporting the claim when it weakens it), and one was made to critical content. Invention of fact supported through citation to hypothesis, dead end citation, and abstracts misrepresented as papers were similarly present in these funded proposals. These were sometimes used directly to justify requests for funding of the proposed studies (for example, “The accumulation of epitopes of βAPP is an early event in the disease relative to the other changes,37 96 justifying our focused investigation of Aβ”; one paper37 stated this only as hypothesis; the other paper96 stated this as likelihood not fact, supporting that view also through citation to the other paper,37 stated as hypothesis (see web extra note 13 for further discussion).
Citation, the act of connecting text statements through reference to the broader literature, is not simply an impartial scholarly method for joining related published knowledge. Citation may be used for self serving purposes311 or as a tool for persuasion312 (see web extra note 14). These aspects of citation might be called social citation. I studied how distortions of the persuasive aspect of social citation may result in broad acceptance of unfounded claims as fact. These distortions can be detected and interpreted through social network theory309 because citation as persuasion is a social behaviour. Network theory applied to citation networks constructed from entire paper bibliographies, such as the science citation network,313 can disclose societal attitudes to journals and specific papers (for example, impact factors), but these networks are not suitable for understanding the foundation for belief in specific claims. When networks are instead confined to citation pertaining to one set of related claims (a claim specific citation network), they become sharply focused tools for understanding social communication pertaining to the claims—what is in effect the published record of a belief system shared by a community. These allow for study of not just what is said about a belief (the traditional scope of review papers), but also who hears it and how it is retold.
The general approach taken here (fig 77)) addressed belief in claims; no experiments were done addressing their truth. The computational analysis of the claim specific citation network representing this belief system detected certain distortions in the patterns of citation that would not have been expected had only scholarly citation been used. Primary data that weakened or refuted claims on which the belief was based were ignored (citation bias) and a small number of influential papers and citations exponentially amplified supportive claim over time without presenting new primary data (amplification). Certain related claims were invented as fact. The combined effects of these citation distortions resulted in authority of the belief (acceptance of it) according to social network theory.
There are varied forms and consequences of distorted persuasive citation seen in this study (see box). Citation bias against critical content can be used for the systematic support of claim,314 results in the loss of implications of isolated data (see web extra note 15), and can be used to justify construction of animal models, which can then be circularly used to amplify claim (see web extra note 16). Such animal models have enormous appeal, and some publications describing them achieved authority status in this network (fig 1) despite reporting no data addressing the claim—that is, whether these β amyloid related molecules are present in human inclusion body myositis muscle. Amplification involves repetitive citation of review papers or other papers lacking data, often through self citation, features noted previously in a variation of a claim specific citation network.315 Invention has multiple variations.
Three factors may account for how citation distortions created authority in this belief system. Foremost is the power of citation through the choice of which papers to cite and which to ignore (citation bias), by citing but distorting content (citation diversion), and by using citation to invent fact (citation transmutation, dead end citation, and back door invention).
Second is an inherent property of negative results, which failed to spread through the network. These were not repeatedly cited by their authors in subsequent papers (only one instance was present274) as perhaps there was simply nothing further to say about them. Unlike “positive results” there is nothing exciting to be repeatedly written about how something was not found in an experiment. Thus the progression from data to accepted claim is different within a single paper compared with across many papers in a specialty. Within a single paper readers generally view new claims as false until proved true through convincing methods and results. Across a network of papers, however, the barrier to the propagation of negative results biases claims as being viewed as true until proved false.
Thirdly, this belief system is possibly an information cascade (also called an informational cascade),316 317 an entity resulting when people perceive advantage in accepting the prevailing view over any private information they may have when making choices. Indeed certain mathematical properties of information cascades (preferential attachment) would be expected to produce a network with properties seen here (a biased network with a power law distribution of node degrees; see web extra note 3). Many authors may just not be aware of the critical data, as these data are effectively isolated from the discourse about this claim and not mentioned in any review articles. Although unsound information cascades are in theory fragile and fall apart quickly when exposed,316 this may not occur in biomedical belief systems, where contradicted claims may persist.318
Many published biomedical belief systems may be information cascades because repetition of claims is ubiquitous in the biomedical literature. Many are built on sound data, with authors repeating claims after trusting the published expert opinion of their colleagues. However, there are incentives for generating and joining information cascades regardless of their soundness. Joining an information cascade aids publication as articles have to say something and negative results are biased against.319 Generating and joining an information cascade may improve the likelihood of obtaining research funding because hypothesis driven research is an essential requirement320 at many research funding agencies such as the National Institutes of Health, and successful funding generally requires a “strong hypothesis . . . based on current scientific literature”320—that is, the published belief system of a claim. Chances for successful funding may therefore be increased through joining the cascade (repeating the claim and proposing experimental plans around it). In the extension of this citation network into text within grant proposals that have been funded by the National Institutes of Health, citation bias, diversion, or invention were often present. Once research funding has been used to join a cascade there are further incentives to interpret results through confirmation bias (“in a way that confirms one’s preconceptions and to avoid information and interpretations which contradict prior beliefs”321) to demonstrate success of the research for subsequent funding. Although joining an information cascade may be an optimal behaviour for some people, it reduces the likelihood that future investigators can discover whether it is sound.317
Methods for the construction and analysis of comprehensive claim specific citation networks present challenges and limitations. These include interpreting meaning of text, as people may reasonably interpret text differently, and understanding the distinct phenomena observed (see web extra note 17 for a discussion of these issues). In principle many biomedical claims have an associated citation network, the study of which provides a powerful approach to detecting citation bias, amplification, and invention, and understanding the nature of the authority of the claim.
I thank Daniel Rockmore (Department of Mathematics, Dartmouth College); Daniel Jonah Goldhagen; Peter Park (Harvard Medical School); and Einer Elhauge (Harvard Law School), for thoughtful discussions. Anthony Amato (Harvard Medical School) carried out validation of extracted text and citations.
Funding: SAG is in part supported by National Institutes of Health grants R01NS43471 and R21NS057225, and Muscular Dystrophy Association grant MDA4353. These grants did not contain specific aims directly encompassing this research.
Competing interests: SAG had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Ethical approval: Not required.
Cite this as: BMJ 2009;339:b2680