|Home | About | Journals | Submit | Contact Us | Français|
“What is truth said jesting Pilate….” On truth, F Bacon 1561–1626
Truth in clinical medicine usually emerges slowly from research, kicking itself free from a variety of influences that frustrate its establishment. Clinical research progresses through diverse routes but often starts with descriptive studies that indicate associations between a measured parameter and a disease state, and the strength of the association is measured by statistical tests that measure the probability of the finding arising by chance. Under these circumstances, proof of causality is slowly established by a combination of repeated observations, the elucidation of a plausible pathogenic mechanism, and ultimately by the use of an intervention that has a direct effect on pathogenic processes or events. These are often called proof of concept experiments. Error occurs quite frequently along this journey, at a number of stages and for a variety of reasons. The diversions that result from these errors may be expensive, wasteful, and potentially dangerous to patients. The source of the erroneous conclusions, therefore, is of importance to patients, researchers, and medical practitioners, as well as editors of medical journals.
Among the huge range of erroneous influences is the improper use of statistics. A depressing but surprising truth for clinical investigators is the lack of robustness of the tests used, often manifesting itself as bewilderment in a clinical investigator after consultation with a statistician. Central, however, to the way statistics are used in biomedical research is that they do not purport to establish truth, they merely give some insight into the likelihood that the observation may have arisen by chance.
Genetic studies are a particularly egregious example of this problem. In single gene disorders, which manifest as a pattern of disease that approximates to Mendelian inheritance, and where large families can be collected that provide a good number of informative meioses across generations, the association of a mutation in a gene that tracks with the disease gives very high levels of significance in statistical tests and levels of chance association that would not be out of keeping with winning the jackpot in the National Lottery. Even so, such findings only indicate that the variant being studied is physically close to the one that causes the disease; to conclude that the variant is itself causal requires further biological evidence. For example, ultimate proof sometimes requires the recreation of the phenotype in an animal model or reversal of the phenotype by restoring production of the wild-type protein.1
Genetic studies in multifactorial, and inevitably polygenic disease, are much more difficult. Under these circumstances the genetic component may be quite modest (as it is in coronary artery disease) and contributed to by a large number of genes further diluting the effect of any variation in a single gene. Further, variation in these genes (polymorphisms) may be associated with only a particular phenotype that is contained within the main diagnosis (for example, presentation with acute coronary syndrome versus stable angina, or low renin hypertension versus high renin hypertension).
The approach to investigation of polygenic disease has been broadly two fold: linkage studies where sets of affected family members (often sibling pairs) are collected and whole genome screens conducted with microsatellite markers of genetic variation; and case–control association studies where polymorphic variation in plausible pathogenic genes is tested for association with disease states, or phenotypes, by measurement of the frequency of different alleles in cases and controls. It is the latter that, as a scientific tool, is much exercising the medical research community.
Genetic case–control association studies, especially small underpowered ones, are easy to do. They may, if conducted well, with intensively characterised phenotypes and with appropriate sample size, study design, and use of statistics, be very powerful. However, their ease of execution is in some way their downfall. Relatively small datasets of patients with imprecisely characterised phenotypes are repeatedly studied for association with ever increasing numbers of genes, sometimes with no clear hypothesis. Statistical testing is conducted often with little or no acknowledgement of the number of times the sample has been tested and no correction of the “p value” for these multiple comparisons. Estimates of the power of studies are often based on total patient numbers and not the number of informative events. The situation is exacerbated by the potential for publication bias (where positive results are more frequently submitted and accepted for publication compared with negative studies). As a consequence numerous small studies are conducted that cannot be confirmed in final large definitive studies. The current situation, as a result, is that these studies have become devalued currency to the extent that some top rank genetic journals will not even consider these for publication. Clinical investigators are, however, not deterred from conducting these studies and this practice results in journals such as Heart now receiving an ever increasing number of such studies.
In an attempt to prevent the dismissal of a technique that does, when used appropriately, give useful information, but also not to mislead, Heart has now formulated instructions to authors that indicate what this journal feels may be expected of genetic case–control association studies (http://heart.bmjjournals.com/misc/ifora.shtml). Within these guidelines is an acceptance that the demands for the level of correction for multiple comparisons and statistical certainty suggested by some (p values often to 10−5)2 are likely to kill off this type of research and, in any case, are not always statistically appropriate. It is also recognised that repeated findings of the same association add to confidence that the association is real but it is noted that initial studies which have since been replicated have had p values less than 10−3.3 Thus, initial studies which are seen as “hypothesis generating” and requiring subsequent replication, can be of value.
It is also clear that where there are associations between genetic variation and an intermediate marker (phenotype) that is closely or immediately related to the gene product, the size of the study can more reliably fit with conventional levels of significance and assessment of power. Absolute requirements on size of study have not been set as it is clear that the power to detect, or exclude, a given genetic effect depends on many parameters other than the number of informative events/cases. Thus, stringent selection of cases to enrich for genetic load, analysis of functional variants or of previously implicated haplotypes of variants, and analysis of heritable intermediate phenotypes all increase the prior likelihood that an observed association will be real. In contrast, it is important for this journal that it does not encourage the publication of articles reporting associations that cannot be confidently made and that are not transparent in the accurate assessment and description of the phenotype, as well as making an honest assessment of the number of comparisons that have been made with a dataset.
It is recognised that there is room for a matter of opinion in this area, and indeed the consultation exercise that was undertaken suggests that opinion is quite widely spread. The new guidelines to authors, however, contain some flexibility, but set clear lines for what this journal feels may be acceptable for publication and the style in which it wishes to receive the submissions. It is hoped that this will not only facilitate the review process and lead to more consistency in what is accepted for publication, but that it may help investigators design better studies from the outset.