Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Clin Transl Sci. Author manuscript; available in PMC 2012 December 1.
Published in final edited form as:
PMCID: PMC3252208

Evaluating Mastery of Biostatistics for Medical Researchers: Need for a new assessment tool

Felicity Enders, PhD, MPH1


Research training has enabled academic clinicians to contribute significantly to the body of medical research literature. Biostatistics represents a critical methodological skill for such researchers, as statistical methods are increasingly a necessary part of medical research. However, there is no validated knowledge and skills assessment for graduate level biostatistics for academic medical researchers. In this paper I review graduate level statistical competencies and existing instruments intended to assess physicians’ ability to read the medical literature and for undergraduate statistics for their alignment with core competencies necessary for successful use of statistics. This analysis shows a need for a new instrument to assess biostatistical competencies for medical researchers.

Keywords: core competency, biostatistics, knowledge and skills assessment


Research training has enabled academic clinicians to contribute significantly to the body of medical research literature. The National Institutes of Health (NIH) have a series of grants to provide funding for training such researchers, both individually and in training programs. Biostatistics represents a critical methodological skill for such researchers, as statistical methods are increasingly a necessary part of medical research. Published original medical research articles now nearly always include statistics, and the complexity of the statistical methods has increased over time1, 2. An assessment instrument is clearly indicated by Young et al (2002)3 who found that physicians who claimed to understand statistical methods in a self-assessment were unable to adequately explain these methods during a structured interview. However, this skill is very valuable. When competent to perform more basic statistical methods independently, graduates’ research can be completed more quickly and at lower cost. The same goal also helps biostatistics departments in academic medical centers, which are typically overwhelmed with work. Transferring some projects to medical research staff is in the best interests of both groups.

There are two primary disciplines which train medical researchers: clinical and translational science (CTS) and public health (PH). Public health programs are well established compared to the clinical and translational science programs, which for the most part have been developed in response to NIH funding proposals in recent years. Both types of programs include biostatistics in required coursework. In the past few years, both disciplines have worked to develop core competencies to clarify their discipline4, 5. While the competencies differ in other areas, the goals for biostatistics are largely the same.

Students’ competency in biostatistics should mirror the statistical methods required to engage in research by the time they complete the degree. The most fundamental statistical methods for design, analysis and reporting of clinical research are described as part of the Consolidated Standards of Reporting Trials (CONSORT) document6, which lists criteria for manuscripts from randomized clinical trials. Because randomized trials have fewer issues with confounding than observational studies, some of the more advanced statistical topics are de-emphasized in CONSORT. While randomized trials represent the pinnacle of medical research due to their experimental nature and greatly reduced chance of confounding, randomization is not always ethical or feasible. Therefore students also need a thorough grounding in observational methods; similar criteria have been developed for observational studies in the Transparent Reporting of Evaluations with Non-randomized Designs (TREND) and Strengthening the Reporting of OBservational studies in Epidemiology (STROBE) manuscript guidelines7-9. Of note, statistical errors in published manuscripts and those submitted for publication have continued even after the initial CONSORT document was published in 200110-12. In particular, Strasak et al (2007) found that 16% of original research articles in the New England Journal of Medicine used an incorrect statistical test; they found the same type of mistakes in 27% of original research articles in Nature Medicine10.

There has not yet been an effort to develop a knowledge and skills assessment for graduate level biostatistics to match the disciplines’ core competencies or the manuscript guidelines. However, such an assessment will be needed to ensure that students are learning the appropriate skills during their coursework, that program graduates have attained the intended competencies, and that competency is retained and provides the necessary research skills once graduates become practicing medical researchers.

While no assessment has been developed specifically for this group, there are two related types of assessment instruments available; outcome measures for introductory undergraduate statistics13 and biostatistical methods needed for practicing physicians to read and comprehend the medical literature14-20. The goal of this paper is to assess what topics should be included in an assessment for graduate level medical researchers and to evaluate whether any of the existing instruments is sufficient for this population.


A comprehensive list of biostatistics and statistically-related competencies for academic clinicians was developed in stages. First, I combined the statistical competencies from the CTS and PH competency documents to obtain a full list of competencies previously identified for graduate level biostatistics. In the interests of completeness, this included those area required for or dependent upon statistics. Competencies which were similar from a statistical viewpoint were grouped both within and between disciplines. I then compared this list with the set of statistical methods needed to write manuscripts with different experimental and observational study designs by using three guidelines; CONSORT, TREND, and STROBE 6-9. Methods from the manuscript guidelines which were not included in the disciplines’ competency list were added to create a set of comprehensive competencies for statistics in medical research.

Assessment instruments were identified by searching PubMed and Google Scholar. Unfortunately, searches such as “statistics assessment survey validation” produce results from across the medical literature because these topics are so inextricably intertwined with research, so it is possible that some assessment tools were missed in this review. Only published instruments designed to specifically assess statistics or biostatistics and which included either the instrument itself or a detailed description of individual questions were included in this review. The instruments were described in terms of the intended population, goals, and validation.

The instrument validation chosen by the authors included a variety of methods. In content validation, experts with knowledge of what should be contained in such an instrument provide a review. Two types of criterion validity were also used; in criterion validity, the scale is compared to a gold standard, in this case one associated with a higher level of statistical expertise. One of these was concurrent validity, in which the scores from people with more statistical expertise were shown to be better than the scores from people with less statistical expertise. The other was responsiveness, in which the instrument score was shown to change following exposure to statistical training.

A list of statistical topic areas was developed iteratively using the assessment items in conjunction with author’s experience as a faculty member teaching introductory biostatistics for eight years. Individual items from each instrument were then associated with statistical topic areas. More than one statistical topic area was assigned only when all areas were required to correctly respond to the question. Questions designed to collect self-efficacy data were excluded. Matching items were treated as individual questions for each matched pair. The statistical topic areas included in existing assessment instruments were cross-referenced with the statistical competencies for a gap analysis.


Table 1 shows a comprehensive set of statistical competencies together with the source of each. In addition to the 11 competencies described in the CTR and PH competency documents, there are six more competencies implied by the three sets of manuscript guidelines.

Table 1
Sources of statistical competencies including a comparison of similar competencies from the CTR and PH disciplines.

Table 2 describes meta-data for the statistical assessment instruments including the target population and validation metrics. One of the instruments, by delMas et al13, was designed to assess students learning undergraduate statistics. All of the remaining instruments were designed to assess whether various subsets of the medical community are able to use statistics effectively when reading the medical literature. None of the assessment instruments was designed specifically for a population of medical researchers.

Table 2
Overview of the target population and validation results for the statistical assessment instruments.

Table 3 shows the number and percent of items in each instrument pertaining to each topic. The numbers in the header row represents the total number of items in each instrument. Instruments are grouped by the intended population (undergraduate or medical). Readers should recall that some items are represented more than once when multiple topic areas are needed to correctly answer the question. Only three questions were associated with none of these statistical topics. One question in delMas et al assesses generalizability of results. One question in Berwick et al assesses regression to the mean; another requires students to identify the typical length of disease from prevalence and incidence information. One topic was missing from all these instruments: methods appropriate for clustered, matched, paired, or longitudinal studies..

Table 3
Number (percent) of items associated with statistical topic areas for undergraduate and medical assessment instruments

Within the instruments designed to assess numeracy in the medical community, there remains a wide variation in the statistical topics assessed. However, several topics are included in all but one or two instruments. These include the following statistical topic areas: assessing assumptions and selecting an appropriate method, unadjusted methods for independent continuous and binary data, significance testing and p-values, confidence intervals, and clinical relevance vs. statistical significance.

Table 4 cross-references the statistical methods used in Table 3 with the statistical competency areas. In addition, a gap analysis is provided to identify topics needed to address the competencies but not included in the set of statistical topics from Table 3. The gap analysis addresses only topic areas missing from the existing instruments, not the quality or quantity of existing questions.

Table 4
Comparison and gap analysis of statistical topics versus competency areas


This analysis shows the need for a new instrument to assess biostatistical competencies for medical researchers. Neither the existing instruments nor a set of questions taken across these instruments sufficiently addressed the competencies required for medical research using statistics. Further research will be required to identify and validate a brief but complete set but of questions addressing the competency areas appropriately.

The Comprehensive Assessment of Outcomes in Statistics (CAOS) test by delMas et al13 was specifically designed to assess students’ understanding of variability, and this is reflected in the topic areas of the questions. While appropriate for undergraduates in introductory statistics, the difference in distribution of question topics corroborates prior work demonstrates the quantifiable baseline differences of graduate students in biostatistics21. While variability remains a critical topic for the foundation of biostatistics, the competency expectations for this group are far more complex. However, the CAOS test is the only measure designed to assess students who are learning statistics in the classroom, rather than those who need only use statistics to read journal articles. As such, it provides an important indication of some topics which may be important for the learning process, such as understanding study designs and the central limit theorem.

The instruments designed to assess practicing physicians fit more closely with the goals of this paper. However, practicing physicians need only to be able to read and understand the research literature. They need not understand the intricacies of the statistical methods or perform the methods. The questions designed for this population reflect the need for physicians to understand pretest and posttest probability well in order to treat their patients appropriately22; there are numerous questions on diagnostic testing. Nevertheless, as a whole the instruments for this group seem somewhat inadequate even for their designated purpose, as there is little attention paid to assessing whether the appropriate method has been used or to interpreting statistical results. Both of these topics are required to robustly critique or defend a research manuscript. Similarly, very few questions pertain to confounding, yet a thorough grounding in this topic is essential for understanding and assessing observational studies. More research may be required to develop or refine an assessment instrument more appropriate for statistical methods in evidence based medicine.

The statistical competencies for clinical and translational research seem more mature than those from public health. This likely reflects the timing of competency development; the clinical and translational research group benefitted from the prior work of their public health colleagues. Both disciplines’ documents appear to have some gaps (Table 1). These are presumably oversights; competencies such as proposing appropriate study designs and performing sample size and power calculations are routinely included in most graduate level curricula for both disciplines. However, as competency documents are increasingly used to guide program evaluation, it is to be hoped that these gaps will be considered in future revisions.

There are several statistical topics required by the competencies but not included in any of these assessment instruments. Perhaps the most egregious of these omissions is the lack of items to assess dependent data, such as paired, matched, or longitudinal data. Such data are frequently used in observational studies, as matching is one way to minimize the effect of anticipated confounders. Indeed, Horton and Switzer (2005) found that 12% of New England Journal of Medicine’s original articles in 2004-2005 used repeated measures methods, one of the more advanced techniques for coping with dependent data23. Together, these gaps may prove as a starting point for an effort to develop a new assessment instrument targeted to graduate students in CTS and PH.

In developing a new assessment instrument for researchers, Table 4 may help guide instrument development as it shows both a comprehensive list of statistical competencies and the corresponding statistical methods upon which specific questions can be based. Most importantly, Table 3 includes an assessment of what topics are not included within this set of questions, but this list should not be considered exhaustive. A review of individual items for quality and comprehensiveness will be needed to ensure further questions are not needed for topic areas with existing questions.

It is likely that two instruments will be required, one a full assessment of statistical competency and another an assessment designed for topics taught in introductory biostatistics courses. This distinction is aided by the wording of the competencies, which vary in their level of expectation. Students are expected to perform lower level statistical methods themselves but work with a statistician for more complex methods. Working with a statistician requires some knowledge of appropriate methods, sufficient common language for collaboration, and the ability to interpret statistical results. Few of the previously developed items assess this second level of understanding.

A new assessment instrument would ideally be developed and validated by a group of inter-institutional experts in biostatistics as used in clinical research. Fortunately, such a group has been formed by the National Institutes of Health through the CTSA award mechanism. Researchers from CTSA institutions collaborate in a variety of areas, including education and evaluation24. Indeed, the biostatistics courses taught within the CTSA come from both public health and clinical and translational research program, so this collaboration could also provide an initial step toward unifying the statistical competencies from these disciplines. A new publicly-available instrument would open the door for further intra- and inter-institutional research on knowledge and skill levels before and after biostatistics coursework, before and after research degree programs, and among practicing researchers. This research would hopefully aid both statistics coursework and program development to improve graduate outcomes.


Funding: Mayo Clinic CTSA (NCRR U54RR 24150-5)


1. Hellems MA, Gurka MJ, Hayden GF. Statistical literacy for readers of Pediatrics: a moving target. Pediatrics. 2007;119(6):1083–1088. [PubMed]
2. Reed JF, 3rd, Salen P, Bagher P. Methodological and statistical techniques: what do residents really need to know about statistics? Journal of medical systems. 2003;27(3):233–238. [PubMed]
3. Young JM, Glasziou P, Ward JE. General practitioners’ self ratings of skills in evidence based medicine: validation study. BMJ. 2002;324(7343):950–951. [PMC free article] [PubMed]
4. Workgroup CECC Core Competencies in Clinical and Translational Science for Master’s Candidates.
5. Calhoun JG, Ramiah K, Weist EM, Shortell SM. Development of a Core Competency Model for the Master of Public Health Degree. American Journal of Public Health. 2008;98(9):10. [PubMed]
6. Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c869, 1–28. [PubMed]
7. Des Jarlais D, Lyles C, Crepaz N, Group T. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. Am J Public Health. 2004;94(3):361–366. [PubMed]
8. von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Bulletin of the World Health Organization. 2007;85(11):867–872. [PubMed]
9. Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. PLoS medicine. 2007;4(10):e297. [PubMed]
10. Strasak AM, Zaman Q, Marinell G, Pfeiffer KP, Ulmer H. The Use of Statistics in Medical Research: A Comparison of The New England Journal of Medicine and Nature Medicine. American Statistical Association. 2007;61(1):47–56.
11. Olsen CH. Review of the use of statistics in infection and immunity. Infection and immunity. 2003;71(12):6689–6692. [PMC free article] [PubMed]
12. Harris AH, Reeder R, Hyun JK. Common statistical and research design problems in manuscripts submitted to high-impact psychiatry journals: what editors and reviewers want authors to know. Journal of psychiatric research. 2009;43(15):1231–1234. [PubMed]
13. delMas R, Garfield J, Ooms A, Chance B. Assessing Students’ Conceptual Understanding After A First Course in Statistics. Statistics Education Research Journal. 2007;6(2):28–58.
14. Ferrill MJ, Norton LL, Blalock SJ. Determining the Statistical Knowledge of Pharmacy Practitioners: A Survey and Review of the Literature1. American Journal of Pharmaceutical Education. 1999;63
15. Berwick DM, Fineberg HV, Weinstein MC. When doctors meet numbers. AM J Med. 1981;71(6):991–998. [PubMed]
16. Windish D, Huot S, Green M. Medicine Residents’ Understanding of the Biostatistics and Results in the Medical Literature. The Journal of the American Medical Association. 2007;298(9):1010–1022. [PubMed]
17. Wulff HR, Andersen B, Brandenhoff P, Guttler F. What do doctors know about statistics? Stat Med. 1987;1987(6):1. [PubMed]
18. Novack L, Jotkowitz A, Knyazer B, Novack V. Evidence-based medicine: assessment of knowledge of basic epidemiological and research methods among medical doctors. Postgrad Med J. 2006;82:817–822. [PMC free article] [PubMed]
19. Ahmadi-Abhari S, Soltani A, Hosseinpanah F. Knowledge and attitudes of trainee physicians regarding evidence-based medicine: a questionnaire survey in Tehran, Iran. Journal of Evaluation in Clinical Practice. 2008;14:5. [PubMed]
20. Raju TN, Langenberg PW, Vidyasagar D, Sen AK. A biostatistical survey questionnaire. The Journal of pediatrics. 1988;112(6):859–863. [PubMed]
21. Enders F. Performance of Graduate Students on CAOS Test Items. Submitted to the Statistics Education Research Journal on March 17, 2011.
22. Rao G. Physician numeracy: essential skills for practicing evidence-based medicine. Fam Med. 2008;40(5):354–358. [PubMed]
23. Horton NJ, Switzer SS. Statistical Methods in the Journal. New England Journal of Medicine. 2005;353(18):1977–1979. [PubMed]
24. Kon AA. The Clinical and Translational Science Award (CTSA) Consortium and the translational research model. The American journal of bioethics : AJOB. 2008;8(3):58–60. discussion W51-53. [PMC free article] [PubMed]