The increasing volume of research by the medical community often leads to increasing numbers of contradictory findings and conclusions. Although the differences observed may represent true differences, the results also may differ because of sampling variability as all studies are performed on a limited number of specimens or patients. When planning a study reporting differences among groups of patients or describing some variable in a single group, sample size should be considered because it allows the researcher to control for the risk of reporting a false-negative finding (Type II error) or to estimate the precision his or her experiment will yield. Equally important, readers of medical journals should understand sample size because such understanding is essential to interpret the relevance of a finding with regard to their own patients. At the time of planning, the investigator must establish (1) a justifiable level of statistical significance, (2) the chances of detecting a difference of given magnitude between the groups compared, ie, the power, (3) this targeted difference (ie, effect size), and (4) the variability of the data (for quantitative data). We believe correct planning of experiments is an ethical issue of concern to the entire community.
The idea that perceptual and cognitive systems must incorporate knowledge about the structure of the environment has become a central dogma of cognitive theory. In a Bayesian context, this idea is often realized in terms of “tuning the prior”—widely assumed to mean adjusting prior probabilities so that they match the frequencies of events in the world. This kind of “ecological” tuning has often been held up as an ideal of inference, in fact defining an “ideal observer.” But widespread as this viewpoint is, it directly contradicts Bayesian philosophy of probability, which views probabilities as degrees of belief rather than relative frequencies, and explicitly denies that they are objective characteristics of the world. Moreover, tuning the prior to observed environmental frequencies is subject to overfitting, meaning in this context overtuning to the environment, which leads (ironically) to poor performance in future encounters with the same environment. Whenever there is uncertainty about the environment—which there almost always is—an agent's prior should be biased away from ecological relative frequencies and toward simpler and more entropic priors.
Bayes; Prior probability; Subjectivism; Frequentism
Scientists who use animals in research must justify the number of animals to be used, and committees that review proposals to use animals in research must review this justification to ensure the appropriateness of the number of animals to be used. This article discusses when the number of animals to be used can best be estimated from previous experience and when a simple power and sample size calculation should be performed. Even complicated experimental designs requiring sophisticated statistical models for analysis can usually be simplified to a single key or critical question so that simple formulae can be used to estimate the required sample size. Approaches to sample size estimation for various types of hypotheses are described, and equations are provided in the Appendix. Several web sites are cited for more information and for performing actual calculations.
number of animals; power calculation; sample size; statistical analysis
The belief remains widespread that medical research studies must have statistical power of at least 80% in order to be scientifically sound, and peer reviewers often question whether power is high enough.
This requirement and the methods for meeting it have severe flaws. Notably, the true nature of how sample size influences a study's projected scientific or practical value precludes any meaningful blanket designation of <80% power as "inadequate". In addition, standard calculations are inherently unreliable, and focusing only on power neglects a completed study's most important results: estimates and confidence intervals. Current conventions harm the research process in many ways: promoting misinterpretation of completed studies, eroding scientific integrity, giving reviewers arbitrary power, inhibiting innovation, perverting ethical standards, wasting effort, and wasting money. Medical research would benefit from alternative approaches, including established value of information methods, simple choices based on cost or feasibility that have recently been justified, sensitivity analyses that examine a meaningful array of possible findings, and following previous analogous studies. To promote more rational approaches, research training should cover the issues presented here, peer reviewers should be extremely careful before raising issues of "inadequate" sample size, and reports of completed studies should not discuss power.
Common conventions and expectations concerning sample size are deeply flawed, cause serious harm to the research process, and should be replaced by more rational alternatives.
The conventional approach of choosing sample size to provide 80% or greater power ignores the cost implications of different sample size choices. Costs, however, are often impossible for investigators and funders to ignore in actual practice. Here, we propose and justify a new approach for choosing sample size based on cost efficiency, the ratio of a study’s projected scientific and/or practical value to its total cost. By showing that a study’s projected value exhibits diminishing marginal returns as a function of increasing sample size for a wide variety of definitions of study value, we are able to develop two simple choices that can be defended as more cost efficient than any larger sample size. The first is to choose the sample size that minimizes the average cost per subject. The second is to choose sample size to minimize total cost divided by the square root of sample size. This latter method is theoretically more justifiable for innovative studies, but also performs reasonably well and has some justification in other cases. For example, if projected study value is assumed to be proportional to power at a specific alternative and total cost is a linear function of sample size, then this approach is guaranteed either to produce more than 90% power or to be more cost efficient than any sample size that does. These methods are easy to implement, based on reliable inputs, and well justified, so they should be regarded as acceptable alternatives to current conventional approaches.
Innovation; Peer review; Power; Research funding; Study design
Concerns have recently been raised about the negative effects of patents on innovation. In this study, the effects of patents on innovations in the Korean biotech SMEs (small and medium-sized entrepreneurs) were examined using survey data and statistical analysis.
The survey results of this study provided some evidence that restricted access problems have occurred even though their frequency was not high. Statistical analysis revealed that difficulties in accessing patented research tools were not negatively correlated with the level of innovation performance and attitudes toward the patent system.
On the basis of the results of this investigation in combination with those of previous studies, we concluded that although restricted access problems have occurred, this has not yet deterred innovation in Korea. However, potential problems do exist, and the effects of restricted access should be constantly scrutinized.
Quantitative trait loci analysis assumes that the trait is normally distributed. In reality, this is often not observed and one strategy is to transform the trait. However, it is not clear how much normality is required and which transformation works best in association studies.
We performed simulations on four types of common quantitative traits to evaluate the effects of normalization using the logarithm, Box-Cox, and rank-based transformations. The impact of sample size and genetic effects on normalization is also investigated. Our results show that rank-based transformation gives generally the best and consistent performance in identifying the causal polymorphism and ranking it highly in association tests, with a slight increase in false positive rate.
For small sample size or genetic effects, the improvement in sensitivity for rank transformation outweighs the slight increase in false positive rate. However, for large sample size and genetic effects, normalization may not be necessary since the increase in sensitivity is relatively modest.
“Pure basic” science can become detached from the natural world that it is supposed to explain. “Pure applied” work can become detached from fundamental processes that shape the world it is supposed to improve. Neither demands the intellectual support of a broad scholarly community or the material support of society. Translational research can do better by seeking innovation in theory or practice through the synthesis of basic and applied questions, literatures, and methods. Although translational thinking has always occurred in behavior analysis, progress often has been constrained by a functional separation of basic and applied communities. A review of translational traditions in behavior analysis suggests that innovation is most likely when individuals with basic and applied expertise collaborate. Such innovation may have to accelerate for behavior analysis to be taken seriously as a general-purpose science of behavior. We discuss the need for better coordination between the basic and applied sectors, and argue that such coordination compromises neither while benefiting both.
translational research; bridge research; the future of behavior analysis; coordinated bidirectional basic–applied research
Microarray experiments are often performed with a small number of biological replicates, resulting in low statistical power for detecting differentially expressed genes and concomitant high false positive rates. While increasing sample size can increase statistical power and decrease error rates, with too many samples, valuable resources are not used efficiently. The issue of how many replicates are required in a typical experimental system needs to be addressed. Of particular interest is the difference in required sample sizes for similar experiments in inbred vs. outbred populations (e.g. mouse and rat vs. human).
We hypothesize that if all other factors (assay protocol, microarray platform, data pre-processing) were equal, fewer individuals would be needed for the same statistical power using inbred animals as opposed to unrelated human subjects, as genetic effects on gene expression will be removed in the inbred populations. We apply the same normalization algorithm and estimate the variance of gene expression for a variety of cDNA data sets (humans, inbred mice and rats) comparing two conditions. Using one sample, paired sample or two independent sample t-tests, we calculate the sample sizes required to detect a 1.5-, 2-, and 4-fold changes in expression level as a function of false positive rate, power and percentage of genes that have a standard deviation below a given percentile.
Factors that affect power and sample size calculations include variability of the population, the desired detectable differences, the power to detect the differences, and an acceptable error rate. In addition, experimental design, technical variability and data pre-processing play a role in the power of the statistical tests in microarrays. We show that the number of samples required for detecting a 2-fold change with 90% probability and a p-value of 0.01 in humans is much larger than the number of samples commonly used in present day studies, and that far fewer individuals are needed for the same statistical power when using inbred animals rather than unrelated human subjects.
An innovative therapy is a newly introduced or modified therapy with unproven effect or side effect, and is undertaken in the best interest of the patient. The ethical use of innovative therapies has been controversial. In paediatrics, the conflict between withholding potential rescue therapy and protecting a vulnerable population’s rights and welfare must be considered. Therefore, it is necessary to ensure that this innovation is conducted within an ethical framework that recognizes that the therapy is not standard. This should integrate the patient’s autonomy, the role of the institution, professional consensus and innovation evaluation. Innovative therapy represents a justifiable departure from inferior conventional therapy in the absence of an accepted standard therapy. Innovation shares with research its experimental nature, but differs from research in its goal and context that exempts innovative therapy from direct governance by research ethics board. Innovative therapy is part of the continuum of hypothesis generation in the advancement of medical knowledge, and its evaluation is a transforming point for clinical research.
Child health; Ethics; Innovation; Review; Therapy
Cox proportional hazards (PH) models are commonly used in medical research to investigate the associations between covariates and time to event outcomes. It is frequently noted that with less than ten events per covariate, these models produce spurious results, and therefore, should not be used. Statistical literature contains asymptotic power formulae for the Cox model which can be used to determine the number of events needed to detect an association. Here we investigate via simulations the performance of these formulae in small sample settings for Cox models with 1- or 2-covariates. Our simulations indicate that, when the number of events is small, the power estimate based on the asymptotic formulae is often inflated. The discrepancy between the asymptotic and empirical power is larger for the dichotomous covariate especially in cases where allocation of sample size to its levels is unequal. When more than one covariate is included in the same model, the discrepancy between the asymptotic power and the empirical power is even larger, especially when a high positive correlation exists between the two covariates.
Time to event data; Cox Proportional Hazards Model; number of events per covariate; event size formula
Relating expression signatures from different sources such as cell lines, in vitro cultures from primary cells and biopsy material is an important task in drug development and translational medicine as well as for tracking of cell fate and disease progression. Especially the comparison of large scale gene expression changes to tissue or cell type specific signatures is of high interest for the tracking of cell fate in (trans-) differentiation experiments and for cancer research, which increasingly focuses on shared processes and the involvement of the microenvironment. These signature relation approaches require robust statistical methods to account for the high biological heterogeneity in clinical data and must cope with small sample sizes in lab experiments and common patterns of co-expression in ubiquitous cellular processes. We describe a novel method, called PhysioSpace, to position dynamics of time series data derived from cellular differentiation and disease progression in a genome-wide expression space. The PhysioSpace is defined by a compendium of publicly available gene expression signatures representing a large set of biological phenotypes. The mapping of gene expression changes onto the PhysioSpace leads to a robust ranking of physiologically relevant signatures, as rigorously evaluated via sample-label permutations. A spherical transformation of the data improves the performance, leading to stable results even in case of small sample sizes. Using PhysioSpace with clinical cancer datasets reveals that such data exhibits large heterogeneity in the number of significant signature associations. This behavior was closely associated with the classification endpoint and cancer type under consideration, indicating shared biological functionalities in disease associated processes. Even though the time series data of cell line differentiation exhibited responses in larger clusters covering several biologically related patterns, top scoring patterns were highly consistent with a priory known biological information and separated from the rest of response patterns.
Translational research has tremendous potential as a tool to reduce health disparities in the United States, but a lack of common understanding about the scope of this dynamic, multidisciplinary approach to research has limited its use. The term “translational research” is often associated with the phrase “bench to bedside,” but the expedited movement of biomedical advances from the laboratory to clinical trials is only the first phase of the translational process. The second phase of translation, wherein innovations are moved from the bedside to real-world practice, is equally important, but it receives far less attention. Due in part to this imbalance, tremendous amounts of money and effort are spent expanding the boundaries of understanding and investigating the molecular underpinnings of disease and illness, while far fewer resources are devoted to improving the mechanisms by which those advances will be used to actually improve health outcomes. To foster awareness of the complete translational process and understanding of its value, we have developed two complementary models that provide a unifying conceptual framework for translational research. Specifically, these models integrate many elements of the National Institutes of Health roadmap for the future of medical research and provide a salient conceptualization of how a wide range of research endeavors from different disciplines can be used harmoniously to make progress toward achieving two overarching goals of Healthy People 2010—increasing the quality and years of healthy life and eliminating health disparities.
Translational Research; Health Disparities
Increasing size of G3BP-induced stress granules is associated with a threshold or switch that must be triggered for eIF2α phosphorylation and subsequent translational repression to occur. Stress granules are active in signaling to the translational machinery and may be important regulators of the innate immune response.
Stress granules are large messenger ribonucleoprotein (mRNP) aggregates composed of translation initiation factors and mRNAs that appear when the cell encounters various stressors. Current dogma indicates that stress granules function as inert storage depots for translationally silenced mRNPs until the cell signals for renewed translation and stress granule disassembly. We used RasGAP SH3-binding protein (G3BP) overexpression to induce stress granules and study their assembly process and signaling to the translation apparatus. We found that assembly of large G3BP-induced stress granules, but not small granules, precedes phosphorylation of eIF2α. Using mouse embryonic fibroblasts depleted for individual eukaryotic initiation factor 2α (eIF2α) kinases, we identified protein kinase R as the principal kinase that mediates eIF2α phosphorylation by large G3BP-induced granules. These data indicate that increasing stress granule size is associated with a threshold or switch that must be triggered in order for eIF2α phosphorylation and subsequent translational repression to occur. Furthermore, these data suggest that stress granules are active in signaling to the translational machinery and may be important regulators of the innate immune response.
Current practice results in the publication of many research studies in medical and related disciplines which may be criticised on the grounds of inadequate sample size and statistical power. Small studies continue to be carried out with little more than a blind hope of showing the desired effect. Nevertheless, papers based on such work are submitted for publication, especially if the results turn out to be statistically significant. There is confusion about what makes a result suitable for publication. Often there is a preference for statistically significant results at the peer review stage. Consequently published reports of small studies tend to contain too many false positive results and to exaggerate the true effects. The use of a criterion of a posteriori power does not eliminate the bias; a priori power is the criterion of choice. This could be implemented by peer review of study protocols at the planning stage by funding bodies and journals.
Reconstructive microsurgery for oral and maxillofacial (OMF) defects is considered as a niche specialty and is performed regularly only in a handful of centers. Till recently the pectoralis major myocutaneous flap (PMMC) was considered to be the benchmark for OMF reconstruction. This philosophy is changing fast with rapid advancement in reconstructive microsurgery. Due to improvement in instrumentation and the development of finer techniques of flap harvesting we can positively state that microsurgery has come of age. Better techniques, microscopes and micro instruments enable us to do things previously unimaginable. Supramicrosurgery and ultrathin flaps are a testimony to this. Years of innovation in reconstructive microsurgery have given us a reasonably good number of very excellent flaps. Tremendous work has been put into producing some exceptionally brilliant research articles, sometimes contradicting each other. This has led to the need for clarity in some areas in this field. This article will review some controversies in reconstructive microsurgery and analyze some of the most common microvascular free flaps (MFF) used in OMF reconstruction. It aims to buttress the fact that three flaps-the radial forearm free flap (RFFF), anterolateral thigh flap (ALT) and fibula are the ones most expedient in the surgeon's arsenal, since they can cater to almost all sizeable defects we come across after ablative surgery in the OMF region. They can thus aptly be titled as the workhorses of OMF reconstruction with regard to free flaps.
Microvascular free flaps; oral and maxillofacial surgery; reconstructive microsurgery
In General Practice, computers might assist clinical decision-making,perform business procedures, and support health care delivery research. Before being used, however, computers first must be economically justifiable. The cost of computer systems is known. One can estimate their potential dollar benefit in primary care. Computer technology was therefore assessed for its potential to save money in a model General Practice. Information processing needs were noted, functional specifications were developed, and typical costs for systems appropriate to practices of varying size were calculated. Computers might improve primary care in many ways, but savings accrue only from support of billing and accounting. Savings might equal or exceed the cost of a computer system in groups of practitioners, optimally composed of between six and eight doctors. If computers could pay for themselves by performing essential business functions, they would then be readily available for other purposes in General Practice.
Sample size calculations are an important part of research to balance the use of resources and to avoid undue harm to participants. Effect sizes are an integral part of these calculations and meaningful values are often unknown to the researcher. General recommendations for effect sizes have been proposed for several commonly used statistical procedures. For the analysis of tables, recommendations have been given for the correlation coefficient for binary data; however, it is well known that suffers from poor statistical properties. The odds ratio is not problematic, although recommendations based on objective reasoning do not exist. This paper proposes odds ratio recommendations that are anchored to for fixed marginal probabilities. It will further be demonstrated that the marginal assumptions can be relaxed resulting in more general results.
Investigating differences between means of more than two groups or experimental conditions is a routine research question addressed in biology. In order to assess differences statistically, multiple comparison procedures are applied. The most prominent procedures of this type, the Dunnett and Tukey-Kramer test, control the probability of reporting at least one false positive result when the data are normally distributed and when the sample sizes and variances do not differ between groups. All three assumptions are non-realistic in biological research and any violation leads to an increased number of reported false positive results. Based on a general statistical framework for simultaneous inference and robust covariance estimators we propose a new statistical multiple comparison procedure for assessing multiple means. In contrast to the Dunnett or Tukey-Kramer tests, no assumptions regarding the distribution, sample sizes or variance homogeneity are necessary. The performance of the new procedure is assessed by means of its familywise error rate and power under different distributions. The practical merits are demonstrated by a reanalysis of fatty acid phenotypes of the bacterium Bacillus simplex from the “Evolution Canyons” I and II in Israel. The simulation results show that even under severely varying variances, the procedure controls the number of false positive findings very well. Thus, the here presented procedure works well under biologically realistic scenarios of unbalanced group sizes, non-normality and heteroscedasticity.
We are interested in understanding the locational distribution of genes and their functions in genomes, as this distribution has both functional and evolutionary significance. Gene locational distribution is known to be affected by various evolutionary processes, with tandem duplication thought to be the main process producing clustering of homologous sequences. Recent research has found clustering of protein structural families in the human genome, even when genes identified as tandem duplicates have been removed from the data. However, this previous research was hindered as they were unable to analyse small sample sizes. This is a challenge for bioinformatics as more specific functional classes have fewer examples and conventional statistical analyses of these small data sets often produces unsatisfactory results.
We have developed a novel bioinformatics method based on Monte Carlo methods and Greenwood's spacing statistic for the computational analysis of the distribution of individual functional classes of genes (from GO). We used this to make the first comprehensive statistical analysis of the relationship between gene functional class and location on a genome. Analysis of the distribution of all genes except tandem duplicates on the five chromosomes of A. thaliana reveals that the distribution on chromosomes I, II, IV and V is clustered at P = 0.001. Many functional classes are clustered, with the degree of clustering within an individual class generally consistent across all five chromosomes. A novel and surprising result was that the locational distribution of some functional classes were significantly more evenly spaced than would be expected by chance.
Analysis of the A. thaliana genome reveals evidence of unexplained order in the locational distribution of genes. The same general analysis method can be applied to any genome, and indeed any sequential data involving classes.
We consider the design of dose-finding trials for patients with malignancies when only a limited sample size is available. The small sample size may be necessary because 1) the modality of treatment is very expensive, and/or 2) the disease under investigation is rare, requiring a lengthy period to enroll a target patient population. Both of these are common in the field of adoptive immunotherapy, in which T cells are infused to prevent and treat infections and malignancies. The clinical trial described in this paper investigates a novel therapy to adoptively transfer genetically modified T cells in small pilot protocols enrolling patients with B-lineage malignancies. Due to the constraints of cost and infrastructure, the maximum sample size for this trial is fixed at 12 patients distributed among four doses of T cells. Given these limitations, an innovative statistical design has been developed to efficiently evaluate the safety, feasibility, persistence, and toxicity profiles of the trial doses. The proposed statistical design is specifically tailored for trials with small sample sizes in that it uses the toxicity outcomes from patients treated at different doses to make dose-finding decisions. Supplementary materials including an R function and a movie demo can be downloaded in the websites listed in the first two sections of the paper.
Adaptive designs; phase I; Toxicity
Motivation: High-dimensional data such as microarrays have created new challenges to traditional statistical methods. One such example is on class prediction with high-dimension, low-sample size data. Due to the small sample size, the sample mean estimates are usually unreliable. As a consequence, the performance of the class prediction methods using the sample mean may also be unsatisfactory. To obtain more accurate estimation of parameters some statistical methods, such as regularizations through shrinkage, are often desired.
Results: In this article, we investigate the family of shrinkage estimators for the mean value under the quadratic loss function. The optimal shrinkage parameter is proposed under the scenario when the sample size is fixed and the dimension is large. We then construct a shrinkage-based diagonal discriminant rule by replacing the sample mean by the proposed shrinkage mean. Finally, we demonstrate via simulation studies and real data analysis that the proposed shrinkage-based rule outperforms its original competitor in a wide range of settings.
Choosing the appropriate sample size is an important step in the design of a microarray experiment, and recently methods have been proposed that estimate sample sizes for control of the False Discovery Rate (FDR). Many of these methods require knowledge of the distribution of effect sizes among the differentially expressed genes. If this distribution can be determined then accurate sample size requirements can be calculated.
We present a mixture model approach to estimating the distribution of effect sizes in data from two-sample comparative studies. Specifically, we present a novel, closed form, algorithm for estimating the noncentrality parameters in the test statistic distributions of differentially expressed genes. We then show how our model can be used to estimate sample sizes that control the FDR together with other statistical measures like average power or the false nondiscovery rate. Method performance is evaluated through a comparison with existing methods for sample size estimation, and is found to be very good.
A novel method for estimating the appropriate sample size for a two-sample comparative microarray study is presented. The method is shown to perform very well when compared to existing methods.
According to current evolutionary dogma, multiple infections generally increase a parasite's virulence (i.e. reduce the host's reproductive success). The basic idea is that the competitive interactions among strains of parasites developing within a single host select individual parasites to exploit their host more rapidly than their competitors (thereby causing an increase in virulence) to ensure their transmission. Although experimental evidence is scarce, it often contradicts the theoretical expectation by suggesting that multiple infections lead to decreased virulence. Here, we present a theoretical model to explain this contradiction and show that the evolutionary outcome of multiple infections depends on the characteristics of the interaction between the host and its parasite. If we assume, as current models do, that parasites have only lethal effects on their host, multiple infections indeed increase virulence. By contrast, if parasites have sub-lethal effects on their host (such as reduced growth) and, in particular, if these effects feed back onto the parasites to reduce their rate of development, then multiplicity of infection generally leads to lower virulence.
This past year, the National Institutes of Health announced an ambitious translational research initiative with the explicit goal of transforming biomedical research in the United States to emphasize research on human health related issues (1,2). This research announcement stressed the fact that animal models of human disease are often inadequate and hence it was argued that there was a need to focus directly on the study of human populations and biologic samples. The research announcement evoked a mixed response from the scientific community (3) partly because existing experimental approaches and animal models have been extremely successful in defining relevant issues and therapeutic strategies in biomedical research.
Given that it is imprudent to abandon effective experimental paradigms, it is incontrovertible that the ultimate goal of the research initiative, “to translate the remarkable scientific innovations we are witnessing into health gains for the nation” (1) is laudable. However a myopic and primary focus on human disease and on human tissue introduces a plethora of research risks and concerns that could potentially complicate data interpretation and retard scientific progress. While some of these issues are generic when one extrapolates from animal models to the human circumstance, others are more specific to the cardiovascular system in general and to the study of cardiocyte biology in particular. This brief review will highlight some of these.