The shift in age structure is having a profound impact, suggesting that the aged should be consulted as reporters on the quality of their own lives.
The aim of this research was to establish the possible impact of traditional Chinese medicine (TCM) techniques on the quality of life (QOL) of the elderly.
Two non-selected, volunteer groups of Rio de Janeiro municipality inhabitants: a control group (36 individuals), not using TCM, and an experimental group (28 individuals), using TCM at ABACO/Sohaku-in Institute, Brazil.
A questionnaire on elderly QOL devised by the World Health Organization, the WHOQOL-Old, was adopted and descriptive statistical techniques were used: mean and standard deviation. The Shapiro–Wilk test checked the normality of the distribution. Furthermore, based on its normality distribution for the intergroup comparison, the Student t test was applied to facets 2, 4, 5, 6, and total score, and the Mann–Whitney U rank test to facets 1 and 3, both tests aiming to analyze the P value between experimental and control groups. The significance level utilized was 95% (P < 0.05).
The experimental group reported the highest QOL for every facet and the total score.
The results suggest that TCM raises the level of QOL.
quality of life; traditional chinese medicine; east-west medicine; WHOQOL-Old; elderly
The theory has been put forward that if a null hypothesis is true, P-values should follow a Uniform distribution. This can be used to check the validity of randomisation.
The theory was tested by simulation for two sample t tests for data from a Normal distribution and a Lognormal distribution, for two sample t tests which are not independent, and for chi-squared and Fisher’s exact test using small and using large samples.
For the two sample t test with Normal data the distribution of P-values was very close to the Uniform. When using Lognormal data this was no longer true, and the distribution had a pronounced mode. For correlated tests, even using data from a Normal distribution, the distribution of P-values varied from simulation run to simulation run, but did not look close to Uniform in any realisation. For binary data in a small sample, only a few probabilities were possible and distribution was very uneven. With a sample of two groups of 1,000 observations, there was great unevenness in the histogram and a poor fit to the Uniform.
The notion that P-values for comparisons of groups using baseline data in randomised clinical trials should follow a Uniform distribution if the randomisation is valid has been found to be true only in the context of independent variables which follow a Normal distribution, not for Lognormal data, correlated variables, or binary data using either chi-squared or Fisher’s exact tests. This should not be used as a check for valid randomisation.
Testing the equality of two survival distributions can be difficult in a prevalent cohort study when non random sampling of subjects is involved. Due to the biased sampling scheme, independent censoring assumption is often violated. Although the issues about biased inference caused by length-biased sampling have been widely recognized in statistical, epidemiological and economical literature, there is no satisfactory solution for efficient two-sample testing. We propose an asymptotic most efficient nonparametric test by properly adjusting for length-biased sampling. The test statistic is derived from a full likelihood function, and can be generalized from the two-sample test to a k-sample test. The asymptotic properties of the test statistic under the null hypothesis are derived using its asymptotic independent and identically distributed representation. We conduct extensive Monte Carlo simulations to evaluate the performance of the proposed test statistics and compare them with the conditional test and the standard logrank test for different biased sampling schemes and right-censoring mechanisms. For length-biased data, empirical studies demonstrated that the proposed test is substantially more powerful than the existing methods. For general left-truncated data, the proposed test is robust, still maintains accurate control of type I error rate, and is also more powerful than the existing methods, if the truncation patterns and right-censoring patterns are the same between the groups. We illustrate the methods using two real data examples.
The concept of assumption adequacy averaging is introduced as a technique to develop more robust methods that incorporate assessments of assumption adequacy into the analysis. The concept is illustrated by using it to develop a method that averages results from the t-test and nonparametric rank-sum test with weights obtained from using the Shapiro-Wilk test to test the assumption of normality. Through this averaging process, the proposed method is able to rely more heavily on the statistical test that the data suggests is superior for each individual gene. Subsequently, this method developed by assumption adequacy averaging outperforms its two component methods (the t-test and rank-sum test) in a series of traditional and bootstrap-based simulation studies. The proposed method showed greater concordance in gene selection across two studies of gene expression in acute myeloid leukemia than did the t-test or rank-sum test. An R routine to implement the method is available upon request.
Gene expression data analysis; microarray data; assumption assessment; assumption adequacy averaging; empirical Bayes
Topical phenytoin is a powerful skin wounds healing and it may be useful in clinical practice. The purpose of this study was to evaluate the effect of topical phenytoin 0.5%, by comparing it with cream (control) in wounds resulting from excision of two melanocytic nevi in the same patient. Our purpose was also to assess if phenytoin had better therapeutic and cosmetic outcomes when compared with cream (control).
This study evaluated 100 patients with skin wounds from excision of melanocytic nevi. 50 patients with lesions on the face and 50 patients with lesions on the back, totalizing 200 lesions excised with modified punch. The resulting superficial skin wounds had the same diameter and depth, and second intention healing followed.
Patients were followed for 60 days. Student's t-test, Mann Whitney nonparametric test, analysis of variance, LSD test, Shapiro-Wilks test and Fisher test were used to analyze the results, depending on the nature of the variables being studied.
Phenytoin showed better therapeutic and cosmetic results, by healing faster, with more intense epithelization in wounds in comparison with cream (control). Phenytoin showed a statistically significant difference regarding the following parameters (p < 0.05): wounded area and healing time. Phenytoin application resulted in a smaller area and a shorter healing time. Also the intensity of exudates, bleeding, and the epithelization were more intense in phenytoin-treated wounds. Regarding the shape and thickness of the scar, injuries treated with phenytoin had round and flat shaped scars in most of the cases. Considering patient's gender and phototype, female patients presented smaller wounds and scar areas; and phototype I had the largest scar areas. Contact eczema was an adverse reaction in 7 injuries located on the back caused by cream (control) and hypoallergenic tape.
Phenytoin showed better therapeutic and cosmetic results compared with cream (control). Phenytoin is a low cost drug, which accelerates skin wounds healing in human patients. Trial registration: ISRCTN96539803
Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis.
To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer.
Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets.
We show how to test hypotheses for coefficient alpha in three different situations: (1) hypothesis tests of whether coefficient alpha equals a prespecified value, (2) hypothesis tests involving two statistically independent sample alphas as may arise when testing the equality of coefficient alpha across groups, and (3) hypothesis tests involving two statistically dependent sample alphas as may arise when testing the equality of alpha across time or when testing the equality of alpha for two test scores within the same sample. We illustrate how these hypotheses may be tested in a structural equation-modeling framework under the assumption of normally distributed responses and also under asymptotically distribution free assumptions. The formulas for the hypothesis tests and computer code are given for four different applied examples. Supplemental materials for this article may be downloaded from http://brm.psychonomic-journals.org/content/supplemental.
Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.
We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson's disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.
Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at .
The secretion of salivary alpha-amylase (sAA) is more associated with psychoneuroendocrinological response to stress than with the flow rate and age. The aim of this cross sectional study is to build an explanatory model based on patterns of relationship between age 20-39 in resting and stimulated saliva under no stressful condition in healthy volunteers. Both resting and stimulated saliva were collected from 40 subjects. The sAA values were log-transformed, the normality assumption was verified with the Shapiro-Wilk test and the reliability of the measurements was estimated by the Pearsons’ r correlation coefficient. The estimated model was based on the theory of the Linear Mixed Models. Significant mean changes were observed in flow rate and sAA activity between resting and stimulated saliva. The final model consists of two components, the first revealed a positive correlation between age and sAA while the second one revealed a negative correlation between the interaction of age × flow rate in its condition (resting or stimulated saliva), with sAA. Both flow rate and age influence sAA activity.
Salivary flow rate; salivary alpha-amylase; age.
A common problem associated with longitudinal studies is the dropouts of subjects or censoring before the end of follow-up. In most existing methods, it is assumed that censoring is noninformative about missed responses. This assumption is crucial to the validity of many statistical procedures. We develop some nonparametric hypothesis testing procedures to test for independent censoring in the absence/presence of covariates. The test statistics are constructed by contrasting two estimators of the conditional mean of cumulative responses for each stratum of covariate space from sample subsets with different patterns of censoring. Our method does not require the modelling of longitudinal response processes, therefore is robust to model misspecifications. A diagnostic plot procedure is also developed that can be used to identify dependent censoring to certain covariate strata. The finite sample performances of the tests are investigated through extensive simulation studies. The potential of our methods is demonstrated through the application of the tests to a chronic granulomatous disease study.
CGD data; Gaussian multiplier method; informative censoring; integrated square test; marginal and conditional independent censoring; nonparametric tests; recurrent events; supremum test; weak convergence
The Benjamini–Hochberg procedure is widely used in multiple comparisons. Previous power results for this procedure have been based on simulations. This article produces theoretical expressions for expected power. To derive them, we make assumptions about the number of hypotheses being tested, which null hypotheses are true, which are false, and the distributions of the test statistics under each null and alternative. We use these assumptions to derive bounds for multiple dimensional rejection regions. With these bounds and a permanent based representation of the joint density function of the largest p-values, we use the law of total probability to derive the distribution of the total number of rejections. We derive the joint distribution of the total number of rejections and the number of rejections when the null hypothesis is true. We give an analytic expression for the expected power for a false discovery rate procedure that assumes the hypotheses are independent.
Benjamini–Hochberg procedure; Distribution of number of rejections; Multiple comparisons; Rejection region bounds
Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compare them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods which ignore the correlation structure, and better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is also presented.
Confidence region; Efficient estimation; Empirical likelihood; Longitudinal data; Maximum empirical likelihood estimator
Efficient estimation of parameters is a major objective in analyzing longitudinal data. We propose two generalized empirical likelihood-based methods that take into consideration within-subject correlations. A nonparametric version of the Wilks theorem for the limiting distributions of the empirical likelihood ratios is derived. It is shown that one of the proposed methods is locally efficient among a class of within-subject variance-covariance matrices. A simulation study is conducted to investigate the finite sample properties of the proposed methods and compares them with the block empirical likelihood method by You et al. (2006) and the normal approximation with a correctly estimated variance-covariance. The results suggest that the proposed methods are generally more efficient than existing methods that ignore the correlation structure, and are better in coverage compared to the normal approximation with correctly specified within-subject correlation. An application illustrating our methods and supporting the simulation study results is presented.
Confidence region; Efficient estimation; Empirical likelihood; Longitudinal data; Maximum empirical likelihood estimator
In microarray data analysis, we are often required to combine several dependent partial test results. To overcome this, many suggestions have been made in previous literature; Tippett’s test and Fisher’s omnibus test are most popular. Both tests have known null distributions when the partial tests are independent. However, for dependent tests, their (even, asymptotic) null distributions are unknown and additional numerical procedures are required. In this paper, we revisited Stouffer’s test based on z-scores and showed its advantage over the two aforementioned methods in the analysis of large-scale microarray data. The combined statistic in Stouffer’s test has a normal distribution with mean 0 from the normality of the z-scores. Its variance can be estimated from the scores of genes in the experiment without an additional numerical procedure. We numerically compared the errors of Stouffer’s test and the two p-value based methods, Tippett’s test and Fisher’s omnibus test. We also analyzed our microarray data to find differentially expressed genes by non-genotoxic and genotoxic carcinogen compounds. Both numerical study and the real application showed that Stouffer’s test performed better than Tippett’s method and Fisher’s omnibus method with additional permutation steps.
Plasma level of high-density lipoprotein-cholesterol (HDL-C), a heritable trait, is an important determinant of susceptibility to atherosclerosis. Non-synonymous and regulatory single nucleotide polymorphisms (SNPs) in genes implicated in HDL-C synthesis and metabolism are likely to influence plasma HDL-C, apolipoprotein A-I (apo A-I) levels and severity of coronary atherosclerosis.
We genotyped 784 unrelated Caucasian individuals from two sets of populations (Lipoprotein and Coronary Atherosclerosis Study- LCAS, N = 333 and TexGen, N = 451) for 94 SNPs in 42 candidate genes by 5' nuclease assays. We tested the distribution of the phenotypes by the Shapiro-Wilk normality test. We used Box-Cox regression to analyze associations of the non-normally distributed phenotypes (plasma HDL-C and apo A-I levels) with the genotypes. We included sex, age, body mass index (BMI), diabetes mellitus (DM), and cigarette smoking as covariates. We calculated the q values as indicators of the false positive discovery rate (FDR).
Plasma HDL-C levels were associated with sex (higher in females), BMI (inversely), smoking (lower in smokers), DM (lower in those with DM) and SNPs in APOA5, APOC2, CETP, LPL and LIPC (each q ≤0.01). Likewise, plasma apo A-I levels, available in the LCAS subset, were associated with SNPs in CETP, APOA5, and APOC2 as well as with BMI, sex and age (all q values ≤0.03). The APOA5 variant S19W was also associated with minimal lumen diameter (MLD) of coronary atherosclerotic lesions, a quantitative index of severity of coronary atherosclerosis (q = 0.018); mean number of coronary artery occlusions (p = 0.034) at the baseline and progression of coronary atherosclerosis, as indicated by the loss of MLD.
Putatively functional variants of APOA2, APOA5, APOC2, CETP, LPL, LIPC and SOAT2 are independent genetic determinants of plasma HDL-C levels. The non-synonymous S19W SNP in APOA5 is also an independent determinant of plasma apo A-I level, severity of coronary atherosclerosis and its progression.
We consider 12 event-related potentials and one electroencephalogram measure as disease-related traits to compare alcohol-dependent individuals (cases) to unaffected individuals (controls). We use two approaches: 1) two-way analysis of variance (with sex and alcohol dependency as the factors), and 2) likelihood ratio tests comparing sex adjusted values of cases to controls assuming that within each group the trait has a 2 (or 3) component normal mixture distribution. In the second approach, we test the null hypothesis that the parameters of the mixtures are equal for the cases and controls. Based on the two-way analysis of variance, we find 1) males have significantly (p < 0.05) lower mean response values than females for 7 of these traits. 2) Alcohol-dependent cases have significantly lower mean response than controls for 3 traits. The mixture analysis of sex-adjusted values of 1 of these traits, the event-related potential obtained at the parietal midline channel (ttth4), found the appearance of a 3-component normal mixture in cases and controls. The mixtures differed in that the cases had significantly lower mean values than controls and significantly different mixing proportions in 2 of the 3 components. Implications of this study are: 1) Sex needs to be taken into account when studying risk factors for alcohol dependency to prevent finding a spurious association between alcohol dependency and the risk factor. 2) Mixture analysis indicates that for the event-related potential "ttth4", the difference observed reflects strong evidence of heterogeneity of response in both the cases and controls.
This article considers the problem of multiple hypothesis testing using t-tests. The observed data are assumed to be independently generated conditional on an underlying and unknown two-state hidden model. We propose an asymptotically valid data-driven procedure to find critical values for rejection regions controlling k-family wise error rate (k-FWER), false discovery rate (FDR) and the tail probability of false discovery proportion (FDTP) by using one-sample and two-sample t-statistics. We only require finite fourth moment plus some very general conditions on the mean and variance of the population by virtue of the moderate deviations properties of t-statistics. A new consistent estimator for the proportion of alternative hypotheses is developed. Simulation studies support our theoretical results and demonstrate that the power of a multiple testing procedure can be substantially improved by using critical values directly as opposed to the conventional p-value approach. Our method is applied in an analysis of the microarray data from a leukemia cancer study that involves testing a large number of hypotheses simultaneously.
empirical processes; FDR; high dimension; microarrays; multiple hypothesis testing; one-sample t-statistics; self-normalized moderate deviation; two-sample t-statistics
Quantitative trait loci (QTLs) may affect not only the mean of a trait but also its
variability. A special aspect is the variability between multiple measured traits of
genotyped animals, such as the within-litter variance of piglet birth weights. The sample
variance of repeated measurements is assigned as an observation for every genotyped
individual. It is shown that the conditional distribution of the non-normally distributed
trait can be approximated by a gamma distribution. To detect QTL effects in the daughter
design, a generalized linear model with the identity link function is applied. Suitable
test statistics are constructed to test the null hypothesis H0: No QTL with effect on the within-litter variance is segregating
versus HA: There is a QTL with effect on the variability of birth weight within
litter. Furthermore, estimates of the QTL effect and the QTL position are
introduced and discussed. The efficiency of the presented tests is compared with a test
based on weighted regression. The error probability of the first type as well as the power
of QTL detection are discussed and compared for the different tests.
We consider the problem of estimating the correlation in bivariate normal data when the means and variances are assumed known, with emphasis on the small sample case. We consider eight different estimators, several of them considered here for the first time in the literature. In a simulation study, we found that Bayesian estimators using the uniform and arc-sine priors outperformed several empirical and exact or approximate maximum likelihood estimators in small samples. The arc-sine prior did better for large values of the correlation. For testing whether the correlation is zero, we found that Bayesian hypothesis tests outperformed significance tests based on the empirical and exact or approximate maximum likelihood estimators considered in small samples, but that all tests performed similarly for sample size 50. These results lead us to suggest using the posterior mean with the arc-sine prior to estimate the correlation in small samples when the variances are assumed known.
Arc-sine prior; Bayes factor; Bayesian test; Maximum likelihood estimator; Uniform prior; Jeffreys prior
For comparing the distribution of two samples with multiple endpoints, O’Brien (1984) proposed rank-sum-type test statistics. Huang et al. (2005) extended these statistics to the general nonparametric Behrens-Fisher hypothesis problem and obtained improved test statistics by replacing the ad hoc variance with the asymptotic variance of the rank-sum statistics. In this paper we generalize the work of O’Brien (1984) and Huang et al. (2005) and propose a weighted rank-sum statistic. We show that the weighted rank-sum statistic is asymptotically normally distributed, permitting the computation of power, p-values and confidence intervals. We further demonstrate via simulation that the weighted rank-sum statistic is efficient in controlling the type I error rate and under certain alternatives, is more powerful than the statistics of O’Brien (1984) and Huang et al.(2005).
Asymptotic normality; Behrens-Fisher problem; Case-Control; Clinical trials; Multiple endpoints; Rank-sum statistics; Weights
We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.
Dirichlet process prior; Identifiability; Postprocessing; Random effects; Smoothing spline; Uniform shrinkage prior; Variance components
The power of a test, the probability of rejecting the null hypothesis in favor of an alternative, may be computed using estimates of one or more distributional parameters. Statisticians frequently fix mean values and calculate power or sample size using a variance estimate from an existing study. Hence computed power becomes a random variable for a fixed sample size. Likewise, the sample size necessary to achieve a fixed power varies randomly. Standard statistical practice requires reporting uncertainty associated with such point estimates. Previous authors studied an asymptotically unbiased method of obtaining confidence intervals for noncentrality and power of the general linear univariate model in this setting. We provide exact confidence intervals for noncentrality, power, and sample size. Such confidence intervals, particularly one-sided intervals, help in planning a future study and in evaluating existing studies.
Effect size; Meta-analysis; Noncentral F distribution
To evaluate by simulation the statistical properties of normalized prediction distribution errors (NPDE), prediction discrepancies (pd), standardized prediction errors (SPE), numerical predictive check (NPC) and decorrelated NPC (NPCdec) for the external evaluation of a population pharmacokinetic analysis, and to illustrate the use of NPDE for the evaluation of covariate models.
We assume that a model MB has been built using a building dataset B, and that a separate validation dataset, V is available. Our null hypothesis H0 is that the data in V can be described by MB. We use several methods to test this hypothesis: NPDE, pd, SPE, NPC and NPCdec. First, we evaluated by simulation the type I error under H0 of different tests applied to the four methods. We also propose and evaluate a single global test combining normality, mean and variance tests applied to NPDE, pd and SPE. We perform tests on NPC and NPCdec, after a decorrelation. MB was a one compartment model with first order absorption (without covariate), previously developed from two phase II and one phase III studies of the antidiabetic drug, gliclazide. We simulated 500 external datasets according to the design of a phase III study.
Second, we investigated the application of NPDE to covariate models. We propose two approaches: the first approach uses correlation tests or mean comparisons to test the relationship between NPDE and covariates; the second evaluates NPDE split by category for discrete covariates or quantiles for continuous covariates. We generated several validation datasets under H0 and under alternative assumptions with a model without covariate, with one continuous covariate (weight), or one categorical covariate (sex). We calculated the powers of the different tests using simulations, where the covariates of the phase III study were used.
The simulations under H0 show a high type I error for the different tests applied to SPE and an increased type I error for pd. The different tests present a type I error close to 5% for for the global test appied to NPDE. We find a type I error higher than 5% for the test applied to classical NPC but this test becomes close to 5% for NPCdec.
For covariate models, when model and validation dataset are consistent, type I error of the tests are close to 5% for both effects. When validation datasets and models are not consistent, the tests detect the correlation between NPDE and the covariate.
We recommend to use NPDE over SPE for external model evaluation, since they do not depend on an approximation of the model and have good statistical properties. NPDE represent a better approach than NPC, since in order to perform tests on NPC, a decorrelation step must be applied before. NPDE, in this illustration, is also a good tool to evaluate model with or without covariates.
Clinical Trials as Topic; Computer Simulation; Gliclazide; pharmacokinetics; Humans; Models, Statistical; Research Design; model evaluation; population pharmacokinetics; predictive distribution; VPC; NPC; predictive check; prediction error
When the one sample or two-sample t-test is either taught in the class room, or applied in practice to small samples, there is considerable divergence of opinion as to whether or not the inferences drawn are valid. Many point to the “Robustness” of the t-test to violations of assumptions, while others use rank or other robust methods because they believe the t-test is not robust against violations of such assumptions. It is quite likely, despite the apparent divergence of these two opinions, that both arguments have considerable merit. If we agree that this question cannot possibly be resolved in general, the issue becomes one of determining, before any actual data have been collected, whether the t-test will or will not be robust in a specific application. This paper describes Statistical Analysis System (SAS) software, covering a large collection of potential input probability distributions, to investigate both the null and power properties of various one and two sample t-tests and their normal approximations, as well as the Wilcoxon two-sample and sign-rank one sample tests, allowing potential practitioners to determine, at the study design stage, whether the t-test will be robust in their specific application. Sample size projections, based on these actual distributions, are also included. This paper is not intended as a tool to assess robustness after the data have been collected.
Robustness; Satterthwaite approximation; sign-rank test; T-test; Wilcoxon test
Investigating differences between means of more than two groups or experimental conditions is a routine research question addressed in biology. In order to assess differences statistically, multiple comparison procedures are applied. The most prominent procedures of this type, the Dunnett and Tukey-Kramer test, control the probability of reporting at least one false positive result when the data are normally distributed and when the sample sizes and variances do not differ between groups. All three assumptions are non-realistic in biological research and any violation leads to an increased number of reported false positive results. Based on a general statistical framework for simultaneous inference and robust covariance estimators we propose a new statistical multiple comparison procedure for assessing multiple means. In contrast to the Dunnett or Tukey-Kramer tests, no assumptions regarding the distribution, sample sizes or variance homogeneity are necessary. The performance of the new procedure is assessed by means of its familywise error rate and power under different distributions. The practical merits are demonstrated by a reanalysis of fatty acid phenotypes of the bacterium Bacillus simplex from the “Evolution Canyons” I and II in Israel. The simulation results show that even under severely varying variances, the procedure controls the number of false positive findings very well. Thus, the here presented procedure works well under biologically realistic scenarios of unbalanced group sizes, non-normality and heteroscedasticity.