This section will examine and explain some methods of statistical analyses that deal with quantification of viral particles or viral DNA, where viral DNA copy numbers are taken as surrogate variables for viral copy numbers. The importance of gene expression data, immunological function, and so forth, is great. In this section, we focus on statistical methods appropriate for quantification of viral particles and HSV-1 copy numbers as experimental outcomes in many of the reviewed animal models in HSV-1 eye disease.
Statistical methods in the biomedical literature, when surveyed by statisticians, are often found to be inadequate. Such shortcomings of statistical technique have also been noted in the literature of virology [
92]. More uniform statistical methods applied in animal models of HSV-1 eye disease will facilitate meta-analytic studies attempting summarization of results from many studies conducted over years of research, funding, and publication. Eventually, such uniformity of practice will result in greater value of the combined efforts of many research groups over years of effort and, perhaps, even using different animal models of HSV-1 eye disease. Thus, while our comments on statistical methods in HSV-1 disease models will of necessity be a brief section of a review, our hope is this will provide a guide that will lead investigators to consider these fundamental aspects of statistical design and analysis more closely and perhaps to seek statistical consultation on these often difficult methodological problems prior to experimental design or statistical analysis.
Perhaps most important among the statistical issues in analysis of viral counts (obtained by whatever means) is that analysis of frequencies or count data must be addressed using a body of statistical methods known as categorical models, or methods for the analysis of frequencies [
93]. In some cases, count data may contain large count values and may be analyzed as if the count variable was a continuous variable. In contrast, relatively small sample sizes and sparse data which could include zeros, as well as truncated distributions often characteristic of count data, make this naïve assumption—that discrete variables act like continuous variables—a potential cause of problems for statistical estimation and hypothesis testing. Variables used as outcomes in mouse and rabbit models of HSV-1 eye disease may also represent ordinal or nominal scores as well as count data. One such example is an investigator-tallied score for corneal involvement with HSV-1 lesions or a scoring of the severity of associated pathological characteristics such as amount of pus or inflammation. Different statistical methods are appropriate for these different outcome variable types.
Often an additional level of complexity in HSV-1 animal models is observation over time. Experimenters may be concerned with the detailed dynamics of a disease process over time in addition to treatment averages at the end of a period of observation. This adds complexities related to correlations found among observations coming from one experimental subject over time (called within subject correlations). If these correlations within subjects are not dealt with correctly in the experimental design and analysis, they can inflate the test statistics and distort the conclusions of the statistical analysis. The dynamics of the infectious process may be the most important aspect of the HSV-1 animal model, and changes over time must be properly modeled and evaluated statistically. Time series or so-called “repeated measures models” represent an additional level of complexity where outcomes are frequencies.
One assumption which underlies the correct application of parametric statistical methods such as the analysis of variance (ANOVA) is an assumption concerning the normal distribution. This assumption is frequently a source of confusion for nonstatistically trained consumers of statistical analyses. It is not the distribution of the values of the variables under analysis that this assumption addresses; thus the frequently mentioned (erroneous) idea that if values of a variable are not normally distributed, parametric methods cannot be applied, and discrete or nonparametric methods must be used. It is rather the distribution of the means of samples of a variable that we expect to be normally distributed, and this is the assumption our application of parametric methods requires. This approach to the normal distribution of the means of samples from a random variable is addressed in statistical theory by the central limit theorem [
94]. This theorem provides that the means of samples from any distribution (normal or not) will approach the normal distribution in the long run with repeated sampling. Parametric methods such as the
t-test or ANOVA will perform in a reliable manner if the assumption of normality is violated or not. These tests exhibit a property of statistical methods called robustness.
The assumption of the equality of the variance of each sampled population causes more substantial problems with the performance of statistical estimators. Violation of this assumption causes us to turn to categorical or nonparametric methods rather than the normal distribution of sample means. Means from samples which include varying numbers of high counts or zeros are likely to be unequal in variance. This situation is typical in counting herpetic lesions or in counts of viral particles and is characteristic of processes that are Poisson distributed or distributed in accord with other random discrete distributions.
Simple chi-square tests of a basic kind taught in statistics 101 courses are useful when viral counts are compared between treatments without time observations. A more conservative alternative is the use of exact versions of contingency table tests. These originated in the Fisher's exact test, which was originally applied to 2 × 2 contingency tables [
95], but have been extended to
n ×
n contingency tables in the modern era of cheap computational power [
96]. Power analyses for such experiments and considerations of subdividing the effects which account for overall significance have been described in basic nontechnical terms [
97,
98].
Perhaps more useful but more challenging are analytic methods applying generalized linear models [
99]. Here the analyst has much greater flexibility to model multiple simultaneous experimental effects such as treatments and their effects over time, as well as to control for “nuisance variables” such as groups or litters of animals. Many different “link” functions can be applied in such models, providing linear additive models type treatment of essentially nonlinear response functions. Extensions of generalized linear models have been developed and are widely applied to observations of subjects over multiple time periods, such as are those often obtained in animal experiments where individual animals are assessed at multiple time points [
100].
Many responses are nonlinear, such as the responses that are often seen in pharmacological experiments, or survival data where observations may be incomplete at the conclusion of the experiment (censored observations). In these cases, modern methods of analysis that specifically deal with nonlinear responses allow analysts flexibility to deal with random and fixed effects in the same models of animal testing [
101]. More ability to deal with the complexities of experimental design increases the demands for the expertise of the analyst, as more parameters must be assessed and determined and more diagnostics conducted to assess model fitness. As is said in statistical consulting, “what must be taught to clients is not how to do statistics, but when to ask for a statistical consultant.”