Traditional Statistical Approach

A traditional statistical approach to a pre-post comparison in a controlled experiment consists of four steps. In Step 1, we examine the distributions of measures at both time points: pre- and post-event, and provide summary statistics, e.g. mean, median, standard deviation, and inter-quartile range separately for pre- and post-event measurements. In Step 2, we test for potential outliers, transform original values if necessary, and impute missing values if appropriate. In Step 3, we estimate the change for each subject, and again, provide summary statistics for the difference between pre- and post-event measurements. In Step 4, we test the significance of change using parametric or non-parametric tests.

Typically, the results of the analysis are summarized in a tabular form, similar to Table . Simple visualization tools can greatly enhance the understanding of results and help with interpretation. For example, Steps 1 and 3 of the analysis can be easily represented by using, for example, a dot-plot, box-plot, line-plot, histogram or scatterplot. The selection of a visual display for depicting a distribution of measured outcomes mainly depends on the sample size and on the intended purposes. In modern publications, for various reasons (primarily the commonly used software packages), a dot-plot is the representation of choice. However, a distribution of measured outcomes for a sample of moderate size can be clearly presented by a compact box-plot. We provided two box-plots for our sample in Figure . Box-plots clearly depict five summary statistics: 5^{th}, 25^{th}, 50^{th }(median), 75^{th}, and 95^{th }percentiles for both pre- and post- samples. The box-plots illustrate the first important observation: the distribution at a baseline (or at time *t*_{1}) is more compact and has a lower median value than the distribution for the second time point (*t*_{2}). Both distributions have a tendency to be skewed toward high EU values with a few distinct outliers, suggesting the potential need for some form of transformation, for example a log-transformation, which we employed in this tutorial.

| **Table 1**Summary statistics for the pre & post event immune response (IR) |

While compact and easy to interpret, box-plots do not reflect the individual changes. To correct for this, the box-plot can be supplemented with a line graph (Figure ), which illustrates individual trajectories and the degree of similarity among the responses. If the majority of subjects have higher (or lower) outcomes at the second time point compared to the first, we observe the overall similarity or "synchronization" in the responses. Typically, if the change is chaotic, e.g. little synchronization in the change is observed, and both distributions substantially overlap, then the conclusion of an unremarkable change is supported by the *t*-test results. However, if the distributions of outcomes for pre- and post- samples are compact and spread far apart, meaning a pronounced synchronization in the overall change is observed, then the comparison of the individual values at two time points produces a low p-value in the paired *t*-test.

A histogram of the observed absolute differences, Δ*Y*_{i }= Y_{t2 }- Y_{t1 }(Figure ), describes the pattern of change. The histogram provides a clear depiction of the predominant direction of change: the majority of children exhibited an increase in IR values (except for one case with a negative value); the difference between post- and pre- values ranges between (-435.0 and 459.4), 50% of children exhibited a change of 104.4 units; 25% of children exhibited a change less than 42.4, 75% of children had a change less than 161.6, yielding an IQR of 119.2 units of IR. On average, there was an IR increase of 116.2 ± 114.3. Testing if the average change differs from zero, yields a *t*-value of 5.095 that corresponds to a p-value of < 0.001.

However, the observed skew in the distribution warrants the use of appropriate transformation and non-parametric tests that alleviate the effect of skew toward high IR values and potential outliers. In our case, a non-parametric Wilcoxon signed-rank test also yields a very low p-value of < 0.001. Next, we transform the IR data using natural log and produce box-plots for pre-event and post-event IR data (Figure ). The summary statistics for transformed data are presented in Table . The transformed data has less skew as indicated by the box-plot and the coefficients of skewness and kurtosis in Table . The superimposed distribution line (in Figure ) offers an insight on how close the observed differences are to the normal distribution. On average, the absolute difference, D*Y*_{i }= lnY_{t2 }- lnY_{t1}, was 1.38 ± 0.97. Testing if the average change differs from zero, yields a *t*-value of 9.037 that corresponds to a p-value of < 0.001. A better way to present the expected difference is to express it as a percent increase together with the 95%confidence interval: so, in our example the average change is 38% and 95% CI is [7.5%; 69.5%].

This standard analysis provides a first glance at the nature of change in the outcome of interest. The succinct form of visuals and tables offers a comprehensive overview of results that are clearly presented to support the study conclusion of a significant increase in IR following a diarrheal episode. In this analysis we assume that the mean and variance of the change are reasonably constant throughout the range of pre-event values. We also assume that neither skew nor outliers affect the estimate of the change and the relationship between pre-and post- event values. However, to reach meaningful conclusions these assumptions should be further explored.

Further Explorations

There is more to the behavior of individual differences that deserves further attention. In general, we assume that high post-event values simply reflect the magnitude of the immune response in that particular child. However, it is plausible that the degree of change in the immune response might depend on the baseline value. We may suspect that children with high pre-event values tend to have high post-event IR values, but the relative increment would be small. Detection of such a relationship is an important finding because it highlights the heterogeneity in the observed change.

To explore the dependency of the change in immune response on the baseline values of IR, we use two scatter-plots. We plot the baseline IR values on the horizontal axis -- labeled as "Pre-event immune response" -- and the measurements for the second time point -- labeled as "Post-event immune response" -- on the vertical axis. Figure shows a scatter-plot of pre- and post- values. Next, we examine the correlation between two sets of measurements using Pearson and Spearman correlation coefficients. The Spearman coefficient is based on the order, *not *the absolute values, and therefore is less sensitive to unusually high IR measurements. In the presence of heavy skew and potential outliers, these coefficients are not expected to be identical. Both correlation coefficients are low (Pearson's r = 0.030, p-value = 0.854; Spearman's ρ = 0.267, p-value = 0.096), indicating no dependency of the post-event IR on IR at baseline.

For the mean and SD of the change to be meaningful estimate we must assume that they are reasonably constant throughout the range of pre-event values. As in Bland and Altman [

5] argument, the usual plot of the pre-event and the post-event IR values is inefficient, because children with high pre-event values might have high post-event IR values, but the relative increment can be small or vice versa, i.e. children with high increment will express very high post-event values. Figure demonstrate the relationship between the difference and average for the actual values of IR, whereas Figure uses the log-transformed IR values and reveals a lesser dependency. This suggests that the change in actual values linearly increases with an increased IR and is scale-sensitive. However, we need to be careful when interpreting the changes in log-transformed IR values as small absolute differences in IR can result in large differences in logged values, especially if the pre-event values are very small.

Cases with atypical behavior are often considered to be outliers and deserve special treatment. It is possible that their effect on the overall results is disproportionally strong and might bias our conclusions. Such bias has to be carefully examined. For example, Figure shows that the highest EU value at baseline is associated with the largest decline in EU values: one child had an unusually high IR value before the event of interest, which later declined to approximately average IR values at the second time point. We consider this point to be an outlier based on the fact that it is biologically implausible to have a decrease in immune response following an event, unless the child had a prior exposure to the pathogen that has gone undetected. This outlier is marked in Figure . Now we illustrate how the effect of an outlier on the relationships between change and baseline values can be examined.

Let us flag and remove the suspicious value first noted when we plotted the baseline values in Figure (marked as "outlier"). We intentionally did not emphasize the outliers for post measures in Figure because the log transformation eliminates their effect [

6]. Then, we recalculate the correlation between pre- and post-event values in the sample without the case exhibiting unusual behavior (Figure ). Once again, the overall correlation is weak and non-significant, although the estimates are getting closer (Pearson's r = 0.247, p-value = 0.129; Spearman's ρ = 0.287, p-value = 0.076). This observation supports the earlier statements that the pre- and post-event IR's are not correlated.

For Figure , the difference in correlations for actual values (Pearson's r = 0.279, p-value = 0.816; Spearman's ρ = 0.633, p-value < 0.001) is likely to be driven by the outlier. After removing the outlier the correlation coefficients become very high and similar (Pearson's r = 0.887, p-value < 0.001; Spearman's ρ = 0.761, p-value < 0.001). For the log-transformed data (Figure ) the correlations are low for a complete data set (Pearson's r = -0.278, p-value = 0.082; Spearman's ρ = 0.147, p-value = 0.366) as well as for the data with an outlier removed (Pearson's r = -0.125, p-value = 0.449; Spearman's ρ = -0.080, p-value = 0.630). This confirms the early statement that the change in actual values depends on IR and is scale-sensitive.

These findings suggest that the immune response might exhibit more complex behavior than we originally postulated. We can suspect that the study group is heterogeneous and it is quite likely that some important factors, or interactions, must be considered in order to better understand the nature of the observed relationship. The child's time of measurement relative to time of exposure and infection, and the infecting dose are potential candidates for such factors.

In this section, we illustrated how one can test the hypothesis that children with high post-event values tend to have high pre-event IR values. We also highlighted the importance of better understanding these associations in order to reach valid conclusions. In planning the pre-post experiments in human cohorts, it is essential to collect information on parameters that can identify potential interactions and/or effect modification that might affect the degree of change. There are also a number of techniques that can further improve a statistical analysis protocol, including statistical tests for the difference in correlation coefficients.

Flaws of a Traditional Scheme: Assumption of Time Irrelevance

The traditional scheme, although simple and familiar, might be inadequate for "real-life" settings in measuring human immune response. The major flaws are in the basic assumptions that might be valid in experimental conditions but not in "field" studies. Typically, in experimental settings or in clinical settings, the time of measurement, the dose of exposure, and the timing of exposure are well controlled. In "field" or observational studies, the time of measurement, even in a well-run cohort, is subject to fluctuations, which have to be carefully considered in the analysis. Moreover, the "time of event" - in our case the time of exposure to infection with a specific pathogen - could not be predicted (unless performed in volunteer studies). Such features do not make human studies any less precise or valuable, but they do require serious attention to details and the proper selection of statistical techniques.

One of the major problems of traditional analysis is the assumption of time irrelevance. Figure adequately summarizes the change in individual measures. However it implies that the measurements were taken at the same time for all pre- or post- event samples. It also may imply that the time of measurements is irrelevant for the purpose of an analysis. The use of an absolute difference of pre- and post-event measures, as a characteristic of change, implies that the observed change is solely due to the event of interest. The duration between pre- and post-event measures, even if specified, is often assumed to be irrelevant to the observed magnitude of change. In this section, we illustrate that the time of measurement is not a trivial matter and must be specifically considered.

Let us re-plot Figure to capture the time of measurements taken prior to the event and post event with respect to the time of event itself. Instead of anchoring the time of pre- and post-event measurements to two arbitrary points on horizontal axes (as in Figure ), we plot the values of pre- and post-event measurements with respect to time of episode for each child. To do this, we convert the chronological dates of diarrhea onset and collection of blood samples for serum IgG levels into time in weeks *before *and *after *an episode. In Figure , all 40 pairs are aligned with respect to episode timing, so that the measurements *before *an episode will be on one side relative to a vertical dotted line, and all the measurements *after *an episode will be on other side. Thus, the negative values for "time" represent time of a sample taken before an event and the positive values represent time of a sample taken after an event. On the horizontal axis, each line starts with a negative value for time and ends with a positive value for time. The vertical axis reflects the IR values for immune response at baseline and post event. Now, we examine the average time of sample collection. On average, samples were taken 7.7 ± 5.9 weeks prior a diarrheal episode and 8.3 ± 4.9 weeks after an episode (see Table ). No tendency for a shift toward longer or shorter time interval post event was observed (p-value = 0.660).

| **Table 2**Summary statistics for the timing of pre & post event serum sample collection (IR) |

In observational studies the consideration of timing can be quite important for reaching valid conclusions. In the simplest cases and when the mentioning of duration between pre- and post- measurements contributes to a better understanding of a process, the change in IR can be easily adjusted. This can be done by presenting the average change per unit of time, e.g. days, weeks, or months. In our example, the summary statistics for IR increase, presented in Figure , will change from its absolute value of 116.2 ± 114.3 to 8.3 ± 9.0, reflecting a weekly rate of change. This is a more accurate representation of overall change in IR, and is suitable if the assumption that the rate is constant over time for each child is correct. However, it is plausible that the magnitude of IR depends on when the sample was taken after an infection, or in other words, depends on the time elapsed since the time of exposure to a pathogen or of onset of symptoms. To explore this phenomenon, we shall perform a proper analysis.

Detection of Important Features in Immune Response Dynamics

It is well known that, in general, immune response to an antigen might exhibit a detectable increase after an antigenic challenge, reach its maximum and then decline to a baseline level or continue to be slightly elevated for some time. Such behavior is characteristic for non-linear dynamics and requires caution if traditional statistical techniques are attempted.

In Figure , the IR values for each pre-post pair are centered according to the timing of an episode. Thus, negative values on a horizontal axis indicate the time of a sample taken before a diarrheal episode and positive values are given to times of samples following an episode. This graph covers the whole range of times at which the measurements were taken and demonstrates that although the measurements taken after an episode were overall higher than the measurements taken before an episode, the degree of increase could be a complex function of time elapsed since a diarrheal episode. In order to reveal this non-linear feature we smoothed the data using statistical tools that help to describe the relationship between IR values and timing relative to diarrheal episode. For this step, we superimposed a plot with individual lines for each pair with a curve built from a model applied to small localized subsets of the data, so-called loess [

7] a locally weighed least squared regression (Figure ). The non-parametric smoother allows estimation of an average value for IR in sequences of small subsets, each covering a portion of data points. The smoothed curve reveals that the measurements obtained before the diarrheal episodes remain stable across the whole time period: from 30 weeks until at least 5 weeks prior to the diarrheal episode. Right before the episode, we start to observe a steady increase in IR responses until approximately 9 weeks after an event, when the increase reaches its maximum. Then, the curve starts to decline to the baseline level at about 20 weeks after the event. Similar results were shown in serological analysis of a cryptosporidial epidemic where the intensity of response was higher among specimens drawn 8 weeks after the first case report compared to those before or after that period [

8].

This analysis helps us to define a curve without forcing a specific shape. If the episode timing is accurate and unbiased, one can model this process accounting for a discrete step with a curve consisting of two segments: one for a pre-event period and another for a post-episode period, so a step will be allowed at the event. In our case, the timing of an episode is self-reported and might contain some delay, which is potentially reflected by a rise prior to the episode.

To provide estimates of the curve's features and measures of the modeling uncertainty, we fit a generalized additive model [

9,

10] with a form similar to what we found in the previous step (with the cubic-splines supported by 5 knots). Figure shows the values of IR change predicted by the model (shown with the solid black line) and the curve's confidence interval (CI) at 95

^{th }percentiles (the upper and lower boundaries are shown with black dotted lines). The model explained 32% of variability in the data. The average IR value was estimated to be 58.1 units. The estimated time to peak was 7.7 weeks post-diarrhea when the IR reached its maximum of 138.1 with a standard error of 18.2. To confirm the quality of estimates, we simulated the curve for the period of 4 - 20 weeks post-diarrhea, with a refined time increment of 0.01 weeks and obtained the estimates of peak timing and found its CI to be practically identical to the presented estimates.

This analysis helps to identify important features of immune response dynamics: general stability of immune response before an episode and a non-linear dynamic of response over time with a pronounced peak followed by a decline to a baseline level. These findings have important implications. For example, they might help to optimize future study designs by better targeting the time period that captures the best immune response.

Age as a Factor of Immune Response

Immune response can vary with age. The standard analysis can not explicitly account for potential differences in age. Even in the most closely observed study cohorts, the age at which a child contracts an infection cannot be controlled. To better understand how chronological age affects the immune response, we have to perform an analysis with an intention to detect the association between IR and the age at measurements.

Just as we aligned the time of measurements relative to the time of an event, we can similarly align measurements to the age of a child and explore how the magnitude of response is associated with the child's age. Figure shows the individual IR values in pre-post pairs for each child according to the age of measurement, reflected on the horizontal axis. For example, the youngest child in this study was 11 weeks old when his/her first measurement was taken; 15 weeks later the second measurement was taken when this child was 26 weeks old. Using another example, a child whom we consider to be an "outlier" (marked on Figures , and ) was 114 and 135 weeks old at the time of the first and second measurements, respectively.

It is evident that all children except one have relatively lower IR values in the pre-samples as compared to the post-samples. Younger children tend to have a relatively smaller increase in IR than the older children. This phenomenon can be thoroughly explored. Let us re-plot Figure but without the lines connecting each pair. Thus, Figure shows IR values for the first measurements (empty circles) and the second measurements (solid circles) for each child. Now, we fit the best curve to each set of points separately: one curve for all measurements taken before child's diarrheal episode (dashed line) and one curve for all measurements taken after an episode (solid line). These curves represent predicted values of IR as a function of age, and indicate that IR measured before an episode (as well as after) increases as children are getting older. These predicted values were obtained from a log-linear regression model, in which an increase in variability that occurs along with an increase in IR is accounted via the Poisson assumption for error structure [

11] as follows:

Using regression coefficients as applied to pre- and post- measurements we are able to estimate the overall magnitude of an increase in IR and the increase in IR associated with one unit of age. Regardless of age after an episode of diarrhea, the IR is 4.2 times higher compared to the IR before an episode. Both pre-event and post-event IR values depend on the child's age: on average, in one year (50 weeks) IR increases by 67.8% (95% CI: 62.3%, 73.2%) in baseline measurement and by 22.3% (95% CI: 19.2%, 25.5%) in post-episode measurements. We then confirm these results by removing an outlier and also by using a generalized linear model adapted for the log-transformed values and a Gaussian error structure (data not shown).

This analysis shows strong relationships associated with age at first encounter with an infectious agent, helps to determine how age contributes to overall increase in IR, and identifies critical features in IR related to time of measurements.

Environmental Exposure as a Factor of Immune Response

Diarrheal infections are known for their seasonal behaviors: some peak in hot summer months, some peak in cold dry winter months [

12-

14]. Environmental factors with seasonal effects drive the probability of exposure to pathogens, affect concentration and pathogenicity, modify susceptibility to infections; and have to be considered whenever it is possible [

15]. It is rare that, in a large scale population study, such as a birth cohort, or even in a well run clinical investigation, the enrollment of study subjects can be done instantaneously or during a very short time interval. This means that during the study, subjects can be exposed to different environmental conditions, which can affect the time of an episode and modify the IR values obtained before and after an event of interest.

In our example, the birth cohort was recruited over 18 months and the samples were collected from March of 2002 until August of 2006 [

16,

17]. To detect the seasonal change in IR, we re-plotted Figure with the calendar time instead of age. Figure shows individual pairs with the time of measurements for pre-event sample (empty dot), post-event measurements (solid dot) and the time of diarrheal episode (blue triangle) for each child. The first measurement in this study was taken at week 50 counting from the time of the first enrollment. We perform this analysis in a manner similar to the study of association with age. When we plot the predicted curve for pre- and post- event measurements over time (Figure ), the temporal patterns emerge: the baseline level remains relatively stable regardless of sampling date and the IR values are consistently above the baseline level.

A very peculiar feature emerges from Figure (A and B), which suggests that the timing of diarrheal samples occurred in two temporal clusters approximately one year apart. To explore this phenomenon, we re-plot Figure as a needle-plot (Figure ). We label each month in the horizontal axes, express the change in IR as log-fold increase: Δ

*Y*_{i }= ln(Y

_{t2}/Y

_{t1}), and place individual values of log-fold increase at the time of each episode (these time points are marked by blue triangles in Figure ). Based on the density of events across time, we can depict two clusters of episodes: one between June-August, 2003 and the other - in February-March, 2005. This graphical depiction complements the findings presented in Figure , suggesting an increased probability of the seasonal exposure to

*Cryptosporidium *during two time periods that are 3-6 months apart: from February to March and from June to September. The seasonal clustering can be further examined using regression models adapted to time series data [

9]; however, such analysis typically requires a longer time period of observations and/or more events. When we extended this analysis of seasonality by considering the essential information related to residential locations (latitude and longitude) of children with diarrheal episodes and performed an analysis of spatio-temporal clustering, these two significant clusters of cryptosporidial diarrhea emerged [

18].

While Figure shows log-transformed ratios with respect to chronological time of a diarrheal episode for each child, similar analysis can be performed for understanding IR with respect to child age (see Figure ). We divided the whole age range into seven equal age intervals of 6 months to closely mimic child development, which is important in acquiring infection. We then estimated the number of episodes for each interval (see Table ), and found that the episodes are more likely to occur when children are between 6 to 18 month old, although the log-fold change in immune responses is relatively stable across all age categories (p-value = 0.525). This analysis complements our findings presented in Figure and suggests an increased probability of developing the first infection due to *Cryptosporidium *between the ages of 6-18 months of age. Combined with the observations described above, we can assume that by the age of 24 months a child might be re-exposed multiple times.

| **Table 3**Summary statistics depicting the distribution of cases in different age groups and the change (log-fold) in their immune response (IR) |