PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of biostsLink to Publisher's site
 
Biostatistics. Jan 2014; 15(1): 129–139.
Published online Oct 4, 2013. doi:  10.1093/biostatistics/kxt039
PMCID: PMC3862212
A conditional predictive p-value to compare a multinomial with an overdispersed multinomial in the analysis of T-cell populations
Qinglin Pei
Department of Statistics, University of Wisconsin, Madison, WI 53706, USA
Cindy L. Zuleger, Michael D. Macklin, and Mark R. Albertini
Department of Medicine and Carbone Cancer Center, University of Wisconsin, Madison, WI 53792, USA
Michael A. Newton*
Departments of Statistics and of Biostatistics and Medical Informatics, University of Wisconsin, Madison, WI 53706, USA
*To whom correspondence should be addressed. newton/at/stat.wisc.edu
Received January 31, 2013; Revised July 25, 2013; Accepted September 2, 2013.
Immunological experiments that record primary molecular sequences of T-cell receptors produce moderate to high-dimensional categorical data, some of which may be subject to extra-multinomial variation caused by technical constraints of cell-based assays. Motivated by such experiments in melanoma research, we develop a statistical procedure for testing the equality of two discrete populations, where one population delivers multinomial data and the other is subject to a specific form of overdispersion. The procedure computes a conditional-predictive p-value by splitting the data set into two, obtaining a predictive distribution for one piece given the other, and using the observed predictive ordinate to generate a p-value. The procedure has a simple interpretation, requires fewer modeling assumptions than would be required of a fully Bayesian analysis, and has reasonable operating characteristics as evidenced empirically and by asymptotic analysis.
Keywords: Bayesian p-value, Dirichlet multinomial, Double overdispersion, Fisher's exact test, HPRT assay, Mass culture experiments, Molecular sequence data, T-cell receptor
When testing the equality of two discrete populations, Fisher's exact test applies naturally to multinomial samples (e.g. Agresti, 1990, p. 62). It is widely used, easily developed, and readily interpreted, but it lacks robustness of validity when sources of variation create overdispersion relative to the multinomial. A more suitable model-based test is available in a case study from immunology that motivates the present work. One of the interesting features of the problem addressed concerns distributional shifts expected in the alternative hypothesis. If the biology is as suspected, then one population becomes a reduced-entropy version of the other. That is, probability masses in one population are concentrated on fewer support points when compared with the other population. But a similar concentration is a consequence of the overdispersion mechanisms governing data generation, even if the null hypothesis holds. Separating these effects to deliver refined statistical inference is the goal of the present work. Given the availability of powerful sequencing technology, we expect that data of this type will continue to be generated, and thus we are motivated to develop appropriate statistical methodology.
We address the testing problem by developing a conditional predictive p-value. Briefly, this involves splitting the data into two pieces. We condition on one piece to drive posterior inference for unknown parameters as well as predictive inference for the second piece of data; the predictive ordinate at the observed data acts as a test statistic. The approach is effective in the case study considered and exhibits good operating characteristics as determined through simulation and asymptotic analysis.
Research on cancer immunotherapy aims to understand and enhance those components of the adaptive immune system that recognize and attack tumor cells. Recognition is through T cells, whose cell-surface receptors enable them to bind antigen, in the context of the appropriate antigen-presenting molecule, and initiate an immune response. The task of identifying T cells that are reactive to tumor-cell antigens is complicated by the tremendous diversity of the T-cell repertoire within the body. It is estimated that 1012 T cells constitute a human, with possibly 107 distinct clonal types from among 1018 possibilities. This diversity is caused by combinatorial and stochastic processes by which T cells mature in the thymus, and it is manifested in a variation of DNA that encodes the α and β components of the T-cell receptor (e.g. Robins and others, 2009; Venturi and others, 2011). Of interest in the experiments summarized below are the J (joining) and V (variable) regions of the complementarity determining region 3 (CDR3) of the T-cell receptor β chain, which exhibits hypervariability and contributes to the antigen specificity of the T cell.
A T cell proliferates after it has been activated by recognition of its cognate antigen, and it produces a clone of descendants that share its specific cell-surface receptor. Thus, a promising approach to narrow the search for tumor-reactive T cells is to select from a patient's blood sample those T cells that have undergone post-activation cell proliferation. Assays have been developed based on somatically acquired mutations in the hypoxanthine-guanine phosphoribosyltransferase (HPRT) gene (Albertini and others, 2001, 2008). Briefly, cells that have incurred a loss-of-function mutation in HPRT are resistant to the effects of the nucleoside analog 6-thioguanine (6-TG), which otherwise is lethal to the cell. The frequency of HPRT mutant (MT) T cells is affected by various factors, is in the range 10−6 to 10−5 per mononuclear cell, and is readily estimated from a patient's sample via MT frequency analysis (Albertini, 2001). The rationale for HPRT-based selection is that, when compared with quiescent T cells, proliferating T cells are at an increased risk for having somatic gene mutations. The HPRT MT T cells therefore should be enriched for proliferating cells compared with wild-type (WT) T cells that have not been selected with 6-TG. In the analysis of data from these selection experiments, an important baseline statistical test is of the null hypothesis that MT and WT cell populations are not different with respect to frequencies over possible cell-surface receptors. It is prudent to perform the test as a check on the efficacy of the approach.
Recently, our group sequenced T-cell receptor CDR3's from cells in several compartments from six melanoma patients (Zuleger and others, 2011). We reconsider here data from the blood compartment in order to carefully develop a test of equality of discrete MT and WT populations. In the Zuleger and others (2011) data, counts of various types of CDR3's would have had multinomial distributions if the receptor sequences were obtained from random samples of MT and WT cells. However, the mass culture (MC) conditions used to propagate cells in vitro induced additional variability in the MT-type counts. The initial sample of cells from each donor was distorted by two related factors in culture. First, 6-TG treatment created a population bottleneck by killing all but the small fraction of input cells that carried an HPRT mutation. Second, cells were grown back to sufficient number to enable receptor sequencing. There was the possibility that different clones had different in vitro growth rates, and if so the type frequencies at the sampling/sequencing time would have differed from those of interest immediately after 6-TG treatment.
Receptor sequence data were obtained by two protocols; the MC method, indicated above, created extra-multinomial variation in MT counts. A second protocol was applied to a subset of sampled material; the single-cell (SC)-derived isolates produced less sequence data but were not subject to the overdispersion phenomenon. Whereas MC consists of culturing T cells from the blood as a bulk population, SC involves plating blood cells at limiting dilutions so that the derived monoclonal cultures can be sequenced and utilized in functional assays. Further details are provided in Section S2 of supplementary material available at Biostatistics online.
3.1. Data
Receptor sequencing was performed separately on material from different tissue samples. With i indexing the tissue sample and t indexing the type of the CDR3 receptor, the primary observable data are counts equation M1, which for sample i and CDR3 type t count the number of sequences of that type in that sample by one of two methods (SC and MC) and in one of two cell types (WT and MT).
equation M2
At the finest scale, t records the specific CDR3 amino-acid sequence; several thousand distinct t's were observed in Zuleger and others (2011). To increase expected type counts, we performed analysis at coarser scales, in which t records any one of many distinct CDR3 sequences sharing a specific structural property. The J-region of the CDR3 assumed one of 13 possible structures; the V-region assumed one of 48 possibilities. Owing to experimental exigencies, a possibly different amount of sequencing was performed in each sample i on study. With reference to the above notation, the total numbers sequenced were
equation M3
Tables Tables11 and and22 show MC and SC J-region counts from six melanoma patients. V-region data on these patients are in Tables S1 and S2 of supplementary material available at Biostatistics online.
Table 1.
Table 1.
Data: frequencies of J-region types, melanoma patients, blood, MC method
Table 2.
Table 2.
Data: frequencies of J-region types, melanoma patients, blood, SC-derived isolates method
3.2. Sampling model
The primary inference task was to compare MT and WT populations using the count data. To this end, let equation M4 denote a vector recording the underlying proportion of each CDR3 type in the population being sampled. We worked on the hypothesis that this population was the same for WT and MT cells, and we sought to quantify evidence against this null. The issue of whether or not different individuals could be assumed to share this common θ was important, and one certainly expects fluctuation at some level; however, the magnitude of such fluctuations was small. For example, a Fisher test of common θ among subjects based upon the WT-SC data, which should not be affected by multinomial overdispersion, showed no substantial violations of the common-θ assumption (see Section S3 of supplementary material available at Biostatistics online). Thus, we retained a single θ vector in subsequent computations.
Considering the sampling and measurement process, and assuming the null hypothesis that MT and WT populations are equivalent, we expected the following:
equation M5
We did not expect the counts equation M6 (those from MT cells grown in MC) to follow this sampling model, since in vitro effects induced extra-multinomial variation. The practical consequence was that equation M7 was very high for perhaps one or just a few sequence types t (e.g. J-region type 1-1 for patients A and C, Table Table1).1). Interestingly, the research hypothesis is that the MT population is enriched for expanding (thus large) clones, and so we expected such clustering of sequence data if the immune system was actively responding to the melanoma. A goal of the statistical inference was to assess evidence against the null hypothesis while accommodating experimental factors that induced non-null-like clustering even in the absence of differences between the MT and WT populations.
A bottleneck was created in vitro since, on the average, a small fraction equation M8 of cells in sample i carried the necessary HPRT mutation to withstand the 6-TG treatment. This mutation frequency equation M9 was estimated for each sample (see Table S3 of supplementary material available at Biostatistics online). The surviving population grew back in culture to yield sufficient numbers for receptor sequencing. Then a random sample of equation M10 sequences was obtained from the cell population after it had undergone this in vitro growth, giving observable type counts equation M11.
To analyze the bottleneck/growth effects, consider latent random counts equation M12, which record how many cells of each type survived 6-TG treatment prior to expansion in vitro. These components should be approximately independent and Poisson-distributed, with
equation M13
(3.1)
where equation M14 is the mutation frequency, equation M15 is the population fraction of cells of CDR3 type t, and equation M16 is the number of cells subject to 6-TG treatment in sample i (see Table S3 of supplementary material available at Biostatistics online). If all clones grew in vitro at the same rate, then the observable MT counts equation M17 would be multinomially distributed, with rates proportional to each equation M18. Experimentally, it is known that growth rates can vary; we allowed the variation in each rate to depend on the latent equation M19 and a single overdispersion parameter equation M20, through a Dirichlet-multinomial model. Specifically
equation M21
(3.2)
for equation M22 counts summing to equation M23 and for possible latent counts equation M24. The Dirichlet-multinomial model has been used extensively for categorical data (e.g. Petkau and Sitter, 1989; Johnson and others, 1997, p. 80; Dunson and Tindall, 2000). It is a convenient model to accommodate extra-multinomial variation. Letting equation M25, it follows from (3.2) that
equation M26
(3.3)
The Dirichlet-multinomial model was also supported by subject-matter considerations. Suppose that type t is present in proportion equation M27, say, in the population of cells from tissue sample i after several weeks expanding in vitro, where the latent vector of proportions equation M28 has a Dirichlet distribution with parameters equation M29. The random sampling of sequences after in vitro growth then gives a multinomial distribution for counts, conditionally upon equation M30; the marginal Dirichlet-multinomial follows by integration. Furthermore, the Dirichlet model for cell population at sampling time was supported by models of species abundance, since a Gammaequation M31-distributed population size for each clone would lead to the named Dirichlet distribution on vectors equation M32 (Dennis and Patil, 1984). In allowing variable growth rates by this mechanism, we have assumed that the contribution to the sampled population from each cell that survives 6-TG selection has a Gamma(ϕ, 1) distribution.
A useful aspect of the Dirichlet-multinomial model is that it carries over to any collapsing of the categories. Specifically, let equation M33 be a set of collapsed categories equation M34. Correspondingly, there are collapsed observed counts equation M35 and latent counts equation M36 from the experimental data. It is a property of the Dirichlet multinomial that the distribution of equation M37, given equation M38, has the same form as in (3.2), but with checks everywhere instead! We say the sampling model for Y is doubly overdispersed, since there is extra-multinomial variation (caused by variable in vitro growth rates) even conditionally on the Z counts, and these counts are also unknown. When the surviving cell count equation M39 is expected to be small, the variance of observed counts equation M40 is inflated both conditionally (through the variance inflation factor in (3.3)) and, marginally, through extra variation in equation M41.
3.3. Prior
The sampling model from Section 3.2 involves unknown proportions equation M42 and an in vitro-growth parameter equation M43. Numerical experiments (not shown) indicated that the posterior distribution of these parameters was not particularly sensitive to the prior setting. We report output in the case of a flat Dirichletequation M44 prior for θ and, independently, an improper flat prior for ϕ.
We combine WT counts into a vector equation M45 over types t by adding contributions from equation M46 and equation M47 across all tissue samples. That is, equation M48. Since we have combined multinomial counts governed by a common probability vector, the summary counts retain the multinomial form. Further, we let Y record the MC MT data equation M49 from all tissue samples (these are subject to overdispersion) as well as any available SC-derived counts equation M50. Here, we do not collapse by counting contributions over i, since such summarized counts would entail undue information loss; instead Y is a collection of vectors.
To emphasize notational distinctions, X and Y refer to random elements in our actual experiment, taking possible values x and y. As is sometimes done in p-value discussions, we introduce a separate notation for data obtained from a hypothetical repeat of the experiment: let equation M51 denote a hypothetical repeated draw of the random vector X. Having observed equation M52 and equation M53, the proposed p-value is
equation M54
(4.1)
where equation M55 is a posterior predictive distribution for WT counts given observed MT counts:
equation M56
(4.2)
Here, θ is the vector of unknown probabilities over types. Although x, the possible realization of WT counts, is used above, nowhere have we conditioned on the event equation M57. Had we conditioned in (4.1) instead on all the data equation M58, then the p-value would be a posterior predictive p-value (Gelman and others, 1996). This object can be unduly conservative, and so we decided to condition on part of the data only, as in the conditional predictive p-value approach (Bayarri and Berger, 1999, 2000; Evans, 2000; Robins and others, 2000). There, one conditions on part of the data, U (in our case, the MT counts Y), and generates a p-value from the conditional distribution of a statistic T that is as independent from U as possible. Our particular choice to split data by MT and WT counts and to construct a test statistic from the conditional density is most similar to Evans’ cross-validatory surprise (Evans, 2000), though examples and properties for count data seem not to have been developed previously.
A further justification of the proposed p-value (4.1) is its structural similarity to Fisher's exact test p-value, which would be suitable in the absence of multinomial overdispersion. Following the notational conventions above, Fisher's p-value is
equation M59
(4.3)
where equation M60 holds the sufficient statistic vector for the null frequencies θ, and where equation M61 is a generalized hypergeometric mass function. In the absence of overdispersion, this p-value is exact in the frequentist sense of being dominated by the uniform distribution, but unaccounted sources of variation tend to deflate and invalidate equation M62. Splitting the data into its natural components X and Y enables construction of a conditional p-value that is similarly reliant on a conditional mass function as a test statistic. Further, in conditioning on one of the data components, it is more sensible to condition on Y, as we propose, since Y contains information on the frequency parameters θ as well as the overdispersion parameter. The WT data X informs only θ, on the other hand conditioning on X instead of Y would make the computation more difficult and more sensitive to prior information.
The null sampling distribution of the conditional predictive p-value proposed in (4.1) is neither exactly uniformly distributed nor dominated by the uniform distribution. Since posterior simulation is used to average over unknown variables, this null distribution is difficult to determine. The proposed p-value is a valid frequentist p-value in the sense that it converges to a uniform distribution as sample sizes diverge (Section 6). A fully Bayesian test of homogeneity between MT and WT proportions provides an alternative approach to the inference problem. This could be pursued since Bayesian analysis is already used to generate conditional predictions. We take a different approach, partly because a full Bayesian development would require further non-null modeling assumptions. Gelman and Shalizi (2012) discuss this and related factors supporting the use of predictive p-values.
Calculating the conditional predictive p-value (4.1) involves sampling hypothetical WT data vectors equation M63 from their conditional distribution, given the observed MT data structure equation M64. It also requires repeated evaluation of the predictive ordinate equation M65 as well as the predictive ordinate on the observed WT data equation M66. We achieve the first task by Markov chain Monte Carlo (MCMC) coupled with predictive simulation. Specifically, equation M67 has a multinomial distribution conditional upon θ, which we readily sample after MCMC delivers a sample of θ vectors from the posterior equation M68. The marginal posterior of θ is naturally sampled by working with the higher-dimensional posterior of equation M69, where Z is the data structure recording all the latent MT counts immediately after 6-TG treatment and prior to in vitro growth (Section 3.2), and where ϕ is the overdispersion parameter related to variability in clonal growth rates. Briefly, each scan updated Z, θ, and ϕ in blocks according to a Metropolis–Hastings sampler (e.g., Robert and Casella, 1999, Chapter 6). The target posterior distribution was equation M70, that is, the distribution of unknowns given the MT count data. Details of the proposal distributions, algorithm structure, and output diagnostics, and code are presented in supplementary material available at Biostatistics online.
The second task in evaluating equation M71 is to compute the predictive ordinate equation M72, both for the realized WT data equation M73 and the many realized conditional predictions equation M74 (as in (4.2)). We eschew Monte Carlo approximations for this purpose and instead use a simple numerical approximation associated with treating equation M75 as a Dirichlet distribution. Realized θ vectors from the MCMC sampling are summarized to give a method-of-moments approximation to the parameters of this Dirichlet. Then the predictive ordinate is the ordinate of a Dirichlet-multinomial distribution, as the integration is achieved analytically. Details are provided in supplementary material available at Biostatistics online.
We applied the proposed computations to both J-region and V-region data from the six melanoma patients. Figure Figure11 summarizes features of equation M76 in comparison to empirical-type frequencies for the J-region data. (See Figure S1 of supplementary material available at Biostatistics online presents the V-region results.) Visually there is reasonably good agreement between MT and WT frequencies over different J-region types; substantial deviations (e.g. MT-MC compared with WT for types 1-1 and 2-1) reflect possible non-null behavior, though this must be gauged by intrinsic variations (e.g. see deviations between MT-MC and MT-SC). Balancing these factors, the Monte Carlo estimated conditional predictive p-value is equation M77, using 104 saved draws from a well-mixed Markov chain (see Figure S2 of supplementary material available at Biostatistics online). In contrast to this borderline significance value, the V-region data give equation M78 using the same simulation size. Evidently fluctuations as seen in the J-region data (Figure (Figure1)1) are not particularly unusual on the null hypothesis of equal MT and WT frequencies. Significant deviations are evident in V-region data from patients.
Fig. 1.
Fig. 1.
Summary statistics for J region frequencies from Tables Tables11 and and2.2. Colored bars show empirical frequencies of each J region type, from various data sources. Boxplots summarize posterior analysis of the underlying proportions (more ...)
The main point of the developed equation M79 was to enable basic judgements about the MT/WT comparison in light of extra-multinomial variation. We note, for comparison, that direct application of Fisher's exact test would yield equation M80 (J) and equation M81 (V), but this method is not reliable in the present context. For another comparison, we approximated the likelihood ratio statistic. Maximum likelihood estimation (MLE) is difficult in this hierarchical model. However, we approximated the MLEs using the posterior mean parameter settings from the developed MCMC sampler, separately for null and alternative hypotheses, and we used forward simulation to calculate likelihood components marginalizing the latent counts in Z (see Section S5 of supplementary material available at Biostatistics online). This gives equation M82 (J) and equation M83 (V), assuming equation M84 asymptotics.
To check the reliability of the proposed equation M85, we did a small simulation experiment in which system parameters were taken to be the posterior means from the J-region data and where sample sizes similarly matched the J-region data. MCMC sampling and predictive simulation were performed on each simulated data set. Figure S3 of supplementary material available at Biostatistics online illustrates the empirical distribution of 1000 null simulated equation M86's, and confirms that this estimated distribution is very close to uniform. To assess power, we also simulated the distribution of equation M87 under the alternative, taking separate estimates of θ for MT and WT cells in order to match as closely as possible the experimental setting (see Section S5 of supplementary material available at Biostatistics online). We found a 0.67 and 0.84 probability for equation M88 and equation M89, respectively, in this case.
Asymptotic uniformity of the proposed p-value can be proved using a curious fact about the large-sample distribution of the multinomial density of a multinomial sample. Fix equation M90 subject to equation M91 and equation M92, and let equation M93 denote a multinomially distributed random vector on equation M94 trials with probability vector θ. Recall the probability mass function
equation M95
for suitable count vectors x.
Lemma 6.1
For the non-random sequence equation M96,
equation M97
as equation M98, where equation M99 is a equation M100 distributed random variable on equation M101 degrees of freedom.
A proof is given in supplementary material available at Biostatistics online (Section S6). As MT sample sizes increase, the posterior equation M102 converges to a point mass at the true θ vector, by standard large-sample Bayesian theory (e.g. Schervish, 1995, p. 428). Thus, the predictive ordinate equation M103 is indistinguishable from the ordinate of the true multinomial distribution equation M104. Taking logs and centering as in Lemma 6.1, the predictive p-value equation M105, now considering inputs as random elements X and Y, is asymptotically equal in distribution to equation M106, where
equation M107
(6.1)
where U and V are independent and identically distributed equation M108 variables on equation M109 degrees of freedom, and where equation M110 is the density of this common distribution. The association is V is the limiting centered log mass equation M111 and U is the limiting centered log mass of equation M112. The uniformity of equation M113 is immediate from the probability integral transform. Note that MT data Y enter the game only to infer the parameter θ; otherwise, testing is left up to the WT data.
Our first attempt to obtain a useful test of the MT/WT difference used a different construction that exhibited unusual sampling properties. We imputed missing counts Z as in the proposed method, conditionally upon the MT data, but for each of these we calculated the Fisher test p-value that would be suitable in the absence of missing data. Then we averaged the resulting p-values. Computational experiments indicated that this p-value might be adjustable to provide a valid test; however, a detailed mathematical analysis uncovered a sampling defect associated with plugging in a consistent parameter estimate. The calculation is tangential to our main argument, but it was helpful in guiding us to a more useful construction, and so we include it in supplementary material available at Biostatistics online (Section S6).
The work was supported in part by the Office of Research and Development, Biomedical Laboratory Research and Development Service, Department of Veterans Affairs; grant P30 CA014520 from the National Cancer Institute; grant R21 HG006568 from the National Human Genome Research Institute; Anns Hope Foundation; the Gretchen and Andrew Dawes Melanoma Research Fund; the Jay Van Sloan Memorial from the Steve Leuthold Family; and the Tim Eagle Memorial. The contents do not represent the views of the Dept. of Veterans Affairs or the United States Government. C.L.Z. was supported by a postdoctoral fellowship by U.S. PHS Grant T32AR055893. Funding to pay the Open Access publication charges for this article was provided by University of Wisconsin fund which is called the Kellett Award.
Supplementary Material
Supplementary Data
Acknowledgements
We thank the melanoma patients who provided samples for analysis. We thank Christina Kendziorski for comments that helped to guide the development of this research, and also two reviewers for useful comments on the original submission. Conflict of Interest: None declared.
  • Agresti A. Categorical Data Analysis. New York: Wiley; 1990.
  • Albertini R. J. HPRT mutations in humans: biomarkers for mechanistic studies. Mutation Research. 2001;489:1–16. [PubMed]
  • Albertini M. R., King D. M., Newton M. A., Vacek P. M. In vivo mutant frequency of thioguanine-resistant T-cells in the peripheral blood and lymph nodes of melanoma patients. Mutation Research. 2001;476:83–97. [PubMed]
  • Albertini M. R., Macklin M. D., Zuleger C. L., Newton M. A., Judice S. A., Albertini R. J. Clonal expansions of 6-thioguanine resistant T lymphocytes in the blood and tumor of melanoma patients. Environmental and Molecular Mutagenesis. 2008;49:676–687. [PMC free article] [PubMed]
  • Bayarri M. J., Berger J. O. Quantifying surprise in the data and model verification. In: Bernardo J. M., Berger J. O., Dawid A. P., Smith A. F. M., editors. Bayesian Statistics 6. London: Oxford University Press; 1999. pp. 53–82.
  • Bayarri M. J., Berger J. O. P values for composite null models. Journal of the American Statistical Association. 2000;95:1127–1142.
  • Dennis B., Patil G. P. The gamma distribution and weighted multimodal gamma distributions as models of population abundance. Mathematical Biosciences. 1984;68:187–212.
  • Dunson D. B., Tindall K. R. Bayesian analysis of mutational sprectra. Genetics. 2000;156:1411–1418. [PubMed]
  • Evans M. Asymptotic distribution of P values in composite null models: comment. Journal of the American Statistical Association. 2000;95:1160–1163.
  • Gelman A., Meng X. L., Stern H. Posterior predictive assessment of model fitness via realized discrepancies (with discussion) Statistica Sinica. 1996;6:733–807.
  • Gelman A., Shalizi C. R. Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology. 2012:1–31. ( doi:10.1111/j.2044-8317.2011.02037.x) [PubMed]
  • Johnson N. L., Kotz S., Balakrishnan N. Discrete Multivariate Distributions. New York: Wiley; 1997.
  • Petkau A. J., Sitter R. R. Models for quantal response experiments over time. Biometrics. 1989;45:1299–1308.
  • Robert C. P., Casella G. Monte Carlo Statistical Methods. New York: Springer; 1999.
  • Robins H. S., Campregher P. V., Srivastava S. K., Wacher A., Turtle C. J., Kahsai O., Riddell S. R., Warren E. H., Carlson C. S. Blood. 2009;114:4099–4107. [PubMed]
  • Robins J. M., van der Vaart A., Ventura V. The asymptotic distribution of p values in composite null models. Journal of the American Statistical Association. 2000;95:1143–1156.
  • Schervish M. J. Theory of Statistics. New York: Springer; 1995.
  • Venturi V., Quigley M. F., Greenaway H. Y., Ng P. C., Ende Z. S., McIntosh T., Asher T. E., Almeida J. R., Levy S., Price D. A., Davenport M. P., Douek D. C. A mechanism for TCR sharing between T cell subsets and individuals revealed by pyrosequencing. The Journal of Immunology. 2011;186:4285–4294. [PubMed]
  • Zuleger C. L., Macklin M. D., Bostwick B. L., Pei Q., Newton M. A., Albertini M. R. In vivo 6-thioguanine-resistant T cells from melanoma patients have public TCR and share TCR beta amino acid sequences with melanoma-reactive T cells. Journal of Immunological Methods. 2011;365:76–86. [PMC free article] [PubMed]
Articles from Biostatistics (Oxford, England) are provided here courtesy of
Oxford University Press