PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS One. 2012; 7(11): e42368.
Published online 2012 November 6. doi:  10.1371/journal.pone.0042368
PMCID: PMC3491072

Estimation of Distribution Overlap of Urn Models

Yu Zhang, Editor

Abstract

A classical problem in statistics is estimating the expected coverage of a sample, which has had applications in gene expression, microbial ecology, optimization, and even numismatics. Here we consider a related extension of this problem to random samples of two discrete distributions. Specifically, we estimate what we call the dissimilarity probability of a sample, i.e., the probability of a draw from one distribution not being observed in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e001.jpg draws from another distribution. We show our estimator of dissimilarity to be a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e002.jpg-statistic and a uniformly minimum variance unbiased estimator of dissimilarity over the largest appropriate range of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e003.jpg. Furthermore, despite the non-Markovian nature of our estimator when applied sequentially over An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e004.jpg, we show it converges uniformly in probability to the dissimilarity parameter, and we present criteria when it is approximately normally distributed and admits a consistent jackknife estimator of its variance. As proof of concept, we analyze V35 16S rRNA data to discern between various microbial environments. Other potential applications concern any situation where dissimilarity of two discrete distributions may be of interest. For instance, in SELEX experiments, each urn could represent a random RNA pool and each draw a possible solution to a particular binding site problem over that pool. The dissimilarity of these pools is then related to the probability of finding binding site solutions in one pool that are absent in the other.

Introduction

An inescapable problem in microbial ecology is that a sample from an environment typically does not observe all species present in that environment. In [14], this problem has 1been recently linked to the concepts of coverage probability (i.e. the probability that a member from the environment is represented in the sample) and the closely related discovery or unobserved probability (i.e. the probability that a previously unobserved species is seen with another random observation from that environment). The mathematical treatment of coverage is not limited, however, to microbial ecology and has found applications in varied contexts, including gene expression, microbial ecology, optimization, and even numismatics.

The point estimation of coverage and discovery probability seem to have been first addressed by Turing and Good [2] to help decipher the Enigma Code, and subsequent work has provided point predictors and prediction intervals for these quantities under various assumptions [1], [3][5].

Following Robbins [6] and in more generality Starr [7], an unbiased estimator of the expected discovery probability of a sample of size An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e005.jpg is

equation image
(1)

where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e007.jpg is the number of species observed exactly An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e008.jpg-times in a sample with replacement of size An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e009.jpg. Using the theory of U-statistics developed by Halmos [8], Clayton and Frees [9] show that the above estimator is the uniformly minimum variance unbiased estimator (UMVUE) of the expected discovery probability of a sample of size An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e010.jpg based on an enlarged sample of size An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e011.jpg.

A quantity analogous to the discovery probability of a sample from a single environment but in the context of two environments is dissimilarity, which we broadly define as the probability that a draw in one environment is not represented in a random sample (of a given size) from a possibly different environment. Estimating the dissimilarity of two microbial environments is therefore closely related to the problem of assessing the species that are unique to each environment, and the concept of dissimilarity may find applications to measure sample quality and allocate additional sampling resources, for example, for a more robust and reliable estimation of the UniFrac distance [10], [11] between pairs of environments. Dissimilarity may find applications in other and very different contexts. For instance, in SELEX experiments [12]–a laboratory technique in which an initial pool of synthesized random RNA sequences is repeatedly screened to yield a pool containing only sequences with given biological functions–the dissimilarity of two RNA pools corresponds to the probability of finding binding site solutions in one pool that are absent in the other.

In this manuscript, we study an estimator of dissimilarity probability similar to Robbins' and Starr's statistic for discovery probability. Our estimator is optimal among the appropriate class of unbiased statistics, while being approximately normally distributed in a general case. The variance of this statistic is estimated using a consistent jackknife. As proof of concept, we analyze samples of processed V35 16S rRNA data from the Human Microbiome Project [13].

Probabilistic Formulation and Inference Problem

To study dissimilarity probability, we use the mathematical model of a pair of urns, where each urn has an unknown composition of balls of different colors, and where there is no a priori knowledge of the contents of either urn. Information concerning the urn composition is inferred from repeated draws with replacement from that urn.

In what follows, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e012.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e013.jpg are independent sequences of independent and identically distributed (i.i.d.) discrete random variables with probability mass functions An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e014.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e015.jpg, respectively. Without loss of generality we assume that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e016.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e017.jpg are supported over possibly infinite subsets of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e018.jpg, and think of outcomes from these distributions as “colors”: i.e. we speak of color-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e019.jpg, color-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e020.jpg, etc. Let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e021.jpg denote the set of colors An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e022.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e023.jpg, and similarly define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e024.jpg. Under this perspective, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e025.jpg denotes the color of the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e026.jpg-th ball drawn with replacement from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e027.jpg. Similarly, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e028.jpg is the color of the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e029.jpg-th ball drawn with replacement from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e030.jpg. Note that based on our formulation, distinct draws are always independent.

The mathematical analysis that follows was motivated by the problem of estimating the fraction of balls in urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e031.jpg with a color that is absent in urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e032.jpg. We can write this parameter as

equation image
(2)

where

equation image
(3)

The parameter An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e035.jpg measures the proportion of urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e036.jpg which is unique from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e037.jpg. On the other hand, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e038.jpg is a measure of the effectiveness of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e039.jpg-samples from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e040.jpg to determine uniqueness in urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e041.jpg. This motivates us to refer to the quantity in (2) as the dissimilarity of urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e042.jpg from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e043.jpg, and to the quantity in (3) as the average dissimilarity of urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e044.jpg relative to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e045.jpg-draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e046.jpg. Note that these parameters are in general asymmetric in the roles of the urns. In what follows, urns-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e047.jpg and -An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e048.jpg are assumed fixed, which motivates us to remove subscripts and write An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e049.jpg instead of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e050.jpg.

Unfortunately, one cannot estimate unbiasedly the dissimilarity of one urn from another based on finite samples, as stated in the following result. (See the Materials and Methods section for the proofs of all of our results).

Theorem 1 (No unbiased estimator of dissimilarity.) There is no unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e051.jpg based on finite samples from two arbitrary urns-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e052.jpg and -An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e053.jpg.

Furthermore, estimating An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e054.jpg accurately without further assumptions on the compositions of urns-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e055.jpg and -An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e056.jpg seems a difficult if not impossible task. For instance, arbitrarily small perturbations of urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e057.jpg are likely to be unnoticed in a sample of a given size from this urn but may drastically affect the dissimilarity of other urns from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e058.jpg. To demonstrate this idea, consider a parameter An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e059.jpg and let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e060.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e061.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e062.jpg. If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e063.jpg then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e064.jpg while, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e065.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e066.jpg.

In contrast with the above, for fixed An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e067.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e068.jpg depends continuously on An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e069.jpg e.g. under the metric

equation image

where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e071.jpg denotes the total variation of a signed measure An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e072.jpg over An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e073.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e074.jpg. This is the case because

equation image
equation image

The above implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e077.jpg is continuous with respect to any metric equivalent to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e078.jpg. Many such metrics can be conceived. For instance, if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e079.jpg denotes the probability measure associated with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e080.jpg samples with replacement from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e081.jpg that are independent of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e082.jpg samples with replacement from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e083.jpg then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e084.jpg is also continuous with respect to any of the metrics An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e085.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e086.jpg, because

An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e087.jpgBecause of the above considerations, we discourage the direct estimation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e088.jpg and focus on the problem of estimating An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e089.jpg accurately.

Results

Consider a finite number of draws with replacement An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e090.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e091.jpg, from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e092.jpg and urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e093.jpg, respectively, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e094.jpg are assumed fixed. Using this data we can estimate An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e095.jpg, for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e096.jpg, via the estimator:

equation image
(4)

where

equation image
(5)

We refer to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e099.jpg as the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e100.jpg-statistics summarizing the data from both urns. Due to the well-known relation: An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e101.jpg, at most An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e102.jpg of these estimators are non-zero. This sparsity may be exploited in the calculation of the right-hand side of (4) over a large range of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e103.jpg's.

Our statistic in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e104.jpg is the U-statistic associated with the kernel An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e105.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e106.jpg is used to denote the indicator function of the event within the brackets (Iverson's bracket notation). Following the approach by Halmos in [8], we can show that this U-statistic is optimal amongst the unbiased estimators of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e107.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e108.jpg. We note that no additional samples from either urn are necessary to estimate An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e109.jpg unbiasedly over this range when An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e110.jpg. This contrasts with the estimator in equation (1), which requires sample enlargement for unbiased estimation of discovery probability of a sample of size An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e111.jpg.

Theorem 2 (Minimum variance unbiased estimator.) If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e112.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e113.jpg then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e114.jpg is the unique uniformly minimum variance unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e115.jpg. Further, no unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e116.jpg exists for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e117.jpg or An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e118.jpg.

Our next result shows that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e119.jpg converges uniformly in probability to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e120.jpg over the largest possible range where unbiased estimation of the later parameter is possible, despite the non-Markovian nature of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e121.jpg when applied sequentially over An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e122.jpg. The result asserts that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e123.jpg is likely to be a good approximation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e124.jpg, uniformly for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e125.jpg, when An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e126.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e127.jpg are large. The method of proof uses an approach by Hoeffding [14] for the exact calculation of the variance of a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e128.jpg-statistic.

Theorem 3 (Uniform convergence in probability.) Independently of how An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e129.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e130.jpg tend to infinity, it follows for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e131.jpg that

equation image
(6)

We may estimate the variance of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e133.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e134.jpg via a leave-one-out or also called delete-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e135.jpg jackknife estimator, using an approach studied by Efron and Stein [15] and Shao and Wu [16].

To account for variability in the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e136.jpg-data through a leave-one-out jackknife estimate, we require that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e137.jpg and let

equation image
(7)

On the other hand, to account for variability in the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e139.jpg-data, consider for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e140.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e141.jpg the statistics

equation image
(8)

Clearly, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e143.jpg; in particular, the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e144.jpg-statistics are a refinement of the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e145.jpg-statistics. Define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e146.jpg and, for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e147.jpg, define

equation image
(9)

where

equation image
(10)
equation image
(11)

Our estimator of the variance of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e151.jpg is obtained by summing the variance attributable to the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e152.jpg-data and the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e153.jpg-data and is given by

equation image
(12)

for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e155.jpg; in particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e156.jpg is our jackknife estimate of the standard deviation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e157.jpg.

To assess the quality of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e158.jpg as an estimate of the variance of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e159.jpg and the asymptotic distribution of the later statistic, we require a few assumptions that rule out degenerate cases. The following conditions are used in the remaining theorems in this section:

  1. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e160.jpg.
  2. there are at least two colors in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e161.jpg that occur in different proportions in urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e162.jpg; in particular, the conditional probability An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e163.jpg is not a uniform distribution.
  3. urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e164.jpg contains at least one color that is absent in urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e165.jpg; in particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e166.jpg.
  4. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e167.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e168.jpg grow to infinity at a comparable rate i.e. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e169.jpg, which means that there exist finite constants An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e170.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e171.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e172.jpg tend to infinity.

Conditions (a–c) imply that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e173.jpg has a strictly positive variance and that a projection random variable, intermediate between An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e174.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e175.jpg, has also a strictly positive variance. The idea of projection is motivated by the analysis of Grams and Serfling in [17].

Condition (d) is technical and only used to show that the result in Theorem 5 holds for the largest possible range of values of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e176.jpg namely, for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e177.jpg. See [18] for results with uniformity related to Theorem 4, as well as uniformity results when condition (d) is not assumed.

Because the variance of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e178.jpg, from now on denoted An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e179.jpg, and its estimate An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e180.jpg tend to zero as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e181.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e182.jpg increase, the unnormalized consistency result is unsatisfactory. As an alternative, we can show that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e183.jpg is a consistent estimator relative to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e184.jpg, as stated next.

Theorem 4 (Asymptotic consistency of variance estimation.) If conditions (a)–(c) are satisfied then, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e185.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e186.jpg, it applies that

equation image
(13)

Finally, under conditions (a)–(d), we show that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e188.jpg is asymptotically normally distributed for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e189.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e190.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e191.jpg increase at a comparable rate.

Theorem 5 (Asymptotic normality.) Let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e192.jpg i.e. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e193.jpg has a standard normal distribution. If conditions (a)–(d) are satisfied then

equation image
(14)

for all real number An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e195.jpg.

The non-trivial aspect of the above result is the asymptotic normality of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e196.jpg when An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e197.jpg, e.g. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e198.jpg, as the results we have found in the literature [14],[19],[20] only guarantee the asymptotic normality of our estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e199.jpg for fixed An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e200.jpg. We note that, due to Slutsky's theorem [21], it follows from (13) and (14) that the ratio

equation image

has, for fixed An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e202.jpg, approximately a standard normal distribution when An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e203.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e204.jpg are large and of a comparable order of magnitude.

Discussion

As proof of concept, we use our estimators to analyze data from the Human Microbiome Project (HMP) [13]. In particular, our samples are V35 16S rRNA data, processed by Qiime into an operational taxonomic unit (OTU) count table format (see File S1). Each of the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e205.jpg samples analyzed have more than An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e206.jpg successfully identified bacteria (see File S2). We sort these samples by the body location metadata describing the origin of the sample. This sorting yields the assignments displayed in Table 1.

Table 1
HMP data.

We present our estimates of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e207.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e208.jpg possible sample comparisons in Figure 1, i.e., we estimate the average dissimilarity of sample-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e209.jpg relative to the full sample-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e210.jpg. Due to (4), observe that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e211.jpg. At the given sample sizes, we can differentiate four broad groups of environments: stool, vagina, oral/throat and skin/nostril. We differentiate a larger proportion of oral/throat bacteria found in stool than stool bacteria found in the oral/throat environments. We may also differentiate the throat, gingival and saliva samples, but cannot reliably differentiate between tongue and throat samples or between the subgingival and supragingival plaques. On the other hand, the stool samples have larger proportions of unique bacteria relative to other stool samples of the same type, and vaginal samples also have this property. In contrast the skin/nostril samples have relatively few bacteria that are not identified in other skin/nostril samples.

Figure 1
Dissimilarity estimates.

The above effects may be a property of the environments from which samples are taken, or an effect of noise from inaccurate estimates due to sampling. To rule out the later interpretation, we show estimates of the standard deviation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e216.jpg based on the jackknife estimator An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e217.jpg from (12) in Figure 2. As An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e218.jpg is zero, the error estimate is given by An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e219.jpg. We see from (7), with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e220.jpg, that

Figure 2
Error estimates.
equation image

Assuming a normal distribution and an accurate jackknife estimate of variance, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e226.jpg will be in the interval An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e227.jpg with at least approximately 95% confidence, for any choice of sample comparisons in our data; in particular, on a linear scale, we expect at least 95% of the estimates in Figure 1 to be accurate in at least the first two digits.

As we mentioned earlier, estimating An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e228.jpg accurately is a difficult problem. We end this section with two heuristics to assess how representative An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e229.jpg is of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e230.jpg, when urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e231.jpg has at least two colors and at least one color in common with urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e232.jpg. First, observe that:

equation image
(15)

In particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e234.jpg is a strictly concave-up and monotonically decreasing function of the real-variable An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e235.jpg. Hence, if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e236.jpg is close to the asymptotic value An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e237.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e238.jpg should be of small magnitude. We call the later quantity the discrete derivative of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e239.jpg at An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e240.jpg. Since we may estimate the discrete derivative from our data, the following heuristic arises: relatively large values of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e241.jpg are evidence that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e242.jpg is not a good approximation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e243.jpg.

Figure 3 shows the heat map of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e244.jpg for each pair of samples. These estimates are of order An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e245.jpg for the majority of the comparisons, and spike to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e246.jpg for several sample-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e247.jpg of varied environment types, when sample-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e248.jpg is associated with a skin or vaginal sample. In particular, further sampling effort from environments associated with certain vaginal, oral or stool samples are likely to reveal bacteria associated with broadly defined skin or vaginal environments.

Figure 3
Discrete derivative estimates.

Another heuristic may be more useful to assess how close An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e250.jpg is to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e251.jpg, particularly when the previous heuristic is inconclusive. As motivation, observe that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e252.jpg, because of the identity in (15), where

equation image

Furthermore, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e254.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e255.jpg is certain finite constant. We can justify this approximation only when An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e256.jpg is well approximated by a linear function of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e257.jpg, in which case we let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e258.jpg denote the estimated value for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e259.jpg obtained from the linear regression. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e260.jpg, the following more precise heuristic comes to light: An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e261.jpg is a good approximation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e262.jpg if the linear regression of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e263.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e264.jpg near An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e265.jpg gives a good fit, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e266.jpg is small relative to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e267.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e268.jpg is also small.

To fix ideas we have applied the above heuristic to three pairs of samples: An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e269.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e270.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e271.jpg, with each ordered pair denoting urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e272.jpg and urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e273.jpg, respectively. As seen in Table 2 for these three cases, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e274.jpg is at least 14-times larger than An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e275.jpg; in particular, due to the asymptotic normality of the later statistic, an appropriate use of the heuristic is reduced to a good linear fit and a small An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e276.jpg value. In all three cases, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e277.jpg was computed from the estimates An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e278.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e279.jpg.

Table 2
Sample comparisons.

For the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e285.jpg-pair, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e286.jpg and the regression error, measured as the largest absolute residual associated with the best linear fit, are zero to machine precision, suggesting that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e287.jpg is a good approximation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e288.jpg. This is reinforced by the blue plot in Figure 4. On the other hand, for the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e289.jpg-pair, the regression error is small, suggesting that the linear approximation An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e290.jpg is good for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e291.jpg. However, because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e292.jpg, we cannot guarantee that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e293.jpg is a good approximation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e294.jpg. In fact, as seen in the red-plot in Figure 4, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e295.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e296.jpg, exposes a steady and almost linear decay that suggests that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e297.jpg may be much smaller than An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e298.jpg. Finally, for the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e299.jpg-pair, the regression error is large and the heuristic is therefore inconclusive. Due to the green-plot in Figure 4, the lack of fit indicates that the exponential rate of decay of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e300.jpg to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e301.jpg has not yet been captured by the data from these urns. Note that the heuristic based on the discrete derivative shows no evidence that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e302.jpg is far from An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e303.jpg.

Figure 4
Sequential estimation.

Materials and Methods

Here we prove the theorems given in the Results section. The key idea to prove each theorem may be summarized as follows.

To show Theorem 1, we identify pairs of urns for which unbiased estimation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e306.jpg is impossible for any statistic. To show Theorem 2, we exploit the diversity of possible urn distributions to show that there are relatively few unbiased estimators of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e307.jpg and, in fact, there is a single unbiased estimator An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e308.jpg that is symmetric on the data. The uniqueness of the symmetric estimator is obtained via a completeness argument: a symmetric statistic having expected value zero is shown to correspond to a polynomial with identically zero coefficients, which themselves correspond to values returned by the statistic when presented with specific data. The symmetric estimator is a U-statistic in that it corresponds to an average of unbiased estimates of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e309.jpg, based on all possible sub-samples of size An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e310.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e311.jpg from the samples of urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e312.jpg and -An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e313.jpg, respectively. As any asymmetric estimator has higher variance than a corresponding symmetric estimator, the symmetric estimator must be the UMVUE.

To show Theorem 3 we use bounds on the variance of the U-statistic and show that, uniformly for relatively small An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e314.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e315.jpg converges to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e316.jpg in the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e317.jpg-norm. In contrast, for relatively large values of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e318.jpg, we exploit the monotonicity of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e319.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e320.jpg to show uniform convergence.

Finally, theorems 4 and 5 are shown using an approximation of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e321.jpg by sums i.i.d. random variables, as well as results concerning the variance of both An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e322.jpg and its approximation. In particular, the approximation satisfies the hypotheses the Central Limit Theorem and Law of Large Numbers, which we use to transfer these results to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e323.jpg.

In what follows, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e324.jpg denotes the set of all probability distributions that are finitely supported over An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e325.jpg.

Proof of Theorem 1

Consider in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e326.jpg probability distributions of the form An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e327.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e328.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e329.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e330.jpg is a given parameter. Any statistic An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e331.jpg which takes as input An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e332.jpg draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e333.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e334.jpg draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e335.jpg has that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e336.jpg is a polynomial of degree at most An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e337.jpg in the variable An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e338.jpg; in particular, it is a continuous function of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e339.jpg over the interval An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e340.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e341.jpg has a discontinuity at An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e342.jpg over this interval, there exists no estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e343.jpg that is unbiased over pairs of distributions in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e344.jpg.

We use lemmas 6–11 to first show Theorem 2. The method of proof of this theorem follows an approach similar to the one used by Halmos [8] for single distributions, which we extend here naturally to the setting of two distributions.

Our next result implies that no uniformly unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e345.jpg is possible when using less than one sample from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e346.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e347.jpg samples from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e348.jpg.

Lemma 6 If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e349.jpg is unbiased for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e350.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e351.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e352.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e353.jpg.

Proof. Consider in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e354.jpg probability distributions of the form An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e355.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e356.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e357.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e358.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e359.jpg are arbitrary real numbers. Clearly, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e360.jpg is a linear combination of polynomials of degree An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e361.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e362.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e363.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e364.jpg and, as a result, it is a polynomial of degree at most An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e365.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e366.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e367.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e368.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e369.jpg has degree An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e370.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e371.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e372.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e373.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e374.jpg is unbiased for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e375.jpg, we conclude that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e376.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e377.jpg.

The form of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e378.jpg given in equation (4) is convenient for computation but, for mathematical analysis, we prefer its An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e379.jpg-statistic form associated with the kernel function An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e380.jpg.

In what follows, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e381.jpg denotes the set of all functions An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e382.jpg that are one-to-one.

Lemma 7

equation image
(16)

where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e384.jpg.

Proof. Fix An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e385.jpg and suppose that color An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e386.jpg occurs An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e387.jpg-times in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e388.jpg. If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e389.jpg then any sublist of size An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e390.jpg of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e391.jpg contains An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e392.jpg, hence An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e393.jpg, for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e394.jpg. On the other hand, if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e395.jpg then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e396.jpg. Since the rightmost sum only depends on the number of times that color An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e397.jpg was observed in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e398.jpg, we may use the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e399.jpg-statistics defined in equation (5) to rewrite:

equation image

The right-hand side above now corresponds to the definition of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e401.jpg given in equation (4).

In what follows, we say that a function An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e402.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e403.jpg-symmetric when

equation image

for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e405.jpg and permutations An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e406.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e407.jpg of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e408.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e409.jpg, respectively. Alternatively, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e410.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e411.jpg-symmetric if and only if it may be regarded a function of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e412.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e413.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e414.jpg correspond to the order statistics An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e415.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e416.jpg, respectively. Accordingly, a statistic of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e417.jpg is called An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e418.jpg-symmetric when it may be represented in the form An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e419.jpg, for some An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e420.jpg-symmetric function An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e421.jpg. It is immediate from Lemma 7 that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e422.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e423.jpg-symmetric.

The next result asserts that the variance of any non-symmetric unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e424.jpg may be reduced by a corresponding symmetric unbiased estimator. The proof is based on the well-known fact that conditioning preserves the mean of a statistic and cannot increase its variance.

Lemma 8 An asymmetric unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e425.jpg that is square-integrable has a strictly larger variance than a corresponding An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e426.jpg-symmetric unbiased estimator.

Proof. Let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e427.jpg denote the sigma-field generated by the random vector An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e428.jpg and suppose that the statistic An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e429.jpg is unbiased for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e430.jpg and square-integrable. In particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e431.jpg is a well-defined statistic and there is an An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e432.jpg-symmetric function An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e433.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e434.jpg. Clearly, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e435.jpg is unbiased for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e436.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e437.jpg-symmetric. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e438.jpg, Jensen's inequality for conditional expectations [22] implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e439.jpg, with equality if and only if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e440.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e441.jpg-symmetric.

Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e442.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e443.jpg-symmetric and bounded, the above lemma implies that if an UMVUE for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e444.jpg exists then it must be An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e445.jpg-symmetric. Next, we show that there is a unique symmetric and unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e446.jpg, which immediately implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e447.jpg is the UMVUE.

In what follows, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e448.jpg denote integers. We say that a polynomial An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e449.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e450.jpg-homogeneous when it is a linear combination of polynomials of the form An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e451.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e452.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e453.jpg. Furthermore, we say that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e454.jpg satisfies the partial vanishing condition if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e455.jpg whenever An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e456.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e457.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e458.jpg.

The next lemma is an intermediate step to show that a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e459.jpg-homogeneous polynomial which satisfies the partial vanishing condition is the zero polynomial, which is shown in Lemma 10.

Lemma 9 If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e460.jpg is a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e461.jpg-homogeneous polynomial in the real variables An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e462.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e463.jpg, that satisfies the partial vanishing condition, then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e464.jpg whenever An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e465.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e466.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e467.jpg.

Proof. Fix An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e468.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e469.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e470.jpg and observe that

equation image

because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e472.jpg is a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e473.jpg-homogeneous polynomial. Notice now that the right hand-side above is zero because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e474.jpg satisfies the partial vanishing condition.

Lemma 10 Let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e475.jpg be a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e476.jpg-homogeneous polynomial in the real variables An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e477.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e478.jpg. If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e479.jpg satisfies the partial vanishing condition then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e480.jpg identically.

Proof. We prove the lemma using structural induction on An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e481.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e482.jpg.

If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e483.jpg then a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e484.jpg-homogeneous polynomial An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e485.jpg must be of the form An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e486.jpg, for an appropriate constant An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e487.jpg. As such a polynomial satisfies the partial-vanishing condition only when An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e488.jpg, the base case for induction is established.

Next, consider a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e489.jpg-homogeneous polynomial An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e490.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e491.jpg, that satisfies the partial vanishing condition, and let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e492.jpg denote its degree with respect to the variable An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e493.jpg. In particular, there are polynomials An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e494.jpg in the variables An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e495.jpg such that

equation image

Now fix An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e497.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e498.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e499.jpg. Because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e500.jpg satisfies the partial vanishing condition, Lemma 9 implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e501.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e502.jpg. In particular, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e503.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e504.jpg whenever An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e505.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e506.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e507.jpg. Thus each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e508.jpg satisfies the partial vanishing condition. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e509.jpg is a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e510.jpg-homogeneous polynomial, the inductive hypothesis implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e511.jpg identically and hence An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e512.jpg identically. The same argument shows that if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e513.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e514.jpg, is a An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e515.jpg-homogeneous polynomial that satisfies the partial vanishing condition then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e516.jpg identically, completing the inductive proof of the lemma.

Our final resultbefore proving Theorem 2 implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e517.jpg cannot admit more than one symmetric and unbiased estimator. Its proof depends on the variety of distributions in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e518.jpg, and uses the requirement that our estimator must be unbiased for any pair of distributions chosen from An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e519.jpg.

Lemma 11 If An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e520.jpg is an An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e521.jpg-symmetric function such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e522.jpg, for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e523.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e524.jpg identically.

Proof. Consider a point An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e525.jpg and define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e526.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e527.jpg as the cardinalities of the sets An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e528.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e529.jpg, respectively. Furthermore, let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e530.jpg denote the distinct elements in the set An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e531.jpg and define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e532.jpg to be the number of times that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e533.jpg appears in this set. Furthermore, let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e534.jpg be a probability distribution such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e535.jpg and define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e536.jpg. In a completely analogous manner define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e537.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e538.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e539.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e540.jpg.

Notice that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e541.jpg is a polynomial in the real variables An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e542.jpg that satisfies the hypothesis of Lemma 10; in particular, this polynomial is identically zero. However, because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e543.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e544.jpg-symmetric, the coefficient of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e545.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e546.jpg is

equation image

implying that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e548.jpg.

Proof of Theorem 2

From Lemma 8, as we mentioned already, if the UMVUE for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e549.jpg exists then it must be An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e550.jpg-symmetric. Suppose there are two An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e551.jpg-symmetric functions such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e552.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e553.jpg are unbiased for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e554.jpg. Applying Lemma 11 to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e555.jpg shows that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e556.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e557.jpg admits therefore a unique symmetric and unbiased estimator. From Lemma 7, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e558.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e559.jpg-symmetric and unbiased for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e560.jpg hence it is the UMVUE for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e561.jpg. From Lemma 6, it follows that no unbiased estimator of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e562.jpg exists for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e563.jpg or An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e564.jpg.

Our next goal is to show Theorem 3, for which we prove first lemmas 12–13. We note that the later lemma applies in a much more general context than our treatment of dissimilarity.

Lemma 12 If, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e565.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e566.jpg is an integer such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e567.jpg then

equation image
(17)
equation image
(18)

uniformly for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e570.jpg as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e571.jpg.

Proof. First observe that for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e572.jpg sufficiently large and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e573.jpg, it applies that

equation image

Note that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e575.jpg, for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e576.jpg. As a result, we may bound the exponential factor on the right-hand side above as follows:

equation image

Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e578.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e579.jpg, uniformly for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e580.jpg as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e581.jpg, (17) follows.

To show (18), first note the combinatorial identity

equation image
(19)

Proceeding in an analogous manner as we did to show (17), we see now that the term associated with the index An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e583.jpg in the above summation satisfies that

equation image

for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e585.jpg sufficiently large and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e586.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e587.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e588.jpg, the above inequalities together with (17) and (19) establish (18).

Lemma 13 Define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e589.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e590.jpg is a bounded An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e591.jpg-symmetric function, and let

equation image

be the U-statistic of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e593.jpg associated with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e594.jpg draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e595.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e596.jpg draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e597.jpg; in particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e598.jpg. Furthermore, assume that

  1. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e599.jpg,
  2. there is a function An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e600.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e601.jpg,
  3. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e602.jpg; in particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e603.jpg.

Under the above assumptions, it follows that

equation image

Proof. Define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e605.jpg ; in particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e606.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e607.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e608.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e609.jpg, for any An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e610.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e611.jpg. The proof of the theorem is reduced to show that

equation image
(20)
equation image
(21)

Next, we compute the variance of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e614.jpg following an approach similar to Hoeffding [14]. Because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e615.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e616.jpg-symmetric, a tedious yet standard calculation shows that

equation image
(22)

where

equation image
(23)
equation image
(24)

Clearly, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e620.jpg. On the other hand, if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e621.jpg is any random variable with finite expectation and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e622.jpg are sigma-fields then An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e623.jpg, due to well-known properties of conditional expectations [22]. In particular, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e624.jpg, we have that

equation image
(25)

Consequently, (22) implies that

equation image
(26)

We claim that

equation image
(27)

Indeed, using an argument similar as above, we find that

equation image
equation image
equation image

Due to assumptions (i)-(ii) and the Bounded Convergence Theorem, the right-hand side above tends to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e631.jpg, and the claim follows.

It follows from (26) and (27) that

equation image

Finally, because of assumption (iii),

equation image

Since each term on the right-hand side above tends to zero as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e634.jpg, (20) follows.

We now show (21). As An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e635.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e636.jpg, it follows by (22) and Lemma 12 that

equation image

uniformly for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e638.jpg as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e639.jpg. In particular,

equation image

Due to the definition of the coefficients An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e641.jpg, the right-hand side above tends to zero, and (21) follows.

Proof of Theorem 3

Note that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e642.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e643.jpg. We show that the kernel function An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e644.jpg and the U-statistics An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e645.jpg satisfy the hypotheses of Lemma 13. From this the theorem is immediate because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e646.jpg-convergence implies convergence in probability.

Clearly An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e647.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e648.jpg-symmetric and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e649.jpg, which shows assumption (i) in Lemma 13. On the other hand, due to the Law of Large Numbers, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e650.jpg almost surely, from which assumption (ii) in the lemma also follows.

Finally, to show assumption (iii), recall that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e651.jpg is the set of one-to-one functions from An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e652.jpg into An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e653.jpg; in particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e654.jpg. Now note that for each indicator of the form An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e655.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e656.jpg, there are An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e657.jpg choices of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e658.jpg outside the set An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e659.jpg. Because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e660.jpg, it follows that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e661.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e662.jpg. This shows condition (iii) in Lemma 13, and Theorem 3 follows.

Proof of equation (7)

The jackknife estimate of the variance of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e663.jpg obtained from removing a single An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e664.jpg-data is, by definition, the quantity

equation image
(28)

Note that removing a color from the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e666.jpg-data which would otherwise add to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e667.jpg, decrements this quantity by one unit. Let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e668.jpg denote the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e669.jpg-statistics associated with the data when observation An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e670.jpg from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e671.jpg is removed from the sample. Note that as each draw from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e672.jpg contributes to exactly one An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e673.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e674.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e675.jpg except for some An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e676.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e677.jpg. We have therefore that

equation image
equation image
equation image

Since there are An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e681.jpg draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e682.jpg which contribute to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e683.jpg, the above sum may be now rewritten in the form given in (7).

Proof of equation (9)

Similarly, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e684.jpg corresponds to the jackknife summed over each possible deletion of a single An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e685.jpg-data, which is more precisely given by

equation image
(29)

where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e687.jpg is the set of one-to-one functions from An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e688.jpg into An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e689.jpg.

Recall that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e690.jpg is the number of colors seen An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e691.jpg times in draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e692.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e693.jpg times in draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e694.jpg, giving that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e695.jpg.

Fix An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e696.jpg and suppose that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e697.jpg is of a color that contributes to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e698.jpg, for some An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e699.jpg Removing An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e700.jpg from the data decrements An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e701.jpg and increments An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e702.jpg by one unit. Proceeding similarly as in the case for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e703.jpg, if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e704.jpg is used to denote the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e705.jpg-statistics when observation An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e706.jpg is removed from sample-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e707.jpg, then

equation image
equation image
equation image

where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e711.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e712.jpg are as defined in (10) and (11). Noting that for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e713.jpg there are An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e714.jpg draws from urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e715.jpg that contribute to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e716.jpg, the form in (9) follows.

In what follows, we specialize the coefficients in (23) and (24) to the kernel function of dissimilarity, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e717.jpg. From now on, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e718.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e719.jpg, define

equation image
(30)
equation image
(31)

Above it is understood that the sigma-field generated by An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e722.jpg when An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e723.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e724.jpg; in particular, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e725.jpg, for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e726.jpg.

The following asymptotic properties of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e727.jpg are useful in the remaining proofs.

Lemma 14 Assume that conditions (a)-(c) are satisfied and define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e728.jpg. It follows that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e729.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e730.jpg. Furthermore

equation image
(32)
equation image
(33)
equation image
(34)

Proof. Observe that conditions (a)–(b) imply that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e734.jpg. In addition, condition (b) implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e735.jpg, whereas condition (c) implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e736.jpg.

Next, consider the set

equation image

i.e. An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e738.jpg is the set of rarest colors in urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e739.jpg which are also in urn-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e740.jpg. Also note that

equation image
(35)

As an intermediate step before showing (32), we prove that

equation image
(36)

For this, first observe that

equation image

Hence

equation image
equation image
equation image

from which (36) now easily follows.

To show (32) note that (36) implies

equation image
equation image
equation image
equation image

which establishes (32).

Now note that

equation image
equation image
equation image

which establishes (33).

Next we show (34), which we note gives more precise information than (27). Consider the random variable An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e754.jpg defined as the smallest An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e755.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e756.jpg. We may bound the probability of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e757.jpg being large by An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e758.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e759.jpg is finite because of condition (a). On the other hand, note that

equation image

Define An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e761.jpg and observe that, over the event An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e762.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e763.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e764.jpg, we obtain that

equation image
equation image
equation image

The identity in equation (34) is now a direct consequence of (35).

Our next goal is to show Theorems 4 and 5. To do so we rely on the method of projection by Grams and Serfling [17]. This approach approximates An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e768.jpg by the random variable

equation image

The projection is the best approximation in terms of mean squared error to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e770.jpg that is a linear combination of individual functions of each datapoint.

Under the stated conditions, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e771.jpg is the sum of two independent sums of non-degenerate i.i.d. random variables and therefore satisfies the hypotheses of the classical central limit theorem. The variance of the projection is easier to analyze and estimate than the An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e772.jpg-statistic directly, which is relevant in establishing consistency for the jackknife estimation of variance.

Let

equation image

be the remainder of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e774.jpg that is not accounted for by its projection. When An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e775.jpg is small relative to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e776.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e777.jpg is mostly explained by An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e778.jpg in relative terms.

The next lemma summarizes results about the asymptotic properties of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e779.jpg, particularly with relation to the scale of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e780.jpg as given by its variance.

Lemma 15 We have that

equation image
(37)
equation image
(38)

Under assumptions (a)-(c), for a fixed An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e783.jpg, we have that

equation image
(39)

Furthermore, under assumptions (a)–(d) we have that

equation image
(40)
equation image
(41)

for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e787.jpg.

Proof. A direct calculation from the form given in (16) gives that

equation image
(42)
equation image
equation image
(43)
equation image

As An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e792.jpg, (37) follows.

To show (38), first observe that

equation image
(44)

Next, using the definition of the projection, we obtain that

equation image
equation image
equation image
equation image

from which (38) follows, due to the identity in (44). Note that the last identity implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e798.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e799.jpg are uncorrelated.

Before continuing, we note that (41) is a direct consequence of (38), (40) and Chebyshev's inequality [22]. To complete the proof of the lemma all reduces therefore to show (41) under conditions (a)–(d). Indeed, if An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e800.jpg and we let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e801.jpg then due to the identities in (22) and (37) and Lemma 12, we obtain under (a)–(c) that

equation image

uniformly for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e803.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e804.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e805.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e806.jpg, we have thus shown (39). Furthermore, note that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e807.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e808.jpg; in particular, due to (33) and conditions (a)–(d), we can assert that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e809.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e810.jpg, the above identity together with the one in (37) let us conclude that

equation image

as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e812.jpg. Because of condition (d), the big-O term above tends to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e813.jpg. As a result:

equation image
(45)

On the other hand, (38) implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e815.jpg. Hence, using (19) and (25) to bound from above the variance of the U-statistic, we obtain:

equation image

as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e817.jpg, where for the last identity we have used (32) and (34). Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e818.jpg, it follows from the above identity that

equation image

In particular, if the base-An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e820.jpg in the logarithm is selected to satisfy that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e821.jpg, then

equation image
(46)

The identities in equation (45) and (46) show (41), which completes the proof of the lemma.

Proof of Theorem 5

For a fixed An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e823.jpg, note that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e824.jpg is the sum of two independent sums of non-degenerate i.i.d. random variables and thus,

equation image

is asymptotically a standard Normal random variable as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e826.jpg by the classical Central Limit Theorem. We would like to show however that this convergence also applies if we let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e827.jpg vary with An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e828.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e829.jpg. We do so using the Berry-Esseen inequality [23]. Motivated by this we define the random variables

equation image
equation image

Note that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e832.jpg, and that

equation image

We need to show that

equation image
(47)

uniformly for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e835.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e836.jpg.

Note that from (42) and (43),

equation image
equation image

Let

equation image
equation image

It follows from (37) that

An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e841.jpgBut note that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e842.jpg. Since, according to Lemma 14, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e843.jpg decreases exponentially fast, we obtain

equation image

uniformly for all An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e845.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e846.jpg. On the other hand, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e847.jpg. Furthermore, (33) implies that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e848.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e849.jpg, for some finite constant An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e850.jpg we find that

equation image

which shows (47).

The above establishes convergence in distribution of An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e852.jpg to a standard normal random variable uniformly for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e853.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e854.jpg. The end of the proof is an adaptation of the proof of Slutsky's Theorem [21]. Indeed, note that

equation image
(48)

From this identity, it follows for any fixed An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e856.jpg that

equation image

The first term on the right-hand side of the above inequality can be made as close to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e858.jpg as wanted, uniformly for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e859.jpg, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e860.jpg, because of (40). On the other hand, the second term tends to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e861.jpg uniformly for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e862.jpg because of (41). Letting , shows that

equation image

Similarly, using (48), we have:

equation image

and a similar argument as before shows now that

equation image

which completes the proof of the theorem.

We finally show Theorem 4, for which we first show the following result.

Lemma 16 Let An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e866.jpg be the set of one-to-one functions from An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e867.jpg into An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e868.jpg. Consider the kernel An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e869.jpg, and define

equation image
(49)
equation image
(50)
equation image
(51)
equation image
(52)

Then, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e874.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e875.jpg,

equation image
(53)
equation image
(54)

Proof. Fix An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e878.jpg. We first use a result by Sen [24] to show that, for each An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e879.jpg:

equation image
(55)
equation image
(56)
equation image
(57)
equation image
(58)

in an almost sure sense. Indeed, assume without loss of generality that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e884.jpg. As the kernel functions found in (49) and (51) are bounded, the hypotheses of Theorem 1 in [24] are satisfied, from which (55) and (57) are immediate. Similarly, because An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e885.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e886.jpg are discrete random variables, (56) and (58) also follow from [24].

Define

equation image
(59)
equation image
(60)

and observe that

equation image
equation image

Furthermore, due to (55)–(58), we have that

equation image
(61)
equation image
(62)

But note that, for An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e893.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e894.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e895.jpg are independent and hence uncorrelated. Similarly, the random variables An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e896.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e897.jpg are independent. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e898.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e899.jpg, it follows from (61) and (62), and the Bounded Convergence Theorem [22] that

equation image
(63)
equation image
(64)

as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e902.jpg.

Finally, by (30) and (31) it follows that

equation image
equation image

In particular, again by the Bounded Convergence Theorem, we have that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e905.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e906.jpg. Since

equation image
equation image

the lemma is now a direct consequence of (63) and (64), and Theorem 1.5.4 of Durrett [22].

Proof of Theorem 4

Fix An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e909.jpg. Using (16) we have that

equation image
equation image
equation image
equation image

It follows by (28) and (29) that

equation image
(65)
equation image
(66)

where An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e916.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e917.jpg are as in (59) and (60), respectively. Furthermore, observe that

equation image

In particular, due to (37), we obtain that

equation image

By Lemma 16, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e920.jpg converges in probability to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e921.jpg, while similarly An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e922.jpg converges in probability to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e923.jpg; in particular, the first two terms on the right-hand side of the inequality converge to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e924.jpg in probability. Since An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e925.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e926.jpg, the same can be said about the last two terms of the inequality. Consequently, An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e927.jpg converges to An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e928.jpg in probability, as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e929.jpg. As stated in (39), however, conditions (a)–(c) imply that An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e930.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e931.jpg are asymptotically equivalent as An external file that holds a picture, illustration, etc.
Object name is pone.0042368.e932.jpg, from which the theorem follows.

Supporting Information

File S1

Summary Metadata related to Table 1 (tab-limited text file).

(TXT)

File S2

OTU table related to Table 2 and Figures 1, ,2,2, ,3,3, and and44 (tab-limited text file).

(TXT)

Acknowledgments

We thank Rob Knight for insightful discussions and comments about this manuscript, and Antonio Gonzalez for providing processed OTU tables from the Human Microbiome Project.

Funding Statement

This work was partially supported by the National Science Foundation grant DMS #0805950, the National Institutes of Health (HG4872), and the Crohns and Colitis Foundation of America. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Lladser ME, Gouet R, Reeder J (2011) Extrapolation of urn models via poissonization: Accurate measurements of the microbial unknown. PLoS ONE 6: e21105. [PMC free article] [PubMed]
2. Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40: 237–264.
3. Esty WW (1983) A Normal limit law for a nonparametric estimator of the coverage of a sample. Ann Stat 11: 905–912.
4. Mao CX (2004) Predicting the conditional probability of finding a new class. J Am Stat Assoc 99: 1108–1118.
5. Lijoi A, Mena RH, Prünster I (2007) Bayesian nonparametric estimation of the probability of discovering new species. Biometrika 94: 769–786.
6. Robbins HE (1968) Estimating the total probability of the unobserved outcomes of an experiment. Ann Math Statist 39: 256–257.
7. Starr N (1979) Linear estimation of discovering a new species. Ann Stat 7: 644–652.
8. Halmos PR (1946) The theory of unbiased estimation. Ann Math Statist 17: 34–43.
9. Clayton MK, Frees EM (1987) Linear estimation of discovering a new species. J Am Stat Assoc 82: 305–311.
10. Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71: 8228–8235. [PMC free article] [PubMed]
11. Lozupone C, Lladser ME, Knights D, Stombaugh J, Knight R (2011) UniFrac: An e_ective distance metric for microbial community comparison. ISME J 5: 169–172. [PMC free article] [PubMed]
12. Tuerk C, Gold L (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249: 505–510. [PubMed]
13. Huttenhower C, Gevers D, Sathirapongsasuti JF, Segata N, Earl AM, et al. (2012) Structure, function and diversity of the healthy human microbiome. Nature 486: 207–214. [PMC free article] [PubMed]
14. Hoeffding W (1948) A class of statistics with asymptotically normal distribution. Ann Math Statist 19: 293–325.
15. Efron B, Stein C (1981) The jackknife estimate of variance. Ann Stat 9: 586–596.
16. Shao J, Wu C (1989) A general theory for jackknife variance estimation. Ann Stat 17: 1176–1197.
17. Grams WF, Sering RJ (1973) Convergence rate for U-statistics and related statistics. Ann Stat 1: 153–160.
18. Hampton JD (2012) Dissimilarity and Optimal Sampling in Urn Ensemble Model. Ph.D. thesis, University of Colorado, Boulder, Colorado.
19. Ahmad AA (1980) On the Berry-Esseen theorem for random U-statistics. Ann Stat 8: 1395–1398.
20. Callaert H, Janssen P (1978) The Berry-Esseen theorem for U-statistics. Ann Stat 6: 417–421.
21. Slutsky E (1925) Über stochastische asymptoten und grenzwerte. Metron 5: 3–89.
22. Durrett R (2010) Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press. URL http://books.google.com/books?id=evbGTPhuvSoC.
23. Shevtsova IG (2010) An improvement of convergence rate estimates in the Lyapunov theorem. Doklady Mathematics 82: 862–864.
24. Sen PK (1977) Almost sure convergence of generalized U-statistics. Ann Probab 5: 287–290.

Articles from PLoS ONE are provided here courtesy of Public Library of Science