(a) Multiple-site similarity

All similarity indices represent variations over three parameters: species composition in each of two sites and the species shared between the two sites (

Novotny & Weiblen 2005). The widely used Sørensen similarity index (

Magurran 2004) measures similarity in species composition for two sites, A and B, by the equation

where

*a* is the number of species found in site A;

*b* is the number of species in site B and

*ab* is the number of species shared by the two sites.

For studies where more than two sites are evaluated, the overall similarity is calculated as the average of the pairwise similarities. As an illustration of the shortcomings of such an approach, we can look at two hypothetical cases. Let case 1 have three sites with four species in each: [(*s*_{1}*,s*_{2}*,s*_{3}*,s*_{4}), (*s*_{1}*,s*_{2}*,s*_{5}*,s*_{6}), (*s*_{1}*,s*_{2}*,s*_{7}*,s*_{8})], where *s*_{i} is species number *i*. The similarity is the same for all pairs of sites, *C*_{S}=4/8=1/2, with average similarity also equal to 1/2. Case 2 also has three sites with four species in each, but with a different distribution: [(*s*_{1}*,s*_{2}*,s*_{3}*,s*_{4}), (*s*_{1}*,s*_{2}*,s*_{5}*,s*_{6}), (*s*_{3}*,s*_{4}*,s*_{5}*,s*_{6})]. The similarity is still *C*_{S}=1/2 for all pairs, so the Sørensen similarity index does not ‘see’ the difference in species composition between the two cases. Using traditional similarity measures on assemblages with more than two sites, we will never do more than compare two sites at a time and thereby ignore ‘higher order similarities’.

We will now suggest a multiple-site similarity measure and start with the situation where we have three sites in a study. We follow the notation from equation

(2.1), with

*a*,

*b* and

*c* the numbers of species found in sites A, B and C, respectively, and

*ab* the number of species shared by sites A and B, etc., until

*abc* which is the number of species found in all three sites. Extending the approach of the Sørensen similarity index, a foundation for a three-site similarity measure can be

The numerator gives the number of species counts exceeding the first; and the denominator gives the sum of species counts over all the sites. This expression will equal 2/3 if all species are shared by all sites, since a species can contribute at most two times in the numerator and three times in the denominator. The three-site similarity measure should therefore be

in order to be in the range 0–1, with 1 indicating complete similarity. The general multiple-site similarity measure for

*T* sites can be formulated in the same manner

where

*a*_{i} is the number of species in site A

_{i},

*i*=1,

…,

*T*;

*a*_{ij} is the number of species shared by sites A

_{i} and A

_{j}; and

*a*_{ikj} is the number of species shared by sites A

_{i}, A

_{j} and A

_{k}, etc. With

*T*=2, we are back at the definition of the Sørensen similarity index (equation

(2.1)). The total number of species in the

*T* sites can, by the inclusion–exclusion principle, be written as

, simplifying the notation of our multiple-site similarity measure to

For the two hypothetical cases discussed earlier, we get

for case 1 and

for case 2. Our multiple-site similarity measure evaluates the sites in case 2 as more similar than the sites in case 1, which is in agreement with the assumption that evenness in the number of site observations for the species should be valued more, i.e. the similarity measure increases with a more even distribution of site observations. For case 2, we also obtain a lower total number of species (

*γ*-diversity), indicating a lower species turnover, hence a higher similarity.

Both cases 1 and 2 have covariance 0 between pairwise similarities, since all similarities are equal to 1/2. With

*T*=3, all pairs of similarities must necessarily be dependent, since they all share one site. The effect of covariance between pairwise similarities on average similarity will depend on the sign and magnitude of the covariance, as well as the proportion of independent pairwise similarities (

Ødegaard *et al*. 2005). To illustrate one possible effect of covariance, let case 3 also have three sites with four species in each: [(

*s*_{1}*,s*_{2}*,s*_{3}*,s*_{4}), (

*s*_{1}*,s*_{2}*,s*_{3}*,s*_{4}), (

*s*_{4}*,s*_{5}*,s*_{6}*,s*_{7})]. Here, the covariance between pairwise similarities is negative. The average similarity is still 1/2, but now

.

(b) Multiple-site similarity versus β-diversity and host specificity

*β*-diversity is essentially also a measure of how similar sites are in terms of the variety of species found in them. A high similarity indicates that there are few species differences between sites, yielding low

*β*-diversity values. One of the most straightforward measures of

*β*-diversity is Whittaker's (1972) measure,

, where

*S*_{T} is the total number of species; and

is the average species richness for the

*T* sites. The link between Sørensen's similarity measure for two sites and

*β*-diversity measures is well known (

Koleff *et al*. 2003). The relation between our multiple-site similarity and Whittaker's

*β*_{W} is simply

If all sites contain the same species, both

and

*β*_{W} will equal 1. If no sites share species,

and

*β*_{W}=

*T*, indicating that the total number of species

*S*_{T} is just the product

.

If, instead of species-sites data, we are studying host observations of, for example, phytophagous insect species on host plant species, the comparison of species compositions on different host plants can be performed by both similarity and host-specificity measures. The host specificity calculated from trophic interactions is defined as

(

Ødegaard *et al*. 2000;

Novotny *et al*. 2002), where

*S*_{T} is the total number of insect species found on

*T* host plant species;

is the average number of insect species associated with each host plant species; and

*T* is the number of host plant species in the study. The product

is thereby the total number of host observations. Host specificity views all host plant species simultaneously and can be considered a ‘multiple host dissimilarity measure’. The link between our multiple-site similarity measure and host specificity is

Note also that

*F*_{T}=

*β*_{W}/

*T*. If all species are shared by all hosts, the host specificity is 1/

*T* and the multiple-site similarity equals 1. With no species overlap, host specificity equals 1 and similarity becomes 0. If we regard our first two hypothetical cases as host observations of insect species on three different host species, we get host specificities 2/3 and 1/2, respectively. Case 1 has more monophagous species; therefore, it should also have higher host specificity.