Mathematically, a ranked list of

*N* candidates is a permutation of numbers

*{1, 2, 3,…, N}*, and comparing rankings from different algorithms corresponds to measuring the degree of agreement between a pair of permutations. We adopt Kendall's tau and Spearman's Footrule coefficient (Kendall,

1938,

1948), which are two of the well-known approaches to compare two ranked lists. For clarity, the formal definitions are described in

Appendix B which also describes how these two metrics have the following desirable features:

- •When the lists are identical, coefficients are close to zero.
- •When the lists have few common elements, or they are disjoint, the coefficients take higher values.
- •When ordering of elements is different, the coefficients take higher values.

Intuitively, this is similar to measuring the distance between two lists based on the number of common members and their relative order in the respective lists.

^{2} Several variations of these two formulae have been proposed in the literature, and we follow the general approach detailed in Fagin et al. (

2003). These methods have been used in information retrieval applications, such as the comparison of results generated by web-search engines, comparison of data lists that get modified frequently, etc., and details could be found in Dwork et al. (

2001) and Fagin et al. (

2003). The intent here is to adapt Kendall's and Spearman's coefficients to genome-wide studies.

The classical Kendall's Tau and Spearman's Footrule coefficients are defined over two permutations of the same list. In the context GWA studies on high-density genotyping microarrays, this amounts to processing lists containing >500,000 SNPs. However, GWA studies are often designed for identification of the “top k” best SNPs. Literature (Pearson et al.,

2007) and our prior experience in this field indicate that the typical measurement errors in the estimation of allelic frequencies are around 1–2% and that the allelic frequency differences are negligible for SNPs beyond the top few thousand SNPs. Thus, the reliability of rankings and ordering diminishes as we move down the list. Since only “top k” candidates are retained, we employ a more practical approach that compares two ranked lists based on “top k” elements. In other words, instead of comparing complete lists, we compare partial lists which is a more appropriate and general approach.

^{3} To that end, we follow the

*modified* Kendall's Tau and Spearman's Footrule proposed in Fagin et al. (

2003). Given two partial ranked lists, we take into account the number of elements common to both the lists, those in only one of the lists, and the order in which they appear. For the sake of completeness, we also include in our analysis the intersection coefficient between the lists, which is just the number of SNPs common in two lists divided by the size of the list (see

Appendix B). Thus, we have three different methods for comparing lists and, in general, we recommend using Kendall's and Spearman's than the intersection method because the intersection method does not take into account the ordering. We leave the comparison between the Kendall's and Spearman's method as an open question that is outside the scope of this study.

3.1.

Comparing a method with the true results

For six representative algorithms (described in

Appendix A) available for prioritizing SNPs using GenePool software, shows the Kendall's, Spearman's, and Intersection coefficient compared with the true rankings for “top k” SNPs. The true ranks were computed using the test of two proportions on the known genotyping data from HapMap.

*k* is varied from 1000 to 25,000, in steps of 1000. More precisely, for each plot, the Y coordinate shows the value of the coefficient for the corresponding “top k” SNPs on the X axis. We restrict the analysis to the top 25,000 SNPs because it represents the top 5%, a typical number selected in GWA studies. (In fact, due to practical limitations and cost issues, many researchers can examine only top few hundred SNPs.) As explained in

Appendix B, values near one for Kendall's and Spearman's coefficient indicate that the lists are highly different while those near zero indicate that the lists are similar. In , it is observed that the Kendall's and Spearman's coefficients decrease monotonically with increasing

*k*. Increased similarity with higher values of

*k* implies that there are more common SNPs and the relative ordering of the SNPs between the two lists has a higher degree of agreement.

As an example, consider comparing the methods in based on Kendall's coefficient, with the constraint of selecting only the top 5,000 candidate SNPs. Under this scenario, the method in should be preferred because it has higher similarity with the true results (lower Kendall's coefficient) than does the method in . For

*k*>

20,000, all six plots show similar Kendall's coefficients indicating that all methods are equally accurate as compared to the case

*k*=

5,000 where significant differences exist.

itself does not indicate how statistically significant these results are. To assign a statistical significance to these coefficients, Monte Carlo simulations were carried out because generating a true distribution from every possible permutation of 500,000 numbers is not practicable. We generated a very large number (>10

^{7}) of pairs of random permutations of the set

*{1, 2,…, N}, N*=

number of SNPs ≈500,000. For each pair of permutation, the Kendall, Spearman, and intersection coefficient were computed for various top

*k* SNPs, with

*k* from 1000 to 25,000, in steps of 1000, to be consistent with the top

*k* SNPs shown on the X axis. The

*p*-values were extremely small (<10

^{−5}), practically equal to zero, for

*k* values greater than 5,000, which implies that there is a statistically significant degree of agreement between the true ranks and those from methods used in GenePool. It should be noted that no individual genotyping information was available to GenePool; it only analyzes SNPs using pooled DNA. This leads us to the conclusion that the rankings provided by GenePool closely reflect our prior knowledge based on allelic frequency differences of known genotypes, and the methods could be used generally on any pooled DNA with no individual genotyping information available whatsoever.

3.2.

Comparing two different methods

The procedure of comparing candidate ranked lists from two different methods is similar to the comparison between a true list and a candidate list generated by a single method. Kendall, Spearman, and Intersection coefficients are computed analytically, and Monte Carlo simulations are carried out. shows pair-wise comparisons of five methods (hence five choose 2

=

10 plots in each sub-figure) for each of the coefficients with increasing values of

*k*. For smaller values of

*k* (<1000), the intersection coefficient is very small and the other two are very high. This indicates that the methods did not have a huge degree of agreement for extremely smaller values of

*k*. However, with increasing

*k*, the degree of agreement increased significantly. From the Monte Carlo simulations, the results were found to be statistically significant with

*p*-values almost zero for

*k*≈

20,000.

could be used to draw various inferences. For example, pair-wise comparison of three methods (Ttest-Silho, Silho-Dunn, Dunn-Ttest) shows that the intersection coefficient is quite high and Kendall/Spearman coefficients are quite low. Very low *p*-value indicates that this is statistically significant and not just a random event. Thus, these three methods generate a similar result, which is also clear from the high similarity between the plots in . To avoid redundancy in the analysis, just one of these three methods could be employed. Similarly, the methods CoDir and MdfdT generate significantly different lists. Thus, if the same data had to be analyzed using independent methods, CoDir and MdfdT should be included. If the same set of SNPs is observed in the “top k” of these independent methods, we have much more confidence in the results than the case when the SNPs are selected by similar methods, such as the Silhouette, Dunn, and Ttest.