3.1 Dataset and SNP calling methods
We consider an Illumina Omni 1M dataset that consists of 3258 samples, and 141 out of 3258 samples were from 38 distinct HapMap samples with some individuals genotyped multiple times. The overall performance is evaluated by comparing the SNP calls for these HapMap samples by different methods to those available from the International HapMap Project database (The International HapMap Consortium, 2007
). In this article, we focus on the 942 313 SNPs in the whole genome with the exception of the sex chromosome. The X chromosome SNPs (24 717) are analyzed alone. The null genotypes in the HapMap project are ignored. The performance of our proposed method, M3
, and other existing calling algorithms is evaluated based on this Illumina dataset.
It has been demonstrated that the call rate and call accuracy of both GenoSNP and CRLMM are better than those of Illuminus when a small number of samples are collected (Ritchie et al., 2011
). As for CRLMM, its implementation depends on the reference population and its calculation strongly hinges on computer configuration (Ritchie et al., 2011
; Zhang et al., 2010
). Based on these considerations, we compare M3
with GenCall as a representative of the population-based method and GenoSNP as the SNP-based method. For the Illumina GenCall approach, genotypes with good GenCall scores (GC score ≥ 0.15) are used as the inferred genotypes. As discussed earlier, GenoSNP is built on a SNP-based mixture model without a reference population to genotype every individual in turn, and this model may be good at genotyping rare variants. It calculates the posterior probability of each sample at a specific SNP, and we use a cut-off value of 85% to select SNPs and samples with good quality. Samples and SNPs having poor clustering properties (low posterior probability) are treated as missing data. As discussed in the model section, the two-stage M3
approach aims to take the advantage of GenCall and GenoSNP. We use the PR of 0.85 as a cut-off to filter samples at a particular SNP in our analysis, and a union set of SNPs with MAF < 0.05 and APR < 0.9 are selected to be re-called in the second stage of our proposed method.
We first evaluate the SNP calling results by evaluating the call rate of each method and the concordance among them. The call rate is defined as the ratio of genotypes passing the calling threshold to the total number of genotypes that need to be inferred. The concordance rate between two algorithms refers to the percentage of agreement of inferred genotypes between two algorithms. The relevant results are summarized in . In brief, there is high consistency among these three methods (M3
, GenCall and GenoSNP) overall. But, there are some discrepancies among these three algorithms. We note that the major homozygote calls by GenCall are more frequently called heterozygote by M3
more likely genotypes null components by GenCall and GenoSNP (Supplementary Materials
). This is partially due to the fact that M3
gives the largest call rate (99.64%), followed by GenoSNP (99.22%) and GenCall (98.16%) ().
The comparisons of call rate and concordance rate among GenCall, GenoSNP and M3
Since the true genotypes are unknown, the above concordance comparisons do not reveal which method performs better. In the following, we focus on the genotype calls of 141 out of 3258 samples from 38 distinct HapMap samples using the genotypes of these individuals obtained from the HapMap project as a gold standard in our comparisons. One issue in using the SNP calls from the HapMap data is differentiating the major allele between two alleles at an SNP. In our comparisons, we count the discrepancy in homozygote calls, denoted by E (Error), between each of the three algorithms and HapMap data, and vary the cut-off level at 1, 10 and 50 to remove SNPs with different major allele assignments between each of the three algorithms and HapMap data. With a more stringent threshold, e.g. E < 1, fewer SNPs with inconsistent major allele assignment are compared. The results are summarized in . We note that M3 has the best average call accuracy and average call rate by three criterions (GenCall, GenoSNP and M3) under all three E cut-offs, followed by GenoSNP and GenCall. For example, when the threshold level for E is set at 1, the highest call rate and the best SNP calling accuracy are achieved by M3 (99.77%, 99.23%), followed by GenoSNP (99.21%, 98.74%) and GenCall (98.03%, 97.87%). With a less stringent cutoff, fewer accurate genotypes by M3 are obtained with average call accuracy 99.20% at E < 10 and 99.11% at E < 50. This is likely due to more consistent major allele assignments with a more stringent cut-off. A similar trend also occurs for GenoSNP and GenCall when the E cut-off becomes increasingly stringent.
The comparisons of call rates and concordance on HapMap samples for overall SNPs
When more SNPs are added to the genotyping arrays, a larger proportion of the newly identified SNPs will have lower MAFs than those already on the arrays. It has been demonstrated that GenoSNP performs better than other algorithms in calling rare variants (Giannoulatou et al., 2008
; Ritchie et al., 2011
). Since M3
tries to take advantage of both the population-based strategy and the SNP-based approach, we expect it to perform well for rare SNPs. summarizes the comparison results for SNPs with MAF < 0.1, 0.05 and 0.01. We use HapMap data from four populations to estimate the MAF for each SNP, because the 38 HapMap samples in our Illumina data come from JPT (116), CHB (139), YRI (209) and CEU (174). We calculate the MAF of each SNP by the weighted value of the MAF of each population. The weight of each population is determined by its population proportion in the 38 HapMap samples. In general, M3
yields the best SNP calling accuracy and highest call rate under three E
cut-offs for SNPs with MAF < 0.1, 0.05 and 0.01. For SNPs with very small MAF (<0.01), M3
(98.47 ~98.77%) provides large average call accuracy by three criterions, followed by GenoSNP (97.80% ~98.10%), GenCall (96.11% ~96.43%). Overall, when considering both common and rare variants, M3
yields the best SNP calling among the three methods.
Comparisons of call rates and concordance on HapMap samples for rare variants
The HWE test (P<0.0001) is a commonly used criterion to examine the quality of genotype calling of each SNP. We compare different SNP calling algorithms based on the proportion of SNPs failing the HWE test. As mentioned above, our Illumina dataset includes 3258 individuals, consisting most of African-Americans (AA) and European-Americans (EA), who are either Hispanic or non-Hispanic. We perform the HWE test on these four populations separately. The results are summarized in . Since GenoSNP is a SNP-based calling algorithm that does not consider HWE in SNP calling, a large number of SNPs (50860, 14432, 63170 and 20342 corresponding to the four populations, respectively) failed the HWE test. In contrast, for GenCall, the population-based strategy, fewer SNPs (14447, 1450, 32209 and 2801 for the four populations, respectively) failed the HWE test. Since M3 is largely a population-based approach, it performs relatively well on the HWE test with quite fewer SNPs (27288, 5155, 44123 and 7631 for the four populations, respectively) failing. Therefore, M3 performs well at both calling rare variants and generating calls that are more likely to pass the HWE test.
Comparisons of HWE test among GenCall, GenoSNP and M3
The higher call rate and more accurate genotype calls of M3 may help to increase statistical power to detect disease-associated variants, especially those with low MAFs. To quantify the power gain, we compare the average power to detect disease-associated variants based on the three calling algorithms using the HapMap sample data [ASW (87) and CEU (174)]. We consider a case–control scenario for AA and EA separately and assume a similar genotyping characteristic for the collected data. We assume a disease prevalence of 0.05, and the relative risk λ varies from 1.5 to 4 under the multiplicative model. The allele frequencies are based on empirical data from the HapMap project. We consider the statistical significance level by Bonferroni correction (5×10−8), and fix the number of cases and controls for both populations (AA: case:control=1250:758; EA: case:control=749:239). The power to detect the risk alleles based on the correct calls for all individuals and all SNPs is measured as a standard. For the three calling algorithms, we assume that the observed allele frequencies at each SNP and the number of cases or controls are similar to those observed from the HapMap samples collected in our study. The power comparisons for the overall SNPs and rare SNPs are summarized in and , respectively. In brief, compared with GenCall and GenoSNP, M3 has the largest power for both common variants and rare variants in both populations. When the relative risk increases, we observe a larger improvement of M3 versus GenCall or GenoSNP. The ratio of M3 versus GenCall or M3 versus GenoSNP measures this improvement, and it has been found that the power of M3 increases 1.46% and 1.03% compared with GenCall for AA and EA, respectively. Similarly, the increase in power of M3 versus GenoSNP is ~ 0.12% and 0.16% for both populations. In particular, the improvement of M3 versus GenCall or GenoSNP for rare variants is more noticeable than that for common variants. (The increase in power of M3 versus GenCall: 1.84% and 1.12% for both populations; the increase in power of M3 versus GenoSNP: 0.43% and 0.34% for both populations.) In general, the power achieved by M3 is closer to the ideal power when the genotypes of all study subjects are correctly inferred.
Fig. 3. The average power to detect risk alleles for overall SNPs by HapMap, GenCall, GenoSNP and M3. Red solid line: the ideal power measured by the HapMap project; blue dash line: the average power measured by the GenCall; purple dot line: the average power (more ...)
The average power to detect risk alleles for less common SNPs by HapMap data, GenCall, GenoSNP and M3. Less common SNPs: the SNPs with MAF < 0.05; cutoff: 5×10−8.
We also evaluate the performance of three algorithms on the X chromosome SNPs. The average call accuracy is compared with the HapMap project calls under three E
(Error) cutoffs, E
< 50, 10 or 1. summarizes the overall concordance result on the X chromosome among three algorithms. Again, M3
provides the best call rate and accuracy on the sex chromosome SNPs, compared with GenCall and GenoSNP. It has been demonstrated that the model incorporating the gender information will perform better than methods (GenCall and GenoSNP) ignoring this gender information (Ritchie et al., 2011
). We further exam the gender-dependent model in M3
, denoted as Mdep3
, on the X chromosome. Compared with M3
without incorporating the gender information, a higher call accuracy is achieved by the model Mdep3
involving the gender information (). The higher call accuracy on the male subjects results in the great improvement of calls. We suggest that the X chromosome SNPs should be called separately using a gender-dependent model (Mdep3
) in practice.
The comparisons of call rates and concordance on HapMap samples for X chromosome SNPs