Robustness of genotyping
The experiment that assessed the robustness of genotyping used Affy6. DNA samples of three HapMap subjects and three NCTR subjects were genotyped, each with four replicates. The Birdseed-v1 in APT (1.10.0) was used to make genotype calls. CNV were determined using the apt-canary program in APT. To assess reproducibility across laboratories, the raw data of the three HapMap subjects from Affymetrix were included in our comparisons. The results are depicted in (data in Supplementary Table 1).
Figure 1 Genotyping robustness based on raw intensity, genotype, and CNV. The average Pearson's correlation coefficients of log2-scaled raw intensity are color coded in red, the average concordance of genotypes in blue, and the average concordance of CNV in green. (more ...)
The QC scores of the 24 CEL files were in the range of 88.6–98.3% (Supplementary Figure 1), similar to the HapMap data from Affymetrix (88.2–99.1%, Supplementary Figure 5c) and compatible with Affymetrix guidelines, indicating that data were of acceptable quality for the comparative study.
The consistency of log2-scaled intensity data were examined using Pearson's correlation. Each pair-wise comparison is summarized in Supplementary Figure 2. The average correlation () between technical replicates (BTR) for five subjects was 0.9514. One subject (N13) had noticeably lower average correlation (0.9231), with one of its replicates determined to be an outlier (lower quality). For the HapMap samples, the average correlation between experiments and Affymetrix data (BEH) was 0.9403, slightly lower than the value corresponding to BTR (0.9515). The average correlation between not-related samples (BNS) was much lower (0.7576). The average correlation between parent and son (BPS) was 0.8456.
Genotype concordances were calculated for all pair-wise comparisons (Supplementary Figure 2) and averaged for BTR, BEH, BPS, and BNS (). The average concordance for BTR was 0.9886, excluding N13 (0.9799), indicating a high repeatability. The average concordance for BEH was 0.9883, showing a high reproducibility across laboratories. As expected, the average concordance for BNS (0.6177) was low, and for BPS (0.7290) moderate.
Except for one replicate of N13 with a significantly lower heterozygous rate, the call rates and heterozygous rates were very similar for comparisons between replicates and between these experiments and Affymetrix data (Supplementary Figure 3). The lower heterozygous rate for the replicate of N13 is consistent with its lower average intensity correlation and genotype concordance. After removal of this replicate, the average intensity correlation (0.9568) and genotype concordance (0.9899) for N13 were similar to the other subjects.
Technical robustness was also evaluated by calculating CNV concordances for all pair-wise comparisons (Supplementary Figure 4), and averaging them for BTR, BEH, BPS, and BNS (). The average concordance for BTR was 0.9804, except for N13 (0.9414), indicating a reasonable CNV repeatability. For the HapMap samples, the average concordance for BEH was 0.9605, similar to the corresponding BTR (0.9784), showing reasonable robustness across laboratories. As expected, the average concordance for BNS (0.8662) was low, and for BPS (0.8978) moderate.
In spite of the apparent overall reproducibility, an outlier was detected only after replicate measurements were completed. The outlier would have not otherwise been detected, as the array met the guidelines for Affymetrix genotyping array quality.
Inconsistencies between SNP arrays
To examine whether genotype calls from different SNP arrays are consistent, genotypes of SNPs interrogated in common in both Affy500K and Affy6 were compared using the 270 HapMap samples.46
The QC scores for Affy500K (Supplementary Figures 5a and b) and Affy6 (Supplementary Figure 5c) data met Affymetrix guidelines. Therefore, all CEL files were used.
After quantile normalization, genotypes were called using the same calling algorithm, Birdseed, with the same parameter settings. Thereafter, the 482
215 common SNPs were used for the comparisons ().
Figure 2 Overview of the procedures for evaluating consistency between SNP arrays (a). Both data sets of the 270 HapMap samples from Affy500K and Affy6 were genotype called using algorithm Birdseed. The 482251 SNPs interrogated in both arrays were used (more ...)
The missing call rates per SNP () and per sample () were compared between Affy500K (x axis) and Affy6 (y axis). Many SNPs and samples are not consistent, some of which show large differences between the two arrays. Moreover, the missing call rates from Affy6 are slightly lower than those from the Affy500K. The P-values (Supplementary Table 2) of paired two-sample t-tests for comparing the missing call rates per SNP and per sample were <0.05, indicating that the difference of missing call rates is statistically significant.
Figure 3 Comparison of genotype calls between SNP arrays. The missing call rates per SNP (a) and per sample (b) between arrays Affy500K and Affy6 were plotted. The red diagonal lines indicate the locations of SNPs (a) and samples (b) when their missing call rates (more ...)
Three possible genotypes (homozygote: AA; heterozygote: AB; and variant homozygote: BB) are provided for each call. The concordance of each paired calls between Affy500k and Affy6 was analyzed (Supplementary Table 3). The analysis revealed 267
608 (0.21%) genotype differences between the two arrays. Further comparison regarding the nature of the differences () shows that concordance of homozygous calls (AA and BB) was higher than the concordance of heterozygous calls (AB). Moreover, discordant genotypes between heterozygote and homozygote were more prevalent than those between two homozygous types.
Inconsistencies between calling algorithms
Genotype concordances were determined between three algorithms (DM, BRLMM, and Birdseed) that were released along with three recent generations of Affymetrix arrays (). Affy500K raw data for the 270 HapMap samples were called using the three algorithms. Thereafter, the calls were compared to determine consistency between algorithms.
The missing call rates per SNP () and per sample () were compared. Many SNPs and samples had different missing call rates between the three algorithms. Furthermore, the missing call rates of the single-chip-based algorithm DM were higher compared with the multiple-chip-based algorithms BRLMM and Birdseed (caused by the default cutoff used in this study, see Discussion), whereas differences between BRLMM and Birdseed were much smaller. The P-values (Supplementary Table 2) of paired two-sample t-tests when comparing missing call rates per SNP and per sample were <0.05, indicating that the algorithms have significantly different missing call rates.
Figure 4 Comparison of genotype calls between calling algorithms. The missing call rates per SNP (a) and per sample (b) between algorithms Birdseed, BRLMM, and DM were plotted. The red diagonal lines indicate the locations of SNPs (a) and samples (b) when their (more ...)
The consistencies of successful calls between the three algorithms were calculated as concordances given in Supplementary Table 3. A total of 538
774 genotypes (0.41%) differed between DM and Birdseed; 200
592 genotypes (0.15%) between DM and BRLMM; and 285
788 genotypes (0.21%) between Birdseed and BRLMM. The concordance of the successful calls between BRLMM and Birdseed stratified on three genotypes that are given in . The concordance for homozygous calls was higher than for heterozygous calls for both BRLMM and Birdseed. Moreover, discordance between heterozygote and homozygote was higher than between the two homozygous types. Comparisons between DM and Birdseed and between DM and BRLMM are depicted in , respectively, with similar trends to the comparison between BRLMM and Birdseed prevailing, such as homozygous calls being more concordant than heterozygous calls.
Propagation of array inconsistency to associated SNPs
The objective of a GWAS is to identify genetic markers associated with a phenotype. It is critical to assess how inconsistencies between different SNP arrays propagate to the associated SNPs identified in the downstream association analysis. To mimic case–control GWAS, three association analyses were conducted for genotypes obtained from Affy6 and Affy500K data for the 270 HapMap samples (). Each of the three population groups (EU: European; AS: Asian; and AF: African) were set in turn as the cases, whereas the other two groups were set as the controls. Associations were analyzed to identify SNPs that can differentiate cases from controls. The significantly associated SNPs were compared using Venn diagrams.
Comparisons of significantly associated SNPs obtained from allelic and genotypic tests (on 482
251 common SNPs) between the two arrays are given in , respectively. For all case–control frameworks and both allelic and genotypic tests, the inconsistency in genotypes between arrays influenced the downstream association analyses, resulting in differently associated SNPs. For example, using allelic testing, 4926 SNPs were significant only for the Affy500K using Europeans as case. It is unclear whether these differences are due to Type I errors using Affy500K or Type II errors using Affy6. Alternatively, the variation in associated SNPs could be due to the exclusion of SNPs during QC steps.
Figure 5 Comparisons of the lists of associated SNPs from Affy500K and Affy6 for assessing propagations of the inconsistency in genotypes between the two arrays to associated SNPs. The significantly associated SNPs identified using allelic association test (a (more ...)
For associated SNPs not common to both arrays, observed differences in downstream association analysis were examined to see whether they were due to failing to pass QC or to conflicting results for statistical testing. The results depicted in show that most associated SNPs missed with Affy500K were excluded at the QC step. Differences in statistical testing were the major cause for the associated SNPs missed in Affy6.
Propagation of calling algorithm inconsistency to associated SNPs
To assess propagation of inconsistencies in genotypes between calling algorithms to the associated SNPs, associations were analyzed using genotypes obtained from algorithms DM, BRLMM, and Birdseed for Affy500K data for the 270 HapMap samples. The associated SNPs were compared using Venn diagrams () for the allelic and genotypic tests, respectively. The inconsistencies in genotypes between the three algorithms propagated into the downstream association analyses. For example: only 1593, 1349, and 1873 SNPs were significantly associated (genotypic test, European as case) using DM, BRLMM, and Birdseed algorithms, respectively. Again, possible Type I or Type II errors as well as QC exclusion differences contribute to the variability in the associated SNPs.
Figure 6 Comparisons of the lists of associated SNPs between calling algorithms DM, BRLMM, and Birdseed for assessing propagations of the inconsistence in genotypes to association SNPs. The significantly associated SNPs identified using allelic association test (more ...)
For SNPs found to be significant only from one algorithm, the SNPs that failed in QC and in statistical tests are given in . Missed SNPs from DM were mainly caused by QC exclusion, whereas missed SNPs from BRLMM and Birdseed were mainly caused by association testing.
For SNPs that were identified as significant only from two algorithms, but not the third, the SNPs that failed in QC and in statistical tests are shown in . QC caused more missed SNPs from DM and Birdseed, whereas association testing caused more missed SNPs from BRLMM.