Sample Quality
The missing call rate per sample is an informative indicator of sample quality. The Genotyping Centers generally fail samples with missing call rates >5%. During QA, we look for high outliers in the distribution of missing call rates and for low outliers in the distribution of the mean confidence score (over all non-missing genotypes) for each sample. None were found for the projects considered here.
Three additional quality measures are used in the QA process to detect mixed (contaminated) DNA samples. First, we identify outliers in autosomal heterozygosity within each ethnic group (for example, as points more than 1.5 inter-quartile ranges from the upper/lower quartile value). Second, we screen for samples with a high variance of BAF for non-homozygous SNPs, as described in
Supporting Information. Third, we look for unusual patterns of relatedness, such as samples that appear to be related to many other samples. Samples with one or more of these characteristics may be mixtures of multiple DNA samples, which can be identified by examination of BAF/LRR plots.
Figure S1 in Supporting Information shows examples of normal and low quality samples. The “Relatedness” section below describes an example of mixed sample detection.
Genotyping Batch Quality
All four studies have highly significant batch effects on the logarithm of missing call rate (ANOVA p-values < 10−100). In most cases, despite the high level of significance, the distribution of the mean missing call rate per batch is continuous with no obvious outliers. Therefore no batches were excluded by this criterion, except for one batch in the Addiction study in which only three of 24 samples passed QC at the genotyping center.
Another way to detect batch effects is to assess differences in allelic frequency between each batch and a pool of all other batches in a study, using a homogeneity test (see
Supporting Information). This allelic frequency test can be affected not only by laboratory processing, but also by the biological characteristics of the samples in a batch, such as continental ancestry, other ethnic variation and relatedness. After taking continental ancestry into consideration, no batch outliers were found in any of the four projects, except for the batch with only three passing samples noted above (see
Figure S2 and text in
Supporting Information).
Gender Checks and Sex Chromosome Aneuploidy
Gender identity is usually inferred from X chromosome heterozygosity, but we find that this variable alone gives ambiguous results, because of sex chromosome aneuploidies and genotyping artifact. Therefore, we also use plots of the mean intensities of the X and Y chromosomes, as shown in for the Addiction and T2D NHS projects.
In the Addiction project, the majority of males and females fall into two very distinct clusters based on X and Y chromosome intensities. All samples annotated as males have a Y intensity greater than all samples annotated as females. Therefore, there is no evidence of gender misidentification. However, several samples (delineated by the dashed lines in ) are distinct from the majority of males and females. Two males with DNA samples from blood have an X chromosome intensity typical of females and a Y intensity typical of males. They appear to be XXY. One of these males has high X heterozygosity and might be mistaken for a female if X heterozygosity alone was used for checking gender. Four male samples (3 cell line and one blood) have unusually high intensities of the Y chromosome and may be either XYY or perhaps XY/XYY mosaics. Similarly, two males have a low Y intensity and may be XY/XO mosaics (one cell line, one blood). Several females have low X intensities and low X heterozygosity, indicating that they are XO or, perhaps more likely, XX/XO mosaics, since they are all cell line samples. As expected, many of the putative XX/XO mosaics show allelic imbalance in BAF plots of the X chromosome, which will be discussed in the next section. Data for samples with a sex (or other) chromosome abnormality are posted on dbGaP, but the affected chromosome is flagged.
In the T2D HPFS project, all subjects are male, but several HapMap females were genotyped as controls for the gender identity check. shows one HapMap female (a cell line sample) that appears to be XX/XO. The plot also shows five unusual males with low Y intensity and substantial levels of X heterozygosity, while all other males have zero heterozygosity. This situation is an artifact of the Birdseed method for calling X chromosome genotypes. In this algorithm, samples inferred to be males by annotation and/or Y chromosome intensity are analyzed with a prior assumption of two genotype clusters, while those inferred to be females have a prior assumption of three clusters. Based on their low Y intensities, these five males were mis-assigned as females during automated calling of the X chromosome genotypes. Consequently, the X chromosome SNPs of these five samples are flagged for omission from association analyses.
Chromosomal Aberrations
Previous studies have documented the use of measures of allelic imbalance (BAF) and relative intensity (LRR) for detecting chromosomal aberrations with SNP array data [
Conlin, et al. 2010;
Peiffer, et al. 2006]. Aneuploidy and large (multi-megabase) duplications and deletions have been detected in tumor cells and in lymphoblastoid cell lines [
Simon-Sanchez, et al. 2007]. We also find such aberrations in blood and buccal cell samples. The frequencies and types of aberrations will be reported elsewhere, but some examples are given here from the Lung Cancer project. The upper panel of shows ‘sample 1’, a female with low X chromosome intensity and heterozygosity. Chromosome 8 in this sample has a normal pattern of three BAF bands, but on chromosome X the intermediate band (corresponding to heterozygous SNPs) is split into two, widely separated bands. These characteristics are expected for a mosaic population of disomic and monosomic cells in which monosomic cells predominate. The lower panel shows ‘sample 2’, which has a high intensity of chromosome 8 relative to its other chromosomes. In this case, the separation of the two intermediate bands is smaller. The positions of these bands (at about 0.4 and 0.6) and the high intensity indicate a mosaic population of trisomic and disomic cells. (A purely trisomic cell population would have the intermediate bands at 0.33 and 0.66.) Samples with a chromosome aberration are included in the posting on dbGaP, but we flag the affected chromosome to be filtered out during association analysis, since it is likely to have a high rate of genotyping errors. In addition, we suggest filtering out any sample-chromosome combination with a missing call rate greater than 5%, since such chromosomes may contain undetected aberrations.
Relatedness
We estimate the degree of relatedness between every pair of individuals in a study to identify unexpected relatedness. Three identity-by-descent (IBD) coefficients (Z
0, Z
1 and Z
2), the probabilities of sharing 0, 1 or 2 alleles that are identical by descent, are estimated using a method of moments procedure implemented with PLINK software [
Purcell, et al. 2007] and compared with their expectations and evolutionary variance (see
Supporting Information). Some pedigree errors can be corrected by consulting original records, while others are corrected based on the inferred genetic relationships.
shows a plot of estimates of Z1 versus Z0 for all pairs of Lung Cancer study subjects with a kinship coefficient estimate greater than 0.025. All study subjects were expected to be unrelated, but this is clearly not the case. Based on expected values (+/− 2 SD), we inferred 14 pairs of full sibs and 5 pairs of half sibs. In addition, there is one parent-offspring pair and 36 pairs of duplicates, including a pair of identical twins. Two of the duplicate pairs could not be documented as coming from the same subject and were removed from the data set.
In the T2D NHS project, the relatedness analysis revealed two samples that appeared to be related to nearly every other sample in the study with a kinship coefficient between the expected values of half sibs and first cousins (
Figures S3 –
S6). These samples have relatively high heterozygosity and their BAF plots indicate that they are mixed samples (
Figure S1 d and e). Several other samples that appeared to have relatedness to a large number of other samples are not clearly mixed, but appear to be of low quality since they all have more than five chromosomes flagged for high variance of BAF (e.g.
Figure S1 f). Similar samples were found in the T2D HPFS project and both projects are discussed further in
Supporting Information.
Population Structure
To investigate population structure, we use principal components analysis (PCA), essentially as described by
Patterson et al. [2006]. The choice of which SNPs to use for principal components analysis is not obvious. Using all SNPs on a whole-genome array is computationally demanding, but feasible, and would seem to be the best approach in terms of utilizing all available information about genetic relationships. However, whole-genome arrays contain clusters of highly correlated SNPs and a single cluster may have a very strong influence on certain PCs, as noted previously [
Novembre, et al. 2008;
Tian, et al. 2008]. For example, in the Lung Cancer project (which consists entirely of European-ancestry subjects), when using all autosomal SNPs with missing call rate less than 5% (~545k SNPs), the first two PCs separate U.S. and Italian subjects, while the third PC separates both U.S. and Italian subjects into three distinct groups. These three groups correspond to the genotypes of a cluster of highly correlated SNPs in 8p23, a region that contains a polymorphic inversion. The same result was found previously in PCA of other European-ancestry populations [
Novembre, et al. 2008]. The highly localized features underlying some principal components may limit their usefulness in detecting and controlling for population structure. Moreover, they may even be counterproductive when used as covariates in association testing for traits affected by SNPs in those chromosomal regions. Therefore, when adjusting for potential population structure, we recommend against the use of PCs that are highly correlated with localized SNP clusters.
One approach to avoiding the strong influence of SNP clusters is to prune the full genome-wide SNP set before PCA to obtain a subset of SNPs in which all pairs have low correlations. In the T2D project, we compared two such SNP sets, one reported by
Yu et al. [2008] selected to have pair-wise linkage disequilibrium (LD) of r
2<0.004 and minor allele frequencies (MAF)>0.05 in a European-ancestry population. The other SNP set we selected, from among the 870,000 autosomal SNPs assayed on the NHS subjects, to have LD r
2<0.04 and MAF>0.05. Both sets contained about 12,000 SNPs and the overlap is 445 SNPs. The first two eigenvectors obtained from the two SNP sets are very similar, whereas the third, fourth and fifth have much lower correlations (
Figures S7 and
S8). Similar sensitivity to SNP selection has been observed in other projects. These results suggest that, beyond the first one or two components, eigenvectors ordered by sample eigenvalues may not be robust indicators of population structure. However, we note that this does not rule out the eigensystem as a whole being similar across SNP sets. When subsets of SNPs are used for PCA, we recommend exploring the variability in PCA-based representations of the data under different SNP set selections.
Case-control associations with population structure and experimental factors
To check for association between case-control status and population structure, we test for an association between disease status and the first two eigenvectors from the PCA of each population subgroup of interest. No significant associations have been found so far, which may be a reflection of study designs that carefully match the geographic origins and other characteristics of cases and controls. We also test for a difference in missing call rate per sample between cases and controls, as a way of detecting association with experimental factors. No significant differences have been found so far, except for a special case in the Addiction project described below. In addition to case-control status, we test for correlation between missing call rate and quantitative traits of interest. For example, in the T2D NHS project, the correlation between body mass index and missing call rate is small and not significant (r=−0.01, p-value=0.28).
The Addiction project has three categories of case status: (a) alcohol and possibly other illicit drug dependence (‘case’), (b) controls exposed to alcohol, but never addicted to alcohol or illicit drugs (‘control’) and (c) addicted to illicit drug(s) but not to alcohol (‘other’). There is a significant association between case status and genotyping batch, which could lead to bias in case-control allelic frequencies, although the occurrence of 213 batches with median size of 21 samples reduces the magnitude of any potential problem. Using analysis of variance, there is a significant effect of case status on the logarithm of missing call rate (p<0.01), which is due to the ‘other’ category having a higher rate than the other two (0.12% versus 0.10%,
Figure S9). This effect appears to be due to confounding with the DNA source, which was either blood or cell line. Among the DNA samples in the ‘other’ category, 77% are from cell lines, whereas the values for alcohol cases and controls are 34% and 25%, respectively. The missing call rate for cell lines is very significantly higher than for blood (p<9 × 10
−14), as shown in
Figure S9 (0.12% versus 0.10%). Therefore, allelic frequency differences between ‘other’ versus ‘case’ and ‘control’ categories are potentially biased by nonrandom missingness. In a situation like this, it may be useful to adjust for tissue type in the association analysis, by including it as a regression covariate.
We have found significant effects on the missing call rate of several experimental factors in multiple studies, including tissue type, tissue collection date, DNA extraction method and date, study site, plate, well and genotyping batch. Although confounding makes it difficult to distinguish causative factors, it is prudent to balance these factors with respect to phenotypic traits as much as possible in the design of GWAS experiments.
Genotyping Completeness and Accuracy
Current genotyping technology is very reliable and typically produces data with both high call rates and high accuracy. However, both types of measures should be evaluated for each project because genotyping processes, reagents and instrumentation may vary. The missing call rate is a measure of data completeness, but is also a measure of genotype quality because missingness is often nonrandom. Two methods can be used to assess genotyping accuracy, duplicate sample concordance and consistency with Mendelian transmission.
Genotyping error rates can be estimated from duplicate discordance rates. Each of the three genotypes may be miscalled as either of the other two genotypes, resulting in six potentially different error rates. For a given true genotype, we consider two error rates, α and β. The probability that duplicate genotyping instances of the same subject give a discordant genotype is 2[(1− α − β)(α + β) + αβ]. When α and β are very small, this is approximately 2(α + β) or twice the total error rate. In high-density genotyping, the number of SNPs per sample is so high that duplicating a single sample would give a good estimate of overall error rate, assuming that the rate was similar for every sample. However, DNA sample quality may vary considerably so that error rates can vary among samples. Therefore, we recommend using at least five study samples for estimating error rates.
For duplicate sample pairs, the median discordance rates (discordant calls per SNP) are 7 × 10−5 for Lung Cancer (33 pairs, Illumina HumanHap550 array) and 2 × 10−4 for Addiction (60 pairs, Illumina Human1M array), so the genotyping error rates are on the order of 10−4. The corresponding mean completion (call) rates are very high: 99.8% for Lung Cancer and 99.7% for Addiction samples. For the T2D projects run on the Affymetrix 6.0 array, study samples were not duplicated, but multiple replicates of a single HapMap control sample (NA12144) provide discordance rate estimates of 4 × 10−3 for NHS and 1 × 10−3 for HPFS. The corresponding mean completion rates are 99.6% for NHS and 99.7% for HPFS. It appears that the error rate is about an order of magnitude larger for the Affymetrix 6.0 than for the Illumina 1M arrays, although different sample sets and other factors could affect these results.
Duplicate discordance estimates for individual SNPs also can be used as a SNP quality filter. The problem here is to find a level of discordance that would eliminate a large fraction of SNPs with high error rates, while retaining a large fraction with low error rates. For example, if the mean error rate is 10
−4, we may wish to retain greater than 99% of SNPs with error rates less than 10
−3, while eliminating as many as possible of SNPs with error rates greater than 10
−2. For the Addiction project, with 60 duplicates, a threshold of >1 discordant call seems appropriate, since it would eliminate 99.9% of SNPs with an error rate of 10
−1, 33.5% with a rate of 10
−2, 0.65% with a rate of 10
−3 and <0.1% with an error rate of 10
−4.
Figure S10 shows the relationship between the probability of observing greater than 0, 1, 2, or 3 discordant calls and the number of duplicates for different genotyping error rates. These binomial calculations can be used to select the optimum threshold and number of duplicates to achieve various levels of distinction among different error rates. At least 30 pairs are indicated for most situations.
Mendelian Errors
Mendelian errors can be detected in parent-offspring pairs or trios. In principle, this method of error detection is less efficient than evaluating concordance of duplicate samples because some genotyping errors are consistent with Mendelian inheritance (e.g. offspring of AB and BB parents, with a true BB genotype called as AB). However, Mendelian errors can be used to detect clustering problems that are not detectable with duplicate concordance. For example, consider a SNP assay in which the AA and AB clusters merge together and are both called as AA, while the actual BB cluster is called as AB. In this case, two AB parents (both called as AA) with a BB offspring (called as AB) would generate a Mendelian error. Similarly, Mendelian errors can detect SNPs with null alleles (N) segregating. For example, one parent as AN, the other as BB and the offspring as BN would give apparently inconsistent genotypes of AA, BB and BB, respectively. In both cases, a duplicate sample would give concordant results
The GENEVA studies analyzed to date are not family-based, but the Addiction and Lung Cancer projects included a small number of HapMap trios as genotyping controls. For each SNP in the Addiction project, the Mendelian error rate was calculated as the number of errors detected divided by the number of families in which the offspring and at least one parent have non-missing genotypes. Among the 1,040,106 SNPs with a possibility for error detection, 99.1% have no errors and the mean error rate is 0.04%.
Hardy-Weinberg Equilibrium Testing
We use an exact test for Hardy-Weinberg equilibrium [
Haldane 1954] (HWE) performed on unrelated control subjects with relatively homogenous ancestry. In quantile-quantile (QQ) plots, all four projects show deviations of observed from expected p-values at about 0.01 for autosomal SNPs (e.g.
Figure S11). It is not clear how many of these deviations are due to genotyping artifacts and how many are due to true genotypic frequency deviations from HWE, but examination of cluster plots indicates that most of the extreme deviations are due to poorly performing SNP assays.
The HWE test appears to detect different types of genotyping artifact on the two genotyping platforms. shows the HWE p-value versus minor allele frequency for the Addiction and T2D NHS projects. The pattern of extreme HWE deviations is strikingly different. The Illumina data (Addiction) show a curve of low p-values that corresponds to SNPs for which one homozygous class is missing, or nearly so (as indicated by the theoretical plot in
Figure S12 and the color-coding in ). This feature is not observed in the Affymetrix data (T2D NHS) and is likely due to the Illumina calling algorithm setting a limit on the distance between adjacent clusters (which may cause merging of adjacent clusters). The extreme HWE deviations in the Affymetrix data show a different pattern: SNPs with relatively low minor allele frequencies tend to have more very significant deviations than those with high frequency, and there are many SNPs in which the heterozygous class is deficient. This feature may be due to the Birdseed algorithm calling by 96-sample plate, which may make calling genotypes of SNPs with rare alleles more difficult. We should emphasize, however, that extreme deviations are rare with both platforms (<0.3% of SNPs having p-value< 10
−10).
One of the SNP filters that we recommend is based on HWE test p-value. Interpretation of these p-values is difficult because the choice of significance level depends on sample size [
Wakefield 2009]. However, the purpose of the recommended filter is to flag poorly performing assays rather than detecting real deviations in the population, so we examine genotype cluster plots to set a threshold for filtering. In all four studies described here, these plots show that many assays with p-values between 10
−3 and 10
−4 have good clustering and genotype calling whereas many of those with p-values less than 10
−4 are of poor quality. (For example, in the Lung Cancer project, among 48 plots in the range of p=10
−6 to 10
−4, 12 of 48 plots showed good clustering, whereas in the range of p=10
−4 to 10
−2, 42 of 48 showed good clustering.) Therefore, we recommend filtering at p=10
−4 for these four studies. Other studies may require a different threshold to account for variations in sample size and genotyping technology.
Sample Exclusion and Filtering
Samples are designated for exclusion if they are of questionable identity (e.g. unresolved gender mismatch) or of unacceptable quality (e.g. appear to be contaminated). All remaining samples are posted on dbGaP, but we recommend that filters are applied prior to association testing. In some cases, the filters apply only to certain chromosomes of a sample (e.g. chromosome aberrations). We recommend filtering out samples with an overall missing call rate greater than 2% and those that are PCA outliers from all major ethnic groups in the study. The percentages of data lost by application of the sample filters is 1.6% for Addiction, 0.5% for Lung Cancer, 1.6% for T2D NHS and 0.5% for T2D HPFS.
The presence of low quality samples during genotype calling may affect the cluster definitions and, therefore, the accuracy of genotype calls for high quality samples. This effect was demonstrated by
Pluzhnikov et al. [2008] in a project that used the Affymetrix 5.0 array with Birdseed v2 calling by plate. Eight low quality samples were detected with high heterozygosity and “unusual patterns of relatedness”. These samples were all on one plate, which had a disproportionately high number of cases. False positive associations were found and these remained after simply removing the eight low quality samples. Re-calling of the remaining samples on the affected plate was necessary to remove the false positives.
The GENEVA T2D projects were genotyped on the Affymetrix 6.0 array with Birdseed v1.33 calling by plate. A total of 28 (NHS) and 26 (HPFS) low quality samples were found with high heterozygosity, a high level of relatedness to other samples, and poor BAF plots (see
Supporting Information for more details). However, important differences from the situation described by
Pluzhnikov et.al [2008] are that the samples are distributed across many plates and we did not find evidence of spurious associations. The QQ plots show low genomic inflation (
Figure S14) and genotypic cluster plots for the top hits (after SNP filtering described below) are generally of good quality. Nevertheless, the possible effect of low quality samples was investigated in two ways.
First, for the HPFS project, we estimated the concordance between a HapMap control (NA12144) run on each plate and the consensus genotype calls of 139 replicate genotyping instances of this subject from an independent study using the same array and calling algorithm (and using SNPs that pass the quality filters described below). The mean discordance is very low for plates with and without low quality samples and the difference is not significant (15.8 × 10−5 for 15 plates with low quality samples and 9.8 × 10−5 for 14 plates without such samples; p-value=0.72). Second, we re-called a sample of 8 plates from each study, four plates with and four plates without low quality samples. The maximum number of low quality samples per plate was five. The discordance between the original and recalled genotypes is significantly higher for plates from which low quality samples were removed (p-value=9 × 10−4) and it varies significantly with the number of samples removed (p-value=2 × 10−4). However, the discordance is very low for both types of plates. The highest median discordance between original and recalled genotypes is 6 × 10−5 for an HPFS plate with five low quality samples. For comparison, the median discordance between 15 HapMap control samples from HPFS (all from plates with no low quality samples) and the external consensus reference is 1.4 × 10−4. Therefore, the effect of recalling after low quality sample removal (measured as discordance between original and recalled genotypes) is less than independent genotyping of the same sample.
We concluded that the substantial effort required to recall and reanalyze all of the affected plates in the T2D studies (14/29 in HPFS and 15/41 in NHS) was very unlikely to make a significant improvement in the quality of the data, so these data sets were released to dbGaP without doing so. However, we advise GWAS analysts to consider re-calling a plate whenever one or more of the following occur: (a) a significant portion of the plate consists of low quality samples; (b) the plate is an outlier with respect to median missing call rate and/or the allelic frequency-plate association test; (c) the study has high genomic inflation and low quality cluster plots for association test hits; (d) case-control association with affected plates.
In the two Illumina projects, we did not detect unusual patterns of relatedness or evidence of mixed samples. No low quality samples were detected in the QA process for the Lung Cancer project. In the Addiction project, one problematic genotyping batch of 3 samples was detected and removed from the data set (of which only one sample was used in cluster definition). For Illumina genotype calling, all samples (except those with call rate < 98%) are used for cluster definition, so this system is much less susceptible to the influence of a few low quality samples than the by-plate Birdseed calling for Affymetrix projects. In another GENEVA Illumina project (performed by the Broad genotyping center and not described in this article), we evaluated the effect of recalling after removing 846 low quality scans (out of 2970) and replacing them with high quality scans from the same subjects. The discordance between genotype calls in the original and recalled genotypes was extremely low: 1594 of 2124 samples had no discordance and the highest discordance rate was 9 × 10−6.
SNP Filtering
Data for all SNPs released by the Genotyping Centers are posted on dbGaP, but we recommend filtering association test results based on missing call rate, duplicate discordance, Mendelian errors, sex differences in allelic frequency and heterozygosity, and MAF. The thresholds vary among studies according to quality metric distributions and genotyping platform. For both the Illumina Human1M and Affymetrix 6.0 arrays, the fraction of SNPs that were either failed by the Genotyping Center QC or flagged for filtering during QA is about 7% without the MAF > 0.01 filter (), while the corresponding figure for the Illumina HumanHap550 array is 2%. The recommended MAF filter level is based on power to detect associations. However, for comparison among studies, the filters in are all set to MAF>0.01 in subjects from the United States with primarily European ancestry. For this ethnic group and MAF criterion, the percent of SNPs lost from the Affymetrix 6.0 and Illumina Human1M arrays after both quality and MAF criteria are applied is about 20%, while that for the Illumina HumanHap550 array is 6%. The table also shows that genome coverage (estimated for HapMap II CEU subjects) is decreased by only 1–2% due to the recommended filters.
Supporting information provides data about overlap of SNP filters within and among studies and the physical distribution of SNPs that fail to pass these filters.
| Table ISNP failure and recommended filter criteria with results from 4 GENEVA projects. |
Preliminary Association Tests
Our final data cleaning step is to perform preliminary association tests and then examine QQ, Manhattan signal, regional association and genotype cluster plots. We use logistic regression and likelihood ratio tests for case-control studies, using samples filtered by quality criteria and retaining unrelated subjects. Initially, we select which of the following covariates to include in the model: age, sex, recruitment center and the first several eigenvectors from the PCA. These potential covariates are analyzed in models that exclude genotype and those with significant effects are included in the final model. We then include the genotype (coded for an additive model) for each SNP in turn and test for SNP effects with a likelihood ratio test. We recommend examination of cluster plots for the ‘top hits’ (most significant SNPs) in an association study and flag results for any SNPs that show poor clustering. Examples of QQ and cluster plots are illustrated for the Addiction study in
Figure S15. Another check on the quality of top hits is to examine Manhattan signal and regional association plots of association test p-value versus chromosomal position. As noted previously [
Wellcome Trust Case Control Consortium 2007], valid associations are likely to appear as a small cluster of SNPs with low p-values, unless the sentinel polymorphism is in a SNP-poor region.
The benefits of attention to QC/QA of genotypic data are difficult to quantify, but some examples have been reported.
Pluzhnikov et al. [2008] describe a genotyping plate effect (due to a small number of low quality samples) that resulted in spurious associations. The
Wellcome Trust Case Control Consortium [2007] (supplement) reported a decrease in the genomic inflation factor with the application of a series of quality filters and we observed a similar effect in the GENEVA Addiction study, where the genomic inflation factor changes from 1.08 to 1.04 after filtering.