We have considered six methods of increasing sophistication for testing for CNV-disease association. The first method is a nonparametric test (Mann-Whitney U test) for location shift between cases and controls of the distribution of quantitative CNV measurements (‘method 1’). The remaining five methods attempt to classify the individuals into different copy number classes, and to test for differences between the frequencies of these copy number classes between cases and controls ( shows these five methods schematically). The simplest of these classification methods involves simple binning of individuals into copy number classes on the basis of predefined thresholds, together with an association test on the resulting contingency table (‘method 2’).
Figure 2 Methods for performing CNV-association testing. (a) In association studies, inference of genotypes from data and association testing of genotypic data are generally treated as separate statistical problems; however, the two underlying models can be combined (more ...)
Our next three methods closely mirror the conventional analyses for SNP genotyping; the quantitative CNV measurement distribution is modeled using a Gaussian mixture model and individuals are then assigned to copy number classes on the basis of their maximum a posteriori probability (see Methods for details). Again, an association test is then applied to the resultant contingency table. In the first of these, the Gaussian mixture model was fitted to the cases and controls combined and each subject was assigned a copy number, irrespective of the confidence of assignment (‘method 3’). This strategy was then adapted to address the problem of differential misclassification by fitting the mixture model independently in cases and controls, thus allowing for shifts in the underlying measurement distributions between groups (‘method 4’). The intuition that differential misclassification bias is removed by simply scoring cases and controls independently (‘method 4’) potentially carries a subtle flaw; in fitting the mixture model, differences in copy number frequency between groups is tacitly assumed, and this could lead to spurious inflation of any differences unless uncertainty implicit in the mixture modelling is correctly propagated into the later association test (discussed in detail in ref. 12
). The next strategy was to assign copy numbers to subjects only if the posterior probability for the assignment exceeded a threshold (0.95, ‘method 5’). This last strategy was explored because it is widely used. Its use is probably suggested by the intuition that, by removing the most uncertain data, the bias caused by measurement error is minimized. However, experience of SNP genotyping brings this intuition into question, as application of stringent call quality filters can generate a different sort of bias as a result of nonrandom missingness.
To address the problems associated with fitting mixture models to cases and controls separately or independently, we developed a method to integrate CNV scoring (data model) and association testing (genetic model) into a single statistical model, and test for association using a likelihood ratio test (‘method 6’). illustrates the elements of this integrated model. It allows for direct influences of case or control group (phenotype) on the CNV measurement distribution, as well as the indirect association via copy number frequency. The likelihood ratio test compares maximized likelihoods for the model with or without an association between copy number and phenotype (shown as a broken arrow in the figure).
Figure 3 Modelling the dependency between copy number and disease. (a) Naïve model in which any dependency between disease phenotype and quantitative measurements of copy number is assumed to be due to differences in the distribution of copy number between (more ...)
Method 1 is a test for a simple monotone relationship between disease risk and diploid copy number, and, for comparability, the remaining five methods have been implemented to be maximally sensitive to this type of relationship by computing trend tests (with 1 degree of freedom (d.f.)). Thus, in methods 2 through 5, we use the Cochran-Armitage test for trend in the contingency table obtained by assignment of subjects to diploid copy number classes. Similarly, method 6 yields a 1-d.f. test when the relationship between copy number and phenotype specified in the model has a simple linear-logistic form. These methods could also be adapted to test for nonlinear effects, analogous to ‘dominance’ terms in the classical biometric model. However, as when testing conventional markers, power is lost when such terms are small and the relationship is monotonic. Here we concentrate on 1-d.f. tests, although our integrated model approach can be generalized to other genetic models, and these methods are implemented in our software package.
We also note that systematic errors due to differing measurement distributions are not only related to case or control group membership. Experimental batch effects are often evident. Such effects can also lead to spurious associations, and should also be taken into account in the analysis. Our likelihood ratio testing framework can explicitly model these batch effects when the batches are large enough that the parameters of the mixture model within each batch can be robustly estimated.
False-positive rates of different CNV-association procedures
We carried out simulations to explore how differential errors and clustering quality influence the type 1 error of the six association testing methods outlined above (Supplementary Methods online).
We generated datasets that varied in signal-to-noise ratio (clustering quality, denoted by Q), as measured by the ratio of the separation of cluster means to the cluster s.d. (Supplementary Methods). We also explored the sensitivity of the type 1 error rate to small differences in cluster means and variances. shows to what degree the test statistics for all six methods are inflated when compared with their expected distributions. Even small location differences in the CNV measurement distribution between cases and controls can lead to massively inflated type 1 error if Mann-Whitney testing or a priori binning are used. Copy number assignment using mixture models performs better, particularly when cases and controls are scored independently (method 4), but appreciable overdispersion remains. This inflation of the test statistic in method 4 results from over-estimating the confidence of copy number assignment through constructing a contingency table, and from overestimating the differences in copy number frequencies between cases and controls through fitting the mixture model to cases and controls separately, which allows the nuisance parameters to vary between the two models. This is effectively equivalent to fitting mixture models under the alternate hypothesis that copy number frequencies do indeed differ between cases and controls. As a result, the true variance of the score test statistic is greater than the naïve estimate. By contrast, the likelihood ratio test (method 6) estimates all parameters under the null (no copy number differences) and alternate (copy number differences exist) hypotheses and thus provides the most robust test. As expected, imposing stringent call thresholds (method 5) does not remove the overdispersion, but rather exacerbates it.
Figure 4 Sensitivity of 1-d.f. association testing methods to clustering quality and differential errors between cases and controls in simulated data. Six alternative association methods are considered: (i) Mann-Whitney testing for difference in location of CNV (more ...)
We also investigated the performance of these methods in empirical CNV data in which we expected no true associations to exist. For this purpose, we analyzed 95 known CNVs (Supplementary Table 1
online) from Affymetrix 500K SNP genotyping data collected on two UK control populations, each of ~1,500 individuals, as part of the Wellcome Trust Case Control Consortium (WTCCC)14
. In one group (the 1958 Birth Cohort sample) DNA was obtained from EBV-transformed cell lines while, from the other (UK National Blood Service controls), DNA was from fresh blood. Association tests on these 95 CNVs, which differ in numbers of alleles, clustering quality and allele frequencies, show substantially less overdispersion of χ2
statistics using the likelihood ratio trend test (λ
= 1.1) as compared to Cochran-Armitage testing of separate mixture model assignment of the same CNVs with (λ
= 1.74) or without (λ
= 1.58) allowing for differential errors (Supplementary Fig. 1
online). This overdispersion is significantly lower for the likelihood ratio trend test (P
< 0.05 for Wilcoxon signed rank test comparing test statistics produced by the likelihood ratio test and either mixture model assignment method), but is not significantly different between the two mixture model assignment methods (P
> 0.05). The small degree of overdispersion observed in the likelihood ratio test statistics was not statistically significant.
Within the integrated model framework it is also possible to carry out a likelihood ratio test for difference in cluster means and variances between the two groups (Supplementary Methods
). We first showed, in simulations in which there were no CNV measurement distribution differences, that this likelihood ratio test statistic had the expected χ2
distribution. In contrast, in the WTCCC data these statistics are considerably overdispersed, demonstrating that differential errors are pervasive in this empirical example (Supplementary Fig. 2
online). The same test also shows highly significant differential bias in the array-CGH and PRTexamples shown in (P
and 1.1 × 10−12
for panels B and C, respectively). These observations confirm that the features of copy number data modeled in the simulations are indeed present in empirical data from different platforms, including SNP genotyping, array-CGH and PRT datasets, and that accounting for differential errors is essential for robust CNV-association testing.
Maximizing information from probes in the same CNV
All the association methods described above require a single measure to discriminate between different copy numbers. However, many CNV assay methods use multiple probes in each CNV (for example, each CNV in the Affymetrix 500K SNP genotyping data is identified by multiple SNP probe sets), so some method for summarizing these measurements is necessary. The obvious approaches are to use the mean or median. However, these are not optimal, as different probes differ in their informativeness, not least because copy number region boundaries can be uncertain. We developed two improved procedures to weight the information from each probe within each CNV region (Supplementary Methods
). The first procedure is to use the first principal component from the intensity data from different probes. This, by downweighting probe intensities uncorrelated with the remainder, generally gives a better separation of different copy numbers than the mean or median of all of the probe intensities. However, we suspected that the weights would still not be optimal, and developed a one-step refinement of these scores: we fitted the Gaussian mixture model to the principal-component scores and then used the estimated CNV assignments to compute an optimal linear discriminant function of probe intensities. We demonstrated that these procedures resulted in improved clustering quality across these 95 CNVs in the Affymetrix 500K data from the WTCCC ( and Supplementary Fig. 3
online). We suggest that such procedures will have general utility for many applications (for example, array CGH) where multiple probes identify the same variant. As expected, the improved summary methods provide considerable protection against overestimation of CNV boundaries (Supplementary Fig. 4
online). Given that the vast majority of known CNVs do not currently have precisely mapped breakpoints, being able to overestimate the extent of CNVs without seriously downgrading measurement quality is a significant advantage.
Figure 5 Statistical power of the likelihood ratio trend test. (a) Clustering quality resulting from alternative probe summary methods for 95 CNVs: linear discriminant function (LDF), principal components analysis (PCA) and arithmetic mean (mean). (b) Statistical (more ...)
We then assessed the statistical power of the likelihood ratio trend test using simulated data across a range of signal-to-noise ratios. The statistical power of the likelihood ratio trend test can be estimated by a quadratic approximation of the profile likelihood (see Methods). We observed that when copy number clusters are discrete the likelihood ratio trend test achieves the maximum theoretical power, but that power falls off rapidly with decreasing clustering quality (). The loss of power is much more pronounced when the model allows for different measurement properties between cases and controls, which reflects the increasing difficulty in distinguishing between association and differences in measurement properties as clustering quality declines. This result is replicated in the empirical data on 95 CNVs described above (). We noted that this marked fall-off in power is even more pronounced when copy number frequencies are low, owing to the increased difficulty of accurately modelling measurement distributions.
Although it could be argued that the more serious loss of power due to the need to model differential errors points to a need for better study design rather than additional statistical sophistication, such effects are very difficult (and often impossible) to exclude. Rarely can cases and controls be approached in strictly comparable circumstances that ensure identical DNA handling. Moreover, prospective group studies will rarely yield sufficient numbers of cases of disease to detect modest effect sizes. Family-based association studies will, perhaps, face fewer difficulties in this respect.
We generalized the likelihood ratio trend test for use in quantitative trait association by replacing the logistic regression for dependency between copy number and phenotype in our model by a simple linear regression (LR-QT test). Although studies of quantitative traits are often carried out in a manner that effectively excludes the differential errors that largely concern us here, this may not always be the case; notably, differential errors can be introduced by experimental batches, which may be confounded with the trait (for example, when extremes of the trait distribution are targeted). Although careful study design may control type 1 errors, we have also shown by simulation (Supplementary Fig. 5
online) that our approach, without allowance for measurement distribution differences, is more powerful than simple tests on the CNV measurements based on linear regression4
. This advantage is maintained over a wide range of signal-to-noise ratios (clustering qualities).
Empirical examples of positive associations
We explored the performance of the likelihood ratio trend test on known CNV associations for both binary disease traits and quantitative traits in empirical data. Type 1 diabetes (T1D) is known to be strongly associated to the MHC class I region, which contains several CNVs and across which there is long-range linkage disequilibrium. Therefore, we should expect to see indirect association between T1D and these CNVs. We confirmed that the likelihood ratio test does indeed identify a highly significant association (P = 0.001) with a CNV in the class I MHC that can be detected in the WTCCC data ().
Figure 6 Examples of empirical CNV associations. (a) Association with a binary disease trait, type 1 diabetes (T1D). The red shaded area represents a density plot of copy number measurement in each group. The two WTCCC control groups come from the 1958 Birth Group (more ...)
We also applied our likelihood ratio method to published case-control data used to support an association between copy number of β-defensin genes and psoriasis11
. These PRT data comprise cases and controls from two populations, Dutch and German. We observed clear evidence for different measurement properties (batch effect) in the German controls relative to Dutch controls (). Moreover, the German but not the Dutch data show significant P
values for differential bias between cases and controls (3.6×10−6
and 0.38, respectively). Nevertheless, both Dutch and German populations show significant evidence of association (P
= 0.002 and 0.02, respectively). Upon joint modelling of all four collections, in which we allow the locations and variances of copy number components to vary between all four groups and assume that the magnitude of effect is the same in both populations, we obtain a P
value of 6.5 × 10−5
. Thus, our analysis suggests that although differential errors between groups are clearly present in these data, the published association is not attributable to differential errors, and illustrates the potential for increasing power by applying our method to combining case-control data across datasets.
Finally, we applied the likelihood ratio test for quantitative trait association to a previously published association4
between a multiallelic CNV detected by array CGH and gene expression in the HapMap lymphoblastoid cell lines, and we demonstrated that the likelihood ratio test gives increased power over and above the nominal P
values obtained from linear regression of normalized gene expression against either the intensity data or the mixture model assignment (), thus corroborating our simulation results.