The central goal of our work was to estimate the functional impact of germ line copy number variation in vivo
. To achieve this goal, we first identified CNVRs in twenty inbred strains at the highest resolution reported to date. We discovered 1,333 CNVRs spanning approximately 3% of the mouse genome. On average, there are over 300 CNVs per strain. As predicted, we found that the frequency of CNVRs increased with decreasing CNVR length, but that short CNVs account for only a small fraction of the total copy number variable sequence content of the mouse genome. We speculate that this trend will hold as higher resolution technologies are developed. Unexpectedly, we found that small CNVs (<10 kb) lack the enrichment of highly homologous sequences that frequently flank, and are presumed to contribute to the formation of medium (10–100 kb) and large (>100 kb) CNVs. Determining the mechanisms that generated these CNVs would facilitate the design of targeted assays to detect new CNVs and provide a better understanding of the forces that shaped the mouse genome. We are aware of only one report documenting similar short deletions in a small number of human genomes and therefore a mouse-to-human CNVR comparison will be informative as high-resolution human data become available41
. A caveat of our CNVR map is that, as is true for all comparative genomic hybridization experiments, we were limited to finding variants in comparison to a reference sequence; sequences that do not exist in the C57BL/6J genome but vary in copy number among other strains were not detected. Therefore, the total extent of copy number variation relative to the union of all inbred mouse genomes must await comprehensive sequencing of other strains. However, a reasonable estimate of the amount of mouse genomic sequence lost in the C57BL/6J strain is the amount of genomic material lost per strain relative to C57BL/6J, which ranged from 16.8 to 33.8 Mb (mean = 25.5 Mb).
Using a relatively small number of inbred mouse strains, we found that all classes of CNVs were associated with gene expression changes in a variety of tissues. We found that 28% of strain-specific expression traits were associated with copy number variation in the hematopoietic progenitor/stem compartment, consistent with the 18% previously reported in human lymphoblastoid cell lines42
. To validate these eQTLs, we inferred the CNVR genotypes of the BXD RI panel and analyzed publicly available KLS expression data. Over 29% of the testable KL eQTLs were supported in the BXD data set, a striking concordance given the substantial experimental and biological differences between the studies. We also detected many CNVR eQTLs in adipose tissue and hypothalamus, even though these data were produced with different mice, using different expression platforms, and the eQTL analysis was performed with 25% fewer strains. Much of the recent speculation on the potential impact of CNVs on phenotypic variation has centered on gene-dosage effects43
. However, we found that only 7.3% of CNVR eQTLs contain the associated expression probe and therefore were due to gene-dosage effects. Presumably, the remaining CNVR eQTLs reflect expression variation mediated by alteration of regulatory material or local chromatin structure. This would be consistent with a model where (subtle) alterations in expression patterns are better tolerated than complete or partial gene gains or losses.
Some of the CNVR eQTLs reported here may be in linkage disequilibrium with another allele causing the associated expression change, underscoring the need to characterize the relationship between CNVs and other genetic variants. It is likely that there are additional eQTLs not detected here: CNVRs that alter expression in only one or two strains, trans
eQTLs, eQTLs that associate with genes expressed in tissues not sampled here, and eQTLs with weak effects. Increasing the number of strains and the tissues sampled would address some of these limitations. However, extending this work to a much larger population with greater genetic diversity (i.e., the Collaborative Cross44
) would increase the power to detect trans
and weaker effects and therefore enable a clearer understanding the overall impact of CNVR on expression variability. Future work must reach beyond identifying statistical associations to better characterize the mechanisms by which a CNVR affects phenotypic (including expression) variation. In addition to estimating the impact of CNVRs on expression variation, the CNVR eQTLs reported here may be of practical value in identifying the causal variants in traditional QTLs because they present plausible hypotheses linking genetic differences between inbred strains to complex traits.