|Home | About | Journals | Submit | Contact Us | Français|
In an analysis of 31,717 cancer cases and 26,136 cancer-free controls drawn from 13 genome-wide association studies (GWAS), we observed large chromosomal abnormalities in a subset of clones from DNA obtained from blood or buccal samples. Mosaic chromosomal abnormalities, either aneuploidy or copy-neutral loss of heterozygosity, of size >2 Mb were observed in autosomes of 517 individuals (0.89%) with abnormal cell proportions between 7% and 95%. In cancer-free individuals, the frequency increased with age; 0.23% under 50 and 1.91% between 75 and 79 (p=4.8×10−8). Mosaic abnormalities were more frequent in individuals with solid-tumors (0.97% versus 0.74% in cancer-free individuals, OR=1.25, p=0.016), with a stronger association for cases who had DNA collected prior to diagnosis or treatment (OR=1.45, p=0.0005). Detectable clonal mosaicism was common in individuals for whom DNA was collected at least one year prior to diagnosis of leukemia compared to cancer-free individuals (OR=35.4, p=3.8×10−11). These findings underscore the importance of the role and time-dependent nature of somatic events in the etiology of cancer and other late-onset diseases.
Classically, genetic mosaicism is defined as the co-existence of cells with two or more distinct karyotypes within an individual that results from a post-zygotic event during development and can occur in both somatic and germline cells1,2. Errors in chromosomal duplication and subsequent transmission to daughter cells may lead to aneuploidy, the gain or loss of chromosomes or segments of chromosomes, and reciprocal gain and loss events manifesting in copy-neutral loss of heterozygosity (cnloh) or acquired uniparental disomy. Somatic mosaicism has been established as a cause of miscarriage, birth defects, developmental delay, and cancer3-9. Because mosaicism can be benign or may manifest with diverse clinical phenotypes, there are no accurate estimates of its frequency in the general population3,6. On rare occasions the propensity to develop chromosomal abnormalities is inherited and leads to multiple phenotypic abnormalities including cancer predisposition as reported in families with mutations in BUB1B and CEP57 10,11. Recently, two groups have identified somatic mosaic mutations in IDH1 and IDH2 in tumors of individuals with Ollier disease and Maffucci syndrome12,13 while another group has characterized somatic mosaicism of a HRAS mutation in an individual with urothelial cancer and epidermal nevus14. Recent work in a population of twins has suggested that the detection of somatic structural variants in blood increases with aging and may be related to reduction in blood cell clonality15. In this report, we define mosaic chromosomal abnormalities broadly: the presence of both normal karyotypes and those with large structural genomic events resulting in alteration of copy number or loss of heterozygosity in distinct and detectable subpopulations of cells regardless of the clonal or developmental origin of the subpopulations.
Recently, we reported on 1,991 individuals from the Spanish Bladder Cancer/EPICURO population-based case-control study in which we had performed a GWAS of adult-onset bladder cancer using DNA obtained from blood or buccal samples16. The SNP array data generated for the GWAS was subsequently used to detect clonal mosaic abnormalities in the autosomes of 1.7% of study subjects, suggesting a higher frequency in adults than previously suspected. Even though somatic mosaicism has been implicated in several cancers, this study did not reveal a significant difference in frequency between cases and controls. A computational algorithm was used to detect 42 large mosaic events involving two or more distinct clones in DNA extracted from blood or buccal samples and we experimentally validated the findings using multiplex ligation-dependent probe amplification (MLPA) and microsatellite analysis (as well as fluorescent in situ hybridization in a subset), establishing the robustness of the software detection method. A similar proportion of cells carrying each event was found in 5 of 6 events (in four individuals with bladder cancer in whom three had one event and one individual with three separate events) in which it was possible to examine more than one tissue (whole blood and bladder mucosa), suggesting an early embryonic origin of the somatic mutation leading to the observed mosaic chromosomal abnormalities16.
In this report, we extend our analysis of clonal mosaic abnormalities in the autosomes to 57,853 individuals (including those previously published16). We tested 31,717 cancer cases and 26,136 cancer free controls for evidence of mosaic abnormalities using genome-wide SNP array data generated as part of 13 distinct cancer GWAS drawn from 48 epidemiological case-control and case-cohort studies (Supplementary Table 1). DNA samples were extracted from blood or buccal samples using a variety of collection and extraction techniques and genotyped using one or more Infinium Human SNP arrays from Illumina Inc. (including versions of Hap300, Hap240, Hap550, Hap610, Hap660, Hap1, Omni Express, and Omni1). Genotype clusters were empirically estimated in 45 batches to optimize accuracy while minimizing potential batch effects (Online Methods).
Detection of clonal mosaic events was based on assessment of allelic imbalance and copy number changes. We used the B-allele frequency (BAF) measurement, derived from the ratio of probe values relative to the locations of the estimated genotype-specific clusters, for initial segmentation using the Mosaic Alteration Detection (MAD) algorithm implemented in GADA-R with modifications17,18. The BAF and log2 relative probe intensity ratio (LRR), which provides data on copy number, were used to classify each event as copy-altering (gain or loss) or neutral (reciprocal gain and loss resulting in loss of heterozygosity, LOH) and to assign the proportions of abnormal (p) and normal (1-p) cells. Mosaic proportions were required to deviate from levels expected from constitutional (non-mosaic) changes in order to exclude homozygous chromosomal segments inherited identical by descent and non-mosaic instances of trisomy, monosomy and uniparental disomy. A minimum event size threshold was set to detect only clonal mosaic events greater than 2 Mbps to minimize the false discovery of constitutional copy number variants. Copy-neutral LOH and copy-loss events could be detected for mosaic proportions between 7% and 95% (Figure 1) with sensitivity that was affected by the signal-to-noise ratio characteristic of each microarray assay and sample quality. There was reduced sensitivity to distinguish between copy-neutral LOH and copy-loss events for mosaic proportions less than 15% across the autosomes. The magnitude of BAF differences for single-copy gain events was 1/3 of the magnitude of copy-neutral LOH or copy-loss events, reducing the sensitivity for calling copy-gain events. As a result, single copy gain events could only be reliably detected for mosaic proportions between 22% and 88%, with ambiguity in distinguishing copy-gain from copy-neutral LOH for mosaic proportions of less than 20%. Since DNA was obtained for the purpose of performing a GWAS, it was not possible to further explore the developmental and clonal characteristics of mosaic events detected in these individuals (e.g. by studying DNA from fractionated blood and other tissue types, determining cell composition of buccal samples, or effect of DNA collection and extraction methods on detection and accuracy of the estimation of mosaic proportions). We report only autosomal chromosomal abnormalities, as the analysis of the sex chromosomes presents distinct technical and interpretative challenges.
We observed 681 mosaic segments of size greater than 2 Mb on 641 autosomal chromosomes in 517 individuals for an overall frequency of individuals with mosaicism of 0.87% (Tables 1 and and2).2). The most frequent type of event observed was copy-neutral LOH (48.2%), while copy-gains and copy-losses were observed for 15.1% and 34.8% of mosaic events, respectively (Table 1). A small proportion (1.9%) of mosaic chromosomes were complex, harboring more than one type of event. 18.7% of mosaic chromosomal events spanned the entire chromosome, including 62 complete trisomies, predominantly in chromosomes 8, 12 and 15. 47.9% of mosaic chromosomal events began at a telomere and extended across some portion of the chromosomal arm (Table 1 and Figure 2). The majority of telomeric events were mosaic copy-neutral LOH (85.7%), most frequently on 9p (Table 3). The remaining mosaic chromosomal events were interstitial (31.5%) spanning neither telomere nor centromere, while an additional small proportion (1.8%) spanned the centromere or had more complex structure (e.g. distinct events involving both telomeres, but not the whole chromosome). The majority of interstitial events were mosaic copy-loss (91.6%), which was most frequently observed within specific regions of chromosomes 13q and 20q (Figure 2). We observed 69 individuals (46 cancer cases and 23 cancer-free individuals) with clonal mosaic events on multiple chromosomes. The distribution of the number of clonal mosaic chromosomal events per individual is shown in Supplementary Table 3. Among cancer-free individuals, the greatest number observed was 5 mosaic chromosomal events, whereas six individuals with cancer had greater than 5 events, including two individuals with gastric cancer who each had 20. A list of mosaic events with phenotype data is available as Supplementary Data.
The strongest predictor of mosaic autosomal abnormalities was age at DNA collection. We examined the effect of aging on the frequency of mosaicism across all studies, which were predominantly individuals over the age of 50. The frequency of cancer-free individuals with detectable clonal mosaic events increased with age, from 0.23% for those under 50 to 1.91% (p=4.8×10−8) for those between the ages of 75 and 79, and with slightly higher frequencies for individuals with cancer (Figure 3). In the early onset cancers (under age 40), which constituted less than 5% of analyzed cases (e.g., testicular cancer and osteogenic sarcoma), we did not observe an increase in mosaic abnormalities. Further studies are needed to investigate the relationship between mosaic abnormalities and cancer in children and young adults, particularly because of the strong association between mosaicism and many developmental disorders. There was no apparent relationship between age at DNA collection and the number, size of mosaic events, or the proportion of abnormal cells (Supplementary Figures 1 and 2).
We regressed the presence of detectable clonal mosaicism in 26,136 cancer-free individuals on age at DNA collection (in 5 year intervals), sex (male versus female), DNA source (buccal cells versus blood), smoking (ever versus never) and admixture coefficients for African and East Asian ancestry in a logistic model to determine the additional factors that influenced frequency of detectable clonal mosaicism. The source of DNA was known for 87% of individuals, of whom 19% were derived from buccal cells and the remainder from blood. DNA source was not significantly associated with mosaicism (OR=0.83, 0.55-1.26 95% confidence interval (CI), p=0.39). By admixture analysis, 75% of subjects were determined to be of European ancestry, 9% of African ancestry and 16% of East Asian ancestry. Although power was limited, we observed that cancer-free individuals with African admixture were at a lower risk of being mosaic (OR=0.43, 0.20-0.92 95% CI, p=0.03), but not in those with East Asian admixture (OR=0.60, 0.32-1.15 95% CI, p=0.12). We did not observe an association between smoking (ever/never) and frequency of mosaic abnormalities (OR=1.04, 0.75-1.44 95% CI, p=0.81).
In 26,136 cancer-free controls and 23,093 cancer cases drawn from non-sex specific and non-hematological cancer sites (i.e. excluding 8,470 individuals with leukemia, lymphoma, multiple myeloma and cancers of the breast, endometrium, ovary, testis, and prostate), we observed a higher frequency of males with mosaic abnormalities than females. In cancer-free individuals, we observed mosaic events in 0.56% of females and 0.87% males (OR=1.35, 0.98-1.88 95% CI, p=0.07); for individuals with cancer we observed mosaic events in 0.79% of females and 1.21% of males (OR=1.48, 1.08-2.03 95% CI, p=0.015); and overall, 0.65% of females and 1.04% of males (OR=1.42, 1.14-1.80 95% CI, p=0.002) in logistic models adjusted for cancer diagnosis (if applicable), age at DNA collection, ancestry, DNA source and smoking. These differences could be due to a true sex-specific effect akin to sex-differential mutation and recombination rates19; however the complex and heterogeneous nature of the inclusion of individual studies and the differences in their entry and selection criteria could result in spurious associations. Although this observation was consistent across cancer types, it should be confirmed in additional studies better designed to address this question.
To determine the relationship between detectable mosaic autosomal abnormalities and non-hematological cancers, we regressed the presence of detectable clonal mosaicism on cancer diagnosis, age, sex, DNA source, smoking and ancestry in a logistic model. We observed a modest increase in cancer risk for mosaic individuals (OR=1.27, 1.05-1.52 95% CI, p=0.012) (Tables 2 & Supplementary Table 2). Notable associations were observed in stratified analyses of lung (OR=1.56, 1.18-2.08 95% CI, p=0.002) and kidney (OR=1.98, 1.27-3.06 95% CI, p=0.002) cancers, both tobacco-associated malignancies. However no cancer site-specific associations were observed for bladder, esophagus, stomach and pancreas cancers, which are also typically associated with tobacco use. There was no significant association in non-hematological cancer cases overall between smoking (ever/never) and frequency of mosaicism (OR=1.19, 0.92-1.54 95% CI, p=0.19) or when stratified by cancer site (results not shown).
In an analysis of the subset of 14,050 individuals with cancer for whom it was possible to determine that DNA was likely obtained before or at the time of diagnosis and prior to treatment with radiation or chemotherapy for a primary tumor (designated as “likely untreated”), we observed a stronger association between mosaic abnormalities and non-hematological cancer diagnosis (OR=1.45, 1.18-1.80 95% CI, p=0.0005). The associations for lung and kidney also increased in significance (Table 3). It is notable that the evidence for association with non-hematological cancer diminished in individuals who were potentially treated (OR=1.03, 0.81-1.30 95% CI, p=0.80). We had approached this analysis with the hypothesis that there could be an increased frequency in detectable clonal mosaicism in non-hematological cancers induced by chemotherapy or radiotherapy but were surprised to observe the frequency was reduced to virtually the same as in the cancer-free population. Although this attenuated effect could have many explanations (e.g., related to the diagnosis and treatment of a solid tumor leading to a decrease in populations of cells with mosaic alteration), we had a limited capacity to model and control for treatment-effects since many of the studies did not provide any treatment information or only provided incomplete, retrospective ascertainment of the specifics. Although many of the participating studies were prospectively ascertained cohorts, DNA collection often occurred after cancer diagnosis. Additional studies are needed in prospectively ascertained cohorts and longitudinal studies in which multiple DNA samples were collected prior to and after diagnosis in order to explore treatment and disease effects.
For the 43 individuals with hematological cancers for whom DNA was obtained at least a year prior to diagnosis, the frequency of detectable clonal mosaicism was 20% for myeloid leukemia and 22% for lymphocytic leukemia (predominantly chronic lymphocytic leukemia, Table 2) compared to 0.74% in 26,136 cancer free controls (overall OR=35.4, 14.7-76.6 95% CI, Fisher exact p=3.8×10−11). Of the 8 mosaic individuals with leukemia for whom DNA samples were collected at least a year prior to diagnosis, 4 were diagnosed with chronic lymphocytic leukemia (CLL) of which 2 had a mosaic deletion in a region of chromosome 13q14 previously described to be deleted in CLL20. DNA was obtained more than 5 years prior to diagnosis for 6 mosaic individuals, with the longest interval being 14 years, suggesting that detectable clonal mosaicism could be a marker of hematological cancer or its precursors, i.e., monoclonal B cell lymphocytosis (MBL) for CLL and myelodysplastic syndrome for acute myelogenous leukemia. Recent work shows that the majority of MBL have mono- or biallelic 13q14 abnormalities21. However, further studies will be needed, preferably with serial pre- and post- diagnosis sampling to investigate the predictive nature of detectable clonal mosaicism, especially involving regions of chromosome 13 and 20 with respect to leukemia risk20.
We further explored the 4 most recurrent altered regions (>20), which also harbor well known cancer genes (as noted in the COSMIC22 and Mitelman databases: http://cgap.nci.nih.gov/Chromosomes/Mitelman); these were on chromosomes 9p (cnloh), 13q (del), 14 (cnloh) and 20q (del) (Table 4). Notably, the most recurrent mosaic events were observed in cancer-free individuals as well as across multiple solid tumors. We observed a comparable frequency in non-hematologic cancer cases and cancer-free controls for three of the regions, whereas the chromosome 14 cnloh abnormalities were more frequent in non-hematological cancer cases (OR=3.32, 1.42-9.00 95% CI, Fisher’s exact p=0.003), particularly in individuals with bladder or kidney cancer. Copy-neutral loh in this region of chromosome 14 has been associated with increased susceptibility to sporadic cancers and harbors imprinted genes, such as the tumor suppressing non-coding RNA, Maternally expressed gene 3 (MEG3)8,23. The recurrent segmental deletion of 13q14 was observed in 5 leukemia cases, but also in 18 individuals with solid tumors (9 with lung cancer and 4 with prostate cancer), and in 10 cancer-free individuals. This region includes the tumor suppressor gene DLEU7 (Deleted In Leukemia 7) and related genes, DLEU1 and DLEU2, the latter harboring two microRNAs within one of its introns (miR-15a and miR-16-1)24-26. The retinoblastoma gene, RB1 was also included within a subset harboring a mosaic deletion of 13q14. It cannot be ruled out that these individuals have either undiagnosed CLL or MBL. The 20q- was seen in two individuals with myeloid leukemia as has been described previously27 but also in cancer-free and individuals with solid tumors.
The accuracy of our software methods to detect clonal mosaic abnormalities was previously addressed and we were able to validate 100% of 42 events in 34 individuals from the Spanish Bladder Cancer Study using confirmatory cytogenetic assays16 (Supplementary Figure 3). We have also performed a comparison of mosaic events in samples from the EAGLE and PLCO lung cancer studies which were independently analyzed as part of the Gene, Environment Association studies consortium (GENEVA) report on mosaic events28. A total of 83 mosaic events in individuals from the EAGLE and PLCO lung cancer studies were detected in common, 20 additional events of size less than 2 Mb and 8 events greater than 2 Mb were detected by GENEVA and not by our study, while we detected 20 additional events (size > 2 Mb) that were not detected by GENEVA. Although additional cytogenetic or molecular validation was not performed, neither method detected notable false-positive events based on manual review of the data. The concordance rate is 75% if considering events > 2 Mb (the cut-off for this analysis) or 63% if considering all events, both of which are considerably better than the 25-50% concordance rates observed across CNV detection methods29-31. Our method is more conservative in the size of events detected, while the GENEVA method is more conservative with respect to sample quality, but provides calls for smaller events when assay quality is sufficient. Better approaches are needed to characterize smaller size events accurately as either mosaic or constitutional and to estimate their frequency. Further improvements to data normalization, segmentation and event classification methods will also likely reduce false-negative rates.
Our study has important implications for the design and analysis of molecular epidemiology studies in cancer as well as the somatic characterization of cancer genomes, like The Cancer Genome Atlas32 and International Cancer Genome Consortium33. Investigators will need to carefully analyze samples used as exemplars of germline DNA for somatic alterations, such as detectable clonal mosaicism. Otherwise, comparisons between “grmline” and tumor DNA may result in implausible somatic changes (e.g. large gains of heterozygosity) and it may be impossible to determine whether somatic events pre-date changes secondary to driver mutations. Since how to detect mosaic events with next generation sequencing technologies is neither routine nor well understood, for the near future it may be prudent to continue to utilize SNP microarrays for such analyses. Due to the increased frequency of detectable clonal mosaicism with age, this will be particularly important for the analysis of epithelial cancers, which characteristically occur in the older population. For future large-scale GWAS in prospective studies, it may be wise to consider analyzing the earliest, pre-diagnosis DNA samples and to consider time from collection to diagnosis in the analysis of longitudinally collected biospecimens.
We have extended our initial observation that detectable clonal mosaicism of the autosomes is present in the population with surprising frequency and particularly in the aging genome. A recent study of detectable clonal mosaicism in twins reported an increase in frequency with age and suggested that this reduction could lead to a less diverse blood cell population and immune system15. These emerging data raise a number of critical issues in mechanisms underlying the possible shift in the repertoire of clones with large structural abnormalities. Thus cells with abnormal karyotypes could have an early developmental origin in which a somatic event in a single stem cell progenitor during embryogenesis could become apparent when cellular diversity decreases with age and cell populations become increasingly oligoclonal. Higher rates of detectable clonal mosaicism in older cancer-free individuals could also be due to increased rates of somatic mutation or diminished capacity for genomic maintenance, such as with telomere attrition34 leading to proliferation of somatically altered cell populations. A survival bottleneck of cellular progenitors could also lead to observable mosaic alterations that were previously below the threshold of detection but subsequently expanded due to positive selection. Further work is required to begin to unravel the underlying mechanisms that result in mosaic abnormalities, particularly as it relates to how and when altered clones are created, tissue-specificity, and the timing and expansion of distinct populations of cells with age. Finally, these findings underscore the importance of considering the role and time-dependent nature of somatic events in the etiology of cancer as well as other late-onset diseases.
This research was supported by the Intramural Research Program and by contract number HHSN261200800001E of the USA National Institutes of Health, National Cancer Institute. Support for each contributing study is listed in the Supplementary Acknowledgement Section. We thank Cathy Laurie and Bruce Weir for constructive discussion and a comparison of methodology and results for the GENEVA study. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the National Cancer Institute, the National Institute for Occupational Safety and Health, or the Maryland Cancer Registry.
K.B.J., M.Y., WY.Z., Z.W., X.D., C.L., S.W., N.E.C., M.T., N.R., and S.J.C. designed the study.
K.B.J., M.Y., L.P.-J., WY.Z., Z.W., S.W., N.E.C., N.R., M.G.C., M.C.D., D.A., B.I.G., R.N.H., F.X.R. , and S.J.C. interpreted the primary results.
K.B.J., M.Y., L.P.-J., B.R.-S., and J.R.G. developed the study methods.
K.B.J., M.Y., L.P.-J., WY.Z., Z.W., X.D., C.L., M.G.C., C.G.E., M.C.D., N.C., J.S., and C.C.C. analyzed the data.
K.B.J., M.Y., WY.Z., Z.W., X.D., C.L., A.H., L.B., and J.K. were responsible for production and analysis of the genotype data.
K.B.J., M.Y., and S.W. performed statistical analysis.
K.B.J., M.Y., S.W., M.-J.H. and S.J.C. drafted the manuscript.
M.T. , R.N.H., S.J.C. and J.F.F. provided vital programmatic and institutional support.
J.R.G., N.E.C., M.T., N.R., S.J.C., S.M.G.,V.L.S., L.T.T, M.M.G., D.A., S.J.W. J.V., P.R.T., N.D.F., C.C.A., A.M.G., N.H.,K.Y., J-M.Y., L.L., T.D., Y-L.Q., Y-T.G.,W-P.K., Y-B.X., Z-Z.T., J-H.F., M.C.A., C.A., W.J.B., C.H.B., E.M.G., C.C.H., C.A.H., B.E.H., L.N.K., L.L.M., L.H.M., B.A.R., A.G.S., L.B.S., M.R.S., J.K.W., M.W., X.W., K.A.Z., R.G.Z., J.D.F., M.G-C., N.M., G.M., L.P-O., D.B., M.S., A.J., M.T.L., L.G., D.C., P.A.B., M.R., P.R., U.A., L.E.B.F.,C.D.B., J.E.B., M.A.B., T.C., M.F., A.A., J.M.G., G.G.G., G.H., S.E.H., P.H., R.H., P.D.I., C.J., A.L., R.M-C., D.S.M., B.S.M., U.P., A.M.R., H.D.S., G.S., X-O.S., K.V., E.W., A.W., A.Z-J., W.Z., D.T.S., M.K., O.V., D.L., E.J.D., H.A.R., S.H.O., C.K., B.M.W., L.J., M.H., W.W., A.A.A., H.B.B-d-M., C.S.F., S.G., M.D.G., E.A.H., A.P.K., A.LC., M.T.M., G.P., M-C.B-R., P.M.B., F.C., K.C., M.C., E.L.G., M.G., J.A.H.B., M.J., K-T.K.,V.K.,R.C.K., R.R.M., J.B.M., K.G.R., E.R., A.T., G.S.T., D.T., J.W.E., H.Y., L.A., R.Z.S-S., P.K., F.S., D.S., S.A.S., L.M., I.L.A., J.S.W., A.P.G., L.S., D.A.B., R.G.G., M.P., WH.C., L.E.M., K.L.S., F.G.D., A.W.H., S.I.B., A.B., N.W., L.A.B., J.L., B.P., K.A.M., M.B.C., B.I.G., C.P.K., M.H.G., R.L.E., D.J.H., G.T., R.N.H., F.X.R., and J.F.F. contributed data or samples.
All authors contributed critical feedback, review, and approval of the manuscript.
Disclosures: BRS and OV are currently employees of the qGenomics company while LAPJ is a member of its scientific advisory board.