We have documented the presence of 13 regions of about 0.5 Mb each in the human genome where deletions occur at up to 100-fold higher frequency than the other 99.8% of the genome. We also present suggestive evidence for the existence of as many as 30 or more hotspots throughout the genome (Poisson distribution, Supporting Information S1
). With the exception of 2p16.3 (the NRXN1
gene) we found approximately equal numbers in all samples, and we present (below) indications from the published literature that the hotspots are found in populations other than the Quebec Founder Population; therefore their existence is probably a universal phenomenon.
By filtering out frequently observed deletions, we eliminate from consideration common deletions existing in the population (by analogy with SNPs, probably those which arose a relatively long time ago and against which there is no negative selection) as well as those rare but recurring deletions arising between repeated elements such as segmental duplications. In so doing, we maximized the chance of finding regions where deletions frequently occur due to reasons other than the presence of repeated elements, such as those in 16p11.2 associated with autism 
The hotspot in 20p12.1 is particularly noteworthy. A total of 27 independent deletions were documented in the 4 population samples, 25 of which were seen only once. The exactitude of the deletion borders is not in question, as the log R ratio graphs show how cleanly the first and last SNP of each deletion could be called (illustrated in Supporting Information S1
). The instability of the region is underlined by the detection of 3 de novo
deletions within the 440 ADHD children, one of which is adjacent to another deletion inherited from the father ().
Illumina Genome Viewer display of LogR ratios in the hotspot region of 20p12 in one trio.
Although this is to our knowledge the first such genome-wide description of genomic instability at this level, our conclusions are supported by published work in a variety of ways. First, deletions in two of the genes residing in hotspots have been intensively studied by many groups because of their roles in disease. Deletions in the PARKIN
gene (Chr6q26) are involved in about one-half of familial early-onset Parkinson's disease (PD) cases; as reviewed by Hedrich et al. 
exons 3 and/or 4 (corresponding to the hotspot in 6q26) are deleted in 50% of cases, whereas exons 1 and 10, farthest from the hotspot we identify, contribute less than 2% to the total of exon deletion events. Similarly, more than 90% of the 66 deletions found in the 1.1 Mb NRXN1
region of Chromosome 2 by Rujescu et al. 
cluster in the same 0.5 Mb where we have identified deletions.
Second, in a genome-wide study similar to ours 
several hundred heterozygous deletions were detected in 810 individuals, at about the same frequency as in the present study. When recurring deletions were removed and the remainder analyzed for clustering (our data-handling, not shown) patterns very similar to ours are obtained, in almost exactly the same regions, especially in 20p12.1, 9p23 and 10q21.3 (6 or 7 independent deletions each). Third, the DGV database (http://projects.tcag.ca/variation/
; UCSC Genome Browser) typically documents many deletions at the hotspot sites we have identified. The inevitable imprecision in mapping deletion endpoints when using the available CNV-calling algorithms 
makes it difficult to assess whether they represent independent events, rather than simply the same common deletion detected with variable apparent endpoints. Nevertheless it is probable that the high frequency of detection of these events reflects the hotspot nature of the genomic domain in at least some instances. Finally, as discussed below, deletions are found in primary tumors as well as tumor-derived cell lines in several of the hotspots we define here, notably 20p12 
, 6q26 
It is of interest to place these findings in the context of what is known regarding hotspots of chromatin instability, and deletions in particular. One of the few such regions which are well enough characterized to allow estimates of deletion frequency is in the DMD
gene, which at about 2.4 Mb is the longest known gene (this was not formally included in our study since it is not autosomal). The incidence of Duchenne muscular dystrophy is 1 per 3500 males and it is known that 1/3 of the cases are attributable to de novo
mutations, of which 60% are deletions 
. A further 6% are duplications, which we are not considering for this discussion. Deletions arising in the major hotspot, involving introns 40–54 and covering 0.7 Mb (about the same as the length of our hotspots), comprise about 2/3 of all those known in DMD. This suggests a frequency of ascertainable de novo
events of approximately 4×10−5
per 0.7 Mb. A number of deletions presumably occur in this region which do not affect exons, and have therefore never been ascertained. Given that the median deletion length and median intron length in this region are respectively about 60 kb and 36 kb, we may project the total frequency of deletions which lie entirely within introns as somewhat less than equal to the frequency of ascertainable deletions which knock out an exon. Thus we conservatively project a de novo
deletion frequency of 8×10−5
per generation per chromosome per 0.7 Mb in the hotspot.
In our study, we observed 5 de novo
deletions in a total of 13 hotspots covering about 7 Mb in 440 individuals. If representative, this number reflects a rate of about 4×10−4
per generation per chromosome per 0.6 Mb (two parental chromosomes can contribute to a deletion in autosomal loci). This is at least five times more frequent than the major hotspot in DMD, and 2 or 3 orders of magnitude higher than the remaining 99.7% of the genome, where only 2 de novo
events were detected by us. (We project this number to represent 4 events in total because of the false negative call rate; see SI.) In the most active hotspot (Chr20p12.1 where we found 3 de novo
events), deletions occur at about 10 times the rate seen even in intron 49 of the DMD gene, where the greatest density of deletion clustering is seen. Therefore we consider that at least some of the hotspots we describe are considerably more unstable than any which have been quantitatively defined to date. Consistent with this is the detection of only one deletion in the DMD
gene hotspot in any of our 2540 individuals (exon 48 in an individual in the EN sample; results not presented) or the 810 studied by Blauw et al. 
This work raises a number of intriguing questions at the fundamental level, one being, why do these hotspots exist? It may be that as a consequence of some form of stress, a chromatin loop may escape its natural confines within the highly organized and compact nuclear structure, and this event simply happens much more often at these sites. Alternatively, these high-frequency deletions may reflect some protective element, for which positive selection has occurred. It is of note that these scenarios are not mutually exclusive, in that there may exist situations of stress where a chromatin domain may (or must) undergo deletion; it would be to the organism's advantage if the deletion occurred in a DNA domain of low coding sequence density. In this way the hotspots we have characterized could be considered as hypothetical safety valves.
A second question that can be raised concerns the molecular mechanism of the high frequency of deletions. Many of the chromosomal elements such as low copy repeats (LCR), and segmental duplications (SD) which have been associated with structural alterations identified in diseases such as autism, neurofibromatosis and Sotos syndrome (OMIM) have been ruled out in the case of the DMD hotspot 
and recently in the chr6q26 hotspot 
. Likewise, upon initial analysis we have found no particular clustering of any of these with breakpoint hotspots in our collection. Hotspots of recombination, invoked to explain some deletion patterns, are spread across the genome at intervals of ten to hundreds of kb (as visualized in the UCSC Genome Browser), and although they may be the preferred site of breakpoints when structural alterations in specific genes lead to an identifiable phenotype, it is difficult to see how the presence of tens of thousands of these sites may explain the existence of the handful of deletion hotspots we have identified. Similarly, fragile sites appear not to be associated in a significant way, in that only two hotspot regions, 6q26 and 8p22 are in bands with fragile sites, FRA6E and FRA8B respectively, based on the summary of 113 sites in about 310 bands by Calin et al. 
, and the latter probably does not overlap the hotspot. The only finding which may be pertinent to this discussion is the report of increased incidence of double strand breaks in intron 49 in the DMD hotspot when transfected into yeast 
. This may reflect, for example, increased TOPO2 activity, but at present we have no evidence to implicate such activity in any of the regions in question.
At a more applied level, these data also have implications for gene-disease associations. The finding of rare deletions in or near coding sequences, especially if they arise de novo
in probands, has often been accepted as de facto
evidence that the affected gene may be involved in the condition in question, simply because of the expected low frequency of these events in the genome (examples below). Our findings indicate that this argument does not hold for deletions occurring in the hotspots we have documented, and since the Poisson distribution analysis (Supporting Information S1
) indicates other hotspots exist, this is probably also true for a number of other regions.
Nine of the documented hotspots carry genes, and every one has been implicated in disease. We propose that a careful delineation of precise deletion (or amplification) boundaries in and around these genes will be useful, since at least some of the deletions may be present simply due to the unstable nature of the chromosomal domain rather than because they contributed to the phenotype by affecting gene function. In our samples, exons were unaffected in three of the nine genes, perhaps reflecting important roles for these genes in human health; however because of small numbers involved we cannot draw conclusions from this information. Nevertheless, the patterns of exon disruption in the other genes are somewhat informative, and the following paragraphs present some examples.
NRXN1 and Autism
Deletions in this gene have been implicated in neurological disorders including autism and mental retardation in anecdotal fashion 
. A major family study of autism 
, on the other hand, found deletions which did not segregate with the condition, and the authors concluded that there was no association. We suggest that a close assessment of exon dosage in these families may reveal either association, or the lack of it, between deleterious deletions and autism. One may postulate that the majority of the deletions in this region segregating in the families do not affect coding sequence, and their presence reflects merely the hotspot nature of the domain; those few which actually disrupt exons may be shown to segregate with the condition. This scenario could be predicted based on our results: 2 of 7 deletions in the SZ sample affected coding regions of the gene, and one (de novo
) deleted an exon in an ADHD proband. None of the five deletions found in other samples (EN, LG) affected exons. Similarly, a recent large study assessing CNVs in NRXN1
in more than 35,000 individuals 
found CNVs in the SZ group at a frequency 3 times higher than amongst controls, but in both patients and controls most CNVs did not affect exons.
MACROD2 and Kabuki Syndrome
One report suggests this gene as a candidate for Kabuki syndrome, since a de novo
deletion involving exon 5 of this gene was found in a proband 
. Our finding of three individuals from the EN cohort with deletions of exons 5 and/or 6 reduces the likelihood of this proposed association being real, as a review of the files of each of the 3 individuals showed no Kabuki-like symptoms at all. The location of the gene in a hotspot of deletion greatly increases the chance of sporadic exon deletions, and perhaps explains the chance finding of the deletion in the Kabuki proband. On the other hand, the incidence of exon-deleting mutations in the EN cohort (3 out of 9) compared to 0 of about 16 in the other cohorts suggests a possible involvement of this gene in EN.
This region of the genome has also been implicated in colorectal cancer, with the report 
that 23% of primary tumors and 55% of cell lines had undergone deletion events with the consensus minimum region of loss at 14.85–15.05 Mb, coinciding with the hotspot we defined here. This group provided evidence that RNA molecules encoded in the region may have tumour suppressor activity, but it is also probable that the high frequency of deletions may in part be attributable to the instability of the region.
CTNNA3 and Alzheimer's Disease
The hotspot on chromosome 10 falls in the 3′ half of this gene. Exons were affected in a substantial proportion of the deletions, including four such deletions which would produce a frameshift in the LG cohort (all 4 subjects were mentally alert). This gene has been associated with late-onset Alzheimer's disease in women by genetic studies 
. The results reported here suggest that if CTNNA3
is involved in Alzheimer's it is not through a loss-of-function mechanism.
PARK2/PARKIN and PD
The association of this gene with familial early-onset PD is well established, since it is homozygously mutated in about 50% of such cases 
. The gene is mutated in a proportion of later-onset PD, but it is currently uncertain whether single-copy deletions in fact predispose to this condition. Our results may be pertinent in this debate. Of the 2540 unrelated individuals in our studies, 22 (0.8%) carry deletions in this gene; exons are affected by 10/15 of the deletions in the ADHD, EN and SZ samples (virtually all of whom are under the usual age of PD onset), but only by one of the 6 deletions in the LG sample, none of whom had PD in spite of advanced age (p<0.05). These results are not inconsistent with a role for PARKIN
deletions in late onset PD and follow-up of patients carrying deleterious deletions like those we have found may help resolve this issue.
One of the hallmarks of a tumor-suppressor gene (TSG) is the presence of deletions in tumors which affect coding sequence, as was seen with the prototypical TSG, RB1 [reviewed in 20]
. If the deletion is inherited, the classic pattern observed is the formation of multiple tumors in the susceptible tissue, since each cell is in principle predisposed to cancer. The search for deletions in tumors has produced many TSG candidates, and some of the sites frequently reported coincide with the hotspots described here. In particular, a paradox has arisen in the case of the PARKIN
gene which our data may help resolve. This gene is described by some groups as a TSG mainly on the strength of the frequency of exon-disrupting deletions in cancer 
but patients with PD appear to be, if anything, protected from most cancers 
. In the extreme situation of individuals inheriting two mutated alleles (engendering early-onset Parkinson's disease), one would expect the appearance of multiple tumors, an observation which has not been reported. However, if deletions occur at high frequency in certain chromosomal sites in cancer merely as a consequence of the unstable nature of the chromatin domain, their appearance would not justify attributing tumor-suppressive function to the gene product. Similar arguments apply to the very high frequency of deletions in Chr20p21.1 in colorectal cancer, which overlap between 14.85–15.05 Mb 
, in the most active hotspot we have found. CTNNA3 
and TUSC3 
, which have also been cited as candidate TSGs for the same reason, could also have their status as TSG re-evaluated in light of our results, given that for each of these genes we have identified a number of cancer-free adult individuals with exon-disrupting deletions ().
In general, therefore, the existence of hotspots with the properties we present here should be incorporated into any interpretation of deletion data concerning the genes associated with these hotspots. In some instances, it may become appropriate to incorporate exon-dosage assays in evaluating individuals' risk and potential treatment scenarios.