|Home | About | Journals | Submit | Contact Us | Français|
Acentric inverted duplication (inv dup) markers, the largest group of chromosomal abnormalities with neocentromere formation, are found in patients both with idiopathic mental retardation and with cancer. The mechanism of their formation has been investigated by analyzing the breakpoints and the genotypes of 12 inv dup marker cases (three trisomic, six tetrasomic, two polysomic and one X chromosome derived marker) using a combination of fluorescence in situ hybridization, quantitative SNP array and microsatellite analysis. Inv dup markers were found to form either symmetrically with one breakpoint or asymmetrically with two distinct breakpoints. Genotype analyses revealed that all inv dup markers formed from one single chromatid end. This observation is incompatible with the previously suggested model by which the acentric inv dup markers form through inter-chromosomal U-type exchange. On the basis of the identification of DNA sequence motifs with inverted homologies within all observed breakpoint regions, a new general mechanism is proposed for the acentric inv dup marker formation: following a double-strand break an acentric fragment forms, during either meiosis or mitosis. The open DNA end of the acentric fragment is stabilized by the formation of an intra-chromosomal loop promoted by the presence of sequences with inverted homologies. Likely coinciding with the neocentromere formation, this stabilized fragment is duplicated during an early mitotic event, insuring the marker’s survival during cell division and its presence in all cells.
The evolution of human chromosomes has involved chromosomal rearrangements, losses and gains of genomic material, repositioning of centromeres and the formation of new centromeres at de novo sites (neocentromeres) stabilizing chromosomal fragments. The mechanisms that shaped modern genomes over millions of years are still operational today. One group of chromosomal rearrangements resulting in the addition of genomic material is accessory acentric chromosomes, which are chromosomal fragments that have lost the normal centromere and survive cell division only after the formation of a neocentromere.
Initially, acentric markers were identified in patients with idiopathic mental retardation, but more recently they have also been found in cancer cells (1–4). So far, ~90 neocentric acentric marker chromosomes have been described, and can be separated into two general groups (5,6). The first group results in an unbalanced karyotype due to an accessory inverted duplication (inv dup) of the distal part of a chromosome arm (class I marker chromosomes). In the second group, there is a balanced karyotype and the marker is either a linear or ring chromosome resulting from an interstitial deletion (class II marker chromosomes). Class I markers represent approximately three quarters of all neocentric cases. In 80% of the class I marker cases, an inv dup marker is present in addition to two normal chromosomes, resulting in tetrasomy for the terminal chromosomal region present on the marker (tetrasomic cases) (5). In the remaining 20% of the class I marker cases, the inv dup marker occurs with one normal chromosome and one deleted derivative chromosome with the deleted portion assumed to be complementary to the region that is present on the inv dup marker; thus leading to three copies of that region in the genome (trisomic cases). It is currently unclear how class I markers of trisomic and tetrasomic cases are formed.
Neocentromere formation is assumed to be facilitated by epigenetic processes and has been suggested to not occur synchronously with the formation of the supernumerary chromosome fragment at meiosis, which is assumed to take place early in the zygote (7). Some neocentromeric markers have been inherited (8), therefore, it is believed that once a neocentromere is formed it is stable during mitosis and meiosis (9–11).
The purpose of this study was to identify the mechanism that underlies the formation of acentric inv dup (class I) markers. Using fluorescence in situ hybridization (FISH), SNP arrays and genotyping methodologies, we demonstrate that inv dup markers, from both trisomic and tetrasomic cases, are formed from two copies derived from the same genotype. Sequences with inv dup homologies or palindromic sequences were found in all the breakpoints, which allow us to propose a new mechanism: after a double-strand break (DSB), a single chromatid end occurs during meiosis or mitosis. In an intermediate stage, the acentric fragment forms a hairpin loop, and following neocentromere formation the fragment is able to survive cell division, resulting in the generation of the acentric inv dup marker after replication.
To obtain insights into the formation of type I marker chromosomes, 12 acentric inv dups representing different type I marker subgroups were analyzed: six autosomal tetrasomic (markers -1, -4, -5, -9, -10, -11), three autosomal trisomic (markers -6, -7, -8), two autosomal polysomic cancer cases (markers -2 and -3) and one case with a chromosome X derived marker (marker-12). Information about the different markers, copy number, cell type, observed incidence and the chromosomal origin of the markers are listed in Supplementary Material, Table S1.
To characterize the breakpoints of the inv dup marker chromosomes and to be able to identify sequence motifs at the breakpoints that potentially contribute to their formation, FISH and quantitative SNP array analyses were performed. The analysis of the breakpoint region of marker-5 is shown in Fig. 1. The breakpoints suggested by FISH (Fig. 1A–C) and quantitative SNP array (Fig. 1D) overlapped, suggesting that the quantitative SNP array analysis allows accurate breakpoint determination on inv dup markers. Breakpoint regions of the other markers were determined in a similar way. Overall 14 breakpoints were identified on 12 marker chromosomes. The description of these breakpoints are listed in Table 1.
The breakpoints could be narrowed to a median size of ~67 kb (range 30–169 kb) and in all but one case (marker-11) the breakpoints determined by SNP array and FISH were overlapping. For the only exception (marker-11), the results of both methods indicated one breakpoint, but the location of the breakpoint determined with FISH was ~140 kb centromeric to the breakpoint determined by the array. This could be due to the fact that the cell line containing marker-11 was mosaic (25–30%). Breakpoints determined by SNP arrays for mosaic cases could be less accurate because the threshold was lowered to better detect the less prominent signal difference present at the breakpoint of the accessory inv dup marker.
For 10 of the 12 markers, one single breakpoint region was identified suggesting that most inv up markers have two symmetrical arms with identical length. In two other cases (markers -4 and -12), the supernumerary markers were more complex with two breakpoints in each (for marker-12, see Fig. 2). These two breakpoints were separated by a single copy stretch of several million base pairs (5.66 Mb on marker-4 and 31.27 Mb on marker-12) resulting in regions with both tetrasomy and trisomy for marker-4 and regions with trisomy and disomy for marker-12, respectively (Table 1). The observation of symmetric and asymmetric inv dup markers suggests either that these two different inv dup groups are formed by different sequence motifs present at the breakpoint or that both groups are formed by similar sequence motifs, which just happen to be separated by sequences of different length.
In order to determine whether different markers form through similar mechanisms, the degree of mosaicism was assessed in cases containing trisomic and tetrasomic class I markers. The three trisomic cases (markers -6, -7, -8) showed a marker incidence of 100% in the analyzed tissues (no mosaicism) (Supplementary Material, Table S2), whereas the six tetrasomic cases (markers -1, -4, -5, -9, -10, -11) were present in 2–100% of the cells examined. The data suggest that mosaicism is not found in cases with trisomic class I marker chromosomes.
It is currently assumed that cases with trisomic inv dup markers contain the missing portion from the deleted chromosome (5). This was true for the markers of two of the trisomic cases (markers -7 and -8), which contained a chromosome with a deletion complementary for the region of duplication on the marker, leading to trisomy for the entire length of the marker. However, the third trisomic case (marker-6) (Fig. 3A) had a deleted chromosome 9 along with a supernumerary marker that did not contain the entire deleted region. Array (Fig. 3B) and FISH analyses (Fig. 3C and D) detected a region of 266 kb that was not present on the deleted derivative or on the inv dup marker (Fig. 3E) and was therefore monosomic in the patient. The loss of this 266 kb piece of DNA likely occurred during the formation of the deletion or the inv dup marker. A search of web-based variation databases and the literature did not reveal any copy number variations in this region, supporting our assumption.
Acentric inv dup markers could be formed in different ways: from a single chromatid end or two chromatid ends through inter- or intra-chromosomal U-type exchange. In order to distinguish between these models, the ratios for the A and B alleles for all the SNPs present on both copies of the inv dup markers were calculated and compared with theoretically possible ratios given the two different models of acentric inv dup marker formation (Supplementary Material, Fig. S1). A combination of SNP array (Fig. 4) and microsatellite analyses was performed. The genotype analysis allowed us to distinguish the possible haplotype combinations (Supplementary Material, Fig. S1). The results suggest that both arms of acentric inv dup markers originated from the same haplotype, as described below.
Table 2 shows all the genotype ratios determined experimentally for the A and B alleles. In all of the six tetrasomic cases (markers -1, -4, -5, -9, -10, -11), only two ratios (0:4 and 1:3) were observed (shown for marker-10 in Fig. 4A and B; Table 2), a distribution pattern predicted by a model in which the marker contains the same haplotype as one of the normal chromosomes in the same cell. The 0:4 and 1:3 ratios suggest that the SNPs on both arms of the marker chromosome were identical, originating from the haplotype of one of the normal chromosomes present in the genome (Supplementary Material, Fig. S1). Thus, these markers formed from one chromosome end and not through an inter-chromosomal U-type exchange. An exception was marker-4, in which a 2:2 ratio was observed in addition to 3:1 and 4:0 ratios (Fig. 4C and D, Table 2) suggesting that a third haplotype must be present in the genome (Supplementary Material, Fig. S1B). Figure 4E shows that the 2:2 ratio was distributed throughout the analyzed tetrasomic region excluding the possibility that this ratio could have resulted after a meiotic cross over between two different haplotypes within that region. Because a 2:2 ratio could have formed in different ways (Supplementary Material, Fig. S1B), and to clarify if a third or even a fourth haplotype was present, we genotyped eight different microsatellites using genomic DNA from the individual with marker-4. Alleles of three different sizes were detected for one microsatellite, D8S264, located on both arms of the marker (data not shown). This suggested that three haplotypes are present in the genome, leading to the assumption that two different alleles were located on the normal chromosomes and the third on the marker, and that the marker was formed by two copies from the same haplotype. Therefore, marker-4 could have been inherited from a previous generation, or more likely it derived from the second chromosome of one parent. This second chromosome with a third haplotype was not inherited to the patient, but a fragment of it did, possibly originating during a meiotic I error, and this fragment may have subsequently formed the acentric marker.
For three trisomic cases (markers -6, -7 and -8), ratios of 0:3 and 1:2 were observed, making it impossible to distinguish if the marker was formed from the missing part of the deleted derivative or two different haplotypes after an inter-chromosomal U-type exchange. To determine if an additional third haplotype was present, which one would expect after an U-type exchange between two chromatids on two different chromosomes, microsatellites were genotyped along the length of these markers. All eight tested microsatellites on markers -6, -7 and -8 revealed only two different allele sizes, in the genome (data not shown) indicating that only two haplotypes are present and therefore making an inter-chromosomal U-type exchange mechanism unlikely. Furthermore, the data suggest that inv dup markers from trisomic cases derive from the chromosome end that was lost on the deleted chromosome.
Both markers -2 and -3 were observed in a cancer cell line (1) suggesting a mitotic origin of these markers. Marker-2 originated from chromosome 3 and marker-3 from chromosome 9. There are two copies of marker-2 and three copies of marker-3 per cell. The plots of the A to B ratios for both markers (Fig. 4F and G) show that only one haplotype is present in the genome, suggesting that the other haplotype was lost somatically (loss of heterozygosity). In addition, it also suggests that both arms of these two inv dup markers again are identical.
The case with marker-12 was diagnosed with Turner syndrome and the karyotype presented as 45, X, and a Xq derived acentric inv dup (12). The trisomic portion on the asymmetric marker-12 showed only a 3:0 ratio (Fig. 4H and I) suggesting that the X chromosome and both copies on the inv dup marker share an identical haplotype. Both copies on this inv dup marker were identical and therefore must have been derived from the X chromosome present in the genome. This observation strongly suggests that the marker originated from one single chromosome end and again not through an inter-chromosomal U-type exchange. The latter mechanism would result in the formation of an inv dup marker in addition to either a dicentric or deleted derivative, neither of which was observed in any of these cases.
The combined data from all inv dup markers suggest that these acentric inv dup markers formed through the mechanism that utilizes one single chromatid end and they are not formed through U-type exchanges, during which two chromosome ends interact and would give rise to the acentric inv dup markers.
In order to explain the formation of inv dups, we assessed the general sequence architecture (Supplementary Material, Table S3) of the 14 breakpoint regions by sequence alignments to identify sequence pairs or single palindromic sequences with inverted homologies (data not shown). For markers with two breakpoints, both breakpoint regions were aligned. When only one breakpoint region was observed, the DNA sequence was aligned with its own reversed sequence.
Sequence pairs with inverted homologies could lead to an inv dup marker with two arms of unequal length, whereas inverted repeat sequences could lead to inv dup markers with two arms of identical length. We initially examined long pairs of sequences in opposite orientation that showed a ≥70% inverted homology to each other. A summary of this analysis is given in Table 3.
Low copy repeats (LCRs) often coincide with breakpoints found in chromosomal abnormalities (13,14) and were present at the two breakpoints of the asymmetric marker-4. One 9.5 kb long LCR (no.23429) was present in both breakpoints in opposite orientation [chr8:6909899–6919457 (+) and chr8:12608622–12617968 (−)]. The inverted homology of these two LCRs was ~92% (for the alignment for these two sequences, see Fig. 5A). The presence of these two sequences in both breakpoints suggests that they contributed to the inv dup marker formation. Marker-12, the other asymmetric marker in this study with two distinct breakpoints did not show any LCRs at either breakpoint. The closest LCR outside of the upper breakpoint region was 11 kb centromeric to the breakpoint, the closest LCRs found of the lower breakpoint region was 90 kb telomeric to the breakpoint. Alignments of those distant LCRs revealed that they did not share any sequence homology to one another (data now shown). All of the remaining breakpoints did not contain LCRs within the breakpoint regions or within 500 kb of the breakpoint (data not shown). The combined data suggest that sequence motifs other than LCRs contribute to inv dup formation. For the symmetric marker-7, various LCRs spanned the breakpoint region. However, homology searches of these sequences did not reveal inverted homologies within the breakpoint or in close proximity (data not shown), suggesting that sequences other than LCRs may also have participated in the formation of this marker.
In order to identify other sequences that are not LCRs, the two breakpoint regions of marker-12 were aligned with each other and two sequences with high inverted homologies were identified that were 785 bp in length and 73% identical (Fig. 5B). These sequences were part of two long interspersed nuclear element (LINE) repeats (Table 3). Interestingly, similar pairs of LINE sequences with inverted homologies were found in the breakpoint of seven markers (markers -2, -5, -7, -8, -9, -11, -12) (Table 3). The alignment of a LINE sequence pair in close proximity to each other within the breakpoint of marker-5 is shown in Fig. 5C. In general, the LINE sequence pairs at breakpoints had between 71 and 89% homology to each other and sizes ranging between 0.8 and 5.7 kb. In these symmetrical breakpoints, the sequence pairs were separated by 3–148 kb. These results suggest that most inv dup markers could form due to the presence of sequences with inverted homology that are in close proximity to the DSB.
Because sequence pairs with high homology could be regarded as palindromic sequences with a spacer, the presence of palindromes without spacer in the breakpoints was investigated. Supporting the hypothesis that such sequences could participate in the local formation of truly symmetrical inv dup markers, palindromic sequences were identified in the breakpoint regions of most of the symmetrical markers. These palindromes could be categorized into three different groups: palindromic Alu sequences (Fig. 5D and E), palindromic human self chained alignments (HSCAs) (Fig. 5F) and AT rich regions with high palindromic potential (Fig. 5G). Single HSCAs with inverted homology to themselves (thus being palindromic) were part of the breakpoint regions of three markers -3, -8 and -10. The sequence sizes ranged from 273 to 6566 bp. Typically, sequences with the highest homology (72–95%) were located at the ends of the HSCA connected by a less homologous spacer (marker-3 in Fig. 5F) or, in two cases, by AT rich sequences. The breakpoint of marker-1 contained a near palindromic sequence that was derived from two Alu repeats, 562 bp in length, with 79% inverted homology (Fig. 5E). Similar palindromic Alu regions were present in the breakpoint regions of markers -3, -9, -10 and -11 (shown is analysis of marker-10 in Fig. 5D). AT rich regions in general were identified in six of the symmetrical breakpoint regions (markers -1, -2, -5, -6, -8 and -10) (Table 3). Sizes of these AT rich regions ranged between 48 and 493 bp; inverted homologies of these sequences were above 74% up to 100%. In summary, the presence of inverted homology sequences in breakpoint regions of all symmetric inv dups suggests that palindromic sequences are involved in their formation.
With the present work, we investigated how acentric class I markers (equal to inv dup markers) are formed. We present an analysis of the breakpoints of 12 acentric class I markers and identify a common mechanism that underlies their formation. We included trisomic (markers -6, -7, -8) and tetrasomic (markers -1, -4, -5, -9, -10, -11) cases derived from patients with mental retardation, one cancer case with two polysomic markers (markers -2 and -3) and one case with a chromosome X derived marker (marker-12). Breakpoint regions were identified using a combination of different technologies (Affymetrix SNP arrays, FISH and microsatellite analysis), allowing us to draw conclusions about the formation of the inv dup markers. Previously, only limited numbers of tetrasomic inv dup markers (three cases) and no trisomic markers were investigated (15,16). These three inv dup markers were asymmetrical with two breakpoints. It was therefore unexpected that nearly all of the markers investigated in our study (10 out of 12) were symmetrical with one breakpoint, suggesting that the majority of inv dup markers represent ‘true mirror image duplications’.
Mosaicism reflects the absence of the marker chromosome in some fraction of cells. How mosaicism occurs in cases containing marker chromosomes is not clear, although neocentromere instability has been suggested as one mechanism to explain the loss of the marker in a subset of the cells (7,17). For example, Reddy et al. (18) observed a decline of marker presence in cells with increasing age of patients (from 100% at birth to 75% at the age of 16). They and others attributed this observation to accumulation of mitotic error and possibly mitotic instability with age (17,18). Alternatively, the marker could confer a ‘genomic burden’ that is deleterious for some cell types and subsequently lost. The frequency of mosaicism might also depend on the timing the marker is formed (during a meiotic or a mitotic event later during development) or when the neocentromere becomes activated. We assessed the degree of mosaicism in new and previously published cases in order to determine whether trisomic and tetrasomic markers form through similar mechanisms. Fifty-two tetrasomic and thirteen trisomic cases described here or published elsewhere (5) were analyzed. The frequency of markers in tetrasomic cases varied between 2 and 100% of cells (5), however, only 12 of 52 tetrasomic cases (23.1%) showed a marker incidence of 100% in one or more analyzed tissues. This was not the case for the thirteen trisomic cases in which the markers were found in 100% of all analyzed cells of all analyzed tissues (Supplementary Material, Table S2). It had previously been suggested that trisomic cases can be mosaic (5); however, this was based on a single case. Close inspection of these data (19) revealed that this case was in fact more likely to be tetrasomic, consistent with our finding that mosaicism only occurs in tetrasomic but not in trisomic cases.
We propose the following model for our observations: each tetrasomic case has four copies of a chromosomal end; two on the normal chromosomes and two on the inv dup marker. When the marker is lost, the genome simply reverts to a normal diploid state. Gradual loss of tetrasomic markers could lead to varying degrees of mosaicism. In contrast, trisomic cases have three copies of a chromosomal end: one on one normal chromosome and two on the inv dup marker. Loss of the marker in a trisomic case would result in a monosomy for the chromosome end, which could be disadvantageous for cell survival. Consistent with our model is the fact that there are four times more tetrasomic cases reported in the literature compared with trisomic cases (5). We propose that both groups of tetrasomic and trisomic markers initially occur with similar frequency but because of the different effect on cell survival, mosaicism is never observed in trisomic cases.
For two of the three trisomic cases in this study, the material present on the marker seemed to reflect the portion lost on the deleted chromosome, but in the case of marker-6 an additional 266 kb was lost on the corresponding deleted derivative 9. We suggest that the 266 kb of DNA was lost on the unprotected end of del(9) and a de novo telomere was formed, a mechanism of DSB repair that has been previously observed in humans (20). This additional loss on the deleted chromosome suggests that newly reported trisomic marker chromosome cases should to be analyzed carefully, for possible concomitant deletions.
Voullaire et al. (7) postulated two different mechanisms for the formation of class I neocentromeric markers: one involving formation during mitosis and the other during meiosis. Chromatid breakage followed by segregation of one centric and one acentric fragment to different daughter cells was proposed for the mitotic mechanism. After replication, the broken sister chromatid ends rejoin, thus forming an acentric inverted duplicated fragment, which upon neocentromerization can survive cell division, whereas the sister cell with the dicentric derivative is not viable and is lost. To explain the formation of the acentric inv dup marker during meiosis, it was proposed that anomalous crossing over occurs during meiosis I, resulting in an acentric inv dup abnormality (7). These two mechanisms, however, only explained the formation of tetrasomic cases.
In addition, it has been suggested that the acentric fragment could also be created during an U-type exchange (18,21). Meiotic U-type exchange requires interaction between the two chromatids of two homologous chromosomes (inter-chromosomal U-type exchange) (Fig. 6A, B) or the two chromatids of one chromosome (intra-chromosomal U-type exchange) (Fig. 6C, D). The model further assumes that an acentric inv dup marker is formed at the same time as a dicentric chromosome, which further could lead to an inv dup del derivative. However, dicentric chromosomes or inv dup del abnormalities have never been observed in combination with an acentric inv dup marker. Therefore, we assume that this is an unlikely mechanism for acentric inv dup formation. We suggest a different mechanism for the formation of inv dup markers. On the basis of the result of genotype studies, we propose that both arms of all inv dup markers are identical and therefore originated from a single chromosomal fragment. The strongest support for this interpretation comes from the data observed for the X chromosome derived marker-12, for which one normal chromosome X and the X-derived inv dup marker were present. Surprisingly, only a single haplotype could be observed. Not only does that indicate that both arms of the inv dup marker are identical, but also that these two copies were derived from the only chromosome X.
LCRs are involved in many chromosomal abnormalities as they are believed to provide the substrate for homologous recombination with their high sequence identity, which predisposes specific regions to rearrange (22). We, therefore, expected to find LCRs at the breakpoints of inv dups (14). However, the only case in which two individual LCR sequences with inverted homologies were found was within the two breakpoints of marker-4 (on 8p). These individual LCRs were embedded within two LCR-clusters termed repeat-proximal (REPP) (1.33 Mb long) and repeat-distal (REPD) (0.378 Mb long) (23), which have been suggested to be involved in other 8p chromosomal abnormalities such as deletions, inversions, inv dup dels and duplications (15,24). Giglio and coworkers (2001) described an asymmetric inv dup marker derived from chromosome 8p similar to marker-4, but were not able to identify a specific pair of LCR sequences within the two LCR clusters. Our analysis identified the LCR no. 23429 at the breakpoints of marker-4 and therefore allows for the first time to link breakpoints within these LCR clusters to a specific sequence pair with inverted homologies. Because the breakpoints of the 8p-derived markers and all the breakpoints of the previously published asymmetric cases contain LCRs (16,24), it was unexpected that no other breakpoints showed the presence of (or close proximity to) LCR pairs in our study, and suggested to us that alternative sequences with inverted homologies contribute to the formation of the markers. We identified such other sequence pairs with inverted homologies within the breakpoints, the longest of which were LINE repeats with sizes up to 5.5 kb, a length well above the defined size limit for LCRs. In addition, we identified palindromic sequences in the breakpoints of eight of the ten symmetric cases. These included palindromic HSCAs, AT rich regions with high palindromic nature and palindromic Alu-repeats. The median size of these sequences was 489 bp. Similar inverted repeated sequences have been assumed to be involved in the formation of inv dup in cancer cells (25). Because various types of DNA repeats are frequently present in the genome, the observed breakpoint position could be the result of a random DSB, where local sequences with high inverted sequence homology lead to hairpin formations. A random occurrence of breakpoint locations was reported for a number of chromosome 13-derived markers (26). In contrast, similar breakpoints have been reported for 8p-derived inv dup markers (8,15,27,28), which may involve the LCRs containing transposable elements. In summary, sequences with inverted homologies can be regarded as sites with increased potential for genome rearrangement, however, due to the complexity and hence rarity of neocentromere formation acentric inv dup markers are not more frequently observed.
The only model that has been proposed for acentric inv dup formation that does not require a chromosomal exchange event between two chromatids is one in which an acentric fragment arises after a DSB during a mitotic event (7). That particular model assumes that after replication two acentric fragments join to form the inv dup marker and a sister cell with a broken chromosome is lost. The mechanism by which the acentric fragments survive cell division and subsequently fuse remain unexplained. Our study suggests an alternative model.
On the basis of our findings that inv dup markers originated from one chromatid end and that inverted sequence homologies are located close to or at the breakpoints, we propose a model in which an acentric inv dup marker is formed after an intermediate hairpin molecule (Fig. 7). Following a DSB, one DNA strand is exposed by 5′ to 3′ degradation. Intra-strand pairing follows, promoted by the presence of an inverted repeat (such as a single palindromic sequence or a set of two sequences with inverted homologies to each other) to generate a hairpin molecule. After bi-directional replication, an acentric marker with an inv dup is formed and ultimately stabilized by the development of a neocentromere (Fig. 7A). Even though a mitotic inter-chromosomal U-type exchanges can explain the formation of trisomic marker cases, our new model for the first time allows to explain all investigated cases of acentric inv dup marker reported in the literature on idiopathic mental retardation. DSBs that occur during the first (Fig. 7B) or the second meiotic division (Fig. 7C). This model is also consistent with the observation that the marker chromosome is present in 100% of all cells in all trisomic cases, whereas mosaicism is observed in many tetrasomic cases due to marker loss, most likely by mitotic error. In cancer cells, mosaicism for the marker chromosome is found because the marker originated during a mitotic event (Fig. 7D).
It is still unknown through what mechanism or process neocentromeres are established. According to our model that proposes an intermediate hairpin molecule, we suggest that the neocentromere function needs to be established before or while the hairpin molecule is replicated. Both events (neocentromerization and replication) might occur during the first initial mitotic divisions of the zygote for inv dup cases in idiopathic mental retardation. In cancer cells, these events occur during a later mitotic event. According to our studies, all observed inv dup marker groups seem to have originated through the same mechanism.
An emerging view of tumor progression is that cancer cells de-differentiate and acquire embryonic or stem cell properties. Therefore, mechanisms that are found during early embryonic development that result in neocentromerization and replication of inv dup markers could be re-activated during oncogenesis resulting in the formation of the same type of inv dup markers in cancer cells.
Three patients with de novo supernumerary marker chromosomes (markers -4, -6, -10) (Supplementary Material, Table S1) were ascertained because of mental retardation and/or development delay along with multiple congenital anomalies and are reported in this study for the first time. Cell lines of these patients were established by epstein barr virus (EBV) transformation of lymphocytes using standard protocols (29). The remaining nine markers utilized in this study have been previously described: a fibroblast cell line (marker-9) (17), a cancer cell line (markers -2 and -3) (1) and six previously described cases with available EBV transformed lymphoblastoid cell lines (markers -1, -5, -7, -9, -11 and -12) (12,17,18,30) were utilized (Supplementary Material, Table S1); however, a much more detailed characterization was performed in this study. All of the cell lines were grown under standard growth conditions, from which the genomic material (DNA, unfixed chromosomes and fixed chromosomes) was isolated to allow for detailed analysis of the marker chromosomes and their breakpoints.
Metaphase chromosomes were prepared from lymphoblast or fibroblast cultures using standard cytogenetic techniques (31,32). To attain high-resolution chromosomes, the cultures were synchronized by the addition of thymidine for 16.5 h of culture and harvested after the addition of ethidium bromide for 2 h and colcemid for 45 min. The cells were treated for 12 min with 0.075 m KCl and fixed in 3:1 methanol:acetic acid prior to staining. To obtain better resolution of the neocentromeric constriction, a different hypotonic solution was used (1:1 mix of 0.075 m KCl and 8% sodium citrate hypotonic solution) and was used for 15 min at room temperature. GTG- and C-banding was performed to characterize markers -4, -6 and -10, and at least 20 chromosomal spreads were examined from two cultures (33).
Polyclonal antibodies to CENP-B, CENP-C and CENP-E (34) were utilized to stain for centromere specific proteins on the three new acentric markers -4, -6 and -10. Cell lines were pretreated with colcemid for 2 h and cytospun onto clean slides. The detection of the centromeric proteins was performed on unfixed chromosomes (17,35), and after fixing, standard FISH protocols were used to verify the identity of chromosomal origin for these markers.
Detailed FISH analysis was initially performed to verify that the markers were acentric and had no detectable alpha-satellite DNA present. Additionally, the breakpoints established by quantitative SNP array were further verified and delineated using FISH. Clones containing alpha-satellite DNA [for chromosomes 8, 9 and 13 (36) (Vysis/Abbott, Inc., Downers Grove, IL, USA)] and locus specific BACs and Fosmids [from the RP11 (www.sanger.ac.uk) and WIBR2 (http://www.broad.mit.edu/) libraries] were utilized in these studies. Genomic BAC and Fosmid clones were isolated according to a standard (alkaline lysis) Mini-prep protocol. Slide preparation, probe preparation and hybridization were completed using previously described methods (37), with minor modifications (38). Briefly, probes were labeled directly with SpectrumGreen, SpectrumOrange (Vysis/Abbott, Inc.) or DEAC-nucleotides (Perkin Elmer) using standard nick translation kit (Vysis/Abbott, Inc.) according to the manufactures description with some modification. Typically 1 µg DNA was labeled overnight with 5 µl of the recommended enzyme concentration at 15°C. Usually 100 ng of SpectrumOrange labeled probes and 200 µl of SpectrumGreen and DEAC labeled probes were used in one hybridization. For some Fosmid clones, the amount of labeled DNA was doubled, for labeled repetitive DNA usually less labeled DNA (50 ng) was used. Between 10 and 20 metaphases were analyzed for the presence of signal on both the normal and the marker chromosome using direct microscopic visualization. Computer visualization of probe hybridization was achieved using a Zeiss Axiophot fluorescence microscope and images were captured using a cooled charge-coupled device camera.
Genomic DNA from cell lines was isolated utilizing automated DNA extraction (MagnaPure Compact from Roche, Version 1.1.1.) or by the use of a manual Qiagen extraction (Qiagen, CA, USA). For fibroblasts, the pellet of 1 flask with 80% confluent cells and for lymphoblastoid cell lines 5 ml of logarithmic growing culture was used for DNA isolation. DNA concentrations were determined using a NanoDropR ND1000 Spectrophotometer with the Software ND1000 V3.1.0. The isolated genomic DNA was stored at 4°C.
For the SNP array, genomic DNA was diluted with water to 50 ng/μl (±5 ng/μl). The diluted DNA was processed immediately or stored at −20°C until use. 250 µg of diluted genomic DNA of an individual case was used in one SNP experiment. Processing of the genomic DNA was performed according to the GeneChip® Mapping 500K Assay Manual (Affymetrix) and hybridized either to the early access version of Nsp-1 or the commercially available Nsp-1 250K and Sty-1 250K chips. In selected cases, to increase the coverage to investigate certain regions with higher coverage, the Affymetrix® Genome-Wide Human SNP arrays 6.0 were used following the corresponding Affymetrix® Genome-Wide Human SNP Nsp/Sty 6.0 User Guide. The type of array used in each case is listed in Table 1. All reagents and hardware recommended by the manual were used and procedures were followed in detail. Briefly, the genomic DNA was digested with either Nsp-1 or Sty-1 restriction enzymes. Digested samples were ligated using either the Nsp-1 or Sty-1 adaptor, followed by a PCR amplification step (four reactions for Nsp-1 and three reactions for Sty-1). Two microliter of each PCR reaction was tested for the successful processing of the DNA on a 2% Agarose 1×TBE Gel (run 1 h, 120 V). All three steps were performed on a MJ Tetrad PTC-225 Thermo Cycler (Biorad). The pooled PCR products were purified with a Clontech Clean-Up Plate and eluted with RB Buffer for the 250K arrays; magnetic beat separation was used for the 6.0 arrays. The DNA concentration of the eluted PCR products was normalized to 2 µg/μl in RB Buffer for the 250K arrays. The PCR products were fragmented using the Fragmentation Reagent. An aliquot of 4 µl of the fragmented product was checked on a second quality control gel (4% NuSieve GTG Agarose 1×TBE gel, run for 1 h, 120 V). The fragmented samples were then labeled with the GeneChip® DNA Labeling Reagent. The Labeling and Fragmentation steps were performed on a GeneAmp® PCR System 9700 Thermo Cycler with gold plated block (Applied Biosystems). After mixing the labeled product with a hybridization buffer, the samples were denatured, loaded into the appropriate array and hybridized in a GeneChip® Hybridization Oven for a minimum of 16 h. The probe arrays were washed and stained on the Fluidics Station 450, and finally scanned using the GeneChip® Scanner 3000 7G. The scanner was controlled by the software GeneChip® Operating Software 2.0. The automatically generated CEL-files were used for further data analyses with two different software programs as described below.
Inv dup markers are accessory chromosomes and contribute additional copies of a specific chromosomal region to normal genome. These additional copies were identified after analyzing the data by two different software packages. The GTG 2.0 Software (Affymetrix) used a data set of 119 normal female patients for the commercial 250K array data, and 270 normal patients for the 6.0 array data to normalize the obtained data. Data from the 250K and the 6.0 chips were analyzed with this program. A second program, written in the R programming language (DFC, available upon request), utilized a set of phenotypically normal patients from a Hutterite population as a reference (39). The data obtained with the early access chips and the 250K arrays were analyzed with this second program, which we refer to as P2. The normalization procedure implemented in P2 largely followed the procedures described by Huang and coworkers (40). The resulting normalized intensity ratios were then split into segments of similar mean and variance using the circular binary segmentation algorithm (41). The information gained from this procedure, as well as a priori knowledge of the breakpoint vicinity from previous cytogenetic analysis, was used by P2 to construct maximum-likelihood estimates and confidence intervals on the location of each breakpoint from the anomaly of interest. This was implemented as follows. A breakpoint was defined, as the physical position at which there is a transition in copy number in the case sample. A candidate region for the breakpoint was constructed in such a way that one edge certainly falls in a region of normal copy number, the other edge certainly falls in the region of abnormal copy number, and there is certainly only one transition in the region. We tried to include as much of the aneuploid segment as possible, while matching the amount of aneuploid and diploid data in the region. This candidate region was then analyzed using a likelihood-based approach to predict the most likely location of the breakpoint. Briefly, the intensity ratios were modeled as a mixture of two normal densities, corresponding to the aneuploid and diploid segments. We found the maximum likelihood location of the breakpoint using a grid search across the candidate region. To provide a practical measure of the uncertainty in the breakpoint localization, we rescaled the breakpoint likelihoods for the entire region into posterior probabilities (using Bayes rule). This enabled us to construct 95% confidence intervals on the location of the breakpoint, which we find helpful in the decision-making process when prioritizing regions for follow-up experimentation, and were used later on in the breakpoint motif analyses.
FISH was used to confirm the array-determined breakpoints. FISH signals were scored to be either absent or present and as one or two signals on the marker chromosome. At times, a larger signal or weaker signal, compared with the signal on a normal chromosome, was present in the same karyotype and was noted. When a break was determined to be present between a pair of adjacent clones, the middle position of each clone was used to delineate the breakpoint as determined by FISH. Utilizing this protocol, breakpoints previously determined by array technology were either verified or further delineated.
To determine if an acentric marker was derived from one normal chromosome present in the cell, or from two different chromosomes, array data was used. The signal intensities for both A and B alleles of each SNP that was present on both the marker chromosome and on the chromosomes from which it was derived were evaluated. The ratios that can be observed depended, first, on how many different haplotypes and, second, on how many copies were present in the genome. The copy number in each case was determined after karyotype analysis. In Supplementary Material, Fig. S1, all theoretical possible A:B ratios for different copy numbers and haplotypes are listed. If two copies were present in the genome, the possible A:B ratios should be 2:0, 0:2, 1:1. If three copies were present, the possible ratios should be 3:0, 0:3, 2:1 and 1:2. If four copies were present, the possible ratios should be 4:0, 0:4, 3:1, 1:3, and 2:2. The copy number values for the A allele (a) and the B allele (b) were extracted from the raw data to calculate the ratios. For a better assessment of the A:B ratios, the data was plotted using the formula (a−b)/(a+b). A significant peak present at the expected theoretical value for a particular A:B ratio, is noted in Table 3 with a ‘+’ sign. If a theoretical value was not observed, a ‘−’ sign was noted. The theoretical possible ratios were compared with the experimentally observed ratios for each case. This way, certain haplotype combinations could be ruled out, suggesting, whether the two copies present on the inv dup marker were derived from one haplotype already present in the genome or not.
After delineation of the breakpoint, the DNA sequence within the breakpoint region was analyzed for the presence of specific sequence motifs with inverted homologies that might facilitate or better enable the formation of an inv dup. Two sets of sequence types were screened for: (i) sequence pairs with inverted homologies to each other, and (ii) single sequences that showed inverted homology to its own reverse complement, thus being palindromic. All sequence types should show inverted homologies of 70% and their location would be within or adjacent to the determined breakpoint regions. Specific attention was given to but not limited to LCRs (42), HSCA (43), LINEs, short interspersed nuclear elements (SINEs), and AT rich microsatellites. The presence of specific repeat sequences was assessed by RepeatMasker Open-3.0 (http://www.repeatmasker.org), and palindrome presence (44) with the help of a web-based database (Human DNA palindrome database (HPALDB), http://vhp.ntu.edu.sg/hpaldb/). Sequence pairs within the breakpoint regions that showed inverted homology of above 70% were identified utilizing two programs: Blat (http://genome.ucsc.edu/cgi-bin/hgBlat?command=start) and Blast2 (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi) (45). The BLAT program allowed the identification of regions with inverted homologies in comparison to the entire genome, and the BLAST2 program allowed direct comparison of two different sequences and allowed to determine directly the degree of inverted homology. All identified sequence motifs with their degree of inverted homologies are listed in Table 3, and genes present at breakpoint are listed in Table S4.
To determine how many haplotypes were present in selected cases and subsequently to distinguish if the two copies on an inv dup marker were formed from one chromosome (or haplotype) or from copies derived from two different chromosomes (haplotypes), various microsatellites (preferentially with high heterozygosity values) that were present on the markers were tested. D8S264, D8S277, D8S503, D8S520, D8S550, D8S258, D8S1734 were used for marker-4; D9S288, D9S1810, D9S286, D9S269, D9S157 for markers -6 and -7, D9S169, D9S161 and D9D1817 and D11S898, D11S908, D11S925, D11S4094, D11S1320 and D11S968 for the marker-8. The primers were from IDTechnologies and were chosen utilizing the UCSC server 2006 (see Supplementary Material, Table S5 for primer sequence and position). When possible the forward primer in each set was labeled with FAM fluorescent dye. The assay was run on a 3130×1 Genetic Analyzer (Applied Biosystems). Amplification reactions were carried out in a total volume of 12 µl, containing 1–10 ng template DNA, 1× PCR buffer, 0.2 mm of each dNTP, 1 U Taq DNA polymerase, 8 pmol of each reverse and fluorescent labeled primer and 2 pmol of the forward primer. Conditions of the PCR amplification were as follows: 10 min at 95°C, followed by 35 cycles at 94°C (30 s), 55°C (30 s), 72°C (30 s), and a final extension at 72°C for 5 min. Microsatellites alleles were resolved on a 3130×1 Genetic Analyzer using the GENESCAN 3.7 software and sized using Gene-Scan 500 ROX (6-carbon-X-rhodamine) molecular size standards (35–500 bp) with GENOTYPER 3.7 software (Applied Biosystems).
Supported by grant number T32CA009594 from the National Cancer Institute (A.E.M.) and the Packard Foundation (D.F.C. through Jonathan K. Pritchard).
The BACs and Fosmids were obtained from both the Welcome Trust Sanger Institute, UK and Dr Evan Eichler, Seattle, WA, USA. Affymetrix provided several SNP arrays and labeling kits.
Conflict of Interest statement. None declared.