High throughput sequencing approaches, including targeted sequencing of samples enriched for specific regions of interest and whole genome sequencing, have recently proven useful for identification of chromosomal breakpoints at base pair resolution in genomes harboring chromosomal rearrangements (Talkowski et al. 2011
). To determine the optimum approach (whole genome or targeted sequencing), the final coverage requirements were estimated. Based on previous sequence capture data, a minimum read depth requirement of 5 reads was established to ensure sufficient sampling of each haplotype in the region of interest. Therefore, since Ts65Dn mice are trisomic for the regions of interest, a minimum read depth of 15 reads was required for sufficient sampling. Data from our previous targeted re-sequencing efforts indicated that 150X oversampling was sufficient to ensure 15 reads minimum coverage within the targeted region (D’Ascenzo et al. 2009
). Taking into account that some reads spanning the 1716
junction might fail to successfully align to the reference genome, an oversampling goal of 200X was established. Considering a target region of ~13 Mb (based on the combined mapping data for the Chr 16 and Chr 17 breakpoints), we estimated that 2.6 Gb (13 Mb × 200) of sequence would be needed to sufficiently cover the targeted regions at a minimum depth of 15 reads. To maximize oversampling of the target region, two 1M feature arrays with overlapping DNA probes (3 bp offset, 60 bp probes) were used to enrich for Chr 16 sequence between Ncam2
, as well as Chr 17 sequence between D17Mit19 and D17Mit 58.
A total of 37 million, 76 bp, paired end reads were generated on an Illumina GIIAx platform and nearly half of the reads mapped to the desired regions on Chr 16 and Chr 17, resulting in ~2.8 Gb of sequence data representing the target region and an average read depth of 94 reads. Further analysis revealed that 99.9% of the targeted bases were covered by at least one read and 90.5% were covered by at least 15 reads. These coverage statistics indicated that sufficient coverage of the target region was obtained with 37 milllion, 76 bp, paired end reads.
The sequence data were mapped to the C57BL/6 reference genome (NCBI m37/mm9) and analysis of the resulting alignments revealed fourteen paired-end reads flanking the 1716 junction, with one mate mapping properly to Chr 16 and the other mapping to Chr 17 (). Furthermore, among the reads flagged as ‘not properly mapped’ were a total of 55 individual 76 bp reads spanning the junction, consisting of both Chr 16 and Chr 17 sequence. De-novo assembly of these reads revealed the precise location of the Chr 16 and Chr 17 breakpoints to be 84,351,351 bp and 9,426,822 bp, respectively. In depth analysis of the aligned reads the 1716 junction revealed that approximately 12.4% (17/137) of properly mapped reads were from the 1716 chromosome. This was significantly lower than expected assuming equal representation (33%) of each allele in the sequencing data; however, since many of the reads from the translocation chromosome failed to properly map to reference, it was not surprising to find bias in the mapped data.
Figure 2 Visualization of reads aligning to the Chr 17 (A) and Chr 16 (B) breakpoint regions using the Integrated Genomics Viewer (IGV). Paired end reads mapping to Chr 17 with mates mapping to Chr 16 are shown in orange (A). Likewise, paired end reads mapping (more ...)
To confirm that this breakpoint sequence is present in animals that carry this marker chromosome, more than 300 genomic DNA samples from animals previously genotyped by standard genotyping methods (qPCR and FISH) were subjected to PCR amplification of the junction fragment (). In all trisomic animals, the breakpoint sequences were detected. Additionally, fluorescent in situ hybridization using BACs containing genomic DNA from flanking regions within 100 kb of the breakpoints also confirmed the location of the breakpoint ().
Figure 3 Amplification and sequencing of the 1716 junction. PCR and Sanger sequencing of the junction confirmed the precise sequence of the breakpoint as predicted by de novo assembly of reads spanning the junction (A). PCR amplification of the junction in 209 (more ...)
Multiple sequence alignment of the junction sequence with homologous regions on Chr 16 and Chr 17 revealed no evidence of gain or loss of sequence from either chromosome. Moreover, there was very little evidence of sequence homology between the breakpoint regions on Chr 16 and Chr 17, with the exception of a short, 7 nt homology region, 15 bp distal to the Chrs 16 and 17 breakpoints (). The Chr 16 breakpoint confirms the previously characterized gene dosage data as illustrated in (Akeson et al. 2001
; Kahlem et al. 2004
). In addition, the position of the Chr 16 breakpoint also confirms the presence of the microRNA, mir155, on the 1716
chromosome. The position of the Chr 17 breakpoint indicates that the 1716
chromosome carries ~10 Mb of Chr 17 and within this region, there are 60 Ensembl and/or NCBI annotations for protein coding genes, pseudogenes and non-coding RNAs genes. The region shares conserved synteny with the human Chr 6 q25.3–q27 and ~70% (42/60) of the genes annotated in the Chr 17 region have a predicted ortholog (Ensembl predictions) in the human regions of conserved synteny (Supplemental Table 1
Figure 5 Multiple sequence alignment of breakpoint regions on Chr 17 (red), Chr16 (black) and the junction site sequence on Mmu1716. Multiple sequence alignment of the breakpoint regions revealed a 7 bp homology region (underlined), 15 bp distal to the breakpoint. (more ...)