It is known that chromosomal rearrangements are highly associated with repetitive sequences in genomic disorders and cancer. Up to a quarter of entries in the Gross Rearrangement Breakpoint Database show presence of repetitive elements [
49]. The repetitive elements range in size and may be as large as 6 Kb in the case of Long Interspersed Nuclear Elements (LINEs) and may cluster, creating long stretches of non-unique sequence. Breakpoints that overlap repetitive sequence elements may not be detected by 5 Kb (or shorter-range) mate pair libraries. Even if the breakpoint is detected, the non-unique sequence surrounding the rearrangement may make validation by PCR challenging. Having large clonal-sized inserts, fosmid diTags overcome this problem by spanning repetitive sequences and correctly identifying aberrant fusions. For example, in our previous study of MCF7 cells, we identified the expressed
DEPDC1B-ELOVL2 chimeric mRNA transcript which is formed by a 5q12.1 intra-chromosomal inversion [
28]. This breakpoint is detected using fosmid diTags, but not 5 Kilobase-sized mate pairs due the presence of LINEs, SINEs, and microsatellites surrounding the site of rearrangement.
In many cases, optimal PCR primer design is hindered by the presence of repetitive sequence surrounding the join. This is common when rearrangements are facilitated by homologous recombination [
50,
51]. Short repetitive elements or longer segmental duplications (also referred to as low copy repeats) at sites of rearrangement severely limit the number of unique priming positions. Fosmid-sized inserts are able to span such repetitive regions, thus providing means of validating breakpoints even in cases of PCR assay failure. For example, there are two previously published gene truncations identified by our fosmid diTag and 5 Kb mate pair libraries that fail breakpoint spanning PCR assay confirmation. First is the t(5;8)(q35.3;q24.21) translocation in HCC1954 involving the truncation of
NSD1, a fusion protein also found in myeloid leukemia [
33]. Second is the t(3;15)(p14.1;q23.2) translocation in MCF7 involving the truncation of
BRIP1, a
BRCA1-interacting protein that contributes to DNA repair [
28]. Although these two gene truncations are cross validated by fosmid and 5 Kb sized inserts, PCR assay across the breakpoint results in amplification failure. In these cases, breakpoint spanning primer design was severely hindered due to the presence of interspersed nuclear elements and long terminal repeats across the aberrant joins.
We show that fosmid-sized inserts are adept at spanning repetitive sequences known to exist at sites of gross rearrangement and low copy repeats associated with homologous recombination. Combining fosmid diTag and 5Kb Illumina mate pair libraries we were able to detect and validate aberrant fusions involving repetitive genomic sequence where detection by shorter end sequence profiles alone or validation by breakpoint spanning PCR assays failed. In addition, we observe that those rearrangements detected by both insert size ranges exhibit 3-fold enrichment for cancer-specific somatic mutation and 2-fold reduction in false positives when compared to the 5 Kb mate pairs alone.
For those breast cancer-specific somatic mutations involving genes, we queried transcriptome fusion and truncation literature to corroborate our finding and assess the extent to which our combined fosmid diTag and 5 Kb mate pair libraries rediscovered known chimeric transcripts reported in MCF7 and HCC1954. We identified genomic alterations corresponding to upwards of approximately half of the published MCF7 and HCC1954 chrimeric mRNA transcripts, but it is difficult to assess the lower bound of our sensitivity since it is unclear if the undetected transcript mutations are due to trans-splicing or similar post-transcriptional modifications.
We integrated read density and breakpoint information from mapped fosmid diTags and 5 Kb mate pairs to accurately identify distinct copy number variation in MCF7 and HCC1954. We discovered distinct driver oncogenes associated with high-copy number amplifications in MCF7 and HCC1954. The distinct structural mutability profiles between MCF7 and HCC1954 correlate to their phenotypic differences. Amplified chromosomal segments, breakpoint clusters, and affected genes are located at different positions across the MCF7 and HCC1954 genomes; and correspond to overexpression of different oncogenes, silencing of diverse tumor suppressors, and distinct defects in DNA repair machinery responsible for homology-driven repair of double-stranded DNA breaks. It is intriguing that in conjunction with mutations in the same DNA repair pathway we also find similar patterns of structural mutability in the two cell lines. Both have clustered and dispersed breakpoints; both exhibit clustered breakpoints in regions of high copy number amplification and dispersed breakpoints that are enriched for the presence of low copy repeats.