Our findings are important in two ways. First, we show that an integrative approach using experimental analyses to train computational SV calling is essential for the accurate characterization of SV architecture. Second, we find a considerable complexity in SV formation; about a quarter of SVs in the mouse are composed of a complex mixture of deletion, insertion, inversion and copy number gain.
In contrast to studies that start by identifying SVs using automated genome-wide methods, followed by experimental validation, we started by experimentally determining a set of SVs and then applied this information to interpret whole-genome automated SV detection [32
]. Laboratory-based efforts proved essential for two main reasons. First, they allowed the correct interpretation of the PEM patterns. Without knowing how to interpret the underlying molecular structure of each PEM, some patterns would be missed or classified incorrectly by computational methods alone. Second, our laboratory efforts allowed the recognition of a diversity of PEM patterns. Otherwise we would not be able to distinguish between simple and complex SVs.
Finer-scale breakpoint sequence analysis reveals that 24% of simple SVs have smaller rearrangements at the nucleotide level (micro-insertions or micro-deletions at the breakpoint of a larger SV). This raises questions about the likely mechanisms of SV formation.
We know that retrotransposition is the commonest mechanism of SV formation in the mouse [32
]. We also know that retrotransposons (LINEs, SINEs and long terminal repeats) are typically characterized by flanking target site duplications and a poly(A) tail or poly(T) head. However, we observed that 15% of retrotransposon SVs do not have target site duplications and truncated or absent poly(A) tails or poly(T) heads (Additional file 9
). Moran and colleagues [36
] observed a similar phenomenon in the human genome and suggested that retrotransposons, such as LINE-1 elements, integrate into DNA lesions, resulting in retrotransposon-mediated DNA repair. We suggest that about 15% of retrotransposon SVs in the mouse genome formed through a similar mechanism involving DNA repair.
It is reasonable to assume that the complexities (micro-insertions and micro-deletions) we see at the breakpoints of ancestral deletions, inversions and gains (we call these 'complex' non-retrotransposon SVs) (Table ) will correlate with a complex mechanism of formation. A DNA replication fork stalling and template switching/microhomology-mediated break-induced replication (FoSTeS/MMBIR) mechanism has been proposed to generate such complex SVs in the human genome [37
]. In addition, about half of our complex non-retrotransposon SVs have microhomology (short sequence of identical bases) ranging from 3 to 25 bp (Additional file 9
), compatible with a microhomology-mediated break-induced replication process. It could be that the complex non-retrotransposon SVs are also the progeny of mutational processes during DNA replication.
Interestingly, our estimate that 24% of SVs have micro-structures at their breakpoint is the same as that reported by Eichler and colleagues [30
] in a study of human structural variation. Another sequencing-based study of SVs in two mouse strains (DBA/2J and C57BL/6J) examined 3,316 breakpoints and reported that 16% of non-transposon structural variants are complex, as defined by multiple breakpoints mapped to within 1 kbp of each other [29
]. However, we were not able to directly compare these results to ours since we have not used the same classification criteria (we used a classification based on SVs being right next to each other, whereas Hall and colleagues' [29
] was based on SVs being at close proximity).
Ideally, sequencing longer reads would typically be required to resolve the complex architecture of structural variants we report in this study, something that goes beyond the current generation sequencing platforms. Our findings offer an intermediate solution between next generation sequencing analysis and complete de novo assembly of genomes.