Many of the issues confronting the field of structural variation will be resolved as advances in technology allow robust and economical analysis of structural variants at the nucleotide level in multiple genomes. Such techniques will include ‘tiling path’-coverage oligonucleotide arrays, paired-end sequence relationship comparisons, and partial or complete sequence assembly comparisons. The ultimate standard will be sequence resolution of all structural variation in a defined set of reference individuals to establish a benchmark for genotyping platforms. We do not foresee that any one approach will capture all genetic variation reliably, nor, for at least a few more years, will a single strategy predominate over microarray-based approaches. Therefore, the main challenges from this point onward will surely include managing a huge data volume, integrating information from various discovery platforms and discerning phenotypic implications. New issues will arise, such as how to best annotate structural variation data in individual diploid genome assemblies (arising from personalized sequencing projects), as well as how to put haplotypes of structural variants (with or without SNPs) into context with respect to the latest human reference sequence. Structural variation data should also assist SNP, linkage disequilibrium and gene expression determination, but new database tools will be required to fully interpret the data.
Structural variation discoveries offer the potential to bridge a long-standing gap between cytogenetic and sequence-based investigations, and unify our understanding of genetic variation. Interestingly, at the onset of writing, we tried to sidestep the topic of terminology (and nomenclature), but kept returning to it in some way or another as we worked to define and distill the breadth of issues before us. In fact, it was the issue of terminology that highlighted the extreme heterogeneity in data being published, with the related strengths, caveats and differences in the studies being attributable in part to the different backgrounds of the researchers involved.
An equally intricate issue for data integration in the future will be categorizing structural variants in terms of whether they are ‘normal’, ‘disease-causing’ or ‘phenotype-associated’, as these designations can be part of a continuous range1,24,55,56
. In , we put forward ideas of annotation modifiers that will assist in maximizing the utility of structural variation information. Molecular cytogeneticists have always been faced with this dilemma and its particular implications in the prenatal or diagnostic setting. Now, with the ability to readily recognize submicroscopic and sequence-level variation, the question of how to differentiate benign and disease-associated structural changes will be increasingly important. There are already well defined examples in which the presence of a structural variant correlates directly with a syndrome or phenotype, such as the many dosage-related microdeletions and duplications that cause genomic disorders57-63
(also see the DECIPHER database). Family-based studies can demonstrate whether a change is de novo
or has been inherited and, in the latter case, whether there are likely to be associated phenotypic consequences (noting there are numerous examples of variable expression of phenotype and disease in inherited chromosomal rearrangements)1,21,55
. Otherwise, large population studies and control and disease reference databases will provide the best source of information about a structural variant’s frequency and likelihood of causing a phenotypic outcome.
Classification of modifiers used for the description of structural variationa
Notwithstanding the challenges, we believe that the recommendations presented here offer necessary first steps toward standardization of many of the variables that, if ignored, will impede progress. At the same time, we recognize that consensus is important, and that standards require time to mature before adoption and implementation48
. With some ground rules now set, it is also our intention to continue discussions with the genomic structural variation research community at the most relevant meeting opportunities.