There has been an explosion of data describing newly recognized structural variants in the human genome. In the flurry of reporting, there has been no standard approach to collecting the data, assessing its quality or describing identified features. This risks becoming a rampant problem, in particular with respect to surveys of copy number variation and their application to disease studies. Here, we consider the challenges in characterizing and documenting genomic structural variants. From this, we derive recommendations for standards to be adopted, with the aim of ensuring the accurate presentation of this form of genetic variation to facilitate ongoing research.
The association of DNA copy-number variation (CNV) with specific gene function and human disease has been long known, but the wide scope and prevalence of this form of variation has only recently been fully appreciated. The latest studies using microarray technology have demonstrated that as much as 12% of the human genome and thousands of genes are variable in copy number, and this diversity is likely to be responsible for a significant proportion of normal phenotypic variation. Current challenges involve developing methods not only for detecting and cataloging CNVs in human populations at increasingly higher resolution but also for determining the association of CNVs with biological function, recent human evolution, and common and complex human disease.
The analytical resolution of individual chromosome peaks in the flow karyotype of cell lines is dependent on sample preparation and the detection sensitivity of the flow cytometer. We have investigated the effect of laser power on the resolution of chromosome peaks in cell lines with complex karyotypes. Chromosomes were prepared from a human gastric cancer cell line and a cell line from a patient with an abnormal phenotype using a modified polyamine isolation buffer. The stained chromosome suspensions were analyzed on a MoFlo sorter (Beckman Coulter) equipped with two water-cooled lasers (Coherent). A bivariate flow karyotype was obtained from each of the cell lines at various laser power settings and compared to a karyotype generated using laser power settings of 300 mW. The best separation of chromosome peaks was obtained with laser powers of 300 mW. This study demonstrates the requirement for high-laser powers for the accurate detection and purification of chromosomes, particularly from complex karyotypes, using a conventional flow cytometer.
flow karyotype; resolution; chromosomes; laser power; photo-saturation
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The trials performed worldwide towards Non-Invasive Prenatal Diagnosis (NIPD) of Down syndrome (or Trisomy 21) have demonstrated the great commercial and medical potential of NIPD compared to the currently used invasive prenatal diagnostic procedures. Extensive investigation of methylation differences between the mother and the fetus has led to the identification of Differentially Methylated Regions (DMRs). In this study, we present a strategy using the Methylated DNA immunoprecipitation (MeDiP) methodology in combination with real-time qPCR to achieve fetal chromosome dosage assessment which can be performed non-invasively through the analysis of fetal-specific DMRs. We achieved non-invasive prenatal detection of trisomy 21 by determining the methylation ratio of normal and trisomy 21 cases for each tested fetal-specific DMR present in maternal peripheral blood, followed by further statistical analysis. The application of the above fetal-specific methylation ratio approach provided correct diagnosis of 14 trisomy 21 and 26 normal cases.
Patients with developmental disorders often harbour sub-microscopic deletions or duplications that lead to a disruption of normal gene expression or perturbation in the copy number of dosage-sensitive genes. Clinical interpretation for such patients in isolation is hindered by the rarity and novelty of such disorders. The DECIPHER project (https://decipher.sanger.ac.uk) was established in 2004 as an accessible online repository of genomic and associated phenotypic data with the primary goal of aiding the clinical interpretation of rare copy-number variants (CNVs). DECIPHER integrates information from a variety of bioinformatics resources and uses visualization tools to identify potential disease genes within a CNV. A two-tier access system permits clinicians and clinical scientists to maintain confidential linked anonymous records of phenotypes and CNVs for their patients that, with informed consent, can subsequently be shared with the wider clinical genetics and research communities. Advances in next-generation sequencing technologies are making it practical and affordable to sequence the whole exome/genome of patients who display features suggestive of a genetic disorder. This approach enables the identification of smaller intragenic mutations including single-nucleotide variants that are not accessible even with high-resolution genomic array analysis. This article briefly summarizes the current status and achievements of the DECIPHER project and looks ahead to the opportunities and challenges of jointly analysing structural and sequence variation in the human genome.
Down syndrome (DS) is caused by trisomy of chromosome 21 (Hsa21) and presents a complex phenotype that arises from abnormal dosage of genes on this chromosome. However, the individual dosage-sensitive genes underlying each phenotype remain largely unknown. To help dissect genotype – phenotype correlations in this complex syndrome, the first fully transchromosomic mouse model, the Tc1 mouse, which carries a copy of human chromosome 21 was produced in 2005. The Tc1 strain is trisomic for the majority of genes that cause phenotypes associated with DS, and this freely available mouse strain has become used widely to study DS, the effects of gene dosage abnormalities, and the effect on the basic biology of cells when a mouse carries a freely segregating human chromosome. Tc1 mice were created by a process that included irradiation microcell-mediated chromosome transfer of Hsa21 into recipient mouse embryonic stem cells. Here, the combination of next generation sequencing, array-CGH and fluorescence in situ hybridization technologies has enabled us to identify unsuspected rearrangements of Hsa21 in this mouse model; revealing one deletion, six duplications and more than 25 de novo structural rearrangements. Our study is not only essential for informing functional studies of the Tc1 mouse but also (1) presents for the first time a detailed sequence analysis of the effects of gamma radiation on an entire human chromosome, which gives some mechanistic insight into the effects of radiation damage on DNA, and (2) overcomes specific technical difficulties of assaying a human chromosome on a mouse background where highly conserved sequences may confound the analysis. Sequence data generated in this study is deposited in the ENA database, Study Accession number: ERP000439.
We report on a 17-year-old patient with midline defects, ocular hypertelorism, neuropsychomotor development delay, neonatal macrosomy, and dental anomalies. DNA copy number investigations using a Whole Genome TilePath array consisting, of 30K BAC/PAC clones showed a 6.36 Mb deletion in the 9p24.1–p24.3 region and a 14.83 Mb duplication in the 20p12.1–p13 region, which derived from a maternal balanced t(9;20)(p24.1;p12.1) as shown by FISH studies. Monosomy 9p is a well-delineated chromosomal syndrome with characteristic clinical features, while chromosome 20p duplication is a rare genetic condition. Only a handful of cases of monosomy 9/trisomy 20 have been previously described. In this report, we compare the phenotype of our patient with those already reported in the literature, and discuss the role of DMRT, DOCK8, FOXD4, VLDLR, RSPO4, AVP, RASSF2, PROKR2, BMP2, MKKS, and JAG1, all genes mapping to the deleted and duplicated regions.
multiple congenital anomalies; midline; chromosomal aberration; array-CGH; monosomy 9p; trisomy 20p
The recently described DNA replication-based mechanisms of fork stalling and template switching (FoSTeS) and microhomology-mediated break-induced replication (MMBIR) were previously shown to catalyze complex exonic, genic and genomic rearrangements. By analyzing a large number of isochromosomes of the long arm of chromosome X (i(Xq)), using whole-genome tiling path array comparative genomic hybridization (aCGH), ultra-high resolution targeted aCGH and sequencing, we provide evidence that the FoSTeS and MMBIR mechanisms can generate large-scale gross chromosomal rearrangements leading to the deletion and duplication of entire chromosome arms, thus suggesting an important role for DNA replication-based mechanisms in both the development of genomic disorders and cancer. Furthermore, we elucidate the mechanisms of dicentric i(Xq) (idic(Xq)) formation and show that most idic(Xq) chromosomes result from non-allelic homologous recombination between palindromic low copy repeats and highly homologous palindromic LINE elements. We also show that non-recurrent-breakpoint idic(Xq) chromosomes have microhomology-associated breakpoint junctions and are likely catalyzed by microhomology-mediated replication-dependent recombination mechanisms such as FoSTeS and MMBIR. Finally, we stress the role of the proximal Xp region as a chromosomal rearrangement hotspot.
Motivation: The careful normalization of array-based comparative genomic hybridization (aCGH) data is of critical importance for the accurate detection of copy number changes. The difference in labelling affinity between the two fluorophores used in aCGH—usually Cy5 and Cy3—can be observed as a bias within the intensity distributions. If left unchecked, this bias is likely to skew data interpretation during downstream analysis and lead to an increased number of false discoveries.
Results: In this study, we have developed aCGH.Spline, a natural cubic spline interpolation method followed by linear interpolation of outlier values, which is able to remove a large portion of the dye bias from large aCGH datasets in a quick and efficient manner.
Conclusions: We have shown that removing this bias and reducing the experimental noise has a strong positive impact on the ability to detect accurately both copy number variation (CNV) and copy number alterations (CNA).
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
Aarray painting is a technique that uses microarray technology to rapidly map chromosome translocation breakpoints. previous methods to map translocation breakpoints have used fluorescence in situ hybridization (FIsH) and have consequently been labor-intensive, time-consuming and restricted to the low breakpoint resolution imposed by the use of metaphase chromosomes. array painting combines the isolation of derivative chromosomes (chromosomes with translocations) and high-resolution microarray analysis to refine the genomic location of translocation breakpoints in a single experiment. In this protocol, we describe array painting by isolation of derivative chromosomes using a MoFlo flow sorter, amplification of these derivatives using whole-genome amplification and hybridization onto commercially available oligonucleotide microarrays. although the sorting of derivative chromosomes is a specialized procedure requiring sophisticated equipment, the amplification, labeling and hybridization of Dna is straightforward, robust and can be completed within 1 week. the protocol described produces good quality data; however, array painting is equally achievable using any combination of the available alternative methodologies for chromosome isolation, amplification and hybridization.
Copy number variants (CNVs) account for the majority of human genomic diversity in terms of base coverage. Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals. The genomes of 30 individuals from three Asian populations (Korean, Chinese and Japanese) were interrogated with an ultra-high-resolution array CGH platform containing 24 million probes. Whole-genome sequencing data from a reference genome (NA10851, with 28.3× coverage) and two Asian genomes (AK1, with 27.8× coverage and AK2, with 32.0× coverage) were used to transform the relative copy number information obtained from array CGH experiments into absolute copy number values. We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations, and the new method of calling absolute CNVs will be essential for applying CNV data to personalized medicine.
Congenital malformations involving the Müllerian ducts are observed in around 5% of infertile women. Complete aplasia of the uterus, cervix, and upper vagina, also termed Müllerian aplasia or Mayer–Rokitansky–Kuster–Hauser (MRKH) syndrome, occurs with an incidence of around 1 in 4500 female births, and occurs in both isolated and syndromic forms. Previous reports have suggested that a proportion of cases, especially syndromic cases, are caused by variation in copy number at different genomic loci.
In order to obtain an overview of the contribution of copy number variation to both isolated and syndromic forms of Müllerian aplasia, copy number assays were performed in a series of 63 cases, of which 25 were syndromic and 38 isolated.
A high incidence (9/63, 14%) of recurrent copy number variants in this cohort is reported here. These comprised four cases of microdeletion at 16p11.2, an autism susceptibility locus not previously associated with Müllerian aplasia, four cases of microdeletion at 17q12, and one case of a distal 22q11.2 microdeletion. Microdeletions at 16p11.2 and 17q12 were found in 4/38 (10.5%) cases with isolated Müllerian aplasia, and at 16p11.2, 17q12 and 22q11.2 (distal) in 5/25 cases (20%) with syndromic Müllerian aplasia.
The finding of microdeletion at 16p11.2 in 2/38 (5%) of isolated and 2/25 (8%) of syndromic cases suggests a significant contribution of this copy number variant alone to the pathogenesis of Müllerian aplasia. Overall, the high incidence of recurrent copy number variants in all forms of Müllerian aplasia has implications for the understanding of the aetiopathogenesis of the condition, and for genetic counselling in families affected by it.
The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.
► Whole-genome sequences of the Tasmanian devil and two distant cancer subclones ► The Tasmanian devil cancer lineage originated recently in a female devil ► The devil cancer genome is relatively stable despite ongoing evolution ► Clonal divergence and geographic spread elucidated through patterns of mutation
Whole-genome sequences of the Tasmanian devil and two devil cancer subclones suggest that the cancer first arose from a female devil and that the clone has subsequently genetically diverged during its spread across Tasmania.
We have systematically compared copy number variant (CNV) detection on eleven microarrays to evaluate data quality and CNV calling, reproducibility, concordance across array platforms and laboratory sites, breakpoint accuracy and analysis tool variability. Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. Nevertheless, these findings should not preclude detection of large CNVs for clinical diagnostic purposes because large CNVs with poor reproducibility are found primarily in complex genomic regions and would typically be removed by standard clinical data curation. The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics. The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms.
Mutations in the transcription factor encoding TFAP2A gene underlie branchio-oculo-facial syndrome (BOFS), a rare dominant disorder characterized by distinctive craniofacial, ocular, ectodermal and renal anomalies. To elucidate the range of ocular phenotypes caused by mutations in TFAP2A, we took three approaches. First, we screened a cohort of 37 highly selected individuals with severe ocular anomalies plus variable defects associated with BOFS for mutations or deletions in TFAP2A. We identified one individual with a de novo TFAP2A four amino acid deletion, a second individual with two non-synonymous variations in an alternative splice isoform TFAP2A2, and a sibling-pair with a paternally inherited whole gene deletion with variable phenotypic expression. Second, we determined that TFAP2A is expressed in the lens, neural retina, nasal process, and epithelial lining of the oral cavity and palatal shelves of human and mouse embryos—sites consistent with the phenotype observed in patients with BOFS. Third, we used zebrafish to examine how partial abrogation of the fish ortholog of TFAP2A affects the penetrance and expressivity of ocular phenotypes due to mutations in genes encoding bmp4 or tcf7l1a. In both cases, we observed synthetic, enhanced ocular phenotypes including coloboma and anophthalmia when tfap2a is knocked down in embryos with bmp4 or tcf7l1a mutations. These results reveal that mutations in TFAP2A are associated with a wide range of eye phenotypes and that hypomorphic tfap2a mutations can increase the risk of developmental defects arising from mutations at other loci.
Cancer is driven by somatically acquired point mutations and chromosomal rearrangements, conventionally thought to accumulate gradually over time. Using next-generation sequencing, we characterize a phenomenon, which we term chromothripsis, whereby tens to hundreds of genomic rearrangements occur in a one-off cellular crisis. Rearrangements involving one or a few chromosomes crisscross back and forth across involved regions, generating frequent oscillations between two copy number states. These genomic hallmarks are highly improbable if rearrangements accumulate over time and instead imply that nearly all occur during a single cellular catastrophe. The stamp of chromothripsis can be seen in at least 2%–3% of all cancers, across many subtypes, and is present in ∼25% of bone cancers. We find that one, or indeed more than one, cancer-causing lesion can emerge out of the genomic crisis. This phenomenon has important implications for the origins of genomic remodeling and temporal emergence of cancer.
► 2%–3% cancers show 10–100 s of rearrangements localized to specific genomic regions ► Genomic features imply chromosome breaks occur in one-off crisis (“chromothripsis”) ► Found across all tumor types, especially common in bone cancers (up to 25%) ► Can generate several genomic lesions with potential to drive cancer in single event
Microarray-based Comparative Genomic Hybridization (array-CGH) has been applied for a decade to screen for submicroscopic DNA gains and losses in tumor and constitutional DNA samples. This method has become increasingly flexible with the integration of new biological resources generated by genome sequencing projects. In this chapter, we describe alternative strategies for whole genome screening and high resolution breakpoint mapping of copy number changes by array-CGH, as well as tools available for accurate analysis of array-CGH experiments. Although most methods listed here have been designed for microarrays composed of large-insert clones, they can be adapted easily to other types of microarray platforms, such as those constructed from printed or synthesized oligonucleotides.
probe design; clone selection; normalization; outlier detection; CNV calling; Comparative Genomic Hybridization; array-CGH
The spatial resolution of microarray-based comparative genomic hybridization (array-CGH) is dependent on the length and density of target DNA sequences covering the chromosomal region of interest. Here we describe the methods developed at the Wellcome Trust Sanger Institute (Cambridge, UK) to construct microarrays composed of large-insert clones available through genome sequencing projects. These methods are applicable to Bacterial and Phage Artificial Chromosomes (BAC and PAC) as well as fosmid and cosmid clones. The protocols are scalable for the construction of microarrays composed of several hundreds up to several ten thousands clones.
microarray fabrication; large-insert clones; BAC; PAC; fosmid; cosmid; DOP-PCR; Comparative Genomic Hybridization; array-CGH.
Array-CGH involves the comparison of a test to a reference genome using a microarray composed of target sequences with known chromosomal coordinates. The test and reference DNA samples are used as templates to generate two probe DNAs labeled with distinct fluorescent dyes. The two probe DNAs are co-hybridized on a microarray in the presence of Cot-1 DNA to suppress unspecific hybridization of repeat sequences. After slide washes and drying, microarray images are acquired on a laser scanner and fluorescent intensities from every target sequence spot on the array are extracted using dedicated computer programs. Intensity ratios are calculated and normalized to enable data interpretation. Although the protocols explained in this chapter correspond primarily to the use of large-insert clone microarrays in either manual or automated fashion, necessary adaptations for hybridization on microarrays composed of shorter target DNA sequences are also briefly described.
probe labeling; random priming; hybridization; detection; Comparative Genomic Hybridization; array-CGH
Hypoplastic left heart (HLH) occurs in at least 1 in 10 000 live births but may be more common in utero. Its causes are poorly understood but a number of affected cases are associated with chromosomal abnormalities. We set out to localize the breakpoints in a patient with sporadic HLH and a de novo translocation. Initial studies showed that the apparently simple 1q41;3q27.1 translocation was actually combined with a 4-Mb inversion, also de novo, of material within 1q41. We therefore localized all four breakpoints and found that no known transcription units were disrupted. However we present a case, based on functional considerations, synteny and position of highly conserved non-coding sequence elements, and the heterozygous Prox1+/− mouse phenotype (ventricular hypoplasia), for the involvement of dysregulation of the PROX1 gene in the aetiology of HLH in this case. Accordingly, we show that the spatial expression pattern of PROX1 in the developing human heart is consistent with a role in cardiac development. We suggest that dysregulation of PROX1 gene expression due to separation from its conserved upstream elements is likely to have caused the heart defects observed in this patient, and that PROX1 should be considered as a potential candidate gene for other cases of HLH. The relevance of another breakpoint separating the cardiac gene ESRRG from a conserved downstream element is also discussed.
chromosome inversion; chromosome translocation; PROX1; hypoplastic left heart; position effect
Alveolar capillary dysplasia with misalignment of pulmonary veins (ACD/MPV) is a rare, neonatally lethal developmental disorder of the lung with defining histologic abnormalities typically associated with multiple congenital anomalies (MCA). Using array CGH analysis, we have identified six overlapping microdeletions encompassing the FOX transcription factor gene cluster in chromosome 16q24.1q24.2 in patients with ACD/MPV and MCA. Subsequently, we have identified four different heterozygous mutations (frameshift, nonsense, and no-stop) in the candidate FOXF1 gene in unrelated patients with sporadic ACD/MPV and MCA. Custom-designed, high-resolution microarray analysis of additional ACD/MPV samples revealed one microdeletion harboring FOXF1 and two distinct microdeletions upstream of FOXF1, implicating a position effect. DNA sequence analysis revealed that in six of nine deletions, both breakpoints occurred in the portions of Alu elements showing eight to 43 base pairs of perfect microhomology, suggesting replication error Microhomology-Mediated Break-Induced Replication (MMBIR)/Fork Stalling and Template Switching (FoSTeS) as a mechanism of their formation. In contrast to the association of point mutations in FOXF1 with bowel malrotation, microdeletions of FOXF1 were associated with hypoplastic left heart syndrome and gastrointestinal atresias, probably due to haploinsufficiency for the neighboring FOXC2 and FOXL1 genes. These differences reveal the phenotypic consequences of gene alterations in cis.
The factors that mediate chromosomal rearrangement remain incompletely defined. Among regions prone to structural variant formation, chromosome 6p25 is one of the few in which disease-associated segmental duplications and segmental deletions have been identified, primarily through gene dosage attributable ocular phenotypes. Using array comparative genome hybridization, we studied ten 6p25 duplication and deletion pedigrees and amplified junction fragments from each. Analysis of the breakpoint architecture revealed that all the rearrangements were non-recurrent, and in contrast to most previous examples the majority of the segmental duplications and deletions utilized coupled homologous and non-homologous recombination mechanisms. One junction fragment exhibited an unprecedented 367 bp insert derived from tandemly arranged breakpoint elements. While this accorded with a recently described replication-based mechanism, it differed from the previous example in being unassociated with template switching, and occurring in a segmental deletion. These results extend the mechanisms involved in structural variant formation, provide strong evidence that a spectrum of recombination, DNA repair and replication underlie 6p25 rearrangements, and have implications for genesis of copy number variations in other genomic regions. These findings highlight the benefits of undertaking the extensive studies necessary to characterize structural variants at the base pair level.