Search tips
Search criteria

Results 1-25 (51)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  A Genome-Wide Assessment of the Role of Untagged Copy Number Variants in Type 1 Diabetes 
PLoS Genetics  2014;10(5):e1004367.
Genome-wide association studies (GWAS) for type 1 diabetes (T1D) have successfully identified more than 40 independent T1D associated tagging single nucleotide polymorphisms (SNPs). However, owing to technical limitations of copy number variants (CNVs) genotyping assays, the assessment of the role of CNVs has been limited to the subset of these in high linkage disequilibrium with tag SNPs. The contribution of untagged CNVs, often multi-allelic and difficult to genotype using existing assays, to the heritability of T1D remains an open question. To investigate this issue, we designed a custom comparative genetic hybridization array (aCGH) specifically designed to assay untagged CNV loci identified from a variety of sources. To overcome the technical limitations of the case control design for this class of CNVs, we genotyped the Type 1 Diabetes Genetics Consortium (T1DGC) family resource (representing 3,903 transmissions from parents to affected offspring) and used an association testing strategy that does not necessitate obtaining discrete genotypes. Our design targeted 4,309 CNVs, of which 3,410 passed stringent quality control filters. As a positive control, the scan confirmed the known T1D association at the INS locus by direct typing of the 5′ variable number of tandem repeat (VNTR) locus. Our results clarify the fact that the disease association is indistinguishable from the two main polymorphic allele classes of the INS VNTR, class I-and class III. We also identified novel technical artifacts resulting into spurious associations at the somatically rearranging loci, T cell receptor, TCRA/TCRD and TCRB, and Immunoglobulin heavy chain, IGH, loci on chromosomes 14q11.2, 7q34 and 14q32.33, respectively. However, our data did not identify novel T1D loci. Our results do not support a major role of untagged CNVs in T1D heritability.
Author Summary
For many complex traits, and in particular type 1 diabetes (T1D), the genome-wide association study (GWAS) design has been successful at detecting a large number of loci that contribute disease risk. However, in the case of T1D as well as almost all other traits, the sum of these loci does not fully explain the heritability estimated from familial studies. This observation raises the possibility that additional variants exist but have not yet been found because they have not effectively been targeted by the GWAS design. Here, we focus on a specific class of large deletions/duplications called copy number variants (CNVs), and more precisely to the subset of these loci that mutate rapidly, which are highly polymorphic. A consequence of this high level of polymorphism is that these variants have typically not been captured by previous GWAS studies. We use a family based design that is optimized to capture these previously untested variants. We then perform a genome-wide scan to assess their contribution to T1D. Our scan was technically successful but did not identify novel associations. This suggests that little was missed by the GWAS strategy, and that the remaining heritability of T1D is most likely driven by a large number of variants, either rare of common, but with a small individual contribution to disease risk.
PMCID: PMC4038470  PMID: 24875393
2.  DeNovoGear: de novo indel and point mutation discovery and phasing 
Nature methods  2013;10(10):985-987.
We present the DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis, and fragment information to identify the parental origin of germline mutations. We used our program to create a whole-genome de novo indel callset with a 95% validation rate, producing a direct estimate of the human germline indel mutation rate.
PMCID: PMC4003501  PMID: 23975140
3.  Cerebral organoids model human brain development and microcephaly 
Nature  2013;501(7467):10.1038/nature12517.
The complexity of the human brain has made it difficult to study many brain disorders in model organisms, and highlights the need for an in vitro model of human brain development. We have developed a human pluripotent stem cell-derived 3D organoid culture system, termed cerebral organoid, which develops various discrete though interdependent brain regions. These include cerebral cortex containing progenitor populations that organize and produce mature cortical neuron subtypes. Furthermore, cerebral organoids recapitulate features of human cortical development, namely characteristic progenitor zone organization with abundant outer radial glial stem cells. Finally, we use RNAi and patient-specific iPS cells to model microcephaly, a disorder that has been difficult to recapitulate in mice. We demonstrate premature neuronal differentiation in patient organoids, a defect that could explain the disease phenotype. Our data demonstrate that 3D organoids can recapitulate development and disease of even this most complex human tissue.
PMCID: PMC3817409  PMID: 23995685
4.  The Rate of Nonallelic Homologous Recombination in Males Is Highly Variable, Correlated between Monozygotic Twins and Independent of Age 
PLoS Genetics  2014;10(3):e1004195.
Nonallelic homologous recombination (NAHR) between highly similar duplicated sequences generates chromosomal deletions, duplications and inversions, which can cause diverse genetic disorders. Little is known about interindividual variation in NAHR rates and the factors that influence this. We estimated the rate of deletion at the CMT1A-REP NAHR hotspot in sperm DNA from 34 male donors, including 16 monozygotic (MZ) co-twins (8 twin pairs) aged 24 to 67 years old. The average NAHR rate was 3.5×10−5 with a seven-fold variation across individuals. Despite good statistical power to detect even a subtle correlation, we observed no relationship between age of unrelated individuals and the rate of NAHR in their sperm, likely reflecting the meiotic-specific origin of these events. We then estimated the heritability of deletion rate by calculating the intraclass correlation (ICC) within MZ co-twins, revealing a significant correlation between MZ co-twins (ICC = 0.784, p = 0.0039), with MZ co-twins being significantly more correlated than unrelated pairs. We showed that this heritability cannot be explained by variation in PRDM9, a known regulator of NAHR, or variation within the NAHR hotspot itself. We also did not detect any correlation between Body Mass Index (BMI), smoking status or alcohol intake and rate of NAHR. Our results suggest that other, as yet unidentified, genetic or environmental factors play a significant role in the regulation of NAHR and are responsible for the extensive variation in the population for the probability of fathering a child with a genomic disorder resulting from a pathogenic deletion.
Author Summary
Many genetic disorders are caused by deletions of specific regions of DNA in sperm or egg cells that go on to produce a child. This can occur through ectopic homologous recombination between highly similar segments of DNA at different positions within the genome. Little is known about the differences in rates of deletion between individuals or the factors that influence this. We analysed the rate of deletion at one such section of DNA in sperm DNA from 34 male donors, including 16 monozygotic co-twins. We observed a seven-fold variation in deletion rate across individuals. Deletion rate is significantly correlated between monozygote co-twins, indicating that deletion rate is heritable. This heritability cannot be explained by age, any known genetic regulator of deletion rate, Body Mass Index, smoking status or alcohol intake. Our results suggest that other, as yet unidentified, genetic or environmental factors play a significant role in the regulation of deletion. These factors are responsible for the extensive variation in the population for the probability of fathering a child with a genomic disorder resulting from a pathogenic deletion.
PMCID: PMC3945173  PMID: 24603440
5.  Exome sequencing improves genetic diagnosis of structural fetal abnormalities revealed by ultrasound 
Human Molecular Genetics  2014;23(12):3269-3277.
The genetic etiology of non-aneuploid fetal structural abnormalities is typically investigated by karyotyping and array-based detection of microscopically detectable rearrangements, and submicroscopic copy-number variants (CNVs), which collectively yield a pathogenic finding in up to 10% of cases. We propose that exome sequencing may substantially increase the identification of underlying etiologies. We performed exome sequencing on a cohort of 30 non-aneuploid fetuses and neonates (along with their parents) with diverse structural abnormalities first identified by prenatal ultrasound. We identified candidate pathogenic variants with a range of inheritance models, and evaluated these in the context of detailed phenotypic information. We identified 35 de novo single-nucleotide variants (SNVs), small indels, deletions or duplications, of which three (accounting for 10% of the cohort) are highly likely to be causative. These are de novo missense variants in FGFR3 and COL2A1, and a de novo 16.8 kb deletion that includes most of OFD1. In five further cases (17%) we identified de novo or inherited recessive or X-linked variants in plausible candidate genes, which require additional validation to determine pathogenicity. Our diagnostic yield of 10% is comparable to, and supplementary to, the diagnostic yield of existing microarray testing for large chromosomal rearrangements and targeted CNV detection. The de novo nature of these events could enable couples to be counseled as to their low recurrence risk. This study outlines the way for a substantial improvement in the diagnostic yield of prenatal genetic abnormalities through the application of next-generation sequencing.
PMCID: PMC4030780  PMID: 24476948
6.  Empirical research on the ethics of genomic research 
There is no universally accepted definition of what an incidental finding is [Wolf et al., 2008] and broadly speaking this could include variants of known and unknown clinical significance, variants linked to highly penetrant, serious, life-threatening conditions, non-paternity or ancestry data. For the purposes of our study, we have adopted a pragmatic distinction between ‘pertinent’ and ‘incidental’ findings as set out in this text. Whilst in the US definitions of incidental findings are becoming accepted in practice [Green et al., 2013] it is still not known how and whether these also apply elsewhere around the world.
PMCID: PMC3884757  PMID: 23813698
7.  DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation 
Nucleic Acids Research  2013;42(D1):D993-D1000.
The DECIPHER database ( is an accessible online repository of genetic variation with associated phenotypes that facilitates the identification and interpretation of pathogenic genetic variation in patients with rare disorders. Contributing to DECIPHER is an international consortium of >200 academic clinical centres of genetic medicine and ≥1600 clinical geneticists and diagnostic laboratory scientists. Information integrated from a variety of bioinformatics resources, coupled with visualization tools, provides a comprehensive set of tools to identify other patients with similar genotype–phenotype characteristics and highlights potentially pathogenic genes. In a significant development, we have extended DECIPHER from a database of just copy-number variants to allow upload, annotation and analysis of sequence variants such as single nucleotide variants (SNVs) and InDels. Other notable developments in DECIPHER include a purpose-built, customizable and interactive genome browser to aid combined visualization and interpretation of sequence and copy-number variation against informative datasets of pathogenic and population variation. We have also introduced several new features to our deposition and analysis interface. This article provides an update to the DECIPHER database, an earlier instance of which has been described elsewhere [Swaminathan et al. (2012) DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum. Mol. Genet., 21, R37–R44].
PMCID: PMC3965078  PMID: 24150940
8.  DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders 
Human Molecular Genetics  2012;21(R1):R37-R44.
Patients with developmental disorders often harbour sub-microscopic deletions or duplications that lead to a disruption of normal gene expression or perturbation in the copy number of dosage-sensitive genes. Clinical interpretation for such patients in isolation is hindered by the rarity and novelty of such disorders. The DECIPHER project ( was established in 2004 as an accessible online repository of genomic and associated phenotypic data with the primary goal of aiding the clinical interpretation of rare copy-number variants (CNVs). DECIPHER integrates information from a variety of bioinformatics resources and uses visualization tools to identify potential disease genes within a CNV. A two-tier access system permits clinicians and clinical scientists to maintain confidential linked anonymous records of phenotypes and CNVs for their patients that, with informed consent, can subsequently be shared with the wider clinical genetics and research communities. Advances in next-generation sequencing technologies are making it practical and affordable to sequence the whole exome/genome of patients who display features suggestive of a genetic disorder. This approach enables the identification of smaller intragenic mutations including single-nucleotide variants that are not accessible even with high-resolution genomic array analysis. This article briefly summarizes the current status and achievements of the DECIPHER project and looks ahead to the opportunities and challenges of jointly analysing structural and sequence variation in the human genome.
PMCID: PMC3459644  PMID: 22962312
10.  NDUFA4 Mutations Underlie Dysfunction of a Cytochrome c Oxidase Subunit Linked to Human Neurological Disease 
Cell Reports  2013;3(6):1795-1805.
The molecular basis of cytochrome c oxidase (COX, complex IV) deficiency remains genetically undetermined in many cases. Homozygosity mapping and whole-exome sequencing were performed in a consanguineous pedigree with isolated COX deficiency linked to a Leigh syndrome neurological phenotype. Unexpectedly, affected individuals harbored homozygous splice donor site mutations in NDUFA4, a gene previously assigned to encode a mitochondrial respiratory chain complex I (NADH:ubiquinone oxidoreductase) subunit. Western blot analysis of denaturing gels and immunocytochemistry revealed undetectable steady-state NDUFA4 protein levels, indicating that the mutation causes a loss-of-function effect in the homozygous state. Analysis of one- and two-dimensional blue-native polyacrylamide gels confirmed an interaction between NDUFA4 and the COX enzyme complex in control muscle, whereas the COX enzyme complex without NDUFA4 was detectable with no abnormal subassemblies in patient muscle. These observations support recent work in cell lines suggesting that NDUFA4 is an additional COX subunit and demonstrate that NDUFA4 mutations cause human disease. Our findings support reassignment of the NDUFA4 protein to complex IV and suggest that patients with unexplained COX deficiency should be screened for NDUFA4 mutations.
Graphical Abstract
•Mutations in NDUFA4, assigned to encode a complex I subunit, cause human COX deficiency•Confirmed interaction between NDUFA4 and the COX holoenzyme in control muscle•The COX holoenzyme without NDUFA4 is detectable with no abnormal subassemblies in patient muscle•NDUFA4 is essential for complex IV activity, but is not required for assembly of the COX holoenzyme
Isolated cytochrome c oxidase (COX) deficiency is a frequent finding in human mitochondrial disease. Mutations in nuclear-encoded structural subunits are extremely rare, and, in many cases, the molecular basis remains undetermined. Recent evidence in cell lines has suggested that NDUFA4, previously assigned to encode a complex I subunit, actually encodes a structural component of COX. Hanna and colleagues now demonstrate that NDUFA4 mutations cause human COX deficiency, thus confirming NDUFA4 as a COX subunit that is essential for the enzyme’s activity.
PMCID: PMC3701321  PMID: 23746447
11.  Quantifying single nucleotide variant detection sensitivity in exome sequencing 
BMC Bioinformatics  2013;14:195.
The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed.
Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give “power estimates” for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5–15% of heterozygous and 1–4% of homozygous SNVs in the targeted regions will be missed.
Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the “missing heritability” of quantitative traits.
PMCID: PMC3695811  PMID: 23773188
12.  Harnessing genomics to identify environmental determinants of heritable disease 
Mutation research  2012;752(1):6-9.
Next-generation sequencing technologies can now be used to directly measure heritable de novo DNA sequence mutations in humans. However, these techniques have not been used to examine environmental factors that induce such mutations and their associated diseases. To address this issue, a working group on environmentally induced germline mutation analysis (ENIGMA) met in October 2011 to propose the necessary foundational studies, which include sequencing of parent–offspring trios from highly exposed human populations, and controlled dose–response experiments in animals. These studies will establish background levels of variability in germline mutation rates and identify environmental agents that influence these rates and heritable disease. Guidance for the types of exposures to examine come from rodent studies that have identified agents such as cancer chemotherapeutic drugs, ionizing radiation, cigarette smoke, and air pollution as germ-cell mutagens. Research is urgently needed to establish the health consequences of parental exposures on subsequent generations.
PMCID: PMC3556182  PMID: 22935230
Germ cell; Heritable mutation; Next generation sequencing; Copy number variants
14.  Mutation spectrum revealed by breakpoint sequencing of human germline CNVs 
Nature genetics  2010;42(5):385-391.
Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 0% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315 deletions. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1–30 bp of microhomology, whereas 33% of deletion breakpoints contain 1–367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible.
PMCID: PMC3428939  PMID: 20364136
15.  Variation in genome-wide mutation rates within and between human families 
Nature genetics  2011;43(7):712-714.
J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female 1. Diverse studies have supported Haldane’s contention of a higher average mutation rate in the male germline in a variety of mammals, including humans (e.g. 2,3). Here we present the first direct comparative analysis of male and female germline mutation rates from complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell-lines from which DNA was derived. Most strikingly, in one family we observed that 92% of germline DNMs were from the paternal germline, while, in complete contrast, in the other family 64% of DNMs were from the maternal germline. These observations reveal considerable variation in mutation rates within and between families.
PMCID: PMC3322360  PMID: 21666693
16.  Comprehensive assessment of array-based platforms and calling algorithms for detection of copy number variants 
Nature Biotechnology  2011;29(6):512-520.
We have systematically compared copy number variant (CNV) detection on eleven microarrays to evaluate data quality and CNV calling, reproducibility, concordance across array platforms and laboratory sites, breakpoint accuracy and analysis tool variability. Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. Nevertheless, these findings should not preclude detection of large CNVs for clinical diagnostic purposes because large CNVs with poor reproducibility are found primarily in complex genomic regions and would typically be removed by standard clinical data curation. The striking differences between CNV calls from different platforms and analytic tools highlight the importance of careful assessment of experimental design in discovery and association studies and of strict data curation and filtering in diagnostics. The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms.
PMCID: PMC3270583  PMID: 21552272
17.  The Y-chromosome landscape of the Philippines: extensive heterogeneity and varying genetic affinities of Negrito and non-Negrito groups 
The Philippines exhibits a rich diversity of people, languages, and culture, including so-called ‘Negrito' groups that have for long fascinated anthropologists, yet little is known about their genetic diversity. We report here, a survey of Y-chromosome variation in 390 individuals from 16 Filipino ethnolinguistic groups, including six Negrito groups, from across the archipelago. We find extreme diversity in the Y-chromosome lineages of Filipino groups with heterogeneity seen in both Negrito and non-Negrito groups, which does not support a simple dichotomy of Filipino groups as Negrito vs non-Negrito. Filipino non-recombining region of the human Y chromosome lineages reflect a chronology that extends from after the initial colonization of the Asia-Pacific region, to the time frame of the Austronesian expansion. Filipino groups appear to have diverse genetic affinities with different populations in the Asia-Pacific region. In particular, some Negrito groups are associated with indigenous Australians, with a potential time for the association ranging from the initial colonization of the region to more recent (after colonization) times. Overall, our results indicate extensive heterogeneity contributing to a complex genetic history for Filipino groups, with varying roles for migrations from outside the Philippines, genetic drift, and admixture among neighboring groups.
PMCID: PMC3025791  PMID: 20877414
Y-chromosome; Filipino; Negrito; heterogeneity; genetic affinity
18.  Large, rare chromosomal deletions associated with severe early-onset obesity 
Nature  2009;463(7281):666-670.
Obesity is a highly heritable and genetically heterogeneous disorder1. Here we investigated the contribution of copy number variation to obesity in 300 Caucasian patients with severe early-onset obesity, 143 of whom also had developmental delay. Large (>500 kilobases), rare (<1%) deletions were significantly enriched in patients compared to 7,366 controls (P < 0.001). We identified several rare copy number variants that were recurrent in patients but absent or at much lower prevalence in controls. We identified five patients with overlapping deletions on chromosome 16p11.2 that were found in 2 out of 7,366 controls (P < 5 × 10−5). In three patients the deletion co-segregated with severe obesity. Two patients harboured a larger de novo 16p11.2 deletion, extending through a 593-kilobase region previously associated with autism2-4 and mental retardation5; both of these patients had mild developmental delay in addition to severe obesity. In an independent sample of 1,062 patients with severe obesity alone, the smaller 16p11.2 deletion was found in an additional two patients. All 16p11.2 deletions encompass several genes but include SH2B1, which is known to be involved in leptin and insulin signalling6. Deletion carriers exhibited hyperphagia and severe insulin resistance disproportionate for the degree of obesity. We show that copy number variation contributes significantly to the genetic architecture of human obesity.
PMCID: PMC3108883  PMID: 19966786
19.  Characterising and Predicting Haploinsufficiency in the Human Genome 
PLoS Genetics  2010;6(10):e1001154.
Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
Author Summary
Humans, like most complex organisms, have two copies of most genes in their genome, one from the mother and one from the father. This redundancy provides a back-up copy for most genes, should one copy be lost through mutation. For a minority of genes, one functional copy is not enough to sustain normal human function, and mutations causing the loss of function of one of the copies of such genes are a major cause of childhood developmental diseases. Over the past 20 years medical geneticists have identified over 300 such genes, but it is not known how many of the 22,000 genes in our genome may also be sensitive to gene loss. By comparing these ∼300 genes known to be sensitive to gene loss with over 1,000 genes where loss of a single copy does not result in disease, we have identified some key evolutionary and functional similarities between genes sensitive to loss of a single copy. We have used these similarities to predict for most genes in the genome, whether loss of a single copy is likely to result in disease. These predictions will help in the interpretation of mutations seen in patients.
PMCID: PMC2954820  PMID: 20976243
20.  Towards a comprehensive structural variation map of an individual human genome 
Genome Biology  2010;11(5):R52.
A comprehensive map of structural variation in the human genome provides a reference dataset for analyses of future personal genomes.
Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.
We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association.
Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.
PMCID: PMC2898065  PMID: 20482838
21.  High-throughput haplotype determination over long distances by Haplotype Fusion PCR and Ligation Haplotyping 
Nature protocols  2009;4(12):1771-1783.
When combined with Haplotype Fusion PCR (HF-PCR), Ligation Haplotyping is a robust, high-throughput method for empirical determination of haplotypes, which can be applied to assaying both sequence and structural variation over long distances. Unlike alternative approaches to haplotype determination, such as allele-specific PCR and long PCR, HF-PCR and Ligation Haplotyping do not suffer from mispriming or template switching errors. In this method, HF-PCR is used to juxtapose DNA sequences from single molecule templates, that contain single nucleotide polymorphisms (SNPs) or paralogous sequence variants (PSVs) separated by several kilobases. HF-PCR employs an emulsion-based fusion PCR reaction, which can be performed rapidly, and in a 96-well format. Subsequently, a ligation-based assay is performed on the HF-PCR products to determine haplotypes. Products are resolved by capillary electrophoresis. Once optimized, the method is rapid to perform, taking a day and a half to generate phased haplotypes from genomic DNA.
PMCID: PMC2871309  PMID: 20010928
22.  The functional impact of structural variation in humans 
Trends in genetics : TIG  2008;24(5):238-245.
Structural variation includes many different types of chromosomal rearrangement and encompasses millions of bases in every human genome. Over the past three years the extent and complexity of structural variation has become better appreciated. Diverse approaches have been adopted to explore the functional impact of this class of variation. As disparate indications of the important biological consequences of genome dynamism are accumulating rapidly, we review the evidence that structural variation has an appreciable impact on cellular phenotypes, disease and human evolution.
PMCID: PMC2869026  PMID: 18378036
23.  A robust statistical method for case-control association testing with copy number variation 
Nature genetics  2008;40(10):1245-1252.
Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.
PMCID: PMC2784596  PMID: 18776912
24.  Deep short-read sequencing of chromosome 17 from the mouse strains A/J and CAST/Ei identifies significant germline variation and candidate genes that regulate liver triglyceride levels 
Genome Biology  2009;10(10):R112.
Methods for accurate identification of nucleotide and structural variation using de novo short read sequencing of mouse chromosomes are described.
Genome sequences are essential tools for comparative and mutational analyses. Here we present the short read sequence of mouse chromosome 17 from the Mus musculus domesticus derived strain A/J, and the Mus musculus castaneus derived strain CAST/Ei. We describe approaches for the accurate identification of nucleotide and structural variation in the genomes of vertebrate experimental organisms, and show how these techniques can be applied to help prioritize candidate genes within quantitative trait loci.
PMCID: PMC2784327  PMID: 19825173
25.  The population genetics of structural variation 
Nature genetics  2007;39(7 Suppl):S30-S36.
Population genetics is central to our understanding of human variation, and by linking medical and evolutionary themes, it enables us to understand the origins and impacts of our genomic differences. Despite current limitations in our knowledge of the locations, sizes and mutational origins of structural variants, our characterization of their population genetics is developing apace, bringing new insights into recent human adaptation, genome biology and disease. We summarize recent dramatic advances, describe the diverse mutational origins of chromosomal rearrangements and argue that their complexity necessitates a re-evaluation of existing population genetic methods.
PMCID: PMC2716079  PMID: 17597779

Results 1-25 (51)