Search tips
Search criteria

Results 1-22 (22)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Clonal Architecture of Secondary Acute Myeloid Leukemia Defined by Single-Cell Sequencing 
PLoS Genetics  2014;10(7):e1004462.
Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions—the population frequency of individual clones, their genetic composition, and their evolutionary relationships—which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Author Summary
Human cancers are genetically diverse populations of cells that evolve over the course of their natural history or in response to the selective pressure of therapy. In theory, it is possible to infer how this variation is structured into related populations of cells based on the frequency of individual mutations in bulk samples, but the accuracy of these models has not been evaluated across a large number of variants in individual cells. Here, we report a strategy for analyzing hundreds of variants within a single cell, and we apply this method to assess models of tumor clonality derived from bulk samples in three cases of leukemia. The data largely support the predicted population structure, though they suggest specific refinements. This type of approach not only illustrates the biological complexity of human cancer, but it also has the potential to inform patient management. That is, precise knowledge of which variants are present in which populations of cells may allow physicians to more effectively target combinations of mutations and predict how patients will respond to therapy.
PMCID: PMC4091781  PMID: 25010716
2.  Cis and Trans Effects of Human Genomic Variants on Gene Expression 
PLoS Genetics  2014;10(7):e1004461.
Gene expression is a heritable cellular phenotype that defines the function of a cell and can lead to diseases in case of misregulation. In order to detect genetic variations affecting gene expression, we performed association analysis of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) with gene expression measured in 869 lymphoblastoid cell lines of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort in cis and in trans. We discovered that 3,534 genes (false discovery rate (FDR) = 5%) are affected by an expression quantitative trait locus (eQTL) in cis and 48 genes are affected in trans. We observed that CNVs are more likely to be eQTLs than SNPs. In addition, we found that variants associated to complex traits and diseases are enriched for trans-eQTLs and that trans-eQTLs are enriched for cis-eQTLs. As a variant affecting both a gene in cis and in trans suggests that the cis gene is functionally linked to the trans gene expression, we looked specifically for trans effects of cis-eQTLs. We discovered that 26 cis-eQTLs are associated to 92 genes in trans with the cis-eQTLs of the transcriptions factors BATF3 and HMX2 affecting the most genes. We then explored if the variation of the level of expression of the cis genes were causally affecting the level of expression of the trans genes and discovered several causal relationships between variation in the level of expression of the cis gene and variation of the level of expression of the trans gene. This analysis shows that a large sample size allows the discovery of secondary effects of human variations on gene expression that can be used to construct short directed gene regulatory networks.
Author Summary
Humans differ in their genetic sequences at millions of positions but only a subset of these differences have a functional effect. In order to detect functional genetic differences, we assessed the impact of common genetic variants on gene expression in 869 individuals and discovered that the expression of many genes is affected by common variants in cis or in trans. We show that the effect of some variants on gene expression cannot be detected in other tissues, highlighting the tissue specificity of gene regulation. In addition, we show that variants associated to common diseases are more likely to affect gene expression in cis and in trans. Finally, we show that variants affecting gene expression in cis often affect gene expression in trans, which suggests that the trans effects are due to the cis genes expression. We tested this hypothesis and discovered several cases of genes regulated in trans by a cis regulated gene in a causal manner. This shows that a population-based strategy with a large number of individuals has the potential to detect secondary effects of common variants that can be used to construct short directed regulatory networks.
PMCID: PMC4091791  PMID: 25010687
3.  DeNovoGear: de novo indel and point mutation discovery and phasing 
Nature methods  2013;10(10):985-987.
We present the DeNovoGear software for analyzing de novo mutations from familial and somatic tissue sequencing data. DeNovoGear uses likelihood-based error modeling to reduce the false positive rate of mutation discovery in exome analysis, and fragment information to identify the parental origin of germline mutations. We used our program to create a whole-genome de novo indel callset with a 95% validation rate, producing a direct estimate of the human germline indel mutation rate.
PMCID: PMC4003501  PMID: 23975140
4.  Human Spermatogenic Failure Purges Deleterious Mutation Load from the Autosomes and Both Sex Chromosomes, including the Gene DMRT1 
PLoS Genetics  2013;9(3):e1003349.
Gonadal failure, along with early pregnancy loss and perinatal death, may be an important filter that limits the propagation of harmful mutations in the human population. We hypothesized that men with spermatogenic impairment, a disease with unknown genetic architecture and a common cause of male infertility, are enriched for rare deleterious mutations compared to men with normal spermatogenesis. After assaying genomewide SNPs and CNVs in 323 Caucasian men with idiopathic spermatogenic impairment and more than 1,100 controls, we estimate that each rare autosomal deletion detected in our study multiplicatively changes a man's risk of disease by 10% (OR 1.10 [1.04–1.16], p<2×10−3), rare X-linked CNVs by 29%, (OR 1.29 [1.11–1.50], p<1×10−3), and rare Y-linked duplications by 88% (OR 1.88 [1.13–3.13], p<0.03). By contrasting the properties of our case-specific CNVs with those of CNV callsets from cases of autism, schizophrenia, bipolar disorder, and intellectual disability, we propose that the CNV burden in spermatogenic impairment is distinct from the burden of large, dominant mutations described for neurodevelopmental disorders. We identified two patients with deletions of DMRT1, a gene on chromosome 9p24.3 orthologous to the putative sex determination locus of the avian ZW chromosome system. In an independent sample of Han Chinese men, we identified 3 more DMRT1 deletions in 979 cases of idiopathic azoospermia and none in 1,734 controls, and found none in an additional 4,519 controls from public databases. The combined results indicate that DMRT1 loss-of-function mutations are a risk factor and potential genetic cause of human spermatogenic failure (frequency of 0.38% in 1306 cases and 0% in 7,754 controls, p = 6.2×10−5). Our study identifies other recurrent CNVs as potential causes of idiopathic azoospermia and generates hypotheses for directing future studies on the genetic basis of male infertility and IVF outcomes.
Author Summary
Infertility is a disease that prevents the transmission of DNA from one generation to the next, and consequently it has been difficult to study the genetics of infertility using classical human genetics methods. Now, new technologies for screening entire genomes for rare and patient-specific mutations are revolutionizing our understanding of reproductively lethal diseases. Here, we apply techniques for variation discovery to study a condition called azoospermia, the failure to produce sperm. Large deletions of the Y chromosome are the primary known genetic risk factor for azoospermia, and genetic testing for these deletions is part of the standard treatment for this condition. We have screened over 300 men with azoospermia for rare deletions and duplications, and find an enrichment of these mutations throughout the genome compared to unaffected men. Our results indicate that sperm production is affected by mutations beyond the Y chromosome and will motivate whole-genome analyses of larger numbers of men with impaired spermatogenesis. Our finding of an enrichment of rare deleterious mutations in men with poor sperm production also raises the possibility that the slightly increased rate of birth defects reported in children conceived by in vitro fertilization may have a genetic basis.
PMCID: PMC3605256  PMID: 23555275
5.  Mutation spectrum revealed by breakpoint sequencing of human germline CNVs 
Nature genetics  2010;42(5):385-391.
Precisely characterizing the breakpoints of copy number variants (CNVs) is crucial for assessing their functional impact. However, fewer than 0% of known germline CNVs have been mapped to the single-nucleotide level. We characterized the sequence breakpoints from a dataset of all CNVs detected in three unrelated individuals in previous array-based CNV discovery experiments. We used targeted hybridization-based DNA capture and 454 sequencing to sequence 324 CNV breakpoints, including 315 deletions. We observed two major breakpoint signatures: 70% of the deletion breakpoints have 1–30 bp of microhomology, whereas 33% of deletion breakpoints contain 1–367 bp of inserted sequence. The co-occurrence of microhomology and inserted sequence is low (10%), suggesting that there are at least two different mutational mechanisms. Approximately 5% of the breakpoints represent more complex rearrangements, including local microinversions, suggesting a replication-based strand switching mechanism. Despite a rich literature on DNA repair processes, reconstruction of the molecular events generating each of these mutations is not yet possible.
PMCID: PMC3428939  PMID: 20364136
6.  A systematic survey of loss-of-function variants in human protein-coding genes 
Science (New York, N.Y.)  2012;335(6070):823-828.
Genome sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2,951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in non-essential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes, and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
PMCID: PMC3299548  PMID: 22344438
7.  Origins and functional impact of copy number variation in the human genome 
Nature  2009;464(7289):704-712.
Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
PMCID: PMC3330748  PMID: 19812545
8.  Variation in genome-wide mutation rates within and between human families 
Nature genetics  2011;43(7):712-714.
J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female 1. Diverse studies have supported Haldane’s contention of a higher average mutation rate in the male germline in a variety of mammals, including humans (e.g. 2,3). Here we present the first direct comparative analysis of male and female germline mutation rates from complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell-lines from which DNA was derived. Most strikingly, in one family we observed that 92% of germline DNMs were from the paternal germline, while, in complete contrast, in the other family 64% of DNMs were from the maternal germline. These observations reveal considerable variation in mutation rates within and between families.
PMCID: PMC3322360  PMID: 21666693
9.  Independent estimation of the frequency of rare CNVs in the UK population confirms their role in schizophrenia 
Schizophrenia Research  2012;135(1-3):1-7.
Several large, rare chromosomal copy number variants (CNVs) have recently been shown to increase risk for schizophrenia and other neuropsychiatric disorders including autism, ADHD, learning difficulties and epilepsy.
We wanted to examine the frequencies of these schizophrenia-associated variants in a large sample of individuals with non-psychiatric illnesses to better understand the robustness and specificity of the association with schizophrenia.
We used Affymetrix 500K microarray data from 10,259 individuals from the UK Wellcome Trust Case Control Consortium (WTCCC) who are affected with six non-psychiatric disorders (coronary artery disease, Crohn's disease, hypertension, rheumatoid arthritis, types 1 and 2 diabetes) to establish the frequencies of nine CNV loci strongly implicated in schizophrenia, and compared them with the previous findings.
Deletions at 1q21.1, 3q29, 15q11.2, 15q13.1 and 22q11.2 (VCFS region), and duplications at 16p11.2 were found significantly more often in schizophrenia cases, compared with the WTCCC reference set. Deletions at 17p12 and 17q12, were also more common in schizophrenia cases but not significantly so, while duplications at 16p13.1 were found at nearly the same rate as in previous schizophrenia samples. The frequencies of CNVs in the WTCCC non-psychiatric controls at three of the loci (15q11.2, 16p13.1 and 17p12) were significantly higher than those reported in previous control populations.
The evidence for association with schizophrenia is compelling for six rare CNV loci, while the remaining three require further replication in large studies. Risk at these loci extends to other neurodevelopmental disorders but their involvement in common non-psychiatric disorders should also be investigated.
PMCID: PMC3315675  PMID: 22130109
CNV; Schizophrenia; WTCCC
10.  Meeting on big mutations addresses big questions in human genetics 
Genome Medicine  2011;3(2):12.
A report on the Keystone Symposium 'Functional Consequences of Structural Variation in the Genome', Steamboat Springs, Colorado, USA, 8-13 January 2011.
PMCID: PMC3092097  PMID: 21345244
11.  Exploration of signals of positive selection derived from genotype-based human genome scans using re-sequencing data 
Human Genetics  2011;131(5):665-674.
We have investigated whether regions of the genome showing signs of positive selection in scans based on haplotype structure also show evidence of positive selection when sequence-based tests are applied, whether the target of selection can be localized more precisely, and whether such extra evidence can lead to increased biological insights. We used two tools: simulations under neutrality or selection, and experimental investigation of two regions identified by the HapMap2 project as putatively selected in human populations. Simulations suggested that neutral and selected regions should be readily distinguished and that it should be possible to localize the selected variant to within 40 kb at least half of the time. Re-sequencing of two ~300 kb regions (chr4:158Mb and chr10:22Mb) lacking known targets of selection in HapMap CHB individuals provided strong evidence for positive selection within each and suggested the micro-RNA gene hsa-miR-548c as the best candidate target in one region, and changes in regulation of the sperm protein gene SPAG6 in the other.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-011-1111-9) contains supplementary material, which is available to authorized users.
PMCID: PMC3325425  PMID: 22057783
12.  Mapping copy number variation by population scale genome sequencing 
Nature  2011;470(7332):59-65.
Genomic structural variants (SVs) are abundant in humans, differing from other variation classes in extent, origin, and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (i.e., copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analyzing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
PMCID: PMC3077050  PMID: 21293372
13.  Inverted duplications on acentric markers: mechanism of formation 
Human Molecular Genetics  2009;18(12):2241-2256.
Acentric inverted duplication (inv dup) markers, the largest group of chromosomal abnormalities with neocentromere formation, are found in patients both with idiopathic mental retardation and with cancer. The mechanism of their formation has been investigated by analyzing the breakpoints and the genotypes of 12 inv dup marker cases (three trisomic, six tetrasomic, two polysomic and one X chromosome derived marker) using a combination of fluorescence in situ hybridization, quantitative SNP array and microsatellite analysis. Inv dup markers were found to form either symmetrically with one breakpoint or asymmetrically with two distinct breakpoints. Genotype analyses revealed that all inv dup markers formed from one single chromatid end. This observation is incompatible with the previously suggested model by which the acentric inv dup markers form through inter-chromosomal U-type exchange. On the basis of the identification of DNA sequence motifs with inverted homologies within all observed breakpoint regions, a new general mechanism is proposed for the acentric inv dup marker formation: following a double-strand break an acentric fragment forms, during either meiosis or mitosis. The open DNA end of the acentric fragment is stabilized by the formation of an intra-chromosomal loop promoted by the presence of sequences with inverted homologies. Likely coinciding with the neocentromere formation, this stabilized fragment is duplicated during an early mitotic event, insuring the marker’s survival during cell division and its presence in all cells.
PMCID: PMC2685760  PMID: 19336476
14.  Towards a comprehensive structural variation map of an individual human genome 
Genome Biology  2010;11(5):R52.
A comprehensive map of structural variation in the human genome provides a reference dataset for analyses of future personal genomes.
Several genomes have now been sequenced, with millions of genetic variants annotated. While significant progress has been made in mapping single nucleotide polymorphisms (SNPs) and small (<10 bp) insertion/deletions (indels), the annotation of larger structural variants has been less comprehensive. It is still unclear to what extent a typical genome differs from the reference assembly, and the analysis of the genomes sequenced to date have shown varying results for copy number variation (CNV) and inversions.
We have combined computational re-analysis of existing whole genome sequence data with novel microarray-based analysis, and detect 12,178 structural variants covering 40.6 Mb that were not reported in the initial sequencing of the first published personal genome. We estimate a total non-SNP variation content of 48.8 Mb in a single genome. Our results indicate that this genome differs from the consensus reference sequence by approximately 1.2% when considering indels/CNVs, 0.1% by SNPs and approximately 0.3% by inversions. The structural variants impact 4,867 genes, and >24% of structural variants would not be imputed by SNP-association.
Our results indicate that a large number of structural variants have been unreported in the individual genomes published to date. This significant extent and complexity of structural variants, as well as the growing recognition of their medical relevance, necessitate they be actively studied in health-related analyses of personal genomes. The new catalogue of structural variants generated for this genome provides a crucial resource for future comparison studies.
PMCID: PMC2898065  PMID: 20482838
15.  Haplotypic Background of a Private Allele at High Frequency in the Americas 
Molecular Biology and Evolution  2009;26(5):995-1016.
Recently, the observation of a high-frequency private allele, the 9-repeat allele at microsatellite D9S1120, in all sampled Native American and Western Beringian populations has been interpreted as evidence that all modern Native Americans descend primarily from a single founding population. However, this inference assumed that all copies of the 9-repeat allele were identical by descent and that the geographic distribution of this allele had not been influenced by natural selection. To investigate whether these assumptions are satisfied, we genotyped 34 single nucleotide polymorphisms across ∼500 kilobases (kb) around D9S1120 in 21 Native American and Western Beringian populations and 54 other worldwide populations. All chromosomes with the 9-repeat allele share the same haplotypic background in the vicinity of D9S1120, suggesting that all sampled copies of the 9-repeat allele are identical by descent. Ninety-one percent of these chromosomes share the same 76.26 kb haplotype, which we call the “American Modal Haplotype” (AMH). Three observations lead us to conclude that the high frequency and widespread distribution of the 9-repeat allele are unlikely to be the result of positive selection: 1) aside from its association with the 9-repeat allele, the AMH does not have a high frequency in the Americas, 2) the AMH is not unusually long for its frequency compared with other haplotypes in the Americas, and 3) in Latin American mestizo populations, the proportion of Native American ancestry at D9S1120 is not unusual compared with that observed at other genomewide microsatellites. Using a new method for estimating the time to the most recent common ancestor (MRCA) of all sampled copies of an allele on the basis of an estimate of the length of the genealogy descended from the MRCA, we calculate the mean time to the MRCA of the 9-repeat allele to be between 7,325 and 39,900 years, depending on the demographic model used. The results support the hypothesis that all modern Native Americans and Western Beringians trace a large portion of their ancestry to a single founding population that may have been isolated from other Asian populations prior to expanding into the Americas.
PMCID: PMC2734135  PMID: 19221006
private allele; D9S1120; Homo sapiens; native American; migration
16.  A Protein Allergen Microarray Detects Specific IgE to Pollen Surface, Cytoplasmic, and Commercial Allergen Extracts 
PLoS ONE  2010;5(4):e10174.
Current diagnostics for allergies, such as skin prick and radioallergosorbent tests, do not allow for inexpensive, high-throughput screening of patients. Additionally, extracts used in these methods are made from washed pollen that lacks pollen surface materials that may contain allergens.
Methodology/Principal Findings
We sought to develop a high-throughput assay to rapidly measure allergen-specific IgE in sera and to explore the relative allergenicity of different pollen fractions (i.e. surface, cytoplasmic, commercial extracts). To do this, we generated a protein microarray containing surface, cytoplasmic, and commercial extracts from 22 pollen species, commercial extracts from nine non-pollen allergens, and five recombinant allergenic proteins. Pollen surface and cytoplasmic fractions were prepared by extraction into organic solvents and aqueous buffers, respectively. Arrays were incubated with <25 uL of serum from 176 individuals and bound IgE was detected by indirect immunofluorescence, providing a high-throughput measurement of IgE. We demonstrated that the allergen microarray is a reproducible method to measure allergen-specific IgE in small amounts of sera. Using this tool, we demonstrated that specific IgE clusters according to the phylogeny of the allergen source. We also showed that the pollen surface, which has been largely overlooked in the past, contained potent allergens. Although, as a class, cytoplasmic fractions obtained by our pulverization/precipitation method were comparable to commercial extracts, many individual allergens showed significant differences.
These results support the hypothesis that protein microarray technology is a useful tool for both research and in the clinic. It could provide a more efficient and less painful alternative to traditionally used skin prick tests, making it economically feasible to compare allergen sensitivity of different populations, monitor individual responses over time, and facilitate genetic studies on pollen allergy.
PMCID: PMC2856625  PMID: 20419087
17.  Dosage Sensitivity Shapes the Evolution of Copy-Number Varied Regions 
PLoS ONE  2010;5(3):e9474.
Dosage sensitivity is an important evolutionary force which impacts on gene dispensability and duplicability. The newly available data on human copy-number variation (CNV) allow an analysis of the most recent and ongoing evolution. Provided that heterozygous gene deletions and duplications actually change gene dosage, we expect to observe negative selection against CNVs encompassing dosage sensitive genes. In this study, we make use of several sources of population genetic data to identify selection on structural variations of dosage sensitive genes. We show that CNVs can directly affect expression levels of contained genes. We find that genes encoding members of protein complexes exhibit limited expression variation and overlap significantly with a manually derived set of dosage sensitive genes. We show that complexes and other dosage sensitive genes are underrepresented in CNV regions, with a particular bias against frequent variations and duplications. These results suggest that dosage sensitivity is a significant force of negative selection on regions of copy-number variation.
PMCID: PMC2835737  PMID: 20224824
18.  The population genetics of structural variation 
Nature genetics  2007;39(7 Suppl):S30-S36.
Population genetics is central to our understanding of human variation, and by linking medical and evolutionary themes, it enables us to understand the origins and impacts of our genomic differences. Despite current limitations in our knowledge of the locations, sizes and mutational origins of structural variants, our characterization of their population genetics is developing apace, bringing new insights into recent human adaptation, genome biology and disease. We summarize recent dramatic advances, describe the diverse mutational origins of chromosomal rearrangements and argue that their complexity necessitates a re-evaluation of existing population genetic methods.
PMCID: PMC2716079  PMID: 17597779
19.  Using population mixtures to optimize the utility of genomic databases: linkage disequilibrium and association study design in India 
Annals of human genetics  2007;72(Pt 4):535-546.
When performing association studies in populations that have not been the focus of large-scale investigations of haplotype variation, it is often helpful to rely on genomic databases in other populations for study design and analysis – such as in the selection of tag SNPs and in the imputation of missing genotypes. One way of improving the use of these databases is to rely on a mixture of database samples that is similar to the population of interest, rather than using the single most similar database sample. We demonstrate the effectiveness of the mixture approach in the application of African, European, and East Asian HapMap samples for tag SNP selection in populations from India, a genetically intermediate region underrepresented in genomic studies of haplotype variation.
PMCID: PMC2495051  PMID: 18513279
20.  Global variation in copy number in the human genome 
Nature  2006;444(7118):444-454.
Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. 1,447 copy number variable regions covering 360 megabases (12% of the genome) were identified in these populations; these CNV regions contained hundreds of genes, disease loci, functional elements and segmental duplications. Strikingly, these CNVs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal dramatic variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.
PMCID: PMC2669898  PMID: 17122850
21.  Mismatch induced speciation in Salmonella: model and data 
In bacteria, DNA sequence mismatches act as a barrier to recombination between distantly related organisms and can potentially promote the cohesion of species. We have performed computer simulations which show that the homology dependence of recombination can cause de novo speciation in a neutrally evolving population once a critical population size has been exceeded. Our model can explain the patterns of divergence and genetic exchange observed in the genus Salmonella, without invoking either natural selection or geographical population subdivision. If this model was validated, based on extensive sequence data, it would imply that the named subspecies of Salmonella enterica correspond to good biological species, making species boundaries objective. However, multilocus sequence typing data, analysed using several conventional tools, provide a misleading impression of relationships within S. enterica subspecies enterica and do not provide the resolution to establish whether new species are presently being formed.
PMCID: PMC1764929  PMID: 17062419
rational systematics; homology-dependent recombination; mismatch repair; genomics; recombination

Results 1-22 (22)