PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1166532)

Clipboard (0)
None

Related Articles

1.  Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection 
Background
Next generation sequencing provides clinical research scientists with direct read out of innumerable variants, including personal, pathological and common benign variants. The aim of resequencing studies is to determine the candidate pathogenic variants from individual genomes, or from family-based or tumor/normal genome comparisons. Whilst the use of appropriate controls within the experimental design will minimize the number of false positive variations selected, this number can be reduced further with the use of high quality whole genome reference data to minimize false positives variants prior to candidate gene selection. In addition the use of platform related sequencing error models can help in the recovery of ambiguous genotypes from lower coverage data.
Description
We have developed a whole genome database of human genetic variations, Huvariome, determined by whole genome deep sequencing data with high coverage and low error rates. The database was designed to be sequencing technology independent but is currently populated with 165 individual whole genomes consisting of small pedigrees and matched tumor/normal samples sequenced with the Complete Genomics sequencing platform. Common variants have been determined for a Benelux population cohort and represented as genotypes alongside the results of two sets of control data (73 of the 165 genomes), Huvariome Core which comprises 31 healthy individuals from the Benelux region, and Diversity Panel consisting of 46 healthy individuals representing 10 different populations and 21 samples in three Pedigrees. Users can query the database by gene or position via a web interface and the results are displayed as the frequency of the variations as detected in the datasets. We demonstrate that Huvariome can provide accurate reference allele frequencies to disambiguate sequencing inconsistencies produced in resequencing experiments. Huvariome has been used to support the selection of candidate cardiomyopathy related genes which have a homozygous genotype in the reference cohorts. This database allows the users to see which selected variants are common variants (> 5% minor allele frequency) in the Huvariome core samples, thus aiding in the selection of potentially pathogenic variants by filtering out common variants that are not listed in one of the other public genomic variation databases. The no-call rate and the accuracy of allele calling in Huvariome provides the user with the possibility of identifying platform dependent errors associated with specific regions of the human genome.
Conclusion
Huvariome is a simple to use resource for validation of resequencing results obtained by NGS experiments. The high sequence coverage and low error rates provide scientists with the ability to remove false positive results from pedigree studies. Results are returned via a web interface that displays location-based genetic variation frequency, impact on protein function, association with known genetic variations and a quality score of the variation base derived from Huvariome Core and the Diversity Panel data. These results may be used to identify and prioritize rare variants that, for example, might be disease relevant. In testing the accuracy of the Huvariome database, alleles of a selection of ambiguously called coding single nucleotide variants were successfully predicted in all cases. Data protection of individuals is ensured by restricted access to patient derived genomes from the host institution which is relevant for future molecular diagnostics.
doi:10.1186/2043-9113-2-19
PMCID: PMC3549785  PMID: 23164068
Medical genetics; Medical genomics; Whole genome sequencing; Allele frequency; Cardiomyopathy
2.  Respiratory chain complex I deficiency caused by mitochondrial DNA mutations 
Defects of the mitochondrial respiratory chain are associated with a diverse spectrum of clinical phenotypes, and may be caused by mutations in either the nuclear or the mitochondrial genome (mitochondrial DNA (mtDNA)). Isolated complex I deficiency is the most common enzyme defect in mitochondrial disorders, particularly in children in whom family history is often consistent with sporadic or autosomal recessive inheritance, implicating a nuclear genetic cause. In contrast, although a number of recurrent, pathogenic mtDNA mutations have been described, historically, these have been perceived as rare causes of paediatric complex I deficiency. We reviewed the clinical and genetic findings in a large cohort of 109 paediatric patients with isolated complex I deficiency from 101 families. Pathogenic mtDNA mutations were found in 29 of 101 probands (29%), 21 in MTND subunit genes and 8 in mtDNA tRNA genes. Nuclear gene defects were inferred in 38 of 101 (38%) probands based on cell hybrid studies, mtDNA sequencing or mutation analysis (nuclear gene mutations were identified in 22 probands). Leigh or Leigh-like disease was the most common clinical presentation in both mtDNA and nuclear genetic defects. The median age at onset was higher in mtDNA patients (12 months) than in patients with a nuclear gene defect (3 months). However, considerable overlap existed, with onset varying from 0 to >60 months in both groups. Our findings confirm that pathogenic mtDNA mutations are a significant cause of complex I deficiency in children. In the absence of parental consanguinity, we recommend whole mitochondrial genome sequencing as a key approach to elucidate the underlying molecular genetic abnormality.
doi:10.1038/ejhg.2011.18
PMCID: PMC3137493  PMID: 21364701
respiratory chain; complex I; mitochondrial DNA; mutation; genetic counselling
3.  Mutations in VRK1 Associated With Complex Motor and Sensory Axonal Neuropathy Plus Microcephaly 
JAMA neurology  2013;70(12):1491-1498.
IMPORTANCE
Patients with rare diseases and complex clinical presentations represent a challenge for clinical diagnostics. Genomic approaches are allowing the identification of novel variants in genes for very rare disorders, enabling a molecular diagnosis. Genomics is also revealing a phenotypic expansion whereby the full spectrum of clinical expression conveyed by mutant alleles at a locus can be better appreciated.
OBJECTIVE
To elucidate the molecular cause of a complex neuropathy phenotype in 3 patients by applying genomic sequencing strategies.
DESIGN, SETTING, AND PARTICIPANTS
Three affected individuals from 2 unrelated families presented with a complex neuropathy phenotype characterized by axonal sensorimotor neuropathy and microcephaly. They were recruited into the Centers for Mendelian Genomics research program to identify the molecular cause of their phenotype. Whole-genome, targeted whole-exome sequencing, and high-resolution single-nucleotide polymorphism arrays were performed in genetics clinics of tertiary care pediatric hospitals and biomedical research institutions.
MAIN OUTCOMES AND MEASURES
Whole-genome and whole-exome sequencing identified the variants responsible for the patients’ clinical phenotype.
RESULTS
We identified compound heterozygous alleles in 2 affected siblings from 1 family and a homozygous nonsense variant in the third unrelated patient in the vaccinia-related kinase 1 gene (VRK1). In the latter subject, we found a common haplotype on which the nonsense mutation occurred and that segregates in the Ashkenazi Jewish population.
CONCLUSIONS AND RELEVANCE
We report the identification of disease-causing alleles in 3 children from 2 unrelated families with a previously uncharacterized complex axonal motor and sensory neuropathy accompanied by severe nonprogressive microcephaly and cerebral dysgenesis. Our data raise the question of whether VRK1 mutations disturb cell cycle progression and may result in apoptosis of cells in the nervous system. The application of unbiased genomic approaches allows the identification of potentially pathogenic mutations in unsuspected genes in highly genetically heterogeneous and uncharacterized neurological diseases.
doi:10.1001/jamaneurol.2013.4598
PMCID: PMC4039291  PMID: 24126608
4.  Whole Genome Sequencing versus Traditional Genotyping for Investigation of a Mycobacterium tuberculosis Outbreak: A Longitudinal Molecular Epidemiological Study 
PLoS Medicine  2013;10(2):e1001387.
In an outbreak investigation of Mycobacterium tuberculosis comparing whole genome sequencing (WGS) with traditional genotyping, Stefan Niemann and colleagues found that classical genotyping falsely clustered some strains, and WGS better reflected contact tracing.
Background
Understanding Mycobacterium tuberculosis (Mtb) transmission is essential to guide efficient tuberculosis control strategies. Traditional strain typing lacks sufficient discriminatory power to resolve large outbreaks. Here, we tested the potential of using next generation genome sequencing for identification of outbreak-related transmission chains.
Methods and Findings
During long-term (1997 to 2010) prospective population-based molecular epidemiological surveillance comprising a total of 2,301 patients, we identified a large outbreak caused by an Mtb strain of the Haarlem lineage. The main performance outcome measure of whole genome sequencing (WGS) analyses was the degree of correlation of the WGS analyses with contact tracing data and the spatio-temporal distribution of the outbreak cases. WGS analyses of the 86 isolates revealed 85 single nucleotide polymorphisms (SNPs), subdividing the outbreak into seven genome clusters (two to 24 isolates each), plus 36 unique SNP profiles. WGS results showed that the first outbreak isolates detected in 1997 were falsely clustered by classical genotyping. In 1998, one clone (termed “Hamburg clone”) started expanding, apparently independently from differences in the social environment of early cases. Genome-based clustering patterns were in better accordance with contact tracing data and the geographical distribution of the cases than clustering patterns based on classical genotyping. A maximum of three SNPs were identified in eight confirmed human-to-human transmission chains, involving 31 patients. We estimated the Mtb genome evolutionary rate at 0.4 mutations per genome per year. This rate suggests that Mtb grows in its natural host with a doubling time of approximately 22 h (400 generations per year). Based on the genome variation discovered, emergence of the Hamburg clone was dated back to a period between 1993 and 1997, hence shortly before the discovery of the outbreak through epidemiological surveillance.
Conclusions
Our findings suggest that WGS is superior to conventional genotyping for Mtb pathogen tracing and investigating micro-epidemics. WGS provides a measure of Mtb genome evolution over time in its natural host context.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Tuberculosis—a contagious bacterial disease that usually infects the lungs—is a major public health problem, particularly in low- and middle-income countries. In 2011, an estimated 8.7 million people developed tuberculosis globally, and 1.4 million people died from the disease. Tuberculosis is second only to HIV/AIDS in terms of global deaths from a single infectious agent. Mycobacterium tuberculosis, the bacterium that causes tuberculosis, is readily spread in airborne droplets when people with active disease cough or sneeze. The characteristic symptoms of tuberculosis include persistent cough, weight loss, fever, and night sweats. Diagnostic tests for the disease include sputum smear analysis (examination of mucus coughed up from the lungs for the presence of M. tuberculosis), mycobacterial culture (growth of M. tuberculosis from sputum), and chest X-rays. Tuberculosis can be cured by taking several antibiotics daily for at least six months, although the recent emergence of multidrug-resistant M. tuberculosis is making tuberculosis harder to treat.
Why Was This Study Done?
Although efforts to reduce the global burden of tuberculosis are showing some improvements, the annual decline in the number of people developing tuberculosis continues to be slow. To develop optimized control strategies, experts need to be able to accurately track M. tuberculosis transmission within human populations. Because M. tuberculosis, like all bacteria, accumulates genetic changes over time, there are many different strains (genetic variants) of M. tuberculosis. Genotyping methods have been developed that identify different bacterial strains by examining specific regions of the bacterial genome (blueprint), but because these methods examine only a small part of the genome, they may not distinguish between related transmission chains. That is, traditional strain genotyping methods may not be able to determine accurately where a tuberculosis outbreak started or how it spread through a population. In this longitudinal cohort study, the researchers compare the ability of whole genome sequencing (WGS), which is rapidly becoming widely available, and traditional genotyping to provide information about a recent German tuberculosis outbreak. In a longitudinal cohort study, a population is followed over time to analyze the occurrence of a specific disease.
What Did the Researchers Do and Find?
During long-term (1997–2010) population-based molecular epidemiological surveillance (disease surveillance that uses molecular techniques rather than reports of illness) in Hamburg and Schleswig-Holstein, the researchers identified a large tuberculosis outbreak caused by M. tuberculosis isolates of the Haarlem lineage using classical strain typing. The researchers examined each of the 86 isolates from this outbreak using WGS and classical genotyping and asked whether the results of these two approaches correlated with contact tracing data (information is routinely collected about the people a patient with tuberculosis has recently met so that these contacts can be tested for tuberculosis and treated if necessary) and with the spatio-temporal distribution of outbreak cases. WGS of the isolates identified 85 single nucleotide polymorphisms (SNPs; genomic sequence variants in which single building blocks, or nucleotides, are altered) that subdivided the outbreak into seven clusters of isolates and 36 unique isolates. The WGS results showed that the first isolates of the outbreak were incorrectly clustered by classical genotyping and that one strain—the “Hamburg clone”—started expanding in 1998. Notably, the genome-based clustering patterns were in better accordance with contact tracing data and with the geographical distribution of cases than clustering patterns based on classical genotyping, and they identified eight confirmed human-to-human transmission chains that involved 31 patients and a maximum of three SNPs. Finally, the researchers used their WGS results to estimate that the Hamburg clone emerged between 1993 and 1997, shortly before the discovery of the tuberculosis outbreak through epidemiological surveillance.
What Do These Findings Mean?
These findings show that WGS can be used to identify specific strains within large tuberculosis outbreaks more accurately than classical genotyping. They also provide new information about the evolution of M. tuberculosis during outbreaks and indicate how WGS data should be interpreted in future genome-based molecular epidemiology studies. WGS has the potential to improve the molecular epidemiological surveillance and control of tuberculosis and of other infectious diseases. Importantly, note the researchers, ongoing reductions in the cost of WGS, the increased availability of “bench top” genome sequencers, and bioinformatics developments should all accelerate the implementation of WGS as a standard method for the identification of transmission chains in infectious disease outbreaks.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001387.
The World Health Organization provides information (in several languages) on all aspects of tuberculosis, including the Global Tuberculosis Report 2012
The Stop TB Partnership is working towards tuberculosis elimination; patient stories about tuberculosis are available (in English and Spanish)
The US Centers for Disease Control and Prevention has information about tuberculosis, including information on tuberculosis genotyping (some information in English and Spanish)
The US National Institute of Allergy and Infectious Diseases also has detailed information on all aspects of tuberculosis
The Tuberculosis Survival Project, which aims to raise awareness of tuberculosis and provide support for people with tuberculosis, provides personal stories about treatment for tuberculosis; the Tuberculosis Vaccine Initiative also provides personal stories about dealing with tuberculosis
MedlinePlus has links to further information about tuberculosis (in English and Spanish)
Wikipedia has a page on whole-genome sequencing (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001387
PMCID: PMC3570532  PMID: 23424287
5.  Characterization of mtDNA variation in a cohort of South African paediatric patients with mitochondrial disease 
Mitochondrial disease can be attributed to both mitochondrial and nuclear gene mutations. It has a heterogeneous clinical and biochemical profile, which is compounded by the diversity of the genetic background. Disease-based epidemiological information has expanded significantly in recent decades, but little information is known that clarifies the aetiology in African patients. The aim of this study was to investigate mitochondrial DNA variation and pathogenic mutations in the muscle of diagnosed paediatric patients from South Africa. A cohort of 71 South African paediatric patients was included and a high-throughput nucleotide sequencing approach was used to sequence full-length muscle mtDNA. The average coverage of the mtDNA genome was 81±26 per position. After assigning haplogroups, it was determined that although the nature of non-haplogroup-defining variants was similar in African and non-African haplogroup patients, the number of substitutions were significantly higher in African patients. We describe previously reported disease-associated and novel variants in this cohort. We observed a general lack of commonly reported syndrome-associated mutations, which supports clinical observations and confirms general observations in African patients when using single mutation screening strategies based on (predominantly non-African) mtDNA disease-based information. It is finally concluded that this first extensive report on muscle mtDNA sequences in African paediatric patients highlights the need for a full-length mtDNA sequencing strategy, which applies to all populations where specific mutations is not present. This, in addition to nuclear DNA gene mutation and pathogenicity evaluations, will be required to better unravel the aetiology of these disorders in African patients.
doi:10.1038/ejhg.2011.262
PMCID: PMC3355259  PMID: 22258525
mitochondrial DNA; mitochondrial diseases; paediatrics; Africa; high-throughput nucleotide sequencing
6.  Rare chromosomal deletions and duplications in attention-deficit hyperactivity disorder: a genome-wide analysis 
Lancet  2010;376(9750):1401-1408.
Summary
Background
Large, rare chromosomal deletions and duplications known as copy number variants (CNVs) have been implicated in neurodevelopmental disorders similar to attention-deficit hyperactivity disorder (ADHD). We aimed to establish whether burden of CNVs was increased in ADHD, and to investigate whether identified CNVs were enriched for loci previously identified in autism and schizophrenia.
Methods
We undertook a genome-wide analysis of CNVs in 410 children with ADHD and 1156 unrelated ethnically matched controls from the 1958 British Birth Cohort. Children of white UK origin, aged 5–17 years, who met diagnostic criteria for ADHD or hyperkinetic disorder, but not schizophrenia and autism, were recruited from community child psychiatry and paediatric outpatient clinics. Single nucleotide polymorphisms (SNPs) were genotyped in the ADHD and control groups with two arrays; CNV analysis was limited to SNPs common to both arrays and included only samples with high-quality data. CNVs in the ADHD group were validated with comparative genomic hybridisation. We assessed the genome-wide burden of large (>500 kb), rare (<1% population frequency) CNVs according to the average number of CNVs per sample, with significance assessed via permutation. Locus-specific tests of association were undertaken for test regions defined for all identified CNVs and for 20 loci implicated in autism or schizophrenia. Findings were replicated in 825 Icelandic patients with ADHD and 35 243 Icelandic controls.
Findings
Data for full analyses were available for 366 children with ADHD and 1047 controls. 57 large, rare CNVs were identified in children with ADHD and 78 in controls, showing a significantly increased rate of CNVs in ADHD (0·156 vs 0·075; p=8·9×10−5). This increased rate of CNVs was particularly high in those with intellectual disability (0·424; p=2·0×10−6), although there was also a significant excess in cases with no such disability (0·125, p=0·0077). An excess of chromosome 16p13.11 duplications was noted in the ADHD group (p=0·0008 after correction for multiple testing), a finding that was replicated in the Icelandic sample (p=0·031). CNVs identified in our ADHD cohort were significantly enriched for loci previously reported in both autism (p=0·0095) and schizophrenia (p=0·010).
Interpretation
Our findings provide genetic evidence of an increased rate of large CNVs in individuals with ADHD and suggest that ADHD is not purely a social construct.
Funding
Action Research; Baily Thomas Charitable Trust; Wellcome Trust; UK Medical Research Council; European Union.
doi:10.1016/S0140-6736(10)61109-9
PMCID: PMC2965350  PMID: 20888040
7.  Integrating precision medicine in the study and clinical treatment of a severely mentally ill person 
PeerJ  2013;1:e177.
Background. In recent years, there has been an explosion in the number of technical and medical diagnostic platforms being developed. This has greatly improved our ability to more accurately, and more comprehensively, explore and characterize human biological systems on the individual level. Large quantities of biomedical data are now being generated and archived in many separate research and clinical activities, but there exists a paucity of studies that integrate the areas of clinical neuropsychiatry, personal genomics and brain-machine interfaces.
Methods. A single person with severe mental illness was implanted with the Medtronic Reclaim® Deep Brain Stimulation (DBS) Therapy device for Obsessive Compulsive Disorder (OCD), targeting his nucleus accumbens/anterior limb of the internal capsule. Programming of the device and psychiatric assessments occurred in an outpatient setting for over two years. His genome was sequenced and variants were detected in the Illumina Whole Genome Sequencing Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory.
Results. We report here the detailed phenotypic characterization, clinical-grade whole genome sequencing (WGS), and two-year outcome of a man with severe OCD treated with DBS. Since implantation, this man has reported steady improvement, highlighted by a steady decline in his Yale-Brown Obsessive Compulsive Scale (YBOCS) score from ∼38 to a score of ∼25. A rechargeable Activa RC neurostimulator battery has been of major benefit in terms of facilitating a degree of stability and control over the stimulation. His psychiatric symptoms reliably worsen within hours of the battery becoming depleted, thus providing confirmatory evidence for the efficacy of DBS for OCD in this person. WGS revealed that he is a heterozygote for the p.Val66Met variant in BDNF, encoding a member of the nerve growth factor family, and which has been found to predispose carriers to various psychiatric illnesses. He carries the p.Glu429Ala allele in methylenetetrahydrofolate reductase (MTHFR) and the p.Asp7Asn allele in ChAT, encoding choline O-acetyltransferase, with both alleles having been shown to confer an elevated susceptibility to psychoses. We have found thousands of other variants in his genome, including pharmacogenetic and copy number variants. This information has been archived and offered to this person alongside the clinical sequencing data, so that he and others can re-analyze his genome for years to come.
Conclusions. To our knowledge, this is the first study in the clinical neurosciences that integrates detailed neuropsychiatric phenotyping, deep brain stimulation for OCD and clinical-grade WGS with management of genetic results in the medical treatment of one person with severe mental illness. We offer this as an example of precision medicine in neuropsychiatry including brain-implantable devices and genomics-guided preventive health care.
doi:10.7717/peerj.177
PMCID: PMC3792182  PMID: 24109560
Genomics; Deep brain stimulation; Whole genome sequencing; Ethics; Neurosurgery; Obsessive compulsive disorder
8.  Molecular Diagnosis of Usher Syndrome: Application of Two Different Next Generation Sequencing-Based Procedures 
PLoS ONE  2012;7(8):e43799.
Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified.
doi:10.1371/journal.pone.0043799
PMCID: PMC3430670  PMID: 22952768
9.  Molecular Genetic Testing for Mitochondrial Disease: From One Generation to the Next 
Neurotherapeutics  2012;10(2):251-261.
Molecular genetic diagnostic testing for mitochondrial disease has evolved continually since the first genetic basis for a clinical mitochondrial disease syndrome was identified in the late 1980s. Owing to global limitations in both knowledge and technology, few individuals, even among those with strong clinical or biochemical evidence of mitochondrial respiratory chain dysfunction, ever received a definitive molecular diagnosis prior to 2005. Clinically available genetic diagnostic testing options improved by 2006 to include sequencing and deletion analysis of an increasing number of individual nuclear genes linked to mitochondrial disease, genome-wide microarray analysis for chromosomal copy number abnormalities, and mitochondrial DNA whole genome sequence analysis. To assess the collective effect of these tests on the genetic diagnosis of suspected mitochondrial disease, we report here results from a retrospective review of the diagnostic yield in patients evaluated from 2008 to 2011 in the Mitochondrial-Genetics Diagnostic Clinic at The Children’s Hospital of Philadelphia. Among 152 patients aged 6 weeks to 81 years referred for clinical evaluation of multisystem presentations concerning for suspected mitochondrial disease, a genetic etiology was established that confirmed definite mitochondrial disease in 16.4 % and excluded primary mitochondrial disease in 9.2 %. Substantial diagnostic challenges remain owing to the clinical difficulty and frank low yield of a priori selecting individual nuclear genes to sequence based on particular symptomatic or biochemical manifestations of suspected mitochondrial disease. These findings highlight the particular utility of massively parallel nuclear exome sequencing technologies, whose benefits and limitations are explored relative to the clinical genetic diagnostic evaluation of mitochondrial disease.
Electronic supplementary material
The online version of this article (doi:10.1007/s13311-012-0174-1) contains supplementary material, which is available to authorized users.
doi:10.1007/s13311-012-0174-1
PMCID: PMC3625386  PMID: 23269497
Next generation sequencing; massively parallel sequencing; whole exome sequencing; retrospective study; mitochondrial disease diagnosis
10.  Germline Variation in Cancer-Susceptibility Genes in a Healthy, Ancestrally Diverse Cohort: Implications for Individual Genome Sequencing 
PLoS ONE  2014;9(4):e94554.
Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.
doi:10.1371/journal.pone.0094554
PMCID: PMC3984285  PMID: 24728327
11.  Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence 
Background
The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents.
Methodology/Principal Findings
To precisely define genetic differences between TPA and TPE, high-quality whole genome sequences of three TPE strains (Samoa D, CDC-2, Gauthier) were determined using next-generation sequencing techniques. TPE genome sequences were compared to four genomes of TPA strains (Nichols, DAL-1, SS14, Chicago). The genome structure was identical in all three TPE strains with similar length ranging between 1,139,330 bp and 1,139,744 bp. No major genome rearrangements were found when compared to the four TPA genomes. The whole genome nucleotide divergence (dA) between TPA and TPE subspecies was 4.7 and 4.8 times higher than the observed nucleotide diversity (π) among TPA and TPE strains, respectively, corresponding to 99.8% identity between TPA and TPE genomes. A set of 97 (9.9%) TPE genes encoded proteins containing two or more amino acid replacements or other major sequence changes. The TPE divergent genes were mostly from the group encoding potential virulence factors and genes encoding proteins with unknown function.
Conclusions/Significance
Hypothetical genes, with genetic differences, consistently found between TPE and TPA strains are candidates for syphilitic treponemes virulence factors. Seventeen TPE genes were predicted under positive selection, and eleven of them coded either for predicted exported proteins or membrane proteins suggesting their possible association with the cell surface. Sequence changes between TPE and TPA strains and changes specific to individual strains represent suitable targets for subspecies- and strain-specific molecular diagnostics.
Author Summary
Spirochete Treponema pallidum ssp. pertenue (TPE) is the causative agent of yaws while strains of Treponema pallidum ssp. pallidum (TPA) cause syphilis. Both yaws and syphilis are distinguished on the basis of epidemiological characteristics and clinical symptoms. Neither treponeme can reproduce outside the host organism, which precludes the use of standard molecular biology techniques used to study cultivable pathogens. In this study, we determined high quality whole genome sequences of TPE strains and compared them to known genetic information for T. pallidum ssp. pallidum strains. The genome structure was identical in all three TPE strains and also between TPA and TPE strains. The TPE genome length ranged between 1,139,330 bp and 1,139,744 bp. The overall sequence identity between TPA and TPE genomes was 99.8%, indicating that the two pathogens are extremely closely related. A set of 34 TPE genes (3.5%) encoded proteins containing six or more amino acid replacements or other major sequence changes. These genes more often belonged to the group of genes with predicted virulence and unknown functions suggesting their involvement in infection differences between yaws and syphilis.
doi:10.1371/journal.pntd.0001471
PMCID: PMC3265458  PMID: 22292095
12.  Successful transitioning is a matter of the Heart: Integrated Care for Grown-Up Congenital Heart Disease 
Purpose
This study offers a comprehensive overview over the existing guidelines for GUCH/ACHD care and synthesises the recommendations made over the past decade, developing them into an integrated care concept for GUCH/ACHD patients. Its aim is to emphasise the need for more coordinated action of paediatric and adult specialists, professional and patients organisations to lobby for a concerted implementation of the guidelines for GUCH/ACHD management and an organised transitioning process.
Context
More than a decade ago, discussions picked up on the adequate management of a challenging new patient group: persons with ‘Grown-Up Congenital Heart Disease’ (GUCH), also known as ‘Adult Congenital Heart Disease’ (ACHD) in North America. The various authors acknowledged the demand for highly specialised and trained professionals who could provide the wide array of services needed for this patient group, with a systematic and multi-disciplinary approach. First experiences have already been gathered throughout the 1990s in Canada and the UK. Since then, the technological and medical advances in paediatric cardiology, cardiac surgery and related medical fields have improved the health outcomes even further, to the extent that 85%–95% of children with congenital heart disease (CHD) survive into adulthood. However, the efforts to implement the necessary managerial, transitioning and vocational training requirements have not been afforded equal focus.
Data sources
A literature review of the existing guidelines for the management of GUCH patients from national and international cardiology associations, expert interviews.
Case description
The key problem in the management of GUCH patients is a lack of understanding the importance of a coordinated transitioning process from paediatric to adult care services. Neither the paediatric nor the adult specialists have the proper training for the care of these patients, the former lacking experience with adult patients the latter not knowing the complex indication of congenital heart disease. In the different guidelines (e.g. from the American Heart Association or the European Society of Cardiology), it is acknowledged that cooperation and communication between specialists and settings and a managed transitioning process are paramount. In this case, the focus is laid on developing an integrated care model based on the existing medical guidelines and the requirements a transition process demands. Adolescents with CHD and their parents need to be prepared to adapt to the demands of an adult life. They need information on working and educational options, family planning, and what complications may be expected. Also, the adolescents need to learn to take over the responsibility for their own life and health—independent of their parents. These are just the most pressing of the challenges GUCH patients face.
(Preliminary) conclusions
Even though specialised GUCH/ACHD centres exist in many countries, they are too few in numbers to effectively and adequately service the whole population and provide high quality training. A lack of coordination and communication between paediatric and adult health care service providers results in patients being lost in transition from paediatric to adult care settings. This counteracts the excellent services children with congenital heart disease receive nowadays, and which have lead to the need of specialised adult service in the first place. It is a waste of time and resources if the efforts made in the paediatric care setting are not followed up adequately once the patients are grown up. This is a classic setting for implementing integrated care and this study offers a model, based on the available medical guidelines to do so.
PMCID: PMC3617751
transition from paediatric to adult services; GUCH; implementation of guidelines; integrated care centres
13.  SOLiD™ Sequencing of Genomes of Clinical Isolates of Leishmania donovani from India Confirm Leptomonas Co-Infection and Raise Some Key Questions 
PLoS ONE  2013;8(2):e55738.
Background
Known as ‘neglected disease’ because relatively little effort has been applied to finding cures, leishmaniasis kills more than 150,000 people every year and debilitates millions more. Visceral leishmaniasis (VL), also called Kala Azar (KA) or black fever in India, claims around 20,000 lives every year. Whole genome analysis presents an excellent means to identify new targets for drugs, vaccine and diagnostics development, and also provide an avenue into the biological basis of parasite virulence in the L. donovani complex prevalent in India.
Methodology/Principal Findings
In our presently described study, the next generation SOLiD™ platform was successfully utilized for the first time to carry out whole genome sequencing of L. donovani clinical isolates from India. We report the exceptional occurrence of insect trypanosomatids in clinical cases of visceral leishmaniasis (Kala Azar) patients in India. We confirm with whole genome sequencing analysis data that isolates which were sequenced from Kala Azar (visceral leishmaniasis) cases were genetically related to Leptomonas. The co-infection in splenic aspirate of these patients with a species of Leptomonas and how likely is it that the infection might be pathogenic, are key questions which need to be investigated. We discuss our results in the context of some important probable hypothesis in this article.
Conclusions/Significance
Our intriguing results of unusual cases of Kala Azar found to be most similar to Leptomonas species put forth important clinical implications for the treatment of Kala Azar in India. Leptomonas have been shown to be highly susceptible to several standard leishmaniacides in vitro. There is very little divergence among these two species viz. Leishmania sp. and L. seymouri, in terms of genomic sequence and organization. A more extensive perception of the phenomenon of co-infection needs to be addressed from molecular pathogenesis and eco-epidemiological standpoint.
doi:10.1371/journal.pone.0055738
PMCID: PMC3572117  PMID: 23418454
14.  Genetic and Genomic Architecture of the Evolution of Resistance to Antifungal Drug Combinations 
PLoS Genetics  2013;9(4):e1003390.
The evolution of drug resistance in fungal pathogens compromises the efficacy of the limited number of antifungal drugs. Drug combinations have emerged as a powerful strategy to enhance antifungal efficacy and abrogate drug resistance, but the impact on the evolution of drug resistance remains largely unexplored. Targeting the molecular chaperone Hsp90 or its downstream effector, the protein phosphatase calcineurin, abrogates resistance to the most widely deployed antifungals, the azoles, which inhibit ergosterol biosynthesis. Here, we evolved experimental populations of the model yeast Saccharomyces cerevisiae and the leading human fungal pathogen Candida albicans with azole and an inhibitor of Hsp90, geldanamycin, or calcineurin, FK506. To recapitulate a clinical context where Hsp90 or calcineurin inhibitors could be utilized in combination with azoles to render resistant pathogens responsive to treatment, the evolution experiment was initiated with strains that are resistant to azoles in a manner that depends on Hsp90 and calcineurin. Of the 290 lineages initiated, most went extinct, yet 14 evolved resistance to the drug combination. Drug target mutations that conferred resistance to geldanamycin or FK506 were identified and validated in five evolved lineages. Whole-genome sequencing identified mutations in a gene encoding a transcriptional activator of drug efflux pumps, PDR1, and a gene encoding a transcriptional repressor of ergosterol biosynthesis genes, MOT3, that transformed azole resistance of two lineages from dependent on calcineurin to independent of this regulator. Resistance also arose by mutation that truncated the catalytic subunit of calcineurin, and by mutation in LCB1, encoding a sphingolipid biosynthetic enzyme. Genome analysis revealed extensive aneuploidy in four of the C. albicans lineages. Thus, we identify molecular determinants of the transition of azole resistance from calcineurin dependence to independence and establish multiple mechanisms by which resistance to drug combinations evolves, providing a foundation for predicting and preventing the evolution of drug resistance.
Author Summary
Fungal infections are a leading cause of mortality worldwide and are difficult to treat due to the limited number of antifungal drugs, whose effectiveness is compromised by the emergence of drug resistance. A powerful strategy to combat drug resistance is combination therapy. Inhibiting the molecular chaperone Hsp90 or its downstream effector calcineurin cripples fungal stress responses and abrogates drug resistance. Here we provide the first analysis of the genetic and genomic changes that underpin the evolution of resistance to antifungal drug combinations in the leading human fungal pathogen, Candida albicans, and model yeast, Saccharomyces cerevisiae. We evolved experimental populations with combinations of inhibitors of Hsp90 or calcineurin and the most widely used antifungal in the clinic, the azoles, which inhibit ergosterol biosynthesis. We harnessed whole-genome sequencing to identify diverse resistance mutations among the 14 lineages that evolved resistance to the drug combination. These included mutations in genes encoding the drug targets, a transcriptional regulator of multidrug transporters, a transcriptional repressor of ergosterol biosynthesis enzymes, and a regulator of sphingolipid biosynthesis. We also identified extensive aneuploidies in several C. albicans lineages. Our study reveals multiple mechanisms by which resistance to drug combination can evolve, suggesting new strategies to combat drug resistance.
doi:10.1371/journal.pgen.1003390
PMCID: PMC3617151  PMID: 23593013
15.  Joint genotype inference with germline and somatic mutations 
BMC Bioinformatics  2013;14(Suppl 5):S3.
The joint sequencing of related genomes has become an important means to discover rare variants. Normal-tumor genome pairs are routinely sequenced together to find somatic mutations and their associations with different cancers. Parental and sibling genomes reveal de novo germline mutations and inheritance patterns related to Mendelian diseases.
Acute lymphoblastic leukemia (ALL) is the most common paediatric cancer and the leading cause of cancer-related death among children. With the aim of uncovering the full spectrum of germline and somatic genetic alterations in childhood ALL genomes, we conducted whole-exome re-sequencing on a unique cohort of over 120 exomes of childhood ALL quartets, each comprising a patient's tumor and matched-normal material, and DNA from both parents. We developed a general probabilistic model for such quartet sequencing reads mapped to the reference human genome. The model is used to infer joint genotypes at homologous loci across a normal-tumor genome pair and two parental genomes.
We describe the algorithms and data structures for genotype inference, model parameter training. We implemented the methods in an open-source software package (QUADGT) that uses the standard file formats of the 1000 Genomes Project. Our method's utility is illustrated on quartets from the ALL cohort.
doi:10.1186/1471-2105-14-S5-S3
PMCID: PMC3622648  PMID: 23734724
16.  Clinical Implications of Human Population Differences in Genome-Wide Rates of Functional Genotypes 
Frontiers in Genetics  2012;3:211.
There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient’s genome must be contrasted with variants in a reference set of genomes made up of other individuals’ genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5–6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.
doi:10.3389/fgene.2012.00211
PMCID: PMC3485509  PMID: 23125845
clinical sequencing; congenital disease; whole genome sequencing; population genetics
17.  Relationship Estimation from Whole-Genome Sequence Data 
PLoS Genetics  2014;10(1):e1004144.
The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1st through 6th degree relationships, and 55% of 9th through 11th degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1st through 9th degree relationships from whole-genome sequence data.
Author Summary
The determination of the relationship between a pair of individuals is a fundamental application of genetics. The most accurate methods for relationship estimation rely on precise, localized estimates of genetic sharing between individuals. Earlier methods have generated these estimates from high-density genetic marker data. We performed relationship estimation using whole-genome sequence data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. Our results demonstrate that complexities specific to whole-genome sequencing result in regions of the genome that are prone to false-positive estimates of genetic sharing. We provide a map of these spurious IBD regions and introduce new methods, implemented in the software package ERSA 2.0, to control for spurious IBD. We show that ERSA 2.0 provides a 5% to 15% increase in relationship detection power for distant relationships with whole-genome sequence data relative to high-density genetic marker data.
doi:10.1371/journal.pgen.1004144
PMCID: PMC3907355  PMID: 24497848
18.  Adaptive Gene Expression Divergence Inferred from Population Genomics 
PLoS Genetics  2007;3(10):e187.
Detailed studies of individual genes have shown that gene expression divergence often results from adaptive evolution of regulatory sequence. Genome-wide analyses, however, have yet to unite patterns of gene expression with polymorphism and divergence to infer population genetic mechanisms underlying expression evolution. Here, we combined genomic expression data—analyzed in a phylogenetic context—with whole genome light-shotgun sequence data from six Drosophila simulans lines and reference sequences from D. melanogaster and D. yakuba. These data allowed us to use molecular population genetics to test for neutral versus adaptive gene expression divergence on a genomic scale. We identified recent and recurrent adaptive evolution along the D. simulans lineage by contrasting sequence polymorphism within D. simulans to divergence from D. melanogaster and D. yakuba. Genes that evolved higher levels of expression in D. simulans have experienced adaptive evolution of the associated 3′ flanking and amino acid sequence. Concomitantly, these genes are also decelerating in their rates of protein evolution, which is in agreement with the finding that highly expressed genes evolve slowly. Interestingly, adaptive evolution in 5′ cis-regulatory regions did not correspond strongly with expression evolution. Our results provide a genomic view of the intimate link between selection acting on a phenotype and associated genic evolution.
Author Summary
Changes in patterns of gene expression likely contribute greatly to phenotypic differences among closely related organisms. However, the evolutionary mechanisms, such as Darwinian selection and random genetic drift, which are underlying differences in patterns of expression, are only now being understood on a genomic level. We combine measurements of gene expression and whole-genome sequence data to investigate the relationship between the forces driving sequence evolution and expression divergence among closely related fruit flies. We find that Darwinian selection acting on regions that may control gene expression is associated with increases in gene expression levels. Investigation of the functional consequences of adaptive evolution on regulating gene expression is clearly warranted. The genetic tools available in Drosophila make functional experiments possible and will shed light on how closely related species have responded to reproductive, pathogenic, and environmental pressures.
doi:10.1371/journal.pgen.0030187
PMCID: PMC2042001  PMID: 17967066
19.  The Role of the Toxicologic Pathologist in the Post-Genomic Era# 
Journal of Toxicologic Pathology  2013;26(2):105-110.
An era can be defined as a period in time identified by distinctive character, events, or practices. We are now in the genomic era. The pre-genomic era: There was a pre-genomic era. It started many years ago with novel and seminal animal experiments, primarily directed at studying cancer. It is marked by the development of the two-year rodent cancer bioassay and the ultimate realization that alternative approaches and short-term animal models were needed to replace this resource-intensive and time-consuming method for predicting human health risk. Many alternatives approaches and short-term animal models were proposed and tried but, to date, none have completely replaced our dependence upon the two-year rodent bioassay. However, the alternative approaches and models themselves have made tangible contributions to basic research, clinical medicine and to our understanding of cancer and they remain useful tools to address hypothesis-driven research questions. The pre-genomic era was a time when toxicologic pathologists played a major role in drug development, evaluating the cancer bioassay and the associated dose-setting toxicity studies, and exploring the utility of proposed alternative animal models. It was a time when there was shortage of qualified toxicologic pathologists. The genomic era: We are in the genomic era. It is a time when the genetic underpinnings of normal biological and pathologic processes are being discovered and documented. It is a time for sequencing entire genomes and deliberately silencing relevant segments of the mouse genome to see what each segment controls and if that silencing leads to increased susceptibility to disease. What remains to be charted in this genomic era is the complex interaction of genes, gene segments, post-translational modifications of encoded proteins, and environmental factors that affect genomic expression. In this current genomic era, the toxicologic pathologist has had to make room for a growing population of molecular biologists. In this present era newly emerging DVM and MD scientists enter the work arena with a PhD in pathology often based on some aspect of molecular biology or molecular pathology research. In molecular biology, the almost daily technological advances require one’s complete dedication to remain at the cutting edge of the science. Similarly, the practice of toxicologic pathology, like other morphological disciplines, is based largely on experience and requires dedicated daily examination of pathology material to maintain a well-trained eye capable of distilling specific information from stained tissue slides - a dedicated effort that cannot be well done as an intermezzo between other tasks. It is a rare individual that has true expertise in both molecular biology and pathology. In this genomic era, the newly emerging DVM-PhD or MD-PhD pathologist enters a marketplace without many job opportunities in contrast to the pre-genomic era. Many face an identity crisis needing to decide to become a competent pathologist or, alternatively, to become a competent molecular biologist. At the same time, more PhD molecular biologists without training in pathology are members of the research teams working in drug development and toxicology. How best can the toxicologic pathologist interact in the contemporary team approach in drug development, toxicology research and safety testing? Based on their biomedical training, toxicologic pathologists are in an ideal position to link data from the emerging technologies with their knowledge of pathobiology and toxicology. To enable this linkage and obtain the synergy it provides, the bench-level, slide-reading expert pathologist will need to have some basic understanding and appreciation of molecular biology methods and tools. On the other hand, it is not likely that the typical molecular biologist could competently evaluate and diagnose stained tissue slides from a toxicology study or a cancer bioassay. The post-genomic era: The post-genomic era will likely arrive approximately around 2050 at which time entire genomes from multiple species will exist in massive databases, data from thousands of robotic high throughput chemical screenings will exist in other databases, genetic toxicity and chemical structure-activity-relationships will reside in yet other databases. All databases will be linked and relevant information will be extracted and analyzed by appropriate algorithms following input of the latest molecular, submolecular, genetic, experimental, pathology and clinical data. Knowledge gained will permit the genetic components of many diseases to be amenable to therapeutic prevention and/or intervention. Much like computerized algorithms are currently used to forecast weather or to predict political elections, computerized sophisticated algorithms based largely on scientific data mining will categorize new drugs and chemicals relative to their health benefits versus their health risks for defined human populations and subpopulations. However, this form of a virtual toxicity study or cancer bioassay will only identify probabilities of adverse consequences from interaction of particular environmental and/or chemical/drug exposure(s) with specific genomic variables. Proof in many situations will require confirmation in intact in vivo mammalian animal models. The toxicologic pathologist in the post-genomic era will be the best suited scientist to confirm the data mining and its probability predictions for safety or adverse consequences with the actual tissue morphological features in test species that define specific test agent pathobiology and human health risk.
doi:10.1293/tox.26.105
PMCID: PMC3695332  PMID: 23914052
genomic era; history of toxicologic pathology; molecular biology
20.  Whole Genome Sequencing of Field Isolates Provides Robust Characterization of Genetic Diversity in Plasmodium vivax 
Background
An estimated 2.85 billion people live at risk of Plasmodium vivax transmission. In endemic countries vivax malaria causes significant morbidity and its mortality is becoming more widely appreciated, drug-resistant strains are increasing in prevalence, and an increasing number of reports indicate that P. vivax is capable of breaking through the Duffy-negative barrier long considered to confer resistance to blood stage infection. Absence of robust in vitro propagation limits our understanding of fundamental aspects of the parasite's biology, including the determinants of its dormant hypnozoite phase, its virulence and drug susceptibility, and the molecular mechanisms underlying red blood cell invasion.
Methodology/Principal Findings
Here, we report results from whole genome sequencing of five P. vivax isolates obtained from Malagasy and Cambodian patients, and of the monkey-adapted Belem strain. We obtained an average 70–400 X coverage of each genome, resulting in more than 93% of the Sal I reference sequence covered by 20 reads or more. Our study identifies more than 80,000 SNPs distributed throughout the genome which will allow designing association studies and population surveys. Analysis of the genome-wide genetic diversity in P. vivax also reveals considerable allele sharing among isolates from different continents. This observation could be consistent with a high level of gene flow among parasite strains distributed throughout the world.
Conclusions
Our study shows that it is feasible to perform whole genome sequencing of P. vivax field isolates and rigorously characterize the genetic diversity of this parasite. The catalogue of polymorphisms generated here will enable large-scale genotyping studies and contribute to a better understanding of P. vivax traits such as drug resistance or erythrocyte invasion, partially circumventing the lack of laboratory culture that has hampered vivax research for years.
Author Summary
Plasmodium vivax is the most frequently transmitted and widely distributed cause of malaria in the world. Each year P. vivax is responsible for approximately 250 million clinical cases of malaria and its global economic burden, placed largely on the poor, has been estimated to exceed US$1.4 billion. In contrast to P. falciparum, P. vivax cannot be propagated in continuous in vitro culture and this limits our understanding of the parasite’s biology. In this study, we sequenced the entire genome of five P. vivax isolates directly from blood samples of infected patients. Our data indicated that each patient was infected with multiple P. vivax strains. We also identified more than 80,000 DNA polymorphisms distributed throughout the genome that will enable future studies of the P. vivax population and association mapping studies. Our study illustrates the potential of genomic studies for better understanding P. vivax biology and how the parasite successfully evades malaria elimination efforts worldwide.
doi:10.1371/journal.pntd.0001811
PMCID: PMC3435244  PMID: 22970335
21.  Whole-Exome Sequencing Reveals a Rapid Change in the Frequency of Rare Functional Variants in a Founding Population of Humans 
PLoS Genetics  2013;9(9):e1003815.
Whole-exome or gene targeted resequencing in hundreds to thousands of individuals has shown that the majority of genetic variants are at low frequency in human populations. Rare variants are enriched for functional mutations and are expected to explain an important fraction of the genetic etiology of human disease, therefore having a potential medical interest. In this work, we analyze the whole-exome sequences of French-Canadian individuals, a founder population with a unique demographic history that includes an original population bottleneck less than 20 generations ago, followed by a demographic explosion, and the whole exomes of French individuals sampled from France. We show that in less than 20 generations of genetic isolation from the French population, the genetic pool of French-Canadians shows reduced levels of diversity, higher homozygosity, and an excess of rare variants with low variant sharing with Europeans. Furthermore, the French-Canadian population contains a larger proportion of putatively damaging functional variants, which could partially explain the increased incidence of genetic disease in the province. Our results highlight the impact of population demography on genetic fitness and the contribution of rare variants to the human genetic variation landscape, emphasizing the need for deep cataloguing of genetic variants by resequencing worldwide human populations in order to truly assess disease risk.
Author Summary
Recent resequencing of the whole genome or the coding part of the genome (the exome) in thousands of individuals has described a large excess of low frequency variants in humans, probably arising as a consequence of recent rapid growth in human population sizes. Most rare variants are private to specific populations and are enriched for functional mutations, thus potentially having some medical relevance. In this study, we analyze whole-exome sequences from over a hundred individuals from the French-Canadian population, which was founded less than 400 years ago by about 8,500 French settlers who colonized the province between the 17th and 18th centuries. We show that in a remarkably short period of time this population has accumulated substantial differences, including an excess of rare, functional and potentially damaging variants, when compared to the original European population. Our results show the effects of population history on genetic variation that may have an impact on genetic fitness and disease, and have implications in the design of genetic studies, highlighting the importance of extending deep resequencing to worldwide human populations.
doi:10.1371/journal.pgen.1003815
PMCID: PMC3784517  PMID: 24086152
22.  Challenges in diagnosing paediatric malaria in Dar es Salaam, Tanzania 
Malaria Journal  2013;12:228.
Background
Malaria is a major cause of paediatric morbidity and mortality. As no clinical features clearly differentiate malaria from other febrile illnesses, and malaria diagnosis is challenged by often lacking laboratory equipment and expertise, overdiagnosis and overtreatment is common.
Methods
Children admitted with fever at the general paediatric wards at Muhimbili National Hospital (MNH), Dar es Salaam, Tanzania from January to June 2009 were recruited consecutively and prospectively. Demographic and clinical features were registered. Routine thick blood smear microscopy at MNH was compared to results of subsequent thin blood smear microscopy, and rapid diagnostics tests (RDTs). Genus-specific PCR of Plasmodium mitochondrial DNA was performed on DNA extracted from whole blood and species-specific PCR was done on positive samples.
Results
Among 304 included children, 62.6% had received anti-malarials during the last four weeks prior to admission and 65.1% during the hospital stay. Routine thick blood smears, research blood smears, PCR and RDT detected malaria in 13.2%, 6.6%, 25.0% and 13.5%, respectively. Positive routine microscopy was confirmed in only 43% (17/40), 45% (18/40) and 53% (21/40), by research microscopy, RDTs and PCR, respectively. Eighteen percent (56/304) had positive PCR but negative research microscopy. Reported low parasitaemia on routine microscopy was associated with negative research blood slide and PCR. RDT-positive cases were associated with signs of severe malaria. Palmar pallor, low haemoglobin and low platelet count were significantly associated with positive PCR, research microscopy and RDT.
Conclusions
The true morbidity attributable to malaria in the study population remains uncertain due to the discrepancies in results among the diagnostic methods. The current routine microscopy appears to result in overdiagnosis of malaria and, consequently, overuse of anti-malarials. Conversely, children with a false positive malaria diagnosis may die because they do not receive treatment for the true cause of their illness. RDTs appear to have the potential to improve routine diagnostics, but the clinical implication of the many RDT-negative, PCR-positive samples needs to be elucidated.
doi:10.1186/1475-2875-12-228
PMCID: PMC3703277  PMID: 23822515
Malaria; Diagnostics; Polymerase chain reaction; Blood microscopy; Rapid diagnostic test; Tanzania; Paediatrics; Fever
23.  Population Genomics: Whole-Genome Analysis of Polymorphism and Divergence in Drosophila simulans 
PLoS Biology  2007;5(11):e310.
The population genetic perspective is that the processes shaping genomic variation can be revealed only through simultaneous investigation of sequence polymorphism and divergence within and between closely related species. Here we present a population genetic analysis of Drosophila simulans based on whole-genome shotgun sequencing of multiple inbred lines and comparison of the resulting data to genome assemblies of the closely related species, D. melanogaster and D. yakuba. We discovered previously unknown, large-scale fluctuations of polymorphism and divergence along chromosome arms, and significantly less polymorphism and faster divergence on the X chromosome. We generated a comprehensive list of functional elements in the D. simulans genome influenced by adaptive evolution. Finally, we characterized genomic patterns of base composition for coding and noncoding sequence. These results suggest several new hypotheses regarding the genetic and biological mechanisms controlling polymorphism and divergence across the Drosophila genome, and provide a rich resource for the investigation of adaptive evolution and functional variation in D. simulans.
Author Summary
Population genomics, the study of genome-wide patterns of sequence variation within and between closely related species, can provide a comprehensive view of the relative importance of mutation, recombination, natural selection, and genetic drift in evolution. It can also provide fundamental insights into the biological attributes of organisms that are specifically shaped by adaptive evolution. One approach for generating population genomic datasets is to align DNA sequences from whole-genome shotgun projects to a standard reference sequence. We used this approach to carry out whole-genome analysis of polymorphism and divergence in Drosophila simulans, a close relative of the model system, D. melanogaster. We find that polymorphism and divergence fluctuate on a large scale across the genome and that these fluctuations are probably explained by natural selection rather than by variation in mutation rates. Our analysis suggests that adaptive protein evolution is common and is often related to biological processes that may be associated with gene expression, chromosome biology, and reproduction. The approaches presented here will have broad applicability to future analysis of population genomic variation in other systems, including humans.
Low-coverage genome sequences from multiple Drosophila simulans strains provide the first comprehensive view of polymorphism and divergence in the fruit fly.
doi:10.1371/journal.pbio.0050310
PMCID: PMC2062478  PMID: 17988176
24.  Rapid Identification of Genetic Modifications in Bacillus anthracis Using Whole Genome Draft Sequences Generated by 454 Pyrosequencing 
PLoS ONE  2010;5(8):e12397.
Background
The anthrax letter attacks of 2001 highlighted the need for rapid identification of biothreat agents not only for epidemiological surveillance of the intentional outbreak but also for implementing appropriate countermeasures, such as antibiotic treatment, in a timely manner to prevent further casualties. It is clear from the 2001 cases that survival may be markedly improved by administration of antimicrobial therapy during the early symptomatic phase of the illness; i.e., within 3 days of appearance of symptoms. Microbiological detection methods are feasible only for organisms that can be cultured in vitro and cannot detect all genetic modifications with the exception of antibiotic resistance. Currently available immuno or nucleic acid-based rapid detection assays utilize known, organism-specific proteins or genomic DNA signatures respectively. Hence, these assays lack the ability to detect novel natural variations or intentional genetic modifications that circumvent the targets of the detection assays or in the case of a biological attack using an antibiotic resistant or virulence enhanced Bacillus anthracis, to advise on therapeutic treatments.
Methodology/Principal Findings
We show here that the Roche 454-based pyrosequencing can generate whole genome draft sequences of deep and broad enough coverage of a bacterial genome in less than 24 hours. Furthermore, using the unfinished draft sequences, we demonstrate that unbiased identification of known as well as heretofore-unreported genetic modifications that include indels and single nucleotide polymorphisms conferring antibiotic and phage resistances is feasible within the next 12 hours.
Conclusions/Significance
Second generation sequencing technologies have paved the way for sequence-based rapid identification of both known and previously undocumented genetic modifications in cultured, conventional and newly emerging biothreat agents. Our findings have significant implications in the context of whole genome sequencing-based routine clinical diagnostics as well as epidemiological surveillance of natural disease outbreaks caused by bacterial and viral agents.
doi:10.1371/journal.pone.0012397
PMCID: PMC2928293  PMID: 20811637
25.  Paediatric UK demyelinating disease longitudinal study (PUDDLS) 
BMC Pediatrics  2011;11:68.
Background
There is evidence that at least 5% of Multiple sclerosis (MS) cases manifest in childhood. Children with MS present with a demyelinating episode involving single or multiple symptoms prior to developing a second event (usually within two years) to then meet criteria for diagnosis. There is evidence from adult cohorts that the incidence and sex ratios of MS are changing and that children of immigrants have a higher risk for developing MS. A paediatric population should reflect the vanguard of such changes and may reflect trends yet to be observed in adult cohorts. Studying a paediatric population from the first demyelinating event will allow us to test these hypotheses, and may offer further valuable insights into the genetic and environmental interactions in the pathogenesis of MS.
Methods/Design
The Paediatric UK Demyelinating Disease Longitudinal Study (PUDDLS) is a prospective longitudinal observational study which aims to determine the natural history, predictors and outcomes of childhood CNS inflammatory demyelinating diseases. PUDDLS will involve centres in the UK, and will establish a cohort of children affected with a first CNS inflammatory demyelinating event for long-term follow up by recruiting for approximately 5 years. PUDDLS will also establish a biological sample archive (CSF, serum, and DNA), allowing future hypothesis driven research. For example, the future discovery of a biomarker will allow validation within this dataset for the evaluation of novel biomarkers. Patients will also be requested to consent to be contacted in the future. A secondary aim is to collaborate internationally with the International Paediatric Multiple Sclerosis Study Group when future collaborative studies are proposed, whilst sharing a minimal anonymised dataset. PUDDLS is the second of two jointly funded studies. The first (UCID-SS) is an epidemiological surveillance study that already received ethical approvals, and started on the 1st September 2009. There is no direct patient involvement, and UCID-SS aims to determine the UK and Ireland incidence of CNS inflammatory demyelinating disorders in children under 16 years.
Discussion
A paediatric population should reflect the vanguard of MS epidemiological changes and may reflect trends yet to be observed in adult MS cohorts. The restricted window between clinical expression of disease and exposure to environmental factors in children offers a unique research opportunity. Studying a paediatric population from the first demyelinating event will allow us to investigate the changing epidemiology of MS, and may offer further valuable insights into the genetic and environmental interactions in the pathogenesis of MS.
doi:10.1186/1471-2431-11-68
PMCID: PMC3163536  PMID: 21798048

Results 1-25 (1166532)