Human genome sequencing has transformed our understanding of genomic variation and its relevance to health and disease, and is now starting to enter clinical practice for the diagnosis of rare diseases. The question of whether and how some categories of genomic findings should be shared with individual research participants is currently a topic of international debate, and development of robust analytical workflows to identify and communicate clinically relevant variants is paramount.
The Deciphering Developmental Disorders (DDD) study has developed a UK-wide patient recruitment network involving over 180 clinicians across all 24 regional genetics services, and has performed genome-wide microarray and whole exome sequencing on children with undiagnosed developmental disorders and their parents. After data analysis, pertinent genomic variants were returned to individual research participants via their local clinical genetics team.
Around 80 000 genomic variants were identified from exome sequencing and microarray analysis in each individual, of which on average 400 were rare and predicted to be protein altering. By focusing only on de novo and segregating variants in known developmental disorder genes, we achieved a diagnostic yield of 27% among 1133 previously investigated yet undiagnosed children with developmental disorders, whilst minimising incidental findings. In families with developmentally normal parents, whole exome sequencing of the child and both parents resulted in a 10-fold reduction in the number of potential causal variants that needed clinical evaluation compared to sequencing only the child. Most diagnostic variants identified in known genes were novel and not present in current databases of known disease variation.
Implementation of a robust translational genomics workflow is achievable within a large-scale rare disease research study to allow feedback of potentially diagnostic findings to clinicians and research participants. Systematic recording of relevant clinical data, curation of a gene–phenotype knowledge base, and development of clinical decision support software are needed in addition to automated exclusion of almost all variants, which is crucial for scalable prioritisation and review of possible diagnostic variants. However, the resource requirements of development and maintenance of a clinical reporting system within a research setting are substantial.
Health Innovation Challenge Fund, a parallel funding partnership between the Wellcome Trust and the UK Department of Health.
Health-related results that are discovered in the process of genomic research should only be returned to research participants after being clinically validated and then delivered and followed up within a health service. Returning such results may be difficult for genomic researchers who are limited by resources or unable to access appropriate clinicians. Raw sequence data could, in theory, be returned instead. This might appear nonsensical as, on its own, it is a meaningless code with no clinical value. Yet, as and when direct to consumer genomics services become more widely available (and can be endorsed by independent health professionals and genomic researchers alike), the return of such data could become a realistic proposition. We explore attitudes from <7000 members of the public, genomic researchers, genetic health professionals and non-genetic health professionals and ask participants to suggest what they would do with a raw sequence, if offered it. Results show 62% participants were interested in using it to seek out their own clinical interpretation. Whilst we do not propose that raw sequence data should be returned at the moment, we suggest that should this become feasible in the future, participants of sequencing studies may possibly support this.
Diagnosis; Genetics; Genome-wide; Getting Research into Practice; Ethics
RNA viruses have high mutation rates and exist within their hosts as large, complex and heterogeneous populations, comprising a spectrum of related but non-identical genome sequences. Next generation sequencing is revolutionising the study of viral populations by enabling the ultra deep sequencing of their genomes, and the subsequent identification of the full spectrum of variants within the population. Identification of low frequency variants is important for our understanding of mutational dynamics, disease progression, immune pressure, and for the detection of drug resistant or pathogenic mutations. However, the current challenge is to accurately model the errors in the sequence data and distinguish real viral variants, particularly those that exist at low frequency, from errors introduced during sequencing and sample processing, which can both be substantial.
We have created a novel set of laboratory control samples that are derived from a plasmid containing a full-length viral genome with extremely limited diversity in the starting population. One sample was sequenced without PCR amplification whilst the other samples were subjected to increasing amounts of RT and PCR amplification prior to ultra-deep sequencing. This enabled the level of error introduced by the RT and PCR processes to be assessed and minimum frequency thresholds to be set for true viral variant identification. We developed a genome-scale computational model of the sample processing and NGS calling process to gain a detailed understanding of the errors at each step, which predicted that RT and PCR errors are more likely to occur at some genomic sites than others. The model can also be used to investigate whether the number of observed mutations at a given site of interest is greater than would be expected from processing errors alone in any NGS data set. After providing basic sample processing information and the site’s coverage and quality scores, the model utilises the fitted RT-PCR error distributions to simulate the number of mutations that would be observed from processing errors alone.
These data sets and models provide an effective means of separating true viral mutations from those erroneously introduced during sample processing and sequencing.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1456-x) contains supplementary material, which is available to authorized users.
Quasispecies; Viral population; Viral evolution; High-throughput sequencing; Next generation sequencing; Variant Calling; Sequencing error correction; RT-PCR Errors; Rare mutations
There is no universally accepted definition of what an incidental finding is [Wolf et al., 2008] and broadly speaking this could include variants of known and unknown clinical significance, variants linked to highly penetrant, serious, life-threatening conditions, non-paternity or ancestry data. For the purposes of our study, we have adopted a pragmatic distinction between ‘pertinent’ and ‘incidental’ findings as set out in this text. Whilst in the US definitions of incidental findings are becoming accepted in practice [Green et al., 2013] it is still not known how and whether these also apply elsewhere around the world.
•Study characterised genomic sequences from archived FMDV samples collected during the 1960s.•The epidemic in 1967–68 occurred as a result of an independent introduction of FMDV into the UK.•Molecular clock used to identify viral sequences that accrued fewer than expected substitutions.•Study supports the use of sequences to trace RNA viruses during outbreaks and epidemics.
A large epidemic of foot-and-mouth disease (FMD) occurred in the United Kingdom (UK) over a seven month period in Northwest England from late 1967 to the summer of 1968. This was preceded by a number of smaller FMD outbreaks in the country, two in 1967, in Hampshire and Warwickshire and one in Northumberland during 1966. The causative agent of all four events was identified as FMD virus (FMDV) serotype O and the source of the large epidemic was attributed to infected bone marrow in lamb products imported from Argentina. However, the diagnostic tools available at the time were unable to entirely rule out connections with the earlier UK FMD outbreaks, as well as other potential sources from Europe. The aim of this study was to apply molecular sequencing to investigate the likely source of this epidemic using VP1 region and full genome (FG) sequences determined directly from clinical epithelium samples (n = 13) or cell culture isolates (n = 6), from this and contemporary outbreaks in the UK, Europe and South America. Analysis of the VP1 sequences provided evidence for at least three separate incursions of FMDV into the UK including one independent introduction that was responsible for the main 1967/68 epidemic. Analysis of FG sequences from the main 1967/68 outbreak (n = 10) revealed nucleotide substitutions at 94 genomic sites providing evidence for the linear accumulation of nucleotide substitutions (rate = 2.42 × 10−5 nt substitutions/site/day). However, there were five samples where this linear relationship was absent, indicating evolutional dormancy of the virus, presumably outside a host. These results help define the evolutionary dynamics of FMDV during an epidemic and contribute to the knowledge and understanding from which to base future outbreak control strategies.
Foot-and-mouth disease; Epidemic; Full-genome sequencing; Phylogenetics; United Kingdom
The Human Phenotype Ontology (HPO) project, available at http://www.human-phenotype-ontology.org, provides a structured, comprehensive and well-defined set of 10,088 classes (terms) describing human phenotypic abnormalities and 13,326 subclass relations between the HPO classes. In addition we have developed logical definitions for 46% of all HPO classes using terms from ontologies for anatomy, cell types, function, embryology, pathology and other domains. This allows interoperability with several resources, especially those containing phenotype information on model organisms such as mouse and zebrafish. Here we describe the updated HPO database, which provides annotations of 7,278 human hereditary syndromes listed in OMIM, Orphanet and DECIPHER to classes of the HPO. Various meta-attributes such as frequency, references and negations are associated with each annotation. Several large-scale projects worldwide utilize the HPO for describing phenotype information in their datasets. We have therefore generated equivalence mappings to other phenotype vocabularies such as LDDB, Orphanet, MedDRA, UMLS and phenoDB, allowing integration of existing datasets and interoperability with multiple biomedical resources. We have created various ways to access the HPO database content using flat files, a MySQL database, and Web-based tools. All data and documentation on the HPO project can be found online.
The DECIPHER database (https://decipher.sanger.ac.uk/) is an accessible online repository of genetic variation with associated phenotypes that facilitates the identification and interpretation of pathogenic genetic variation in patients with rare disorders. Contributing to DECIPHER is an international consortium of >200 academic clinical centres of genetic medicine and ≥1600 clinical geneticists and diagnostic laboratory scientists. Information integrated from a variety of bioinformatics resources, coupled with visualization tools, provides a comprehensive set of tools to identify other patients with similar genotype–phenotype characteristics and highlights potentially pathogenic genes. In a significant development, we have extended DECIPHER from a database of just copy-number variants to allow upload, annotation and analysis of sequence variants such as single nucleotide variants (SNVs) and InDels. Other notable developments in DECIPHER include a purpose-built, customizable and interactive genome browser to aid combined visualization and interpretation of sequence and copy-number variation against informative datasets of pathogenic and population variation. We have also introduced several new features to our deposition and analysis interface. This article provides an update to the DECIPHER database, an earlier instance of which has been described elsewhere [Swaminathan et al. (2012) DECIPHER: web-based, community resource for clinical interpretation of rare variants in developmental disorders. Hum. Mol. Genet., 21, R37–R44].
Patients with developmental disorders often harbour sub-microscopic deletions or duplications that lead to a disruption of normal gene expression or perturbation in the copy number of dosage-sensitive genes. Clinical interpretation for such patients in isolation is hindered by the rarity and novelty of such disorders. The DECIPHER project (https://decipher.sanger.ac.uk) was established in 2004 as an accessible online repository of genomic and associated phenotypic data with the primary goal of aiding the clinical interpretation of rare copy-number variants (CNVs). DECIPHER integrates information from a variety of bioinformatics resources and uses visualization tools to identify potential disease genes within a CNV. A two-tier access system permits clinicians and clinical scientists to maintain confidential linked anonymous records of phenotypes and CNVs for their patients that, with informed consent, can subsequently be shared with the wider clinical genetics and research communities. Advances in next-generation sequencing technologies are making it practical and affordable to sequence the whole exome/genome of patients who display features suggestive of a genetic disorder. This approach enables the identification of smaller intragenic mutations including single-nucleotide variants that are not accessible even with high-resolution genomic array analysis. This article briefly summarizes the current status and achievements of the DECIPHER project and looks ahead to the opportunities and challenges of jointly analysing structural and sequence variation in the human genome.
Advances in sequencing technology coupled with new integrative approaches to data analysis provide a potentially transformative opportunity to use pathogen genome data to advance our understanding of transmission. However, to maximize the insights such genetic data can provide, we need to understand more about how the microevolution of pathogens is observed at different scales of biological organization. Here, we examine the evolutionary processes in foot-and-mouth disease virus observed at different scales, ranging from the tissue, animal, herd and region. At each scale, we observe analogous processes of population expansion, mutation and selection resulting in the accumulation of mutations over increasing time scales. While the current data are limited, rates of nucleotide substitution appear to be faster over individual-to-individual transmission events compared with those observed at a within-individual scale suggesting that viral population bottlenecks between individuals facilitate the fixation of polymorphisms. Longer-term rates of nucleotide substitution were found to be equivalent in individual-to-individual transmission compared with herd-to-herd transmission indicating that viral diversification at the herd level is not retained at a regional scale.
virus; evolution; scales; foot-and-mouth disease; transmission; bottlenecks
Analysis of full-genome sequences was previously used to trace the origin and transmission pathways of foot-and-mouth disease virus (FMDV) outbreaks in the UK in 2001 and 2007. Interpretation of these data was sometimes at variance with conventional epidemiological tracing, and was also used to predict the presence of undisclosed infected premises that were later discovered during serological surveillance. Here we report the genome changes associated with sequential passage of a highly BHK-21-cell-adapted (heparan sulphate-binding) strain of FMDV arising from two independent transmission chains in cattle. In vivo virus replication rapidly selected for a wild-type variant with an amino acid substitution at VP356. Full-genome sequence analysis clearly demonstrated sequence divergence during parallel passage. The genetic diversity generated over the course of infection and the rate at which these changes became fixed and were transmitted between cattle occurred at a rate sufficient to enable reliable tracing of transmission pathways at the level of the individual animal. However, tracing of transmission pathways was only clear when sequences from epithelial lesions were compared. Sequences derived from oesophageal–pharyngeal scrapings were problematic to interpret, with a varying number of ambiguities suggestive of a more diverse virus population. These findings will help to correctly interpret full-genome sequence analyses to resolve transmission pathways within future FMDV epidemics.
RNA virus populations within samples are highly heterogeneous, containing a large number of minority sequence variants which can potentially be transmitted to other susceptible hosts. Consequently, consensus genome sequences provide an incomplete picture of the within- and between-host viral evolutionary dynamics during transmission. Foot-and-mouth disease virus (FMDV) is an RNA virus that can spread from primary sites of replication, via the systemic circulation, to found distinct sites of local infection at epithelial surfaces. Viral evolution in these different tissues occurs independently, each of them potentially providing a source of virus to seed subsequent transmission events. This study employed the Illumina Genome Analyzer platform to sequence 18 FMDV samples collected from a chain of sequentially infected cattle. These data generated snap-shots of the evolving viral population structures within different animals and tissues. Analyses of the mutation spectra revealed polymorphisms at frequencies >0.5% at between 21 and 146 sites across the genome for these samples, while 13 sites acquired mutations in excess of consensus frequency (50%). Analysis of polymorphism frequency revealed that a number of minority variants were transmitted during host-to-host infection events, while the size of the intra-host founder populations appeared to be smaller. These data indicate that viral population complexity is influenced by small intra-host bottlenecks and relatively large inter-host bottlenecks. The dynamics of minority variants are consistent with the actions of genetic drift rather than strong selection. These results provide novel insights into the evolution of FMDV that can be applied to reconstruct both intra- and inter-host transmission routes.
Since the development of technologies that can determine the base-pair sequence of DNA, the ability to sequence genes has contributed much to science and medicine. However, it has remained a relatively costly and laborious process, hindering its use as a routine biomedical tool. Recent times are seeing rapid developments in this field, both in the availability of novel sequencing platforms, as well as supporting technologies involved in processes such as targeting and data analysis. This is leading to significant reductions in the cost of sequencing a human genome and the potential for its use as a routine biomedical tool. This review is a snapshot of this rapidly moving field examining the current state of the art, forthcoming developments and some of the issues still to be resolved prior to the use of new sequencing technologies in routine clinical diagnosis.
Next generation sequencing; Targeting; Massively parallel
Cell-free fetal DNA (cffDNA) can be detected in maternal blood during pregnancy, opening the possibility of early non-invasive prenatal diagnosis for a variety of genetic conditions. Since 1997, many studies have examined the accuracy of prenatal fetal sex determination using cffDNA, particularly for pregnancies at risk of an X-linked condition. Here we report a review and meta-analysis of the published literature to evaluate the use of cffDNA for prenatal determination (diagnosis) of fetal sex. We applied a sensitive search of multiple bibliographic databases including PubMed (MEDLINE), EMBASE, the Cochrane library and Web of Science.
Ninety studies, incorporating 9,965 pregnancies and 10,587 fetal sex results met our inclusion criteria. Overall mean sensitivity was 96.6% (95% credible interval 95.2% to 97.7%) and mean specificity was 98.9% (95% CI = 98.1% to 99.4%). These results vary very little with trimester or week of testing, indicating that the performance of the test is reliably high.
Based on this review and meta-analysis we conclude that fetal sex can be determined with a high level of accuracy by analyzing cffDNA. Using cffDNA in prenatal diagnosis to replace or complement existing invasive methods can remove or reduce the risk of miscarriage. Future work should concentrate on the economic and ethical considerations of implementing an early non-invasive test for fetal sex.
Cell-free fetal DNA; Meta-analysis; Non-invasive prenatal diagnosis
The rapid and continuing progress in gene discovery for complex diseases is fuelling interest in the potential application of genetic risk models for clinical and public health practice.The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality.Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction.A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by prior reporting guidelines.These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
The diverse sequences of viral populations within individual hosts are the starting material for selection and subsequent evolution of RNA viruses such as foot-and-mouth disease virus (FMDV). Using next-generation sequencing (NGS) performed on a Genome Analyzer platform (Illumina), this study compared the viral populations within two bovine epithelial samples (foot lesions) from a single animal with the inoculum used to initiate experimental infection. Genomic sequences were determined in duplicate sequencing runs, and the consensus sequence of the inoculum determined by NGS was identical to that previously determined using the Sanger method. However, NGS revealed the fine polymorphic substructure of the viral population, from nucleotide variants present at just below 50% frequency to those present at fractions of 1%. Some of the higher-frequency polymorphisms identified encoded changes within codons associated with heparan sulfate binding and were present in both foot lesions, revealing intermediate stages in the evolution of a tissue culture-adapted virus replicating within a mammalian host. We identified 2,622, 1,434, and 1,703 polymorphisms in the inoculum and in the two foot lesions, respectively: most of the substitutions occurred in only a small fraction of the population and represented the progeny from recent cellular replication prior to onset of any selective pressures. We estimated the upper limit for the genome-wide mutation rate of the virus within a cell to be 7.8 × 10−4 per nucleotide. The greater depth of detection achieved by NGS demonstrates that this method is a powerful and valuable tool for the dissection of FMDV populations within hosts.
The rapid and continuing progress in gene discovery for complex diseases is fueling interest in the potential application of genetic risk models for clinical and public health practice. The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality. Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction. A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by previous reporting guidelines. These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
The rapid and continuing progress in gene discovery for complex diseases is fuelling interest in the potential application of genetic risk models for clinical and public health practice. The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality. Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction. A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by prior reporting guidelines. These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
Genetic; Risk prediction; Methodology; Guidelines; Reporting
In June 2009, the Science and Technology Committee of the UK House of Lords published a report on genomic medicine, based on expert evidence collected over an 18-month period. Crucially, the report signaled that the use of genomic medicine was at a crossroads, due to the rapid development of new technologies, and opened up opportunities across the whole of medicine and healthcare. This commentary responds to the report's call for a new health service strategy, including a new genetics White Paper from the Government, and suggests some of the important elements that need further consideration.