With the advent of high-throughput sequencing technology, researchers face a bottleneck in terms of the time required to analyse the potential impact on disease aetiology of the many genetic variants routinely detected. Computational algorithms can in principle help researchers to prioritize and direct future experiments by narrowing down the numerous genetic alterations identified in sequencing studies. However, in practice, it can be challenging to run these algorithms in a researcher’s own laboratory, owing to the requirements of third-party software and databases, and large hard disk space and RAM specifications. We have developed Cancer-Related Analysis of VAriants Toolkit (CRAVAT), a web-based application that provides a simple interface to prioritize genes and variants important for tumorigenesis, allowing users to assess millions of variants in a single upload step ().
CRAVAT interface and workflow. (1) Input co-ordinates. (2) Select ‘Cancer driver analysis’, ‘Functional effect analysis’ and/or ‘Gene annotation’. (3) Results are delivered to the provided email address
Numerous web implementations already exist for variant classifiers [reviewed in Karchin (2009)
]. CRAVAT handles both germline and somatic variation but is dedicated to cancer genome analysis. It accepts variant calls from sequencing studies in either genomic coordinates (hg18 or hg19) or transcript coordinates—NCBI Refseq, CCDS and Ensembl (Pruitt et al., 2007
; Flicek et al., 2012
). Variants are mapped onto the best available transcript, using a greedy algorithm (see Supplementary Methods
), and those variants that cause missense changes are identified. These variants can be scored in terms of their predicted impact on tumorigenesis, using the Cancer-Specific High-throughput Annotation of Somatic Mutations (CHASM) method (Carter et al., 2009
). They can also be scored by their predicted impact on protein function, with the Variant Effect Scoring Tool (VEST) (Carter et al., 2013
). Genes are ranked by their most significantly scored variant or mutation. Results are linked with published information from the 1000 Genomes Project (Clarke et al., 2012
), the Exome Sequencing Project, Catalogue of Somatic Mutations in Cancer (COSMIC) (Forbes et al., 2008
), GeneCards (Harel et al., 2009
) and PubMed, enabling users to compare predictions with known gene function, cancer associations and clinical/experimental studies. CRAVAT returns results via
email in Excel and/or tab-separated text. It can also provide a formatted submission file for mutation Position Imaging Toolbox (muPIT) interactive (N.Niknafs et al.
, submitted for publication), allowing users to visualize variants interactively in 3D, together with position-specific annotations.