The main COSMIC Web site is accessible via the Internet (http://www.sanger.ac.uk/cosmic
). Its front page contains all the options on searching the data, together with an overview of the site's current contents and additional descriptions. This protocol will inform the reader how to retrieve data from the system and how to examine it using the graphical and tabular views to extract the greatest useful information. Each Web page interconnects as much as possible, allowing continuous rounds of query specialization, leading to a Web-like workflow rather than a linear one (). An example walkthrough shows some of COSMIC's main capabilities.
Figure 10.11.12 The main COSMIC workflow. After initial selection of a gene or phenotype to examine, the detail pages link together in a web, allowing navigation through sample, gene, and mutation details, together with redefinition, specialization, or generalization (more ...)
NOTE: All examples, names, and numbers refer to release v33 (September, 2007) and will change over time.
Any Internet-connected computer
Web-browsing software such as Internet Explorer, Firefox, Safari, Netscape
No input files required
The COSMIC home page: Getting started
1. Access the main COSMIC home page at http://www.sanger.ac.uk/cosmic
. This should look similar to (the per-release news items and current content statistics will change frequently).
Figure 10.11.1 The main COSMIC home page detailing current content statistics and top-level search options. The statistics are regenerated every release; in this case, the numbers relate to the October 2007 release. For color version of this figure see http://www.currentprotocols.com (more ...)
COSMIC can be searched in a number of different ways, requiring different parameters. The easiest method is to type a gene name, tissue type, or cancer morphology into the Text Search box. Alternatively, the Detailed Search allows navigation to a gene name or precise cancer phenotype, offering lists of options from which to choose. The Quick Search allows the overview of the genotype information for tumors from a chosen tissue site (e.g., “lung”) in one step. Finally, the Genes from Literature Curation describes those genes which have received complete literature curation, as distinct from those which are from CGP sources only. Also on this page are links to recent announcements from COSMIC, and a running total of the system's current contents.
Simple COSMIC searching
2. To perform a simple search, enter the search term of choice in the Text Search box. In this example, enter KRAS into the search box to perform a search by gene (searching is not case-sensitive). The search results window is displayed, with each option providing a small description.
Most searches in COSMIC are performed simply through the Text Search box. It is possible to search COSMIC for gene names and their HUGO synonyms. Simple searches can also be performed for tissue types (primary locations such as “pancreas,” or subspecializations such as “uterus,” classified in COSMIC as Soft Tissue: Striated Muscle: Uterus), tumor morphology (e.g., “glioma”), sample name (e.g., “HCC38”), or a mutation description (e.g., the common KRAS mutation “c.35G>A”; “p.G12D”). If a COSMIC ID number (the numeric internal database identifier) is known for any of these, it can also be used in a simple search to retrieve very specific datapoints (the KRAS p.G12D mutation has ID number 521). Search terms can be combined: 521 mutation will return only mutations, excluding, for instance, all samples with “521” in their name.
Often a number of options are returned; KRAS returns 82. The first is usually most useful, in this case, a link to the gene summary page for KRAS, followed by 81 mutations identified so far in this gene.
Selecting the link most relevant to the query will display the appropriate summary page. While COSMIC can be navigated gene-centrically or tissue-centrically, summary pages are available to describe all key data including genes, tissues (cancer phenotypes), papers, CGP studies, mutations and samples, and it is links to these that are provided.
Examining the Gene Summary page
3. Click on the top link, “gene: KRAS,” which will show the summary information for the KRAS gene. A page similar to A will appear, summarizing all the information stored in COSMIC about the gene.
Figure 10.11.2 (A) The Gene overview page for KRAS, providing summary statistics of the mutation data, links to the data source overview pages (paper or study) and a series of links external to COSMIC. (B) For a gene with fusion data (e.g., TMPRSS2), an extra element (more ...)
In the Mutation Summary, the spread of mutations across the gene is shown graphically (the scale is the peptide sequence of the gene's product). The mutations combined are drawn in green, with the most frequently mutated position highlighted in red, and the subsequent breakdown by mutation type drawn in black. The mutations graphic is clickable, producing a small menu offering further details and navigation options. The Histogram button links to the core Histogram page detailed in the next section.
Additional Info comprises a list of links to views of the gene's sequence in three forms and to external database links, including the option to view all COSMIC's KRAS data integrated into Ensembl's ContigView via their DAS technology. To use this for the first time, click the lower DAS link to turn on COSMIC's data sources in Ensembl and show the mutation data aligned to the human genome sequence, complete with Ensembl genome browser annotations.
The References section provides a summary of the publications used to compile these data, with links to the papers cited.
The Studies section details the CGP studies' contribution, with links to the study summary pages.
Finally, the Samples section shows the total number of samples investigated in this gene, together with the number which were mutated.
Additionally, if a gene is involved in a fusion event with another gene, a section is inserted under the mutation summary (), detailing the genes' fusion partners together with summary totals. Links are provided to the fusion summary page for the gene pairs.
The Histogram page: Tissue spread, mutation frequencies, and spectra
4. Click the red Histogram button under the mutation graphic. A page will appear describing the gene's mutation spectrum in much more detail, shown at the amino acid level by default (). Click the Sequence Type “cDNA” button under the graphic and then press Display to obtain the nucleotide view of the graphic (). In the table below the graphic, the Details tab is selected by default (). Click the Mutations button to view tabulated details of the mutations on the gene ().
Figure 10.11.3 Graphical representation of the mutation spectrum across the KRAS gene on the amino acid scale (A) and on the nucleotide scale (B). Note the novel introduction of complex mutations. Frequently two or three of these equal nucleotide substitutions, these (more ...)
Figure 10.11.4 (A) Details table from the histogram page, detailing the per-tissue and total sample counts and mutation rates. Only a small portion is shown; 13 tissue types are above “haematopoietic and lymphoid tissue” and 18 below “pancreas.” (more ...)
The default peptide-view graphic () shows a histogram of the single-base substitutions identified in the gene, color coded by residue according to the color scheme used in Ensembl (http://www.ensembl.org
). Underneath this are insertions (red triangles) and deletions (blue triangles). Below this are indications of protein structure together with links to their source (e.g., Pfam, InterPro). Below this are options for zooming into the gene sequence and for selecting amino acid versus nucleotide views. Under the graphic, a statement indicates how many mutations were reported in the gene but were not detailed enough to place on the histogram. At the top of the graphic, buttons are offered to export the data represented by the graphic, together with gene/reference summary pages.
Two tables are available beneath the graphic. The default Details table () contains a breakdown of mutation frequencies per primary tissue type, with totals at the bottom. The Mutations table () describes all the mutations observed on the gene, together with counts of times observed (in brackets) and links to mutation summary pages. Similar mutations may arise a number of times with different counts (e.g., mutation p.G12 V in was counted 2468 times in one instance and 1 time in another). This is usually because the underlying nucleotide change is different, but results in the same effect in the expressed product. The p.G12 V mutation counted 2468 times resulted from the simple c.35G>T substitution, while the p.G12 V counted only once was caused by the compound c.35_36GT>TC substitution (classified ‘complex’). The mutation table is broken down by mutation type (most KRAS mutations are single-base substitutions). All mutation descriptions in COSMIC, as seen here, are compliant with Human Genome Variation Society (HGVS) nomenclature (denDunnen and Antonarakis 2000
), which recommends specific syntaxes for the precise reporting of sequence changes.
Changing the histogram view from amino acid to cDNA provides a nucleotide-centric picture (). The scale on the graphic changes, as do the pictured results. Complex compound substitutions can now be seen as small vertical bars. Primarily 2-bp compound equal substitutions, these usually result in simple missense mutations, only existing as complex compound substitutions at the nucleotide level. Notice that the Mutations table also changes when the graphic is changed to the cDNA view ().
Tabulating selected data
5. Click on the More Details link on the right side of the Details table. In , the link has been clicked for “pancreas,” resulting in a popup box. The three top links offer methods to refine the query used to produce the histogram page. Under Sample Data, export functions are available to tabulate the data summarized in the table. Click the Positive and Negative link.
The three export options allow the viewing of all the data for that tissue for that gene (i.e., KRAS/pancreas), just the mutant samples, or just those without mutations. The tabulated data include sample name and phenotype details, together with mutation details and a link to PubMed for the originating publication. CGP data, released prepublication, have no PubMed link.
The Histogram page: Zooming
6. Click the browser's “back” button to return to the histogram page in cDNA view. Click the peak mutant shown in , and a small popup menu will offer zooming options and further details links. Click the “Zoom in +−5bp” option, and the view will zoom in on the 10 bp surrounding the selected nucleotide ().
The zoomed view shows much more clearly the details of the region, including color-coded annotated nucleotides. (If the zoom window expands beyond 50 bp, the annotations no longer fit and are removed.) The wild-type cDNA and amino acid sequences are labeled on the x axis. Above is the substitution histogram, and below are details of complex compound substitutions followed by insertions and deletions. Note that the tables below the graphic also change with zooming. The Mutations table will only show the sequence variants within the graphic window [in this case, nucleotides 30 to 40 of the KRAS coding domains (CDS) sequence]. The Details table will only count Mutated Samples if they have a mutation between the boundaries, thus reducing the calculated mutation frequency.
7. Click on Zoom Out at the top to return to the full gene view.
Searching COSMIC: Defining a detailed phenotype
8. Clicking on a primary tissue link from the Details table on the Histogram page (left-hand column of ) allows further specialization of the phenotype being investigated. This detailed phenotype search is also available via the Browse by Tissue option on the main COSMIC home page. Options are offered from the COSMIC database, which can be selected singly or in multiples. In this example, to specify a tumor site (tissue), select “pancreas,” click Next, then “ampulla of Vater,” and click Next. To further specialize by tumor morphology (histology), select “carcinoma” and click Next, then select “ductal carcinoma” and click Next. The Tissue Summary page will present a list of genes with statistics to show how many tumor samples have been examined in each gene, and the tumor type's mutation frequency in that gene (). Click on the KRAS gene in the table to go to the Histogram page with the new specialized query, which will now display only mutations for the new phenotype selection ().
Figure 10.11.6 The five most mutated genes in the specialized phenotype, ductal carcinoma of the pancreatic ampulla of Vater (Pancreas: Ampulla of Vater; Carcinoma: Ductal Carcinoma). The small popup menu summarizes the tabulated data for the selected gene, and a link (more ...)
Figure 10.11.7 Starting from the Tissue Overview in , the histogram and tables reflect specialized phenotypes, showing only the data from samples with this specific cancer type. For color version of this figure see http://www.currentprotocols.com.
These selection pages behave differently depending on the number of selections made. At the beginning, selecting five tissues or less will allow further specialization of the tumor site, selecting more will simply skip to the histology selection. The histology section will only offer further specialization if one choice is made. Once a specific selection is made (and a selection does not have to be this specific), the resulting page will show which genes have been analyzed through the selected phenotype and which were mutated in any of the samples. The five genes with the highest mutation frequency will be described in more detail, both graphically and in a table (). The ordering of these five genes is a statistical evaluation of their impact in cancer, a combination of the mutation frequency, and the number of samples examined. Clicking on a gene name in the table or a gene's bar in the chart produces the histogram page (as described above). However, the graphic and tables now reflect only the mutations found in the phenotype specified (), so the numbers are much reduced in both.
To view the full details of the gene again (i.e., remove the specialized phenotype), click on the Switch View button above the histogram graphic. While navigating specialized phenotypes, the current tissue/histology selection is shown in the sidebar on the left hand side; clicking on these links allows you to respecialize.
Examining a mutation in more detail
9. Click on the top portion of the mutant peak in the KRAS tissue histogram picture (green, ), and then click the Mutation Details link in the popup menu. (Alternatively, click on the Mutations button under the histogram to reveal the Mutations table then click on “p.G12 V”). The Mutation Summary page is presented ().
Figure 10.11.8 The Mutation Summary page for the highly oncogenic c.35G>T KRAS mutation. All details of this mutation are presented here, including the Associated Samples list, which is a potentially very long list of samples in which this mutation has been (more ...)
This page presents all the information available for the sequence change selected. The COSMIC mutation ID at the top can be used in a search from the front page, as can the mutation descriptions. The ‘p.’ and ‘c.’ descriptions are the very precise descriptive HGVS-compliant nomenclatures at the protein and cDNA levels, respectively. The position of the mutated residue is indicated graphically on the CDS scale of the gene. Underneath this are the coordinates of the mutation on the current genome golden path sequence (NCBI36 at the time of writing), together with a link to view the mutation in a genomic context in the Genome Browser at Ensembl via its DAS technology. The lower histogram graphic indicates the five tissues in which this mutant was most frequently found. Beneath this is a potentially very long list of samples, in which this mutation has been observed. Links are available to go back to the Tissue Summary and Gene Summary pages.
10. In the box labeled Contig View, click on the link labeled “Click here to switch on the tracks if you have not previously used COSMIC DAS” to be directed to the Ensembl view of this COSMIC data. In Ensembl, zoom out a few times to view the genomic context ()
Figure 10.11.9 COSMIC data viewed in the Ensembl genome browser, starting from the Mutation Summary in . By default initially zoomed into the coordinates immediately around the mutation specified (KRAS c.35G>T), zooming out gives a view of the (more ...)
After clicking the Contig view link, an Ensembl DAS view is presented, extremely zoomed in to the mutation position. After zooming out, a view similar to shows the KRAS gene structure and the position of mutations, together with surrounding genes and other genomic information. Further DAS tracks can be selected in Ensembl to show significantly more data.
Examining a sample in more detail
11. Click the browser's “back” button to return to the Mutation Summary page (), then click on a sample name from the long list on the Mutation Summary page (e.g., “1040576,” the first pancreas sample part way down this page). The Sample Summary page will be shown (), detailing all available information about the sample.
Figure 10.11.10 Selected information boxes from the Sample Overview page, detailing a choice from the c.35G>T Mutation Summary page. This page can be very long, as it brings together all the information about the sample, which can be extensive. This figure only (more ...)
This page can become very long, as it shows all the data available for the selected sample. The page begins with the sample name; this is accurate as far as possible, but where a publication uses a non-unique or ambiguous name, it is replaced by a COSMIC database reference (this can be a number with an E or S prefix, but is more typically a single 7-digit ID). Other details include information on the sample itself and the individual it came from (where available; e.g., age, ethnicity) and the exact phenotype of the sample. Under these details, a list of genes is shown in which the sample has mutations, followed by a listing of the individual mutations with their zygosity and somatic status. Further details include the publications describing the sample and a list of genes examined in this sample that were not mutant.
CGP samples (as opposed to those derived from the literature) are often examined more thoroughly, usually offering a more extensive list of investigated genes together with microsatellite LOH data and intensive microarray CGH analysis. CGP studies often offer prepublication data; in these cases, the information is grouped into “studies” rather than publications, which are navigated similarly.
Viewing the contents of a paper
12. Click on the More Details link in the reference box on the Sample Summary page (Bergmann et al., 2006
A paper often describes the analysis of many samples through many genes. In this example, seven samples have been examined in both the KRAS and BRAF genes and three mutations were found, all in KRAS. All the details are shown here, per paper, with links to PubMed and the originating journal article (in many cases, the DOI link requires a journal subscription). A similar summary view is available for CGP studies, which are groups of functionally related genes for CGP prepublication data.
Further navigation in the histogram: Exporting
13. The paper's list of genes are separated in alphabetical brackets (not shown). For the Bergmann paper, click on the J-L tab, then click KRAS, and then click the Histogram button to return to the histogram as seen before. Click on the scale bar just above 100 to zoom in on this position, then click the Export button.
The Navigation box below the graphic allows the histogram to be changed in many ways, including zooming options whereby nucleotide or amino acid coordinates can be input for a very specific view, changing the default amino acid view to a nucleotide view (and vice versa), and selecting a completely new gene to examine. Above the histogram, more buttons provide further navigation. The Summary button returns to the Gene Summary page, the References button provides a complete listing of all publications used to generate the data onscreen, Zoom Out returns the histogram to the full gene view, and the Export button allows the data summarized in the graphic to be exported in tabular format. A number of export formats are provided; the HTML and the two Text links simply render the data onscreen, so it can be viewed or cut-and-pasted into another application. The MS Excel option saves the data in a file on the client computer in Microsoft Excel spreadsheet format.
Examining gene fusions
14. Press the browser's back button to return to the KRAS histogram (). In the navigation box under the histogram, a pull-down menu offers a similar view of all the genes in COSMIC. Select TMPRSS2 and press the Display button. Click ETV1 in the information box under the TMPRSS2 histogram (), and the Fusion Summary page will be displayed (), detailing the different fused structures observed. Click mutation “115” in the Inferred Breakpoints table to retrieve details for this mutation.
Figure 10.11.11 (A) The information box displayed under the histogram when COSMIC has information on fusion events involving the selected gene (in this example, TMPRSS2). (B) The Fusion Summary page for the TMPRSS2/ETV1 gene pair. Inferred breakpoints are displayed in (more ...)
In this case, TMPRSS2 has no classic small mutations, but an information box indicates that COSMIC has data of fusion events involving this gene, mostly with ERG, but also with ETV1 (). These are specified in the Mutations table (press the red Mutations button), which also link to the fusion variant of the Mutation Summary page. However, the first step in this example is to view the summary of a fused gene pair.
shows the Fusion Summary page for the selected gene pair. In order to accurately describe the published data while ensuring navigability, the fusion data are described in two ways, Inferred Breakpoints and Observed mRNAs. This is due to many papers using expression technologies such as RT-PCR to determine fusions between genes. A number of these studies identify more than one transcript per sample, some finding more than four different products between the same gene pair in one tumor. This implies significant alternative splicing of the mRNAs expressed from the fused gene pair. In order to simplify these data for display and navigation, the position of the genomic breakpoint has been inferred from the experimental data while maintaining the original results.
To do this, it has been assumed that each sample's breakpoint lies between the most 3′ expressed exon of the 5′ gene partner and the most 5′ exon of the 3′ gene partner, from the mRNAs reported in that sample. For instance, in sample “MET26-LN,” two TMPRSS2/ETV1 fusion mRNAs were identified, both containing the downstream sequence from exon 4 of ETV; one fusion (ID 14) contained only the first exon of TMPRSS2, while the other (ID 15) contained the first two exons. Since both were observed in the same sample, the default assumption is that these are splice variants of a fusion between somewhere downstream of TMPRSS2 exon 2 and somewhere upstream of ETV1 exon 4, and the inferred breakpoint for this sample (ID 115) reflects this.
The Mutation Details page (not shown, but accessed by clicking “115” from the Fusion Summary page in ) shows the mutation ID, whether it is an inferred or observed fusion mRNA, and the HGVS-compliant syntax describing the exact details of the mutation, followed by two graphical representations of the mutation. The first graphic describes the mutant mRNA in relation to its wild-type parent genes, the second shows the related mutations. Lastly, a listing of samples containing this fusion mutation is presented.
The Sample Summary page also describes fusion mutations if present, replacing the standard mutations table with a tabulated inferred breakpoint and a graphical list of observed mRNAs; this can be seen by clicking on “MET26-LN” in the Mutation Details page.
Core COSMIC workflow
15. Before beginning a new search, it is useful to review the main pathways through the data in COSMIC. The workflow and interrelationships of the core pages are summarized in .