|Home | About | Journals | Submit | Contact Us | Français|
The molecular diversity of viruses complicates the interpretation of viral genomic and proteomic data. To make sense of viral gene functions, investigators must be familiar with the virus host range, replication cycle and virion structure. Our aim is to provide a comprehensive resource bridging together textbook knowledge with genomic and proteomic sequences. ViralZone web resource (www.expasy.org/viralzone/) provides fact sheets on all known virus families/genera with easy access to sequence data. A selection of reference strains (RefStrain) provides annotated standards to circumvent the exponential increase of virus sequences. Moreover ViralZone offers a complete set of detailed and accurate virion pictures.
Viruses are presumably the most abundant biological entities on the planet, with the total number of virus particles exceeding by 10 times the total number of cells (25). Many viruses have a relatively small genome encoding for a few proteins: one of the smallest being the circovirus with a 1.7-kb genome coding only two proteins (11). Despite their apparent simplicity, viral biochemistry and replication mechanisms are more varied than those seen in the entire bacterial, plant and animal kingdoms (15, 19). Nearly every possible method for encoding information in nucleic acid is exploited by viruses, from single-stranded DNA to double-stranded RNA. Each of the 83 virus families has a different replication strategy which calls for unique proteins and unique enzymes (19). For example the replication cycles of Human herpesvirus 1 (HHV-1) and Ebolavirus (EBOV) have nothing in common (Figure 1). The dsDNA HHV-1 genome encodes 73 proteins and replicates in the host nucleus where new viral genomes are encapsidated before budding through the endoplasmic reticulum and then into vesicles that will release the virion out of the cell (6,26). The EBOV ssRNA genome encodes eight proteins, replicates in the host cell cytoplasm using its own RNA-dependent RNA polymerase complex and buds directly at the plasma membrane (2). These two disparate replication cycles only illustrate the tremendous variety of viral molecular biology. As a result, it is crucial to have a clear vision of a specific virus’ biology in order to understand its genome and protein functions. Yet this information is hardly available outside academic books.
To help solve this problem, The Swiss-Prot virus annotation team has developed a website dedicated to viruses: ViralZone (www.expasy.org/viralzone). The concept of this website is to link specific knowledge for each virus family with viral protein and genomic sequences. All the available information is presented in a concise and accessible virus fact sheet. The fact sheets contain condensed information about genome, replication cycle, taxonomy and epidemiology as well as graphics describing virion organization, genome transcription and translation strategies. The whole site comprises 426 fact sheets covering the whole known virosphere: 83 families, 334 genera and nine additional pages dedicated to important species like Influenza H1N1 or HIV-1.
Unlike Luca for cellular organism (14), there is no presumed common ancestor for viruses (12). Therefore current virus classification comprises seven independent classes, according to the Baltimore system (4). This classification is based on the nature of the nucleic acids in the virion particle: dsDNA, ssDNA, dsRNA, ss(+)RNA, ss(−)RNA, ssRNA(RT) or ssDNA(RT).
Virus abundance on earth is higher than initially expected and recent studies have unveiled millions of viruses per millilitre of seawater and billions per cubic centimeter in nearshore surface sediments (24,16); most of them are unidentified. As virus discovery accelerates, virus taxonomy has to be modified and completed each year (Table 1). In ViralZone, the starting point to access virus fact sheets are the seven Baltimore taxonomic pages (4) containing the whole list of known virus families and genera (Figure 2A). This list is reviewed each year as new viruses are constantly being described (8). The advantage of a website is that it can be incrementally updated while it can take years to publish new reference books. For example, the International Committee on Taxonomy of Viruses (ICTV) published important taxonomic changes on August 2009 on its website and the ViralZone taxonomy was updated accordingly only one month later.
From a public health point of view, providing comprehensive knowledge for all known virus genera turns out to be extremely useful when a new pathogen emerges out of a neglected virus family. A recent example is provided by the Xenotropic Moloney murine leukaemia virus-Related Virus (XMRV), which has recently raised the interest of the scientific community for its potential involvement in prostate cancer (22) and/or chronic fatigue syndrome (18). Since specific gammaretrovirus resources on the web were scarce, a direct consequence has been a dramatic increase in the number of hits to the corresponding ViralZone page (www.expasy.org/viralzone/all_by_species/67.html) that reached close to 2000 visitors in November–December 2009 (source: Google Analytics).
Virus host ranges can be quite narrow, e.g. the human hepatitis B virus which is strictly restricted to Human, or very large e.g. the rabies virus which seems to be able to successfully infect any mammal. Knowing the host tropism is essential to understanding the viral molecular biology. For example a dsDNA viral genome are transcribed differently in a bacteria or in a eukaryote. Moreover virus host range has a dramatic importance for public health, as illustrated by zoonosis like SARS, Ebola or Influenza that are caused by viruses able to mutate and cross hosts barriers, thus threatening the human population. For all these reasons the display of virus host tropism is highlighted in ViralZone. The hosts are indicated by a colour code for each virus genera on the taxonomy pages (Figure 2B). ViralZone display of hosts is restricted to the natural reservoirs. Vectors hosts, dead-end or laboratory hosts are not described here except if a human host/cell-line is involved.
Virus families can be browsed ‘by host’, allowing users to easily identify which viral families infect Humans, non-human vertebrates, plants, eukaryotic microorganisms, archaea or bacteria. A complete list of all major virus species able to infect humans is accessible through the ViralZone home page (www.expasy.org/viralzone/all_by_species/678.html).
Virus fact sheets provide concise and specific information on molecular biology, taxonomy, hosts, and epidemiological data (Figure 2C). The first tab: ‘General’ describes molecular biology, virion and genome organization, followed by a step-by-step description of the viral cell infection cycle. The database section allows easy access to NCBI nucleotide and UniProtKB protein entries, as well as to specific virology databases such as VIPR (http://www.viprbrc.org) VIPERdb (7,23), Descriptions of Plant Viruses (1) and VBRC (www.vbrc.org, virology.ca). Host and cell tropism are generally indicated, but the latter might be absent since this kind of data can be unknown, or difficult to access in the literature. Cell receptor(s) for virus entry are also listed and links are provided to relevant publication(s). Finally epidemiological data briefly describe associated diseases and virus transmission as well as vaccines (if available) or antiviral drugs effective against this virus.
Under the ‘Proteins by Strain’ tab, strains and/or isolates are listed together with the proteins they encode. This list displays all related UniProtKB/Swiss-Prot entries (Figure 2D). These are manually curated entries with data extracted from publications. All the proteins annotated for a given virus strain or isolate are accessible at once for a given virus. Alternatively, the ‘Proteins by Name’ tab displays clusters of proteins having the same name and function (Figure 2E). This sorting is possible because the protein entries have been manually curated and have a coherent naming system. Calling the ClustalW alignment software (Figure 2F) directly from this page allows the user to align a set of these proteins to quickly generate a general alignment of any given viral protein family. Reference strain entries are clearly indicated, giving a landmark to users looking for the optimal data for a given viral protein family.
Virions are very diverse in shape and structure; they can be enveloped with one to several lipid bilayers or naked, and the genetic material can be protected by one, two or even three capsids showing helical or icosahedric symmetry and whose size is often related to genome length: from 17nm (Porcine circovirus, 1.7kb) to 400nm in diameter (Mimivirus, 1200kb). Virion pictures and diagrams can be found in Virology books or publications, but often with heterogeneous quality, colours and resolution. We created 160 original virion diagrams for ViralZone covering all known viral families and genera described to date. All the figures share the same concept and resolution, with defined colours for each part of the viral particle. Virions with icosahedric capsid symmetry are represented first as seen from a cross-section, then with 3D-like picture showing precise capsid architecture (Figure 3). Structural proteins are coloured in the same way in virion and genome pictures.
All these pictures are available to the scientific community, and freely accessible on the ViralZone web site. Permission is granted to download and use them for any academic purposes: thesis, presentations or publications, provided the source is acknowledged (source: ViralZone www.expasy.ch/viralzone, Swiss Institute of Bioinformatics).
Virus genomes are relatively small, mostly <50kb, and are therefore easy and relatively inexpensive to sequence. This has resulted in an exponential increase in the number of new virus isolates deposited in the sequence databases. Of the 851503 viral protein entries in UniProtKB, the species with the greatest number of open reading frames (ORF) deposited in UniProtKB is HIV-1 with 313532 different ORFs, while the human proteome only accounts for 77225 entries (Table 2) (UniProt release 15.12).
As manually annotating all UniProtKB viral proteins is not achievable, we selected about one reference strain (RefStrain) per genus to be fully curated. These RefStrains have been preferentially chosen in the genus type species, and belong to the NCBI Reference Sequences database (RefSeq) in which viral genomes have been manually reviewed (5). The 355 RefStrains selected account for 12576 proteins, which are representative of the diversity of all virus genera and can be reasonably easily maintained in an annotated and updated form to reflect ongoing research. These RefStrains are now accessible through each ViralZone fact sheet which provides links to the corresponding RefSeq genome and UniProtKB virus proteome. RefStrains allow users to know which sequences to look at in order to have the best and most up-to-date information for any given virus, and can serve as templates to correctly annotate all similar viruses, an area of high interest to the bioinformatics community.
ViralZone is regularly updated with new information extracted from publications and scientific meetings abstracts. Users are also actively contributing by sending feedback, minor corrections and ideas to email@example.com. Future improvements will permit the further development of the viral molecular biology section, which will in turn be linked from fact sheets. Replication cycles are for the moment described in text format for all virus fact sheets but pictures would be more suited. An example of such picture is already accessible for the Inoviridae family (www.expasy.org/viralzone/all_by_species/675.html). The design of a virus specific controlled Gene Ontology (GO) that will both facilitate gene analysis and enhance data exchange between viral sequence databases is under way in collaboration with the GO consortium (13,27).
ViralZone is a freely accessible web resource that offers accurate and concise virus information for all known viruses. It displays high quality virion pictures available to all the scientific community. The site also functions as a hub for all scientists interested in virus knowledge, by bringing together virus metadata with genomic and protein sequence databases. Indeed the ViralZone web resource has already been cited as a source of data in several publications (3,17,21,28), dozens of thesis, many scientific web sites, and the virion figures are already widely used to support communication and teaching in Virology.
Supplementary Data are available at NAR Online.
Swiss Federal Government through the Federal Office of Education; Science grant Swiss Institute of Bioinformatics (SIB) (www.isb-sib.ch). Funding for open access charge: Swiss Institute of Bioinformatics.
Conflict of interest statement. None declared.