Background. Infection with hepatitis C virus (HCV) is a burgeoning worldwide public health problem, with 170 million infected individuals and an estimated 20 million deaths in the coming decades. While 6 main genotypes generally distinguish the global geographic diversity of HCV, a multitude of closely related subtypes within these genotypes are poorly defined and may influence clinical outcome and treatment options. Unfortunately, the paucity of genetic data from many of these subtypes makes time-consuming primer walking the limiting step for sequencing understudied subtypes.
Methods. Here we combined long-range polymerase chain reaction amplification with pyrosequencing for a rapid approach to generate the complete viral coding region of 31 samples representing poorly defined HCV subtypes.
Results. Phylogenetic classification based on full genome sequences validated previously identified HCV subtypes, identified a recombinant sequence, and identified a new distinct subtype of genotype 4. Unlike conventional sequencing methods, use of deep sequencing also facilitated characterization of minor drug resistance variants within these uncommon or, in some cases, previously uncharacterized HCV subtypes.
Conclusions. These data aid in the classification of uncommon HCV subtypes while also providing a high-resolution view of viral diversity within infected patients, which may be relevant to the development of therapeutic regimens to minimize drug resistance.
Hepatitis C virus; pyrosequencing; subtype classification; drug resistance mutations; viral diversity
We report the rational design and in vivo testing of mosaic proteins for a polyvalent pan-filoviral vaccine using a computational strategy designed for the Human Immunodeficiency Virus type 1 (HIV-1) but also appropriate for Hepatitis C virus (HCV) and potentially other diverse viruses. Mosaics are sets of artificial recombinant proteins that are based on natural proteins. The recombinants are computationally selected using a genetic algorithm to optimize the coverage of potential cytotoxic T lymphocyte (CTL) epitopes. Because evolutionary history differs markedly between HIV-1 and filoviruses, we devised an adapted computational technique that is effective for sparsely sampled taxa; our first significant result is that the mosaic technique is effective in creating high-quality mosaic filovirus proteins. The resulting coverage of potential epitopes across filovirus species is superior to coverage by any natural variants, including current vaccine strains with demonstrated cross-reactivity. The mosaic cocktails are also robust: mosaics substantially outperformed natural strains when computationally tested against poorly sampled species and more variable genes. Furthermore, in a computational comparison of cross-reactive potential a design constructed prior to the Bundibugyo outbreak performed nearly as well against all species as an updated design that included Bundibugyo. These points suggest that the mosaic designs would be more resilient than natural-variant vaccines against future Ebola outbreaks dominated by novel viral variants. We demonstrate in vivo immunogenicity and protection against a heterologous challenge in a mouse model. This design work delineates the likely requirements and limitations on broadly-protective filoviral CTL vaccines.
Hemorrhagic fever viruses (HFVs) are a diverse set of over 80 viral species, found in 10 different genera comprising five different families: arena-, bunya-, flavi-, filo- and togaviridae. All these viruses are highly variable and evolve rapidly, making them elusive targets for the immune system and for vaccine and drug design. About 55 000 HFV sequences exist in the public domain today. A central website that provides annotated sequences and analysis tools will be helpful to HFV researchers worldwide. The HFV sequence database collects and stores sequence data and provides a user-friendly search interface and a large number of sequence analysis tools, following the model of the highly regarded and widely used Los Alamos HIV database [Kuiken, C., B. Korber, and R.W. Shafer, HIV sequence databases. AIDS Rev, 2003. 5: p. 52–61]. The database uses an algorithm that aligns each sequence to a species-wide reference sequence. The NCBI RefSeq database [Sayers et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 39, D38–D51.] is used for this; if a reference sequence is not available, a Blast search finds the best candidate. Using this method, sequences in each genus can be retrieved pre-aligned. The HFV website can be accessed via http://hfv.lanl.gov.
Large sequence datasets provide an opportunity to investigate the dynamics of pathogen epidemics. Thus, a fast method to estimate the evolutionary rate from large and numerous phylogenetic trees becomes necessary. Based on minimizing tip height variances, we optimize the root in a given phylogenetic tree, to estimate the most homogenous evolutionary rate between samples from at least two different time points. Simulations showed that the method had no bias in the estimation of evolutionary rates, and that it was robust to tree rooting and topological errors. We show that the evolutionary rates of HIV-1 subtype B and C epidemics have changed over time, with the rate of evolution inversely correlated to the rate of virus spread. For subtype B the evolutionary rate slowed down and tracked the start of the HAART era in 1996. Subtype C in Ethiopia showed an increase in the evolutionary rate when the prevalence increase markedly slowed down in 1995. Thus, we show that the evolutionary rate of HIV-1 on the population level dynamically tracks epidemic events.
Viral evolution; Molecular epidemiology; Phylogeny; TreeRate
Improvements in DNA sequencing technologies portend a new era in virology and could possibly lead to a giant leap in our understanding of viral evolution and ecology. Yet, as viral genome sequences begin to fill the world’s biological databases, it is critically important to recognize that the scientific promise of this era is dependent on consistent and comprehensive genome annotation. With this in mind, the NCBI Genome Annotation Workshop recently hosted a study group tasked with developing sequence, function, and metadata annotation standards for viral genomes. This report describes the issues involved in viral genome annotation and reviews policy recommendations presented at the NCBI Annotation Workshop.
virus; genome; annotation
To investigate the viral features of long-term nonprogressive HIV-1 infection and the selection of viral genomes, we studied serial complete HIV-1 sequences obtained from a mother–child pair, both long-term nonprogressors. Analysis of four genomic sequences demonstrated that all viral genes were intact, lacking major deletions or premature stop codons to easily explain the slow disease progression. These data suggest that viral attenuation, if present, was caused by subtle sequence variations or virus–host interactions. Serial sequences from an HIV-1-infected mother–child pair afforded us the opportunity to examine the immune selection of HIV-1 sequences years after transmission between individuals. We demonstrated that the daughter's strains were most likely subjected to immunoselection or immunoediting according to the presence of novel MHC class I alleles that differed between mother and daughter. An analysis of nef-specific cytotoxic T-lymphocyte responses in the child, whose HIV-1 nef sequence differed from the maternal nef, supported this interpretation. This study highlights the potential of full genome analysis in the investigation of pathogenesis and immune selection during HIV-1 evolution.
Infection with genotype 4 of the Hepatitis C virus is common in Africa and the Mediterranean area, but has also been found at increasing frequencies in injection drug users in Europe and North America. Full length viral sequences to characterize viral diversity and structure have recently become available mostly for subtype 4a, and studies in Egypt and Saudi Arabia, where high proportions of subtype 4a infected patients exist, have begun to establish optimized treatment regimens. However knowledge about other subtype variants of genotype 4 present in less developed African states is lacking. In this study the full coding region from so far poorly characterized variants of HCV genotype 4 was amplified and sequenced using a long range PCR technique. Sequences were analyzed with respect to phylogenetic relationship, possible recombination and prominent sequence characteristics compared to other known HCV strains. We present for the first time two full-length sequences from the HCV genotype 4k, in addition to five strains from HCV genotypes 4d and 4f. Reference sequences for accurate HCV genotyping are required for optimized treatment, and a better knowledge of the global viral sequence diversity is needed to guide vaccines or new drugs effective in the world wide epidemic.
Hepatitis C Virus; subtype; long template PCR
Classification of viral sequences should be fast, objective, accurate, and reproducible. Most methods that classify sequences use either pairwise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the branching index for subtype classification in HCV and HIV-1. Pairs of BI values with known positive and negative test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signal that groups reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false positive and false negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not uniquely represent any known subtype. Web-based services for viral subtype classification with the branching index are available online.
Two important databases are often used in HIV genetic research, the HIV Sequence Database in Los Alamos, which collects all sequences and focuses on annotation and data analysis, and the HIV RT/Protease Sequence Database in Stanford, which collects sequences associated with the development of viral resistance against anti-retroviral drugs and focuses on analysis of those sequences. The types of data and services these two databases offer, the tools they provide, and the way they are set up and operated are described in detail.
HIV; Database; Analysis; Resistance; Genetic sequences; Evolution
The hepatitis C virus (HCV) is a significant public health threat worldwide. The virus is highly variable and evolves rapidly, making it an elusive target for the immune system and for vaccine and drug design. Presently, ∼50 000 HCV sequences have been published. A central website that provides annotated sequences and analysis tools will be helpful to HCV scientists worldwide. The HCV sequence database collects and annotates sequence data, and provides them to the public via a website that contains a user-friendly search interface and a large number of sequence analysis tools, following the model of the highly regarded and widely used Los Alamos HIV database. The HCV website can be accessed via http://hcv.lanl.gov and http://hcv-db.org.
The duration of treatment for HCV infection is partly indicated by the genotype of the virus. For studies of disease transmission, vaccine design, and surveillance for novel variants, subtype-level classification is also needed. This study used the Shimodaira-Hasegawa test and related statistical techniques to compare phylogenetic trees obtained from coding and non-coding regions of a whole-genome alignment for the reliability of subtyping in different regions.
Different regions of the HCV genome yield inconsistent phylogenies, which can lead to erroneous conclusions about classification of a given infection. In particular, the highly conserved 5' untranslated region (UTR) yields phylogenetic trees with topologies that differ from the HCV polyprotein and complete genome phylogenies. Phylogenetic trees from the NS5B gene reliably cluster related subtypes, and yield topologies consistent with those of the whole genome and polyprotein.
These results extend those from previous studies and indicate that, unlike the NS5B gene, the 5' UTR contains insufficient variation to resolve HCV classifications to the level of viral subtype, and fails to distinguish genotypes reliably. Use of the 5' UTR for clinical tests to characterize HCV infection should be replaced by a subtype-informative test.
Detecting recombinations in the genome sequence of human immunodeficiency virus (HIV-1) is crucial for epidemiological studies and for vaccine development. Herein, we present a web server for subtyping and localization of phylogenetic breakpoints in HIV-1. Our software is based on a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach proposed by Spang et al. The input data for our server is a partial or complete genome sequence from HIV-1; our tool assigns regions of the input sequence to known subtypes of HIV-1 and predicts phylogenetic breakpoints. jpHMM is available online at .
Jumping alignments have recently been proposed as a strategy to search a given multiple sequence alignment A against a database. Instead of comparing a database sequence S to the multiple alignment or profile as a whole, S is compared and aligned to individual sequences from A. Within this alignment, S can jump between different sequences from A, so different parts of S can be aligned to different sequences from the input multiple alignment. This approach is particularly useful for dealing with recombination events.
We developed a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach. Given a partition of the aligned input sequence family into known sequence subtypes, our model can jump between states corresponding to these different subtypes, depending on which subtype is locally most similar to a database sequence. Jumps between different subtypes are indicative of intersubtype recombinations. We applied our method to a large set of genome sequences from human immunodeficiency virus (HIV) and hepatitis C virus (HCV) as well as to simulated recombined genome sequences.
Our results demonstrate that jumps in our jumping profile HMM often correspond to recombination breakpoints; our approach can therefore be used to detect recombinations in genomic sequences. The recombination breakpoints identified by jpHMM were found to be significantly more accurate than breakpoints defined by traditional methods based on comparing single representative sequences.
A chemokine receptor from the seven-transmembrane-domain G-protein-coupled receptor superfamily is an essential coreceptor for the cellular entry of human immunodeficiency virus type 1 (HIV-1) and simian immunodeficiency virus (SIV) strains. To investigate nonhuman primate CC-chemokine receptor 5 (CCR5) homologue structure and function, we amplified CCR5 DNA sequences from peripheral blood cells obtained from 24 representative species and subspecies of the primate suborders Prosimii (family Lemuridae) and Anthropoidea (families Cebidae, Callitrichidae, Cercopithecidae, Hylobatidae, and Pongidae) by PCR with primers flanking the coding region of the gene. Full-length CCR5 was inserted into pCDNA3.1, and multiple clones were sequenced to permit discrimination of both alleles. Compared to the human CCR5 sequence, the CCR5 sequences of the Lemuridae, Cebidae, and Cercopithecidae shared 87, 91 to 92, and 96 to 99% amino acid sequence homology, respectively. Amino acid substitutions tended to cluster in the amino and carboxy termini, the first transmembrane domain, and the second extracellular loop, with a pattern of species-specific changes that characterized CCR5 homologues from primates within a given family. At variance with humans, all primate species examined from the suborder Anthropoidea had amino acid substitutions at positions 13 (N to D) and 129 (V to I); the former change is critical for CD4-independent binding of SIV to CCR5. Within the Cebidae, Cercopithecidae, and Pongidae (including humans), CCR5 nucleotide similarities were 95.2 to 97.4, 98.0 to 99.5, and 98.3 to 99.3%, respectively. Despite this low genetic diversity, the phylogeny of the selected primate CCR5 homologue sequences agrees with present primate systematics, apart from some intermingling of species of the Cebidae and Cercopithecidae. Constructed HOS.CD4 cell lines expressing the entire CCR5 homologue protein from each of the Anthropoidea species and subspecies were tested for their ability to support HIV-1 and SIV entry and membrane fusion. Other than that of Cercopithecus pygerythrus, all CCR5 homologues tested were able to support both SIV and HIV-1 entry. Our results suggest that the shared structure and function of primate CCR5 homologue proteins would not impede the movement of primate immunodeficiency viruses between species.
Two novel simian immunodeficiency virus (SIV) strains from wild-caught red-capped mangabeys (Cercocebus torquatus torquatus) from Nigeria were characterized. Sequence analysis of the fully sequenced SIV strain rcmNG411 (SIVrcmNG411) and gag and pol sequence of SIVrcmNG409 revealed that they were genetically most closely related to the recently characterized SIVrcm from Gabon (SIVrcmGB1). Thus, red-capped mangabeys from distant geographic locations harbor a common lineage of SIV. SIVrcmNG411 carried a vpx gene in addition to vpr, suggesting a common evolutionary ancestor with SIVsm (from sooty mangabeys). However, SIVrcm was only marginally closer to SIVsm in that region than to any of the other lentiviruses. SIVrcm showed the highest similarity in pol with SIVdrl, isolated from a drill, a primate that is phylogenetically distinct from mangabey monkeys, and clustered with other primate lentiviruses (primarily SIVcpz [from chimpanzees] and SIVagmSab [from African green monkeys]) discordantly in different regions of the genome, suggesting a history of recombination. Despite the genetic relationship to SIVcpz in the pol gene, SIVrcmNG411 did not replicate in chimpanzee peripheral blood mononuclear cells (PBMC), although two other viruses unrelated to SIVcpz, SIVmndGB1 (from mandrills) and SIVlhoest (from L'Hoest monkeys), were able to grow in chimpanzee PBMC. The CCR5 24-bp deletion previously described in red-capped mangabeys from Gabon was also observed in Nigerian red-capped mangabeys, and SIVrcmNG411, like SIVrcmGB1, used CCR2B and STRL33 as coreceptors for virus entry. SIVrcm, SIVsm, SIVmndGB1, and all four SIVlhoest isolates but not SIVsun (from sun-tailed monkeys) replicated efficiently in human PBMC, suggesting that the ability to infect the human host can vary within one lineage.
To investigate the temporal relationship between human immunodeficiency virus type 1 (HIV-1) replicative capacity and syncytium-inducing (SI) phenotype, biological and genetic characteristics of longitudinally obtained virus clones from two HIV-1-infected individuals who developed SI variants were studied. In one individual, the emergence of rapidly replicating SI and non-syncytium-inducing (NSI) variants was accompanied by a loss of the slowly replicating NSI variants. In the other subject, NSI variants were always slowly replicating, while the coexisting SI variants showed an increase in the rate of replication. Irrespective their replicative capacity, the NSI variants remained present throughout the infection in both individuals. Phylogenetic analysis of the V3 region showed early branching of the SI variants from the NSI tree. Successful SI conversion seemed a unique event since no SI variants were found among later-stage NSI variants. This was also confirmed by the increasing evolutionary distance between the two subpopulations. At any time point during the course of the infection, the variation within the coexisting SI and NSI populations did not exceed 2%, indicating continuous competition within each viral subpopulation.
We studied the temporal relationship between human immunodeficiency type 1 (HIV-1) quasispecies in tissues and in peripheral blood mononuclear cells (PBMC) of infected individuals. Sequential PBMC and tissue samples from various organs obtained at autopsy from three patients who died of AIDS-related complications were available for analysis. Biological HIV-1 clones were isolated from PBMC samples, and cellular tropism and syncytium-inducing (SI) capacity were determined. Genomic DNA was isolated from 1 cm3 of organ tissue, and proviral DNA was amplified by means of PCR and cloned with the PGEM-T vector system. A 185-bp region encompassing the third variable domain of the virus envelope, known to influence HIV-1 biological properties, was sequenced. HIV-1 could be amplified from all PBMC and organ samples, except from liver tissue for two patients. Both SI and non-syncytium-inducing (NSI) genotypes could be detected in the different tissues. Tissue-specific quasispecies were observed in brain, lung, and testis. Lymphoid tissues, such as bone marrow, lymph node, and spleen, harbored several different variants similar to those detected in blood in the last PBMC samples. In general, only tissues in which macrophages are likely to be the main target cell for HIV-1 harbored NSI HIV-1 sequences that clustered separately. Both SI and NSI sequences that clustered with sequences from late-stage PBMC were present in other tissues, which may indicate that the presence of HIV-1 in those tissues is secondary to lymphocyte infiltration rather than to tissue tropism of HIV-1 itself. These data suggest that the viral reservoir may be limited, which will have important implications for the success of HIV-1 eradication.
The 2005 consensus proposal for the classification of hepatitis C virus (HCV) presented an agreed and uniform nomenclature for HCV variants and the criteria for their assignment into genotypes and subtypes. Since its publication, the available dataset of HCV sequences has vastly expanded through advancement in nucleotide sequencing technologies and an increasing focus on the role of HCV genetic variation in disease and treatment outcomes. The current study represents a major update to the previous consensus HCV classification, incorporating additional sequence information derived from over 1,300 (near-)complete genome sequences of HCV available on public databases in May 2013. Analysis resolved several nomenclature conflicts between genotype designations and using consensus criteria created a classification of HCV into seven confirmed genotypes and 67 subtypes. There are 21 additional complete coding region sequences of unassigned subtype. The study additionally describes the development of a Web resource hosted by the International Committee for Taxonomy of Viruses (ICTV) that maintains and regularly updates tables of reference isolates, accession numbers, and annotated alignments (http://talk.ictvonline.org/links/hcv/hcv-classification.htm). The Flaviviridae Study Group urges those who need to check or propose new genotypes or subtypes of HCV to contact the Study Group in advance of publication to avoid nomenclature conflicts appearing in the literature. While the criteria for assigning genotypes and subtypes remain unchanged from previous consensus proposals, changes are proposed in the assignment of provisional subtypes, subtype numbering beyond “w,” and the nomenclature of intergenotypic recombinant. Conclusion: This study represents an important reference point for the consensus classification of HCV variants that will be of value to researchers working in clinical and basic science fields. (Hepatology 2014;59:318-327)