|Home | About | Journals | Submit | Contact Us | Français|
Hepatitis D virus (HDV) is a satellite of hepatitis B virus (HBV) for transmission and propagation and infects nearly 20 million people worldwide. The HDV genome is a compact circular single-stranded RNA genome with extensive intramolecular complementarity. Despite its different epidemiological and pathological patterns, the variability and geographical distribution of HDV are limited to three genotypes and two subtypes that have been characterized to date. Phylogenetic reconstructions based on the delta antigen gene and full-length genome sequence data show an extensive and probably ancient radiation of African lineages, suggesting that the genetic variability of HDV is much more complex than was previously thought, with evidence of additional clades. These results relate the geographic distribution of HDV more closely to the genetic variability of its helper HBV.
Hepatitis D virus (HDV) is a transmissible agent discovered 26 years ago (30) that requires helper functions from the hepatitis B virus (HBV) for virion assembly and propagation (37). Thus, HDV infection is necessarily associated with HBV infection because HDV ribonucleoprotein buds through the hepatitis B surface antigen (HBsAg) excretory pathway. The HDV genome is a circular single-stranded RNA genome of approximately 1,680 bases with extensive intramolecular complementarity (41). Part of the HDV genome might have historical homology to viroids or plant virus satellite RNA sequences (10, 15), and a rolling-circle model has been developed for viral RNA replication (reviewed in reference 39). However, in contrast to viroids, which do not code for any protein, the HDV antigenome contains an open reading frame that was probably acquired by HDV from a cellular ancestor transcript, leading to the expression of the delta protein (1, 20). Indeed, HDV mRNA is translated to sHD and LHD proteins, corresponding respectively to the “small-p24” and the “large-p27” hepatitis Delta proteins. The LHD amino acid sequence is identical to sHD with the addition of a carboxy-terminal extension of 19 to 20 amino acids following the editing of the sHD stop codon during the viral RNA replication cycle (23, 43). sHD is required for viral replication and might promote RNA polymerase II elongation of nascent HDV RNA (45), while LHD is essential for HDV particle assembly (5).
HDV-HBV coinfection and HDV superinfection of a patient chronically infected by HBV both lead to a liver disease more severe than that induced by HBV alone (2, 31). HDV is highly endemic in Mediterranean countries, the Middle East, Central Africa, and northern parts of South America. In contrast, in industrialized countries, its prevalence is low and its transmission is often associated with intravenous drug use. Intrafamilial transmission has been described in southern Italy, which has been considered an area of high endemicity (12, 28).
Although HDV was expected, like many RNA viruses, to exhibit considerable genetic variability, only three HDV genotypes have been characterized to date on the basis of a small number of complete genome sequences (3, 13, 41). Historically, the definition of “genotype” is based on the comparison of nucleotide similarity between pairs of sequences that have been discovered and characterized at the time. For HDV genomes that are known at present, the divergence in nucleotide sequence of the studied region is less than 14 to 15.7% among different isolates of the same genotype and ranges from 19 to 38% between sequences from different genotypes (3, 13, 34, 44). Genotype I includes the European, North American, African, and some Asian HDV isolates (6, 34, 41). Genotype II has been found in Japan, Taiwan, and Yakutia (Russia) (13, 14, 17, 44); some sequences from Taiwan and the Okinawa islands were tentatively assigned to a subtype of genotype II (i.e., genotype IIB) (33, 44). Genotype III has been found exclusively in South America (Peru, Colombia, and Venezuela) (3, 27).
So far, studies of HDV genome variability have been performed in non-African countries except for the description of two sequences (assigned to genotype I) from Ethiopia and Somalia (46). By studying samples that have been prospectively tested for HDV replication, we present an extensive analysis of African HDV sequences. Besides genotype I-like sequences, our phylogenetic analyses indicate that approximately 70% of the characterized African isolates (mostly from West and Central Africa) form highly divergent groups, suggesting an ancient African radiation and extending the known HDV genetic variability to at least seven clades, thus bringing the variability of this satellite closer to that of human HBV.
From a cohort of 227 individuals whose serum samples were prospectively collected for HDV genome replication analysis between 1999 and 2002, we selected 25 patients whose preliminary examination suggested that the HDV strains varied from previously described HDV genotypes. Twenty-two samples were obtained from patients born in Africa or who had traveled to Africa. Table Table11 indicates the gender, age, place of birth, and biological and liver histological results of the infected patient. The male-to-female ratio was 0.8, and the mean age was 35 years (range, 15 to 53 years). Most patients had chronic active hepatitis or cirrhosis, and only one patient, aged 33 years, had acute HDV superinfection. One patient was treated with alpha interferon at the time of sampling.
HDV RNA was extracted from 250 μl of serum, and region 6A-6S (237 nucleotides [nt]) was amplified as described in reference 7; primers 900s and 1280as encompass the R0 region (400 nt) covering the 3′ end of the HD gene (14). For all samples, two short reverse transcription-PCR-generated DNA fragments were directly sequenced.
Based on R0 sequence data, complete HDV cDNA was amplified using synthetic primers (Table (Table2).2). For four samples (dFr-45, dFr-47, dFr-48, and dFr-73), two overlapping cDNA fragments of 850 and 1,050 nt were used to obtain the entire HDV genome. Specific primers were also determined to amplify cDNA from samples dFr-644 and dFr-910. The PCR-amplified fragments were cloned into the TA Cloning pCRII vector (Invitrogen), and at least two clones were sequenced bidirectionally using the BigDye Terminator technology (ABI Prism 377; Perkin-Elmer Applied Biosystems).
Secondary-structure prediction was performed using complete antigenome sequences of HDV genotypes I (Italy-A20), IIA (Japan-S), IIB (Taiwan-TW-2b), and III (Peru-1) and African isolates (dFr-45, dFr-47, dFr-48, dFr-73, dFr-644, and dFr-910). The mfold program version 3.1, predicting possible secondary structures for RNA sequences, is available at http://bioinfo.math.rpi.edu/~mfold/rna. HD protein secondary-structure prediction was performed using the secondary-structure consensus from the network protein sequence interface available at the Pole Bio-Informatique Lyonnais (http://npsa-pbil.ibcp.fr).
Alignment of HDV R0 sequences was generated with ClustalW 1.8 using a gap-opening penalty (GOP) of 15 and a gap extension penalty (GEP) of 6.66 with minimum manual corrections. Because different values of GOP and GEP gave different results, full-genome alignments were performed using the SOAP program (22). This program generates and compares alignments corresponding to 30 different sets of alignment parameters (GEP from 12 to 17 in steps of 1; GOP from 6 to 8 in steps of 0.5). Positions with different alignments can be excluded, or a proportion of GOP and GEP combinations yielding the same alignment can be studied. For example, with full-length genomes (26 taxa, approximately 1,800 characters), we compared the results of three phylogenetic analyses corresponding to three different consensus alignments: strict (734 characters excluded), 80% (690 characters excluded), and 50% (559 character excluded) consensus among the 30 different SOAP-generated alignments. We also used ProAlign (21), a program that provides a statistical approach to multiple-sequence alignment, such that a posterior probability is assigned to each aligned position. Positions with a posterior probability below a user-defined threshold can be excluded before phylogeny inference is made.
Maximum-parsimony (MP; using heuristic for the R0 region or branch-and-bound for the full genome and sHD searches) and neighbor-joining (NJ; maximum-likelihood [ML] distances) phylogenetic analyses were performed with PAUP*4.0b6 (38). ML analyses were carried out both with PAUP*4.0b6 (full-length and sHD data sets) and the metapopulation genetic algorithm (metaGA) (18) (all data sets) implemented in the program MetaPIGA (http://www.ulb.ac.be/sciences/ueg). MetaGA ML analyses were performed with the following settings: Hasegawa-Kishino-Yano (HKY) model, taking into account the proportion of invariable sites and rate heterogeneity (four categories), four populations of four individuals each, and probability consensus pruning. Bootstrap analyses (103 and 104 replicates for MP and NJ, respectively) and 10,000 metaGA samples (2,500 replicates with four populations, ML analyses) were used to assess the robustness and posterior probabilities of nodes, respectively. Due to computational limitations, ML inferences using PAUP*4.0b6 were performed only on taxon subsets (24 full-genome sequences and 33 sHD sequences). We used the HKY model and estimated the transition/transversion ratio, the proportion of invariable sites, and rate heterogeneity (gamma distribution with four categories) parameters from the data. Bootstrap analyses were limited to 100 replicates with the values of ML parameters constrained to those estimated from the original ML search.
The accession numbers of the sequences used were as follows: Afghanistan, AF008373; Albania-02, AF008309; Archangelos, AF008333; Cagliari, X85253; China, X77627; Colombia, L22061; Egypt, AF008375; Ethiopia, U81989; French-02, AF008372; Greek-19, AF008320; Italy- A20, X04451; Italy-35, AF008420; Japan-S, X60193; Lebanon, M84917; Miyako, AF309420; Nauru, M58629; Okinawa (Ok) 1-18, AB015442; Ok2-05, AB015443; Ok3-25, AB015444; Ok4-15, AB015445; Ok5-01, AB015446; Ok6-21, AB015447; Peru-1, L22063; Peru-2, L22064; Romania, AF008319; Russia, AF008374; Somalia, U81988; Taiwan, M92448; Taiwan-3, U19598; Taiwan-TW-2b, AF018077; TW2476, AF104264; Turkish-01, AF008347; US-1, D01075; US-23, AF008371; VnzD8349, AB037948; VnzD8375, AB037947; VnzD8624, AB037949; Woodchuck 5 (W5), AJ307077; Woodchuck 15 (W15), M21012; Yakut (Ya)-8, AJ309871; Ya-12, AJ309872; Ya-13, AJ309868; Yakut-26, AJ309879; Ya-29, AJ309869; Ya-30, AJ309873; Ya-51, AJ309876; Yakut-62, AJ30880; Ya-63, AJ309875; Ya-245, AJ309874; Ya-704, AJ309877; Ya-724, AJ309878; dFr-1594 (Angola), AJ583884; dFr-71 (Central African Republic), AJ583879; dFr-45 (Cameroon), AJ583868; dFr-48 (Cameroon), AJ583871; dFr-55 (Cameroon), AJ583872; dFr-56 (Cameroon), AJ583873; dFr-1953 (Cameroon); AJ583886; dFr-2066 (Cameroon), AJ583888; dFr-644 (Congo), AJ583882; dFr-74 (Democratic Republic of Congo), AJ583881; dFr-59 (Egypt), AJ583874; dFr-70 (Egypt), AJ583878; dFr-46 (France), AJ583869; dFr-1843 (Gabon), AJ583885; dFr-69 (Gambia), AJ583877; dFr-4 (Ghana), AJ583867; dFr-47 (Guinea), AJ583870; dFr-2317 (Guinea), AJ583891; dFr-73 (Ivory Coast), AJ583880; dFr-2020 (Ivory Coast), AJ583887; dFr-2204 (Ivory Coast), AJ583889; dFr-2301 (Ivory Coast), AJ583890; dFr-910 (Mali), AJ583883; dFr-60 (Romania), AJ583875; dFr-65 (Romania), AJ583876.
We used a 336-nt HDV cDNA fragment (here called R0), encompassing the 3′ end of the HD gene, to characterize a large number of clinical samples. From 227 patients monitored for HDV replication analyses between 1999 and 2001, we selected 25 samples (i) whose HDV cDNA could not be amplified by using previously described 6A and 6S primers although their HDV serology was positive (Table (Table1)1) or (ii) whose R0 DNA amplicon restriction pattern was atypical (data not shown). Of the 25 HDV-infected patients, 22 were later found to originate from African countries (Table (Table1).1). The 25 new R0 sequences were aligned with 16 sequences from Russia (13) and 36 R0 sequences from public databases. Analyses using MP, NJ, and ML methods yielded results compatible with the phylogenetic tree shown in Fig. Fig.1a.1a. Of the 22 newly characterized African sequences, 15 form at least three lineages (red box in Fig. Fig.1)1) spanning a range of variability much larger than for type I or type III. The other seven African sequences are scattered within the type I clade.
To test the validity of our results, we sequenced the full genome of isolates (Fig. (Fig.1b)1b) spanning the range of the newly characterized clades (accession numbers AJ584844 to AJ584849). Alignments among these sequences showed approximately 25% divergence (Table (Table3),3), similar to that observed between type I and type II sequences, suggesting the existence of additional types. To avoid artifacts due to alignment ambiguities, we used the SOAP and ProAlign programs, the former to produce and compare 30 alignments (each corresponding to 1 of 30 different sets of alignment parameters) and the latter to provide statistical alignments so that each column has a posterior probability of being correctly aligned. The phylogenetic analyses were performed again after excluding positions (i) that are different among the 30 SOAP alignments or (ii) that have a posterior probability of <90% in the ProAlign alignments. These analyses yielded results (Fig. (Fig.1b)1b) similar to those described above (Fig. (Fig.1a):1a): the genetic variability of the HDV genus has more major monophyletic groups (i.e., clades) than previously thought.
Finally, because there is functional evidence that the sHD protein trans-complements the corresponding HDV type more efficiently (4, 19), we phylogenetically analyzed the sHD coding sequences from 33 isolates including newly characterized strains from the island of Okinawa (24) and Taiwan (42) that had been tentatively identified as a genotype IIB and eight sHD gene sequences from African strains. The phylogenetic results obtained again confirmed that the African HDV clades account for a large proportion of HDV worldwide variability (Fig. (Fig.2).2). These analyses, including the African HDV sequences, also confirm that sequences from the “genotype IIB” complex form a distinct clade (proposed as HDV-4 in Fig. Fig.2)2) which is not directly related to the genotype IIA sequences (cf. HDV-2 in Fig. Fig.2).2). In all, at least seven major clades, including the well-defined genotype I (HDV-1 in Fig. Fig.2)2) and genotype III (HDV-3 in Fig. Fig.2),2), were identified; three of them (HDV-5, HDV-6, and HDV-7) correspond exclusively to African HDV sequences (Fig. (Fig.22).
All the newly characterized complete HDV sequences exhibited the two expected overlapping open reading frames (sHD and LHD), and most of the conserved motifs were located in the central and carboxy-terminal regions of the sHD (Fig. (Fig.3).3). Since most patients studied had a clear antibody response to type I HD recombinant antigen (Table (Table2),2), we might expect immunogenic epitopes to correspond to these conserved regions. Analysis of the carboxy-terminal part of the LHD proteins reveals a highly variable proline-rich domain except for three conserved residues (Fig. (Fig.33).
Predictions of HDV-RNA antigenomic secondary structure indicate that the characterized African isolates exhibit slightly different patterns in the vicinity of the RNA-editing site (at the amber/tryptophan codon) (Fig. (Fig.4).4). Each prototype sequence from each clade had a specific antigenomic RNA pseudo-double-strand secondary structure. Alternative branched secondary structures were also observed for some sequences (data not shown).
Due to historical geopolitical factors, and similar to hepatitis C virus (HCV) (26) and HBV, the delta viruses characterized in Paris (France) showed a wide African distribution (Table (Table1).1). For example, the six full-length viral RNA sequences were obtained from five patients from Western or Central sub-Saharan African countries (Cameroon, Guinea, Ivory Coast, Mali, and Republic of Congo) and from an adult Polish woman who had lived in Cameroon for 3 years. To determine whether the geographic distribution of the HDV isolates was correlated with their levels of sequence divergence, we compared the Kimura-2 parameter pairwise distances between full-genome sequences with the relative geographic distance matrix of the capitals of the countries where the patients had been infected. To compare two quantitative continuous variables, we decided to calculate the correlation coefficient to evaluate the degree of proportionality of data sets. When the ubiquitous type I sequences were removed, a significant statistical correlation was observed between the two matrices (Pearson correlation coefficient r = 0.791, P < 0.0001). Because r = 0.791, we can estimate that in this data set the Kimura-2 genetic distances and the corresponding relative geographic distances were directly proportional, with a minimal risk of error (0.0001).
The extensive genetic variability of HBV-related viruses suggests that hepadnaviruses have been infecting humans and other primates for a long time (11, 16, 25). HBV is endemic in sub-Saharan Africa, and chronically infected mothers transmit the virus to their children at birth, although intrafamilial spreading can occur later in childhood (9). The age of the HDV-HBV association needs to be clarified. Clearly, the classification of HDV into only three “genotypes” does not reflect the actual range of variability of the Deltavirus genus. Indeed, using the delta antigen gene and full genome sequence data, we identified a wide and probably ancient radiation of African lineages (as suggested by several clades branched inside the deepest part of the unrooted tree [Fig. [Fig.2]),2]), making the genetic variability of HDV much wider than previously thought, although the South American sequences (HDV clade 3) remain the most divergent group. Furthermore, strains Taiwan-TW-2b, Miyako, L215, and AF209859 should be considered a specific clade, distinct from those including genotypes I, II, and III. We suggest that the sequence TW-2b (Fig. (Fig.11 and and2)2) be considered the clade 4 prototype due to the early date of the study by Wu et al. (44). Finally, the additional African lineages described here suggest that there are at least three new clades. In our geographic area (near Paris, France), HDV African clades 5, 6, and 7 represent 10.2, 2, and 2% of the HDV sequences characterized in the 2002, respectively (E. Gordien, unpublished data). This would mean that the Deltavirus genus includes at least seven clades, which, interestingly, is very similar to the human HBV genetic variability (which includes six distinct genotypes [A to F] [reviewed in reference 32] plus the recently characterized HBV genotype G ).
The non-type-I HDVs from patients infected in Cameroon (dFr-45 dFr-48, and dFr-1953 [Fig. [Fig.1])1]) are good representatives of the newly characterized African clades. Furthermore, HDV type 1 sequences have also been found in patients originating from Cameroon (dFr-55 and dFr-56 [Fig. [Fig.1]).1]). A similar genetic diversity has been observed in other RNA viruses in this restricted area, e.g., the highly divergent human immunodeficiency virus type 1 group O, initially described in Cameroon (40). The general phylogeographic pattern revealed by our analyses suggests that HDV sequences constitute, besides some DNA virus genomes (36), efficient markers of human migrations. However, horizontal acquisitions of strains, rather than strict vertical familial transmission, can also occur. This is shown by the observation that the HDV type I sequences (dFr-55 and dFr-56) were obtained from two girls who are the daughters of a woman from whom sequence dFr-45 was characterized (arrows in Fig. Fig.1).1). Since a wide range of HDV genetic variability is suspected, specific diagnostic tools (conserved versus clade-specific PCR primers) should be used.
Finally, the wide radiation reported here might explain the spectrum of disorders associated with HDV. For example, specific liver lesions, including morula cells, have been observed in African and Amazonian patients with severe hepatitis (3, 29). All African samples studied here came from screening of HDV replication markers in patients with liver disease who were immigrating to France. Most patients suffered from active chronic hepatitis or cirrhosis, and two of them underwent liver transplantation. Although the deltaviruses corresponding to the African radiation in this study are associated with severe liver-specific HDV histological lesions, it should be emphasized that these virus lineages might not necessarily be as pathogenic in the general population. Obviously, other factors such as the time and duration of infection, the genetic background of the patient (8), and the HBV helper strain may contribute to the pathogenicity.
We are particularly grateful for the help given by all physicians in collecting clinical data from patients. We thank Camille Sureau and Patrick Mardulyn for helpful comments on an earlier version of the manuscript.
N.R. was partly supported by the “Association Nationale de Recherches sur le Sida (ANRS)”, V.I. was supported by a “bonus qualité recherche” from “Université Paris 13”, and M.C.M. is supported by the National Fund for Scientific Research Belgium (FNRS) and the “Communauté Française de Belgique.” This work is part of the program of the “Laboratoire Associé au Centre National de Référence des Hépatites B et C pour le virus delta” (E.G. and P.D.) supported by the French Ministry of Health.