Search tips
Search criteria 


Logo of jvirolPermissionsJournals.ASM.orgJournalJV ArticleJournal InfoAuthorsReviewers
J Virol. 2012 February; 86(4): 2212–2220.
PMCID: PMC3302385

Origin and Evolution of the Unique Hepatitis C Virus Circulating Recombinant Form 2k/1b


Since its initial identification in St. Petersburg, Russia, the recombinant hepatitis C virus (HCV) 2k/1b has been isolated from several countries throughout Eurasia. The 2k/1b strain is the only recombinant HCV to have spread widely, raising questions about the epidemiological background in which it first appeared. In order to further understand the circumstances by which HCV recombinants might be formed and spread, we estimated the date of the recombination event that generated the 2k/1b strain using a Bayesian phylogenetic approach. Our study incorporates newly isolated 2k/1b strains from Amsterdam, The Netherlands, and has employed a hierarchical Bayesian framework to combine information from different genomic regions. We estimate that 2k/1b originated sometime between 1923 and 1956, substantially before the first detection of the strain in 1999. The timescale and the geographic spread of 2k/1b suggest that it originated in the former Soviet Union at about the time that the world's first centralized national blood transfusion and storage service was being established. We also reconstructed the epidemic history of 2k/1b using coalescent theory-based methods, matching patterns previously reported for other epidemic HCV subtypes. This study demonstrates the practicality of jointly estimating dates of recombination from flanking regions of the breakpoint and further illustrates that rare genetic-exchange events can be particularly informative about the underlying epidemiological processes.


Hepatitis C virus (HCV) infection presents a major global health burden, with the WHO estimating that 170 million chronic carriers are at risk of developing severe clinical outcomes such as cirrhosis and hepatic cellular carcinoma (56, 71). The virus belongs to the single-stranded positive-sense RNA virus family Flaviviridae and is characterized by considerable genetic diversity. HCV diversity is classified into six main genotypes (genotypes 1 to 6), each of which is further divided into numerous subtypes, and the virus exhibits nucleotide sequence divergences of 30 and 20% at the genotype and subtype levels, respectively (58). The high genomic heterogeneity of HCV is a result of both its high rate of evolution and its long-term association with human populations (60). Although there is no indication for a zoonotic virus reservoir, a related virus has recently been discovered in dogs (22).

The greatest diversity of HCV is found in West and Central Africa and in Southeast Asia, where the virus appears to have persisted endemically for at least several centuries (49, 60). The current distribution of HCV genotypes and subtypes is geographically structured, reflecting differences in the rates and routes of transmission of the various subtypes and genotypes. Epidemic strains, exemplified by subtypes 1a, 1b, and 3a, are characterized by high prevalence, low genetic diversity, and a global distribution and are typically associated with transmission via infected blood products and injecting drug use (IDU) during the 20th century (13, 4446, 54, 57). In contrast, endemic strains are more spatially restricted but harbor greater genetic diversity than epidemic strains, and it is currently thought that this endemic diversity provided the source of the epidemic strains that constitute the majority of HCV infections worldwide (47, 60).

Recombination is thought to play a comparatively minor role in shaping the genetic diversity of HCV; however, an increasing number of reports suggests that it is not entirely insignificant in HCV evolution. Most notable of these was the initial discovery of a natural recombinant form of HCV circulating in injecting drug users resident in St. Petersburg, Russia (20). This recombinant, labeled 2k/1b, has a 5′ genome region that is most closely related to subtype 2k and a 3′ genome region that is most closely related to the global epidemic subtype 1b, with a single recombination breakpoint located at genomic position 3175 or 3176 in the NS2 gene (20). Since the discovery of 2k/1b, several other studies have reported both inter- and intragenotypic HCV recombinants in natural populations, although the evidence presented for recombination varies in strength; the weakest studies report only discordant genotyping results between genome regions (which could also result from coinfection), whereas the most convincing studies repeatedly sequence the same recombination breakpoint from independent extractions (thereby excluding the possibility of in vitro genetic exchange). Thus far there have been nine descriptions of HCV recombinant forms, although only in six cases have the breakpoints been sequenced (68, 19, 28, 29, 42).

Inspection of the recombination breakpoint positions within the HCV genome reveals a difference between inter- and intragenotypic recombinants. Breakpoints in the intrasubtypic recombinants (1a/1c and 1b/1a) are located in the E1/E2 region, while in the intergenotypic recombinants (including 2k/1b), the breakpoints are consistently found in the NS2-NS3 region (8, 19, 28, 29, 39, 42). Interestingly, naturally occurring intergenotypic HCV recombinants have more often than not involved genotype 2 in the 5′ genome region (19, 20, 28, 29, 42). This may reflect some inherent yet unknown biological or ecological properties of this genotype to produce viable recombinant viruses.

Nevertheless, the low rate of discovery of novel recombinant forms suggests that although recombination does occur in HCV, it is an uncommon event, at least in comparison to HIV-1. Indeed, the 2k/1b strain discovered in St. Petersburg is the only known circulating recombinant form (CRF) of HCV and has therefore been designated CRF01_1b2k, following a naming scheme similar to that developed for HIV-1 recombinants (23). A CRF is a recombinant form that is found repeatedly in different patients. Since its discovery, CRF01_1b2k has been isolated from patients in many countries, including Ireland, France, Cyprus, Azerbaijan, Uzbekistan, and Russia (20, 25, 26, 29, 3739). CRF01_1b2k is the only recombinant strain of HCV to have transmitted widely; therefore, it is important to investigate its genesis and dissemination in order to understand why it might be unique and to evaluate the likelihood that other HCV recombinant forms could increase in prevalence in the future.

Evolutionary analysis of viral genomes using methods based on molecular clocks and coalescent theory has previously proved useful in reconstructing the epidemic history of various HCV strains, including subtypes 1a and 1b worldwide (33, 46) and subtype 1b in Japan (66). Similar analyses of HCV genotype 4 in Egypt have estimated the timescale of the large HCV epidemic in that country and have confirmed its iatrogenic cause (48, 65). Very little is known about the evolutionary history of HCV subtype 2k, most likely because of the lack of sequence data for the strain.

In order to address the lack of information about the epidemiological and transmission history of HCV CRF01_1b2k, we have conducted a comprehensive evolutionary analysis of all available viral genome sequences using a well-established Bayesian framework (11). Since previously published sequence data on CRF01_1b2k are limited, we sought to increase the sample size by isolating and sequencing a panel of new recombinant isolates from patients of Russian origin resident in Amsterdam, The Netherlands.

By combining information from different genomic regions in a single analysis, we provide the first estimate of the date of the recombination event that generated CRF01_1b2k. The date that we obtained considerably predates the discovery of the strain and requires a reevaluation of the circumstances surrounding its origin. This is the first time that a recombination event has been dated for any virus other than HIV-1 (32, 51, 67) or influenza A virus (27). Further, we estimate the CRF's past rate of transmission and its pattern of global geographic spread. To obtain more precise parameter estimates when dating recombination events, we employed a joint phylogenetic approach that improves on methods previously applied to HIV-1 CRFs (51, 67). The methods introduced here should serve as a model for future phylogenetic investigations of genetic-exchange events in RNA virus populations.


Identification and sequencing of new HCV 2k/1b isolates from Amsterdam.

In the course of a study of HCV-infected patients resident in Amsterdam (unpublished data), it was found that HCV genotyping results from the 5′ untranslated and NS5B regions were discordant for 6 (out of 200) patients. Of these, five were male and one was female, and their mean age was 34 years (Table 1). For this study, the 5′-end sequences were not used.

Table 1
Epidemiological and sequence information of the CRF01_1b2k isolates used in this studya

HCV RNA was isolated from 200 μl plasma using the purification method described by Boom et al. (4). cDNA was generated using random hexamer primers as described before (4). The amplification was performed using a conventional PCR with the following cycling conditions: 2 min at 50°C and 10 min at 95°C, followed by 45 cycles, each consisting of 30 s at 95°C, 30 s at 55°C, and 1 min at 72°C. Amplicons were purified from a 1% agarose gel as described by Boom et al. (4). Amplification of a 724-nucleotide (nt) fragment (unpublished data) of the NS5B region was performed as previously described (40). Amplification of core/E1 was performed in a 25-μl volume using HCV1b/2 F (GCGTGAGRGTCCTGGAG) as the forward primer and HCV1b/2 R (TGCCARCARTANGGCYTCAT) (positions 292 to 312; all primer locations are indicated relative to the H77 reference genome) as the reverse primer, using the same amplification conditions mentioned above. Amplicons were also purified from a 1% agarose gel as described by Boom et al. (4). Sequencing was performed using HCV1b/2F seq (CTTCYTACTAGCTCTYTTGTCTT; positions 128 to 112) as the forward sequencing primer and HCV1b/2R seq (TGCCAACTGCCRTTGGTGT; positions 2386 to 2410) as the reverse sequencing primer.

To confirm that the viral variants were recombinants, a 234-nt fragment harboring the known breakpoint in NS2 was also amplified and sequenced. Amplification and sequencing of the NS2 breakpoint were performed using HCV2K_F (GCACGCCATACTTCGTCAGAG) as the forward primer and HCV1B_R (CAGGTAATGATCTTGGTCTCCATGT) as the reverse primer, also using the same cycling conditions mentioned above.

In addition to new CRF01_1b2k sequences from Amsterdam, 6 recombinant isolates from an unpublished study were also included. These sequences were sampled from IDUs in Azerbaijan (Table 1). Most of the remaining CRF01_1b2k isolates came from a study of IDUs in Uzbekistan (25) and from a cohort study of HCV-positive patients from seven countries (26); detailed demographic information on isolates from these two studies is provided here (Table 1). In the latter study, all individuals infected with CRF01_1b2k came from either Russia or Uzbekistan, and 6 of these patients were linked to high-risk groups, namely, blood transfusion recipients or intravenous drug users.

Collation of HCV sequence alignments.

To investigate the evolutionary origin and spread of HCV CRF01_1b2k, a data set comprising all subtype 2k/1b (n = 27), 2k (n = 15), and 1b (n = 71) isolates for which both core/E1 and NS5B sequences were available was collated from GenBank and the HCV Sequence Database (24) (see Table S1 in the supplemental material). This collection included the 6 newly sequenced isolates obtained from HCV patients in Amsterdam (see above). Alignments for both genome regions were constructed manually, and each alignment contained exactly the same set of taxa. The date and sampling location of each sequence were obtained from the literature or via personal communication (Table 1).

Estimation of genome region-specific rates of evolution.

The evolutionary rates of the core/E1 and NS5B regions used in this study could not be estimated directly from our data, because the sample size and range of sample dates were not large or wide enough. In line with previous studies of HCV epidemic history (e.g., see reference 48), we employed an independent data set with significant temporal information to provide the substitution rates of the subgenomic regions of interest. Specifically, we used the data and analysis strategy of a recent study that reported rates of evolution for all HCV genome regions for subtypes 1a and 1b (14). That study utilized an alignment-partition approach, which was implemented in the BEAST program, to estimate region-specific rates (11). We applied a codon-structured nucleotide substitution model (55), an uncorrelated relaxed lognormal molecular clock (10), and a Bayesian skyline coalescent model (12) to both subtype 1a and 1b whole-genome alignments, from which we obtained rate estimates for the precise subgenomic regions (core/E1 and NS5B) used in our analysis of CRF01_1b2k. The Markov chain Monte Carlo chains (MCMCs) were run for 200 million generations and sampled regularly to yield a posterior tree distribution based upon 10,000 estimates. For further analysis details, see reference 14.

Phylogenetic analysis.

Preliminary phylogenetic analyses were undertaken to confirm that CRF01_1b2k originated from a single recombination event. Neighbor-joining (NJ) trees of the core/E1 and NS5B data sets were estimated using the PAUP* program (64) with an HKY85 nucleotide substitution model and a gamma distributed among site rate heterogeneity (data not shown).

Next, in order to directly test the hypothesis of a single recombinant origin, we performed Bayesian MCMC analysis of the core/E1 and NS5B data sets in two ways: (i) we constrained the CRF01_1b2k isolates to be a monophyletic clade, and (ii) no phylogenetic constraints were imposed. The hypothesis of a single origin was then tested by performing a Bayes factor (BF) comparison of the marginal likelihoods (41, 63) obtained from these two analyses. This revealed an insignificant difference between the two competing hypotheses; thus, a single recombination event was assumed in following analyses.

Molecular clock analysis.

In order to estimate the date of the recombination event that formed CRF01_1b2k, we created separate data sets for the core/E1 and NS5B regions that contained the CRF isolates, plus all available closely related parental subtype reference sequences (belonging to subtype 2k for the core/E1 region and to subtype 1b for the NS5B region). A hierarchical phylogenetic model (62) was used to combine both data sets and thereby provide a joint estimate of the time to the most recent common ancestor (TMRCA) of the CRF clade, while accounting for uncertainty in both genome regions.

As the CRF clade in the different genome regions is known to represent a common evolutionary history, jointly estimating the age of this clade will maximize the explanatory power of the data (62) and is thus more powerful than analyzing the regions independently, as has been done previously (e.g., see reference 67). However, due to the recombination of the different subtypes, we cannot simply assume that a single tree represents the entire genome. Instead, we allow independent trees for each genomic region but maintain the TMRCA of the CRF clade in each region to lie within a small time of each other, while estimating the mean of these as the parameter of interest.

Specifically, separate phylogenies, molecular clock models, and substitution models were estimated for the core/E1 and NS5B regions. The genome region-specific rates (estimated as described above) were used as prior distributions for the evolutionary rates for the core/E1 and NS5B regions. For the NS5B region, the rates estimated from subtype 1b were applied, while for the core/E1 region, the average of the 1a and 1b rates was used (because a subtype 2k-specific rate was not available).

For each pair of sampled phylogenies (core/E1 and NS5B) in the posterior distribution of the MCMC, three node dates were obtained (as labeled in Fig. 2): A, the joint TMRCA of the CRF clade; B, the date of the parental node of the CRF clade in the core/E1 subtype 2k phylogeny; and C, the date of the parental node of the CRF clade in the NS5B subtype 1b phylogeny. The former date, together with the more recent of the last two dates, therefore defines a time range during which the recombination event must have occurred (see Fig. 2). The posterior distribution of this time range was then compiled by repeating the above-described procedure for each pair of phylogenies in the MCMC output. The BEAST analysis model settings were the same as those outlined in the section on genome region-specific rates above.

Fig 2
An illustration to explain the structure of the joint phylogenetic method that we employed to estimate the date of the recombination event that generated CRF01_1b2k. The core/E1 tree is shown on the left and the NS5b tree on the right; in both, the shaded ...

CRF01_1b2k transmission history.

Further Bayesian MCMC phylogenetic analyses were performed solely on the CRF01_1b2k isolates in order to estimate the epidemic history and basic reproductive number, R0, of the strain since its emergence. BEAST model settings were the same as those outlined above, except that different coalescent models were employed to reconstruct the transmission history of the CRF. Both the GMRF skyride (36) and exponential-growth coalescent models were used.


Estimation of genome region-specific rates of evolution.

The rates of evolution of the core/E1 and NS5B genomic regions used in this study were estimated from subtype 1a and 1b whole-genome data sets (see Materials and Methods) and are given in Table 2. The rates for the NS5B region corresponded between subtypes, with similar 95% highest posterior density (HPD) intervals, while the rate estimates for the core/E1 region did show some variation among subtypes (Table 2).

Table 2
Evolutionary rate estimates of the genomic regions used in this study

Phylogenetic analysis.

To establish the evolutionary origins of the HCV 2k/1b strain, we analyzed the core/E1 (644 nt) and NS5B (741 nt) regions of 27 CRF 1b/2k, 15 subtype 2k, and 71 subtype 1b isolates. The 2k/1b isolates were sampled between 1999 and 2007 and were from the following locations: Ireland, Uzbekistan, Azerbaijan, Cyprus, Amsterdam, France, and Russia (Table 1). Since the BF test supported the hypothesis that the 2k/1b isolates were monophyletic, a single origin of the CRF was inferred (BFs for the comparisons of monophyly and nonmonophyly models were 0.31 for the core/E1 data set and −0.68 for the NS5B data set). Furthermore, the monophyletic origin of the CRF clade was supported in both genome regions when no phylogenetic constraints were imposed. However, monophyly of CRF 1b/2k isolates was not supported by a high posterior probability (0.67) in the NS5B maximum clade credibility (MCC) tree, most likely reflecting the uncertainty associated with the star-like phylogeny and relatively short sequence length. However, the recombinant nature of many of the isolates in this study was confirmed by direct observation of the breakpoint in the NS2 gene region (confirmed isolates are represented by filled circles in Fig. 1).

Fig 1
Molecular clock phylogenies of CRF01_1b2k and its parental subtypes, estimated from the core/E1 alignment and the NS5B alignment. The horizontal bars in each phylogeny contain the dating estimates for two nodes: the common ancestor of the CRF clade (1931 ...

Molecular clock analysis.

Figure 1 and Table 3 provide the estimated TMRCAs of the CRF clade obtained using the joint and independent molecular clock analyses. The estimates obtained separately from the core/E1 and NS5B data sets were in close agreement and exhibit overlapping 95% HPD intervals. These estimates also agree with the joint estimate of TMRCA of the CRF (node A), which was 1946 (1932 to 1959; Table 3). We also estimated the date of the most recent common ancestor of the CRF clade and its most closely related parental isolate (Fig. 2). These estimates were again similar for both genome regions, at about 1933 (Table 3). Lastly, by comparing the HPDs of the node A date with the dates of the more recent of the two parental nodes (either B or C), we were able to generate bounds for the time of the recombination event that generated the CRF, which was between 1923 and 1956.

Table 3
Estimates of TMRCA of the CRF clade obtained from separate genome regions and from joint (hierarchical) phylogenetic analysis

CRF01_1b2k transmission history.

Figure 3 shows the estimated epidemic history of CRF01_1b2k, as estimated using the Bayesian skyride plot method, which depicts the effective population size of the CRF epidemic over time. The plot indicates approximately constant exponential growth since the emergence of the CRF lineage (Fig. 3) until the mid-1990s, after which the effective population size declines or stabilizes (either is plausible, given the size of the credible region of the estimate). This decrease/stabilization coincides with the advent of screening for HCV in blood donors, which greatly reduced the risk of HCV infection via blood transfusion (31, 52, 70).

Fig 3
Estimated Bayesian skyride plot of the CRF01_1b2k clade. The vertical axis represents the product of viral generation time and the effective number of infections (Ne). The solid line shows the best estimate, and the shaded area shows the 95% credible ...

To ascertain the CRF's exponential growth rate (r), the CRF data set was also analyzed using an exponential-growth coalescent model. The estimated growth rate was 0.116 year−1 (95% HPD interval, 0.079 to 0.159). This estimate was subsequently used to calculate R0 values for the CRF01_1b2k strain under a plausible range of average durations of infectiousness (D), using the equation R0 = rD + 1 (46). Both the estimated growth rate (r = 0.1 year−1) and the estimated R0 values (R0, ~2 to 4) are compatible with a number of equivalent estimates for other HCV subtypes, including those from IDU risk groups (4648, 66).

The MCC tree of the CRF01_1b2k clade (Fig. 4), when combined with all available epidemiological information (Table 1), indicates a clear pattern of phylogenetic clustering according to the geographic location and risk factor of each patient. For example, where these details were available, 14 out of 15 isolates were associated with IDU, while only 1 was isolated from a patient with a history of blood transfusion (Table 1). These observations further support the view that CRF01_1b2k transmission is strongly linked with the IDU transmission route. Although it is not possible to reconstruct the location of the common ancestor of the CRF lineage with any certainty (because our sample size is small and the basal branches of the phylogeny are poorly supported), all of the isolates included in this study are from or have an epidemiological link to the former Soviet Union, and the oldest strain was sampled in Russia in 1999. A well-supported cluster of strains from Azerbaijan originated in about 1970, and therefore, the CRF has likely circulated there since that time. Hence, it appears that CRF01_1b2k disseminated throughout the Soviet Union before its dissolution in the late 1980s and for some time prior to the discovery of the CRF in St. Petersburg in 1999.

Fig 4
Molecular clock phylogeny of the CRF01_1b2k clade, estimated using a relaxed uncorrelated lognormal clock model and the SDR06 nucleotide substitution model (see Materials and Methods). Sequences are color labeled according to the country of origin (diamonds) ...


As the only known HCV recombinant in widespread circulation, the existence and emergence of CRF01_1b2k present an interesting question in HCV epidemiology and evolution. Investigating its evolutionary origins and transmission history helps to understand the circumstances that led to its unique properties. In contrast to HIV, which has 49 known CRFs and a much greater number of unique recombinant forms (30), recombination typically contributes little to the generation and maintenance of HCV genetic diversity. Given that HCV has a higher global prevalence than HIV and, thus, all else being equal, there is a high likelihood of dual infections with divergent HCV strains, it is unlikely that epidemiological factors are restricting the opportunities for HCV to generate CRFs. Mixed infections with divergent HCV strains have been reported for many different populations and are noted to be prevalent among high-risk groups, particularly IDUs and some hemophiliacs (3, 5, 16, 50, 53).

Since the opportunities for HCV recombination are not limited, it is more likely that fundamental molecular and evolutionary differences between HIV and HCV explain why HIV has many CRFs and HCV has few. These could include differences in the rate of template switching or differences in genomic or immunological constraints, such that HCV recombinants have, on average, lower fitness than HIV recombinants and therefore are rarely transmitted (72). Although both viruses are associated with chronic infections, unlike HIV, HCV can be spontaneously cleared by the host. This may explain in part the differences in the number of recombinants between HIV and HCV, where partial protective immunity against the latter reduces that chance of in vivo recombination of HCV strains (1, 43, 69). However, the high rate of mixed infections observed suggests that this is likely at best to play a minor role in HCV recombination. The low frequency of HCV recombinants is more likely to reflect mechanistic constraints on viral replication. There is evidence that template switching in HCV is especially rare and that the replication complex is typically encoded on the same genomic strand that it will replicate and transcribe (2). It is also interesting that when replication complexes are exchanged between different genotypes, the replication efficiency is substantially reduced (15). The pseudodiploidy of the HIV genome certainly increases the likelihood of recombination occurring due to the ability of the virus to package two RNA templates (17), while the secondary RNA structure in the HCV genome may limit the production of viable hybrid HCVs (59, 68).

Our study of previously reported and newly obtained HCV isolates provides the first estimates of the date of the recombination event that generated CRF01_1b2k. We estimated the time of origin of CRF01_1b2k to be between 1923 and 1956, which is not much later than the origin and global spread of the parental subtype 1b (33). This date is significantly earlier than we expected: we expected that the CRF's creation might be linked to the dramatic increase in IDU behavior following the breakup of the former Soviet Union. This result is robust to the manner of isolate sampling: if, because of nonrandom sampling, our isolates are more closely related to each other than under random sampling, then the TMRCA of CRF01_1b2k would be biased toward more recent dates. Furthermore, despite its small size, our data set provides a relatively short time window during which the recombinant must have arisen (Fig. 2). The involvement of subtype 1b in the recombinant is not surprising, as it is one of the most prevalent subtypes worldwide. However, to fully appreciate the origin of CRF01_1b2k, we need to consider the evolutionary history of both parental subtypes and of the recombinant lineage itself. Genotype 2 harbors considerable genetic diversity, especially in West Africa, which is where the genotype is thought to have originated (34). Although the small number of subtype 2k isolates sampled to date likely underestimates the true extent of subtype 2k distribution, such viruses have been isolated from Martinique and Madagascar, implicating a role for the historical trans-Atlantic slave trade in the dissemination of the virus from West Africa (34).

The current distribution of subtype 2k is associated with francophone regions and former Soviet Union countries. In contrast, CRF01_1b2k is more spatially limited, with all isolates being directly or indirectly linked to the former Soviet Union. As the nonrecombinant subtype 2k isolates that are most closely related to CRF01_1b2k are from Moldova and Azerbaijan (see Fig. S1 in the supplemental material), it seems most likely that CRF01_1b2k was generated in the former Soviet Union. An equivalent analysis of subtype 1b viruses provides no reliable phylogeographic linkage, due to the low phylogenetic resolution of the NS5B data set.

Our estimated date of CRF origin coincides with an interesting period of history in the former Soviet Union, which was an early leader in transfusion technology. Under the leadership of Alexander Bogdanov in the 1920s, a nationwide network of blood transfusion centers and research institutes, as well as the Central Institute of Hematology in Moscow, Russia, in 1926, was established throughout the Soviet republics (61). This expanded into a network of ~1,500 blood donating centers across the republics (18). The Soviets also adopted blood storage and preservation techniques at an early stage. They established more than 60 primary and 500 subsidiary blood storage centers by the mid-1930s, which shipped blood across the entire Soviet Union (61). During the Second World War, these networks were swiftly readapted to support the front line; in Moscow alone, about 2,000 blood donations were given per day (18, 61). The impressive scale of the blood service in the former Soviet Union is likely to have favored HCV transmission by increasing the efficiency and geographic range of the virus's dissemination. Whether specific medical practices at this time increased the probability of mixed viral infections remains unknown. It is interesting to note that Bogdanov himself was fascinated by the ideological interpretation of blood sharing and frequently practiced what he called “physiological collectivism”: the exchange of blood with others through mutual transfusions (61).

Although unscreened blood transfusions can provide a credible hypothesis for the origin of CRF01_1b2k in the Soviet Union some time from 1923 to 1956, we must also attempt to explain how subtype 2k or the CRF itself arrived in the Soviet Union from West Africa or the Caribbean. Migration from Africa to the former Soviet Union did occur during the late 1950s and 1970s as a result of alliances forged by the Soviet government with newly independent African states such as Ghana and Angola (35). However, these connections are too late to have contributed to the emergence of CRF01_1b2k, according to our dating estimates. Although we cannot reject the hypothesis that the CRF was formed in West Africa and subsequently moved to the Soviet Union, our results are more consistent with the recombination event occurring in the latter. This uncertainty is likely to be reduced with further samples, especially subtype 2k viruses, from African and former Soviet Union locations.

The epidemic history CRF01_1b2k (Fig. 3) since its emergence is similar to that estimated for other epidemic subtypes of HCV (e.g., see reference 33). The growth in the CRF01_1b2k effective population sizes coincides with a substantial increase in blood transfusion, including during the Second World War, and with the subsequent rise in intravenous drug usage. CRF01_1b2k transmission seems to have slowed or stabilized since the early 1990s, coinciding with the onset of the anti-HCV screening of donors. In the absence of any data to the contrary, the transmission of this recombinant and its spread from the former Soviet Union reflect the peculiar epidemiological properties of the risk groups that it has been associated with rather than any intrinsic properties of the virus.

We demonstrate the practicality and benefits of using a hierarchical phylogenetic model to jointly estimate parameters of interest when analyzing multipartite sequence data that result from genetic exchange. This method yields more accurate parameter estimates than previous methods (e.g., see references 27 and 67) by incorporating the phylogenetic information and uncertainty in different genomic regions. We recommend that this improved statistical framework be used in future investigations of recombination in fast-evolving RNA viruses.

This study has made significant steps in understanding the epidemic history and spread of the unique circulating HCV recombinant 2k/1b. Most significantly, we show that this strain originated many decades before the post-Soviet rise in injection behavior with which it is currently associated. On the basis of the date of its origin and its molecular epidemiology, there are reasonable grounds to suppose that the Soviet Union's revolutionary blood service was instrumental in the CRF's early generation and continental-scale spread. This infrastructure may have facilitated the pan-Eurasian spread of other parenterally transmitted blood-borne infections, and this is an interesting question for future research.

Supplementary Material

Supplemental material:


Published ahead of print 23 November 2011

Supplemental material for this article may be found at


1. Aitken CK, et al. 2008. High incidence of hepatitis C virus reinfection in a cohort of injecting drug users. Hepatology 48:1746–1752 [PubMed]
2. Appel N, Herian U, Bartenschlager R. 2005. Efficient rescue of hepatitis C virus RNA replication by trans-complementation with nonstructural protein 5A. J. Virol. 79:896–909 [PMC free article] [PubMed]
3. Blackard JT, Sherman KE. 2007. Hepatitis C virus coinfection and superinfection. J. Infect. Dis. 195:519–524 [PubMed]
4. Boom R, et al. 1990. Rapid and simple method for purification of nucleic-acids. J. Clin. Microbiol. 28:495–503 [PMC free article] [PubMed]
5. Bowden S, McCaw R, White PA, Crofts N, Aitken CK. 2005. Detection of multiple hepatitis C virus genotypes in a cohort of injecting drug users. J. Viral Hepat. 12:322–324 [PubMed]
6. Calado RA, et al. 2011. Hepatitis C virus subtypes circulating among intravenous drug users in Lisbon, Portugal. J. Med. Virol. 83:608–615 [PubMed]
7. Colina R, et al. 2004. Evidence of intratypic recombination in natural populations of hepatitis C virus. J. Gen. Virol. 85:31–37 [PubMed]
8. Cristina J, Colina R. 2006. Evidence of structural genomic region recombination in hepatitis C virus. Virology J. 3:53. [PMC free article] [PubMed]
9. Demetriou VL, de Vijver DAMCV, Kostrikis LG, Network CHCV. 2009. Molecular epidemiology of hepatitis C infection in Cyprus: evidence of polyphyletic infection. J. Med. Virol. 81:238–248 [PubMed]
10. Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. 2006. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4:699–710 [PMC free article] [PubMed]
11. Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214. [PMC free article] [PubMed]
12. Drummond AJ, Rambaut A, Shapiro B, Pybus OG. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol. Biol. Evol. 22:1185–1192 [PubMed]
13. Dubois F, et al. 1997. Hepatitis C in a French population-based survey, 1994: seroprevalence, frequency of viremia, genotype distribution, and risk factors. Hepatology 25:1490–1496 [PubMed]
14. Gray RR, et al. 2011. The mode and tempo of hepatitis C virus evolution within and among hosts. BMC Evol. Biol. 11:131. [PMC free article] [PubMed]
15. Herlihy KJ, et al. 2008. Development of intergenotypic chimeric replicons to determine the broad-spectrum antiviral activities of hepatitis C virus polymerase inhibitors. Antimicrob. Agents Chemother. 52:3523–3531 [PMC free article] [PubMed]
16. Herring BL, Page-Shafer K, Tobler LH, Delwart EL. 2004. Frequent hepatitis C virus superinfection in injection drug users. J. Infect. Dis. 190:1396–1403 [PubMed]
17. Hu WS, Temin HM. 1990. Genetic consequences of packaging two RNA genomes in one retroviral particle: pseudodiploidy and high rate of genetic recombination. Proc. Natl. Acad. Sci. U. S. A. 87:1556–1560 [PubMed]
18. Huestis DW. 2002. Russia's National Research Center for Hematology: its role in the development of blood banking. Transfusion 42:490–494 [PubMed]
19. Kageyama S, et al. 2006. A natural inter-genotypic (2b/1b) recombinant of hepatitis C virus in the Philippines. J. Med. Virol. 78:1423–1428 [PubMed]
20. Kalinina O, Norder H, Mukomolov S, Magnius LO. 2002. A natural intergenotypic recombinant of hepatitis C virus identified in St. Petersburg. J. Virol. 76:4034–4043 [PMC free article] [PubMed]
21. Kalinina O, et al. 2001. Shift in predominating subtype of HCV from 1b to 3a in St. Petersburg mediated by increase in injecting drug use. J. Med. Virol. 65:517–524 [PubMed]
22. Kapoor A, et al. 2011. Characterization of a canine homolog of hepatitis C virus. Proc. Natl. Acad. Sci. U. S. A. 108:11608–11613 [PubMed]
23. Kuiken C, Simmonds P. 2009. Nomenclature and numbering of the hepatitis C virus. Methods Mol. Biol. 510:33–53 [PubMed]
24. Kuiken C, Yusim K, Boykin L, Richardson R. 2005. The Los Alamos hepatitis C sequence database. Bioinformatics 21:379–384 [PubMed]
25. Kurbanov F, et al. 2008. Detection of hepatitis C virus natural recombinant RF1_2k/1b strain among intravenous drug users in Uzbekistan. Hepatol. Res. 38:457–464 [PubMed]
26. Kurbanov F, et al. 2008. Molecular epidemiology and interferon susceptibility of the natural recombinant hepatitis C virus strain RF1_2k/1b. J. Infect. Dis. 198:1448–1456 [PubMed]
27. Lam TY, et al. 2008. Evolutionary analyses of European H1N2 swine influenza A virus by placing timestamps on the multiple reassortment events. Virus Res. 131:271–278 [PubMed]
28. Lee YM, et al. 2010. Molecular epidemiology of HCV genotypes among injection drug users in Taiwan: full-length sequences of two new subtype 6w strains and a recombinant form_2b6w. J. Med. Virol. 82:57–68 [PubMed]
29. Legrand-Abravanel F, et al. 2007. New natural intergenotypic (2/5) recombinant of hepatitis C virus. J. Virol. 81:4357–4362 [PMC free article] [PubMed]
30. Leitner T, Korber B, Daniels M, Calef C, Foley B. 2005. HIV-1 subtype and circulating recombinant form (CRF) reference sequences, 2005, p 41–48 In Leitner T, et al., editors. (ed), HIV sequence compendium 2005. Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM
31. Lemon SM, Brown EA. 1995. Hepatitis C virus, p 1474–1486 In Mandel GL, Bennett JE, Dolin R, editors. (ed), Principle and practice of infectious disease, 4th ed Churchill Livingstone, New York, NY
32. Li Y, et al. 2010. Identification of a novel second-generation circulating recombinant form (CRF48_01B) in Malaysia: a descendant of the previously identified CRF33_01B. J. Acquir. Immune Defic. Syndr. 54:129–136 [PubMed]
33. Magiorkinis G, et al. 2009. The global spread of hepatitis C virus 1a and 1b: a phylodynamic and phylogeographic analysis. PLoS Med. 6:e1000198. [PMC free article] [PubMed]
34. Markov PV, et al. 2009. Phylogeography and molecular epidemiology of hepatitis C virus genotype 2 in Africa. J. Gen. Virol. 90:2086–2096 [PubMed]
35. Matusevich M. 2009. Black in the U.S.S.R.: Africans, African Americans, and the Soviet society. Transition 100:56–75
36. Minin VN, Bloomquist EW, Suchard MA. 2008. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol. Biol. Evol. 25:1459–1471 [PMC free article] [PubMed]
37. Moreau I, et al. 2006. Serendipitous identification of natural intergenotypic recombinants of hepatitis C in Ireland. Virol. J. 3:95–102 [PMC free article] [PubMed]
38. Morel V, et al. 2010. Emergence of a genomic variant of the recombinant 2k/1b strain during a mixed hepatitis C infection: a case report. J. Clin. Virol. 47:382–386 [PubMed]
39. Moreno P, et al. 2009. Evidence of recombination in hepatitis C virus populations infecting a hemophiliac patient. Virol. J. 6:203. [PMC free article] [PubMed]
40. Murphy DG, et al. 2007. Use of sequence analysis of the NS5B region for routine genotyping of hepatitis C virus with reference to C/E1 and 5′ untranslated region sequences. J. Clin. Microbiol. 45:1102–1112 [PMC free article] [PubMed]
41. Newton MA, Raftery AE. 1994. Approximate Bayesian-inference with the weighted likelihood bootstrap. J. R. Stat. Soc. Ser. B Methodol. 56:3–48
42. Noppornpanth S, et al. 2006. Identification of a naturally occurring recombinant genotype 2/6 hepatitis C virus. J. Virol. 80:7569–7577 [PMC free article] [PubMed]
43. Osburn WO, et al. 2010. Spontaneous control of primary hepatitis C virus infection and immunity against persistent reinfection. Gastroenterology 138:315–324 [PMC free article] [PubMed]
44. Pawlotsky JM, et al. 1995. Relationship between hepatitis-C virus genotypes and sources of infection in patients with chronic hepatitis-C. J. Infect. Dis. 171:1607–1610 [PubMed]
45. Pol S, et al. 1995. The changing relative prevalence of hepatitis-C virus genotypes—evidence in hemodialyzed patients and kidney recipients. Gastroenterology 108:581–583 [PubMed]
46. Pybus OG, et al. 2001. The epidemic behavior of the hepatitis C virus. Science 292:2323–2325 [PubMed]
47. Pybus OG, Cochrane A, Holmes EC, Simmonds P. 2005. The hepatitis C virus epidemic among injecting drug users. Infect. Genet. Evol. 5:131–139 [PubMed]
48. Pybus OG, Drummond AJ, Nakano T, Robertson BH, Rambaut A. 2003. The epidemiology and iatrogenic transmission of hepatitis C virus in Egypt: a Bayesian coalescent approach. Mol. Biol. Evol. 20:381–387 [PubMed]
49. Pybus OG, Markov PV, Wu A, Tatem AJ. 2007. Investigating the endemic transmission of the hepatitis C virus. Int. J. Parasitol. 37:839–849 [PubMed]
50. Qian KP, Natov SN, Pereira BJ, Lau JY. 2000. Hepatitis C virus mixed genotype infection in patients on haemodialysis. J. Viral Hepat. 7:153–160 [PubMed]
51. Ristic N, et al. 2011. Analysis of the origin and evolutionary history of HIV-1 CRF28_BF and CRF29_BF reveals a decreasing prevalence in the AIDS epidemic of Brazil. PLoS One 6:e17485. [PMC free article] [PubMed]
52. Schreiber GB, Busch MP, Kleinman SH, Korelitz JJ. 1996. The risk of transfusion-transmitted viral infections. The Retrovirus Epidemiology Donor Study. N. Engl. J. Med. 334:1685–1690 [PubMed]
53. Schroter M, Feucht HH, Zollner B, Schafer P, Laufs R. 2003. Multiple infections with different HCV genotypes: prevalence and clinical impact. J. Clin. Virol. 27:200–204 [PubMed]
54. Seeff LB, et al. 2000. 45-year follow-up of hepatitis C virus infection in healthy young adults. Ann. Intern. Med. 132:105–111 [PubMed]
55. Shapiro B, Rambaut A, Drummond AJ. 2006. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23:7–9 [PubMed]
56. Shepard CW, Finelli L, Alter MJ. 2005. Global epidemiology of hepatitis C virus infection. Lancet Infect. Dis. 5:558–567 [PubMed]
57. Silini E, et al. 1995. Molecular epidemiology of hepatitis-C virus-infection among intravenous-drug-users. J. Hepatol. 22:691–695 [PubMed]
58. Simmonds P. 2004. Genetic diversity and evolution of hepatitis C virus—15 years on. J. Gen. Virol. 85:3173–3188 [PubMed]
59. Simmonds P, Smith DB. 1999. Structural constraints on RNA virus evolution. J. Virol. 73:5787–5794 [PMC free article] [PubMed]
60. Smith DB, et al. 1997. The origin of hepatitis C virus genotypes. J. Gen. Virol. 78(Pt 2):321–328 [PubMed]
61. Starr D. 1999. Blood: an epic history of medicine and commerce.Little, Brown & Company, London, United Kingdom
62. Suchard MA, Kitchen CMR, Sinsheimer JS, Weiss RE. 2003. Hierarchical phylogenetic models for analyzing multipartite sequence data. Syst. Biol. 52:649–664 [PubMed]
63. Suchard MA, Weiss RE, Sinsheimer JS. 2001. Bayesian selection of continuous-time Markov chain evolutionary models. Mol. Biol. Evol. 18:1001–1013 [PubMed]
64. Swofford D. 2003. PAUP*. Phylogenetic analysis using parsimony (*and other methods), version 4.Sinauer Associates, Sunderland, MA
65. Tanaka Y, et al. 2004. Exponential spread of hepatitis C virus genotype 4a in Egypt. J. Mol. Evol. 58:191–195 [PubMed]
66. Tanaka Y, et al. 2005. Molecular evolutionary analyses implicate injection treatment for schistosomiasis in the initial hepatitis C epidemics in Japan. J. Hepatol. 42:47–53 [PubMed]
67. Tee KK, et al. 2009. Estimating the date of origin of an HIV-1 circulating recombinant form. Virology 387:229–234 [PubMed]
68. Tuplin A, Wood J, Evans DJ, Patel AH, Simmonds P. 2002. Thermodynamic and phylogenetic prediction of RNA secondary structures in the coding region of hepatitis C virus. RNA 8:824–841 [PubMed]
69. van de Laar TJ, et al. 2009. Frequent HCV reinfection and superinfection in a cohort of injecting drug users in Amsterdam. J. Hepatol. 51:667–674 [PubMed]
70. van der Poel CL. 1999. Hepatitis C virus and blood transfusion: past and present risks. J. Hepatology 31(Suppl 1):101–106 [PubMed]
71. WHO 2003, posting date Global alert and response (GAR): hepatitis C. WHO, Geneva, Switzerland
72. Worobey M, Holmes EC. 1999. Evolutionary aspects of recombination in RNA viruses. J. Gen. Virol. 80:2535–2543 [PubMed]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)