|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: MEQ-M LM. Performed the experiments: JW KH DW RG LL. Analyzed the data: JA DLR MEQ-M. Wrote the paper: MEQ-M. Provided access to material, technology and critical advice: EP EJA LM.
HIV-1 coreceptor tropism assays are required to rule out the presence of CXCR4-tropic (non-R5) viruses prior treatment with CCR5 antagonists. Phenotypic (e.g., Trofile™, Monogram Biosciences) and genotypic (e.g., population sequencing linked to bioinformatic algorithms) assays are the most widely used. Although several next-generation sequencing (NGS) platforms are available, to date all published deep sequencing HIV-1 tropism studies have used the 454™ Life Sciences/Roche platform. In this study, HIV-1 co-receptor usage was predicted for twelve patients scheduled to start a maraviroc-based antiretroviral regimen. The V3 region of the HIV-1 env gene was sequenced using four NGS platforms: 454™, PacBio® RS (Pacific Biosciences), Illumina®, and Ion Torrent™ (Life Technologies). Cross-platform variation was evaluated, including number of reads, read length and error rates. HIV-1 tropism was inferred using Geno2Pheno, Web PSSM, and the 11/24/25 rule and compared with Trofile™ and virologic response to antiretroviral therapy. Error rates related to insertions/deletions (indels) and nucleotide substitutions introduced by the four NGS platforms were low compared to the actual HIV-1 sequence variation. Each platform detected all major virus variants within the HIV-1 population with similar frequencies. Identification of non-R5 viruses was comparable among the four platforms, with minor differences attributable to the algorithms used to infer HIV-1 tropism. All NGS platforms showed similar concordance with virologic response to the maraviroc-based regimen (75% to 80% range depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). In conclusion, all four NGS platforms were able to detect minority non-R5 variants at comparable levels suggesting that any NGS-based method can be used to predict HIV-1 coreceptor usage.
The discovery that human immunodeficiency virus type 1 (HIV-1) requires a co-receptor to enter target cells, mainly the chemokine receptors CCR5 or CXCR4 , , was not only crucial to better understand HIV-1 transmission and pathogenesis but opened the door for the designing of novel antiretroviral drugs targeting host cell entry. Multiple strategies to block the replication of CCR5- or CXCR4-tropic (R5 or X4, respectively) viruses have been studied , leading to the approval for clinical use of the first CCR5-receptor antagonist (maraviroc, Selzentry/Celsentri, Pfizer, NY) in 2007 . Like other co-receptor antagonists in development, maraviroc's activity is very specific, showing no direct activity against viruses able to use CXCR4 to enter the target cell , . Thus, an HIV-1 tropism test should be performed prior to initiation of maraviroc-containing regimens to rule out the presence of detectable non-CCR5-tropic (non-R5) virus , , .
Several phenotypic and genotypic assays have been developed to assess HIV-1 co-receptor usage or tropism . Most phenotypic assays involve the generation of patient-derived env-recombinant viruses to determine their ability to infect reporter cell lines expressing HIV-1 receptors and co-receptors , , , . The new version of Trofile™ (Monogram Biosciences, South San Francisco, CA) , i.e., the Enhanced Sensitivity Trofile Assay (ESTA)  is currently the most widely used HIV-1 coreceptor tropism assay. Nonetheless, phenotypic assays share a few practical limitations such as high cost and long turnaround time, which restrict their use and consequentially hinder access to future CCR5 antagonists/agonists.
Genotypic tests are a faster, less expensive alternative to inferring HIV-1 coreceptor tropism from env sequences , . Considerable effort has been made to develop genotypic assays able to predict HIV-1 co-receptor usage based on just the V3 region of the env gene , , , which seems to be the principal determinant of HIV-1 tropism , . However, genotypic tests based on bulk capillary electrophoresis (Sanger) sequencing of a population of V3 sequences lack the sensitivity to detect minority variants present below 20% of the viral population , , , . For that reason, several studies have evaluated the use of next-generation (NGS) or deep sequencing to detect minority non-R5 HIV-1 variants , , , , , ,  or low frequency drug-resistant variants that could lead to treatment failure , , , , . Prediction of HIV-1 coreceptor usage by deep sequencing is highly concordant with phenotypic assays (82% to 87%) , , , has improved sensitivity for detecting non-R5 variants over population sequencing , , , , , and predicts the success of maraviroc-based antiretroviral regimens , .
To date all published HIV-1 deep sequencing studies have used the 454™ Life Sciences platform (454 Life Sciences/Roche, Branford, CT); some of which were focused on HIV-1 tropism prediction , , , , , , , , , . The advent of novel NGS technologies offering different chemistries, simplified sample preparation, faster turnaround times, and reduced cost per bp sequenced prompted us to compare the ability of four NGS platforms, i.e., 454™ Life Sciences/Roche, Illumina® (Illumina, Inc. San Diego, CA), PacBio® RS (Pacific Biosciences, Menlo Park, CA), and Ion Torrent™ (Ion Torrent/Life Technologies, South San Francisco, CA) to determine HIV-1 coreceptor tropism.
Twelve RNA specimens, derived from plasma samples collected from HIV-infected individuals prior to enrollment in the (i) maraviroc expanded access program (EAP) at multiple centers in Europe or (ii) ALLEGRO trial, a multicenter study to assess the prevalence of R5 HIV-1 variants in Spain , were obtained from the Hospital Carlos III (Madrid, Spain) . Phenotypic HIV-1 coreceptor tropism was determined at baseline using the original version of the Trofile™ assay (Monogram Biosciences), which had a reported non-R5 variant detection limit of 5 to 10% . Written informed consent was obtained from the patients before participation in the study as previously described , .
Viral RNA was reverse-transcribed using AccuScript High Fidelity Reverse Transcriptase (Stratagene Agilent; Santa Clara, CA) and the corresponding antisense external primer in 20-µl reaction mixture containing 1 mM dNTPs, 10 mM DTT and 10 units of RNase inhibitor. Viral cDNA was then PCR amplified using a series of external and nested primers with defined cycling conditions. Using the same external (first-round) PCR reactions that covered the entire HIV-1 envelope (env) gene (2,830 nt), two different PCR fragments were amplified due to intrinsic requirements of each NGS platform (Fig. 1). These twelve samples are part of a larger cohort analyzed in a separate study (Weber and Quinones-Mateu, submitted for publication). In that study 105 patient-derived 337 bp amplicons corresponding to a short region around the HIV-1 V3 loop were analyzed with the 454™ system. Here, we were able to sequence the same amplicons using the PacBio® RS platform; however, the Illumina® and Ion Torrent™ systems used in this study are better suited to sequence longer DNA regions, shearing them into small fragments (150 to 200 bp). Processing and sequencing the small 337 bp PCR products may have been difficult with these two platforms. Thus, 337-nucleotide (nt) fragments encompassing the V3 region were generated in single nested (second-round) PCR reactions. These small amplicons were sequenced using 454™ and PacBio® RS sequencing systems. A larger fragment corresponding to the env gene was amplified as a 2,302 nt fragment, that is, all the surface glycoprotein (gp120) and most of the transmembrane glycoprotein (gp41), missing only 321 nt of the gp41 cytoplasmic domain. These amplicons were sequenced using Illumina® and Ion Torrent™ platforms. External PCR reactions were carried out in a 50-µl mixture containing 0.2 mM dNTPs, 3 mM MgCl2 and 2.5 units of Pfu Turbo DNA Polymerase (Stratagene). Nested PCR reactions for population sequencing analysis were carried out in 50-µl mixture containing 0.2 mM dNTPs, 0.3 units of Pfu Turbo DNA Polymerase and 1.9 units of Taq Polymerase (Denville Scientific; Metuchen, NJ), then purified with the QIAquick PCR Purification kit (Qiagen) and quantified with Quant-iT PicoGreen dsDNA kit (Invitrogen). Nested PCR reactions for deep sequencing analysis were customized for each NGS system as described below.
PCR products corresponding to the gp120/gp41-coding regions of HIV-1 were purified with the QIAquick PCR Purification kit (Qiagen) and the V3 region sequenced (population or global sequence) using AP Biotech DYEnamic ET Terminator cycle with Thermosequenase II (Davis Sequencing LCC, Davis, CA) (Fig. 1). Nucleotide sequences were analyzed using DNASTAR Lasergene Software Suite v.7.1.0 (Madison, WI).
Second-round PCR amplification and deep sequencing analysis was customized for each NGS platform as follows:
A 2,302 nt fragment of the env gene was amplified from an external PCR product, then purified (QIAquick PCR Purification, Qiagen) and quantified (Quant-iT PicoGreen dsDNA, Invitrogen) as described above for Illumina. The Ion Xpress™ Fragment Library Kit (Life Technologies, Carlsbad CA) was used to construct a library for shotgun sequencing on the Ion Personal Genome Machine (PGM, Ion Torrent/Life Technologies). Briefly, amplicon DNA was randomly fragmented using the Ion Shear™ Plus Reagent (Life Technologies). The P1 adapter (5′-CCA CTA CGC CTC CGC TTT CCT CTC TAT GGG CAG TCG GTG AT; 5′-ATC ACC GAC TGC CCA TAG AGA GGA AAG CGG AGG CGT AGT GG*T*T) or one of 12 A_BC adapters were then ligated to the repaired fragment ends. Following ligation and size selection (i.e., 150+/−20 nucleotides; Pippin Prep™, Life Technologies) the library was PCR amplified using forward (5′-CCA TCT CAT CCC TGC GTG TC) and reverse (5′-CCA CTA CGC CTC CGC TTT CCT CTC TAT G) primers. The quality and quantity of each of the 12 libraries was assessed with the 2100 Bioanalyzer (DNA High Sensitivity Chip, Agilent Technologies, Sunnyvale CA). Templates were then prepared and enriched for sequencing on the Ion Sphere Particles™ (ISPs) using the Ion Xpress™ Template Kit (Life Technologies) prior to sequencing on the Ion PGM with the Ion Sequencing Kit (Life Technologies). Fifteen million templated ISPs were primed with the Ion Sequencing primer (5′-CCA TCT CAT CCC TGC GTG TCT CCG AC) and then mixed with the Ion Sequencing Polymerase. The primer-activated polymerase-bound ISPs were loaded into the Ion 314™ Chip (Life Technologies) and subjected to 65 cycles of sequencing with the standard nucleotide flow order. Signal processing and base calling was performed with Torrent Analysis Suite version 1.5.
Finally, it is important to note that the 454™ sequencing was performed in the laboratory of Dr. Hendrik Poinar (McMaster University, Hamilton, Canada), while for Illumina®, PacBio®, and Ion Torrent™ the PCR products were sent for sequencing at the respective company. V3 nucleotide sequences obtained by deep sequencing using any of the NGS platforms (as described below) have been submitted to the Los Alamos National Laboratory HIV-DB Next Generation Sequence Archive (http://www.hiv.lanl.gov/content/sequence/HIV/NextGenArchive/Archer2012).
To minimize the amount of data loss due to high sequence variability and to allow for interpatient indel variation across the V3 region, sample-specific reference sequences were constructed as previously described . First, sequences corresponding to the HIV genomic region spanning positions 6,900 to 7,400 on the HXB2 reference strain (accession no: K03455) were extracted, followed by replacement with the V3 population sequence derived from each sample. For each sample, reads derived from the four NGS platforms were then independently mapped to the respective sample-specific reference sequence (Fig. 1) and all indel and substitution information in relation to the reference sequenced stored as described . For each dataset, reads spanning the V3 region (coordinates 210 to 315 within the reference templates) were extracted, truncated and translated for genotyping. Within each dataset only one representative of any identical variant was maintained, but the overall frequency stored. Finally, all variants with a frequency >5 within the population, calculated by each platform, were combined and clustered using a neighbor-joining algorithm implemented within Segminator II .
HIV-1 co-receptor tropism was predicted from population and extracted V3 read data as described above using several bioinformatics tools (Fig. 1). In the case of global sequences, nucleotide mixtures were considered when the second highest peak in the electropherogram was above 25%, and then these nucleotide mixtures were translated into all possible permutations. The algorithms used to infer HIV-1 tropism from V3 amino acid sequences were: (i) Geno2Pheno , with false positive rates (FPR, i.e., predicted frequency of classifying an R5 sequence as non-R5 virus) based on optimized cutoffs for determining HIV-1 coreceptor usage (3.5%) as previously described , ,  or the recommendation from the European Consensus Group on clinical management of HIV-1 tropism testing (10%) as described in the Geno2Pheno website (http://coreceptor.bioinf.mpi-inf.mpg.de/index.php), (ii) Web PSSM using the subtype B x4r5 matrix , and (iii) the 11/24/25 charge rule ,  implemented within our analysis pipeline. Finally, plasma samples were classified as containing non-R5 viruses if at least 2% of the individual sequences, as determined by deep sequencing, were predicted to be non-R5 , .
Descriptive results are expressed as median values, interquartile ranges, and standard deviations. Pearson correlation coefficient was used to determine the strength of association between categorical variables. All differences with a P value of <0.05 were considered statistically significant. The kappa coefficient, which assesses a chance-adjusted measure of the agreement between any number of categories, was calculated using ComKappa2 v.2.0.4  to quantify the concordance among the different the HIV-1 tropism determinations and the patient's virologic response at week 12. Values of kappa can range from −1.0 to 1.0, with −1.0 indicating perfect disagreement below chance, 0.0 indicating agreement equal to chance, and 1.0 indicating perfect agreement above chance. A rule of thumb is that kappa values <0.40 indicate poor agreement, ≥.40 <0.75 indicate good agreement, and ≥0.75 <1.0 indicate excellent agreement. All statistical analyses were performed using GraphPad Prism v.5.01 (GraphPad Software, La Jolla, CA) unless otherwise specified.
Twelve representative specimens were selected from 167 patients analyzed in a study comparing phenotypic and genotypic HIV-1 coreceptor tropism assays  based on the level of concordance among the different tests. Eleven samples corresponded to patients participating in the maraviroc EAP, with three failing to enter the study following the detection of non-R5 (D/M) variants at baseline and five out of the remaining eight patients responding to the maraviroc-based regimen at week 12 (Table 1). Population sequencing was performed not only to infer HIV-1 tropism but also to verify sample identity, showing a 99.9% homology with the V3 nucleotide sequences published for these specimens  (data not shown).
For each platform, the number of reads containing successfully mapped and complete V3 sequences, as well as those that were subsequently translated, were calculated. All insertions, deletions, and substitutions relative to the sample-specific reference sequences were tabulated during read mapping, and prior to any filtering based on correctly translating entire V3 sequences (Table 2). In general, all platforms showed low substitution (1.8, 1.7, 1.3, and 1.7 mean #/read), deletion (0.2, 0.01, 0.4, and 0.5 mean #/read), and insertion (2.1, 1.7, 1.9, and 2.4 mean #/read) rates across all samples for 454™, Illumina®, PacBio®, and Ion Torrent™, respectively (Fig. 2 and Table 2). As expected, there was interpatient variability; for example, sample 10–172 showed slightly higher substitution (5.1, 5.1, 2.5, and 5.1 mean #/read) and insertion (4.5, 4.0, 2.9, and 5.5 mean #/read) rates across all platforms compared with the other samples. Nevertheless, the overall low rates of indels and substitutions resulted in a high number of successfully translated V3 sequences from the original V3 spanning reads, suggesting that each one of the NGS platforms could be used for genotyping of complex HIV-1 populations.
All major virus variants were detected by each platform with similar frequency, with the exception of sample 10–137 where Ion Torrent detected a different dominant variant than the other three platforms or in sample 10–172 where an insertion of three nucleotides was observed in most variants sequenced with 454, Illumina, and PacBio but not with Ion Torrent (Figs. 3, ,4,4, and and5).5). All the amplicons, either the 337-nt fragments encompassing the V3 region or the 2,302 nt fragments covering most of the env gene, were obtained from the same external PCR products; however, the amplicons sequenced by Ion Torrent were generated seven months later than the products analyzed by the other NGS platforms. Thus, it is possible that in some cases (such as with sample 10–137) a different majority variant within the quasispecies population may have been selected during PCR amplification. Moreover, in some cases and at lower frequencies, unique variants were platform-dependent, probably related to platform-dependent error rates and/or stochastic PCR errors (Fig. 3). Nevertheless, and in general, all platforms were able to identify the same major variants within the population and similar proportion of low frequency variants (i.e., at a frequency <0.5%) (Fig. 3, inserts). For example, mixtures of at least two predominant populations were accurately identified by all four NGS platforms in samples 10–80, 10–133, and 10–180 (Figs. 3, ,4,4, ,5,5, and and6).6). Phylogenetic trees constructed by combining V3 sequences obtained with each platform confirmed these findings (Figs. 4, ,5,5, ,6,6, and and77).
The main goal of this study was to evaluate the ability of the four NGS platforms to determine HIV-1 coreceptor tropism. For that, the ten samples with known virologic response at week 12 (Table 1) were selected to compare the results from phenotypic and genotypic (population and deep sequencing) HIV-1 tropism assays, the latest using four different algorithms to predict HIV-1 coreceptor usage. Minority non-R5 variants were detected at comparable levels with a few exceptions mainly linked to the algorithm used to infer HIV-1 tropism rather than the NGS platform. For example, PacBio® (samples 10–91, 10–176, and 10–180; 11/24/125 rule) and Ion Torrent™ (samples 10–180 and 10–69; 11/24/25 rule and Geno2Pheno, respectively) detected a higher frequency of non-R5 variants than the other NGS systems (Fig. 8).
Interestingly, only the two samples carrying almost exclusively X4 viruses (10–80 and 10–172) were classified as non-R5 by all four algorithms (i.e., 11/24/25, Geno2Pheno 3.5% FPR, Geno2Pheno 10% FPR, and PSSM) based on a frequency of non-R5 variants ≥2% within the viral population. The most stringent Geno2Pheno 3.5% FPR failed to call one sample determined as D/M by Trofile (10–176) and specimens from two patients with virologic failure at week 12 (10–65 and 10–73). Not surprisingly, Geno2Pheno 10% FPR was able to call two of these samples as non-R5 (10–65 and 10–176) but also classified a patient with virologic success (10–180) as carrying non-R5 variants (Fig. 8). Finally, despite the limited sample number, prediction of HIV-1 coreceptor usage by all four NGS platforms showed similar concordance with the virologic response at week 12, ranging from 75% to 80% (kappa coefficients of 0.5 to 0.6, P<0.001) depending on the algorithm used, compared to Trofile (80%, kappa coefficient of 0.6) and population sequencing (mean 70%, kappa coefficient 0.4).
The use of CCR5 antagonists to block HIV-1 replication has accelerated the development of HIV-1 coreceptor tropism assays , ,  and stressed the need for novel, sensitive, and more affordable tests to increase treatment with this drug class. Although Trofile is the most commonly used phenotypic assay for HIV-1 coreceptor tropism, less sensitive genotypic tests based on HIV-1 population (Sanger) sequencing are frequently used in Europe, leading to the rapid adoption of deep sequencing technologies in genotypic HIV-1 coreceptor tropism protocols , , , , , , , , , . Based on the need for these NGS-based genotypic assays, we have compared the ability of four NGS platforms (454™, Illumina®, PacBio®, and Ion Torrent™) to detect minority variants, and to infer the presence of non-R5 viruses within the HIV-1 population.
Next generation sequencing has been used in a multitude of biological fields, from the sequencing of whole genomes of animals, plants, and microbes, to targeted studies on polymorphisms related to various genetic disorders and cancer, most of them based on Illumina® , , ,  and 454™ , ,  platforms and more recently using PacBio® , ,  and Ion Torrent™ , ,  systems. To date all published HIV-related studies have used the 454™ platform, due in part to being one of the first NGS systems to provide longer read lengths , . As expected each deep sequencing platform differs in terms of the chemistry, read length, yield, error rate, turn-around time, and overall cost , . Here, we sequenced the HIV-1 V3 region from the same RNA aliquots obtained from 12 patients and showed that all four NGS platforms had similar substitution and insertion rates (ranging from 1.3 to 1.8 and 1.7 to 2.4 mean #/read, respectively), while Illumina® had the fewest deletions per read (0.01 versus a range of 0.2–0.5 mean #/read for the other three platforms). This is consistent with the reduced number of indels reported for Illumina® when compared with 454™ during the genome sequencing of Gallus gallus  or influenza virus  and the sequencing of a strain of Escherichia coli using 454™ and Ion Torrent™ .
We observed differences in the number of V3 sequences and mean read length among the NGS platforms, which were due to both the size of the PCR product selected for sequencing and intrinsic characteristics of the sequencing method. Short amplicons (337 nt) containing the V3 region were sequenced using 454™ and PacBio®; however, library preparation and shotgun sequencing was performed on the entire PCR-amplified env gene (2,302 nt) using Illumina® and Ion Torrent™. In addition, the 454™ sequencing was performed in-house using barcoded sequencing primers while for Illumina®, PacBio®, and Ion Torrent™ the amplicons were sent for sequencing at the respective company. Despite these differences, all NGS platforms were able to detect the same higher frequency variants but showed slightly variations in the detection of low frequency variants (<0.5%), which had limited implications for HIV-1 tropism. It is important to note that based on HIV-1 clonal sequencing the error rate for in-house 454™ sequencing assays has been calculated to be between 0.1% and 0.5% , , , ; therefore, we only used variants present at ≥1% of the viral population for diversity and HIV-1 tropism analyses.
Multiple studies have compared the efficacy of phenotypic and genotypic HIV-1 tropism assays to detect non-R5 variants , , , , . In general, population-based sequencing tests are less sensitive and less specific than phenotypic assays , , although a few studies have shown significant concordance and predictive values , , . More sensitive deep sequencing methods for HIV-1 coreceptor tropism assays resulted in the detection of minor variants, which correlated well with both phenotypic assays , , ,  and virological response to maraviroc , . Here we have shown that all four NGS platforms provide equal and sensitive detection of minority non-R5 viruses in 12 patients, with minor differences depending on the bioinformatic algorithm used to infer HIV-1 tropism. However, it is important to stress that maraviroc was combined with at least two other antiretroviral drugs and increased viral loads could be due to several factors including poor or selective drug adherence and resistance to other drugs while maintaining partial maraviroc suppression. Nevertheless, all four NGS platforms showed similar concordance with virologic response at week 12 (ranging from 75% to 80% depending on the algorithm used), compared to Trofile (80%) and population sequencing (70%). Despite the limited number of samples, these results are comparable to previous studies where deep sequencing had a good concordance with phenotypic HIV1- tropism tests (82% to 87%) , ,  and matched Trofile™ in predicting the success of maraviroc-based antiretroviral regimens .
In conclusion, this is the first study comparing the ability of the four current leaders in deep sequencing (454™, Illumina®, PacBio®, and Ion Torrent™) to detect minority variants, and to infer the presence of non-R5 viruses, within the HIV-1 population. Despite minor differences in error rates and profiles (types of errors), all four NGS platforms successfully detected the same unique viral variants present at high frequencies, which are the sequences relevant for the clinical determination of HIV-1 coreceptor tropism. Further studies with larger number of patients and the latest chemistry and software for each NGS system will be needed to corroborate our findings; however, despite intrinsic parameters to each NGS platform (e.g., read length, error rates, cost per run, and turn-around time) we suspect that any of the current NGS platforms will be effective in a genotypic test to predict HIV-1 coreceptor usage.
We thank Dr. Vicente Soriano and Dr. Eva Poveda (Hospital Carlos III, Madrid, Spain) for providing the clinical samples. We also thank Dr. James F. Demarest (ViiV Healthcare, Research Triangle Park, NC), Dr. Robert Burnside (Pfizer, New London/Norwich, CT), and Laura Napolitano (Monogram Biosciences, South San Francisco, CA) for providing clinical data and statistical support. We are grateful to Dr. Hendrik Poinar (McMaster University, Hamilton, Canada) for his support and access to his laboratory to perform the 454 sequencing runs. We thank Christine L. Schirmer (Genetics Core, Arizona Research Laboratories, University of Arizona, Tucson, AZ) for preparing the DNA libraries used with the Ion Torrent NGS platform.
JA is funded by a project grant BBSRC (BB/H012419/1) to DLR. JW is funded by a project grant from the Ministry of Education, Youth and Sports of the Czech Republic (LK11207). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.