|Home | About | Journals | Submit | Contact Us | Français|
Human genetic variation is a determinant of recovery from an acute hepatitis C virus (HCV) infection, but, to date, single nucleotide polymorphisms (SNPs) in a limited number of genes have been studied with respect to HCV clearance. We determined whether SNPs in 112 selected immune-response genes are important for HCV clearance by genotyping 1536 SNPs in a cohort of 343 persons with natural HCV clearance and 547 persons with HCV persistence. PLINK and Haploview software packages were used to perform association, permutation, and haplotype analyses stratified by African-American (AA) and European-American (EA) race. Of the 1536 SNPs tested, 1426 were successfully genotyped (92.8%). In AAs, we identified 18 SNPs located in 11 gene regions that were associated with HCV outcome (empirical p-value < 0.01). In EAs, there were 20 SNPs located in eight gene regions associated with HCV outcome. Four of the gene regions studied (TNFSF18, TANK, HAVCR1 and IL18BP) contained SNPs with empirical p-values < 0.01 in both of the race groups.
In this large-scale analysis of 1426 genotyped SNPs in 112 candidate genes, we identified four gene regions that are likely candidates for a role in HCV clearance or persistence in both AAs and EAs.
Hepatitis C virus (HCV) infection results in viral persistence in the majority of cases, but approximately 30% of individuals mount an immune response that successfully eliminates infection (1, 2). The factors required for generating this effective immune response are largely unknown but such information could lead to the development of more effective therapeutics and vaccinations. Since host genetic variation is a determinant of the immune response, comparing host genetic differences in immune response genes between those with HCV clearance and persistence could provide clues to these factors. Several studies have demonstrated genetic differences in immune response genes in those who have mounted a successful HCV-specific immune response compared to those who did not (3–7). These studies have used a candidate gene approach in which polymorphisms in one or a few genes are studied for an association with outcome. Since HCV clearance is likely to be polygenic, i.e. involving many genes, studying a few genes at a time limits the number of discoveries. Alternative approaches are now available to screen a large number of genes rapidly using small quantities of DNA.
One approach is to perform a genome-wide scan where single nucleotide polymorphisms (SNPs) at regular intervals through the entire genome are examined. The advantages of this approach are that a priori knowledge about pathogenesis is not needed since candidate genes do not need to be specified and that selection bias in gene selection does not occur. The disadvantage is that it is expensive and that SNP coverage within each gene is more limited, so associations within important candidates may be missed. The costs of such an approach can be decreased by studying pools of people, but then the ability to construct haplotypes, which are groups of SNPs that are in the same chromosomal region and inherited together, is lost. The second approach is to test many candidate genes simultaneously incorporating large-scale genotyping technologies to examine SNPs. The hits from this large-scale candidate gene approach can focus the gene or gene region for further exploration.
In order to determine whether selected immune-response genes are important for HCV clearance, we used the candidate gene approach and large-scale genotyping methods to test 1536 variants in 112 candidate immune-response genes throughout the genome. These genes were tested in a consortial cohort with well-defined HCV clearance and persistence.
The subjects in the cohort were participants in one of the following studies: (i) AIDS Link to Intravenous Experience (ALIVE) study (300 subjects), which is an ongoing study of 2,921 injection drug users enrolled in Baltimore, MD, from February 1988 to March 1989, as previously described (8); (ii) Multicenter Hemophilia Cohort Study (MHCS) (320 subjects), which is a prospectively-followed cohort of patients with hemophilia, von Willebrand disease, or a related coagulation disorder from 16 comprehensive hemophilia treatment centers enrolled between 1982 and 1986, as previously described (9); (iii) a cohort of blood donors throughout the United States (85 subjects); (iv) an HCV clinic cohort in Portland, Oregon (11 subjects); (v) Women’s Interagency Health Study (WIHS) (60 subjects), which is a cohort of female injection drug users, described previously (10);and (vi) Hemophilia Growth and Development Study (HGDS) (115 subjects), which is a continuing study of 333 children and adolescents with hemophilia, von Willebrand disease, or other coagulation disorder enrolled between March 1989 and May 1990, as previously described (11).
Prior HCV infection was established by detection of HCV antibody (anti-HCV) by enzyme immunoassay (EIA) and recombinant immunoblot assay (RIBA). Individuals with HCV recovery had anti-HCV (confirmed by RIBA) and undetectable HCV RNA in serum or plasma without any HCV therapy. Persistently infected individuals had anti-HCV and HCV RNA in serum or plasma prior to any HCV therapy. For each individual with HCV clearance we matched one or two individuals with HCV persistence from the same cohort based on ethnicity and gender with the exception of the HCV clinic cohort from Portland which were all HCV clearance subjects.
Informed consent for genetic testing was obtained from all participants and the study was approved by the institutional review boards at all participating institutions.
All serum or plasma specimens were stored at −70° C until testing. HIV type 1 (HIV-1) antibody testing was done by EIA with reactive results confirmed as positive by Western blotting as previously reported (8, 9, 12, 13). Anti-HCV testing was done by Ortho HCV 2.0 or 3.0 EIA (Ortho Diagnostic Systems, Raritan, N.J.). HCV RNA was assessed by a branched DNA (bDNA) assay (Quantiplex HCV RNA 2.0 assay; Chiron Corporation, Emeryville, CA), qualitative HCV COBAS AMPLICOR system (COBAS AMPLICOR HCV; Roche Diagnostics, Branchburg, N.J.) or by transcription-mediated amplification (Novartis, Emeryville and Gen Probe, San Diego, CA.) Those subjects with a sample below the limit of detection by the bDNA assay (potential subjects with HCV recovery) had another sample tested with the qualitative COBAS, and their antibody status was confirmed by RIBA (RIBA 3.0) (Novartis). All assays were performed according to the manufacturer’s specifications.
The 112 candidate genes selected for this study were either known or hypothesized to be involved in the immune response to HCV (Table 1). A Scientific Advisory Board selected candidate genes based on their known or suspected role in the pathogenesis of HCV clearance (see acknowledgement). A gene region, which was defined as the candidate gene plus 20 kb of flanking sequence both 5’ and 3’ to encompass regulatory regions, was determined for each candidate gene. If the flanking sequence from one candidate gene contained another candidate gene, the two gene regions were combined into one large gene region. Since the flanking sequences were large, a gene region often contained genes not selected specifically for this study. We initially selected all Phase I HapMap (www.hapmap.org) SNPs in each of these gene regions. The goal of the the first phase of the International HapMap Consortium was to genotype a common SNP every 5kb (14). Since our study population was comprised of both European-American (EA) and African-American (AA) individuals, allele frequencies from HapMap for both the Yoruba in Ibadan, Nigeria (YRI) and the Utah residents with ancestry from northern and western Europe (CEU) datasets were considered during SNP selection. If a SNP was in complete linkage disequilibrium (D’=1) with another SNP being tested in both the CEU and YRI datasets, only one was included. Only SNPS with minor allele frequencies (MAF) 5% in both the CEU and YRI datasets were considered.
Variants in each selected SNP were determined using the Illumina platform at the Johns Hopkins Genetic Resources Core Facility (JHGRCF). As a quality control measure, 20 known duplicates provided by JHGRCF and 20 blind duplicate samples from our test set were included. The JHGRCF did not report any data from samples that failed genotyping or SNPs that had: 1) poorly defined clusters, 2) excessive replicate errors, 3) more than 50% missing data, or 4) all heterozygote genotype calls. For each SNP, the JHGRCF reported individual raw data as well as allele frequencies.
Genotype quality analysis and filtering was done using PLINK. The criteria to exclude SNPs from statistical analysis included 1) minor allele frequencies (MAF) < 5%, 2) > 10% missing genotypes, and 3) deviation from Hardy Weinberg Equilibrium (HWE) (P-values < 0.001).
Population stratification is the presence of multiple ancestry groups in a study population that can lead to type I or type II errors due to inherent allele frequency differences. We used the genomic control (GC) (15) procedure to correct for population stratification. This method is based on the idea that under population stratification, the distribution of observed chi-squared test statistics from association tests will be inflated by a genomic inflation factor (λ) compared to the true chi-squared distribution under the null hypothesis. λ is calculated by dividing the median of the observed chi-squared distribution by the median of the expected chi-squared distribution (0.456). A λ of 1.0 or less suggests there is no inflation due to stratification, so no correction is made. Values greater than 1.0 suggest stratification is present and GC is applied by dividing the observed test statistics by λ. In candidate gene studies, λ can be calculated using a set of unlinked SNPs that are not related to the disease of interest. However, since we did not choose SNPs specific for GC analysis, we used an alternative method to calculate λ. In this method, we started with the set of all successfully genotyped SNPs. We dropped SNPs if they were in linkage disequilibrium with another SNP in the set (r2 > 0.5) or if they had an association test P-value < 0.05. These two groups of SNPs were not included because they would artificially inflate λ. This procedure was done for both the EA and AA data, resulting in a λ for each group. Individuals self-identified their ethnicity. This self-identification was accurate as determined by a multidimensional scaling (MDS) analysis using the software program PLINK version 1.05 (http://pngu.mgh.harvard.edu/~purcell/plink/) (16) (Figure 1). Thus, samples self-identified as EA or AA were analyzed separately. Samples that did not self-identify as one of these two race groups were not included in this analysis (n=91).
Statistical analysis was performed in the software package PLINK. The allele frequencies of HCV clearance and persistence subjects were compared and the odds of clearing HCV infection were calculated for each SNP. Linkage disequilibrium (LD) in a genetic region was calculated for SNPs with allelic association test P-values < 0.01. SNPs in LD (r2 > 0.5) were grouped together as one signal.
Haplotype blocks were defined for all gene regions containing at least one SNP with an allelic association test p-value < 0.05. Haploview (http://www.broad.mit.edu/haploview/haploview) (17) was used to define the haplotype blocks using the “Solid spine of LD” algorithm. This definition requires all SNPs within the haplotype block to be in strong LD (D’>0.8) with the first and last SNP in the block. Haplotypes were inferred in Haploview using an E-M algorithm.
To control the Type 1 error, we performed 25,000 permutations for every SNP. Permutation testing calculates an empirical p-value by determining how frequently the association identified would occur by chance. For each permutation, the case-control labels (phenotype) are shuffled and the maximum chi-squared test statistic observed is compared to the experimental test-statistics for each SNP. An empirical p-value is calculated that provides a pointwise estimate for the significance of each SNP. Empirical p-values were calculated for every allelic association and p-values <0.01 were considered statistically significant. In addition, for the gene regions with at least one SNP meeting the pointwise p-value threshold of 0.01 for statistical significance in both populations, we calculated a gene region specific empirical p-value that accounted for the intra-correlation of SNPs and LD structure within the candidate gene and potential Type 1 error (18). We performed these permutations using the max(T) permutation procedure in PLINK.
Power calculations were determined using the statistical program Quanto (http://hydra.usc.edu/gxe/). Assuming a population–wide HCV clearance rate of 30%, this study had 80% power to detect an associated SNP with a frequency of 20% and a relative risk of >1.5 when the analysis was not stratified by race. For a race-stratified analysis, there was 80% power to detect an associated SNP with a frequency of 20% and a relative risk >1.7.
There were 352 individuals included in the AA group of whom 133 (37.8%) had HCV clearance and 219 (62.2%) had HCV persistence. There were 441 individuals included in the EA group of whom 169 (38.3%) had HCV clearance and 272 (61.7%) had HCV persistence (Table 2). The 91 individuals in the ‘Other’ group were not used in this analysis because the racial background of this group was very diverse including Hispanic, Asian and Native American leaving limited power for evaluation of these ethnic groups.
Of the 1536 SNPs tested, 110 were excluded because they did not reach the standards of JHGRCF quality control, leaving 1426 SNPs that were successfully genotyped (92.8%). Of the 890 individuals, 6 (0.7%) individuals had repeat genotyping failures and were removed from the dataset by the JHGRCF for poor DNA quality. The final data set generated by JHGRCF had a missing data rate of 0.074%. The error rate for the known and blinded duplicates was 0%.
Prior to analysis, 98 (7%) SNPs from the AA group and 258 (18%) from the EA were removed because the minor allele frequency < 5%. An additional 5 SNPs in the AA group and 2 SNPs in the EA group were eliminated because the SNPs deviated from Hardy-Weinberg equilibrium
Population stratification was assessed using the genomic control method in the AA (n=818 SNPs) and EA (n=526 SNPs) separately. The λ values were 1.01 and 0.931 for AA and EA, respectively suggesting no population stratification in either group.
Allelic association in the AA population identified 18 SNPs in 11 genes regions with empirical p-values < 0.01 (Supplementary Table 1). The same test in the EA samples resulted in 20 SNPs in eight gene regions with empirical p-values < 0.01 (Supplementary Table 2). Since many of these SNPs were in linkage disequilibirum within the same gene, we reduced the number of unique signals to 10 SNPs in AAs and 8 SNPs in EAs (Table 3). Interestingly, there were 4 gene regions, TNFSF18, TANK, HAVCR1 and IL18BP, which contained SNPs significantly associated with HCV outcome (empirical pvalue < 0.01) in both the AAs and EAs. However, the specific SNPs within these genes that were associated differed between the racial groups (Table 4).
Haplotypes were generated for AAs and EAs for each gene region with at least one SNP with p-value < 0.05. None of these haplotypes were more significantly associated with outcome than the individual SNPs tested (data not shown).
In this large-scale study, over 1400 SNPs in 112 gene regions that are candidates for playing a role in HCV clearance were genotyped. We found 18 SNPs in African-Americans and 20 SNPs in European-Americans with an empirical p < 0.01. Interestingly, we identified 4 gene regions that had an empirical p-value < 0.01 in both of the major races studied: TNFSF18, TANK, HAVCR1 and IL18BP. Two of the SNPs tested, HAVCR1 (rs1553316) and IL18BP (rs5743673), are coding SNPs, however, neither is known to have functional consequences. The replication of four gene regions in two independent populations is encouraging and suggests that these gene regions should be considered leading candidates for a role in HCV clearance. Although the exact SNPs were not necessarily replicated in each population this may be due to differences in allele frequencies, LD structure, or true allelic heterogeneity.
The TNFSF18 (Tumor Necrosis Factor (ligand) Superfamily, member 18), also known as GITRL, gene region is found on chromosome 1. TNFSF18 is expressed on CD4+CD28+ Regulatory T-cells (TRegs). TRegs can suppress other immune responses, providing a negative feedback on the immune system and preventing autoimmune responses. The binding of TNFSF18 to its receptor results in a down-regulation of TReg regulatory function and thus can lead to an increase in immune response, which would be favorable for HCV clearance (19).
The TANK (TRAF family member-associated NFKB activator) region is located on chromosome 2. TANK has been found to be important in type 1 interferon production through its interaction with both the RIG-I and toll-like receptor dependent (TLR) pathways (20), both of which are important in the innate immune response to HCV. TANK also plays a role in inducing a cellular response to tumor necrosis factor-alpha (21), and it has been described as an adaptor protein that is required for IRF3 activation (22). Thus, if a SNP alters the function of TANK, then either the innate or adaptive immune response to HCV could be affected.
The HAVCR1 (Hepatitis A Virus Cellular Receptor 1), also known as TIM1, gene region is found on chromosome 5. It belongs to a family of cell surface glycoproteins and appears to act as a costimulatory molecule in vitro leading to enhancement of T cell proliferation as well as Th1 and Th2 cytokine production. Interestingly, polymorphisms in HAVCR1 including a six-amino-acid insertion at residue 157 (157insMTTTVP), are linked to asthma and autoimmune diseases suggesting that these variants may affect HAVCR1 function (23). Thus, it is also possible that such functional variants could alter the immune response to HCV
The IL18BP (Interleukin-18 Binding Protein) gene region is found on chromosome 11. IL18BP is a secreted protein that can bind to and neutralize IL18, which prevents IL18-induced IFN-gamma production (24). Polymorphisms in both IFN-gamma and IL18 have been implicated in HCV infection outcome (25, 26), and IL18 is up regulated in persons with chronic HCV infection (27). It is possible variants in IL18BP could affect the activity or production of IL18 and IFN gamma altering HCV outcome.
In addition to these four gene regions, one of the top-scoring SNPs in the EA group, rs1804027, was also significant in a study (listed as IMS-JST013416) investigating natural clearance of HCV in a Japanese population (28). This SNP results in a non-synonymous mutation in nuclear body protein SP110. The function of SP110 has not been well described, but it has been shown that HCV core protein can bind an isoform of SP110, SP110b, which results in the activation of Retinoic Acid Receptor (RARα)-mediated transcription (29).
It is important to consider the limitations of this study when interpreting the results. First, the size of the study makes it difficult to detect weak associations in frequent polymorphisms and any associations in rare variants. Second, deletion or insertion polymorphisms that may alter function are unlikely to be discovered unless they are tightly linked to one of the tested SNPs. Third, SNPs were selected for coverage of genes and not for specific function therefore this study was not designed to identify causal alleles, but genes that may influence HCV clearance. Fourth, this study included many of the leading candidate gene regions potentially associated with HCV clearance at the time it was designed. However, since it was not intended to be an exhaustive survey of all interesting gene regions, additional studies based on these data should also consider the most recent data in HCV pathogenesis and include other relevant gene regions. For example, recently we and others reported a polymorphism in Il28B associated with HCV clearance and treatment response (30, 31). Lastly, epistatic interactions between variants in different genes (such as ligand-receptor pairs) were not considered, because this study did not have enough power for such a large number of comparisons. Such interactions can be important in HCV pathogenesis as has been demonstrated for HLA and KIR genes (3).
By providing data on over 1400 SNPs in 112 candidate gene regions for HCV clearance or persistence, this study is an important first step since it reveals SNPs in four gene regions that warrant further investigation as a possible genetic basis for the natural clearance of HCV in multiple populations. Furthermore, this study provides the stimulus for confirmatory studies of our top scoring SNPs in other large, independent cohorts in order to determine the causal gene regions involved in the outcome of an acute HCV infection. These gene regions then need to be further dissected in order to determine the specific polymorphisms involved in HCV clearance.
We would like to thank the Scientific Advisory Board including: Steven O’Brien, Robert Lanford, Stanley Lemon, Hugo Rosen, Christopher Karp, Michael Gale and Mary Carrington.
Funding: NIH-R01-DA13324, NHLBI-R01HL076902, NIH, National Institute of Child Health and Human Development grant 1 R01-HD41224, NIH-R01-DA0334, NIH-R01-DA12568, NIAID (UO1-AI-35004, UO1-AI-31834, UO1-AI-34994, UO1-AI-34989, UO1-AI-34993, and UO1-AI-42590), NICHHD (UO1-HD-32632 National Cancer Institute contract N02-CP-91027 with RTI International. Also in part by the Intramural Research Program, National Institute of Health.
Conflict of interest statement:
None of the others have a commercial or other association that would pose a conflict of interest with this study.