|Home | About | Journals | Submit | Contact Us | Français|
The genomic region surrounding the TNF locus on human chromosome 6 has previously been associated with typhoid fever in Vietnam. We used a haplotypic approach to understand this association further. Eighty single nucleotide polymorphisms (SNPs) spanning a 150 kb region were genotyped in 95 Vietnamese individuals (typhoid case/mother/father trios). A subset of data from 33 SNPs with a minor allele frequency of >4.3% was used to construct haplotypes. Fifteen SNPs, which tagged the 42 constructed haplotypes were selected. The haplotype tagging SNPs (T1-T15) were genotyped in 380 confirmed typhoid cases and 380 Vietnamese ethnically matched controls. Allelic frequencies of seven SNPs (T1, T2, T3, T5, T6, T7, T8) were significantly different between typhoid cases and controls. Logistic regression results support the hypothesis that there is just one signal associated with disease at this locus. Haplotype-based analysis of the tag SNPs provided positive evidence of association with typhoid (posterior probability 0.821). The analysis highlighted a low-risk cluster of haplotypes that each carry the minor allele of T1 or T7, but not both, and otherwise carry the combination of alleles *12122*1111 at T1-T11, further supporting the one associated signal hypothesis. Finally, individuals that carry the typhoid fever protective haplotype *12122*1111 also produce a relatively low TNF-α response to LPS.
Typhoid fever is a human specific systemic disease caused by infection with Salmonella enterica serotype Typhi (Parry et al. 2002). It is estimated that 22 million cases of typhoid fever occur worldwide per year, resulting in 200,000 deaths (Crump et al. 2004). There is a significant burden of disease in developing countries where sanitary conditions can be inadequate. In southern Vietnam, typhoid fever is the major cause of community-acquired septicemia (Hoa et al. 1998). Recent community based surveillance of disease prevalence reported incidence rates of 198 per 100,000 in the Mekong Delta, Vietnam (Lin et al. 2000). The 1990s saw the development and spread of multidrug resistant strains of S. Typhi in southern Vietnam. With approximately 90% of S. Typhi isolates now multi-drug resistant, the potential for a return to the pre-antibiotic era and untreatable typhoid fever exists. It is possible that the future control of typhoid fever may lie in alternative treatments or preventative measures to augment or replace existing therapies. Identification of typhoid fever susceptibility or resistance genes provides insight into the host-pathogen interaction and disease mechanisms, which may ultimately contribute to the development of new therapies.
The genomic region surrounding the TNF locus on human chromosome 6 has previously been associated with typhoid fever. We identified haplotypes that were either protective (TNFA*1.DRB1*04) or predisposed individuals to typhoid fever (TNFA*2.DRB1*0301) (Dunstan et al. 2001). In addition, a study in Indonesia suggested a protective role of DRB1*12021 for complicated typhoid fever (Dharmana et al. 2002). This genomic region encoding the major histocompatibility complex (MHC) is gene rich with a large number of genes related to immunity and inflammation (The MHC sequencing Consortium 1999). Therefore it is difficult to pinpoint the causal basis of a single SNP association with disease as a number of genes within this region could individually or collectively be responsible. Of these, the TNF gene, which encodes the pro-inflammatory cytokine TNF-α, is a strong candidate. Keuter et al. (1994) measured TNF-α levels in typhoid fever patients and found that the production of this cytokine was lower in the acute phase of the disease than in convalescence. Bhutta et al. (1997) have reported an association between circulating TNF-α levels and typhoid fever severity and more recently House et al. (2002) showed that low ex vivo production of TNF-α was associated with a delayed recovery. However, the typhoid associated TNFA -308 polymorphism may be behaving as a marker for the true causal polymorphism, which could be found within TNF or other genes in close physical or genetic proximity. To understand how an association between a TNF promoter polymorphism and typhoid arose it is necessary to first understand the haplotypic structure of the TNF region in the Vietnamese.
Investigating the genetic susceptibility to disease using a haplotypic approach is more powerful than genotyping individual genetic markers (Daly et al. 2001). The human genome can be divided into haplotype blocks, defined as sizeable regions of the genome with little evidence of historical recombination (Gabriel et al. 2002). Within these blocks only a small number of common haplotypes are observed (Gabriel et al. 2002). The potential of haplotype blocks to map human complex trait loci is being vigorously investigated and large-scale haplotype mapping projects in specific regions of the genome (Allcock et al. 2002), and throughout the genome, are underway (The International HapMap Project 2003). Once haplotype blocks for a genomic region are identified, the minimum number of SNPs that captures the most frequently occurring haplotypes can be determined (Johnson et al. 2001). Identification of these haplotype tagging SNPs (htSNPs) not only enables a significant reduction in genotyping but also allows a comprehensive and sensitive scan of the common variation within a genomic region.
Initial studies investigating haplotypic variation of the MHC region (Walsh et al. 2003) and more specifically in the MHC Class III region (Ackerman et al. 2003a, b) have been reported. Ackerman et al. (2003b) investigated the haplotypic structure of the TNF region, within MHC Class III, in a population of West Africans. Genotyping a small number of SNP markers (N = 25) over an 80 kb region they found that linkage disequilibrium (LD) was remarkably heterogenous and concluded that more detailed marker maps of the TNF region were needed when attempting to identify the causal basis of a genetic association with disease (Ackerman et al. 2003b).
In this study we aimed to define the haplotypic structure of the TNF region in a Vietnamese population, to identify the haplotype tagging SNPs of this region and to investigate how these individual SNPs and haplotypes may be associated with typhoid fever.
Genomic DNA from patients with typhoid fever was collected as part of larger epidemiologic or treatment studies. These studies were either performed at the Hospital for Tropical Diseases in Ho Chi Minh City, Dong Thap Provincial Hospital in Dong Thap Province or Dong Nai Paediatric Center in Dong Nai Province. Venous blood (2 ml) was collected from 380 patients with blood culture positive typhoid fever admitted to one of the three hospitals. The samples and studies have been described previously (Chinh et al. 2000; Dunstan et al. 2001; Luxemburger et al. 2001; Phuong et al. 1999; Vinh et al. 2004). In addition umbilical cord blood samples from babies born at Huong Vuong Hospital in Ho Chi Minh City were collected.
To enable accurate construction of haplotypes we collected simplex families (case/parent trios). Venous blood was collected from patients with blood culture positive typhoid fever who were admitted to Dong Thap Provincial Hospital. Health care workers from Dong Thap Provincial Hospital then collected blood samples from both parents either in the hospital or during a home visit. In this study 93 case/parent trios were analysed.
All case patients and control subjects were unrelated and were of the Vietnamese Kinh ethnicity. Informed consent was obtained from the individuals admitted into the study. Ethical approval was obtained by the ethical and scientific committee of the Hospital for Tropical Diseases, the Dong Thap Hospital and the Health services of Dong Thap Province and the institutional review board of Dong Nai Paediatric Center. Ethical approval was also granted from the Oxford Tropical Research Ethics Committee (OXTREC) of Oxford University, UK.
Genomic DNA from typhoid patients and their parents was extracted from approximately 2 ml of venous blood using either the blood midi kit from Qiagen (Qiagen, Lewes, UK) or the Nucleon BACC1 extraction kit (Nucleon Biosciences UK). For controls, genomic DNA was extracted from 10 ml of cord blood using the blood maxi kit from Qiagen (Qiagen, Lewes, UK). DNA concentration was determined by picogreen (Molecular Probes Invitrogen, Paisley, UK) using a Tecan fluorescent plate reader. Genomic DNA was amplified using primer extension pre-amplification (PEP) (Zhang et al. 1992).
High throughput genotyping was performed by allele-specific MALDITOF mass spectrometry using the Sequenom MassArray system. Briefly, a fragment of approximately 100 bp containing the SNP site was first amplified by PCR (Tetrad thermal cycler, MJ Research, Waltham, MA, USA). Multiplex PCR reactions (5 μl) were performed in a 384-well PCR plate by mixing 2 μl of PEP DNA (1:20 dilution) with 800 μM of dNTP, 1 × NH4 buffer, 2 mM MgCl2, 0.025 units of BioTaq (Bioline), and 0.2 μM of each primer. The cycling parameters were 96°C for 1 min then 5 cycles of 94°C for 45 s, 56°C for 45 s and 72°C for 30 s, then 29 cycles of 94°C for 45 s, 65°C for 45 s and 72°C for 30 s, then 72°C for 10 min once. Following PCR the unincorporated dNTPs were removed by treatment with shrimp alkaline phosphatase, the extension reaction performed and the subsequent products are desalted. Fifteen nanolitres of the reaction mixture were then “spotted” onto a SpectroCHIP. The CHIP was read in the Bruker Biflex III Mass spectrometer system, and the data analysed by SpectroTYPER. The data obtained for all typed SNPs was tested for HWE. All SNPs were in HWE (P > 0.05) when using Yates correction.
Construction of haplotypes was performed using PHASE (Stephens et al. 2001) and PHAMILY (Ackerman et al. 2003a) software accessed via http://www.gmap.net/analysis.htm. The program PHAMILY reconstructs parental haplotypes where phase is unambiguous and this data is entered into PHASE to increase the accuracy of haplotype reconstruction by using all available information. PHASE is an implementation of the Stephens-Donnelly method of haplotype construction, which uses a Bayesian approach to assign the remaining phase-unknown sites among the unrelated parents. The program HaploXT (Abecasis and Cookson 2000) was used to define haplotype structure by measuring LD between the SNPs. The standardized disequilibrium coefficient (D′) values generated by HaploXT can be visualized using the graphical display program Marker beta (http://www.gmap.net/marker/). Haplotype tagging SNPs were selected by the Entropy program using the big_haplotype algorithim with a moving window size of 33 SNPs (Ackerman et al. 2003a) (http://www.well.ox.ac.uk/~rmott/SNPS/). This program chooses a subset of markers that best approximates the haplotypic diversity in the population by identifying the marker subset with maximum entropy, that is the entropy that is achieved when the complete set of markers are genotyped.
Pearson’s χ2 test was used to test associations between disease phenotypes and allele or genotype frequencies. Yates correction for 1 degree of freedom was applied. The Fisher’s exact test was used when an expected value in the contingency table was <5. P < 0.05 was considered significant. Step-wise logistic regression analysis was performed using SPSS for Windows 10.0.5 (SPSS Inc, Chicago, IL, USA). STAT/SE 8.0 (Stata Corporation, Texas, USA) was used in conjunction with a genetics specific statistical package, genassoc from http://www-gene.cimr.cam.ac.uk/clayton/software/stata.
The posterior probability of haplotype association with disease phenotypes was assessed using the GENEBPM algorithm (Morris 2005). The disease phenotype of each individual was modeled in a logistic regression framework, parameterized in terms of the odds of disease for each possible pair of haplotypes consistent with the observed SNP genotype, weighted by the corresponding phase assignment probabilities. A Bayesian partition model is utilized to cluster haplotypes according to their similarity, with each haplotype in the same cluster assigned the same odds of disease.
Ex vivo whole blood stimulation with Escherichia coli LPS was performed, and TNF-α cytokine levels were measured, according to the methods of House et al. (2002).
Through genomic sequencing and public database interrogation approximately 200 SNPs were identified in a 150 kb segment of the MHC Class III region encompassing TNFA on chromosome 6 (Kwiatkowski et al., personal communication). Twelve genes span this region; MICB, BAT1 (UAP56), ATP6V1G2, NFKBIL1, LTA, TNF, LTB, LST1 (1C7), NCR3, AIF-1, BAT2, and BAT8 (Fig. 1). Genotyping these SNPs in individuals of Gambian and Caucasian ethnicity identified 80 SNPs with a minor allele frequency of >0.05 and an 80% genotyping success rate using the Sequenom MassArray (Kwiatkowski et al., personal communication). Figure 1 shows the location of the 80 SNPs in relation to the 12 genes spanning this 150 kb region.
The 80 SNPs were genotyped in 95 Vietnamese individuals; 31 case/mother/father trios (93) plus one additional mother/father pair (2). Of these 80 SNPs genotyped, 7 SNPs completely failed, 8 SNPs had a failure rate >20%, 21 SNPs were monomorphic and 9 SNPs had a minor allele frequency of <4%. Table 1 shows the SNP name, position and minor allele frequency of the 80 SNPs in 64 unrelated Vietnamese individuals (32 mother/father pairs). In total, 35 SNPs had a genotyping failure rate of <20% and a minor allele frequency >4.3%, however data from 2 of these SNPs UAP56*7126 and NFKBIL1*15811, was not analysed further as their chromosomal location was not confirmed. The genotypes of these 33 SNPs in control individuals all displayed Hardy Weinberg Equilibrium (HWE; P >0.05).
Haplotype construction from data of 124 unrelated parental chromosomes (31 mother/father pairs) genotyped for 33 SNPs was performed using the PHASE (Stephens et al. 2001) and PHAMILY (Ackerman et al. 2003a) programs. Forty-two haplotypes with a frequency of 1% or greater were reconstructed from 33 SNPs (Table 2). Haplotypes 2, 5, 8 and 10 are considered common with frequencies greater than 5%, with haplotype 5 being the most common at a frequency of 17%. Twenty haplotypes were found with a frequency of 1%, 13 haplotypes with 2%, 5 haplotypes with 3% and 3 haplotypes with 6-10%. Analysis of this data using HaploXT (Abecasis and Cookson 2000) and the graphical display program Marker beta showed that the level of LD within this genomic region is high and this is particularly evident in the block from BAT1*11796 to 1C7*2708 (Fig. 2).
The Entropy program was used to select haplotype tagging SNPs for the 42 haplotypes reconstructed from genotyping 33 SNPs in 124 chromosome (Ackerman et al. 2003a). A minimum subset of 15 tagging SNPs capturing the 42 haplotypes giving maximum entropy of 4.72238 was identified (Fig. 1). The 33 SNPs used for haplotype construction and the 15 tagging SNPs are shown in Table 1.
Fifteen tagging SNPs (labelled T1-T15) plus one additional SNP were genotyped in 380 typhoid cases and 380 cord blood controls. A sample size of 380 cases and controls is sufficient to detect relative risks of 2 or greater for all allele frequencies from 0.1 to 0.5 with at least 94% power and a P value of 0.05. The genotypes of the 16 SNPs in control individuals all displayed HWE (P > 0.05). The allele frequencies, genotypes, allelic comparisons and genotypic comparisons are shown in Table 3. Allelic frequencies of seven SNPs (T1, T2, T3, T5, T6, T7, T8) and genotypic frequencies of six SNPs (T1, T3, T5, T6, T7, T8) were significantly different between typhoid cases and controls.
An additive model, as opposed to a dominant or recessive model, best represented the data as evidenced by the strong odds ratio (OR) generated when comparing allelic frequencies (data not shown). Therefore all data was recoded to represent an additive model for logistic regression. To establish whether association effects seen at different loci are independent we used multiple logisitic regression analysis (Cordell and Clayton 2002). This approach can be used to test the null hypothesis of no association for each SNP, adjusted for the additive effects of all other SNPs (Table 4). For the model containing all 16 SNPs, only SNPs T1 and T7 were significant (P = 0.019 and P = 0.003, respectively). Dropping SNP T1 from the model, T7 remained highly significant (P = 0.006). Similarly, SNP T1 remains significant (P = 0.029) when T7 is dropped from the model. However, when both of these SNPs are dropped from the model, T6 demonstrates significant evidence of association (P = 0.036). These results suggest that the effects of T1 and T7 are independent in terms of their association with disease, but that the effect of T6 is partially correlated with that of both T1 and T7.
To take into account the high level of missing data when using all 16 SNPs (24%) we decided to only put the highly significant SNPs (T1, T6, T7) into the regression model to reduce the missing data to 11.9%. This logistic regression analysis confirmed the initial findings i.e. when T1, T6 and T7 were put into the regression, markers T1 and T7 both became significant (P = 0.003 and P = 0.003, respectively). These combined logistic regression results support the hypothesis that there is just one signal associated with disease at this locus. A model of this hypothesis is graphically represented in Fig. 3.
We utilized the GENEBPM algorithm (Morris 2005) to approximate the posterior probability of association of tag SNP haplotypes with disease to further investigate the pattern of results obtained from the single-point and multi-locus analyses. The posterior probability was estimated to be 0.821, compared to a prior probability of 0.5, representing positive evidence of an association of tag SNP haplotypes with disease.
Figure 4 presents a cladogram of common haplotypes (frequency greater than 1%), constructed from output of the GENEBPM algorithm. The most common haplotype is labeled ‘1’, the second most common is labeled ‘2’, and so on. The cladogram can be used to represent the similarity of haplotypes in terms of the tag SNPs they carry and their disease risk. Haplotypes that cluster closely are likely to share recent common ancestry, and thus have similar risk of disease. Figure 4 highlights a specific low-risk clade of haplotypes, listed in Table 5 with their estimated frequency and posterior mean odds ratio, relative to the most common haplotype.
Haplotypes in the low-risk cluster all carry the combination of alleles *12122*1111 at SNPs T1-T11, where * represents either allele. The minor allele at SNPs T1 and T7 occur only in this cluster, and define two independent segments of this clade, confirming the conclusions of the multi-locus analysis. Furthermore, the minor allele at SNP T6 occurs only in this clade, and one other rare haplotype. Thus, in the absence of SNPs T1 and T7, SNP T6 best isolates the low-risk clade, and thus becomes significant in the multi-locus analysis in the absence of T1 and T7.
Blood from typhoid patients on days 1, 4 and 7 of treatment were stimulated with 1 μg/ml LPS for 24 h and the level of TNF-α production in the supernatant was measured. Figure 5 shows the levels of TNF-α release in typhoid patients that either have or do not have the *12122*1111 haplotype. The amount of TNF-α produced by the patients who have the *12122*1111 haplotype is significantly less than those who do not have the haplotype on day 4 of treatment (P = 0.023). This trend is also observed on day 7 of treatment (P = 0.057).
Investigating the role of host genes in susceptibility and protection from disease historically involved genotyping individual genetic markers in candidate genes and looking for disease associations. However, this approach investigates only single gene loci and may identify only indirect markers of disease. In recent years development of a haplotypic approach to study disease susceptibility has progressed rapidly (Daly et al. 2001). Data from the recently released HapMap project offers a powerful tool to potentially identify the multiple genetic loci, and their interactions, that are responsible for protection or susceptibility to complex diseases (Altshuler et al. 2005). In this study we have investigated a genetic association between typhoid fever and the MHC class III region using a haplotypic approach.
With the aim to identify SNPs that are the most informative to common haplotypes in the TNF region in the Vietnamese we used an unstructured method to economically select htSNPs (Ackerman et al. 2003a). This method was appropriate for this data set as, (1) this genomic region has a high level of LD, (2) the average % missing data for all htSNPs was low (on average 3.3%) and (3) we used family trios for haplotype inference (Forton et al. 2005). To improve htSNP selection, a small number of additional SNPs could be identified in sites of increased inference error, which is determined by identifying loci that are most prone to haplotype reconstruction error (http://www.gmap.net/marker). In the htSNP set used here, SNPs rs2259571-rs10885 show high error profiles of 1-3% at each haplotype locus (data not shown). Although additional SNPs in this high error profile region may have increased the accuracy of haplotype reconstruction, the set of htSNPs that we found individually associated with typhoid fever lie within a region of lower error. This suggests that the htSNPs selected in the region of disease association were adequate.
Cladisitic analysis of SNPs is a novel approach to disease-gene mapping and provides considerably more power than single-locus methods (Durrant et al. 2004; Morris 2005). Cladisitic methods are based on the expectation that chromosomes with recent shared ancestry are similar in the vicinity of a disease gene. We used the GENEBPM algorithm to identify a cluster of low-risk haplotypes (*12122*1111) and identify groups of cases that harbour these haplotypes (Morris 2005). This algorithm gave a posterior probability of 0.821 (0.75 corresponds to 3:1 odds against the hypothesis that the haplotypes and disease are not associated), which represents positive evidence of an association between tag SNP haplotypes and typhoid fever. Both the combined logistic regression results and the cladisitic analysis support the hypothesis that there is just one signal associated with disease at this locus, and this signal is marked by the *12122*1111 haplotype.
Haplotype-based analysis revealed that the frequency of *12122*1111 was higher in the control population compared to typhoid fever patients. This strong association is with hospitalized typhoid as all cases genotyped in this study were inpatients, and although they did not show disease complications, they represent individuals with more severe infections than typhoid sufferers within the community that are more likely to have less symptomatic disease.
Although we have identified a haplotype in the TNF region that affords protection from typhoid fever we are yet to determine the causative disease loci. The seven associated htSNPs span a region of 44.7 kb and are found within the genes BAT1, LTA and TNF. BAT1, which is a member of the DEAD-box protein family encoding an ATP-dependent RNA helicase, has been shown to be a negative regulator of inflammation (Allcock et al. 2001). LTA encoding lymphotoxin-α, and TNF-α are members of the TNF super-family, mediating a large variety of inflammatory and immunostimulatory responses. All three genes, or haplotypes spanning these genes, have been associated with a variety of infectious and inflammatory diseases (Cabrera et al. 1995; Knight et al. 1999; Migita et al. 2005; Moffatt and Cookson 1997; Zeggini et al. 2002), and functional variation of these proteins could potentially effect susceptibility to typhoid fever. In this report we have shown that patients that carry the protective haplotype *12122*1111 produce less ex vivo TNF-α than patients without the haplotype on day 4 of treatment, and this trend is also seen on days 1 and 7 of treatment. Our future work will involve investigating TNF-α expression in healthy individuals with or without the protective haplotype to clearly examine this relationship. Future work to pinpoint the causative mutation responsible for the protective effect of the *12122*1111 haplotype will involve either a very high resolution association study of the region from TNF to BAT1 or re-sequencing of this region in Vietnamese individuals carrying this haplotype.
We acknowledge with thanks the directors and the staff of the Hospital for Tropical Disease, Dong Thap Provincial Hospital, Dong Nai Pediatric Hospital and Hung Vuong Hospital, Viet Nam for the clinical and microbiological work associated with this study. We thank the Vietnamese individuals who took part in this study. The Wellcome Trust funded this work.
Sarah J. Dunstan, Oxford University Clinical Research Unit, Hospital for Tropical Diseases, 190 Ben Ham Tu, Quan 5, District 5, Ho Chi Minh City, Vietnam & Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, Oxford University, OX3 7LJ Oxford, United Kingdom.
Nguyen Thi Hue, Oxford University Clinical Research Unit, Hospital for Tropical Diseases, 190 Ben Ham Tu, Quan 5, District 5, Ho Chi Minh City, Vietnam & Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam.
Kirk Rockett, Wellcome Trust Centre for Human Genetics, Oxford University, OX3 7BN Oxford, UK.
Julian Forton, Wellcome Trust Centre for Human Genetics, Oxford University, OX3 7BN Oxford, UK.
Andrew P. Morris, Wellcome Trust Centre for Human Genetics, Oxford University, OX3 7BN Oxford, UK.
Mahamadou Diakite, Wellcome Trust Centre for Human Genetics, Oxford University, OX3 7BN Oxford, UK.
Mai Ngoc Lanh, Dong Thap Provincial Hospital, Dong Thap, Vietnam.
Le Thi Phuong, Dong Thap Provincial Hospital, Dong Thap, Vietnam.
Deborah House, Wellcome Trust Sanger Centre Hinxton, Cambridge, UK.
Christopher M. Parry, Oxford University Clinical Research Unit, Hospital for Tropical Diseases, 190 Ben Ham Tu, Quan 5, District 5, Ho Chi Minh City, Vietnam & Department of Medical Microbiology and Genitourinary Medicine, University of Liverpool, Liverpool, UK.
Ha Vinh, Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam.
Nguyen T. Hieu, Hung Vuong Hospital, Ho Chi Minh City, Vietnam.
Gordon Dougan, Wellcome Trust Sanger Centre Hinxton, Cambridge, UK.
Tran Tinh Hien, Hospital for Tropical Diseases, Ho Chi Minh City, Vietnam.
Dominic Kwiatowski, Wellcome Trust Centre for Human Genetics, Oxford University, OX3 7BN Oxford, UK.
Jeremy J. Farrar, Oxford University Clinical Research Unit, Hospital for Tropical Diseases, 190 Ben Ham Tu, Quan 5, District 5, Ho Chi Minh City, Vietnam & Centre for Tropical Medicine, Nuffield Department of Clinical Medicine, Oxford University, OX3 7LJ Oxford, United Kingdom.