In this summary paper, we describe the contributions included in the haplotype-based analysis group (Group 4) at the Genetic Analysis Workshop 16, which was held September 17-20, 2008. Our group applied a large number of haplotype-based methods in the context of genome-wide association studies. Two general approaches were applied: a two-stage approach that selected significant single-nucleotide polymorphisms and then created haplotypes and genome-wide analysis of smaller sets of single-nucleotide polymorphisms selected by sliding windows or estimating haplotype blocks. Genome-wide haplotype analyses performed in these ways were feasible. The presence of the very strong chromosome 6 association in the North American Rheumatoid Arthritis Consortium data was detected by every method, and additional analyses attempted to control for this strong result to allow detection of additional haplotype associations.
population stratification; multiple comparisons
Genome-wide association studies (GWAS) conducted using commercial single nucleotide polymorphisms (SNP) arrays have proven to be a powerful tool for the detection of common disease susceptibility variants. However, their utility for the detection of lower frequency variants is yet to be practically investigated. Here we describe the application of a rare variant collapsing method to a large genome-wide SNP dataset, the Wellcome Trust Case Control Consortium rheumatoid arthritis (RA) GWAS. We partitioned the data into gene-centric bins and collapsed genotypes of low frequency variants (defined here as MAF ≤0.05) into a single count coupled with univariate analysis. We then prioritised gene regions for further investigation in an independent cohort of 3,355 cases and 2,427 controls based on rare variant signal p value and prior evidence to support involvement in RA. A total of 14,536 gene bins were investigated in the primary analysis and signals mapping to the TNFAIP3 and chr17q24 loci were selected for further investigation. We detected replicating association to low frequency variants in the TNFAIP3 gene (combined p = 6.6 × 10−6). Even though rare variants are not well-represented and can be difficult to genotype in GWAS, our study supports the application of low frequency variant collapsing methods to genome-wide SNP datasets as a means of exploiting data that are routinely ignored.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-010-0889-1) contains supplementary material, which is available to authorized users.
We analyzed a case-control data set for chromosome 18q from the Genetic Analysis Workshop 15 to detect susceptibility loci for rheumatoid arthritis (RA). A total number of 460 cases and 460 unaffected controls were genotyped on 2300 single-nucleotide polymorphisms (SNPs) by the North American Rheumatoid Arthritis Consortium. Using a multimarker approach for association mapping under the framework of the Malecot model and composite likelihood, we identified a region showing significant association with RA (p < 0.002) and the predicted disease locus was at a genomic location of 53,306 kb with a 95% confidence interval (CI) of 53,295–53,331 kb. A common haplotype in this region was protective against RA (p = 0.002). In another region showing nominal significant association (51,585 kb, 95% CI: 51,541–51,628 kb, p = 0.037), a haplotype was also protective (p = 0.002). We further demonstrated that reducing SNP density decreased power and accuracy of association mapping. SNP selection based on equal linkage disequilibrium (LD) distance generally produced higher accuracy than that based on equal kilobase distance or tagging.
After more than 200 genome-wide association studies, there have been some successful identifications of a single novel locus. Thus, the identification of single-nucleotide polymorphisms (SNP) with interaction effects is of interest. Using the Genetic Analysis Workshop 16 data from the North American Rheumatoid Arthritis Consortium, we propose an approach to screen for SNP-SNP interaction using a two-stage method and an approach for detecting gene-gene interactions using principal components. We selected a set of 17 rheumatoid arthritis candidate genes to assess both approaches. Our approach using principal components holds promise in detecting gene-gene interactions. However, further study is needed to evaluate the power and the feasibility for a whole genome-wide association analysis using the principal components approach.
Using the North American Rheumatoid Arthritis Consortium (NARAC) candidate gene and genome-wide single-nucleotide polymorphism (SNP) data sets, we applied regression methods and tree-based random forests to identify genetic associations with rheumatoid arthritis (RA) and to predict RA disease status. Several genes were consistently identified as weakly associated with RA without a significant interaction or combinatorial effect with other candidate genes. Using random forests, the tested candidate gene SNPs were not sufficient to predict RA patients and normal subjects with high accuracy. However, using the top 500 SNPs, ranked by the importance score, from the genome-wide linkage panel of 5742 SNPs, we were able to accurately predict RA patients and normal subjects with sensitivity of approximately 90% and specificity of approximately 80%, which was confirmed by five-fold cross-validation. However, in a complete training-testing framework, replication of genetic predictors was less satisfactory; thus, further evaluation of existing methodology and development of new methods are warranted.
Genome-wide association studies (GWAS) provide an increasing number of single nucleotide polymorphisms (SNPs) associated with diseases. Our aim is to exploit those closely spaced SNPs in candidate regions for a deeper analysis of association beyond single SNP analysis, combining the classical stepwise regression approach with haplotype analysis to identify risk haplotypes for complex diseases.
Our proposed multi-locus stepwise regression starts with an evaluation of all pair-wise SNP combinations and then extends each SNP combination stepwise by one SNP from the region, carrying out haplotype regression in each step. The best associated haplotype patterns are kept for the next step and must be corrected for multiple testing at the end. These haplotypes should also be replicated in an independent data set. We applied the method to a region of 259 SNPs from the epidermal differentiation complex (EDC) on chromosome 1q21 of a German GWAS using a case control set (1,914 individuals) and to 268 families with at least two affected children as replication.
A 4-SNP haplotype pattern with high statistical significance in the case control set (p = 4.13 × 10-7 after Bonferroni correction) could be identified which remained significant in the family set after Bonferroni correction (p = 0.0398). Further analysis revealed that this pattern reflects mainly the effect of the well-known FLG gene; however, a FLG-independent haplotype in case control set (OR = 1.71, 95% CI: 1.32-2.23, p = 5.6 × 10-5) and family set (OR = 1.68, 95% CI: 1.18-2.38, p = 2.19 × 10-3) could be found in addition.
Our approach is a useful tool for finding allele combinations associated with diseases beyond single SNP analysis in chromosomal candidate regions.
Genome-wide association studies (GWAS) simultaneously investigating hundreds of thousands of single nucleotide polymorphisms (SNP) have become a powerful tool in the investigation of new disease susceptibility loci. Haplotypes are sometimes thought to be superior to SNPs and are promising in genetic association analyses. The application of genome-wide haplotype analysis, however, is hindered by the complexity of haplotypes themselves and sophistication in computation. We systematically analyzed the haplotype effects for breast cancer risk among 5,761 African American women (3,016 cases and 2,745 controls) using a sliding window approach on the genome-wide scale. Three regions on chromosomes 1, 4 and 18 exhibited moderate haplotype effects. Furthermore, among 21 breast cancer susceptibility loci previously established in European populations, 10p15 and 14q24 are likely to harbor novel haplotype effects. We also proposed a heuristic of determining the significance level and the effective number of independent tests by the permutation analysis on chromosome 22 data. It suggests that the effective number was approximately half of the total (7,794 out of 15,645), thus the half number could serve as a quick reference to evaluating genome-wide significance if a similar sliding window approach of haplotype analysis is adopted in similar populations using similar genotype density.
The availability of very large number of markers by modern technology makes genome-wide association studies very popular. The usual approach is to test single-nucleotide polymorphisms (SNPs) one at a time for association with disease status. However, it may not be possible to detect marginally significant effects by single-SNP analysis. Simultaneous analysis of SNPs enables detection of even those SNPs with small effect by evaluating the collective impact of several neighboring SNPs. Also, false-positive signals may be weakened by the presence of other neighboring SNPs included in the analysis. We analyzed the North American Rheumatoid Arthritis Consortium data of Genetic Analysis Workshop 16 using HLasso, a new method for simultaneous analysis of SNPs. The simultaneous analysis approach has excellent control of type I error, and many of the previously reported results of single-SNP analyses were confirmed by this approach.
Using single-nucleotide polymorphisms (SNPs), we sought to predict classical class I and class II human leukocyte antigen (HLA) alleles, and test for their associations with rheumatoid arthritis (RA) in the North American Rheumatoid Arthritis Consortium sample of cases and controls, genotyped on the Illumina HumanHap550 BeadChip. We use publicly available databases of SNP data and HLA data to find SNPs or SNP-haplotypes to be used as surrogates for each HLA allele. To reduce the confounding effects of linkage disequilibrium with the HLA-DRB1 locus, we tested for the association conditional on the presence or absence of a shared epitope allele on the same haplotype as the target HLA allele. Using SNP surrogates, we find that components of the DQ8 serotype (DQA1*0301:DQB1*0302) are associated with RA, irrespective of the presence or absence of a shared epitope allele on their respective haplotypes. Knowledge of the haplotype structure in the HLA region is still necessary for better interpretation of the results.
Genome-wide association studies (GWAS) may benefit from utilizing haplotype information for making marker-phenotype associations. Several rationales for grouping single nucleotide polymorphisms (SNPs) into haplotype blocks exist, but any advantage may depend on such factors as genetic architecture of traits, patterns of linkage disequilibrium in the study population, and marker density. The objective of this study was to explore the utility of haplotypes for GWAS in barley (Hordeum vulgare) to offer a first detailed look at this approach for identifying agronomically important genes in crops. To accomplish this, we used genotype and phenotype data from the Barley Coordinated Agricultural Project and constructed haplotypes using three different methods. Marker-trait associations were tested by the efficient mixed-model association algorithm (EMMA). When QTL were simulated using single SNPs dropped from the marker dataset, a simple sliding window performed as well or better than single SNPs or the more sophisticated methods of blocking SNPs into haplotypes. Moreover, the haplotype analyses performed better 1) when QTL were simulated as polymorphisms that arose subsequent to marker variants, and 2) in analysis of empirical heading date data. These results demonstrate that the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers. Analysis of the empirical data also confirmed our intuition that the distribution of QTL alleles in nature is often unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. We recommend routine use of both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data.
The power of genome-wide association studies can be improved by incorporating information from previous study findings, for example, results of genome-wide linkage analyses. Weighted false-discovery rate (FDR) control can incorporate genome-wide linkage scan results into the analysis of genome-wide association data by assigning single-nucleotide polymorphism (SNP) specific weights. Stratified FDR control can also be applied by stratifying the SNPs into high and low linkage strata. We applied these two FDR control methods to the data of North American Rheumatoid Arthritis Consortium (NARAC) study and the Framingham Heart Study (FHS), combining both association and linkage analysis results. For the NARAC study, we used linkage results from a previous genome scan of rheumatoid arthritis (RA) phenotype. For the FHS study, we obtained genome-wide linkage scores from the same 550 k SNP data used for the association analyses of three lipids phenotypes (HDL, LDL, TG). We confirmed some genes previously reported for association with RA and lipid phenotypes. Stratified and weighted FDR methods appear to give improved ranks to some of the replicated SNPs for the RA data, suggesting linkage scan results could provide useful information to improve genome-wide association studies.
We performed a case-control association analysis of rheumatoid arthritis (RA) for several candidate genes using the North American Rheumatoid Arthritis Consortium (NARAC) data provided in Genetic Analysis Workshop 15. We conducted the case-control association analysis using all related cases and unrelated controls and compared the results with those from the analysis of samples using only one randomly selected case from each family and all unrelated controls. For both analyses we used a weighted composite likelihood ratio test based on single-nucleotide polymorphism (SNP) markers or haplotypes accounting for the correlation among samples within a family. Several SNPs, including R620W in the candidate gene PTPN22, showed an association with RA status, which confirmed previously reported results. Several other SNPs in the candidate genes, such as CTLA4, HAVCR1, and SUMO4, also had rather small p-values (<0.05), suggesting the associations between them and RA. Our results showed that the p-values obtained from the analysis including all related cases were generally smaller than those obtained from the analysis including only one randomly selected case per family. These results, together with the results, based on simulated data, showed that higher power could be achieved using all related cases.
Recent genome-wide association studies on several complex diseases have focused on individual single-nucleotide polymorphism (SNP) analysis; however, not many studies have reported interactions among genes perhaps because the gene-gene and gene-environment interaction analysis could be infeasible due to heavy computing requirements. In this study we propose a new strategy for exploring the interactions among haplotypes. The proposed method consists of two steps. Step 1 tests the single-SNP association of whole genome with multiple testing corrections and finds the haplotype blocks of the significant SNPs. Step 2 performs interaction analysis of haplotypes within blocks. Our proposed method is applied to the rheumatoid arthritis data for Genetic Analysis Workshop 16.
Since more than a million single-nucleotide polymorphisms (SNPs) are analyzed in any given genome-wide association study (GWAS), performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required.
We developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap.
ParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program.
ParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address:
Most genetic association studies only genotype a small proportion of cataloged single-nucleotide polymorphisms (SNPs) in regions of interest. With the catalogs of high-density SNP data available (e.g., HapMap) to researchers today, it has become possible to impute genotypes at untyped SNPs. This in turn allows us to test those untyped SNPs, the motivation being to increase power in association studies. Several imputation methods and corresponding software packages have been developed for this purpose. The objective of our study is to apply three widely used imputation methods and corresponding software packages to a data from a genome-wide association study of rheumatoid arthritis from the North American Rheumatoid Arthritis Consortium in Genetic Analysis Workshop 16, to compare the performances of the three methods, to evaluate their strengths and weaknesses, and to identify additional susceptibility loci underlying rheumatoid arthritis. The software packages used in this paper included a program for Bayesian imputation-based association mapping (BIMBAM), a program for imputing unobserved genotypes in case-control association studies (IMPUTE), and a program for testing untyped alleles (TUNA). We found some untyped SNP that showed significant association with rheumatoid arthritis. Among them, a few of these were not located near any typed SNP that was found to be significant and thus may be worth further investigation.
We performed a genome-wide association scan on the North American Rheumatoid Arthritis Consortium (NARAC) data using Hotelling's T2 tests, i.e., TH based on allele coding and TG based on genotype coding. The objective was to identify associations between single-nucleotide polymorphisms (SNPs) or markers and rheumatoid arthritis. In specific candidate gene regions, we evaluated the performance of Hotelling's T2 tests. Then Hotelling's T2 tests were used as a tool to identify new regions that contain SNPs showing strong associations with disease. As expected, the strongest association evidence was found in the region of the HLA-DRB1 locus on chromosome 6. In the region of the TRAF1-C5 genes, we identified two SNPs, rs2900180 and rs3761847, with the largest and the second largest TH and TG scores among all SNPs on chromosome 9. We also identified one SNP, rs2476601, in the region of the PTPN22 gene that had the largest TH score and the second largest TG score among all SNPs on chromosome 1. In addition, SNPs with the largest TH score on each chromosome were identified. These SNPs may be located in the regions of genes that have modest effects on rheumatoid arthritis. These regions deserve further investigation.
Genome-wide association studies often involve testing hundreds of thousands of single-nucleotide polymorphisms (SNPs). These tests may be highly correlated because of linkage disequilibrium among SNPs. Multiple testing correction ignoring the correlation among markers, as is done in the Bonferroni procedure, can cause loss of power. Several multiple testing adjustment methods accounting for correlations among tests have been developed and have shown improved power compared to the Bonferroni procedure. These methods include a Monte Carlo (MC) method and a method of computing p-values adjusted for correlated tests. The objective of this study is to apply these two multiple testing methods to genome-wide association study of the Genetic Analysis Workshop 16 rheumatoid arthritis data from the North American Rheumatoid Arthritis Consortium, to compare the performance of these two methods to the Bonferroni procedure in identifying susceptibility loci underlying rheumatoid arthritis, and to discuss the strengths and weaknesses of these methods. The results show that both the MC method and p-values adjusted for correlated tests method identified more significant SNPs, thus potentially have higher power than the corresponding Bonferroni methods using the same test statistics as in the MC method and p-values adjusted for correlated tests, respectively. Simulation studies demonstrate that the MC method may have slightly higher power than the p-values adjusted for correlated tests method.
Recently, several genome-wide association studies (GWASs) have led to the discovery of nine new loci of genetic susceptibility in Alzheimer's disease (AD). However, the landscape of the AD genetic susceptibility is far away to be complete and in addition to single-SNP (single-nucleotide polymorphism) analyses as performed in conventional GWAS, complementary strategies need to be applied to overcome limitations inherent to this type of approaches. We performed a genome-wide haplotype association (GWHA) study in the EADI1 study (n=2025 AD cases and 5328 controls) by applying a sliding-windows approach. After exclusion of loci already known to be involved in AD (APOE, BIN1 and CR1), 91 regions with suggestive haplotype effects were identified. In a second step, we attempted to replicate the best suggestive haplotype associations in the GERAD1 consortium (2820 AD cases and 6356 controls) and observed that 9 of them showed nominal association. In a third step, we tested relevant haplotype associations in a combined analysis of five additional case–control studies (5093 AD cases and 4061 controls). We consistently replicated the association of a haplotype within FRMD4A on Chr.10p13 in all the data set analyzed (OR: 1.68; 95% CI: (1.43–1.96); P=1.1 × 10−10). We finally searched for association between SNPs within the FRMD4A locus and Aβ plasma concentrations in three independent non-demented populations (n=2579). We reported that polymorphisms were associated with plasma Aβ42/Aβ40 ratio (best signal, P=5.4 × 10−7). In conclusion, combining both GWHA study and a conservative three-stage replication approach, we characterised FRMD4A as a new genetic risk factor of AD.
Alzheimer; amyloid; FRMD4A; GWAS; plasma
We present computationally simple association tests based on haplotype sharing that can be easily applied to genome-wide association studies, while allowing use of fast (but not likelihood-based) haplotyping algorithms, and properly accounting for the uncertainty introduced by using inferred haplotypes. We also give haplotype sharing analyses that adjust for population stratification. We apply our methods to a genome-wide association study of rheumatoid arthritis available as Problem 1 of Genetic Analysis Workshop 16. In addition to the HLA region on chromosome 6, we find genome-wide significant signals at 7q33 and 13q31.3. These regions contain genes with interesting potential connections with rheumatoid arthritis and are not identified using single single-nucleotide polymorphism methods.
Rheumatoid arthritis has a complex mode of inheritance. Although HLA-DRB1 and PTPN22 are well-established susceptibility loci, other genes that confer a modest level of risk have been identified recently. We carried out a genomewide association analysis to identify additional genetic loci associated with an increased risk of rheumatoid arthritis.
We genotyped 317,503 single-nucleotide polymorphisms (SNPs) in a combined case-control study of 1522 case subjects with rheumatoid arthritis and 1850 matched control subjects. The patients were seropositive for autoantibodies against cyclic citrullinated peptide (CCP). We obtained samples from two data sets, the North American Rheumatoid Arthritis Consortium (NARAC) and the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA). Results from NARAC and EIRA for 297,086 SNPs that passed quality-control filters were combined with the use of Cochran-Mantel-Haenszel stratified analysis. SNPs showing a significant association with disease (P<1×10-8) were genotyped in an independent set of case subjects with anti-CCP-positive rheumatoid arthritis (485 from NARAC and 512 from EIRA) and in control subjects (1282 from NARAC and 495 from EIRA).
We observed associations between disease and variants in the major-histocompatibility-complex locus, in PTPN22, and in a SNP (rs3761847) on chromosome 9 for all samples tested, the latter with an odds ratio of 1.32 (95% confidence interval, 1.23 to 1.42; P = 4×10-14). The SNP is in linkage disequilibrium with two genes relevant to chronic inflammation: TRAF1 (encoding tumor necrosis factor receptor-associated factor 1) and C5 (encoding complement component 5).
A common genetic variant at the TRAF1-C5 locus on chromosome 9 is associated with an increased risk of anti-CCP-positive rheumatoid arthritis.
Random forest (RF) analysis of genetic data does not require specification of the mode of inheritance, and provides measures of variable importance that incorporate interaction effects. In this paper we describe RF-based approaches for assessment of gene and haplotype importance, and apply these approaches to a subset of the North American Rheumatoid Arthritis Consortium case-control data provided by Genetic Analysis Workshop 16. The RF analyses of 37 genes identified many of the same genes as logistic regression, but also suggested importance of certain single-nucleotide polymorphism and genes that were not ranked highly by logistic regression. A new permutation method did not reveal strong evidence of gene-gene interaction effects in these data. Although RFs are a promising approach for genetic data analysis, extensions beyond simple single-nucleotide polymorphism analyses and modifications to improve computational feasibility are needed.
Using the North American Rheumatoid Arthritis Consortium genome-wide association dataset, we applied ridged, multiple least-squares regression to identify genetic variants with apparent unique contributions to variation of anti-cyclic citrullinated peptide (anti-CCP), a newly identified clinical risk factor for development of rheumatoid arthritis. Within a 2.7-Mbp region on chromosome 6 around the well studied HLA-DRB1 locus, ridge regression identified a single-nucleotide polymorphism that was associated with anti-CCP variation when including the additive effects of other single-nucleotide polymorphisms in a multivariable analysis, but that showed only a weak direct association with anti-CCP. This suggests that multivariable methods can be used to identify potentially relevant genetic variants in regions of interest that would be difficult to detect based on direct associations.
We propose to use the rough set theory to identify genes affecting rheumatoid arthritis risk from the data collected by the North American Rheumatoid Arthritis Consortium. For each gene, we employ generalized dynamic reducts in the rough set theory to select a subset of single-nucleotide polymorphisms (SNPs) to represent the genetic information from this gene. We then group the study subjects into different clusters based on their genotype similarity at the selected markers. Statistical association between disease status and cluster membership is then studied to identify genes associated with rheumatoid arthritis. Based on our proposed approach, we are able to identify a number of statistically significant genes associated with rheumatoid arthritis. Aside from genes on chromosome 6, our identified genes include known disease-associated genes such as PTPN22 and TRAF1. In addition, our list contains other biologically plausible genes, such as ADAM15 and AGPAT2. Our findings suggest that ADAM15 and AGPAT2 may contribute to a genetic predisposition through abnormal angiogenesis and adipose tissue.
The genome-wide association study (GWAS) has become a routine approach for mapping disease risk loci with the advent of large-scale genotyping technologies. Multi-allelic haplotype markers can provide superior power compared with single-SNP markers in mapping disease loci. However, the application of haplotype-based analysis to GWAS is usually bottlenecked by prohibitive time cost for haplotype inference, also known as phasing. In this study, we developed an efficient approach to haplotype-based analysis in GWAS. By using a reference panel, our method accelerated the phasing process and reduced the potential bias generated by unrealistic assumptions in phasing process. The haplotype-based approach delivers great power and no type I error inflation for association studies. With only a medium-size reference panel, phasing error in our method is comparable to the genotyping error afforded by commercial genotyping solutions.
Reconstruction of haplotypes, or the allelic phase, of single nucleotide polymorphisms (SNPs) is a key component of studies aimed at the identification and dissection of genetic factors involved in complex genetic traits. In humans, this often involves investigation of SNPs in case/control or other cohorts in which the haplotypes can only be partially inferred from genotypes by statistical approaches with resulting loss of power. Moreover, alternative statistical methodologies can lead to different evaluations of the most probable haplotypes present, and different haplotype frequency estimates when data are ambiguous. Given the cost and complexity of SNP studies, a robust and easy-to-use molecular technique that allows haplotypes to be determined directly from individual DNA samples would have wide applicability. Here, we present a reliable, automated and high-throughput method for molecular haplotyping in 2 kb, and potentially longer, sequence segments that is based on the physical determination of the phase of SNP alleles on either of the individual paternal haploids. We demonstrate that molecular haplotyping with this technique is not more complicated than SNP genotyping when implemented by matrix-assisted laser desorption/ionisation mass spectrometry, and we also show that the method can be applied using other DNA variation detection platforms. Molecular haplotyping is illustrated on the well-described β2-adrenergic receptor gene.