The haplotype map constructed by the HapMap Project is a valuable resource in the genetic studies of disease genes, population structure, and evolution. In the Project, Caucasian and African haplotypes are fairly accurately inferred, based mainly on the rules of Mendelian inheritance using the genotypes of trios. However, the Asian haplotypes are inferred from the genotypes of unrelated individuals based on population genetics, and are less accurate. Thus, the effects of this inaccuracy on downstream analyses needs to be assessed. We determined true Japanese haplotypes by genotyping 100 complete hydatidiform moles (CHM), each carrying a genome derived from a single sperm, using Affymetrix 500 K Arrays. We then assessed how inferred haplotypes can differ from true haplotypes, by phasing pseudo-individualized true haplotypes using the programs PHASE, fastPHASE, and Beagle. We found that, at various genomic regions, especially the MHC locus, the expansion of extended haplotype homozygosity (EHH), which is a measure of positive selection, is obscured when inferred Asian haplotype data is used to detect the expansion. We then mapped the genome using a new statistic, XDiHH, which directly detects the difference between the true and inferred haplotypes, in the determination of EHH expansion. We also show that the true haplotype data presented here is useful to assess and improve the accuracy of phasing of Asian genotypes.
Precise haplotype maps are preferred for the performance of a variety of genetic studies including identification of disease-associated loci and dissection of evolutionary mechanisms such as selection and recombination. For diploid organisms, the haplotype information appears as the genotypes when we obtain the information using widely used high-throughput techniques. The process of extracting haplotype information from genotypes is called phasing, which can be accurately done if the genotypes are from related individuals, such as parent–child trios, by considering the constraints imposed by the rules of Mendelian inheritance. For the genotype data without family information, phasing is done by one of the methods that are based on haplotype clustering, and the inferred haplotypes are known to be less accurate. Here, we experimentally determined genome-wide definitive haplotypes using a collection of Japanese complete hydatidiform moles (CHM), each of which carries a genome derived from a single sperm. Using these resources, we asked if the definitive haplotype data can detect long-distance information that has been obscured when we rely solely on the haplotypes inferred by clustering. We also show that by introducing definitive haplotypes as references, inference of haplotypes of unrelated individuals is significantly improved.