Our study represents the first genome-wide homozygosity association scans for RA; we pinpointed important RA-associated ROHs in the MHC region and confirmed this region to be associated with RA
[64],
[65]. For the two ROHs, the window with the best prediction accuracy 62% is anchored by the SNP rs2027852. We validated the results derived from the WTCCC data by using the independently acquired NARAC data (
Figure S2). Homozygosity disequilibrium was consistently found in the MHC region, for which the respective maximum values of −log
10(p) for NARAC_100 (
W
=

100) and NARAC_200 (
W
=

200) are −log
10(p)

=

7.6973 and −log
10(p)

=

7.1334, respectively, which are highly significant values.
The SNP rs2027852 is flanked by
HLA-DRB6 and
HLA-DRB1. The
HLA-DRB1-shared epitope is an important determinant of RA susceptibility
[10]. Associations between
HLA-DRB1 and RA susceptibility
[10],
[11],
[12],
[13],
[66],
[67] and between
HLA-DRB1 and the severity of RA
[68],
[69] have been made. In addition to
HLA-DRB1, a second relevant ROH includes
HLA-DPA1 and
HLA-DPB1. Previous studies produced inconclusive results concerning the relationship between RA and
HLA-DPA1 and
HLA-DPB1
[70],
[71],
[72]. Despite the evidence of statistical significance supported by this study, more functional studies are necessary to re-confirm the genetic associations with RA.
We found that the observed homozygosity disequilibrium in the MHC region is not explained by mechanisms associated with hemizygous deletion because our copy number analysis found only a very small proportion of the samples had acquired DNA deletions in the MHC region (). The RA-related ROHs probably were not generated from copy-neutral chromosomal aberrations, e.g., uniparental disomy and loss of heterozygosity, because such chromosomal abnormalities often result in severe inherited disorders and cancers, which the patients of the study did not have. Inbreeding, as the cause of the homozygosity disequilibrium, also seems unlikely as the patients were not an inbred population(s).
Selective sweep, a type of natural selection, seems to be a plausible mechanism for the appearance of homozygosity disequilibrium in general population
[40]. Homozygosity disequilibrium in the MHC region, which has been shown to contain the important functional genes related to RA and other autoimmune diseases
[64],
[65],
[73],
[74], results in a loss of genetic diversities and thereby influences quantitative and/or qualitative alternations of expression profiles. Some studies have found that autoimmunity susceptibility genes are positively selected in RA
[75],
[76],
[77],
[78]. Selected alleles accumulate in the gene pool over time and consequently increase the probability of generating an ROH. Genomic regions with a small recombination fraction and a large LD tend to contain even more ROHs than do regions with large recombination fractions or a small LD; for example, the time necessary for a region to be affected by selective pressure is so short that a limited number of recombinations prevents a rapid decay of LD and thereby promotes the occurrence of ROHs
[39]. For type-1 diabetes, a relevant study has also pointed out significant SNP identity and conserved extended haplotypes in the MHC region
[44]. That and our study reinforce the idea that natural selection may be critical to maintaining functionally important genes
[79] and susceptibility to complex diseases
[80].
Our study attempted to tackle several difficulties associated with homozygosity association mapping, which is defined as the identification of ROHs associated with a given disease. However, the observed, extended homozygosity may contain a run of homozygotes, hemizygotes, or a combination of both, and the different types of runs may reflect different genetic mechanisms associated with a disease. For genotype-based homozygosity association mapping, it is difficult to distinguish the differences between true homozygosity (a homozygous run) and spurious homozygosity (a hemizygous run)
[81],
[82]. Therefore, we employed genotype-based homozygosity association scans and intensity-based copy-number characterization to discriminate between copy-neutral homozygosity and deletion-induced hemizygosity for the RA-associated ROHs. Additionally, missing genotypes or heterozygous calls that arise from genotyping errors or recent mutations may interrupt a homozygous run (imperfect ROH). The genome-wide homozygosity association mapping used in this study overcame these obstacles by imputing missing genotypes and correcting for the modest heterozygous interference with the use of a local polynomial fit
[53].
The required minimum power value and sample size for genome-wide homozygosity association mapping for complex diseases have not been explicitly determined in previous studies
[81]. Our simulations provided an objective assessment of how the values for the power and the number of samples affect the results, and the results for the simulations suggest that we used sufficient sample numbers to attain reasonable statistical power to detect RA-associated ROHs in this study. In contrast to a single-SNP recessive model, the homozygosity association tests provided by LOHAS and ROH programs are multilocus analysis methods. The two multilocus methods make use of genetic information from extent of homozygosity, which is a function of LD, recombination fraction, and population history
[40]. Recessive-acting disease alleles in an ROH predisposing to a disease are accumulated and made use to elevate the low power of a single-SNP analysis due to rare disease alleles at single SNPs.
Population substructure/admixture is an important confounding factor in genome-wide case-control association studies. Ignoring the difference of genetic substructure/admixture in case and control groups may lead to false-positive findings. We thus also performed genome-wide homozygosity association test with an adjustment for population substructure/admixture using principal components. We regressed the homozygosity intensity estimates from LOHAS software
[53] on case/control disease status and the first 10 principal components from EIGENSTRAT software
[83] to validate genetic association we identified in the MHC region. We found that genetic association between the identified ROHs in the MHC region and RA disease status remained very significant after taking population substructure/admixture into account (
Figure S3). The maximum −log
10(p) values for the scans were 28.4155 for WTCCC_100, 23.1904 for WTCCC_150, and 14.6061 for WTCCC_200 in the first peak region and 8.6160 for WTCCC_100, 7.5250 for WTCCC_150, and 7.4240 for WTCCC_200 in the second peak region. The results explain that our findings in the MHC region are valid and robust to population substructure/admixture.
RA-associated ROHs identified by LOHAS software was also evaluated by a second homozygosity association method. ROH program
[40], which has been integrated into HelixTree software (HelixTree, Inc.), was run to examine homozygosity association in the MHC region. Several parameter combinations for defining an ROH were considered in the analysis using ROH program. At the Bonferroni significance level, two significant RA-associated ROHs identified by LOHAS software were validated by ROH program (
Figure S4).
In conclusion, our genome-wide homozygosity association study used high-density SNP array data to provide an alternative method to an allelic association study for mapping RA-susceptibility genes. Excess ROHs were found in the MHC regions of RA patients compared with those of controls, which uncovered a recessive component and missing heritability for RA and possibly other autoimmune diseases.