Susceptibility to primary biliary cirrhosis (PBC) is strongly associated with HLA region polymorphisms. To determine if associations can be explained by classical HLA determinants we studied Italian 676 cases and 1440 controls with genotyped with dense single nucleotide polymorphisms (SNPs) for which classical HLA alleles and amino acids were imputed. Although previous genome-wide association studies and our results show stronger SNP associations near DQB1, we demonstrate that the HLA signals can be attributed to classical DRB1 and DPB1 genes. Strong support for the predominant role of DRB1 is provided by our conditional analyses. We also demonstrate an independent association of DPB1. Specific HLA-DRB1 genes (*08, *11 and *14) account for most of the DRB1 association signal. Consistent with previous studies, DRB1*08 (p = 1.59 × 10−11) was the strongest predisposing allele where as DRB1*11 (p = 1.42 × 10−10) was protective. Additionally DRB1*14 and the DPB1 association (DPB1*03:01) (p = 9.18 × 10−7) were predisposing risk alleles. No signal was observed in the HLA class 1 or class 3 regions. These findings better define the association of PBC with HLA and specifically support the role of classical HLA-DRB1 and DPB1 genes and alleles in susceptibility to PBC.
genetic risk; risk allele; imputation; antigen binding pocket; autoimmune disease
Admixture mapping is a rapidly developing method to map susceptibility alleles in complex genetic disease associated with continental ancestry. Theoretically, when admixture between continental populations has occurred relatively recently, the chromosomal segments derived from the parental populations can be deduced from the differences in genotype allele frequencies. Progress in computational algorithms, in identification of ancestry informative single nucleotide polymorphisms, and in recent studies applying these tools suggests that this approach will complement other strategies for identifying the variation that underlies many complex diseases.
Studies in European and East Asian populations have identified lung cancer susceptibility loci in nicotinic acetylcholine receptor (nAChR) genes on chromosome 15q25.1 which also appear to influence smoking behaviors. We sought to determine if genetic variation in nAChR genes influences lung cancer susceptibly in African-Americans, and evaluated the association of these cancer susceptibility loci with smoking behavior. A total of 1308 African-Americans with lung cancer and 1241 African-American controls from three centers were genotyped for 378 single nucleotide polymorphisms (SNPs) spanning the sixteen human nAChR genes. Associations between SNPs and the risk of lung cancer were estimated using logistic regression, adjusted for relevant covariates. Seven SNPs in three nAChR genes were significantly associated with lung cancer at a strict Bonferroni-corrected level, including a novel association on chromosome 2 near the promoter of CHRNA1 (rs3755486: OR = 1.40, 95% CI = 1.18-1.67, P = 1.0 × 10−4). Association analysis of an additional 305 imputed SNPs on 2q31.1 supported this association. Publicly available expression data demonstrated that the rs3755486 risk allele correlates with increased CHRNA1 gene expression. Additional SNP associations were observed on 15q25.1 in genes previously associated with lung cancer, including a missense variant in CHRNA5 (rs16969968: OR = 1.60, 95% CI = 1.27-2.01, P = 5.9 × 10−5). Risk alleles on 15q25.1 also correlated with an increased number of cigarettes smoked per day among the controls. These findings identify a novel lung cancer risk locus on 2q31.1 which correlates with CHRNA1 expression and replicate previous associations on 15q25.1 in African-Americans.
Lung cancer; nicotine dependence; African-Americans; genetic association; smoking
The major histocompatibility complex (MHC) class II transactivator gene (CIITA) encodes an important transcription factor regulating genes required for human leukocyte antigen (HLA) class II MHC-restricted antigen presentation. Major histocompatibility complex (MHC) genes, particularly HLA class II, are strongly associated with risk of developing rheumatoid arthritis (RA). Given the strong biological relationship between CIITA and HLA class II genes, a comprehensive investigation of CIITA variation in RA was conducted. This study tested 31 CIITA SNPs in 2542 RA cases and 3690 controls (N = 6232). All individuals were of European ancestry, as determined by ancestry informative genetic markers. No evidence for association between CIITA variation and RA was observed after a correction for multiple testing was applied. This is the largest study to fully characterize common genetic variation in CIITA, including an assessment of haplotypes. Results exclude even a modest role for common CIITA polymorphisms in susceptibility to RA.
rheumatoid arthritis; autoimmunity; CIITA; MHC2TA
The major histocompatibility complex (MHC) class II transactivator gene (CIITA) encodes an important transcription factor required for HLA class II MHC-restricted antigen presentation. MHC genes, including the HLA class II DRB1*03:01 allele, are strongly associated with systemic lupus erythematosus (SLE). Recently the rs4774 CIITA missense variant (+1632G/C) was reported to be associated with susceptibility to multiple sclerosis. In the current study, we investigated CIITA, DRB1*03:01 and risk of SLE using a multi-stage analysis. In stage 1, 9 CIITA variants were tested in 658 cases and 1,363 controls (N = 2,021). In stage 2, rs4774 was tested in 684 cases and 2,938 controls (N = 3,622). We also performed a meta-analysis of the pooled 1,342 cases and 4,301 controls (N = 5,643). In stage 1, rs4774*C was associated with SLE (odds ratio [OR] = 1.24, 95% confidence interval [95% CI] = 1.07–1.44, P = 4.2 × 10−3). Similar results were observed in stage 2 (OR = 1.16, 95% CI = 1.02–1.33, P = 8.5×10−3) and the meta-analysis of the combined dataset (OR = 1.20, 95% CI = 1.09–1.33, Pmeta = 2.5×10−4). In all three analyses, the strongest evidence for association between rs4774*C and SLE was present in individuals who carried at least one copy of DRB1*03:01 (Pmeta= 1.9×10−3). Results support a role for CIITA in SLE, which appears to be stronger in the presence of DRB1*03:01.
systemic lupus erythematosus; autoimmunity; major histocompatibility complex; HLA; CIITA; MHC2TA
Admixed populations such as African Americans and Hispanic Americans are often medically underserved and bear a disproportionately high burden of disease. Owing to the diversity of their genomes, these populations have both advantages and disadvantages for genetic studies of complex phenotypes. Advances in statistical methodologies that can infer genetic contributions from ancestral populations may yield new insights into the aetiology of disease and may contribute to the applicability of genomic medicine to these admixed population groups.
Several genome-wide association studies identified the chr15q25.1 region, which includes three nicotinic cholinergic receptor genes (CHRNA5-B4) and the cell proliferation gene (PSMA4), for its association with lung cancer risk in Caucasians. A haplotype and its tagging single nucleotide polymorphisms (SNPs) encompassing six genes from IREB2 to CHRNB4 were most strongly associated with lung cancer risk (OR = 1.3; P < 10−20). In order to narrow the region of association and identify potential causal variations, we performed a fine-mapping study using 77 SNPs in a 194 kb segment of the 15q25.1 region in a sample of 448 African-American lung cancer cases and 611 controls. Four regions, two SNPs and two distinct haplotypes from sliding window analyses, were associated with lung cancer. CHRNA5 rs17486278 G had OR = 1.28, 95% CI 1.07–1.54 and P = 0.008, whereas CHRNB4 rs7178270 G had OR = 0.78, 95% CI 0.66–0.94 and P = 0.008 for lung cancer risk. Lung cancer associations remained significant after pack-year adjustment. Rs7178270 decreased lung cancer risk in women but not in men; gender interaction P = 0.009. For two SNPs (rs7168796 A/G and rs7164594 A/G) upstream of PSMA4, lung cancer risks for people with haplotypes GG and AA were reduced compared with those with AG (OR = 0.56, 95% CI 0.38–0.82; P = 0.003 and OR = 0.73, 95% CI 0.59–0.90, P = 0.004, respectively). A four-SNP haplotype spanning CHRNA5 (rs11637635 C, rs17408276 T, rs16969968 G) and CHRNA3 (rs578776 G) was associated with increased lung cancer risk (P = 0.002). The identified regions contain SNPs predicted to affect gene regulation. There are multiple lung cancer risk loci in the 15q25.1 region in African-Americans.
Argentine population genetic structure was examined using a set of 78 ancestry informative markers (AIMs) to assess the contributions of European, Amerindian, and African ancestry in 94 individuals members of this population. Using the Bayesian clustering algorithm STRUCTURE, the mean European contribution was 78%, the Amerindian contribution was 19.4%, and the African contribution was 2.5%. Similar results were found using weighted least mean square method: European, 80.2%; Amerindian, 18.1%; and African, 1.7%. Consistent with previous studies the current results showed very few individuals (four of 94) with greater than 10% African admixture. Notably, when individual admixture was examined, the Amerindian and European admixture showed a very large variance and individual Amerindian contribution ranged from 1.5 to 84.5% in the 94 individual Argentine subjects. These results indicate that admixture must be considered when clinical epidemiology or case control genetic analyses are studied in this population. Moreover, the current study provides a set of informative SNPs that can be used to ascertain or control for this potentially hidden stratification. In addition, the large variance in admixture proportions in individual Argentine subjects shown by this study suggests that this population is appropriate for future admixture mapping studies.
ancestry informative markers; admixture; population stratification
Reports show higher prevalence of albuminuria among Hispanics compared to whites. Differences by country of origin or genetic background are unknown.
Methods and Results
In MESA, we studied the associations of both genetic ancestry and country of origin with albumin to creatinine ratio among 1,417 Hispanic vs. White participants using multivariable linear regression and back transforming beta-coefficients into relative difference (%RD, 95%CI). Percentage European, Native American and African ancestry components for Hispanics were estimated using genetic admixture analysis.
The proportions of European, Native American and African genetic ancestry differed significantly by country of origin (p-value<0.0001); Mexican/Central Americans had the highest Native American (41±13%), Puerto Ricans had the highest European (61±15 %), and Dominicans had the highest African (39±21%) ancestry. Hispanic ethnicity was associated with higher albumin/creatinine ratio compared to whites, but the association varied country of origin (adjusted p interaction=0.04). Mexican/Central Americans and Dominicans had higher albumin/creatinine ratio compared to whites after adjustment (RD 19%, 2-40% and (RD 27%, 1-61%), but not Puerto Ricans (RD 8%, −12-34%). Higher Native American ancestry was associated with higher albuminuria after age and sex adjustment among all Hispanics (RD 11%, 1-21%), but was attenuated after further adjustment. Higher European ancestry was independently associated with lower albumin/creatinine ratio among Puerto Ricans (−21%, −34 to −6), but not among Mexican/Central Americans and Dominicans.
Hispanics are a heterogeneous group with varying genetic ancestry. Risks of albuminuria differ across country of origin groups. These differences may be due, in part, to differences in genetic ancestral components.
genetics; kidney; albuminuria; ancestry
To provide a resource for assessing continental ancestry in a wide variety of genetic studies we identified, validated and characterized a set of 128 ancestry informative markers (AIMs). The markers were chosen for informativeness, genome-wide distribution, and genotype reproducibility on two platforms (TaqMan® assays and Illumina arrays). We analyzed genotyping data from 825 subjects with diverse ancestry, including European, East Asian, Amerindian, African, South Asian, Mexican, and Puerto Rican. A comprehensive set of 128 AIMs and subsets as small as 24 AIMs are shown to be useful tools for ascertaining the origin of subjects from particular continents, and to correct for population stratification in admixed population sample sets. Our findings provide general guidelines for the application of specific AIM subsets as a resource for wide application. We conclude that investigators can use TaqMan assays for the selected AIMs as a simple and cost efficient tool to control for differences in continental ancestry when conducting association studies in ethnically diverse populations.
population structure; continental ancestry; population stratification; ancestry informative markers
Prior studies of lung cancer and CYP1A1/2 in African American and Latino populations have shown inconsistent results and not yet investigated the haplotype block structure of CYP1A1/2 or addressed potential population stratification. To investigate haplotypes in the CYP1A1/2 region and lung cancer in African Americans and Latinos, we conducted a case-control study (1998-2003). African Americans (N = 535) and Latinos (N = 412) were frequency-matched on age, sex, and self-reported race/ethnicity. We used a custom genotyping panel containing 50 single nucleotide polymorphisms in the CYP1A1/2 region and 184 ancestry informative markers selected to have large allele frequency differences between Africans, Europeans, and Amerindians. Latinos exhibited significant haplotype main effects in two blocks, even after adjusting for admixture (odds ratio (OR) = 2.02, 95% confidence interval (CI): 1.28 – 3.19 and OR = 0.55, 95% CI: 0.36 – 0.83) but no main effects were found among African Americans. Adjustment for admixture revealed substantial confounding by population stratification among Latinos but not African Americans. Among Latinos and African Americans interactions between smoking level and haplotypes were not statistically significant. Evidence of population stratification among Latinos underscores the importance of adjusting for admixture in lung cancer association studies, particularly in Latino populations. These results suggest a variant occurring within the CYP1A2 region may be conferring an increased risk of lung cancer in Latinos.
lung cancer; haplotype; CYP1A1/2; population stratification; admixed
Differences in cardiovascular disease (CVD) burden exist among racial/ethnic groups in the United States, with African Americans having the highest prevalence. Subclinical CVD measures have also been shown to differ by race/ethnicity. In the United States, there has been significant intermixing among racial/ethnic groups creating admixed populations. Very little research exists on the relationship of genetic ancestry and subclinical CVD measures.
Methods and Results
These associations were investigated in 712 African-American and 705 Hispanic participants from the MESA candidate gene sub-study. Individual ancestry was estimated from 199 genetic markers using STRUCTURE. Associations of ancestry and coronary artery calcium (CAC) and common and internal carotid intima media thickness (cIMT) were evaluated using log-binomial and linear regression models. Splines indicated linear associations of ancestry with subclinical CVD measures in African-Americans, but presence of threshold effects in Hispanics. Among African Americans, each standard deviation (SD) increase in European ancestry was associated with an 8% (95% CI (1.02, 1.15), p=0.01) greater CAC prevalence. Each SD increase in European ancestry was also associated with a 2% (95% CI (−3.4%, −0.5%), p=0.008) lower common cIMT in African Americans. Among Hispanics, the highest tertile of European ancestry was associated with a 34% greater CAC prevalence, p=0.02 as compared to lowest tertile.
The linear association of ancestry and subclinical CVD suggests that genetic effects may be important in determining CAC and cIMT among African-Americans. Our results also suggest that CAC and common cIMT may be important phenotypes for further study with admixture mapping.
atherosclerosis; calcium; ancestry; epidemiology; genetics
Genetic susceptibility to systemic lupus erythematosus (SLE) is well established, with the HLA class II DRB1 and DQB1 loci demonstrating the strongest association. However, HLA may also influence SLE through novel biologic mechanisms in addition to genetic transmission of risk alleles. Evidence for increased maternal–offspring HLA class II compatibility in SLE and differences in maternal versus paternal transmission rates (parent-of-origin effects) and nontransmission rates (noninherited maternal antigen [NIMA] effects) in other autoimmune diseases have been reported. Thus, we investigated maternal–offspring HLA compatibility, parent-of-origin effects, and NIMA effects at DRB1 in SLE.
The cohort comprised 707 SLE families and 188 independent healthy maternal–offspring pairs (total of 2,497 individuals). Family-based association tests were conducted to compare transmitted versus nontransmitted alleles (transmission disequilibrium test) and both maternally versus paternally transmitted (parent-of-origin) and nontransmitted alleles (using the chi-square test of heterogeneity). Analyses were stratified according to the sex of the offspring. Maternally affected offspring DRB1 compatibility in SLE families was compared with paternally affected offspring compatibility and with independent control maternal–offspring pairs (using Fisher’s test) and was restricted to male and nulligravid female offspring with SLE.
As expected, DRB1 was associated with SLE (P < 1 × 10−4). However, mothers of children with SLE had similar transmission and nontransmission frequencies for DRB1 alleles when compared with fathers, including those for the known SLE risk alleles HLA–DRB1*0301, *1501, and *0801. No association between maternal–offspring compatibility and SLE was observed.
Maternal–offspring HLA compatibility, parent-of-origin effects, and NIMA effects at DRB1 are unlikely to play a role in SLE.
Base excision repair (BER) is the primary DNA damage repair mechanism for repairing small base lesions resulting from oxidation and alkylation damage. This study examines the association between 24 single-nucleotide polymorphisms (SNPs) belonging to five BER genes (XRCC1, APEX1, PARP1, MUTYH and OGG1) and lung cancer among Latinos (113 cases and 299 controls) and African-Americans (255 cases and 280 controls). The goal was to evaluate the differences in genetic contribution to lung cancer risk by ethnic groups. Analyses of individual SNPs and haplotypes were performed using unconditional logistic regressions adjusted for age, sex and genetic ancestry. Four SNPs among Latinos and one SNP among African-Americans were significantly (P < 0.05) associated with either risk of all lung cancer or non-small cell lung cancer (NSCLC). However, only the association between XRCC1 Arg399Gln (rs25487) and NSCLC among Latinos (odds ratio associated with every copy of Gln = 1.52; 95% confidence interval: 1.01–2.28) had a false-positive report probability of <0.5. Arg399Gln is a SNP with some functional evidence and has been shown previously to be an important SNP associated with lung cancer, mostly for Asians. Since the analyses were adjusted for genetic ancestry, the observed association between Arg399Gln and NSCLC among Latinos is unlikely to be confounded by population stratification; however, this result needs to be confirmed by additional studies among the Latino population. This study suggests that there are genetic differences in the association between BER pathway and lung cancer between Latinos and African-Americans.
The single nucleotide polymorphism (SNP) rs11761231 on chromosome 7q has been reported as a sexually dimorphic marker for rheumatoid arthritis susceptibility in a British population. We sought to replicate this finding and better characterize susceptibility alleles in the region in a North American population.
DNA from two North American collections of RA patients and controls (1605 cases and 2640 controls) was genotyped for rs11761231 and 16 additional chromosome 7q tag SNPs using Sequenom iPlex assays. Association tests were performed for each collection and also separately contrasting male cases versus male controls and female cases versus female controls. Principal components analysis (EIGENSTRAT) was used to determine association with RA before and after adjusting for population stratification in the subset of the samples (772 cases and 1213 controls) with whole genome SNP data.
We failed to replicate association of the 7q region with rheumatoid arthritis. Initially, rs11761231 showed evidence for association with RA in the NARAC collection (p=0.0076) and rs11765576 showed association with RA in both the NARAC (p = 0.019) and RA replication (p = 0.0013) collections. These markers also exhibited sexual differentiation. However, in the whole genome subset, neither SNP showed significant association with RA after correction for population stratification.
While two SNPs on chromosome 7q appeared to be associated with RA in a North American cohort, the significance of this finding did not withstand correction for population substructure. Our results emphasize the need to carefully account for population structure to avoid false positive disease associations.
For Genetic Analysis Workshop 16 Problem 1, we provided data for genome-wide association analysis of rheumatoid arthritis. Single-nucleotide polymorphism (SNP) genotype data were provided for 868 cases and 1194 controls that had been assayed using an Illumina 550 k platform. In addition, phenotypic data were provided from genotyping DRB1 alleles, which were classified according to the rheumatoid arthritis shared epitope, levels of anti-cyclic citrullinated peptide, and levels of rheumatoid factor IgM. Several questions could be addressed using the data, including analysis of genetic associations using single SNPs or haplotypes, as well as gene-gene and genetic analysis of SNPs for qualitative and quantitative factors.
Genetic factors are critical in determining susceptibility to primary biliary cirrhosis (PBC), but there has not been a clear association with human leukocyte antigen (HLA) genes. We performed a multi-center case-control study and analyzed HLA class II DRB1 associations using a large cohort of 664 well-defined cases of PBC and 1,992 controls of Italian ancestry. Importantly, healthy controls were rigorously matched not only by age and gender, but also for the geographical origin of the proband four grandparents (Northern, Central, and Southern Italy). Following correction for multiple testing, DRB1*08 (Odds Ratio–OR, 3.3; 95% Confidence Interval–CI, 2.4−4.5) and DRB1*02 (OR 0.9; 95% CI 0.8−1.2) were significantly associated with PBC while alleles DRB1*11 (OR 0.4; 95% CI 0.3−0.4) and DRB1*13 (OR 0.7; 95% CI 0.6−0.9) were protective. When subjects were stratified according to their grandparental geographical origin, only the associations with DRB1*08 and DRB1*11 were common to all three areas. Associated DRB1 alleles were found only in a minority of patients while an additive genetic model is supported by the gene dosage effect for DRB1*11 allele and the interaction of DRB1*11,*13, and *08. Lastly, no significant associations were detected between specific DRB1 alleles and relevant clinical features represented by the presence of cirrhosis or serum autoantibodies. In conclusion, we confirm the role for HLA to determine PBC susceptibility and suggest that the effect of HLA is limited to patient subgroups. We suggest that a large whole-genome approach is required to identify further genetic elements contributing to the loss of tolerance in this disease.
autoimmune cholangitis; Major Histocompatibility Complex; genetic factors; etiopathogenesis; environmental factors
A variety of methods are available for estimating genetic admixture proportions in populations; however, few investigators have conducted detailed comparisons using empirical data. The authors characterized admixture proportions among self-identified African Americans (n = 535) and Latinos (n = 412) living in the San Francisco Bay Area who participated in a lung cancer case-control study (1998–2003). Individual estimates of genetic ancestry based on 184 informative markers were obtained from a Bayesian approach and 2 maximum likelihood approaches and were compared using descriptive statistics, Pearson correlation coefficients, and Bland-Altman plots. Case-control differences in individual admixture proportions were assessed using 2-sample t tests and logistic regression analysis. Results indicated that Bayesian and frequentist approaches to estimating admixture provide similar estimates and inferences. No difference was observed in admixture proportions between African-American cases and controls, but Latino cases and controls significantly differed according to Amerindian and European genetic ancestry. Differences in admixture proportions between Latino cases and controls were not unexpected, since cases were more likely to have been born in the United States. Genetic admixture proportions provide a quantitative measure of ancestry differences among Latinos that can be used in analyses of genetic risk factors.
African Americans; case-control studies; epidemiologic methods; genetics, population; Hispanic Americans; linkage disequilibrium; lung neoplasms; statistics
The definition of European population genetic substructure and its application to understanding complex phenotypes is becoming increasingly important. In the current study using over 4,000 subjects genotyped for 300,000 single-nucleotide polymorphisms (SNPs), we provide further insight into relationships among European population groups and identify sets of SNP ancestry informative markers (AIMs) for application in genetic studies. In general, the graphical description of these principal components analyses (PCA) of diverse European subjects showed a strong correspondence to the geographical relationships of specific countries or regions of origin. Clearer separation of different ethnic and regional populations was observed when northern and southern European groups were considered separately and the PCA results were influenced by the inclusion or exclusion of different self-identified population groups including Ashkenazi Jewish, Sardinian, and Orcadian ethnic groups. SNP AIM sets were identified that could distinguish the regional and ethnic population groups. Moreover, the studies demonstrated that most allele frequency differences between different European groups could be controlled effectively in analyses using these AIM sets. The European substructure AIMs should be widely applicable to ongoing studies to confirm and delineate specific disease susceptibility candidate regions without the necessity of performing additional genome-wide SNP studies in additional subject sets.
Few studies on the association between nucleotide excision repair (NER) variants and lung cancer risk have included Latinos and African Americans. We examine variants in six NER genes (ERCC2, ERCC4, ERCC5, LIG1, RAD23B and XPC) in association with primary lung cancer risk among 113 Latino and 255 African American subjects newly diagnosed with primary lung cancer from 1998 to 2003 in the San Francisco Bay Area, and 579 healthy controls (299 Latinos and 280 African Americans). Individual single nucleotide polymorphism and haplotype analyses, multifactor dimensionality reduction, and principal components analysis were performed to assess the association between six genes in the NER pathway and lung cancer risk. Among Latinos, ERCC2 haplotype CGA (rs238406, rs11878644, rs6966) was associated with reduced lung cancer risk [odds ratio (OR) of 0.65 and 95% confidence interval (CI): 0.44-0.97], especially among non-smokers (OR=0.29; 95% CI: 0.12-0.67). From multifactor dimensionality reduction analysis, in Latinos, smoking and three SNPs (ERCC2 rs171140, ERCC5 rs17655, and LIG1 rs20581) together had a prediction accuracy of 67.4% (p=0.001) for lung cancer. Among African Americans, His/His genotype of ERCC5 His1104Asp (rs17655) was associated with increased lung cancer risk (OR=1.78; 95% CI: 1.09-2.91), and LIG1 haplotype GGGAA (rs20581, rs156641, rs3730931, rs20579, and rs439132) was associated with reduced lung cancer risk (OR=0.61; 95% CI: 0.42-0.88). Our study suggests different elements of the NER pathway may be important in the different ethnic groups resulting either from different linkage relationship, genetic backgrounds, and/or exposure histories.
nucleotide excision repair; DNA repair; lung cancer; African Americans; Latinos
Accounting for the genetic substructure of human populations has become a major practical issue for studying complex genetic disorders. Allele frequency differences among ethnic groups and subgroups and admixture between different ethnic groups can result in frequent false-positive results or reduced power in genetic studies. Here, we review the problems and progress in defining population differences and the application of statistical methods to improve association studies. It is now possible to take into account the confounding effects of population stratification using thousands of unselected genome-wide single-nucleotide polymorphisms or, alternatively, selected panels of ancestry informative markers. These methods do not require any demographic information and therefore can be widely applied to genotypes available from multiple sources. We further suggest that it will be important to explore results in homogeneous population subsets as we seek to define the extent to which genomic variation influences complex phenotypes.
Accounting for population genetic substructure is important in reducing type 1 errors in genetic studies of complex disease. As efforts to understand complex genetic disease are expanded to different continental populations the understanding of genetic substructure within these continents will be useful in design and execution of association tests. In this study, population differentiation (Fst) and Principal Components Analyses (PCA) are examined using >200 K genotypes from multiple populations of East Asian ancestry. The population groups included those from the Human Genome Diversity Panel [Cambodian, Yi, Daur, Mongolian, Lahu, Dai, Hezhen, Miaozu, Naxi, Oroqen, She, Tu, Tujia, Naxi, Xibo, and Yakut], HapMap [ Han Chinese (CHB) and Japanese (JPT)], and East Asian or East Asian American subjects of Vietnamese, Korean, Filipino and Chinese ancestry. Paired Fst (Wei and Cockerham) showed close relationships between CHB and several large East Asian population groups (CHB/Korean, 0.0019; CHB/JPT, 00651; CHB/Vietnamese, 0.0065) with larger separation with Filipino (CHB/Filipino, 0.014). Low levels of differentiation were also observed between Dai and Vietnamese (0.0045) and between Vietnamese and Cambodian (0.0062). Similarly, small Fst's were observed among different presumed Han Chinese populations originating in different regions of mainland of China and Taiwan (Fst's <0.0025 with CHB). For PCA, the first two PC's showed a pattern of relationships that closely followed the geographic distribution of the different East Asian populations. PCA showed substructure both between different East Asian groups and within the Han Chinese population. These studies have also identified a subset of East Asian substructure ancestry informative markers (EASTASAIMS) that may be useful for future complex genetic disease association studies in reducing type 1 errors and in identifying homogeneous groups that may increase the power of such studies.
Recent evidence suggests that additional risk loci for RA are present in the major histocompatibility complex (MHC), independent of the class II HLA-DRB1 locus. We have now tested a total of 1,769 SNPs across 7.5Mb of the MHC located from 6p22.2 (26.03 Mb) to 6p21.32 (33.59 Mb) derived from the Illumina 550K Beadchip (Illumina, San Diego, CA, USA). For an initial analysis in the whole dataset (869 RA CCP + cases, 1,193 controls), the strongest association signal was observed in markers near the HLA-DRB1 locus, with additional evidence for association extending out into the Class I HLA region. To avoid confounding that may arise due to linkage disequilibrium with DRB1 alleles, we analyzed a subset of the data by matching cases and controls by DRB1 genotype (both alleles matched 1:1), yielding a set of 372 cases with 372 controls. This analysis revealed the presence of at least two regions of association with RA in the Class I region, independent of DRB1 genotype. SNP alleles found on the conserved A1-B8-DR3 (8.1) haplotype show the strongest evidence of positive association (P ~ 0.00005) clustered in the region around the HLA-C locus. In addition, we identified risk alleles that are not present on the 8.1 haplotype, with maximal association signals (P ~ 0.001–0.0027) located near the ZNF311 locus. This latter association is enriched in DRB1*0404 individuals. Finally, several additional association signals were found in the extreme centromeric portion of the MHC, in regions containing the DOB1, TAP2, DPB1, and COL11A2 genes. These data emphasize that further analysis of the MHC is likely to reveal genetic risk factors for rheumatoid arthritis that are independent of the DRB1 shared epitope alleles.