Infectious and inflammatory diseases have repeatedly shown strong genetic associations within the major histocompatibility complex (MHC); however, the basis for these associations remains elusive. To define host genetic effects on the outcome of a chronic viral infection, we performed genome-wide association analysis in a multiethnic cohort of HIV-1 controllers and progressors, and we analyzed the effects of individual amino acids within the classical human leukocyte antigen (HLA) proteins. We identified >300 genome-wide significant single-nucleotide polymorphisms (SNPs) within the MHC and none elsewhere. Specific amino acids in the HLA-B peptide binding groove, as well as an independent HLA-C effect, explain the SNP associations and reconcile both protective and risk HLA alleles. These results implicate the nature of the HLA–viral peptide interaction as the major factor modulating durable control of HIV infection.
Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with ‘true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05–0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r2, increased from 0.61 to 0.71. We also saw improved imputation accuracy for other European populations (in the British samples, r2 improved from 0.58 to 0.65, and in the Italians from 0.43 to 0.47). A combined reference set comprising 1000G and GoNL improved the imputation of rare variants even further. The Italian samples benefitted the most from this combined reference (the mean r2 increased from 0.47 to 0.50). We conclude that the creation of a large population-specific reference is advantageous for imputing rare variants and that a combined reference panel across multiple populations yields the best imputation results.
genotype imputation; GWAS; GoNL; rare variants; reference sets; reference panel
Alopecia areata (AA) is a prevalent autoimmune disease with ten known susceptibility loci. Here we perform the first meta-analysis in AA by combining data from two genome-wide association studies (GWAS), and replication with supplemented ImmunoChip data for a total of 3,253 cases and 7,543 controls. The strongest region of association is the MHC, where we fine-map 4 independent effects, all implicating HLA-DR as a key etiologic driver. Outside the MHC, we identify two novel loci that exceed statistical significance, containing ACOXL/BCL2L11(BIM) (2q13); GARP (LRRC32) (11q13.5), as well as a third nominally significant region SH2B3(LNK)/ATXN2 (12q24.12). Candidate susceptibility gene expression analysis in these regions demonstrates expression in relevant immune cells and the hair follicle. We integrate our results with data from seven other autoimmune diseases and provide insight into the alignment of AA within these disorders. Our findings uncover new molecular pathways disrupted in AA, including autophagy/apoptosis, TGFß/Tregs and JAK kinase signaling, and support the causal role of aberrant immune processes in AA.
Marginal zone lymphoma (MZL) is the third most common subtype of B-cell non-Hodgkin lymphoma. Here we perform a two-stage GWAS of 1,281 MZL cases and 7,127 controls of European ancestry and identify two independent loci near BTNL2 (rs9461741, P=3.95×10−15) and HLA-B (rs2922994, P=2.43×10−9) in the HLA region significantly associated with MZL risk. This is the first evidence that genetic variation in the major histocompatibility complex influences MZL susceptibility.
Substantial clinical, pathological and genetic overlap exists between amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). TDP-43 inclusions have been found in both ALS and FTD cases (FTD-TDP). Recently, a repeat expansion in C9orf72 was identified as the causal variant in a proportion of ALS and FTD cases. We sought to identify additional evidence for a common genetic basis for the spectrum of ALS-FTD.
We used published GWAS data of 4,377 ALS patients and 13,017 controls and 435 pathology-proven FTD-TDP cases and 1,414 controls for genotype imputation. Data were analyzed in a joint meta-analysis, by replicating topmost associated hits of one disease in the other, and by using a conservative rank products analysis, allocating equal weight to ALS and FTD-TDP sample sizes.
Meta-analysis identified 19 genome-wide significant single nucleotide polymorphisms (SNPs) at C9orf72 on chromosome 9p21.2 (lowest p=2.6×10−12) and one SNP in UNC13A on chromosome 19p13.11 (p=1.0×10−11) as shared susceptibility loci for ALS and FTD-TDP. Conditioning on the 9p21.2 genotype increased statistical significance at UNC13A. A third signal, on chromosome 8q24.13 at the SPG8 locus coding for strumpellin, (p=3.91×10−7) was replicated in an independent cohort of 4,056 ALS patients and 3,958 controls (p=0.026; combined analysis p=1.01×10−7).
We identified common genetic variants at C9orf72, but in addition in UNC13A that are shared between ALS and FTD. UNC13A provides a novel link between ALS and FTD-TDP, and identifies changes in neurotransmitter release and synaptic function as a converging mechanism in the pathogenesis of ALS and FTD-TDP.
To identify genetic determinants of granulomatosis with polyangiitis (Wegener’s) (GPA).
We carried out a genome-wide association study (GWAS) of 492 GPA cases and 1,506 healthy controls (white subjects of European descent), followed by replication analysis of the most strongly associated signals in an independent cohort of 528 GPA cases and 1,228 controls.
Genome-wide significant associations were identified in 32 single-nucleotide polymorphic (SNP) markers across the HLA region, the majority of which were located in the HLA–DPB1 and HLA–DPA1 genes encoding the class II major histocompatibility complex (MHC) DPβ chain 1 and DPα chain 1 proteins, respectively. Peak association signals in these 2 genes, emanating from SNPs rs9277554 (for DPβ chain 1) and rs9277341 (DPα chain 1) were strongly replicated in an independent cohort (in the combined analysis of the initial cohort and the replication cohort, P = 1.92 × 10−50 and 2.18 × 10−39, respectively). Imputation of classic HLA alleles and conditional analyses revealed that the SNP association signal was fully accounted for by the classic HLA–DPB1*04 allele. An independent single SNP, rs26595, near SEMA6A (the gene for semaphorin 6A) on chromosome 5, was also associated with GPA, reaching genome-wide significance in a combined analysis of the GWAS and replication cohorts (P = 2.09 × 10−8).
We identified the SEMA6A and HLA–DP loci as significant contributors to risk for GPA, with the HLA–DPB1*04 allele almost completely accounting for the MHC association. These two associations confirm the critical role of immunogenetic factors in the development of GPA.
Ankylosing spondylitis (AS) is a common, highly heritable, inflammatory arthritis for which HLA-B*27 is the major genetic risk factor, although its role in the aetiology of AS remains elusive. To better understand the genetic basis of the MHC susceptibility loci, we genotyped 7,264 MHC SNPs in 22,647 AS cases and controls of European descent. We impute SNPs, classical HLA alleles and amino-acid residues within HLA proteins, and tested these for association to AS status. Here we show that in addition to effects due to HLA-B*27 alleles, several other HLA-B alleles also affect susceptibility. After controlling for the associated haplotypes in HLA-B, we observe independent associations with variants in the HLA-A, HLA-DPB1 and HLA-DRB1 loci. We also demonstrate that the ERAP1 SNP rs30187 association is not restricted only to carriers of HLA-B*27 but also found in HLA-B*40:01 carriers independently of HLA-B*27 genotype.
Ankylosing spondylitis is a common, highly inheritable inflammatory arthritis with poorly understood biology. Here Brown, Cortes and colleagues use fine mapping of the major histocompatibility complex and identify novel associations, and identify other HLA alleles that like HLA-B27 interact with ERAP1 variants to influence disease risk.
Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here we present the largest trans-ethnic genome-wide meta-analysis (GWMA) of psoriasis in 15,369 cases and 19,517 controls of Caucasian and Chinese ancestries. We identify four novel associations at LOC144817, COG6, RUNX1 and TP63, as well as three novel secondary associations within IFIH1 and IL12B. Fine-mapping analysis of MHC region demonstrates an important role for all three HLA class I genes and a complex and heterogeneous pattern of HLA associations between Caucasian and Chinese populations. Further, trans-ethnic comparison suggests population-specific effect or allelic heterogeneity for 11 loci. These population-specific effects contribute significantly to the ethnic diversity of psoriasis prevalence. This study not only provides novel biological insights into the involvement of immune and keratinocyte development mechanism, but also demonstrates a complex and heterogeneous genetic architecture of psoriasis susceptibility across ethnic populations.
Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here Yin et al. conduct a large trans-ethnic genome-wide meta-analysis and identify novel loci that contribute to population-specific susceptibility.
Patients who have suffered from cerebral ischemia have a high risk of recurrent vascular events. Predictive models based on classical risk factors typically have limited prognostic value. Given that cerebral ischemia has a heritable component, genetic information might improve performance of these risk models. Our aim was to develop and compare two models: one containing traditional vascular risk factors, the other also including genetic information.
Methods and Results
We studied 1020 patients with cerebral ischemia and genotyped them with the Illumina Immunochip. Median follow-up time was 6.5 years; the annual incidence of new ischemic events (primary outcome, n=198) was 3.0%. The prognostic model based on classical vascular risk factors had an area under the receiver operating characteristics curve (AUC-ROC) of 0.65 (95% confidence interval 0.61-0.69). When we added a genetic risk score based on prioritized SNPs from a genome-wide association study of ischemic stroke (using summary statistics from the METASTROKE study which included 12389 cases and 62004 controls), the AUC-ROC remained the same. Similar results were found for the secondary outcome ischemic stroke.
We found no additional value of genetic information in a prognostic model for the risk of ischemic events in patients with cerebral ischemia of arterial origin. This is consistent with a complex, polygenic architecture, where many genes of weak effect likely act in concert to influence the heritable risk of an individual to develop (recurrent) vascular events. At present, genetic information cannot help clinicians to distinguish patients at high risk for recurrent vascular events.
Efavirenz and abacavir are components of recommended first-line regimens for human immunodeficiency virus (HIV)-1 infection. We used genome-wide genotyping and clinical data to explore genetic associations with virologic failure among subjects randomized to efavirenz- or abacavir-containing regimens in AIDS Clinical Trials Group (ACTG) protocols.
Virologic response and genome-wide genotype data were available from treatment-naive subjects randomized to efavirenz-containing (n=1,596) or abacavir-containing (n=786) regimens in ACTG protocols 384, A5142, A5095, and A5202.
Meta-analysis of association results across race/ethnic groups showed no genome-wide significant associations (p<5×10−8) with virologic response for either efavirenz or abacavir. Our sample size provided 80% power to detect a genotype relative risk of 1.8 for efavirenz, and 2.4 for abacavir. Analyses focused on CYP2B genotypes that define the lowest plasma efavirenz exposure stratum did not reveal associations, nor did analysis limited to gene sets predicted to be relevant to efavirenz and abacavir disposition.
No single polymorphism is strongly associated with virologic failure with efavirenz- or abacavir-containing regimens. Analyses to better consider context, and that minimize confounding by non-genetic factors, may reveal associations not apparent herein.
HIV-1; efavirenz; abacavir; pharmacogenomics; virologic failure
Genome-wide association studies have been successful in identifying common variants that influence the susceptibility to complex diseases. From these studies, it has emerged that there is substantial overlap in susceptibility loci between diseases. In line with those findings, we hypothesized that shared genetic pathways may exist between multiple sclerosis (MS) and amyotrophic lateral sclerosis (ALS). While both diseases may have inflammatory and neurodegenerative features, epidemiological studies have indicated an increased co-occurrence within individuals and families. To this purpose, we combined genome-wide data from 4088 MS patients, 3762 ALS patients and 12 030 healthy control individuals in whom 5 440 446 single-nucleotide polymorphisms (SNPs) were successfully genotyped or imputed. We tested these SNPs for the excess association shared between MS and ALS and also explored whether polygenic models of SNPs below genome-wide significance could explain some of the observed trait variance between diseases. Genome-wide association meta-analysis of SNPs as well as polygenic analyses fails to provide evidence in favor of an overlap in genetic susceptibility between MS and ALS. Hence, our findings do not support a shared genetic background of common risk variants in MS and ALS.
Variants associated with blood lipid levels may be population-specific. To identify
low-frequency variants associated with this phenotype, population-specific reference
panels may be used. Here we impute nine large Dutch biobanks (~35,000
samples) with the population-specific reference panel created by the Genome of the
Netherlands Project and perform association testing with blood lipid levels. We
report the discovery of five novel associations at four loci (P value
<6.61 × 10−4), including a rare missense
variant in ABCA6
(rs77542162, p.Cys1359Arg, frequency 0.034), which is predicted to be deleterious.
The frequency of this ABCA6
variant is 3.65-fold increased in the Dutch and its effect
βTC=0.140) is estimated to be very similar to those
observed for single variants in well-known lipid genes, such as LDLR.
Frequencies of rare variants fluctuate over populations, hampering
gene discovery. Here the authors use a population-specific reference panel, the Genome
of the Netherlands, to discover four novel loci involved in lipid metabolism, including
an exonic variant in ABCA6.
Genome-wide association studies (GWAS) have identified thousands of loci associated wtih complex traits, but it is challenging to pinpoint causal genes in these loci and to exploit subtle association signals. We used tissue-specific quantitative interaction proteomics to map a network of five genes involved in the Mendelian disorder long QT syndrome (LQTS). We integrated the LQTS network with GWAS loci from the corresponding common complex trait, QT interval variation, to identify candidate genes that were subsequently confirmed in Xenopus laevis oocytes and zebrafish. We used the LQTS protein network to filter weak GWAS signals by identifying single nucleotide polymorphisms (SNPs) in proximity to genes in the network supported by strong proteomic evidence. Three SNPs passing this filter reached genome-wide significance after replication genotyping. Overall, we present a general strategy to propose candidates in GWAS loci for functional studies and to systematically filter subtle association signals using tissue-specific quantitative interaction proteomics.
The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal Mendelian Long QT Syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals we identified 35 common variant QT interval loci, that collectively explain ∼8-10% of QT variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 novel QT loci in 298 unrelated LQTS probands identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode for proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies novel candidate genes for ventricular arrhythmias, LQTS,and SCD.
genome-wide association study; QT interval; Long QT Syndrome; sudden cardiac death; myocardial repolarization; arrhythmias
Marginal zone lymphoma (MZL) is the third most common subtype of B-cell non-Hodgkin lymphoma. Here we perform a two-stage GWAS of 1,281 MZL cases and 7,127 controls of European ancestry and identify two independent loci near BTNL2 (rs9461741, P=3.95 × 10−15) and HLA-B (rs2922994, P=2.43 × 10−9) in the HLA region significantly associated with MZL risk. This is the first evidence that genetic variation in the major histocompatibility complex influences MZL susceptibility.
Marginal zone lymphoma (MZL) is a common subtype of B-cell non-Hodgkin lymphoma. Here the authors carry out a two-stage genome-wide association study in over 8,000 Europeans and identify two new MZL risk loci at chromosome 6p, implicating the major histocompatibility complex in the disease for the first time.
Motivation: Recently, investigators have proposed state-of-the-art Identity-by-descent (IBD) mapping methods to detect IBD segments between purportedly unrelated individuals. The IBD information can then be used for association testing in genetic association studies. One approach for this IBD association testing strategy is to test for excessive IBD between pairs of cases (‘pairwise method’). However, this approach is inefficient because it requires a large number of permutations. Moreover, a limited number of permutations define a lower bound for P-values, which makes fine-mapping of associated regions difficult because, in practice, a much larger genomic region is implicated than the region that is actually associated.
Results: In this article, we introduce a new pairwise method ‘Fast-Pairwise’. Fast-Pairwise uses importance sampling to improve efficiency and enable approximation of extremely small P-values. Fast-Pairwise method takes only days to complete a genome-wide scan. In the application to the WTCCC type 1 diabetes data, Fast-Pairwise successfully fine-maps a known human leukocyte antigen gene that is known to cause the disease.
Availability: Fast-Pairwise is publicly available at: http://genetics.cs.ucla.edu/graphibd.
Supplementary data are available at Bioinformatics online.
To identify novel genetic risk factors for rheumatoid arthritis (RA), we conducted a genome-wide association study (GWAS) meta-analysis of 5,539 autoantibody positive RA cases and 20,169 controls of European descent, followed by replication in an independent set of 6,768 RA cases and 8,806 controls. Of 34 SNPs selected for replication, 7 novel RA risk alleles were identified at genome-wide significance (P<5×10−8) in analysis of all 41,282 samples. The associated SNPs are near genes of known immune function, including IL6ST, SPRED2, RBPJ, CCR6, IRF5, and PXK. We also refined the risk alleles at two established RA risk loci (IL2RA and CCL21) and confirmed the association at AFF3. These new associations bring the total number of confirmed RA risk loci to 31 among individuals of European ancestry. An additional 11 SNPs replicated at P<0.05, many of which are validated autoimmune risk alleles, suggesting that most represent bona fide RA risk alleles.
Background and Purpose
Meta-analyses of extant genome-wide data illustrate the need to focus on subtypes of ischemic stroke for gene discovery. The NINDS Stroke Genetics Network (SiGN) contributes substantially to meta-analyses that focus on specific subtypes of stroke.
The NINDS Stroke Genetics Network (SiGN) includes ischemic stroke cases from 24 Genetic Research Centers (GRCs), 13 from the US and 11 from Europe. Investigators harmonize ischemic stroke phenotyping using the web-based Causative Classification of Stroke (CCS) system, with data entered by trained and certified adjudicators at participating GRCs. Through the Center for Inherited Diseases Research (CIDR), SiGN plans to genotype 10,296 carefully phenotyped stroke cases using genome-wide SNP arrays, and add to these another 4,253 previously genotyped cases for a total of 14,549 cases. To maximize power for subtype analyses, the study allocates genotyping resources almost exclusively to cases. Publicly available studies provide most of the control genotypes. CIDR-generated genotypes and corresponding phenotypic data will be shared with the scientific community through dbGaP, and brain MRI studies will be centrally archived.
The SiGN consortium, with its emphasis on careful and standardized phenotyping of ischemic stroke and stroke subtypes, provides an unprecedented opportunity to uncover genetic determinants of ischemic stroke.
ischemic stroke; genetics; genomics
Genome-wide association studies (GWAS) have begun to identify the common genetic component to ischaemic stroke (IS). However, IS has considerable phenotypic heterogeneity. Where clinical covariates explain a large fraction of disease risk, covariate informed designs can increase power to detect associations. As prevalence rates in IS are markedly affected by age, and younger onset cases may have higher genetic predisposition, we investigated whether an age-at-onset informed approach could detect novel associations with IS and its subtypes; cardioembolic (CE), large artery atherosclerosis (LAA) and small vessel disease (SVD) in 6,778 cases of European ancestry and 12,095 ancestry-matched controls. Regression analysis to identify SNP associations was performed on posterior liabilities after conditioning on age-at-onset and affection status. We sought further evidence of an association with LAA in 1,881 cases and 50,817 controls, and examined mRNA expression levels of the nearby genes in atherosclerotic carotid artery plaques. Secondly, we performed permutation analyses to evaluate the extent to which age-at-onset informed analysis improves significance for novel loci. We identified a novel association with an MMP12 locus in LAA (rs660599; p = 2.5×10−7), with independent replication in a second population (p = 0.0048, OR(95% CI) = 1.18(1.05–1.32); meta-analysis p = 2.6×10−8). The nearby gene, MMP12, was significantly overexpressed in carotid plaques compared to atherosclerosis-free control arteries (p = 1.2×10−15; fold change = 335.6). Permutation analyses demonstrated improved significance for associations when accounting for age-at-onset in all four stroke phenotypes (p<0.001). Our results show that a covariate-informed design, by adjusting for age-at-onset of stroke, can detect variants not identified by conventional GWAS.
Ischaemic stroke places an enormous burden on global healthcare. However, the disease processes that lead to stroke are not fully understood. Genome-wide association studies have recently established that common genetic variants can increase risk of ischaemic stroke and its subtypes. In this study, we aimed to identify novel genetic associations with ischaemic stroke and its subtypes by addressing the fact that younger onset cases may have a stronger genetic component, and using this information in our analyses. We identify a novel genetic variant on chromosome 11 (rs660599), which is associated with increased risk of large artery stroke. We also show that mRNA expression of the nearest gene (MMP12) is higher in arteries with the disease process underlying large artery stroke (atherosclerosis). Finally, we evaluate our novel analysis approach, and show that our method is likely to identify further associations with ischaemic stroke.
Single nucleotide polymorphisms in the APOA5-A4-C3-A1 gene complex are associated with elevated plasma triglycerides and elevated vascular risk in healthy populations. In patients with clinically manifest vascular disease, hypertriglyceridemia and metabolic syndrome are frequently present, but the contribution of these single nucleotide polymorphisms to plasma triglycerides, effect modification by obesity and risk of recurrent vascular events is unknown in these patients.
Prospective cohort study of 5547 patients with vascular disease. Rs964184 (APOA5-A4-C3-A1 gene complex) was genotyped, and we evaluated the relation with plasma lipid levels, presence of metabolic syndrome and the risk for new vascular events.
The minor allele of rs964184 was strongly associated with log plasma triglycerides (β 0.12; 95%CI 0.10-0.15, p = 1.1*10−19), and was also associated with 0.03 mmol/L lower high-density lipoprotein-cholesterol (95%CI 0.01–0.04), and 0.14 mmol/L higher non-high-density lipoprotein-cholesterol (95%CI 0.09–0.20). The minor allele frequency increased from 10.9% in patients with plasma triglycerides <1 mmol/L to 24.6% in patients with plasma triglycerides between 4 and 10 mmol/L. The relation between rs964184 and plasma triglycerides was modified by body mass index in patients with one minor allele (β 0.02; (95%CI −0.04–0.09) if body mass index <24 kg/m2, β 0.17 (95%CI 0.12–0.22) if body mass index >27 kg/m2, p for interaction = 0.02). The prevalence of the metabolic syndrome increased from 52% for patients with two copies of the major allele to 62% for patients with two copies of the minor allele (p = 0.01). Rs964184 was not related with recurrent vascular events (HR 0.99; 95%CI 0.86–1.13).
The single nucleotide polymorphism rs964184 (APOA5-A4-C3-A1) is associated with elevated plasma triglycerides concentrations in patients with clinically manifest vascular disease. In carriers of one minor allele, the effect on plasma triglycerides was modified by body mass index. There is no relation between rs964184 and recurrent vascular events in these patients.
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate–increasing and heart rate–decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
Phenotypic determination of HIV-1 coreceptor usage was performed on 593 pre-treatment plasma HIV-1 samples from treatment-naive participants in ACTG A5095. No human genetic variants were significantly associated with virus able to use CXCR4 for entry at the genome-wide level.
We conducted a genome-wide association study to explore whether common host genetic variants (>5% frequency) were associated with presence of virus able to use CXCR4 for entry.
Phenotypic determination of human immunodeficiency virus (HIV)-1 coreceptor usage was performed on pretreatment plasma HIV-1 samples from treatment-naive participants in AIDS Clinical Trials Group A5095, a study of initial antiretroviral regimens. Associations between genome-wide single-nucleotide polymorphisms (SNPs), CCR5 Δ32 genotype, and human leukocyte antigen (HLA) class I alleles and viral coreceptor usage were explored.
Viral phenotypes were obtained from 593 patients with available genome-wide SNP data. Forty-four percent of subjects had virus capable of using CXCR4 for entry as determined by phenotyping. Overall, no associations, including those between polymorphisms in genes encoding viral coreceptors and their promoter regions or in HLA genes previously associated with HIV-1 disease progression, passed the statistical threshold for genome-wide significance (P < 5.0 × 10−8) in any comparison. However, the presence of viruses able to use CXCR4 for entry was marginally associated with the CCR5 Δ32 genotype in the nongenome-wide analysis.
No human genetic variants were significantly associated with virus able to use CXCR4 for entry at the genome-wide level. Although the sample size had limited power to definitively exclude genetic associations, these results suggest that host genetic factors, including those that influence coreceptor expression or the immune pressures leading to viral envelope diversity, are either rare or have only modest effects in determining HIV-1 coreceptor usage.
CCR5 Δ32 mutation; genome-wide association study; HIV-1; viral coreceptor usage; viral tropism
Genome-wide association studies (GWAS) are widely applied to identify susceptibility loci for a variety of diseases using genotyping arrays that interrogate known polymorphisms throughout the genome. A particular strength of GWAS is that it is unbiased with respect to specific genomic elements (e.g., coding or regulatory regions of genes), and it has revealed important associations that would have never been suspected based on prior knowledge or assumptions. To date, the discovered SNPs associated with complex human traits tend to have small effect sizes, requiring very large sample sizes to achieve robust statistical power. To address these issues, a number of efficient strategies have emerged for conducting GWAS, including combining study results across multiple studies using meta-analysis, collecting cases through electronic health records, and using samples collected from other studies as controls that have already been genotyped and made publicly available (e.g., through deposition of de-identified data into dbGaP or EGA). In certain scenarios, it may be attractive to use already genotyped controls and divert resources to standardized collection, phenotyping, and genotyping of cases only. This strategy, however, requires that careful attention be paid to the choice of “public controls” and to the comparability of genetic data between cases and the public controls to ensure that any allele frequency differences observed between groups is attributable to locus-specific effects rather than to a systematic bias due to poor matching (population stratification) or differential genotype calling (batch effects). The goal of this paper is to describe some of the potential pitfalls in using previously genotyped control data. We focus on considerations related to the choice of control groups, the use of different genotyping platforms, and approaches to deal with population stratification when cases and controls are genotyped across different platforms.
genome-wide association study; case-control study; genetic association study; population stratification; power