To investigate the underlying mechanisms of T2D pathogenesis, we looked for diabetes susceptibility genes that increase the risk of type 2 diabetes (T2D) in a Han Chinese population. A two-stage genome-wide association (GWA) study was conducted, in which 995 patients and 894 controls were genotyped using the Illumina HumanHap550-Duo BeadChip for the first genome scan stage. This was further replicated in 1,803 patients and 1,473 controls in stage 2. We found two loci not previously associated with diabetes susceptibility in and around the genes protein tyrosine phosphatase receptor type D (PTPRD) (P = 8.54×10−10; odds ratio [OR] = 1.57; 95% confidence interval [CI] = 1.36–1.82), and serine racemase (SRR) (P = 3.06×10−9; OR = 1.28; 95% CI = 1.18–1.39). We also confirmed that variants in KCNQ1 were associated with T2D risk, with the strongest signal at rs2237895 (P = 9.65×10−10; OR = 1.29, 95% CI = 1.19–1.40). By identifying two novel genetic susceptibility loci in a Han Chinese population and confirming the involvement of KCNQ1, which was previously reported to be associated with T2D in Japanese and European descent populations, our results may lead to a better understanding of differences in the molecular pathogenesis of T2D among various populations.
Type 2 diabetes (T2D) is a complex disease that involves many genes and environmental factors. Genome-wide and candidate-gene association studies have thus far identified at least 19 regions containing genes that may confer a risk for T2D. However, most of these studies were conducted with patients of European descent. We studied Chinese patients with T2D and identified two genes, PTPRD and SRR, that were not previously known to be involved in diabetes and are involved in biological pathways different from those implicated in T2D by previous association reports. PTPRD is a protein tyrosine phosphatase and may affect insulin signaling on its target cells. SRR encodes a serine racemase that synthesizes D-serine from L-serine. Both D-serine (coagonist) and the neurotransmitter glutamate bind to NMDA receptors and trigger excitatory neurotransmission in the brain. Glutamate signaling also regulates insulin and glucagon secretion in pancreatic islets. Thus, SRR and D-serine, in addition to regulating insulin and glucagon secretion, may play a role in the etiology of T2D. Our study suggests that, in different patient populations, different genes may confer risks for diabetes. Our findings may lead to a better understanding of the molecular pathogenesis of T2D.
Many candidate genes have been studied for asthma, but replication has varied. Novel candidate genes have been identified for various complex diseases using genome-wide association studies (GWASs). We conducted a GWAS in 492 Mexican children with asthma, predominantly atopic by skin prick test, and their parents using the Illumina HumanHap 550 K BeadChip to identify novel genetic variation for childhood asthma. The 520,767 autosomal single nucleotide polymorphisms (SNPs) passing quality control were tested for association with childhood asthma using log-linear regression with a log-additive risk model. Eleven of the most significantly associated GWAS SNPs were tested for replication in an independent study of 177 Mexican case–parent trios with childhood-onset asthma and atopy using log-linear analysis. The chromosome 9q21.31 SNP rs2378383 (p = 7.10×10−6 in the GWAS), located upstream of transducin-like enhancer of split 4 (TLE4), gave a p-value of 0.03 and the same direction and magnitude of association in the replication study (combined p = 6.79×10−7). Ancestry analysis on chromosome 9q supported an inverse association between the rs2378383 minor allele (G) and childhood asthma. This work identifies chromosome 9q21.31 as a novel susceptibility locus for childhood asthma in Mexicans. Further, analysis of genome-wide expression data in 51 human tissues from the Novartis Research Foundation showed that median GWAS significance levels for SNPs in genes expressed in the lung differed most significantly from genes not expressed in the lung when compared to 50 other tissues, supporting the biological plausibility of our overall GWAS findings and the multigenic etiology of childhood asthma.
Asthma is a leading chronic childhood disease with a presumed strong genetic component, but no genes have been definitely shown to influence asthma development. Few genetic studies of asthma have included Hispanic populations. Here, we conducted a genome-wide association study of asthma in 492 Mexican children with asthma, predominantly atopic by skin prick test, and their parents to identify novel genetic variation for childhood asthma. We implicated several polymorphisms in or near TLE4 on chromosome 9q21.31 (a novel candidate region for childhood asthma) and replicated one polymorphism in an independent study of childhood-onset asthmatics with atopy and their parents of Mexican ethnicity. Hispanics have differing proportions of Native American, European, and African ancestries, and we found less Native American ancestry than expected at chromosome 9q21.31. This suggests that chromosome 9q21.31 may underlie ethnic differences in childhood asthma and that future replication would be most effective in populations with Native American ancestry. Analysis of publicly available genome-wide expression data revealed that association signals in genes expressed in the lung differed most significantly from genes not expressed in the lung when compared to 50 other tissues, supporting the biological plausibility of the overall GWAS findings and the multigenic etiology of asthma.
African Americans have increased susceptibility to non-diabetic (non-DM) forms of end-stage renal disease (ESRD) and extensive evidence supports a genetic contribution. A genome-wide association study (GWAS) using pooled DNA was performed in 1,000 African Americans to detect associated genes. DNA from 500 non-DM ESRD cases and 500 non-nephropathy controls was quantified using gel electrophoresis and spectrophotometric analysis and pools of 50 case and 50 control DNA samples were created. DNA pools were genotyped in duplicate on the Illumina HumanHap550-Duo BeadChip. Normalization methods were developed and applied to array intensity values to reduce inter-array variance. Allele frequencies were calculated from normalized channel intensities and compared between case and control pools. Three SNPs had p values of <1.0E–6: rs4462445 (ch 13), rs4821469 (ch 22) and rs8077346 (ch 17). After normalization, top scoring SNPs (n = 65) were genotyped individually in 464 of the original cases and 478 of the controls, with replication in 336 non-DM ESRD cases and 363 non-nephropathy controls. Sixteen SNPs were associated with non-DM ESRD (p < 7.7E–4, Bonferroni corrected). Twelve of these SNPs are in or near the MYH9 gene. The four non-MYH9 SNPs that were associated with non-DM ESRD in the pooled samples were not associated in the replication set. Five SNPs that were modestly associated in the pooled samples were more strongly associated in the replication and/or combined samples. This is the first GWAS for non-DM ESRD in African Americans using pooled DNA. We demonstrate strong association between non-DM ESRD in African Americans with MYH9, and have identified additional candidate loci.
The first genome wide association study (GWAS) for childhood asthma identified a novel major susceptibility locus on chromosome 17q21 harboring the ORMDL3 gene, but the role of previous asthma candidate genes was not specifically analyzed in this GWAS. We systematically identified 89 SNPs in 14 candidate genes previously associated with asthma in >3 independent study populations. We re-genotyped 39 SNPs in these genes not covered by GWAS performed in 703 asthmatics and 658 reference children. Genotyping data were compared to imputation data derived from Illumina HumanHap300 chip genotyping. Results were combined to analyze 566 SNPs covering all 14 candidate gene loci. Genotyped polymorphisms in ADAM33, GSTP1 and VDR showed effects with p-values <0.0035 (corrected for multiple testing). Combining genotyping and imputation, polymorphisms in DPP10, EDN1, IL12B, IL13, IL4, IL4R and TNF showed associations at a significance level between p = 0.05 and p = 0.0035. These data indicate that (a) GWAS coverage is insufficient for many asthma candidate genes, (b) imputation based on these data is reliable but incomplete, and (c) SNPs in three previously identified asthma candidate genes replicate in our GWAS population with significance after correction for multiple testing in 14 genes.
Recent genome-wide association (GWA) studies of lipids have been conducted in samples ascertained for other phenotypes, particularly diabetes. Here we report the first GWA analysis of loci affecting total cholesterol (TC), low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol and triglycerides sampled randomly from 16 population-based cohorts and genotyped using mainly the Illumina HumanHap300-Duo platform. Our study included a total of 17,797-22,562 persons, aged 18-104 years and from geographic regions spanning from the Nordic countries to Southern Europe. We established 22 loci associated with serum lipid levels at a genome-wide significance level (P < 5 × 10-8), including 16 loci that were identified by previous GWA studies. The six newly identified loci in our cohort samples are ABCG5 (TC, P = 1.5 × 10-11; LDL, P = 2.6 × 10-10), TMEM57 (TC, P = 5.4 × 10-10), CTCF-PRMT8 region (HDL, P = 8.3 × 10-16), DNAH11 (LDL, P = 6.1 × 10-9), FADS3-FADS2 (TC, P = 1.5 × 10-10; LDL, P = 4.4 × 10-13) and MADD-FOLH1 region (HDL, P = 6 × 10-11). For three loci, effect sizes differed significantly by sex. Genetic risk scores based on lipid loci explain up to 4.8% of variation in lipids and were also associated with increased intima media thickness (P = 0.001) and coronary heart disease incidence (P = 0.04). The genetic risk score improves the screening of high-risk groups of dyslipidemia over classical risk factors.
The plasma adiponectin level, a potential upstream and internal facet of metabolic and cardiovascular diseases, has a reasonably high heritability. Whether other novel genes influence the variation in adiponectin level and the roles of these genetic variants on subsequent clinical outcomes has not been thoroughly investigated. Therefore, we aimed not only to identify genetic variants modulating plasma adiponectin levels but also to investigate whether these variants are associated with adiponectin-related metabolic traits and cardiovascular diseases.
RESEARCH DESIGN AND METHODS
We conducted a genome-wide association study (GWAS) to identify quantitative trait loci (QTL) associated with high molecular weight forms of adiponectin levels by genotyping 382 young-onset hypertensive (YOH) subjects with Illumina HumanHap550 SNP chips. The culpable single nucleotide polymorphism (SNP) variants responsible for lowered adiponectin were then confirmed in another 559 YOH subjects, and the association of these SNP variants with the risk of metabolic syndrome (MS), type 2 diabetes mellitus (T2DM), and ischemic stroke was examined in an independent community–based prospective cohort, the CardioVascular Disease risk FACtors Two-township Study (CVDFACTS, n = 3,350).
The SNP (rs4783244) most significantly associated with adiponectin levels was located in intron 1 of the T-cadherin (CDH13) gene in the first stage (P = 7.57 × 10−9). We replicated and confirmed the association between rs4783244 and plasma adiponectin levels in an additional 559 YOH subjects (P = 5.70 × 10−17). This SNP was further associated with the risk of MS (odds ratio [OR] = 1.42, P = 0.027), T2DM in men (OR = 3.25, P = 0.026), and ischemic stroke (OR = 2.13, P = 0.002) in the CVDFACTS.
These findings indicated the role of T-cadherin in modulating adiponectin levels and the involvement of CDH13 or adiponectin in the development of cardiometabolic diseases.
Genome-wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA-pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and showed that the HumanHap300 arrays are substantially more efficient. In terms of effective sample size, HumanHap300-based pooling extracts >80% of the information available with individual genotyping (IG). In contrast, Genechip HindIII-based pooling only extracts ∼30% of the available information. With HumanHap300 arrays concordance with IG data is excellent. Guidance is given on best study design and it is shown that even after taking into account pooling error, one stage scans can be performed for >100-fold reduced cost compared with IG. With appropriately designed two stage studies, IG can provide confirmation of pooling results whilst still providing ∼20-fold reduction in total cost compared with IG-based alternatives. The large cost savings with Illumina HumanHap300-based pooling imply that future studies need only be limited by the availability of samples and not cost.
Unlike Caucasian populations, genetic factors contributing to the risk of type 2 diabetes mellitus (T2DM) are not well studied in Asian populations. In light of this, and the fact that copy number variation (CNV) is emerging as a new way to understand human genomic variation, the objective of this study was to identify type 2 diabetes–associated CNV in a Korean cohort.
Using the Illumina HumanHap300 BeadChip (317,503 markers), genome-wide genotyping was performed to obtain signal and allelic intensities from 275 patients with type 2 diabetes mellitus (T2DM) and 496 nondiabetic subjects (Total n = 771). To increase the sensitivity of CNV identification, we incorporated multiple factors using PennCNV, a program that is based on the hidden Markov model (HMM). To assess the genetic effect of CNV on T2DM, a multivariate logistic regression model controlling for age and gender was used. We identified a total of 7,478 CNVs (average of 9.7 CNVs per individual) and 2,554 CNV regions (CNVRs; 164 common CNVRs for frequency>1%) in this study. Although we failed to demonstrate robust associations between CNVs and the risk of T2DM, our results revealed a putative association between several CNVRs including chr15:45994758–45999227 (P = 8.6E-04, Pcorr = 0.01) and the risk of T2DM. The identified CNVs in this study were validated using overlapping analysis with the Database of Genomic Variants (DGV; 71.7% overlap), and quantitative PCR (qPCR). The identified variations, which encompassed functional genes, were significantly enriched in the cellular part, in the membrane-bound organelle, in the development process, in cell communication, in signal transduction, and in biological regulation.
We expect that the methods and findings in this study will contribute in particular to genome studies of Asian populations.
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation involved in susceptibility to etiologically complex disease. We conducted a GWAS to identify common genetic variation involved in susceptibility to upper aero-digestive tract (UADT) cancers. Genome-wide genotyping was carried out using the Illumina HumanHap300 beadchips in 2,091 UADT cancer cases and 3,513 controls from two large European multi-centre UADT cancer studies, as well as 4,821 generic controls. The 19 top-ranked variants were investigated further in an additional 6,514 UADT cancer cases and 7,892 controls of European descent from an additional 13 UADT cancer studies participating in the INHANCE consortium. Five common variants presented evidence for significant association in the combined analysis (p≤5×10−7). Two novel variants were identified, a 4q21 variant (rs1494961, p = 1×10−8) located near DNA repair related genes HEL308 and FAM175A (or Abraxas) and a 12q24 variant (rs4767364, p = 2×10−8) located in an extended linkage disequilibrium region that contains multiple genes including the aldehyde dehydrogenase 2 (ALDH2) gene. Three remaining variants are located in the ADH gene cluster and were identified previously in a candidate gene study involving some of these samples. The association between these three variants and UADT cancers was independently replicated in 5,092 UADT cancer cases and 6,794 controls non-overlapping samples presented here (rs1573496-ADH7, p = 5×10−8; rs1229984-ADH1B, p = 7×10−9; and rs698-ADH1C, p = 0.02). These results implicate two variants at 4q21 and 12q24 and further highlight three ADH variants in UADT cancer susceptibility.
We have used a two-phased study approach to identify common genetic variation involved in susceptibility to upper aero-digestive tract cancer. Using Illumina HumanHap300 beadchips, 2,091 UADT cancer cases and 3,513 controls from two large European multi-centre UADT cancer studies, as well as 4,821 generic controls, were genotyped for a panel 317,000 genetic variants that represent the majority of common genetic in the human genome. The 19 top-ranked variants were then studied in an additional series of 6,514 UADT cancer cases and 7,892 controls of European descent from an additional 13 UADT cancer studies. Five variants were significantly associated with UADT cancer risk after the completion of both stages, including three residing within the alcohol dehydrogenase genes (ADH1B, ADH1C, ADH7) that have been previously described. Two additional variants were found, one near the ALDH2 gene and a second variant located in HEL308, a DNA repair gene. These results implicate two variants 4q21 and 12q24 and further highlight three ADH variants UADT cancer susceptibility.
Familial aggregation of ischemic stroke derives from shared genetic and environmental factors. We present a meta-analysis of genome-wide association scans (GWAS) from 3 cohorts to identify the contribution of common variants to ischemic stroke risk.
This study involved 1464 ischemic stroke cases and 1932 controls. Cases were genotyped using the Illumina 610 or 660 genotyping arrays; controls, with Illumina HumanHap 550Kv1 or 550Kv3 genotyping arrays. Imputation was performed with the 1000 Genomes European ancestry haplotypes (August 2010 release) as a reference. A total of 5,156,597 single-nucleotide polymorphisms (SNPs) were incorporated into the fixed effects meta-analysis. All SNPs associated with ischemic stroke (P<1×10−5) were incorporated into a multivariate risk profile model.
No SNP reached genome-wide significance for ischemic stroke (P<5×10−8). Secondary analysis identified a significant cumulative effect for age at onset of stroke (first versus fifth quintile of cumulative profiles based on SNPs associated with late onset, ß = 14.77 [10.85,18.68], P = 5.5×10−12), as well as a strong effect showing increased risk across samples with a high propensity for stroke among samples with enriched counts of suggestive risk alleles (P<5×10−6). Risk profile scores based only on genomic information offered little incremental prediction.
There is little evidence of a common genetic variant contributing to moderate risk of ischemic stroke. Quintiles based on genetic loading of alleles associated with a younger age at onset of ischemic stroke revealed a significant difference in age at onset between those in the upper and lower quintiles. Using common variants from GWAS and imputation, genomic profiling remains inferior to family history of stroke for defining risk. Inclusion of genomic (rare variant) information may be required to improve clinical risk profiling.
Genome-wide association study (GWAS) has identified more than 30 loci associated with type 2 diabetes (T2D) in Caucasians. However, genomic understanding of T2D in Asians, especially Han Chinese, is still limited.
Methods and Principal Findings
A two-stage GWAS was performed in Han Chinese from Mainland China. The discovery stage included 793 T2D cases and 806 healthy controls genotyped using Illumina Human 660- and 610-Quad BeadChips; and the replication stage included two independent case-control populations (a total of 4445 T2D cases and 4458 controls) genotyped using TaqMan assay. We validated the associations of KCNQ1 (rs163182, p = 2.085×10−17, OR 1.28) and C2CD4A/B (rs1370176, p = 3.677×10−4, OR 1.124; rs1436953, p = 7.753×10−6, OR 1.141; rs7172432, p = 4.001×10−5, OR 1.134) in Han Chinese.
Conclusions and Significance
Our study represents the first GWAS of T2D with both discovery and replication sample sets recruited from Han Chinese men and women residing in Mainland China. We confirmed the associations of KCNQ1 and C2CD4A/B with T2D, with the latter for the first time being examined in Han Chinese. Arguably, eight more independent loci were replicated in our GWAS.
We investigated candidate genomic regions associated with computed tomography (CT)-derived measures of adiposity in Hispanic from the IRAS Family Study. In 1190 Hispanic individuals from 92 families from the San Luis Valley, CO and San Antonio, TX, we measured CT-derived visceral adipose tissue (VAT); subcutaneous adipose tissue (SAT); and visceral: subcutaneous ratio (VSR). A genome-wide association study (GWAS) was completed using the Illumina HumanHap 300 BeadChip (~317K single nucleotide polymorphisms (SNPs)) in 229 individuals from the San Antonio site (Stage 1). Two hundred ninety-seven SNPs with evidence for association with VAT, SAT, or VSR, adjusting for age and sex (p<0.001), were genotyped in the remaining 961 Hispanic samples. The entire Hispanic cohort (n = 1190) was then tested for association, adjusting for age, sex, site of recruitment and admixture estimates (Stage 2). In Stage 3, additional SNPs were genotyped in four genic regions showing evidence of association in Stage 2.
Several SNPs were associated in the GWAS (p<1×10−5) and were confirmed to be significantly associated in the entire Hispanic cohort (p<0.01), including: rs7543757 for VAT; rs4754373, and rs11212913 for SAT; and rs4541696, and rs4134351 for VSR. Numerous SNPs were associated with multiple adiposity phenotypes. Targeted analysis of four genes whose SNPs were significant in Stage 2 suggest candidate genes for influencing the distribution (RGS6) and amount of adiposity (NGEF).
Several candidate loci, including RGS6 and NGEF, are associated with CT-derived adipose fat measures in Hispanic Americans in a three-stage genetic association study.
genetic association; visceral fat; subcutaneous fat; obesity; body mass index
Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina’s HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%–93%), but IMPUTE2 had the highest IQS (81%–83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.
Genome-wide association studies (GWAS) have been fruitful in identifying disease susceptibility loci for common and complex diseases. A remaining question is whether we can quantify individual disease risk based on genotype data, in order to facilitate personalized prevention and treatment for complex diseases. Previous studies have typically failed to achieve satisfactory performance, primarily due to the use of only a limited number of confirmed susceptibility loci. Here we propose that sophisticated machine-learning approaches with a large ensemble of markers may improve the performance of disease risk assessment. We applied a Support Vector Machine (SVM) algorithm on a GWAS dataset generated on the Affymetrix genotyping platform for type 1 diabetes (T1D) and optimized a risk assessment model with hundreds of markers. We subsequently tested this model on an independent Illumina-genotyped dataset with imputed genotypes (1,008 cases and 1,000 controls), as well as a separate Affymetrix-genotyped dataset (1,529 cases and 1,458 controls), resulting in area under ROC curve (AUC) of ∼0.84 in both datasets. In contrast, poor performance was achieved when limited to dozens of known susceptibility loci in the SVM model or logistic regression model. Our study suggests that improved disease risk assessment can be achieved by using algorithms that take into account interactions between a large ensemble of markers. We are optimistic that genotype-based disease risk assessment may be feasible for diseases where a notable proportion of the risk has already been captured by SNP arrays.
An often touted utility of genome-wide association studies (GWAS) is that the resulting discoveries can facilitate implementation of personalized medicine, in which preventive and therapeutic interventions for complex diseases can be tailored to individual genetic profiles. However, recent studies using whole-genome SNP genotype data for disease risk assessment have generally failed to achieve satisfactory results, leading to a pessimistic view of the utility of genotype data for such purposes. Here we propose that sophisticated machine-learning approaches on a large ensemble of markers, which contain both confirmed and as yet unconfirmed disease susceptibility variants, may improve the performance of disease risk assessment. We tested an algorithm called Support Vector Machine (SVM) on three large-scale datasets for type 1 diabetes and demonstrated that risk assessment can be highly accurate for the disease. Our results suggest that individualized disease risk assessment using whole-genome data may be more successful for some diseases (such as T1D) than other diseases. However, the predictive accuracy will be dependent on the heritability of the disease under study, the proportion of the genetic risk that is known, and that the right set of markers and right algorithms are being used.
Genome-wide association studies (GWAS) led to the identification of numerous novel loci for a number of complex diseases. Pathway-based approaches using genotypic data provide tangible leads which cannot be identified by single marker approaches as implemented in GWAS. The available pathway analysis approaches mainly differ in the employed databases and in the applied statistics for determining the significance of the associated disease markers.
So far, pathway-based approaches using GWAS data failed to consider the overlapping of genes among different pathways or the influence of protein–interactions. We performed a multistage integrative pathway (MIP) analysis on three common diseases - Crohn's disease (CD), rheumatoid arthritis (RA) and type 1 diabetes (T1D) - incorporating genotypic, pathway, protein- and domain-interaction data to identify novel associations between these diseases and pathways. Additionally, we assessed the sensitivity of our method by studying the influence of the most significant SNPs on the pathway analysis by removing those and comparing the corresponding pathway analysis results. Apart from confirming many previously published associations between pathways and RA, CD and T1D, our MIP approach was able to identify three new associations between disease phenotypes and pathways. This includes a relation between the influenza-A pathway and RA, as well as a relation between T1D and the phagosome and toxoplasmosis pathways. These results provide new leads to understand the molecular underpinnings of these diseases.
The developed software herein used is available at http://www.cogsys.cs.uni-tuebingen.de/software/GWASPathwayIdentifier/index.htm.
Obesity is an increasingly common disorder that predisposes to several medical conditions, including type 2 diabetes. We investigated whether large and rare copy-number variations (CNVs) differentiate moderate to extreme obesity from never-overweight control subjects.
RESEARCH DESIGN AND METHODS
Using single nucleotide polymorphism (SNP) arrays, we performed a genome-wide CNV survey on 430 obese case subjects (BMI >35 kg/m2) and 379 never-overweight control subjects (BMI <25 kg/m2). All subjects were of European ancestry and were genotyped on the Illumina HumanHap550 arrays with ∼550,000 SNP markers. The CNV calls were generated by PennCNV software.
CNVs >1 Mb were found to be overrepresented in case versus control subjects (odds ratio [OR] = 1.5 [95% CI 0.5–5]), and CNVs >2 Mb were present in 1.3% of the case subjects but were absent in control subjects (OR = infinity [95% CI 1.2–infinity]). When focusing on rare deletions that disrupt genes, even more pronounced effect sizes are observed (OR = 2.7 [95% CI 0.5–27.1] for CNVs >1 Mb). Interestingly, obese case subjects who carry these large CNVs have moderately high BMI and do not appear to be extreme cases. Several CNVs disrupt known candidate genes for obesity, such as a 3.3-Mb deletion disrupting NAP1L5 and a 2.1-Mb deletion disrupting UCP1 and IL15.
Our results suggest that large CNVs, especially rare deletions, confer risk of obesity in patients with moderate obesity and that genes impacted by large CNVs represent intriguing candidates for obesity that warrant further study.
Bipolar disorder (BPD) is a common psychiatric illness with a complex mode of inheritance. Besides traditional linkage and association studies, which require large sample sizes, analysis of common and rare chromosomal copy number variants (CNVs) in extended families may provide novel insights into the genetic susceptibility of complex disorders. Using the Illumina HumanHap550 BeadChip with over 550,000 SNP markers, we genotyped 46 individuals in a three-generation Old Order Amish pedigree with 19 affected (16 BPD and three major depression) and 27 unaffected subjects. Using the PennCNV algorithm, we identified 50 CNV regions that ranged in size from 12 to 885 kb and encompassed at least 10 single nucleotide polymorphisms (SNPs). Of 19 well characterized CNV regions that were available for combined genotype-expression analysis 11 (58%) were associated with expression changes of genes within, partially within or near these CNV regions in fibroblasts or lymphoblastoid cell lines at a nominal P value <0.05. To further investigate the mode of inheritance of CNVs in the large pedigree, we analyzed a set of four CNVs, located at 6q27, 9q21.11, 12p13.31 and 15q11, all of which were enriched in subjects with affective disorders. We additionally show that these variants affect the expression of neuronal genes within or near the rearrangement. Our analysis suggests that family based studies of the combined effect of common and rare CNVs at many loci may represent a useful approach in the genetic analysis of disease susceptibility of mental disorders.
Inflammatory bowel disease, including Crohn's disease (CD) and ulcerative colitis (UC), and type 1 diabetes (T1D) are autoimmune diseases that may share common susceptibility pathways. We examined known susceptibility loci for these diseases in a cohort of 1689 CD cases, 777 UC cases, 989 T1D cases and 6197 shared control subjects of European ancestry, who were genotyped by the Illumina HumanHap550 SNP arrays. We identified multiple previously unreported or unconfirmed disease associations, including known CD loci (ICOSLG and TNFSF15) and T1D loci (TNFAIP3) that confer UC risk, known UC loci (HERC2 and IL26) that confer T1D risk and known UC loci (IL10 and CCNY) that confer CD risk. Additionally, we show that T1D risk alleles residing at the PTPN22, IL27, IL18RAP and IL10 loci protect against CD. Furthermore, the strongest risk alleles for T1D within the major histocompatibility complex (MHC) confer strong protection against CD and UC; however, given the multi-allelic nature of the MHC haplotypes, sequencing of the MHC locus will be required to interpret this observation. These results extend our current knowledge on genetic variants that predispose to autoimmunity, and suggest that many loci involved in autoimmunity may be under a balancing selection due to antagonistic pleiotropic effect. Our analysis implies that variants with opposite effects on different diseases may facilitate the maintenance of common susceptibility alleles in human populations, making autoimmune diseases especially amenable to genetic dissection by genome-wide association studies.
Neuroblastoma is a malignancy of the developing sympathetic nervous system that most commonly affects young children and is often lethal. The etiology of this embryonal cancer is not known.
We performed a genome-wide association study by first genotyping 1,032 neuroblastoma patients and 2,043 controls of European descent using the Illumina HumanHap550 BeadChip. Three independent groups of neuroblastoma cases (N=720) and controls (N=2128) were then genotyped to replicate significant associations.
We observed highly significant association between neuroblastoma and the common minor alleles of three single nucleotide polymorphisms (SNPs) within a 94.2 kilobase (Kb) linkage disequilibrium block at chromosome band 6p22 containing the predicted genes FLJ22536 and FLJ44180 (P-value range = 1.71×10-9-7.01×10-10; allelic odds ratio range 1.39-1.40). Homozygosity for the at-risk G allele of the most significantly associated SNP, rs6939340, resulted in an increased likelihood of developing neuroblastoma of 1.97 (95% CI 1.58-2.44). Subsequent genotyping of these 6p22 SNPs in the three independent case series confirmed our observation of association (P=9.33×10-15 at rs6939340 for joint analysis). Furthermore, neuroblastoma patients homozygous for the risk alleles at 6p22 were more likely to develop metastatic (Stage 4) disease (P=0.02), show amplification of the MYCN oncogene in the tumor cells (P=0.006), and to have disease relapse (P=0.01).
Common genetic variation at chromosome band 6p22 is associated with susceptibility to neuroblastoma.
Hypertension is a complex disorder with high prevalence rates all over the world. We conducted the first genome-wide gene-based association scan for hypertension in a Han Chinese population. By analyzing genome-wide single-nucleotide-polymorphism data of 400 matched pairs of young-onset hypertensive patients and normotensive controls genotyped with the Illumina HumanHap550-Duo BeadChip, 100 susceptibility genes for hypertension were identified and also validated with permutation tests. Seventeen of the 100 genes exhibited differential allelic and expression distributions between patient and control groups. These genes provided a good molecular signature for classifying hypertensive patients and normotensive controls. Among the 17 genes, IGF1, SLC4A4, WWOX, and SFMBT1 were not only identified by our gene-based association scan and gene expression analysis but were also replicated by a gene-based association analysis of the Hong Kong Hypertension Study. Moreover, cis-acting expression quantitative trait loci associated with the differentially expressed genes were found and linked to hypertension. IGF1, which encodes insulin-like growth factor 1, is associated with cardiovascular disorders, metabolic syndrome, decreased body weight/size, and changes of insulin levels in mice. SLC4A4, which encodes the electrogenic sodium bicarbonate cotransporter 1, is associated with decreased body weight/size and abnormal ion homeostasis in mice. WWOX, which encodes the WW domain-containing protein, is related to hypoglycemia and hyperphosphatemia. SFMBT1, which encodes the scm-like with four MBT domains protein 1, is a novel hypertension gene. GRB14, TMEM56 and KIAA1797 exhibited highly significant differential allelic and expressed distributions between hypertensive patients and normotensive controls. GRB14 was also found relevant to blood pressure in a previous genetic association study in East Asian populations. TMEM56 and KIAA1797 may be specific to Taiwanese populations, because they were not validated by the two replication studies. Identification of these genes enriches the collection of hypertension susceptibility genes, thereby shedding light on the etiology of hypertension in Han Chinese populations.
There is great interindividual variability in HIV-1 viral setpoint after seroconversion, some of which is known to be due to genetic differences among infected individuals. Here, our focus is on determining, genome-wide, the contribution of variable gene expression to viral control, and to relate it to genomic DNA polymorphism. RNA was extracted from purified CD4+ T-cells from 137 HIV-1 seroconverters, 16 elite controllers, and 3 healthy blood donors. Expression levels of more than 48,000 mRNA transcripts were assessed by the Human-6 v3 Expression BeadChips (Illumina). Genome-wide SNP data was generated from genomic DNA using the HumanHap550 Genotyping BeadChip (Illumina). We observed two distinct profiles with 260 genes differentially expressed depending on HIV-1 viral load. There was significant upregulation of expression of interferon stimulated genes with increasing viral load, including genes of the intrinsic antiretroviral defense. Upon successful antiretroviral treatment, the transcriptome profile of previously viremic individuals reverted to a pattern comparable to that of elite controllers and of uninfected individuals. Genome-wide evaluation of cis-acting SNPs identified genetic variants modulating expression of 190 genes. Those were compared to the genes whose expression was found associated with viral load: expression of one interferon stimulated gene, OAS1, was found to be regulated by a SNP (rs3177979, p = 4.9E-12); however, we could not detect an independent association of the SNP with viral setpoint. Thus, this study represents an attempt to integrate genome-wide SNP signals with genome-wide expression profiles in the search for biological correlates of HIV-1 control. It underscores the paradox of the association between increasing levels of viral load and greater expression of antiviral defense pathways. It also shows that elite controllers do not have a fully distinctive mRNA expression pattern in CD4+ T cells. Overall, changes in global RNA expression reflect responses to viral replication rather than a mechanism that might explain viral control.
There has been recent progress in understanding the genetic factors that modulate susceptibility to HIV-1 infection. Genetic variation explains to a certain extent differences in disease progression among individuals. Less is known regarding the contribution of differences in gene expression to viral control. The present study evaluated, genome-wide, gene expression levels in CD4+ T cell, the main target of HIV-1. Thereafter, it searched for genetic variants that would modify gene expression. Specific expression profiles associated with high levels of viremia—in particular, the upregulation of genes of the antiviral defense. In contrast, no expression profile associated with effective viral control. Multiple genetic variants modulated gene expression in CD4+ T cells; however, none had a strong influence on viral control. This integrated genome-wide assessment suggests that viral replication drives gene expression rather than expression pointing to mechanisms of viral control.
Coronary artery disease (CAD) is a multifactorial disease with environmental and genetic determinants. The genetic determinants of CAD have previously been explored by the candidate gene approach. Recently, the data from the International HapMap Project and the development of dense genotyping chips have enabled us to perform genome-wide association studies (GWAS) on a large number of subjects without bias towards any particular candidate genes. In 2007, three chip-based GWAS simultaneously revealed the significant association between common variants on chromosome 9p21 and CAD. This association was replicated among other ethnic groups and also in a meta-analysis. Further investigations have detected several other candidate loci associated with CAD. The chip-based GWAS approach has identified novel and unbiased genetic determinants of CAD and these insights provide the important direction to better understand the pathogenesis of CAD and to develop new and improved preventive measures and treatments for CAD.
Although autism is one of the most heritable neuropsychiatric disorders, its underlying genetic architecture has largely eluded description. To comprehensively examine the hypothesis that common variation is important in autism, we performed a genome-wide association study (GWAS) using a discovery dataset of 438 autistic Caucasian families and the Illumina Human 1M beadchip. 96 single nucleotide polymorphisms (SNPs) demonstrated strong association with autism risk (p-value < 0.0001). The validation of the top 96 SNPs was performed using an independent dataset of 487 Caucasian autism families genotyped on the 550K Illumina BeadChip. A novel region on chromosome 5p14.1 showed significance in both the discovery and validation datasets. Joint analysis of all SNPs in this region identified 8 SNPs having improved p-values (3.24E-04 to 3.40E-06) than in either dataset alone. Our findings demonstrate that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation.
DNA repair genes are important for maintaining genomic stability and limiting carcinogenesis. We analyzed all single nucleotide polymorphisms (SNPs) of 125 DNA repair genes covered by the Illumina HumanHap300 (v1.1) BeadChips in a previously conducted genome-wide association study (GWAS) of 1,154 lung cancer cases and 1,137 controls and replicated the top-hits of XRCC4 SNPs in an independent set of 597 cases and 611 controls in Texas populations. We found that six of 20 XRCC4 SNPs were associated with a decreased risk of lung cancer with a P value of 0.01 or lower in the discovery dataset, of which the most significant SNP was rs10040363 (P for allelic test = 4.89 ×10−4). Moreover, the data in this region allowed us to impute a potentially functional SNP rs2075685 (imputed P for allelic test = 1.3 ×10−3). A luciferase reporter assay demonstrated that the rs2075685G>T change in the XRCC4 promoter increased expression of the gene. In the replication study of rs10040363, rs1478486, rs9293329, and rs2075685, however, only rs10040363 achieved a borderline association with a decreased risk of lung cancer in a dominant model (adjusted OR = 0.80, 95% CI = 0.62–1.03, P = 0.079). In the final combined analysis of both the Texas GWAS discovery and replication datasets, the strength of the association was increased for rs10040363 (adjusted OR = 0.77, 95% CI = 0.66–0.89, Pdominant = 5×10−4 and P for trend = 5×10−4) and rs1478486 (adjusted OR = 0.82, 95% CI = 0.71 −0.94, Pdominant = 6×10−3 and P for trend = 3.5×10−3). Finally, we conducted a meta-analysis of these XRCC4 SNPs with available data from published GWA studies of lung cancer with a total of 12,312 cases and 47,921 controls, in which none of these XRCC4 SNPs was associated with lung cancer risk. It appeared that rs2075685, although associated with increased expression of a reporter gene and lung cancer risk in the Texas populations, did not have an effect on lung cancer risk in other populations. This study underscores the importance of replication using published data in larger populations.
XRCC4; variant; Genetic susceptibility; genome-wide association study; replication study
To determine susceptibility genes for high myopia in Singaporean Chinese.
A meta-analysis of two genome wide association (GWA) datasets in Chinese and a follow-up replication cohort in Japanese.
Participants and Controls
Two independent datasets of Singaporean Chinese individuals aged 10–12 years (SCORM -- Singapore Cohort Study of the Risk factors for Myopia: cases=65, controls=238) and aged > 21 years (SP2 -- Singapore Prospective Study Program: cases=222, controls=435) for GWA studies, and a Japanese dataset aged >20 years (cases=959, controls=2128) for replication.
Genomic DNA samples from SCORM and SP2 were genotyped using various Illumina Beadarray platforms (> HumanHap 500). Single-locus association tests were conducted for each dataset with meta-analysis using pooled z-scores. The top-ranked genetic markers were examined for replication in Japanese dataset. Fisher’s P was calculated for the combined analysis of all three cohorts.
Main outcome measures
High myopia, defined by spherical equivalent (SE) ≤ −6.00 diopters (D); controls defined by SE between −0.50D and +1.00D.
Two SNPs (rs12716080 and rs6885224) in the gene CTNND2 on chromosome 5p15 ranked top in the meta-analysis of our Chinese datasets (meta- P = 1.14×10−5 and meta- P = 1.51×10−5, respectively) with strong supporting evidence in each individual dataset analysis (Max P = 1.85.x10−4 in SCORM: Max P = 8.8×10−3 in SP2). Evidence of replication was observed in Japanese dataset for rs6885224 (P = 0.035, meta-P of three datasets: 7.84×10−6).
This study identified strong association of CTNND2 for high myopia in Asian datasets. The CTNND2 gene maps to a known high myopia linkage region on chromosome 5p15.
myopia; genome wide association; CTNND2; single nucleotide polymorphism; genetics