PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1064584)

Clipboard (0)
None

Related Articles

1.  Power to Detect Risk Alleles Using Genome-Wide Tag SNP Panels 
PLoS Genetics  2007;3(10):e170.
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data.
Author Summary
Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
doi:10.1371/journal.pgen.0030170
PMCID: PMC2000969  PMID: 17922574
2.  Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies 
BMC Genetics  2009;10:27.
Background
Although high-throughput genotyping arrays have made whole-genome association studies (WGAS) feasible, only a small proportion of SNPs in the human genome are actually surveyed in such studies. In addition, various SNP arrays assay different sets of SNPs, which leads to challenges in comparing results and merging data for meta-analyses. Genome-wide imputation of untyped markers allows us to address these issues in a direct fashion.
Methods
384 Caucasian American liver donors were genotyped using Illumina 650Y (Ilmn650Y) arrays, from which we also derived genotypes from the Ilmn317K array. On these data, we compared two imputation methods: MACH and BEAGLE. We imputed 2.5 million HapMap Release22 SNPs, and conducted GWAS on ~40,000 liver mRNA expression traits (eQTL analysis). In addition, 200 Caucasian American and 200 African American subjects were genotyped using the Affymetrix 500 K array plus a custom 164 K fill-in chip. We then imputed the HapMap SNPs and quantified the accuracy by randomly masking observed SNPs.
Results
MACH and BEAGLE perform similarly with respect to imputation accuracy. The Ilmn650Y results in excellent imputation performance, and it outperforms Affx500K or Ilmn317K sets. For Caucasian Americans, 90% of the HapMap SNPs were imputed at 98% accuracy. As expected, imputation of poorly tagged SNPs (untyped SNPs in weak LD with typed markers) was not as successful. It was more challenging to impute genotypes in the African American population, given (1) shorter LD blocks and (2) admixture with Caucasian populations in this population. To address issue (2), we pooled HapMap CEU and YRI data as an imputation reference set, which greatly improved overall performance. The approximate 40,000 phenotypes scored in these populations provide a path to determine empirically how the power to detect associations is affected by the imputation procedures. That is, at a fixed false discovery rate, the number of cis-eQTL discoveries detected by various methods can be interpreted as their relative statistical power in the GWAS. In this study, we find that imputation offer modest additional power (by 4%) on top of either Ilmn317K or Ilmn650Y, much less than the power gain from Ilmn317K to Ilmn650Y (13%).
Conclusion
Current algorithms can accurately impute genotypes for untyped markers, which enables researchers to pool data between studies conducted using different SNP sets. While genotyping itself results in a small error rate (e.g. 0.5%), imputing genotypes is surprisingly accurate. We found that dense marker sets (e.g. Ilmn650Y) outperform sparser ones (e.g. Ilmn317K) in terms of imputation yield and accuracy. We also noticed it was harder to impute genotypes for African American samples, partially due to population admixture, although using a pooled reference boosts performance. Interestingly, GWAS carried out using imputed genotypes only slightly increased power on top of assayed SNPs. The reason is likely due to adding more markers via imputation only results in modest gain in genetic coverage, but worsens the multiple testing penalties. Furthermore, cis-eQTL mapping using dense SNP set derived from imputation achieves great resolution, and locate associate peak closer to causal variants than conventional approach.
doi:10.1186/1471-2156-10-27
PMCID: PMC2709633  PMID: 19531258
3.  Global similarity with local differences in linkage disequilibrium between the Dutch and HapMap–CEU populations 
The HapMap project has facilitated the selection of tagging single nucleotide polymorphisms (tagSNPs) for genome-wide association studies (GWAS) under the assumption that linkage disequilibrium (LD) in the HapMap populations is similar to the populations under investigation. Earlier reports support this assumption, although in most of these studies only a few loci were evaluated. We compared pair-wise LD and LD block structure across autosomes between the Dutch population and the CEU–HapMap reference panel. The impact of sampling distribution on the estimation of LD blocks was studied by bootstrapping. A high Pearson correlation (genome-wide; 0.93) between pair-wise r2 for the Dutch and the CEU populations was found, indicating that tagSNPs from the CEU–HapMap panel capture common variation in the Dutch population. However, some genomic regions exhibited, significantly lower correlation than the genome-wide estimate. This might decrease the validity of HapMap tagSNPs in these regions and the power of GWAS. The LD block structure differed considerably between the Dutch and CEU–HapMap populations. This was not explained by demographic differences between the CEU and Dutch samples, as testing for population stratification was not significant. We also found that sampling variation had a large effect on the estimation of LD blocks, as shown by the bootstrapping analysis. Thus, in small samples, most of the observed differences in LD blocks between populations are most likely the result of sampling variation. This poor concordance in LD block structure suggests that large samples are required for robust estimations of local LD block structure in populations.
doi:10.1038/ejhg.2008.248
PMCID: PMC2947108  PMID: 19127282
Dutch population; HapMap–CEU; pair-wise LD; LD blocks; bootstrapping
4.  Insight in Genome-Wide Association of Metabolite Quantitative Traits by Exome Sequence Analyses 
PLoS Genetics  2015;11(1):e1004835.
Metabolite quantitative traits carry great promise for epidemiological studies, and their genetic background has been addressed using Genome-Wide Association Studies (GWAS). Thus far, the role of less common variants has not been exhaustively studied. Here, we set out a GWAS for metabolite quantitative traits in serum, followed by exome sequence analysis to zoom in on putative causal variants in the associated genes. 1H Nuclear Magnetic Resonance (1H-NMR) spectroscopy experiments yielded successful quantification of 42 unique metabolites in 2,482 individuals from The Erasmus Rucphen Family (ERF) study. Heritability of metabolites were estimated by SOLAR. GWAS was performed by linear mixed models, using HapMap imputations. Based on physical vicinity and pathway analyses, candidate genes were screened for coding region variation using exome sequence data. Heritability estimates for metabolites ranged between 10% and 52%. GWAS replicated three known loci in the metabolome wide significance: CPS1 with glycine (P-value  = 1.27×10−32), PRODH with proline (P-value  = 1.11×10−19), SLC16A9 with carnitine level (P-value  = 4.81×10−14) and uncovered a novel association between DMGDH and dimethyl-glycine (P-value  = 1.65×10−19) level. In addition, we found three novel, suggestively significant loci: TNP1 with pyruvate (P-value  = 1.26×10−8), KCNJ16 with 3-hydroxybutyrate (P-value  = 1.65×10−8) and 2p12 locus with valine (P-value  = 3.49×10−8). Exome sequence analysis identified potentially causal coding and regulatory variants located in the genes CPS1, KCNJ2 and PRODH, and revealed allelic heterogeneity for CPS1 and PRODH. Combined GWAS and exome analyses of metabolites detected by high-resolution 1H-NMR is a robust approach to uncover metabolite quantitative trait loci (mQTL), and the likely causative variants in these loci. It is anticipated that insight in the genetics of intermediate phenotypes will provide additional insight into the genetics of complex traits.
Author Summary
Human metabolic individuality is under strict control of genetic and environmental factors. In our study, we aimed to find the genetic determinants of circulating molecules in sera of large set of individuals representing the general population. First, we performed a hypothesis-free genome wide screen in this population to identify genetic regions of interest. Our study confirmed four known gene metabolite connections, but also pointed to four novel ones. Genome-wide screens enriched for common intergenic variants may miss causal genetic variations directly changing the protein sequence. To investigate this further, we zoomed into regions of interest and tested whether the association signals obtained in the first stage were direct, or whether they represent causal variations, which were not captured in the initial panel. These subsequent tests showed that protein coding and regulatory variations are involved in metabolite levels. For two genomic regions we also found that genes harbour more than one causal variant influencing metabolite levels independent of each other. We also observed strong connection between markers of cardio-metabolic health and metabolites. Taken together, our novel loci are of interest for further research to investigate the causal relation to for instance type 2 diabetes and cardiovascular disease.
doi:10.1371/journal.pgen.1004835
PMCID: PMC4287344  PMID: 25569235
5.  Extensive Natural Variation for Cellular Hydrogen Peroxide Release Is Genetically Controlled 
PLoS ONE  2012;7(8):e43566.
Natural variation in DNA sequence contributes to individual differences in quantitative traits. While multiple studies have shown genetic control over gene expression variation, few additional cellular traits have been investigated. Here, we investigated the natural variation of NADPH oxidase-dependent hydrogen peroxide (H2O2 release), which is the joint effect of reactive oxygen species (ROS) production, superoxide metabolism and degradation, and is related to a number of human disorders. We assessed the normal variation of H2O2 release in lymphoblastoid cell lines (LCL) in a family-based 3-generation cohort (CEPH-HapMap), and in 3 population-based cohorts (KORA, GenCord, HapMap). Substantial individual variation was observed, 45% of which were associated with heritability in the CEPH-HapMap cohort. We identified 2 genome-wide significant loci of Hsa12 and Hsa15 in genome-wide linkage analysis. Next, we performed genome-wide association study (GWAS) for the combined KORA-GenCord cohorts (n = 279) using enhanced marker resolution by imputation (>1.4 million SNPs). We found 5 significant associations (p<5.00×10−8) and 54 suggestive associations (p<1.00×10−5), one of which confirmed the linked region on Hsa15. To replicate our findings, we performed GWAS using 58 HapMap individuals and ∼2.1 million SNPs. We identified 40 genome-wide significant and 302 suggestive SNPs, and confirmed genome signals on Hsa1, Hsa12, and Hsa15. Genetic loci within 900 kb from the known candidate gene p67phox on Hsa1 were identified in GWAS in both cohorts. We did not find replication of SNPs across all cohorts, but replication within the same genomic region. Finally, a highly significant decrease in H2O2 release was observed in Down Syndrome (DS) individuals (p<2.88×10−12). Taken together, our results show strong evidence of genetic control of H2O2 in LCL of healthy and DS cohorts and suggest that cellular phenotypes, which themselves are also complex, may be used as proxies for dissection of complex disorders.
doi:10.1371/journal.pone.0043566
PMCID: PMC3430705  PMID: 22952707
6.  Limits on the reproducibility of marker associations with southern leaf blight resistance in the maize nested association mapping population 
BMC Genomics  2014;15(1):1068.
Background
A previous study reported a comprehensive quantitative trait locus (QTL) and genome wide association study (GWAS) of southern leaf blight (SLB) resistance in the maize Nested Association Mapping (NAM) panel. Since that time, the genomic resources available for such analyses have improved substantially. An updated NAM genetic linkage map has a nearly six-fold greater marker density than the previous map and the combined SNPs and read-depth variants (RDVs) from maize HapMaps 1 and 2 provided 28.5 M genomic variants for association analysis, 17 fold more than HapMap 1. In addition, phenotypic values of the NAM RILs were re-estimated to account for environment-specific flowering time covariates and a small proportion of lines were dropped due to genotypic data quality problems. Comparisons of original and updated QTL and GWAS results confound the effects of linkage map density, GWAS marker density, population sample size, and phenotype estimates. Therefore, we evaluated the effects of changing each of these parameters individually and in combination to determine their relative impact on marker-trait associations in original and updated analyses.
Results
Of the four parameters varied, map density caused the largest changes in QTL and GWAS results. The updated QTL model had better cross-validation prediction accuracy than the previous model. Whereas joint linkage QTL positions were relatively stable to input changes, the residual values derived from those QTL models (used as inputs to GWAS) were more sensitive, resulting in substantial differences between GWAS results. The updated NAM GWAS identified several candidate genes consistent with previous QTL fine-mapping results.
Conclusions
The highly polygenic nature of resistance to SLB complicates the identification of causal genes. Joint linkage QTL are relatively stable to perturbations of data inputs, but their resolution is generally on the order of tens or more Mbp. GWAS associations have higher resolution, but lower power due to stringent thresholds designed to minimize false positive associations, resulting in variability of detection across studies. The updated higher density linkage map improves QTL estimation and, along with a much denser SNP HapMap, greatly increases the likelihood of detecting SNPs in linkage with causal variants. We recommend use of the updated genetic resources and results but emphasize the limited repeatability of small-effect associations.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1068) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-1068
PMCID: PMC4300987  PMID: 25475173
Quantitative trait loci; Nested association mapping; Disease resistance; Genome wide association study; Zea mays
7.  A genome-wide association study of serum uric acid in African Americans 
BMC Medical Genomics  2011;4:17.
Background
Uric acid is the primary byproduct of purine metabolism. Hyperuricemia is associated with body mass index (BMI), sex, and multiple complex diseases including gout, hypertension (HTN), renal disease, and type 2 diabetes (T2D). Multiple genome-wide association studies (GWAS) in individuals of European ancestry (EA) have reported associations between serum uric acid levels (SUAL) and specific genomic loci. The purposes of this study were: 1) to replicate major signals reported in EA populations; and 2) to use the weak LD pattern in African ancestry population to better localize (fine-map) reported loci and 3) to explore the identification of novel findings cognizant of the moderate sample size.
Methods
African American (AA) participants (n = 1,017) from the Howard University Family Study were included in this study. Genotyping was performed using the Affymetrix® Genome-wide Human SNP Array 6.0. Imputation was performed using MACH and the HapMap reference panels for CEU and YRI. A total of 2,400,542 single nucleotide polymorphisms (SNPs) were assessed for association with serum uric acid under the additive genetic model with adjustment for age, sex, BMI, glomerular filtration rate, HTN, T2D, and the top two principal components identified in the assessment of admixture and population stratification.
Results
Four variants in the gene SLC2A9 achieved genome-wide significance for association with SUAL (p-values ranging from 8.88 × 10-9 to 1.38 × 10-9). Fine-mapping of the SLC2A9 signals identified a 263 kb interval of linkage disequilibrium in the HapMap CEU sample. This interval was reduced to 37 kb in our AA and the HapMap YRI samples.
Conclusions
The most strongly associated locus for SUAL in EA populations was also the most strongly associated locus in this AA sample. This finding provides evidence for the role of SLC2A9 in uric acid metabolism across human populations. Additionally, our findings demonstrate the utility of following-up EA populations GWAS signals in African-ancestry populations with weaker linkage disequilibrium.
doi:10.1186/1755-8794-4-17
PMCID: PMC3045279  PMID: 21294900
8.  Imputation of Variants from the 1000 Genomes Project Modestly Improves Known Associations and Can Identify Low-frequency Variant - Phenotype Associations Undetected by HapMap Based Imputation 
PLoS ONE  2013;8(5):e64343.
Genome-wide association (GWA) studies have been limited by the reliance on common variants present on microarrays or imputable from the HapMap Project data. More recently, the completion of the 1000 Genomes Project has provided variant and haplotype information for several million variants derived from sequencing over 1,000 individuals. To help understand the extent to which more variants (including low frequency (1% ≤ MAF <5%) and rare variants (<1%)) can enhance previously identified associations and identify novel loci, we selected 93 quantitative circulating factors where data was available from the InCHIANTI population study. These phenotypes included cytokines, binding proteins, hormones, vitamins and ions. We selected these phenotypes because many have known strong genetic associations and are potentially important to help understand disease processes. We performed a genome-wide scan for these 93 phenotypes in InCHIANTI. We identified 21 signals and 33 signals that reached P<5×10−8 based on HapMap and 1000 Genomes imputation, respectively, and 9 and 11 that reached a stricter, likely conservative, threshold of P<5×10−11 respectively. Imputation of 1000 Genomes genotype data modestly improved the strength of known associations. Of 20 associations detected at P<5×10−8 in both analyses (17 of which represent well replicated signals in the NHGRI catalogue), six were captured by the same index SNP, five were nominally more strongly associated in 1000 Genomes imputed data and one was nominally more strongly associated in HapMap imputed data. We also detected an association between a low frequency variant and phenotype that was previously missed by HapMap based imputation approaches. An association between rs112635299 and alpha-1 globulin near the SERPINA gene represented the known association between rs28929474 (MAF = 0.007) and alpha1-antitrypsin that predisposes to emphysema (P = 2.5×10−12). Our data provide important proof of principle that 1000 Genomes imputation will detect novel, low frequency-large effect associations.
doi:10.1371/journal.pone.0064343
PMCID: PMC3655956  PMID: 23696881
9.  SCAN: SNP and copy number annotation 
Bioinformatics  2009;26(2):259-262.
Motivation: Genome-wide association studies (GWAS) generate relationships between hundreds of thousands of single nucleotide polymorphisms (SNPs) and complex phenotypes. The contribution of the traditionally overlooked copy number variations (CNVs) to complex traits is also being actively studied. To facilitate the interpretation of the data and the designing of follow-up experimental validations, we have developed a database that enables the sensible prioritization of these variants by combining several approaches, involving not only publicly available physical and functional annotations but also multilocus linkage disequilibrium (LD) annotations as well as annotations of expression quantitative trait loci (eQTLs).
Results: For each SNP, the SCAN database provides: (i) summary information from eQTL mapping of HapMap SNPs to gene expression (evaluated by the Affymetrix exon array) in the full set of HapMap CEU (Caucasians from UT, USA) and YRI (Yoruba people from Ibadan, Nigeria) samples; (ii) LD information, in the case of a HapMap SNP, including what genes have variation in strong LD (pairwise or multilocus LD) with the variant and how well the SNP is covered by different high-throughput platforms; (iii) summary information available from public databases (e.g. physical and functional annotations); and (iv) summary information from other GWAS. For each gene, SCAN provides annotations on: (i) eQTLs for the gene (both local and distant SNPs) and (ii) the coverage of all variants in the HapMap at that gene on each high-throughput platform. For each genomic region, SCAN provides annotations on: (i) physical and functional annotations of all SNPs, genes and known CNVs within the region and (ii) all genes regulated by the eQTLs within the region.
Availability: http://www.scandb.org
Contact: ncox@medicine.bsd.uchicago.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp644
PMCID: PMC2852202  PMID: 19933162
10.  Mapping Genetic Variants Associated with Beta-Adrenergic Responses in Inbred Mice 
PLoS ONE  2012;7(7):e41032.
β-blockers and β-agonists are primarily used to treat cardiovascular diseases. Inter-individual variability in response to both drug classes is well recognized, yet the identity and relative contribution of the genetic players involved are poorly understood. This work is the first genome-wide association study (GWAS) addressing the values and susceptibility of cardiovascular-related traits to a selective β1-blocker, Atenolol (ate), and a β-agonist, Isoproterenol (iso). The phenotypic dataset consisted of 27 highly heritable traits, each measured across 22 inbred mouse strains and four pharmacological conditions. The genotypic panel comprised 79922 informative SNPs of the mouse HapMap resource. Associations were mapped by Efficient Mixed Model Association (EMMA), a method that corrects for the population structure and genetic relatedness of the various strains. A total of 205 separate genome-wide scans were analyzed. The most significant hits include three candidate loci related to cardiac and body weight, three loci for electrocardiographic (ECG) values, two loci for the susceptibility of atrial weight index to iso, four loci for the susceptibility of systolic blood pressure (SBP) to perturbations of the β-adrenergic system, and one locus for the responsiveness of QTc (p<10−8). An additional 60 loci were suggestive for one or the other of the 27 traits, while 46 others were suggestive for one or the other drug effects (p<10−6). Most hits tagged unexpected regions, yet at least two loci for the susceptibility of SBP to β-adrenergic drugs pointed at members of the hypothalamic-pituitary-thyroid axis. Loci for cardiac-related traits were preferentially enriched in genes expressed in the heart, while 23% of the testable loci were replicated with datasets of the Mouse Phenome Database (MPD). Altogether these data and validation tests indicate that the mapped loci are relevant to the traits and responses studied.
doi:10.1371/journal.pone.0041032
PMCID: PMC3409184  PMID: 22859963
11.  Genome-Wide Association for Abdominal Subcutaneous and Visceral Adipose Reveals a Novel Locus for Visceral Fat in Women 
PLoS Genetics  2012;8(5):e1002695.
Body fat distribution, particularly centralized obesity, is associated with metabolic risk above and beyond total adiposity. We performed genome-wide association of abdominal adipose depots quantified using computed tomography (CT) to uncover novel loci for body fat distribution among participants of European ancestry. Subcutaneous and visceral fat were quantified in 5,560 women and 4,997 men from 4 population-based studies. Genome-wide genotyping was performed using standard arrays and imputed to ∼2.5 million Hapmap SNPs. Each study performed a genome-wide association analysis of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), VAT adjusted for body mass index, and VAT/SAT ratio (a metric of the propensity to store fat viscerally as compared to subcutaneously) in the overall sample and in women and men separately. A weighted z-score meta-analysis was conducted. For the VAT/SAT ratio, our most significant p-value was rs11118316 at LYPLAL1 gene (p = 3.1×10E-09), previously identified in association with waist–hip ratio. For SAT, the most significant SNP was in the FTO gene (p = 5.9×10E-08). Given the known gender differences in body fat distribution, we performed sex-specific analyses. Our most significant finding was for VAT in women, rs1659258 near THNSL2 (p = 1.6×10-08), but not men (p = 0.75). Validation of this SNP in the GIANT consortium data demonstrated a similar sex-specific pattern, with observed significance in women (p = 0.006) but not men (p = 0.24) for BMI and waist circumference (p = 0.04 [women], p = 0.49 [men]). Finally, we interrogated our data for the 14 recently published loci for body fat distribution (measured by waist–hip ratio adjusted for BMI); associations were observed at 7 of these loci. In contrast, we observed associations at only 7/32 loci previously identified in association with BMI; the majority of overlap was observed with SAT. Genome-wide association for visceral and subcutaneous fat revealed a SNP for VAT in women. More refined phenotypes for body composition and fat distribution can detect new loci not previously uncovered in large-scale GWAS of anthropometric traits.
Author Summary
Body fat distribution, particularly centralized obesity, is associated with metabolic risk above and beyond total adiposity. We performed genome-wide association of abdominal adipose depots quantified using computed tomography (CT) to uncover novel loci for body fat distribution among participants of European ancestry. We quantified subcutaneous and visceral fat in more than 10,000 women and men who also had genome-wide association data available. Given the known gender differences in body fat distribution, we performed sex-specific analyses. Our most significant finding was for VAT in women, near the THNSL2 gene. These findings were not observed in men. We also interrogated our data for the 14 recently published loci for body fat distribution (measured by waist–hip ratio adjusted for BMI); associations were observed for 7 of these loci, most notably for VAT/SAT ratio. We conclude that genome-wide association for visceral and subcutaneous fat revealed a SNP for VAT in women. More refined phenotypes for body composition and fat distribution can detect new loci not uncovered in large-scale GWAS of anthropometric traits.
doi:10.1371/journal.pgen.1002695
PMCID: PMC3349734  PMID: 22589738
12.  An Investigation of Genome-Wide Studies Reported Susceptibility Loci for Ulcerative Colitis Shows Limited Replication in North Indians 
PLoS ONE  2011;6(1):e16565.
Genome-Wide Association studies (GWAS) of both Crohn's Disease (CD) and Ulcerative Colitis (UC) have unearthed over 40 risk conferring variants. Recently, a meta-analysis on UC revealed several loci, most of which were either previously associated with UC or CD susceptibility in populations of European origin. In this study, we attempted to replicate these findings in an ethnically distinct north Indian UC cohort. 648 UC cases and 850 controls were genotyped using Infinium Human 660W-quad. Out of 59 meta-analysis index SNPs, six were not in the SNP array used in the study. Of the remaining 53 SNPs, four were found monomorphic. Association (p<0.05) at 25 SNPs was observed, of which 15 were CD specific. Only five SNPs namely rs2395185 (HLA-DRA), rs3024505 (IL10), rs6426833 (RNF186), rs3763313 (BTNL2) and rs2066843 (NOD2) retained significance after Bonferroni correction. These results (i) reveal limited replication of Caucasian based meta-analysis results; (ii) reiterate overlapping molecular mechanism(s) in UC and CD; (iii) indicate differences in genetic architecture between populations; and (iv) suggest that resources such as HapMap need to be extended to cover diverse ethnic populations. They also suggest a systematic GWAS in this terrain may be insightful for identifying population specific IBD risk conferring loci and thus enable cross-ethnicity fine mapping of disease loci.
doi:10.1371/journal.pone.0016565
PMCID: PMC3031575  PMID: 21304977
13.  Genetic Association for Renal Traits among Participants of African Ancestry Reveals New Loci for Renal Function 
PLoS Genetics  2011;7(9):e1002264.
Chronic kidney disease (CKD) is an increasing global public health concern, particularly among populations of African ancestry. We performed an interrogation of known renal loci, genome-wide association (GWA), and IBC candidate-gene SNP association analyses in African Americans from the CARe Renal Consortium. In up to 8,110 participants, we performed meta-analyses of GWA and IBC array data for estimated glomerular filtration rate (eGFR), CKD (eGFR <60 mL/min/1.73 m2), urinary albumin-to-creatinine ratio (UACR), and microalbuminuria (UACR >30 mg/g) and interrogated the 250 kb flanking region around 24 SNPs previously identified in European Ancestry renal GWAS analyses. Findings were replicated in up to 4,358 African Americans. To assess function, individually identified genes were knocked down in zebrafish embryos by morpholino antisense oligonucleotides. Expression of kidney-specific genes was assessed by in situ hybridization, and glomerular filtration was evaluated by dextran clearance. Overall, 23 of 24 previously identified SNPs had direction-consistent associations with eGFR in African Americans, 2 of which achieved nominal significance (UMOD, PIP5K1B). Interrogation of the flanking regions uncovered 24 new index SNPs in African Americans, 12 of which were replicated (UMOD, ANXA9, GCKR, TFDP2, DAB2, VEGFA, ATXN2, GATM, SLC22A2, TMEM60, SLC6A13, and BCAS3). In addition, we identified 3 suggestive loci at DOK6 (p-value = 5.3×10−7) and FNDC1 (p-value = 3.0×10−7) for UACR, and KCNQ1 with eGFR (p = 3.6×10−6). Morpholino knockdown of kcnq1 in the zebrafish resulted in abnormal kidney development and filtration capacity. We identified several SNPs in association with eGFR in African Ancestry individuals, as well as 3 suggestive loci for UACR and eGFR. Functional genetic studies support a role for kcnq1 in glomerular development in zebrafish.
Author Summary
Chronic kidney disease (CKD) is an increasing global public health problem and disproportionately affects populations of African ancestry. Many studies have shown that genetic variants are associated with the development of CKD; however, similar studies are lacking in African ancestry populations. The CARe consortium consists of more than 8,000 individuals of African ancestry; genome-wide association analysis for renal-related phenotypes was conducted. In cross-ethnicity analyses, we found that 23 of 24 previously identified SNPs in European ancestry populations have the same effect direction in our samples of African ancestry. We also identified 3 suggestive genetic variants associated with measurement of kidney function. We then tested these genes in zebrafish knockdown models and demonstrated that kcnq1 is involved in kidney development in zebrafish. These results highlight the similarity of genetic variants across ethnicities and show that cross-species modeling in zebrafish is feasible for genes associated with chronic human disease.
doi:10.1371/journal.pgen.1002264
PMCID: PMC3169523  PMID: 21931561
14.  Association of genetic variation with systolic and diastolic blood pressure among African Americans: the Candidate Gene Association Resource study 
Fox, Ervin R. | Young, J. Hunter | Li, Yali | Dreisbach, Albert W. | Keating, Brendan J. | Musani, Solomon K. | Liu, Kiang | Morrison, Alanna C. | Ganesh, Santhi | Kutlar, Abdullah | Ramachandran, Vasan S. | Polak, Josef F. | Fabsitz, Richard R. | Dries, Daniel L. | Farlow, Deborah N. | Redline, Susan | Adeyemo, Adebowale | Hirschorn, Joel N. | Sun, Yan V. | Wyatt, Sharon B. | Penman, Alan D. | Palmas, Walter | Rotter, Jerome I. | Townsend, Raymond R. | Doumatey, Ayo P. | Tayo, Bamidele O. | Mosley, Thomas H. | Lyon, Helen N. | Kang, Sun J. | Rotimi, Charles N. | Cooper, Richard S. | Franceschini, Nora | Curb, J. David | Martin, Lisa W. | Eaton, Charles B. | Kardia, Sharon L.R. | Taylor, Herman A. | Caulfield, Mark J. | Ehret, Georg B. | Johnson, Toby | Chakravarti, Aravinda | Zhu, Xiaofeng | Levy, Daniel | Munroe, Patricia B. | Rice, Kenneth M. | Bochud, Murielle | Johnson, Andrew D. | Chasman, Daniel I. | Smith, Albert V. | Tobin, Martin D. | Verwoert, Germaine C. | Hwang, Shih-Jen | Pihur, Vasyl | Vollenweider, Peter | O'Reilly, Paul F. | Amin, Najaf | Bragg-Gresham, Jennifer L. | Teumer, Alexander | Glazer, Nicole L. | Launer, Lenore | Zhao, Jing Hua | Aulchenko, Yurii | Heath, Simon | Sõber, Siim | Parsa, Afshin | Luan, Jian'an | Arora, Pankaj | Dehghan, Abbas | Zhang, Feng | Lucas, Gavin | Hicks, Andrew A. | Jackson, Anne U. | Peden, John F. | Tanaka, Toshiko | Wild, Sarah H. | Rudan, Igor | Igl, Wilmar | Milaneschi, Yuri | Parker, Alex N. | Fava, Cristiano | Chambers, John C. | Kumari, Meena | JinGo, Min | van der Harst, Pim | Kao, Wen Hong Linda | Sjögren, Marketa | Vinay, D.G. | Alexander, Myriam | Tabara, Yasuharu | Shaw-Hawkins, Sue | Whincup, Peter H. | Liu, Yongmei | Shi, Gang | Kuusisto, Johanna | Seielstad, Mark | Sim, Xueling | Nguyen, Khanh-Dung Hoang | Lehtimäki, Terho | Matullo, Giuseppe | Wu, Ying | Gaunt, Tom R. | Charlotte Onland-Moret, N. | Cooper, Matthew N. | Platou, Carl G.P. | Org, Elin | Hardy, Rebecca | Dahgam, Santosh | Palmen, Jutta | Vitart, Veronique | Braund, Peter S. | Kuznetsova, Tatiana | Uiterwaal, Cuno S.P.M. | Campbell, Harry | Ludwig, Barbara | Tomaszewski, Maciej | Tzoulaki, Ioanna | Palmer, Nicholette D. | Aspelund, Thor | Garcia, Melissa | Chang, Yen-Pei C. | O'Connell, Jeffrey R. | Steinle, Nanette I. | Grobbee, Diederick E. | Arking, Dan E. | Hernandez, Dena | Najjar, Samer | McArdle, Wendy L. | Hadley, David | Brown, Morris J. | Connell, John M. | Hingorani, Aroon D. | Day, Ian N.M. | Lawlor, Debbie A. | Beilby, John P. | Lawrence, Robert W. | Clarke, Robert | Collins, Rory | Hopewell, Jemma C. | Ongen, Halit | Bis, Joshua C. | Kähönen, Mika | Viikari, Jorma | Adair, Linda S. | Lee, Nanette R. | Chen, Ming-Huei | Olden, Matthias | Pattaro, Cristian | Hoffman Bolton, Judith A. | Köttgen, Anna | Bergmann, Sven | Mooser, Vincent | Chaturvedi, Nish | Frayling, Timothy M. | Islam, Muhammad | Jafar, Tazeen H. | Erdmann, Jeanette | Kulkarni, Smita R. | Bornstein, Stefan R. | Grässler, Jürgen | Groop, Leif | Voight, Benjamin F. | Kettunen, Johannes | Howard, Philip | Taylor, Andrew | Guarrera, Simonetta | Ricceri, Fulvio | Emilsson, Valur | Plump, Andrew | Barroso, Inês | Khaw, Kay-Tee | Weder, Alan B. | Hunt, Steven C. | Bergman, Richard N. | Collins, Francis S. | Bonnycastle, Lori L. | Scott, Laura J. | Stringham, Heather M. | Peltonen, Leena | Perola, Markus | Vartiainen, Erkki | Brand, Stefan-Martin | Staessen, Jan A. | Wang, Thomas J. | Burton, Paul R. | SolerArtigas, Maria | Dong, Yanbin | Snieder, Harold | Wang, Xiaoling | Zhu, Haidong | Lohman, Kurt K. | Rudock, Megan E. | Heckbert, Susan R. | Smith, Nicholas L. | Wiggins, Kerri L. | Shriner, Daniel | Veldre, Gudrun | Viigimaa, Margus | Kinra, Sanjay | Prabhakaran, Dorairajan | Tripathy, Vikal | Langefeld, Carl D. | Rosengren, Annika | Thelle, Dag S. | MariaCorsi, Anna | Singleton, Andrew | Forrester, Terrence | Hilton, Gina | McKenzie, Colin A. | Salako, Tunde | Iwai, Naoharu | Kita, Yoshikuni | Ogihara, Toshio | Ohkubo, Takayoshi | Okamura, Tomonori | Ueshima, Hirotsugu | Umemura, Satoshi | Eyheramendy, Susana | Meitinger, Thomas | Wichmann, H.-Erich | Cho, Yoon Shin | Kim, Hyung-Lae | Lee, Jong-Young | Scott, James | Sehmi, Joban S. | Zhang, Weihua | Hedblad, Bo | Nilsson, Peter | Smith, George Davey | Wong, Andrew | Narisu, Narisu | Stančáková, Alena | Raffel, Leslie J. | Yao, Jie | Kathiresan, Sekar | O'Donnell, Chris | Schwartz, Steven M. | Arfan Ikram, M. | Longstreth, Will T. | Seshadri, Sudha | Shrine, Nick R.G. | Wain, Louise V. | Morken, Mario A. | Swift, Amy J. | Laitinen, Jaana | Prokopenko, Inga | Zitting, Paavo | Cooper, Jackie A. | Humphries, Steve E. | Danesh, John | Rasheed, Asif | Goel, Anuj | Hamsten, Anders | Watkins, Hugh | Bakker, Stephan J.L. | van Gilst, Wiek H. | Janipalli, Charles S. | Radha Mani, K. | Yajnik, Chittaranjan S. | Hofman, Albert | Mattace-Raso, Francesco U.S. | Oostra, Ben A. | Demirkan, Ayse | Isaacs, Aaron | Rivadeneira, Fernando | Lakatta, Edward G. | Orru, Marco | Scuteri, Angelo | Ala-Korpela, Mika | Kangas, Antti J. | Lyytikäinen, Leo-Pekka | Soininen, Pasi | Tukiainen, Taru | Würz, Peter | Twee-Hee Ong, Rick | Dörr, Marcus | Kroemer, Heyo K. | Völker, Uwe | Völzke, Henry | Galan, Pilar | Hercberg, Serge | Lathrop, Mark | Zelenika, Diana | Deloukas, Panos | Mangino, Massimo | Spector, Tim D. | Zhai, Guangju | Meschia, James F. | Nalls, Michael A. | Sharma, Pankaj | Terzic, Janos | Kranthi Kumar, M.J. | Denniff, Matthew | Zukowska-Szczechowska, Ewa | Wagenknecht, Lynne E. | Fowkes, Gerald R. | Charchar, Fadi J. | Schwarz, Peter E.H. | Hayward, Caroline | Guo, Xiuqing | Bots, Michiel L. | Brand, Eva | Samani, Nilesh J. | Polasek, Ozren | Talmud, Philippa J. | Nyberg, Fredrik | Kuh, Diana | Laan, Maris | Hveem, Kristian | Palmer, Lyle J. | van der Schouw, Yvonne T. | Casas, Juan P. | Mohlke, Karen L. | Vineis, Paolo | Raitakari, Olli | Wong, Tien Y. | Shyong Tai, E. | Laakso, Markku | Rao, Dabeeru C. | Harris, Tamara B. | Morris, Richard W. | Dominiczak, Anna F. | Kivimaki, Mika | Marmot, Michael G. | Miki, Tetsuro | Saleheen, Danish | Chandak, Giriraj R. | Coresh, Josef | Navis, Gerjan | Salomaa, Veikko | Han, Bok-Ghee | Kooner, Jaspal S. | Melander, Olle | Ridker, Paul M. | Bandinelli, Stefania | Gyllensten, Ulf B. | Wright, Alan F. | Wilson, James F. | Ferrucci, Luigi | Farrall, Martin | Tuomilehto, Jaakko | Pramstaller, Peter P. | Elosua, Roberto | Soranzo, Nicole | Sijbrands, Eric J.G. | Altshuler, David | Loos, Ruth J.F. | Shuldiner, Alan R. | Gieger, Christian | Meneton, Pierre | Uitterlinden, Andre G. | Wareham, Nicholas J. | Gudnason, Vilmundur | Rettig, Rainer | Uda, Manuela | Strachan, David P. | Witteman, Jacqueline C.M. | Hartikainen, Anna-Liisa | Beckmann, Jacques S. | Boerwinkle, Eric | Boehnke, Michael | Larson, Martin G. | Järvelin, Marjo-Riitta | Psaty, Bruce M. | Abecasis, Gonçalo R. | Elliott, Paul | van Duijn , Cornelia M. | Newton-Cheh, Christopher
Human Molecular Genetics  2011;20(11):2273-2284.
The prevalence of hypertension in African Americans (AAs) is higher than in other US groups; yet, few have performed genome-wide association studies (GWASs) in AA. Among people of European descent, GWASs have identified genetic variants at 13 loci that are associated with blood pressure. It is unknown if these variants confer susceptibility in people of African ancestry. Here, we examined genome-wide and candidate gene associations with systolic blood pressure (SBP) and diastolic blood pressure (DBP) using the Candidate Gene Association Resource (CARe) consortium consisting of 8591 AAs. Genotypes included genome-wide single-nucleotide polymorphism (SNP) data utilizing the Affymetrix 6.0 array with imputation to 2.5 million HapMap SNPs and candidate gene SNP data utilizing a 50K cardiovascular gene-centric array (ITMAT-Broad-CARe [IBC] array). For Affymetrix data, the strongest signal for DBP was rs10474346 (P= 3.6 × 10−8) located near GPR98 and ARRDC3. For SBP, the strongest signal was rs2258119 in C21orf91 (P= 4.7 × 10−8). The top IBC association for SBP was rs2012318 (P= 6.4 × 10−6) near SLC25A42 and for DBP was rs2523586 (P= 1.3 × 10−6) near HLA-B. None of the top variants replicated in additional AA (n = 11 882) or European-American (n = 69 899) cohorts. We replicated previously reported European-American blood pressure SNPs in our AA samples (SH2B3, P= 0.009; TBX3-TBX5, P= 0.03; and CSK-ULK3, P= 0.0004). These genetic loci represent the best evidence of genetic influences on SBP and DBP in AAs to date. More broadly, this work supports that notion that blood pressure among AAs is a trait with genetic underpinnings but also with significant complexity.
doi:10.1093/hmg/ddr092
PMCID: PMC3090190  PMID: 21378095
15.  Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples 
BMC Bioinformatics  2008;9(Suppl 9):S17.
Background
Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.
Results
Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.
Conclusion
Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
doi:10.1186/1471-2105-9-S9-S17
PMCID: PMC2537568  PMID: 18793462
16.  Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS 
PLoS Genetics  2010;6(4):e1000888.
Although genome-wide association studies (GWAS) of complex traits have yielded more reproducible associations than had been discovered using any other approach, the loci characterized to date do not account for much of the heritability to such traits and, in general, have not led to improved understanding of the biology underlying complex phenotypes. Using a web site we developed to serve results of expression quantitative trait locus (eQTL) studies in lymphoblastoid cell lines from HapMap samples (http://www.scandb.org), we show that single nucleotide polymorphisms (SNPs) associated with complex traits (from http://www.genome.gov/gwastudies/) are significantly more likely to be eQTLs than minor-allele-frequency–matched SNPs chosen from high-throughput GWAS platforms. These findings are robust across a range of thresholds for establishing eQTLs (p-values from 10−4–10−8), and a broad spectrum of human complex traits. Analyses of GWAS data from the Wellcome Trust studies confirm that annotating SNPs with a score reflecting the strength of the evidence that the SNP is an eQTL can improve the ability to discover true associations and clarify the nature of the mechanism driving the associations. Our results showing that trait-associated SNPs are more likely to be eQTLs and that application of this information can enhance discovery of trait-associated SNPs for complex phenotypes raise the possibility that we can utilize this information both to increase the heritability explained by identifiable genetic factors and to gain a better understanding of the biology underlying complex traits.
Author Summary
We show here that single nucleotide polymorphisms (SNPs) associated with complex traits (as identified in the catalog of results from genome-wide association studies http://www.genome.gov/gwastudies/) are more likely than other SNPs chosen from high-throughput genotyping platforms to predict expression levels of genes. These observations confirm that genetic risk factors for complex traits will often affect phenotype by altering the amount or timing of protein production, rather than by changing the type of protein produced. This knowledge can be used to improve our ability to discover genetic risk factors for complex traits and to improve our understanding of their underlying biology.
doi:10.1371/journal.pgen.1000888
PMCID: PMC2848547  PMID: 20369019
17.  Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples 
PLoS Genetics  2010;6(3):e1000866.
As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB).
Author Summary
Many association studies have been published looking for genetic variants contributing to a variety of human traits such as obesity, diabetes, and height. Because the frequency of genetic variants can differ across populations, it is important to have estimates of genetic ancestry in the individuals being studied. In this study, we were able to measure genetic ancestry in populations of mixed ancestry by genotyping pooled, rather than individual, DNA samples. This represents a rapid and inexpensive means for modeling genetic ancestry and thus could facilitate future association or population-genetic studies in populations of unknown ancestry for which whole-genome data do not already exist.
doi:10.1371/journal.pgen.1000866
PMCID: PMC2832667  PMID: 20221249
18.  Genome-wide eQTLs and heritability for gene expression traits in unrelated individuals 
BMC Genomics  2014;15:13.
Background
While the possible sources underlying the so-called ‘missing heritability’ evident in current genome-wide association studies (GWAS) of complex traits have been actively pursued in recent years, resolving this mystery remains a challenging task. Studying heritability of genome-wide gene expression traits can shed light on the goal of understanding the relationship between phenotype and genotype. Here we used microarray gene expression measurements of lymphoblastoid cell lines and genome-wide SNP genotype data from 210 HapMap individuals to examine the heritability of gene expression traits.
Results
Heritability levels for expression of 10,720 genes were estimated by applying variance component model analyses and 1,043 expression quantitative loci (eQTLs) were detected. Our results indicate that gene expression traits display a bimodal distribution of heritability, one peak close to 0% and the other summit approaching 100%. Such a pattern of the within-population variability of gene expression heritability is common among different HapMap populations of unrelated individuals but different from that obtained in the CEU and YRI trio samples. Higher heritability levels are shown by housekeeping genes and genes associated with cis eQTLs. Both cis and trans eQTLs make comparable cumulative contributions to the heritability. Finally, we modelled gene-gene interactions (epistasis) for genes with multiple eQTLs and revealed that epistasis was not prevailing in all genes but made a substantial contribution in explaining total heritability for some genes analysed.
Conclusions
We utilised a mixed effect model analysis for estimating genetic components from population based samples. On basis of analyses of genome-wide gene expression from four HapMap populations, we demonstrated detailed exploitation of the distribution of genetic heritabilities for expression traits from different populations, and highlighted the importance of studying interaction at the gene expression level as an important source of variation underlying missing heritability.
doi:10.1186/1471-2164-15-13
PMCID: PMC4028055  PMID: 24405759
Microarray gene expression; eQTLs; Heritability; Mixed model; HapMap populations; Epistasis
19.  MaCH: Using Sequence and Genotype Data to Estimate Haplotypes and Unobserved Genotypes 
Genetic epidemiology  2010;34(8):816-834.
Genome-wide association studies (GWAS) can identify common alleles that contribute to complex disease susceptibility. Despite the large number of SNPs assessed in each study, the effects of most common SNPs must be evaluated indirectly using either genotyped markers or haplotypes thereof as proxies. We have previously implemented a computationally efficient Markov Chain framework for genotype imputation and haplotyping in the freely available MaCH software package. The approach describes sampled chromosomes as mosaics of each other and uses available genotype and shotgun sequence data to estimate unobserved genotypes and haplotypes, together with useful measures of the quality of these estimates. Our approach is already widely used to facilitate comparison of results across studies as well as meta-analyses of GWAS. Here, we use simulations and experimental genotypes to evaluate its accuracy and utility, considering choices of genotyping panels, reference panel configurations, and designs where genotyping is replaced with shotgun sequencing. Importantly, we show that genotype imputation not only facilitates cross study analyses but also increases power of genetic association studies. We show that genotype imputation of common variants using HapMap haplotypes as a reference is very accurate using either genome-wide SNP data or smaller amounts of data typical in fine-mapping studies. Furthermore, we show the approach is applicable in a variety of populations. Finally, we illustrate how association analyses of unobserved variants will benefit from ongoing advances such as larger HapMap reference panels and whole genome shotgun sequencing technologies.
doi:10.1002/gepi.20533
PMCID: PMC3175618  PMID: 21058334
imputation; haplotyping; sequencing
20.  Collaborative Meta-analysis: Associations of 150 Candidate Genes With Osteoporosis and Osteoporotic Fracture 
Annals of internal medicine  2009;151(8):528-537.
Background
Osteoporosis is a highly heritable trait. Many candidate genes have been proposed as being involved in regulating bone mineral density (BMD). Few of these findings have been replicated in independent studies.
Objective
To assess the relationship between BMD and fracture and all common single-nucleotide polymorphisms (SNPs) in previously proposed osteoporosis candidate genes.
Design
Large-scale meta-analysis of genome-wide association data.
Setting
5 international, multicenter, population-based studies.
Participants
Data on BMD were obtained from 19 195 participants (14 277 women) from 5 populations of European origin. Data on fracture were obtained from a prospective cohort (n = 5974) from the Netherlands.
Measurements
Systematic literature review using the Human Genome Epidemiology Navigator identified autosomal genes previously evaluated for association with osteoporosis. We explored the common SNPs arising from the haplotype map of the human genome (HapMap) across all these genes. BMD at the femoral neck and lumbar spine was measured by dual-energy x-ray absorptiometry. Fractures were defined as clinically apparent, site-specific, validated nonvertebral and vertebral low-energy fractures.
Results
150 candidate genes were identified and 36 016 SNPs in these loci were assessed. SNPs from 9 gene loci (ESR1, LRP4, ITGA1, LRP5, SOST, SPP1, TNFRSF11A, TNFRSF11B, and TN-FSF11) were associated with BMD at either site. For most genes, no SNP was statistically significant. For statistically significant SNPs (n = 241), effect sizes ranged from 0.04 to 0.18 SD per allele. SNPs from the LRP5, SOST, SPP1, and TNFRSF11A loci were significantly associated with fracture risk; odds ratios ranged from 1.13 to 1.43 per allele. These effects on fracture were partially independent of BMD at SPP1 and SOST.
Limitation
Only common polymorphisms in linkage disequilibrium with SNPs in HapMap could be assessed, and previously reported associations for SNPs in some candidate genes could not be excluded.
Conclusion
In this large-scale collaborative genome-wide meta-analysis, 9 of 150 candidate genes were associated with regulation of BMD, 4 of which also significantly affected risk for fracture. However, most candidate genes had no consistent association with BMD.
Primary Funding Source
European Union, Netherlands Organisation for Scientific Research, Research Institute for Diseases in the Elderly, Netherlands Genomics Initiative, Wellcome Trust, National Institutes of Health, deCODE Genetics, and Canadian Institutes of Health Research.
PMCID: PMC2842981  PMID: 19841454
21.  A Genome-Wide Association Study Identifies Novel and Functionally Related Susceptibility Loci for Kawasaki Disease 
PLoS Genetics  2009;5(1):e1000319.
Kawasaki disease (KD) is a pediatric vasculitis that damages the coronary arteries in 25% of untreated and approximately 5% of treated children. Epidemiologic data suggest that KD is triggered by unidentified infection(s) in genetically susceptible children. To investigate genetic determinants of KD susceptibility, we performed a genome-wide association study (GWAS) in 119 Caucasian KD cases and 135 matched controls with stringent correction for possible admixture, followed by replication in an independent cohort and subsequent fine-mapping, for a total of 893 KD cases plus population and family controls. Significant associations of 40 SNPs and six haplotypes, identifying 31 genes, were replicated in an independent cohort of 583 predominantly Caucasian KD families, with NAALADL2 (rs17531088, pcombined = 1.13×10−6) and ZFHX3 (rs7199343, pcombined = 2.37×10−6) most significantly associated. Sixteen associated variants with a minor allele frequency of >0.05 that lay within or close to known genes were fine-mapped with HapMap tagging SNPs in 781 KD cases, including 590 from the discovery and replication stages. Original or tagging SNPs in eight of these genes replicated the original findings, with seven genes having further significant markers in adjacent regions. In four genes (ZFHX3, NAALADL2, PPP1R14C, and TCP1), the neighboring markers were more significantly associated than the originally associated variants. Investigation of functional relationships between the eight fine-mapped genes using Ingenuity Pathway Analysis identified a single functional network (p = 10−13) containing five fine-mapped genes—LNX1, CAMK2D, ZFHX3, CSMD1, and TCP1—with functional relationships potentially related to inflammation, apoptosis, and cardiovascular pathology. Pair-wise blood transcript levels were measured during acute and convalescent KD for all fine-mapped genes, revealing a consistent trend of significantly reduced transcript levels prior to treatment. This is one of the first GWAS in an infectious disease. We have identified novel, plausible, and functionally related variants associated with KD susceptibility that may also be relevant to other cardiovascular diseases.
Author Summary
Kawasaki disease is an inflammatory pediatric condition that damages the coronary arteries in a quarter of untreated patients and is the commonest cause of childhood acquired heart disease in developed countries. While the infectious trigger(s) remain unknown, epidemiologic evidence suggests that human genetic variation underlies the susceptibility. In order to identify novel mechanisms that may predispose to this disease, we undertook a genome-wide association study, which investigates genetic determinants without prior supposition regarding the loci of interest. This was amongst the first complex infectious diseases to be studied in this way and one of the largest genetic studies of Kawasaki disease with 893 cases. We identified and confirmed 40 SNPs and six haplotypes, identifying 31 genes, in an international cohort of Caucasian patients. We followed up 16 SNPs where the associated genetic variant was more common and was situated within a gene, confirming eight SNPs by fine-mapping across the entire gene. Of these eight genes, seven were expressed in blood and five showed significantly different gene expression in paired patient samples taken during acute and convalescent Kawasaki disease. Five of the eight genes also appear to be involved in a single putative functional network of interacting genes. These novel genes and pathways may ultimately lead to novel diagnostics and treatment for Kawasaki disease.
doi:10.1371/journal.pgen.1000319
PMCID: PMC2607021  PMID: 19132087
22.  Genome-Wide Association Studies of the PR Interval in African Americans 
PLoS Genetics  2011;7(2):e1001304.
The PR interval on the electrocardiogram reflects atrial and atrioventricular nodal conduction time. The PR interval is heritable, provides important information about arrhythmia risk, and has been suggested to differ among human races. Genome-wide association (GWA) studies have identified common genetic determinants of the PR interval in individuals of European and Asian ancestry, but there is a general paucity of GWA studies in individuals of African ancestry. We performed GWA studies in African American individuals from four cohorts (n = 6,247) to identify genetic variants associated with PR interval duration. Genotyping was performed using the Affymetrix 6.0 microarray. Imputation was performed for 2.8 million single nucleotide polymorphisms (SNPs) using combined YRI and CEU HapMap phase II panels. We observed a strong signal (rs3922844) within the gene encoding the cardiac sodium channel (SCN5A) with genome-wide significant association (p<2.5×10−8) in two of the four cohorts and in the meta-analysis. The signal explained 2% of PR interval variability in African Americans (beta  = 5.1 msec per minor allele, 95% CI  = 4.1–6.1, p = 3×10−23). This SNP was also associated with PR interval (beta = 2.4 msec per minor allele, 95% CI = 1.8–3.0, p = 3×10−16) in individuals of European ancestry (n = 14,042), but with a smaller effect size (p for heterogeneity <0.001) and variability explained (0.5%). Further meta-analysis of the four cohorts identified genome-wide significant associations with SNPs in SCN10A (rs6798015), MEIS1 (rs10865355), and TBX5 (rs7312625) that were highly correlated with SNPs identified in European and Asian GWA studies. African ancestry was associated with increased PR duration (13.3 msec, p = 0.009) in one but not the other three cohorts. Our findings demonstrate the relevance of common variants to African Americans at four loci previously associated with PR interval in European and Asian samples and identify an association signal at one of these loci that is more strongly associated with PR interval in African Americans than in Europeans.
Author Summary
We performed genome-wide association studies in African American participants from four population-based cohorts to identify genetic variation that correlates with variation in PR interval duration, an electrocardiographic measure of conduction through the atria and atrioventricular node. We observed a strong signal within the gene encoding the cardiac sodium channel, SCN5A, with genome-wide significant association (p<2.5×10−8) in two cohorts and in a meta-analysis of four cohorts with African Americans. We replicated this association in two additional cohorts of African Americans and in Europeans (p = 3×10−16). The signal explains 2% of PR duration variability in African Americans and 0.5% in Europeans. In further meta-analysis, we observed genome-wide significant associations for single nucleotide polymorphisms in SCN10A, MEIS1, TBX5, corresponding to signals observed in people of European and Asian descent. We found an association of genetic ancestry and PR interval in one but not the other three cohorts. Our findings provide the first demonstration of the relevance of these loci to individuals of African ancestry and identify an association signal from SCN5A that is more strongly associated with PR interval in African Americans.
doi:10.1371/journal.pgen.1001304
PMCID: PMC3037415  PMID: 21347284
23.  Beyond the HapMap Genotypic Data: Prospects of Deep Resequencing Projects 
Current bioinformatics  2008;3(3):178.
The International HapMap Project provides a key resource of genotypic data on human samples including lymphoblastoid cell lines derived from individuals of four major world populations of African, European, Japanese and Chinese ancestry. Researchers have utilized this resource to identify genetic elements that correlate with various phenotypes such as risks of common diseases, individual drug response and gene expression variation. However, recent comparative studies have suggested that the currently available HapMap genotypic data may not capture a substantial proportion of rare or untyped SNPs in these populations, implying that the HapMap SNPs may not be sufficient for comprehensive association studies. In this paper, three large-scale deep resequencing projects covering the HapMap samples: ENCODE (Encyclopedia of DNA Elements), SeattleSNPs and NIEHS (National Institute of Environmental Health Sciences) Environmental Genome Project are discussed. Prospectively, once integrated with the HapMap resource, these efforts will greatly benefit the next wave of association studies and data mining using these cell lines.
doi:10.2174/157489308785909232
PMCID: PMC2819736  PMID: 20151045
HapMap; lymphoblastoid cell lines; genotype; single nucleotide polymorphism; resequencing
24.  Digital Genotyping of Macrosatellites and Multicopy Genes Reveals Novel Biological Functions Associated with Copy Number Variation of Large Tandem Repeats 
PLoS Genetics  2014;10(6):e1004418.
Tandem repeats are common in eukaryotic genomes, but due to difficulties in assaying them remain poorly studied. Here, we demonstrate the utility of Nanostring technology as a targeted approach to perform accurate measurement of tandem repeats even at extremely high copy number, and apply this technology to genotype 165 HapMap samples from three different populations and five species of non-human primates. We observed extreme variability in copy number of tandemly repeated genes, with many loci showing 5–10 fold variation in copy number among humans. Many of these loci show hallmarks of genome assembly errors, and the true copy number of many large tandem repeats is significantly under-represented even in the high quality ‘finished’ human reference assembly. Importantly, we demonstrate that most large tandem repeat variations are not tagged by nearby SNPs, and are therefore essentially invisible to SNP-based GWAS approaches. Using association analysis we identify many cis correlations of large tandem repeat variants with nearby gene expression and DNA methylation levels, indicating that variations of tandem repeat length are associated with functional effects on the local genomic environment. This includes an example where expansion of a macrosatellite repeat is associated with increased DNA methylation and suppression of nearby gene expression, suggesting a mechanism termed “repeat induced gene silencing”, which has previously been observed only in transgenic organisms. We also observed multiple signatures consistent with altered selective pressures at tandemly repeated loci, suggesting important biological functions. Our studies show that tandemly repeated loci represent a highly variable fraction of the genome that have been systematically ignored by most previous studies, copy number variation of which can exert functionally significant effects. We suggest that future studies of tandem repeat loci will lead to many novel insights into their role in modulating both genomic and phenotypic diversity.
Author Summary
Here we utilize Nanostring digital assays and show their utility for estimating copy number of 186 multicopy genes and tandem repeats. By analyzing patterns of single nucleotide variation around these variants, we show that copy number variation at the vast majority of tandem repeat variations is not effectively tagged by nearby SNPs, and thus standard genome-wide association studies that focus on SNPs provide little or no information about such variants. By comparing patterns of tandem repeat copy number with variation in local gene expression and DNA methylation, we also identify extensive functional effects on local genome function. This includes an example of a non-coding macrosatellite repeat, expansion of which exerts a repressive effect on a nearby gene accompanied by accumulations of local DNA methylation. Finally, comparison of diverse human populations with a number of primate genomes shows that many of these sequences have undergone extreme changes in copy number during recent human and primate evolution, and show signatures that suggest possible selective effects. Overall, we conclude that multicopy genes and macrosatellites represent a highly variable fraction of the genome with important functional effects that has been systematically ignored by previous studies.
doi:10.1371/journal.pgen.1004418
PMCID: PMC4063668  PMID: 24945355
25.  Genomewide meta-analysis identifies novel multiple sclerosis susceptibility loci 
Annals of neurology  2011;70(6):897-912.
Objective
To perform a one-stage meta-analysis of genome-wide association studies (GWAS) of multiple sclerosis (MS) susceptibility and explore functional consequences of new susceptibility loci.
Methods
We synthesized 7 MS GWAS. Each dataset was imputed using HapMap phase II and a per-SNP meta-analysis was performed across the 7 datasets. We explored RNA expression data using a quantitative trait analysis in peripheral blood mononuclear cells (PBMCs) of 228 subjects with demyelinating disease.
Results
We meta-analyzed 2,529,394 unique SNPs in 5,545 cases and 12,153 controls. We identified three novel susceptibility alleles: rs170934T at 3p24.1 (OR=1.17, P = 1.6 × 10−8) near EOMES, rs2150702G in the second intron of MLANA on chromosome 9p24.1 (OR = 1.16, P = 3.3 × 10−8), and rs6718520A in an intergenic region on chromosome 2p21, with THADA as the nearest flanking gene (OR = 1.17, P = 3.4 × 10−8). The three new loci do not have a strong “cis” effect on RNA expression in PBMCs. Ten other susceptibility loci had a suggestive P<1×10−6, some of which have evidence of association in other inflammatory diseases, i.e. IL12B, TAGAP, PLEK, and ZMIZ1.
Interpretation
We have performed a meta-analysis of GWAS in MS that more than doubles the size of previous gene discovery efforts and highlights three novel MS susceptibility loci. These and additional loci with suggestive evidence of association are excellent candidates for further investigations to refine and validate their role in the genetic architecture of MS.
doi:10.1002/ana.22609
PMCID: PMC3247076  PMID: 22190364

Results 1-25 (1064584)