1.  Genome-wide Association Study of N370S Homozygous Gaucher Disease Reveals the Candidacy of CLN8 gene as a Genetic Modifier Contributing to Extreme Phenotypic Variation 
American journal of hematology  2012;87(4):377-383.
Mutations in GBA1 gene result in defective acid β-glucosidase and the complex phenotype of Gaucher disease (GD) related to the accumulation of glucosylceramide-laden macrophages. The phenotype is highly variable even among patients harboring identical GBA1 mutations. We hypothesized that modifier gene(s) underlie phenotypic diversity in GD and performed a GWAS study in Ashkenazi Jewish patients with type 1 GD (GD1), homozygous for N370S mutation. Patients were assigned to mild, moderate or severe disease category using composite disease severity scoring systems. Whole-genome genotyping for >500,000 SNPs was performed to search for associations using OQLS algorithm in 139 eligible patients. Several SNPs in linkage disequilibrium within the CLN8 gene locus were associated with the GD1 severity: SNP rs11986414 was associated with GD1 severity at p value 1.26 × 10−6. Compared to mild disease, risk allele A at rs11986414 conferred an odds ratio of 3.72 for moderate/severe disease. Loss of function mutations in CLN8 causes neuronal ceroid-lipofuscinosis but our results indicate that its increased expression may protect against severe GD1. In cultured skin fibroblasts, the relative expression of CLN8 was higher in mild GD compared to severely affected patients in whom CLN8 risk alleles were over-represented. In an in vitro cell model of GD, CLN8 expression was increased which was further enhanced in the presence of bioactive substrate, glucosylsphingosine. Taken together, CLN8 is a candidate modifier gene for GD1 that may function as a protective sphingolipid sensor and/or in glycosphingolipid trafficking. Future studies should explore the role of CLN8 in pathophysiology of GD.
PMCID: PMC3684025  PMID: 22388998
Gaucher disease; GWAS; genotype/phenotype correlations; phenotypic diversity; modifier genes; CLN8; N370S; GBA mutations
2.  A Variant in the Glucokinase Regulatory Protein (GCKR) Gene is Associated with Fatty Liver in Obese Children and Adolescents 
Hepatology (Baltimore, Md.)  2011;55(3):781-789.
Recently the SNP identified as rs1260326, in the glucokinase regulatory protein (GCKR) was associated with hypertriglyceridemia in adults. Since accumulation of triglycerides in hepatocytes represents the hallmark of the steatosis, we aimed to investigate whether this variant might be associated with fatty liver (hepatic fat content, HFF%). Moreover, since recently rs738409 in the PNPLA3 and rs2854116 in the APOC3 were associated with fatty liver recently, we explored how the GCKR SNP and these two variants jointly influence hepatosteatosis.
Methods and Results
We studied 455 obese children and adolescents (181 Caucasians, 139 African Americans and 135 Hispanics). All underwent an OGTT and fasting lipoprotein subclasses measurement by proton NMR. A subset of 142 children underwent a fast gradient MRI to measure the HFF%.
The rs1260326 was associated with elevated triglycerides (Caucasians p=0.00014; African Americans p=0.00417) large VLDL (Caucasians p=0.001; African Americans p=0.03) and with fatty liver (Caucasians p= 0.034; African Americans p= 0.00002; and Hispanics p= 0.016). The PNPLA3, but not the APOC3 rs2854116 SNP, was associated with fatty liver but not with triglycerides levels. There was a joint effect between the PNPLA3 and GCKR SNPs, explaining 32% of HFF% variance Caucasians (p=0.00161), 39.0% in African Americans (p=0.00000496), and 15% in Hispanics (p=0.00342).
The rs1260326 in GCKR is associated with hepatic fat accumulation along with large VLDL, and triglycerides levels. GCKR and PNPLA3 act together to convey susceptibility to fatty liver in obese youths.
PMCID: PMC3288435  PMID: 22105854
GCKR; PNPLA3; SNPs; obesity; youths
3.  A Comparison of Methods for Clustering 16S rRNA Sequences into OTUs 
PLoS ONE  2013;8(8):e70837.
Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
PMCID: PMC3742672  PMID: 23967117
4.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease 
Jostins, Luke | Ripke, Stephan | Weersma, Rinse K | Duerr, Richard H | McGovern, Dermot P | Hui, Ken Y | Lee, James C | Schumm, L Philip | Sharma, Yashoda | Anderson, Carl A | Essers, Jonah | Mitrovic, Mitja | Ning, Kaida | Cleynen, Isabelle | Theatre, Emilie | Spain, Sarah L | Raychaudhuri, Soumya | Goyette, Philippe | Wei, Zhi | Abraham, Clara | Achkar, Jean-Paul | Ahmad, Tariq | Amininejad, Leila | Ananthakrishnan, Ashwin N | Andersen, Vibeke | Andrews, Jane M | Baidoo, Leonard | Balschun, Tobias | Bampton, Peter A | Bitton, Alain | Boucher, Gabrielle | Brand, Stephan | Büning, Carsten | Cohain, Ariella | Cichon, Sven | D’Amato, Mauro | De Jong, Dirk | Devaney, Kathy L | Dubinsky, Marla | Edwards, Cathryn | Ellinghaus, David | Ferguson, Lynnette R | Franchimont, Denis | Fransen, Karin | Gearry, Richard | Georges, Michel | Gieger, Christian | Glas, Jürgen | Haritunians, Talin | Hart, Ailsa | Hawkey, Chris | Hedl, Matija | Hu, Xinli | Karlsen, Tom H | Kupcinskas, Limas | Kugathasan, Subra | Latiano, Anna | Laukens, Debby | Lawrance, Ian C | Lees, Charlie W | Louis, Edouard | Mahy, Gillian | Mansfield, John | Morgan, Angharad R | Mowat, Craig | Newman, William | Palmieri, Orazio | Ponsioen, Cyriel Y | Potocnik, Uros | Prescott, Natalie J | Regueiro, Miguel | Rotter, Jerome I | Russell, Richard K | Sanderson, Jeremy D | Sans, Miquel | Satsangi, Jack | Schreiber, Stefan | Simms, Lisa A | Sventoraityte, Jurgita | Targan, Stephan R | Taylor, Kent D | Tremelling, Mark | Verspaget, Hein W | De Vos, Martine | Wijmenga, Cisca | Wilson, David C | Winkelmann, Juliane | Xavier, Ramnik J | Zeissig, Sebastian | Zhang, Bin | Zhang, Clarence K | Zhao, Hongyu | Silverberg, Mark S | Annese, Vito | Hakonarson, Hakon | Brant, Steven R | Radford-Smith, Graham | Mathew, Christopher G | Rioux, John D | Schadt, Eric E | Daly, Mark J | Franke, Andre | Parkes, Miles | Vermeire, Severine | Barrett, Jeffrey C | Cho, Judy H
Nature  2012;491(7422):119-124.
Crohn’s disease (CD) and ulcerative colitis (UC), the two common forms of inflammatory bowel disease (IBD), affect over 2.5 million people of European ancestry with rising prevalence in other populations1. Genome-wide association studies (GWAS) and subsequent meta-analyses of CD and UC2,3 as separate phenotypes implicated previously unsuspected mechanisms, such as autophagy4, in pathogenesis and showed that some IBD loci are shared with other inflammatory diseases5. Here we expand knowledge of relevant pathways by undertaking a meta-analysis of CD and UC genome-wide association scans, with validation of significant findings in more than 75,000 cases and controls. We identify 71 new associations, for a total of 163 IBD loci that meet genome-wide significance thresholds. Most loci contribute to both phenotypes, and both directional and balancing selection effects are evident. Many IBD loci are also implicated in other immune-mediated disorders, most notably with ankylosing spondylitis and psoriasis. We also observe striking overlap between susceptibility loci for IBD and mycobacterial infection. Gene co-expression network analysis emphasizes this relationship, with pathways shared between host responses to mycobacteria and those predisposing to IBD.
PMCID: PMC3491803  PMID: 23128233
5.  Genome-Wide Association Study of Alcohol Dependence Implicates KIAA0040 on Chromosome 1q 
Neuropsychopharmacology  2011;37(2):557-566.
Previous studies using SAGE (the Study of Addiction: Genetics and Environment) and COGA (the Collaborative Study on the Genetics of Alcoholism) genome-wide association study (GWAS) data sets reported several risk loci for alcohol dependence (AD), which have not yet been well replicated independently or confirmed by functional studies. We combined these two data sets, now publicly available, to increase the study power, in order to identify replicable, functional, and significant risk regions for AD. A total of 4116 subjects (1409 European-American (EA) cases with AD, 1518 EA controls, 681 African-American (AA) cases, and 508 AA controls) underwent association analysis. An additional 443 subjects underwent expression quantitative trait locus (eQTL) analysis. Genome-wide association analysis was performed in EAs to identify significant risk genes. All available markers in the genome-wide significant risk genes were tested in AAs for associations with AD, and in six HapMap populations and two European samples for associations with gene expression levels. We identified a unique genome-wide significant gene—KIAA0040—that was enriched with many replicable risk SNPs for AD, all of which had significant cis-acting regulatory effects. The distributions of −log(p) values for SNP-disease and SNP-expression associations for all markers in the TNN–KIAA0040 region were consistent across EAs, AAs, and five HapMap populations (0.369⩽r⩽0.824; 2.8 × 10−9⩽p⩽0.032). The most significant SNPs in these populations were in high LD, concentrating in KIAA0040. Finally, expression of KIAA0040 was significantly (1.2 × 10−11⩽p⩽1.5 × 10−6) associated with the expression of numerous genes in the neurotransmitter systems or metabolic pathways previously associated with AD. We concluded that KIAA0040 might harbor a causal variant for AD and thus might directly contribute to risk for this disorder. KIAA0040 might also contribute to the risk of AD via neurotransmitter systems or metabolic pathways that have previously been implicated in the pathophysiology of AD. Alternatively, KIAA0040 might regulate the risk via some interactions with flanking genes TNN and TNR. TNN is involved in neurite outgrowth and cell migration in hippocampal explants, and TNR is an extracellular matrix protein expressed primarily in the central nervous system.
PMCID: PMC3242317  PMID: 21956439
risk region; alcohol dependence; cis-eQTL; GWAS; alcohol & alcoholism; neurogenetics; addiction & substance abuse; biological psychiatry; GWAScis-eQTL; risk region
7.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease 
Nature Genetics  2011;43(11):1066-1073.
More than a thousand disease susceptibility loci have been identified via genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings generally remain to be defined. We utilize pooled next-generation sequencing to study 56 genes in regions associated to Crohn’s Disease in 350 cases and 350 controls. Follow up genotyping of 70 rare and low-frequency protein-altering variants (MAF ~ .001-.05) in nine independent case-control series (16054 CD patients, 12153 UC patients, 17575 healthy controls) identifies four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association to a novel, protective splice variant in CARD9 (p < 1e-16, OR ~ 0.29), as well as additional associations to coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by providing novel, rare, and likely functional variants that will empower functional experiments and predictive models.
PMCID: PMC3378381  PMID: 21983784
8.  Correction: A Genome-Wide Association Study on Obesity and Obesity-Related Traits 
PLoS ONE  2012;7(2):10.1371/annotation/a34ee94e-3e6a-48bd-a19e-398a4bb88580.
PMCID: PMC3293772
9.  A Common Variant in the Patatin-Like Phospholipase 3 Gene (PNPLA3) Is Associated with Fatty Liver Disease in Obese Children and Adolescents 
Hepatology (Baltimore, Md.)  2010;52(4):1281-1290.
The genetic factors associated with susceptibility to nonalcoholic fatty liver disease (NAFLD) in pediatric obesity remain largely unknown. Recently, a nonsynonymous single-nucleotide polymorphism (rs738409), in the patatin-like phospholipase 3 gene (PNPLA3) has been associated with hepatic steatosis in adults. In a multiethnic group of 85 obese youths, we genotyped the PNLPA3 single-nucleotide polymorphism, measured hepatic fat content by magnetic resonance imaging and insulin sensitivity by the insulin clamp. Because PNPLA3 might affect adipogenesis/lipogenesis, we explored the putative association with the distribution of adipose cell size and the expression of some adipogenic/lipogenic genes in a subset of subjects who underwent a subcutaneous fat biopsy. Steatosis was present in 41% of Caucasians, 23% of African Americans, and 66% of Hispanics. The frequency of PNPLA3(rs738409) G allele was 0.324 in Caucasians, 0.183 in African Americans, and 0.483 in Hispanics. The prevalence of the G allele was higher in subjects showing hepatic steatosis. Surprisingly, subjects carrying the G allele showed comparable hepatic glucose production rates, peripheral glucose disposal rate, and glycerol turnover as the CC homozygotes. Carriers of the G allele showed smaller adipocytes than those with CC genotype (P = 0.005). Although the expression of PNPLA3, PNPLA2, PPARγ2(peroxisome proliferator-activated receptor gamma 2), SREBP1c(sterol regulatory element binding protein 1c), and ACACA(acetyl coenzyme A carboxylase) was not different between genotypes, carriers of the G allele showed lower leptin (LEP)(P = 0.03) and sirtuin 1 (SIRT1) expression (P = 0.04).
A common variant of the PNPLA3 gene confers susceptibility to hepatic steatosis in obese youths without increasing the level of hepatic and peripheral insulin resistance. The rs738409 PNPLA3 G allele is associated with morphological changes in adipocyte cell size.
PMCID: PMC3221304  PMID: 20803499
10.  A Novel, Functional and Replicable Risk Gene Region for Alcohol Dependence Identified by Genome-Wide Association Study 
PLoS ONE  2011;6(11):e26726.
Several genome-wide association studies (GWASs) reported tens of risk genes for alcohol dependence, but most of them have not been replicated or confirmed by functional studies. The present study used a GWAS to search for novel, functional and replicable risk gene regions for alcohol dependence. Associations of all top-ranked SNPs identified in a discovery sample of 681 African-American (AA) cases with alcohol dependence and 508 AA controls were retested in a primary replication sample of 1,409 European-American (EA) cases and 1,518 EA controls. The replicable associations were then subjected to secondary replication in a sample of 6,438 Australian family subjects. A functional expression quantitative trait locus (eQTL) analysis of these replicable risk SNPs was followed-up in order to explore their cis-acting regulatory effects on gene expression. We found that within a 90 Mb region around PHF3-PTP4A1 locus in AAs, a linkage disequilibrium (LD) block in PHF3-PTP4A1 formed the only peak associated with alcohol dependence at p<10−4. Within this block, 30 SNPs associated with alcohol dependence in AAs (1.6×10−5≤p≤0.050) were replicated in EAs (1.3×10−3≤p≤0.038), and 18 of them were also replicated in Australians (1.8×10−3≤p≤0.048). Most of these risk SNPs had strong cis-acting regulatory effects on PHF3-PTP4A1 mRNA expression across three HapMap samples. The distributions of −log(p) values for association and functional signals throughout this LD block were highly consistent across AAs, EAs, Australians and three HapMap samples. We conclude that the PHF3-PTP4A1 region appears to harbor a causal locus for alcohol dependence, and proteins encoded by PHF3 and/or PTP4A1 might play a functional role in the disorder.
PMCID: PMC3210123  PMID: 22096494
11.  A Genome-Wide Association Study on Obesity and Obesity-Related Traits 
PLoS ONE  2011;6(4):e18939.
Large-scale genome-wide association studies (GWAS) have identified many loci associated with body mass index (BMI), but few studies focused on obesity as a binary trait. Here we report the results of a GWAS and candidate SNP genotyping study of obesity, including extremely obese cases and never overweight controls as well as families segregating extreme obesity and thinness. We first performed a GWAS on 520 cases (BMI>35 kg/m2) and 540 control subjects (BMI<25 kg/m2), on measures of obesity and obesity-related traits. We subsequently followed up obesity-associated signals by genotyping the top ∼500 SNPs from GWAS in the combined sample of cases, controls and family members totaling 2,256 individuals. For the binary trait of obesity, we found 16 genome-wide significant signals within the FTO gene (strongest signal at rs17817449, P = 2.5×10−12). We next examined obesity-related quantitative traits (such as total body weight, waist circumference and waist to hip ratio), and detected genome-wide significant signals between waist to hip ratio and NRXN3 (rs11624704, P = 2.67×10−9), previously associated with body weight and fat distribution. Our study demonstrated how a relatively small sample ascertained through extreme phenotypes can detect genuine associations in a GWAS.
PMCID: PMC3084240  PMID: 21552555

