Keloids are benign dermal tumors that occur ~20-times more often in African versus Caucasian descent individuals. While most keloids occur sporadically, a genetic predisposition is supported by both familial aggregation of some keloids and the large differences in risk among populations. Yet, no well-established genetic risk factors for keloids have been identified. In this study we conducted admixture mapping and whole exome association using 478 African Americans (AAs) samples (122 cases, 356 controls) with exome genotyping data to identify regions where local ancestry associated with keloid risk. Logistic regression was used to evaluate associations under admixture peaks. A significant mapping peak was observed on chr15q21.2-22.3. This peak included NEDD4, a gene previously implicated in a keloid genome-wide association study (GWAS) of Japanese individuals later validated in a Chinese cohort. While we observed modest evidence for association with NEDD4, a more significant association was observed at (myosin 1E) MYO1E. A genome-scan not including the 15q21-22 region also identified associations at MYO7A (p rs35641839, odds ratio [OR]=4.71, 95% confidence interval [CI] 2.38–9.32, p=8.34x10−6) at 11q13.5. The identification of SNPs in two myosin genes strongly associated with keloid formation suggests that an altered cytoskeleton contributes to the enhanced migratory and invasive properties of keloid fibroblasts. Our findings support the admixture mapping approach for the study of keloid risk, and indicate potentially common genetic elements on chr15q21.2-22.3 in causation of keloids in AAs, Japanese, and Chinese populations.
keloids; admixture mapping; fibrosis; genetic epidemiology; ancestry
Genetic susceptibility to common conditions, such as essential hypertension and cardiac hypertrophy, is probably determined by various combinations of small quantitative changes in the expression of many genes. NPR1, coding for natriuretic peptide receptor A (NPRA), is a potential candidate, because NPRA mediates natriuretic, diuretic, and vasorelaxing actions of the nariuretic peptides, and because genetically determined quantitative changes in the expression of this gene affect blood pressure and heart weight in a dose-dependent manner in mice. To determine whether there are common quantitative variants in human NPR1, we have sequenced the entire human NPR1 gene and identified 10 polymorphic sites in its non-coding sequence by using DNA from 34 unrelated human individuals. Five of the sites are single nucleotide polymorphisms; the remaining five are length polymorphisms, including a highly variable complex dinucleotide repeat in intron 19. There are three common haplotypes 5’ to this dinucleotide repeat and three 3’ to it, but the 5’ haplotypes and 3’ haplotypes appear to be randomly associated. Transient expression analysis in cultured cells of reporter plasmids with the proximal promoter sequences of NPR1 and its 3’ untranslated regions showed that these polymorphisms have functional effects. We conclude that common NPR1 alleles can alter expression of the gene as much as two-fold and could therefore significantly affect genetic risks for essential hypertension and cardiac hypertrophy in humans.
Chronic alcohol consumption may induce gene expression alterations in brain reward regions such as the prefrontal cortex (PFC), modulating the risk of alcohol use disorders (AUDs). Transcriptome profiles of 23 AUD cases and 23 matched controls (16 pairs of males and 7 pairs of females) in postmortem PFC were generated using Illumina’s HumanHT-12 v4 Expression BeadChip. Probe-level differentially expressed genes and gene modules in AUD subjects were identified using multiple linear regression and weighted gene co-expression network analyses. The enrichment of differentially co-expressed genes in alcohol dependence-associated genes identified by genome-wide association studies (GWAS) was examined using gene set enrichment analysis. Biological pathways overrepresented by differentially co-expressed genes were uncovered using DAVID bioinformatics resources. Three AUD-associated gene modules in males [Module 1 (561 probes mapping to 505 genes): r=0.42, Pcorrelation=0.020; Module 2 (815 probes mapping to 713 genes): r=0.41, Pcorrelation=0.020; Module 3 (1,446 probes mapping to 1,305 genes): r=−0.38, Pcorrelation=0.030] and one AUD-associated gene module in females [Module 4 (683 probes mapping to 652 genes): r=0.64, Pcorrelation=0.010] were identified. Differentially expressed genes mapped by significant expression probes (Pnominal≤0.05) clustered in Modules 1 and 2 were enriched in GWAS-identified alcohol dependence-associated genes [Module 1 (134 genes): P=0.028; Module 2 (243 genes): P=0.004]. These differentially expressed genes, including ALDH2, ALDH7A1, and ALDH9A1, are involved in cellular functions such as aldehyde detoxification, mitochondrial function, and fatty acid metabolism. Our study revealed differentially co-expressed genes in postmortem PFC of AUD subjects and demonstrated that some of these differentially co-expressed genes participate in alcohol metabolism.
Alcohol use disorders; postmortem prefrontal cortex; genome-wide gene expression; co-expression; gene set enrichment analysis; biological pathways
Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized association of variants influencing MPV and PLT using functional, pathway and disease enrichment analysis assess pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic (eMERGE) network had data for PLT and 6,291 participants had data for MPV. We identified 5 chromosomal regions associated with PLT and 8 associated with MPV at genome-wide significance (P<5E-8). In addition, we replicated 20 SNPs (out of 56 SNPs (α: 0.05/56=9E-4)) influencing PLT and 22 SNPs (out of 29 SNPs (α: 0.05/29=2E-3)) influencing MPV in a meta-analysis of GWAS of PLT and MPV. While our GWAS did not reveal any novel associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1368 diagnoses (0.05/1368=3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.
Congenital heart disease (CHD) is the most common congenital malformation, with evidence of a strong genetic component. We analyzed data from 223 consecutively ascertained families, each consisting of at least one child affected by a conotruncal defect (CNT) or hypoplastic left heart disease (HLHS) and both parents. The NimbleGen HD2-2.1 comparative genomic hybridization platform was used to identify de novo and rare inherited copy number variants (CNVs). Excluding 10 cases with 22q11.2 DiGeorge deletions, we validated de novo CNVs in 8 % of 148 probands with CNTs, 12.7 % of 71 probands with HLHS and none in 4 probands with both. Only 2 % of control families showed a de novo CNV. We also identified a group of ultra-rare inherited CNVs that occurred de novo in our sample, contained a candidate gene for CHD, recurred in our sample or were present in an affected sibling. We confirmed the contribution to CHD of copy number changes in genes such as GATA4 and NODAL and identified several genes in novel recurrent CNVs that may point to novel CHD candidate loci. We also found CNVs previously associated with highly variable pheno-types and reduced penetrance, such as dup 1q21.1, dup 16p13.11, dup 15q11.2-13, dup 22q11.2, and del 2q23.1. We found that the presence of extra-cardiac anomalies was not related to the frequency of CNVs, and that there was no significant difference in CNV frequency or specificity between the probands with CNT and HLHS. In agreement with other series, we identified likely causal CNVs in 5.6 % of our total sample, half of which were de novo.
Elevated intraocular pressure (IOP) is a major risk factor for glaucoma and is influenced by genetic and environmental factors. Recent genome-wide association studies (GWAS) reported associations with IOP at TMCO1 and GAS7, and with primary open-angle glaucoma (POAG) at CDKN2B-AS1, CAV1/CAV2, and SIX1/SIX6. To identify novel genetic variants and replicate the published findings, we performed GWAS and meta-analysis of IOP in >6,000 subjects of European ancestry collected in three datasets: the NEI Glaucoma Human genetics collaBORation, GLAUcoma Genes and ENvironment study, and a subset of the Age-related Macular Degeneration-Michigan, Mayo, AREDS and Pennsylvania study. While no signal achieved genome-wide significance in individual datasets, a meta-analysis identified significant associations with IOP at TMCO1 (rs7518099-G, p = 8.0 × 10−8). Focused analyses of five loci previously reported for IOP and/or POAG, i.e., TMCO1, CDKN2B-AS1, GAS7, CAV1/CAV2, and SIX1/SIX6, revealed associations with IOP that were largely consistent across our three datasets, and replicated the previously reported associations in both effect size and direction. These results confirm the involvement of common variants in multiple genomic regions in regulating IOP and/or glaucoma risk.
CDH13 encodes T-cadherin, a receptor for high molecular weight (HMW) adiponectin and low-density lipoprotein, promoting proliferation and migration of endothelial cells. Genome-wide association studies have mapped multiple variants in CDH13 associated with cardiometabolic traits (CMT) with variable effects across studies. We hypothesized that this heterogeneity might reflect interplay with DNA methylation within the region. Resequencing and EpiTYPER™ assay were applied for the HYPertension in ESTonia/Coronary Artery Disease in Czech (HYPEST/CADCZ; n = 358) samples to identify CDH13 promoter SNPs acting as methylation Quantitative Trait Loci (meQTLs) and to investigate their associations with CMT. In silico data were extracted from genome-wide DNA methylation and genotype datasets of the population-based sample Estonian Genome Center of the University of Tartu (EGCUT; n = 165). HYPEST–CADCZ meta-analysis identified a rare variant rs113460564 as highly significant meQTL for a 134-bp distant CpG site (P = 5.90 × 10−6; β = 3.19 %). Four common SNPs (rs12443878, rs12444338, rs62040565, rs8060301) exhibited effect on methylation level of up to 3 neighboring CpG sites in both datasets. The strongest association was detected in EGCUT between rs8060301 and cg09415485 (false discovery rate corrected P value = 1.89 × 10−30). Simultaneously, rs8060301 showed association with diastolic blood pressure, serum high-density lipoprotein and HMW adiponectin (P < 0.005). Novel strong associations were identified between rare CDH13 promoter meQTLs (minor allele frequency <5 %) and HMW adiponectin: rs2239857 (P = 5.50 × 10−5, β = −1,841.9 ng/mL) and rs77068073 (P = 2.67 × 10−4, β = −2,484.4 ng/mL). Our study shows conclusively that CDH13 promoter harbors meQTLs associated with CMTs. It paves the way to deeper understanding of the interplay between DNA variation and methylation in susceptibility to common diseases.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-014-1521-6) contains supplementary material, which is available to authorized users.
Diseases related to smoking are the second leading cause of death in the world. Cigarette smoking is a risk factor for several diseases such as cancer and cardiovascular and respiratory disorders. Despite increasing evidence of genetic determination, the susceptibility genes and loci underlying various aspects of smoking behavior are largely unknown. Moreover, almost all reported genome-wide association studies (GWASs) have been performed on samples of European origin, limiting the applicability of the results to other ethnic populations. In this first GWAS on smoking behavior in an Asian population, after analyzing 8,842 DNA samples from the Korea Association Resource project with 352,228 single nucleotide polymorphisms (SNPs) genotyped for each sample, we identified 8 SNPs significantly associated with smoking initiation (SI) and 4 with nicotine dependence (ND). Because of the current unavailability of an independent Asian smoking sample, we replicated the discoveries in independent samples of European-American and African-American origin. Of the 12 SNPs examined in the replicated samples, we identified two SNPs, in the regulator of G-protein signaling 17 gene (rs7747583, p valuemeta = 6.40 × 10−6; rs2349433, p valuemeta = 5.57 × 10−6), associated with SI. Also, we found two SNPs significantly associated with ND; one in the FERM domain containing 4A (rs4424567, p valuemeta = 2.30 × 10−6) and the other at 7q31.1 (rs848353, p valuemeta = 9.16 × 10−8). These SNPs represent novel targets for examination of smoking behavior and warrant further investigation using independent samples.
Uterine fibroid (UFs) affect 77% of women by menopause and account for $9.4 billion in yearly healthcare costs. We recently replicated findings from the first UF genome-wide association study (GWAS), conducted in the Japanese. Here we tested these GWAS-discovered SNPs for association with UF characteristics to further assess whether risk varies by sub-phenotypes of UFs. Women were enrolled in Right from the Start (RFTS) and the BioVU DNA Repository (BioVU). UF status was determined by pelvic imaging. We tested the top GWAS-associated SNPs for association with UF characteristics (RFTS: type, number, volume; BioVU: type) using covariate adjusted logistic and linear regression. We also combined association results of UF type using meta-analysis. 456 European American (EA) cases and 1,549 controls were examined. Trinucleotide repeat containing 6B (TNRC6B) rs12484776 associated with volume in RFTS (Beta = 0.40, 95% CI 0.05 to 0.75, p = 0.024). RFTS analyses evaluating stratified quartiles of volume showed the strongest OR at rs12484776 for the largest volume (16.6 to 179.1 cc, odds ratio [OR]=2.19, 95% confidence interval [CI] 1.07 to 4.46, p = 0.031). Meta-analysis showed a strong association at blocked early in transport 1 homolog (BET1L) rs2280543 for intramural UFs (meta-OR = 0.51, standard error [SE] = 0.14, Q = 0.590, I = 0, p = 2.48×10−6), which is stronger than the overall association with UF risk. This study is the first to evaluate these SNPs for association with UF characteristics and suggests these genes associate with increasing UF volume and protection from intramural UF in EAs.
Uterine leiomyoma; fibroids; genetic epidemiology; polymorphism; women's health
Branchio-oto-renal (BOR) syndrome is an autosomal dominant disorder characterized by branchial arch anomalies, hearing loss and renal dysmorphology. Although haploinsufficiency of EYA1 and SIX1 are known to cause BOR, copy number variation analysis has only been performed on a limited number of BOR patients. In this study, we used high-resolution array-based comparative genomic hybridization (aCGH) on 32 BOR probands negative for coding-sequence and splice-site mutations in known BOR-causing genes to identify potential disease-causing genomic rearrangements. Of the >1,000 rare and novel copy number variants (CNVs) we identified, four were heterozygous deletions of EYA1 and several downstream genes that had nearly identical breakpoints associated with retroviral sequence blocks, suggesting that non-allelic homologous recombination seeded by this recombination hotspot is important in the pathogenesis of BOR. A different heterozygous deletion removing the last exon of EYA1 was identified in an additional proband. Thus in total 5 probands (14%) had deletions of all or part of EYA1. Using a novel disease-gene prioritization strategy that includes network analysis of genes associated with other deletions suggests that SHARPIN (Sipl1), FGF3 and the HOXA gene cluster may contribute to the pathogenesis of BOR.
array CGH; EYA1; birth defects; Branchio-oto-renal syndrome; copy number variation
We performed a gene–smoking interaction analysis using families from an early-onset coronary artery disease cohort (GENECARD). This analysis was focused on validating and expanding results from previous studies implicating single nucleotide polymorphisms (SNPs) on chromosome 3 in smoking-mediated coronary artery disease. We analyzed 430 SNPs on chromosome 3 and identified 16 SNPs that showed a gene–smoking interaction at P < 0.05 using association in the presence of linkage—ordered subset analysis, a method that uses permutations of the data to empirically estimate the strength of the association signal. Seven of the 16 SNPs were in the Rho-GTPase pathway indicating a 1.87-fold enrichment for this pathway. A meta-analysis of gene–smoking interactions in three independent studies revealed that rs9289231 in KALRN had a Fisher’s combined P value of 0.0017 for the interaction with smoking. In a gene-based meta-analysis KALRN had a P value of 0.026. Finally, a pathway-based analysis of the association results using WebGestalt revealed several enriched pathways including the regulation of the actin cytoskeleton pathway as defined by the Kyoto Encyclopedia of Genes and Genomes.
It is commonly acknowledged that estimates of heritability from classical twin studies have many potential shortcomings. Despite this, in the post-GWAS era, these heritability estimates have come to be a continual source of interest and controversy. While the heritability estimates of a quantitative trait are subject to a number of biases, in this article we will argue that the standard statistical approach to estimating the heritability of a binary trait relies on some additional untestable assumptions which, if violated, can lead to badly biased estimates. The ACE liability threshold model assumes at its heart that each individual has an underlying liability or propensity to acquire the binary trait (e.g., disease), and that this unobservable liability is multivariate normally distributed. We investigated a number of different scenarios violating this assumption such as the existence of a single causal diallelic gene and the existence of a dichotomous exposure. For each scenario, we found that substantial asymptotic biases can occur, which no increase in sample size can remove. Asymptotic biases as much as four times larger than the true value were observed, and numerous cases also showed large negative biases. Additionally, regions of low bias occurred for specific parameter combinations. Using simulations, we also investigated the situation where all of the assumptions of the ACE liability model are met. We found that commonly used sample sizes can lead to biased heritability estimates. Thus, even if we are willing to accept the meaningfulness of the liability construct, heritability estimates under the ACE liability threshold model may not accurately reflect the heritability of this construct. The points made in this paper should be kept in mind when considering the meaningfulness of a reported heritability estimate for any specific disease.
Genome-wide association studies (GWAS) have identified many variants that influence high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, and/or triglycerides. However, environmental modifiers, such as smoking, of these known genotype–phenotype associations are just recently emerging in the literature. We have tested for interactions between smoking and 49 GWAS-identified variants in over 41,000 racially/ethnically diverse samples with lipid levels from the Population Architecture Using Genomics and Epidemiology (PAGE) study. Despite their biological plausibility, we were unable to detect significant SNP × smoking interactions.
Two syndromic cognitive impairment disorders have very similar craniofacial dysmorphisms. One is caused by mutations of SATB2, a transcription regulator, and the other by heterozygous mutations leading to premature stop codons in UPF3B, encoding a member of the nonsense-mediated mRNA decay complex. Here we demonstrate that the products of these two causative genes function in the same pathway. We show that the SATB2 nonsense mutation in our patient leads to a truncated protein that localizes to the nucleus, forms a dimer with wild-type SATB2 and interferes with its normal activity. This suggests that the SATB2 nonsense mutation has a dominant negative effect. The patient’s leukocytes had significantly decreased UPF3B mRNA compared to controls. This effect was replicated both in vitro, where siRNA knockdown of SATB2 in HEK293 cells resulted in decreased UPF3B expression, and in vivo, where embryonic tissue of Satb2 knock-out mice showed significantly decreased Upf3b expression. Furthermore, chromatin immunoprecipitation demonstrates that SATB2 binds to the UPF3B promoter, and a luciferase reporter assay confirmed that SATB2 expression significantly activates gene transcription using the UPF3B promoter. These findings indicate that SATB2 acts as an activator UPF3B expression through binding to its promoter. This study emphasizes the value of recognizing disorders with similar clinical phenotypes to explore underlying mechanisms of genetic interaction.
Oral-facial-digital type VI syndrome (OFDVI) is a rare phenotype of Joubert syndrome (JS). Recently, C5orf42 was suggested as the major OFDVI gene, being mutated in 9 of 11 families (82 %). We sequenced C5orf42 in 313 JS probands and identified mutations in 28 (8.9 %), most with a phenotype of pure JS. Only 2 out of 17 OFDVI patients (11.7 %) were mutated. A comparison of mutated vs. non-mutated OFDVI patients showed that preaxial and mesoaxial polydactyly, hypothalamic hamartoma and other congenital defects may predict C5orf42 mutations, while tongue hamartomas are more common in negative patients.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-014-1508-3) contains supplementary material, which is available to authorized users.
Mutation Position Imaging Toolbox (MuPIT) Interactive is a browser-based application for single nucleotide variants (SNVs), which automatically maps the genomic coordinates of SNVs onto the coordinates of available three-dimensional protein structures. The application is designed for interactive browser-based visualization of the putative functional relevance of SNVs by biologists who are not necessarily experts either in bioinformatics or protein structure. Users may submit batches of several thousand SNVs and review all protein structures that cover the SNVs, including available functional annotations such as binding sites, mutagenesis experiments, and common polymorphisms. Multiple SNVs may be mapped onto each structure, enabling 3D visualization of SNV clusters and their relationship to functionally annotated positions. We illustrate the utility of MuPIT Interactive in rationalizing the impact of selected polymorphisms in the PharmGKB database, somatic mutations identified in the Cancer Genome Atlas study of invasive breast carcinomas, and rare variants identified in the Exome Sequencing Project. MuPIT Interactive is freely available for non-profit use at http://mupit.icm.jhu.edu.
mutation; protein structure; single nucleotide variant; visualization
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios.
genetic association; kernel distance; SKAT statistic