genetic association study; disease genetics; immunogenetics; liver
Analysis of the ImmunoChip single nucleotide polymorphism (SNP) array in 2816 individuals, comprising the most common subtypes (oligoarticular and RF negative polyarticular) of juvenile idiopathic arthritis (JIA) and 13056 controls strengthens the evidence for association to three known JIA-risk loci (HLA, PTPN22 and PTPN2) and has identified fourteen risk loci reaching genome-wide significance (p < 5 × 10-8) for the first time. Eleven additional novel regions showed suggestive evidence for association with JIA (p < 1 × 10-6). Dense-mapping of loci along with bioinformatic analysis has refined the association to one gene for eight regions, highlighting crucial pathways, including the IL-2 pathway, in JIA disease pathogenesis. The entire ImmunoChip loci, HLA region and the top 27 loci (p < 1 × 10-6) explain an estimated 18%, 13% and 6% risk of JIA, respectively. Analysis of the ImmunoChip dataset, the largest cohort of JIA cases investigated to date, provides new insight in understanding the genetic basis for this childhood autoimmune disease.
Prior studies have identified recurrent oncogenic mutations in colorectal adenocarcinoma1 and have surveyed exons of protein-coding genes for mutations in 11 affected individuals2,3. Here we report whole-genome sequencing from nine individuals with colorectal cancer, including primary colorectal tumors and matched adjacent non-tumor tissues, at an average of 30.7× and 31.9× coverage, respectively. We identify an average of 75 somatic rearrangements per tumor, including complex networks of translocations between pairs of chromosomes. Eleven rearrangements encode predicted in-frame fusion proteins, including a fusion of VTI1A and TCF7L2 found in 3 out of 97 colorectal cancers. Although TCF7L2 encodes TCF4, which cooperates with β-catenin4 in colorectal carcinogenesis5,6, the fusion lacks the TCF4 β-catenin–binding domain. We found a colorectal carcinoma cell line harboring the fusion gene to be dependent on VTI1A-TCF7L2 for anchorage-independent growth using RNA interference-mediated knockdown. This study shows previously unidentified levels of genomic rearrangements in colorectal carcinoma that can lead to essential gene fusions and other oncogenic events.
We demonstrate that the binding sites for highly conserved transcription factors vary extensively between human and mouse. We mapped the binding of four tissue-specific transcription factors (FOXA2, HNF1A, HNF4A, HNF6) to 4,000 orthologous gene pairs in hepatocytes purified from human and mouse livers. Despite the conserved function of these factors, from 41% to 89% of their binding events appear to be species-specific. When the same protein binds the promoters of orthologous genes, approximately two-thirds of the binding sites do not align.
We performed a genome-wide association study in non-Hispanic white subjects with fibrotic idiopathic interstitial pneumonias (N=1616) and controls (N=4683); replication was assessed in 876 cases and 1890 controls. We confirmed association with TERT and MUC5B on chromosomes 5p15 and 11p15, respectively, the chromosome 3q26 region near TERC, and identified 7 novel loci (PMeta = 2.4×10−8 to PMeta = 1.1×10−19). The novel loci include FAM13A (4q22), DSP (6p24), OBFC1 (10q24), ATP11A (13q34), DPP9 (19p13), and chromosomal regions 7q22 and 15q14-15. Our results demonstrate that genes involved in host defense, cell-cell adhesion, and DNA repair contribute to the risk of fibrotic IIP.
The commonest pediatric brain tumors are low-grade gliomas (LGGs). We utilized whole genome sequencing to discover multiple novel genetic alterations involving BRAF, RAF1, FGFR1, MYB, MYBL1 and genes with histone-related functions, including H3F3A and ATRX, in 39 LGGs and low-grade glioneuronal tumors (LGGNTs). Only a single non-silent somatic alteration was detected in 24/39 (62%) tumors. Intragenic duplications of the FGFR1 tyrosine kinase domain (TKD) and rearrangements of MYB were recurrent and mutually exclusive in 53% of grade II diffuse LGGs. Transplantation of Trp53-null neonatal astrocytes containing TKD-duplicated FGFR1 into brains of nude mice generated high-grade astrocytomas with short latency and 100% penetrance. TKD-duplicated FGFR1 induced FGFR1 autophosphorylation and upregulation of the MAPK/ERK and PI3K pathways, which could be blocked by specific inhibitors. Focusing on the therapeutically challenging diffuse LGGs, our study of 151 tumors has discovered genetic alterations and potential therapeutic targets across the entire range of pediatric LGGs/LGGNTs.
Chronic infection with hepatitis C virus (HCV) is a common cause of liver cirrhosis and cancer. We performed RNA-sequencing in primary human hepatocytes activated with synthetic dsRNA to mimic HCV infection. Upstream of IFNL3 (IL28B) on chromosome 19q13.13, we discovered a novel, transiently induced region that harbors dinucleotide variant ss469415590 (TT/ΔG), which is in high linkage disequilibrium with rs12979860, a genetic marker strongly associated with HCV clearance. ss469415590-ΔG is a frame-shift variant that creates a novel primate-specific gene, designated interferon lambda 4 (IFNL4), which encodes a protein of moderate similarity with IFNL3. Compared to rs12979860, ss469415590 is more strongly associated with HCV clearance in individuals of African ancestry, whereas it provides comparable information in Europeans and Asians. Transient over-expression of IFNL4 in a hepatoma cell line induced STAT1/STAT2 phosphorylation and expression of interferon-stimulated genes. Our findings provide new insights into the genetic regulation of HCV clearance and its clinical management.
To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves’ disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves’ disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies.
Uveal melanoma is the most common primary cancer of the eye and often results in fatal metastasis. Here, we describe mutations occurring exclusively at arginine-625 in splicing factor 3B subunit 1 (SF3B1) in low-grade uveal melanomas with good prognosis. Thus, uveal melanoma is among a small group of cancers associated with SF3B1 mutation, and these mutations denote a distinct molecular subset of uveal melanomas.
The molecular pathogenesis of renal cell carcinoma (RCC) is poorly
understood. Whole-genome and exome sequencing followed by innovative tumorgraft
analyses (to accurately determine mutant allele ratios) identified several
putative two-hit tumor suppressor genes including BAP1. BAP1, a
nuclear deubiquitinase, is inactivated in 15% of clear-cell RCCs. BAP1
cofractionates with and binds to HCF-1 in tumorgrafts. Mutations disrupting the
HCF-1 binding motif impair BAP1-mediated suppression of cell proliferation, but
not H2AK119ub1 deubiquitination. BAP1 loss sensitizes RCC cells in
vitro to genotoxic stress. Interestingly, BAP1 and
PBRM1 mutations anticorrelate in tumors
(P=3×10−5), and combined loss of
BAP1 and PBRM1 in a few RCCs was associated with rhabdoid features
(q=0.0007). BAP1 and PBRM1 regulate seemingly different
gene expression programs, and BAP1 loss was associated with high tumor grade
(q=0.0005). Our results establish the foundation for an
integrated pathological and molecular genetic classification of RCC, paving the
way for subtype-specific treatments exploiting genetic vulnerabilities.
Recent studies indicate that a subclass of APOBEC cytidine deaminases, which convert cytosine to uracil during RNA editing and retrovirus or retrotransposon restriction, may induce mutation clusters in human tumors. We show here that throughout cancer genomes APOBEC mutagenesis is pervasive and correlates with APOBEC mRNA levels. Mutation clusters in whole-genome and exome datasets conformed to stringent criteria indicative of an APOBEC mutation pattern. Applying these criteria to 954,247 mutations in 2,680 exomes of 14 cancer types, mostly from TCGA, revealed significant presence of the APOBEC mutation pattern in bladder, cervical, breast, head and neck and lung cancers, reaching 68% of all mutations in some samples. Within breast cancer, the HER2E subtype was clearly enriched with tumors displaying the APOBEC mutation pattern, suggesting this type of mutagenesis is functionally linked with cancer development. The APOBEC mutation pattern also extended to cancer-associated genes, implying that ubiquitous APOBEC mutagenesis is carcinogenic.
The mammalian placenta is remarkably distinct between species, suggesting a history of rapid evolutionary diversification1. To gain insight into the molecular drivers of placental evolution, we compared biochemically predicted enhancers between mouse and rat trophoblast stem cells (TSCs) and find that species-specific enhancers are highly enriched for endogenous retroviruses (ERVs) on a genome-wide level. One of these ERV families, RLTR13D5, contributes hundreds of mouse-specific H3K4me1/H3K27ac-defined enhancers that functionally bind Cdx2, Eomes, and Elf5 - core factors that define the TSC regulatory network. Furthermore, we demonstrate that RLTR13D5 is capable of driving gene expression in rat placental cells. Comparison with other tissues revealed that species-specific ERV enhancer activity is generally restricted to hypomethylated tissues, suggesting that tissues permissive to ERV activity gain access to an otherwise silenced source of regulatory variation. Overall, our results implicate ERV enhancer cooption as a mechanism underlying the striking evolutionary diversification of placental development.
We conducted a genome-wide association study of gastric cancer (GC) and esophageal squamous cell carcinoma (ESCC) in ethnic Chinese subjects in which we genotyped 551,152 single nucleotide polymorphisms (SNPs). We report a combined analysis of 2,240 GC cases, 2,115 ESCC cases, and 3,302 controls drawn from five studies. In logistic regression models adjusted for age, sex, and study, multiple variants at 10q23 had genome-wide significance for GC and ESCC independently. A notable signal was rs2274223, a nonsynonymous SNP located in PLCE1, for GC (P=8.40×1010; per allele odds ratio (OR) = 1.31) and ESCC (P=3.85×10−9; OR = 1.34). The association with GC differed by anatomic subsite. For tumors located in the cardia the association was stronger (P=4.19 × 10−15; OR= 1.57) and for those located in the noncardia stomach it was absent (P=0.44; OR=1.05). Our findings at 10q23 could provide insight into the high incidence rates of both cancers in China.
We report the results of an association study of melanoma based on the genome-wide imputation of the genotypes of 1,353 cases and 3,566 controls of European origin conducted by the GenoMEL consortium. This revealed a novel association between several single nucleotide polymorphisms (SNPs) in intron 8 of the FTO gene, including rs16953002, which replicated using 12,313 cases and 55,667 controls of European ancestry from Europe, the USA and Australia (combined p=3.6×10−12, per-allele OR for A=1.16). As well as identifying a novel melanoma susceptibility locus, this is the first study to identify and replicate an association with SNPs in FTO not related to body mass index (BMI). These SNPs are not in intron 1 (the BMI-related region) and show no association with BMI. This suggests FTO’s function may be broader than the existing paradigm that FTO variants influence multiple traits only through their associations with BMI and obesity.
TERT-locus single nucleotide polymorphisms (SNPs) and leucocyte telomere measures are reportedly associated with risks of multiple cancers. Using the iCOGs chip, we analysed ~480 TERT-locus SNPs in breast (n=103,991), ovarian (n=39,774) and BRCA1 mutation carrier (11,705) cancer cases and controls. 53,724 participants have leucocyte telomere measures. Most associations cluster into three independent peaks. Peak 1 SNP rs2736108 minor allele associates with longer telomeres (P=5.8×10−7), reduced estrogen receptor negative (ER-negative) (P=1.0×10−8) and BRCA1 mutation carrier (P=1.1×10−5) breast cancer risks, and altered promoter-assay signal. Peak 2 SNP rs7705526 minor allele associates with longer telomeres (P=2.3×10−14), increased low malignant potential ovarian cancer risk (P=1.3×10−15) and increased promoter activity. Peak 3 SNPs rs10069690 and rs2242652 minor alleles increase ER-negative (P=1.2×10−12) and BRCA1 mutation carrier (P=1.6×10−14) breast and invasive ovarian (P=1.3×10−11) cancer risks, but not via altered telomere length. The cancer-risk alleles of rs2242652 and rs10069690 respectively increase silencing and generate a truncated TERT splice-variant.
Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces by utilizing the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning four million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified eight large novel inter-chromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed in RNA and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.
Genome wide association studies (GWAS) have identified four susceptibility loci for epithelial ovarian cancer (EOC) with another two loci being close to genome-wide significance. We pooled data from a GWAS conducted in North America with another GWAS from the United Kingdom. We selected the top 24,551 SNPs for inclusion on the iCOGS custom genotyping array. Follow-up genotyping was carried out in 18,174 cases and 26,134 controls from 43 studies from the Ovarian Cancer Association Consortium. We validated the two loci at 3q25 and 17q21 previously near genome-wide significance and identified three novel loci associated with risk; two loci associated with all EOC subtypes, at 8q21 (rs11782652, P=5.5×10-9) and 10p12 (rs1243180; P=1.8×10-8), and another locus specific to the serous subtype at 17q12 (rs757210; P=8.1×10-10). An integrated molecular analysis of genes and regulatory regions at these loci provided evidence for functional mechanisms underlying susceptibility that implicates CHMP4C in the pathogenesis of ovarian cancer.
Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms.
We report a new model to project the predictive performance of polygenic models based on the number and distribution of effect sizes for the underlying susceptibility alleles and the size of the training dataset. Using estimates of effect-size distribution and heritability derived from current studies, we project that while 45% of the variance of height has been attributed to common tagging Single Nucleotide Polymorphisms (SNP), a model trained on one million people may only explain 33.4% of variance of the trait. Current studies can identify 3.0%, 1.1%, and 7.0%, of the populations who are at two-fold or higher than average risk for Type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate the percentages to 18.8%, 6.1%, and 12.2%, respectively. The utility of future polygenic models will depend on achievable sample sizes, underlying genetic architecture and information on other risk-factors, including family history.
Age-related macular degeneration (AMD) is a common cause of blindness in older individuals. To accelerate understanding of AMD biology and help design new therapies, we executed a collaborative genomewide association study, examining >17,100 advanced AMD cases and >60,000 controls of European and Asian ancestry. We identified 19 genomic loci associated with AMD with p<5×10−8 and enriched for genes involved in regulation of complement activity, lipid metabolism, extracellular matrix remodeling and angiogenesis. Our results include 7 loci reaching p<5×10−8 for the first time, near the genes COL8A1/FILIP1L, IER3/DDR1, SLC16A8, TGFBR1, RAD51B, ADAMTS9/MIR548A2, and B3GALTL. A genetic risk score combining SNPs from all loci displayed similar good ability to distinguish cases and controls in all samples examined. Our findings provide new directions for biological, genetic and therapeutic studies of AMD.
A genome wide association scan of ~6.6 million genotyped or imputed variants in 882 Sardinian Multiple Sclerosis (MS) cases and 872 controls suggested association of CBLB gene variants with disease, which was confirmed in 1,775 cases and 2,005 controls (overall P =1.60 × 10-10). CBLB encodes a negative regulator of adaptive immune responses and mice lacking the orthologue are prone to experimental autoimmune encephalomyelitis, the animal model of MS.
Scapuloperoneal spinal muscular atrophy (SPSMA) and hereditary motor and sensory neuropathy type IIC (HMSN IIC, also known as HMSN2C or Charcot-Marie-Tooth disease type 2C (CMT2C)) are phenotypically heterogeneous disorders involving topographically distinct nerves and muscles. We originally described a large New England family of French-Canadian origin with SPSMA and an American family of English and Scottish descent with CMT2C1,2. We mapped SPSMA and CMT2C risk loci to 12q24.1–q24.31 with an overlapping region between the two diseases3,4. Further analysis reduced the CMT2C risk locus to a 4-Mb region5. Here we report that SPSMA and CMT2C are allelic disorders caused by mutations in the gene encoding the transient receptor potential cation channel, subfamily V, member 4 (TRPV4). Functional analysis revealed that increased calcium channel activity is a distinct property of both SPSMA- and CMT2C-causing mutant proteins. Our findings link mutations in TRPV4 to altered calcium homeostasis and peripheral neuropathies, implying a pathogenic mechanism and possible options for therapy for these disorders.
Many individuals with multiple or large colorectal adenomas, or early-onset colorectal cancer (CRC), have no detectable germline mutations in the known cancer predisposition genes. Using whole-genome sequencing, supplemented by linkage and association analysis, we identified specific heterozygous POLE or POLD1 germline variants in several multiple adenoma and/or CRC cases, but in no controls. The susceptibility variants appear to have high penetrance. POLD1 is also associated with endometrial cancer predisposition. The mutations map to equivalent sites in the proof-reading (exonuclease) domain of DNA polymerases ε and δ, and are predicted to impair correction of mispaired bases inserted during DNA replication. In agreement with this prediction, mutation carriers’ tumours were microsatellite-stable, but tended to acquire base substitution mutations, as confirmed by yeast functional assays. Further analysis of published data showed that the recently-described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE exonuclease domain mutations.
Sequence-based variation in gene expression is a key driver of disease risk. Common variants regulating expression in cis have been mapped in many eQTL studies typically in single tissues from unrelated individuals. Here, we present a comprehensive analysis of gene expression across multiple tissues conducted in a large set of mono- and dizygotic twins that allows systematic dissection of genetic (cis and trans) and non-genetic effects on gene expression. Using identity-by-descent estimates, we show that at least 40% of the total heritable cis-effect on expression cannot be accounted for by common cis-variants, a finding which exposes the contribution of low frequency and rare regulatory variants with respect to both transcriptional regulation and complex trait susceptibility. We show that a substantial proportion of gene expression heritability is trans to the structural gene and identify several replicating trans-variants which act predominantly in a tissue-restricted manner and may regulate the transcription of many genes.
Despite continual progress in the cataloging of vertebrate regulatory elements, little is known about their organization and regulatory architecture. Here we describe a massively parallel experiment to systematically test the impact of copy number, spacing, combination and order of transcription factor binding sites on gene expression. A complex library of ~5,000 synthetic regulatory elements containing patterns from 1 2 liver-specific transcription factor binding sites was assayed in mice and in HepG2 cells. We find that certain transcription factors act as direct drivers of gene expression in homotypic clusters of binding sites, independent of spacing between sites, whereas others function only synergistically. Heterotypic enhancers are stronger than their homotypic analogs and favor specific transcription factor binding site combinations, mimicking putative native enhancers. Exhaustive testing of binding site permutations suggests that there is flexibility in binding site order. Our findings provide quantitative support for a flexible model of regulatory element activity and suggest a framework for the design of synthetic tissue-specific enhancers.