Castration-resistant prostate cancer (CRPC) is associated with wide variations in survival. Recent studies of whole blood mRNA expression-based biomarkers strongly predicted survival but the genes used in these biomarker models were non-overlapping and their relationship was unknown. We developed a biomarker model for CRPC that is robust, but also captures underlying biological processes that drive prostate cancer lethality.
Using three independent cohorts of CRPC patients, we developed an integrative genomic approach for understanding the biological processes underlying genes associated with cancer progression, constructed a novel four-gene model that captured these changes, and compared the performance of the new model with existing gene models and other clinical parameters.
Our analysis revealed striking patterns of myeloid- and lymphoid-specific distribution of genes that were differentially expressed in whole blood mRNA profiles: up-regulated genes in patients with worse survival were overexpressed in myeloid cells, whereas down-regulated genes were noted in lymphocytes. A resulting novel four-gene model showed significant prognostic power independent of known clinical predictors in two independent datasets totaling 90 patients with CRPC, and was superior to the two existing gene models.
Whole blood mRNA profiling provides clinically relevant information in patients with CRPC. Integrative genomic analysis revealed patterns of differential mRNA expression with changes in gene expression in immune cell components which robustly predicted the survival of CRPC patients. The next step would be validation in a cohort of suitable size to quantify the prognostic improvement by the gene score upon the standard set of clinical parameters.
Electronic supplementary material
The online version of this article (doi:10.1186/s12916-015-0442-0) contains supplementary material, which is available to authorized users.
The growing gap between the demand for genome sequencing and the supply of trained genomics professionals is creating an acute need to develop more effective genomics education. In response we developed “Practical Analysis of Your Personal Genome”, a novel laboratory-style medical genomics course in which students have the opportunity to obtain and analyze their own whole genome. This report describes our motivations for and the content of a “practical” genomics course that incorporates personal genome sequencing and the lessons we learned during the first three iterations of this course.
Electronic supplementary material
The online version of this article (doi:10.1186/s12920-015-0124-y) contains supplementary material, which is available to authorized users.
Genomics; Education; Whole genome sequencing
Beyond its role in host defense, bacterial DNA methylation also plays important roles in the regulation of gene expression, virulence and antibiotic resistance. Bacterial cells in a clonal population can generate epigenetic heterogeneity to increase population-level phenotypic plasticity. Single molecule, real-time (SMRT) sequencing enables the detection of N6-methyladenine and N4-methylcytosine, two major types of DNA modifications comprising the bacterial methylome. However, existing SMRT sequencing-based methods for studying bacterial methylomes rely on a population-level consensus that lacks the single-cell resolution required to observe epigenetic heterogeneity. Here, we present SMALR (single-molecule modification analysis of long reads), a novel framework for single molecule-level detection and phasing of DNA methylation. Using seven bacterial strains, we show that SMALR yields significantly improved resolution and reveals distinct types of epigenetic heterogeneity. SMALR is a powerful new tool that enables de novo detection of epigenetic heterogeneity and empowers investigation of its functions in bacterial populations.
Bacterial DNA methylation is involved in many processes, from host defense to antibiotic resistance, however current methods for examining methylated genomes lack single-cell resolution. Here Beaulaurier et al. present Single Molecule Modification Analysis of Long Reads, a new tool for de novo detection of epigenetic heterogeneity.
Background & Aims
Very early onset inflammatory bowel diseases (VEOIBD), including infant disorders, are a diverse group of diseases found in children less than 6 years of age. They have been associated with several gene variants. We aimed to identify genes that cause VEOIBD.
We performed whole-exome sequencing of DNA from 1 infants with severe enterocolitis and her parents. Candidate gene mutations were validated in 40 pediatric patients and functional studies were carried out using intestinal samples and human intestinal cell lines.
We identified compound heterozygote mutations in the tetratricopeptide repeat domain 7 (TTC7A) gene in an infant from non-consanguineous parents with severe exfoliative apoptotic enterocolitis; we also detected the mutations in 2 unrelated families, each with 2 affected siblings. TTC7A interacts with EFR3 homolog B (EFR3B) to regulate phosphatidylinositol 4-kinase (PI4KA) at the plasma membrane. Functional studies demonstrated that TTC7A is expressed in human enterocytes. The mutations we identified in TTC7A result in either mislocalization or reduced expression of TTC7A. PI4KA was found to co-immunoprecipitate with TTC7A; the identified TTC7A mutations reduced this binding. Knockdown of TTC7A in human intestinal-like cell lines reduced their adhesion, increased apoptosis, and decreased production of phosphatidylinositol 4-phosphate.
In a genetic analysis, we identified loss of function mutations in TTC7A in 5 infants with VEOIBD. Functional studies demonstrated that the mutations cause defects in enterocytes and T cells that lead to severe apoptotic enterocolitis. Defects in the PI4KA–TTC7A–EFR3B pathway are involved in the pathogenesis of VEOIBD.
IBD; intestinal atresia; autoimmunity; intestine
Decreased insulin sensitivity, also referred to as insulin resistance (IR), is a
fundamental abnormality in patients with type 2 diabetes and a risk factor for
cardiovascular disease. While IR predisposition is heritable, the genetic basis remains
largely unknown. The GENEticS of Insulin Sensitivity consortium conducted a genome-wide
association study (GWAS) for direct measures of insulin sensitivity, such as euglycemic
clamp or insulin suppression test, in 2,764 European individuals, with replication in an
additional 2,860 individuals. The presence of a nonsynonymous variant of
N-acetyltransferase 2 (NAT2) [rs1208 (803A>G, K268R)] was strongly
associated with decreased insulin sensitivity that was independent of BMI. The rs1208 “A”
allele was nominally associated with IR-related traits, including increased fasting
glucose, hemoglobin A1C, total and LDL cholesterol, triglycerides, and coronary artery
disease. NAT2 acetylates arylamine and hydrazine drugs and carcinogens, but predicted
acetylator NAT2 phenotypes were not associated with insulin sensitivity. In a murine
adipocyte cell line, silencing of NAT2 ortholog Nat1
decreased insulin-mediated glucose uptake, increased basal and isoproterenol-stimulated
lipolysis, and decreased adipocyte differentiation, while Nat1
overexpression produced opposite effects. Nat1-deficient mice had
elevations in fasting blood glucose, insulin, and triglycerides and decreased insulin
sensitivity, as measured by glucose and insulin tolerance tests, with intermediate effects
in Nat1 heterozygote mice. Our results support a role for
NAT2 in insulin sensitivity.
Cardiology; Endocrinology; Genetics; Metabolism
The immune system is a highly complex and dynamic system. Historically, the most common scientific and clinical practice has been to evaluate its individual components. This kind of approach cannot always expose the interconnecting pathways that control immune-system responses and does not reveal how the immune system works across multiple biological systems and scales. High-throughput technologies can be used to measure thousands of parameters of the immune system at a genome-wide scale. These system-wide surveys yield massive amounts of quantitative data that provide a means to monitor and probe immune-system function. New integrative analyses can help synthesize and transform these data into valuable biological insight. Here we review some of the computational analysis tools for high-dimensional data and how they can be applied to immunology.
Chronic Obstructive Pulmonary Disease (COPD) is a complex disease. Genetic, epigenetic, and environmental factors are known to contribute to COPD risk and disease progression. Therefore we developed a systematic approach to identify key regulators of COPD that integrates genome-wide DNA methylation, gene expression, and phenotype data in lung tissue from COPD and control samples. Our integrative analysis identified 126 key regulators of COPD. We identified EPAS1 as the only key regulator whose downstream genes significantly overlapped with multiple genes sets associated with COPD disease severity. EPAS1 is distinct in comparison with other key regulators in terms of methylation profile and downstream target genes. Genes predicted to be regulated by EPAS1 were enriched for biological processes including signaling, cell communications, and system development. We confirmed that EPAS1 protein levels are lower in human COPD lung tissue compared to non-disease controls and that Epas1 gene expression is reduced in mice chronically exposed to cigarette smoke. As EPAS1 downstream genes were significantly enriched for hypoxia responsive genes in endothelial cells, we tested EPAS1 function in human endothelial cells. EPAS1 knockdown by siRNA in endothelial cells impacted genes that significantly overlapped with EPAS1 downstream genes in lung tissue including hypoxia responsive genes, and genes associated with emphysema severity. Our first integrative analysis of genome-wide DNA methylation and gene expression profiles illustrates that not only does DNA methylation play a ‘causal’ role in the molecular pathophysiology of COPD, but it can be leveraged to directly identify novel key mediators of this pathophysiology.
Chronic Obstructive Pulmonary Disease (COPD) is a common lung disease. It is the fourth leading cause of death in the world and is expected to be the third by 2020. COPD is a heterogeneous and complex disease consisting of obstruction in the small airways, emphysema, and chronic bronchitis. COPD is generally caused by exposure to noxious particles or gases, most commonly from cigarette smoking. However, only 20–25% of smokers develop clinically significant airflow obstruction. Smoking is known to cause epigenetic changes in lung tissues. Thus, genetics, epigenetic, and their interaction with environmental factors play an important role in COPD pathogenesis and progression. Currently, there are no therapeutics that can reverse COPD progression. In order to identify new targets that may lead to the development of therapeutics for curing COPD, we developed a systematic approach to identify key regulators of COPD that integrates genome-wide DNA methylation, gene expression, and phenotype data in lung tissue from COPD and control samples. Our integrative analysis identified 126 key regulators of COPD. We identified EPAS1 as the only key regulator whose downstream genes significantly overlapped with multiple genes sets associated with COPD disease severity.
A disruptive approach to therapeutic discovery and development is required in order to significantly improve the success rate of drug discovery for central nervous system (CNS) disorders. In this review, we first assess the key factors contributing to the frequent clinical failures for novel drugs. Second, we discuss cancer translational research paradigms that addressed key issues in drug discovery and development and have resulted in delivering drugs with significantly improved outcomes for patients. Finally, we discuss two emerging technologies that could improve the success rate of CNS therapies: human induced pluripotent stem cell (hiPSC)-based studies and multiscale biology models. Coincident with advances in cellular technologies that enable the generation of hiPSCs directly from patient blood or skin cells, together with methods to differentiate these hiPSC lines into specific neural cell types relevant to neurological disease, it is also now possible to combine data from large-scale forward genetics and post-mortem global epigenetic and expression studies in order to generate novel predictive models. The application of systems biology approaches to account for the multiscale nature of different data types, from genetic to molecular and cellular to clinical, can lead to new insights into human diseases that are emergent properties of biological networks, not the result of changes to single genes. Such studies have demonstrated the heterogeneity in etiological pathways and the need for studies on model systems that are patient-derived and thereby recapitulate neurological disease pathways with higher fidelity. In the context of two common and presumably representative neurological diseases, the neurodegenerative disease Alzheimer’s Disease, and the psychiatric disorder schizophrenia, we propose the need for, and exemplify the impact of, a multiscale biology approach that can integrate panomic, clinical, imaging, and literature data in order to construct predictive disease network models that can (i) elucidate subtypes of syndromic diseases, (ii) provide insights into disease networks and targets and (iii) facilitate a novel drug screening strategy using patient-derived hiPSCs to discover novel therapeutics for CNS disorders.
stem cell-based screening; systems biology and network biology; drug discovery screening; complex disease mechanism; high throughput biology
Errors in sample annotation or labeling often occur in large-scale genetic or genomic studies and are difficult to avoid completely during data generation and management. For integrative genomic studies, it is critical to identify and correct these errors. Different types of genetic and genomic data are inter-connected by cis-regulations. On that basis, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors in multiple types of molecular data, which can be used in further integrative analysis. Our results indicate that inspection of sample annotation and labeling error is an indispensable data quality assurance step. Applied to a large lung genomic study, MODMatcher increased statistically significant genetic associations and genomic correlations by more than two-fold. In a simulation study, MODMatcher provided more robust results by using three types of omics data than two types of omics data. We further demonstrate that MODMatcher can be broadly applied to large genomic data sets containing multiple types of omics data, such as The Cancer Genome Atlas (TCGA) data sets.
Many human diseases are complex with multiple genetic and environmental causal factors interacting together to give rise to disease phenotypes. Such factors affect biological systems through many layers of regulations, including transcriptional and epigenetic regulation, and protein changes. To fully understand their molecular mechanisms, complex diseases are often studied in diverse dimensions including genetics (genotype variations by single nucleotide polymorphism (SNP) arrays or whole exome sequencing), transcriptomics, epigenetics, and proteomics. However, errors in sample annotation or labeling often occur in large-scale genetic and genomic studies and are difficult to avoid completely during data generation and management. Identifying and correcting these errors are critical for integrative genomic studies. In this study, we developed a computational approach, Multi-Omics Data Matcher (MODMatcher), to identify and correct sample labeling errors based on multiple types of molecular data before further integrative analysis. Our results indicate that signals increased more than 100% after correction of sample labeling errors in a large lung genomic study. Our method can be broadly applied to large genomic data sets with multiple types of omics data, such as TCGA (The Cancer Genome Atlas) data sets.
Posttraumatic stress disorder (PTSD) and other deployment-related outcomes originate from a complex interplay between constellations of changes in DNA, environmental traumatic exposures, and other biological risk factors. These factors affect not only individual genes or bio-molecules but also the entire biological networks that in turn increase or decrease the risk of illness or affect illness severity. This review focuses on recent developments in the field of systems biology which use multidimensional data to discover biological networks affected by combat exposure and post-deployment disease states. By integrating large-scale, high-dimensional molecular, physiological, clinical, and behavioral data, the molecular networks that directly respond to perturbations that can lead to PTSD can be identified and causally associated with PTSD, providing a path to identify key drivers. Reprogrammed neural progenitor cells from fibroblasts from PTSD patients could be established as an in vitro assay for high throughput screening of approved drugs to determine which drugs reverse the abnormal expression of the pathogenic biomarkers or neuronal properties.
PTSD; genomics; gene expression; proteomics; Computational Biology; risk factors
Allergic rhinitis is a common disease whose genetic basis is incompletely explained. We report an integrated genomic analysis of allergic rhinitis.
We performed genome wide association studies (GWAS) of allergic rhinitis in 5633 ethnically diverse North American subjects. Next, we profiled gene expression in disease-relevant tissue (peripheral blood CD4+ lymphocytes) collected from subjects who had been genotyped. We then integrated the GWAS and gene expression data using expression single nucleotide (eSNP), coexpression network, and pathway approaches to identify the biologic relevance of our GWAS.
GWAS revealed ethnicity-specific findings, with 4 genome-wide significant loci among Latinos and 1 genome-wide significant locus in the GWAS meta-analysis across ethnic groups. To identify biologic context for these results, we constructed a coexpression network to define modules of genes with similar patterns of CD4+ gene expression (coexpression modules) that could serve as constructs of broader gene expression. 6 of the 22 GWAS loci with P-value ≤ 1x10−6 tagged one particular coexpression module (4.0-fold enrichment, P-value 0.0029), and this module also had the greatest enrichment (3.4-fold enrichment, P-value 2.6 × 10−24) for allergic rhinitis-associated eSNPs (genetic variants associated with both gene expression and allergic rhinitis). The integrated GWAS, coexpression network, and eSNP results therefore supported this coexpression module as an allergic rhinitis module. Pathway analysis revealed that the module was enriched for mitochondrial pathways (8.6-fold enrichment, P-value 4.5 × 10−72).
Our results highlight mitochondrial pathways as a target for further investigation of allergic rhinitis mechanism and treatment. Our integrated approach can be applied to provide biologic context for GWAS of other diseases.
Genome-wide association study; Allergic rhinitis; Coexpression network; Expression single-nucleotide polymorphism; Coexpression module; Pathway; Mitochondria; Hay fever; Allergy
Using expression profiles from postmortem prefrontal cortex samples of 624 dementia patients and non-demented controls, we investigated global disruptions in the co-regulation of genes in two neurodegenerative diseases, late-onset Alzheimer's disease (AD) and Huntington's disease (HD). We identified networks of differentially co-expressed (DC) gene pairs that either gained or lost correlation in disease cases relative to the control group, with the former dominant for both AD and HD and both patterns replicating in independent human cohorts of AD and aging. When aligning networks of DC patterns and physical interactions, we identified a 242-gene subnetwork enriched for independent AD/HD signatures. This subnetwork revealed a surprising dichotomy of gained/lost correlations among two inter-connected processes, chromatin organization and neural differentiation, and included DNA methyltransferases, DNMT1 and DNMT3A, of which we predicted the former but not latter as a key regulator. To validate the inter-connection of these two processes and our key regulator prediction, we generated two brain-specific knockout (KO) mice and show that Dnmt1 KO signature significantly overlaps with the subnetwork (P = 3.1 × 10−12), while Dnmt3a KO signature does not (P = 0.017).
differential co-expression; dysregulatory gene networks; epigenetic regulation of neural differentiation; network alignment; neurodegenerative diseases
The outbreak of diarrhea and hemolytic uremic syndrome that occurred in Germany in 2011 was caused by a Shiga toxin-producing enteroaggregative Escherichia coli (EAEC) strain. The strain was classified as EAEC due to the presence of a plasmid (pAA) that mediates a characteristic pattern of aggregative adherence on cultured cells, the defining feature of EAEC that has classically been associated with virulence. Here, we describe an infant rabbit-based model of intestinal colonization and diarrhea caused by the outbreak strain, which we use to decipher the factors that mediate the pathogen’s virulence. Shiga toxin is the key factor required for diarrhea. Unexpectedly, we observe that pAA is dispensable for intestinal colonization and development of intestinal pathology. Instead, chromosome-encoded autotransporters are critical for robust colonization and diarrheal disease in this model. Our findings suggest that conventional wisdom linking aggregative adherence to EAEC intestinal colonization is false for at least a subset of strains.
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.
Genome-wide association studies (GWAS) have found a large number of genetic regions (“loci”) affecting clinical end-points and phenotypes, many outside coding intervals. One approach to understanding the biological basis of these associations has been to explore whether GWAS signals from intermediate cellular phenotypes, in particular gene expression, are located in the same loci (“colocalise”) and are potentially mediating the disease signals. However, it is not clear how to assess whether the same variants are responsible for the two GWAS signals or whether it is distinct causal variants close to each other. In this paper, we describe a statistical method that can use simply single variant summary statistics to test for colocalisation of GWAS signals. We describe one application of our method to a meta-analysis of blood lipids and liver expression, although any two datasets resulting from association studies can be used. Our method is able to detect the subset of GWAS signals explained by regulatory effects and identify candidate genes affected by the same GWAS variants. As summary GWAS data are increasingly available, applications of colocalisation methods to integrate the findings will be essential for functional follow-up, and will also be particularly useful to identify tissue specific signals in eQTL datasets.
The genetics of complex disease produce alterations in the molecular interactions of cellular pathways whose collective effect may become clear through the organized structure of molecular networks. To characterize molecular systems associated with late-onset Alzheimer’s disease (LOAD), we constructed gene regulatory networks in 1647 post-mortem brain tissues from LOAD patients and non-demented subjects, and demonstrate that LOAD reconfigures specific portions of the molecular interaction structure. Through an integrative network-based approach, we rank-ordered these network structures for relevance to LOAD pathology, highlighting an immune and microglia-specific module dominated by genes involved in pathogen phagocytosis, containing TYROBP as a key regulator and up-regulated in LOAD. Mouse microglia cells over-expressing intact or truncated TYROBP revealed expression changes that significantly overlapped the human brain TYROBP network. Thus the causal network structure is a useful predictor of response to gene perturbations and presents a novel framework to test models of disease mechanisms underlying LOAD.
Whole exome and genome sequencing (WES/WGS) is now routinely offered as a clinical test by a growing number of laboratories. As part of the test design process each laboratory must determine the performance characteristics of the platform, test and informatics pipeline. This report documents one such characterization of WES/WGS.
Whole exome and whole genome sequencing was performed on multiple technical replicates of five reference samples using the Illumina HiSeq 2000/2500. The sequencing data was processed with a GATK-based genome analysis pipeline to evaluate: intra-run, inter-run, inter-mode, inter-machine and inter-library consistency, concordance with orthogonal technologies (microarray, Sanger) and sensitivity and accuracy relative to known variant sets.
Concordance to high-density microarrays consistently exceeds 97% (and typically exceeds 99%) and concordance between sequencing replicates also exceeds 97%, with no observable differences between different flow cells, runs, machines or modes. Sensitivity relative to high-density microarray variants exceeds 95%. In a detailed study of a 129 kb region, sensitivity was lower with some validated single-base insertions and deletions “not called”. Different variants are "not called" in each replicate: of all variants identified in WES data from the NA12878 reference sample 74% of indels and 89% of SNVs were called in all seven replicates, in NA12878 WGS 52% of indels and 88% of SNVs were called in all six replicates. Key sources of non-uniformity are variance in depth of coverage, artifactual variants resulting from repetitive regions and larger structural variants.
Blood pressure (BP) is a heritable determinant of risk for cardiovascular disease (CVD). To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP) and pulse pressure (PP), we genotyped ∼50 000 single-nucleotide polymorphisms (SNPs) that capture variation in ∼2100 candidate genes for cardiovascular phenotypes in 61 619 individuals of European ancestry from cohort studies in the USA and Europe. We identified novel associations between rs347591 and SBP (chromosome 3p25.3, in an intron of HRH1) and between rs2169137 and DBP (chromosome1q32.1 in an intron of MDM4) and between rs2014408 and SBP (chromosome 11p15 in an intron of SOX6), previously reported to be associated with MAP. We also confirmed 10 previously known loci associated with SBP, DBP, MAP or PP (ADRB1, ATP2B1, SH2B3/ATXN2, CSK, CYP17A1, FURIN, HFE, LSP1, MTHFR, SOX6) at array-wide significance (P < 2.4 × 10−6). We then replicated these associations in an independent set of 65 886 individuals of European ancestry. The findings from expression QTL (eQTL) analysis showed associations of SNPs in the MDM4 region with MDM4 expression. We did not find any evidence of association of the two novel SNPs in MDM4 and HRH1 with sequelae of high BP including coronary artery disease (CAD), left ventricular hypertrophy (LVH) or stroke. In summary, we identified two novel loci associated with BP and confirmed multiple previously reported associations. Our findings extend our understanding of genes involved in BP regulation, some of which may eventually provide new targets for therapeutic intervention.
Approaches exploiting extremes of the trait distribution may reveal novel loci for common traits, but it is unknown whether such loci are generalizable to the general population. In a genome-wide search for loci associated with upper vs. lower 5th percentiles of body mass index, height and waist-hip ratio, as well as clinical classes of obesity including up to 263,407 European individuals, we identified four new loci (IGFBP4, H6PD, RSRC1, PPP2R2A) influencing height detected in the tails and seven new loci (HNF4G, RPTOR, GNAT2, MRPS33P4, ADCY9, HS6ST3, ZZZ3) for clinical classes of obesity. Further, we show that there is large overlap in terms of genetic structure and distribution of variants between traits based on extremes and the general population and little etiologic heterogeneity between obesity subgroups.
Genetic variation at the chromosome 9p21 risk locus promotes cardiovascular disease; however, it is unclear how or which proteins encoded at this locus contribute to disease. We have previously demonstrated that loss of one candidate gene at this locus, cyclin-dependent kinase inhibitor 2B (Cdkn2b), in mice promotes vascular SMC apoptosis and aneurysm progression. Here, we investigated the role of Cdnk2b in atherogenesis and found that in a mouse model of atherosclerosis, deletion of Cdnk2b promoted advanced development of atherosclerotic plaques composed of large necrotic cores. Furthermore, human carriers of the 9p21 risk allele had reduced expression of CDKN2B in atherosclerotic plaques, which was associated with impaired expression of calreticulin, a ligand required for activation of engulfment receptors on phagocytic cells. As a result of decreased calreticulin, CDKN2B-deficient apoptotic bodies were resistant to efferocytosis and not efficiently cleared by neighboring macrophages. These uncleared SMCs elicited a series of proatherogenic juxtacrine responses associated with increased foam cell formation and inflammatory cytokine elaboration. The addition of exogenous calreticulin reversed defects associated with loss of Cdkn2b and normalized engulfment of Cdkn2b-deficient cells. Together, these data suggest that loss of CDKN2B promotes atherosclerosis by increasing the size and complexity of the lipid-laden necrotic core through impaired efferocytosis.
Single-molecule real-time (SMRT) DNA sequencing allows the systematic detection of chemical modifications such as methylation but has not previously been applied on a genome-wide scale. We used this approach to detect 49,311 putative 6-methyladenine (m6A) residues and 1,407 putative 5-methylcytosine (m5C) residues in the genome of a pathogenic Escherichia coli strain. We obtained strand-specific information for methylation sites and a quantitative assessment of the frequency of methylation at each modified position. We deduced the sequence motifs recognized by the methyltransferase enzymes present in this strain without prior knowledge of their specificity. Furthermore, we found that deletion of a phage-encoded methyltransferase-endonuclease (restriction-modification; RM) system induced global transcriptional changes and led to gene amplification, suggesting that the role of RM systems extends beyond protecting host genomes from foreign DNA.
Genome wide association studies have implicated allelic variation at 9p21.3 in multiple forms of vascular disease, including atherosclerotic coronary heart disease and abdominal aortic aneurysm. As for other genes at 9p21.3, human eQTL studies have associated expression of the tumor suppressor gene CDKN2B with the risk haplotype, but its potential role in vascular pathobiology remains unclear.
Methods and Results
Here we employed vascular injury models and found that Cdkn2b knockout mice displayed the expected increase in proliferation after injury, but developed reduced neointimal lesions and larger aortic aneurysms. In situ and in vitro studies suggested that these effects were due to increased smooth muscle cell apoptosis. Adoptive bone marrow transplant studies confirmed that the observed effects of Cdkn2b were mediated through intrinsic vascular cells and were not dependent on bone marrow-derived inflammatory cells. Mechanistic studies suggested that the observed increase in apoptosis was due to a reduction in MDM2 and an increase in p53 signaling, possibly due in part to compensation by other genes at the 9p21.3 locus. Dual inhibition of both Cdkn2b and p53 led to a reversal of the vascular phenotype in each model.
These results suggest that reduced CDKN2B expression and increased SMC apoptosis may be one mechanism underlying the 9p21.3 association with aneurysmal disease.
CDKN2B; apoptosis; smooth muscle; remodeling; abdominal aortic aneurysm; genome wide association studies; p53
Retrospective studies have demonstrated that nearly 50% of patients with ovarian cancer with normal cancer antigen 125 (CA125) levels have persistent disease; however, prospectively distinguishing between patients is currently impossible. Here, we demonstrate that for one patient, with the first reported fibroblast growth factor receptor 2 (FGFR2) fusion transcript in ovarian cancer, circulating tumor DNA (ctDNA) is a more sensitive and specific biomarker than CA125, and it can also inform on a candidate therapeutic. For a 4-year period, during which the patient underwent primary debulking surgery and chemotherapy, tumor recurrences, and multiple chemotherapeutic regimens, blood samples were longitudinally collected and stored. Whereas postsurgical CA125 levels were elevated only three times for 28 measurements, the FGFR2 fusion ctDNA biomarker was readily detectable by quantitative real-time reverse transcription-polymerase chain reaction (PCR) in all of these same blood samples and in the tumor recurrences. Given the persistence of the FGFR2 fusion, we treated tumor cells derived from this patient and others with the FGFR2 inhibitor BGJ398. Only tumor cells derived from this patient were sensitive to FGFR2 inhibitor treatment. Using the same methodologic approach, we demonstrate in a second patient with a different fusion that PCR and agarose gel electrophoresis can also be used to identify tumor-specific DNA in the circulation. Taken together, we demonstrate that a relatively inexpensive, PCR-based ctDNA surveillance assay can outperform CA125 in identifying occult disease.
Multiple laboratories now offer clinical whole genome sequencing (WGS). We anticipate WGS becoming routinely used in research and clinical practice. Many institutions are exploring how best to educate geneticists and other professionals about WGS. Providing students in WGS courses with the option to analyze their own genome sequence is one strategy that might enhance students’ engagement and motivation to learn about personal genomics. However, if this option is presented to students, it is vital they make informed decisions, do not feel pressured into analyzing their own genomes by their course directors or peers, and feel free to analyze a third-party genome if they prefer. We therefore developed a 26-hour introductory genomics course in part to help students make informed decisions about whether to receive personal WGS data in a subsequent advanced genomics course. In the advanced course, they had the option to receive their own personal genome data, or an anonymous genome, at no financial cost to them. Our primary aims were to examine whether students made informed decisions regarding analyzing their personal genomes, and whether there was evidence that the introductory course enabled the students to make a more informed decision.
This was a longitudinal cohort study in which students (N = 19) completed questionnaires assessing their intentions, informed decision-making, attitudes and knowledge before (T1) and after (T2) the introductory course, and before the advanced course (T3). Informed decision-making was assessed using the Decisional Conflict Scale.
At the start of the introductory course (T1), most (17/19) students intended to receive their personal WGS data in the subsequent course, but many expressed conflict around this decision. Decisional conflict decreased after the introductory course (T2) indicating there was an increase in informed decision-making, and did not change before the advanced course (T3). This suggests that it was the introductory course content rather than simply time passing that had the effect. In the advanced course, all (19/19) students opted to receive their personal WGS data. No changes in technical knowledge of genomics were observed. Overall attitudes towards WGS were broadly positive.
Providing students with intensive introductory education about WGS may help them make informed decisions about whether or not to work with their personal WGS data in an educational setting.
Genome-wide association studies (GWAS) have identified 36 loci associated with body mass index (BMI), predominantly in populations of European ancestry. We conducted a meta-analysis to examine the association of >3.2 million SNPs with BMI in 39,144 men and women of African ancestry, and followed up the most significant associations in an additional 32,268 individuals of African ancestry. We identified one novel locus at 5q33 (GALNT10, rs7708584, p=3.4×10−11) and another at 7p15 when combined with data from the Giant consortium (MIR148A/NFE2L3, rs10261878, p=1.2×10−10). We also found suggestive evidence of an association at a third locus at 6q16 in the African ancestry sample (KLHL32, rs974417, p=6.9×10−8). Thirty-two of the 36 previously established BMI variants displayed directionally consistent effect estimates in our GWAS (binomial p=9.7×10−7), of which five reached genome-wide significance. These findings provide strong support for shared BMI loci across populations as well as for the utility of studying ancestrally diverse populations.