Geisinger Health System (GHS) provides an ideal platform for Precision Medicine. Key elements are the integrated health system, stable patient population, and electronic health record (EHR) infrastructure. In 2007 Geisinger launched MyCode®, a system-wide biobanking program to link samples and EHR data for broad research use.
Patient-centered input into MyCode® was obtained using participant focus groups. Participation in MyCode® is based on opt-in informed consent and allows recontact, which facilitates collection of data not in the EHR, and, since 2013, the return of clinically actionable results to participants. MyCode® leverages Geisinger’s technology and clinical infrastructure for participant tracking and sample collection.
MyCode® has a consent rate of >85% with more than 90,000 participants currently, with ongoing enrollment of ~4,000 per month. MyCode® samples have been used to generate molecular data, including high-density genotype and exome sequence data. Genotype and EHR-derived phenotype data replicate previously reported genetic associations.
The MyCode® project has created resources that enable a new model for translational research that is faster, more flexible, and more cost effective than traditional clinical research approaches. The new model is scalable, and will increase in value as these resources grow and are adopted across multiple research platforms.
biobank; genomics; electronic health records; genetic association
Alzheimer’s disease (AD) represents the most common form of dementia in elder populations with approximately 30 million cases worldwide. Genome wide genotyping and sequencing studies have identified many genetic variants associated with late-onset Alzheimer’s disease (LOAD). While most of these variants are associated with increased risk of developing LOAD, only limited number of reports focused on variants that are protective against the disease.
Here we applied a novel approach to uncover protective alleles against AD by analyzing genetic and phenotypic data in Mount Sinai Biobank and Electronic Medical Record (EMR) databases.
We discovered a likely loss-of-function small deletion variant in the caspase 7 (CASP7) gene associated with significantly reduced incidence of LOAD in carriers of the high-risk APOE ε4 allele. Further investigation of four independent cohorts of European ancestry revealed the protective effect of the CASP7 variant against AD is most significant in homozygous APOE ε4 allele carriers. Meta analysis of multiple datasets shows overall odds ratio = 0.45 (p = 0.004). Analysis of RNA sequencing derived gene expression data indicated the variant correlates with reduced caspase 7 expression in multiple brain tissues we examined.
Taken together, these results are consistent with the notion that caspase 7 plays a key role in microglial activation driving neuro-degeneration during AD pathogenesis, and may explain the underlying genetic mechanisms that anti-inflammatory interventions in AD show greater benefit in APOE ε4 carriers than non-carriers. Our findings inform potential novel therapeutic opportunities for AD and warrant further investigations.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-2725-z) contains supplementary material, which is available to authorized users.
Alzheimer’s disease; Genetic variants; Protective alleles; CASP7; Resilience; Loss of function
The most common side effect of angiotensin converting enzyme inhibitor drugs (ACEi) is a cough. We conducted a genome wide association study (GWAS) of ACEi-induced cough among 7,080 subjects of diverse ancestries in the eMERGE network. Cases were subjects diagnosed with ACEi-induced cough. Controls were subjects with at least 6 months of ACEi use and no cough. A GWAS (1,595 cases and 5,485 controls) identified associations on chromosome 4 in an intron of KCNIP4. The strongest association was at rs145489027 (MAF=0.33, OR=1.3 [95%CI: 1.2–1.4], p=1.0×10−8). Replication for six SNPs in KCNIP4 was tested in a second eMERGE population (n=926) and in the GoDARTS cohort (n=4,309). Replication was observed at rs7675300 (OR=1.32 [1.01–1.70], p=0.04) in eMERGE and rs16870989 and rs1495509 (OR=1.15 [1.01–1.30], p=0.03 for both) in GoDARTS. The combined association at rs1495509 was significant (OR=1.23 [1.15–1.32], p=1.9×10−9). These results indicate that SNPs in KCNIP4 may modulate ACEi-induced cough risk.
ACE inhibitor; angiotensin converting enzyme inhibitor; GWAS; KCNIP4; Drug Related Side Effects and Adverse Reactions; pharmacogenetics
Background and objective
We designed an algorithm to identify abdominal aortic aneurysm cases and controls from electronic health records to be shared and executed within the “electronic Medical Records and Genomics” (eMERGE) Network.
Materials and methods
Structured Query Language, was used to script the algorithm utilizing “Current Procedural Terminology” and “International Classification of Diseases” codes, with demographic and encounter data to classify individuals as case, control, or excluded. The algorithm was validated using blinded manual chart review at three eMERGE Network sites and one non-eMERGE Network site. Validation comprised evaluation of an equal number of predicted cases and controls selected at random from the algorithm predictions. After validation at the three eMERGE Network sites, the remaining eMERGE Network sites performed verification only. Finally, the algorithm was implemented as a workflow in the Konstanz Information Miner, which represented the logic graphically while retaining intermediate data for inspection at each node. The algorithm was configured to be independent of specific access to data and was exportable (without data) to other sites.
The algorithm demonstrated positive predictive values (PPV) of 92.8% (CI: 86.8-96.7) and 100% (CI: 97.0-100) for cases and controls, respectively. It performed well also outside the eMERGE Network. Implementation of the transportable executable algorithm as a Konstanz Information Miner workflow required much less effort than implementation from pseudo code, and ensured that the logic was as intended.
Discussion and conclusion
This ePhenotyping algorithm identifies abdominal aortic aneurysm cases and controls from the electronic health record with high case and control PPV necessary for research purposes, can be disseminated easily, and applied to high-throughput genetic and other studies.
Electronic health records; Electronic medical record; Case-Control study; ICD-9; Computing methodologies; KNIME; Aortic aneurysm
Electronic health records (EHR) provide a comprehensive resource for discovery, allowing unprecedented exploration of the impact of genetic architecture on health and disease. The data of EHRs also allow for exploration of the complex interactions between health measures across health and disease. The discoveries arising from EHR based research provide important information for the identification of genetic variation for clinical decision-making. Due to the breadth of information collected within the EHR, a challenge for discovery using EHR based data is the development of high-throughput tools that expose important areas of further research, from genetic variants to phenotypes. Phenome-Wide Association studies (PheWAS) provide a way to explore the association between genetic variants and comprehensive phenotypic measurements, generating new hypotheses and also exposing the complex relationships between genetic architecture and outcomes, including pleiotropy. EHR based PheWAS have mainly evaluated associations with case/control status from International Classification of Disease, Ninth Edition (ICD-9) codes. While these studies have highlighted discovery through PheWAS, the rich resource of clinical lab measures collected within the EHR can be better utilized for high-throughput PheWAS analyses and discovery. To better use these resources and enrich PheWAS association results we have developed a sound methodology for extracting a wide range of clinical lab measures from EHR data. We have extracted a first set of 21 clinical lab measures from the de-identified EHR of participants of the Geisinger MyCode™ biorepository, and calculated the median of these lab measures for 12,039 subjects. Next we evaluated the association between these 21 clinical lab median values and 635,525 genetic variants, performing a genome-wide association study (GWAS) for each of 21 clinical lab measures. We then calculated the association between SNPs from these GWAS passing our Bonferroni defined p-value cutoff and 165 ICD-9 codes. Through the GWAS we found a series of results replicating known associations, and also some potentially novel associations with less studied clinical lab measures. We found the majority of the PheWAS ICD-9 diagnoses highly related to the clinical lab measures associated with same SNPs. Moving forward, we will be evaluating further phenotypes and expanding the methodology for successful extraction of clinical lab measurements for research and PheWAS use. These developments are important for expanding the PheWAS approach for improved EHR based discovery.
Patients, clinicians, researchers and payers are seeking to understand the value of using genomic information (as reflected by genotyping, sequencing, family history or other data) to inform clinical decision-making. However, challenges exist to widespread clinical implementation of genomic medicine, a prerequisite for developing evidence of its real-world utility.
To address these challenges, the National Institutes of Health-funded IGNITE (Implementing GeNomics In pracTicE; www.ignite-genomics.org) Network, comprised of six projects and a coordinating center, was established in 2013 to support the development, investigation and dissemination of genomic medicine practice models that seamlessly integrate genomic data into the electronic health record and that deploy tools for point of care decision making. IGNITE site projects are aligned in their purpose of testing these models, but individual projects vary in scope and design, including exploring genetic markers for disease risk prediction and prevention, developing tools for using family history data, incorporating pharmacogenomic data into clinical care, refining disease diagnosis using sequence-based mutation discovery, and creating novel educational approaches.
This paper describes the IGNITE Network and member projects, including network structure, collaborative initiatives, clinical decision support strategies, methods for return of genomic test results, and educational initiatives for patients and providers. Clinical and outcomes data from individual sites and network-wide projects are anticipated to begin being published over the next few years.
The IGNITE Network is an innovative series of projects and pilot demonstrations aiming to enhance translation of validated actionable genomic information into clinical settings and develop and use measures of outcome in response to genome-based clinical interventions using a pragmatic framework to provide early data and proofs of concept on the utility of these interventions. Through these efforts and collaboration with other stakeholders, IGNITE is poised to have a significant impact on the acceleration of genomic information into medical practice.
Precision medicine; Pharmacogenomics; Genomics; Personalized medicine; Clinical decision support; Electronic health record; Implementation
Dopamine β-hydroxylase (DBH) catalyzes the conversion of dopamine to norepinephrine in the CNS and peripherally. DBH variants are associated with large changes in circulating DBH and implicated in multiple disorders; yet causal relationships and tissue-specific effects remain unresolved.
To characterize regulatory variants in DBH, effect on mRNA expression, and role in modulating sympathetic tone and disease risk.
Methods and Results
Analysis of DBH mRNA in human tissues confirmed high expression in the locus coeruleus (LC) and adrenal gland, but also in sympathetically innervated organs (liver>lung>heart). Allele-specific mRNA assays revealed pronounced allelic expression differences in the liver (2–11-fold) attributable to promoter rs1611115 and exon 2 rs1108580, but only small differences in LC and adrenals. These alleles were also associated with significantly reduced mRNA expression in liver and lung. Although DBH protein is expressed in other sympathetically innervated organs, mRNA levels were too low for analysis. In mice, hepatic Dbh mRNA levels correlated with cardiovascular risk phenotypes. The minor alleles of rs1611115 and rs1108580 were associated with sympathetic phenotypes including angina pectoris. Testing combined effects of these variants suggested protection against myocardial infarction in three separate clinical cohorts.
We demonstrate profound effects of DBH variants on expression in two sympathetically innervated organs, liver and lung, but not in adrenals and brain. Preliminary results demonstrate an association of these variants with clinical phenotypes responsive to peripheral sympathetic tone. We hypothesize that in addition to endocrine effects via circulating DBH and norepinephrine, the variants act in sympathetically innervated target organs.
Dopamine beta-hydroxylase; regulatory genetic variants; genetic association; sympathetic tone; myocardial infarction; human; gene expression/regulation; genetic polymorphism
Phenome-Wide Association Studies (PheWAS) comprehensively investigate the association between genetic variation and a wide array of outcome traits. Electronic health record (EHR) based PheWAS uses various abstractions of International Classification of Diseases, Ninth Revision (ICD-9) codes to identify case/control status for diagnoses that are used as the phenotypic variables. However, there have not been comparisons within a PheWAS between results from high quality derived phenotypes and high-throughput but potentially inaccurate use of ICD-9 codes for case/control definition. For this study we first developed a group of high quality algorithms for five phenotypes. Next we evaluated the association of these “gold standard” phenotypes and 4,636,178 genetic variants with minor allele frequency > 0.01 and compared the results from high-throughput associations at the 3 digit, 5 digit, and PheWAS codes for defining case/control status. We found that certain diseases contained similar patient populations across phenotyping methods but had differences in PheWAS.
Using abdominal aortic aneurysm (AAA) as a model, this case–control study used electronic medical record (EMR) data to assess known risk factors and identify new associations.
The study population consisted of cases with AAA (n =888) and controls (n =10,523) from the Geisinger Health System EMR in Central and Northeastern Pennsylvania. We extracted all clinical and diagnostic data for these patients from January 2004 to December 2009 from the EMR. From this sample set, bootstrap replication procedures were used to randomly generate 2,500 iterations of data sets, each with 500 cases and 2000 controls. Estimates of risk factor effect sizes were obtained by stepwise logistic regression followed by bootstrap aggregation. Variables were ranked using the number of inclusions in iterations and P values.
The benign neoplasm diagnosis was negatively associated with AAA, a novel finding. Similarly, type 2 diabetes, diastolic blood pressure, weight and myelogenous neoplasms were negatively associated with AAA. Peripheral artery disease, smoking, age, coronary stenosis, systolic blood pressure, age, height, male sex, pulmonary disease and hypertension were associated with an increased risk for AAA.
This study utilized EMR data, retrospectively, for risk factor assessment of a complex disease. Known risk factors for AAA were replicated in magnitude and direction. A novel negative association of benign neoplasms was identified. EMRs allow researchers to rapidly and inexpensively use clinical data to expand cohort size and derive better risk estimates for AAA as well as other complex diseases.
Aortic Aneurysm; Abdominal; Electronic medical record; Neoplasms; Benign; Risk factors; Blood pressure; Diabetes mellitus; Type 2; Case–control studies
We performed a genome-wide association study on 1,292 individuals with abdominal aortic aneurysms (AAAs) and 30,503 controls from Iceland and The Netherlands, with a follow-up of top markers in up to 3,267 individuals with AAAs and 7,451 controls. The A allele of rs7025486 on 9q33 was found to associate with AAA, with an odds ratio (OR) of 1.21 and P = 4.6 × 10−10. In tests for association with other vascular diseases, we found that rs7025486[A] is associated with early onset myocardial infarction (OR = 1.18, P = 3.1 × 10−5), peripheral arterial disease (OR = 1.14, P = 3.9 × 10−5) and pulmonary embolism (OR = 1.20, P = 0.00030), but not with intracranial aneurysm or ischemic stroke. No association was observed between rs7025486[A] and common risk factors for arterial and venous diseases—that is, smoking, lipid levels, obesity, type 2 diabetes and hypertension. Rs7025486 is located within DAB2IP, which encodes an inhibitor of cell growth and survival.
The goal of the present study was to identify differences in gene expression between SAT, VAT, and EAT depots in Class III severely obese individuals.
Human subcutaneous (SAT) and visceral (VAT) adipose tissues exhibit differential gene expression profiles. There is little information, however, about the other proximal white adipose tissue, epigastric (EAT) in terms of its function and contribution to metabolism.
Subjects and Methods
Using RNA from adipose biospecimens obtained from Class III severely obese patients undergoing open Roux-en-Y gastric bypass surgery, we compared gene expression profiles between SAT, VAT, and EAT, using microarrays validated by real time quantitative PCR.
The three depots were found to share 1,907 genes. VAT had the greatest number of genes  expressed exclusively in this depot, followed by SAT , and then EAT . Moreover, VAT shared more genes with EAT  than with SAT . Further analyses using ratios of SAT/EAT, VAT/EAT, and SAT/VAT, identified specific as well as overlapping networks and pathways of genes representing dermatological diseases, inflammation, cell cycle and growth, cancer, and development. Targeted analysis of genes playing a role in adipose tissue development and function, revealed that Peroxisome proliferator-activated receptor Gamma Coactivator 1-alpha (PGC1-α) that regulates the precursor of the hormone Irisin (FNCD5), were abundantly expressed in all three fat depots, along with fibroblast growth factors (FGF) FGF1, FGF7, and FGF10, whereas, FGF19 and FGF21 were undetectable.
These data indicate that EAT has more in common with VAT suggesting similar metabolic potential. The human epigastric adipose depot could play a significant functional role in metabolic diseases and should be further investigated.
epigastric; visceral; subcutaneous; adipose; FGF19; microarrays
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11–1.24, p = 2.10 × 10−9) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08–1.21, p = 2.34 × 10−6). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07–1.22, p = 3.33 × 10−5); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74–0.91, p = 5.41 × 10−5) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
PheWAS; genetic association; pleiotropy; Exome chip; FTO; BMI
Emerging biomarkers for acute myocardial infarction (AMI) may enhance conventional risk prediction algorithms if they are informative and associated with risk independently of established predictors. In this study we constructed a cohort for testing emerging biomarkers for AMI in managed care populations using existing biospecimen repositories linked to EHR.
EHR-based biorepositories collected by healthcare systems can be federated to provide large, methodologically-sound testing sets for biomarker validation.
Subjects aged 40 to 80 were selected from two existing population-based biospecimen repositories. Incident AMI status and covariates were ascertained from EHR. An ad-hoc model for AMI risk was parameterized and validated. Simulation was used to test incremental gains in performance due to the inclusion of biomarkers in this model. Gains in performance were assessed in terms of area under the ROC curve and case reclassification.
A total of 18,329 individuals (57% female) contributed 108,400 person-years of EHR follow-up. The crude AMI incidence was 10.8 and 5.0 per 1,000 person-years among males and females, respectively. Compared to the model with risk factors alone, inclusion of a simulated biomarker yielded substantial gains in sensitivity without loss of specificity. Furthermore, a net ROC-AUC gain of 13.3% was observed as well as correct reclassification of 9.8% of incident cases (79 of 806) that were otherwise not considered statin-indicated at baseline under ATPIII criteria.
More research is needed to assess incremental contribution of emerging biomarkers for AMI prediction in managed care populations.
Abdominal aortic aneurysm (AAA) is a common human disease with a high estimated heritability (0.7); however, only a small number of associated genetic loci have been reported to date. In contrast, over 100 loci have now been reproducibly associated with either blood lipid profile and/or coronary artery disease (CAD) (both risk factors for AAA) in large-scale meta-analyses. This study employed a staged design to investigate whether the loci for these two phenotypes are also associated with AAA. Validated CAD and dyslipidaemia loci underwent screening using the Otago AAA genome-wide association data set. Putative associations underwent staged secondary validation in 10 additional cohorts. A novel association between the SORT1 (1p13.3) locus and AAA was identified. The rs599839 G allele, which has been previously associated with both dyslipidaemia and CAD, reached genome-wide significance in 11 combined independent cohorts (meta-analysis with 7048 AAA cases and 75 976 controls: G allele OR 0.81, 95% CI 0.76–0.85, P = 7.2 × 10−14). Modelling for confounding interactions of concurrent dyslipidaemia, heart disease and other risk factors suggested that this marker is an independent predictor of AAA susceptibility. In conclusion, a genetic marker associated with cardiovascular risk factors, and in particular concurrent vascular disease, appeared to independently contribute to susceptibility for AAA. Given the potential genetic overlap between risk factor and disease phenotypes, the use of well-characterized case–control cohorts allowing for modelling of cardiovascular disease risk confounders will be an important component in the future discovery of genetic markers for conditions such as AAA.
Yes-associated protein (YAP) is a transcriptional co-activator and regulates cell proliferation and apoptosis. We investigated the clinical and biological significance of YAP in endometrial cancer (EMCA).
YAP expression in 150 primary tumor tissues from patients with EMCA was evaluated by immunohistochemistry and its association with clinicopathological data was assessed. The biological functions of YAP were determined in EMCA cell lines through knockdown/overexpression of YAP. The role of YAP in modulating radiation sensitivity was also investigated in EMCA cells.
Increased nuclear YAP expression was significantly associated with higher grade, stage, lympho-vascular space invasion, postoperative recurrence/metastasis and overall survival in estrogen mediated EMCA, called type 1 cancer (p = 0.019, = 0.028, = 0.0008, = 0.046 and = 0.015, respectively). In multivariate analysis, nuclear YAP expression was confirmed as an independent prognostic factor for overall survival in type 1 EMCA. YAP knockdown by siRNA resulted in a significant decrease in cell proliferation (p<0.05), anchorage-dependent growth (p = 0.015) and migration/invasion (p<0.05), and a significant increase in the number of cells in G0/G1 phase (p = 0.002). Conversely, YAP overexpression promoted cell proliferation. Clonogenic assay demonstrated enhanced radiosensitivity by approximately 36% in YAP inhibited cells.
Since YAP functions as a transcriptional co-activator, its differential localization in the nucleus of cancer cells and subsequent impact on cell proliferation could have important consequences with respect to its role as an oncogene in EMCA. Nuclear YAP expression could be useful as a prognostic indicator or therapeutic target and predict radiation sensitivity in patients with EMCA.
The melanocortin 4 receptor (MC4R) critically regulates feeding and satiety. Rare variants in MC4R are predominantly found in obese individuals. Though some rare variants in MC4R discovered in patients have defects in localization, ligand binding and signaling to cAMP, many have no recognized defects.
In our cohort of 1433 obese subjects that underwent Roux-en-Y Gastric Bypass (RYGB) surgery, we found fifteen variants of MC4R. We matched rare variant carriers to patients with the MC4R reference alleles for gender, age, starting BMI and T2D to determine the variant effect on weight-loss post-RYGB. In vitro, we determined expression of mutant receptors by ELISA and western blot, and cAMP production by microscopy.
While carrying a rare MC4R allele is associated with obesity, carriers of rare variants exhibited comparable weight-loss after RYGB to non-carriers. However, subjects carrying three of these variants, V95I, I137T or L250Q, lost less weight after surgery. In vitro, the R305Q mutation caused a defect in cell surface expression while only the I137T and C326R mutations showed impaired cAMP signaling. Despite these apparent differences, there was no correlation between in vitro signaling and pre- or post-surgery clinical phenotype.
These data suggest that subtle differences in receptor signaling conferred by rare MC4R variants combined with additional factors predispose carriers to obesity. In the absence of complete MC4R deficiency, these differences can be overcome by the powerful weight-reducing effects of bariatric surgery. In a complex disorder such as obesity, genetic variants that cause subtle defects that have cumulative effects can be overcome after appropriate clinical intervention.
Abdominal aortic aneurysm (AAA), a dilatation of the infrarenal aorta, typically affects males > 65 years. The pathobiological mechanisms of human AAA are poorly understood. The goal of this study was to identify novel pathways involved in the development of AAAs.
A custom-designed “AAA-chip” was used to assay 43 of the differentially expressed genes identified in a previously published microarray study between AAA (n = 15) and control (n = 15) infrarenal abdominal aorta. Protein analyses were performed on selected genes.
Altogether 38 of the 43 genes on the “AAA-chip” showed significantly different expression. Novel validated genes in AAA pathobiology included ADCY7, ARL4C, BLNK, FOSB, GATM, LYZ, MFGE8, PRUNE2, PTPRC, SMTN, TMODI and TPM2. These genes represent a wide range of biological functions, such as calcium signaling, development and differentiation, as well as cell adhesion not previously implicated in AAA pathobiology. Protein analyses for GATM, CD4, CXCR4, BLNK, PLEK, LYZ, FOSB, DUSP6, ITGA5 and PTPRC confirmed the mRNA findings.
The results provide new directions for future research into AAA pathogenesis to study the role of novel genes confirmed here. New treatments and diagnostic tools for AAA could potentially be identified by studying these novel pathways.
gene expression; vascular biology; aorta; abdominal aortic aneurysm
A single mutation can alter cellular and global homeostatic mechanisms and give rise to multiple clinical diseases. We hypothesized that these disease mechanisms could be identified using low minor allele frequency (MAF<0.1) non-synonymous SNPs (nsSNPs) associated with “mechanistic phenotypes”, comprised of collections of related diagnoses. We studied two mechanistic phenotypes: (1) thrombosis, evaluated in a population of 1,655 African Americans; and (2) four groupings of cancer diagnoses, evaluated in 3,009 white European Americans. We tested associations between nsSNPs represented on GWAS platforms and mechanistic phenotypes ascertained from electronic medical records (EMRs), and sought enrichment in functional ontologies across the top-ranked associations. We used a two-step analytic approach whereby nsSNPs were first sorted by the strength of their association with a phenotype. We tested associations using two reverse genetic models and standard additive and recessive models. In the second step, we employed a hypothesis-free ontological enrichment analysis using the sorted nsSNPs to identify functional mechanisms underlying the diagnoses comprising the mechanistic phenotypes. The thrombosis phenotype was solely associated with ontologies related to blood coagulation (Fisher's p = 0.0001, FDR p = 0.03), driven by the F5, P2RY12 and F2RL2 genes. For the cancer phenotypes, the reverse genetics models were enriched in DNA repair functions (p = 2×10−5, FDR p = 0.03) (POLG/FANCI, SLX4/FANCP, XRCC1, BRCA1, FANCA, CHD1L) while the additive model showed enrichment related to chromatid segregation (p = 4×10−6, FDR p = 0.005) (KIF25, PINX1). We were able to replicate nsSNP associations for POLG/FANCI, BRCA1, FANCA and CHD1L in independent data sets. Mechanism-oriented phenotyping using collections of EMR-derived diagnoses can elucidate fundamental disease mechanisms.
The YAP1 gene encodes a potent new oncogene and stem cell factor. However, in some cancers, the YAP1 gene plays a role of tumor suppressor. At present, the gene and its products are intensely studied and its cDNAs are used as transgenes in cellular and animal models. Here, we report 4 new potential mRNA splicing isoforms of the YAP1 gene, bringing the total number of isoforms to 8. We detected all 8 YAP1 isoforms in a panel of human tissues and evaluated the expression of the longest isoform of YAP1 (YAP1-2δ) using Real Time PCR. All YAP1 isoforms are barely detectable in human leukocytes compared to fair levels of expression found in other human tissues. We analyzed the structure of the genomic region that gave rise to alternatively spliced YAP1 transcripts in different metazoans. We found that YAP1 isoforms, which utilize exon 6 emerged in evolution with the appearance of amniotes. Interestingly, 6 YAP1 isoforms, which contain the exon 5 extension, exon 6 or both would have their leucine zipper region disrupted in the predicted protein product, compared to the intact leucine zipper found in two YAP1 (α) isoforms. This observation has direct functional ramifications for YAP1 signaling. We also propose a normalized nomenclature for the mRNA splice variants of YAP1 gene, which should aid in the characterization of signaling differences among the potential protein products of the YAP1 gene.
Alternative splicing; WW domains; Leucine Zipper; Quantitative RT-PCR
The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)-funded consortium engaged in the development of methods and best-practices for utilizing the Electronic Medical Record (EMR) as a tool for genomic research. Now in its sixth year, its second funding cycle and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from EMRs can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and healthcare informatics, particularly electronic phenotyping, genome-wide association studies, genomic medicine implementation and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here we describe the evolution, accomplishments, opportunities and challenges of the network since its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting towards implementation of genomic medicine.
electronic medical records; personalized medicine; genome-wide association studies; genetics and genomics; collaborative research
The goal of this study was to investigate the role of complement cascade genes in the pathobiology of human abdominal aortic aneurysms (AAAs).
Methods and Results
Results of a genome-wide microarray expression profiling revealed 3,274 differentially expressed genes between aneurysmal and control aortic tissue. Interestingly, 13 genes in the complement cascade were significantly differentially expressed between AAA and the controls. In silico analysis of the promoters of the 13 complement cascade genes showed enrichment for transcription factor binding sites for STAT5A. Chromatin-immunoprecipitation experiments demonstrated binding of transcription factor STAT5A to the promoters of the majority of the complement cascade genes. Immunohistochemical analysis showed strong staining for C2 in AAA tissues.
These results provide strong evidence that the complement cascade plays a role in human AAA. Based on our microarray studies, the pathway is activated in AAA, particularly via the lectin and classical pathways. The overrepresented binding sites of transcription factor STAT5A in the complement cascade gene promoters suggest a role for STAT5A in the coordinated regulation of complement cascade gene expression.
Abdominal aortic aneurysm; complement cascade; genetic association study; STAT5; chromatin immunoprecipitation
Abdominal aortic aneurysm (AAA) is a dilatation of the aorta affecting most frequently elderly men. Histologically AAAs are characterized by inflammation, vascular smooth muscle cell apoptosis, and extracellular matrix degradation. The mechanisms of AAA formation, progression, and rupture are currently poorly understood. A previous mRNA expression study revealed a large number of differentially expressed genes between AAA and non-aneurysmal control aortas. MicroRNAs (miRNAs), small non-coding RNAs that are post-transcriptional regulators of gene expression, could provide a mechanism for the differential expression of genes in AAA.
To determine differences in miRNA levels between AAA (n = 5) and control (n = 5) infrarenal aortic tissues, a microarray study was carried out. Results were adjusted using Benjamini-Hochberg correction (adjusted p < 0.05). Real-time quantitative RT-PCR (qRT-PCR) assays with an independent set of 36 AAA and seven control tissues were used for validation. Potential gene targets were retrieved from miRNA target prediction databases Pictar, TargetScan, and MiRTarget2. Networks from the target gene set were generated and examined using the network analysis programs, CytoScape® and Ingenuity Pathway Core Analysis®.
A microarray study identified eight miRNAs with significantly different expression levels between AAA and controls (adjusted p < 0.05). Real-time qRT-PCR assays validated the findings for five of the eight miRNAs. A total of 222 predicted miRNA target genes known to be differentially expressed in AAA based on a prior mRNA microarray study were identified. Bioinformatic analyses revealed that several target genes are involved in apoptosis and activation of T cells.
Our genome-wide approach revealed several differentially expressed miRNAs in human AAA tissue suggesting that miRNAs play a role in AAA pathogenesis.
Apoptosis; Microarray analysis; Vascular biology; miRNA-mRNA analysis; Network analysis
The extracellular matrix of peripheral nerve is formed from a diverse set of macromolecules, including glycoproteins, collagens and proteoglycans. Recent studies using knockout animal models have demonstrated that individual components of the extracellular matrix play a vital role in peripheral nerve development and regeneration. In this study we identified fibrillin-1 and fibrillin-2, large modular structural glycoproteins, as components of the extracellular matrix of peripheral nerve. Previously it was found that fibrillin-2 null mice display joint contractures, suggesting a possible defect of the peripheral nervous system in these animals. Close examination of the peripheral nerves of fibrillin-2 deficient animals described here revealed some structural abnormalities in the perineurium, while general structure of the nerve and molecular composition of nerve extracellular matrix remained unchanged. We also found that in spite of the obvious motor function impairment, fibrillin-2 null mice failed to display changes of nerve conduction properties or nerve regeneration capacity. Based on the data obtained we can conclude that peripheral neuropathy should be excluded as the cause of the impairment of locomotory function and joint contractures observed in fibrillin-2 deficient animals.
fibrillin; extracellular matrix; peripheral nerve
The infrarenal abdominal aorta exhibits increased disease susceptibility relative to other aortic regions. Allograft studies exchanging thoracic and abdominal segments showed that regional susceptibility is maintained regardless of location, suggesting substantial roles for embryological origin, tissue composition and site-specific gene expression.
We analyzed gene expression with microarrays in baboon aortas, and found that members of the HOX gene family exhibited spatial expression differences. HOXA4 was chosen for further study, since it had decreased expression in the abdominal compared to the thoracic aorta. Western blot analysis from 24 human aortas demonstrated significantly higher HOXA4 protein levels in thoracic compared to abdominal tissues (P < 0.001). Immunohistochemical staining for HOXA4 showed nuclear and perinuclear staining in endothelial and smooth muscle cells in aorta. The HOXA4 transcript levels were significantly decreased in human abdominal aortic aneurysms (AAAs) compared to age-matched non-aneurysmal controls (P < 0.00004). Cultured human aortic endothelial and smooth muscle cells stimulated with INF-γ (an important inflammatory cytokine in AAA pathogenesis) showed decreased levels of HOXA4 protein (P < 0.0007).
Our results demonstrated spatial variation in expression of HOXA4 in human aortas that persisted into adulthood and that downregulation of HOXA4 expression was associated with AAAs, an important aortic disease of the ageing population.
ACK (activated Cdc42-associated tyrosine kinase) (also Tnk2) is an ubiquitin-binding protein and plays an important role in ligand-induced and ubiquitination-mediated degradation of epidermal growth factor receptor (EGFR). Here we report that ACK is ubiquitinated by HECT E3 ubiquitin ligase Nedd4-1 and degraded along with EGFR in response to EGF stimulation. ACK interacts with Nedd4-1 through a conserved PPXY WW-binding motif. The WW3 domain in Nedd4-1 is critical for binding to ACK. Although ACK binds to both Nedd4-1 and Nedd4-2 (also Nedd4L), Nedd4-1 is the E3 ubiquitin ligase for ubiquitination of ACK in cells. Interestingly, deletion of the sterile alpha motif (SAM) domain at the N terminus dramatically reduced the ubiquitination of ACK by Nedd4-1, while deletion of the Uba domain dramatically enhanced the ubiquitination. Use of proteasomal and lysosomal inhibitors demonstrated that EGF-induced ACK degradation is processed by lysosomes, not proteasomes. RNA interference (RNAi) knockdown of Nedd4-1, not Nedd4-2, inhibited degradation of both EGFR and ACK, and overexpression of ACK mutants that are deficient in either binding to or ubiquitination by Nedd4-1 blocked EGF-induced degradation of EGFR. Our findings suggest an essential role of Nedd4-1 in regulation of EGFR degradation through interaction with and ubiquitination of ACK.