Survival after cardiac surgery in infancy requires adaptive responses from oxidative stress management and vascular regulation pathways. We tested the hypothesis that genetic variation in these pathways influences post-operative survival in non-syndromic congenital heart disease (CHD) children.
This is an analysis of a cohort of non-syndromic CHD patients who underwent cardiac surgery with cardiopulmonary bypass before 6 months of age (n=422). Six single nucleotide polymorphisms (SNPs) in 6 genes involved in oxidative stress and vascular response pathways, identified through a priori literature search, were tested for effects on transplant-free survival. Survival curves, adjusting for confounding covariates, were calculated using the Cox Proportional Hazard Models.
Long-term survival was strongly associated with VEGFA SNP rs833069 (p=7.03×10−4) and SOD2 SNP rs2758331 (p=0.019). To test for joint effects of the 2 SNPs on transplant-free survival, the genotypes were grouped to form a risk score reflecting the cumulative number of risk alleles (0–4 alleles/patient). A higher risk score based on the VEGFA and SOD2 SNP genotypes was associated with worse transplant-free survival (p=3.02×10−4) after confounder adjustment. The total burden of risk alleles was additive; individuals with the highest risk score of 4 (n=59 subjects, 14.2% of the cohort) had a total covariate-adjusted HR=15.64 for worse transplant-free survival.
After cardiac surgery, infants who are homozygous for the high-risk alleles for both the VEGFA and SOD2 SNPs have an approximate 16-fold increased risk of death or heart transplant; suggesting that genetic variants are important modifiers of survival after surgery for CHD.
Congenital heart disease, CHD; Ischemia/reperfusion injury (myocardial); Genetics, genomics; Genes/polymorphisms/microarrays; Myocardial remodeling (reshaping, constraining, ventriculectomy); Outcomes (including mortality, morbidity, survival, etc.); Statistics, survival analysis
By 6 October 2014, many laboratories in the United States must begin honoring new individual data access rights created by recent changes to federal privacy and laboratory regulations. These access rights are more expansive than has been widely understood and pose complex challenges for genomic testing laboratories. This article analyzes regulatory texts and guidances to explore which laboratories are affected. It offers the first published analysis of which parts of the vast trove of data generated during next-generation sequencing will be accessible to patients and research subjects. Persons tested at affected laboratories seemingly will have access, upon request, to uninterpreted gene variant information contained in their stored variant call format, binary alignment/map, and FASTQ files. A defect in the regulations will subject some non-CLIA-regulated research laboratories to these new access requirements unless the Department of Health and Human Services takes swift action to avert this apparently unintended consequence. More broadly, all affected laboratories face a long list of daunting operational, business, compliance, and bioethical issues as they adapt to this change and to the Food and Drug Administration’s recently announced plan to publish draft guidance outlining a new oversight framework for lab-developed tests.
access rights; CLIA; FDA; HIPAA; return of results
Recent data suggest that high‐density lipoprotein cholesterol (HDL‐C) levels are likely not in the causative pathway of atheroprotection, shifting focus from HDL‐C to its subfractions and associated proteins. This study's goal was to determine which HDL phenotype was the better predictor of carotid artery disease (CAAD).
Methods and Results
HDL‐2 and HDL‐3 were measured in 1725 participants of European ancestry in a prevalent case‐control cohort study of CAAD. Stratified analyses were conducted for men (n=1201) and women (n=524). Stepwise linear regression was used to determine whether HDL‐C, HDL‐2, HDL‐3, or apolipoprotein A1 was the best predictor of CAAD, while adjusting for the confounders of censored age, diabetes, and current smoking status. In both men and women, HDL‐3 was negatively associated with CAAD (P=0.0011 and 0.033 for men and women, respectively); once HDL‐3 was included in the model, no other HDL phenotype was significantly associated with CAAD. Addition of paraoxonase 1 activity to the aforementioned regression model showed a significant and independent (of HDL‐3) association with CAAD in men (P=0.001) but not in the smaller female subgroup.
This study is the first to contrast the associations of HDL‐2 and HDL‐3 with CAAD. We found that HDL‐3 levels were more predictive of CAAD status than HDL‐2, HDL‐C, or apolipoprotein A1. In addition, for men, paraoxonase 1 activity improved the overall model prediction for CAAD independently and additively with HDL‐3 levels. Further investigation into the molecular mechanisms through which HDL‐3 is associated with protection from CAAD is warranted.
atherosclerosis; carotid arteries; high‐density lipoprotein; lipids; lipoproteins
The electronic Medical Records and Genomics (eMERGE) (Phase I) network was established in 2007 to further genomic discovery using biorepositories linked to the electronic health record (EHR). In Phase II, which began in 2011, genomic discovery efforts continue and in addition the network is investigating best practices for implementing genomic medicine, in particular, the return of genomic results in the EHR for use by physicians at point-of-care. To develop strategies for addressing the challenges of implementing genomic medicine in the clinical setting, the eMERGE network is conducting studies that return clinically-relevant genomic results to research participants and their health care providers. These genomic medicine pilot studies include returning individual genetic variants associated with disease susceptibility or drug response, as well as genetic risk scores for common “complex” disorders. Additionally, as part of a network-wide pharmacogenomics-related project, targeted resequencing of 84 pharmacogenes is being performed and select genotypes of pharmacogenetic relevance are being placed in the EHR to guide individualized drug therapy. Individual sites within the eMERGE network are exploring mechanisms to address incidental findings generated by resequencing of the 84 pharmacogenes. In this paper, we describe studies being conducted within the eMERGE network to develop best practices for integrating genomic findings into the EHR, and the challenges associated with such work.
genomics; electronic health records; incidental findings; implementation; genetic counseling; next generation sequencing; pharmacogenetics
As genomic and exomic testing expands in both the research and clinical arenas, determining whether, how, and which incidental findings to return to the ordering clinician and patient becomes increasingly important. Although opinion is varied on what should be returned to consenting patients or research participants, most experts agree that return of medically actionable results should be considered. There is insufficient evidence to fully inform evidence-based clinical practice guidelines regarding return of results from genome-scale sequencing, and thus generation of such evidence is imperative, given the rapidity with which genome-scale diagnostic tests are being incorporated into clinical care. We present an overview of the approaches to incidental findings by members of the Clinical Sequencing Exploratory Research network, funded by the National Human Genome Research Institute, to generate discussion of these approaches by the clinical genomics community. We also report specific lists of “medically actionable” genes that have been generated by a subset of investigators in order to explore what types of findings have been included or excluded in various contexts. A discussion of the general principles regarding reporting of novel variants, challenging cases (genes for which consensus was difficult to achieve across Clinical Sequencing Exploratory Research network sites), solicitation of preferences from participants regarding return of incidental findings, and the timing and context of return of incidental findings are provided.
actionability; actionable genes; clinical sequencing; genomic medicine; incidental findings
Background. Paraoxonase 1 (PON1) enzymatic activity has been consistently predictive of cardiovascular disease, while the genotypes at the four functional polymorphisms at PON1 have not. The goal of this study was to identify additional variation at the PON gene cluster that improved prediction of PON1 activity and determine if these variants predict carotid artery disease (CAAD). Methods. We considered 1,328 males in a CAAD cohort. 51 tagging single-nucleotide polymorphisms (tag SNPs) across the PON cluster were evaluated to determine their effects on PON1 activity and CAAD status. Results. Six SNPs (four in PON1 and one each in PON2/3) predicted PON1 arylesterase (AREase) activity, in addition to the four previously known functional SNPs. In total, the 10 SNPs explained 30.1% of AREase activity, 5% of which was attributable to the six identified predictive SNPs. We replicate rs854567 prediction of 2.3% of AREase variance, the effects of rs3917510, and a PON3 haplotype that includes rs2375005. While AREase activity strongly predicted CAAD, none of the 10 SNPs predicting AREase predicted CAAD. Conclusions. This study identifies new genetic variants that predict additional PON1 AREase activity. Identification of SNPs associated with PON1 activity is required when evaluating the many phenotypes associated with genetic variation near PON1.
Background and Purpose
Lipoprotein(a) level (Lp(a)) is an established risk factor for coronary artery disease and has been implicated in carotid artery disease (CAAD). The relationship between genetic variation in the LPA gene region and CAAD risk remains unknown.
We genotyped single nucleotide polymorphisms (SNPs) in the LPAL2, LPA, and PLG region in 530 individuals with severe CAAD and 770 controls and kringle IV type 2 (KIV2) repeat length in a subset of 90 individuals.
Nine SNPs collectively accounted for 30% of the variance in Lp(a) level. Six SNPs were associated with Lp(a) level after accounting for KIV2 copy number, and the dominant KIV2 allele combined with these markers explained 60% of the variance in Lp(a) level. Five SNPs, including rs10455872, which had an odds ratio of 2.1 per minor allele, and haplotypes formed by rs10455872, rs6919346, and rs3123629 were significant predictors of CAAD. After accounting for Lp(a) level, all evidence of CAAD-genotype association in the LPA region was eliminated.
LPA region SNPs capture some but not all of the effect of KIV2 repeat length on Lp(a) level. There are associations between LPA region SNPs and CAAD which appear to be due to effects on Lp(a) level.
Carotid stenosis; atherosclerosis; lipoprotein(a); genomics; risk factors
There are numerous obstacles to genomic medicine. These include the large number of rare and novel genomic variants per individual. The American College of Medical Genetics and Genomics (ACMG) has recommended that all pathogenic variants in 56 gene-disease pairs that are identified incidentally in a genomic test be offered to the patient (Green et al., 2013, PMID: 23788249). We considered an expanded list of 112 actionable gene-disease pairs, ones where medical intervention is possible to prevent or detect disease early. We estimate the rate of these incidental findings (IFs) in European and African Ancestry groups. However, we found high discordance between classifications of expert reviewers. We have reported both inconsistency across labs in variant classification and a bias towards overcalling pathogenicity (Amendola et al., 2015, PMID: 25637381). Thus, there is a need to standardize classification of genomic variants in medical sequencing. To date genomics laboratories have used non-standard classification systems. The ACMG published guidelines for variant classification for Mendelian disorders designed to increase consistency among labs (Richards et al., 2015, PMID: 25741868). The Clinical Sequencing Exploratory Research (CSER) Consortium evaluated the use of these rules by nine of the CLIA laboratories supporting CSER projects, considering 99 germline variants. The results were examined to evaluate intra-laboratory differences between variant classifications using the labs own criteria vs. adopting ACMG criteria and inter-laboratory differences using either the lab’s own system or the ACMG guidelines. Agreement among labs did not differ whether using the laboratory specific vs. ACMG criteria (P=0.9); i.e., the ACMG criteria did not yield more consistent variant classification in this exercise. We further analyzed sources of disagreement in the use of the ACMG criteria and identified causes of variance in classifications. In addition to providing useful analyses of how variant classifications approaches vary among laboratories, these data should allow clarification and refinement of the ACMG criteria that may increase consistency in variant classification.
Genomic medicine; genomic variants; variant classifications
The authors conducted focus groups with patients prescribed antidepressants (pilot session plus 2 focus groups, n = 27); patients prescribed carbamazepine (2 focus groups, n = 17); and healthy patients (2 focus groups, n = 17). Although participants understood the potential advantages of pharmacogenetic testing, many felt that the risks (discrimination, stigmatization, physician overreliance on genomic results, and denial of certain medications) may outweigh the benefits. These concerns were shared across groups but were more strongly expressed among participants with chronic mental health diagnoses.
Pharmacogenetic testing, a form of precision medicine, has the potential to optimize medication choice and dosing. Yet, relatively little is known about the views of patients—particularly those with chronic psychiatric conditions—with respect to such testing.
To explore patients’ beliefs and attitudes regarding pharmacogenetic testing, with the goal of informing policy development and implementation.
Qualitative study design using semistructured focus groups with adults enrolled in Group Health Cooperative, a large health maintenance organization in the Pacific Northwest. We conducted focus groups with patients prescribed antidepressants (pilot session plus 2 focus groups, n = 27); patients prescribed carbamazepine (2 focus groups, n = 17); and healthy patients (2 focus groups, n = 17).
Although participants understood the potential advantages of pharmacogenetic testing, many felt that the risks (discrimination, stigmatization, physician overreliance on genomic results, and denial of certain medications) may outweigh the benefits. These concerns were shared across groups but were more strongly expressed among participants with chronic mental health diagnoses.
Clinical implementation of pharmacogenetic testing must address patient concerns about privacy, discrimination, quality of care, and erosion of the physician-patient relationship.
Te American College of Medical Genetics and Genomics (ACMG) recommended that clinical laboratories performing next-generation sequencing analyze and return pathogenic variants for 56 specific genes it considered medically actionable. Our objective was to evaluate the clinical and economic impact of returning these results.
We developed a decision-analytic policy model to project the quality-adjusted life-years and lifetime costs associated with returning ACMG-recommended incidental findings in three hypothetical cohorts of 10,000 patients.
Returning incidental findings to cardiomyopathy patients, colorectal cancer patients, or healthy individuals would increase costs by $896,000, $2.9 million, and $3.9 million, respectively, and would increase quality-adjusted life-years by 20, 25.4, and 67 years, respectively, for incremental cost-effectiveness ratios of $44,800, $115,020, and $58,600, respectively. In probabilistic analyses, returning incidental findings cost less than $100,000/quality-adjusted life-year gained in 85, 28, and 91%, respectively, of simulations. Assuming next-generation sequencing costs $500, the incremental cost-effectiveness ratio for primary screening of healthy individuals was $133,400 (<$100,000/quality-adjusted life-year gained in 10% of simulations). Results were sensitive to the cohort age and assumptions about gene penetrance.
Returning incidental findings is likely cost-effective for certain patient populations. Screening of generally healthy individuals is likely not cost-effective based on current data, unless next-generation sequencing costs less than $500.
cost-effectiveness; genome sequencing; incidental findings; next-generation sequencing
In an effort to return actionable results from variant data to electronic health records (EHRs), participants in the Electronic Medical Records and Genomics (eMERGE) Network are being sequenced with the targeted Pharmacogenomics Research Network sequence platform (PGRNseq). This cost-effective, highly-scalable, and highly-accurate platform was created to explore rare variation in 84 key pharmacogenetic genes with strong drug phenotype associations.
To return Clinical Laboratory Improvement Amendments (CLIA) results to our participants at the Group Health Cooperative, we sequenced the DNA of 900 participants (61 % female) with non-CLIA biobanked samples. We then selected 450 of those to be re-consented, to redraw blood, and ultimately to validate CLIA variants in anticipation of returning the results to the participant and EHR. These 450 were selected using an algorithm we designed to harness data from self-reported race, diagnosis and procedure codes, medical notes, laboratory results, and variant-level bioinformatics to ensure selection of an informative sample. We annotated the multi-sample variant call format by a combination of SeattleSeq and SnpEff tools, with additional custom variables including evidence from ClinVar, OMIM, HGMD, and prior clinical associations.
We focused our analyses on 27 actionable genes, largely driven by the Clinical Pharmacogenetics Implementation Consortium. We derived a ranking system based on the total number of coding variants per participant (75.2±14.7), and the number of coding variants with high or moderate impact (11.5±3.9). Notably, we identified 11 stop-gained (1 %) and 519 missense (20 %) variants out of a total of 1785 in these 27 genes. Finally, we prioritized variants to be returned to the EHR with prior clinical evidence of pathogenicity or annotated as stop-gain for the following genes: CACNA1S and RYR1 (malignant hyperthermia); SCN5A, KCNH2, and RYR2 (arrhythmia); and LDLR (high cholesterol).
The incorporation of genetics into the EHR for clinical decision support is a complex undertaking for many reasons including lack of prior consent for return of results, lack of biospecimens collected in a CLIA environment, and EHR integration. Our study design accounts for these hurdles and is an example of a pilot system that can be utilized before expanding to an entire health system.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0181-z) contains supplementary material, which is available to authorized users.
Severe congenital neutropenia (SCN) is a rare hematopoietic disorder, with estimated incidence of 1 in 200,000 individuals of European descent, many cases of which are inherited in an autosomal dominant pattern. Despite the fact that several causal genes have been identified, the genetic basis for >30% of cases remains unknown. We report a five generation family segregating a novel single nucleotide variant (SNV) in TCIRG1. There is perfect co-segregation of the SNV with congenital neutropenia in this family; all 11 affected, but none of the unaffected, individuals carry this novel SNV. Western blot analysis show reduced levels of TCIRG1 protein in affected individuals, compared to healthy controls. Two unrelated patients with SCN, identified by independent investigators, are heterozygous for different, rare, highly conserved, coding variants in TCIRG1.
TCIRG1; Congenital neutropenia; SCN; V-ATPase
An important challenge with the application of next-generation sequencing technology is the possibility of uncovering incidental genomic findings. A paucity of evidence on personal utility for incidental findings has hindered clinical guidelines. Our objective was to estimate personal utility for complex information derived from incidental genomic findings.
We used a discrete-choice experiment to evaluate participants’ personal utility for the following attributes: disease penetrance, disease treatability, disease severity, carrier status and cost. Study participants were drawn from the Canadian public. We analyzed the data with a mixed logit model.
In total, 1200 participants completed our questionnaire (available in English and French). Participants valued receiving information about high-penetrance disorders but expressed disutility for receiving information on low-penetrance disorders. The average willingness to pay was $445 (95% confidence interval [CI] $322–$567) to receive incidental findings in a scenario where clinicians returned information about high-penetrance, medically treatable disorders, but only 66% of participants (95% CI 63%–71%) indicated that they would choose to receive information in that scenario. On average, participants placed an important value ($725, 95% CI $600–$850) on having a choice about what type of findings they would receive, including receipt of information about high-penetrance, treatable disorders or receipt of information about high-penetrance disorders with or without available treatment. The predicted uptake of that scenario was 76% (95% CI 72%–79%).
Most participants valued receiving incidental findings, but personal utility depended on the type of finding, and not all participants wanted to receive incidental results, regardless of the potential health implications. These results indicate that to maximize benefit, participant-level preferences should inform the decision about whether to return incidental findings.
Apolipoprotein E (APOE) genotype is a determinant of neurologic recovery after brain ischemia and traumatic brain injury. The APOE ε2 allele has been associated with worse neurodevelopmental (ND) outcome after repair of congenital heart defects (CHD) in infancy. Replication of this finding in an independent cohort is essential to validate the observed genotype-phenotype association.
The association of APOE genotype with ND outcomes was assessed in a combined cohort of patients with single-ventricle CHD enrolled in the Single Ventricle Reconstruction and Infant Single Ventricle trials. ND outcome was assessed at 14 months using the Psychomotor Development Index (PDI) and Mental Development Index (MDI) of the Bayley Scales of Infant Development-II. Stepwise multivariable regression was performed to develop predictive models for PDI and MDI scores.
Complete data were available for 298 of 435 patients. After adjustment for preoperative and postoperative covariates, the APOE ε2 allele was associated with a lower PDI score (P = .038). Patients with the ε2 allele had a PDI score approximately 6 points lower than those without the risk allele, explaining 1.04% of overall PDI variance, because the ε2 allele was present in only 11% of the patients. There was a marginal effect of the ε2 allele on MDI scores (P = .058).
These data validate the association of the APOE ε2 allele with adverse early ND outcomes after cardiac surgery in infants, independent of patient and operative factors. Genetic variants that decrease neuroresilience and impair neuronal repair after brain injury are important risk factors for ND dysfunction after surgery for CHD. (J Thorac Cardiovasc Surg 2014;148:2560-8)
To effectively articulate the results of exome and genome sequencing we refined the structure and content of molecular test reports. To communicate results of a randomized control trial aimed at the evaluation of exome sequencing for clinical medicine, we developed a structured narrative report. With feedback from genetics and non-genetics professionals, we developed separate indication-specific and incidental findings reports. Standard test report elements were supplemented with research study-specific language, which highlighted the limitations of exome sequencing and provided detailed, structured results, and interpretations. The report format we developed to communicate research results can easily be transformed for clinical use by removal of research-specific statements and disclaimers. The development of clinical reports for exome sequencing has shown that accurate and open communication between the clinician and laboratory is ideally an ongoing process to address the increasing complexity of molecular genetic testing.
exome sequencing; Clinical Laboratory Improvement Amendments (CLIA); College of American Pathologists (CAP); incidental findings; laboratory report
It is critical to develop new metrics to determine whether high density lipoprotein (HDL) is cardioprotective in humans. One promising approach is HDL particle concentration (HDL-P) – the size and concentration of HDL in plasma or serum. However, the two methods currently used to determine HDL-P yield concentrations that differ more than 5-fold. We therefore developed and validated an improved approach to quantify HDL-P, termed calibrated ion mobility analysis (calibrated IMA).
HDL was isolated from plasma by ultracentrifugation, introduced into the gas phase with electrospray ionization, separated by size, and quantified by particle counting. A calibration curve constructed with purified proteins was used to correct for the ionization efficiency of HDL particles.
The concentrations of gold nanoparticles and reconstituted HDLs measured by calibrated IMA were indistinguishable from concentrations determined by orthogonal methods. In plasma of control (n=40) and cerebrovascular disease (n=40) subjects, three subspecies of HDL were reproducibility measured, with an estimated total HDL-P of 13.4±2.4 µM (mean±SD). HDL-C accounted for 48% of the variance in HDL-P. HDL-P was significantly lower in subjects with cerebrovascular disease, and this difference remained significant after adjustment for HDL cholesterol levels.
Calibrated IMA accurately and reproducibly determined the concentration of gold nanoparticles and synthetic HDL, strongly suggesting the method could accurately quantify HDL particle concentration. Importantly, the estimated stoichiometry of apoA-I determined by calibrated IMA was 3–4 per HDL particle, in excellent agreement with current structural models. Furthermore, HDL-P associated with cardiovascular disease status in a clinical population independently of HDL cholesterol.
cardiovascular disease; carotid cerebrovascular disease; native electrospray ionization; HDL
As APOE locus variants contribute to both risk of late-onset Alzheimer disease and differences in age-at-onset, it is important to know if other established late-onset Alzheimer disease risk loci also affect age-at-onset in cases.
To investigate the effects of known Alzheimer disease risk loci in modifying age-at-onset, and to estimate their cumulative effect on age-at-onset variation, using data from genome-wide association studies in the Alzheimer’s Disease Genetics Consortium (ADGC).
Design, Setting and Participants
The ADGC comprises 14 case-control, prospective, and family-based datasets with data on 9,162 Caucasian participants with Alzheimer’s occurring after age 60 who also had complete age-at-onset information, gathered between 1989 and 2011 at multiple sites by participating studies. Data on genotyped or imputed single nucleotide polymorphisms (SNPs) most significantly associated with risk at ten confirmed LOAD loci were examined in linear modeling of AAO, and individual dataset results were combined using a random effects, inverse variance-weighted meta-analysis approach to determine if they contribute to variation in age-at-onset. Aggregate effects of all risk loci on AAO were examined in a burden analysis using genotype scores weighted by risk effect sizes.
Main Outcomes and Measures
Age at disease onset abstracted from medical records among participants with late-onset Alzheimer disease diagnosed per standard criteria.
Analysis confirmed association of APOE with age-at-onset (rs6857, P=3.30×10−96), with associations in CR1 (rs6701713, P=7.17×10−4), BIN1 (rs7561528, P=4.78×10−4), and PICALM (rs561655, P=2.23×10−3) reaching statistical significance (P<0.005). Risk alleles individually reduced age-at-onset by 3-6 months. Burden analyses demonstrated that APOE contributes to 3.9% of variation in age-at-onset (R2=0.220) over baseline (R2=0.189) whereas the other nine loci together contribute to 1.1% of variation (R2=0.198).
Conclusions and Relevance
We confirmed association of APOE variants with age-at-onset among late-onset Alzheimer disease cases and observed novel associations with age-at-onset in CR1, BIN1, and PICALM. In contrast to earlier hypothetical modeling, we show that the combined effects of Alzheimer disease risk variants on age-at-onset are on the scale of, but do not exceed, the APOE effect. While the aggregate effects of risk loci on age-at-onset may be significant, additional genetic contributions to age-at-onset are individually likely to be small.
Alzheimer Disease; Alzheimer Disease Genetics; Alzheimer’s Disease - Pathophysiology; Genetics of Alzheimer Disease; Aging
Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized association of variants influencing MPV and PLT using functional, pathway and disease enrichment analysis assess pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic (eMERGE) network had data for PLT and 6,291 participants had data for MPV. We identified 5 chromosomal regions associated with PLT and 8 associated with MPV at genome-wide significance (P<5E-8). In addition, we replicated 20 SNPs (out of 56 SNPs (α: 0.05/56=9E-4)) influencing PLT and 22 SNPs (out of 29 SNPs (α: 0.05/29=2E-3)) influencing MPV in a meta-analysis of GWAS of PLT and MPV. While our GWAS did not reveal any novel associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1368 diagnoses (0.05/1368=3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.
Up to half of unique genetic variants in genomic evaluations of familial cancer risk will be rare variants of uncertain significance. Classification of rare variants will be an ongoing issue as genomic testing becomes more common.
We modified standard power calculations to explore sample sizes necessary to classify and estimate relative disease risk for rare variant frequencies (0.001 to 0.00001) and varying relative risk (20 to 1.5) and using population-based and family-based designs focusing on breast and colon cancer. We required 80% power and tolerated a 10% false positive rate, since variants tested will be in known genes with high pretest probability.
Using population-based strategies, hundreds to millions of cases are necessary to classify rare cancer variants. Larger samples are necessary for less frequent and less penetrant variants. Family-based strategies are robust to changes in variant frequency and require between 8 and 1175 individuals, depending on risk.
It is unlikely that most rare missense variants will be classifiable in the near future and accurate relative risk estimates may never be available for very rare variants. This knowledge may alter strategies for communicating information about variants of uncertain significance to patients.
SAMPLE SIZE CALCULATION; POWER; VARIANT OF UNCERTAIN SIGNIFICANCE; STUDY DESIGN; ODDS RATIO; RELATIVE RISK; VUS; CANCER RISK
The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
imputation; genome-wide association; eMERGE; electronic health records
Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 p = 1.85×10−17, β = 0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: p = 1.08×10−6, β = −0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 p = 0.03, β = −0.09), VEGFA (rs11755845 p = 0.01, β = −0.13), and NFIA (rs334699 p = 1.50×10−3, β = −0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.
The Electronic Medical Records and Genomics (eMERGE) Network is a national consortium that is developing methods and best practices for using the electronic health record (EHR) for genomic medicine and research. We conducted a multi-site survey of information resources to support integration of pharmacogenomics into clinical care. This work aimed to: (a) characterize the diversity of information resource implementation strategies among eMERGE institutions; (b) develop a master template containing content topics of important for genomic medicine (as identified by the DISCERN-Genetics tool); and (c) assess the coverage of content topics among information resources developed by eMERGE institutions. Given that a standard implementation does not exist and sites relied on a diversity of information resources, we identified a need for a national effort to efficiently produce sharable genomic medicine resources capable of being accessed from the EHR. We discuss future areas of work to prepare institutions to use infobuttons for distributing standardized genomic content.
Combining samples across multiple cohorts in large-scale scientific research programs is often required to achieve the necessary power for genome-wide association studies. Controlling for genomic ancestry through principal component analysis (PCA) to address the effect of population stratification is a common practice. In addition to local genomic variation, such as copy number variation and inversions, other factors directly related to combining multiple studies, such as platform and site recruitment bias, can drive the correlation patterns in PCA. In this report, we describe the combination and analysis of multi-ethnic cohort with biobanks linked to electronic health records for large-scale genomic association discovery analyses. First, we outline the observed site and platform bias, in addition to ancestry differences. Second, we outline a general protocol for selecting variants for input into the subject variance-covariance matrix, the conventional PCA approach. Finally, we introduce an alternative approach to PCA by deriving components from subject loadings calculated from a reference sample. This alternative approach of generating principal components controlled for site and platform bias, in addition to ancestry differences, has the advantage of fewer covariates and degrees of freedom.
principal component analysis; ancestry; biobank; loadings; genetic association study
About half of malignant hyperthermia (MH) cases are associated with skeletal muscle ryanodine receptor 1 (RYR1) and calcium channel, voltage-dependent, L type, α1S subunit (CACNA1S) gene mutations, leaving many with an unknown cause. We chose to apply a sequencing approach to uncover causal variants in unknown cases. Sequencing the exome, the protein-coding region of the genome, has power at low sample sizes and identified the cause of over a dozen Mendelian disorders.
We considered four families with multiple MH cases but in whom no mutations in RYR1 and CACNA1S had been identified by Sanger sequencing of complementary DNA. Exome sequencing of two affecteds per family, chosen for maximum genetic distance, were compared. Variants were ranked by allele frequency, protein change, and measures of conservation among mammals to assess likelihood of causation. Finally, putative pathogenic mutations were genotyped in other family members to verify cosegregation with MH.
Exome sequencing revealed 1 rare RYR1 nonsynonymous variant in each of 3 families (Asp1056His, Val2627Met, Val4234Leu), and 1 CACNA1S variant (Thr1009Lys) in a 4th family. These were not seen in variant databases or in our control population sample of 5379 exomes. Follow-up sequencing in other family members verified cosegregation of alleles with MH.
Using both exome sequencing and allele frequency data from large sequencing efforts may aid genetic diagnosis of MH. In our sample, it was more sensitive for variant detection in known genes than Sanger sequencing of complementary DNA, and allows for the possibility of novel gene discovery.