By 6 October 2014, many laboratories in the United States must begin honoring new individual data access rights created by recent changes to federal privacy and laboratory regulations. These access rights are more expansive than has been widely understood and pose complex challenges for genomic testing laboratories. This article analyzes regulatory texts and guidances to explore which laboratories are affected. It offers the first published analysis of which parts of the vast trove of data generated during next-generation sequencing will be accessible to patients and research subjects. Persons tested at affected laboratories seemingly will have access, upon request, to uninterpreted gene variant information contained in their stored variant call format, binary alignment/map, and FASTQ files. A defect in the regulations will subject some non-CLIA-regulated research laboratories to these new access requirements unless the Department of Health and Human Services takes swift action to avert this apparently unintended consequence. More broadly, all affected laboratories face a long list of daunting operational, business, compliance, and bioethical issues as they adapt to this change and to the Food and Drug Administration’s recently announced plan to publish draft guidance outlining a new oversight framework for lab-developed tests.
access rights; CLIA; FDA; HIPAA; return of results
Recent data suggest that high‐density lipoprotein cholesterol (HDL‐C) levels are likely not in the causative pathway of atheroprotection, shifting focus from HDL‐C to its subfractions and associated proteins. This study's goal was to determine which HDL phenotype was the better predictor of carotid artery disease (CAAD).
Methods and Results
HDL‐2 and HDL‐3 were measured in 1725 participants of European ancestry in a prevalent case‐control cohort study of CAAD. Stratified analyses were conducted for men (n=1201) and women (n=524). Stepwise linear regression was used to determine whether HDL‐C, HDL‐2, HDL‐3, or apolipoprotein A1 was the best predictor of CAAD, while adjusting for the confounders of censored age, diabetes, and current smoking status. In both men and women, HDL‐3 was negatively associated with CAAD (P=0.0011 and 0.033 for men and women, respectively); once HDL‐3 was included in the model, no other HDL phenotype was significantly associated with CAAD. Addition of paraoxonase 1 activity to the aforementioned regression model showed a significant and independent (of HDL‐3) association with CAAD in men (P=0.001) but not in the smaller female subgroup.
This study is the first to contrast the associations of HDL‐2 and HDL‐3 with CAAD. We found that HDL‐3 levels were more predictive of CAAD status than HDL‐2, HDL‐C, or apolipoprotein A1. In addition, for men, paraoxonase 1 activity improved the overall model prediction for CAAD independently and additively with HDL‐3 levels. Further investigation into the molecular mechanisms through which HDL‐3 is associated with protection from CAAD is warranted.
atherosclerosis; carotid arteries; high‐density lipoprotein; lipids; lipoproteins
The electronic Medical Records and Genomics (eMERGE) (Phase I) network was established in 2007 to further genomic discovery using biorepositories linked to the electronic health record (EHR). In Phase II, which began in 2011, genomic discovery efforts continue and in addition the network is investigating best practices for implementing genomic medicine, in particular, the return of genomic results in the EHR for use by physicians at point-of-care. To develop strategies for addressing the challenges of implementing genomic medicine in the clinical setting, the eMERGE network is conducting studies that return clinically-relevant genomic results to research participants and their health care providers. These genomic medicine pilot studies include returning individual genetic variants associated with disease susceptibility or drug response, as well as genetic risk scores for common “complex” disorders. Additionally, as part of a network-wide pharmacogenomics-related project, targeted resequencing of 84 pharmacogenes is being performed and select genotypes of pharmacogenetic relevance are being placed in the EHR to guide individualized drug therapy. Individual sites within the eMERGE network are exploring mechanisms to address incidental findings generated by resequencing of the 84 pharmacogenes. In this paper, we describe studies being conducted within the eMERGE network to develop best practices for integrating genomic findings into the EHR, and the challenges associated with such work.
genomics; electronic health records; incidental findings; implementation; genetic counseling; next generation sequencing; pharmacogenetics
As genomic and exomic testing expands in both the research and clinical arenas, determining whether, how, and which incidental findings to return to the ordering clinician and patient becomes increasingly important. Although opinion is varied on what should be returned to consenting patients or research participants, most experts agree that return of medically actionable results should be considered. There is insufficient evidence to fully inform evidence-based clinical practice guidelines regarding return of results from genome-scale sequencing, and thus generation of such evidence is imperative, given the rapidity with which genome-scale diagnostic tests are being incorporated into clinical care. We present an overview of the approaches to incidental findings by members of the Clinical Sequencing Exploratory Research network, funded by the National Human Genome Research Institute, to generate discussion of these approaches by the clinical genomics community. We also report specific lists of “medically actionable” genes that have been generated by a subset of investigators in order to explore what types of findings have been included or excluded in various contexts. A discussion of the general principles regarding reporting of novel variants, challenging cases (genes for which consensus was difficult to achieve across Clinical Sequencing Exploratory Research network sites), solicitation of preferences from participants regarding return of incidental findings, and the timing and context of return of incidental findings are provided.
actionability; actionable genes; clinical sequencing; genomic medicine; incidental findings
It is critical to develop new metrics to determine whether high density lipoprotein (HDL) is cardioprotective in humans. One promising approach is HDL particle concentration (HDL-P) – the size and concentration of HDL in plasma or serum. However, the two methods currently used to determine HDL-P yield concentrations that differ more than 5-fold. We therefore developed and validated an improved approach to quantify HDL-P, termed calibrated ion mobility analysis (calibrated IMA).
HDL was isolated from plasma by ultracentrifugation, introduced into the gas phase with electrospray ionization, separated by size, and quantified by particle counting. A calibration curve constructed with purified proteins was used to correct for the ionization efficiency of HDL particles.
The concentrations of gold nanoparticles and reconstituted HDLs measured by calibrated IMA were indistinguishable from concentrations determined by orthogonal methods. In plasma of control (n=40) and cerebrovascular disease (n=40) subjects, three subspecies of HDL were reproducibility measured, with an estimated total HDL-P of 13.4±2.4 µM (mean±SD). HDL-C accounted for 48% of the variance in HDL-P. HDL-P was significantly lower in subjects with cerebrovascular disease, and this difference remained significant after adjustment for HDL cholesterol levels.
Calibrated IMA accurately and reproducibly determined the concentration of gold nanoparticles and synthetic HDL, strongly suggesting the method could accurately quantify HDL particle concentration. Importantly, the estimated stoichiometry of apoA-I determined by calibrated IMA was 3–4 per HDL particle, in excellent agreement with current structural models. Furthermore, HDL-P associated with cardiovascular disease status in a clinical population independently of HDL cholesterol.
cardiovascular disease; carotid cerebrovascular disease; native electrospray ionization; HDL
As APOE locus variants contribute to both risk of late-onset Alzheimer disease and differences in age-at-onset, it is important to know if other established late-onset Alzheimer disease risk loci also affect age-at-onset in cases.
To investigate the effects of known Alzheimer disease risk loci in modifying age-at-onset, and to estimate their cumulative effect on age-at-onset variation, using data from genome-wide association studies in the Alzheimer’s Disease Genetics Consortium (ADGC).
Design, Setting and Participants
The ADGC comprises 14 case-control, prospective, and family-based datasets with data on 9,162 Caucasian participants with Alzheimer’s occurring after age 60 who also had complete age-at-onset information, gathered between 1989 and 2011 at multiple sites by participating studies. Data on genotyped or imputed single nucleotide polymorphisms (SNPs) most significantly associated with risk at ten confirmed LOAD loci were examined in linear modeling of AAO, and individual dataset results were combined using a random effects, inverse variance-weighted meta-analysis approach to determine if they contribute to variation in age-at-onset. Aggregate effects of all risk loci on AAO were examined in a burden analysis using genotype scores weighted by risk effect sizes.
Main Outcomes and Measures
Age at disease onset abstracted from medical records among participants with late-onset Alzheimer disease diagnosed per standard criteria.
Analysis confirmed association of APOE with age-at-onset (rs6857, P=3.30×10−96), with associations in CR1 (rs6701713, P=7.17×10−4), BIN1 (rs7561528, P=4.78×10−4), and PICALM (rs561655, P=2.23×10−3) reaching statistical significance (P<0.005). Risk alleles individually reduced age-at-onset by 3-6 months. Burden analyses demonstrated that APOE contributes to 3.9% of variation in age-at-onset (R2=0.220) over baseline (R2=0.189) whereas the other nine loci together contribute to 1.1% of variation (R2=0.198).
Conclusions and Relevance
We confirmed association of APOE variants with age-at-onset among late-onset Alzheimer disease cases and observed novel associations with age-at-onset in CR1, BIN1, and PICALM. In contrast to earlier hypothetical modeling, we show that the combined effects of Alzheimer disease risk variants on age-at-onset are on the scale of, but do not exceed, the APOE effect. While the aggregate effects of risk loci on age-at-onset may be significant, additional genetic contributions to age-at-onset are individually likely to be small.
Alzheimer Disease; Alzheimer Disease Genetics; Alzheimer’s Disease - Pathophysiology; Genetics of Alzheimer Disease; Aging
Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized association of variants influencing MPV and PLT using functional, pathway and disease enrichment analysis assess pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic (eMERGE) network had data for PLT and 6,291 participants had data for MPV. We identified 5 chromosomal regions associated with PLT and 8 associated with MPV at genome-wide significance (P<5E-8). In addition, we replicated 20 SNPs (out of 56 SNPs (α: 0.05/56=9E-4)) influencing PLT and 22 SNPs (out of 29 SNPs (α: 0.05/29=2E-3)) influencing MPV in a meta-analysis of GWAS of PLT and MPV. While our GWAS did not reveal any novel associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1368 diagnoses (0.05/1368=3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.
Up to half of unique genetic variants in genomic evaluations of familial cancer risk will be rare variants of uncertain significance. Classification of rare variants will be an ongoing issue as genomic testing becomes more common.
We modified standard power calculations to explore sample sizes necessary to classify and estimate relative disease risk for rare variant frequencies (0.001 to 0.00001) and varying relative risk (20 to 1.5) and using population-based and family-based designs focusing on breast and colon cancer. We required 80% power and tolerated a 10% false positive rate, since variants tested will be in known genes with high pretest probability.
Using population-based strategies, hundreds to millions of cases are necessary to classify rare cancer variants. Larger samples are necessary for less frequent and less penetrant variants. Family-based strategies are robust to changes in variant frequency and require between 8 and 1175 individuals, depending on risk.
It is unlikely that most rare missense variants will be classifiable in the near future and accurate relative risk estimates may never be available for very rare variants. This knowledge may alter strategies for communicating information about variants of uncertain significance to patients.
SAMPLE SIZE CALCULATION; POWER; VARIANT OF UNCERTAIN SIGNIFICANCE; STUDY DESIGN; ODDS RATIO; RELATIVE RISK; VUS; CANCER RISK
The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
imputation; genome-wide association; eMERGE; electronic health records
Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 p = 1.85×10−17, β = 0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: p = 1.08×10−6, β = −0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 p = 0.03, β = −0.09), VEGFA (rs11755845 p = 0.01, β = −0.13), and NFIA (rs334699 p = 1.50×10−3, β = −0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.
Background. Paraoxonase 1 (PON1) enzymatic activity has been consistently predictive of cardiovascular disease, while the genotypes at the four functional polymorphisms at PON1 have not. The goal of this study was to identify additional variation at the PON gene cluster that improved prediction of PON1 activity and determine if these variants predict carotid artery disease (CAAD). Methods. We considered 1,328 males in a CAAD cohort. 51 tagging single-nucleotide polymorphisms (tag SNPs) across the PON cluster were evaluated to determine their effects on PON1 activity and CAAD status. Results. Six SNPs (four in PON1 and one each in PON2/3) predicted PON1 arylesterase (AREase) activity, in addition to the four previously known functional SNPs. In total, the 10 SNPs explained 30.1% of AREase activity, 5% of which was attributable to the six identified predictive SNPs. We replicate rs854567 prediction of 2.3% of AREase variance, the effects of rs3917510, and a PON3 haplotype that includes rs2375005. While AREase activity strongly predicted CAAD, none of the 10 SNPs predicting AREase predicted CAAD. Conclusions. This study identifies new genetic variants that predict additional PON1 AREase activity. Identification of SNPs associated with PON1 activity is required when evaluating the many phenotypes associated with genetic variation near PON1.
Combining samples across multiple cohorts in large-scale scientific research programs is often required to achieve the necessary power for genome-wide association studies. Controlling for genomic ancestry through principal component analysis (PCA) to address the effect of population stratification is a common practice. In addition to local genomic variation, such as copy number variation and inversions, other factors directly related to combining multiple studies, such as platform and site recruitment bias, can drive the correlation patterns in PCA. In this report, we describe the combination and analysis of multi-ethnic cohort with biobanks linked to electronic health records for large-scale genomic association discovery analyses. First, we outline the observed site and platform bias, in addition to ancestry differences. Second, we outline a general protocol for selecting variants for input into the subject variance-covariance matrix, the conventional PCA approach. Finally, we introduce an alternative approach to PCA by deriving components from subject loadings calculated from a reference sample. This alternative approach of generating principal components controlled for site and platform bias, in addition to ancestry differences, has the advantage of fewer covariates and degrees of freedom.
principal component analysis; ancestry; biobank; loadings; genetic association study
About half of malignant hyperthermia (MH) cases are associated with skeletal muscle ryanodine receptor 1 (RYR1) and calcium channel, voltage-dependent, L type, α1S subunit (CACNA1S) gene mutations, leaving many with an unknown cause. We chose to apply a sequencing approach to uncover causal variants in unknown cases. Sequencing the exome, the protein-coding region of the genome, has power at low sample sizes and identified the cause of over a dozen Mendelian disorders.
We considered four families with multiple MH cases but in whom no mutations in RYR1 and CACNA1S had been identified by Sanger sequencing of complementary DNA. Exome sequencing of two affecteds per family, chosen for maximum genetic distance, were compared. Variants were ranked by allele frequency, protein change, and measures of conservation among mammals to assess likelihood of causation. Finally, putative pathogenic mutations were genotyped in other family members to verify cosegregation with MH.
Exome sequencing revealed 1 rare RYR1 nonsynonymous variant in each of 3 families (Asp1056His, Val2627Met, Val4234Leu), and 1 CACNA1S variant (Thr1009Lys) in a 4th family. These were not seen in variant databases or in our control population sample of 5379 exomes. Follow-up sequencing in other family members verified cosegregation of alleles with MH.
Using both exome sequencing and allele frequency data from large sequencing efforts may aid genetic diagnosis of MH. In our sample, it was more sensitive for variant detection in known genes than Sanger sequencing of complementary DNA, and allows for the possibility of novel gene discovery.
Background and Purpose
Lipoprotein(a) level (Lp(a)) is an established risk factor for coronary artery disease and has been implicated in carotid artery disease (CAAD). The relationship between genetic variation in the LPA gene region and CAAD risk remains unknown.
We genotyped single nucleotide polymorphisms (SNPs) in the LPAL2, LPA, and PLG region in 530 individuals with severe CAAD and 770 controls and kringle IV type 2 (KIV2) repeat length in a subset of 90 individuals.
Nine SNPs collectively accounted for 30% of the variance in Lp(a) level. Six SNPs were associated with Lp(a) level after accounting for KIV2 copy number, and the dominant KIV2 allele combined with these markers explained 60% of the variance in Lp(a) level. Five SNPs, including rs10455872, which had an odds ratio of 2.1 per minor allele, and haplotypes formed by rs10455872, rs6919346, and rs3123629 were significant predictors of CAAD. After accounting for Lp(a) level, all evidence of CAAD-genotype association in the LPA region was eliminated.
LPA region SNPs capture some but not all of the effect of KIV2 repeat length on Lp(a) level. There are associations between LPA region SNPs and CAAD which appear to be due to effects on Lp(a) level.
Carotid stenosis; atherosclerosis; lipoprotein(a); genomics; risk factors
Whole exome and whole genome sequencing are applications of next generation sequencing transforming clinical care, but there is little evidence whether these tests improve patient outcomes or if they are cost effective compared to current standard of care. These gaps in knowledge can be addressed by comparative effectiveness and patient-centered outcomes research. We designed a randomized controlled trial that incorporates these research methods to evaluate whole exome sequencing compared to usual care in patients being evaluated for hereditary colorectal cancer and polyposis syndromes. Approximately 220 patients will be randomized and followed for 12 months after return of genomic findings. Patients will receive findings associated with colorectal cancer in a first return of result visit, and findings not associated with colorectal cancer (incidental findings) during a second return of result visit. The primary outcome is efficacy to detect mutations associated with these syndromes; secondary outcomes include psychosocial impact, cost-effectiveness and comparative costs. The secondary outcomes will be obtained via surveys before and after each return visit. The expected challenges in conducting this randomized controlled trial include the relatively low prevalence of genetic disease, difficult interpretation of some genetic variants, and uncertainty about which incidental findings should be returned to patients. The approaches utilized in this study may help guide other investigators in clinical genomics to identify useful outcome measures and strategies to address comparative effectiveness questions about the clinical implementation of genomic sequencing in clinical care.
Comparative effectiveness research; Genomics; Next generation sequencing; Randomized clinical trial; Outcomes research; Whole exome sequencing
Recent data suggest that an increased level of high-density lipoprotein cholesterol (HDL-C) is not causally protective against heart disease, shifting focus to other sub-phenotypes of HDL. Prior work on the effects of dietary intakes has focused largely on HDL-C. The goal of this study was to identify the dietary intakes that affect HDL-related measures: HDL-C, HDL-2, HDL-3, and apoA1 using data from a carotid artery disease case–control cohort.
A subset of 1,566 participants with extensive lipid phenotype data completed the Harvard Standardized Food Frequency Questionnaire to determine their daily micronutrient intake over the past year. Stepwise linear regression was used to separately evaluate the effects of dietary covariates on adjusted levels of HDL-C, HDL-2, HDL-3, and apoA1.
Dietary folate intake was positively associated with HDL-C (p = 0.007), HDL-2 (p = 0.0011), HDL-3 (p = 0.0022), and apoA1 (p = 0.001). Alcohol intake and myristic acid (14:0), a saturated fat, were each significantly associated with increased levels of all HDL-related measures studied. Dietary carbohydrate and iron intake were significantly associated with decreased levels of all HDL-related measures. Magnesium intake was positively associated with HDL-C, HDL-2, and HDL-3 levels, but not apoA1 levels, while vitamin C was only associated with apoA1 levels. Dietary fiber and protein intake were both associated with HDL-3 levels alone.
This study is the first to report that dietary folate intake is associated with HDL-C, HDL-2, HDL-3, and apoA1 levels in humans. We further identify numerous dietary intake associations with apoA1, HDL-2, and HDL-3 levels. Given the shifting focus away from HDL-C, these data will prove valuable for future epidemiologic investigation of the role of diet and multiple HDL phenotypes in heart disease.
HDL; HDL-C; HDL-2; HDL-3; Apolipoprotein A1; HDL subfractions; Folate; Alcohol; Fatty acids; Magnesium; Food frequency questionnaire; Cardiovascular disease
Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few common variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: 1) identify clinically valid genetic variants; 2) decide whether they are actionable and what the action should be; and 3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop.
genomic medicine; clinical actionability; database; electronic health records (EHR); pharmacogenomics; DNA sequencing
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11–1.24, p = 2.10 × 10−9) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08–1.21, p = 2.34 × 10−6). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07–1.22, p = 3.33 × 10−5); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74–0.91, p = 5.41 × 10−5) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
PheWAS; genetic association; pleiotropy; Exome chip; FTO; BMI
The underlying ethos of dbGaP is that access to these data by secondary data analysts facilitates advancement of science. NIH has required that genome-wide association study data be deposited in the Database of Genotypes and Phenotypes (dbGaP) since 2003. In 2013, a proposed updated policy extended this requirement to next-generation sequencing data. However, recent literature and anecdotal reports suggest lingering logistical and ethical concerns about subject identifiability, informed consent, publication embargo enforcement, and difficulty in accessing dbGaP data. We surveyed the International Genetic Epidemiology Society (IGES) membership about their experiences. One hundred and seventy five (175) individuals completed the survey, a response rate of 27%. Of respondents who received data from dbGaP (43%), only 32% perceived the application process as easy but most (75%) received data within five months. Remaining challenges include difficulty in identifying an institutional signing official and an overlong application process. Only 24% of respondents had contributed data to dbGaP. Of these, 31% reported local IRB restrictions on data release; an additional 15% had to reconsent study participants before depositing data. The majority of respondents (56%) disagreed that the publication embargo period was sufficient. In response, we recommend longer embargo periods and use of varied data-sharing models rather than a one-size-fits-all approach.
data sharing; identifiability; GWAS; ELSI; ethics; publication embargo; collaboration
The return of individual results to research participants has been vigorously debated. Consensus statements indicate that researchers and bioethicists consider the return of research results most appropriate when the findings are clinically relevant. Even when clinical utility is the motivator, however, the return of individual research results is not equivalent to clinical care. There are important differences in the domains of research and medical care, both from a legal standpoint and in terms of the ethical responsibilities of clinicians and researchers. As a corollary, researchers risk promoting a therapeutic misconception if they create quasi-clinical settings for return of clinically relevant research results. Rather, efforts should be focused on clarity in the provision of research results, appropriate caveats and, most important, appropriate referrals when the results may be helpful to consider in medical care.
Genetic research results; medical practice; CLIA; HIPAA
PON1 is a key component of high-density lipoproteins (HDLs) and is at least partially responsible for HDL's antioxidant/atheroprotective properties. PON1 is also associated with numerous human diseases, including cardiovascular disease, Parkinson's disease and cancer. In addition, PON1 metabolizes a broad variety of substrates, including toxic organophosphorous compounds, statin adducts, glucocorticoids, the likely atherogenic l-homocysteine thiolactone and the quorum-sensing factor of Pseudomonas aeruginosa. Numerous cardiovascular and antidiabetic pharmacologic agents, dietary macronutrients, lifestyle factors and antioxidant supplements affect PON1 expression and enzyme activity levels. Owing to the importance of PON1 to HDL function and its individual association with diverse human diseases, pharmacogenomic interactions between PON1 and the various factors that alter its expression and activity may represent an important therapeutic target for future investigation.
antioxidants; cardiovascular disease; drug interactions; gene-by-environment interactions; oxidative stress; paraoxonase; pharmacogenetics; pharmacogenomics; PON1; statins
The electronic MEdical Records & GEnomics (eMERGE) network was established in 2007 by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) in part to explore the utility of electronic medical records (EMRs) in genome science. The initial focus was on discovery primarily using the genome-wide association paradigm, but more recently, the network has begun evaluating mechanisms to implement new genomic information coupled to clinical decision support into EMRs. Herein, we describe this evolution including the development of the individual and merged eMERGE genomic datasets, the contribution the network has made toward genomic discovery and human health, and the steps taken toward the next generation genotype-phenotype association studies and clinical implementation.
biobanks; genome-wide association studies; pharmacogenomics; electronic medical records
With white blood cell count emerging as an important risk factor for chronic inflammatory diseases, genetic associations of differential leukocyte types, specifically monocyte count, are providing novel candidate genes and pathways to further investigate. Circulating monocytes play a critical role in vascular diseases such as in the formation of atherosclerotic plaque. We performed a joint and ancestry-stratified genome-wide association analyses to identify variants specifically associated with monocyte count in 11 014 subjects in the electronic Medical Records and Genomics Network. In the joint and European ancestry samples, we identified novel associations in the chromosome 16 interferon regulatory factor 8 (IRF8) gene (P-value = 2.78×10(−16), β = −0.22). Other monocyte associations include novel missense variants in the chemokine-binding protein 2 (CCBP2) gene (P-value = 1.88×10(−7), β = 0.30) and a region of replication found in ribophorin I (RPN1) (P-value = 2.63×10(−16), β = −0.23) on chromosome 3. The CCBP2 and RPN1 region is located near GATA binding protein2 gene that has been previously shown to be associated with coronary heart disease. On chromosome 9, we found a novel association in the prostaglandin reductase 1 gene (P-value = 2.29×10(−7), β = 0.16), which is downstream from lysophosphatidic acid receptor 1. This region has previously been shown to be associated with monocyte count. We also replicated monocyte associations of genome-wide significance (P-value = 5.68×10(−17), β = −0.23) at the integrin, alpha 4 gene on chromosome 2. The novel IRF8 results and further replications provide supporting evidence of genetic regions associated with monocyte count.