The best-documented example for transmission distortion (TD) to normal offspring are the t haplotypes on mouse chromosome 17. In healthy humans, TD has been described for whole chromosomes and for particular loci, but multiple comparisons have presented a statistical obstacle in wide-ranging analyses. Here we provide six high-resolution TD maps of the short arm of human chromosome 6 (Hsa6p), based on single-nucleotide polymorphism (SNP) data from 60 trio families belonging to two ethnicities that are available through the International HapMap Project. We tested all approximately 70 000 previously genotyped SNPs within Hsa6p by the transmission disequilibrium test. TagSNP selection followed by permutation testing was performed to adjust for multiple testing. A statistically significant evidence for TD was observed among male parents of European ancestry, due to strong and wide-ranging skewed segregation in a 730 kb long region containing the transcription factor-encoding genes SUPT3H and RUNX2, as well as the microRNA locus MIRN586. We also observed that this chromosomal segment coincides with pronounced linkage disequilibrium (LD), suggesting a relationship between TD and LD. The fact that TD may be taking place in samples not selected for a genetic disease implies that linkage studies must be assessed with particular caution in chromosomal segments with evidence of TD.
transmission distortion; linkage disequilibrium; human chromosome 6p; SUPT3H; MIRN586; RUNX2
The analysis of genome-wide genetic association studies generally starts with univariate statistical tests of each single-nucleotide polymorphism. The standard approach is the Cochran-Armitage trend test or its logistic regression equivalent although this approach can lose considerable power if the underlying genetic model is not additive. An alternative is the MAX test, which is robust against the three basic modes of inheritance. Here, the asymptotic distribution of the MAX test is derived using the generalized linear model together with the Delta method and multiple contrasts. The approach is applicable to binary, quantitative, and survival traits. It may be used for unrelated individuals, family-based studies, and matched pairs. The approach provides point and interval effect estimates and allows selecting the most plausible genetic model using the minimum P-value. R code is provided. A Monte-Carlo simulation study shows that the asymptotic MAX test framework meets type I error levels well, has good power, and good model selection properties for minor allele frequencies ≥0.3. Pearson's χ2-test is superior for lower minor allele frequencies with low frequencies for the rare homozygous genotype. In these cases, the model selection procedure should be used with caution. The use of the MAX test is illustrated by reanalyzing findings from seven genome-wide association studies including case–control, matched pairs, and quantitative trait data.
family-based association; genetic association; genome-wide association; indirect mapping; MAX test
Biomarkers are considered as tools to enhance cardiovascular risk estimation. However, the value of biomarkers on risk estimation beyond European risk scores, their comparative impact among different European regions and their role towards personalised medicine remains uncertain. Biomarker for Cardiovascular Risk Assessment in Europe (BiomarCaRE) is an European collaborative research project with the primary objective to assess the value of established and emerging biomarkers for cardiovascular risk prediction. BiomarCaRE integrates clinical and epidemiological biomarker research and commercial enterprises throughout Europe to combine innovation in biomarker discovery for cardiovascular disease prediction with consecutive validation of biomarker effectiveness in large, well-defined primary and secondary prevention cohorts including over 300,000 participants from 13 European countries. Results from this study will contribute to improved cardiovascular risk prediction across different European populations. The present publication describes the rationale and design of the BiomarCaRE project.
Electronic supplementary material
The online version of this article (doi:10.1007/s10654-014-9952-x) contains supplementary material, which is available to authorized users.
BiomarCaRE; Biomarker; Cardiovascular Risk Assessment; MORGAM; EU
The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants. An entire class of statistical methods collapses rare variants from a genomic region of interest (ROI), thereby aggregating rare variants. In an extensive simulation study using data from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsing methods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholds and functionality. Findings of the simulation study were additionally confirmed by a real data set investigating the association between methotrexate clearance and the SLCO1B1 gene in patients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type I error levels for many of the proposed collapsing methods. Only four approaches yielded valid type I errors in all considered scenarios. None of the statistical tests was able to detect true associations over a substantial proportion of replicates in the simulated data. Detailed annotation of functionality of variants is crucial to detect true associations. These findings were confirmed in the analysis of the real data. Recent theoretical work showed that large power is achieved in gene-based analyses only if large sample sizes are available and a substantial proportion of causing rare variants is present in the gene-based analysis. Many of the investigated statistical approaches use permutation requiring high computational cost. There is a clear need for valid, powerful and fast to calculate test statistics for studies investigating rare variants.
collapsing; rare variants; simulation study; comparison; burden test; SLCO1B1
The mitochondrial m.1555A>G mutation is associated with a high rate of permanent hearing loss, if aminoglycosides are given. Preterm infants have an increased risk of permanent hearing loss and are frequently treated with aminoglycoside antibiotics.
We genotyped preterm infants with a birth weight below 1500 grams who were prospectively enrolled in a large cohort study for the m.1555A>G mutation. Treatment with aminoglycoside antibiotics in combination with mitochondrial m.1555A>G mutation was tested as a predictor for failed hearing screening at discharge in a multivariate logistic regression analysis.
7056 infants were genotyped and analysed. Low birth weight was the most significant predictor of failed hearing screening (p = 7.3 × 10-10). 12 infants (0.2%) had the m.1555A>G-mutation. In a multivariable logistic regression analysis, the combination of aminoglycoside treatment with m.1555A>G-carrier status was associated with failed hearing screening (p = 0.0058). However, only 3 out of 10 preterm m.1555A>G-carriers who were exposed to aminoglycosides failed hearing screening. The m.1555A>G-mutation was detected in all mothers of m.1555A>G-positive children, but in none of 2993 maternal DNA-samples of m.1555A>G-negative infants.
Antenatal screening for the m.1555A>G mutation by maternal genotyping of pregnant women with preterm labour might be a reasonable approach to identify infants who are at increased risk for permanent hearing loss. Additional studies are needed to estimate the relevance of cofactors like aminoglycoside plasma levels and birth weight and the amount of preterm m.1555A>G-carriers with permanent hearing loss.
Newborn; Screening; Hearing loss; Mitochondrial; Mutation
Sequencing technologies have enabled the investigation of whole genomes of many individuals in parallel. Studies have shown that the joint consideration of multiple rare variants may explain a relevant proportion of the genetic basis for disease so that grouping of rare variants, termed collapsing, can enrich the association signal.
Following this assumption, we investigate the type I error and the power of two proposed collapsing methods (combined multivariate and collapsing method and the functional principal component analysis [FPCA]-based statistic) using the case-control data provided for the Genetic Analysis Workshop 18 with knowledge of the true model. Variants with a minor allele frequency (MAF) of 0.05 or less were collapsed per gene for combined multivariate and collapsing. Neither of the methods detected any of the truly associated genes reliably. Although combined multivariate and collapsing identified one gene with a power of 0.66, it had an unacceptably high false-positive rate of 75%. In contrast, FPCA covered the type I error level well but at the cost of low power. A strict filtering of variants by small MAF might lead to a better performance of the collapsing methods. Furthermore, the inclusion of information on functionality of the variants could be helpful.
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
The analysis of genome-wide genetic association studies generally starts with univariate statistical tests of each single nucleotide polymorphism. The standard approach is the Cochran-Armitage trend test or its logistic regression equivalent although this approach can lose considerable power if the underlying genetic model is not additive. An alternative is the MAX test which is robust against the three basic modes of inheritance. Here, the asymptotic distribution of the MAX test is derived using the generalized linear model together with the Delta method and multiple contrasts. The approach is applicable to binary, quantitative, and survival traits. It may be used for unrelated individuals, family-based studies, and matched pairs. The approach provides point and interval effect estimates and allows selecting the most plausible genetic model using the minimum p-value. R code is provided. A Monte-Carlo simulation study shows that the asymptotic MAX test framework meets type I error levels well, has good power and good model selection properties for minor allele frequencies ≥0.3. Pearson’s chi-square test is superior for lower minor allele frequencies with low frequencies for the rare homozygous genotype. In these cases, the model selection procedure should be used with caution. The use of the MAX test is illustrated by re-analyzing findings from 7 genome-wide association studies including case-control, matched pairs, and quantitative trait data.
Family-based association; genetic association; genome-wide association; indirect mapping; MAX test
Hypertension is a risk factor for coronary artery disease. Recent genome-wide association studies have identified 30 genetic variants associated with higher blood pressure at genome-wide significance (p<5×10−8). If elevated blood pressure is a causative factor for coronary artery disease, these variants should also increase coronary artery disease risk. Analyzing genome-wide association data from 22,233 coronary artery disease cases and 64,762 controls, we observed in the Coronary artery disease Genome-Wide Replication And Meta-Analysis (CARDIoGRAM) consortium that 88% of these blood pressure-associated polymorphisms were likewise positively associated with coronary artery disease, i.e. they had an odds ratio >1 for coronary artery disease, a proportion much higher than expected by chance (p=4.10−5). The average relative coronary artery disease risk increase per each of the multiple blood pressure-raising alleles observed in the consortium was 3.0% for systolic blood pressure-associated polymorphisms (95% confidence interval, 1.8 to 4.3%) and 2.9% for diastolic blood pressure-associated polymorphisms (95% confidence interval, 1.7 to 4.1%). In sub-studies, individuals carrying most systolic blood pressure- and diastolic blood pressure-related risk alleles (top quintile of a genetic risk score distribution) had 70% (95% confidence interval, 50-94%) and 59% (95% confidence interval, 40-81%) higher odds of having coronary artery disease, respectively, as compared to individuals in the bottom quintile. In conclusion, most blood pressure-associated polymorphisms also confer an increased risk for coronary artery disease. These findings are consistent with a causal relationship of increasing blood pressure to coronary artery disease. Genetic variants primarily affecting blood pressure contribute to the genetic basis of coronary artery disease.
Blood pressure; polymorphism; genetics; coronary artery disease
Expression quantitative trait loci (eQTL) studies are performed to identify single-nucleotide polymorphisms that modify average expression values of genes, proteins, or metabolites, depending on the genotype. As expression values are often not normally distributed, statistical methods for eQTL studies should be valid and powerful in these situations. Adaptive tests are promising alternatives to standard approaches, such as the analysis of variance or the Kruskal–Wallis test. In a two-stage procedure, skewness and tail length of the distributions are estimated and used to select one of several linear rank tests. In this study, we compare two adaptive tests that were proposed in the literature using extensive Monte Carlo simulations of a wide range of different symmetric and skewed distributions. We derive a new adaptive test that combines the advantages of both literature-based approaches. The new test does not require the user to specify a distribution. It is slightly less powerful than the locally most powerful rank test for the correct distribution and at least as powerful as the maximin efficiency robust rank test. We illustrate the application of all tests using two examples from different eQTL studies.
adaptive test; eQTL study; gene expression; linear rank test; single-nucleotide polymorphism
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.
Nowadays, the availability of cheaper and accurate assays to quantify multiple (endo)phenotypes in large population cohorts allows multi-trait studies. However, these studies are limited by the lack of flexible models integrated with efficient computational tools for genome-wide multi SNPs-traits analyses. To overcome this problem, we propose a novel Bayesian analysis strategy and a new algorithmic implementation which exploits parallel processing architecture for fully multivariate modeling of groups of correlated phenotypes at the genome-wide scale. In addition to increased power of our algorithm over alternative Bayesian and well-established non-Bayesian multi-phenotype methods, we provide an application to a real case study of several blood lipid traits, and show how our method recovered most of the major associations and is better at refining multi-trait polygenic associations than alternative methods. We reveal and replicate in independent cohorts new associations with two phenotypic groups that were not detected by competing multivariate approaches and not noticed by a large meta-GWAS. We also discuss the applicability of the proposed method to large meta-analyses involving hundreds of thousands of individuals and to diverse genomic datasets where complex dependencies in the predictor space are present.
This systematic review determines the best known form of biofeedback (BF) and/or electrical stimulation (ES) for the treatment of fecal incontinence in adults and rates the quality of evidence using the Grades of Recommendation, Assessment, Development, and Evaluation. Attention is given to type, strength, and application mode of the current for ES and to safety.
Methods followed the Cochrane Handbook. Randomized controlled trials were included. Studies were searched in The Cochrane Library, MEDLINE, and EMBASE (registration number (PROSPERO): CRD42011001334).
BF and/or ES were studied in 13 randomized parallel-group trials. In 12 trials, at least one therapy group received BF alone and/or in combination with ES, while ES alone was evaluated in seven trials. Three (four) trials were rated as of high (moderate) quality. Average current strength was reported in three of seven studies investigating ES; only two studies reached the therapeutic window. No trial showed superiority of control, or of BF alone or of ES alone when compared with BF + ES. Superiority of BF + ES over any monotherapy was demonstrated in several trials. Amplitude-modulated medium-frequency (AM-MF) stimulation, also termed pre-modulated interferential stimulation, combined with BF was superior to both low-frequency ES and BF alone, and 50 % of the patients were continent after 6 months of treatment. Effects increased with treatment duration. Safety reporting was bad, and there are safety issues with some forms of low-frequency ES.
There is sufficient evidence for the efficacy of BF plus ES combined in treating fecal incontinence. AM-MF plus BF seems to be the most effective and safe treatment.
• The higher the quality of the randomized trial the more likely was a significant difference between treatment groups.
• Two times more patients became continent when biofeedback was used instead of a control, such as pelvic floor exercises.
• Two times more patients became continent when biofeedback plus electrical stimulation was used instead of biofeedback only.
• Low-frequency electrical stimulation can have adverse device effects, and this is in contrast to amplitude-modulated medium-frequency electrical stimulation.
• There is high quality evidence that amplitude-modulated medium-frequency electrical stimulation plus electromyography biofeedback is the best second-line treatment for fecal incontinence.
Electronic supplementary material
The online version of this article (doi:10.1007/s00384-013-1739-0) contains supplementary material, which is available to authorized users.
Conservative treatment; Biofeedback; Cleveland Clinic score; Electrical stimulation; Fecal incontinence; Meta-analysis
Combined analyses of gene networks and DNA sequence variation can provide new insights into the aetiology of common diseases. Here, we used integrated genome-wide approaches across seven rat tissues to identify gene networks and the loci underlying their regulation. We defined an interferon regulatory factor 7 (IRF7)1-driven inflammatory network (iDIN) enriched for viral response genes, which represents a molecular biomarker for macrophages and was regulated in multiple tissues by a locus on rat chromosome 15q25. At this locus, Epstein-Barr virus induced gene 2 (Ebi2 or Gpr183), which we localised to macrophages and is known to control B lymphocyte migration2,3, regulated the iDIN. The human chromosome 13q32 locus, orthologous to rat 15q25, controlled the human equivalent of iDIN, which was conserved in monocytes. For the macrophage-associated autoimmune disease type 1 diabetes (T1D) iDIN genes were more likely to associate with T1D susceptibility than randomly selected immune response genes (P = 8.85 × 10−6). The human locus controlling the iDIN, was associated with the risk of T1D at SNP rs9585056 (P = 7.0 × 10−10, odds ratio = 1.15), which was one of five SNPs in this region associated with EBI2 expression. These data implicate IRF7 network genes and their regulatory locus in the pathogenesis of T1D.
There is evidence across several species for genetic control of phenotypic variation of complex traits1–4, such that the variance among phenotypes is genotype dependent. Understanding genetic control of variability is important in evolutionary biology, agricultural selection programmes and human medicine, yet for complex traits, no individual genetic variants associated with variance, as opposed to the mean, have been identified. Here we perform a meta-analysis of genome-wide association studies of phenotypic variation using 170,000 samples on height and body mass index (BMI) in human populations. We report evidence that the single nucleotide polymorphism (SNP) rs7202116 at the FTO gene locus, which is known to be associated with obesity (as measured by mean BMI for each rs7202116 genotype)5–7, is also associated with phenotypic variability. We show that the results are not due to scale effects or other artefacts, and find no other experiment-wise significant evidence for effects on variability, either at loci other than FTO for BMI or at any locus for height. The difference in variance for BMI among individuals with opposite homozygous genotypes at the FTO locus is approximately 7%, corresponding to a difference of 0.5 kilograms in the standard deviation of weight. Our results indicate that genetic variants can be discovered that are associated with variability, and that between-person variability in obesity can partly be explained by the genotype at the FTO locus. The results are consistent with reported FTO by environment interactions for BMI8, possibly mediated by DNA methylation9,10. Our BMI results for other SNPs and our height results for all SNPs suggest that most genetic variants, including those that influence mean height or mean BMI, are not associated with phenotypic variance, or that their effects on variability are too small to detect even with samples sizes greater than 100,000.
Genetic determinants of peripheral arterial disease (PAD) remain largely unknown. To identify genetic variants associated with the ankle-brachial index (ABI), a noninvasive measure of PAD, we conducted a meta-analysis of genome-wide association study data from 21 population-based cohorts.
Methods and Results
Continuous ABI and PAD (ABI≤0.9) phenotypes adjusted for age and sex were examined. Each study conducted genotyping and imputed data to the ~2.5 million SNPs in HapMap. Linear and logistic regression models were used to test each SNP for association with ABI and PAD using additive genetic models. Study-specific data were combined using fixed-effects inverse variance weighted meta-analyses. There were a total of 41,692 participants of European ancestry (~60% women, mean ABI 1.02 to 1.19), including 3,409 participants with PAD and with GWAS data available. In the discovery meta-analysis, rs10757269 on chromosome 9 near CDKN2B had the strongest association with ABI (β= −0.006, p=2.46x10−8). We sought replication of the 6 strongest SNP associations in 5 population-based studies and 3 clinical samples (n=16,717). The association for rs10757269 strengthened in the combined discovery and replication analysis (p=2.65x10−9). No other SNP associations for ABI or PAD achieved genome-wide significance. However, two previously reported candidate genes for PAD and one SNP associated with coronary artery disease (CAD) were associated with ABI : DAB21P (rs13290547, p=3.6x10−5); CYBA (rs3794624, p=6.3x10−5); and rs1122608 (LDLR, p=0.0026).
GWAS in more than 40,000 individuals identified one genome-wide significant association on chromosome 9p21 with ABI. Two candidate genes for PAD and 1 SNP for CAD are associated with ABI.
cohort study; genetic association; genome-wide association study; meta-analysis; peripheral vascular disease
In order to assess whether gene expression variability could be influenced by several SNPs acting in cis, either through additive or more complex haplotype effects, a systematic genome-wide search for cis haplotype expression quantitative trait loci (eQTL) was conducted in a sample of 758 individuals, part of the Cardiogenics Transcriptomic Study, for which genome-wide monocyte expression and GWAS data were available. 19,805 RNA probes were assessed for cis haplotypic regulation through investigation of ∼2,1×109 haplotypic combinations. 2,650 probes demonstrated haplotypic p-values >104-fold smaller than the best single SNP p-value. Replication of significant haplotype effects were tested for 412 probes for which SNPs (or proxies) that defined the detected haplotypes were available in the Gutenberg Health Study composed of 1,374 individuals. At the Bonferroni correction level of 1.2×10−4 (∼0.05/412), 193 haplotypic signals replicated. 1000G imputation was then conducted, and 105 haplotypic signals still remained more informative than imputed SNPs. In-depth analysis of these 105 cis eQTL revealed that at 76 loci genetic associations were compatible with additive effects of several SNPs, while for the 29 remaining regions data could be compatible with a more complex haplotypic pattern. As 24 of the 105 cis eQTL have previously been reported to be disease-associated loci, this work highlights the need for conducting haplotype-based and 1000G imputed cis eQTL analysis before commencing functional studies at disease-associated loci.
In order to assess whether gene expression variability could be influenced by the presence of more than one cis-acting SNP, we have conducted a systematic genome-wide search for haplotypic cis eQTL effects in a sample of 758 individuals and replicated the findings in an independent sample of 1,374 subjects. In both studies, genome-wide monocytes expression and genotype data were available. We identified 105 genes whose monocyte expression was under the influence of multiple cis-acting SNPs. About 75% of the detected genetic effects were related to independent additive SNP effects and the last quarter due to more complex haplotype effects. Of note, 24 of the genes identified to be affected by multiple cis eSNPs have been previously reported to reside at disease-associated loci. This could suggest that such multiple locus-specific genetic effects could contribute to the susceptibility to human diseases.
Like human infants, songbirds learn their species-specific vocalizations through imitation learning. The birdsong system has emerged as a widely used experimental animal model for understanding the underlying neural mechanisms responsible for vocal production learning. However, how neural impulses are translated into the precise motor behavior of the complex vocal organ (syrinx) to create song is poorly understood. First and foremost, we lack a detailed understanding of syringeal morphology.
To fill this gap we combined non-invasive (high-field magnetic resonance imaging and micro-computed tomography) and invasive techniques (histology and micro-dissection) to construct the annotated high-resolution three-dimensional dataset, or morphome, of the zebra finch (Taeniopygia guttata) syrinx. We identified and annotated syringeal cartilage, bone and musculature in situ in unprecedented detail. We provide interactive three-dimensional models that greatly improve the communication of complex morphological data and our understanding of syringeal function in general.
Our results show that the syringeal skeleton is optimized for low weight driven by physiological constraints on song production. The present refinement of muscle organization and identity elucidates how apposed muscles actuate different syringeal elements. Our dataset allows for more precise predictions about muscle co-activation and synergies and has important implications for muscle activity and stimulation experiments. We also demonstrate how the syrinx can be stabilized during song to reduce mechanical noise and, as such, enhance repetitive execution of stereotypic motor patterns. In addition, we identify a cartilaginous structure suited to play a crucial role in the uncoupling of sound frequency and amplitude control, which permits a novel explanation of the evolutionary success of songbirds.
Coronary artery calcification (CAC) detected by computed tomography is a non-invasive measure of coronary atherosclerosis, that underlies most cases of myocardial infarction (MI). We aimed to identify common genetic variants associated with CAC and further investigate their associations with MI.
Methods and Results
Computed tomography was used to assess quantity of CAC. A meta-analysis of genome-wide association studies for CAC was carried out in 9,961 men and women from five independent community-based cohorts, with replication in three additional independent cohorts (n=6,032). We examined the top single nucleotide polymorphisms (SNPs) associated with CAC quantity for association with MI in multiple large genome-wide association studies of MI. Genome-wide significant associations with CAC for SNPs on chromosome 9p21 near CDKN2A and CDKN2B (top SNP: rs1333049, P=7.58×10−19) and 6p24 (top SNP: rs9349379, within the PHACTR1 gene, P=2.65×10−11) replicated for CAC and for MI. Additionally, there is evidence for concordance of SNP associations with both CAC and with MI at a number of other loci, including 3q22 (MRAS gene), 13q34 (COL4A1/COL4A2 genes), and 1p13 (SORT1 gene).
SNPs in the 9p21 and PHACTR1 gene loci were strongly associated with CAC and MI, and there are suggestive associations with both CAC and MI of SNPs in additional loci. Multiple genetic loci are associated with development of both underlying coronary atherosclerosis and clinical events.
cardiac computed tomography; coronary artery calcification; coronary atherosclerosis; genome-wide association studies; myocardial infarction
Microarray profiling of gene expression is widely applied in molecular biology and functional genomics. Experimental and technical variations make meta-analysis of different studies challenging. In a total of 3358 samples, all from German population-based cohorts, we investigated the effect of data preprocessing and the variability due to sample processing in whole blood cell and blood monocyte gene expression data, measured on the Illumina HumanHT-12 v3 BeadChip array.
Gene expression signal intensities were similar after applying the log2 or the variance-stabilizing transformation. In all cohorts, the first principal component (PC) explained more than 95% of the total variation. Technical factors substantially influenced signal intensity values, especially the Illumina chip assignment (33–48% of the variance), the RNA amplification batch (12–24%), the RNA isolation batch (16%), and the sample storage time, in particular the time between blood donation and RNA isolation for the whole blood cell samples (2–3%), and the time between RNA isolation and amplification for the monocyte samples (2%). White blood cell composition parameters were the strongest biological factors influencing the expression signal intensities in the whole blood cell samples (3%), followed by sex (1–2%) in both sample types. Known single nucleotide polymorphisms (SNPs) were located in 38% of the analyzed probe sequences and 4% of them included common SNPs (minor allele frequency >5%). Out of the tested SNPs, 1.4% significantly modified the probe-specific expression signals (Bonferroni corrected p-value<0.05), but in almost half of these events the signal intensities were even increased despite the occurrence of the mismatch. Thus, the vast majority of SNPs within probes had no significant effect on hybridization efficiency.
In summary, adjustment for a few selected technical factors greatly improved reliability of gene expression analyses. Such adjustments are particularly required for meta-analyses.
Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research.
To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics.
On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, that reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field.
The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology.
Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers.
Biomedical informatics; data mining; data analysis; data-driven methods; translational bioinformatics
We aimed to assess whether pri-miRNA SNPs (miSNPs) could influence monocyte gene expression, either through marginal association or by interacting with polymorphisms located in 3'UTR regions (3utrSNPs). We then conducted a genome-wide search for marginal miSNPs effects and pairwise miSNPs × 3utrSNPs interactions in a sample of 1,467 individuals for which genome-wide monocyte expression and genotype data were available. Statistical associations that survived multiple testing correction were tested for replication in an independent sample of 758 individuals with both monocyte gene expression and genotype data. In both studies, the hsa-mir-1279 rs1463335 was found to modulate in cis the expression of LYZ and in trans the expression of CNTN6, CTRC, COPZ2, KRT9, LRRFIP1, NOD1, PCDHA6, ST5 and TRAF3IP2 genes, supporting the role of hsa-mir-1279 as a regulator of several genes in monocytes. In addition, we identified two robust miSNPs × 3utrSNPs interactions, one involving HLA-DPB1 rs1042448 and hsa-mir-219-1 rs107822, the second the H1F0 rs1894644 and hsa-mir-659 rs5750504, modulating the expression of the associated genes.
As some of the aforementioned genes have previously been reported to reside at disease-associated loci, our findings provide novel arguments supporting the hypothesis that the genetic variability of miRNAs could also contribute to the susceptibility to human diseases.
High plasma HDL cholesterol is associated with reduced risk of myocardial infarction, but whether this association is causal is unclear. Exploiting the fact that genotypes are randomly assigned at meiosis, are independent of non-genetic confounding, and are unmodified by disease processes, mendelian randomisation can be used to test the hypothesis that the association of a plasma biomarker with disease is causal.
We performed two mendelian randomisation analyses. First, we used as an instrument a single nucleotide polymorphism (SNP) in the endothelial lipase gene (LIPG Asn396Ser) and tested this SNP in 20 studies (20 913 myocardial infarction cases, 95 407 controls). Second, we used as an instrument a genetic score consisting of 14 common SNPs that exclusively associate with HDL cholesterol and tested this score in up to 12 482 cases of myocardial infarction and 41 331 controls. As a positive control, we also tested a genetic score of 13 common SNPs exclusively associated with LDL cholesterol.
Carriers of the LIPG 396Ser allele (2·6% frequency) had higher HDL cholesterol (0·14 mmol/L higher, p=8×10−13) but similar levels of other lipid and non-lipid risk factors for myocardial infarction compared with non-carriers. This difference in HDL cholesterol is expected to decrease risk of myocardial infarction by 13% (odds ratio [OR] 0·87, 95% CI 0·84–0·91). However, we noted that the 396Ser allele was not associated with risk of myocardial infarction (OR 0·99, 95% CI 0·88–1·11, p=0·85). From observational epidemiology, an increase of 1 SD in HDL cholesterol was associated with reduced risk of myocardial infarction (OR 0·62, 95% CI 0·58–0·66). However, a 1 SD increase in HDL cholesterol due to genetic score was not associated with risk of myocardial infarction (OR 0·93, 95% CI 0·68–1·26, p=0·63). For LDL cholesterol, the estimate from observational epidemiology (a 1 SD increase in LDL cholesterol associated with OR 1·54, 95% CI 1·45–1·63) was concordant with that from genetic score (OR 2·13, 95% CI 1·69–2·69, p=2×10−10).
Some genetic mechanisms that raise plasma HDL cholesterol do not seem to lower risk of myocardial infarction. These data challenge the concept that raising of plasma HDL cholesterol will uniformly translate into reductions in risk of myocardial infarction.
US National Institutes of Health, The Wellcome Trust, European Union, British Heart Foundation, and the German Federal Ministry of Education and Research.
eQTL analyses are important to improve the understanding of genetic association results. Here, we performed a genome-wide association and global gene expression study to identify functionally relevant variants affecting the risk of coronary artery disease (CAD).
Methods and Results
In a genome-wide association analysis of 2,078 CAD cases and 2,953 controls, we identified 950 single nucleotide polymorphisms (SNPs) that were associated with CAD at P<10-3. Subsequent in silico and wet-lab replication stages and a final meta-analysis of 21,428 CAD cases and 38,361 controls revealed a novel association signal at chromosome 10q23.31 within the LIPA (Lysosomal Acid Lipase A) gene (P=3.7×10-8; OR 1.1; 95% CI: 1.07-1.14). The association of this locus with global gene expression was assessed by genome-wide expression analyses in the monocyte transcriptome of 1,494 individuals. The results showed a strong association of this locus with expression of the LIPA transcript (P=1.3×10-96). An assessment of LIPA SNPs and transcript with cardiovascular phenotypes revealed an association of LIPA transcript levels with impaired endothelial function (P=4.4×10-3).
The use of data on genetic variants and the addition of data on global monocytic gene expression led to the identification of the novel functional CAD susceptibility locus LIPA, located on chromosome 10q23.31. The respective eSNPs associated with CAD strongly affect LIPA gene expression level, which itself was related to endothelial dysfunction, a precursor of CAD.
coronary artery disease; genome-wide association studies; gene expression; genetic variation; genomics; eQTL; eSNP; LIPA