Dates are tropical fruits with appreciable nutritional value. Previous attempts at global metabolic characterization of the date metabolome were constrained by small sample size and limited geographical sampling. In this study, two independent large cohorts of mature dates exhibiting substantial diversity in origin, varieties and fruit processing conditions were measured by metabolomics techniques in order to identify major determinants of the fruit metabolome.
Multivariate analysis revealed a first principal component (PC1) significantly associated with the dates’ countries of production. The availability of a smaller dataset featuring immature dates from different development stages served to build a model of the ripening process in dates, which helped reveal a strong ripening signature in PC1. Analysis revealed enrichment in the dry type of dates amongst fruits with early ripening profiles at one end of PC1 as oppose to an overrepresentation of the soft type of dates with late ripening profiles at the other end of PC1. Dry dates are typical to the North African region whilst soft dates are more popular in the Gulf region, which partly explains the observed association between PC1 and geography. Analysis of the loading values, expressing metabolite correlation levels with PC1, revealed enrichment patterns of a comprehensive range of metabolite classes along PC1. Three distinct metabolic phases corresponding to known stages of date ripening were observed: An early phase enriched in regulatory hormones, amines and polyamines, energy production, tannins, sucrose and anti-oxidant activity, a second phase with on-going phenylpropanoid secondary metabolism, gene expression and phospholipid metabolism and a late phase with marked sugar dehydration activity and degradation reactions leading to increased volatile synthesis.
These data indicate the importance of date ripening as a main driver of variation in the date metabolome responsible for their diverse nutritional and economical values. The biochemistry of the ripening process in dates is consistent with other fruits but natural dryness may prevent degenerative senescence in dates following ripening. Based on the finding that mature dates present varying extents of ripening, our survey of the date metabolome essentially revealed snapshots of interchanging metabolic states during ripening empowering an in-depth characterization of underlying biology.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-015-0672-5) contains supplementary material, which is available to authorized users.
Date fruit; Ripening; Metabolomics; Date palm; Soft dates varieties; Dry dates varieties; SIMCA; OPLS; PCA; Multivariate
Availability of standardized metabolite panels and genome-wide single-nucleotide polymorphism data endorse the comprehensive analysis of gene–metabolite association. Currently, many studies use genome-wide association analysis to investigate the genetic effects on single metabolites (mGWAS) separately. Such studies have identified several loci that are associated not only with one but with multiple metabolites, facilitated by the fact that metabolite panels often include metabolites of the same or related pathways. Strategies that analyse several phenotypes in a combined way were shown to be able to detect additional genetic loci. One of those methods is the phenotype set enrichment analysis (PSEA) that tests sets of metabolites for enrichment at genes. Here we applied PSEA on two different panels of serum metabolites together with genome-wide data. All analyses were performed as a two-step identification–validation approach, using data from the population-based KORA cohort and the TwinsUK study. In addition to confirming genes that were already known from mGWAS, we were able to identify and validate 12 new genes. Knowledge about gene function was supported by the enriched metabolite sets. For loci with unknown gene functions, the results suggest a function that is interrelated with the metabolites, and hint at the underlying pathways.
The populations of the Arabian Peninsula remain the least represented in public genetic databases, both in terms of single nucleotide variants and of larger genomic mutations. We present the first high-resolution copy number variation (CNV) map for a Gulf Arab population, using a hybrid approach that integrates array genotyping intensity data and next-generation sequencing reads to call CNVs in the Qatari population.
CNVs were detected in 97 unrelated Qatari individuals by running two calling algorithms on each of two primary datasets: high-resolution genotyping (Illumina Omni 2.5M) and high depth whole-genome sequencing (Illumina PE 100bp). The four call-sets were integrated to identify high confidence CNV regions, which were subsequently annotated for putative functional effect and compared to public databases of CNVs in other populations. The availability of genome sequence was leveraged to identify tagging SNPs in high LD with common deletions in this population, enabling their imputation from genotyping experiments in the future.
Genotyping intensities and genome sequencing data from 97 Qataris were analyzed with four different algorithms and integrated to discover 16,660 high confidence CNV regions (CNVRs) in the total population, affecting ~28 Mb in the median Qatari genome. Up to 40 % of all CNVs affected genes, including novel CNVs affecting Mendelian disease genes, segregating at different frequencies in the 3 major Qatari subpopulations, including those with Bedouin, Persian/South Asian, and African ancestry. Consistent with high consanguinity levels in the Bedouin subpopulation, we found an increased burden for homozygous deletions in this group. In comparison to known CNVs in the comprehensive Database of Genomic Variants, we found that 5 % of all CNVRs in Qataris were completely novel, with an enrichment of CNVs affecting several known chromosomal disorder loci and genes known to regulate sugar metabolism and type 2 diabetes in the Qatari cohort. Finally, we leveraged the availability of genome sequence to find suitable tagging SNPs for common deletions in this population.
We combine four independently generated datasets from 97 individuals to study CNVs for the first time at high-resolution in a Gulf Arab population.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1991-5) contains supplementary material, which is available to authorized users.
Copy number variation; Next-generation sequencing; Genotyping; Genomics; Mendelian disease; Qatar
Non-cellular blood circulating microRNAs (plasma miRNAs) represent a promising source for the development of prognostic and diagnostic tools owing to their minimally invasive sampling, high stability, and simple quantification by standard techniques such as RT-qPCR. So far, the majority of association studies involving plasma miRNAs were disease-specific case-control analyses. In contrast, in the present study, plasma miRNAs were analysed in a sample of 372 individuals from a population-based cohort study, the Study of Health in Pomerania (SHIP).
Quantification of miRNA levels was performed by RT-qPCR using the Exiqon Serum/Plasma Focus microRNA PCR Panel V3.M covering 179 different miRNAs. Of these, 155 were included in our analyses after quality-control. Associations between plasma miRNAs and the phenotypes age, body mass index (BMI), and sex were assessed via a two-step linear regression approach per miRNA. The first step regressed out the technical parameters and the second step determined the remaining associations between the respective plasma miRNA and the phenotypes of interest.
After regressing out technical parameters and adjusting for the respective other two phenotypes, 7, 15, and 35 plasma miRNAs were significantly (q < 0.05) associated with age, BMI, and sex, respectively. Additional adjustment for the blood cell parameters identified 12 and 19 miRNAs to be significantly associated with age and BMI, respectively. Most of the BMI-associated miRNAs likely originate from liver. Sex-associated differences in miRNA levels were largely determined by differences in blood cell parameters. Thus, only 7 as compared to originally 35 sex-associated miRNAs displayed sex-specific differences after adjustment for blood cell parameters.
These findings emphasize that circulating miRNAs are strongly impacted by age, BMI, and sex. Hence, these parameters should be considered as covariates in association studies based on plasma miRNA levels. The established experimental and computational workflow can now be used in future screening studies to determine associations of plasma miRNAs with defined disease phenotypes.
Electronic supplementary material
The online version of this article (doi:10.1186/s12920-015-0136-7) contains supplementary material, which is available to authorized users.
BMI; Age; Sex; Circulating microRNA; miRNA; Association studies; Plasma; Blood
The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe “MetaRNA-Seq,” a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
The susceptibility for various diseases as well as the response to treatments differ considerably between men and women. As a basis for a gender-specific personalized healthcare, an extensive characterization of the molecular differences between the two genders is required. In the present study, we conducted a large-scale metabolomics analysis of 507 metabolic markers measured in serum of 1756 participants from the German KORA F4 study (903 females and 853 males). One-third of the metabolites show significant differences between males and females. A pathway analysis revealed strong differences in steroid metabolism, fatty acids and further lipids, a large fraction of amino acids, oxidative phosphorylation, purine metabolism and gamma-glutamyl dipeptides. We then extended this analysis by a network-based clustering approach. Metabolite interactions were estimated using Gaussian graphical models to get an unbiased, fully data-driven metabolic network representation. This approach is not limited to possibly arbitrary pathway boundaries and can even include poorly or uncharacterized metabolites. The network analysis revealed several strongly gender-regulated submodules across different pathways. Finally, a gender-stratified genome-wide association study was performed to determine whether the observed gender differences are caused by dimorphisms in the effects of genetic polymorphisms on the metabolome. With only a single genome-wide significant hit, our results suggest that this scenario is not the case. In summary, we report an extensive characterization and interpretation of gender-specific differences of the human serum metabolome, providing a broad basis for future analyses.
Electronic supplementary material
The online version of this article (doi:10.1007/s11306-015-0829-0) contains supplementary material, which is available to authorized users.
Epidemiology; Metabolic networks; Metabolomics; Gender differences; Systems biology
In this era of precision medicine, the deep and comprehensive characterization of tumor phenotypes will lead to therapeutic strategies beyond classical factors such as primary sites or anatomical staging. Recently, “-omics” approached have enlightened our knowledge of tumor biology. Such approaches have been extensively implemented in order to provide biomarkers for monitoring of the disease as well as to improve readouts of therapeutic impact. The application of metabolomics to the study of cancer is especially beneficial, since it reflects the biochemical consequences of many cancer type-specific pathophysiological processes. Here, we characterize metabolic profiles of colon and ovarian cancer cell lines to provide broader insight into differentiating metabolic processes for prospective drug development and clinical screening.
We applied non-targeted metabolomics-based mass spectroscopy combined with ultrahigh-performance liquid chromatography and gas chromatography for the metabolic phenotyping of four cancer cell lines: two from colon cancer (HCT15, HCT116) and two from ovarian cancer (OVCAR3, SKOV3). We used the MetaP server for statistical data analysis.
A total of 225 metabolites were detected in all four cell lines; 67 of these molecules significantly discriminated colon cancer from ovarian cancer cells. Metabolic signatures revealed in our study suggest elevated tricarboxylic acid cycle and lipid metabolism in ovarian cancer cell lines, as well as increased β-oxidation and urea cycle metabolism in colon cancer cell lines.
Our study provides a panel of distinct metabolic fingerprints between colon and ovarian cancer cell lines. These may serve as potential drug targets, and now can be evaluated further in primary cells, biofluids, and tissue samples for biomarker purposes.
Electronic supplementary material
The online version of this article (doi:10.1186/s12967-015-0576-z) contains supplementary material, which is available to authorized users.
Genome-wide association studies with metabolomics (mGWAS) identify genetically influenced metabotypes (GIMs), their ensemble defining the heritable part of every human's metabolic individuality. Knowledge of genetic variation in metabolism has many applications of biomedical and pharmaceutical interests, including the functional understanding of genetic associations with clinical end points, design of strategies to correct dysregulations in metabolic disorders and the identification of genetic effect modifiers of metabolic disease biomarkers. Furthermore, it has been shown that GIMs provide testable hypotheses for functional genomics and metabolomics and for the identification of novel gene functions and metabolite identities. mGWAS with growing sample sizes and increasingly complex metabolic trait panels are being conducted, allowing for more comprehensive and systems-based downstream analyses. The generated large datasets of genetic associations can now be mined by the biomedical research community and provide valuable resources for hypothesis-driven studies. In this review, we provide a brief summary of the key aspects of mGWAS, followed by an update of recently published mGWAS. We then discuss new approaches of integrating and exploring mGWAS results and finish by presenting selected applications of GIMs in recent studies.
Supplemental Digital Content is available in the text.
High blood pressure is a major contributor to the global burden of disease and discovering novel causal pathways of blood pressure regulation has been challenging. We tested blood pressure associations with 280 fasting blood metabolites in 3980 TwinsUK females. Survival analysis for all-cause mortality was performed on significant independent metabolites (P<8.9×10−5). Replication was conducted in 2 independent cohorts KORA (n=1494) and Hertfordshire (n=1515). Three independent animal experiments were performed to establish causality: (1) blood pressure change after increasing circulating metabolite levels in Wistar–Kyoto rats; (2) circulating metabolite change after salt-induced blood pressure elevation in spontaneously hypertensive stroke-prone rats; and (3) mesenteric artery response to noradrenaline and carbachol in metabolite treated and control rats. Of the15 metabolites that showed an independent significant association with blood pressure, only hexadecanedioate, a dicarboxylic acid, showed concordant association with blood pressure (systolic BP: β [95% confidence interval], 1.31 [0.83–1.78], P=6.81×10−8; diastolic BP: 0.81 [0.5–1.11], P=2.96×10−7) and mortality (hazard ratio [95% confidence interval], 1.49 [1.08–2.05]; P=0.02) in TwinsUK. The blood pressure association was replicated in KORA and Hertfordshire. In the animal experiments, we showed that oral hexadecanedioate increased both circulating hexadecanedioate and blood pressure in Wistar–Kyoto rats, whereas blood pressure elevation with oral sodium chloride in hypertensive rats did not affect hexadecanedioate levels. Vascular reactivity to noradrenaline was significantly increased in mesenteric resistance arteries from hexadecanedioate-treated rats compared with controls, indicated by the shift to the left of the concentration–response curve (P=0.013). Relaxation to carbachol did not show any difference. Our findings indicate that hexadecanedioate is causally associated with blood pressure regulation through a novel pathway that merits further investigation.
blood pressure; fatty acid synthases; hypertension; metabolomics; mortality
Biological systems consist of multiple organizational levels all densely interacting with each other to ensure function and flexibility of the system. Simultaneous analysis of cross-sectional multi-omics data from large population studies is a powerful tool to comprehensively characterize the underlying molecular mechanisms on a physiological scale. In this study, we systematically analyzed the relationship between fasting serum metabolomics and whole blood transcriptomics data from 712 individuals of the German KORA F4 cohort. Correlation-based analysis identified 1,109 significant associations between 522 transcripts and 114 metabolites summarized in an integrated network, the ‘human blood metabolome-transcriptome interface’ (BMTI). Bidirectional causality analysis using Mendelian randomization did not yield any statistically significant causal associations between transcripts and metabolites. A knowledge-based interpretation and integration with a genome-scale human metabolic reconstruction revealed systematic signatures of signaling, transport and metabolic processes, i.e. metabolic reactions mainly belonging to lipid, energy and amino acid metabolism. Moreover, the construction of a network based on functional categories illustrated the cross-talk between the biological layers at a pathway level. Using a transcription factor binding site enrichment analysis, this pathway cross-talk was further confirmed at a regulatory level. Finally, we demonstrated how the constructed networks can be used to gain novel insights into molecular mechanisms associated to intermediate clinical traits. Overall, our results demonstrate the utility of a multi-omics integrative approach to understand the molecular mechanisms underlying both normal physiology and disease.
Biological systems operate on multiple, intertwined organizational layers that can nowadays be accesses by high-throughput measurement methods, the so-called ‘omics’ technologies. A major aim in the field of systems biology is to understand the flow of biological information between the different layers at a systems level in both health and disease. To unravel the complex mechanisms underlying those molecular processes and to understand how the different functional levels interact with each other, an integrated analysis of multiple layers, i.e. a ‘multi-omics‘ approach is required. In our present study, we investigate the relationship between circulating metabolites in serum and whole-blood gene expression measured in the blood of individuals from a population-based cohort. To this end, we constructed a correlation network that displays which transcript and metabolite show the same trend of up- and down-regulation. We derived a functional characterization of the network by developing a novel computational analysis. The analysis revealed systematic signatures of signaling, transport and metabolic processes on both a regulatory and a pathway level. Moreover, integrating the network with associations to clinical markers such as HDL-cholesterol, LDL-cholesterol and TG identified coordinately activated pathways or modules which might help to assess the molecular machinery behind such an intermediate phenotype.
Metabolomics has opened new avenues for studying metabolic alterations in type 2 diabetes. While many urine and blood metabolites have been associated individually with diabetes, a complete systems view analysis of metabolic dysregulations across multiple biofluids and over varying timescales of glycaemic control is still lacking.
Here we report a broad metabolomics study in a clinical setting, covering 2,178 metabolite measures in saliva, blood plasma and urine from 188 individuals with diabetes and 181 controls of Arab and Asian descent. Using multivariate linear regression we identified metabolites associated with diabetes and markers of acute, short-term and long-term glycaemic control.
Ninety-four metabolite associations with diabetes were identified at a Bonferroni level of significance (p < 2.3 × 10−5), 16 of which have never been reported. Sixty-five of these diabetes-associated metabolites were associated with at least one marker of glycaemic control in the diabetes group. Using Gaussian graphical modelling, we constructed a metabolic network that links diabetes-associated metabolites from three biofluids across three different timescales of glycaemic control.
Our study reveals a complex network of biochemical dysregulation involving metabolites from different pathways of diabetes pathology, and provides a reference framework for future diabetes studies with metabolic endpoints.
Electronic supplementary material
The online version of this article (doi:10.1007/s00125-015-3636-2) contains peer-reviewed but unedited supplementary material, which is available to authorised users.
Arab population; Asian population; Blood metabolomics; Gaussian graphical modelling; Glycaemic control; Metabolic dysregulation; Partial correlation; Saliva metabolomics; Systems biology; Type 2 diabetes; Urine metabolomics
The date palm (Phoenix dactylifera L.) is one of the oldest cultivated trees and is intimately tied to the history of human civilization. There are hundreds of commercial cultivars with distinct fruit shapes, colors, and sizes growing mainly in arid lands from the west of North Africa to India. The origin of date palm domestication is still uncertain, and few studies have attempted to document genetic diversity across multiple regions. We conducted genotyping-by-sequencing on 70 female cultivar samples from across the date palm–growing regions, including four Phoenix species as the outgroup. Here, for the first time, we generate genome-wide genotyping data for 13,000–65,000 SNPs in a diverse set of date palm fruit and leaf samples. Our analysis provides the first genome-wide evidence confirming recent findings that the date palm cultivars segregate into two main regions of shared genetic background from North Africa and the Arabian Gulf. We identify genomic regions with high densities of geographically segregating SNPs and also observe higher levels of allele fixation on the recently described X-chromosome than on the autosomes. Our results fit a model with two centers of earliest cultivation including date palms autochthonous to North Africa. These results adjust our understanding of human agriculture history and will provide the foundation for more directed functional studies and a better understanding of genetic diversity in date palm.
date palm; domestication; genotyping-by-sequencing; population genetics; plant sex chromosomes
Feed efficiency is a paramount factor for livestock economy. Previous studies had indicated a substantial heritability of several feed efficiency traits. In our study, we investigated the genetic background of residual feed intake, a commonly used parameter of feed efficiency, in a cattle resource population generated from crossing dairy and beef cattle. Starting from a whole genome association analysis, we subsequently performed combined phenotype-metabolome-genome analysis taking a systems biology approach by inferring gene networks based on partial correlation and information theory approaches. Our data about biological processes enriched with genes from the feed efficiency network suggest that genetic variation in feed efficiency is driven by genetic modulation of basic processes relevant to general cellular functions. When looking at the predicted upstream regulators from the feed efficiency network, the Tumor Protein P53 (TP53) and Transforming Growth Factor beta 1 (TGFB1) genes stood out regarding significance of overlap and number of target molecules in the data set. These results further support the hypothesis that TP53 is a major upstream regulator for genetic variation of feed efficiency. Furthermore, our data revealed a significant effect of both, the Non-SMC Condensin I Complex, Subunit G (NCAPG) I442M (rs109570900) and the Growth /differentiation factor 8 (GDF8) Q204X (rs110344317) loci, on residual feed intake and feed conversion. For both loci, the growth promoting allele at the onset of puberty was associated with a negative, but favorable effect on residual feed intake. The elevated energy demand for increased growth triggered by the NCAPG 442M allele is obviously not fully compensated for by an increased efficiency in converting feed into body tissue. As a consequence, the individuals carrying the NCAPG 442M allele had an additional demand for energy uptake that is reflected by the association of the allele with increased daily energy intake as observed in our study.
Excess body weight is a major risk factor for cardiometabolic diseases. The complex molecular mechanisms of body weight change-induced metabolic perturbations are not fully understood. Specifically, in-depth molecular characterization of long-term body weight change in the general population is lacking. Here, we pursued a multi-omic approach to comprehensively study metabolic consequences of body weight change during a seven-year follow-up in a large prospective study.
We used data from the population-based Cooperative Health Research in the Region of Augsburg (KORA) S4/F4 cohort. At follow-up (F4), two-platform serum metabolomics and whole blood gene expression measurements were obtained for 1,631 and 689 participants, respectively. Using weighted correlation network analysis, omics data were clustered into modules of closely connected molecules, followed by the formation of a partial correlation network from the modules. Association of the omics modules with previous annual percentage weight change was then determined using linear models. In addition, we performed pathway enrichment analyses, stability analyses, and assessed the relation of the omics modules with clinical traits.
Four metabolite and two gene expression modules were significantly and stably associated with body weight change (P-values ranging from 1.9 × 10−4 to 1.2 × 10−24). The four metabolite modules covered major branches of metabolism, with VLDL, LDL and large HDL subclasses, triglycerides, branched-chain amino acids and markers of energy metabolism among the main representative molecules. One gene expression module suggests a role of weight change in red blood cell development. The other gene expression module largely overlaps with the lipid-leukocyte (LL) module previously reported to interact with serum metabolites, for which we identify additional co-expressed genes. The omics modules were interrelated and showed cross-sectional associations with clinical traits. Moreover, weight gain and weight loss showed largely opposing associations with the omics modules.
Long-term weight change in the general population globally associates with serum metabolite concentrations. An integrated metabolomics and transcriptomics approach improved the understanding of molecular mechanisms underlying the association of weight gain with changes in lipid and amino acid metabolism, insulin sensitivity, mitochondrial function as well as blood cell development and function.
Electronic supplementary material
The online version of this article (doi:10.1186/s12916-015-0282-y) contains supplementary material, which is available to authorized users.
Metabolomics; Transcriptomics; Weight change; Obesity; Molecular epidemiology; Bioinformatics
Modification of DNA by methylation of cytosines at CpG dinucleotides is a widespread phenomenon that leads to changes in gene expression, thereby influencing and regulating many biological processes. Recent technical advances in the genome-wide determination of single-base DNA-methylation enabled epigenome-wide association studies (EWASs). Early EWASs established robust associations between age and gender with the degree of CpG methylation at specific sites. Other studies uncovered associations with cigarette smoking. However, so far these studies were mainly conducted in Caucasians, raising the question of whether these findings can also be extrapolated to other populations.
Here, we present an EWAS with age, gender, and smoking status in a family study of 123 individuals of Arab descent. We determined DNA methylation at over 450,000 CpG sites using the Illumina Infinium HumanMethylation450 BeadChip, applied state-of-the-art data processing protocols, including correction for blood cell type heterogeneity and hidden confounders, and eliminated probes containing SNPs at the targeted CpG site using 40× whole-genome sequencing data. Using this approach, we could replicate the leading published EWAS associations with age, gender and smoking, and recovered hallmarks of gender-specific epigenetic changes. Interestingly, we could even replicate the recently reported precise prediction of chronological age based on the methylation of only a few selected CpG sites.
Our study supports the view that when applied with state-of-the art protocols to account for all potential confounders, DNA methylation arrays represent powerful tools for EWAS with more complex phenotypes that can also be successfully applied to non-Caucasian populations.
Electronic supplementary material
The online version of this article (doi:10.1186/s13148-014-0040-6) contains supplementary material, which is available to authorized users.
DNA methylation; Age; Gender; Smoking; Association study; Epigenetics
Background: The prevalence of type 2 diabetes (T2D) in Qatar and the Middle East is one of the highest in the world. It is estimated that about one quarter of the individuals with tbl2D are undiagnosed. Elevated HbA1c levels are an indicator of tbl2D or a pre-diabetic state. In this study we set out to examine which factors, such as anthropometric and socio-demographic risk factors, are associated with elevated HbA1c levels in a population without tbl2D. Methods: We examined 191 subjects with no record of tbl2D. Anthropometrics and HbA1c were measured. Socio-demographic (age, gender, ethnicity and educational level) and health information were assessed through questionnaires. Elevated HbA1c levels were defined as >6.0% (>42 mmol/mol). Individual risk factors were examined in relationship to having elevated HbA1c levels using logistic regression. Results: Thirty-eight (20%) study participants had elevated HbA1c levels. Participants from South Asian and Filipino descent were more likely to present with elevated HbA1c levels than Arab participants (adjusted odds ratios (OR): 13.30 (95% confidence interval (CI): 4.24, 41.79), p < 0.001 for South Asian and 4.54 (95% CI: 1.04, 19.83), p = 0.04 for Filipinos). A body mass index of above 30 kg/m2 was associated with elevated HbA1c levels (adjusted OR: 2.90 (95% CI: 1.29, 6.51), p = 0.01). Neither gender nor educational level was associated with elevated HbA1c levels. Conclusions: Elevated HbA1c levels in individuals not diagnosed with diabetes were most frequently found in the South Asian and Filipino immigrant population. Special attention should therefore be given to the early identification of tbl2D in these subjects.
HbA1c; undiagnosed type 2 diabetes; public health; pre-diabetes; ethnic differences
Using a nontargeted metabolomics approach of 447 fasting plasma metabolites, we searched for novel molecular markers that arise before and after hyperglycemia in a large population-based cohort of 2,204 females (115 type 2 diabetic [T2D] case subjects, 192 individuals with impaired fasting glucose [IFG], and 1,897 control subjects) from TwinsUK. Forty-two metabolites from three major fuel sources (carbohydrates, lipids, and proteins) were found to significantly correlate with T2D after adjusting for multiple testing; of these, 22 were previously reported as associated with T2D or insulin resistance. Fourteen metabolites were found to be associated with IFG. Among the metabolites identified, the branched-chain keto-acid metabolite 3-methyl-2-oxovalerate was the strongest predictive biomarker for IFG after glucose (odds ratio [OR] 1.65 [95% CI 1.39–1.95], P = 8.46 × 10−9) and was moderately heritable (h2 = 0.20). The association was replicated in an independent population (n = 720, OR 1.68 [ 1.34–2.11], P = 6.52 × 10−6) and validated in 189 twins with urine metabolomics taken at the same time as plasma (OR 1.87 [1.27–2.75], P = 1 × 10−3). Results confirm an important role for catabolism of branched-chain amino acids in T2D and IFG. In conclusion, this T2D-IFG biomarker study has surveyed the broadest panel of nontargeted metabolites to date, revealing both novel and known associated metabolites and providing potential novel targets for clinical prediction and a deeper understanding of causal mechanisms.
Genome-wide association scans with high-throughput metabolic profiling provide unprecedented insights into how genetic variation influences metabolism and complex disease. Here we report the most comprehensive exploration of genetic loci influencing human metabolism to date, including 7,824 adult individuals from two European population studies. We report genome-wide significant associations at 145 metabolic loci and their biochemical connectivity regarding more than 400 metabolites in human blood. We extensively characterize the resulting in vivo blueprint of metabolism in human blood by integrating it with information regarding gene expression, heritability, overlap with known drug targets, previous association with complex disorders and inborn errors of metabolism. We further developed a database and web-based resources for data mining and results visualization. Our findings contribute to a greater understanding of the role of inherited variation in blood metabolic diversity, and identify potential new opportunities for pharmacologic development and disease understanding.
Motivation: Linking genes and functional information to genetic variants identified by association studies remains difficult. Resources containing extensive genomic annotations are available but often not fully utilized due to heterogeneous data formats. To enhance their accessibility, we integrated many annotation datasets into a user-friendly webserver.
Availability and implementation:
Supplementary data are available at Bioinformatics online.
With diminishing costs of next generation sequencing (NGS), whole genome analysis becomes a standard tool for identifying genetic causes of inherited diseases. Commercial NGS service providers in general not only provide raw genomic reads, but further deliver SNP calls to their clients. However, the question for the user arises whether to use the SNP data as is, or process the raw sequencing data further through more sophisticated SNP calling pipelines with more advanced algorithms.
Here we report a detailed comparison of SNPs called using the popular GATK multiple-sample calling protocol to SNPs delivered as part of a 40x whole genome sequencing project by Illumina Inc of 171 human genomes of Arab descent (108 unrelated Qatari genomes, 19 trios, and 2 families with rare diseases) and compare them to variants provided by the Illumina CASAVA pipeline. GATK multi-sample calling identifies more variants than the CASAVA pipeline. The additional variants from GATK are robust for Mendelian consistencies but weak in terms of statistical parameters such as TsTv ratio. However, these additional variants do not make a difference in detecting the causative variants in the studied phenotype.
Both pipelines, GATK multi-sample calling and Illumina CASAVA single sample calling, have highly similar performance in SNP calling at the level of putatively causative variants.
Electronic supplementary material
The online version of this article (doi:10.1186/1756-0500-7-747) contains supplementary material, which is available to authorized users.
NGS; GATK; CASAVA; WGS pipeline; Mendelian inheritance; Qatari population; Multi-sample calling; Genotype calling; Variant; Trios; Illumina
With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter.
Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample.
Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development.
DNA methylation; Beta regression; GAMLSS; Infinium HumanMethylation450k BeadChip; EWAS; Modeling variability; Resampling; Model performance; Model comparison; Models for location and scale
High-throughput screening techniques that analyze the metabolic endpoints of biological processes can identify the contributions of genetic predisposition and environmental factors to the development of common diseases. Studies applying controlled physiological challenges can reveal dysregulation in metabolic responses that may be predictive for or associated with these diseases. However, large-scale epidemiological studies with well controlled physiological challenge conditions, such as extended fasting periods and defined food intake, pose logistic challenges. Culturally and religiously motivated behavioral patterns of life style changes provide a natural setting that can be used to enroll a large number of study volunteers. Here we report a proof of principle study conducted within a Muslim community, showing that a metabolomics study during the Holy Month of Ramadan can provide a unique opportunity to explore the pre-prandial and postprandial response of human metabolism to nutritional challenges. Up to five blood samples were obtained from eleven healthy male volunteers, taken directly before and two hours after consumption of a controlled meal in the evening on days 7 and 26 of Ramadan, and after an over-night fast several weeks after Ramadan. The observed increases in glucose, insulin and lactate levels at the postprandial time point confirm the expected physiological response to food intake. Targeted metabolomics further revealed significant and physiologically plausible responses to food intake by an increase in bile acid and amino acid levels and a decrease in long-chain acyl-carnitine and polyamine levels. A decrease in the concentrations of a number of phospholipids between samples taken on days 7 and 26 of Ramadan shows that the long-term response to extended fasting may differ from the response to short-term fasting. The present study design is scalable to larger populations and may be extended to the study of the metabolic response in defined patient groups such as individuals with type 2 diabetes.
Metabolomics; Nutritional challenging; Ramadan fasting; Study design; Clinical research
Individualized Medicine aims at providing optimal treatment for an individual patient at a given time based on his specific genetic and molecular characteristics. This requires excellent clinical stratification of patients as well as the availability of genomic data and biomarkers as prerequisites for the development of novel diagnostic tools and therapeutic strategies. The University Medicine Greifswald, Germany, has launched the “Greifswald Approach to Individualized Medicine” (GANI_MED) project to address major challenges of Individualized Medicine. Herein, we describe the implementation of the scientific and clinical infrastructure that allows future translation of findings relevant to Individualized Medicine into clinical practice.
Clinical patient cohorts (N > 5,000) with an emphasis on metabolic and cardiovascular diseases are being established following a standardized protocol for the assessment of medical history, laboratory biomarkers, and the collection of various biosamples for bio-banking purposes. A multi-omics based biomarker assessment including genome-wide genotyping, transcriptome, metabolome, and proteome analyses complements the multi-level approach of GANI_MED. Comparisons with the general background population as characterized by our Study of Health in Pomerania (SHIP) are performed. A central data management structure has been implemented to capture and integrate all relevant clinical data for research purposes. Ethical research projects on informed consent procedures, reporting of incidental findings, and economic evaluations were launched in parallel.
Personalized Medicine; Individualized Medicine; Epidemiology
The mechanism of antihypertensive and lipid-lowering drugs on the human organism is still not fully understood. New insights on the drugs’ action can be provided by a metabolomics-driven approach, which offers a detailed view of the physiological state of an organism. Here, we report a metabolome-wide association study with 295 metabolites in human serum from 1,762 participants of the KORA F4 (Cooperative Health Research in the Region of Augsburg) study population. Our intent was to find variations of metabolite concentrations related to the intake of various drug classes and—based on the associations found—to generate new hypotheses about on-target as well as off-target effects of these drugs. In total, we found 41 significant associations for the drug classes investigated: For beta-blockers (11 associations), angiotensin-converting enzyme (ACE) inhibitors (four assoc.), diuretics (seven assoc.), statins (ten assoc.), and fibrates (nine assoc.) the top hits were pyroglutamine, phenylalanylphenylalanine, pseudouridine, 1-arachidonoylglycerophosphocholine, and 2-hydroxyisobutyrate, respectively. For beta-blockers we observed significant associations with metabolite concentrations that are indicative of drug side-effects, such as increased serotonin and decreased free fatty acid levels. Intake of ACE inhibitors and statins associated with metabolites that provide insight into the action of the drug itself on its target, such as an association of ACE inhibitors with des-Arg(9)-bradykinin and aspartylphenylalanine, a substrate and a product of the drug-inhibited ACE. The intake of statins which reduce blood cholesterol levels, resulted in changes in the concentration of metabolites of the biosynthesis as well as of the degradation of cholesterol. Fibrates showed the strongest association with 2-hydroxyisobutyrate which might be a breakdown product of fenofibrate and, thus, a possible marker for the degradation of this drug in the human organism. The analysis of diuretics showed a heterogeneous picture that is difficult to interpret. Taken together, our results provide a basis for a deeper functional understanding of the action and side-effects of antihypertensive and lipid-lowering drugs in the general population.
Electronic supplementary material
The online version of this article (doi:10.1007/s10654-014-9910-7) contains supplementary material, which is available to authorized users.
Beta-blockers; Angiotensin-converting enzyme inhibitors; Diuretics; Statins; Fibrates; Metabolomics