PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-14 (14)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
more »
Document Types
author:("sartans, Ugis")
1.  Updates to BioSamples database at European Bioinformatics Institute 
Nucleic Acids Research  2013;42(D1):D50-D52.
The BioSamples database at the EBI (http://www.ebi.ac.uk/biosamples) provides an integration point for BioSamples information between technology specific databases at the EBI, projects such as ENCODE and reference collections such as cell lines. The database delivers a unified query interface and API to query sample information across EBI’s databases and provides links back to assay databases. Sample groups are used to manage related samples, e.g. those from an experimental submission, or a single reference collection. Infrastructural improvements include a new user interface with ontological and key word queries, a new query API, a new data submission API, complete RDF data download and a supporting SPARQL endpoint, accessioning at the point of submission to the European Nucleotide Archive and European Genotype Phenotype Archives and improved query response times.
doi:10.1093/nar/gkt1081
PMCID: PMC3965081  PMID: 24265224
2.  ArrayExpress update—trends in database growth and links to data analysis tools 
Nucleic Acids Research  2012;41(D1):D987-D990.
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
doi:10.1093/nar/gks1174
PMCID: PMC3531147  PMID: 23193272
3.  graph2tab, a library to convert experimental workflow graphs into tabular formats 
Bioinformatics  2012;28(12):1665-1667.
Motivations: Spreadsheet-like tabular formats are ever more popular in the biomedical field as a mean for experimental reporting. The problem of converting the graph of an experimental workflow into a table-based representation occurs in many such formats and is not easy to solve.
Results: We describe graph2tab, a library that implements methods to realise such a conversion in a size-optimised way. Our solution is generic and can be adapted to specific cases of data exporters or data converters that need to be implemented.
Availability and Implementation: The library source code and documentation are available at http://github.com/ISA-tools/graph2tab.
Contact: brandizi@ebi.ac.uk.
Supplementary Information: A supplementary document describes the theoretical and technical details about the library implementation.
doi:10.1093/bioinformatics/bts258
PMCID: PMC3371871  PMID: 22556367
4.  The BioSample Database (BioSD) at the European Bioinformatics Institute 
Nucleic Acids Research  2011;40(D1):D64-D70.
The BioSample Database (http://www.ebi.ac.uk/biosamples) is a new database at EBI that stores information about biological samples used in molecular experiments, such as sequencing, gene expression or proteomics. The goals of the BioSample Database include: (i) recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; (ii) minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and (iii) supporting cross database queries by sample characteristics. Each sample in the database is assigned an accession number. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples@ebi.ac.uk
doi:10.1093/nar/gkr937
PMCID: PMC3245134  PMID: 22096232
5.  A Genome-Wide Metabolic QTL Analysis in Europeans Implicates Two Loci Shaped by Recent Positive Selection 
PLoS Genetics  2011;7(9):e1002270.
We have performed a metabolite quantitative trait locus (mQTL) study of the 1H nuclear magnetic resonance spectroscopy (1H NMR) metabolome in humans, building on recent targeted knowledge of genetic drivers of metabolic regulation. Urine and plasma samples were collected from two cohorts of individuals of European descent, with one cohort comprised of female twins donating samples longitudinally. Sample metabolite concentrations were quantified by 1H NMR and tested for association with genome-wide single-nucleotide polymorphisms (SNPs). Four metabolites' concentrations exhibited significant, replicable association with SNP variation (8.6×10−11
Author Summary
Physiological concentrations of metabolites—small molecules involved in biochemical processes in living systems—can be measured and used to diagnose and predict disease states. A common goal is to detect and clinically exploit statistical differences in metabolite concentrations between diseased and healthy individuals. As a basis for the design and interpretation of case-control studies, it is useful to have a characterization of metabolic diversity amongst healthy individuals, some of which stems from inter-individual genetic variation. When a single genetic locus has a sufficiently strong effect on metabolism, its genomic position can be determined by collecting metabolite concentration data and genome-wide genotype data on a set of individuals and searching for associations between the two data sets—a so-called metabolite quantitative trait locus (mQTL) study. By so tracing mQTLs, we can identify the genetic drivers of metabolism, characterize how the nature or quantity of the corresponding expressed protein(s) feeds forward to influence metabolite levels, and specify disease-predictive models that incorporate mutual dependence amongst genetics, environment, and metabolism.
doi:10.1371/journal.pgen.1002270
PMCID: PMC3169529  PMID: 21931564
A comprehensive variation map of the human metabolome identifies genetic and stable-environmental sources as major drivers of metabolite concentrations. The data suggest that sample sizes of a few thousand are sufficient to detect metabolite biomarkers predictive of disease.
We designed a longitudinal twin study to characterize the genetic, stable-environmental, and longitudinally fluctuating influences on metabolite concentrations in two human biofluids—urine and plasma—focusing specifically on the representative subset of metabolites detectable by 1H nuclear magnetic resonance (1H NMR) spectroscopy.We identified widespread genetic and stable-environmental influences on the (urine and plasma) metabolomes, with (30 and 42%) attributable on average to familial sources, and (47 and 60%) attributable to longitudinally stable sources.Ten of the metabolites annotated in the study are estimated to have >60% familial contribution to their variation in concentration.Our findings have implications for the design and interpretation of 1H NMR-based molecular epidemiology studies. On the basis of the stable component of variation quantified in the current paper, we specified a model of disease association under which we inferred that sample sizes of a few thousand should be sufficient to detect disease-predictive metabolite biomarkers.
Metabolites are small molecules involved in biochemical processes in living systems. Their concentration in biofluids, such as urine and plasma, can offer insights into the functional status of biological pathways within an organism, and reflect input from multiple levels of biological organization—genetic, epigenetic, transcriptomic, and proteomic—as well as from environmental and lifestyle factors. Metabolite levels have the potential to indicate a broad variety of deviations from the ‘normal' physiological state, such as those that accompany a disease, or an increased susceptibility to disease. A number of recent studies have demonstrated that metabolite concentrations can be used to diagnose disease states accurately. A more ambitious goal is to identify metabolite biomarkers that are predictive of future disease onset, providing the possibility of intervention in susceptible individuals.
If an extreme concentration of a metabolite is to serve as an indicator of disease status, it is usually important to know the distribution of metabolite levels among healthy individuals. It is also useful to characterize the sources of that observed variation in the healthy population. A proportion of that variation—the heritable component—is attributable to genetic differences between individuals, potentially at many genetic loci. An effective, molecular indicator of a heritable, complex disease is likely to have a substantive heritable component. Non-heritable biological variation in metabolite concentrations can arise from a variety of environmental influences, such as dietary intake, lifestyle choices, general physical condition, composition of gut microflora, and use of medication. Variation across a population in stable-environmental influences leads to long-term differences between individuals in their baseline metabolite levels. Dynamic environmental pressures lead to short-term fluctuations within an individual about their baseline level. A metabolite whose concentration changes substantially in response to short-term pressures is relatively unlikely to offer long-term prediction of disease. In summary, the potential suitability of a metabolite to predict disease is reflected by the relative contributions of heritable and stable/unstable-environmental factors to its variation in concentration across the healthy population.
Studies involving twins are an established technique for quantifying the heritable component of phenotypes in human populations. Monozygotic (MZ) twins share the same DNA genome-wide, while dizygotic (DZ) twins share approximately half their inherited DNA, as do ordinary siblings. By comparing the average extent of phenotypic concordance within MZ pairs to that within DZ pairs, it is possible to quantify the heritability of a trait, and also to quantify the familiality, which refers to the combination of heritable and common-environmental effects (i.e., environmental influences shared by twins in a pair). In addition to incorporating twins into the study design, it is useful to quantify the phenotype in some individuals at multiple time points. The longitudinal aspect of such a study allows environmental effects to be decomposed into those that affect the phenotype over the short term and those that exert stable influence.
For the current study, urine and blood samples were collected from a cohort of MZ and DZ twins, with some twins donating samples on two occasions several months apart. Samples were analysed by 1H nuclear magnetic resonance (1H NMR) spectroscopy—an untargeted, discovery-driven technique for quantifying metabolite concentrations in biological samples. The application of 1H NMR to a biological sample creates a spectrum, made up of multiple peaks, with each peak's size quantitatively representing the concentration of its corresponding hydrogen-containing metabolite.
In each biological sample in our study, we extracted a full set of peaks, and thereby quantified the concentrations of all common plasma and urine metabolites detectable by 1H NMR. We developed bespoke statistical methods to decompose the observed concentration variation at each metabolite peak into that originating from familial, individual-environmental, and unstable-environmental sources.
We quantified the variability landscape across all common metabolite peaks in the urine and plasma 1H NMR metabolomes. We annotated a subset of peaks with a total of 65 metabolites; the variance decompositions for these are shown in Figure 1. Ten metabolites' concentrations were estimated to have familial contributions in excess of 60%. The average proportion of stable variation across all extracted metabolite peaks was estimated to be 47% in the urine samples and 60% in the plasma samples; the average estimated familiality was 30% for urine and 42% for plasma. These results comprise the first quantitative variation map of the 1H NMR metabolome. The identification and quantification of substantive widespread stability provides support for the use of these biofluids in molecular epidemiology studies. On the basis of our findings, we performed power calculations for a hypothetical study searching for predictive disease biomarkers among 1H NMR-detectable urine and plasma metabolites. Our calculations suggest that sample sizes of 2000–5000 should allow reliable identification of disease-predictive metabolite concentrations explaining 5–10% of disease risk, while greater sample sizes of 5000–20 000 would be required to identify metabolite concentrations explaining 1–2% of disease risk.
1H Nuclear Magnetic Resonance spectroscopy (1H NMR) is increasingly used to measure metabolite concentrations in sets of biological samples for top-down systems biology and molecular epidemiology. For such purposes, knowledge of the sources of human variation in metabolite concentrations is valuable, but currently sparse. We conducted and analysed a study to create such a resource. In our unique design, identical and non-identical twin pairs donated plasma and urine samples longitudinally. We acquired 1H NMR spectra on the samples, and statistically decomposed variation in metabolite concentration into familial (genetic and common-environmental), individual-environmental, and longitudinally unstable components. We estimate that stable variation, comprising familial and individual-environmental factors, accounts on average for 60% (plasma) and 47% (urine) of biological variation in 1H NMR-detectable metabolite concentrations. Clinically predictive metabolic variation is likely nested within this stable component, so our results have implications for the effective design of biomarker-discovery studies. We provide a power-calculation method which reveals that sample sizes of a few thousand should offer sufficient statistical precision to detect 1H NMR-based biomarkers quantifying predisposition to disease.
doi:10.1038/msb.2011.57
PMCID: PMC3202796  PMID: 21878913
biomarker; 1H nuclear magnetic resonance spectroscopy; metabolome-wide association study; top-down systems biology; variance decomposition
Nucleic Acids Research  2010;39(Database issue):D1002-D1004.
The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
doi:10.1093/nar/gkq1040
PMCID: PMC3013660  PMID: 21071405
Bioinformatics  2009;25(20):2768-2769.
Summary: SIMBioMS is a web-based open source software system for managing data and information in biomedical studies. It provides a solution for the collection, storage, management and retrieval of information about research subjects and biomedical samples, as well as experimental data obtained using a range of high-throughput technologies, including gene expression, genotyping, proteomics and metabonomics. The system can easily be customized and has proven to be successful in several large-scale multi-site collaborative projects. It is compatible with emerging functional genomics data standards and provides data import and export in accepted standard formats. Protocols for transferring data to durable archives at the European Bioinformatics Institute have been implemented.
Availability: The source code, documentation and initialization scripts are available at http://simbioms.org.
Contact: support@simbioms.org; mariak@ebi.ac.uk
doi:10.1093/bioinformatics/btp420
PMCID: PMC2759553  PMID: 19633095
Nucleic Acids Research  2008;37(Database issue):D868-D872.
ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository—a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse—a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas—a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200 000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently—ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.
doi:10.1093/nar/gkn889
PMCID: PMC2686529  PMID: 19015125
BMC Bioinformatics  2007;8:52.
Background
One of the crucial aspects of day-to-day laboratory information management is collection, storage and retrieval of information about research subjects and biomedical samples. An efficient link between sample data and experiment results is absolutely imperative for a successful outcome of a biomedical study. Currently available software solutions are largely limited to large-scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but often implies sufficient investment of time, effort and funds, which are not always available. There is a clear need for lightweight open source systems for patient and sample information management.
Results
We present a web-based tool for submission, management and retrieval of sample and research subject data. The system secures confidentiality by separating anonymized sample information from individuals' records. It is simple and generic, and can be customised for various biomedical studies. Information can be both entered and accessed using the same web interface. User groups and their privileges can be defined. The system is open-source and is supplied with an on-line tutorial and necessary documentation. It has proven to be successful in a large international collaborative project.
Conclusion
The presented system closes the gap between the need and the availability of lightweight software solutions for managing information in biomedical studies involving human research subjects.
doi:10.1186/1471-2105-8-52
PMCID: PMC1803798  PMID: 17291344
BMC Bioinformatics  2006;7:489.
Background
Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support.
Results
We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion.
Conclusion
MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.
doi:10.1186/1471-2105-7-489
PMCID: PMC1687205  PMID: 17087822
Nucleic Acids Research  2004;32(Web Server issue):W465-W470.
Expression Profiler (EP, http://www.ebi.ac.uk/expressionprofiler) is a web-based platform for microarray gene expression and other functional genomics-related data analysis. The new architecture, Expression Profiler: next generation (EP:NG), modularizes the original design and allows individual analysis-task-related components to be developed by different groups and yet still seamlessly to work together and share the same user interface look and feel. Data analysis components for gene expression data preprocessing, missing value imputation, filtering, clustering methods, visualization, significant gene finding, between group analysis and other statistical components are available from the EBI (European Bioinformatics Institute) web site. The web-based design of Expression Profiler supports data sharing and collaborative analysis in a secure environment. Developed tools are integrated with the microarray gene expression database ArrayExpress and form the exploratory analytical front-end to those data. EP:NG is an open-source project, encouraging broad distribution and further extensions from the scientific community.
doi:10.1093/nar/gkh470
PMCID: PMC441608  PMID: 15215431
Nucleic Acids Research  2003;31(1):68-71.
ArrayExpress is a new public database of microarray gene expression data at the EBI, which is a generic gene expression database designed to hold data from all microarray platforms. ArrayExpress uses the annotation standard Minimum Information About a Microarray Experiment (MIAME) and the associated XML data exchange format Microarray Gene Expression Markup Language (MAGE-ML) and it is designed to store well annotated data in a structured way. The ArrayExpress infrastructure consists of the database itself, data submissions in MAGE-ML format or via an online submission tool MIAMExpress, online database query interface, and the Expression Profiler online analysis tool. ArrayExpress accepts three types of submission, arrays, experiments and protocols, each of these is assigned an accession number. Help on data submission and annotation is provided by the curation team. The database can be queried on parameters such as author, laboratory, organism, experiment or array types. With an increasing number of organisations adopting MAGE-ML standard, the volume of submissions to ArrayExpress is increasing rapidly. The database can be accessed at http://www.ebi.ac.uk/arrayexpress.
PMCID: PMC165538  PMID: 12519949
Genome Biology  2002;3(9):research0046.1-research0046.9.
Meaningful exchange of microarray data is currently difficult because it is rare that published data provide sufficient information depth or are even in the same format from one publication to another. MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.
Background
Meaningful exchange of microarray data is currently difficult because it is rare that published data provide sufficient information depth or are even in the same format from one publication to another. Only when data can be easily exchanged will the entire biological community be able to derive the full benefit from such microarray studies.
Results
To this end we have developed three key ingredients towards standardizing the storage and exchange of microarray data. First, we have created a minimal information for the annotation of a microarray experiment (MIAME)-compliant conceptualization of microarray experiments modeled using the unified modeling language (UML) named MAGE-OM (microarray gene expression object model). Second, we have translated MAGE-OM into an XML-based data format, MAGE-ML, to facilitate the exchange of data. Third, some of us are now using MAGE (or its progenitors) in data production settings. Finally, we have developed a freely available software tool kit (MAGE-STK) that eases the integration of MAGE-ML into end users' systems.
Conclusions
MAGE will help microarray data producers and users to exchange information by providing a common platform for data exchange, and MAGE-STK will make the adoption of MAGE easier.
PMCID: PMC126871  PMID: 12225585

Results 1-14 (14)