Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies 
BMC Bioinformatics  2013;14:166.
Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code.
We improve speed: computation time from 6 hours is reduced to 10-15 minutes. Our approach can handle essentially an unlimited amount of covariates efficiently, using projections. Data files in GWAS are vast and reading them into computer memory becomes an important issue. However, much improvement can be made if the data is structured beforehand in a way allowing for easy access to blocks of SNPs. We propose several solutions based on the R packages ff and ncdf.
We adapted the semi-parallel computations for logistic regression. We show that in a typical GWAS setting, where SNP effects are very small, we do not lose any precision and our computations are few hundreds times faster than standard procedures.
We provide very fast algorithms for GWAS written in pure R code. We also show how to rearrange SNP data for fast access.
PMCID: PMC3695771  PMID: 23711206
2.  Quantile regression for the statistical analysis of immunological data with many non-detects 
BMC Immunology  2012;13:37.
Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects.
Methods and results
Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects.
Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
PMCID: PMC3447667  PMID: 22769433
Non-detects; Outliers; Robustness; Data analysis; Statistical; Quantile regression; Soluble biological markers; Immunological data
3.  An R package "VariABEL" for genome-wide searching of potentially interacting loci by testing genotypic variance heterogeneity 
BMC Genetics  2012;13:4.
Hundreds of new loci have been discovered by genome-wide association studies of human traits. These studies mostly focused on associations between single locus and a trait. Interactions between genes and between genes and environmental factors are of interest as they can improve our understanding of the genetic background underlying complex traits. Genome-wide testing of complex genetic models is a computationally demanding task. Moreover, testing of such models leads to multiple comparison problems that reduce the probability of new findings. Assuming that the genetic model underlying a complex trait can include hundreds of genes and environmental factors, testing of these models in genome-wide association studies represent substantial difficulties.
We and Pare with colleagues (2010) developed a method allowing to overcome such difficulties. The method is based on the fact that loci which are involved in interactions can show genotypic variance heterogeneity of a trait. Genome-wide testing of such heterogeneity can be a fast scanning approach which can point to the interacting genetic variants.
In this work we present a new method, SVLM, allowing for variance heterogeneity analysis of imputed genetic variation. Type I error and power of this test are investigated and contracted with these of the Levene's test. We also present an R package, VariABEL, implementing existing and newly developed tests.
Variance heterogeneity analysis is a promising method for detection of potentially interacting loci. New method and software package developed in this work will facilitate such analysis in genome-wide context.
PMCID: PMC3398297  PMID: 22272569
single-nucleotide polymorphisms (SNPs); genome-wide association (GWA); gene-environment interactions (GxE); gene-gene interactions (GxG); variance heterogeneity; environmental sensitivity; VariABEL; the GenABEL project
4.  MLPAinter for MLPA interpretation: an integrated approach for the analysis, visualisation and data management of Multiplex Ligation-dependent Probe Amplification 
BMC Bioinformatics  2010;11:67.
Multiplex Ligation-Dependent Probe Amplification (MLPA) is an application that can be used for the detection of multiple chromosomal aberrations in a single experiment. In one reaction, up to 50 different genomic sequences can be analysed. For a reliable work-flow, tools are needed for administrative support, data management, normalisation, visualisation, reporting and interpretation.
Here, we developed a data management system, MLPAInter for MLPA interpretation, that is windows executable and has a stand-alone database for monitoring and interpreting the MLPA data stream that is generated from the experimental setup to analysis, quality control and visualisation. A statistical approach is applied for the normalisation and analysis of large series of MLPA traces, making use of multiple control samples and internal controls.
MLPAinter visualises MLPA data in plots with information about sample replicates, normalisation settings, and sample characteristics. This integrated approach helps in the automated handling of large series of MLPA data and guarantees a quick and streamlined dataflow from the beginning of an experiment to an authorised report.
PMCID: PMC3098110  PMID: 20113482
5.  Integrating chromosomal aberrations and gene expression profiles to dissect rectal tumorigenesis 
BMC Cancer  2008;8:314.
Accurate staging of rectal tumors is essential for making the correct treatment choice. In a previous study, we found that loss of 17p, 18q and gain of 8q, 13q and 20q could distinguish adenoma from carcinoma tissue and that gain of 1q was related to lymph node metastasis. In order to find markers for tumor staging, we searched for candidate genes on these specific chromosomes.
We performed gene expression microarray analysis on 79 rectal tumors and integrated these data with genomic data from the same sample series. We performed supervised analysis to find candidate genes on affected chromosomes and validated the results with qRT-PCR and immunohistochemistry.
Integration of gene expression and chromosomal instability data revealed similarity between these two data types. Supervised analysis identified up-regulation of EFNA1 in cases with 1q gain, and EFNA1 expression was correlated with the expression of a target gene (VEGF). The BOP1 gene, involved in ribosome biogenesis and related to chromosomal instability, was over-expressed in cases with 8q gain. SMAD2 was the most down-regulated gene on 18q, and on 20q, STMN3 and TGIF2 were highly up-regulated. Immunohistochemistry for SMAD4 correlated with SMAD2 gene expression and 18q loss.
On basis of integrative analysis this study identified one well known CRC gene (SMAD2) and several other genes (EFNA1, BOP1, TGIF2 and STMN3) that possibly could be used for rectal cancer characterization.
PMCID: PMC2584339  PMID: 18959792
6.  Macrodissection versus microdissection of rectal carcinoma: minor influence of stroma cells to tumor cell gene expression profiles 
BMC Genomics  2005;6:142.
The molecular determinants of carcinogenesis, tumor progression and patient prognosis can be deduced from simultaneous comparison of thousands of genes by microarray analysis. However, the presence of stroma cells in surgically excised carcinoma tissues might obscure the tumor cell-specific gene expression profiles of these samples. To circumvent this complication, laser microdissection can be performed to separate tumor epithelium from the surrounding stroma and healthy tissue. In this report, we compared RNAs isolated from macrodissected, of which only surrounding healthy tissue had been removed, and microdissected rectal carcinoma samples by microarray analysis in order to determine the most reliable approach to detect the expression of tumor cell-derived genes by microarray analysis.
As microdissection yielded low tissue and RNA quantities, extra rounds of mRNA amplification were necessary to obtain sufficient RNA for microarray experiments. These second rounds of amplification influenced the gene expression profiles. Moreover, the presence of stroma cells in macrodissected samples had a minor contribution to the tumor cell gene expression profiles, which can be explained by the observation that more RNA is extracted from tumor epithelial cells than from stroma.
These data demonstrate that the more convenient procedure of macrodissection can be adequately used and yields reliable data regarding the identification of tumor cell-specific gene expression profiles.
PMCID: PMC1283972  PMID: 16225673

Results 1-6 (6)