PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-11 (11)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  DataSHIELD: taking the analysis to the data, not the data to the analysis 
Background: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK’s proposed ‘care.data’ initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data.
Methods: Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC.
Results: Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach.
Conclusions: DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property—the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis.
doi:10.1093/ije/dyu188
PMCID: PMC4276062  PMID: 25261970
DataSHIELD; pooled analysis; ELSI; privacy; confidentiality; disclosure; distributed computing; intellectual property; bioinformatics
2.  PSEUDOMARKER 2.0: efficient computation of likelihoods using NOMAD 
BMC Bioinformatics  2014;15:47.
Background
PSEUDOMARKER is a software package that performs joint linkage and linkage disequilibrium analysis between a marker and a putative disease locus. A key feature of PSEUDOMARKER is that it can combine case-controls and pedigrees of varying structure into a single unified analysis. Thus it maximizes the full likelihood of the data over marker allele frequencies or conditional allele frequencies on disease and recombination fraction.
Results
The new version 2.0 uses the software package NOMAD to maximize likelihoods, resulting in generally comparable or better optima with many fewer evaluations of the likelihood functions.
Conclusions
After being modified substantially to use modern optimization methods, PSEUDOMARKER version 2.0 is more robust and substantially faster than version 1.0. NOMAD may be useful in other bioinformatics problems where complex likelihood functions are optimized.
doi:10.1186/1471-2105-15-47
PMCID: PMC3932042  PMID: 24533837
3.  The prevalence of metabolic syndrome and metabolically healthy obesity in Europe: a collaborative analysis of ten large cohort studies 
Background
Not all obese subjects have an adverse metabolic profile predisposing them to developing type 2 diabetes or cardiovascular disease. The BioSHaRE-EU Healthy Obese Project aims to gain insights into the consequences of (healthy) obesity using data on risk factors and phenotypes across several large-scale cohort studies. Aim of this study was to describe the prevalence of obesity, metabolic syndrome (MetS) and metabolically healthy obesity (MHO) in ten participating studies.
Methods
Ten different cohorts in seven countries were combined, using data transformed into a harmonized format. All participants were of European origin, with age 18–80 years. They had participated in a clinical examination for anthropometric and blood pressure measurements. Blood samples had been drawn for analysis of lipids and glucose. Presence of MetS was assessed in those with obesity (BMI ≥ 30 kg/m2) based on the 2001 NCEP ATP III criteria, as well as an adapted set of less strict criteria. MHO was defined as obesity, having none of the MetS components, and no previous diagnosis of cardiovascular disease.
Results
Data for 163,517 individuals were available; 17% were obese (11,465 men and 16,612 women). The prevalence of obesity varied from 11.6% in the Italian CHRIS cohort to 26.3% in the German KORA cohort. The age-standardized percentage of obese subjects with MetS ranged in women from 24% in CHRIS to 65% in the Finnish Health2000 cohort, and in men from 43% in CHRIS to 78% in the Finnish DILGOM cohort, with elevated blood pressure the most frequently occurring factor contributing to the prevalence of the metabolic syndrome. The age-standardized prevalence of MHO varied in women from 7% in Health2000 to 28% in NCDS, and in men from 2% in DILGOM to 19% in CHRIS. MHO was more prevalent in women than in men, and decreased with age in both sexes.
Conclusions
Through a rigorous harmonization process, the BioSHaRE-EU consortium was able to compare key characteristics defining the metabolically healthy obese phenotype across ten cohort studies. There is considerable variability in the prevalence of healthy obesity across the different European populations studied, even when unified criteria were used to classify this phenotype.
doi:10.1186/1472-6823-14-9
PMCID: PMC3923238  PMID: 24484869
Harmonization; Obesity; Metabolic syndrome; Cardiovascular disease; Metabolically healthy
4.  On the statistical properties of family-based association tests in datasets containing both pedigrees and unrelated case–control samples 
A common approach to genetic mapping of loci for complex diseases is to perform a genome-wide association study (GWAS) by analyzing a vast number of SNP markers in cohorts of unrelated cases and controls. A direct motivation for the case–control design is that unrelated, affected individuals can be easier to collect than large families with multiple affected persons in the Western world. Despite its higher potential power, investigators have not actively pursued family ascertainment in part because of a dearth of methods for analyzing such correlated data on a large scale. We examine the statistical properties of several commonly used family-based association tests, as to their performance using real-life mixtures of families and singletons taken from our own migraine and schizophrenia studies, as well as population-based data for a complex trait simulated with the evolutionary phenogenetic simulator, ForSim. In virtually every situation, the full likelihood-based methods in the PSEUDOMARKER program outperformed those implemented in FBAT, GENEHUNTER TDT, PLINK (family-based options), HRR/HHRR, QTDT, TRANSMIT, UNPHASED, MENDEL, and LAMP. We further show that GWAS is much more powerful when family samples are used rather than unrelateds, on a genotype-by-genotype basis.
doi:10.1038/ejhg.2011.173
PMCID: PMC3260916  PMID: 21934707
power; type-I error; genetic linkage analysis; linkage disequilibrium; family-based association; genome-wide association studies
5.  On the validity of the likelihood ratio test and consistency of resulting parameter estimates in joint linkage and linkage disequilibrium analysis under improperly specified parametric models 
Annals of human genetics  2011;76(1):63-73.
Summary
It has been shown that parametric analysis of linkage disequilibrium conditional on linkage using an overly deterministic model can be optimal for family-based association analysis. However if one applies this strategy carelessly there is a risk of false inference. We analyze properties of such likelihood ratio tests when the assumed disease mode-of-inheritance is inaccurate. Under some conditions problems result if one is not careful to consider what null hypothesis is being tested. We show that: (a) tests for which the null hypothesis assumes absence of both linkage and association are independent of the true mode-of-inheritance; (b) LRTs assuming either linkage or association under the null hypothesis may depend on the true mode-of-inheritance, lead to inconsistent parameter estimates, in particular under extremely deterministic models; (c) this problem cannot be eliminated by increasing sample size or adding population controls - as sample size increases, the chance of false positive inference goes to 100%; (d) this issue can lead to systematic false positive inference of association in regions of linkage. This is important because highly-deterministic models are often used intentionally in model-based analyses because they can have more power than the true model, and are implicit in many model-free analysis methods.
doi:10.1111/j.1469-1809.2011.00683.x
PMCID: PMC3442930  PMID: 22082140
Likelihood methods; Family-based association; Linkage disequilibrium; Type I error; Bias
6.  PSEUDOMARKER: A Powerful Program for Joint Linkage and/or Linkage Disequilibrium Analysis on Mixtures of Singletons and Related Individuals 
Human Heredity  2011;71(4):256-266.
A decade ago, there was widespread enthusiasm for the prospects of genome-wide association studies to identify common variants related to common chronic diseases using samples of unrelated individuals from populations. Although technological advancements allow us to query more than a million SNPs across the genome at low cost, a disappointingly small fraction of the genetic portion of common disease etiology has been uncovered. This has led to the hypothesis that less frequent variants might be involved, stimulating a renaissance of the traditional approach of seeking genes using multiplex families from less diverse populations. However, by using the modern genotyping and sequencing technology, we can now look not just at linkage, but jointly at linkage and linkage disequilibrium (LD) in such samples. Software methods that can look simultaneously at linkage and LD in a powerful and robust manner have been lacking. Most algorithms cannot jointly analyze datasets involving families of varying structures in a statistically or computationally efficient manner. We have implemented previously proposed statistical algorithms in a user-friendly software package, PSEUDOMARKER. This paper is an announcement of this software package. We describe the motivation behind the approach, the statistical methods, and software, and we briefly demonstrate PSEUDOMARKER's advantages over other packages by example.
doi:10.1159/000329467
PMCID: PMC3190175  PMID: 21811076
Computer software; Family-based association; Genome-wide association; Likelihood methods; Linkage analysis; Linkage disequilibrium; Study design
7.  Novel Susceptibility Locus at 22q11 for Diabetic Nephropathy in Type 1 Diabetes 
PLoS ONE  2011;6(9):e24053.
Background
Diabetic nephropathy (DN) affects about 30% of patients with type 1 diabetes (T1D) and contributes to serious morbidity and mortality. So far only the 3q21–q25 region has repeatedly been indicated as a susceptibility region for DN. The aim of this study was to search for new DN susceptibility loci in Finnish, Danish and French T1D families.
Methods and Results
We performed a genome-wide linkage study using 384 microsatellite markers. A total of 175 T1D families were studied, of which 94 originated from Finland, 46 from Denmark and 35 from France. The whole sample set consisted of 556 individuals including 42 sib-pairs concordant and 84 sib-pairs discordant for DN. Two-point and multi-point non-parametric linkage analyses were performed using the Analyze package and the MERLIN software. A novel DN locus on 22q11 was identified in the joint analysis of the Finnish, Danish and French families by genome-wide multipoint non-parametric linkage analysis using the Kong and Cox linear model (NPLpairs LOD score 3.58). Nominal or suggestive evidence of linkage to this locus was also detected when the three populations were analyzed separately. Suggestive evidence of linkage was found to six additional loci in the Finnish and French sample sets.
Conclusions
This study identified a novel DN locus at chromosome 22q11 with significant evidence of linkage to DN. Our results suggest that this locus may be of importance in European populations. In addition, this study supports previously indicated DN loci on 3q21–q25 and 19q13.
doi:10.1371/journal.pone.0024053
PMCID: PMC3164698  PMID: 21909410
8.  Linkage Analysis of Schizophrenia Controlling for Population Substructure 
Etiological heterogeneity and complexity has hampered attempts to identify predisposing genes for schizophrenia. We sought to minimize the number of segregating genes involved by focusing on a population isolate with elevated disease prevalence. We exploited the well-established population history, and searched for disease susceptibility loci in families from two alternative founder lineages. We studied 28 schizophrenia pedigrees (123 nuclear families) from an outlying municipality on the eastern border of Finland. We divided the families based on their genealogy and defined two routes of immigration: southern and northern. We examined the kinship coefficients and allele frequency distributions within each group, and performed a linkage analysis based on 497 microsatellite markers across the genome. A high degree of historical relatedness was demonstrated by higher sharing of alleles than predicted by the relationships we identified within the previous four generations alone, as would be expected. Between the two subpopulations, allele frequencies were significantly different, consistent with their isolated genealogies. The southern families showed some evidence of linkage in a schizophrenia locus at 4q23 (Z=3.3) near our previous finding with quantitative variation in verbal learning and memory [Paunio et al. (2004); Hum Mol Genet 13: 1693–1702], while the northern pedigrees gave most significant evidence on 10q21 (Z=2.53). Joint analysis of families from both lineages suggested evidence of linkage only at 3p14 (Z=3.18). Thus the detailed genealogical information led us to identification of distinct linkage signals for schizophrenia susceptibility loci between the three analyses we performed.
doi:10.1002/ajmg.b.30905
PMCID: PMC2861849  PMID: 19086037
population isolate; founder population; complex disease mapping
9.  Genome-wide Linkage Screen for Stature and Body-mass Index in 3.032 Families - Evidence for Sex- and Population-specific Genetic Effects 
Stature (adult body height), and body mass index (BMI) have a strong genetic component explaining observed variation in human populations, however, identifying those genetic components has been extremely challenging. It seems obvious that sample size is a critical determinant for successful identification of quantitative trait loci (QTL) that underlie the genetic architecture of these polygenic traits. The inherent shared environment and known genetic relationships in family studies provide clear advantages for gene mapping over studies utilizing unrelated individuals. To these ends, we combined the genotype and phenotype data from four previously performed family-based genome-wide screens resulting in a sample of 9.371 individuals from 3.032 African-American and European-American families and performed variance-components linkage analyses for stature and BMI. To our knowledge, this study represents the single largest family-based genome-wide linkage scan published for stature and BMI to date. This large study sample allowed us to pursue population-and sex-specific analyses as well. For stature we found evidence for linkage in previously reported loci on 11q23, 12q12, 15q25 and 18q23 as well as 15q26 and 19q13 which have not been linked to stature previously. For BMI we found evidence for two loci: one on 7q35 and another on 11q22 both of which have been previously linked to BMI in multiple populations. Our results show both the benefit of 1) combining data to maximize the sample size and 2) minimizing heterogeneity by analyzing subgroups where within-group variation can be reduced and suggest that the latter may be a more successful approach in genetic mapping.
doi:10.1038/ejhg.2008.152
PMCID: PMC2628452  PMID: 18781184
Body Height; Body Mass Index; Linkage mapping; Quantitative Trait Loci
10.  Genome-wide linkage screen for stature and body mass index in 3.032 families: evidence for sex- and population-specific genetic effects 
Stature (adult body height) and body mass index (BMI) have a strong genetic component explaining observed variation in human populations; however, identifying those genetic components has been extremely challenging. It seems obvious that sample size is a critical determinant for successful identification of quantitative trait loci (QTL) that underlie the genetic architecture of these polygenic traits. The inherent shared environment and known genetic relationships in family studies provide clear advantages for gene mapping over studies utilizing unrelated individuals. To these ends, we combined the genotype and phenotype data from four previously performed family-based genome-wide screens resulting in a sample of 9.371 individuals from 3.032 African-American and European-American families and performed variance-components linkage analyses for stature and BMI. To our knowledge, this study represents the single largest family-based genome-wide linkage scan published for stature and BMI to date. This large study sample allowed us to pursue population- and sex-specific analyses as well. For stature, we found evidence for linkage in previously reported loci on 11q23, 12q12, 15q25 and 18q23, as well as 15q26 and 19q13, which have not been linked to stature previously. For BMI, we found evidence for two loci: one on 7q35 and another on 11q22, both of which have been previously linked to BMI in multiple populations. Our results show both the benefit of (1) combining data to maximize the sample size and (2) minimizing heterogeneity by analyzing subgroups where within-group variation can be reduced and suggest that the latter may be a more successful approach in genetic mapping.
doi:10.1038/ejhg.2008.152
PMCID: PMC2628452  PMID: 18781184
body height; body mass index; linkage mapping; quantitative trait loci
11.  Combined Genome Scans for Body Stature in 6,602 European Twins: Evidence for Common Caucasian Loci 
PLoS Genetics  2007;3(6):e97.
Twin cohorts provide a unique advantage for investigations of the role of genetics and environment in the etiology of variation in common complex traits by reducing the variance due to environment, age, and cohort differences. The GenomEUtwin (http://www.genomeutwin.org) consortium consists of eight twin cohorts (Australian, Danish, Dutch, Finnish, Italian, Norwegian, Swedish, and United Kingdom) with the total resource of hundreds of thousands of twin pairs. We performed quantitative trait locus (QTL) analysis of one of the most heritable human complex traits, adult stature (body height) using genome-wide scans performed for 3,817 families (8,450 individuals) derived from twin cohorts from Australia, Denmark, Finland, Netherlands, Sweden, and United Kingdom with an approximate ten-centimorgan microsatellite marker map. The marker maps for different studies differed and they were combined and related to the sequence positions using software developed by us, which is publicly available (https://apps.bioinfo.helsinki.fi/software/cartographer.aspx). Variance component linkage analysis was performed with age, sex, and country of origin as covariates. The covariate adjusted heritability was 81% for stature in the pooled dataset. We found evidence for a major QTL for human stature on 8q21.3 (multipoint logarithm of the odds 3.28), and suggestive evidence for loci on Chromosomes X, 7, and 20. Some evidence of sex heterogeneity was found, however, no obvious female-specific QTLs emerged. Several cohorts contributed to the identified loci, suggesting an evolutionarily old genetic variant having effects on stature in European-based populations. To facilitate the genetic studies of stature we have also set up a website that lists all stature genome scans published and their most significant loci (http://www.genomeutwin.org/stature_gene_map.htm).
Author Summary
Twin cohorts provide a unique advantage for research of the role of genetics and environment behind common complex traits by reducing the variance due to environment, age, and cohort differences. The GenomEUtwin consortium consists of eight twin cohorts with the total resource of hundreds of thousands of twin pairs (http://www.genomeutwin.org). We performed quantitative family-based genetic linkage analysis for one of the most heritable human complex traits, adult stature (body height), using genome-wide scans derived from twin cohorts from Australia, Denmark, Finland, Netherlands, Sweden, and United Kingdom. Age, sex, and country were adjusted for in the data analyses. Human stature was found to be very heritable across all the cohorts and in the combined dataset. We found evidence for a shared genetic locus accounting for human stature on Chromosome 8, and suggestive evidence for loci on Chromosomes X, 7, and 20. Since twins from several countries contributed to the identified loci, an evolutionarily old genetic variant must influence stature in European-based populations. To facilitate the research in the field we have also set up a website that lists all stature genome scans published and their most significant loci (http://www.genomeutwin.org/stature_gene_map.htm).
doi:10.1371/journal.pgen.0030097
PMCID: PMC1892350  PMID: 17559308

Results 1-11 (11)