1.  Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research 
A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.
PMCID: PMC4929882  PMID: 26306643
2.  Data sharing in large research consortia: experiences and recommendations from ENGAGE 
Data sharing is essential for the conduct of cutting-edge research and is increasingly required by funders concerned with maximising the scientific yield from research data collections. International research consortia are encouraged to share data intra-consortia, inter-consortia and with the wider scientific community. Little is reported regarding the factors that hinder or facilitate data sharing in these different situations. This paper provides results from a survey conducted in the European Network for Genetic and Genomic Epidemiology (ENGAGE) that collected information from its participating institutions about their data-sharing experiences. The questionnaire queried about potential hurdles to data sharing, concerns about data sharing, lessons learned and recommendations for future collaborations. Overall, the survey results reveal that data sharing functioned well in ENGAGE and highlight areas that posed the most frequent hurdles for data sharing. Further challenges arise for international data sharing beyond the consortium. These challenges are described and steps to help address these are outlined.
PMCID: PMC3925260  PMID: 23778872
biobanks; data sharing; consortia; genetic research
3.  Genomic inflation factors under polygenic inheritance 
Population structure, including population stratification and cryptic relatedness, can cause spurious associations in genome-wide association studies (GWAS). Usually, the scaled median or mean test statistic for association calculated from multiple single-nucleotide-polymorphisms across the genome is used to assess such effects, and ‘genomic control' can be applied subsequently to adjust test statistics at individual loci by a genomic inflation factor. Published GWAS have clearly shown that there are many loci underlying genetic variation for a wide range of complex diseases and traits, implying that a substantial proportion of the genome should show inflation of the test statistic. Here, we show by theory, simulation and analysis of data that in the absence of population structure and other technical artefacts, but in the presence of polygenic inheritance, substantial genomic inflation is expected. Its magnitude depends on sample size, heritability, linkage disequilibrium structure and the number of causal variants. Our predictions are consistent with empirical observations on height in independent samples of ∼4000 and ∼133 000 individuals.
PMCID: PMC3137506  PMID: 21407268
genome-wide association study; genomic inflation factor; polygenic inheritance
5.  Association of FTO variants with BMI and fat mass in the self-contained population of Sorbs in Germany 
The association between common variants in the FTO gene with weight, adiposity and body mass index (BMI) has now been widely replicated. Although the causal variant has yet to be identified, it most likely maps within a 47 kb region of intron 1 of FTO. We performed a genome-wide association study in the Sorbian population and evaluated the relationships between FTO variants and BMI and fat mass in this isolate of Slavonic origin resident in Germany. In a sample of 948 Sorbs, we could replicate the earlier reported associations of intron 1 SNPs with BMI (eg, P-value=0.003, β=0.02 for rs8050136). However, using genome-wide association data, we also detected a second independent signal mapping to a region in intron 2/3 about 40–60 kb away from the originally reported SNPs (eg, for rs17818902 association with BMI P-value=0.0006, β=−0.03 and with fat mass P-value=0.0018, β=−0.079). Both signals remain independently associated in the conditioned analyses. In conclusion, we extend the evidence that FTO variants are associated with BMI by putatively identifying a second susceptibility allele independent of that described earlier. Although further statistical analysis of these findings is hampered by the finite size of the Sorbian isolate, these findings should encourage other groups to seek alternative susceptibility variants within FTO (and other established susceptibility loci) using the opportunities afforded by analyses in populations with divergent mutational and/or demographic histories.
PMCID: PMC2987177  PMID: 19584900
FTO; BMI; Sorbs

