PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-13 (13)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Ancient human genome sequence of an extinct Palaeo-Eskimo 
Nature  2010;463(7282):757-762.
We report here the genome sequence of an ancient human. Obtained from ∼4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.
doi:10.1038/nature08835
PMCID: PMC3951495  PMID: 20148029
2.  Genome-Wide Association Study of Genetic Variants in LPS-Stimulated IL-6, IL-8, IL-10, IL-1ra and TNF-α Cytokine Response in a Danish Cohort 
PLoS ONE  2013;8(6):e66262.
Background
Cytokine response plays a vital role in various human lipopolysaccharide (LPS) infectious and inflammatory diseases. This study aimed to find genetic variants that might affect the levels of LPS-induced interleukin (IL)-6, IL-8, IL-10, IL-1ra and tumor necrosis factor (TNF)-α cytokine production.
Methods
We performed an initial genome-wide association study using Affymetrix Human Mapping 500 K GeneChip® to screen 130 healthy individuals of Danish descent. The levels of IL-6, IL-8, IL-10, IL-1ra and TNF-α in 24-hour LPS-stimulated whole blood samples were compared within different genotypes. The 152 most significant SNPs were replicated using Illumina Golden Gate® GeneChip in an independent cohort of 186 Danish individuals. Next, 9 of the most statistical significant SNPs were replicated using PCR-based genotyping in an independent cohort of 400 Danish individuals. All results were analyzed in a combined study among the 716 Danish individuals.
Results
Only one marker of the 500 K Gene Chip in the discovery study showed a significant association with LPS-induced IL-1ra cytokine levels after Bonferroni correction (P<10−7). However, this SNP was not associated with the IL-1ra cytokine levels in the replication dataset. No SNPs reached genome-wide significance for the five cytokine levels in the combined analysis of all three stages.
Conclusions
The associations between the genetic variants and the LPS-induced IL-6, IL-8, IL-10, IL-1ra and TNF-α cytokine levels were not significant in the meta-analysis. This present study does not support a strong genetic effect of LPS-stimulated cytokine production; however, the potential for type II errors should be considered.
doi:10.1371/journal.pone.0066262
PMCID: PMC3688878  PMID: 23823136
3.  Genetic Architecture of Vitamin B12 and Folate Levels Uncovered Applying Deeply Sequenced Large Datasets 
PLoS Genetics  2013;9(6):e1003530.
Genome-wide association studies have mainly relied on common HapMap sequence variations. Recently, sequencing approaches have allowed analysis of low frequency and rare variants in conjunction with common variants, thereby improving the search for functional variants and thus the understanding of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined samples of 45,576 and 37,341 individuals with serum B12 and folate measurements, respectively. We found six novel loci associating with serum B12 (CD320, TCN2, ABCD4, MMAA, MMACHC) or folate levels (FOLR3) and confirmed seven loci for these traits (TCN1, FUT6, FUT2, CUBN, CLYBL, MUT, MTHFR). Conditional analyses established that four loci contain additional independent signals. Interestingly, 13 of the 18 identified variants were coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. Contrary to epidemiological studies we did not find consistent association of the variants with cardiovascular diseases, cancers or Alzheimer's disease although some variants demonstrated pleiotropic effects. Although to some degree impeded by low statistical power for some of these conditions, these data suggest that sequence variants that contribute to the population diversity in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations.
Author Summary
Genome-wide association studies have in recent years revealed a wealth of common variants associated with common diseases and phenotypes. We took advantage of the advances in sequencing technologies to study the association of low frequency and rare variants in conjunction with common variants with serum levels of vitamin B12 (B12) and folate in Icelanders and Danes. We found 18 independent signals in 13 loci associated with serum B12 or folate levels. Interestingly, 13 of the 18 identified variants are coding and 11 of the 13 target genes have known functions related to B12 and folate pathways. These data indicate that the target genes at all of the loci have been identified. Epidemiological studies have shown a relationship between serum B12 and folate levels and the risk of cardiovascular diseases, cancers, and Alzheimer's disease. We investigated association between the identified variants and these diseases but did not find consistent association.
doi:10.1371/journal.pgen.1003530
PMCID: PMC3674994  PMID: 23754956
4.  Genotype and SNP calling from next-generation sequencing data 
Nature reviews. Genetics  2011;12(6):443-451.
Meaningful analysis of next-generation sequencing (NGS) data, which are produced extensively by genetics and genomics studies, relies crucially on the accurate calling of SNPs and genotypes. Recently developed statistical methods both improve and quantify the considerable uncertainty associated with genotype calling, and will especially benefit the growing number of studies using low- to medium-coverage data. We review these methods and provide a guide for their use in NGS studies.
doi:10.1038/nrg2986
PMCID: PMC3593722  PMID: 21587300
5.  SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data 
PLoS ONE  2012;7(7):e37558.
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
doi:10.1371/journal.pone.0037558
PMCID: PMC3404070  PMID: 22911679
6.  Ascertainment Biases in SNP Chips Affect Measures of Population Divergence 
Molecular Biology and Evolution  2010;27(11):2534-2547.
Chip-based high-throughput genotyping has facilitated genome-wide studies of genetic diversity. Many studies have utilized these large data sets to make inferences about the demographic history of human populations using measures of genetic differentiation such as FST or principal component analyses. However, the single nucleotide polymorphism (SNP) chip data suffer from ascertainment biases caused by the SNP discovery process in which a small number of individuals from selected populations are used as discovery panels. In this study, we investigate the effect of the ascertainment bias on inferences regarding genetic differentiation among populations in one of the common genome-wide genotyping platforms. We generate SNP genotyping data for individuals that previously have been subject to partial genome-wide Sanger sequencing and compare inferences based on genotyping data to inferences based on direct sequencing. In addition, we also analyze publicly available genome-wide data. We demonstrate that the ascertainment biases will distort measures of human diversity and possibly change conclusions drawn from these measures in some times unexpected ways. We also show that details of the genotyping calling algorithms can have a surprisingly large effect on population genetic inferences. We not only present a correction of the spectrum for the widely used Affymetrix SNP chips but also show that such corrections are difficult to generalize among studies.
doi:10.1093/molbev/msq148
PMCID: PMC3107607  PMID: 20558595
ascertainment bias; demography; single nucleotide polymorphisms; SNP chip data; population genetics
7.  Natural Selection Affects Multiple Aspects of Genetic Variation at Putatively Neutral Sites across the Human Genome 
PLoS Genetics  2011;7(10):e1002326.
A major question in evolutionary biology is how natural selection has shaped patterns of genetic variation across the human genome. Previous work has documented a reduction in genetic diversity in regions of the genome with low recombination rates. However, it is unclear whether other summaries of genetic variation, like allele frequencies, are also correlated with recombination rate and whether these correlations can be explained solely by negative selection against deleterious mutations or whether positive selection acting on favorable alleles is also required. Here we attempt to address these questions by analyzing three different genome-wide resequencing datasets from European individuals. We document several significant correlations between different genomic features. In particular, we find that average minor allele frequency and diversity are reduced in regions of low recombination and that human diversity, human-chimp divergence, and average minor allele frequency are reduced near genes. Population genetic simulations show that either positive natural selection acting on favorable mutations or negative natural selection acting against deleterious mutations can explain these correlations. However, models with strong positive selection on nonsynonymous mutations and little negative selection predict a stronger negative correlation between neutral diversity and nonsynonymous divergence than observed in the actual data, supporting the importance of negative, rather than positive, selection throughout the genome. Further, we show that the widespread presence of weakly deleterious alleles, rather than a small number of strongly positively selected mutations, is responsible for the correlation between neutral genetic diversity and recombination rate. This work suggests that natural selection has affected multiple aspects of linked neutral variation throughout the human genome and that positive selection is not required to explain these observations.
Author Summary
While researchers have identified candidate genes that have evolved under positive Darwinian natural selection, less is known about how much of the human genome has been affected by natural selection or whether positive selection has had a greater role at shaping patterns of variation across the human genome than negative selection acting against deleterious mutations. To address these questions, we have combined patterns of genetic variation in three genome-wide resequencing datasets with population genetic models of natural selection. We find that genetic diversity and average minor allele frequency are reduced in regions of the genome with low recombination rate. Additionally, genetic diversity, human-chimp divergence, and average minor allele frequency have been reduced near genes. Overall, while we cannot exclude positive selection at a fraction of mutations, models that include many weakly deleterious mutations throughout the human genome better explain multiple aspects of the genome-wide resequencing data. This work points to negative selection as an important force for shaping patterns of variation and suggests that there are many weakly deleterious mutations at both coding and noncoding sites throughout the human genome. Understanding such mutations will be important for learning about human evolution and the genetic basis of common disease.
doi:10.1371/journal.pgen.1002326
PMCID: PMC3192825  PMID: 22022285
8.  Combined Analyses of 20 Common Obesity Susceptibility Variants 
Diabetes  2010;59(7):1667-1673.
OBJECTIVE
Genome-wide association studies and linkage studies have identified 20 validated genetic variants associated with obesity and/or related phenotypes. The variants are common, and they individually exhibit small-to-modest effect sizes.
RESEARCH DESIGN AND METHODS
In this study we investigate the combined effect of these variants and their ability to discriminate between normal weight and overweight/obese individuals. We applied receiver operating characteristics (ROC) curves, and estimated the area under the ROC curve (AUC) as a measure of the discriminatory ability. The analyses were performed cross-sectionally in the population-based Inter99 cohort where 1,725 normal weight, 1,519 overweight, and 681 obese individuals were successfully genotyped for all 20 variants.
RESULTS
When combining all variants, the 10% of the study participants who carried more than 22 risk-alleles showed a significant increase in probability of being both overweight with an odds ratio of 2.00 (1.47–2.72), P = 4.0 × 10−5, and obese with an OR of 2.62 (1.76–3.92), P = 6.4 × 10−7, compared with the 10% of the study participants who carried less than 14 risk-alleles. Discrimination ability for overweight and obesity, using the 20 single nucleotide polymorphisms (SNPs), was determined to AUCs of 0.53 and 0.58, respectively. When combining SNP data with conventional nongenetic risk factors of obesity, the discrimination ability increased to 0.64 for overweight and 0.69 for obesity. The latter is significantly higher (P < 0.001) than for the nongenetic factors alone (AUC = 0.67).
CONCLUSIONS
The discriminative value of the 20 validated common obesity variants is at present time sparse and too weak for clinical utility, however, they add to increase the discrimination ability of conventional nongenetic risk factors.
doi:10.2337/db09-1042
PMCID: PMC2889766  PMID: 20110568
9.  Estimation of allele frequency and association mapping using next-generation sequencing data 
BMC Bioinformatics  2011;12:231.
Background
Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.
Results
We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.
Conclusions
Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
doi:10.1186/1471-2105-12-231
PMCID: PMC3212839  PMID: 21663684
10.  Family and Population-Based Studies of Variation within the Ghrelin Receptor Locus in Relation to Measures of Obesity 
PLoS ONE  2010;5(4):e10084.
Background
The growth hormone secretagogue receptor (GHSR) is mediating hunger sensation when stimulated by its natural ligand ghrelin. In the present study, we tested the hypothesis that common and rare variation in the GHSR locus are related to increased prevalence of obesity and overweight among Whites.
Methodology/Principal Findings
In a population-based study sample of 15,854 unrelated, middle-aged Danes, seven variants were genotyped to capture common variation in an 11 kbp region including GHSR. These were investigated for their individual and haplotypic association with obesity. None of these analyses revealed consistent association with measures of obesity. A -151C/T promoter mutation in the GHSR was found in two unrelated obese patients. One family presented with complete co-segregation, but the other with incomplete co-segregation. The mutation resulted in an increased transcriptional activity (p<0.02) and introduction of a specific binding for Sp-1-like nuclear extracts relative to the wild type. The -151C/T mutation was genotyped in the 15,854 Danes with a minor allele frequency of 0.01%. No association with obesity in carriers (mean BMI: 27±4 kg/m2) versus non-carriers (mean BMI: 28±5 kg/m2) (p>0.05) could be shown.
Conclusions/Significance
In a population-based study sample of 15,854 Danes no association between GHSR genotypes and measures of obesity and overweight was found. Also, analyses of GHSR haplotypes lack consistent associations with obesity related traits. A rare functional GHSR promoter mutation variant was identified, yet there was no consistent relationship with obesity in neither family- nor population-based studies.
doi:10.1371/journal.pone.0010084
PMCID: PMC2852411  PMID: 20404923
11.  Association Testing of Novel Type 2 Diabetes Risk Alleles in the JAZF1, CDC123/CAMK1D, TSPAN8, THADA, ADAMTS9, and NOTCH2 Loci With Insulin Release, Insulin Sensitivity, and Obesity in a Population-Based Sample of 4,516 Glucose-Tolerant Middle-Aged Danes 
Diabetes  2008;57(9):2534-2540.
OBJECTIVE— We evaluated the impact on diabetes-related intermediary traits of common novel type 2 diabetes–associated variants in the JAZF1 (rs864745), CDC123/CAMK1D (rs12779790), TSPAN8 (rs7961581), THADA (rs7578597), ADAMTS9 (rs4607103), and NOTCH2 (rs10923931) loci, which were recently identified by meta-analysis of genome-wide association data.
RESEARCH DESIGN AND METHODS— We genotyped the six variants in 4,516 middle-aged glucose-tolerant individuals of the population-based Inter99 cohort who were all characterized by an oral glucose tolerance test (OGTT).
RESULTS— Homozygous carriers of the minor diabetes risk G-allele of the CDC123/CAMK1D rs12779790 showed an 18% decrease in insulinogenic index (95% CI 10–27%; P = 4 × 10−5), an 18% decrease in corrected insulin response (CIR) (8.1–29%; P = 4 × 10−4), and a 13% decrease in the ratio of area under the serum-insulin and plasma-glucose curves during an OGTT (AUC-insulin/AUC-glucose) (5.8–20%; P = 4 × 10−4). Carriers of the diabetes-associated T-allele of JAZF1 rs864745 had an allele-dependent 3% decrease in BIGTT-AIR (0.9–4.3%; P = 0.003). Furthermore, the diabetes-associated C-allele of TSPAN8 rs7961581 associated with decreased levels of CIR (4.5% [0.5–8.4]; P = 0.03), of AUC-insulin/AUC-glucose ratio (3.9% [1.2–6.7]; P = 0.005), and of the insulinogenic index (5.2% [1.9–8.6]; P = 0.002). No association with traits of insulin release or insulin action was observed for the THADA, ADAMTS9, or NOTCH2 variants.
CONCLUSIONS— If replicated, our data suggest that type 2 diabetes at-risk alleles in the JAZF1, CDC123/CAMK1D, and TSPAN8 loci associate with various OGTT-based surrogate measures of insulin release, emphasizing the contribution of abnormal pancreatic β-cell function in the pathogenesis of type 2 diabetes.
doi:10.2337/db08-0436
PMCID: PMC2518507  PMID: 18567820
12.  The Validation and Assessment of Machine Learning: A Game of Prediction from High-Dimensional Data 
PLoS ONE  2009;4(8):e6287.
In applied statistics, tools from machine learning are popular for analyzing complex and high-dimensional data. However, few theoretical results are available that could guide to the appropriate machine learning tool in a new application. Initial development of an overall strategy thus often implies that multiple methods are tested and compared on the same set of data. This is particularly difficult in situations that are prone to over-fitting where the number of subjects is low compared to the number of potential predictors. The article presents a game which provides some grounds for conducting a fair model comparison. Each player selects a modeling strategy for predicting individual response from potential predictors. A strictly proper scoring rule, bootstrap cross-validation, and a set of rules are used to make the results obtained with different strategies comparable. To illustrate the ideas, the game is applied to data from the Nugenob Study where the aim is to predict the fat oxidation capacity based on conventional factors and high-dimensional metabolomics data. Three players have chosen to use support vector machines, LASSO, and random forests, respectively.
doi:10.1371/journal.pone.0006287
PMCID: PMC2716515  PMID: 19652722
13.  Novel de novo BRCA2 mutation in a patient with a family history of breast cancer 
BMC Medical Genetics  2008;9:58.
Background
BRCA2 germ-line mutations predispose to breast and ovarian cancer. Mutations are widespread and unclassified splice variants are frequently encountered. We describe the parental origin and functional characterization of a novel de novo BRCA2 splice site mutation found in a patient exhibiting a ductal carcinoma at the age of 40.
Methods
Variations were identified by denaturing high performance liquid chromatography (dHPLC) and sequencing of the BRCA1 and BRCA2 genes. The effect of the mutation on splicing was examined by exon trapping in COS-7 cells and by RT-PCR on RNA isolated from whole blood. The paternity was determined by single nucleotide polymorphism (SNP) microarray analysis. Parental origin of the de novo mutation was determined by establishing mutation-SNP haplotypes by variant specific PCR, while de novo and mosaic status was investigated by sequencing of DNA from leucocytes and carcinoma tissue.
Results
A novel BRCA2 variant in the splice donor site of exon 21 (nucleotide 8982+1 G→A/c.8754+1 G→A) was identified. Exon trapping showed that the mutation activates a cryptic splice site 46 base pairs 3' of exon 21, resulting in the inclusion of a premature stop codon and synthesis of a truncated BRCA2 protein. The aberrant splicing was verified by RT-PCR analysis on RNA isolated from whole blood of the affected patient. The mutation was not found in any of the patient's parents or in the mother's carcinoma, showing it is a de novo mutation. Variant specific PCR indicates that the mutation arose in the male germ-line.
Conclusion
We conclude that the novel BRCA2 splice variant is a de novo mutation introduced in the male spermatozoa that can be classified as a disease causing mutation.
doi:10.1186/1471-2350-9-58
PMCID: PMC2478678  PMID: 18597679

Results 1-13 (13)