Search tips
Search criteria

Results 1-15 (15)

Clipboard (0)
Year of Publication
Document Types
1.  Whole genome sequencing data from pedigrees suggests linkage disequilibrium among rare variants created by population admixture 
BMC Proceedings  2014;8(Suppl 1):S44.
Next-generation sequencing technologies have been designed to discover rare and de novo variants and are an important tool for identifying rare disease variants. Many statistical methods have been developed to test, using next-generation sequencing data, for rare variants that are associated with a trait. However, many of these methods make assumptions that rare variants are in linkage equilibrium in a gene. In this report, we studied whether transmitted or untransmitted haplotypes carry an excess of rare variants using the whole genome sequencing data of 15 large Mexican American pedigrees provided by the Genetic Analysis Workshop 18. We observed that an excess of rare variants are carried on either transmitted or nontransmitted haplotypes from parents to offspring. Further analyses suggest that such nonrandom associations among rare variants can be attributed to population admixture and single-nucleotide variant calling errors. Our results have significant implications for rare variant association studies, especially those conducted in admixed populations.
PMCID: PMC4143626  PMID: 25519326
2.  De novo mutations discovered in 8 Mexican American families through whole genome sequencing 
BMC Proceedings  2014;8(Suppl 1):S24.
De novo mutations enrich the sequence diversity and carry the clue of evolutional selection. Recent studies suggest the de novo mutations could be one of the risk factors for complex diseases. We conducted a survey of de novo mutations using the whole genome sequence data but only available on the odd autosomes of Mexican American families provided by Genetic Analysis Workshop 18. We extracted 8 three-generation families who have sequencing data available from 20 large pedigrees. By comparing the known single nucleotide variants (SNVs) in dbSNP129 and the de novo variants transmitted in the Mexican American families, we were able to estimate a de novo mutation rate of 1.64(±0.42) × 10−8 per position per haploid genome. This result is consistent with the estimates in literature that required many extensive validation efforts, such as genotyping and further resequencing. Our analysis suggests the importance of using family samples for studying rare variants.
PMCID: PMC4143763  PMID: 25519376
3.  Identifying rare variants from exome scans: the GAW17 experience 
BMC Proceedings  2011;5(Suppl 9):S1.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
PMCID: PMC3287821  PMID: 22373325
4.  Evaluation of a LASSO regression approach on the unrelated samples of Genetic Analysis Workshop 17 
BMC Proceedings  2011;5(Suppl 9):S12.
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.
PMCID: PMC3287844  PMID: 22373385
5.  Interrogating population structure and its impact on association tests 
BMC Proceedings  2011;5(Suppl 9):S25.
We found from our analysis of the Genetic Analysis Workshop 17 data that the population structure of the 697 unrelated individuals was an important confounding factor for association studies, even if it was not explicitly considered when simulating the phenotypes. We uncovered structures beyond the reported ethnicities and found ample evidence of phenotype–population structure associations. The first 10 principal components of the genotype data of the 697 individuals demonstrated much stronger associations with Q1, Q2, and the disease than did the individuals’ ethnicities. In addition, we observed that population structure was a confounding factor for the Q1-gene association when identifying the significant genes both with and without adjusting for the causal single-nucleotide polymorphisms, the ethnicities, and the principal components. Many false discoveries remained after adjusting for the causal single-nucleotide polymorphisms. Adjusting for the principal components appeared more effective than did adjusting for ethnicity in terms of preventing false discoveries. This analysis was performed with knowledge of the causal loci.
PMCID: PMC3287860  PMID: 22373290
6.  Testing gene-environment interactions in gene-based association studies 
BMC Proceedings  2011;5(Suppl 9):S26.
Gene-based and single-nucleotide polymorphism (SNP) set association studies provide an important complement to SNP analysis. Kernel-based nonparametric regression has recently emerged as a powerful and flexible tool for this purpose. Our goal is to explore whether this approach can be extended to incorporate and test for interaction effects, especially for genes containing rare variant SNPs. Here, we construct nonparametric regression models that can be used to include a gene-environment interaction effect under the framework of the least-squares kernel machine and examine the performance of the proposed method on the Genetic Analysis Workshop 17 unrelated individuals data set. Two hundred simulated replicates were used to explore the power for detecting interaction. We demonstrate through a genome scan of the quantitative phenotype Q1 that the simulated gene-environment interaction effect in the data can be detected with reasonable power by using the least-squares kernel machine method.
PMCID: PMC3287861  PMID: 22373316
7.  Estimating heritability using family and unrelated individuals data 
BMC Proceedings  2011;5(Suppl 9):S34.
For the family data from Genetic Analysis Workshop 17, we obtained heritability estimates of quantitative traits Q1 and Q4 using the ASSOC program in the S.A.G.E. software package. ASSOC is a family-based method that estimates heritability through the estimation of variance components. The covariate-adjusted mean heritability was 0.650 for Q1 and 0.745 for Q4. For the unrelated individuals data, we estimated the heritability of Q1 as the proportion of total variance that can be accounted for by all single-nucleotide polymorphisms under an additive model. We examined a novel ordinary least-squares method, a naïve restricted maximum-likelihood method, and a calibrated restricted maximum-likelihood method. We applied the different methods to all 200 replicates for Q1. We observed that the ordinary least-squares method yielded many estimates outside the interval [0, 1]. The restricted maximum-likelihood estimates were more stable than the ordinary least-squares estimates. The naïve restricted maximum-likelihood method yielded an average estimate of 0.462 ± 0.1, and the calibrated restricted maximum-likelihood method yielded an average of 0.535 ± 0.121. Our results demonstrate discrepancies in heritability estimates using the family data and the unrelated individuals data.
PMCID: PMC3287870  PMID: 22373039
8.  Rare variant density across the genome and across populations 
BMC Proceedings  2011;5(Suppl 9):S39.
Next-generation sequencing allows for a new focus on rare variant density for conducting analyses of association to disease and for narrowing down the genomic regions that show evidence of functionality. In this study we use the 1000 Genomes Project pilot data as distributed by Genetic Analysis Workshop 17 to compare rare variant densities across seven populations. We made the comparisons using regressions of rare variants on total variant counts per gene for each population and Tajima’s D values calculated for each gene in each population, using data on 3,205 genes. We found that the populations clustered by continent for both the regression slopes and Tajima’s D values, with the African populations (Yoruba and Luhya) showing the highest density of rare variants, followed by the Asian populations (Han and Denver Chinese followed by the Japanese) and the European populations (CEPH [European-descent] and Tuscan) with the lowest densities. These significant differences in rare variant densities across populations seem to translate to measures of the rare variant density more commonly used in rare variant association analyses, suggesting the need to adjust for ancestry in such analyses. The selection signal was high for AHNAK, HLA-A, RANBP2, and RGPD4, among others. RANBP2 and RGPD4 showed a marked difference in rare variant density and potential selection between the Luhya and the other populations. This may suggest that differences between populations should be considered when delimiting genomic regions according to functionality and that these differences can create potential for disease heterogeneity.
PMCID: PMC3287875  PMID: 22373165
9.  A novel method to detect rare variants using both family and unrelated case-control data 
BMC Proceedings  2011;5(Suppl 9):S80.
To detect rare variants associated with a phenotype, we develop a novel statistical method that can use both family and unrelated case-control data. Unlike the currently existing methods, we first use family data to calculate weights to be given to rare variants, differentiating between concordantly affected and discordant sib pairs. These weights are then used in an association test applied to the unrelated case-control data. We applied the proposed method to the simulated sequencing data in Genetic Analysis Workshop 17 and identified two genes associated with the disease.
PMCID: PMC3287921  PMID: 22373319
10.  Capability of common SNPs to tag rare variants 
BMC Proceedings  2011;5(Suppl 9):S88.
Genome-wide association studies are based on the linkage disequilibrium pattern between common tagging single-nucleotide polymorphisms (SNPs) (i.e., SNPs having only common alleles) and true causal variants, and association studies with rare SNP alleles aim to detect rare causal variants. To better understand and explain the findings from both types of studies and to provide clues to improve the power of an association study with only common SNPs genotyped, we study the correlation between common SNPs and the presence of rare alleles within a region in the genome and look at the capability of common SNPs in strong linkage disequilibrium with each other to capture single rare alleles. Our results indicate that common SNPs can, to some extent, tag the presence of rare alleles and that including SNPs in strong linkage disequilibrium with each other among the tagging SNPs helps to detect rare alleles.
PMCID: PMC3287929  PMID: 22373521
11.  Comparison of a unified analysis approach for family and unrelated samples with the transmission-disequilibrium test to study associations of hypertension in the Framingham Heart Study 
BMC Proceedings  2009;3(Suppl 7):S22.
Population stratification is one of the major causes of spurious associations in association studies. A unified association approach based on principal-component analysis can overcome the effect of population stratification, as well as make use of both family and unrelated samples combined to increase power (family-case-control, or FamCC). In this study, we compared FamCC and the transmission-disequilibrium test (TDT) using data on hypertension, systolic blood pressure, and diastolic blood pressure in the Framingham Heart Study. Our study indicated FamCC has reasonable type I error for both the unrelated sample and the family sample for all three traits. For these three traits, we found results from FamCC were inconsistent with those from the TDT. We discuss the reasons for this inconsistency. After correcting for multiple tests, we did not detect any significant single-nucleotide polymorphisms by either FamCC or the TDT.
PMCID: PMC2795919  PMID: 20018012
12.  Assessing the impact of global versus local ancestry in association studies 
BMC Proceedings  2009;3(Suppl 7):S107.
To account for population stratification in association studies, principal-components analysis is often performed on single-nucleotide polymorphisms (SNPs) across the genome. Here, we use Framingham Heart Study (FHS) Genetic Analysis Workshop 16 data to compare the performance of local ancestry adjustment for population stratification based on principal components (PCs) estimated from SNPs in a local chromosomal region with global ancestry adjustment based on PCs estimated from genome-wide SNPs.
Standardized height residuals from unrelated adults from the FHS Offspring Cohort were averaged from longitudinal data. PCs of SNP genotype data were calculated to represent individual's ancestry either 1) globally using all SNPs across the genome or 2) locally using SNPs in adjacent 20-Mbp regions within each chromosome. We assessed the extent to which there were differences in association studies of height depending on whether PCs for global, local, or both global and local ancestry were included as covariates.
The correlations between local and global PCs were low (r < 0.12), suggesting variability between local and global ancestry estimates. Genome-wide association tests without any ancestry adjustment demonstrated an inflated type I error rate that decreased with adjustment for local ancestry, global ancestry, or both. A known spurious association was replicated for SNPs within the lactase gene, and this false-positive association was abolished by adjustment with local or global ancestry PCs.
Population stratification is a potential source of bias in this seemingly homogenous FHS population. However, local and global PCs derived from SNPs appear to provide adequate information about ancestry.
PMCID: PMC2795878  PMID: 20017971
13.  A method to correct for population structure using a segregation model 
BMC Proceedings  2009;3(Suppl 7):S104.
To overcome the "spurious" association caused by population stratification in population-based association studies, we propose a principal-component based method that can use both family and unrelated samples at the same time. More specifically, we adapt the multivariate logistic model, which is often used in segregation analysis and can allow for the family correlation structure, for association analysis. To correct the effect of hidden population structure, the first ten principal-components calculated from the matrix of marker genotype data are incorporated as covariates in the model. To test for the association, the marker of interest is also incorporated as a covariate in the model. We applied the proposed method to the second generation (i.e., the Offspring Cohort), in the Genetic Analysis Workshop 16 Framingham Heart Study 50 k data set to evaluate the performance of the method. Although there may have been difficulty in the convergence while maximizing the likelihood function as indicated by a flat likelihood, the distribution of the empirical p-values for the test statistic does show that the method has a correct type I error rate whenever the variance-covariance matrix of the estimates can be computed.
PMCID: PMC2795875  PMID: 20017968
14.  An integrated genome-wide association analysis on rheumatoid arthritis data 
BMC Proceedings  2007;1(Suppl 1):S35.
We propose a nonparametric association analysis combining both family and unrelated case-control genotype data. Under the assumption of Hardy-Weinberg equilibrium, we formed an affected group to compare with a group of unaffecteds.
Comparison with traditional case-control chi-square test and transmission-disequilibrium test shows that this new approach has noticeably improved power. All analysis was based on the simulated rheumatoid arthritis data provided by Genetic Analysis Workshop 15. In the situation of population stratification, we also suggest an approach to update the genotype data using principal components. However, the Genetic Analysis Workshop 15 simulation data does not simulate population stratification. All analysis was done without knowledge of the answers.
PMCID: PMC2367523  PMID: 18466533
15.  A genome-wide linkage study of GAW15 gene expression data 
BMC Proceedings  2007;1(Suppl 1):S87.
Recently, gene expression levels have been shown to demonstrate familial aggregation, suggesting a direct role of heritable DNA variation. We studied the gene expression levels in lymphoblastoid cells of the Centre d'Etude du Polymorphisme Humain Utah families made available to Genetic Analysis Workshop 15 (GAW15), using genome-wide linkage analyses.
Heritability was estimated for the expression levels of each individual phenotype. Genome wide linkage analysis was then performed using the 2819 SNPs for the expression levels of all the genes.
Heritability exceeded 0.21 for 50% of the expressed phenotypes. Genome-wide linkage analysis demonstrated that 19 of them reached significance after correcting for multiple comparisons, only 4 of which were reported previously. We did not identify any hot spots of transcriptional regulation when assuming LOD score > 5.3 for significant linkage evidence.
Our analysis suggests that inconsistent results in comparison to the previous report may be due to the different approaches, phenotype transformation, and different pedigree data used in the analyses.
PMCID: PMC2367464  PMID: 18466590

Results 1-15 (15)