Search tips
Search criteria

Results 1-25 (1996599)

Clipboard (0)

Related Articles

1.  Robust methods for population stratification in genome wide association studies 
BMC Bioinformatics  2013;14:132.
Genome-wide association studies can provide novel insights into diseases of interest, as well as to the responsiveness of an individual to specific treatments. In such studies, it is very important to correct for population stratification, which refers to allele frequency differences between cases and controls due to systematic ancestry differences. Population stratification can cause spurious associations if not adjusted properly. The principal component analysis (PCA) method has been relied upon as a highly useful methodology to adjust for population stratification in these types of large-scale studies. Recently, the linear mixed model (LMM) has also been proposed to account for family structure or cryptic relatedness. However, neither of these approaches may be optimal in properly correcting for sample structures in the presence of subject outliers.
We propose to use robust PCA combined with k-medoids clustering to deal with population stratification. This approach can adjust for population stratification for both continuous and discrete populations with subject outliers, and it can be considered as an extension of the PCA method and the multidimensional scaling (MDS) method. Through simulation studies, we compare the performance of our proposed methods with several widely used stratification methods, including PCA and MDS. We show that subject outliers can greatly influence the analysis results from several existing methods, while our proposed robust population stratification methods perform very well for both discrete and admixed populations with subject outliers. We illustrate the new method using data from a rheumatoid arthritis study.
We demonstrate that subject outliers can greatly influence the analysis result in GWA studies, and propose robust methods for dealing with population stratification that outperform existing population stratification methods in the presence of subject outliers.
PMCID: PMC3637636  PMID: 23601181
Population structure; Population stratification; Robust principal component analysis; Resampling by half means; Outlier detection; GWA studies
2.  Association analyses of the MAS-QTL data set using grammar, principal components and Bayesian network methodologies 
BMC Proceedings  2011;5(Suppl 3):S8.
It has been shown that if genetic relationships among individuals are not taken into account for genome wide association studies, this may lead to false positives. To address this problem, we used Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification analyses. To account for linkage disequilibrium among the significant markers, principal components loadings obtained from top markers can be included as covariates. Estimation of Bayesian networks may also be useful to investigate linkage disequilibrium among SNPs and their relation with environmental variables.
For the quantitative trait we first estimated residuals while taking polygenic effects into account. We then used a single SNP approach to detect the most significant SNPs based on the residuals and applied principal component regression to take linkage disequilibrium among these SNPs into account. For the categorical trait we used principal component stratification methodology to account for background effects. For correction of linkage disequilibrium we used principal component logit regression. Bayesian networks were estimated to investigate relationship among SNPs.
Using the Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification approach we detected around 100 significant SNPs for the quantitative trait (p<0.05 with 1000 permutations) and 109 significant (p<0.0006 with local FDR correction) SNPs for the categorical trait. With additional principal component regression we reduced the list to 16 and 50 SNPs for the quantitative and categorical trait, respectively.
GRAMMAR could efficiently incorporate the information regarding random genetic effects. Principal component stratification should be cautiously used with stringent multiple hypothesis testing correction to correct for ancestral stratification and association analyses for binary traits when there are systematic genetic effects such as half sib family structures. Bayesian networks are useful to investigate relationships among SNPs and environmental variables.
PMCID: PMC3103207  PMID: 21624178
3.  CERAMIC: Case-Control Association Testing in Samples with Related Individuals, Based on Retrospective Mixed Model Analysis with Adjustment for Covariates 
PLoS Genetics  2016;12(10):e1006329.
We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to allow samples with related individuals and to incorporate partially missing data. In simulations, we show that CERAMIC outperforms existing LMM and generalized LMM approaches, maintaining high power and correct type 1 error across a wider range of scenarios. CERAMIC results in a particularly large power increase over existing methods when the sample includes related individuals with some missing data (e.g., when some individuals with phenotype and covariate information have missing genotype), because CERAMIC is able to make use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Because CERAMIC is based on a retrospective analysis, it is robust to misspecification of the phenotype model, resulting in better control of type 1 error and higher power than that of prospective methods, such as GMMAT, when the phenotype model is misspecified. CERAMIC is computationally efficient for genomewide analysis in samples of related individuals of almost any configuration, including small families, unrelated individuals and even large, complex pedigrees. We apply CERAMIC to data on type 2 diabetes (T2D) from the Framingham Heart Study. In a genome scan, 9 of the 10 smallest CERAMIC p-values occur in or near either known T2D susceptibility loci or plausible candidates, verifying that CERAMIC is able to home in on the important loci in a genome scan.
Author Summary
Case-control association testing has proven to be useful for identification of genetic variants that affect susceptibility to disease. One can expect to gain power for detecting such variants by including relevant covariates in the analysis, by accounting for any relatedness of sampled individuals, and by making use of partial information in the data. For analysis of continuously-varying traits, variations on linear mixed-model (LMM) approaches have proven effective at achieving some of these goals. However, for case-control or binary trait mapping, there remain significant challenges. Direct application of LMM approaches to binary traits suffers from power loss when covariate effects are strong, and existing generalized LMM approaches can perform poorly in the presence of trait model misspecification and partially missing data. We propose CERAMIC, a method for binary trait mapping, which is computationally feasible for large genome-wide studies, and which gains power over previous approaches by improved trait modeling, retrospective assessment of significance, accounting for sample structure, and making use of partially missing data. We illustrate this approach in genome-wide association mapping of type 2 diabetes in data from the Framingham Heart Study.
PMCID: PMC5047592  PMID: 27695091
4.  Tracing Sub-Structure in the European American Population with PCA-Informative Markers 
PLoS Genetics  2008;4(7):e1000114.
Genetic structure in the European American population reflects waves of migration and recent gene flow among different populations. This complex structure can introduce bias in genetic association studies. Using Principal Components Analysis (PCA), we analyze the structure of two independent European American datasets (1,521 individuals–307,315 autosomal SNPs). Individual variation lies across a continuum with some individuals showing high degrees of admixture with non-European populations, as demonstrated through joint analysis with HapMap data. The CEPH Europeans only represent a small fraction of the variation encountered in the larger European American datasets we studied. We interpret the first eigenvector of this data as correlated with ancestry, and we apply an algorithm that we have previously described to select PCA-informative markers (PCAIMs) that can reproduce this structure. Importantly, we develop a novel method that can remove redundancy from the selected SNP panels and show that we can effectively remove correlated markers, thus increasing genotyping savings. Only 150–200 PCAIMs suffice to accurately predict fine structure in European American datasets, as identified by PCA. Simulating association studies, we couple our method with a PCA-based stratification correction tool and demonstrate that a small number of PCAIMs can efficiently remove false correlations with almost no loss in power. The structure informative SNPs that we propose are an important resource for genetic association studies of European Americans. Furthermore, our redundancy removal algorithm can be applied on sets of ancestry informative markers selected with any method in order to select the most uncorrelated SNPs, and significantly decreases genotyping costs.
Author Summary
Genetic association studies search to identify disease susceptibility genes through the analysis of genetic markers such as single nucleotide polymorphisms (SNPs) in large numbers of cases and controls. In such settings, the existence of sub-structure in the population under study (i.e. differences in ancestry among cases and controls) may lead to spurious results. It is therefore imperative to control for this possible bias. Such biases may arise for example when studying the European American population, which consists of individuals of diverse ancestry proportions from different European countries and to some degree also from African and Native American populations. Here, we study the genetic sub-structure of the European American population, analyzing 1,521 individuals for over 300,000 SNPs across the entire genome. Applying a powerful method that is based on dimensionality reduction (Principal Components Analysis), we are able to identify 200 SNPs that successfully represent the complete dataset. Importantly, we introduce a novel method that effectively removes redundancy from any set of genetic markers, and may prove extremely useful in a variety of different research scenarios, in order to significantly reduce the cost of a study.
PMCID: PMC2537989  PMID: 18797516
5.  Graphic analysis of population structure on genome-wide rheumatoid arthritis data 
BMC Proceedings  2009;3(Suppl 7):S110.
Principal-component analysis (PCA) has been used for decades to summarize the human genetic variation across geographic regions and to infer population migration history. Reduction of spurious associations due to population structure is crucial for the success of disease association studies. Recently, PCA has also become a popular method for detecting population structure and correction of population stratification in disease association studies. Inspired by manifold learning, we propose a novel method based on spectral graph theory. Regarding each study subject as a node with suitably defined weights for its edges to close neighbors, one can form a weighted graph. We suggest using the spectrum of the associated graph Laplacian operator, namely, Laplacian eigenfunctions, to infer population structures instead of principal components (PCs). For the whole genome-wide association data for the North American Rheumatoid Arthritis Consortium (NARAC) provided by Genetic Workshop Analysis 16, Laplacian eigenfunctions revealed more meaningful structures of the underlying population than PCA. The proposed method has connection to PCA, and it naturally includes PCA as a special case. Our simple method is computationally fast and is suitable for disease studies at the genome-wide scale.
PMCID: PMC2795882  PMID: 20017975
6.  Genetic Structures of Population Cohorts Change with Increasing Age: Implications for Genetic Analyses of Human aging and Life Span 
Correcting for the potential effects of population stratification is an important issue in genome wide association studies (GWAS) of complex traits. Principal component analysis (PCA) of the genetic structure of the population under study with subsequent incorporation of the first several principal components (PCs) in the GWAS regression model is often used for this purpose.
For longevity related traits such a correction may negatively affect the accuracy of genetic analyses. This is because PCs may capture genetic structure induced by mortality selection processes in genetically heterogeneous populations.
Data and Methods
We used the Framingham Heart Study data on life span and on individual genetic background to construct two sets of PCs. One was constructed to separate population stratification due to differences in ancestry from that induced by mortality selection. The other was constructed using genetic data on individuals of different ages without attempting to separate the ancestry effects from the mortality selection effects. The GWASs of human life span were performed using the first 20 PCs from each of the selected sets to control for possible population stratification.
The results indicated that the GWAS that used the PC set separating population stratification induced by mortality selection from differences in ancestry produced stronger genetic signals than the GWAS that used PCs without such separation.
The quality of genetic estimates in GWAS can be improved when changes in genetic structure caused by mortality selection are taken into account in controlling for possible effects of population stratification.
PMCID: PMC4398390  PMID: 25893220
Genetics of aging; Longevity; Genetic associations; Principal component analysis; Genetic structure; Mortality selection; Heterogeneous population
7.  Pedigree-based random effect tests to screen gene pathways 
BMC Proceedings  2014;8(Suppl 1):S100.
The new generation of sequencing platforms opens new horizons in the genetics field. It is possible to exhaustively assay all genetic variants in an individual and search for phenotypic associations. The whole genome sequencing approach, when applied to a large human sample like the San Antonio Family Study, detects a very large number (>25 million) of single nucleotide variants along with other more complex variants. The analytical challenges imposed by this number of variants are formidable, suggesting that methods are needed to reduce the overall number of statistical tests. In this study, we develop a single degree-of-freedom test of variants in a gene pathway employing a random effect model that uses an empirical pathway-specific genetic relationship matrix as the focal covariance kernel. The empirical pathway-specific genetic relationship uses all variants (or a chosen subset) from gene members of a given biological pathway. Using SOLAR's pedigree-based variance components modeling, which also allows for arbitrary fixed effects, such as principal components, to deal with latent population structure, we employ a likelihood ratio test of the pathway-specific genetic relationship matrix model. We examine all gene pathways in KEGG database gene pathways using our method in the first replicate of the Genetic Analysis Workshop 18 simulation of systolic blood pressure. Our random effect approach was able to detect true association signals in causal gene pathways. Those pathways could be easily be further dissected by the independent analysis of all markers.
PMCID: PMC4143680  PMID: 25519354
8.  Assessing the impact of global versus local ancestry in association studies 
BMC Proceedings  2009;3(Suppl 7):S107.
To account for population stratification in association studies, principal-components analysis is often performed on single-nucleotide polymorphisms (SNPs) across the genome. Here, we use Framingham Heart Study (FHS) Genetic Analysis Workshop 16 data to compare the performance of local ancestry adjustment for population stratification based on principal components (PCs) estimated from SNPs in a local chromosomal region with global ancestry adjustment based on PCs estimated from genome-wide SNPs.
Standardized height residuals from unrelated adults from the FHS Offspring Cohort were averaged from longitudinal data. PCs of SNP genotype data were calculated to represent individual's ancestry either 1) globally using all SNPs across the genome or 2) locally using SNPs in adjacent 20-Mbp regions within each chromosome. We assessed the extent to which there were differences in association studies of height depending on whether PCs for global, local, or both global and local ancestry were included as covariates.
The correlations between local and global PCs were low (r < 0.12), suggesting variability between local and global ancestry estimates. Genome-wide association tests without any ancestry adjustment demonstrated an inflated type I error rate that decreased with adjustment for local ancestry, global ancestry, or both. A known spurious association was replicated for SNPs within the lactase gene, and this false-positive association was abolished by adjustment with local or global ancestry PCs.
Population stratification is a potential source of bias in this seemingly homogenous FHS population. However, local and global PCs derived from SNPs appear to provide adequate information about ancestry.
PMCID: PMC2795878  PMID: 20017971
9.  ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction 
BMC Bioinformatics  2013;14:61.
Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case–control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification.
We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual’s continental and sub-continental ancestry. To predict an individual’s continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control’s λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% ± 2.4%, 95.6% ± 3.9%, 95.6% ± 2.1%, 98.3% ± 2.0%, and 95.9% ± 1.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver.
ETHNOPRED is a novel technique for producing classifiers that can identify an individual’s continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values.
PMCID: PMC3618021  PMID: 23432980
10.  Double genomic control is not effective to correct for population stratification in meta-analysis for genome-wide association studies 
Frontiers in Genetics  2012;3:300.
Meta-analysis of genome-wide association studies (GWAS) has become a useful tool to identify genetic variants that are associated with complex human diseases. To control spurious associations between genetic variants and disease that are caused by population stratification, double genomic control (GC) correction for population stratification in meta-analysis for GWAS has been implemented in the software METAL and GWAMA and is widely used by investigators. In this research, we conducted extensive simulation studies to evaluate the double GC correction method in meta-analysis and compared the performance of the double GC correction with that of a principal components analysis (PCA) correction method in meta-analysis. Results show that when the data consist of population stratification, using double GC correction method can have inflated type I error rates at a marker with significant allele frequency differentiation in the subpopulations (such as caused by recent strong selection). On the other hand, the PCA correction method can control type I error rates well and has much higher power in meta-analysis compared to the double GC correction method, even though in the situation that the casual marker does not have significant allele frequency difference between the subpopulations. We applied the double GC correction and PCA correction to meta-analysis of GWAS for two real datasets from the Atherosclerosis Risk in Communities (ARIC) project and the Multi-Ethnic Study of Atherosclerosis (MESA) project. The results also suggest that PCA correction is more effective than the double GC correction in meta-analysis.
PMCID: PMC3529452  PMID: 23269928
genome-wide association studies; meta-analysis; double genomic control correction; principal components analysis; population stratification
11.  Novel genetic matching methods for handling population stratification in genome-wide association studies 
BMC Bioinformatics  2015;16(1):84.
A usually confronted problem in association studies is the occurrence of population stratification. In this work, we propose a novel framework to consider population matchings in the contexts of genome-wide and sequencing association studies. We employ pairwise and groupwise optimal case-control matchings and present an agglomerative hierarchical clustering, both based on a genetic similarity score matrix. In order to ensure that the resulting matches obtained from the matching algorithm capture correctly the population structure, we propose and discuss two stratum validation methods. We also invent a decisive extension to the Cochran-Armitage Trend test to explicitly take into account the particular population structure.
We assess our framework by simulations of genotype data under the null hypothesis, to affirm that it correctly controls for the type-1 error rate. By a power study we evaluate that structured association testing using our framework displays reasonable power. We compare our result with those obtained from a logistic regression model with principal component covariates. Using the principal components approaches we also find a possible false-positive association to Alzheimer’s disease, which is neither supported by our new methods, nor by the results of a most recent large meta analysis or by a mixed model approach.
Matching methods provide an alternative handling of confounding due to population stratification for statistical tests for which covariates are hard to model. As a benchmark, we show that our matching framework performs equally well to state of the art models on common variants.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0521-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4367953  PMID: 25880419
Genome-wide association studies; population stratification; genetic matching; structured association
12.  Application of a single-objective, hybrid genetic algorithm approach to pharmacokinetic model building 
A limitation in traditional stepwise population pharmacokinetic model building is the difficulty in handling interactions between model components. To address this issue, a method was previously introduced which couples NONMEM parameter estimation and model fitness evaluation to a single-objective, hybrid genetic algorithm for global optimization of the model structure. In this study, the generalizability of this approach for pharmacokinetic model building is evaluated by comparing (1) correct and spurious covariate relationships in a simulated dataset resulting from automated stepwise covariate modeling, Lasso methods, and single-objective hybrid genetic algorithm approaches to covariate identification and (2) information criteria values, model structures, convergence, and model parameter values resulting from manual stepwise versus single-objective, hybrid genetic algorithm approaches to model building for seven compounds. Both manual stepwise and single-objective, hybrid genetic algorithm approaches to model building were applied, blinded to the results of the other approach, for selection of the compartment structure as well as inclusion and model form of inter-individual and inter-occasion variability, residual error, and covariates from a common set of model options. For the simulated dataset, stepwise covariate modeling identified three of four true covariates and two spurious covariates; Lasso identified two of four true and 0 spurious covariates; and the single-objective, hybrid genetic algorithm identified three of four true covariates and one spurious covariate. For the clinical datasets, the Akaike information criterion was a median of 22.3 points lower (range of 470.5 point decrease to 0.1 point decrease) for the best single-objective hybrid genetic-algorithm candidate model versus the final manual stepwise model: the Akaike information criterion was lower by greater than 10 points for four compounds and differed by less than 10 points for three compounds. The root mean squared error and absolute mean prediction error of the best single-objective hybrid genetic algorithm candidates were a median of 0.2 points higher (range of 38.9 point decrease to 27.3 point increase) and 0.02 points lower (range of 0.98 point decrease to 0.74 point increase), respectively, than that of the final stepwise models. In addition, the best single-objective, hybrid genetic algorithm candidate models had successful convergence and covariance steps for each compound, used the same compartment structure as the manual stepwise approach for 6 of 7 (86 %) compounds, and identified 54 % (7 of 13) of covariates included by the manual stepwise approach and 16 covariate relationships not included by manual stepwise models. The model parameter values between the final manual stepwise and best single-objective, hybrid genetic algorithm models differed by a median of 26.7 % (q1 = 4.9 % and q3 = 57.1 %). Finally, the single-objective, hybrid genetic algorithm approach was able to identify models capable of estimating absorption rate parameters for four compounds that the manual stepwise approach did not identify. The single-objective, hybrid genetic algorithm represents a general pharmacokinetic model building methodology whose ability to rapidly search the feasible solution space leads to nearly equivalent or superior model fits to pharmacokinetic data.
Electronic supplementary material
The online version of this article (doi:10.1007/s10928-012-9258-0) contains supplementary material, which is available to authorized users.
PMCID: PMC3400037  PMID: 22767341
Pharmacokinetics; Model building; Genetic algorithms
13.  A Comparison of Association Methods Correcting for Population Stratification in Case–Control Studies 
Annals of human genetics  2011;75(3):418-427.
Population stratification is an important issue in case–control studies of disease-marker association. Failure to properly account for population structure can lead to spurious association or reduced power. In this article, we compare the performance of six methods correcting for population stratification in case–control association studies. These methods include genomic control (GC), EIGENSTRAT, principal component-based logistic regression (PCA-L), LAPSTRUCT, ROADTRIPS, and EMMAX. We also include the uncorrected Armitage test for comparison. In the simulation studies, we consider a wide range of population structure models for unrelated samples, including admixture. Our simulation results suggest that PCA-L and LAPSTRUCT perform well over all the scenarios studied, whereas GC, ROADTRIPS, and EMMAX fail to correct for population structure at single nucleotide polymorphisms (SNPs) that show strong differentiation across ancestral populations. The Armitage test does not adjust for confounding due to stratification thus has inflated type I error. Among all correction methods, EMMAX has the greatest power, based on the population structure settings considered for samples with unrelated individuals. The three methods, EIGENSTRAT, PCA-L, and LAPSTRUCT, are comparable, and outperform both GC and ROADTRIPS in almost all situations.
PMCID: PMC3215268  PMID: 21281271
Population structure; association testing; type I error; power
14.  Semiparametric Bayesian analysis of gene-environment interactions with error in measurement of environmental covariates and missing genetic data 
Statistics and its interface  2011;4(3):305-316.
Case-control studies are widely used to detect gene-environment interactions in the etiology of complex diseases. Many variables that are of interest to biomedical researchers are difficult to measure on an individual level, e.g. nutrient intake, cigarette smoking exposure, long-term toxic exposure. Measurement error causes bias in parameter estimates, thus masking key features of data and leading to loss of power and spurious/masked associations. We develop a Bayesian methodology for analysis of case-control studies for the case when measurement error is present in an environmental covariate and the genetic variable has missing data. This approach offers several advantages. It allows prior information to enter the model to make estimation and inference more precise. The environmental covariates measured exactly are modeled completely nonparametrically. Further, information about the probability of disease can be incorporated in the estimation procedure to improve quality of parameter estimates, what cannot be done in conventional case-control studies. A unique feature of the procedure under investigation is that the analysis is based on a pseudo-likelihood function therefore conventional Bayesian techniques may not be technically correct. We propose an approach using Markov Chain Monte Carlo sampling as well as a computationally simple method based on an asymptotic posterior distribution. Simulation experiments demonstrated that our method produced parameter estimates that are nearly unbiased even for small sample sizes. An application of our method is illustrated using a population-based case-control study of the association between calcium intake with the risk of colorectal adenoma development.
PMCID: PMC3178196  PMID: 21949562
Bayesian inference; Errors in variables; Gene-environment interactions; Markov Chain Monte Carlo sampling; Missing data; Pseudo-likelihood; Semiparametric methods
15.  Longitudinal Association Analysis of Quantitative Traits 
Genetic epidemiology  2012;36(8):856-869.
Longitudinal genetic studies provide a valuable resource for exploring key genetic and environmental factors that affect complex traits over time. Genetic analysis of longitudinal data that incorporate temporal variations is important for understanding genetic architecture and biological variations of common complex diseases. Although they are important, there is a paucity of statistical methods to analyze longitudinal human genetic data. In this article, longitudinal methods are developed for temporal association mapping to analyze population longitudinal data. Both parametric and nonparametric models are proposed. The models can be applied to multiple diallelic genetic markers such as single-nucleotide polymorphisms and multiallelic markers such as microsatellites. By analytical formulae, we show that the models take both the linkage disequilibrium and temporal trends into account simultaneously. Variance-covariance structure is constructed to model the single measurement variation and multiple measurement correlations of an individual based on the theory of stochastic processes. Novel penalized spline models are used to estimate the time-dependent mean functions and regression coefficients. The methods were applied to analyze Framingham Heart Study data of Genetic Analysis Workshop (GAW) 13 and GAW 16. The temporal trends and genetic effects of the systolic blood pressure are successfully detected by the proposed approaches. Simulation studies were performed to find out that the nonparametric penalized linear model is the best choice in fitting real data. The research sheds light on the important area of longitudinal genetic analysis, and it provides a basis for future methodological investigations and practical applications.
PMCID: PMC4444073  PMID: 22965819
association mapping; quantitative trait loci; longitudinal analysis
16.  A Robust Statistical Method for Association-Based eQTL Analysis 
PLoS ONE  2011;6(8):e23192.
It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation.
We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations.
The analyses show that the new method confers an improved statistical power for detecting genuine genetic association in subpopulations and an effective control of spurious associations stemmed from population structure when compared with other two popularly implemented methods in the literature of GWAS.
PMCID: PMC3153488  PMID: 21858027
17.  Conditional Random Fields for Fast, Large-Scale Genome-Wide Association Studies 
PLoS ONE  2011;6(7):e21591.
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at
PMCID: PMC3134455  PMID: 21765897
18.  Effect of sample stratification on dairy GWAS results 
BMC Genomics  2012;13:536.
Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach.
Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows.
Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection.
PMCID: PMC3496570  PMID: 23039970
19.  Human genome meeting 2016 
Srivastava, A. K. | Wang, Y. | Huang, R. | Skinner, C. | Thompson, T. | Pollard, L. | Wood, T. | Luo, F. | Stevenson, R. | Polimanti, R. | Gelernter, J. | Lin, X. | Lim, I. Y. | Wu, Y. | Teh, A. L. | Chen, L. | Aris, I. M. | Soh, S. E. | Tint, M. T. | MacIsaac, J. L. | Yap, F. | Kwek, K. | Saw, S. M. | Kobor, M. S. | Meaney, M. J. | Godfrey, K. M. | Chong, Y. S. | Holbrook, J. D. | Lee, Y. S. | Gluckman, P. D. | Karnani, N. | Kapoor, A. | Lee, D. | Chakravarti, A. | Maercker, C. | Graf, F. | Boutros, M. | Stamoulis, G. | Santoni, F. | Makrythanasis, P. | Letourneau, A. | Guipponi, M. | Panousis, N. | Garieri, M. | Ribaux, P. | Falconnet, E. | Borel, C. | Antonarakis, S. E. | Kumar, S. | Curran, J. | Blangero, J. | Chatterjee, S. | Kapoor, A. | Akiyama, J. | Auer, D. | Berrios, C. | Pennacchio, L. | Chakravarti, A. | Donti, T. R. | Cappuccio, G. | Miller, M. | Atwal, P. | Kennedy, A. | Cardon, A. | Bacino, C. | Emrick, L. | Hertecant, J. | Baumer, F. | Porter, B. | Bainbridge, M. | Bonnen, P. | Graham, B. | Sutton, R. | Sun, Q. | Elsea, S. | Hu, Z. | Wang, P. | Zhu, Y. | Zhao, J. | Xiong, M. | Bennett, David A. | Hidalgo-Miranda, A. | Romero-Cordoba, S. | Rodriguez-Cuevas, S. | Rebollar-Vega, R. | Tagliabue, E. | Iorio, M. | D’Ippolito, E. | Baroni, S. | Kaczkowski, B. | Tanaka, Y. | Kawaji, H. | Sandelin, A. | Andersson, R. | Itoh, M. | Lassmann, T. | Hayashizaki, Y. | Carninci, P. | Forrest, A. R. R. | Semple, C. A. | Rosenthal, E. A. | Shirts, B. | Amendola, L. | Gallego, C. | Horike-Pyne, M. | Burt, A. | Robertson, P. | Beyers, P. | Nefcy, C. | Veenstra, D. | Hisama, F. | Bennett, R. | Dorschner, M. | Nickerson, D. | Smith, J. | Patterson, K. | Crosslin, D. | Nassir, R. | Zubair, N. | Harrison, T. | Peters, U. | Jarvik, G. | Menghi, F. | Inaki, K. | Woo, X. | Kumar, P. | Grzeda, K. | Malhotra, A. | Kim, H. | Ucar, D. | Shreckengast, P. | Karuturi, K. | Keck, J. | Chuang, J. | Liu, E. T. | Ji, B. | Tyler, A. | Ananda, G. | Carter, G. | Nikbakht, H. | Montagne, M. | Zeinieh, M. | Harutyunyan, A. | Mcconechy, M. | Jabado, N. | Lavigne, P. | Majewski, J. | Goldstein, J. B. | Overman, M. | Varadhachary, G. | Shroff, R. | Wolff, R. | Javle, M. | Futreal, A. | Fogelman, D. | Bravo, L. | Fajardo, W. | Gomez, H. | Castaneda, C. | Rolfo, C. | Pinto, J. A. | Akdemir, K. C. | Chin, L. | Futreal, A. | Patterson, S. | Statz, C. | Mockus, S. | Nikolaev, S. N. | Bonilla, X. I. | Parmentier, L. | King, B. | Bezrukov, F. | Kaya, G. | Zoete, V. | Seplyarskiy, V. | Sharpe, H. | McKee, T. | Letourneau, A. | Ribaux, P. | Popadin, K. | Basset-Seguin, N. | Chaabene, R. Ben | Santoni, F. | Andrianova, M. | Guipponi, M. | Garieri, M. | Verdan, C. | Grosdemange, K. | Sumara, O. | Eilers, M. | Aifantis, I. | Michielin, O. | de Sauvage, F. | Antonarakis, S. | Likhitrattanapisal, S. | Lincoln, S. | Kurian, A. | Desmond, A. | Yang, S. | Kobayashi, Y. | Ford, J. | Ellisen, L. | Peters, T. L. | Alvarez, K. R. | Hollingsworth, E. F. | Lopez-Terrada, D. H. | Hastie, A. | Dzakula, Z. | Pang, A. W. | Lam, E. T. | Anantharaman, T. | Saghbini, M. | Cao, H. | Gonzaga-Jauregui, C. | Ma, L. | King, A. | Rosenzweig, E. Berman | Krishnan, U. | Reid, J. G. | Overton, J. D. | Dewey, F. | Chung, W. K. | Small, K. | DeLuca, A. | Cremers, F. | Lewis, R. A. | Puech, V. | Bakall, B. | Silva-Garcia, R. | Rohrschneider, K. | Leys, M. | Shaya, F. S. | Stone, E. | Sobreira, N. L. | Schiettecatte, F. | Ling, H. | Pugh, E. | Witmer, D. | Hetrick, K. | Zhang, P. | Doheny, K. | Valle, D. | Hamosh, A. | Jhangiani, S. N. | Akdemir, Z. Coban | Bainbridge, M. N. | Charng, W. | Wiszniewski, W. | Gambin, T. | Karaca, E. | Bayram, Y. | Eldomery, M. K. | Posey, J. | Doddapaneni, H. | Hu, J. | Sutton, V. R. | Muzny, D. M. | Boerwinkle, E. A. | Valle, D. | Lupski, J. R. | Gibbs, R. A. | Shekar, S. | Salerno, W. | English, A. | Mangubat, A. | Bruestle, J. | Thorogood, A. | Knoppers, B. M. | Takahashi, H. | Nitta, K. R. | Kozhuharova, A. | Suzuki, A. M. | Sharma, H. | Cotella, D. | Santoro, C. | Zucchelli, S. | Gustincich, S. | Carninci, P. | Mulvihill, J. J. | Baynam, G. | Gahl, W. | Groft, S. C. | Kosaki, K. | Lasko, P. | Melegh, B. | Taruscio, D. | Ghosh, R. | Plon, S. | Scherer, S. | Qin, X. | Sanghvi, R. | Walker, K. | Chiang, T. | Muzny, D. | Wang, L. | Black, J. | Boerwinkle, E. | Weinshilboum, R. | Gibbs, R. | Karpinets, T. | Calderone, T. | Wani, K. | Yu, X. | Creasy, C. | Haymaker, C. | Forget, M. | Nanda, V. | Roszik, J. | Wargo, J. | Haydu, L. | Song, X. | Lazar, A. | Gershenwald, J. | Davies, M. | Bernatchez, C. | Zhang, J. | Futreal, A. | Woodman, S. | Chesler, E. J. | Reynolds, T. | Bubier, J. A. | Phillips, C. | Langston, M. A. | Baker, E. J. | Xiong, M. | Ma, L. | Lin, N. | Amos, C. | Lin, N. | Wang, P. | Zhu, Y. | Zhao, J. | Calhoun, V. | Xiong, M. | Dobretsberger, O. | Egger, M. | Leimgruber, F. | Sadedin, S. | Oshlack, A. | Antonio, V. A. A. | Ono, N. | Ahmed, Z. | Bolisetty, M. | Zeeshan, S. | Anguiano, E. | Ucar, D. | Sarkar, A. | Nandineni, M. R. | Zeng, C. | Shao, J. | Cao, H. | Hastie, A. | Pang, A. W. | Lam, E. T. | Liang, T. | Pham, K. | Saghbini, M. | Dzakula, Z. | Chee-Wei, Y. | Dongsheng, L. | Lai-Ping, W. | Lian, D. | Hee, R. O. Twee | Yunus, Y. | Aghakhanian, F. | Mokhtar, S. S. | Lok-Yung, C. V. | Bhak, J. | Phipps, M. | Shuhua, X. | Yik-Ying, T. | Kumar, V. | Boon-Peng, H. | Campbell, I. | Young, M. -A. | James, P. | Rain, M. | Mohammad, G. | Kukreti, R. | Pasha, Q. | Akilzhanova, A. R. | Guelly, C. | Abilova, Z. | Rakhimova, S. | Akhmetova, A. | Kairov, U. | Trajanoski, S. | Zhumadilov, Z. | Bekbossynova, M. | Schumacher, C. | Sandhu, S. | Harkins, T. | Makarov, V. | Doddapaneni, H. | Glenn, R. | Momin, Z. | Dilrukshi, B. | Chao, H. | Meng, Q. | Gudenkauf, B. | Kshitij, R. | Jayaseelan, J. | Nessner, C. | Lee, S. | Blankenberg, K. | Lewis, L. | Hu, J. | Han, Y. | Dinh, H. | Jireh, S. | Walker, K. | Boerwinkle, E. | Muzny, D. | Gibbs, R. | Hu, J. | Walker, K. | Buhay, C. | Liu, X. | Wang, Q. | Sanghvi, R. | Doddapaneni, H. | Ding, Y. | Veeraraghavan, N. | Yang, Y. | Boerwinkle, E. | Beaudet, A. L. | Eng, C. M. | Muzny, D. M. | Gibbs, R. A. | Worley, K. C. C. | Liu, Y. | Hughes, D. S. T. | Murali, S. C. | Harris, R. A. | English, A. C. | Qin, X. | Hampton, O. A. | Larsen, P. | Beck, C. | Han, Y. | Wang, M. | Doddapaneni, H. | Kovar, C. L. | Salerno, W. J. | Yoder, A. | Richards, S. | Rogers, J. | Lupski, J. R. | Muzny, D. M. | Gibbs, R. A. | Meng, Q. | Bainbridge, M. | Wang, M. | Doddapaneni, H. | Han, Y. | Muzny, D. | Gibbs, R. | Harris, R. A. | Raveenedran, M. | Xue, C. | Dahdouli, M. | Cox, L. | Fan, G. | Ferguson, B. | Hovarth, J. | Johnson, Z. | Kanthaswamy, S. | Kubisch, M. | Platt, M. | Smith, D. | Vallender, E. | Wiseman, R. | Liu, X. | Below, J. | Muzny, D. | Gibbs, R. | Yu, F. | Rogers, J. | Lin, J. | Zhang, Y. | Ouyang, Z. | Moore, A. | Wang, Z. | Hofmann, J. | Purdue, M. | Stolzenberg-Solomon, R. | Weinstein, S. | Albanes, D. | Liu, C. S. | Cheng, W. L. | Lin, T. T. | Lan, Q. | Rothman, N. | Berndt, S. | Chen, E. S. | Bahrami, H. | Khoshzaban, A. | Keshal, S. Heidari | Bahrami, H. | Khoshzaban, A. | Keshal, S. Heidari | Alharbi, K. K. R. | Zhalbinova, M. | Akilzhanova, A. | Rakhimova, S. | Bekbosynova, M. | Myrzakhmetova, S. | Matar, M. | Mili, N. | Molinari, R. | Ma, Y. | Guerrier, S. | Elhawary, N. | Tayeb, M. | Bogari, N. | Qotb, N. | McClymont, S. A. | Hook, P. W. | Goff, L. A. | McCallion, A. | Kong, Y. | Charette, J. R. | Hicks, W. L. | Naggert, J. K. | Zhao, L. | Nishina, P. M. | Edrees, B. M. | Athar, M. | Al-Allaf, F. A. | Taher, M. M. | Khan, W. | Bouazzaoui, A. | Harbi, N. A. | Safar, R. | Al-Edressi, H. | Anazi, A. | Altayeb, N. | Ahmed, M. A. | Alansary, K. | Abduljaleel, Z. | Kratz, A. | Beguin, P. | Poulain, S. | Kaneko, M. | Takahiko, C. | Matsunaga, A. | Kato, S. | Suzuki, A. M. | Bertin, N. | Lassmann, T. | Vigot, R. | Carninci, P. | Plessy, C. | Launey, T. | Graur, D. | Lee, D. | Kapoor, A. | Chakravarti, A. | Friis-Nielsen, J. | Izarzugaza, J. M. | Brunak, S. | Chakraborty, A. | Basak, J. | Mukhopadhyay, A. | Soibam, B. S. | Das, D. | Biswas, N. | Das, S. | Sarkar, S. | Maitra, A. | Panda, C. | Majumder, P. | Morsy, H. | Gaballah, A. | Samir, M. | Shamseya, M. | Mahrous, H. | Ghazal, A. | Arafat, W. | Hashish, M. | Gruber, J. J. | Jaeger, N. | Snyder, M. | Patel, K. | Bowman, S. | Davis, T. | Kraushaar, D. | Emerman, A. | Russello, S. | Henig, N. | Hendrickson, C. | Zhang, K. | Rodriguez-Dorantes, M. | Cruz-Hernandez, C. D. | Garcia-Tobilla, C. D. P. | Solorzano-Rosales, S. | Jäger, N. | Chen, J. | Haile, R. | Hitchins, M. | Brooks, J. D. | Snyder, M. | Jiménez-Morales, S. | Ramírez, M. | Nuñez, J. | Bekker, V. | Leal, Y. | Jiménez, E. | Medina, A. | Hidalgo, A. | Mejía, J. | Halytskiy, V. | Naggert, J. | Collin, G. B. | DeMauro, K. | Hanusek, R. | Nishina, P. M. | Belhassa, K. | Belhassan, K. | Bouguenouch, L. | Samri, I. | Sayel, H. | moufid, FZ. | El Bouchikhi, I. | Trhanint, S. | Hamdaoui, H. | Elotmani, I. | Khtiri, I. | Kettani, O. | Quibibo, L. | Ahagoud, M. | Abbassi, M. | Ouldim, K. | Marusin, A. V. | Kornetov, A. N. | Swarovskaya, M. | Vagaiceva, K. | Stepanov, V. | De La Paz, E. M. Cutiongco | Sy, R. | Nevado, J. | Reganit, P. | Santos, L. | Magno, J. D. | Punzalan, F. E. | Ona, D. | Llanes, E. | Santos-Cortes, R. L. | Tiongco, R. | Aherrera, J. | Abrahan, L. | Pagauitan-Alan, P. | Morelli, K. H. | Domire, J. S. | Pyne, N. | Harper, S. | Burgess, R. | Zhalbinova, M. | Akilzhanova, A. | Rakhimova, S. | Bekbosynova, M. | Myrzakhmetova, S. | Gari, M. A. | Dallol, A. | Alsehli, H. | Gari, A. | Gari, M. | Abuzenadah, A. | Thomas, M. | Sukhai, M. | Garg, S. | Misyura, M. | Zhang, T. | Schuh, A. | Stockley, T. | Kamel-Reid, S. | Sherry, S. | Xiao, C. | Slotta, D. | Rodarmer, K. | Feolo, M. | Kimelman, M. | Godynskiy, G. | O’Sullivan, C. | Yaschenko, E. | Xiao, C. | Yaschenko, E. | Sherry, S. | Rangel-Escareño, C. | Rueda-Zarate, H. | Tayubi, I. A. | Mohammed, R. | Ahmed, I. | Ahmed, T. | Seth, S. | Amin, S. | Song, X. | Mao, X. | Sun, H. | Verhaak, R. G. | Futreal, A. | Zhang, J. | Whiite, S. J. | Chiang, T. | English, A. | Farek, J. | Kahn, Z. | Salerno, W. | Veeraraghavan, N. | Boerwinkle, E. | Gibbs, R. | Kasukawa, T. | Lizio, M. | Harshbarger, J. | Hisashi, S. | Severin, J. | Imad, A. | Sahin, S. | Freeman, T. C. | Baillie, K. | Sandelin, A. | Carninci, P. | Forrest, A. R. R. | Kawaji, H. | Salerno, W. | English, A. | Shekar, S. N. | Mangubat, A. | Bruestle, J. | Boerwinkle, E. | Gibbs, R. A. | Salem, A. H. | Ali, M. | Ibrahim, A. | Ibrahim, M. | Barrera, H. A. | Garza, L. | Torres, J. A. | Barajas, V. | Ulloa-Aguirre, A. | Kershenobich, D. | Mortaji, Shahroj | Guizar, Pedro | Loera, Eliezer | Moreno, Karen | De León, Adriana | Monsiváis, Daniela | Gómez, Jackeline | Cardiel, Raquel | Fernandez-Lopez, J. C. | Bonifaz-Peña, V. | Rangel-Escareño, C. | Hidalgo-Miranda, A. | Contreras, A. V. | Polfus, L. | Wang, X. | Philip, V. | Carter, G. | Abuzenadah, A. A. | Gari, M. | Turki, R. | Dallol, A. | Uyar, A. | Kaygun, A. | Zaman, S. | Marquez, E. | George, J. | Ucar, D. | Hendrickson, C. L. | Emerman, A. | Kraushaar, D. | Bowman, S. | Henig, N. | Davis, T. | Russello, S. | Patel, K. | Starr, D. B. | Baird, M. | Kirkpatrick, B. | Sheets, K. | Nitsche, R. | Prieto-Lafuente, L. | Landrum, M. | Lee, J. | Rubinstein, W. | Maglott, D. | Thavanati, P. K. R. | de Dios, A. Escoto | Hernandez, R. E. Navarro | Aldrate, M. E. Aguilar | Mejia, M. R. Ruiz | Kanala, K. R. R. | Abduljaleel, Z. | Khan, W. | Al-Allaf, F. A. | Athar, M. | Taher, M. M. | Shahzad, N. | Bouazzaoui, A. | Huber, E. | Dan, A. | Al-Allaf, F. A. | Herr, W. | Sprotte, G. | Köstler, J. | Hiergeist, A. | Gessner, A. | Andreesen, R. | Holler, E. | Al-Allaf, F. | Alashwal, A. | Abduljaleel, Z. | Taher, M. | Bouazzaoui, A. | Abalkhail, H. | Al-Allaf, A. | Bamardadh, R. | Athar, M. | Filiptsova, O. | Kobets, M. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Filiptsova, O. | Kobets, M. N. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Filiptsova, O. | Kobets, M. N. | Kobets, Y. | Burlaka, I. | Timoshyna, I. | Al-allaf, F. A. | Mohiuddin, M. T. | Zainularifeen, A. | Mohammed, A. | Abalkhail, H. | Owaidah, T. | Bouazzaoui, A.
Human Genomics  2016;10(Suppl 1):12.
Table of contents
O1 The metabolomics approach to autism: identification of biomarkers for early detection of autism spectrum disorder
A. K. Srivastava, Y. Wang, R. Huang, C. Skinner, T. Thompson, L. Pollard, T. Wood, F. Luo, R. Stevenson
O2 Phenome-wide association study for smoking- and drinking-associated genes in 26,394 American women with African, Asian, European, and Hispanic descents
R. Polimanti, J. Gelernter
O3 Effects of prenatal environment, genotype and DNA methylation on birth weight and subsequent postnatal outcomes: findings from GUSTO, an Asian birth cohort
X. Lin, I. Y. Lim, Y. Wu, A. L. Teh, L. Chen, I. M. Aris, S. E. Soh, M. T. Tint, J. L. MacIsaac, F. Yap, K. Kwek, S. M. Saw, M. S. Kobor, M. J. Meaney, K. M. Godfrey, Y. S. Chong, J. D. Holbrook, Y. S. Lee, P. D. Gluckman, N. Karnani, GUSTO study group
O4 High-throughput identification of specific qt interval modulating enhancers at the SCN5A locus
A. Kapoor, D. Lee, A. Chakravarti
O5 Identification of extracellular matrix components inducing cancer cell migration in the supernatant of cultivated mesenchymal stem cells
C. Maercker, F. Graf, M. Boutros
O6 Single cell allele specific expression (ASE) IN T21 and common trisomies: a novel approach to understand DOWN syndrome and other aneuploidies
G. Stamoulis, F. Santoni, P. Makrythanasis, A. Letourneau, M. Guipponi, N. Panousis, M. Garieri, P. Ribaux, E. Falconnet, C. Borel, S. E. Antonarakis
O7 Role of microRNA in LCL to IPSC reprogramming
S. Kumar, J. Curran, J. Blangero
O8 Multiple enhancer variants disrupt gene regulatory network in Hirschsprung disease
S. Chatterjee, A. Kapoor, J. Akiyama, D. Auer, C. Berrios, L. Pennacchio, A. Chakravarti
O9 Metabolomic profiling for the diagnosis of neurometabolic disorders
T. R. Donti, G. Cappuccio, M. Miller, P. Atwal, A. Kennedy, A. Cardon, C. Bacino, L. Emrick, J. Hertecant, F. Baumer, B. Porter, M. Bainbridge, P. Bonnen, B. Graham, R. Sutton, Q. Sun, S. Elsea
O10 A novel causal methylation network approach to Alzheimer’s disease
Z. Hu, P. Wang, Y. Zhu, J. Zhao, M. Xiong, David A Bennett
O11 A microRNA signature identifies subtypes of triple-negative breast cancer and reveals MIR-342-3P as regulator of a lactate metabolic pathway
A. Hidalgo-Miranda, S. Romero-Cordoba, S. Rodriguez-Cuevas, R. Rebollar-Vega, E. Tagliabue, M. Iorio, E. D’Ippolito, S. Baroni
O12 Transcriptome analysis identifies genes, enhancer RNAs and repetitive elements that are recurrently deregulated across multiple cancer types
B. Kaczkowski, Y. Tanaka, H. Kawaji, A. Sandelin, R. Andersson, M. Itoh, T. Lassmann, the FANTOM5 consortium, Y. Hayashizaki, P. Carninci, A. R. R. Forrest
O13 Elevated mutation and widespread loss of constraint at regulatory and architectural binding sites across 11 tumour types
C. A. Semple
O14 Exome sequencing provides evidence of pathogenicity for genes implicated in colorectal cancer
E. A. Rosenthal, B. Shirts, L. Amendola, C. Gallego, M. Horike-Pyne, A. Burt, P. Robertson, P. Beyers, C. Nefcy, D. Veenstra, F. Hisama, R. Bennett, M. Dorschner, D. Nickerson, J. Smith, K. Patterson, D. Crosslin, R. Nassir, N. Zubair, T. Harrison, U. Peters, G. Jarvik, NHLBI GO Exome Sequencing Project
O15 The tandem duplicator phenotype as a distinct genomic configuration in cancer
F. Menghi, K. Inaki, X. Woo, P. Kumar, K. Grzeda, A. Malhotra, H. Kim, D. Ucar, P. Shreckengast, K. Karuturi, J. Keck, J. Chuang, E. T. Liu
O16 Modeling genetic interactions associated with molecular subtypes of breast cancer
B. Ji, A. Tyler, G. Ananda, G. Carter
O17 Recurrent somatic mutation in the MYC associated factor X in brain tumors
H. Nikbakht, M. Montagne, M. Zeinieh, A. Harutyunyan, M. Mcconechy, N. Jabado, P. Lavigne, J. Majewski
O18 Predictive biomarkers to metastatic pancreatic cancer treatment
J. B. Goldstein, M. Overman, G. Varadhachary, R. Shroff, R. Wolff, M. Javle, A. Futreal, D. Fogelman
O19 DDIT4 gene expression as a prognostic marker in several malignant tumors
L. Bravo, W. Fajardo, H. Gomez, C. Castaneda, C. Rolfo, J. A. Pinto
O20 Spatial organization of the genome and genomic alterations in human cancers
K. C. Akdemir, L. Chin, A. Futreal, ICGC PCAWG Structural Alterations Group
O21 Landscape of targeted therapies in solid tumors
S. Patterson, C. Statz, S. Mockus
O22 Genomic analysis reveals novel drivers and progression pathways in skin basal cell carcinoma
S. N. Nikolaev, X. I. Bonilla, L. Parmentier, B. King, F. Bezrukov, G. Kaya, V. Zoete, V. Seplyarskiy, H. Sharpe, T. McKee, A. Letourneau, P. Ribaux, K. Popadin, N. Basset-Seguin, R. Ben Chaabene, F. Santoni, M. Andrianova, M. Guipponi, M. Garieri, C. Verdan, K. Grosdemange, O. Sumara, M. Eilers, I. Aifantis, O. Michielin, F. de Sauvage, S. Antonarakis
O23 Identification of differential biomarkers of hepatocellular carcinoma and cholangiocarcinoma via transcriptome microarray meta-analysis
S. Likhitrattanapisal
O24 Clinical validity and actionability of multigene tests for hereditary cancers in a large multi-center study
S. Lincoln, A. Kurian, A. Desmond, S. Yang, Y. Kobayashi, J. Ford, L. Ellisen
O25 Correlation with tumor ploidy status is essential for correct determination of genome-wide copy number changes by SNP array
T. L. Peters, K. R. Alvarez, E. F. Hollingsworth, D. H. Lopez-Terrada
O26 Nanochannel based next-generation mapping for interrogation of clinically relevant structural variation
A. Hastie, Z. Dzakula, A. W. Pang, E. T. Lam, T. Anantharaman, M. Saghbini, H. Cao, BioNano Genomics
O27 Mutation spectrum in a pulmonary arterial hypertension (PAH) cohort and identification of associated truncating mutations in TBX4
C. Gonzaga-Jauregui, L. Ma, A. King, E. Berman Rosenzweig, U. Krishnan, J. G. Reid, J. D. Overton, F. Dewey, W. K. Chung
O28 NORTH CAROLINA macular dystrophy (MCDR1): mutations found affecting PRDM13
K. Small, A. DeLuca, F. Cremers, R. A. Lewis, V. Puech, B. Bakall, R. Silva-Garcia, K. Rohrschneider, M. Leys, F. S. Shaya, E. Stone
O29 PhenoDB and genematcher, solving unsolved whole exome sequencing data
N. L. Sobreira, F. Schiettecatte, H. Ling, E. Pugh, D. Witmer, K. Hetrick, P. Zhang, K. Doheny, D. Valle, A. Hamosh
O30 Baylor-Johns Hopkins Center for Mendelian genomics: a four year review
S. N. Jhangiani, Z. Coban Akdemir, M. N. Bainbridge, W. Charng, W. Wiszniewski, T. Gambin, E. Karaca, Y. Bayram, M. K. Eldomery, J. Posey, H. Doddapaneni, J. Hu, V. R. Sutton, D. M. Muzny, E. A. Boerwinkle, D. Valle, J. R. Lupski, R. A. Gibbs
O31 Using read overlap assembly to accurately identify structural genetic differences in an ashkenazi jewish trio
S. Shekar, W. Salerno, A. English, A. Mangubat, J. Bruestle
O32 Legal interoperability: a sine qua non for international data sharing
A. Thorogood, B. M. Knoppers, Global Alliance for Genomics and Health - Regulatory and Ethics Working Group
O33 High throughput screening platform of competent sineups: that can enhance translation activities of therapeutic target
H. Takahashi, K. R. Nitta, A. Kozhuharova, A. M. Suzuki, H. Sharma, D. Cotella, C. Santoro, S. Zucchelli, S. Gustincich, P. Carninci
O34 The undiagnosed diseases network international (UDNI): clinical and laboratory research to meet patient needs
J. J. Mulvihill, G. Baynam, W. Gahl, S. C. Groft, K. Kosaki, P. Lasko, B. Melegh, D. Taruscio
O36 Performance of computational algorithms in pathogenicity predictions for activating variants in oncogenes versus loss of function mutations in tumor suppressor genes
R. Ghosh, S. Plon
O37 Identification and electronic health record incorporation of clinically actionable pharmacogenomic variants using prospective targeted sequencing
S. Scherer, X. Qin, R. Sanghvi, K. Walker, T. Chiang, D. Muzny, L. Wang, J. Black, E. Boerwinkle, R. Weinshilboum, R. Gibbs
O38 Melanoma reprogramming state correlates with response to CTLA-4 blockade in metastatic melanoma
T. Karpinets, T. Calderone, K. Wani, X. Yu, C. Creasy, C. Haymaker, M. Forget, V. Nanda, J. Roszik, J. Wargo, L. Haydu, X. Song, A. Lazar, J. Gershenwald, M. Davies, C. Bernatchez, J. Zhang, A. Futreal, S. Woodman
O39 Data-driven refinement of complex disease classification from integration of heterogeneous functional genomics data in GeneWeaver
E. J. Chesler, T. Reynolds, J. A. Bubier, C. Phillips, M. A. Langston, E. J. Baker
O40 A general statistic framework for genome-based disease risk prediction
M. Xiong, L. Ma, N. Lin, C. Amos
O41 Integrative large-scale causal network analysis of imaging and genomic data and its application in schizophrenia studies
N. Lin, P. Wang, Y. Zhu, J. Zhao, V. Calhoun, M. Xiong
O42 Big data and NGS data analysis: the cloud to the rescue
O. Dobretsberger, M. Egger, F. Leimgruber
O43 Cpipe: a convergent clinical exome pipeline specialised for targeted sequencing
S. Sadedin, A. Oshlack, Melbourne Genomics Health Alliance
O44 A Bayesian classification of biomedical images using feature extraction from deep neural networks implemented on lung cancer data
V. A. A. Antonio, N. Ono, Clark Kendrick C. Go
O45 MAV-SEQ: an interactive platform for the Management, Analysis, and Visualization of sequence data
Z. Ahmed, M. Bolisetty, S. Zeeshan, E. Anguiano, D. Ucar
O47 Allele specific enhancer in EPAS1 intronic regions may contribute to high altitude adaptation of Tibetans
C. Zeng, J. Shao
O48 Nanochannel based next-generation mapping for structural variation detection and comparison in trios and populations
H. Cao, A. Hastie, A. W. Pang, E. T. Lam, T. Liang, K. Pham, M. Saghbini, Z. Dzakula
O49 Archaic introgression in indigenous populations of Malaysia revealed by whole genome sequencing
Y. Chee-Wei, L. Dongsheng, W. Lai-Ping, D. Lian, R. O. Twee Hee, Y. Yunus, F. Aghakhanian, S. S. Mokhtar, C. V. Lok-Yung, J. Bhak, M. Phipps, X. Shuhua, T. Yik-Ying, V. Kumar, H. Boon-Peng
O50 Breast and ovarian cancer prevention: is it time for population-based mutation screening of high risk genes?
I. Campbell, M.-A. Young, P. James, Lifepool
O53 Comprehensive coverage from low DNA input using novel NGS library preparation methods for WGS and WGBS
C. Schumacher, S. Sandhu, T. Harkins, V. Makarov
O54 Methods for large scale construction of robust PCR-free libraries for sequencing on Illumina HiSeqX platform
H. DoddapaneniR. Glenn, Z. Momin, B. Dilrukshi, H. Chao, Q. Meng, B. Gudenkauf, R. Kshitij, J. Jayaseelan, C. Nessner, S. Lee, K. Blankenberg, L. Lewis, J. Hu, Y. Han, H. Dinh, S. Jireh, K. Walker, E. Boerwinkle, D. Muzny, R. Gibbs
O55 Rapid capture methods for clinical sequencing
J. Hu, K. Walker, C. Buhay, X. Liu, Q. Wang, R. Sanghvi, H. Doddapaneni, Y. Ding, N. Veeraraghavan, Y. Yang, E. Boerwinkle, A. L. Beaudet, C. M. Eng, D. M. Muzny, R. A. Gibbs
O56 A diploid personal human genome model for better genomes from diverse sequence data
K. C. C. Worley, Y. Liu, D. S. T. Hughes, S. C. Murali, R. A. Harris, A. C. English, X. Qin, O. A. Hampton, P. Larsen, C. Beck, Y. Han, M. Wang, H. Doddapaneni, C. L. Kovar, W. J. Salerno, A. Yoder, S. Richards, J. Rogers, J. R. Lupski, D. M. Muzny, R. A. Gibbs
O57 Development of PacBio long range capture for detection of pathogenic structural variants
Q. Meng, M. Bainbridge, M. Wang, H. Doddapaneni, Y. Han, D. Muzny, R. Gibbs
O58 Rhesus macaques exhibit more non-synonymous variation but greater impact of purifying selection than humans
R. A. Harris, M. Raveenedran, C. Xue, M. Dahdouli, L. Cox, G. Fan, B. Ferguson, J. Hovarth, Z. Johnson, S. Kanthaswamy, M. Kubisch, M. Platt, D. Smith, E. Vallender, R. Wiseman, X. Liu, J. Below, D. Muzny, R. Gibbs, F. Yu, J. Rogers
O59 Assessing RNA structure disruption induced by single-nucleotide variation
J. Lin, Y. Zhang, Z. Ouyang
P1 A meta-analysis of genome-wide association studies of mitochondrial dna copy number
A. Moore, Z. Wang, J. Hofmann, M. Purdue, R. Stolzenberg-Solomon, S. Weinstein, D. Albanes, C.-S. Liu, W.-L. Cheng, T.-T. Lin, Q. Lan, N. Rothman, S. Berndt
P2 Missense polymorphic genetic combinations underlying down syndrome susceptibility
E. S. Chen
P4 The evaluation of alteration of ELAM-1 expression in the endometriosis patients
H. Bahrami, A. Khoshzaban, S. Heidari Keshal
P5 Obesity and the incidence of apolipoprotein E polymorphisms in an assorted population from Saudi Arabia population
K. K. R. Alharbi
P6 Genome-associated personalized antithrombotical therapy for patients with high risk of thrombosis and bleeding
M. Zhalbinova, A. Akilzhanova, S. Rakhimova, M. Bekbosynova, S. Myrzakhmetova
P7 Frequency of Xmn1 polymorphism among sickle cell carrier cases in UAE population
M. Matar
P8 Differentiating inflammatory bowel diseases by using genomic data: dimension of the problem and network organization
N. Mili, R. Molinari, Y. Ma, S. Guerrier
P9 Vulnerability of genetic variants to the risk of autism among Saudi children
N. Elhawary, M. Tayeb, N. Bogari, N. Qotb
P10 Chromatin profiles from ex vivo purified dopaminergic neurons establish a promising model to support studies of neurological function and dysfunction
S. A. McClymont, P. W. Hook, L. A. Goff, A. McCallion
P11 Utilization of a sensitized chemical mutagenesis screen to identify genetic modifiers of retinal dysplasia in homozygous Nr2e3rd7 mice
Y. Kong, J. R. Charette, W. L. Hicks, J. K. Naggert, L. Zhao, P. M. Nishina
P12 Ion torrent next generation sequencing of recessive polycystic kidney disease in Saudi patients
B. M. Edrees, M. Athar, F. A. Al-Allaf, M. M. Taher, W. Khan, A. Bouazzaoui, N. A. Harbi, R. Safar, H. Al-Edressi, A. Anazi, N. Altayeb, M. A. Ahmed, K. Alansary, Z. Abduljaleel
P13 Digital expression profiling of Purkinje neurons and dendrites in different subcellular compartments
A. Kratz, P. Beguin, S. Poulain, M. Kaneko, C. Takahiko, A. Matsunaga, S. Kato, A. M. Suzuki, N. Bertin, T. Lassmann, R. Vigot, P. Carninci, C. Plessy, T. Launey
P14 The evolution of imperfection and imperfection of evolution: the functional and functionless fractions of the human genome
D. Graur
P16 Species-independent identification of known and novel recurrent genomic entities in multiple cancer patients
J. Friis-Nielsen, J. M. Izarzugaza, S. Brunak
P18 Discovery of active gene modules which are densely conserved across multiple cancer types reveal their prognostic power and mutually exclusive mutation patterns
B. S. Soibam
P19 Whole exome sequencing of dysplastic leukoplakia tissue indicates sequential accumulation of somatic mutations from oral precancer to cancer
D. Das, N. Biswas, S. Das, S. Sarkar, A. Maitra, C. Panda, P. Majumder
P21 Epigenetic mechanisms of carcinogensis by hereditary breast cancer genes
J. J. Gruber, N. Jaeger, M. Snyder
P22 RNA direct: a novel RNA enrichment strategy applied to transcripts associated with solid tumors
K. Patel, S. Bowman, T. Davis, D. Kraushaar, A. Emerman, S. Russello, N. Henig, C. Hendrickson
P23 RNA sequencing identifies gene mutations for neuroblastoma
K. Zhang
P24 Participation of SFRP1 in the modulation of TMPRSS2-ERG fusion gene in prostate cancer cell lines
M. Rodriguez-Dorantes, C. D. Cruz-Hernandez, C. D. P. Garcia-Tobilla, S. Solorzano-Rosales
P25 Targeted Methylation Sequencing of Prostate Cancer
N. Jäger, J. Chen, R. Haile, M. Hitchins, J. D. Brooks, M. Snyder
P26 Mutant TPMT alleles in children with acute lymphoblastic leukemia from México City and Yucatán, Mexico
S. Jiménez-Morales, M. Ramírez, J. Nuñez, V. Bekker, Y. Leal, E. Jiménez, A. Medina, A. Hidalgo, J. Mejía
P28 Genetic modifiers of Alström syndrome
J. Naggert, G. B. Collin, K. DeMauro, R. Hanusek, P. M. Nishina
P31 Association of genomic variants with the occurrence of angiotensin-converting-enzyme inhibitor (ACEI)-induced coughing among Filipinos
E. M. Cutiongco De La Paz, R. Sy, J. Nevado, P. Reganit, L. Santos, J. D. Magno, F. E. Punzalan , D. Ona , E. Llanes, R. L. Santos-Cortes , R. Tiongco, J. Aherrera, L. Abrahan, P. Pagauitan-Alan; Philippine Cardiogenomics Study Group
P32 The use of “humanized” mouse models to validate disease association of a de novo GARS variant and to test a novel gene therapy strategy for Charcot-Marie-Tooth disease type 2D
K. H. Morelli, J. S. Domire, N. Pyne, S. Harper, R. Burgess
P34 Molecular regulation of chondrogenic human induced pluripotent stem cells
M. A. Gari, A. Dallol, H. Alsehli, A. Gari, M. Gari, A. Abuzenadah
P35 Molecular profiling of hematologic malignancies: implementation of a variant assessment algorithm for next generation sequencing data analysis and clinical reporting
M. Thomas, M. Sukhai, S. Garg, M. Misyura, T. Zhang, A. Schuh, T. Stockley, S. Kamel-Reid
P36 Accessing genomic evidence for clinical variants at NCBI
S. Sherry, C. Xiao, D. Slotta, K. Rodarmer, M. Feolo, M. Kimelman, G. Godynskiy, C. O’Sullivan, E. Yaschenko
P37 NGS-SWIFT: a cloud-based variant analysis framework using control-accessed sequencing data from DBGAP/SRA
C. Xiao, E. Yaschenko, S. Sherry
P38 Computational assessment of drug induced hepatotoxicity through gene expression profiling
C. Rangel-Escareño, H. Rueda-Zarate
P40 Flowr: robust and efficient pipelines using a simple language-agnostic approach;ultraseq; fast modular pipeline for somatic variation calling using flowr
S. Seth, S. Amin, X. Song, X. Mao, H. Sun, R. G. Verhaak, A. Futreal, J. Zhang
P41 Applying “Big data” technologies to the rapid analysis of heterogenous large cohort data
S. J. Whiite, T. Chiang, A. English, J. Farek, Z. Kahn, W. Salerno, N. Veeraraghavan, E. Boerwinkle, R. Gibbs
P42 FANTOM5 web resource for the large-scale genome-wide transcription start site activity profiles of wide-range of mammalian cells
T. Kasukawa, M. Lizio, J. Harshbarger, S. Hisashi, J. Severin, A. Imad, S. Sahin, T. C. Freeman, K. Baillie, A. Sandelin, P. Carninci, A. R. R. Forrest, H. Kawaji, The FANTOM Consortium
P43 Rapid and scalable typing of structural variants for disease cohorts
W. Salerno, A. English, S. N. Shekar, A. Mangubat, J. Bruestle, E. Boerwinkle, R. A. Gibbs
P44 Polymorphism of glutathione S-transferases and sulphotransferases genes in an Arab population
A. H. Salem, M. Ali, A. Ibrahim, M. Ibrahim
P46 Genetic divergence of CYP3A5*3 pharmacogenomic marker for native and admixed Mexican populations
J. C. Fernandez-Lopez, V. Bonifaz-Peña, C. Rangel-Escareño, A. Hidalgo-Miranda, A. V. Contreras
P47 Whole exome sequence meta-analysis of 13 white blood cell, red blood cell, and platelet traits
L. Polfus, CHARGE and NHLBI Exome Sequence Project Working Groups
P48 Association of adipoq gene with type 2 diabetes and related phenotypes in african american men and women: The jackson heart study
S. Davis, R. Xu, S. Gebeab, P Riestra, A Gaye, R. Khan, J. Wilson, A. Bidulescu
P49 Common variants in casr gene are associated with serum calcium levels in koreans
S. H. Jung, N. Vinayagamoorthy, S. H. Yim, Y. J. Chung
P50 Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with multiple exponential functions
Y. Zhou, S. Xu
P51 A Bayesian framework for generalized linear mixed models in genome-wide association studies
X. Wang, V. Philip, G. Carter
P52 Targeted sequencing approach for the identification of the genetic causes of hereditary hearing impairment
A. A. Abuzenadah, M. Gari, R. Turki, A. Dallol
P53 Identification of enhancer sequences by ATAC-seq open chromatin profiling
A. Uyar, A. Kaygun, S. Zaman, E. Marquez, J. George, D. Ucar
P54 Direct enrichment for the rapid preparation of targeted NGS libraries
C. L. Hendrickson, A. Emerman, D. Kraushaar, S. Bowman, N. Henig, T. Davis, S. Russello, K. Patel
P56 Performance of the Agilent D5000 and High Sensitivity D5000 ScreenTape assays for the Agilent 4200 Tapestation System
R. Nitsche, L. Prieto-Lafuente
P57 ClinVar: a multi-source archive for variant interpretation
M. Landrum, J. Lee, W. Rubinstein, D. Maglott
P59 Association of functional variants and protein physical interactions of human MUTY homolog linked with familial adenomatous polyposis and colorectal cancer syndrome
Z. Abduljaleel, W. Khan, F. A. Al-Allaf, M. Athar , M. M. Taher, N. Shahzad
P60 Modification of the microbiom constitution in the gut using chicken IgY antibodies resulted in a reduction of acute graft-versus-host disease after experimental bone marrow transplantation
A. Bouazzaoui, E. Huber, A. Dan, F. A. Al-Allaf, W. Herr, G. Sprotte, J. Köstler, A. Hiergeist, A. Gessner, R. Andreesen, E. Holler
P61 Compound heterozygous mutation in the LDLR gene in Saudi patients suffering severe hypercholesterolemia
F. Al-Allaf, A. Alashwal, Z. Abduljaleel, M. Taher, A. Bouazzaoui, H. Abalkhail, A. Al-Allaf, R. Bamardadh, M. Athar
PMCID: PMC4896275  PMID: 27294413
20.  Tests of Association for Quantitative Traits in Nuclear Families Using Principal Components to Correct for Population Stratification 
Annals of human genetics  2009;73(Pt 6):601-613.
Traditional transmission disequilibrium test (TDT) based methods for genetic association analyses are robust to population stratification at the cost of a substantial loss of power. We here describe a novel method for family-based association studies that corrects for population stratification with the use of an extension of principal component analysis (PCA). Specifically, we adopt PCA on unrelated parents in each family. We then infer principal components for children from those for their parents through a TDT-like strategy. Two test statistics within variance-components model are proposed for association tests. Simulation results show that the proposed tests have correct type I error rates regardless of population stratification, and have greatly improved power over two popular TDT-based methods: QTDT and FBAT. The application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility of the proposed method.
PMCID: PMC2764806  PMID: 19702646
Family Based Association Tests (FBATs); Transmission Disequilibrium Test (TDT); Principal Component Analysis (PCA); Variance-Components
21.  Theoretical Formulation of Principal Components Analysis to Detect and Correct for Population Stratification 
PLoS ONE  2010;5(9):e12510.
The Eigenstrat method, based on principal components analysis (PCA), is commonly used both to quantify population relationships in population genetics and to correct for population stratification in genome-wide association studies. However, it can be difficult to make appropriate inference about population relationships from the principal component (PC) scatter plot. Here, to better understand the working mechanism of the Eigenstrat method, we consider its theoretical or “population” formulation. The eigen-equation for samples from an arbitrary number () of populations is reduced to that of a matrix of dimension , the elements of which are determined by the variance-covariance matrix for the random vector of the allele frequencies. Solving the reduced eigen-equation is numerically trivial and yields eigenvectors that are the axes of variation required for differentiating the populations. Using the reduced eigen-equation, we investigate the within-population fluctuations around the axes of variation on the PC scatter plot for simulated datasets. Specifically, we show that there exists an asymptotically stable pattern of the PC plot for large sample size. Our results provide theoretical guidance for interpreting the pattern of PC plot in terms of population relationships. For applications in genetic association tests, we demonstrate that, as a method of correcting for population stratification, regressing out the theoretical PCs corresponding to the axes of variation is equivalent to simply removing the population mean of allele counts and works as well as or better than the Eigenstrat method.
PMCID: PMC2941459  PMID: 20862251
22.  Study of large and highly stratified population datasets by combining iterative pruning principal component analysis and structure 
BMC Bioinformatics  2011;12:255.
The ever increasing sizes of population genetic datasets pose great challenges for population structure analysis. The Tracy-Widom (TW) statistical test is widely used for detecting structure. However, it has not been adequately investigated whether the TW statistic is susceptible to type I error, especially in large, complex datasets. Non-parametric, Principal Component Analysis (PCA) based methods for resolving structure have been developed which rely on the TW test. Although PCA-based methods can resolve structure, they cannot infer ancestry. Model-based methods are still needed for ancestry analysis, but they are not suitable for large datasets. We propose a new structure analysis framework for large datasets. This includes a new heuristic for detecting structure and incorporation of the structure patterns inferred by a PCA method to complement STRUCTURE analysis.
A new heuristic called EigenDev for detecting population structure is presented. When tested on simulated data, this heuristic is robust to sample size. In contrast, the TW statistic was found to be susceptible to type I error, especially for large population samples. EigenDev is thus better-suited for analysis of large datasets containing many individuals, in which spurious patterns are likely to exist and could be incorrectly interpreted as population stratification. EigenDev was applied to the iterative pruning PCA (ipPCA) method, which resolves the underlying subpopulations. This subpopulation information was used to supervise STRUCTURE analysis to infer patterns of ancestry at an unprecedented level of resolution. To validate the new approach, a bovine and a large human genetic dataset (3945 individuals) were analyzed. We found new ancestry patterns consistent with the subpopulations resolved by ipPCA.
The EigenDev heuristic is robust to sampling and is thus superior for detecting structure in large datasets. The application of EigenDev to the ipPCA algorithm improves the estimation of the number of subpopulations and the individual assignment accuracy, especially for very large and complex datasets. Furthermore, we have demonstrated that the structure resolved by this approach complements parametric analysis, allowing a much more comprehensive account of population structure. The new version of the ipPCA software with EigenDev incorporated can be downloaded from
PMCID: PMC3148578  PMID: 21699684
23.  Comparison of Population-Based Association Study Methods Correcting for Population Stratification 
PLoS ONE  2008;3(10):e3392.
Population stratification can cause spurious associations in population–based association studies. Several statistical methods have been proposed to reduce the impact of population stratification on population–based association studies. We simulated a set of stratified populations based on the real haplotype data from the HapMap ENCODE project, and compared the relative power, type I error rates, accuracy and positive prediction value of four prevailing population–based association study methods: traditional case-control tests, structured association (SA), genomic control (GC) and principal components analysis (PCA) under various population stratification levels. Additionally, we evaluated the effects of sample sizes and frequencies of disease susceptible allele on the performance of the four analytical methods in the presence of population stratification. We found that the performance of PCA was very stable under various scenarios. Our comparison results suggest that SA and PCA have comparable performance, if sufficient ancestral informative markers are used in SA analysis. GC appeared to be strongly conservative in significantly stratified populations. It may be better to apply GC in the stratified populations with low stratification level. Our study intends to provide a practical guideline for researchers to select proper study methods and make appropriate inference of the results in population-based association studies.
PMCID: PMC2562035  PMID: 18852890
24.  Testing Genetic Association with Rare and Common Variants in Family Data 
Genetic epidemiology  2014;38(0 1):S37-S43.
With the advance of next-generation sequencing technologies in recent years, rare genetic variant data have now become available for genetic epidemiology studies. For family samples however, only a few statistical methods for association analysis of rare genetic variants have been developed. Rare variant approaches are of great interest particularly for family data because samples enriched for trait-relevant variants can be ascertained and rare variants are putatively enriched through segregation. To facilitate the evaluation of existing and new rare variant testing approaches for analyzing family data, Genetic Analysis Workshop 18 (GAW18) provided genotype and next-generation sequencing data and longitudinal blood pressure traits from extended pedigrees of Mexican-American families from the San Antonio Family Study. Our GAW18 group members analyzed real and simulated phenotype data from GAW18 by using generalized linear mixed-effects models or principal components to adjust for familial correlation or by testing binary traits using a correction factor for familial effects. With one exception, approaches dealt with the extended pedigrees in their original state using information based on the kinship matrix or alternative genetic similarity measures. For simulated data, our group demonstrated that the family-based kernel machine score test is superior in power to family-based single-marker or burden tests, except in a few specific scenarios. For real data, three contributions identified significant associations. They substantially reduced the number of tests before performing the association analysis. We conclude from our real data analyses that further development of strategies for targeted testing or more focused screening of genetic variants is strongly desirable.
PMCID: PMC4324976  PMID: 25112186
extended pedigrees; rare variant analysis; family-based association test; linear mixed effects model; kernel machine score test; principal components
25.  A variable age of onset segregation model for linkage analysis, with correction for ascertainment, applied to glioma 
We propose a two-step model-based approach, with correction for ascertainment, to linkage analysis of a binary trait with variable age of onset and apply it to a set of multiplex pedigrees segregating for adult glioma.
First, we fit segregation models by formulating the likelihood for a person to have a bivariate phenotype, affection status and age of onset, along with other covariates, and from these we estimate population trait allele frequencies and penetrance parameters as a function of age (N=281 multiplex glioma pedigrees). Second, the best fitting models are used as trait models in multipoint linkage analysis (N=74 informative multiplex glioma pedigrees). To correct for ascertainment, a prevalence constraint is used in the likelihood of the segregation models for all 281 pedigrees. Then the trait allele frequencies are re-estimated for the pedigree founders of the subset of 74 pedigrees chosen for linkage analysis.
Using the best fitting segregation models in model-based multipoint linkage analysis, we identified two separate peaks on chromosome 17; the first agreed with a region identified by Shete et al. who used model-free affected-only linkage analysis, but with a narrowed peak: and the second agreed with a second region they found but had a larger maximum log of the odds (LOD).
Our approach has the advantage of not requiring markers to be in linkage equilibrium unless the minor allele frequency is small (markers which tend to be uninformative for linkage), and of using more of the available information for LOD-based linkage analysis.
PMCID: PMC3518573  PMID: 22962404
Glioma; model-based linkage; segregation; age of onset; prevalence constraint

Results 1-25 (1996599)