Search tips
Search criteria

Results 1-25 (1138580)

Clipboard (0)

Related Articles

1.  Powerful Haplotype-Based Hardy-Weinberg Equilibrium Tests for Tightly Linked Loci 
PLoS ONE  2013;8(10):e77399.
Recently, there have been many case-control studies proposed to test for association between haplotypes and disease, which require the Hardy-Weinberg equilibrium (HWE) assumption of haplotype frequencies. As such, haplotype inference of unphased genotypes and development of haplotype-based HWE tests are crucial prior to fine mapping. The goodness-of-fit test is a frequently-used method to test for HWE for multiple tightly-linked loci. However, its degrees of freedom dramatically increase with the increase of the number of loci, which may lack the test power. Therefore, in this paper, to improve the test power for haplotype-based HWE, we first write out two likelihood functions of the observed data based on the Niu's model (NM) and inbreeding model (IM), respectively, which can cause the departure from HWE. Then, we use two expectation-maximization algorithms and one expectation-conditional-maximization algorithm to estimate the model parameters under the HWE, IM and NM models, respectively. Finally, we propose the likelihood ratio tests LRT and LRT for haplotype-based HWE under the NM and IM models, respectively. We simulate the HWE, Niu's, inbreeding and population stratification models to assess the validity and compare the performance of these two LRT tests. The simulation results show that both of the tests control the type I error rates well in testing for haplotype-based HWE. If the NM model is true, then LRT is more powerful. While, if the true model is the IM model, then LRT has better performance in power. Under the population stratification model, LRT is still more powerful. To this end, LRT is generally recommended. Application of the proposed methods to a rheumatoid arthritis data set further illustrates their utility for real data analysis.
PMCID: PMC3805574  PMID: 24167573
2.  Assessing population genetic structure via the maximisation of genetic distance 
The inference of the hidden structure of a population is an essential issue in population genetics. Recently, several methods have been proposed to infer population structure in population genetics.
In this study, a new method to infer the number of clusters and to assign individuals to the inferred populations is proposed. This approach does not make any assumption on Hardy-Weinberg and linkage equilibrium. The implemented criterion is the maximisation (via a simulated annealing algorithm) of the averaged genetic distance between a predefined number of clusters. The performance of this method is compared with two Bayesian approaches: STRUCTURE and BAPS, using simulated data and also a real human data set.
The simulations show that with a reduced number of markers, BAPS overestimates the number of clusters and presents a reduced proportion of correct groupings. The accuracy of the new method is approximately the same as for STRUCTURE. Also, in Hardy-Weinberg and linkage disequilibrium cases, BAPS performs incorrectly. In these situations, STRUCTURE and the new method show an equivalent behaviour with respect to the number of inferred clusters, although the proportion of correct groupings is slightly better with the new method. Re-establishing equilibrium with the randomisation procedures improves the precision of the Bayesian approaches. All methods have a good precision for FST ≥ 0.03, but only STRUCTURE estimates the correct number of clusters for FST as low as 0.01. In situations with a high number of clusters or a more complex population structure, MGD performs better than STRUCTURE and BAPS. The results for a human data set analysed with the new method are congruent with the geographical regions previously found.
This new method used to infer the hidden structure in a population, based on the maximisation of the genetic distance and not taking into consideration any assumption about Hardy-Weinberg and linkage equilibrium, performs well under different simulated scenarios and with real data. Therefore, it could be a useful tool to determine genetically homogeneous groups, especially in those situations where the number of clusters is high, with complex population structure and where Hardy-Weinberg and/or linkage equilibrium are present.
PMCID: PMC2776585  PMID: 19900278
3.  Handling linkage disequilibrium in linkage analysis using dense single-nucleotide polymorphisms 
BMC Proceedings  2007;1(Suppl 1):S161.
The presence of linkage disequilibrium violates the underlying assumption of linkage equilibrium in most traditional multipoint linkage approaches. Studies have shown that such violation leads to bias in qualitative trait linkage analysis when parental genotypes are unavailable. Appropriate handling of marker linkage disequilibrium can avoid such false positive evidence. Using the rheumatoid arthritis simulated data from Genetic Analysis Workshop 15, we examined and compared the following three approaches to handle linkage disequilibrium among dense markers in both qualitative and quantitative trait linkage analyses: a simple algorithm; SNPLINK, methods for marker selection; and MERLIN-LD, a method for modeling linkage disequilibrium by creating marker clusters. In analysis ignoring linkage disequilibrium between markers, we observed LOD score inflation only in the affected sib-pair linkage analysis without parental genotypes; no such inflation was present in the quantitative trait locus linkage analysis with severity as our phenotype with or without parental genotypes. Using methods to model or adjust for linkage disequilibrium, we found a substantial reduction of inflation of LOD score in affected sib-pair linkage analysis. Greater LOD score reduction was observed by decreasing the amount of tolerable linkage disequilibrium among markers selected or marker clusters using MERLIN-LD; the latter approach showed most reduction. SNPLINK performed better with selected markers based on the D' measure of linkage disequilibrium as opposed to the r2 measure and outperformed the simple algorithm. Our findings reiterate the necessity of properly handling dense markers in linkage analysis, especially when parental genotypes are unavailable.
PMCID: PMC2367569  PMID: 18466507
4.  A novel similarity-measure for the analysis of genetic data in complex phenotypes 
BMC Bioinformatics  2009;10(Suppl 6):S24.
Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machine Learning literature is limited to relatively few papers which are focused on the development and application of data mining methods for the analysis of genetic variability. On the other hand, these papers apply to genetic data procedures which had been developed for a different kind of analysis and do not take into account the peculiarities of population genetics. The aim of our study was to define a new similarity measure, specifically conceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e., cases and controls) taking into account that genetic profiles are usually distributed in a population group according to the Hardy Weinberg equilibrium.
We set up a new kernel function consisting of a similarity measure between groups of subjects genotyped for numerous genetic loci. This measure weighs different genetic profiles according to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population. We named this function the "Hardy-Weinberg kernel".
The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the well established linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed the linear kernel in a number of experiments where we used either simulated data or real data.
The "Hardy-Weinberg kernel" reported here represents one of the first attempts at incorporating genetic knowledge into the definition of a kernel function designed for the analysis of genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observed when rare genotypes have different frequencies in cases and controls. The ability to capture the effect of rare genotypes on phenotypic traits might be a very important and useful feature, as most of the current statistical tools loose most of their statistical power when rare genotypes are involved in the susceptibility to the trait under study.
PMCID: PMC2697648  PMID: 19534750
5.  Analysis of Case-Control Studies of Genetic and Environmental Factors With Missing Genetic Information and Haplotype-phase Ambiguity 
Genetic epidemiology  2005;29(2):108-127.
Case-control studies of unrelated subjects are now widely used to study the role of genetic susceptibility and gene-environment interactions in the etiology of complex diseases. Exploiting an assumption of gene-environment independence, and treating the distribution of the environmental exposures to be completely nonparametric, Chatterjee and Carroll (2005) recently developed an efficient retrospective maximum-likelihood method for analysis of case-control studies. In this article, we develop an extension of the retrospective maximum-likelihood approach to studies where genetic information may be missing on some study subjects. In particular, special emphasis is given to haplotype-based studies where missing data arises due to linkage-phase ambiguity of genotype data. We use a profile likelihood technique and an appropriate EM algorithm to derive a relatively simple procedure for parameter estimation, with or without a rare disease assumption, and possibly incorporating information on the marginal probability of the disease for the underlying population. We also describe two alternative robust approaches that are less sensitive to the underlying gene-environment independence and Hardy-Weinberg-Equilibrium assumptions. The performance of the proposed methods are studied using simulation studies in the context of haplotype-based studies of gene-environment interaction. An application of the proposed method is illustrated using a case-control study of ovarian cancer designed to study the interaction between BRCA1/2 mutations and reproductive risk factors in the etiology of ovarian cancer.
PMCID: PMC2585318  PMID: 16080203
Case-control studies; Gene-environment interactions; EM-algorithm; Haplo-type; Semiparametric methods
6.  Cubic exact solutions for the estimation of pairwise haplotype frequencies: implications for linkage disequilibrium analyses and a web tool 'CubeX' 
BMC Bioinformatics  2007;8:428.
The frequency of a haplotype comprising one allele at each of two loci can be expressed as a cubic equation (the 'Hill equation'), the solution of which gives that frequency. Most haplotype and linkage disequilibrium analysis programs use iteration-based algorithms which substitute an estimate of haplotype frequency into the equation, producing a new estimate which is repeatedly fed back into the equation until the values converge to a maximum likelihood estimate (expectation-maximisation).
We present a program, "CubeX", which calculates the biologically possible exact solution(s) and provides estimated haplotype frequencies, D', r2 and χ2 values for each. CubeX provides a "complete" analysis of haplotype frequencies and linkage disequilibrium for a pair of biallelic markers under situations where sampling variation and genotyping errors distort sample Hardy-Weinberg equilibrium, potentially causing more than one biologically possible solution. We also present an analysis of simulations and real data using the algebraically exact solution, which indicates that under perfect sample Hardy-Weinberg equilibrium there is only one biologically possible solution, but that under other conditions there may be more.
Our analyses demonstrate that lower allele frequencies, lower sample numbers, population stratification and a possible |D'| value of 1 are particularly susceptible to distortion of sample Hardy-Weinberg equilibrium, which has significant implications for calculation of linkage disequilibrium in small sample sizes (eg HapMap) and rarer alleles (eg paucimorphisms, q < 0.05) that may have particular disease relevance and require improved approaches for meaningful evaluation.
PMCID: PMC2180187  PMID: 17980034
7.  Genetic Analysis Workshop 15: simulation of a complex genetic model for rheumatoid arthritis in nuclear families including a dense SNP map with linkage disequilibrium between marker loci and trait loci 
BMC Proceedings  2007;1(Suppl 1):S4.
Data for Problem 3 of the Genetic Analysis Workshop 15 were generated by computer simulation in an attempt to mimic some of the genetic and epidemiological features of rheumatoid arthritis (RA) such as its population prevalence, sex ratio, risk to siblings of affected individuals, association with cigarette smoking, the strong effect of genotype in the HLA region and other genetic effects. A complex genetic model including epistasis and genotype-by-environment interaction was applied to a population of 1.9 million nuclear families of size four from which we selected 1500 families with both offspring affected and 2000 unrelated, unaffected individuals all of whose first-degree relatives were unaffected. This process was repeated to produce 100 replicate data sets. In addition, we generated marker data for 22 autosomes consisting of a genome-wide set of 730 simulated STRP markers, 9187 SNP markers and an additional 17,820 SNP markers on chromosome 6. Appropriate linkage disequilibrium between markers and between trait loci and markers was modelled using HapMap Phase 1 data . The code base for this project was written primarily in the Octave programming language, but it is being ported to the R language and developed into a larger project for general genetic simulation called GenetSim . All of the source code that was used to generate the GAW 15 Problem 3 data is freely available for download at .
PMCID: PMC2367506  PMID: 18466538
8.  Which strategy is better for linkage analysis: single-nucleotide polymorphisms or microsatellites? Evaluation by identity-by-state – identity-by-descent transformation affected sib-pair method on GAW14 data 
BMC Genetics  2005;6(Suppl 1):S16.
The central issue for Genetic Analysis Workshop 14 (GAW14) is the question, which is the better strategy for linkage analysis, the use of single-nucleotide polymorphisms (SNPs) or microsatellite markers? To answer this question we analyzed the simulated data using Duffy's SIB-PAIR program, which can incorporate parental genotypes, and our identity-by-state – identity-by-descent (IBS-IBD) transformation method of affected sib-pair linkage analysis which uses the matrix transformation between IBS and IBD. The advantages of our method are as follows: the assumption of Hardy-Weinberg equilibrium is not necessary; the parental genotype information maybe all unknown; both IBS and its related IBD transformation can be used in the linkage analysis; the determinant of the IBS-IBD transformation matrix provides a quantitative measure of the quality of the marker in linkage analysis. With the originally distributed simulated data, we found that 1) for microsatellite markers there are virtually no differences in types I and II error rates when parental genotypes were or were not used; 2) on average, a microsatellite marker has more power than a SNP marker does in linkage detection; 3) if parental genotype information is used, SNP markers show lower type I error rates than microsatellite markers; and 4) if parental genotypes are not available, SNP markers show considerable variation in type I error rates for different methods.
PMCID: PMC1866774  PMID: 16451621
9.  Testing Haplotype-Environment Interactions Using Case-Parent Triads 
Human Heredity  2010;70(1):23-33.
Joint analysis of multiple SNP markers can be informative, but studying joint effects of haplotypes and environmental exposures is challenging. Population structure can involve both genes and exposures and a case-control study is susceptible to bias from either source of stratification. We propose a procedure that uses case-parent triad data and, though not fully robust, resists bias from population structure.
Our procedure assumes that haplotypes under study have no influence on propensity to exposure. Then, under a no-interaction null hypothesis (multiplicative scale), transmission of a causative haplotype from parents to affected offspring might show distortion from Mendelian proportions but should be independent of exposure. We used this insight to develop a permutation test of no haplotype-by-exposure interaction.
Simulations showed that our proposed test respects the nominal Type I error rate and provides good power under a variety of scenarios. We illustrate by examining whether SNP variants in GSTP1 modify the association between maternal smoking and oral clefting.
Our procedure offers desirable features: no need for haplotype estimation, validity under unspecified genetic main effects, tolerance to Hardy-Weinberg disequilibrium, ability to handle missing genotypes and a relatively large number of SNPs. Simulations suggest resistance to bias due to exposure-related population stratification.
PMCID: PMC2912643  PMID: 20413979
Haplotype-environment interaction; Gene-environment interaction; Case-parent triad; Permutation test; Non-parametreic test; Population stratification
10.  Accommodating population stratification in case-control association analysis: a new test and its application to genome-wide study on rheumatoid arthritis 
BMC Proceedings  2009;3(Suppl 7):S111.
It is well known that conventional association tests can lead to excessive false positives when there is population stratification. We propose a new test for detecting genetic association with a case-control study design. Unlike some other methods for handling population stratification, we treat the cases as a population and the controls as another one even though each of them may be a mixture of several sub-populations. A likelihood-ratio test is used to test whether the allele frequency of a testing single-nucleotide polymorphism in the case population is the same as that in the control population. This new test is applied to the Genetic Analysis Workshop 16 Problem 1 data on rheumatoid arthritis. Compared with the Pearson chi-square genotype test, the association strength of many single-nucleotide polymorphisms is decreased while the signal at the HLA region on 6p21 is maintained.
PMCID: PMC2795883  PMID: 20017976
11.  Conditional genotype analysis: detecting secondary disease loci in linkage disequilibrium with a primary disease locus 
BMC Proceedings  2007;1(Suppl 1):S163.
A number of autoimmune and other diseases have well established HLA associations; in many cases there is strong evidence for the direct involvement of the HLA class II peptide-presenting antigens, e.g., HLA DR-DQ for type 1 diabetes (T1D) and HLA-DR for rheumatoid arthritis (RA). The involvement of additional HLA region genes in the disease process is implicated in these diseases. We have developed a model-free approach to detect these additional disease genes using genotype data; the conditional genotype method (CGM) and overall conditional genotype method (OCGM) use all patient and control data and do not require haplotype estimation. Genotypes at marker genes in the HLA region are stratified and their expected values are determined in a way that removes the effects of linkage disequilibrium (LD) with the peptide-presenting HLA genes directly involved in the disease. A statistic has been developed under the null hypothesis of no additional disease genes in the HLA region for the OCGM method and was applied to the Genetic Analysis Workshop 15 simulated data set of Problem 3, which mimics RA (answers were known). In addition to the primary effect of the HLA DR locus, the effects of the other two HLA region simulated genes involved in disease were detected (gene C, 0 cM from DR, increases RA risk only in women; and gene D, 5.12 cM from DR, rare allele increases RA risk five-fold). No false negatives were found. Power calculations were performed.
PMCID: PMC2367484  PMID: 18466509
12.  SYMPHONY, an information-theoretic method for gene–gene and gene–environment interaction analysis of disease syndromes 
Heredity  2013;110(6):548-559.
We develop an information-theoretic method for gene–gene (GGI) and gene–environmental interactions (GEI) analysis of syndromes, defined as a phenotype vector comprising multiple quantitative traits (QTs). The K-way interaction information (KWII), an information-theoretic metric, was derived for multivariate normal distributed phenotype vectors. The utility of the method was challenged with three simulated data sets, the Genetic Association Workshop-15 (GAW15) rheumatoid arthritis data set, a high-density lipoprotein (HDL) and atherosclerosis data set from a mouse QT locus study, and the 1000 Genomes data. The dependence of the KWII on effect size, minor allele frequency, linkage disequilibrium, population stratification/admixture, as well as the power and computational time requirements of the novel method was systematically assessed in simulation studies. In these studies, phenotype vectors containing two and three constituent multivariate normally distributed QTs were used and the KWII was found to be effective at detecting GEI associated with the phenotype. High KWII values were observed for variables and variable combinations associated with the syndrome phenotype compared with uninformative variables not associated with the phenotype. The KWII values for the phenotype-associated combinations increased monotonically with increasing effect size values. The KWII also exhibited utility in simulations with non-linear dependence between the constituent QTs. Analysis of the HDL and atherosclerosis data set indicated that the simultaneous analysis of both phenotypes identified interactions not detected in the analysis of the individual traits. The information-theoretic approach may be useful for non-parametric analysis of GGI and GEI of complex syndromes.
PMCID: PMC3656633  PMID: 23423149
gene–environment interactions; gene–gene interactions; K-way interaction information; syndromes; complex diseases
13.  Case-Control Genome-wide Joint Association Study Using Semiparametric Empirical Model and Approximate Bayes Factor 
We propose a semiparametric approach for the analysis of case-control genome-wide association study. Parametric components are used to model both the conditional distribution of the case status given the covariates and the distribution of genotype counts, whereas the distribution of the covariates are modeled nonparametrically. This yields a direct and joint modeling of the case status, covariates and genotype counts, and gives better understanding of the disease mechanism and results in more reliable conclusions. Side information, such as the disease prevalence, can be conveniently incorporated into the model by empirical likelihood approach and leads to more efficient estimates and powerful test in the detection of disease-associated SNPs. Profiling is used to eliminate a nuisance nonparametric component, and the resulting profile empirical likelihood estimates are shown to be consistent and asymptotically normal. For the hypothesis test on disease association, we apply the approximate Bayes factor (ABF) which is computationally simple and most desirable in genome-wide association studies where hundreds of thousands to a million genetic markers are tested. We treat the approximate Bayes factor as a hybrid Bayes factor which replaces the full data by the maximum likelihood estimates of the parameters of interest in the full model and derive it under a general setting. The deviation from Hardy-Weinberg Equilibrium (HWE) is also taken into account and the ABF for HWE using cases is shown to provide evidence of association between a disease and a genetic marker. Simulation studies and an application are further provided to illustrate the utility of the proposed methodology.
PMCID: PMC3921884  PMID: 24532860
Approximate Bayes factor; association study; empirical likelihood; genetic model; Hardy-Weinberg Equilibrium; profile likelihood; robustness; side information
14.  TNFAIP3 rs2230926 polymorphisms in rheumatoid arthritis of southern Chinese Han population: a case-control study 
Polymorphism of tumor necrosis factor alpha-induced protein 3 (TNFAIP3) has been be related to various auto-immune diseases. Based on previous studies that the single nucleotide polymorphism (SNP) of rs2230926 was association with rheumatoid arthritis (RA) of Japanese, Caucasian population and the northern Chinese Han population, we tested the alleles and geno-type frequencies of rs2230926 in TNFAIP3 to investigate whether rs2230926 is susceptible to RA of southern Chinese Han population. In our case-control association study, 207 RA patients fulfilling the American College of Rheumatology (ACR) 1987 criteria were compared with 199 unrelated healthy subjects. After testing the alleles and genotype frequencies of rs2230926, the airwise linkage disequilibrium (LD) was computed and odd ration (OR) and 95% confidence intervals (95% CI) were used for evaluating the susceptibility to RA. The SNP of rs2230926 of the cases and control subjects were conformed to the Hardy-Weinberg equilibrium (P = 0.02257). The significantly statistical differences in alleles of T, G were founded in the cases and controls (P = 0.0027, OR 0.417, 95% CI 0.232-0.749); the genetic types of rs2230926 were associated with a susceptibility to RA, with OR 0.375 (95% CI 0.198-0.707, P = 0.0018). In the present study, our results indicated that the genetic polymorphism of rs2230926 in TNFAIP3 may be a susceptible factor conferring risk for RA in southern Chinese Han population.
PMCID: PMC4314011
Rheumatoid arthritis; TNFAIP3; single nucleotide polymorphism
15.  A comparative study of three methods for detecting association of quantitative traits in samples of related subjects 
BMC Proceedings  2009;3(Suppl 7):S122.
We used Genetic Analysis Workshop 16 Problem 3 Framingham Heart Study simulated data set to compare methods for association analysis of quantitative traits in related individuals. More specifically, we investigated type I error and relative power of three approaches: the measured genotype, the quantitative transmission-disequilibrium test (QTDT), and the quantitative trait linkage-disequilibrium (QTLD) tests. We studied high-density lipoprotein and triglyceride (TG) lipid variables, as measured at Visit 1. Knowing the answers, we selected three true major genes for high-density lipoprotein and/or TG. Empirical distributions of the three association models were derived from the first 100 replicates. In these data, all three models were similar in error rates. Across the three association models, the power was the lowest for the functional SNP with smallest size effects (i.e., α2), and for the less heritable trait (i.e., TG). Our results showed that measured genotype outperformed the two orthogonal-based association models (QTLD, QTDT), even after accounting for population stratification. QTDT had the lowest power rates. This is consistent with the amount of marker and trait data used by each association model. While the effective sample sizes varied little across our tested variants, we observed some large power drops and marked differences in performances of the models. We found that the performances contrasted the most for the tightly linked, but not associated, functional variants.
PMCID: PMC2795895  PMID: 20017988
16.  Fine-mapping using the weighted average method for a case-control study 
BMC Genetics  2005;6(Suppl 1):S67.
We present a new method for fine-mapping a disease susceptibility locus using a case-control design. The new method, termed the weighted average (WA) statistic, averages the Cochran-Armitage (CA) trend test statistic and the difference between the Hardy-Weinberg disequilibrium test statistic for cases and controls (the HWD trend). The main characteristics of the WA statistic are that it improves on the weaknesses, and maintains the strengths, of both the CA trend test and the HWD trend test. Data from three different populations in the Genetic Analysis Workshop 14 (GAW14) simulated dataset (Aipotu, Karangar, and Danacaa) were first subjected to model-free linkage analysis to find regions exhibiting linkage. Then, for fine-scale mapping, 140 SNPs within the significant linkage regions were analyzed with the WA test statistic on replicates of the three populations, both separately and combined. The regions that were significant in the multipoint linkage analysis were also significant in this fine-scale mapping. The most significant regions that were obtained using the WA statistic were regions in chromosome 3 (B03T3056–B03T3058, p-value < 1 × 10-10 ) and chromosome 9 (B09T8332–B09T8334, p-value 1 × 10-6 ). Based on the results of the simulated GAW14 data, the WA test statistic showed good performance and could narrow down the region containing the susceptibility locus. However, the strength of the signal depends on both the strength of the linkage disequilibrium and the heterozygosity of the linked marker.
PMCID: PMC1866715  PMID: 16451680
17.  Comparison of the power of haplotype-based versus single- and multilocus association methods for gene × environment (gene × sex) interactions and application to gene × smoking and gene × sex interactions in rheumatoid arthritis 
BMC Proceedings  2007;1(Suppl 1):S73.
Accounting for interactions with environmental factors in association studies may improve the power to detect genetic effects and may help identifying important environmental effect modifiers. The power of unphased genotype-versus haplotype-based methods in regions with high linkage disequilibrium (LD), as measured by D', for analyzing gene × environment (gene × sex) interactions was compared using the Genetic Analysis Workshop 15 (GAW15) simulated data on rheumatoid arthritis with prior knowledge of the answers. Stepwise and regular conditional logistic regression (CLR) was performed using a matched case-control sample for a HLA region interacting with sex. Haplotype-based analyses were performed using a haplotype-sharing-based Mantel statistic and a test for haplotype-trait association in a general linear model framework. A step-down minP algorithm was applied to derive adjusted p-values and to allow for power comparisons. These methods were also applied to the GAW15 real data set for PTPN22.
For markers in strong LD, stepwise CLR performed poorly because of the correlation/collinearity between the predictors in the model. The power was high for detecting genetic main effects using simple CLR models and haplotype-based methods and for detecting joint effects using CLR and Mantel statistics. Only the haplotype-trait association test had high power to detect the gene × sex interaction.
In the PTPN22 region with markers characterized by strong LD, all methods indicated a significant genotype × sex interaction in a sample of about 1000 subjects. The previously reported R620W single-nucleotide polymorphism was identified using logistic regression, but the haplotype-based methods did not provide any precise location information.
PMCID: PMC2367597  PMID: 18466575
18.  Investigation of Maternal Effects, Maternal-Fetal Interactions and Parent-of-Origin Effects (Imprinting), Using Mothers and Their Offspring 
Genetic Epidemiology  2011;35(1):19-45.
Many complex genetic effects, including epigenetic effects, may be expected to operate via mechanisms in the inter-uterine environment. A popular design for the investigation of such effects, including effects of parent-of-origin (imprinting), maternal genotype, and maternal-fetal genotype interactions, is to collect DNA from affected offspring and their mothers (case/mother duos) and to compare with an appropriate control sample. An alternative design uses data from cases and both parents (case/parent trios) but does not require controls. In this study, we describe a novel implementation of a multinomial modeling approach that allows the estimation of such genetic effects using either case/mother duos or case/parent trios. We investigate the performance of our approach using computer simulations and explore the sample sizes and data structures required to provide high power for detection of effects and accurate estimation of the relative risks conferred. Through the incorporation of additional assumptions (such as Hardy-Weinberg equilibrium, random mating and known allele frequencies) and/or the incorporation of additional types of control sample (such as unrelated controls, controls and their mothers, or both parents of controls), we show that the (relative risk) parameters of interest are identifiable and well estimated. Nevertheless, parameter interpretation can be complex, as we illustrate by demonstrating the mathematical equivalence between various different parameterizations. Our approach scales up easily to allow the analysis of large-scale genome-wide association data, provided both mothers and affected offspring have been genotyped at all variants of interest. Genet. Epidemiol. 35:19–45, 2011. © 2010 Wiley-Liss, Inc.
PMCID: PMC3025173  PMID: 21181895
epigenetic; log-linear model; case/parent trio
19.  Testing allele homogeneity: the problem of nested hypotheses 
BMC Genetics  2012;13:103.
The evaluation of associations between genotypes and diseases in a case-control framework plays an important role in genetic epidemiology. This paper focuses on the evaluation of the homogeneity of both genotypic and allelic frequencies. The traditional test that is used to check allelic homogeneity is known to be valid only under Hardy-Weinberg equilibrium, a property that may not hold in practice.
We first describe the flaws of the traditional (chi-squared) tests for both allelic and genotypic homogeneity. Besides the known problem of the allelic procedure, we show that whenever these tests are used, an incoherence may arise: sometimes the genotypic homogeneity hypothesis is not rejected, but the allelic hypothesis is. As we argue, this is logically impossible. Some methods that were recently proposed implicitly rely on the idea that this does not happen. In an attempt to correct this incoherence, we describe an alternative frequentist approach that is appropriate even when Hardy-Weinberg equilibrium does not hold. It is then shown that the problem remains and is intrinsic of frequentist procedures. Finally, we introduce the Full Bayesian Significance Test to test both hypotheses and prove that the incoherence cannot happen with these new tests. To illustrate this, all five tests are applied to real and simulated datasets. Using the celebrated power analysis, we show that the Bayesian method is comparable to the frequentist one and has the advantage of being coherent.
Contrary to more traditional approaches, the Full Bayesian Significance Test for association studies provides a simple, coherent and powerful tool for detecting associations.
PMCID: PMC3770452  PMID: 23176636
Allelic homogeneity test; Bayesian methods; Chi-squared test; Hardy-Weinberg equilibrium; FBST; Monotonicity
20.  Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes 
Although prospective logistic regression is the standard method of analysis for case-control data, it has been recently noted that in genetic epidemiologic studies one can use the “retrospective” likelihood to gain major power by incorporating various population genetics model assumptions such as Hardy-Weinberg-Equilibrium (HWE), gene-gene and gene-environment independence. In this article, we review these modern methods and contrast them with the more classical approaches through two types of applications (i) association tests for typed and untyped single nucleotide polymorphisms (SNPs) and (ii) estimation of haplotype effects and haplotype-environment interactions in the presence of haplotype-phase ambiguity. We provide novel insights to existing methods by construction of various score-tests and pseudo-likelihoods. In addition, we describe a novel two-stage method for analysis of untyped SNPs that can use any flexible external algorithm for genotype imputation followed by a powerful association test based on the retrospective likelihood. We illustrate applications of the methods using simulated and real data.
PMCID: PMC2883271  PMID: 20543902
Case-control studies; Empirical-Bayes; Genetic epidemiology; Haplotypes; Model averaging; Model robustness; Model selection; Retrospective studies; Shrinkage
21.  Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates 
PLoS ONE  2009;4(1):e4269.
Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data.
Methodology/Principal Findings
PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional “density landscape”, from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population Fst values as low as 0.03 (G'st>0.2), whereas the limit of resolution of the Bayesian approach was Fst = 0.05 (G'st>0.35).
We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies.
PMCID: PMC2625398  PMID: 19172174
22.  Genotyping Error Detection in Samples of Unrelated Individuals without Replicate Genotyping 
Human heredity  2008;67(3):154-162.
Identifying genotyping errors is an important issue in genetic research, yet it has been relatively less studied in samples consisting of unrelated individuals. In this article, we consider several models of genotyping errors, which were originally proposed for pedigree data, for unrelated population samples with single nucleotide polymorphism (SNP) genotype data. The mathematical constraints are investigated for detecting genotyping errors without resampling replicates or genotyping relatives.
For the various proposed genotyping error models, we unveil the conditions under which the parameters are identifiable. These results are verified through applications to simulated and real SNP data.
We show that, with constraints, two particular models provide both identifiable error rate and allele frequencies of an SNP for unrelated population data. The simulation study shows that these two models present unbiased estimates for the allele frequencies. One of the models also gives an unbiased estimate for the genotyping error rate.
While the Hardy-Weinberg equilibrium test can be used to detect genotyping errors, a key advantage of these models is the explicit estimates of genotyping error rates and allele frequencies. This work may help researchers to estimate error rates and to use the estimates in their analysis to increase power and decrease bias, without the extra work of genotyping family members or replicates.
PMCID: PMC2782542  PMID: 19077433
Genotyping error; single nucleotide polymorphisms (SNPs); identifiability
23.  Genotyping Error Detection in Samples of Unrelated Individuals without Replicate Genotyping 
Human Heredity  2008;67(3):154-162.
Identifying genotyping errors is an important issue in genetic research, yet it has been relatively less studied in samples consisting of unrelated individuals. In this article, we consider several models of genotyping errors, which were originally proposed for pedigree data, for unrelated population samples with single nucleotide polymorphism (SNP) genotype data. The mathematical constraints are investigated for detecting genotyping errors without resampling replicates or genotyping relatives.
For the various proposed genotyping error models, we unveil the conditions under which the parameters are identifiable. These results are verified through applications to simulated and real SNP data.
We show that, with constraints, two particular models provide both identifiable error rate and allele frequencies of an SNP for unrelated population data. The simulation study shows that these two models present unbiased estimates for the allele frequencies. One of the models also gives an unbiased estimate for the genotyping error rate.
While the Hardy-Weinberg equilibrium test can be used to detect genotyping errors, a key advantage of these models is the explicit estimates of genotyping error rates and allele frequencies. This work may help researchers to estimate error rates and to use the estimates in their analysis to increase power and decrease bias, without the extra work of genotyping family members or replicates.
PMCID: PMC2782542  PMID: 19077433
Genotyping error; Single nucleotide polymorphisms (SNPs); Identifiability
24.  A Candidate Gene Approach Identifies the TRAF1/C5 Region as a Risk Factor for Rheumatoid Arthritis 
PLoS Medicine  2007;4(9):e278.
Rheumatoid arthritis (RA) is a chronic autoimmune disorder affecting ∼1% of the population. The disease results from the interplay between an individual's genetic background and unknown environmental triggers. Although human leukocyte antigens (HLAs) account for ∼30% of the heritable risk, the identities of non-HLA genes explaining the remainder of the genetic component are largely unknown. Based on functional data in mice, we hypothesized that the immune-related genes complement component 5 (C5) and/or TNF receptor-associated factor 1 (TRAF1), located on Chromosome 9q33–34, would represent relevant candidate genes for RA. We therefore aimed to investigate whether this locus would play a role in RA.
Methods and Findings
We performed a multitiered case-control study using 40 single-nucleotide polymorphisms (SNPs) from the TRAF1 and C5 (TRAF1/C5) region in a set of 290 RA patients and 254 unaffected participants (controls) of Dutch origin. Stepwise replication of significant SNPs was performed in three independent sample sets from the Netherlands (ncases/controls = 454/270), Sweden (ncases/controls = 1,500/1,000) and US (ncases/controls = 475/475). We observed a significant association (p < 0.05) of SNPs located in a haplotype block that encompasses a 65 kb region including the 3′ end of C5 as well as TRAF1. A sliding window analysis revealed an association peak at an intergenic region located ∼10 kb from both C5 and TRAF1. This peak, defined by SNP14/rs10818488, was confirmed in a total of 2,719 RA patients and 1,999 controls (odds ratiocommon = 1.28, 95% confidence interval 1.17–1.39, pcombined = 1.40 × 10−8) with a population-attributable risk of 6.1%. The A (minor susceptibility) allele of this SNP also significantly correlates with increased disease progression as determined by radiographic damage over time in RA patients (p = 0.008).
Using a candidate-gene approach we have identified a novel genetic risk factor for RA. Our findings indicate that a polymorphism in the TRAF1/C5 region increases the susceptibility to and severity of RA, possibly by influencing the structure, function, and/or expression levels of TRAF1 and/or C5.
Using a candidate-gene approach, Rene Toes and colleagues identified a novel genetic risk factor for rheumatoid arthritis in theTRAF1/C5 region.
Editors' Summary
Rheumatoid arthritis is a very common chronic illness that affects around 1% of people in developed countries. It is caused by an abnormal immune reaction to various tissues within the body; as well as affecting joints and causing an inflammatory arthritis, it can also affect many other organs of the body. Severe rheumatoid arthritis can be life-threatening, but even mild forms of the disease cause substantial illness and disability. Current treatments aim to give symptomatic relief with the use of simple analgesics, or anti-inflammatory drugs. In addition, most patients are also treated with what are known as disease-modifying agents, which aim to prevent joint damage. Rheumatoid arthritis is known to have a genetic component. For example, an association has been shown with the part of the genome that contains the human leukocyte antigens (HLAs), which are involved in the immune response. Information on other genes involved would be helpful both for understanding the underlying cause of the disease and possibly for the discovery of new treatments.
Why Was This Study Done?
Previous work in mice that have a disease similar to human rheumatoid arthritis has identified a number of possible candidate genes. One of these genes, complement component 5 (C5) is involved in the complement system—a primitive system within the body that is involved in the defense against foreign molecules. In humans the gene for C5 is located on Chromosome 9 close to another gene involved in the inflammatory response, TNF receptor-associated factor 1 (TRAF1). A preliminary study in humans of this region had shown some evidence, albeit weak, to suggest that this region might be associated with rheumatoid arthritis. The authors set out to look in more detail, and in a larger group of individuals, to see if they could prove this association.
What Did the Researchers Do and Find?
The researchers took 40 genetic markers, known as single-nucleotide polymorphisms (SNPs), from across the region that included the C5 and TRAF1 genes. SNPs have each been assigned a unique reference number that specifies a point in the human genome, and each is present in alternate forms so can be differentiated. They compared which of the alternate forms were present in 290 patients with rheumatoid arthritis and 254 unaffected participants of Dutch origin. They then repeated the study in three other groups of patients and controls of Dutch, Swedish, and US origin. They found a consistent association with rheumatoid arthritis of one region of 65 kilobases (a small distance in genetic terms) that included one end of the C5 gene as well as the TRAF1 gene. They could refine the area of interest to a piece marked by one particular SNP that lay between the genes. They went on to show that the genetic region in which these genes are located may be involved in the binding of a protein that modifies the transcription of genes, thus providing a possible explanation for the association. Furthermore, they showed that one of the alternate versions of the marker in this region was associated with more aggressive disease.
What Do These Findings Mean?
The finding of a genetic association is the first step in identifying a genetic component of a disease. The strength of this study is that a novel genetic susceptibility factor for RA has been identified and that the overall result is consistent in four different populations as well as being associated with disease severity. Further work will need to be done to confirm the association in other populations and then to identify the precise genetic change involved. Hopefully this work will lead to new avenues of investigation for therapy.
Additional Information.
Please access these Web sites via the online version of this summary at
• Medline Plus, the health information site for patients from the US National Library of Medicine, has a page of resources on rheumatoid arthritis
• The UK's National Health Service online information site has information on rheumatoid arthritis
• The Arthritis Research Campaign, a UK charity that funds research on all types of arthritis, has a booklet with information for patients on rheumatoid arthritis
• Reumafonds, a Dutch arthritis foundation, gives information on rheumatoid arthritis (in Dutch)
• Autocure is an initiative whose objective is to transform knowledge obtained from molecular research into a cure for an increasing number of patients suffering from inflammatory rheumatic diseases
• The European league against Rheumatism, an organisation which represents the patient, health professionals, and scientific societies of rheumatology of all European nations
PMCID: PMC1976626  PMID: 17880261
25.  Univariate/Multivariate Genome-Wide Association Scans Using Data from Families and Unrelated Samples 
PLoS ONE  2009;4(8):e6502.
As genome-wide association studies (GWAS) are becoming more popular, two approaches, among others, could be considered in order to improve statistical power for identifying genes contributing subtle to moderate effects to human diseases. The first approach is to increase sample size, which could be achieved by combining both unrelated and familial subjects together. The second approach is to jointly analyze multiple correlated traits. In this study, by extending generalized estimating equations (GEEs), we propose a simple approach for performing univariate or multivariate association tests for the combined data of unrelated subjects and nuclear families. In particular, we correct for population stratification by integrating principal component analysis and transmission disequilibrium test strategies. The proposed method allows for multiple siblings as well as missing parental information. Simulation studies show that the proposed test has improved power compared to two popular methods, EIGENSTRAT and FBAT, by analyzing the combined data, while correcting for population stratification. In addition, joint analysis of bivariate traits has improved power over univariate analysis when pleiotropic effects are present. Application to the Genetic Analysis Workshop 16 (GAW16) data sets attests to the feasibility and applicability of the proposed method.
PMCID: PMC2715864  PMID: 19652719

Results 1-25 (1138580)