Protein N-glycosylation patterns are known to show vast genetic as well as physiological and pathological variation and represent a large pool of potential biomarkers. Large-scale studies are needed for the identification and validation of biomarkers, and the analytical techniques required have recently been developed. Such methods have up to now mainly been applied to complex mixtures of glycoproteins in biofluids (e.g. plasma). Here, we analyzed N-glycosylation profiles of alpha1-antitrypsin (AAT) and immunoglobulin A (IgA) enriched fractions by 96-well microtitration plate based high-throughput immuno-affinity capturing and N-glycan analysis using multiplexed capillary gel electrophoresis with laser-induced fluorescence detection (CGE-LIF). Human plasma samples were from the Leiden Longevity Study comprising 2415 participants of different chronological and biological ages. Glycosylation patterns of AAT enriched fractions were found to be associated with chronological (calendar) age and they differed between females and males. Moreover, several glycans in the AAT enriched fraction were associated with physiological parameters marking cardiovascular and metabolic diseases. Pronounced differences were found between males and females in the glycosylation profiles of IgA enriched fractions. Our results demonstrate that large-scale immuno-affinity capturing of proteins from human plasma using a bead-based method combined with high-throughput N-glycan analysis is a powerful tool for the discovery of glycosylation-based biomarker candidates.
In order to study family based association in the presence of linkage we extend a generalized linear mixed model proposed for genetic linkage analysis (Lebrec and van Houwelingen, 2007) by adding a genotypic effect to the mean. The corresponding score test is a weighted FBAT statistic, where the weight depends on the linkage effect and on other genetic and shared environmental effects. For testing of genetic association in the presence of gene covariate interaction, we propose a linear regression method where the family-specific score statistic is regressed on family-specific covariates. Both statistics are straightforward to compute. Simulation results show that adjusting the weight for the within-family variance structure may be a powerful approach in the presence of environmental effects. The test statistic for genetic association in the presence of gene-covariate interaction improved the power for detecting association. For illustration we analyze the Rheumatoid Arthritis data from GAW15. Adjusting for smoking and anti-CCP increased the significance of the association with the DR locus.
family-based studies; generalized linear mixed model; FBAT; Linkage; Linkage disequilibrium; Score test
Mechanisms underlying the variation in human life expectancy are largely unknown, but lipid metabolism and especially lipoprotein size was suggested to play an important role in longevity. We have performed comprehensive lipid phenotyping in the Leiden Longevity Study (LLS). By applying multiple logistic regression analysis we tested for the first time the effects of parameters in lipid metabolism (i.e., classical serum lipids, lipoprotein particle sizes, and apolipoprotein E levels) on longevity independent of each other. Parameters in lipid metabolism were measured in offspring of nonagenarian siblings from 421 families of the LLS (n = 1,664; mean age, 59 years) and in the partners of the offspring as population controls (n = 711; mean age, 60 years). In the initial model, where lipoprotein particles sizes, classical serum lipids and apolipoprotein E were included, offspring had larger low-density lipoprotein (LDL) particle sizes (p = 0.017), and lower triglyceride levels (p = 0.026), indicating that they displayed a more beneficial lipid profile. After backwards regression only LDL size (p = 0.014) and triglyceride levels (p = 0.05) were associated with offspring from long-lived families. Sex-specific backwards regression analysis revealed that LDL particle sizes were associated with male longevity (increase in log odds ratio (OR) per unit = 0.21; p = 0.023). Triglyceride levels (decrease OR per unit = 0.22; p = 0.01), but not LDL particle size, were associated with female longevity. Due to the analysis of a comprehensive lipid profile, we confirmed an important role of lipid metabolism in human longevity, with LDL size and triglyceride levels as major predicting factors.
Human longevity; Triglycerides; HDL cholesterol; LDL cholesterol; Lipoprotein particle size; Apolipoprotein E
Genotype imputation has become an essential tool in the analysis of genome-wide association scans. This technique allows investigators to test association at ungenotyped genetic markers, and to combine results across studies that rely on different genotyping platforms. In addition, imputation is used within long-running studies to reuse genotypes produced across generations of platforms. Typically, genotypes of controls are reused and cases are genotyped on more novel platforms yielding a case–control study that is not matched for genotyping platforms. In this study, we scrutinize such a situation and validate GWAS results by actually retyping top-ranking SNPs with the Sequenom MassArray platform. We discuss the needed quality controls (QCs). In doing so, we report a considerable discrepancy between the results from imputed and retyped data when applying recommended QCs from the literature. These discrepancies appear to be caused by extrapolating differences between arrays by the process of imputation. To avoid false positive results, we recommend that more stringent QCs should be applied. We also advocate reporting the imputation quality measure (RT2) for the post-imputation QCs in publications.
GWAS; imputation; quality control
The MUTYH gene is involved in base excision repair. MUTYH mutations predispose to recessively inherited colorectal polyposis and cancer. Here, we evaluate an association with breast cancer (BC), following up our previous finding of an elevated BC frequency among Dutch bi-allelic MUTYH mutation carriers. A case–control study was performed comparing 1,469 incident BC patients (ORIGO cohort), 471 individuals displaying features suggesting a genetic predisposition for BC, but without a detectable BRCA1 or BRCA2 mutation (BRCAx cohort), and 1,666 controls. First, for 303 consecutive patients diagnosed before age 55 years and/or with multiple primary breast tumors, the MUTYH coding region and flanking introns were sequenced. The remaining subjects were genotyped for five coding variants, p.Tyr179Cys, p.Arg309Cys, p.Gly396Asp, p.Pro405Leu, and p.Ser515Phe, and four tagging SNPs, c.37-2487G>T, p.Val22Met, c.504+35G>A, and p.Gln338His. No bi-allelic pathogenic MUTYH mutations were identified. The pathogenic variant p.Gly396Asp and the variant of uncertain significance p.Arg309Cys occurred twice as frequently in BRCAx subjects as compared to incident BC patients and controls (p = 0.13 and p = 0.15, respectively). The likely benign variant p.Val22Met occurred less frequently in patients from the incident BC (p = 0.03) and BRCAx groups (p = 0.11), respectively, as compared to the controls. Minor allele genotypes of several MUTYH variants showed trends towards association with lobular BC histology. This extensive case–control study could not confirm previously reported associations of MUTYH variants with BC, although it was too small to exclude subtle effects on BC susceptibility.
Electronic supplementary material
The online version of this article (doi:10.1007/s10549-012-1965-0) contains supplementary material, which is available to authorized users.
MUTYH; Breast cancer; BRCAx; Case–control study; Genotyping
Recently we proposed a novel two-step approach to test for pathway effects in disease progression. The goal of this approach is to study the joint effect of multiple single-nucleotide polymorphisms that belong to certain genes. By using random effects, our approach acknowledges the correlations within and between genes when testing for pathway effects. Gene-gene and gene-environment interactions can be included in the model. The method can be implemented with standard software, and the distribution of the test statistics under the null hypothesis can be approximated by using standard chi-square distributions. Hence no extensive permutations are needed for computations of the p-value. In this paper we adapt and apply the method to family data, and we study its performance for sequence data from Genetic Analysis Workshop 17. For the set of unrelated subjects, the performance of the new test was disappointing. We found a power of 6% for the binary outcome and of 18% for the quantitative trait Q1. For family data the new approach appears to perform well, especially for the quantitative outcome. We found a power of 39% for the binary outcome and a power of 89% for the quantitative trait Q1.
Analyzing sequencing data is difficult because of the low frequency of rare variants, which may result in low power to detect associations. We consider pathway analysis to detect multiple common and rare variants jointly and to investigate whether analysis at the pathway level provides an alternative strategy for identifying susceptibility genes. Available pathway analysis methods for data from genome-wide association studies might not be efficient because these methods are designed to detect common variants. Here, we investigate the performance of several existing pathway analysis methods for sequencing data. In particular, we consider the global test, which does not consider linkage disequilibrium between the variants in a gene. We improve the performance of the global test by assigning larger weights to rare variants, as proposed in the weighted-sum approach. Our conclusion is that straightforward application of pathway analysis is not satisfactory; hence, when common and rare variants are jointly analyzed, larger weights should be assigned to rare variants.
In genome-wide association studies (GWAS) of complex traits, single SNP analysis is still the most applied approach. However, the identified SNPs have small effects and provide limited biological insight. A more appropriate approach to interpret GWAS data of complex traits is to analyze the combined effect of a SNP set grouped per pathway or gene region. We used this approach to study the joint effect on human longevity of genetic variation in two candidate pathways, the insulin/insulin-like growth factor (IGF-1) signaling (IIS) pathway and the telomere maintenance (TM) pathway. For the analyses, we used genotyped GWAS data of 403 unrelated nonagenarians from long-lived sibships collected in the Leiden Longevity Study and 1,670 younger population controls. We analyzed 1,021 SNPs in 68 IIS pathway genes and 88 SNPs in 13 TM pathway genes using four self-contained pathway tests (PLINK set-based test, Global test, GRASS and SNP ratio test). Although we observed small differences between the results of the different pathway tests, they showed consistent significant association of the IIS and TM pathway SNP sets with longevity. Analysis of gene SNP sets from these pathways indicates that the association of the IIS pathway is scattered over several genes (AKT1, AKT3, FOXO4, IGF2, INS, PIK3CA, SGK, SGK2, and YWHAG), while the association of the TM pathway seems to be mainly determined by one gene (POT1). In conclusion, this study shows that genetic variation in genes involved in the IIS and TM pathways is associated with human longevity.
Electronic supplementary material
The online version of this article (doi:10.1007/s11357-011-9340-3) contains supplementary material, which is available to authorized users.
Genetics; Aging; Longevity; Gene set analysis; Insulin/IGF-1 signaling; Telomere maintenance
Cytokines are major immune system regulators. Previously, innate cytokine profiles determined by lipopolysaccharide stimulation were shown to be highly heritable. To identify regulating genes in innate immunity, we analyzed data from a genome-wide linkage scan using microsatellites in osteoarthritis (OA) patients (The GARP study) and their innate cytokine data on interleukin (IL)-1β, IL-1Ra, IL-10 and tumor necrosis factor (TNF)α. A confirmation cohort consisted of the Leiden 85-Plus study. In this study, a linkage analysis was followed by manual selection of candidate genes in linkage regions showing LOD scores over 2.5. An single-nucleotide polymorphism (SNP) gene tagging method was applied to select SNPs on the basis of the highest level of gene tagging and possible functional effects. QTDT was used to identify the SNPs associated with innate cytokine production. Initial association signals were modeled by a linear mixed model. Through these analyses, we identified 10 putative genes involved in the regulation of TNFα. SNP rs6679497 in gene CD53 showed significant association with TNFα levels (P=0.001). No association of this SNP was observed with OA. A novel gene involved in the innate immune response of TNFα is identified. Genetic variation in this gene may have a role in diseases and disorders in which TNFα is closely involved.
linkage; osteoarthritis; immunity; TNF; GARP; CD53
Various cytokines and inflammatory mediators are known to be involved in the pathogenesis of rheumatoid arthritis (RA). We hypothesized that polymorphisms in selected inflammatory response and tissue repair genes contribute to the susceptibility to and severity of RA.
Polymorphisms in TNFA, IL1B, IL4, IL6, IL8, IL10, PAI1, NOS2a, C1INH, PARP, TLR2 and TLR4 were genotyped in 376 Caucasian RA patients and 463 healthy Caucasian controls using single base extension. Genotype distributions in patients were compared with those in controls. In addition, the association of polymorphisms with the need for anti-TNF-α treatment as a marker of RA severity was assessed.
The IL8 781 CC genotype was associated with early onset of disease. The TNFA -238 G/A polymorphism was differentially distributed between RA patients and controls, but only when not corrected for age and gender. None of the polymorphisms was associated with disease severity.
We here report an association between IL8 781 C/T polymorphism and age of onset of RA. Our findings indicate that there might be a role for variations in genes involved in the immune response and in tissue repair in RA pathogenesis. Nevertheless, additional larger genomic and functional studies are required to further define their role in RA.
At least 20 type 2 diabetes loci have now been identified, and several of these are associated with altered β-cell function. In this study, we have investigated the combined effects of eight known β-cell loci on insulin secretion stimulated by three different secretagogues during hyperglycemic clamps.
RESEARCH DESIGN AND METHODS
A total of 447 subjects originating from four independent studies in the Netherlands and Germany (256 with normal glucose tolerance [NGT]/191 with impaired glucose tolerance [IGT]) underwent a hyperglycemic clamp. A subset had an extended clamp with additional glucagon-like peptide (GLP)-1 and arginine (n = 224). We next genotyped single nucleotide polymorphisms in TCF7L2, KCNJ11, CDKAL1, IGF2BP2, HHEX/IDE, CDKN2A/B, SLC30A8, and MTNR1B and calculated a risk allele score by risk allele counting.
The risk allele score was associated with lower first-phase glucose-stimulated insulin secretion (GSIS) (P = 7.1 × 10−6). The effect size was equal in subjects with NGT and IGT. We also noted an inverse correlation with the disposition index (P = 1.6 × 10−3). When we stratified the study population according to the number of risk alleles into three groups, those with a medium- or high-risk allele score had 9 and 23% lower first-phase GSIS. Second-phase GSIS, insulin sensitivity index and GLP-1, or arginine-stimulated insulin release were not significantly different.
A combined risk allele score for eight known β-cell genes is associated with the rapid first-phase GSIS and the disposition index. The slower second-phase GSIS, GLP-1, and arginine-stimulated insulin secretion are not associated, suggesting that especially processes involved in rapid granule recruitment and exocytosis are affected in the majority of risk loci.
Recently, results from a meta-analysis of genome-wide association studies have yielded a number of novel type 2 diabetes loci. However, conflicting results have been published regarding their effects on insulin secretion and insulin sensitivity. In this study we used hyperglycemic clamps with three different stimuli to test associations between these novel loci and various measures of β-cell function.
RESEARCH DESIGN AND METHODS
For this study, 336 participants, 180 normal glucose tolerant and 156 impaired glucose tolerant, underwent a 2-h hyperglycemic clamp. In a subset we also assessed the response to glucagon-like peptide (GLP)-1 and arginine during an extended clamp (n = 123). All subjects were genotyped for gene variants in JAZF1, CDC123/CAMK1D, TSPAN8/LGR5, THADA, ADAMTS9, NOTCH2/ADAMS30, DCD, VEGFA, BCL11A, HNF1B, WFS1, and MTNR1B.
Gene variants in CDC123/CAMK1D, ADAMTS9, BCL11A, and MTNR1B affected various aspects of the insulin response to glucose (all P < 6.9 × 10−3). The THADA gene variant was associated with lower β-cell response to GLP-1 and arginine (both P < 1.6 × 10−3), suggesting lower β-cell mass as a possible pathogenic mechanism. Remarkably, we also noted a trend toward an increased insulin response to GLP-1 in carriers of MTNR1B (P = 0.03), which may offer new therapeutic possibilities. The other seven loci were not detectably associated with β-cell function.
Diabetes risk alleles in CDC123/CAMK1D, THADA, ADAMTS9, BCL11A, and MTNR1B are associated with various specific aspects of β-cell function. These findings point to a clear diversity in the impact that these various gene variants may have on (dys)function of pancreatic β-cells.
Fc gamma receptors (FcγRs) play a crucial role in immunity by linking IgG antibody-mediated responses with cellular effector and regulatory functions. Genetic variants in these receptors have been previously identified as risk factors for several chronic inflammatory conditions. The present study aimed to investigate the presence of copy number variations (CNVs) in the FCGR3B gene and its potential association with the autoimmune disease rheumatoid arthritis (RA).
CNV of the FCGR3B gene was studied using Multiplex Ligation Dependent Probe Amplification (MLPA) in 518 Dutch RA patients and 304 healthy controls. Surprisingly, three independent MLPA probes targeting the FCGR3B promoter measured different CNV frequencies, with probe#1 and #2 measuring 0 to 5 gene copies and probe#3 showing little evidence of CNV. Quantitative-PCR correlated with the copy number results from MLPA probe#2, which detected low copy number (1 copy) in 6.7% and high copy number (≥3 copies) in 9.4% of the control population. No significant difference was observed between RA patients and the healthy controls, neither in the low copy nor the high copy number groups (p-values = 0.36 and 0.71, respectively). Sequencing of the FCGR3B promoter region revealed an insertion/deletion (indel) that explained the disparate CNV results of MLPA probe#1. Finally, a non-significant trend was found between the novel -256A>TG indel and RA (40.7% in healthy controls versus 35.9% in RA patients; P = 0.08).
The current study highlights the complexity and poor characterization of the FCGR3B gene sequence, indicating that the design and interpretation of genotyping assays based on specific probe sequences must be performed with caution. Nonetheless, we confirmed the presence of CNV and identified novel polymorphisms in the FCGR3B gene in the Dutch population. Although no association was found between RA and FCGR3B CNV, the possible protective effect of the -256A>TG indel polymorphism must be addressed in larger studies.
Markers for longevity that reflect the health condition and predict healthy aging are extremely scarce. Such markers are, however, valuable in aging research. It has been shown previously that the N-glycosylation pattern of human immunoglobulin G (IgG) is age-dependent. Here we investigate whether N-linked glycans reflect early features of human longevity.
The Leiden Longevity Study (LLS) consists of nonagenarian sibling pairs, their offspring, and partners of the offspring serving as control. IgG subclass specific glycosylation patterns were obtained from 1967 participants in the LLS by MALDI-TOF-MS analysis of tryptic IgG Fc glycopeptides. Several regression strategies were applied to evaluate the association of IgG glycosylation with age, sex, and longevity. The degree of galactosylation of IgG decreased with increasing age. For the galactosylated glycoforms the incidence of bisecting GlcNAc increased as a function of age. Sex-related differences were observed at ages below 60 years. Compared to males, younger females had higher galactosylation, which decreased stronger with increasing age, resulting in similar galactosylation for both sexes from 60 onwards. In younger participants (<60 years of age), but not in the older age group (>60 years), decreased levels of non-galactosylated glycoforms containing a bisecting GlcNAc reflected early features of longevity.
We here describe IgG glycoforms associated with calendar age at all ages and the propensity for longevity before middle age. As modulation of IgG effector functions has been described for various IgG glycosylation features, a modulatory effect may be expected for the longevity marker described in this study.
Our aim is to develop methods for mapping genes related to age at onset in general pedigrees. We propose two score tests, one derived from a gamma frailty model with pairwise likelihood and one derived from a log-normal frailty model with approximated likelihood around the null random effect. The score statistics are weighted nonparametric linkage statistics, with weights depending on the age at onset. These tests are correct under the null hypothesis irrespective of the weight used. They are simple, robust, computationally fast, and can be applied to large, complex pedigrees. We apply these methods to simulated data and to the Genetic Analysis Workshop 16 Framingham Heart Study data set. We investigate the time to the first of three events: hard coronary heart disease, diabetes, or death from any cause. We use a two-step procedure. In the first step, we estimate the population parameters under the null hypothesis of no linkage. In the second step, we apply the score tests, using the population parameters estimated in the first step.
We describe an empirical Bayesian linear model for integration of functional gene annotation data with genome-wide association data. Using case-control study data from the North American Rheumatoid Arthritis Consortium and gene annotation data from the Gene Ontology, we illustrate how the method can be used to prioritize candidate genes for further investigation.
We investigated efficient case-control association analysis using family data. The outcome of interest was coronary heart disease. We employed existing and new methods that take into account the correlations among related individuals to obtain the proper type I error rates. The methods considered for autosomal single-nucleotide polymorphisms were: 1) generalized estimating equations-based methods, 2) variance-modified Cochran-Armitage (MCA) trend test incorporating kinship coefficients, and 3) genotypic modified quasi-likelihood score test. Additionally, for X-linked single-nucleotide polymorphisms we proposed a two-degrees-of-freedom test. Performance of these methods was tested using Framingham Heart Study 500 k array data.
In haplotype-based candidate gene studies a problem is that the genotype data are unphased, which results in haplotype ambiguity. The measure  quantifies haplotype predictability from genotype data. It is computed for each individual haplotype, and for a measure of global relative efficiency a minimum value is suggested. Alternatively, we developed methods directly based on the information content of haplotype frequency estimates to obtain global relative efficiency measures: and based on A- and D-optimality, respectively. All three methods are designed for single populations; they can be applied in cases only, controls only or the whole data. Therefore they are not necessarily optimal for haplotype testing in case-control studies.
A new global relative efficiency measure was derived to maximize power of a simple test statistic that compares haplotype frequencies in cases and controls. Application to real data showed that our proposed method gave a clear and summarizing measure for the case-control study conducted. Additionally this measure might be used for selection of individuals, who have the highest potential for improving power by resolving phase ambiguity.
Instead of using relative efficiency measure for cases only, controls only or their combined data, we link uncertainty measure to case-control studies directly. Hence, our global efficiency measure might be useful to assess whether data are informative or have enough power for estimation of a specific haplotype risk.
Elevated circulating levels of C-reactive protein (CRP), interleukin (IL)-6 and fibrinogen (FG) have been repeatedly associated with many adverse outcomes in patients with chronic obstructive pulmonary disease (COPD). To date, it remains unclear whether and to what extent systemic inflammation is primary or secondary in the pathogenesis of COPD.
The aim of this study was to examine the association between haplotypes of CRP, IL6 and FGB genes, systemic inflammation, COPD risk and COPD-related phenotypes (respiratory impairment, exercise capacity and body composition).
Eighteen SNPs in three genes, representing optimal haplotype-tagging sets, were genotyped in 355 COPD patients and 195 healthy smokers. Plasma levels of CRP, IL-6 and FG were measured in the total study group. Differences in haplotype distributions were tested using the global and haplotype-specific statistics.
Raised plasma levels of CRP, IL-6 and fibrinogen were demonstrated in COPD patients. However, COPD population was very heterogeneous: about 40% of patients had no evidence of systemic inflammation (CRP < 3 mg/uL or no inflammatory markers in their top quartile). Global test for haplotype effect indicated association of CRP gene and CRP plasma levels (P = 0.0004) and IL6 gene and COPD (P = 0.003). Subsequent analysis has shown that IL6 haplotype H2, associated with an increased COPD risk (p = 0.004, OR = 4.82; 1.64 to 4.18), was also associated with very low CRP levels (p = 0.0005). None of the genes were associated with COPD-related phenotypes.
Our findings suggest that common genetic variation in CRP and IL6 genes may contribute to heterogeneity of COPD population associated with systemic inflammation.
The statistical analysis of immunological data may be complicated because precise quantitative levels cannot always be determined. Values below a given detection limit may not be observed (nondetects), and data with nondetects are called left-censored. Since nondetects cannot be considered as missing at random, a statistician faced with data containing these nondetects must decide how to combine nondetects with detects. Till now, the common practice is to impute each nondetect with a single value such as a half of the detection limit, and to conduct ordinary regression analysis. The first aim of this paper is to give an overview of methods to analyze, and to provide new methods handling censored data other than an (ordinary) linear regression. The second aim is to compare these methods by simulation studies based on real data.
We compared six new and existing methods: deletion of nondetects, single substitution, extrapolation by regression on order statistics, multiple imputation using maximum likelihood estimation, tobit regression, and logistic regression. The deletion and extrapolation by regression on order statistics methods gave biased parameter estimates. The single substitution method underestimated variances, and logistic regression suffered loss of power. Based on simulation studies, we found that tobit regression performed well when the proportion of nondetects was less than 30%, and that taken together the multiple imputation method performed best.
Based on simulation studies, the newly developed multiple imputation method performed consistently well under different scenarios of various proportion of nondetects, sample sizes and even in the presence of heteroscedastic errors.
Despite the current trend towards large epidemiological studies of unrelated individuals, linkage studies in families are still thoroughly being utilized as tools for disease gene mapping. The use of the single-nucleotide-polymorphisms (SNP) array technology in genotyping of family data has the potential to provide more informative linkage data. Nevertheless, SNP array data are not immune to genotyping error which, as has been suggested in the past, could dramatically affect the evidence for linkage especially in selective designs such as affected sib pair (ASP) designs. The influence of genotyping error on selective designs for continuous traits has not been assessed yet.
We use the identity-by-descent (IBD) regression-based paradigm for linkage testing to analytically quantify the effect of simple genotyping error models under specific selection schemes for sibling pairs. We show, for example, that in extremely concordant (EC) designs, genotyping error leads to decreased power whereas it leads to increased type I error in extremely discordant (ED) designs. Perhaps surprisingly, the effect of genotyping error on inference is most severe in designs where selection is least extreme. We suggest a genomic control for genotyping errors via a simple modification of the intercept in the regression for linkage.
This study extends earlier findings: genotyping error can substantially affect type I error and power in selective designs for continuous traits. Designs involving both EC and ED sib pairs are fairly immune to genotyping error. When those designs are not feasible the simple genomic control strategy that we suggest offers the potential to deliver more robust inference, especially if genotyping is carried out by SNP array technology.
To detect association of the DR1 allele with rheumatoid arthritis (RA) given linkage in the affected sibling pairs of the replicates of Problem 3 of Genetic Analysis Workshop 15 (GAW15), we propose a new score statistic that takes into account the linkage information. We knew the answers. Linkage studies are often followed by case-control association studies of candidate genes located under the peak to identify the causes of a linkage peak. One strategy is to type the affected sibling pairs from the original linkage study and a set of unrelated controls for single-nuclear polymorphisms describing the genetic variation of these genes. For this affected sibling pair-control design, we propose a relative-risk model for the relationship between the disease outcomes of sibling pairs and their genotypes and identity-by-descent status at the locus of interest. From this model, we derive a score statistic to analyze genetic association given linkage. We compare the performance of the new statistic to the method of Li et al. and to a standard association analysis that neglects the information on the identity-by-descent status of the sibling pair. We conclude that for the GAW15 data the new method performs well and that methods that use the linkage information may be more efficient than standard comparisons of genotypes in cases and controls.
Our aim is to develop methods for identifying a (causal) variant or variants from a dense panel of single-nucleotide polymorphisms (SNPs) that are genotyped on the evidence of previous studies. Because a large number of SNPs are in close proximity to each other, the magnitude of linkage disequilibrium (LD) plays an important role. Namely, highly correlated SNPs may hamper standard methods such as multivariate logistic regression due to multicolinearity between the covariates. Sequences of models with high dimension naturally raise questions about model selection strategies. We investigate three variable selection methods based on logistic regression. The penalties on stepwise selection were imposed using the Akaike's Information Criterion (AIC), and using the lasso penalty. Finally, a Bayesian variable-selection logistic regression model was implemented. The methods are illustrated using the simulated dense SNPs including the causal DR/C locus on chromosome 6. We also evaluate model selection in terms of average prediction error across nine replicates. We conclude that for the Genetic Analysis Workshop 15 (GAW15) data, the newly developed Bayesian selection method performs well.
In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known.
By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study.
We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.