To examine the associations of stressful experiences and social support with cognitive function in a sample of middle-aged adults with a family history of Alzheimer’s disease (AD).
Using data from the Wisconsin Registry for Alzheimer’s Prevention (WRAP; N=623), we evaluated relationships between stressful events experienced in the past year, as well as social support, and cognitive performance in four domains: speed and flexibility, immediate memory, verbal learning and memory, and working memory. We assessed interactions between psychosocial predictors, and with APOE ε4 status.
Greater number of stressful events was associated with poorer performance on tests of speed and flexibility. Greater social support was associated with better performance in the same domain; this relationship was diminished by presence of the ε4 allele. No associations were seen in the remaining three domains.
Psychosocial factors may influence cognition in at-risk individuals; influence varies by cognitive domain and ε4 status.
Cognitive function; geriatrics; social factors; stressful events; gene-environment interaction
The strongest genetic factor for late-onset Alzheimer’s disease (AD) is APOE; nine additional susceptibility genes have recently been identified. The effect of these genes is often assumed to be additive and polygenic scores are formed as a summary measure of risk. However, interactions between these genes are likely to be important. We sought to examine the role of interactions between the nine recently identified AD susceptibility genes and APOE in cognitive function and decline in 1,153 participants from the Wisconsin Registry for Alzheimer’s Prevention, a longitudinal study of middle-aged adults enriched for a parental history of AD. Participants underwent extensive cognitive testing at baseline and up to two additional visits approximately 4 and 6 years later. The influence of the interaction between APOE and each of 14 single nucleotide polymorphisms (SNPs) in the nine recently identified genes on three cognitive factor scores (Verbal Learning and Memory, Working Memory, and Immediate Memory) was examined using linear mixed models adjusting for age, gender and ancestry. Interactions between the APOE ε4 allele and both of the genotyped ABCA7 SNPs, rs3764650 and rs3752246, were associated with all three cognitive factor scores (P-values ≤0.01). Both of these genes are in the cholesterol metabolism pathway leading to AD. This research supports the importance of considering non-additive effects of AD susceptibility genes.
gene-gene interaction; memory; cognition; Alzheimer’s disease; cholesterol
Both shorter and longer telomeres in peripheral blood leukocyte (PBL)
DNA have been associated with cancer risk. However, associations remain
inconsistent across studies of the same cancer type. This study compares DNA
preparation methods to determine telomere length from colorectal cancer
We examined PBL relative telomere length (RTL) measured by
quantitative PCR (qPCR) in 1,033 colorectal cancer patients and 2,952
healthy controls. DNA was extracted with Phenol/Chloroform, PureGene or
We observed differences in RTL depending on DNA extraction method
(p<0.001). Phenol/Chloroform extracted DNA had a mean RTL (T/S
ratio) of 0.78 (range 0.01-6.54) ) compared to PureGene extracted DNA (mean
RTL of 0.75; range 0.00-12.33). DNA extracted by QIAamp yielded a mean RTL
of 0.38 (range 0.02-3.69). We subsequently compared RTL measured by qPCR
from an independent set of 20 colorectal cancer cases and 24 normal controls
in PBL DNA extracted by each of the three extraction methods. The range of
RTL measured by qPCR from QIAamp-extracted DNA (0.17-0.58-) was smaller than
from either PureGene or Phenol/Chloroform (ranges:0.04-2.67 and 0.32-2.81,
RTL measured by qPCR from QIAamp-extracted DNA was smaller than from
either PureGene or Phenol/Chloroform (p<0.001).
Differences in DNA extraction method may contribute to the
discrepancies between studies seeking to find an association between the
risk of cancer or other diseases and RTL.
Telomere length; extraction methods; colorectal cancer
Recent advances in sequencing technology have presented both opportunities and challenges, with limited statistical power to detect a single causal rare variant with practical sample sizes. To overcome this, the contributors to Group 1 of Genetic Analysis Workshop 17 sought to develop methods to detect the combined signal of multiple causal rare variants in a biologically meaningful way. The contributors used genes, genome location proximity, or genetic pathways as the basic unit in combining the information from multiple variants. Weaknesses of the exome sequence data and the relative strengths and weaknesses of the five approaches are discussed.
Bayesian; pathways; simulated
Vitamin D deficiency is associated with many adverse health outcomes. There are several well established environmental predictors of vitamin D concentrations, yet studies of the genetic determinants of vitamin D concentrations are in their infancy. Our objective was to conduct a pilot genome-wide association (GWA) study of 25-hydroxyvitamin D (25[OH]D) and 1,25-dihydroxyvitamin D (1,25[OH]2D) concentrations in a subset of 229 Hispanic subjects, followed by replication genotyping of 50 single nucleotide polymorphisms (SNPs) in the entire sample of 1,190 Hispanics from San Antonio, Texas and San Luis Valley, Colorado. Of the 309,200 SNPs that met all quality control criteria, three SNPs in high linkage disequilibrium (LD) with each other were significantly associated with 1,25[OH]2D (rs6680429, rs9970802, and rs10889028) at a Bonferroni corrected P-value threshold of 1.62 × 10−7, however none met the threshold for 25[OH]D. Of the 50 SNPs selected for replication genotyping, five for 25[OH]D (rs2806508, rs10141935, rs4778359, rs1507023, and rs9937918) and eight for 1,25[OH]2D (rs6680429, rs1348864, rs4559029, rs12667374, rs7781309, rs10505337, rs2486443, and rs2154175) were replicated in the entire sample of Hispanics (P < 0.01). In conclusion, we identified several SNPs that were associated with vitamin D metabolite concentrations in Hispanics. These candidate polymorphisms merit further investigation in independent populations and other ethnicities.
Vitamin D; 25-hydroxyvitamin D; 1,25-dihydroxyvitamin D; genome-wide association study; Hispanic
Despite the importance of gene-environment (G×E) interactions in the etiology of common diseases, little work has been done to develop methods for detecting these types of interactions in genome-wide association study data. This was the focus of Genetic Analysis Workshop 16 Group 10 contributions, which introduced a variety of new methods for the detection of G×E interactions in both case-control and family-based data using both cross-sectional and longitudinal study designs. Many of these contributions detected significant G×E interactions. Although these interactions have not yet been confirmed, the results suggest the importance of testing for interactions. Issues of sample size, quantifying the environmental exposure, longitudinal data analysis, family-based analysis, selection of the most powerful analysis method, population stratification, and computational expense with respect to testing G×E interactions are discussed.
GAW; case-control; family-based; cross-sectional; longitudinal; rheumatoid arthritis; Framingham Heart Study
Several observational studies have recently suggested an inverse association of circulating levels of vitamin D with blood pressure. These findings have been based mainly on Caucasian populations; whether this association also exists among Hispanic and African Americans has yet to be definitively determined. This study investigates the association of 25-hydroxyvitamin D (25[OH]D) with blood pressure in Hispanic and African Americans.
The data source for this study is the Insulin Resistance Atherosclerosis Family Study (IRASFS), which consists of Hispanic- and African-American families from three U.S. recruitment centers (n=1334). A variance components model was used to analyze the association of plasma 25[OH]D levels with blood pressure.
An inverse association was found between 25[OH]D and both systolic (β for 10 ng/mL difference= −2.05; p<0.01) and diastolic (β for 10 ng/mL difference= −1.35; p<0.001) blood pressure in all populations combined, after adjusting for age, sex, ethnicity and season of blood draw. Further adjustment for body mass index (BMI) weakened this association (β for 10 ng/mL difference= −0.94; p=0.14 and β for 10 ng/mL difference = −0.64; p=0.09, respectively).
25[OH]D levels are significantly inversely associated with blood pressure in Hispanic and African Americans from the IRASFS. However, this association was not significant after adjustment for BMI. Further research is needed to determine the role of BMI in this association. Large, well-designed prospective studies of the effect of vitamin D supplementation on blood pressure may be warranted.
Vitamin D; 25-hydroxyvitamin D; blood pressure; hypertension; race; ethnic groups; Hispanic; African American
We evaluated whether 13 single nucleotide polymorphisms (SNPs) identified in genome-wide association studies interact with one another and with reproductive and menstrual risk factors in association with breast cancer risk. DNA samples and information on parity, breastfeeding, age at menarche, age at first birth, and age at menopause were collected through structured interviews from 1484 breast cancer cases and 1307 controls who participated in a population-based case-control study conducted in three U.S. states. A polygenic score was created as the sum of risk allele copies multiplied by the corresponding log odds estimate. Logistic regression was used to test associations between SNPs, the score, reproductive and menstrual factors and breast cancer risk. Nonlinearity of the score was assessed by the inclusion of a quadratic term for polygenic score. Interactions between the aforementioned variables were tested by including a cross-product term in models. We confirmed associations between rs13387042 (2q35), rs4973768 (SLC4A7), rs10941679 (5p12), rs2981582 (FGFR2), rs3817198 (LSP1), rs3803662 (TOX3) and rs6504950 (STXBP4) with breast cancer. Women in the score’s highest quintile had 2.2-fold increased risk when compared to women in the lowest quintile (95% confidence interval:1.67–2.88). The quadratic polygenic score term was not significant in the model (p=0.85), suggesting established breast cancer loci are not associated with increased risk more than the sum of risk alleles. Modifications of menstrual and reproductive risk factors associations with breast cancer risk by polygenic score were not observed. Our results suggest interactions between breast cancer susceptibility loci and reproductive factors are not strong contributors to breast cancer risk.
Epidemiology; reproductive and menstrual factors; breast cancer; breast cancer susceptibility loci
We tested variants in genes related to lutein and zeaxanthin status for association with age-related macular degeneration (AMD) in the Carotenoids in Age-Related Eye Disease Study (CAREDS).
Of 2005 CAREDS participants, 1663 were graded for AMD from fundus photography and genotyped for 424 single nucleotide polymorphisms (SNPs) from 24 candidate genes for carotenoid status. Of 337 AMD cases 91% had early or intermediate AMD. The SNPs were tested individually for association with AMD using logistic regression. A carotenoid-related genetic risk model was built using backward selection and compared to existing AMD risk factors using the area under the receiver operating characteristic curve (AUC).
A total of 24 variants from five genes (BCMO1, BCO2, NPCL1L1, ABCG8, and FADS2) not previously related to AMD and four genes related to AMD in previous studies (SCARB1, ABCA1, APOE, and ALDH3A2) were associated independently with AMD, after adjusting for age and ancestry. Variants in all genes (not always the identical SNPs) were associated with lutein and zeaxanthin in serum and/or macula, in this or other samples, except for BCO2 and FADS2. A genetic risk score including nine variants significantly (P = 0.002) discriminated between AMD cases and controls beyond age, smoking, CFH Y402H, and ARMS2 A69S. The odds ratio (95% confidence interval) for AMD among women in the highest versus lowest quintile for the risk score was 3.1 (2.0–4.9).
Variants in genes related to lutein and zeaxanthin status were associated with AMD in CAREDS, adding to the body of evidence supporting a protective role of lutein and zeaxanthin in risk of AMD.
In this study of over 1600 postmenopausal women of the CAREDS, we describe the first evidence that variation in multiple genes related to carotenoid status in the blood and macula are associated with age-related macular degeneration (AMD).
macular degeneration; carotenoids; genes
Although markers identified by genome-wide association studies have individually
strong statistical significance, their performance in prediction remains limited. Our
goal was to use animal breeding genomic prediction models to predict additive genetic
contributions for systolic blood pressure (SBP) using whole genome sequencing data
with different validation designs.
The additive genetic contributions of SBP were estimated via linear mixed model. Rare
variants (MAF<0.05) were collapsed through the k-means method to create a
"collapsed single-nucleotide polymorphisms." Prediction of the additive genomic
contributions of SBP was conducted using genomic Best Linear Unbiased Predictor
(GBLUP) and BayesCπ. Estimates of predictive accuracy were compared
using common single-nucleotide polymorphisms (SNPs) versus common and collapsed SNPs,
and for prediction within and across families.
The additive genetic variance of SBP contributed to 18% of the phenotypic variance
(h2 = 0.18). BayesCπ had slightly better
prediction accuracies than GBLUP. In both models, within-family predictions had
higher accuracies both in the training and testing set than didacross-family design.
Collapsing rare variants via the k-means method and adding to the common SNPs did not
improve prediction accuracies. The prediction model, including both pedigree and
genomic information, achieved a slightly higher accuracy than using either source of
Prediction of genetic contributions to complex traits is feasible using whole genome
sequencing and statistical methods borrowed from animal breeding. The relatedness of
individuals between the training and testing set strongly affected the performance of
prediction models. Methods for inclusion of rare variants in these models need more
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
Telomeres are nucleoprotein structures that cap the end of chromosomes and shorten with sequential cell divisions in normal aging. Short telomeres are also implicated in the incidence of many cancers, but the evidence is not conclusive for colorectal cancer (CRC). Therefore, the aim of this study was to assess the association of CRC and telomere length.
In this case–control study, we measured relative telomere length from peripheral blood leukocytes (PBLs) DNA with quantitative PCR in 598 CRC patients and 2,212 healthy controls.
Multivariate analysis indicated that telomere length was associated with risk for CRC, and this association varied in an age-related manner; younger individuals (≤50 years of age) with longer telomeres (80–99 percentiles) had a 2–6 times higher risk of CRC, while older individuals (>50 years of age) with shortened telomeres (1–10 percentiles) had 2–12 times the risk for CRC. The risk for CRC varies with extremes in telomere length in an age-associated manner.
Younger individuals with longer telomeres or older individuals with shorter telomeres are at higher risk for CRC. These findings indicate that the association of PBL telomere length varies according to the age of cancer onset and that CRC is likely associated with at minimum two different mechanisms of telomere dynamics.
Why does living in a disadvantaged neighborhood predict poorer mental and physical health? Recent research focusing on the Southwestern United States suggests that disadvantaged neighborhoods favor poor health, in part, because they undermine sleep quality. Building on previous research, we test whether this process extends to the Midwestern United States. Specifically, we use cross-sectional data from the Survey of the Health of Wisconsin (SHOW), a statewide probability sample of Wisconsin adults, to examine whether associations among perceived neighborhood quality (e.g., perceptions of crime, litter, and pleasantness in the neighborhood) and health status (overall self-rated health and depression) are mediated by overall sleep quality (measured as self-rated sleep quality and physician diagnosis of sleep apnea). We find that perceptions of low neighborhood quality are associated with poorer self-rated sleep quality, poorer self-rated health, and more depressive symptoms. We also observe that poorer self-rated sleep quality is associated with poorer self-rated health and more depressive symptoms. Our mediation analyses indicate that self-rated sleep quality partially mediates the link between perceived neighborhood quality and health status. Specifically, self-rated sleep quality explains approximately 20% of the association between neighborhood quality and self-rated health and nearly 19% of the association between neighborhood quality and depression. Taken together, these results confirm previous research and extend the generalizability of the indirect effect of perceived neighborhood context on health status through sleep quality.
Sleep; Sleep quality; Neighborhood context; Neighborhood quality; Self-rated health; Depression; Wisconsin; USA
Colorectal cancer (CRC) tumor DNA is characterized by chromosomal damage termed chromosomal instability (CIN) and excessively shortened telomeres. Up to 80% of CRC is microsatellite stable (MSS) and is historically considered to be chromosomally unstable (CIN+). However, tumor phenotyping depicts some MSS CRC with little or no genetic changes, thus being chromosomally stable (CIN-). MSS CIN- tumors have not been assessed for telomere attrition.
MSS rectal cancers from patients ≤50 years old with Stage II (B2 or higher) or Stage III disease were assessed for CIN, telomere length and telomere maintenance mechanism (telomerase activation [TA]; alternative lengthening of telomeres [ALT]). Relative telomere length was measured by qPCR in somatic epithelial and cancer DNA. TA was measured with the TRAPeze assay, and tumors were evaluated for the presence of C-circles indicative of ALT. p53 mutation status was assessed in all available samples. DNA copy number changes were evaluated with Spectral Genomics aCGH.
Tumors were classified as chromosomally stable (CIN-) and chromosomally instable (CIN+) by degree of DNA copy number changes. CIN- tumors (35%; n=6) had fewer copy number changes (<17% of their clones with DNA copy number changes) than CIN+ tumors (65%; n=13) which had high levels of copy number changes in 20% to 49% of clones. Telomere lengths were longer in CIN- compared to CIN+ tumors (p=0.0066) and in those in which telomerase was not activated (p=0.004). Tumors exhibiting activation of telomerase had shorter tumor telomeres (p=0.0040); and tended to be CIN+ (p=0.0949).
MSS rectal cancer appears to represent a heterogeneous group of tumors that may be categorized both on the basis of CIN status and telomere maintenance mechanism. MSS CIN- rectal cancers appear to have longer telomeres than those of MSS CIN+ rectal cancers and to utilize ALT rather than activation of telomerase.
To investigate genetic determinants of macular pigment optical density in women from the Carotenoids in Age-Related Eye Disease Study (CAREDS), an ancillary study of the Women's Health Initiative Observational Study.
1585 of 2005 CAREDS participants had macular pigment optical density (MPOD) measured noninvasively using customized heterochromatic flicker photometry and blood samples genotyped for 440 single nucleotide polymorphisms (SNPs) in 26 candidate genes related to absorption, transport, binding, and cleavage of carotenoids directly, or via lipid transport. SNPs were individually tested for associations with MPOD using least-squares linear regression.
Twenty-one SNPs from 11 genes were associated with MPOD (P ≤ 0.05) after adjusting for dietary intake of lutein and zeaxanthin. This includes variants in or near genes related to zeaxanthin binding in the macula (GSTP1), carotenoid cleavage (BCMO1), cholesterol transport or uptake (SCARB1, ABCA1, ABCG5, and LIPC), long-chain omega-3 fatty acid status (ELOVL2, FADS1, and FADS2), and various maculopathies (ALDH3A2 and RPE65). The strongest association was for rs11645428 near BCMO1 (βA = 0.029, P = 2.2 × 10−4). Conditional modeling within genes and further adjustment for other predictors of MPOD, including waist circumference, diabetes, and dietary intake of fiber, resulted in 13 SNPs from 10 genes maintaining independent association with MPOD. Variation in these single gene polymorphisms accounted for 5% of the variability in MPOD (P = 3.5 × 10−11).
Our results support that MPOD is a multi-factorial phenotype associated with variation in genes related to carotenoid transport, uptake, and metabolism, independent of known dietary and health influences on MPOD.
In 1585 postmenopausal women of the Carotenoids in Age-Related Eye Disease Study sample, common genetic variants in or near genes involved in carotenoid transport, uptake, and metabolism were associated with density of lutein and zeaxanthin in the macula, independent of other known predictors, including dietary intake of carotenoids.
Interest is increasing in epistasis as a possible source of the unexplained variance missed by genome-wide association studies. The Genetic Analysis Workshop 16 Group 9 participants evaluated a wide variety of classical and novel analytical methods for detecting epistasis, in both the statistical and machine learning paradigms, applied to both real and simulated data. Because the magnitude of epistasis is clearly relative to scale of penetrance, and therefore to some extent, to the choice of model framework, it is not surprising that strong interactions under one model might be minimized or even disappear entirely under a different modeling framework.
generalized linear model; machine learning methods
To investigate whether the association between physical activity and serum 25-hydroxyvitamin D (25(OH)D) concentrations is independent of sun exposure, body size, and other potential explanatory variables.
Using data from a sample of 1,343 postmenopausal women, from the Women’s Health Initiative, linear regression was used to examine the associations of duration (minutes/week) of recreational activity and of yard work with 25(OH)D concentrations (nmol/L).
In age-adjusted analyses, positive associations were observed between 25(OH)D concentrations and both duration of recreational physical activity (β=0.71, SE(0.09), P<0.001) and yard work (β=0.36, SE(0.10), P=0.004). After further adjustment for vitamin D intake, self-reported sunlight exposure, waist circumference, and season of blood draw, 25(OH)D was significantly associated with recreational activity (β=0.21, SE(0.09), P=0.014) but not with yard work (β=0.18, SE(0.09), P=0.061). Interactions were observed between season and both recreational activity (Pinteraction=0.082) and yard work (Pinteraction=0.038) such that these activity-25(OH)D associations were greater during summer/fall compared to winter/spring. Self-reported sunlight exposure and measures of body size did not modify the associations.
The observed age-adjusted activity-25(OH)D associations were attenuated after adjusting for explanatory variables and were modified by season of blood draw. Adopting a lifestyle that incorporates outdoor physical activity during summer/fall, consuming recommended amounts of vitamin D, and maintaining a healthy weight may improve or maintain vitamin D status in postmenopausal women.
25-hydroxyvitamin D; vitamin D; serum; sunlight exposure; physical activity; epidemiology; women
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
Next-generation sequencing technologies are rapidly changing the field of genetic epidemiology and enabling exploration of the full allele frequency spectrum underlying complex diseases. Although sequencing technologies have shifted our focus toward rare genetic variants, statistical methods traditionally used in genetic association studies are inadequate for estimating effects of low minor allele frequency variants. Four our study we use the Genetic Analysis Workshop 17 data from 697 unrelated individuals (genotypes for 24,487 autosomal variants from 3,205 genes). We apply a Bayesian hierarchical mixture model to identify genes associated with a simulated binary phenotype using a transformed genotype design matrix weighted by allele frequencies. A Metropolis Hasting algorithm is used to jointly sample each indicator variable and additive genetic effect pair from its conditional posterior distribution, and remaining parameters are sampled by Gibbs sampling. This method identified 58 genes with a posterior probability greater than 0.8 for being associated with the phenotype. One of these 58 genes, PIK3C2B was correctly identified as being associated with affected status based on the simulation process. This project demonstrates the utility of Bayesian hierarchical mixture models using a transformed genotype matrix to detect genes containing rare and common variants associated with a binary phenotype.
Evidence-based public health requires the existence of reliable information systems for priority setting and evaluation of interventions. Existing data systems in the United States are either too crude (e.g., vital statistics), rely on administrative data (e.g., Medicare) or, because of their national scope (e.g., NHANES), lack the discriminatory power to assess specific needs and to evaluate community health activities at the state and local level. This manuscript describes the rationale and methods of the Survey of the Health of Wisconsin (SHOW), a novel infrastructure for population health research.
The program consists of a series of independent annual surveys gathering health-related data on representative samples of state residents and communities. Two-stage cluster sampling is used to select households and recruit approximately 800-1,000 adult participants (21-74 years old) each year. Recruitment and initial interviews are done at the household; additional interviews and physical exams are conducted at permanent or mobile examination centers. Individual survey data include physical, mental, and oral health history, health literacy, demographics, behavioral, lifestyle, occupational, and household characteristics as well as health care access and utilization. The physical exam includes blood pressure, anthropometry, bioimpedance, spirometry, urine collection and blood draws. Serum, plasma, and buffy coats (for DNA extraction) are stored in a biorepository for future studies. Every household is geocoded for linkage with existing contextual data including community level measures of the social and physical environment; local neighborhood characteristics are also recorded using an audit tool. Participants are re-contacted bi-annually by phone for health history updates.
SHOW generates data to assess health disparities across state communities as well as trends on prevalence of health outcomes and determinants. SHOW also serves as a platform for ancillary epidemiologic studies and for studies to evaluate the effect of community-specific interventions. It addresses key gaps in our current data resources and increases capacity for etiologic, applied and translational population health research. It is hoped that this program will serve as a model to better support evidence-based public health, facilitate intervention evaluation research, and ultimately help improve health throughout the state and nation.
Genome-wide association studies are often limited in their ability to attain their full potential due to the sheer volume of information created. We sought to use the random forest algorithm to identify single-nucleotide polymorphisms (SNPs) that may be involved in gene-by-smoking interactions related to the early-onset of coronary heart disease.
Using data from the Framingham Heart Study, our analysis used a case-only design in which the outcome of interest was age of onset of early coronary heart disease.
Smoking status was dichotomized as ever versus never. The single SNP with the highest importance score assigned by random forests was rs2011345. This SNP was not associated with age alone in the control subjects. Using generalized estimating equations to adjust for sex and account for familial correlation, there was evidence of an interaction between rs2011345 and smoking status.
The results of this analysis suggest that random forests may be a useful tool for identifying SNPs taking part in gene-by-environment interactions in genome-wide association studies.
The aim of this study was to detect the effect of interactions between single-nucleotide polymorphisms (SNPs) on incidence of heart diseases. For this purpose, 2912 subjects with 350,160 SNPs from the Framingham Heart Study (FHS) were analyzed. PLINK was used to control quality and to select the 10,000 most significant SNPs. A classification tree algorithm, Generalized, Unbiased, Interaction Detection and Estimation (GUIDE), was employed to build a classification tree to detect SNP-by-SNP interactions for the selected 10 k SNPs. The classes generated by GUIDE were reexamined by a generalized estimating equations (GEE) model with the empirical variance after accounting for potential familial correlation. Overall, 17 classes were generated based on the splitting criteria in GUIDE. The prevalence of coronary heart disease (CHD) in class 16 (determined by SNPs rs1894035, rs7955732, rs2212596, and rs1417507) was the lowest (0.23%). Compared to class 16, all other classes except for class 288 (prevalence of 1.2%) had a significantly greater risk when analyzed using GEE model. This suggests the interactions of SNPs on these node paths are significant.
The objective of this study was to detect interactions between relevant single-nucleotide polymorphisms (SNPs) associated with rheumatoid arthritis (RA). Data from Problem 1 of the Genetic Analysis Workshop 16 were used. These data consisted of 868 cases and 1,194 controls genotyped with the 500 k Illumina chip. First, machine learning methods were applied for preselecting SNPs. One hundred SNPs outside the HLA region and 1,500 SNPs in the HLA region were preselected using information-gain theory. The software weka was used to reduce colinearity and redundancy in the HLA region, resulting in a subset of 6 SNPs out of 1,500. In a second step, a parametric approach to account for interactions between SNPs in the HLA region, as well as HLA-nonHLA interactions was conducted using a Bayesian threshold least absolute shrinkage and selection operator (LASSO) model incorporating 2,560 covariates. This approach detected some main and interaction effects for SNPs in genes that have previously been associated with RA (e.g., rs2395175, rs660895, rs10484560, and rs2476601). Further, some other SNPs detected in this study may be considered in candidate gene studies.
The high genomic density of the single-nucleotide polymorphism (SNP) sets that are typically surveyed in genome-wide association studies (GWAS) now allows the application of haplotype-based methods. Although the choice of haplotype-based vs. individual-SNP approaches is expected to affect the results of association studies, few empirical comparisons of method performance have been reported on the genome-wide scale in the same set of individuals. To measure the relative ability of the two strategies to detect associations, we used a large dataset from the North American Rheumatoid Arthritis Consortium to: 1) partition the genome into haplotype blocks, 2) associate haplotypes with disease, and 3) compare the results with individual-SNP association mapping. Although some associations were shared across methods, each approach uniquely identified several strong candidate regions. Our results suggest that the application of both haplotype-based and individual-SNP testing to GWAS should be adopted as a routine procedure.
Genes have been found to influence the age of onset of several diseases and traits. The occurrence of many chronic diseases, obesity included, appears to be strongly age-dependent. However, an analysis of potential age of onset genes for obesity has yet to be reported. There are at least two analytic methods for determining an age of onset gene. The first is to consider a person affected if they possess the trait before a certain age (an early age of onset phenotype). The second is to define the phenotype based on the residual from a survival analysis.
No regions provided evidence for linkage at the more stringent level of p < 0.001. However, five regions showed consistent suggestive evidence for linkage (one marker with p < 0.01 and a second contiguous marker at p < 0.05). These regions were chromosome 1 (280–294 cM) and chromosome 16 (56–64 cM) for overweight using the survival analysis residual method and chromosome 13 (102–122 cM), chromosome 17 (127–138 cM), and chromosome 19 (23–47 cM) for obese before age 35.
Only one region (chromosome 19 at 23–47 cM) showed somewhat consistent results between the two analytic methods. Potential reasons for inconsistent results between the two methods, as well as their strengths and weaknesses, are discussed. The use of both methods together to explore the genetics of the age of onset of a trait may prove to be beneficial in determining a gene that is linked only to an early age of onset phenotype versus one that determines age of onset through all age groups.