With its potential to discover a much greater amount of genetic variation, next-generation sequencing is fast becoming an emergent tool for genetic association studies. However, the cost of sequencing all individuals in a large-scale population study is still high in comparison to most alternative genotyping options. While the ability to identify individual-level data is lost (without bar-coding), sequencing pooled samples can substantially lower costs without compromising the power to detect significant associations.We propose a hierarchical Bayesian model that estimates the association of each variant using pools of cases and controls, accounting for the variation in read depth across pools and sequencing error. To investigate the performance of our method across a range of number of pools, number of individuals within each pool, and average coverage, we undertook extensive simulations varying effect sizes, minor allele frequencies, and sequencing error rates. In general, the number of pools and pool size have dramatic effects on power while the total depth of coverage per pool has only a moderate impact. This information can guide the selection of a study design that maximizes power subject to cost, sample size, or other laboratory constraints. We provide an R package (hiPOD: hierarchical Pooled Optimal Design) to find the optimal design, allowing the user to specify a cost function, cost, and sample size limitations, and distributions of effect size, minor allele frequency, and sequencing error rate.
genetic association studies; sequencing; rare variants
Recent genome-wide association studies (GWASs) have identified common variants at 16 autosomal regions influencing the risk of developing colorectal cancer (CRC). To decipher the genetic basis of the association signals at these loci, we performed a meta-analysis of data from five GWASs, totalling 5626 cases and 7817 controls, using imputation to recover un-typed genotypes. To enhance our ability to discover low-frequency risk variants, in addition to using 1000 Genomes Project data as a reference panel, we made use of high-coverage sequencing data on 253 individuals, 199 with early-onset familial CRC. For 13 of the regions, it was possible to refine the association signal identifying a smaller region of interest likely to harbour the functional variant. Our analysis did not provide evidence that any of the associations at the 16 loci being a consequence of synthetic associations rather than linkage disequilibrium with a common risk variant.
Association studies among admixed populations pose many challenges including confounding of genetic effects due to population substructure and heterogeneity due to different patterns of linkage disequilibrium (LD). We use simulations to investigate controlling for confounding by indicators of global ancestry and the impact of including a covariate for local ancestry. In addition, we investigate the use of an interaction term between a single-nucleotide polymorphism (SNP) and local ancestry to capture heterogeneity in SNP effects. Although adjustment for global ancestry can control for confounding, additional adjustment for local ancestry may increase power when the induced admixture LD is in the opposite direction as the LD in the ancestral population. However, if the induced LD is in the same direction, there is the potential for reduced power because of overadjustment. Furthermore, the inclusion of a SNP by local ancestry interaction term can increase power when there is substantial differential LD between ancestry populations. We examine these approaches in genome-wide data using the University of Southern California's Children's Health Study investigating asthma risk. The analysis highlights rs10519951 (P = 8.5 × 10−7), a SNP lacking any evidence of association from a conventional analysis (P = 0.5).
confounding; genetic association studies; genome-wide association studies; heterogeneity; linkage disequilibrium; population stratification
To evaluate associations of treatment and an ‘additive genetic efficacy score’ (AGES) based on dopamine functional polymorphisms with time to first smoking lapse and point prevalence abstinence at end of treatment among participants enrolled in two randomized clinical trials of smoking cessation therapies.
Double-blind pharmacogenetic efficacy trials randomizing participants to active or placebo bupropion. Study 1 also randomized participants to cognitive-behavioral smoking cessation treatment (CBT) or this treatment with CBT for depression. Study 2 provided standardized behavioural support.
Two Hospital-affiliated clinics (Study 1), and two University-affiliated clinics (Study 2).
N=792 self-identified white treatment-seeking smokers aged ≥18 years smoking ≥10 cigarettes per day over the last year.
Age, gender, Fagerström Test for Nicotine Dependence, dopamine pathway genotypes (rs1800497 [ANKK1 E713K], rs4680 [COMT V158M], DRD4 exon 3 Variable Number of Tandem Repeats polymorphism [DRD4 VNTR], SLC6A3 3' VNTR) analyzed both separately and as part of an AGES, time to first lapse, and point prevalence abstinence at end of treatment.
Significant associations of the AGES (hazard ratio = 1.10, 95% Confidence Interval [CI] = 1.06–1.14], p=0.0099) and of the DRD4 VNTR (HR = 1.29, 95%CI 1.17–1.41, p=0.0073) were observed with time to first lapse. A significant AGES by pharmacotherapy interaction was observed (β [SE]=−0.18 [0.07], p=0.016), such that AGES predicted risk for time to first lapse only for individuals randomized to placebo.
A score based on functional polymorphisms relating to dopamine pathways appears to predict lapse to smoking following a quit attempt, and the association is mitigated in smokers using bupropion.
Bupropion; genetic; pharmacogenetic analysis; randomized clinical trial; first lapse
Obsessive-compulsive disorder (OCD) is a common, debilitating neuropsychiatric
illness with complex genetic etiology. The International OCD Foundation Genetics
Collaborative (IOCDF-GC) is a multi-national collaboration established to discover the
genetic variation predisposing to OCD. A set of individuals affected with DSM-IV OCD, a
subset of their parents, and unselected controls, were genotyped with several different
Illumina SNP microarrays. After extensive data cleaning, 1,465 cases, 5,557
ancestry-matched controls and 400 complete trios remained, with a common set of 469,410
autosomal and 9,657 X-chromosome SNPs. Ancestry-stratified case-control association
analyses were conducted for three genetically-defined subpopulations and combined in two
meta-analyses, with and without the trio-based analysis. In the case-control analysis, the
lowest two p-values were located within DLGAP1
(p=2.49×10-6 and p=3.44×10-6), a
member of the neuronal postsynaptic density complex. In the trio analysis, rs6131295, near
BTBD3, exceeded the genome-wide significance threshold with a
p-value=3.84 × 10-8. However, when trios were meta-analyzed
with the combined case-control samples, the p-value for this variant was
3.62×10-5, losing genome-wide significance. Although no SNPs were
identified to be associated with OCD at a genome-wide significant level in the combined
trio-case-control sample, a significant enrichment of methylation-QTLs (p<0.001)
and frontal lobe eQTLs (p=0.001) was observed within the top-ranked SNPs
(p<0.01) from the trio-case-control analysis, suggesting these top signals may
have a broad role in gene expression in the brain, and possibly in the etiology of
Obsessive-compulsive disorder; GWAS; Genetic; Genomic; Neurodevelopmental disorder; DLGAP
Experimental evidence has demonstrated an anti-neoplastic role for vitamin D in the colon and higher circulating 25-hydroxyvitamin D (25[OH]D) levels are consistently associated with a lower risk of colorectal cancer (CRC). Genome-wide association studies have identified loci associated with levels of circulating 25(OH)D. The identified SNPs from four gene regions, collectively explain approximately 5% of the variance in circulating 25(OH)D.
We investigated whether six polymorphisms in GC, CYP2R1, CYP24A1 and DHCR7/NADSYN1, genes previously shown to be associated with circulating 25(OH)D levels, were associated with CRC risk in 10,061 cases and 12,768 controls drawn from 13 studies included in the Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) and Colon Cancer Family Registry (CCFR). We performed a meta-analysis of crude and multivariate-adjusted logistic regression models to calculate odds ratios and associated confidence intervals for SNPs individually, SNPs simultaneously, and for a vitamin D additive genetic risk score (GRS).
We did not observe a statistically significant association between the 25(OH)D associated SNPs and CRC marginally, conditionally, or as a GRS, or for colon or rectal cancer separately or combined.
Our findings do not support an association between SNPs associated with circulating 25(OH)D and risk of CRC. Additional work is warranted to investigate the complex relationship between 25(OH)D and CRC risk.
There was no association observed between genetic markers of circulating 25(OH)D and CRC. These genetic markers account for a small proportion of the variance in 25(OH)D.
Genetic epidemiology is increasingly focused on complex diseases involving multiple genes and environmental factors, often interacting in complex ways. Although standard frequentist methods still have a role in hypothesis generation and testing for discovery of novel main effects and interactions, Bayesian methods are particularly well suited to modeling the relationships in an integrated “systems biology” manner. In this chapter, we provide an overview of the principles of Bayesian analysis and their advantages in this context and describe various approaches to applying them for both model building and discovery in a genome-wide setting. In particular, we highlight the ability of Bayesian methods to construct complex probability models via a hierarchical structure and to account for uncertainty in model specification by averaging over large spaces of alternative models.
Nicotine metabolism and genetic variation have an impact on nicotine addiction and smoking abstinence, but further research is required. The nicotine metabolite ratio (NMR) is a robust biomarker of nicotine metabolism used to categorize slow and normal nicotine metabolizers (lower 25th quartile cutoff). In two randomized clinical trials of smoking abstinence treatments, we conducted NMR-stratified analyses on smoking abstinence across 13 regions coding for nicotinic acetylcholine receptors and proteins involved in the dopamine reward system. Gene × NMR interaction P-values were adjusted for multiple correlated tests, and we used a Bonferroni-corrected α-level of 0.004 to determine system-wide significance. Three SNPs in DRD1 (rs11746641, rs2168631, rs11749035) had significant interactions (0.001 ≤ adjusted P-values ≤ 0.004), with increased odds of abstinence within slow metabolizers (ORs=3.1–3.5, 95% CI 1.7–6.7). Our findings support the role of DRD1 in nicotine dependence, and identify genetic and nicotine metabolism profiles that may interact to impact nicotine dependence.
Genetic association studies; heterogeneity; smoking abstinence; nicotine metabolism; nicotine metabolite ratio; DRD1
We conducted gender-stratified analyses on a systems-based candidate gene study of 53 regions involved in nicotinic response and the brain-reward pathway in two randomized clinical trials of smoking cessation treatments (placebo, bupropion, transdermal and nasal spray nicotine replacement therapy). We adjusted P-values for multiple correlated tests, and used a Bonferroni corrected α-level of 5 × 10−4 to determine system-wide significance. Four SNPs (rs12021667, rs12027267, rs6702335, rs12039988; r2>0.98) in erythrocyte membrane protein band 4.1 (EPB41) had a significant male-specific marginal association with smoking abstinence (OR=0.5; 95% CI 0.3–0.6) at end of treatment (adjusted P<6 × 10−5). rs806365 in cannabinoid receptor 1 (CNR1) had a significant male-specific gene-treatment interaction at 6-month follow-up (adjusted P=3.9 × 10−5); within males using nasal spray, rs806365 was associated with a decrease in odds of abstinence (OR=0.04; 95% CI 0.01–0.2). While the role of CNR1 in substance abuse has been well studied, we report EPB41 for the first time in the nicotine literature.
Genetic association studies; heterogeneity; smoking cessation
A chronic disease such as asthma is the result of a complex sequence of biological interactions involving multiple genes and pathways in response to a multitude of environmental exposures. However, methods to model jointly all factors are still evolving. Some of the current challenges include how to integrate knowledge from different data types and different disciplines, as well as how to utilize relevant external information such as gene annotation to identify novel disease genes and gene-environment interactions.
Using a Bayesian hierarchical modeling framework, we developed two alternative methods for joint analysis of an epidemiologic study of a disease endpoint and an experimental study of intermediate phenotypes, while incorporating external information.
Our simulation studies demonstrated superior performance of the proposed hierarchical models compared to separate analysis with the standard single-level regression modeling approach. The combined analyses of the Southern California Children's Health Study and challenge study data suggest that these joint analytical methods detected more significant genetic main and gene-environment interaction effects than the conventional analysis.
The proposed prior framework is very flexible and can be generalized for an integrative analysis of diverse sources of relevant biological data.
Bayesian hierarchical modeling; Biological related studies; Data integration; Gene-environment interaction; Joint analysis; Markov-chain Monte Carlo (MCMC) methods; Prior knowledge
DRD4 Exon III Variable Number of Tandem Repeat (VNTR) variation was found to interact with bupropion to influence prospective smoking abstinence, in a recently published longitudinal analyses of N = 331 individuals from a randomized double-blind placebo-controlled trial of bupropion and intensive cognitive–behavioral mood management therapy.
We used univariate, multivariate, and longitudinal logistic regression to evaluate gene, treatment, time, and interaction effects on point prevalence and continuous abstinence at end of treatment, 6 months, and 12 months, respectively, in N = 416 European ancestry participants in a double-blind pharmacogenetic efficacy trial randomizing participants to active or placebo bupropion. Participants received 10 weeks of pharmacotherapy and 7 sessions of behavioral therapy, with a target quit date 2 weeks after initiating both therapies. VNTR genotypes were coded with the long allele dominant resulting in 4 analysis categories. Covariates included demographics, dependence measures, depressive symptoms, and genetic ancestry. We also performed genotype-stratified secondary analyses.
We observed significant effects of time in longitudinal analyses of both abstinence outcomes, of treatment in individuals with VNTR long allele genotypes for both abstinence outcomes, and of covariates in some analyses. We observed non-significantly larger differences in active versus placebo effect sizes in individuals with VNTR long allele genotypes than in individuals without the VNTR long allele, in the directions previously reported.
VNTR by treatment interaction differences between these and previous analyses may be attributable to insufficient size of the replication sample. Analyses of multiple randomized clinical trials will enable identification and validation of factors mediating treatment response.
4-Aminobiphenyl (ABP) is an established human bladder carcinogen, with tobacco smoke being a major source of human exposure. Other arylamine compounds, including 2,6-dimethylaniline (2,6-DMA), have been implicated as possible human bladder carcinogens. Hemoglobin adducts of 4-ABP and 2,6-DMA are validated biomarkers of exposure to those compounds in humans.
The Shanghai Bladder Cancer Study enrolled 581 incident bladder cancer cases and 604 population controls. Each participant was solicited for his/her history of tobacco use and other lifestyle factors, and donation of blood and urine specimens. Red blood cell lysates were used to quantify both hemoglobin adducts of 4-ABP and 2,6-DMA. Urine samples were used to quantify total cotinine. Odds ratios (ORs) and 95% confidence intervals (CIs) for bladder cancer were estimated using unconditional logistic regression methods.
Among lifelong nonsmokers, ORs (95% CIs) of bladder cancer for low (below median of positive values) and high versus undetectable levels of 2,6-DMA hemoglobin adducts were 3.87 (1.39-10.75) and 6.90 (3.17-15.02), respectively (Ptrend<0.001). Similarly, among lifelong nonsmokers, ORs (95% CIs) of bladder cancer for 3rd and 4th versus 1st/2nd quartiles of 4-ABP hemoglobin adducts was 1.30 (0.76-2.22) and 2.29 (1.23-4.24), respectively (Ptrend=0.00). The two associations were independent of each other.
Hemoglobin adducts of 4-ABP and 2,6-DMA were significantly and independently associated with increased bladder cancer risk among lifelong nonsmokers in Shanghai, China.
The findings of the present study in China with previous data in Los Angeles, California strongly implicate arylamines as potential causal agents of human bladder cancer.
Given the increasing scale of rare variant association studies, we introduce a method for high-dimensional studies that integrates multiple sources of data as well as allows for multiple region-specific risk indices.
Our method builds upon the previous Bayesian risk index (BRI) by integrating external biological variant-specific covariates to help guide the selection of associated variants and regions. Our extension also incorporates a second-level of uncertainty as to which regions are associated with the outcome of interest.
Using a set of study-based simulations, we show that our approach leads to an increase in power to detect true associations in comparison to several commonly used alternatives. Additionally, the method provides multi-level inference at the pathway, region and variant levels.
To demonstrate the flexibility of the method to incorporate various types of information and the applicability to a high-dimensional data, we apply our method to a single region within a candidate gene study of second primary breast cancer and to multiple regions within a candidate pathway study of colon cancer.
genetic association studies; Bayesian model uncertainty; Bayes factors; sequence analysis; rare variant analysis
BACKGROUND & AIMS
Heritable factors contribute to the development of colorectal cancer. Identifying the genetic loci associated with colorectal tumor formation could elucidate the mechanisms of pathogenesis.
We conducted a genome-wide association study that included 14 studies, 12,696 cases of colorectal tumors (11,870 cancer, 826 adenoma), and 15,113 controls of European descent. The 10 most statistically significant, previously unreported findings were followed up in 6 studies; these included 3056 colorectal tumor cases (2098 cancer, 958 adenoma) and 6658 controls of European and Asian descent.
Based on the combined analysis, we identified a locus that reached the conventional genome-wide significance level at less than 5.0 × 10−8: an intergenic region on chromosome 2q32.3, close to nucleic acid binding protein 1 (most significant single nucleotide polymorphism: rs11903757; odds ratio [OR], 1.15 per risk allele; P = 3.7 × 10−8). We also found evidence for 3 additional loci with P values less than 5.0 × 10−7: a locus within the laminin gamma 1 gene on chromosome 1q25.3 (rs10911251; OR, 1.10 per risk allele; P = 9.5 × 10−8), a locus within the cyclin D2 gene on chromosome 12p13.32 (rs3217810 per risk allele; OR, 0.84; P = 5.9 × 10−8), and a locus in the T-box 3 gene on chromosome 12q24.21 (rs59336; OR, 0.91 per risk allele; P = 3.7 × 10−7).
In a large genome-wide association study, we associated polymorphisms close to nucleic acid binding protein 1 (which encodes a DNA-binding protein involved in DNA repair) with colorectal tumor risk. We also provided evidence for an association between colorectal tumor risk and polymorphisms in laminin gamma 1 (this is the second gene in the laminin family to be associated with colorectal cancers), cyclin D2 (which encodes for cyclin D2), and T-box 3 (which encodes a T-box transcription factor and is a target of Wnt signaling to β-catenin). The roles of these genes and their products in cancer pathogenesis warrant further investigation.
Colon Cancer; Genetics; Risk Factors; SNP
Although previous investigations have indicated a role for genetic factors in smoking initiation, the underlying genetic mechanisms are still unknown. In 2,339 adolescents from a Chinese Han population in the Wuhan Smoking Prevention Trial (Wuhan, China, 1998–1999), the authors explored the association of 57 genes in the dopamine pathway with smoking initiation. Using a conservative approach for declaring significance, positive findings were further examined in an independent sample of 603 Caucasian adolescents followed for up to 10 years as part of the Children's Health Study (Southern California, 1993–2009). The authors identified 1 single nucleotide polymorphism (rs2298122) in the calcyon neuron-specific vesicular protein gene (CALY) that was positively associated with smoking initiation in females (odds ratio = 2.21, 95% confidence interval: 1.49, 3.27; P = 8.4 × 10−5) in the Wuhan Smoking Prevention Trial cohort, and they replicated the association in females from the Children's Health Study cohort (hazard rate ratio = 2.05, 95% confidence interval: 1.27, 3.31; P = 0.003). These results suggest that the CALY gene may influence smoking initiation in adolescents, although the potential roles of underlying psychological characteristics that may be components of the smoking-initiation phenotype, such as impulsivity or novelty-seeking, remain to be explored.
adolescent; dopamine; genetic association studies; smoking
CYP2B6 variation predicts pharmacokinetic characteristics of its substrates. Consideration for underlying genetic structure is critical to protect against spurious associations with the highly polymorphic CYP2B6 gene.
The effect of CYP2B6 variation on response to its substrates, nonnucleoside reverse transcriptase inhibitors (NNRTIs), was explored in the Women's Interagency HIV Study.
Five putative functional polymorphisms were tested for associations with virologic suppression within one year after NNRTI initiation in women naïve to antiretroviral agents (n=91). Principal components (PCs) were generated to control for population substructure. Logistic regression was used to test the joint effect of rs3745274 and rs28399499, which together indicate slow, intermediate, and extensive metabolizers.
Rs3745274 was significantly associated with virologic suppression (OR=3.61, 95% CI 1.16-11.22, p trend=0.03); the remaining polymorphisms tested were not significantly associated with response. Women classified as intermediate and slow metabolizers were 2.90 (95% CI 0.79-12.28) and 13.44 (95% CI 1.66-infinity) times as likely to achieve virologic suppression compared to extensive metabolizers after adjustment for PCs (p trend=0.005). Failure to control for genetic ancestry resulted in substantial confounding of the relationship between the metabolizer phenotype and treatment response.
The CYP2B6 metabolizer phenotype was significantly associated with virologic response to NNRTIs; this relationship would have been masked by simple adjustment for self-reported ethnicity. Given the appreciable genetic heterogeneity that exists within self-reported ethnicity, these results exemplify the importance of characterizing underlying genetic structure in pharmacogenetic studies. Further follow-up of the CYP2B6 metabolizer phenotype is warranted given the potential clinical importance of this finding.
CYP2B6; population substructure; women; NNRTIs; confounding
We are interested in investigating the involvement of multiple rare variants within a given region by conducting analyses of individual regions with two goals: (1) to determine if regional rare variation in aggregate is associated with risk; and (2) conditional upon the region being associated, to identify specific genetic variants within the region that are driving the association. In particular, we seek a formal integrated analysis that achieves both of our goals. For rare variants with low minor allele frequencies, there is very little power to statistically test the null hypothesis of equal allele or genotype counts for each variant. Thus, genetic association studies are often limited to detecting association within a subset of the common genetic markers. However, it is very likely that associations exist for the rare variants that may not be captured by the set of common markers. Our framework aims at constructing a risk index based on multiple rare variants within a region. Our analytical strategy is novel in that we use a Bayesian approach to incorporate model uncertainty in the selection of variants to include in the index as well as the direction of the associated effects. Additionally, the approach allows for inference at both the group and variant-specific levels. Using a set of simulations, we show that our methodology has added power over other popular rare variant methods to detect global associations. In addition, we apply the approach to sequence data from the WECARE Study of second primary breast cancers.
genetic association studies; Bayesian model uncertainty; Bayes factors; multiplicity correction; sequence analysis; WECARE
Evaluate nicotinic acetycholine receptor (nAChR) single nucleotide polymorphism (SNP) association with seven day point prevalence abstinence (abstinence) in randomized clinical trials of smoking cessation therapies (RCTs) in individuals grouped by pharmacotherapy randomization to inform the development of personalized smoking cessation therapy.
We quantified association of four SNPs at three nAChRs with abstinence in eight RCTs. Participants were 2,633 outpatient treatment-seeking, self-identified European ancestry individuals smoking ≥10 cigarettes per day, recruited via advertisement, prescribed pharmacotherapy, and provided with behavioral therapy. Interventions included nicotine replacement therapy (NRT), bupropion, varenicline, placebo or combined NRT and bupropion, and five modes of group and individual behavioral therapy. Outcome measures tested in multivariate logistic regression were end of treatment (EOT) and six month (6MO) abstinence, with demographic, behavioral and genetic covariates.
“Risk” alleles previously associated with smoking heaviness were significantly (P<0.05) associated with reduced abstinence in the placebo pharmacotherapy group (PG) at 6MO [for rs588765 OR (95%CI) 0.41 (0.17–0.99)], and at EOT and at 6MO [for rs1051730, 0.42 (0.19–0.93) and 0.31 (0.12–0.80)], and with increased abstinence in the NRT PG at 6MO [for rs588765 2.07 (1.11–3.87) and for rs1051730 2.54 (1.29–4.99)]. We observed significant heterogeneity in rs1051730 effects (F=2.48, P=0.021) between PGs.
chr15q25.1 nAChR SNP risk alleles for smoking heaviness significantly increase relapse with placebo treatment and significantly increase abstinence with NRT. These SNP-PG associations require replication in independent samples for validation, and testing in larger sample sizes to evaluate whether similar effects occur in other PGs.
logistic regression; mediation analysis; nAChR variation; nicotine dependence; pharmacotherapy; randomized clinical trials
Tourette Syndrome (TS) is a developmental disorder that has one of the highest familial recurrence rates among neuropsychiatric diseases with complex inheritance. However, the identification of definitive TS susceptibility genes remains elusive. Here, we report the first genome-wide association study (GWAS) of TS in 1285 cases and 4964 ancestry-matched controls of European ancestry, including two European-derived population isolates, Ashkenazi Jews from North America and Israel, and French Canadians from Quebec, Canada. In a primary meta-analysis of GWAS data from these European ancestry samples, no markers achieved a genome-wide threshold of significance (p<5 × 10−8); the top signal was found in rs7868992 on chromosome 9q32 within COL27A1 (p=1.85 × 10−6). A secondary analysis including an additional 211 cases and 285 controls from two closely-related Latin-American population isolates from the Central Valley of Costa Rica and Antioquia, Colombia also identified rs7868992 as the top signal (p=3.6 × 10−7 for the combined sample of 1496 cases and 5249 controls following imputation with 1000 Genomes data). This study lays the groundwork for the eventual identification of common TS susceptibility variants in larger cohorts and helps to provide a more complete understanding of the full genetic architecture of this disorder.
Tourette Syndrome; tics; genetics; GWAS; neurodevelopmental disorder
In a variety of taxa, males deploy alternative reproductive tactics to secure fertilizations. In many species, small “sneaker” males attempt to steal fertilizations while avoiding encounters with larger, more aggressive, dominant males. Sneaker males usually face a number of disadvantages, including reduced access to females and the higher likelihood that upon ejaculation, their sperm face competition from other males. Nevertheless, sneaker males represent an evolutionarily stable strategy under a wide range of conditions. Game theory suggests that sneaker males compensate for these disadvantages by investing disproportionately in spermatogenesis, by producing more sperm per unit body mass (the “fair raffle”) and/or by producing higher quality sperm (the “loaded raffle”). Here, we test these models by competing sperm from sneaker “jack” males against sperm from dominant “hooknose” males in Chinook salmon. Using two complementary approaches, we reject the fair raffle in favor of the loaded raffle and estimate that jack males were ∼1.35 times as likely as hooknose males to fertilize eggs under controlled competitive conditions. Interestingly, the direction and magnitude of this skew in paternity shifted according to individual female egg donors, suggesting cryptic female choice could moderate the outcomes of sperm competition in this externally fertilizing species.
Hooknose; jack; salmon; sexual selection; sneaker male; sperm competition
The direct estimation of heritability from genome-wide common variant data as implemented in the program Genome-wide Complex Trait Analysis (GCTA) has provided a means to quantify heritability attributable to all interrogated variants. We have quantified the variance in liability to disease explained by all SNPs for two phenotypically-related neurobehavioral disorders, obsessive-compulsive disorder (OCD) and Tourette Syndrome (TS), using GCTA. Our analysis yielded a heritability point estimate of 0.58 (se = 0.09, p = 5.64e-12) for TS, and 0.37 (se = 0.07, p = 1.5e-07) for OCD. In addition, we conducted multiple genomic partitioning analyses to identify genomic elements that concentrate this heritability. We examined genomic architectures of TS and OCD by chromosome, MAF bin, and functional annotations. In addition, we assessed heritability for early onset and adult onset OCD. Among other notable results, we found that SNPs with a minor allele frequency of less than 5% accounted for 21% of the TS heritability and 0% of the OCD heritability. Additionally, we identified a significant contribution to TS and OCD heritability by variants significantly associated with gene expression in two regions of the brain (parietal cortex and cerebellum) for which we had available expression quantitative trait loci (eQTLs). Finally we analyzed the genetic correlation between TS and OCD, revealing a genetic correlation of 0.41 (se = 0.15, p = 0.002). These results are very close to previous heritability estimates for TS and OCD based on twin and family studies, suggesting that very little, if any, heritability is truly missing (i.e., unassayed) from TS and OCD GWAS studies of common variation. The results also indicate that there is some genetic overlap between these two phenotypically-related neuropsychiatric disorders, but suggest that the two disorders have distinct genetic architectures.
Family and twin studies have shown that genetic risk factors are important in the development of Tourette Syndrome (TS) and obsessive compulsive disorder (OCD). However, efforts to identify the individual genetic risk factors involved in these two neuropsychiatric disorders have been largely unsuccessful. One possible explanation for this is that many genetic variations scattered throughout the genome each contribute a small amount to the overall risk. For TS and OCD, the genetic architecture (characterized by the number, frequency, and distribution of genetic risk factors) is presently unknown. This study examined the genetic architecture of TS and OCD in a variety of ways. We found that rare genetic changes account for more genetic risk in TS than in OCD; certain chromosomes contribute to OCD risk more than others; and variants that influence the level of genes expressed in two regions of the brain can account for a significant amount of risk for both TS and OCD. Results from this study might help in determining where, and what kind of variants are individual risk factors for TS and OCD and where they might be located in the human genome.
Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available.
By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma.
We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions.
The conventional method of detecting gene-environment interactions, the case-control analysis, suffers from low statistical power. In contrast, the case-only analysis/design can be powerful in certain scenarios, although violation of the assumption of independence between the genetic and environmental factors can greatly bias the results. As an alternative, Bayes model averaging may be used to combine the case-control and case-only analyses. This approach first frames the case-control and case-only analyses as variations of a log-linear model. The weighting between these 2 models is then a function of the data and prior beliefs on the independence of the 2 potentially interacting factors. In this paper, the authors demonstrate via simulations that when there is no prior information on the independence of the genetic and environmental factors, this approach tends to be more powerful than the case-control analysis. Additionally, when the genetic and environmental factors are not independent in the population, bias is substantially reduced, with a corresponding reduction in type I error in comparison with the case-only analysis. Increased power or increased robustness to violations of the independence assumption may be obtained with more appropriate prior specification. The authors use an example data analysis to demonstrate the advantages of this approach.
Bayesian estimation; Bayesian model; case-control studies; epidemiologic methods; interaction
Recent genome-wide studies identified a risk locus for colorectal cancer at 18q21, which maps to the SMAD7 gene. Our objective was to confirm the association between SMAD7 SNPs and colorectal cancer risk in the multi-center Colon Cancer Family Registry.
Materials and Methods
23 tagging SNPs in the SMAD7 gene were genotyped among 1,592 population-based and 253 clinic-based families. The SNP-colorectal cancer associations were assessed in multivariable conditional logistic regression.
Among the population-based families, both SNPs rs12953717 (odds ratio, 1.29; 95% confidence interval, 1.12–1.49), and rs11874392 (odds ratio, 0.80; 95% confidence interval, 0.70–0.92) were associated with risk of colorectal cancer. These associations were similar among the population- and the clinic-based families, though they were significant only among the former. Marginally significant differences in the SNP-colorectal cancer associations were observed by use of nonsteroidal anti-inflammatory drugs, cigarette smoking, body mass index, and history of polyps.
SMAD7 SNPs were associated with colorectal cancer risk in the Colon Cancer Family Registry. There was evidence suggesting that the association between rs12953717 and colorectal cancer risk may be modified by factors such as smoking and use of nonsteroidal anti-inflammatory drugs.
There is little information regarding associations between suspected bladder cancer risk factors and tumor subtypes at diagnosis. Some, but not all, studies have found that bladder cancer among smokers is often more invasive than it is among nonsmokers. This population-based case-control study was conducted in Los Angeles, California, involving 1,586 bladder cancer patients and their individually matched controls. Logistic regression was used to conduct separate analyses according to tumor subtypes defined by stage and grade. Cigarette smoking increased risk of both superficial and invasive bladder cancer, but the more advanced the stage, the stronger the effect. The odds ratios associated with regular smokers were 2.2 (95% confidence intervals, 1.8-2.8), 2.7 (2.1-3.6) and 3.7 (2.5-5.5) for low-grade superficial, high-grade superficial and invasive tumors respectively. This pattern was consistently observed regardless of the smoking exposure index under examination. Women had higher risk of invasive bladder cancer than men even they smoked comparable amount of cigarettes as men. There was no gender difference in the association between smoking and risk of low-grade superficial bladder cancer. The heterogeneous effect of cigarette smoking was attenuated among heavy users of NSAIDs. Our results indicate that cigarette smoking was more strongly associated with increased risk of invasive bladder cancer than with low-grade superficial bladder cancer.
cigarette smoking; bladder cancer; tumor subtypes; non-steroidal anti-inflammatory drugs; Los Angeles