Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single-marker-based, which test one SNP a time. In this paper, we consider testing the effect of a SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a Generalized Estimating Equations (GEE)-based kernel association test, a variance component-based testing method, to test for the association between a phenotype and multiple variants in a SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the p-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type-I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum p-value GEE test for a SNP set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.
Family-based association; Generalized estimation equations; Kernel machine regression; Marginal models; Score test; Variance component
High blood pressure (BP) is the most common cardiovascular risk factor worldwide and a major contributor to heart disease and stroke. We previously discovered a BP-associated missense SNP (single nucleotide polymorphism)–rs2272996–in the gene encoding vanin-1, a glycosylphosphatidylinositol (GPI)-anchored membrane pantetheinase. In the present study, we first replicated the association of rs2272996 and BP traits with a total sample size of nearly 30,000 individuals from the Continental Origins and Genetic Epidemiology Network (COGENT) of African Americans (P = 0.01). This association was further validated using patient plasma samples; we observed that the N131S mutation is associated with significantly lower plasma vanin-1 protein levels. We observed that the N131S vanin-1 is subjected to rapid endoplasmic reticulum-associated degradation (ERAD) as the underlying mechanism for its reduction. Using HEK293 cells stably expressing vanin-1 variants, we showed that N131S vanin-1 was degraded significantly faster than wild type (WT) vanin-1. Consequently, there were only minimal quantities of variant vanin-1 present on the plasma membrane and greatly reduced pantetheinase activity. Application of MG-132, a proteasome inhibitor, resulted in accumulation of ubiquitinated variant protein. A further experiment demonstrated that atenolol and diltiazem, two current drugs for treating hypertension, reduce the vanin-1 protein level. Our study provides strong biological evidence for the association of the identified SNP with BP and suggests that vanin-1 misfolding and degradation are the underlying molecular mechanism.
Hypertension (HTN) or high blood pressure (BP) is common worldwide and a major risk factor for cardiovascular disease and all-cause mortality. Identification of genetic variants of consequence for HTN serves as the molecular basis for its treatment. Using admixture mapping analysis of the Family Blood Pressure Program data, we recently identified that the VNN1 gene (encoding the protein vanin-1), in particular SNP rs2272996 (N131S), was associated with BP in both African Americans and Mexican Americans. Vanin-1 was reported to act as an oxidative stress sensor using its pantetheinase enzyme activity. Because a linkage between oxidative stress and HTN has been hypothesized for many years, vanin-1's pantetheinase activity offers a physiologic rationale for BP regulation. Here, we first replicated the association of rs2272996 with BP in the Continental Origins and Genetic Epidemiology Network (COGENT), which included nearly 30,000 African Americans. We further demonstrated that the N131S mutation in vanin-1 leads to its rapid degradation in cells, resulting in loss of function on the plasma membrane. The loss of function of vanin-1 is associated with reduced BP. Therefore, our results indicate that vanin-1 is a new candidate to be manipulated to ameliorate HTN.
Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations.
To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human diversity.
76% of micSeqs were confirmed by a comparative genomics approach. Fourteen micSeqs are expressed in human brain or contain TF binding regions. Some micSeqs are primate-specific, conserved and may play a role in the evolution of primates.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-685) contains supplementary material, which is available to authorized users.
Missing common sequence; De novo assembling; Next generation sequencing; Expression in brain; Transcription factor binding; Genome evolution
Genome-wide association studies (GWAS) have identified at least 71 Crohn's disease (CD) genetic risk loci but the role of gene-gene interactions is unclear. The value of genetic variants in clinical practice is not defined due to limited explained heritability.
Materials and Methods
We examined model predictability of combining the 71 CD risk alleles and genetic interactions in an ongoing inflammatory bowel disease (IBD) GWAS. The Wellcome Trust Case-Control Consortium (WTCCC) IBD GWAS was used as a replicate cohort. We used logic regression (LR), an adaptive regression methodology, to search for high order binary predictors (e.g. single nucleotide polymorphism [SNP]) interactions.
The combined 71 CD SNPs had good CD risk predictability (area under the curve, AUCs of 0.75 and 0.73 in the 2 cohorts). Higher cumulative allele score predicted higher CD risk, but a relatively small difference in cumulative allele scores was observed between CD and controls (49 vs. 47, P<0.001). Through LR, we identified high order genetic interactions and significantly improved the model predictability (AUC, 0.75 to 0.77, P<0.0001). A genetic interaction model including NOD2, ATG16L1, IL10/IL19, C13orf31 and chr21q loci was discovered and successfully replicated in the independent WTCCC cohort. The explained heritability of the 71 CD SNPs alone was 24% and increased to 27% after adding the genetic interactions.
A novel approach allowed the identification and replication of genetic interactions among NOD2, ATG16L1, IL10/IL19, C13orf31 and chr21q loci. CD risk can be predicted by a model of 71 CD loci and improved by adding genetic interactions.
genetics; genetic interaction; cumulative genetic effect; Crohn's disease
Next-generation sequencing technologies have been designed to discover rare and de novo variants and are an important tool for identifying rare disease variants. Many statistical methods have been developed to test, using next-generation sequencing data, for rare variants that are associated with a trait. However, many of these methods make assumptions that rare variants are in linkage equilibrium in a gene. In this report, we studied whether transmitted or untransmitted haplotypes carry an excess of rare variants using the whole genome sequencing data of 15 large Mexican American pedigrees provided by the Genetic Analysis Workshop 18. We observed that an excess of rare variants are carried on either transmitted or nontransmitted haplotypes from parents to offspring. Further analyses suggest that such nonrandom associations among rare variants can be attributed to population admixture and single-nucleotide variant calling errors. Our results have significant implications for rare variant association studies, especially those conducted in admixed populations.
De novo mutations enrich the sequence diversity and carry the clue of evolutional selection. Recent studies suggest the de novo mutations could be one of the risk factors for complex diseases. We conducted a survey of de novo mutations using the whole genome sequence data but only available on the odd autosomes of Mexican American families provided by Genetic Analysis Workshop 18. We extracted 8 three-generation families who have sequencing data available from 20 large pedigrees. By comparing the known single nucleotide variants (SNVs) in dbSNP129 and the de novo variants transmitted in the Mexican American families, we were able to estimate a de novo mutation rate of 1.64(±0.42) × 10−8 per position per haploid genome. This result is consistent with the estimates in literature that required many extensive validation efforts, such as genotyping and further resequencing. Our analysis suggests the importance of using family samples for studying rare variants.
Blood pressure (BP) is a heritable determinant of risk for cardiovascular disease (CVD). To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP) and pulse pressure (PP), we genotyped ∼50 000 single-nucleotide polymorphisms (SNPs) that capture variation in ∼2100 candidate genes for cardiovascular phenotypes in 61 619 individuals of European ancestry from cohort studies in the USA and Europe. We identified novel associations between rs347591 and SBP (chromosome 3p25.3, in an intron of HRH1) and between rs2169137 and DBP (chromosome1q32.1 in an intron of MDM4) and between rs2014408 and SBP (chromosome 11p15 in an intron of SOX6), previously reported to be associated with MAP. We also confirmed 10 previously known loci associated with SBP, DBP, MAP or PP (ADRB1, ATP2B1, SH2B3/ATXN2, CSK, CYP17A1, FURIN, HFE, LSP1, MTHFR, SOX6) at array-wide significance (P < 2.4 × 10−6). We then replicated these associations in an independent set of 65 886 individuals of European ancestry. The findings from expression QTL (eQTL) analysis showed associations of SNPs in the MDM4 region with MDM4 expression. We did not find any evidence of association of the two novel SNPs in MDM4 and HRH1 with sequelae of high BP including coronary artery disease (CAD), left ventricular hypertrophy (LVH) or stroke. In summary, we identified two novel loci associated with BP and confirmed multiple previously reported associations. Our findings extend our understanding of genes involved in BP regulation, some of which may eventually provide new targets for therapeutic intervention.
Hepatocellular carcinoma (HCC) is a hypervascular tumor and accumulating evidence suggests that angiogenesis plays an important role in HCC development. Cordycepin, also known as 3′-deoxyadenosine, is a derivative of adenosine, and numerous cellular enzymes cannot differentiate the two. The aim of the present study was to determine whether cordycepin regulates proliferation, migration and angiogenesis in a human umbilical vein endothelial cell line (EA.hy926) and in a hepatocellular carcinoma cell line (HepG2). MTT was used to assess cell proliferation. Apoptosis was analyzed by flow cytometry (propidium iodide staining). Transwell and wound healing assays were used to analyze the migration and invasion of HepG2 and EA.hy926 cells. Angiogenesis in EA.hy926 cells was assessed using a tube formation assay. Cordycepin strongly suppressed HepG2 and EA.hy926 cell proliferation in a dose- and time-dependent manner. Cordycepin induced EA.hy926 cell apoptosis in a dose-dependent manner (2,000 μg/ml: 50.20±1.55% vs. 0 μg/ml: 2.62±0.19%; P<0.01). Cordycepin inhibited EA.hy926 cell migration (percentage of wound healing area, 2,000 μg/ml: 3.45±0.29% vs. 0 μg/ml: 85.48±0.84%; P<0.05), as well as tube formation (total length of tubular structure, 1,000 μg/ml: 107±39 μm vs. 0 μg/ml: 936±56 μm; P<0.05). Cordycepin also efficiently inhibited HepG2 cell invasion and migration. High-performance liquid chromatography analysis of the cytosol from EA.hy926 cells showed that cordycepin was stable for 3 h. In conclusion, cordycepin not only inhibited human HepG2 cell proliferation and invasion, but also induced apoptosis and inhibited migration and angiogenesis in vascular endothelial cells, suggesting that cordycepin may be used as a novel anti-angiogenic therapy in HCC.
cordycepin; angiogenesis; invasion; hepatocellular carcinoma; apoptosis; vascular endothelial cells
Periodontitis and other bone loss diseases, decreasing bone volume and strength, have a significant impact on millions of people with the risk of tooth loss and bone fracture. The integrity and strength of bone are maintained through the balance between bone resorption and bone formation by osteoclasts and osteoblasts, respectively, so the loss of bone results from the disruption of such balance due to increased resorption or/and decreased formation of bone. The goal of therapies for diseases of bone loss is to reduce bone loss, improve bone formation, and then keep healthy bone density. Current therapies have mostly relied on long-term medication, exercise, anti-inflammatory therapies, and changing of the life style. However there are some limitations for some patients in the effective treatments for bone loss diseases because of the complexity of bone loss. Interleukin-10 (IL-10) is a potent anti-inflammatory cytokine, and recent studies have indicated that IL-10 can contribute to the maintenance of bone mass through inhibition of osteoclastic bone resorption and regulation of osteoblastic bone formation. This paper will provide a brief overview of the role of IL-10 in bone loss diseases and discuss the possibility of IL-10 adoption in therapy of bone loss diseases therapy.
Genome-wide association studies (GWAS) have identified 36 loci associated with body mass index (BMI), predominantly in populations of European ancestry. We conducted a meta-analysis to examine the association of >3.2 million SNPs with BMI in 39,144 men and women of African ancestry, and followed up the most significant associations in an additional 32,268 individuals of African ancestry. We identified one novel locus at 5q33 (GALNT10, rs7708584, p=3.4×10−11) and another at 7p15 when combined with data from the Giant consortium (MIR148A/NFE2L3, rs10261878, p=1.2×10−10). We also found suggestive evidence of an association at a third locus at 6q16 in the African ancestry sample (KLHL32, rs974417, p=6.9×10−8). Thirty-two of the 36 previously established BMI variants displayed directionally consistent effect estimates in our GWAS (binomial p=9.7×10−7), of which five reached genome-wide significance. These findings provide strong support for shared BMI loci across populations as well as for the utility of studying ancestrally diverse populations.
Genetic variants in 296 genes in regions identified through admixture mapping of hypertension, BMI, and lipids were assessed for association with hypertension, blood pressure, BMI, and HDL-C.
This study identified coding SNPs identified from HapMap2 data that were located in genes on chromosomes 5, 6, 8, and 21, where ancestry association evidence for hypertension, BMI or HDL-C was identified in previous admixture mapping studies. Genotyping was performed in 1,733 unrelated African-Americans from the National Heart, Lung and Blood Institute’s (NHLBI) Family Blood Pressure Project, and gene-based association analyses were conducted for hypertension, systolic blood pressure (SBP), diastolic blood pressure (DBP), BMI, and HDL-C. A gene score based on the number of minor alleles of each SNP in a gene was created and used for gene-based regression analyses, adjusting for age, age2, sex, local marker ancestry, and BMI, as applicable. An individual’s African ancestry estimated from 2,507 ancestry-informative markers was also adjusted for to eliminate any confounding due to population stratification.
CXADR (rs437470) on chromosome 21 was associated with SBP and DBP with or without adjusting for local ancestry (p < 0.0006). F2RL1 (rs631465) on chromosome 5 was associated with BMI (p = 0.0005). Local ancestry in these regions was associated with the respective traits as well.
This study suggests that CXADR and F2RL1 likely play important roles in blood pressure and obesity variation, respectively; and these findings are consistent with other studies, so replication and functional analyses are necessary.
Blood pressure; Obesity; African Americans; Genetic Association Studies
When dense markers are available, one can interrogate almost every common variant across the genome via imputation and single nucleotide polymorphism (SNP) test, which has become a routine in current genome-wide association studies (GWASs). As a complement, admixture mapping exploits the long-range linkage disequilibrium (LD) generated by admixture between genetically distinct ancestral populations. It is then questionable whether admixture mapping analysis is still necessary in detecting the disease associated variants in admixed populations. We argue that admixture mapping is able to reduce the burden of massive comparisons in GWASs; it therefore can be a powerful tool to locate the disease variants with substantial allele frequency differences between ancestral populations. In this report we studied a two-stage approach, where candidate regions are defined by conducting admixture mapping at stage 1, and single SNP association tests are followed at stage 2 within the candidate regions defined at stage 1. We first established the genome-wide significance levels corresponding to the criteria to define the candidate regions at stage 1 by simulations. We next compared the power of the two-stage approach with direct association analysis. Our simulations suggest that the two-stage approach can be more powerful than the standard genome-wide association analysis when the allele frequency difference of a causal variant in ancestral populations, is larger than 0.4. Our conclusion is consistent with a theoretical prediction by Risch and Tang ( Am J Hum Genet 79:S254). Surprisingly, our study also suggests that power can be improved when we use less strict criteria to define the candidate regions at stage 1.
genome-wide association studies; admixture mapping; permutation based significance thresholds; two-stage approach
In response to the increased organ shortage, organs derived from donation after cardiac death (DCD) donors are becoming an acceptable option once again for clinical use in transplantation. However, transplant outcomes in cases where DCD organs are used are not as favorable as those from donation after brain death or living donors. Different methods of organ preservation are a key factor that may influence the outcomes of DCD kidney transplantation.
We compared the transplant outcomes in patients receiving DCD kidneys preserved by machine perfusion (MP) or by static cold storage (CS) preservation by conducting a meta-analysis. The MEDLINE, EMBASE and Cochrane Library databases were searched. All studies reporting outcomes for MP versus CS preserved DCD kidneys were further considered for inclusion in this meta-analysis. Odds ratios and 95% confidence intervals (CI) were calculated to compare the pooled data between groups that were transplanted with kidneys that were preserved by MP or CS.
Four prospective, randomized, controlled trials, involving 175 MP and 176 CS preserved DCD kidney transplant recipients, were included. MP preserved DCD kidney transplant recipients had a decreased incidence of delayed graft function (DGF) with an odd ration of 0.56 (95% CI = 0.36–0.86, P = 0.008) compared to CS. However, no significant differences were seen between the two technologies in incidence of primary non-function, one year graft survival, or one year patient survival.
MP preservation of DCD kidneys is superior to CS in terms of reducing DGF rate post-transplant. However, primary non-function, one year graft survival, and one year patient survival were not affected by the use of MP or CS for preservation.
In genetic association studies, it is necessary to correct for population structure to avoid inference bias. During the past decade, prevailing corrections often only involved adjustments of global ancestry differences between sampled individuals. Nevertheless, population structure may vary across local genomic regions due to the variability of local ancestries associated with natural selection, migration, or random genetic drift. Adjusting for global ancestry alone may be inadequate when local population structure is an important confounding factor. In contrast, adjusting for local ancestry can more effectively prevent false-positives due to local population structure. To more accurately locate disease genes, we recommend adjusting for local ancestries by interrogating local structure. In practice, locus-specific ancestries are usually unknown and cannot be accurately inferred when ancestral population information is not available. For such scenarios, we propose employing local principal components (PC) to represent local ancestries and adjusting for local PCs when testing for genotype–phenotype association. With an acceptable computation burden, the proposed algorithm successfully eliminates the known spurious association between SNPs in the LCT gene and height due to the population structure in European Americans.
Genome-wide association studies; Local ancestries; Local principal components; Migration; Random genetic drift; Natural selection; Genomic inflation factor; Genomic control; Local ancestry principal components correction; Fine mapping
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Association studies; Family data; Score test; Multi-marker test
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
Populations of ethnic mixtures can be useful in genetic studies. Admixture mapping, or mapping by admixture linkage disequilibrium (MALD), is specially developed for admixed populations and can supplement traditional genome-wide association analyses in the search for genetic variants underlying complex traits. Admixture mapping tests the association between a trait and locus-specific ancestries. The locus-specific ancestries are in linkage disequilibrium (LD) which is generated by the admixture process between genetically distinct ancestral populations. Because of highly correlated locus-specific ancestries, admixture mapping performs many fewer independent tests across the genome than current genome-wide association analysis. Therefore, admixture mapping can be more powerful because of the smaller penalty due to multiple tests. In this chapter, I introduce the theory behind admixture mapping and how we conduct the analysis in practice.
Admixture mapping; Population admixture; Ancestry information marker; Hidden Markov model
Interactions among genomic loci (also known as epistasis) have been suggested as one of the potential sources of missing heritability in single locus analysis of genome-wide association studies (GWAS). The computational burden of searching for interactions is compounded by the extremely low threshold for identifying significant p-values due to multiple hypothesis testing corrections. Utilizing prior biological knowledge to restrict the set of candidate SNP pairs to be tested can alleviate this problem, but systematic studies that investigate the relative merits of integrating different biological frameworks and GWAS data have not been conducted.
We developed four biologically based frameworks to identify pairwise interactions among candidate SNP pairs as follows: (1) for each human protein-coding gene, a set of SNPs associated with that gene was constructed providing a gene-based interaction model, (2) for each known biological pathway, a set of SNPs associated with the genes in the pathway was constructed providing a pathway-based interaction model, (3) a set of SNPs associated with genes in a disease-related subnetwork provides a network-based interaction model, and (4) a framework is based on the function of SNPs. The last approach uses expression SNPs (eSNPs or eQTLs), which are SNPs or loci that have defined effects on the abundance of transcripts of other genes. We constructed pairs of eSNPs and SNPs located in the target genes whose expression is regulated by eSNPs. For all four frameworks the SNP sets were exhaustively tested for pairwise interactions within the sets using a traditional logistic regression model after excluding genes that were previously identified to associate with the trait. Using previously published GWAS data for type 2 diabetes (T2D) and the biologically based pair-wise interaction modeling, we identify twelve genes not seen in the previous single locus analysis.
We present four approaches to detect interactions associated with complex diseases. The results show our approaches outperform the traditional single locus approaches in detecting genes that previously did not reach significance; the results also provide novel drug targets and biomarkers relevant to the underlying mechanisms of disease.
Although obstructive sleep apnea (OSA) is known to have a strong familial basis, no genetic polymorphisms influencing apnea risk have been identified in cross-cohort analyses. We utilized the National Heart, Lung, and Blood Institute (NHLBI) Candidate Gene Association Resource (CARe) to identify sleep apnea susceptibility loci. Using a panel of 46,449 polymorphisms from roughly 2,100 candidate genes on a customized Illumina iSelect chip, we tested for association with the apnea hypopnea index (AHI) as well as moderate to severe OSA (AHI≥15) in 3,551 participants of the Cleveland Family Study and two cohorts participating in the Sleep Heart Health Study.
Among 647 African-Americans, rs11126184 in the pleckstrin (PLEK) gene was associated with OSA while rs7030789 in the lysophosphatidic acid receptor 1 (LPAR1) gene was associated with AHI using a chip-wide significance threshold of p-value<2×10−6. Among 2,904 individuals of European ancestry, rs1409986 in the prostaglandin E2 receptor (PTGER3) gene was significantly associated with OSA. Consistency of effects between rs7030789 and rs1409986 in LPAR1 and PTGER3 and apnea phenotypes were observed in independent clinic-based cohorts.
Novel genetic loci for apnea phenotypes were identified through the use of customized gene chips and meta-analyses of cohort data with replication in clinic-based samples. The identified SNPs all lie in genes associated with inflammation suggesting inflammation may play a role in OSA pathogenesis.
Genome-wide genotyping of a cohort using pools rather than individual samples has long been proposed as a cost-saving alternative for performing genome-wide association (GWA) studies. However, successful disease gene mapping using pooled genotyping has thus far been limited to detecting common variants with large effect sizes, which tend not to exist for many complex common diseases or traits. Therefore, for DNA pooling to be a viable strategy for conducting GWA studies, it is important to determine whether commonly used genome-wide SNP array platforms such as the Affymetrix 6.0 array can reliably detect common variants of small effect sizes using pooled DNA. Taking obesity and age at menarche as examples of human complex traits, we assessed the feasibility of genome-wide genotyping of pooled DNA as a single-stage design for phenotype association. By individually genotyping the top associations identified by pooling, we obtained a 14- to 16-fold enrichment of SNPs nominally associated with the phenotype, but we likely missed the top true associations. In addition, we assessed whether genotyping pooled DNA can serve as an inexpensive screen as the second stage of a multi-stage design with a large number of samples by comparing the most cost-effective 3-stage designs with 80% power to detect common variants with genotypic relative risk of 1.1, with and without pooling. Given the current state of the specific technology we employed and the associated genotyping costs, we showed through simulation that a design involving pooling would be 1.07 times more expensive than a design without pooling. Thus, while a significant amount of information exists within the data from pooled DNA, our analysis does not support genotyping pooled DNA as a means to efficiently identify common variants contributing small effects to phenotypes of interest. While our conclusions were based on the specific technology and study design we employed, the approach presented here will be useful for evaluating the utility of other or future genome-wide genotyping platforms in pooled DNA studies.
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome-wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway-based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within-category selection to identify the most important SNPs within each gene set. The proposed model operates in a well-established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures and we illustrate the SPCA method using the Wellcome Trust Case-Control Consortium Crohn Disease (CD) dataset.
SNPs; genome-wide association; pathway analysis; principal component analysis
Hyper-phosphorylation at the Y705 residue of signal transducer and activator of transcription 3 (STAT3) is implicated in tumorigenesis of leukemia and some solid tumors. However, its role in the development of colorectal cancer (CRC) is not well defined. To rigorously test the impact of this phosphorylation on colorectal tumorigenesis, we engineered a STAT3 Y705F knock-in to interrupt STAT3 activity in HCT116 and RKO CRC cells. These STAT3 Y705F mutant cells fail to respond to cytokine stimulation and grow slower than parental cells. These mutant cells are also greatly diminished in their abilities to form colonies in culture, to exhibit anchorage-independent growth in soft agar, and to grow as xenografts in nude mice. These observations strongly support the premise that STAT3 Y705 phosphorylation is crucial in colorectal tumorigenesis. Although it is generally believed that STAT3 functions as a transcription factor, recent studies indicate that transcription-independent functions of STAT3 also play an important role in tumorigenesis. We show here that wild-type STAT3, but not STAT3 Y705F mutant protein, associates with PLCγ1. PLCγ1 is a central signal transducer of growth factor and cytokine signaling pathways that are involved in tumorigenesis. In STAT3 Y705F mutant CRC cells, PLCγ1 activity is reduced. Moreover, over-expression of a constitutively active form of PLC γ1 rescues the transformation defect of STAT3 Y705F mutant cells. In aggregate, our study identifies previously unknown cross-talk between STAT3 and the PLCγ signaling pathways that may play a critical role in colorectal tumorigenesis.
STAT3; PLC; colorectal cancer; phosphorylation; PTPRT
It is generally known that risk variants segregate together with a disease within families but this information has not been used in the existing statistical methods for detecting rare variants. Here we introduce two weighted sum statistics that can apply to either genome-wide association data or resequencing data for identifying rare disease variants: weights calculated based on sibpairs and odd ratios, respectively. We evaluated the two methods via extensive simulations under different disease models. We compared the proposed methods with the weighted sum statistic (WSS) proposed by Madsen and Browning, keeping the same genotyping or resequencing cost. Our methods clearly demonstrate more statistical power than the WSS. In addition, we found using sibpair information can increase power over using only unrelated samples by more than 40%. We applied our methods to the Framingham Heart Study (FHS) and Wellcome Trust Case Control Consortium (WTCCC) hypertension datasets. Although we did not identify any genes as reaching a genome-wide significance level, we found variants in the candidate gene angiotensinogen (AGT) significantly associated with hypertension at P=6.9×10-4, whereas the most significant single SNP association evidence is P=0.063. We further applied the odds ratio weighted method to the IFIH1 gene for type 1 diabetes in the WTCCC data. Our method yielded a P value of 4.82×10-4, much more significant than that obtained by haplotype-based methods. We demonstrated that family data are extremely informative in searching for rare variants underlying complex traits, and the odds ratio weighted sum statistic is more efficient than currently existing methods.
Admixture mapping based on recently admixed populations is a powerful method to detect disease variants with substantial allele frequency differences in ancestral populations. We performed admixture mapping analysis for systolic blood pressure (SBP) and diastolic blood pressure (DBP), followed by trait-marker association analysis, in 6303 unrelated African-American participants of the Candidate Gene Association Resource (CARe) consortium. We identified five genomic regions (P< 0.001) harboring genetic variants contributing to inter-individual BP variation. In follow-up association analyses, correcting for all tests performed in this study, three loci were significantly associated with SBP and one significantly associated with DBP (P< 10−5). Further analyses suggested that six independent single-nucleotide polymorphisms (SNPs) contributed to the phenotypic variation observed in the admixture mapping analysis. These six SNPs were examined for replication in multiple, large, independent studies of African-Americans [Women's Health Initiative (WHI), Maywood, Genetic Epidemiology Network of Arteriopathy (GENOA) and Howard University Family Study (HUFS)] as well as one native African sample (Nigerian study), with a total replication sample size of 11 882. Meta-analysis of the replication set identified a novel variant (rs7726475) on chromosome 5 between the SUB1 and NPR3 genes, as being associated with SBP and DBP (P< 0.0015 for both); in meta-analyses combining the CARe samples with the replication data, we observed P-values of 4.45 × 10−7 for SBP and 7.52 × 10−7 for DBP for rs7726475 that were significant after accounting for all the tests performed. Our study highlights that admixture mapping analysis can help identify genetic variants missed by genome-wide association studies because of drastically reduced number of tests in the whole genome.