It is usually observed that among genes there exist strong statistical interactions associated with diseases of public health importance. Gene interactions can potentially contribute to the improvement of disease classification accuracy. Especially when gene expression differs across different classes are not great enough, it is more important to take use of gene interactions for disease classification analyses. However, most gene selection algorithms in classification analyses merely focus on genes whose expression levels show differences across classes, and ignore the discriminatory information from gene interactions. In this study, we develop a two-stage algorithm that can take gene interaction into account during a gene selection procedure. Its biggest advantage is that it can take advantage of discriminatory information from gene interactions as well as gene expression differences, by using “Bayes error” as a gene selection criterion. Using simulated and real microarray data sets, we demonstrate the ability of gene interactions for classification accuracy improvement, and present that the proposed algorithm can yield small informative sets of genes while leading to highly accurate classification results. Thus our study may give a novel sight for future gene selection algorithms of human diseases discrimination.
The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group).
We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features.
The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature selection simultaneously. It outperforms the two sCCA methods (CCA-l1 and CCA-group) by identifying the correlated features with more true positives while controlling total discordance at a lower level on the simulated data, even if the group effect does not exist or there are irrelevant features grouped with true correlated features. Compared with our proposed CCA-group sparse models, CCA-l1 tends to select less true correlated features while CCA-group inclines to select more redundant features.
Group sparse CCA; Genomic data integration; Feature selection; SNP
The BMP and Wnt/β-catenin signaling pathways cooperatively regulate osteoblast differentiation and bone formation. Although BMP signaling regulates gene expression of the Wnt pathway, much less is known about whether Wnt signaling modulates BMP expression in osteoblasts. Given the presence of putative Tcf/Lef response elements that bind β-catenin/TCF transcription complex in the BMP2 promoter, we hypothesized that the Wnt/β-catenin pathway stimulates BMP2 expression in osteogenic cells. In this study, we showed that Wnt/β-catenin signaling is active in various osteoblast or osteoblast precursor cell lines, including MC3T3-E1, 2T3, C2C12, and C3H10T1/2 cells. Furthermore, crosstalk between the BMP and Wnt pathways affected BMP signaling activity, osteoblast differentiation, and bone formation, suggesting Wnt signaling is an upstream regulator of BMP signaling. Activation of Wnt signaling by Wnt3a or overexpression of β-catenin/TCF4 both stimulated BMP2 transcription at promoter and mRNA levels. In contrast, transcription of BMP2 in osteogenic cells was decreased by either blocking the Wnt pathway with DKK1 and sFRP4, or inhibiting β-catenin/TCF4 activity with FWD1/β-TrCP, ICAT, or ΔTCF4. Using a site-directed mutagenesis approach, we confirmed that Wnt/β-catenin transactivation of BMP2 transcription is directly mediated through the Tcf/Lef response elements in the BMP2 promoter. These results, which demonstrate that the Wnt/β-catenin signaling pathway is an upstream activator of BMP2 expression in osteoblasts, provide novel insights into the nature of functional cross talk integrating the BMP and Wnt/β-catenin pathways in osteoblastic differentiation and maintenance of skeletal homeostasis.
BMP; Wnt/β-catenin; Gene expression; Osteogenesis
There is growing evidence for a link between energy and bone metabolism. The nuclear receptor subfamily 5 member A2 (NR5A2) is involved in lipid metabolism and modulates the expression of estrogen-related genes in some tissues. The objective of this study was to explore the influence of NR5A2 on bone cells and to determine whether its allelic variations are associated with bone mineral density (BMD).
Analyses of gene expression by quantitative PCR and inhibition of NR5A2 expression by siRNAs were used to explore the effects of NR5A2 in osteoblasts. Femoral neck BMD and 30 single nucleotide polymorphisms (SNPs) were first analyzed in 935 postmenopausal women and the association of NR5A2 genetic variants with BMD was explored in other 1284 women in replication cohorts.
NR5A2 was highly expressed in bone. The inhibition of NR5A2 confirmed that it modulates the expression of osteocalcin, osteoprotegerin, and podoplanin in osteoblasts. Two SNPs were associated with BMD in the Spanish discovery cohort (rs6663479, P=0.0014, and rs2816948, P=0.0012). A similar trend was observed in another Spanish cohort, with statistically significant differences across genotypes in the combined analysis (P=0.03). However, the association in a cohort from the United States was rather weak. Electrophoretic mobility assays and studies with luciferase reporter vectors confirmed the existence of differences in the binding of nuclear proteins and the transcriptional activity of rs2816948 alleles.
NR5A2 modulates gene expression in osteoblasts and some allelic variants are associated with bone mass in Spanish postmenopausal women.
Deng and Lynch (1, 2) proposed to
characterize deleterious genomic mutations from changes in the mean and genetic variance of
fitness traits upon selfing in outcrossing populations. Such observations can be readily acquired
in cyclical parthenogens. Selfing and life-table experiments were performed for two such
Daphnia populations. A significant inbreeding depression and an increase of genetic variance for
all traits analyzed were observed. Deng and Lynch's (2) procedures were employed to estimate the genomic mutation rate (U), mean dominance coefficient (
selection coefficient (
), and scaled genomic mutational variance (
(^ indicates an
estimate) are 0.84, 0.30, 0.14 and 4.6E-4 respectively. For the true values, the
are lower bounds, and
The present study searched for replicable risk genomic regions for alcohol and nicotine co-dependence using a genome-wide association strategy. The data contained a total of 3,143 subjects including 818 European-American (EA) cases with alcohol and nicotine co-dependence, 1,396 EA controls, 449 African-American (AA) cases and 480 AA controls. We performed separate genome-wide association analyses in EAs and AAs and a meta-analysis to derive combined p values, and calculated the genome-wide false discovery rate (FDR) for each SNP. Regions with p<5×10-7 together with FDR<0.05 in the meta-analysis were examined to detect all replicable risk SNPs across EAs, AAs and meta-analysis. These SNPs were followed with a series of functional expression quantitative trait locus (eQTL) analyses. We found a unique genome-wide significant gene region – SH3BP5-NR2C2 – that was enriched with 11 replicable risk SNPs for alcohol and nicotine co-dependence. The distributions of -log(p) values for all SNP-disease associations within this region were consistent across EAs, AAs, and meta-analysis (0.315≤r≤0.868; 8.1×10-52≤p≤3.6×10-5). In the meta-analysis, this region was the only association peak throughout chromosome 3 at p<0.0001. All replicable risk markers available for eQTL analysis had nominal cis- and trans-acting regulatory effects on gene expression. The transcript expression of the genes in this region was regulated partly by several nicotine dependence-related genes and significantly correlated with transcript expression of many alcohol and nicotine dependence-related genes. We concluded that the SH3BP5-NR2C2 region on Chromosome 3 might harbor causal loci for alcohol and nicotine co-dependence.
GWAS; alcohol and nicotine co-dependence
Copy number variation (CNV) is an important structural variation (SV) in human genome. Various studies have shown that CNVs are associated with complex diseases. Traditional CNV detection methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution. The next generation sequencing (NGS) technique promises a higher resolution detection of CNVs and several methods were recently proposed for realizing such a promise. However, the performances of these methods are not robust under some conditions, e.g., some of them may fail to detect CNVs of short sizes. There has been a strong demand for reliable detection of CNVs from high resolution NGS data.
A novel and robust method to detect CNV from short sequencing reads is proposed in this study. The detection of CNV is modeled as a change-point detection from the read depth (RD) signal derived from the NGS, which is fitted with a total variation (TV) penalized least squares model. The performance (e.g., sensitivity and specificity) of the proposed approach are evaluated by comparison with several recently published methods on both simulated and real data from the 1000 Genomes Project.
The experimental results showed that both the true positive rate and false positive rate of the proposed detection method do not change significantly for CNVs with different copy numbers and lengthes, when compared with several existing methods. Therefore, our proposed approach results in a more reliable detection of CNVs than the existing methods.
Therapeutic interventions in prediabetes are important in the primary prevention of type 2 diabetes (T2D) and its chronic complications. However, little is known about the pharmacogenetic effect of traditional herbs on prediabetes treatment. A total of 194 impaired glucose tolerance (IGT) subjects were treated with traditional hypoglycemic herbs (Tianqi Jiangtang) for 12 months in this study. DNA samples were genotyped for 184 mutations in 34 genes involved in drug metabolism or transportation. Multinomial logistic regression analysis indicated that rs1142345 (A > G) in the thiopurine S-methyltransferase (TPMT) gene was significantly associated with the hypoglycemic effect of the drug (P = 0.001, FDR P = 0.043). The “G” allele frequencies of rs1142345 in the healthy (subjects reverted from IGT to normal glucose tolerance), maintenance (subjects still had IGT), and deterioration (subjects progressed from IGT to T2D) groups were 0.094, 0.214, and 0.542, respectively. Binary logistic regression analysis indicated that rs1142345 was also significantly associated with the hypoglycemic effect of the drug between the healthy and maintenance groups (P = 0.027, OR = 4.828) and between the healthy and deterioration groups (P = 0.001, OR = 7.811). Therefore, rs1142345 was associated with the clinical effect of traditional hypoglycemic herbs. Results also suggested that TPMT was probably involved in the pharmacological mechanisms of T2D.
Copy number variation (CNV) has played an important role in studies of susceptibility or resistance to complex diseases. Traditional methods such as fluorescence in situ hybridization (FISH) and array comparative genomic hybridization (aCGH) suffer from low resolution of genomic regions. Following the emergence of next generation sequencing (NGS) technologies, CNV detection methods based on the short read data have recently been developed. However, due to the relatively young age of the procedures, their performance is not fully understood. To help investigators choose suitable methods to detect CNVs, comparative studies are needed. We compared six publicly available CNV detection methods: CNV-seq, FREEC, readDepth, CNVnator, SegSeq and event-wise testing (EWT). They are evaluated both on simulated and real data with different experiment settings. The receiver operating characteristic (ROC) curve is employed to demonstrate the detection performance in terms of sensitivity and specificity, box plot is employed to compare their performances in terms of breakpoint and copy number estimation, Venn diagram is employed to show the consistency among these methods, and F-score is employed to show the overlapping quality of detected CNVs. The computational demands are also studied. The results of our work provide a comprehensive evaluation on the performances of the selected CNV detection methods, which will help biological investigators choose the best possible method.
Genotype imputation is an important tool in human genetics studies, which uses reference sets with known genotypes and prior knowledge on linkage disequilibrium and recombination rates to infer un-typed alleles for human genetic variations at a low cost. The reference sets used by current imputation approaches are based on HapMap data, and/or based on recently available next-generation sequencing (NGS) data such as data generated by the 1000 Genomes Project. However, with different coverage and call rates for different NGS data sets, how to integrate NGS data sets of different accuracy as well as previously available reference data as references in imputation is not an easy task and has not been systematically investigated. In this study, we performed a comprehensive assessment of three strategies on using NGS data and previously available reference data in genotype imputation for both simulated data and empirical data, in order to obtain guidelines for optimal reference set construction. Briefly, we considered three strategies: strategy 1 uses one NGS data as a reference; strategy 2 imputes samples by using multiple individual data sets of different accuracy as independent references and then combines the imputed samples with samples based on the high accuracy reference selected when overlapping occurs; and strategy 3 combines multiple available data sets as a single reference after imputing each other. We used three software (MACH, IMPUTE2 and BEAGLE) for assessing the performances of these three strategies. Our results show that strategy 2 and strategy 3 have higher imputation accuracy than strategy 1. Particularly, strategy 2 is the best strategy across all the conditions that we have investigated, producing the best accuracy of imputation for rare variant. Our study is helpful in guiding application of imputation methods in next generation association analyses.
Previous studies using SAGE (the Study of Addiction: Genetics and Environment) and COGA (the Collaborative Study on the Genetics of Alcoholism) genome-wide association study (GWAS) data sets reported several risk loci for alcohol dependence (AD), which have not yet been well replicated independently or confirmed by functional studies. We combined these two data sets, now publicly available, to increase the study power, in order to identify replicable, functional, and significant risk regions for AD. A total of 4116 subjects (1409 European-American (EA) cases with AD, 1518 EA controls, 681 African-American (AA) cases, and 508 AA controls) underwent association analysis. An additional 443 subjects underwent expression quantitative trait locus (eQTL) analysis. Genome-wide association analysis was performed in EAs to identify significant risk genes. All available markers in the genome-wide significant risk genes were tested in AAs for associations with AD, and in six HapMap populations and two European samples for associations with gene expression levels. We identified a unique genome-wide significant gene—KIAA0040—that was enriched with many replicable risk SNPs for AD, all of which had significant cis-acting regulatory effects. The distributions of −log(p) values for SNP-disease and SNP-expression associations for all markers in the TNN–KIAA0040 region were consistent across EAs, AAs, and five HapMap populations (0.369⩽r⩽0.824; 2.8 × 10−9⩽p⩽0.032). The most significant SNPs in these populations were in high LD, concentrating in KIAA0040. Finally, expression of KIAA0040 was significantly (1.2 × 10−11⩽p⩽1.5 × 10−6) associated with the expression of numerous genes in the neurotransmitter systems or metabolic pathways previously associated with AD. We concluded that KIAA0040 might harbor a causal variant for AD and thus might directly contribute to risk for this disorder. KIAA0040 might also contribute to the risk of AD via neurotransmitter systems or metabolic pathways that have previously been implicated in the pathophysiology of AD. Alternatively, KIAA0040 might regulate the risk via some interactions with flanking genes TNN and TNR. TNN is involved in neurite outgrowth and cell migration in hippocampal explants, and TNR is an extracellular matrix protein expressed primarily in the central nervous system.
risk region; alcohol dependence; cis-eQTL; GWAS; alcohol & alcoholism; neurogenetics; addiction & substance abuse; biological psychiatry; GWAScis-eQTL; risk region
It has been a research focus to uncover the genetic determination of complex diseases caused by rare variants. As the vast majority of genomic variants represent background variation, highlighting potentially causal mutations through weighting scheme is critical to the success of rare variants aimed association studies. In this study, we propose a novel Bayesian marker selection approach to perform weighting-based association test. In this approach, individual association signal and its direction are used to weight variants. In addition, the predicted biological function of variants is taken as prior information to direct the selection of likely causal variants. Simulation studies show that the proposed method has improved power over several existing methods in certain conditions. Analyses of two empirical datasets demonstrate its applicability.
weighting; Bayesian marker selection; rare variants; association
Osteoporosis (OP) is characterized by low bone mineral density (BMD) and has strong genetic determination. However, specific genetic variants influencing BMD and contributing to pathogenesis of osteoporosis are largely uncharacterized. Current genetic studies in bone filed, which aimed at identification of OP risk genes, are mostly focused on DNA, RNA, or protein level individually, lacking integrative evidences from the three levels of genetic information flow to confidently ascertain the significance of genes for osteoporosis. Our previous proteomics study discovered that superoxide dismutase 2 (SOD2) in circulating monocytes (CMCs, i.e., potential osteoclast precursors) was significantly up-regulated at protein level in vivo in Chinese with low vs. high hip BMD. Herein, at mRNA level, we found that SOD2 gene expression was also up-regulated in CMC (p < 0.05) in Chinese with low vs. high hip BMD. At DNA level, in 1,627 unrelated Chinese subjects, we identified eight SNPs at SOD2 gene locus that were suggestively associated with hip BMD (peak signal at rs11968525, p = 0.048). Among the eight SNPs, three SNPs (rs7754103, rs7754295, and rs2053949) were associated with SOD2 mRNA expression level (p < 0.05), suggesting that they are expression quantitative trait locus (eQTL) regulating SOD2 gene expression. In conclusion, the present integrative evidences from DNA, RNA, and protein levels supported SOD2 as a susceptibility gene for osteoporosis.
Osteoporosis; SOD2; eQTL; BMD
Various types of genomic data (e.g., SNPs and mRNA transcripts) have been employed to identify risk genes for complex diseases. However, the analysis of these data has largely been performed in isolation. Combining these multiple data for integrative analysis can take advantage of complementary information and thus can have higher power to identify genes (and/or their functions) that would otherwise be impossible with individual data analysis. Due to the different nature, structure, and format of diverse sets of genomic data, multiple genomic data integration is challenging. Here we address the problem by developing a sparse representation based clustering (SRC) method for integrative data analysis. As an example, we applied the SRC method to the integrative analysis of 376821 SNPs in 200 subjects (100 cases and 100 controls) and expression data for 22283 genes in 80 subjects (40 cases and 40 controls) to identify significant genes for osteoporosis (OP). Comparing our results with previous studies, we identified some genes known related to OP risk (e.g., ‘THSD4’, ‘CRHR1’, ‘HSD11B1’, ‘THSD7A’, ‘BMPR1B’ ‘ADCY10’, ‘PRL’, ‘CA8’,’ESRRA’, ‘CALM1’, ‘CALM1’, ‘SPARC’, and ‘LRP1’). Moreover, we uncovered novel osteoporosis susceptible genes (‘DICER1’, ‘PTMA’, etc.) that were not found previously but play functionally important roles in osteoporosis etiology from existing studies. In addition, the SRC method identified genes can lead to higher accuracy for the diagnosis/classification of osteoporosis subjects when compared with the traditional T-test and Fisher-exact test, which further validates the proposed SRC approach for integrative analysis.
Many lines of evidence suggest that mitochondrial DNA (mtDNA) variants are involved in the pathogenesis of human complex diseases, especially for age-related disorders. Osteoporosis is a typical age-related complex disease. However, the role of mtDNA variants in the susceptibility of osteoporosis is largely unknown. In this study, we performed a mitochondria-wide association study for osteoporosis in Caucasians. A total of 445 mitochondrial single nucleotide polymorphisms (mtSNPs) were genotyped in a large sample of 2,286 unrelated Caucasian subjects by using the Affymetrix Genome-Wide SNP Array 6.0, and 72 mtSNPs survived the quality control. We first tested for association between single-mtSNP and bone mineral density (BMD), and identified that, a mtSNP within the NADH dehydrogenase 2 gene (ND2), mt4823 C/A polymorphism, was strongly associated with hip BMD (P = 2.05 × 10−4), even after conservative Bonferroni correction‥ The C allele of mt4823 was associated with reduced hip BMD and the effect size (β) was estimated to be ~0.044. Another SNP mt15885 within the Cytochrome b gene (Cytb) was found to be associated both with spine (P = 1.66×10−3) and hip BMD (P = 0.023). The T allele of mt15885 had a protective effect on spine (β = 0.064) and hip BMD (β = 0.038). Next, we classified subjects into the nine common European haplogroups and conducted association analyses. Subjects classified as haplogroup X had significantly lower mean hip BMD values than others (P = 0.040). Our results highlighted the importance of mtDNA variants in influencing BMD variation and risk to osteoporosis.
mtSNP; haplogroup; osteoporosis; BMD; association
Motivation: Several new de novo assembly tools have been developed recently to assemble short sequencing reads generated by next-generation sequencing platforms. However, the performance of these tools under various conditions has not been fully investigated, and sufficient information is not currently available for informed decisions to be made regarding the tool that would be most likely to produce the best performance under a specific set of conditions.
Results: We studied and compared the performance of commonly used de novo assembly tools specifically designed for next-generation sequencing data, including SSAKE, VCAKE, Euler-sr, Edena, Velvet, ABySS and SOAPdenovo. Tools were compared using several performance criteria, including N50 length, sequence coverage and assembly accuracy. Various properties of read data, including single-end/paired-end, sequence GC content, depth of coverage and base calling error rates, were investigated for their effects on the performance of different assembly tools. We also compared the computation time and memory usage of these seven tools. Based on the results of our comparison, the relative performance of individual tools are summarized and tentative guidelines for optimal selection of different assembly tools, under different conditions, are provided.
Supplementary information: Supplementary data are available at Bioinformatics online.
Genotype imputation is often used in the meta-analysis of genome-wide association studies (GWAS), for combining data from different studies and/or genotyping platforms, in order to improve the ability for detecting disease variants with small to moderate effects. However, how genotype imputation affects the performance of the meta-analysis of GWAS is largely unknown. In this study, we investigated the effects of genotype imputation on the performance of meta-analysis through simulations based on empirical data from the Framingham Heart Study. We found that when fix-effects models were used, considerable between-study heterogeneity was detected when causal variants were typed in only some but not all individual studies, resulting in up to ∼25% reduction of detection power. For certain situations, the power of the meta-analysis can be even less than that of individual studies. Additional analyses showed that the detection power was slightly improved when between-study heterogeneity was partially controlled through the random-effects model, relative to that of the fixed-effects model. Our study may aid in the planning, data analysis, and interpretation of GWAS meta-analysis results when genotype imputation is necessary.
MicroRNAs (miRNAs) regulate posttranscriptional gene expression usually by binding to 3'-untranslated regions (3'-UTRs) of target message RNAs (mRNAs). Hence genetic polymorphisms on 3'-UTRs of mRNAs may alter binding affinity between miRNAs target 3'-UTRs, thereby altering translational regulation of target mRNAs and/or degradation of mRNAs, leading to differential protein expression of target genes. Based on a database that catalogues predicted polymorphisms in miRNA target sites (poly-miRTSs), we selected 568 polymorphisms within 3'-UTRs of target mRNAs and performed association analyses between these selected poly-miRTSs and osteoporosis in 997 white subjects who were genotyped by Affymetrix Human Mapping 500K arrays. Initial discovery (in the 997 subjects) and replication (in 1728 white subjects) association analyses identified three poly-miRTSs (rs6854081, rs1048201, and rs7683093) in the fibroblast growth factor 2 (FGF2) gene that were significantly associated with femoral neck bone mineral density (BMD). These three poly-miRTSs serve as potential binding sites for 9 miRNAs (eg, miR-146a and miR-146b). Further gene expression analyses demonstrated that the FGF2 gene was differentially expressed between subjects with high versus low BMD in three independent sample sets. Our initial and replicate association studies and subsequent gene expression analyses support the conclusion that these three polymorphisms of the FGF2 gene may contribute to susceptibility to osteoporosis, most likely through their effects on altered binding affinity for specific miRNAs. © 2011 American Society for Bone and Mineral Research.
MICRORNA; OSTEOPOROSIS; ASSOCIATION; POLYMORPHISM
DNA microarray gene expression and microarray based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer.
Clustering Technique; Comparative Genomic Hybridization (CGH); Copy Number Variation (CNV); Generalized Singular Value Decomposition (GSVD); Gene Expression; Gene Shaving; Independent Component Analysis (ICA)
Menarche and menopause mark lower and upper limits of the female reproductive period. Timing of these events influences female’s health in later life. The onsets of menarche and menopause have a strong genetic basis. We tested two genes, TNFRSF11A (RANK) and TNFSF11 (RANKL), for their association with age at menarche (AM) and age at natural menopause (ANM).
Nineteen SNPs of TNFRSF11A and 12 SNPs of TNFSF11 were genotyped in a random sample of 306 unrelated white women. This sample was analyzed for association of the SNPs and common haplotypes with AM. Then a subsample of 211 females with natural menopause was analyzed for association of both genes with ANM. Smoking, alcohol intake and duration of lactation were applied as covariates in the association analyses.
Three polymorphisms of TNFSF11 were associated with AM: rs2200287 (P = 0.005), rs9525641 (P = 0.039), and rs1054016 (P = 0.047). Two SNPs of this gene, rs346578 and rs9525641, showed association with ANM (P = 0.007 and P = 0.011, respectively). Two SNPs of TNFRSF11A, were associated with AM (rs3826620, P = 0.022) and ANM (rs8086340, P = 0.015). Multiple SNP/SNP and SNP/environment interaction effects on AM and ANM were detected for both genes. One polymorphism of TNFRSF11A, rs4436867, was not directly associated with either trait, but indicated significant interactions with four TNFSF11 polymorphisms on ANM. Two other TNFRSF11A polymorphisms, rs4941125 and rs7235803, showed interaction effects with several TNFSF11 polymorphisms on AM. Both genes manifested significant interaction with the duration of breastfeeding in their effect on ANM.
The TNFRSF11A and TNFSF11 genes are associated with the onset of AM and ANM in white women.
age at natural menopause; age at menarche; association; TNFRSF11A; TNFSF11; polymorphisms; haplotypes
Bone mineral density (BMD) measured at the femoral neck (FN) is the most important risk phenotype for osteoporosis and has been used as a reference standard for describing osteoporosis. The specific genes influencing FN BMD remain largely unknown. To identify such genes, we first performed a genome-wide association (GWA) analysis for FN BMD in a discovery sample consisting of 983 unrelated white subjects. We then tested the top significant single-nucleotide polymorphisms (SNPs; 175 SNPs with p < 5 × 10−4) for replication in a family-based sample of 2557 white subjects. Combing results from these two samples, we found that two genes, parathyroid hormone (PTH) and interleukin 21 receptor (IL21R), achieved consistent association results in both the discovery and replication samples. The PTH gene SNPs, rs9630182, rs2036417, and rs7125774, achieved p values of 1.10 × 10−4, 3.24 × 10−4, and 3.06 × 10−4, respectively, in the discovery sample; p values of 6.50 × 10−4, 5.08 × 10−3, and 5.68 × 10−3, respectively, in the replication sample; and combined p values of 3.98 × 10−7, 9.52 × 10−6, and 1.05 × 10−5, respectively, in the total sample. The IL21R gene SNPs, rs8057551, rs8061992, and rs7199138, achieved p values of 1.51 × 10−4, 1.53 × 10−4, and 3.88 × 10−4, respectively, in the discovery sample; p values of 2.36 × 10−3, 6.74 × 10−3, and 6.41 × 10−3, respectively, in the replication sample; and combined p values of 2.31 × 10−6, 8.62 × 10−6, and 1.41 × 10−5, respectively, in the total sample. The effect size of each SNP was approximately 0.11 SD estimated in the discovery sample. PTH and IL21R both have potential biologic functions important to bone metabolism. Overall, our findings provide some new clues to the understanding of the genetic architecture of osteoporosis. © 2010 American Society for Bone and Mineral Research.
genome-wide association; BMD; PTH; IL21R; osteoporosis
GPRC6A is a widely expressed orphan G protein–coupled receptor that senses extracellular amino acids, osteocalcin, and divalent cations in vitro. GPRC6A null (GPRC6A−/−) mice exhibit multiple metabolic abnormalities including osteopenia. To investigate whether the osseous abnormalities are a direct function of GPRC6A in osteoblasts, we examined the function of primary osteoblasts and bone marrow stromal cell cultures (BMSCs) in GPRC6A−/− mice. We confirmed that GPRC6A−/− mice exhibited a decrease in bone mineral density (BMD) associated with reduced expression of osteocalcin, ALP, osteoprotegerin, and Runx2-II transcripts in bone. Osteoblasts and BMSCs derived from GPRC6A−/− mice exhibited an attenuated response to extracellular calcium-stimulated extracellular signal-related kinase (ERK) activation, diminished alkaline phosphatase (ALP) expression, and impaired mineralization ex vivo. In addition, siRNA-mediated knockdown of GPRC6A in MC3T3 osteoblasts also resulted in a reduction in extracellular calcium-stimulated ERK activity. To explore the potential relevance of GPRC6A function in humans, we looked for an association between GPRC6A gene polymorphisms and BMD in a sample of 1000 unrelated American Caucasians. We found that GPRC6A gene polymorphisms were significantly associated with human spine BMD. These data indicate that GRPC6A directly participates in the regulation of osteoblast-mediated bone mineralization and may mediate the anabolic effects of extracellular amino acids, osteocalcin, and divalent cations in bone. © 2010 American Society for Bone and Mineral Research.
GPRC6A; G protein–coupled receptor (GPCR); osteoblast; bone mineral density; gene polymorphisms
Because of combining the genetic information of multiple loci, multilocus association studies (MLAS) are expected to be more powerful than single locus association studies (SLAS) in disease genes mapping. However, some researchers found that MLAS had similar or reduced power relative to SLAS, which was partly attributed to the increased degrees of freedom (dfs) in MLAS. Based on partial least-squares (PLS) analysis, we develop a MLAS approach, while avoiding large dfs in MLAS. In this approach, genotypes are first decomposed into the PLS components that not only capture majority of the genetic information of multiple loci, but also are relevant for target traits. The extracted PLS components are then regressed on target traits to detect association under multilinear regression. Simulation study based on real data from the HapMap project were used to assess the performance of our PLS-based MLAS as well as other popular multilinear regression-based MLAS approaches under various scenarios, considering genetic effects and linkage disequilibrium structure of candidate genetic regions. Using PLS-based MLAS approach, we conducted a genome-wide MLAS of lean body mass, and compared it with our previous genome-wide SLAS of lean body mass. Simulations and real data analyses results support the improved power of our PLS-based MLAS in disease genes mapping relative to other three MLAS approaches investigated in this study. We aim to provide an effective and powerful MLAS approach, which may help to overcome the limitations of SLAS in disease genes mapping.
Osteoporosis is characterized mainly by low bone mineral density (BMD). Many cytokines and chemokines have been related with bone metabolism. Monocytes in the immune system are important sources of cytokines and chemokines for bone metabolism. However, no study has investigated in vivo expression of a large number of various factors simultaneously in human monocytes underlying osteoporosis. This study explored the in vivo expression pattern of general cytokines, chemokines, and their receptor genes in human monocytes and validated the significant genes by qRT-PCR and genetic association analyses. Expression profilings were performed in monocyte samples from 26 Chinese and 20 Caucasian premenopausal women with discordant BMD. Genome-wide association analysis with BMD variation was conducted in 1000 unrelated Caucasians. We selected 168 cytokines, chemokines, osteoclast-related factors, and their receptor genes for analyses. Significantly, the signal transducer and activator of transcription 1 (STAT1) gene was upregulated in the low versus the high BMD groups in both Chinese and Caucasians. We also revealed a significant association of the STAT1 gene with BMD variation in the 1000 Caucasians. Thus we conclude that the STAT1 gene is important in human circulating monocytes in the etiology of osteoporosis. © 2010 American Society for Bone and Mineral Research.
STAT1; bmd; monocytes; osteoporosis; microarray; SNP