DNA methylation is a widely studied epigenetic phenomenon; alterations in methylation patterns influence human phenotypes and risk of disease. As part of the Atherosclerosis Risk in Communities (ARIC) study, the Illumina Infinium HumanMethylation450 (HM450) BeadChip was used to measure DNA methylation in peripheral blood obtained from ~3000 African American study participants. Over 480,000 cytosine-guanine (CpG) dinucleotide sites were surveyed on the HM450 BeadChip. To evaluate the impact of technical variation, 265 technical replicates from 130 participants were included in the study.
For each CpG site, we calculated the intraclass correlation coefficient (ICC) to compare variation of methylation levels within- and between-replicate pairs, ranging between 0 and 1. We modeled the distribution of ICC as a mixture of censored or truncated normal and normal distributions using an EM algorithm. The CpG sites were clustered into low- and high-reliability groups, according to the calculated posterior probabilities. We also demonstrated the performance of this clustering when applied to a study of association between methylation levels and smoking status of individuals. For the CpG sites showing genome-wide significant association with smoking status, most (~96%) were seen from sites in the high reliability cluster.
We suggest that CpG sites with low ICC may be excluded from subsequent association analyses, or extra caution needs to be taken for associations at such sites.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-312) contains supplementary material, which is available to authorized users.
DNA methylation; Infinium 450 K chip; Technical error; Intraclass correlation; Normal mixture models
Overall prognosis for osteosarcoma (OS) is poor despite aggressive treatment options. Limited access to primary tumors, technical challenges in processing OS tissues, and the lack of well-characterized primary cell cultures has hindered our ability to fully understand the properties of OS tumor initiation and progression. In this study, we have isolated and characterized cell cultures derived from four central high-grade human OS samples. Furthermore, we used the cell cultures to study the role of CD49f in OS progression. Recent studies have implicated CD49f in stemness and multipotency of both cancer stem cells and mesenchymal stem cells. Therefore, we investigated the role of CD49f in osteosarcomagenesis. First, single cell suspensions of tumor biopsies were subcultured and characterized for cell surface marker expression. Next, we characterized the growth and differentiation properties, sensitivity to chemotherapy drugs, and anchorage-independent growth. Xenograft assays showed that cell populations expressing CD49fhi/CD90lo cell phenotype produced an aggressive tumor. Multiple lines of evidence demonstrated that inhibiting CD49f decreased the tumor-forming ability. Furthermore, the CD49fhi/CD90lo cell population is generating more aggressive OS tumor growth and indicating this cell surface marker could be a potential candidate for the isolation of an aggressive cell type in OSs.
Cancer stem cell; CD49f; integrin; osteosarcoma; tumor-initiating cell; tumor progression
Genome-wide association studies (GWAS) have achieved great success in identifying single nucleotide polymorphisms (SNPs, herein called genetic variants) and genes associated with risk of developing prostate cancer. However, GWAS do not typically link the genetic variants to the disease state or inform the broader context in which the genetic variants operate. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to infer the causal association between gene expression and the disease and to identify the network states and biological pathways enriched for genetic variants. We identified gene regulatory networks and biological pathways enriched for genetic variants, including the prostate cancer, IGF-1, JAK2, androgen, and prolactin signaling pathways. The integration of GWAS information with gene expression data provides insights about the broader context in which genetic variants associated with an increased risk of developing prostate cancer operate.
GWAS; genetic variants; gene expression; prostate cancer
Recent advances in high-throughput genotyping have made possible identification of genetic variants associated with increased risk of developing prostate cancer using genome-wide associations studies (GWAS). However, the broader context in which the identified genetic variants operate is poorly understood. Here we present a comprehensive assessment, network, and pathway analysis of the emerging genetic susceptibility landscape of prostate cancer.
We created a comprehensive catalog of genetic variants and associated genes by mining published reports and accompanying websites hosting supplementary data on GWAS. We then performed network and pathway analysis using single nucleotide polymorphism (SNP)-containing genes to identify gene regulatory networks and pathways enriched for genetic variants.
We identified multiple gene networks and pathways enriched for genetic variants including IGF-1, androgen biosynthesis and androgen signaling pathways, and the molecular mechanisms of cancer. The results provide putative functional bridges between GWAS findings and gene regulatory networks and biological pathways.
prostate cancer GWAS network pathway analysis
B-Precursor acute lymphoblastic leukemia (B-ALL) is the most common childhood cancer. Although 80% of B-ALL patients are able to be cured, significant challenges persist. Significant disparities in clinical outcomes and mortality rates exist between racial/ethnic populations. The objective of this study was to determine whether gene expression levels significantly differ between ethnic populations. We compared gene expression levels between four ethnic populations (Whites, Blacks, Hispanics, and Asians) in the United States. Additionally, we performed network and pathway analysis to identify gene networks and pathways. Gene expression data involved 198 samples distributed as follows: 126 Whites, 51 Hispanics, 13 Blacks, and 8 Asians. We identified 300 highly significantly (P < 0.001) differentially expressed genes between the four ethnic populations. Among the identified genes included the genes PHF6, BRD3, CRLF2, and RNF135 which have been implicated in pediatric B-ALL. We identified key pathways implicated in B-ALL including the PDGF, PI3/AKT, ERBB2-ERBB3, and IL-15 signaling pathways.
leukemia gene expression variation pediatric B-ALL
Genome-wide association studies (GWAS) have achieved great success in identifying common variants associated with increased risk of developing breast cancer. However, GWAS do not typically provide information about the broader context in which genetic variants operate in different subtypes of breast cancer. The objective of this study was to determine whether genes containing single nucleotide polymorphisms (SNPs, herein called genetic variants) are associated with different subtypes of breast cancer. Additionally, we sought to identify gene regulator networks and biological pathways enriched for these genetic variants. Using supervised analysis, we identified 201 genes that were significantly associated with the six intrinsic subtypes of breast cancer. The results demonstrate that integrative genomics analysis is a powerful approach for linking GWAS information to distinct disease states and provide insights about the broader context in which genetic variants operate in different subtypes of breast cancer.
GWAS subtypes breast cancer
Genome-wide association studies (GWAS) have identified genetic variants associated with an increased risk of developing breast cancer. However, the association of genetic variants and their associated genes with the most aggressive subset of breast cancer, the triple-negative breast cancer (TNBC), remains a central puzzle in molecular epidemiology. The objective of this study was to determine whether genes containing single nucleotide polymorphisms (SNPs) associated with an increased risk of developing breast cancer are connected to and could stratify different subtypes of TNBC. Additionally, we sought to identify molecular pathways and networks involved in TNBC. We performed integrative genomics analysis, combining information from GWAS studies involving over 400,000 cases and over 400,000 controls, with gene expression data derived from 124 breast cancer patients classified as TNBC (at the time of diagnosis) and 142 cancer-free controls. Analysis of GWAS reports produced 500 SNPs mapped to 188 genes. We identified a signature of 159 functionally related SNP-containing genes which were significantly (P <10−5) associated with and stratified TNBC. Additionally, we identified 97 genes which were functionally related to, and had similar patterns of expression profiles, SNP-containing genes. Network modeling and pathway prediction revealed multi-gene pathways including p53, NFkB, BRCA, apoptosis, DNA repair, DNA mismatch, and excision repair pathways enriched for SNPs mapped to genes significantly associated with TNBC. The results provide convincing evidence that integrating GWAS information with gene expression data provides a unified and powerful approach for biomarker discovery in TNBC.
triple negative breast cancer GWAS gene expression
Variable response and resistance to tamoxifen treatment in breast cancer patients remains a major clinical problem. To determine whether genes and biological pathways containing SNPs associated with risk for breast cancer are dysregulated in response to tamoxifen treatment, we performed analysis combining information from 43 genome-wide association studies with gene expression data from 298 ER+ breast cancer patients treated with tamoxifen and 125 ER+ controls. We identified 95 genes which distinguished tamoxifen treated patients from controls. Additionally, we identified 54 genes which stratified tamoxifen treated patients into two distinct groups. We identified biological pathways containing SNPs associated with risk for breast cancer, which were dysregulated in response to tamoxifen treatment. Key pathways identified included the apoptosis, P53, NFkB, DNA repair and cell cycle pathways. Combining GWAS with transcription profiling provides a unified approach for associating GWAS findings with response to drug treatment and identification of potential drug targets.
tamoxifen genome-wide association studies gene expression
Genome-wide association studies (GWAS) have successfully identified genetic variants associated with risk for breast cancer. However, the molecular mechanisms through which the identified variants confer risk or influence phenotypic expression remains poorly understood. Here, we present a novel integrative genomics approach that combines GWAS information with gene expression data to assess the combined contribution of multiple genetic variants acting within genes and putative biological pathways, and to identify novel genes and biological pathways that could not be identified using traditional GWAS. The results show that genes containing SNPs associated with risk for breast cancer are functionally related and interact with each other in biological pathways relevant to breast cancer. Additionally, we identified novel genes that are co-expressed and interact with genes containing SNPs associated with breast cancer. Integrative analysis combining GWAS information with gene expression data provides functional bridges between GWAS findings and biological pathways involved in breast cancer.
genome-wide association studies gene expression pathway
Genome-wide association studies (GWAS) have identified SNPs associated with breast cancer. However, they offer limited insights about the biological mechanisms by which SNPs confer risk. We investigated the association of GWAS information with a major oncogenic pathway in breast cancer, the Notch signaling pathway. We first identified 385 SNPs and 150 genes associated with risk for breast cancer by mining data from 41 GWAS. We then investigated their expression, along with 32 genes involved in the Notch signaling pathway using two publicly available gene expression data sets from the Caucasian (42 cases and 143 controls) and Asian (43 cases and 43 controls) populations. Pathway prediction and network modeling confirmed that Notch receptors and genes involved in the Notch signaling pathway interact with genes containing SNPs associated with risk for breast cancer. Additionally, we identified other SNP-associated biological pathways relevant to breast cancer, including the P53, apoptosis and MAP kinase pathways.
GWAS; gene expression; Notch signaling pathway
Polymorphic variants and mutations disrupting canonical splicing isoforms are among the leading causes of human hereditary disorders. While there is a substantial evidence of aberrant splicing causing Mendelian diseases, the implication of such events in multi-genic disorders is yet to be well understood. We have developed a new tool (SpliceScan II) for predicting the effects of genetic variants on splicing and cis-regulatory elements. The novel Bayesian non-canonical 5'GC splice site (SS) sensor used in our tool allows inference on non-canonical exons.
Our tool performed favorably when compared with the existing methods in the context of genes linked to the Autism Spectrum Disorder (ASD). SpliceScan II was able to predict more aberrant splicing isoforms triggered by the mutations, as documented in DBASS5 and DBASS3 aberrant splicing databases, than other existing methods. Detrimental effects behind some of the polymorphic variations previously associated with Alzheimer's and breast cancer could be explained by changes in predicted splicing patterns.
We have developed SpliceScan II, an effective and sensitive tool for predicting the detrimental effects of genomic variants on splicing leading to Mendelian and complex hereditary disorders. The method could potentially be used to screen resequenced patient DNA to identify de novo mutations and polymorphic variants that could contribute to a genetic disorder.
Auxiliary splicing sequences play an important role in ensuring accurate and efficient splicing by promoting or repressing recognition of authentic splice sites. These cis-acting motifs have been termed splicing enhancers and silencers and are located both in introns and exons. They co-evolved into an intricate splicing code together with additional functional constraints, such as tissue-specific and alternative splicing patterns. We used orthologous exons extracted from the University of California Santa Cruz multiple genome alignments of human and 22 Tetrapoda organisms to predict candidate enhancers and silencers that have reproducible and statistically significant bias towards annotated exonic boundaries.
A total of 2,546 Tetrapoda enhancers and silencers were clustered into 15 putative core motifs based on their Markov properties. Most of these elements have been identified previously, but 118 putative silencers and 260 enhancers (~15%) were novel. Examination of previously published experimental data for the presence of predicted elements showed that their mutations in 21/23 (91.3%) cases altered the splicing pattern as expected. Predicted intronic motifs flanking 3' and 5' splice sites had higher evolutionary conservation than other sequences within intronic flanks and the intronic enhancers were markedly differed between 3' and 5' intronic flanks.
Difference in intronic enhancers supporting 5' and 3' splice sites suggests an independent splicing commitment for neighboring exons. Increased evolutionary conservation for ISEs/ISSs within intronic flanks and effect of modulation of predicted elements on splicing suggest functional significance of found elements in splicing regulation. Most of the elements identified were shown to have direct implications in human splicing and therefore could be useful for building computational splicing models in biomedical research.
Structural and functional abnormalities have been found in language-related brain regions in patients with schizophrenia. We previously reported findings pointing to differences in word processing between people with schizophrenia and individuals who are at high-risk for schizophrenia using a voxel-based (whole brain) fMRI approach. We now extend this finding to specifically examine functional activity in three language related cortical regions using a larger cohort of individuals.
A visual lexical discrimination task was performed by 36 controls, 21 subjects at high genetic-risk for schizophrenia, and 20 patients with schizophrenia during blood oxygenation level dependent (BOLD) fMRI scanning. Activation in bilateral inferior frontal gyri (Brodmann's area 44-45), bilateral inferior parietal lobe (Brodmann's area 39-40), and bilateral superior temporal gyri (Brodmann's area 22) was investigated. For all subjects, two-tailed Pearson correlations were calculated between the computed laterality index and a series of cognitive test scores determining language functioning.
Regional activation in Brodmann's area 44-45 was left lateralized in normal controls, while high-risk subjects and patients with schizophrenia or schizoaffective disorder showed more bilateral activation. No significant differences among the three diagnostic groups in the other two regions of interest (Brodmann's area 22 or areas 39-40) were found. Furthermore, the apparent reasons for loss of leftward language lateralization differed between groups. In high-risk subjects, the loss of lateralization was based on reduced left hemisphere activation, while in the patient group, it was due to increased right side activation. Language ability related cognitive scores were positively correlations with the laterality indices obtained from Brodmann's areas 44-45 in the high-risk group, and with the laterality indices from Brodmann's areas 22 and 44-45 in the patient group.
This study reinforces previous language related imaging studies in high-risk subjects and patients with schizophrenia suggesting that reduced functional lateralization in language related frontal cortex may be a vulnerability marker for schizophrenia. Future studies will determine whether it is predictive of who develops illness.
fMRI; schizophrenia; High risk, genetic; Language lateralization; ROI based study