A number of genetic studies have suggested numerous susceptibility genes for dental caries over the past decade with few definite conclusions. The rapid accumulation of relevant information, along with the complex architecture of the disease, provides a challenging but also unique opportunity to review and integrate the heterogeneous data for follow-up validation and exploration. In this study, we collected and curated candidate genes from four major categories: association studies, linkage scans, gene expression analyses, and literature mining. Candidate genes were prioritized according to the magnitude of evidence related to dental caries. We then searched for dense modules enriched with the prioritized candidate genes through their protein-protein interactions (PPIs). We identified 23 modules comprising of 53 genes. Functional analyses of these 53 genes revealed three major clusters: cytokine network relevant genes, matrix metalloproteinases (MMPs) family, and transforming growth factor-beta (TGF-β) family, all of which have been previously implicated to play important roles in tooth development and carious lesions. Through our extensive data collection and an integrative application of gene prioritization and PPI network analyses, we built a dental caries-specific sub-network for the first time. Our study provided insights into the molecular mechanisms underlying dental caries. The framework we proposed in this work can be applied to other complex diseases.
Genetic factors underlying trait neuroticism, reflecting a tendency towards negative affective states, may overlap genetic susceptibility for anxiety disorders and help explain the extensive comorbidity amongst internalizing disorders. Genome-wide linkage (GWL) data from several studies of neuroticism and anxiety disorders have been published, providing an opportunity to test such hypotheses and identify genomic regions that harbor genes common to these phenotypes. In all, 11 independent GWL studies of either neuroticism (n=8) or anxiety disorders (n=3) were collected, which comprised of 5341 families with 15 529 individuals. The rank-based genome scan meta-analysis (GSMA) approach was used to analyze each trait separately and combined, and global correlations between results were examined. False discovery rate (FDR) analysis was performed to test for enrichment of significant effects. Using 10 cM intervals, bins nominally significant for both GSMA statistics, PSR and POR, were found on chromosomes 9, 11, 12, and 14 for neuroticism and on chromosomes 1, 5, 15, and 16 for anxiety disorders. Genome-wide, the results for the two phenotypes were significantly correlated, and a combined analysis identified additional nominally significant bins. Although none reached genome-wide significance, an excess of significant PSRP-values were observed, with 12 bins falling under a FDR threshold of 0.50. As demonstrated by our identification of multiple, consistent signals across the genome, meta-analytically combining existing GWL data is a valuable approach to narrowing down regions relevant for anxiety-related phenotypes. This may prove useful for prioritizing emerging genome-wide association data for anxiety disorders.
anxiety; neuroticism; panic disorder; linkage; meta-analysis
Antipsychotic drugs are medications commonly for schizophrenia (SCZ) treatment, which include two groups: typical and atypical. SCZ patients have multiple comorbidities, and the coadministration of drugs is quite common. This may result in adverse drug-drug interactions, which are events that occur when the effect of a drug is altered by the coadministration of another drug. Therefore, it is important to provide a comprehensive view of these interactions for further coadministration improvement. Here, we extracted SCZ drugs and their adverse drug interactions from the DrugBank and compiled a SCZ-specific adverse drug interaction network. This network included 28 SCZ drugs, 241 non-SCZs, and 991 interactions. By integrating the Anatomical Therapeutic Chemical (ATC) classification with the network analysis, we characterized those interactions. Our results indicated that SCZ drugs tended to have more adverse drug interactions than other drugs. Furthermore, SCZ typical drugs had significant interactions with drugs of the “alimentary tract and metabolism” category while SCZ atypical drugs had significant interactions with drugs of the categories “nervous system” and “antiinfectives for systemic uses.” This study is the first to characterize the adverse drug interactions in the course of SCZ treatment and might provide useful information for the future SCZ treatment.
Kinase inhibitors are accepted treatment for metastatic melanomas that harbor specific driver mutations in BRAF or KIT, but only 40–50% of cases are positive. To uncover other potential targetable mutations, we performed whole-genome sequencing of a highly aggressive BRAF (V600) and KIT (W557, V559, L576, K642, D816) wildtype melanoma. Surprisingly, we found a somatic BRAF L597R mutation in exon 15. Analysis of BRAF exon 15 in 49 tumors negative for BRAF V600 mutations as well as driver mutations in KIT, NRAS, GNAQ, and GNA11, showed that 2 (4%) harbored L597 mutations and another 2 involved BRAF D594 and K601 mutations. In vitro signaling induced by L597R/S/Q mutants was suppressed by MEK inhibition. A patient with BRAF L597S mutant metastatic melanoma responded significantly to treatment with the MEK inhibitor, TAK-733. Collectively, these data demonstrate clinical significance to BRAF L597 mutations in melanoma.
melanoma; BRAF L597; whole genome sequencing; BRAF inhibitor; MEK inhibitor; TAK-733
Gene set-based analysis of genome-wide association study (GWAS) data has recently emerged as a useful approach to examine the joint effects of multiple risk loci in complex human diseases or phenotypes. Dental caries is a common, chronic, and complex disease leading to a decrease in quality of life worldwide. In this study, we applied the approaches of gene set enrichment analysis to a major dental caries GWAS dataset, which consists of 537 cases and 605 controls. Using four complementary gene set analysis methods, we analyzed 1331 Gene Ontology (GO) terms collected from the Molecular Signatures Database (MSigDB). Setting false discovery rate (FDR) threshold as 0.05, we identified 13 significantly associated GO terms. Additionally, 17 terms were further included as marginally associated because they were top ranked by each method, although their FDR is higher than 0.05. In total, we identified 30 promising GO terms, including ‘Sphingoid metabolic process,’ ‘Ubiquitin protein ligase activity,’ ‘Regulation of cytokine secretion,’ and ‘Ceramide metabolic process.’ These GO terms encompass broad functions that potentially interact and contribute to the oral immune response related to caries development, which have not been reported in the standard single marker based analysis. Collectively, our gene set enrichment analysis provided complementary insights into the molecular mechanisms and polygenic interactions in dental caries, revealing promising association signals that could not be detected through single marker analysis of GWAS data.
High-throughput genomic technologies are increasingly being used to identify therapeutic targets and risk factors for specific diseases. Using 116 independent liver samples, we identified 793 probe sets that demonstrated a significant association in the frequency of absent calls as tissues progressed from normal to pre-neoplastic to neoplastic, followed by a bioinformatic approach which identified that 78.9% of the significant probe sets contained at least one CpG island in the gene promoter region compared with 58.9% of the remaining genes examined. Our results indicate that further high-throughput methylation studies to more fully characterize molecular events involved in hepatocarcinogenesis are warranted.
HCC; hepatocellular carcinoma; HCV; hepatitis C virus; methylation; microarray; hepatocarcinogenesis
Integrating evidence from multiple domains is useful in prioritizing disease candidate genes for subsequent testing. We ranked all known human genes (n = 3819) under linkage peaks in the Irish Study of High-Density Schizophrenia Families using three different evidence domains: 1) a meta-analysis of microarray gene expression results using the Stanley Brain collection, 2) a schizophrenia protein-protein interaction network, and 3) a systematic literature search. Each gene was assigned a domain-specific p-value and ranked after evaluating the evidence within each domain. For comparison to this ranking process, a large-scale candidate gene hypothesis was also tested by including genes with Gene Ontology terms related to neurodevelopment. Subsequently, genotypes of 3725 SNPs in 167 genes from a custom Illumina iSelect array were used to evaluate the top ranked vs. hypothesis selected genes. Seventy-three genes were both highly ranked and involved in neurodevelopment (category 1) while 42 and 52 genes were exclusive to neurodevelopment (category 2) or highly ranked (category 3), respectively. The most significant associations were observed in genes PRKG1, PRKCE, and CNTN4 but no individual SNPs were significant after correction for multiple testing. Comparison of the approaches showed an excess of significant tests using the hypothesis-driven neurodevelopment category. Random selection of similar sized genes from two independent genome-wide association studies (GWAS) of schizophrenia showed the excess was unlikely by chance. In a further meta-analysis of three GWAS datasets, four candidate SNPs reached nominal significance. Although gene ranking using integrated sources of prior information did not enrich for significant results in the current experiment, gene selection using an a priori hypothesis (neurodevelopment) was superior to random selection. As such, further development of gene ranking strategies using more carefully selected sources of information is warranted.
The molecular mechanisms underlying the reduced penetrance seen in the nonsense-mediated decay–positive (NMD+) BMPR2 mutation–associated hereditary pulmonary arterial hypertension (HPAH) remain unknown. We reasoned that the cellular and genetic mechanisms behind this phenomenon could be uncovered by combining expression profiling with Connectivity Map (cMap) analysis. Cultured lymphocytes from 10 patients with HPAH and 10 matched familial control subjects, all with NMD+ BMPR2 mutations, were subjected to expression analysis. For each group, the expression data were combined before analysis. This generated a signature of 23 up-regulated and 12 down-regulated genes in patients with HPAH compared with control subjects (the “PAH penetrance signature”). Although gene set enrichment analysis of this signature was not uniquely informative, cMap analysis identified drugs with expression signatures similar to the PAH penetrance signature. Several of these drugs were predicted to influence reactive oxygen species (ROS) formation. This hypothesis was tested and confirmed in the same cells initially subjected to the expression analysis using quantitative biochemical detection of ROS concentration. We conclude that expression of the PAH penetrance signature represents an increased risk of developing clinical HPAH and that ROS formation may play a role in pathogenesis of HPAH. These results provide the first molecular insights into NMD+ BMPR2 related HPAH penetrance and highlight the potential utility of cMap analyses in pulmonary research.
hereditary pulmonary arterial hypertention; ROS; connectivity map; expression signature; BMPR2
Along with computational approaches, NGS led technologies have caused a major impact upon the discoveries made in the area of miRNA biology, including novel miRNAs identification. However, to this date all microRNA discovery tools compulsorily depend upon the availability of reference or genomic sequences. Here, for the first time a novel approach, miReader, has been introduced which could discover novel miRNAs without any dependence upon genomic/reference sequences. The approach used NGS read data to build highly accurate miRNA models, molded through a Multi-boosting algorithm with Best-First Tree as its base classifier. It was comprehensively tested over large amount of experimental data from wide range of species including human, plants, nematode, zebrafish and fruit fly, performing consistently with >90% accuracy. Using the same tool over Illumina read data for Miscanthus, a plant whose genome is not sequenced; the study reported 21 novel mature miRNA duplex candidates. Considering the fact that miRNA discovery requires handling of high throughput data, the entire approach has been implemented in a standalone parallel architecture. This work is expected to cause a positive impact over the area of miRNA discovery in majority of species, where genomic sequence availability would not be a compulsion any more.
Background and Methods
Esophageal adenocarcinoma (EAC) is characterized by a steep rise in incidence rates in the Western population. The unique miRNA signature that distinguishes EAC from other upper gastrointestinal cancers remains unclear. Herein, we performed a comprehensive microarray profiling for the specific miRNA signature associated with EAC. We validated this signature by qRT-PCR.
Microarray analysis showed that 21 miRNAs were consistently deregulated in EAC. miR-194, miR-192, miR-200a, miR-21, miR-203, miR-205, miR-133b, and miR-31 were selected for validation using 46 normal squamous (NS), 23 Barrett’s esophagus (BE), 17 Barrett’s high grade dysplasia (HGD), 34 EAC, 33 gastric adenocarcinoma (GC), and 45 normal gastric (NG) tissues. The qRT-PCR analysis indicated that 2 miRNAs (miR-21 and miR-133b) were deregulated in both EAC and GC, and 6 miRNAs (up-regulated: miR-194, miR-31, miR-192, and miR-200a; down-regulated: miR-203 and miR-205) in EAC, as compared to BE but not in GC, indicating their potential unique role in EAC. Our data showed that miR-194, miR-192, miR-21, and miR-31 were up-regulated in BE adjacent to HGD lesions relative to isolated BE samples. Analysis of clinicopathological features indicated that down-regulation of miR-203 is significantly associated with progression and tumor stages in EAC. Interestingly, the overexpression levels of miR-194, miR-200a, and miR-192 were significantly higher in early EAC stages, suggesting that these miRNAs may be involved in EAC tumor development rather than progression.
Our findings demonstrate the presence of a unique miRNA signature for EAC. This may provide some clues for the distinct molecular features of EAC to be considered in future studies of the role of miRNAs in EAC and their utility as disease biomarkers.
Next generation sequencing (NGS) technologies allow us to explore virus interactions with host genomes that lead to carcinogenesis or other diseases; however, this effort is largely hindered by the dearth of efficient computational tools. Here, we present a new tool, VirusFinder, for the identification of viruses and their integration sites in host genomes using NGS data, including whole transcriptome sequencing (RNA-Seq), whole genome sequencing (WGS), and targeted sequencing data. VirusFinder’s unique features include the characterization of insertion loci of virus of arbitrary type in the host genome and high accuracy and computational efficiency as a result of its well-designed pipeline. The source code as well as additional data of VirusFinder is publicly available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Breast Cancer (BCa) genome-wide association studies revealed allelic frequency differences between cases and controls at index single nucleotide polymorphisms (SNPs). To date, 71 loci have thus been identified and replicated. More than 320,000 SNPs at these loci define BCa risk due to linkage disequilibrium (LD). We propose that BCa risk resides in a subgroup of SNPs that functionally affects breast biology. Such a shortlist will aid in framing hypotheses to prioritize a manageable number of likely disease-causing SNPs. We extracted all the SNPs, residing in 1 Mb windows around breast cancer risk index SNP from the 1000 genomes project to find correlated SNPs. We used FunciSNP, an R/Bioconductor package developed in-house, to identify potentially functional SNPs at 71 risk loci by coinciding them with chromatin biofeatures. We identified 1,005 SNPs in LD with the index SNPs (r2≥0.5) in three categories; 21 in exons of 18 genes, 76 in transcription start site (TSS) regions of 25 genes, and 921 in enhancers. Thirteen SNPs were found in more than one category. We found two correlated and predicted non-benign coding variants (rs8100241 in exon 2 and rs8108174 in exon 3) of the gene, ANKLE1. Most putative functional LD SNPs, however, were found in either epigenetically defined enhancers or in gene TSS regions. Fifty-five percent of these non-coding SNPs are likely functional, since they affect response element (RE) sequences of transcription factors. Functionality of these SNPs was assessed by expression quantitative trait loci (eQTL) analysis and allele-specific enhancer assays. Unbiased analyses of SNPs at BCa risk loci revealed new and overlooked mechanisms that may affect risk of the disease, thereby providing a valuable resource for follow-up studies.
Methylation change plays an important role in many cellular systems, including cancer development. During recent years, genome-wide or large scale methylation data has become available thanks to rapid advances in high-throughput biotechnologies. So far, researchers have always used gene expression profiling to study disease subtypes and related therapies. In this study, we investigated methylation profiles in 30 breast cancer cell lines using methylation data generated by microarray technologies. Strong variation of the number of methylation peaks was found among these 30 cell lines; however, more peaks were found in the upstream regions than downstream regions of genes. We further grouped the methylation profiles of these cell lines into three consensus clusters. Finally, we performed an integrative analysis of breast cancer cell lines using both methylation and gene expression profiling data. There was no significant correlation between methylation profiling subtypes and gene expression profiling subtypes, suggesting the complex nature of methylation in the regulation of gene expression. However, we found basal B cell lines appeared exclusively in two methylation clusters. Although these results are preliminary, this study suggests that methylation profiling might be promising in disease subtype classification and the development of therapeutic strategies.
The development of drug-resistant phenotypes has been a major obstacle to Cisplatin (CDDP) use in non-small cell lung cancer (NSCLC). We aimed to identify some of the molecular mechanisms that underlie CDDP resistance by using microarray expression analysis.
Methods and Materials
H460 cells were treated with cisplatin. Differences between cisplatin-resistant lung cancer cells (CDDP-R) and parental H460 cells were studied using Western blot, MTS and clongenic assays, in vivo tumor implantation, and microarray analysis. CDDP-R cells were treated with human recombinant insulin-like growth factor binding protein-3 (IGFBP-3) and siRNA targeting insulin-like growth factor-1 receptor (IGF-1R).
CDDP-R cells illustrated greater expression of the markers CD133 and ALDH, more rapid in vivo tumor growth, more resistance to cisplatin- and etoposide-induced apoptosis, and greater survival after treatment with cisplatin or radiation than the parental H460 cells. Also, CDDP-R demonstrated decreased expression of IGFBP-3 and increased activation of IGF-1R signaling as compared to parental H460 cells in the presence of IGF-1. Human recombinant IGFBP-3 reversed cisplatin resistance in CDDP-R cells, and targeting of IGF-1R using siRNA resulted in sensitization of CDDP-R-cells to cisplatin and radiation.
The IGF-1 signaling pathway contributes to CDDP-R resistance to cisplatin and radiation. Thus, this pathway represents a potential target for improved lung cancer response to treatment.
cisplatin resistance; IGFBP-3; lung cancer; radiotherapy
We introduce GenRev, a network-based software package developed to explore the functional relevance of genes generated as an intermediate result from numerous high-throughput technologies. GenRev searches for optimal intermediate nodes (genes) for the connection of input nodes via several algorithms, including the Klein-Ravi algorithm, the limited kWalks algorithm and a heuristic local search algorithm. Gene ranking and graph clustering analyses are integrated into the package. GenRev has the following features. (1) It provides users with great flexibility to define their own networks. (2) Users are allowed to define each gene’s importance in a subnetwork search by setting its score. (3) It is standalone and platform independent. (4) It provides an optimization in subnetwork search, which dramatically reduces the running time. GenRev is particularly designed for general use so that users have the flexibility to choose a reference network and define the score of genes. GenRev is freely available at http://bioinfo.mc.vanderbilt.edu/GenRev.html.
Gene ranking; Network; Subnetwork; Klein-Ravi algorithm; limited kWalks algorithm; Disease genes
Recombinant Neuregulin (NRG)-1β has multiple beneficial effects on cardiac myocytes in culture, and has potential as a clinical therapy for heart failure (HF). A number of factors may influence the effect of NRG-1β on cardiac function via ErbB receptor coupling and expression. We examined the effect of the NRG-1β isoform, glial growth factor 2 (GGF2), in rats with myocardial infarction (MI) and determined the impact of high-fat diet as well as chronicity of disease on GGF2 induced improvement in left ventricular systolic function. Potential mechanisms for GGF2 effects on the remote myocardium were explored using microarray and proteomic analysis.
Methods and Results
Rats with MI were randomized to receive vehicle, 0.625 mg/kg, or 3.25 mg/kg GGF2 in the presence and absence of high-fat feeding beginning at day 7 post-MI and continuing for 4 weeks. Residual left ventricular (LV) function was improved in both of the GGF2 treatment groups compared with the vehicle treated MI group at 4 weeks of treatment as assessed by echocardiography. High-fat diet did not prevent the effects of high dose GGF2. In experiments where treatment was delayed until 8 weeks after MI, high but not low dose GGF2 treatment was associated with improved systolic function. mRNA and protein expression analysis of remote left ventricular tissue revealed a number of changes in myocardial gene and protein expression altered by MI that were normalized by GGF2 treatment, many of which are involved in energy production.
This study demonstrates that in rats with MI induced systolic dysfunction, GGF2 treatment improves cardiac function. There are differences in sensitivity of the myocardium to GGF2 effects when administered early vs. late post-MI that may be important to consider in the development of GGF2 in humans.
Salmonella bacteria cause millions of infections and thousands of deaths every year. This pathogen has an unusually broad host range including humans, animals, and even plants. During infection, Salmonella expresses a variety of virulence factors and effectors that are delivered into the host cell triggering cellular responses through protein–protein interactions (PPIs) with host cell proteins which make the pathogen’s invasion and replication possible. To speed up proteomic efforts in elucidating Salmonella–host interactomes, we carried out a survey of the currently published Salmonella–host PPI. Such a list can serve as the gold standard for computational models aimed at predicting Salmonella–host interactomes through integration of large-scale biological data sources. Manual literature and database search of >2200 journal articles and >100 databases resulted in a gold standard list of currently 62 PPI, including primarily interactions of Salmonella proteins with human and mouse proteins. Only six of these interactions were directly retrievable from PPI databases and 16 were highlighted in databases featuring literature extracts. Thus, the literature survey resulted in the most complete interactome available to date for Salmonella. Pathway analysis using Ingenuity and Broad Gene Set Enrichment Analysis (GSEA) software revealed among general pathways such as MAPK signaling in particular those related to cell death as well as cell morphology, turnover, and interactions, in addition to response to not only Salmonella but also other pathogenic – viral and bacterial – infections. The list of interactions is available at http://www.shiprec.org/indicationslist.htm
Interactome; Pathway analysis; Protein–protein interaction; Salmonella; Subnetwork analysis
Deregulation of gene expression, a hallmark of cancer, is caused by both genetic and epigenetic mechanisms. The rapid accumulation of epigenome maps of various cancers suggests a new avenue of research, namely integrating epigenomic data with other types of omic data for cancer diagnosis, prognosis, and biomarker discovery. We introduce the MAPIT algorithm (Multi Analyte Pathway Inference Tool), to enable principled integration of epigenomic, transcriptomic, and protein interactome data. As a proof-of-principle, we apply MAPIT to glioblastoma multiforme (GBM), the most common and aggressive form of brain tumor. Few predictive markers were reported for the prognosis of GBM patients. By integrating mRNA transcriptome, promoter DNA methylome and protein-protein physical interactome, we find ten expression- and three methylation-based network markers, involving 118 genes. When tested on additional GBM patient samples, the prognostic accuracy of the multi-analyte network markers (73.5%) is 9.7% and 8.6% higher than previous prognostic signatures built on gene expression or DNA methylation alone. Our results highlight the critical role of two novel pathways in the prognosis of GBM patients, small GTPase-mediated protein trafficking and ubiquitination-dependent protein degradation. A better understanding of these two pathways could lead to personalized therapies for subgroups of GBM patients. Our study demonstrates that integrating epigenomic, transcriptomic, and interactomic data can improve the accuracy network-based prognosis markers and lead to novel mechanistic understanding of cancer.
The 2012 International Conference on Intelligent Biology and Medicine (ICIBM 2012) was held on April 22-24, 2012 in Nashville, Tennessee, USA. The conference featured six technical sessions, one tutorial session, one workshop, and 3 keynote presentations that covered state-of-the-art research activities in genomics, systems biology, and intelligent computing. In addition to a major emphasis on the next generation sequencing (NGS)-driven informatics, ICIBM 2012 aligned significant interests in systems biology and its applications in medicine. We highlight in this editorial the selected papers from the meeting that address the developments of novel algorithms and applications in systems biology.
Pathway analysis of large-scale omics data assists us with the examination of the cumulative effects of multiple functionally related genes, which are difficult to detect using the traditional single gene/marker analysis. So far, most of the genomic studies have been conducted in a single domain, e.g., by genome-wide association studies (GWAS) or microarray gene expression investigation. A combined analysis of disease susceptibility genes across multiple platforms at the pathway level is an urgent need because it can reveal more reliable and more biologically important information.
We performed an integrative pathway analysis of a GWAS dataset and a microarray gene expression dataset in prostate cancer. We obtained a comprehensive pathway annotation set from knowledge-based public resources, including KEGG pathways and the prostate cancer candidate gene set, and gene sets specifically defined based on cross-platform information. By leveraging on this pathway collection, we first searched for significant pathways in the GWAS dataset using four methods, which represent two broad groups of pathway analysis approaches. The significant pathways identified by each method varied greatly, but the results were more consistent within each method group than between groups. Next, we conducted a gene set enrichment analysis of the microarray gene expression data and found 13 pathways with cross-platform evidence, including "Fc gamma R-mediated phagocytosis" (PGWAS = 0.003, Pexpr < 0.001, and Pcombined = 6.18 × 10-8), "regulation of actin cytoskeleton" (PGWAS = 0.003, Pexpr = 0.009, and Pcombined = 3.34 × 10-4), and "Jak-STAT signaling pathway" (PGWAS = 0.001, Pexpr = 0.084, and Pcombined = 8.79 × 10-4).
Our results provide evidence at both the genetic variation and expression levels that several key pathways might have been involved in the pathological development of prostate cancer. Our framework that employs gene expression data to facilitate pathway analysis of GWAS data is not only feasible but also much needed in studying complex disease.
DNA methylation, which mainly occurs at CpG dinucleotides, is a dynamic epigenetic regulation mechanism in most eukaryotic genomes. It is already known that methylated CpG dinucleotides can lead to a high rate of C to T mutation at these sites. However, less is known about whether and how the methylation level causes a different mutation rate, especially at the single-base resolution.
In this study, we used genome-wide single-base resolution methylation data to perform a comprehensive analysis of the mutation rate of methylated cytosines from human embryonic stem cell. Through the analysis of the density of single nucleotide polymorphisms, we first confirmed that the mutation rate in methylated CpG sites is greater than that in unmethylated CpG sites. Then, we showed that among methylated CpG sites, the mutation rate is markedly increased in low-intermediately (20-40% methylation level) to intermediately methylated CpG sites (40-60% methylation level) of the human genome. This mutation pattern was observed regardless of DNA strand direction and the sequence coverage over the site on which the methylation level was calculated. Moreover, this highly non-random mutation pattern was found more apparent in intergenic and intronic regions than in promoter regions and CpG islands. Our investigation suggested this pattern appears primarily in autosomes rather than sex chromosomes. Further analysis based on human-chimpanzee divergence confirmed these observations. Finally, we observed a significant correlation between the methylation level and cytosine allele frequency.
Our results showed a high mutation rate in low-intermediately to intermediately methylated CpG sites at different scales, from the categorized genomic region, whole chromosome, to the whole genome level, thereby providing the first supporting evidence of mutation rate variation at human methylated CpG sites using the genome-wide sing-base resolution methylation data.
We present a report of the 2012 International Conference on Intelligent Biology and Medicine (ICIBM 2012) and the editorial report of the supplement to BMC Genomics that includes 22 research papers selected from ICIBM 2012, which was held on April 22-24, 2012 in Nashville, Tennessee, USA. The conference covered a variety of research areas, including bioinformatics, systems biology, and intelligent computing. It included six sessions, a tutorial - Introduction to Proteome Informatics, a workshop - Next Generation Sequencing, and a poster session. The selected papers in this Supplement issue represent the genomic focus in ICIBM 2012.
A variety of species and experimental designs have been used to study genetic influences on alcohol dependence, ethanol response, and related traits. Integration of these heterogeneous data can be used to produce a ranked target gene list for additional investigation.
In this study, we performed a unique multi-species evidence-based data integration using three microarray experiments in mice or humans that generated an initial alcohol dependence (AD) related genes list, human linkage and association results, and gene sets implicated in C. elegans and Drosophila. We then used permutation and false discovery rate (FDR) analyses on the genome-wide association studies (GWAS) dataset from the Collaborative Study on the Genetics of Alcoholism (COGA) to evaluate the ranking results and weighting matrices. We found one weighting score matrix could increase FDR based q-values for a list of 47 genes with a score greater than 2. Our follow up functional enrichment tests revealed these genes were primarily involved in brain responses to ethanol and neural adaptations occurring with alcoholism.
These results, along with our experimental validation of specific genes in mice, C. elegans and Drosophila, suggest that a cross-species evidence-based approach is useful to identify candidate genes contributing to alcoholism.
While genome-wide association studies identified some promising candidates for schizophrenia, the majority of risk genes remained unknown. We were interested in testing whether integration gene expression and other functional information could facilitate the identification of susceptibility genes and related biological pathways.
We conducted high throughput sequencing analyses to evaluate mRNA expression in blood samples isolated from 3 schizophrenia patients and 3 healthy controls. We also conducted pooled sequencing of 10 schizophrenic patients and matched controls. Differentially expressed genes were identified by t-test. In the individually sequenced dataset, we identified 198 genes differentially expressed between cases and controls, of them 19 had been verified by the pooled sequencing dataset and 21 reached nominal significance in gene-based association analyses of a genome wide association dataset. Pathway analysis of these differentially expressed genes revealed that they were highly enriched in the immune related pathways. Two genes, S100A8 and TYROBP, had consistent changes in expression in both individual and pooled sequencing datasets and were nominally significant in gene-based association analysis.
Integration of gene expression and pathway analyses with genome-wide association may be an efficient approach to identify risk genes for schizophrenia.
Semi-invariant natural killer T (NKT) cells are thymus-derived innate lymphocytes that modulate microbial and tumour immunity as well as autoimmune diseases. These immunoregulatory properties of NKT cells are acquired during their development. Much has been learnt regarding the molecular and cellular cues that promote NKT cell development, yet how these cells are maintained in the thymus and the periphery and how they acquire functional competence are incompletely understood. We found that IL-15 induced several Bcl-2 family survival factors in thymic and splenic NKT cells in vitro. Yet, IL15-mediated thymic and peripheral NKT cell survival critically depended on Bcl-xL expression. Additionally, IL-15 regulated thymic developmental stage 2 (ST2) to ST3 lineage progression and terminal NKT cell differentiation. Global gene expression analyses and validation revealed that IL-15 regulated Tbx21 (T-bet) expression in thymic NKT cells. The loss of IL15 also resulted in poor expression of key effector molecules such as IFN-γ, granzyme A and C as well as several NK cell receptors in NKT cells. Taken together, our findings reveal a critical role for IL-15 in NKT cell survival, which is mediated by Bcl-xL, and effector differentiation, which is consistent with a role of T-bet in regulating terminal maturation.