Background and Aim
Recent studies in patients with anorexia nervosa suggest that oxytocin may be involved in the pathophysiology of anorexia nervosa. We examined whether there was evidence of variation in methylation status of the oxytocin receptor (OXTR) gene in patients with anorexia nervosa that might account for these findings.
We analyzed the methylation status of the CpG sites in a region from the exon 1 to the MT2 regions of the OXTR gene in buccal cells from 15 patients and 36 healthy women using bisulfite sequencing. We further examined whether methylation status was associated with markers of illness severity or form.
We identified six CpG sites with significant differences in average methylation levels between the patient and control groups. Among the six differentially methylated CpG sites, five showed higher than average methylation levels in patients than those in the control group (64.9–88.8% vs. 6.6–45.0%). The methylation levels of these five CpG sites were negatively associated with body mass index (BMI). BMI, eating disorders psychopathology, and anxiety were identified in a regression analysis as factors affecting the methylation levels of these CpG sites with more variation accounted for by BMI.
Epigenetic misregulation of the OXTR gene may be implicated in anorexia nervosa, which may either be a mechanism linking environmental adversity to risk or may be a secondary consequence of the illness.
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.
A cancer genome typically harbors both driver mutations, which contribute to tumorigenesis, and passenger mutations, which tend to be neutral and occur randomly. Cancer genomes differ dramatically due to genetic and environmental factors. A major challenge in interpreting the large volume of mutation data identified in cancer genomes using next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. Applying our approach in a large cohort of lung adenocarcinoma samples and melanoma samples, we derived a consensus mutation subnetwork for each cancer containing significantly enriched cancer genes and cancer-related functional pathways. Our results indicated that driver genes occur within a broad spectrum of frequency, interact with each other, and converge in several key pathways that play critical roles in tumorigenesis.
Recent studies have demonstrated that gene set analysis, which tests disease association with genetic variants in a group of functionally related genes, is a promising approach for analyzing and interpreting genome-wide association studies (GWAS) data. These approaches aim to increase power by combining association signals from multiple genes in the same gene set. In addition, gene set analysis can also shed more light on the biological processes underlying complex diseases. However, current approaches for gene set analysis are still in an early stage of development in that analysis results are often prone to sources of bias, including gene set size and gene length, linkage disequilibrium patterns and the presence of overlapping genes. In this paper, we provide an in-depth review of the gene set analysis procedures, along with parameter choices and the particular methodology challenges at each stage. In addition to providing a survey of recently developed tools, we also classify the analysis methods into larger categories and discuss their strengths and limitations. In the last section, we outline several important areas for improving the analytical strategies in gene set analysis.
Genome-wide association study; Gene set; Pathway; Gene-set enrichment analysis; Statistical significance; Complex disease
Differential co-expression analysis (DCEA) has emerged in recent years as a novel, systematic investigation into gene expression data. While most DCEA studies or tools focus on the co-expression relationships among genes, some are developing a potentially more promising research domain, differential regulation analysis (DRA). In our previously proposed R package DCGL v1.0, we provided functions to facilitate basic differential co-expression analyses; however, the output from DCGL v1.0 could not be translated into differential regulation mechanisms in a straightforward manner.
To advance from DCEA to DRA, we upgraded the DCGL package from v1.0 to v2.0. A new module named “Differential Regulation Analysis” (DRA) was designed, which consists of three major functions: DRsort, DRplot, and DRrank. DRsort selects differentially regulated genes (DRGs) and differentially regulated links (DRLs) according to the transcription factor (TF)-to-target information. DRrank prioritizes the TFs in terms of their potential relevance to the phenotype of interest. DRplot graphically visualizes differentially co-expressed links (DCLs) and/or TF-to-target links in a network context. In addition to these new modules, we streamlined the codes from v1.0. The evaluation results proved that our differential regulation analysis is able to capture the regulators relevant to the biological subject.
With ample functions to facilitate differential regulation analysis, DCGL v2.0 was upgraded from a DCEA tool to a DRA tool, which may unveil the underlying differential regulation from the observed differential co-expression. DCGL v2.0 can be applied to a wide range of gene expression data in order to systematically identify novel regulators that have not yet been documented as critical.
DCGL v2.0 package is available at http://cran.r-project.org/web/packages/DCGL/index.html or at our project home page http://lifecenter.sgst.cn/main/en/dcgl.jsp.
Copy number variation (CNV) is one of the most prevalent genetic variations in the genome, leading to an abnormal number of copies of moderate to large genomic regions. High-throughput technologies such as next-generation sequencing often identify thousands of CNVs involved in biological or pathological processes. Despite the growing demand to filter and classify CNVs by factors such as frequency in population, biological features, and function, surprisingly, no online web server for CNV annotations has been made available to the research community. Here, we present CNVannotator, a web server that accepts an input set of human genomic positions in a user-friendly tabular format. CNVannotator can perform genomic overlaps of the input coordinates using various functional features, including a list of the reported 356,817 common CNVs, 181,261 disease CNVs, as well as, 140,342 SNPs from genome-wide association studies. In addition, CNVannotator incorporates 2,211,468 genomic features, including ENCODE regulatory elements, cytoband, segmental duplication, genome fragile site, pseudogene, promoter, enhancer, CpG island, and methylation site. For cancer research community users, CNVannotator can apply various filters to retrieve a subgroup of CNVs pinpointed in hundreds of tumor suppressor genes and oncogenes. In total, 5,277,234 unique genomic coordinates with functional features are available to generate an output in a plain text format that is free to download. In summary, we provide a comprehensive web resource for human CNVs. The annotated results along with the server can be accessed at http://bioinfo.mc.vanderbilt.edu/CNVannotator/.
Currently there are definitions from many agencies and research societies defining “bioinformatics” as deriving knowledge from computational analysis of large volumes of biological and biomedical data. Should this be the bioinformatics research focus? We will discuss this issue in this review article. We would like to promote the idea of supporting human-infrastructure (HI) with no-boundary thinking (NT) in bioinformatics (HINT).
No-boundary thinking; Human infrastructure
A number of genetic studies have suggested numerous susceptibility genes for dental caries over the past decade with few definite conclusions. The rapid accumulation of relevant information, along with the complex architecture of the disease, provides a challenging but also unique opportunity to review and integrate the heterogeneous data for follow-up validation and exploration. In this study, we collected and curated candidate genes from four major categories: association studies, linkage scans, gene expression analyses, and literature mining. Candidate genes were prioritized according to the magnitude of evidence related to dental caries. We then searched for dense modules enriched with the prioritized candidate genes through their protein-protein interactions (PPIs). We identified 23 modules comprising of 53 genes. Functional analyses of these 53 genes revealed three major clusters: cytokine network relevant genes, matrix metalloproteinases (MMPs) family, and transforming growth factor-beta (TGF-β) family, all of which have been previously implicated to play important roles in tooth development and carious lesions. Through our extensive data collection and an integrative application of gene prioritization and PPI network analyses, we built a dental caries-specific sub-network for the first time. Our study provided insights into the molecular mechanisms underlying dental caries. The framework we proposed in this work can be applied to other complex diseases.
Genetic factors underlying trait neuroticism, reflecting a tendency towards negative affective states, may overlap genetic susceptibility for anxiety disorders and help explain the extensive comorbidity amongst internalizing disorders. Genome-wide linkage (GWL) data from several studies of neuroticism and anxiety disorders have been published, providing an opportunity to test such hypotheses and identify genomic regions that harbor genes common to these phenotypes. In all, 11 independent GWL studies of either neuroticism (n=8) or anxiety disorders (n=3) were collected, which comprised of 5341 families with 15 529 individuals. The rank-based genome scan meta-analysis (GSMA) approach was used to analyze each trait separately and combined, and global correlations between results were examined. False discovery rate (FDR) analysis was performed to test for enrichment of significant effects. Using 10 cM intervals, bins nominally significant for both GSMA statistics, PSR and POR, were found on chromosomes 9, 11, 12, and 14 for neuroticism and on chromosomes 1, 5, 15, and 16 for anxiety disorders. Genome-wide, the results for the two phenotypes were significantly correlated, and a combined analysis identified additional nominally significant bins. Although none reached genome-wide significance, an excess of significant PSRP-values were observed, with 12 bins falling under a FDR threshold of 0.50. As demonstrated by our identification of multiple, consistent signals across the genome, meta-analytically combining existing GWL data is a valuable approach to narrowing down regions relevant for anxiety-related phenotypes. This may prove useful for prioritizing emerging genome-wide association data for anxiety disorders.
anxiety; neuroticism; panic disorder; linkage; meta-analysis
Drug addiction is a complex and chronic mental disease, which places a large burden on the American healthcare system due to its negative effects on patients and their families. Recently, network pharmacology is emerging as a promising approach to drug discovery by integrating network biology and polypharmacology, allowing for a deeper understanding of molecular mechanisms of drug actions at the systems level. This study seeks to apply this approach for investigation of illicit drugs and their targets in order to elucidate their interaction patterns and potential secondary drugs that can aid future research and clinical care.
In this study, we extracted 188 illicit substances and their related information from the DrugBank database. The data process revealed 86 illicit drugs targeting a total of 73 unique human genes, which forms an illicit drug-target network. Compared to the full drug-target network from DrugBank, illicit drugs and their target genes tend to cluster together and form four subnetworks, corresponding to four major medication categories: depressants, stimulants, analgesics, and steroids. External analysis of Anatomical Therapeutic Chemical (ATC) second sublevel classifications confirmed that the illicit drugs have neurological functions or act via mechanisms of stimulants, opioids, and steroids. To further explore other drugs potentially having associations with illicit drugs, we constructed an illicit-extended drug-target network by adding the drugs that have the same target(s) as illicit drugs to the illicit drug-target network. After analyzing the degree and betweenness of the network, we identified hubs and bridge nodes, which might play important roles in the development and treatment of drug addiction. Among them, 49 non-illicit drugs might have potential to be used to treat addiction or have addictive effects, including some results that are supported by previous studies.
This study presents the first systematic review of the network characteristics of illicit drugs, their targets, and other drugs that share the targets of these illicit drugs. The results, though preliminary, provide some novel insights into the molecular mechanisms of drug addiction. The observation of illicit-related drugs, with partial verification from previous studies, demonstrated that the network-assisted approach is promising for the identification of drug repositioning.
Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.
Antipsychotic drugs are medications commonly for schizophrenia (SCZ) treatment, which include two groups: typical and atypical. SCZ patients have multiple comorbidities, and the coadministration of drugs is quite common. This may result in adverse drug-drug interactions, which are events that occur when the effect of a drug is altered by the coadministration of another drug. Therefore, it is important to provide a comprehensive view of these interactions for further coadministration improvement. Here, we extracted SCZ drugs and their adverse drug interactions from the DrugBank and compiled a SCZ-specific adverse drug interaction network. This network included 28 SCZ drugs, 241 non-SCZs, and 991 interactions. By integrating the Anatomical Therapeutic Chemical (ATC) classification with the network analysis, we characterized those interactions. Our results indicated that SCZ drugs tended to have more adverse drug interactions than other drugs. Furthermore, SCZ typical drugs had significant interactions with drugs of the “alimentary tract and metabolism” category while SCZ atypical drugs had significant interactions with drugs of the categories “nervous system” and “antiinfectives for systemic uses.” This study is the first to characterize the adverse drug interactions in the course of SCZ treatment and might provide useful information for the future SCZ treatment.
Kinase inhibitors are accepted treatment for metastatic melanomas that harbor specific driver mutations in BRAF or KIT, but only 40–50% of cases are positive. To uncover other potential targetable mutations, we performed whole-genome sequencing of a highly aggressive BRAF (V600) and KIT (W557, V559, L576, K642, D816) wildtype melanoma. Surprisingly, we found a somatic BRAF L597R mutation in exon 15. Analysis of BRAF exon 15 in 49 tumors negative for BRAF V600 mutations as well as driver mutations in KIT, NRAS, GNAQ, and GNA11, showed that 2 (4%) harbored L597 mutations and another 2 involved BRAF D594 and K601 mutations. In vitro signaling induced by L597R/S/Q mutants was suppressed by MEK inhibition. A patient with BRAF L597S mutant metastatic melanoma responded significantly to treatment with the MEK inhibitor, TAK-733. Collectively, these data demonstrate clinical significance to BRAF L597 mutations in melanoma.
melanoma; BRAF L597; whole genome sequencing; BRAF inhibitor; MEK inhibitor; TAK-733
Gene set-based analysis of genome-wide association study (GWAS) data has recently emerged as a useful approach to examine the joint effects of multiple risk loci in complex human diseases or phenotypes. Dental caries is a common, chronic, and complex disease leading to a decrease in quality of life worldwide. In this study, we applied the approaches of gene set enrichment analysis to a major dental caries GWAS dataset, which consists of 537 cases and 605 controls. Using four complementary gene set analysis methods, we analyzed 1331 Gene Ontology (GO) terms collected from the Molecular Signatures Database (MSigDB). Setting false discovery rate (FDR) threshold as 0.05, we identified 13 significantly associated GO terms. Additionally, 17 terms were further included as marginally associated because they were top ranked by each method, although their FDR is higher than 0.05. In total, we identified 30 promising GO terms, including ‘Sphingoid metabolic process,’ ‘Ubiquitin protein ligase activity,’ ‘Regulation of cytokine secretion,’ and ‘Ceramide metabolic process.’ These GO terms encompass broad functions that potentially interact and contribute to the oral immune response related to caries development, which have not been reported in the standard single marker based analysis. Collectively, our gene set enrichment analysis provided complementary insights into the molecular mechanisms and polygenic interactions in dental caries, revealing promising association signals that could not be detected through single marker analysis of GWAS data.
High-throughput genomic technologies are increasingly being used to identify therapeutic targets and risk factors for specific diseases. Using 116 independent liver samples, we identified 793 probe sets that demonstrated a significant association in the frequency of absent calls as tissues progressed from normal to pre-neoplastic to neoplastic, followed by a bioinformatic approach which identified that 78.9% of the significant probe sets contained at least one CpG island in the gene promoter region compared with 58.9% of the remaining genes examined. Our results indicate that further high-throughput methylation studies to more fully characterize molecular events involved in hepatocarcinogenesis are warranted.
HCC; hepatocellular carcinoma; HCV; hepatitis C virus; methylation; microarray; hepatocarcinogenesis
Integrating evidence from multiple domains is useful in prioritizing disease candidate genes for subsequent testing. We ranked all known human genes (n = 3819) under linkage peaks in the Irish Study of High-Density Schizophrenia Families using three different evidence domains: 1) a meta-analysis of microarray gene expression results using the Stanley Brain collection, 2) a schizophrenia protein-protein interaction network, and 3) a systematic literature search. Each gene was assigned a domain-specific p-value and ranked after evaluating the evidence within each domain. For comparison to this ranking process, a large-scale candidate gene hypothesis was also tested by including genes with Gene Ontology terms related to neurodevelopment. Subsequently, genotypes of 3725 SNPs in 167 genes from a custom Illumina iSelect array were used to evaluate the top ranked vs. hypothesis selected genes. Seventy-three genes were both highly ranked and involved in neurodevelopment (category 1) while 42 and 52 genes were exclusive to neurodevelopment (category 2) or highly ranked (category 3), respectively. The most significant associations were observed in genes PRKG1, PRKCE, and CNTN4 but no individual SNPs were significant after correction for multiple testing. Comparison of the approaches showed an excess of significant tests using the hypothesis-driven neurodevelopment category. Random selection of similar sized genes from two independent genome-wide association studies (GWAS) of schizophrenia showed the excess was unlikely by chance. In a further meta-analysis of three GWAS datasets, four candidate SNPs reached nominal significance. Although gene ranking using integrated sources of prior information did not enrich for significant results in the current experiment, gene selection using an a priori hypothesis (neurodevelopment) was superior to random selection. As such, further development of gene ranking strategies using more carefully selected sources of information is warranted.
The molecular mechanisms underlying the reduced penetrance seen in the nonsense-mediated decay–positive (NMD+) BMPR2 mutation–associated hereditary pulmonary arterial hypertension (HPAH) remain unknown. We reasoned that the cellular and genetic mechanisms behind this phenomenon could be uncovered by combining expression profiling with Connectivity Map (cMap) analysis. Cultured lymphocytes from 10 patients with HPAH and 10 matched familial control subjects, all with NMD+ BMPR2 mutations, were subjected to expression analysis. For each group, the expression data were combined before analysis. This generated a signature of 23 up-regulated and 12 down-regulated genes in patients with HPAH compared with control subjects (the “PAH penetrance signature”). Although gene set enrichment analysis of this signature was not uniquely informative, cMap analysis identified drugs with expression signatures similar to the PAH penetrance signature. Several of these drugs were predicted to influence reactive oxygen species (ROS) formation. This hypothesis was tested and confirmed in the same cells initially subjected to the expression analysis using quantitative biochemical detection of ROS concentration. We conclude that expression of the PAH penetrance signature represents an increased risk of developing clinical HPAH and that ROS formation may play a role in pathogenesis of HPAH. These results provide the first molecular insights into NMD+ BMPR2 related HPAH penetrance and highlight the potential utility of cMap analyses in pulmonary research.
hereditary pulmonary arterial hypertention; ROS; connectivity map; expression signature; BMPR2
Along with computational approaches, NGS led technologies have caused a major impact upon the discoveries made in the area of miRNA biology, including novel miRNAs identification. However, to this date all microRNA discovery tools compulsorily depend upon the availability of reference or genomic sequences. Here, for the first time a novel approach, miReader, has been introduced which could discover novel miRNAs without any dependence upon genomic/reference sequences. The approach used NGS read data to build highly accurate miRNA models, molded through a Multi-boosting algorithm with Best-First Tree as its base classifier. It was comprehensively tested over large amount of experimental data from wide range of species including human, plants, nematode, zebrafish and fruit fly, performing consistently with >90% accuracy. Using the same tool over Illumina read data for Miscanthus, a plant whose genome is not sequenced; the study reported 21 novel mature miRNA duplex candidates. Considering the fact that miRNA discovery requires handling of high throughput data, the entire approach has been implemented in a standalone parallel architecture. This work is expected to cause a positive impact over the area of miRNA discovery in majority of species, where genomic sequence availability would not be a compulsion any more.
Background and Methods
Esophageal adenocarcinoma (EAC) is characterized by a steep rise in incidence rates in the Western population. The unique miRNA signature that distinguishes EAC from other upper gastrointestinal cancers remains unclear. Herein, we performed a comprehensive microarray profiling for the specific miRNA signature associated with EAC. We validated this signature by qRT-PCR.
Microarray analysis showed that 21 miRNAs were consistently deregulated in EAC. miR-194, miR-192, miR-200a, miR-21, miR-203, miR-205, miR-133b, and miR-31 were selected for validation using 46 normal squamous (NS), 23 Barrett’s esophagus (BE), 17 Barrett’s high grade dysplasia (HGD), 34 EAC, 33 gastric adenocarcinoma (GC), and 45 normal gastric (NG) tissues. The qRT-PCR analysis indicated that 2 miRNAs (miR-21 and miR-133b) were deregulated in both EAC and GC, and 6 miRNAs (up-regulated: miR-194, miR-31, miR-192, and miR-200a; down-regulated: miR-203 and miR-205) in EAC, as compared to BE but not in GC, indicating their potential unique role in EAC. Our data showed that miR-194, miR-192, miR-21, and miR-31 were up-regulated in BE adjacent to HGD lesions relative to isolated BE samples. Analysis of clinicopathological features indicated that down-regulation of miR-203 is significantly associated with progression and tumor stages in EAC. Interestingly, the overexpression levels of miR-194, miR-200a, and miR-192 were significantly higher in early EAC stages, suggesting that these miRNAs may be involved in EAC tumor development rather than progression.
Our findings demonstrate the presence of a unique miRNA signature for EAC. This may provide some clues for the distinct molecular features of EAC to be considered in future studies of the role of miRNAs in EAC and their utility as disease biomarkers.
Next generation sequencing (NGS) technologies allow us to explore virus interactions with host genomes that lead to carcinogenesis or other diseases; however, this effort is largely hindered by the dearth of efficient computational tools. Here, we present a new tool, VirusFinder, for the identification of viruses and their integration sites in host genomes using NGS data, including whole transcriptome sequencing (RNA-Seq), whole genome sequencing (WGS), and targeted sequencing data. VirusFinder’s unique features include the characterization of insertion loci of virus of arbitrary type in the host genome and high accuracy and computational efficiency as a result of its well-designed pipeline. The source code as well as additional data of VirusFinder is publicly available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Breast Cancer (BCa) genome-wide association studies revealed allelic frequency differences between cases and controls at index single nucleotide polymorphisms (SNPs). To date, 71 loci have thus been identified and replicated. More than 320,000 SNPs at these loci define BCa risk due to linkage disequilibrium (LD). We propose that BCa risk resides in a subgroup of SNPs that functionally affects breast biology. Such a shortlist will aid in framing hypotheses to prioritize a manageable number of likely disease-causing SNPs. We extracted all the SNPs, residing in 1 Mb windows around breast cancer risk index SNP from the 1000 genomes project to find correlated SNPs. We used FunciSNP, an R/Bioconductor package developed in-house, to identify potentially functional SNPs at 71 risk loci by coinciding them with chromatin biofeatures. We identified 1,005 SNPs in LD with the index SNPs (r2≥0.5) in three categories; 21 in exons of 18 genes, 76 in transcription start site (TSS) regions of 25 genes, and 921 in enhancers. Thirteen SNPs were found in more than one category. We found two correlated and predicted non-benign coding variants (rs8100241 in exon 2 and rs8108174 in exon 3) of the gene, ANKLE1. Most putative functional LD SNPs, however, were found in either epigenetically defined enhancers or in gene TSS regions. Fifty-five percent of these non-coding SNPs are likely functional, since they affect response element (RE) sequences of transcription factors. Functionality of these SNPs was assessed by expression quantitative trait loci (eQTL) analysis and allele-specific enhancer assays. Unbiased analyses of SNPs at BCa risk loci revealed new and overlooked mechanisms that may affect risk of the disease, thereby providing a valuable resource for follow-up studies.