semantic biology; biological ontologies; semantic web; data representation; data analysis
During embryonic development a large number of widely differing and specialized cell types with identical genomes are generated from a single totipotent zygote. Tissue specific transcription factors cooperate with epigenetic modifiers to establish cellular identity in differentiated cells and epigenetic regulatory mechanisms contribute to the maintenance of distinct chromatin states and cell-type specific gene expression patterns, a phenomenon referred to as epigenetic memory. This is accomplished via the stable maintenance of various epigenetic marks through successive rounds of cell division. Preservation of DNA methylation patterns is a well-established mechanism of epigenetic memory, but more recently it has become clear that many other epigenetic modifications can also be maintained following DNA replication and cell division. In this review, we present an overview of the current knowledge regarding the role of histone lysine methylation in the establishment and maintenance of stable epigenetic states.
epigenetic memory; epigenetics; cell identity; inheritance; cell fate; histone modification; ES cells
The understanding of networks is a common goal of an unprecedented array of traditional disciplines. One of the protein network properties most influenced by the structural contents of its nodes is the inter-connectivity. Recent studies in which structural information was included into the topological analysis of protein networks revealed that the content of intrinsic disorder in the nodes could modulate the network topology, rewire networks, and change their inter-connectivity, which is defined by its clustering coefficient. Here, we review the role of intrinsic disorder present in the partners of the highly conserved 14-3-3 protein family on its interaction networks. The 14-3-3s are phospho-serine/threonine binding proteins that have strong influence in the regulation of metabolism and signal transduction networks. Intrinsic disorder increases the clustering coefficients, namely the inter-connectivity of the nodes within each 14-3-3 paralog networks. We also review two new ideas to measure intrinsic disorder independently of the primary sequence of proteins, a thermodynamic model and a method that uses protein structures and their solvent environment. This new methods could be useful to explain unsolved questions about versatility and fixation of intrinsic disorder through evolution. The relation between the intrinsic disorder and network topologies could be an interesting model to investigate new implicitness of the graph theory into biology.
protein intrinsic disorder; protein interaction networks; 14-3-3 protein family; protein phosphorylation; post-translational modifications
Analysis of the biological gene networks involved in a disease may lead to the identification of therapeutic targets. Such analysis requires exploring network properties, in particular the importance of individual network nodes (i.e., genes). There are many measures that consider the importance of nodes in a network and some may shed light on the biological significance and potential optimality of a gene or set of genes as therapeutic targets. This has been shown to be the case in cancer therapy. A dilemma exists, however, in finding the best therapeutic targets based on network analysis since the optimal targets should be nodes that are highly influential in, but not toxic to, the functioning of the entire network. In addition, cancer therapeutics targeting a single gene often result in relapse since compensatory, feedback and redundancy loops in the network may offset the activity associated with the targeted gene. Thus, multiple genes reflecting parallel functional cascades in a network should be targeted simultaneously, but require the identification of such targets. We propose a methodology that exploits centrality statistics characterizing the importance of nodes within a gene network that is constructed from the gene expression patterns in that network. We consider centrality measures based on both graph theory and spectral graph theory. We also consider the origins of a network topology, and show how different available representations yield different node importance results. We apply our techniques to tumor gene expression data and suggest that the identification of optimal therapeutic targets involving particular genes, pathways and sub-networks based on an analysis of the nodes in that network is possible and can facilitate individualized cancer treatments. The proposed methods also have the potential to identify candidate cancer therapeutic targets that are not thought to be oncogenes but nonetheless play important roles in the functioning of a cancer-related network or pathway.
network analysis; centrality; cancer; pathway; drug targets; personalized treatment; gene expression
In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of 351 patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO) analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome 21 is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.
breast cancer; gene regulatory network; BC3Net; GPEA; statistical inference; computational genomics
Glioblastomas show heterogeneous histological features. These distinct phenotypic states are thought to be associated with the presence of glioma stem cells (GSCs), which are highly tumorigenic and self-renewing sub-population of tumor cells that have different functional characteristics. Differentiation of GSCs may be regulated by multi-tiered epigenetic mechanisms that orchestrate the expression of thousands of genes. One such regulatory mechanism involves functional non-coding RNAs (ncRNAs), such as microRNAs (miRNAs); a large number of ncRNAs have been identified and shown to regulate the expression of genes associated with cell differentiation programs. Given the roles of miRNAs in cell differentiation, it is possible they are involved in the regulation of gene expression networks in GSCs that are important for the maintenance of the pluripotent state and for directing differentiation. Here, we review recent findings on ncRNAs associated with GSC differentiation and discuss how these ncRNAs contribute to the establishment of tissue heterogeneity during glioblastoma tumor formation.
epigenetics; glioma; cancer stem cells; long non-coding RNA; micro RNA
population genetics; population growth; distribution of fitness effects; disease-causing mutations; coalescent theory
The Illumina NexteraXT transposon protocol is a cost effective way to generate paired end libraries. However, the resulting insert size is highly sensitive to the concentration of DNA used, and the variation of insert sizes is often large. One consequence of this is some fragments may have an insert shorter than the length of a single read, particularly where the library is designed to produce overlapping paired end reads in order to produce longer continuous sequences. Such small insert sizes mean fewer longer reads, and also result in the presence of adapter at the end of the read. Here is presented a protocol to use publicly available tools to identify read pairs with small insert sizes and so likely to contain adapter, to check the sequence of the adapter, and remove adapter sequence from the reads. This protocol does not require a reference genome or prior knowledge of the sequence to be trimmed. Whilst the presence of fragments with small insert sizes may be a particular problem for NexteraXT libraries, the principle can be applied to any Illumina dataset in which the presence of such small inserts is suspected.
Nextera; fastq; insert; adapter; Illumina
The discovery of microRNAs (miRNAs) has led to a paradigm shift in our basic understanding of gene regulation. Competing endogenous RNAs (ceRNAs) are the recent entrants adding to the complexities of miRNA mediated gene regulation. ceRNAs are RNAs that share miRNA recognition elements (MREs) thereby regulating each other. It is apparent that miRNAs act as rheostats that fine-tune gene expression and maintain the functional balance of various gene networks. Thus MREs in coding and non-coding transcripts have evolved to become the crosstalk hubs of gene interactions, affecting the expression levels and activities of different ceRNAs. Decoding the crosstalk between MREs mediated by ceRNAs is critical to delineate the intricacies in gene regulation, and we have just begun to unravel this complexity.
competing endogenous RNAs; miRNAs ceRNAs; microRNAs; MREs; sponge effect; RNA-RNA crosstalk
Gene–environment interaction (GEI) analysis can potentially enhance gene discovery for common complex traits. However, genome-wide interaction analysis is computationally intensive. Moreover, analysis of longitudinal data in families is much more challenging due to the two sources of correlations arising from longitudinal measurements and family relationships. GWIS of longitudinal family data can be a computational bottleneck. Therefore, we compared two methods for analysis of longitudinal family data: a methodologically sound but computationally demanding method using the Kronecker model (KRC) and a computationally more forgiving method using the hierarchical linear model (HLM). The KRC model uses a Kronecker product of an unstructured matrix for correlations among repeated measures (longitudinal) and a compound symmetry matrix for correlations within families at a given visit. The HLM uses an autoregressive covariance matrix for correlations among repeated measures and a random intercept for familial correlations. We compared the two methods using the longitudinal Framingham heart study (FHS) SHARe data. Specifically, we evaluated SNP–alcohol (amount of alcohol consumption) interaction effects on high density lipoprotein cholesterol (HDLC). Keeping the prohibitive computational burden of KRC in mind, we limited the analysis to chromosome 16, where preliminary cross-sectional analysis yielded some interesting results. Our first important finding was that the HLM provided very comparable results but was remarkably faster than the KRC, making HLM the method of choice. Our second finding was that longitudinal analysis provided smaller P-values, thus leading to more significant results, than cross-sectional analysis. This was particularly pronounced in identifying GEIs. We conclude that longitudinal analysis of GEIs is more powerful and that the HLM method is an optimal method of choice as compared to the computationally (prohibitively) intensive KRC method.
gene–environment interactions; longitudinal family data; Framingham heart study; interactions in family data; HLM; SNP–alcohol interactions
When analyzing the data that arises from exome or whole-genome sequencing studies, window-based tests, (i.e., tests that jointly analyze all genetic data in a small genomic region), are very popular. However, power is known to be quite low for finding associations with phenotypes using these tests, and therefore a variety of analytic strategies may be employed to potentially improve power. Using sequencing data of all of chromosome 3 from an interim release of data on 2432 individuals from the UK10K project, we simulated phenotypes associated with rare genetic variation, and used the results to explore the window-based test power. We asked two specific questions: firstly, whether there could be substantial benefits associated with incorporating information from external annotation on the genetic variants, and secondly whether the false discovery rate (FDRs) would be a useful metric for assessing significance. Although, as expected, there are benefits to using additional information (such as annotation) when it is associated with causality, we confirmed the general pattern of low sensitivity and power for window-based tests. For our chosen example, even when power is high to detect some of the associations, many of the regions containing causal variants are not detectable, despite using lax significance thresholds and optimal analytic methods. Furthermore, our estimated FDR values tended to be much smaller than the true FDRs. Long-range correlations between variants—due to linkage disequilibrium—likely explain some of this bias. A more sophisticated approach to using the annotation information may improve power, however, many causal variants of realistic effect sizes may simply be undetectable, at least with this sample size. Perhaps annotation information could assist in distinguishing windows containing causal variants from windows that are merely correlated with causal variants.
rare genetic variants; SNV; false discovery rate; multiple testing; genomic annotation; whole genome sequencing; window-based tests; stratified false discovery rate
Limited understanding of the Rb1 locus hinders genetic and epigenetic analyses of Retinoblastoma, a childhood cancer of the nervous systems. In this study, we used in silico tools to investigate and review putative genetic and epigenetic elements of the Rb1 gene. We report transcription start sites, CpG islands, and regulatory moieties that are likely to influence transcriptional states of this gene. These might contribute genetic and epigenetic information modulating tissue-specific transcripts and expression levels of Rb1. The elements we identified include tandem repeats that reside within or next to CpG islands near Rb1's transcriptional start site, and that are likely to be polymorphic among individuals. Our analyses highlight the complexity of this gene and suggest opportunities and limitations for future studies of retinoblastoma, genetic counseling, and the accurate identification of patients at greater risk of developing the malignancy.
retinoblastoma; epigenetics; CpG islands; in silico analysis
There are twenty-five known inherited cardiac arrhythmia susceptibility genes, all of which encode either ion channel pore-forming subunits or proteins that regulate aspects of ion channel biology such as function, trafficking, and localization. The human KCNE gene family comprises five potassium channel regulatory subunits, sequence variants in each of which are associated with cardiac arrhythmias. KCNE gene products exhibit promiscuous partnering and in some cases ubiquitous expression, hampering efforts to unequivocally correlate each gene to specific native potassium currents. Likewise, deducing the molecular etiology of cardiac arrhythmias in individuals harboring rare KCNE gene variants, or more common KCNE polymorphisms, can be challenging. In this review we provide an update on putative arrhythmia-causing KCNE gene variants, and discuss current thinking and future challenges in the study of molecular mechanisms of KCNE-associated cardiac rhythm disturbances.
MinK-related peptide; MiRP; Long QT Syndrome; atrial fibrillation; Brugada Syndrome
Twin and family studies have shown that most traits are at least moderately heritable. But what are the implications of finding genetic influence for the design of intervention and prevention programs? For complex traits, heritability does not mean immutability, and research has shown that genetic influences can change with age, context, and in response to behavioral and drug interventions. The most significant implications for intervention will come when we move from observational genetics to investigating dynamic genetics, including genetically sensitive interventions. Future interventions should be designed to overcome genetic risk and draw upon genetic strengths by changing the environment.
intervention; gene-environment interaction; dynamic genetics; Twins; heritability
Leukocyte telomere length is believed to measure cellular aging in humans, and short leukocyte telomere length is associated with increased risks of late onset diseases, including cardiovascular disease, dementia, etc. Many studies have shown that leukocyte telomere length is a heritable trait, and several candidate genes have been identified, including TERT, TERC, OBFC1, and CTC1. Unlike most studies that have focused on genetic causes of chronic diseases such as heart disease and diabetes in relation to leukocyte telomere length, the present study examined the genome to identify variants that may contribute to variation in leukocyte telomere length among families with exceptional longevity. From the genome wide association analysis in 4,289 LLFS participants, we identified a novel intergenic SNP rs7680468 located near PAPSS1 and DKK2 on 4q25 (p = 4.7E-8). From our linkage analysis, we identified two additional novel loci with HLOD scores exceeding three, including 4.77 for 17q23.2, and 4.36 for 10q11.21. These two loci harbor a number of novel candidate genes with SNPs, and our gene-wise association analysis identified multiple genes, including DCAF7, POLG2, CEP95, and SMURF2 at 17q23.2; and RASGEF1A, HNRNPF, ANF487, CSTF2T, and PRKG1 at 10q11.21. Among these genes, multiple SNPs were associated with leukocyte telomere length, but the strongest association was observed with one contiguous haplotype in CEP95 and SMURF2. We also show that three previously reported genes—TERC, MYNN, and OBFC1—were significantly associated with leukocyte telomere length at pempirical < 0.05.
telomere length; aging; familial longevity; genome wide association and linkage; family-based study; novel genes
RNA-seq; asymmetry; neurons; proteomics; genomics; heterogeneity; single-neuron diversity
Frequent and devastating epidemics of parasites are one of the major issues encountered by modern agriculture. To manage the impact of pathogens, resistant plant varieties have been selected. However, resistances are overcome by parasites requiring the use of pesticides and causing new economical and food safety issues. A promising strategy to maintain the epidemic at a low level and hamper pathogen's adaptation to varietal resistance is the use of mixtures of varieties such that the mix will form a heterogeneous environment for the parasite. A way to find the good combination of varieties that will actually constitute a heterogeneous environment for pathogens is to look for genotype × genotype (G × G) interactions between pathogens and plant varieties. A pattern in which pathogens have a high fitness on one variety and a poor fitness on other varieties guarantees the efficiency of the mixture strategy. In the present article, we inoculated 18 different genotypes of the fungus Magnaporthe oryzae on three rice plant varieties showing different levels of partial resistance in order to find a variety combination compatible with the requirements of the variety mixture strategy, i.e., showing appropriate G × G interactions. We estimated the success of each plant-fungus interaction by measuring fungal fitness and three fungal life history traits: infection success, within-host growth, sporulation capacity. Our results show the existence of G × G interactions between the two varieties Ariete and CO39 on all measured traits and fungal fitness. We also observed that these varieties have different resistance mechanisms; Ariete is good at controlling infection success of the parasite but is not able to control its growth when inside the leaf, while CO39 shows the opposite pattern. We also found that Maratelli's resistance has been eroded. Finally, correlation analyses demonstrated that not all infectious traits are positively correlated.
G × G interactions; variety mixture; partial resistance; rice; Magnaporthe oryzae; rice blast disease
The diagnosis of a suspected tumor lesion faces two basic problems: detection and identification of the specific type of tumor. Radiological techniques are commonly used for the detection and localization of solid tumors. Prerequisite is a high intrinsic or enhanced contrast between normal and neoplastic tissue. Identification of the tumor type is still based on histological analysis. The result depends critically on the sampling sites, which given the inherent heterogeneity of tumors, constitutes a major limitation. Non-invasive in vivo imaging might overcome this limitation providing comprehensive three-dimensional morphological, physiological, and metabolic information as well as the possibility for longitudinal studies. In this context, magnetic resonance based techniques are quite attractive since offer at the same time high spatial resolution, unique soft tissue contrast, good temporal resolution to study dynamic processes and high chemical specificity. The goal of this paper is to review the role of magnetic resonance techniques in characterizing tumor tissue in vivo both at morphological and physiological levels. The first part of this review covers methods, which provide information on specific aspects of tumor phenotypes, considered as indicators of malignancy. These comprise measurements of the inflammatory status, neo-vascular physiology, acidosis, tumor oxygenation, and metabolism together with tissue morphology. Even if the spatial resolution is not sufficient to characterize the tumor phenotype at a cellular level, this multiparametric information might potentially be used for classification of tumors. The second part discusses mathematical tools, which allow characterizing tissue based on the acquired three-dimensional data set. In particular, methods addressing tumor heterogeneity will be highlighted. Finally, we address the potential and limitation of using MRI as a tool to provide in vivo tissue characterization.
in vivo; histology; MRI; tumor; classification; physiology; metabolism; tissue
Asthma is characterized by lung inflammation caused by complex interaction between the immune system and environmental factors such as allergens and inorganic pollutants. Recent research in this field is focused on discovering new biomarkers associated with asthma pathogenesis. This review illustrates updated research associating biomarkers of allergic asthma and their potential use in systems biology of the disease. We focus on biomolecules with altered expression, which may serve as inflammatory, diagnostic and therapeutic biomarkers of asthma discovered in human or experimental asthma model using genomic, proteomic and epigenomic approaches for gene and protein expression profiling. These include high-throughput technologies such as state of the art microarray and proteomics Mass Spectrometry (MS) platforms. Emerging concepts of molecular interactions and pathways may provide new insights in searching potential clinical biomarkers. We summarized certain pathways with significant linkage to asthma pathophysiology by analyzing the compiled biomarkers. Systems approaches with this data can identify the regulating networks, which will eventually identify the key biomarkers to be used for diagnostics and drug discovery.
allergic asthma; biomarker; DAAB; TH-2 cytokines and ROS pathway
To better understand dynamic disease processes, integrated multi-omic methods are needed, yet comparing different types of omic data remains difficult. Integrative solutions benefit experimenters by eliminating potential biases that come with single omic analysis. We have developed the methods needed to explore whether a relationship exists between co-expression network models built from transcriptomic and proteomic data types, and whether this relationship can be used to improve the disease signature discovery process. A naïve, correlation based method is utilized for comparison. Using publicly available infectious disease time series data, we analyzed the related co-expression structure of the transcriptome and proteome in response to SARS-CoV infection in mice. Transcript and peptide expression data was filtered using quality scores and subset by taking the intersection on mapped Entrez IDs. Using this data set, independent co-expression networks were built. The networks were integrated by constructing a bipartite module graph based on module member overlap, module summary correlation, and correlation to phenotypes of interest. Compared to the module level results, the naïve approach is hindered by a lack of correlation across data types, less significant enrichment results, and little functional overlap across data types. Our module graph approach avoids these problems, resulting in an integrated omic signature of disease progression, which allows prioritization across data types for down-stream experiment planning. Integrated modules exhibited related functional enrichments and could suggest novel interactions in response to infection. These disease and platform-independent methods can be used to realize the full potential of multi-omic network signatures. The data (experiment SM001) are publically available through the NIAID Systems Virology (https://www.systemsvirology.org) and PNNL (http://omics.pnl.gov) web portals. Phenotype data is found in the supplementary information. The ProCoNA package is available as part of Bioconductor 2.13.
omics; networks; data integration; proteomics; transcriptomics; virology; biomarkers; SARS
Dysfunction in the dopaminergic and serotonergic neurotransmitter systems has been demonstrated to be important in the etiology of borderline personality disorder (BPD). We investigated the relationship of two BPD risk factors, the HTR1A promoter polymorphism -1019C > G (rs6295) and the dopamine transporter (DAT1) repeat allele, with BPD in a major depressive disorder cohort of 367 patients. Out-patients with major depressive disorder were recruited for two treatment trials and assessed for personality disorders, including BPD. DNA samples were collected and the rs6295 polymorphism was detected with a TaqMan® assay. The DAT1 repeat allele was genotyped using a modified PCR method. The impact of polymorphisms on BPD was statistically analyzed using uncontrolled logistic and multiple logistic regression models. BPD patients had higher frequencies of the DAT1 9,9 (OR = 2.67) and 9,10 (OR = 3.67) genotypes and also those homozygous HTR1A G allele (OR = 2.03). No significant interactions between HTR1A and DAT1 genotypes, were observed; however, an increased risk of BPD was observed for those patients who were either 9,10; G,G (OR = 6.64) and 9,9; C,G (OR = 5.42). Furthermore, the odds of BPD in patients exhibiting high-risk variants of these two genes differed from those of patients in low-risk groups by up to a factor of 9. Our study provides evidence implicating the importance of the serotonergic and dopaminergic systems in BPD and that the interaction between genes from different neurotransmitters may play a role in the susceptibility to BPD.
personality disorder; depression; 5-HT; dopamine; genetics
Cancer stem cells (CSCs) have been reported in many human tumors and are proposed to drive tumor initiation and progression. CSCs share a variety of biological properties with normal somatic stem cells such as the capacity for self-renewal, the propagation of differentiated progeny, and the expression of specific cell surface markers and stem cell genes. However, CSCs differ from normal stem cells in their chemoresistance and tumorigenic and metastatic activities. Despite their potential clinical importance, the regulation of CSCs at the molecular level is not well-understood. MicroRNAs (miRNAs) are a class of endogenous non-coding RNAs that play an important role in the regulation of several cellular, physiological, and developmental processes. Aberrant miRNA expression is associated with many human diseases including cancer. miRNAs have been implicated in the regulation of CSC properties; therefore, a better understanding of the modulation of CSC gene expression by miRNAs could aid the identification of promising biomarkers and therapeutic targets. In the present review, we summarize the major findings on the regulation of CSCs by miRNAs and discuss recent advances that have improved our understanding of the regulation of CSCs by miRNA networks and may lead to the development of miRNA therapeutics specifically targeting CSCs.
microRNA; cancer stem cells (CSCs); tumor initiation; therapy resistance; metastasis
In African trypanosomes, there is no control of transcription initiation by RNA polymerase II at the level of individual protein-coding genes. Transcription is polycistronic, and individual mRNAs are excised by trans-splicing and polyadenylation. As a consequence, trypanosomes are uniquely reliant on post-transcriptional mechanisms for control of gene expression. Rates of mRNA decay vary over up to two orders of magnitude, making these organisms an excellent model system for the study of mRNA degradation processes. The trypanosome CAF1-NOT complex is simpler than that of other organisms, with no CCR4 or NOT4 homolog: it consists of CAF1, NOT1, NOT2, NOT5 NOT9, NOT10, and NOT11. It is important for the initiation of degradation of most, although not all, mRNAs. There is no homolog of NOT4, and Tho and TREX complexes are absent. Functions of the trypanosome NOT complex are therefore likely to be restricted mainly to deadenylation. Mechanisms that cause the NOT complex to deadenylate some mRNAs faster than others must exist, but have not yet been described.
Trypanosoma; deadenylation; mRNA decay; mRNA degradation; NOT complex; CAF1
Colon cancer has the third highest incidence and mortality among cancers in the United States. MicroRNA-21 (miR21) has been described as an oncomir that is highly overexpressed in tumor tissue from colorectal cancer. Recent studies showed that silencing of miR21 through use of a miR21 inhibitor (anti-miR21) affected viability, apoptosis and the cell cycle in colon cancer cells. We identified an anti-miR21 that targets miR21 to inhibit genes by both post-transcriptional gene silencing and transcriptional gene silencing in the cytoplasm and nucleus, respectively. Overexpression of anti-miR21 in colon cancer cells caused changes in miRNA expression levels. We found that treatment with anti-miR21 down-regulated expression of miR30, which is involved in angiogenesis. In an in vitro angiogenesis assay, network formation induced by an angiogenesis activator was reduced upon treatment with anti-miR21. Sequence analysis of anti-miR21 and pri-miR30 revealed homology between anti-miR21 and the 3′ end of pri-miR30, suggesting that anti-miR21 may bind to pri-miR30 and block processing of the miRNA processing. These results suggest anti-miR21 has a role not only in tumor growth but also in angiogenesis. Therefore, treatment with the anti-miR21 antagomir may have a synergistic effect mediated through suppression of miR30.
microRNA; anti-miR21; perturbation; siRNA; colon cancer; angiogenesis
micro RNA; miRNA-7; circular RNAs; evolution; gene regulation; Alzheimer's disease; transcriptome; hippocampal CA1