Cadherins are cell–cell adhesion proteins essential for the maintenance of tissue architecture and integrity, and their impairment is often associated with human cancer. Knowledge regarding regulatory mechanisms associated with cadherin misexpression in cancer is scarce. Specific features of the intronic-structure and intronic-based regulatory mechanisms in the cadherin superfamily are unidentified. This study aims at systematically characterizing the intronic portion of cadherin superfamily members and the identification of intronic regions constituting putative targets/triggers of regulation, using a bioinformatic approach and biological data mining. Our study demonstrates that the cadherin superfamily genes harbour specific characteristics in comparison to all non-cadherin genes, both from the genomic and transcriptional standpoints. Cadherin superfamily genes display higher average total intron number and significantly longer introns than other genes and across the entire vertebrate lineage. Moreover, in the human genome, we observed an uncommon high frequency of MIR (mammalian-wide interspersed repeats) and MaLR (mammalian-wide interspersed repeats, a subtype of LTR) regulatory-associated repetitive elements at 5′-located introns, concomitantly with increased de novo intronic transcription. Using this approach, we identified cadherin intronic-specific sites that may constitute novel targets/triggers of cadherin superfamily expression regulation. These findings pinpoint the need to identify mechanisms affecting particularly MIR and MaLR elements located in introns 2 and 3 of human cadherin genes, possibly important in the expression modulation of this superfamily in homeostasis and cancer.
cadherin; cancer; intronic-based regulatory elements; MIR; MaLR; transcription
Control of systemic iron homeostasis is interconnected with the inflammatory response through the key iron regulator, the antimicrobial peptide hepcidin. We have previously shown that mice with iron deficiency anemia (IDA)-low hepcidin show a pro-inflammatory response that is blunted in iron deficient-high hepcidin Tmprss6 KO mice. The transcriptional response associated with chronic hepcidin overexpression due to genetic inactivation of Tmprss6 is unknown. By using whole genome transcription profiling of the liver and analysis of spleen immune-related genes we identified several functional pathways differentially expressed in Tmprss6 KO mice, compared to IDA animals and thus irrespective of the iron status. In the effort of defining genes potentially targets of Tmprss6 we analyzed liver gene expression changes according to the genotype and independently of treatment. Tmprss6 inactivation causes down-regulation of liver pathways connected to immune and inflammatory response as well as spleen genes related to macrophage activation and inflammatory cytokines production. The anti-inflammatory status of Tmprss6 KO animals was confirmed by the down-regulation of pathways related to immunity, stress response and intracellular signaling in both liver and spleen after LPS treatment. Opposite to Tmprss6 KO mice, Hfe−/− mice are characterized by iron overload with inappropriately low hepcidin levels. Liver expression profiling of Hfe−/− deficient versus iron loaded mice show the opposite expression of some of the genes modulated by the loss of Tmprss6. Altogether our results confirm the anti-inflammatory status of Tmprss6 KO mice and identify new potential target pathways/genes of Tmprss6.
Recent epidemiological reports of associations between socioeconomic status and epigenetic markers that predict vulnerability to diseases are bringing to light substantial biological effects of social inequalities. Here, we start the discussion of the moral consequences of these findings. We firstly highlight their explanatory importance in the context of the research program on the Developmental Origins of Health and Disease (DOHaD) and the social determinants of health. In the second section, we review some theories of the moral status of health inequalities. Rather than a complete outline of the debate, we single out those theories that rest on the principle of equality of opportunity and analyze the consequences of DOHaD and epigenetics for these particular conceptions of justice. We argue that DOHaD and epigenetics reshape the conceptual distinction between natural and acquired traits on which these theories rely and might provide important policy tools to tackle unjust distributions of health.
Autoinflammatory diseases are rare illnesses characterized by apparently unprovoked inflammation without high-titer auto-antibodies or antigen-specific T cells. They may cause neurological manifestations, such as meningitis and hearing loss, but they are also characterized by non-neurological manifestations. In this work we studied a 30-year-old man who had a chronic disease characterized by meningitis, progressive hearing loss, persistently raised inflammatory markers and diffuse leukoencephalopathy on brain MRI. He also suffered from chronic recurrent osteomyelitis of the mandible. The hypothesis of an autoinflammatory disease prompted us to test for the presence of mutations in interleukin-1−pathway genes and to investigate the function of this pathway in the mononuclear cells obtained from the patient. Search for mutations in genes associated with interleukin-1−pathway demonstrated a novel NLRP3 (CIAS1) mutation (p.I288M) and a previously described MEFV mutation (p.R761H), but their combination was found to be non-pathogenic. On the other hand, we uncovered a selective interleukin-6 hypersecretion within the central nervous system as the likely pathogenic mechanism. This is also supported by the response to the anti-interleukin-6−receptor monoclonal antibody tocilizumab, but not to the recombinant interleukin-1−receptor antagonist anakinra. Exome sequencing failed to identify mutations in other genes known to be involved in autoinflammatory diseases. We propose that the disease described in this patient might be a prototype of a novel category of autoinflammatory diseases characterized by prominent neurological involvement.
Anakinra; Aseptic meningitis; NLRP3 (CIAS1); Hearing loss; Interleukin-1; Interleukin-6; Leukoencephalopathy; MEFV; Tocilizumab
Co-option of cis-regulatory modules has been suggested as a mechanism for the evolution of expression sites during development. However, the extent and mechanisms involved in mobilization of cis-regulatory modules remains elusive. To trace the history of non-coding elements, which may represent candidate ancestral cis-regulatory modules affirmed during chordate evolution, we have searched for conserved elements in tunicate and vertebrate (Olfactores) genomes. We identified, for the first time, 183 non-coding sequences that are highly conserved between the two groups. Our results show that all but one element are conserved in non-syntenic regions between vertebrate and tunicate genomes, while being syntenic among vertebrates. Nevertheless, in all the groups, they are significantly associated with transcription factors showing specific functions fundamental to animal development, such as multicellular organism development and sequence-specific DNA binding. The majority of these regions map onto ultraconserved elements and we demonstrate that they can act as functional enhancers within the organism of origin, as well as in cross-transgenesis experiments, and that they are transcribed in extant species of Olfactores. We refer to the elements as ‘Olfactores conserved non-coding elements’.
Recent advances in genomics technologies have spurred unprecedented efforts in genome and exome re-sequencing aiming to unravel the genetic component of rare and complex disorders. While in rare disorders this allowed the identification of novel causal genes, the missing heritability paradox in complex diseases remains so far elusive. Despite rapid advances of next-generation sequencing, both the technology and the analysis of the data it produces are in its infancy. At present there is abundant knowledge pertaining to the role of rare single nucleotide variants (SNVs) in rare disorders and of common SNVs in common disorders. Although the 1,000 genome project has clearly highlighted the prevalence of rare variants and more complex variants (e.g. insertions, deletions), their role in disease is as yet far from elucidated.
We set out to analyse the properties of sequence variants identified in a comprehensive collection of exome re-sequencing studies performed on samples from patients affected by a broad range of complex and rare diseases (N = 173). Given the known potential for Loss of Function (LoF) variants to be false positive, we performed an extensive validation of the common, rare and private LoF variants identified, which indicated that most of the private and rare variants identified were indeed true, while common novel variants had a significantly higher false positive rate. Our results indicated a strong enrichment of very low-frequency insertion/deletion variants, so far under-investigated, which might be difficult to capture with low coverage and imputation approaches and for which most of study designs would be under-powered. These insertions and deletions might play a significant role in disease genetics, contributing specifically to the underlining rare and private variation predicted to be discovered through next generation sequencing.
Parkinson’s disease (PD) is a common, adult-onset, neuro-degenerative disorder characterized by the degeneration of cardinal motor signs mainly due to the loss of dopaminergic neurons in the substantia nigra. To date, researchers still have limited understanding of the key molecular events that provoke neurodegeneration in this disease. Here, we present ParkDB, the first queryable database dedicated to gene expression in PD. ParkDB contains a complete set of re-analyzed, curated and annotated microarray datasets. This resource enables scientists to identify and compare expression signatures involved in PD and dopaminergic neuron differentiation under different biological conditions and across species.
Database URL: http://www2.cancer.ucl.ac.uk/Parkinson_Db2/
Recent multi-dimensional approaches to the study of complex disease have revealed powerful insights into how genetic and epigenetic factors may underlie their aetiopathogenesis. We examined genotype-epigenotype interactions in the context of Type 2 Diabetes (T2D), focussing on known regions of genomic susceptibility. We assayed DNA methylation in 60 females, stratified according to disease susceptibility haplotype using previously identified association loci. CpG methylation was assessed using methylated DNA immunoprecipitation on a targeted array (MeDIP-chip) and absolute methylation values were estimated using a Bayesian algorithm (BATMAN). Absolute methylation levels were quantified across LD blocks, and we identified increased DNA methylation on the FTO obesity susceptibility haplotype, tagged by the rs8050136 risk allele A (p = 9.40×10−4, permutation p = 1.0×10−3). Further analysis across the 46 kb LD block using sliding windows localised the most significant difference to be within a 7.7 kb region (p = 1.13×10−7). Sequence level analysis, followed by pyrosequencing validation, revealed that the methylation difference was driven by the co-ordinated phase of CpG-creating SNPs across the risk haplotype. This 7.7 kb region of haplotype-specific methylation (HSM), encapsulates a Highly Conserved Non-Coding Element (HCNE) that has previously been validated as a long-range enhancer, supported by the histone H3K4me1 enhancer signature. This study demonstrates that integration of Genome-Wide Association (GWA) SNP and epigenomic DNA methylation data can identify potential novel genotype-epigenotype interactions within disease-associated loci, thus providing a novel route to aid unravelling common complex diseases.
The International Knockout Mouse Consortium (IKMC) aims to mutate all protein-coding genes in the mouse using a combination of gene targeting and gene trapping in mouse embryonic stem (ES) cells and to make the generated resources readily available to the research community. The IKMC database and web portal (www.knockoutmouse.org) serves as the central public web site for IKMC data and facilitates the coordination and prioritization of work within the consortium. Researchers can access up-to-date information on IKMC knockout vectors, ES cells and mice for specific genes, and follow links to the respective repositories from which corresponding IKMC products can be ordered. Researchers can also use the web site to nominate genes for targeting, or to indicate that targeting of a gene should receive high priority. The IKMC database provides data to, and features extensive interconnections with, other community databases.
Ultraconserved elements (UCEs) are highly constrained elements of mammalian genomes, whose functional role has not been completely elucidated yet. Previous studies have shown that some of them act as enhancers in mouse, while some others are expressed in both normal and cancer-derived human tissues. Only one UCE element so far was shown to present these two functions concomitantly, as had been observed in other isolated instances of single, non ultraconserved enhancer elements.
We used a custom microarray to assess the levels of UCE transcription during mouse development and integrated these data with published microarray and next-generation sequencing datasets as well as with newly produced PCR validation experiments. We show that a large fraction of non-exonic UCEs is transcribed across all developmental stages examined from only one DNA strand. Although the nature of these transcripts remains a mistery, our meta-analysis of RNA-Seq datasets indicates that they are unlikely to be short RNAs and that some of them might encode nuclear transcripts. In the majority of cases this function overlaps with the already established enhancer function of these elements during mouse development. Utilizing several next-generation sequencing datasets, we were further able to show that the level of expression observed in non-exonic UCEs is significantly higher than in random regions of the genome and that this is also seen in other regions which act as enhancers.
Our data shows that the concurrent presence of enhancer and transcript function in non-exonic UCE elements is more widespread than previously shown. Moreover through our own experiments as well as the use of next-generation sequencing datasets, we were able to show that the RNAs encoded by non-exonic UCEs are likely to be long RNAs transcribed from only one DNA strand.
Mixed lineage kinase 3 (MLK3) is a serine/threonine kinase, regulating MAPkinase signalling, in which cancer-associated mutations have never been reported. In this study, 174 primary gastrointestinal cancers (48 hereditary and 126 sporadic forms) and 7 colorectal cancer cell lines were screened for MLK3 mutations. MLK3 mutations were significantly associated with MSI phenotype in primary tumours (P = 0.0005), occurring in 21% of the MSI carcinomas. Most MLK3 somatic mutations identified were of the missense type (62.5%) and more than 80% of them affected evolutionarily conserved residues. A predictive 3D model points to the functional relevance of MLK3 missense mutations, which cluster in the kinase domain. Further, the model shows that most of the altered residues in the kinase domain probably affect MLK3 scaffold properties, instead of its kinase activity. MLK3 missense mutations showed transforming capacity in vitro and cells expressing the mutant gene were able to develop locally invasive tumours, when subcutaneously injected in nude mice. Interestingly, in primary tumours, MLK3 mutations occurred in KRAS and/or BRAF wild-type carcinomas, although not being mutually exclusive genetic events. In conclusion, we have demonstrated for the first time the presence of MLK3 mutations in cancer and its association to mismatch repair deficiency. Further, we demonstrated that MLK3 missense mutations found in MSI gastrointestinal carcinomas are functionally relevant.
PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations.
Germline CDH1 point or small frameshift mutations can be identified in 30–50% of hereditary diffuse gastric cancer (HDGC) families. We hypothesized that CDH1 genomic rearrangements would be found in HDGC and identified 160 families with either two gastric cancers in first-degree relatives and with at least one diffuse gastric cancer (DGC) diagnosed before age 50, or three or more DGC in close relatives diagnosed at any age. Sixty-seven carried germline CDH1 point or small frameshift mutations. We screened germline DNA from the 93 mutation negative probands for large genomic rearrangements by Multiplex Ligation-Dependent Probe Amplification. Potential deletions were validated by RT–PCR and breakpoints cloned using a combination of oligo-CGH-arrays and long-range-PCR. In-silico analysis of the CDH1 locus was used to determine a potential mechanism for these rearrangements. Six of 93 (6.5%) previously described mutation negative HDGC probands, from low GC incidence populations (UK and North America), carried genomic deletions (UK and North America). Two families carried an identical deletion spanning 193 593 bp, encompassing the full CDH3 sequence and CDH1 exons 1 and 2. Other deletions affecting exons 1, 2, 15 and/or 16 were identified. The statistically significant over-representation of Alus around breakpoints indicates it as a likely mechanism for these deletions. When all mutations and deletions are considered, the overall frequency of CDH1 alterations in HDGC is ∼46% (73/160). CDH1 large deletions occur in 4% of HDGC families by mechanisms involving mainly non-allelic homologous recombination in Alu repeat sequences. As the finding of pathogenic CDH1 mutations is useful for management of HDGC families, screening for deletions should be offered to at-risk families.
We have developed a comprehensive resource devoted to biologists wanting to optimize the use of gene trap clones in their experiments. We have processed 300 602 such clones from both public and private projects to generate 28 199 ‘UniTraps’, i.e. distinct collections of unambiguous insertions at the same subgenic region of annotated genes. The UniTrap resource contains data relative to 9583 trapped genes, which represent 42.3% of the mouse gene content. Among the trapped genes, 7 728 have a counterpart in humans, and 677 are known to be involved in the pathogenesis of human diseases. The aim of this analysis is to provide the wet lab researchers with a comprehensive database and curated tools for (i) identifying and comparing the clones carrying a trap into the genes of interest, (ii) evaluating the severity of the mutation to the protein function in each independent trapping event and (iii) supplying complete information to perform PCR, RT-PCR and restriction experiments to verify the clone and identify the exact point of vector insertion. To share this unique resource with the scientific community, we have designed and implemented a web interface that is freely accessible at http://unitrap.cbm.fvg.it/.
Early steps of embryo development are directed by maternal gene products and trace levels of zygotic gene activity in vertebrates. A major activation of zygotic transcription occurs together with degradation of maternal mRNAs during the midblastula transition in several vertebrate systems. How these processes are regulated in preparation for the onset of differentiation in the vertebrate embryo is mostly unknown. Here, we studied the function of TATA-binding protein (TBP) by knock down and DNA microarray analysis of gene expression in early embryo development. We show that a subset of polymerase II-transcribed genes with ontogenic stage-dependent regulation requires TBP for their zygotic activation. TBP is also required for limiting the activation of genes during development. We reveal that TBP plays an important role in the degradation of a specific subset of maternal mRNAs during late blastulation/early gastrulation, which involves targets of the miR-430 pathway. Hence, TBP acts as a specific regulator of the key processes underlying the transition from maternal to zygotic regulation of embryogenesis. These results implicate core promoter recognition as an additional level of differential gene regulation during development.
maternal mRNA; MBT; TBP; transcription; zebrafish
The BASC system provides tools for the integrated mining and browsing of genetic, genomic and phenotypic data. This public resource hosts information on Brassica species supporting the Multinational Brassica Genome Sequencing Project, and is based upon five distinct modules, ESTDB, Microarray, MarkerQTL, CMap and EnsEMBL. ESTDB hosts expressed gene sequences and related annotation derived from comparison with GenBank, UniRef and the genome sequence of Arabidopsis. The Microarray module hosts gene expression information related to genes annotated within ESTDB. MarkerQTL is the most complex module and integrates information on genetic markers, maps, individuals, genotypes and traits. Two further modules include an Arabidopsis EnsEMBL genome viewer and the CMap comparative genetic map viewer for the visualization and integration of genetic and genomic data. The database is accessible at .
Alignment of orthologous vertebrate loci reveals that a significant proportion of conserved cis-regulatory elements have undergone shuffling during evolution.
All vertebrates share a remarkable degree of similarity in their development as well as in the basic functions of their cells. Despite this, attempts at unearthing genome-wide regulatory elements conserved throughout the vertebrate lineage using BLAST-like approaches have thus far detected noncoding conservation in only a few hundred genes, mostly associated with regulation of transcription and development.
We used a unique combination of tools to obtain regional global-local alignments of orthologous loci. This approach takes into account shuffling of regulatory regions that are likely to occur over evolutionary distances greater than those separating mammalian genomes. This approach revealed one order of magnitude more vertebrate conserved elements than was previously reported in over 2,000 genes, including a high number of genes found in the membrane and extracellular regions. Our analysis revealed that 72% of the elements identified have undergone shuffling. We tested the ability of the elements identified to enhance transcription in zebrafish embryos and compared their activity with a set of control fragments. We found that more than 80% of the elements tested were able to enhance transcription significantly, prevalently in a tissue-restricted manner corresponding to the expression domain of the neighboring gene.
Our work elucidates the importance of shuffling in the detection of cis-regulatory elements. It also elucidates how similarities across the vertebrate lineage, which go well beyond development, can be explained not only within the realm of coding genes but also in that of the sequences that ultimately govern their expression.
Studies on the zebrafish model have contributed to our understanding of several
important developmental processes, especially those that can be easily studied in the
embryo. However, our knowledge on late events such as gonad differentiation in the
zebrafish is still limited. Here we provide an analysis on the gene sets expressed in
the adult zebrafish testis and ovary in an attempt to identify genes with potential
role in (zebra)fish gonad development and function. We produced 10 533 expressed
sequence tags (ESTs) from zebrafish testis or ovary and downloaded an additional
23 642 gonad-derived sequences from the zebrafish EST database. We clustered these
sequences together with over 13 000 kidney-derived zebrafish ESTs to study partial
transcriptomes for these three organs. We searched for genes with gonad-specific
expression by screening macroarrays containing at least 2600 unique cDNA inserts
with testis-, ovary- and kidney-derived cDNA probes. Clones hybridizing to only one
of the two gonad probes were selected, and subsequently screened with computational
tools to identify 72 genes with potentially testis-specific and 97 genes with potentially
ovary-specific expression, respectively. PCR-amplification confirmed gonad-specificity
for 21 of the 45 clones tested (all without known function). Our study, which involves
over 47 000 EST sequences and specialized cDNA arrays, is the first analysis of adult
organ transcriptomes of zebrafish at such a scale. The study of genes expressed in
adult zebrafish testis and ovary will provide useful information on regulation of gene
expression in teleost gonads and might also contribute to our understanding of the
development and differentiation of reproductive organs in vertebrates.