Triple-negative breast cancers (TNBCs) are a heterogeneous set of tumors defined by an absence of actionable therapeutic targets (ER−,PR−,HER2−). Microdissected normal ductal epithelium from healthy volunteers represents a novel comparator to reveal insights into TNBC heterogeneity and to inform drug development. Using RNA-sequencing data from our institution and The Cancer Genome Atlas (TCGA) we compared the transcriptomes of 94 TNBCs, 20 microdissected normal breast tissues from healthy volunteers from the Susan G. Komen for the Cure Tissue Bank, and 10 histologically normal tissues adjacent to tumor. Pathway analysis comparing TNBCs to optimized normal controls of microdissected normal epithelium versus classic controls composed of adjacent normal tissue revealed distinct molecular signatures. Differential gene expression of TNBC compared with normal comparators demonstrated important findings for TNBC-specific clinical trials testing targeted agents; lack of over-expression for negative studies and over-expression in studies with drug activity. Next, by comparing each individual TNBC to the set of microdissected normals, we demonstrate that TNBC heterogeneity is attributable to transcriptional chaos, is associated with non-silent DNA mutational load, and explains transcriptional heterogeneity in addition to known molecular subtypes. Finally, chaos analysis identified 146 core genes dysregulated in >90% of TNBCs revealing an over-expressed central network. In conclusion, Use of microdissected normal ductal epithelium from healthy volunteers enables an optimized approach for studying TNBC and uncovers biological heterogeneity mediated by transcriptional chaos.
triple-negative breast cancer; RNA-seq; TCGA; normal breast; adjacent normal; ductal epithelium
Advances of high-throughput technologies have rapidly produced more and more data from DNAs and RNAs to proteins, especially large volumes of genome-scale data. However, connection of the genomic information to cellular functions and biological behaviours relies on the development of effective approaches at higher systems level. In particular, advances in RNA-Seq technology has helped the studies of transcriptome, RNA expressed from the genome, while systems biology on the other hand provides more comprehensive pictures, from which genes and proteins actively interact to lead to cellular behaviours and physiological phenotypes. As biological interactions mediate many biological processes that are essential for cellular function or disease development, it is important to systematically identify genomic information including genetic mutations from GWAS (genome-wide association study), differentially expressed genes, bidirectional promoters, intrinsic disordered proteins (IDP) and protein interactions to gain deep insights into the underlying mechanisms of gene regulations and networks. Furthermore, bidirectional promoters can co-regulate many biological pathways, where the roles of bidirectional promoters can be studied systematically for identifying co-regulating genes at interactive network level. Combining information from different but related studies can ultimately help revealing the landscape of molecular mechanisms underlying complex diseases such as cancer.
Kidney Renal Clear Cell Carcinoma (KIRC) is one of fatal genitourinary diseases and accounts for most malignant kidney tumours. KIRC has been shown resistance to radiotherapy and chemotherapy. Like many types of cancers, there is no curative treatment for metastatic KIRC. Using advanced sequencing technologies, The Cancer Genome Atlas (TCGA) project of NIH/NCI-NHGRI has produced large-scale sequencing data, which provide unprecedented opportunities to reveal new molecular mechanisms of cancer. We combined differentially expressed genes, pathways and network analyses to gain new insights into the underlying molecular mechanisms of the disease development.
Followed by the experimental design for obtaining significant genes and pathways, comprehensive analysis of 537 KIRC patients' sequencing data provided by TCGA was performed. Differentially expressed genes were obtained from the RNA-Seq data. Pathway and network analyses were performed. We identified 186 differentially expressed genes with significant p-value and large fold changes (P < 0.01, |log(FC)| > 5). The study not only confirmed a number of identified differentially expressed genes in literature reports, but also provided new findings. We performed hierarchical clustering analysis utilizing the whole genome-wide gene expressions and differentially expressed genes that were identified in this study. We revealed distinct groups of differentially expressed genes that can aid to the identification of subtypes of the cancer. The hierarchical clustering analysis based on gene expression profile and differentially expressed genes suggested four subtypes of the cancer. We found enriched distinct Gene Ontology (GO) terms associated with these groups of genes. Based on these findings, we built a support vector machine based supervised-learning classifier to predict unknown samples, and the classifier achieved high accuracy and robust classification results. In addition, we identified a number of pathways (P < 0.04) that were significantly influenced by the disease. We found that some of the identified pathways have been implicated in cancers from literatures, while others have not been reported in the cancer before. The network analysis leads to the identification of significantly disrupted pathways and associated genes involved in the disease development. Furthermore, this study can provide a viable alternative in identifying effective drug targets.
Our study identified a set of differentially expressed genes and pathways in kidney renal clear cell carcinoma, and represents a comprehensive computational approach to analysis large-scale next-generation sequencing data. The pathway and network analyses suggested that information from distinctly expressed genes can be utilized in the identification of aberrant upstream regulators. Identification of distinctly expressed genes and altered pathways are important in effective biomarker identification for early cancer diagnosis and treatment planning. Combining differentially expressed genes with pathway and network analyses using intelligent computational approaches provide an unprecedented opportunity to identify upstream disease causal genes and effective drug targets.
Kidney Renal Clear Cell Carcinoma; TCGA; RNA-Seq; Differentially Expressed Genes; Pathways; Gene Network Analysis; Machine Learning Classifier
In the cell nucleus, each chromosome is confined to a chromosome territory. This spatial organization of chromosomes plays a crucial role in gene regulation and genome stability. An additional level of organization has been discovered at the chromosome scale: the spatial segregation into open and closed chromatins to form two genome-wide compartments. Although considerable progress has been made in our knowledge of chromatin organization, a fundamental issue remains the understanding of its dynamics, especially in cancer. To address this issue, we performed genome-wide mapping of chromatin interactions (Hi-C) over the time after estrogen stimulation of breast cancer cells. To biologically interpret these interactions, we integrated with estrogen receptor (ERα) binding events, gene expression and epigenetic marks. We show that gene-rich chromosomes as well as areas of open and highly transcribed chromatins are rearranged to greater spatial proximity, thus enabling genes to share transcriptional machinery and regulatory elements. At a smaller scale, differentially interacting loci are enriched for cancer proliferation and estrogen-related genes. Moreover, these loci are correlated with higher ERα binding events and gene expression. Taken together these results reveal the role of a hormone - estrogen - on genome organization, and its effect on gene regulation in cancer.
Inter-individual variation in gene regulatory elements is hypothesized to play a causative role in adverse drug reactions and reduced drug activity. However, relatively little is known about the location and function of drug-dependent elements. To uncover drug-associated elements in a genome-wide manner, we performed RNA-seq and ChIP-seq using antibodies against the pregnane X receptor (PXR) and three active regulatory marks (p300, H3K4me1, H3K27ac) on primary human hepatocytes treated with rifampin or vehicle control. Rifampin and PXR were chosen since they are part of the CYP3A4 pathway, which is known to account for the metabolism of more than 50% of all prescribed drugs. We selected 227 proximal promoters for genes with rifampin-dependent expression or nearby PXR/p300 occupancy sites and assayed their ability to induce luciferase in rifampin-treated HepG2 cells, finding only 10 (4.4%) that exhibited drug-dependent activity. As this result suggested a role for distal enhancer modules, we searched more broadly to identify 1,297 genomic regions bearing a conditional PXR occupancy as well as all three active regulatory marks. These regions are enriched near genes that function in the metabolism of xenobiotics, specifically members of the cytochrome P450 family. We performed enhancer assays in rifampin-treated HepG2 cells for 42 of these sequences as well as 7 sequences that overlap linkage-disequilibrium blocks defined by lead SNPs from pharmacogenomic GWAS studies, revealing 15/42 and 4/7 to be functional enhancers, respectively. A common African haplotype in one of these enhancers in the GSTA locus was found to exhibit potential rifampin hypersensitivity. Combined, our results further suggest that enhancers are the predominant targets of rifampin-induced PXR activation, provide a genome-wide catalog of PXR targets and serve as a model for the identification of drug-responsive regulatory elements.
Drug response varies between individuals and can be caused by genetic factors. Nucleotide variation in gene regulatory elements can have a significant effect on drug response, but due to the difficulty in identifying these elements, they remain understudied. Here, we used various genomic assays to analyze human liver cells treated with or without the antibiotic rifampin and identified drug-induced regulatory elements genome-wide. The testing of numerous active promoters in human liver cells showed only a few to be induced by rifampin treatment. A similar analysis of enhancers found several of them to be induced by the drug. Nucleotide variants in one of these enhancers were found to alter its activity. Combined, this work identifies numerous novel gene regulatory elements that can be activated due to drug response and thus provides candidate sequences in the human genome where nucleotide variation can lead to differences in drug response. It also provides a universally applicable method to detect these elements for other drugs.
This report describes an improved protocol to generate stranded, barcoded RNA-seq libraries to capture the whole transcriptome. By optimizing the use of duplex specific nuclease (DSN) to remove ribosomal RNA reads from stranded barcoded libraries, we demonstrate improved efficiency of multiplexed next generation sequencing (NGS). This approach detects expression profiles of all RNA types, including miRNA (microRNA), piRNA (Piwi-interacting RNA), snoRNA (small nucleolar RNA), lincRNA (long non-coding RNA), mtRNA (mitochondrial RNA) and mRNA (messenger RNA) without the use of gel electrophoresis. The improved protocol generates high quality data that can be used to identify differential expression in known and novel coding and non-coding transcripts, splice variants, mitochondrial genes and SNPs (single nucleotide polymorphisms).
RNA-seq; transcriptome; duplex-specific nuclease; gene expression1
Nontypeable Haemophilus influenzae (NTHi) are Gram-negative commensal bacteria that reside in the nasopharynx. NTHi can also cause multiple upper and lower respiratory tract diseases that include sinusitis, conjunctivitis, bronchitis, and otitis media. In numerous bacterial species the ferric uptake regulator (Fur) acts as a global regulator of iron homeostasis by negatively regulating the expression of iron uptake systems. However in NTHi strain 86-028NP and numerous other bacterial species there are multiple instances where Fur positively affects gene expression. It is known that many instances of positive regulation by Fur occur indirectly through a small RNA intermediate. However, no examples of small RNAs have been described in NTHi. Therefore we used RNA-Seq analysis to analyze the transcriptome of NTHi strain 86-028NPrpsL and an isogenic 86-028NPrpsLΔfur strain to identify Fur-regulated intergenic transcripts. From this analysis we identified HrrF, the first small RNA described in any Haemophilus species. Orthologues of this small RNA exist only among other Pasteurellaceae. Our analysis showed that HrrF is maximally expressed when iron levels are low. Additionally, Fur was shown to bind upstream of the hrrF promoter. RNA-Seq analysis was used to identify targets of HrrF which include genes whose products are involved in molybdate uptake, deoxyribonucleotide synthesis, and amino acid biosynthesis. The stability of HrrF is not dependent on the RNA chaperone Hfq. This study is the first step in an effort to investigate the role small RNAs play in altering gene expression in response to iron limitation in NTHi.
A causal role of gene amplification in tumorigenesis is well-known, while amplification of DNA regulatory elements as an oncogenic driver remains unclear. In this study, we integrated next-generation sequencing approaches to map distant estrogen response elements (DEREs) that remotely control transcription of target genes through chromatin proximity. Two densely mapped DERE regions located on chromosomes 17q23 and 20q13 were frequently amplified in ERα-positive luminal breast cancer. These aberrantly amplified DEREs deregulated target gene expression potentially linked to cancer development and tamoxifen resistance. Progressive accumulation of DERE copies was observed in normal breast progenitor cells chronically exposed to estrogenic chemicals. These findings may extend to other DNA regulatory elements, the amplification of which can profoundly alter target transcriptome during tumorigenesis.
Maternal alcohol consumption inflicts a multitude of phenotypic consequences that range from undetectable changes to severe dysmorphology. Using tightly controlled murine studies that deliver precise amounts of alcohol at discrete developmental stages, our group and other labs demonstrated in prior studies that the C57BL/6 and DBA/2 inbred mouse strains display differential susceptibility to the teratogenic effects of alcohol. Since the phenotypic diversity extends beyond the amount, dosage and timing of alcohol exposure, it is likely that an individual's genetic background contributes to the phenotypic spectrum. To identify the genomic signatures associated with these observed differences in alcohol-induced dysmorphology, we conducted a microarray-based transcriptome study that also interrogated the genomic signatures between these two lines based on genetic background and alcohol exposure. This approach is called a gene x environment (GxE) analysis; one example of a GxE interaction would be a gene whose expression level increases in C57BL/6, but decreases in DBA/2 embryos, following alcohol exposure. We identified 35 candidate genes exhibiting GxE interactions. To identify cis-acting factors that mediated these interactions, we interrogated the proximal promoters of these 35 candidates and found 241 single nucleotide variants (SNVs) in 16 promoters. Further investigation indicated that 186 SNVs (15 promoters) are predicted to alter transcription factor binding. In addition, 62 SNVs created, removed or altered the placement of a CpG dinucleotide in 13 of the proximal promoters, 53 of which overlapped putative transcription factor binding sites. These 53 SNVs are also our top candidates for future studies aimed at examining the effects of alcohol on epigenetic gene regulation.
fetal alcohol syndrome; gene x environment interactions; genomics; gene expression; next generation sequencing; genetic association; epigenetics
Our efforts to prevent and treat breast cancer are significantly impeded by a lack of knowledge of the biology and developmental genetics of the normal mammary gland. In order to provide the specimens that will facilitate such an understanding, The Susan G. Komen for the Cure Tissue Bank at the IU Simon Cancer Center (KTB) was established. The KTB is, to our knowledge, the only biorepository in the world prospectively established to collect normal, healthy breast tissue from volunteer donors. As a first initiative toward a molecular understanding of the biology and developmental genetics of the normal mammary gland, the effect of the menstrual cycle and hormonal contraceptives on DNA expression in the normal breast epithelium was examined.
Using normal breast tissue from 20 premenopausal donors to KTB, the changes in the mRNA of the normal breast epithelium as a function of phase of the menstrual cycle and hormonal contraception were assayed using next-generation whole transcriptome sequencing (RNA-Seq).
In total, 255 genes representing 1.4% of all genes were deemed to have statistically significant differential expression between the two phases of the menstrual cycle. The overwhelming majority (221; 87%) of the genes have higher expression during the luteal phase. These data provide important insights into the processes occurring during each phase of the menstrual cycle. There was only a single gene significantly differentially expressed when comparing the epithelium of women using hormonal contraception to those in the luteal phase.
We have taken advantage of a unique research resource, the KTB, to complete the first-ever next-generation transcriptome sequencing of the epithelial compartment of 20 normal human breast specimens. This work has produced a comprehensive catalog of the differences in the expression of protein-coding genes as a function of the phase of the menstrual cycle. These data constitute the beginning of a reference data set of the normal mammary gland, which can be consulted for comparison with data developed from malignant specimens, or to mine the effects of the hormonal flux that occurs during the menstrual cycle.
To determine whether cholera toxin B subunit and active peptide from shark liver (CTB-APSL) fusion protein plays a role in treatment of type 2 diabetic mice, the CTB-APSL gene was cloned and expressed in silkworm (Bombyx mori) baculovirus expression vector system (BEVS), then the fusion protein was orally administrated at a dose of 100 mg/kg for five weeks in diabetic mice. The results demonstrated that the oral administration of CTB-APSL fusion protein can effectively reduce the levels of both fasting blood glucose (FBG) and glycosylated hemoglobin (GHb), promote insulin secretion and improve insulin resistance, significantly improve lipid metabolism, reduce triglycerides (TG), total cholesterol (TC) and low density lipoprotein (LDL) levels and increase high density lipoprotein (HDL) levels, as well as effectively improve the inflammatory response of type 2 diabetic mice through the reduction of the levels of inflammatory cytokines tumor necrosis factor-α (TNF-α) and interleukin-6 (IL-6). Histopathology shows that the fusion protein can significantly repair damaged pancreatic tissue in type 2 diabetic mice, significantly improve hepatic steatosis and hepatic cell cloudy swelling, reduce the content of lipid droplets in type 2 diabetic mice, effectively inhibit renal interstitial inflammatory cells invasion and improve renal tubular epithelial cell nucleus pyknosis, thus providing an experimental basis for the development of a new type of oral therapy for type 2 diabetes.
active peptide from shark liver; cholera toxin B subunit; Bombyx mori pupae; type 2 diabetes mellitus; oral administration
Summary: NGSUtils is a suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. These tools provide a stable and modular platform for data management and analysis.
Availability and implementation: NGSUtils is available under a BSD license and works on Mac OS X and Linux systems. Python 2.6+ and virtualenv are required. More information and source code may be obtained from the website: http://ngsutils.org.
Supplementary data are available at Bioinformatics online.
Our fundamental understanding of how several thousand diverse RNAs are recognized in the soma, sorted, packaged, transported and localized within the cell is fragmentary. The COPa and COPb proteins of the coatomer protein I (COPI) vesicle complex were reported to interact with specific RNAs and represent a candidate RNA sorting and transport system. To determine the RNA-binding profile of Golgi-derived COPI in neuronal cells, we performed formaldehyde-linked RNA immunoprecipitation, followed by high-throughput sequencing, a process we term FLRIP-Seq (FLRIP, formaldehyde-cross-linked immunoprecipitation). We demonstrate that COPa co-immunoprecipitates a specific set of RNAs that are enriched in G-quadruplex motifs and fragile X mental retardation protein-associated RNAs and that encode factors that predominantly localize to the plasma membrane and cytoskeleton and function within signaling pathways. These data support the novel function of COPI in inter-compartmental trafficking of RNA.
To adapt to stresses encountered in stationary phase, Gram-negative bacteria utilize the alternative sigma factor RpoS. However, some species lack RpoS; thus, it is unclear how stationary-phase adaptation is regulated in these organisms. Here we defined the growth-phase-dependent transcriptomes of Haemophilus ducreyi, which lacks an RpoS homolog. Compared to mid-log-phase organisms, cells harvested from the stationary phase upregulated genes encoding several virulence determinants and a homolog of hfq. Insertional inactivation of hfq altered the expression of ~16% of the H. ducreyi genes. Importantly, there were a significant overlap and an inverse correlation in the transcript levels of genes differentially expressed in the hfq inactivation mutant relative to its parent and the genes differentially expressed in stationary phase relative to mid-log phase in the parent. Inactivation of hfq downregulated genes in the flp-tad and lspB-lspA2 operons, which encode several virulence determinants. To comply with FDA guidelines for human inoculation experiments, an unmarked hfq deletion mutant was constructed and was fully attenuated for virulence in humans. Inactivation or deletion of hfq downregulated Flp1 and impaired the ability of H. ducreyi to form microcolonies, downregulated DsrA and rendered H. ducreyi serum susceptible, and downregulated LspB and LspA2, which allow H. ducreyi to resist phagocytosis. We propose that, in the absence of an RpoS homolog, Hfq serves as a major contributor of H. ducreyi stationary-phase and virulence gene regulation. The contribution of Hfq to stationary-phase gene regulation may have broad implications for other organisms that lack an RpoS homolog.
Pathogenic bacteria encounter a wide range of stresses in their hosts, including nutrient limitation; the ability to sense and respond to such stresses is crucial for bacterial pathogens to successfully establish an infection. Gram-negative bacteria frequently utilize the alternative sigma factor RpoS to adapt to stresses and stationary phase. However, homologs of RpoS are absent in some bacterial pathogens, including Haemophilus ducreyi, which causes chancroid and facilitates the acquisition and transmission of HIV-1. Here, we provide evidence that, in the absence of an RpoS homolog, Hfq serves as a major contributor of stationary-phase gene regulation and that Hfq is required for H. ducreyi to infect humans. To our knowledge, this is the first study describing Hfq as a major contributor of stationary-phase gene regulation in bacteria and the requirement of Hfq for the virulence of a bacterial pathogen in humans.
Haemophilus ducreyi causes chancroid, a genital ulcer disease that facilitates the transmission of human immunodeficiency virus type 1. In humans, H. ducreyi is surrounded by phagocytes and must adapt to a hostile environment to survive. To sense and respond to environmental cues, bacteria frequently use two-component signal transduction (2CST) systems. The only obvious 2CST system in H. ducreyi is CpxRA; CpxR is a response regulator, and CpxA is a sensor kinase. Previous studies by Hansen and coworkers showed that CpxR directly represses the expression of dsrA, the lspB-lspA2 operon, and the flp operon, which are required for virulence in humans. They further showed that CpxA functions predominantly as a phosphatase in vitro to maintain the expression of virulence determinants. Since a cpxA mutant is avirulent while a cpxR mutant is fully virulent in humans, CpxA also likely functions predominantly as a phosphatase in vivo. To better understand the role of H. ducreyi CpxRA in controlling virulence determinants, here we defined genes potentially regulated by CpxRA by using RNA-Seq. Activation of CpxR by deletion of cpxA repressed nearly 70% of its targets, including seven established virulence determinants. Inactivation of CpxR by deletion of cpxR differentially regulated few genes and increased the expression of one virulence determinant. We identified a CpxR binding motif that was enriched in downregulated but not upregulated targets. These data reinforce the hypothesis that CpxA phosphatase activity plays a critical role in controlling H. ducreyi virulence in vivo. Characterization of the downregulated genes may offer new insights into pathogenesis.
Bone marrow-derived mesenchymal stem cells (MSCs) are dominant seed cell sources for bone regeneration. Bone morphogenetic proteins (BMPs) initiate cartilage and bone formation in a sequential cascade. Vascular endothelial growth factor (VEGF) is an essential coordinator of extracellular matrix remodeling, angiogenesis and bone formation. In the present study, the effects of the vascular endothelial growth factor 165 (VEGF165) and bone morphogenetic protein 2 (BMP2) genes on bone regeneration were investigated by the lentivirus-mediated cotransfection of the two genes into rat bone marrow-derived MSCs. The successful co-expression of the two genes in the MSCs was confirmed using quantitative polymerase chain reaction (qPCR) and western blot analysis. The results of alizarin red and alkaline phosphatase (ALP) staining at 14 days subsequent to transfection showed that the area of staining in cells transfected with BMP2 alone was higher than that in cells transfected with BMP2 and VEGF165 or untransfected control cells, while the BMP2 + VEGF165 group showed significantly more staining than the untransfected control. This indicated that BMP2 alone exhibited a stronger effect in bone regeneration than BMP2 in combination with VEGF165. Similarly, in inducing culture medium, the ALP activity of the BMP2 + VEGF165 group was notably suppressed compared with that of the BMP2 group. The overexpression of VEGF165 inhibited BMP2-induced MSC differentiation and osteogenesis in vitro. Whether or not local VEGF gene therapy is likely to affect bone regeneration in vivo requires further investigation.
bone marrow-derived mesenchymal stem cells; bone morphogenetic protein 2; vascular endothelial growth factor; bone regeneration; co-transfection
Genome-wide association studies (GWAS) have identified hundreds of genetic variants associated with complex human diseases, clinical conditions and traits. Genetic mapping of expression quantitative trait loci (eQTLs) is providing us with novel functional effects of thousands of single nucleotide polymorphisms (SNPs). In a classical quantitative trail loci (QTL) mapping problem multiple tests are done to assess whether one trait is associated with a number of loci. In contrast to QTL studies, thousands of traits are measured alongwith thousands of gene expressions in an eQTL study. For such a study, a huge number of tests have to be performed (~106). This extreme multiplicity gives rise to many computational and statistical problems. In this paper we have tried to address these issues using two closely related inferential approaches: an empirical Bayes method that bears the Bayesian flavor without having much a priori knowledge and the frequentist method of false discovery rates. A three-component t-mixture model has been used for the parametric empirical Bayes (PEB) method. Inferences have been obtained using Expectation/Conditional Maximization Either (ECME) algorithm. A simulation study has also been performed and has been compared with a nonparametric empirical Bayes (NPEB) alternative.
The results show that PEB has an edge over NPEB. The proposed methodology has been applied to human liver cohort (LHC) data. Our method enables to discover more significant SNPs with FDR<10% compared to the previous study done by Yang et al. (Genome Research, 2010).
In contrast to previously available methods based on p-values, the empirical Bayes method uses local false discovery rate (lfdr) as the threshold. This method controls false positive rate.
Background. The genome-wide association studies (GWAS) have been successful during the last few years. A key challenge is that the interpretation of the results is not straightforward, especially for transacting SNPs. Integration of transcriptome data into GWAS may provide clues elucidating the mechanisms by which a genetic variant leads to a disease. Methods. Here, we developed a novel mediation analysis approach to identify new expression quantitative trait loci (eQTL) driving CYP2D6 activity by combining genotype, gene expression, and enzyme activity data. Results. 389,573 and 1,214,416 SNP-transcript-CYP2D6 activity trios are found strongly associated (P < 10−5, FDR = 16.6% and 11.7%) for two different genotype platforms, namely, Affymetrix and Illumina, respectively. The majority of eQTLs are trans-SNPs. A single polymorphism leads to widespread downstream changes in the expression of distant genes by affecting major regulators or transcription factors (TFs), which would be visible as an eQTL hotspot and can lead to large and consistent biological effects. Overlapped eQTL hotspots with the mediators lead to the discovery of 64 TFs.
Conclusions. Our mediation analysis is a powerful approach in identifying the trans-QTL-phenotype associations. It improves our understanding of the functional genetic variations for the liver metabolism mechanisms.
BACKGROUND & AIMS
Early embryogenesis involves cell fate decisions that define the body axes and establish pools of progenitor cells. Development does not stop once lineages are specified; cells continue to undergo specific maturation events, and changes in gene expression patterns lead to their unique physiological functions. Secretory pancreatic acinar cells mature postnatally to synthesize large amounts of protein, polarize, and communicate with other cells. The transcription factor MIST1 is expressed by only secretory cells and regulates maturation events. MIST1-deficient acinar cells in mice do not establish apical-basal polarity, properly position zymogen granules, or communicate with adjacent cells, disrupting pancreatic function. We investigated whether MIST1 directly induces and maintains the mature phenotype of acinar cells.
We analyzed the effects of Cre-mediated expression of Mist1 in adult Mist1– deficient (Mist1KO) mice. Pancreatic tissues were collected and analyzed by light and electron microscopy, immunohistochemistry, real-time polymerase chain reaction analysis, and chromatin immunoprecipitation. Primary acini were isolated from mice and analyzed in amylase secretion assays.
Induced expression of Mist1 in adult Mist1KO mice restored wild-type gene expression patterns in acinar cells. The acinar cells changed phenotypes, establishing apical-basal polarity, increasing the size of zymogen granules, reorganizing the cytoskeletal network, communicating intercellularly (by synthesizing gap junctions), and undergoing exocytosis.
The exocrine pancreas of adult mice can be remodeled by re-expression of the transcription factor MIST1. MIST1 regulates acinar cell maturation and might be used to repair damaged pancreata in patients with pancreatic disorders.
DIMM; Exocrine Pancreas Disease; Secretion; Transcription
Active peptide from shark liver (APSL) is a cytokine from Chiloscyllium plagiosum that can stimulate liver regeneration and protects the pancreas. To study the effect of orally administered recombinant APSL (rAPSL) on an animal model of type 2 diabetes mellitus, the APSL gene was cloned, and APSL was expressed in Bombyx mori N cells (BmN cells), silkworm larvae and silkworm pupae using the silkworm baculovirus expression vector system (BEVS). It was demonstrated that rAPSL was able to significantly reduce the blood glucose level in mice with type 2 diabetes induced by streptozotocin. The analysis of paraffin sections of mouse pancreatic tissues revealed that rAPSL could effectively protect mouse islets from streptozotocin-induced lesions. Compared with the powder prepared from normal silkworm pupae, the powder prepared from pupae expressing rAPSL exhibited greater protective effects, and these results suggest that rAPSL has potential uses as an oral drug for the treatment of diabetes mellitus in the future.
active peptide from shark liver; Bombyx mori pupae; BmNPV/Bac-to-Bac baculovirus expression system; type 2 diabetes mellitus; oral administration
Micro-indels (insertions or deletions shorter than 21 bps) constitute the second most frequent class of human gene mutation after single nucleotide variants. Despite the relative abundance of non-frameshifting indels, their damaging effect on protein structure and function has gone largely unstudied. We have developed a support vector machine-based method named DDIG-in (Detecting disease-causing genetic variations due to indels) to prioritize non-frameshifting indels by comparing disease-associated mutations with putatively neutral mutations from the 1,000 Genomes Project. The final model gives good discrimination for indels and is robust against annotation errors. A webserver implementing DDIG-in is available at http://sparks-lab.org/ddig.
Bidirectional promoters are shared promoter sequences between divergent gene pair (genes proximal to each other on opposite strands), and can regulate the genes in both directions. In the human genome, > 10% of protein-coding genes are arranged head-to-head on opposite strands, with transcription start sites that are separated by < 1,000 base pairs. Many transcription factor binding sites occur in the bidirectional promoters that influence the expression of 2 opposite genes. Recently, RNA polymerase II (RPol II) ChIP-seq data are used to identify the promoters of coding genes and non-coding RNAs. However, a bidirectional promoter with RPol II ChIP-Seq data has not been found.
In some bidirectional promoter regions, the RPol II forms a bi-peak shape, which indicates that 2 promoters are located in the bidirectional region. We have developed a computational approach to identify the regulatory regions of all divergent gene pairs using genome-wide RPol II binding patterns derived from ChIP-seq data, based upon the assumption that the distribution of RPol II binding patterns around the bidirectional promoters are accumulated by RPol II binding of 2 promoters. In HeLa S3 cells, 249 promoter pairs and 1094 single promoters were identified, of which 76 promoters cover only positive genes, 86 promoters cover only negative genes, and 932 promoters cover 2 genes. Gene expression levels and STAT1 binding sites for different promoter categories were therefore examined.
The regulatory region of bidirectional promoter identification based upon RPol II binding patterns provides important temporal and spatial measurements regarding the initiation of transcription. From gene expression and transcription factor binding site analysis, the promoters in bidirectional regions may regulate the closest gene, and STAT1 is involved in primary promoter.
Over 10,000 long intergenic non-coding RNAs (lincRNAs) have been identified in the human genome. Some have been well characterized and known to participate in various stages of gene regulation. In the post-transcriptional process, another class of well-known small non-coding RNA, or microRNA (miRNA), is very active in inhibiting mRNA. Though similar features between mRNA and lincRNA have been revealed in several recent studies, and a few isolated miRNA-lincRNA relationships have been observed. Despite these advances, the comprehensive miRNA regulation pattern of lincRNA has not been clarified.
In this study, we investigated the possible interaction between the two classes of non-coding RNAs. Instead of using the existing long non-coding database, we employed an ab initio method to annotate lincRNAs expressed in a group of normal breast tissues and breast tumors.
Approximately 90 lincRNAs show strong reverse expression correlation with miRNAs, which have at least one predicted target site presented. These target sites are statistically more conserved than their neighboring genetic regions and other predicted target sites. Several miRNAs that target to these lincRNAs are known to play an essential role in breast cancer.
Similar to inhibiting mRNAs, miRNAs show potential in promoting the degeneration of lincRNAs. Breast-cancer-related miRNAs may influence their target lincRNAs resulting in differential expression in normal and malignant breast tissues. This implies the miRNA regulation of lincRNAs may be involved in the regulatory process in tumor cells.
Typical analysis of time-series gene expression data such as clustering or graphical models cannot distinguish between early and later drug responsive gene targets in cancer cells. However, these genes would represent good candidate biomarkers.
We propose a new model - the dynamic time order network - to distinguish and connect early and later drug responsive gene targets. This network is constructed based on an integrated differential equation. Spline regression is applied for an accurate modeling of the time variation of gene expressions. Then a likelihood ratio test is implemented to infer the time order of any gene expression pair. One application of the model is the discovery of estrogen response biomarkers. For this purpose, we focused on genes whose responses are late when the breast cancer cells are treated with estradiol (E2).
Our approach has been validated by successfully finding time order relations between genes of the cell cycle system. More notably, we found late response genes potentially interesting as biomarkers of E2 treatment.
Alternative splicing increases proteome diversity by expressing multiple gene isoforms that often differ in function. Identifying alternative splicing events from RNA-seq experiments is important for understanding the diversity of transcripts and for investigating the regulation of splicing.
We developed Alt Event Finder, a tool for identifying novel splicing events by using transcript annotation derived from genome-guided construction tools, such as Cufflinks and Scripture. With a proper combination of alignment and transcript reconstruction tools, Alt Event Finder is capable of identifying novel splicing events in the human genome. We further applied Alt Event Finder on a set of RNA-seq data from rat liver tissues, and identified dozens of novel cassette exon events whose splicing patterns changed after extensive alcohol exposure.
Alt Event Finder is capable of identifying de novo splicing events from data-driven transcript annotation, and is a useful tool for studying splicing regulation.