|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Hepatotoxicity is a form of liver injury caused by exposure to stressors. Genomic-based approaches have been used to detect changes in transcription in response to hepatotoxicants. However, there are no straightforward ways of using co-expressed genes anchored to a phenotype or constrained by the experimental design for discerning mechanisms of a biological response.
Through the analysis of a gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants and leveraging alanine aminotransferase (ALT), a serum enzyme indicative of liver injury as the phenotypic marker, we identified biological processes and molecular pathways that may be associated with mechanisms of hepatotoxicity. Our analysis used an approach called Coherent Co-expression Biclustering (cc-Biclustering) for clustering a subset of genes through a coherent (consistency) measure within each group of samples representing a subset of experimental conditions. Supervised biclustering identified 87 genes co-expressed and correlated with ALT in all the samples exposed to the chemicals. None of the over-represented pathways related to liver injury. However, biclusters with subsets of samples exposed to one of the 7 hepatotoxicants, but not to a non-toxic isomer, contained co-expressed genes that represented pathways related to a stress response. Unsupervised biclustering of the data resulted in 1) four to five times more genes within the bicluster containing all the samples exposed to the chemicals, 2) biclusters with co-expression of genes that discerned 1,4 dichlorobenzene (a non-toxic isomer at low and mid doses) from the other chemicals, pathways and biological processes that underlie liver injury and 3) a bicluster with genes up-regulated in an early response to toxic exposure.
We obtained clusters of co-expressed genes that over-represented biological processes and molecular pathways related to hepatotoxicity in the rat. The mechanisms involved in the response of the liver to the exposure to 1,4-dichlorobenzene suggest non-genotoxicity whereas the exposure to the hepatotoxicants could be DNA damaging leading to overall genomic instability and activation of cell cycle check point signaling. In addition, key pathways and biological processes representative of an inflammatory response, energy production and apoptosis were impacted by the hepatotoxicant exposures that manifested liver injury in the rat.
The liver is considered one of the vital organs in the body. Its has several major functions including the production of bile to break down fat, glycogen storage, decomposition of red blood cells, production of cholesterol, plasma protein synthesis and drug metabolism just to name a few. The latter takes place by a host of specialized detoxification enzymes and pathways that biochemically modify or metabolize xenobiotics to harmless metabolites and other byproducts for clearance from the body . However, the metabolism of some drugs and compounds leads to toxic intermediates that can harm the liver and severely disrupt its function . Drug-induced liver injury (DILI) is the leading cause of liver failure in the United States (US) and is quickly becoming a major concern world-wide . In fact, DILI accounts for more than 50 percent of the cases of acute liver failure in the US and more than 75 percent of cases of adverse drug reactions result in liver transplantation or death . Drug-induced hepatotoxicity is the most frequent cause for a drug to be withdrawn from the market, restricted in its use or have a warning associated with it due to adverse drug reactions . A better understanding of the pathophysiology of DILI and the mechanisms involved in the manifestation of hepatotoxicity are critical to improving human health and public awareness of potentially harmful toxicants .
Microarray gene expression analysis has been used to study the effects of toxicants and other environmental stressors on biological systems [7-11]. Recently, Lobenhofer et al.  used gene expression data from rats exposed to a compendium of hepatotoxicants to show that blood gene expression patterns could be used not only to classify animals based on the compound they were exposed to, but also to provide a reasonable indication of the severity level of liver injury. In addition, Bushel et al.  demonstrated that gene expression profiles from rat blood samples could accurately predict exposure levels of acetaminophen to the rat liver better than traditional clinical panels. Furthermore, human subjects treated with acetaminophen could be classified based on blood gene expression levels. Although these efforts led to informative conclusions about the genes expression changes related to drug-induced hepatotoxicity, they did not capture the breath of the biological mechanisms altered by the toxic insult.
While hierarchical clustering groups genes and samples by the similarity of expression across all the elements in a data matrix, biclustering [14-17] aims to identify sub-matrices (subsets of rows [genes] and subsets of columns [samples] from an original data matrix) of gene expression that are coherent and possess homogeneity of co-expression. Hence, the biclusters contain co-expressed genes that represent distinct biological responses related to mechanistic changes. In this paper we used a biclustering method to analyze a compendium gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants (in a dose response and time series manner) and leveraged alanine aminotransferase (ALT), a serum enzyme indicative of liver injury, as the phenotypic marker in order to identify biclusters of co-expressed genes that over-represent biological processes and pathways related to hepatotoxicity.
Male 12 week old F344 Fischer rats were exposed to one of the following chemicals: 1,2-dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine or diquat dibromide . For each chemical, four to six rats (two in one case), were exposed to three or four different dose levels (from subtoxic to toxic) of the compound and sacrificed at 6, 24 and 48 hrs later to extract liver RNA for microarray hybridization. The 318 samples and exposure conditions are listed in Table Table1.1. The number of samples in each group varied from 32 to 72. Liver necrosis was observed in all the rats exposed to one of the eight chemicals at high doses (Table (Table2).2). However, 1,4-dichlorobenzene is an isomer of 1,2-dichlorobenzene and is non-toxic at low and mid doses. Since each compound has its unique chemical structure and properties, the activated biological processes, molecular pathways, extent of toxicity in the liver and injury could be highly similar or dissimilar. Considering these variables, we partitioned the columns of the gene expression data matrix into eight groups (one for each chemical) for analysis using Coherent Co-expression (cc)-Biclustering.
Each group of chemicals included a set of samples with different exposure conditions (different doses and time points). We used ALT as a profile to supervise the extraction of the genes that are expressed similarly and correlated with the phenotypic marker. ALT values in each of the chemical exposures were generally elevated from 6, 24, to 48 hrs and from low doses to high doses. For a probability threshold (pt) value of 0.001, supervised cc-Biclustering extracted 84 biclusters in which the genes were correlated (r-value >= 0) to ALT and 76 biclusters in which the genes were anti-correlated (r-value < 0) to ALT. The number of genes in each bicluster varied. The largest number of genes in a bicluster was 713. The number of groups (chemicals) in the biclusters had a range from a minimum of 1 to a maximum of 8. The biclusters which contained only one chemical could represent the uniqueness in gene expression for the agent and the biclusters which contained all eight chemicals could provide some clues to the common responses to the exposures across the agents.
Figure Figure11 displays heat maps of two biclusters which included all eight groups of chemicals. There are 87 genes in Figure Figure1A1A (the top half) and 86 genes in Figure Figure1B1B (the bottom half) significantly correlated and anti-correlated to the ALT profile, respectively. From Gene Ontology analysis of the genes in the biclusters, it was revealed that the significantly over-represented KEGG pathways and biological processes that are up-regulated included general mechanisms such as translation and biosynthetic processes from the genes correlated with ALT and down-regulated categories such as response to external stimuli, negative regulation of coagulation and macromolecule metabolism/biosynthesis from the anti-correlated genes (Table (Table3).3). It is interesting that none of the over-represented categories appeared to be related to liver injury. Figure Figure22 shows heat maps of two biclusters which consisted of seven groups of chemicals not including 1,4-dichlorobenzene. There are 182 genes correlated to the ALT profile in Figure Figure2A2A (the top half) and 76 anti-correlated genes in Figure Figure2B2B (the bottom half). Gene Ontology analysis revealed that over-represented KEGG pathways and biological processes included a defense response, wound healing, cell proliferation and hematopoietic migration from the genes correlated with ALT and categories related to lipid and steroid metabolism as well as the alkylation of amino acids, tryptophan and sulfur metabolism from the anti-correlated genes (Table (Table44).
The Gene Ontology analysis of three biclusters from two chemicals, galactosamine and thioacetamide, revealed interesting categorization of the genes (see Additional file 1 Figure S1 and Table S1). The 401 genes in a bicluster subset by galactosamine over-represented a KEGG pathway related to extracellular matrix (ECM) receptor interaction (Figure S1A). Another bicluster of 364 genes subset by thioacetamide had a different set of significantly over-represented categories representing the Wnt signalling pathway and glycerophospholipid metabolism (Figure S1B). The third bicluster of 289 genes subset by both chemicals had biological processes and KEGG pathways related to neuron development and cell adhesion as over-represented categories (Figure S1C).
Although the blood serum level of ALT is a good indicator of liver injury, it is not considered a prognosticator of toxic insult as the true nature and extent of the liver damage is not proportional to the elevation in the serum enzyme activity . For instance, a normal ALT level does not necessarily mean that the liver is definitely normal and high levels of ALT in the blood doesn't necessarily indicate the extent to which the liver is inflamed or damaged. In addition, it is likely that ALT levels elevate well after the mechanistic changes have occurred that led to the liver injury from toxic exposure. Therefore, we set out to use cc-Biclustering in an unsupervised fashion (without ALT) so we could find biclusters of genes that may be unrelated to ALT or respond before ALT elevation and yet are very informative in terms of the manifestation of hepatotoxicity. We obtained more biclusters (~ three times more) using unsupervised cc-Biclustering when the same threshold pt was used as in the supervised case.
Similar to the biclusters in Figure Figure1,1, the genes in the heat maps shown in Figure Figure33 are subset by all eight chemicals. There are 330 genes up-regulated as shown in Figure Figure3A3A (the top half) and 409 down-regulated genes as shown in Figure Figure3B3B (the bottom half). Compared to the number of genes in the bicluster shown in Figure Figure1,1, the bicluster in Figure Figure33 contained about four to five times more genes and more significant categories (Table (Table5).5). The set of genes that are up-regulated were found to over-represent biological processes and KEGG pathways representative of a more toxic response (i.e. less similar to the general responses over-represented by the genes in the bicluster from the supervised clustering [Figure [Figure1A]).1A]). Apoptosis, an inflammatory response and glycolysis/gluconeogenesis were key mechanisms impacted. The set of down-regulated genes contained over-represented categories related to cholesterol biosynthesis, fatty acid metabolism, alkaloid biosynthesis and the peroxisome proliferator-activated receptors (PPARs) signaling pathway. PPAR-α is mainly expressed in the liver and activation of the receptor has been associated with suppression of apoptosis and induction of cell proliferation .
Heat maps of the gene expression from three biclusters subset by seven chemicals (excluding 1,4-dichlorobenzene), are shown in Figure Figure4.4. Figure Figure4A4A (the top) shows 175 co-expressed genes up-regulated at a later time and Figure Figure4B4B (the middle) 114 co-expressed genes down-regulated. The set of up-regulated genes significantly over-represented categories that are suggestive of liver regeneration (Table (Table6)6) [20,21]. Angiogenesis, the regulation of actin cytoskeleton, regulation of adherens (adhesion) junctions and the Toll-like receptor (TLR) signaling pathway were impacted. The down-regulated genes over-represented mechanisms related to energy producing pathways (glucose homeostasis, gluconeogenesis and the pentose phosphate pathways). Comparisons of the over-represented categories between the genes in Figure Figure33 and and44 clearly differentiate 1,4-dichlorobenzene from the other chemicals since the biclusters in Figure Figure44 do not contain genes that are co-expressed in the samples exposed to 1,4-dichlorobenzene. Figure Figure4C4C (the bottom) had 47 genes up-regulated early and significantly over-represented categories pointing to a negative regulation of protein kinase activity, the mitogen-activated protein kinase (MAPK) signaling pathway, and apoptosis (Table (Table66).
Over the recent years several ways of investigating drug-induced hepatotoxicity has been explored. Using gene expression analysis of samples exposed to toxicants offers a genome-wide assessment of the transcriptional changes that occur from the insult. However, current methodologies for analyzing the data are limited in that they typically do not "anchor" the changes in gene expression to the phenotype of toxicity nor do they constrain them by the experimental design. Case in point is hierarchical clustering of genes and samples based on gene expression data. Although the method provides an overall view of the clusters of genes that are co-expressed within a group of highly similar samples, it does not extract subclusters of co-expressed genes that are related to a given phenotype, end-point measure of the samples or subset of experiments. Yoon et al.  proposed a method for discovering coherent biclusters from gene expression data using decision diagrams constituted from binary representations of a set of samples in which the expression of a subset of genes are highly similar (coherent). However, the method does not integrate phenotypic data nor is it guided by the experimental design (i.e. constraints imposed by a time series or dose response study). Linking co-expressed genes to a phenotype of interest or set of experimental conditions can potentially enhance the interpretation of the biological systems that are impacted from the manifestation of an outcome .
We analyzed a compendium gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants and leveraged alanine aminotransferase (ALT), a serum enzyme indicative of liver injury as the phenotypic marker, to identify several biological processes and molecular pathways that may be associated with mechanisms of hepatotoxicity. Our analysis used an approach to biclustering called Coherent Co-expression Biclustering (cc-Biclustering) for clustering of a subset of genes through a coherent (consistency) measure within each group of samples representing a subset of experimental conditions. Existing biclustering methods use some measure of merit to determine whether a row (gene) or column (sample) should be included or excluded from a bicluster . cc-Biclustering uses a given coherent measure (CM) to determine whether a gene or a group of samples is included or excluded from a bicluster. The CM used between pairs of gene vectors (or gene vectors with a phenotypic profile) is flexible. Depending on a research interest, CM can be chosen to be Pearson correlation, Euclidean distance or some other measure of (dis)similarity. Unsupervised cc-Biclustering uses a pairwise comparison of the gene expression profiles to extract biclusters. In the case of supervised cc-Biclustering, we correlated the co-expression of the subset of genes within a bicluster with ALT. The overlap between the up-regulated genes from the supervised and unsupervised cc-Biclustering methods is 61 genes (Additional file 1; Figure S2) where as the overlap between the down-regulated genes from the two methods is 78 genes (Additional file 1: Figure S3). The sharp contrast between biclusters in Figure Figure11 and Figure Figure22 with respect to the over-represented biological categories exhibited by the genes positively correlated with ALT clearly differentiated 1,4-dichlorobenzene from the other seven chemicals. This finding is consistent with a study to predict the levels of necrosis in the rat liver using this same compendium of hepatotoxicants . It was shown that 1,4-dichlorobenzene was the only chemical with an observed necrosis contained in less than 25% of the hepatocytes (one sample was < 25%, 4 samples were < 5%, and 31 samples had no sign of necrosis). All the other chemicals caused observed necrosis in > 50% of the hepatocytes in one or more of the exposed samples (Table (Table2).2). Moreover, a bicluster with samples exposed to bromobenzene, diquat dibromide, galactosamine, monocrotaline or N-nitrosomorpholine and 12 genes (including Cd40, Casp8 and Nr4a1) correlated with ALT and contained over-represented categories related to an inflammatory response, glycolysis/gluconeogenesis and apoptosis (Data not shown). These are biological processes known to be involved in hepatotoxicity [24,25].
Comparison of the over-represented categories between the genes in the biclusters shown in Figures Figures33 and and44 (both from the unsupervised biclustering) also differentiate 1,4-dichlorobenzene from the other chemicals quite well. In addition, 47 genes in a bicluster were found to be up-regulated early in response to the hepatotoxicant exposure (Figure (Figure4C4C bottom) and significantly over-represented categories related to negative regulation of protein kinase activity, the MAPK signaling pathway, and apoptosis (Table (Table6).6). Many genes in this set (Atf4, Trib3, Jun, Btg2, Gadd45a, Gadd45b, Ddit3, Cdkn1a, Hmox1, Cdk9, Bag5 and Mcl1) have been extensively studied in terms of their response to toxicants or other external stimuli [26-31]. Most of these genes are targets of p53 and cause DNA damage. Our finding suggests that the liver injury caused by 1,4-dichlorobenzene could be non-genotoxic while the other toxicants could be genotoxic. This result is consistent with the current finding that early responses of some genes at 6 hr differentiate 1,4-dichlorobenzene from the other seven more toxic chemicals where it is evident that transcriptional regulation by Jun and TP53 leads to necrosis .
Although the data from each array is independently collected, in a vast number of cases, many of the related biological samples are highly correlated. For example, some samples are from the same tissue, or treated by the same chemical, or a time series. Same samples are merely biological replicates. In our study we analyzed rat liver samples exposed to one of eight chemicals. Each of the hepatotoxicants has its unique chemical structure and causes different levels and extent of liver injury (Table (Table2).2). The activated biological processes and molecular pathways can be highly similar or dissimilar. The gene sets involved in responding to each of the hepatotoxicants can overlap to a large extent or be somewhat unique. Using our biclustering approach we were able to capitalize on the experimental design and/or phenotypic measure to discern possible mechanisms of hepatotoxicity through the over-representation of pathways and biological processes determined by co-expressed genes in cc-Biclusters. More work is underway to efficiently process the wealth of biclusters extracted for a more informed interpretation of the mechanistic changes that take place during the formation of hepatotoxicity.
Male 12 week old F344 Fischer rats were individually exposed to one of the following chemicals: 1,2-dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine or diquat dibromide. All eight chemicals were studied using standardized procedures, i.e. a common array platform, experimental procedures and data retrieving and analysis processes. For details of the experimental design see Lobenhofer et al. Briefly, for each chemical, four to six male rats were exposed to a low dose, mid dose(s) or a high dose of the chemical and sacrificed at 6, 24 or 48 hrs later. At necropsy, the liver of the rats were harvested for RNA extraction. A time-matched vehicle control pool was made for each chemical by pooling equal amounts of RNA from each of the control animals. Each treated animal was hybridized against a time-matched control pool to the Agilent Rat Oligonucleotide Microarray with a dye-swap technical replicate. Fluorescence intensities were measured with an Agilent DNA Microarray Scanner and processed with the Agilent Feature Extraction software version A.7.5.1 for bromobenzene and 1,2-dichlorobenzene arrays and version 188.8.131.52 for the others. See the header of the Agilent data files for more details of the scanning and data acquisition hardware and software parameters. The 318 samples and exposure conditions are listed in Table Table1.1. Liver necrosis was observed in all the rats exposed to one of the eight chemicals at high doses. However, 1,4-dichlorobenzene is an isomer of 1,2-dichlorobenzene and is non-toxic at low and mid doses. Since each compound has its unique chemical structure and properties, the activated biological processes, molecular pathways, type of toxicity in the liver and injury could be highly similar or dissimilar. Considering these variables, we partitioned the columns of the gene expression data matrix (Additional file 2) into eight groups (one for each chemical) for analysis using cc-Biclustering (Additional file 3). The number of samples in each group varied from 32 to 72. The data is publicly available at the Gene Expression Omnibus (GEO) database http://www.ncbi.nlm.nih.gov/geo/ under series GSE15785 and at the Chemical Effects in Biological Systems (CEBS) database http://cebs.niehs.nih.gov under accession number 001-00001-0020-000-4.
At sacrifice, blood was collected into serum separation tubes (BD Microtainer® Tubes, BD, Franklin Lakes, NJ) and serum was separated. Clinical chemistry analyses of alanine aminotransferase (ALT) was performed on all rats at study termination. Serum levels of the established liver injury marker ALT (Additional file 4) increase when the liver shows inflammation, injury or hepatotoxicity.
Rather than obtaining coherent measures across the whole vector of gene expression, cc-Biclustering uses the sample information in the study design of the experiment to partition the M columns into J groups according to a given phenotypic response or biological study of interest. For instance, as shown in the next section where an expression matrix has eight different chemicals, each group corresponds to a chemical with which samples (biological replicates included) are treated at different doses and time points. A gene expression value of an element in matrix A now has four indices, i.e.
where row index i is from 1 to N number of genes; the column index breaks up into three sub-indices, j, k, and l. The index j is from 1 to J number of groups (i.e., each of the groups corresponds to a chemical). The index k is from 1 to K number of treatments within a group (i.e., corresponding to a given chemical, a treatment is exposure to a given dose at a given time point). The index l is from 1 to L, where L is the number of biological replicates. The general idea of cc-Biclustering is to map matrix A to a binary coherent matrix H(hi, j) according to an inclusion\exclusion criterion function. The H matrix has N rows and J columns.
The advantages of bracketing samples or conditions into a set of groups are four-fold. (1) A coherent measure of expression is computed within a group to determine if a gene aij is included in a bicluster, where aij is a vector of expression values of ith row and jth group. (2) Gene expression aij can be time and/or dose series derived. If there were several replicates at a given time and dose, aij is evaluated to determine if it is differentially expressed and should be included in a bicluster. (3) The coherent measure (CM) used between pairs of gene vectors ai1,j and ai2,j (or phenotypic profile Sj) is flexible. Depending on a research interest, CM can be chosen to be Pearson correlation, Euclidean distance, or some other measure of biologically relevant similarity. (4) The cc-Biclustering algorithm is simple. Like many coherent value based models (such as an additive model, multiplicative model or based on a given CM) cc-Biclustering converts a gene expression matrix A to a binary matrix H with a given p-value threshold. The extracted biclusters then have constant rows and columns . We used the DAVID database (April 2008 version) for Gene Ontology (January 2008 download) biological processes and KEGG pathway (January 2008 download) analyses of the genes within the biclusters and used p-values for over-representation of these biological categories based on a one-tailed Fisher exact probability or EASE score . The supervised and unsupervised cc-Biclustering algorithms are described in more detail and depicted in pseudo code (Additional file 1). A comparison of the analysis of a microarray gene expression data set from Arabidopsis thaliana samples treated with various conditions and times of exposure using unsupervised cc-Biclustering and the Cheng and Church biclustering algorithm  with parameters δ = 0.5, α = 1.2 and output = 10 biclusters revealed that our method produced the top 3 biclusters of genes (by size) which are highly correlated within the subset of samples that shared similar exposure conditions with a Pearson correlation in the range of +0.73 to +0.84 whereas the other method produced the top 3 biclusters of genes with correlation in the range of +0.01 to +0.55.
JWC and PRB designed the strategy of cc-Biclustering. JWC implemented the cc-Biclustering algorithm, analyzed the data and wrote part of the manuscript. PRB provided suggestions, advice and guidance for the concept of the research, interpreted the results and also wrote part of the manuscript. The authors have read and approved the final manuscript.
Supervised and unsupervised cc-Biclustering details, pseudo code and supplemental figures and table.
Two-dimensional matrix of the gene expression ratio data (data preprocessed and fluor-flips per biological sample merged [averaged]).
Sample information denoting biological replicates for cc-Biclustering.
ALT data for the biological samples.
We thank the National Center for Toxicogenomics at the National Institute of Environmental Health Sciences (NIEHS) for the hepatotoxicant compendium data. We also thank Jennifer Fostel and David Fargo for their critical review of the manuscript. This research was supported, in part by, the Intramural Research Program of the NIH and NIEHS.