|Home | About | Journals | Submit | Contact Us | Français|
The pathological mechanism of Barrett’s esophagus (BE) is still unclear. In the present study, pathway cross-talks were analyzed to identify hub pathways for BE, with the purpose of finding an efficient and cost-effective detection method to discover BE at its early stage and take steps to prevent its progression.
We collected and preprocessed gene expression profile data, original pathway data, and protein-protein interaction (PPI) data. Then, we constructed a background pathway cross-talk network (BPCN) based on the original pathway data and PPI data, and a disease pathway cross-talk network (DPCN) based on the differential pathways between the PPI data and the BE and normal control. Finally, a comprehensive analysis was conducted on these 2 networks to identify hub pathway cross-talks for BE, so as to better understand the pathological mechanism of BE from the pathway level.
A total of 12 411 genes, 300 pathways (6919 genes), and 787 896 PPI interactions (16 730 genes) were separately obtained from their own databases. Then, we constructed a BPCN with 300 nodes (42 293 interactions) and a DPCN with 296 nodes (15 073 interactions). We identified 4 hub pathways: AMP signaling pathway, cGMP-PKG signaling pathway, natural killer cell-mediated cytotoxicity, and osteoclast differentiation. We found that these pathways might play important roles during the occurrence and development of BE.
We predicted that these pathways (such as AMP signaling pathway and cAMP signaling pathway) could be used as potential biomarkers for early diagnosis and therapy of BE.
Barrett’s esophagus (BE, also known as columnar-lined esophagus), is a complication of gastroesophageal reflux disease and a precursor lesion in most cases of esophageal adenocarcinoma (EA) . Although, less than 5% of patients with BE will go on to develop EA, it is generally accepted that most persons with BE are undiagnosed and the vast majority of EA occurs in patients with undiagnosed BE . EA usually carries a poor prognosis, with a 5-year survival rate of less than 15% . Endoscopic examination is the now most commonly used means for detection of early EA, but is neither feasible nor cost-effective ; therefore, there is great need for an efficient and cost-effective method to detect BE in the early stage to prevent progression.
Recent efforts have been made to better understand the occurrence and development of BE. It has been reported that increasing age, cigarette smoking, obesity, lack of Helicobacter pylori (H. pylori) infection, and gastroesophageal reflux disease are the leading risk factors for BE . In addition, the intestinal epithelial-associated caudal-type homeobox (CDX) transcription factors CDX1 and CDX2 have been implicated in the pathogenesis of BE . By using next-generation sequencing in endoscopic biopsies, ARID1A has been identified as a tumor-suppressor gene in BE . Furthermore, the genomic sequences have been discovered . However, the exact pathological mechanism still remains unclear.
At present, pathway analysis has become the first choice for extracting and explaining the underlying biology for high-throughput molecular measurements . One effective biological approach to identifying pathway interaction is through genetic screenings, in which synthetic lethality of 2 mutations often indicates interaction between 2 pathways where those 2 mutations reside separately . Given the complex nature of biological systems, pathways often need to function in a coordinated fashion to produce appropriate physiological responses to internal and external stimuli . Fortunately, background pathway cross-talk network (BPCN) provides a quantifiable description of the molecular networks that characterize the complex interactions and the intricate interwoven relationships that govern cellular functions, among those tissues and disease-related genes to explain the molecular processes during disease development and progression . In networks, 2 pathways are likely to interact with or influence each other (cross-talk) if significantly more protein interactions are detected between these 2 pathways than expected by chance. Therefore, in the present study, pathway cross-talk analysis was conducted based on the networks of BPCN and disease pathway cross-talk network (DPCN) to identify the key pathways for BE, so as to better understand the exact pathogenesis of BE.
Therefore, we collected and preprocessed gene expression profile data, pathway data, and protein-protein interaction (PPI) data. Next, we separately constructed a BPCN and a DPCN. Finally, a comprehensive analysis was conducted on these 2 networks to identify key pathway cross-talks for BE. The results are potential biomarkers for early diagnosis and therapy of BE, which could give great insights to reveal the pathological mechanism underlying this disease, or contribute to future study of related diseases.
The gene expression profile of BE, with accessing number of GSE39491 (8), was obtained from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The data on GSE39491, on the A-AFFY-37 – Affymetrix GeneChip Human Genome U133A 2.0 platform, were composed of 40 BE samples and 80 controls from matched normal mucosa. The microarray data and annotation files were downloaded. Then, the gene expression profile on probe level was converted into gene symbol level, and the duplicated symbols were deleted. Finally, a total of 12 411 gene symbols was obtained for further analysis.
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information . The KEGG pathway database (http://www.genome.jp/kegg) is a collection of graphical diagrams (pathway maps) for the biochemical pathways . In this study, all human pathway data were downloaded from the KEGG pathway database, and a total of 300 pathways and 6919 genes were obtained.
There are several PPI databases that researchers commonly use, such as the Biomolecular Interaction Network Database (BIND) , BioGRID , Reactome , and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) . In the present study, the global human PPIs were obtained from the STRING database (http://string-db.org/), which included a total of 1 048 576 interactions. The protein IDs were converted into gene symbol level, and the duplicated symbols were deleted. Finally, a PPI network including a total of 787 896 interactions (16 730 genes) were obtained for further analysis.
To evaluate interactions among pathways, the PPI relationship between the pathways, which was pathway cross-talk, was investigated. The pathways that had cross-talk between each other were selected to construct the network, which was defined as the BPCN. First, for each of the pathway pairs that we obtained from the KEGG pathway database, the PPI analysis of the genes enriched in these 2 pathways were conducted. After statistically analyzing all of the pathway pairs and all of the interactions between any 2 pathway pairs, we separately denoted these numbers as weight values for the pathway pairs. Then, we used the Fisher exact test to evaluate gene overlap between any given pair of pathways , and P-values (denoted as PB) were adjusted by false discovery rate (FDR) . Finally, the BPCN was visualized by Cytoscape with pathway pairs whose adjusted PB were <0.05.
To further explore the relationships among the pathways in BE, a pathway cross-talk network was constructed based on the gene expression profile and the PPIs, and we denoted this network as the DPCN. In the present study, there were 2 steps for constructing the DPCN: pathway analysis for BE and DPCN construction.
In the present study, to gain further insights into the functional enrichment of the genes of the BE, pathway analysis was performed on the gene expression profile. There were 2 steps in this analysis. First, pathway enrichment analysis was conducted based on the KEGG pathway database . The Database for Annotation, Visualization, and Integrated Discovery (DAVID)  was used to perform the KEGG pathway enrichment analysis of the nodes to find the biochemical pathways which might be involved in the occurrence and development of BE. The pathways with gene counts including more than 5 genes and less than 100 genes were selected for further analysis. Then, GSEA-ANOVA of the attract method was used to test pathway-level data to identify the values of the F-statistic, and the t test with Welch modification was used to adjust the P value . In this case, each pathway was assigned a P value, which we denoted as PA, and these pathways were ranked in descending order according to their PA.
In the present study, the DPCN was constructed based on the differential pathways. To further define the relationships of these pathways identified above, the PPI relationships between every pathway cross-talk were measured. For any pathway cross-talk, we went through all genes in a given pathway, and if a gene did not have any interaction, we skipped it. If a gene had interaction, the Spearman correlation coefficient (SCC) (23) was utilized to weight pairwise interactions of BE and normal controls in pathways. The SCC of a pair of interactions (x and y) was defined as:
Where n was the number of interactions of the inter; g(x, i) or g(y, i) was the expression level of interaction a or b in the pathway i under a specific condition (BE or normal); (x) or (y) represented the mean expression level of interaction x or y and σ(x) or σ(y) represented the standard deviation of expression level of interaction x or y.
For any pathway cross-talk, supposing that there were A and B genes in these 2 pathways, respectively, we defined the weight of the pathway pairs as the total absolute different value of SCC between normal controls and BE divided (X × Y).
In the following, we used the Fisher exact test  to evaluate gene overlap between any given pathway cross-talk, and P-values, which we denoted as PD, were adjusted by FDR . Finally, the pathway pairs of BE and normal controls whose adjusted PD <0.05 were considered as differential pathways were selected to construct a DPCN via Cytoscape.
For purposes of identifying hub pathways for BE, a general analysis was conducted on the BPCN and DPCN. Centrality analysis was employed to investigate biological functions and significance of hub cross-talks in BPCN and DPCN. Centrality measures mainly contain degree , closeness , betweenness , and transitivity , in which degree is the simplest topological index. In the present study, the pathways of the BPCN and DPCN were ranked in descending order according to the degree centralities of the pathways.
Then, the rank product (RP) algorithm , a simple but powerful meta-analysis tool to detect differentially expressed genes between 2 experimental conditions, was used to analyze these 2 networks. U and V stand for 2 conditions (BE vs. controls), and there were nU and nV replicates in the BPCN, and mU and mV in DPCN. The RP for each cross-talk was determined according to the following formula:
Where rsi stood for the rank of sth gene under ith comparison, i=1, …, T. The pathways with RP value <0.05 were considered to be very important and selected for further analysis.
The impact factor (IF) was considered to determine the hub pathways. For an arbitrary pathway x, where PD represented the degree value of the DPCN, and PA represented the P value according to the attract method. The IF of pathway x was calculated according to the following formula:
Finally, based on comprehensive analysis, the pathways with PA <0.05 and RP value <0.05, as well as the top 2% pathways according to the IF values, were considered as hub pathways. The cross-talks among hub pathways were hub cross-talks.
In the present study, for detecting significant biomarkers for BE, pathway cross-talk analysis was conducted. Prior to analysis, gene data, original pathway data, and PPI data were all collected from their own databases. In the following, comprehensive analysis was performed on the BPCN and DPCN to identify hub cross-talks. The results were as follow.
Having obtained the pathway data and the PPI data from their own databases, the PPI relationships between any 2 pathways were analyzed. By setting the threshold value of PB <0.05, a BPCN with 300 nodes (42 293 interactions) was constructed (Figure 1). Degree centrality analysis was conducted on the BPCN (Figure 2), showing that the degree of most pathways was focused on the value between 250 and 300. In this case, most pathways were contacted with each other. Edges between 2 pathways with significant gene overlap were considered as not informative, and thus were removed from the network. Note that it was our intent to discover cross-talk among different biological activities in BE; therefore, we constructed a DPCN.
As indicated in the Methods section, to construct the DPCN, we first conducted KEGG enrichment analysis of the gene expression profile of BE. Then, each pathway was assigned a P value via the attract method. There were 16 pathways with PA <0.05: Chemical carcinogenesis (PA=6.49E-06), Metabolism of xenobiotics by cytochrome (PA=2.09E-05), Neuroactive ligand-receptor interaction (PA=7.97E-05), Ribosome (PA=7.97E-05), Retinol metabolism (PA=7.97E-05), Drug metabolism – cytochrome (PA=1.30E-03), Natural killer cell-mediated cytotoxicity (PA=1.71E-03), RNA transport (PA=1.71E-03), ABC transporters (PA=4.79E-03), Osteoclast differentiation (PA=1.20E-02), Nicotine addiction (PA=1.20E-02), Antigen processing and presentation (PA=1.90E-02), cAMP signaling pathway (PA=2.40E-02), cGMP – PKG signaling pathway (PA=2.40E-02), Valine (PA=2.40E-02), and Spliceosome (PA=2.40E-02).
To further define the biological activities of the pathways of the BE, a DPCN was constructed based on the differential pathways. As SCC was used to weight the pairwise interactions of BE and normal controls in pathways, the Fisher exact test was utilized to evaluate gene overlap between any given pathway cross-talk, and FDR was used to adjust the P value. By setting the cutoff value of PD <0.05, 296 differential pathways were identified. In this case, a DPCN with 296 nodes (15 073 interactions), where each node represented a pathway, was built (Figure 3). Then, degree centrality analysis was conducted on the DPCN (Figure 4), showing that the degree values were scattered and distributed dispersedly from 0 to 200, which was smaller than that in BPCN. This might be useful in exploring different cross-talks between BE and normal controls.
To detect differentially expressed genes between BE and the normal control condition, an RP algorithm was implemented to perform analysis on these 2 networks. Under the threshold value of RP <0.05, we obtained a total of 55 pathways. The IF values of the pathways were calculated and ranked in descending order, and we obtained 6 pathways: Amyotrophic lateral sclerosis (ALS) (IF=186), Osteoclast differentiation (IF=157), cAMP signaling pathway (IF=156), Natural killer cell-mediated cytotoxicity (IF=147), cGMP - PKG signaling pathway (IF=137), and Epstein-Barr virus infection (IF=135). Finally, 4 hub pathways – cAMP signaling pathway, cGMP-PKG signaling pathway, Natural killer cell-mediated cytotoxicity, and Osteoclast differentiation – were identified under the threshold values of PA <0.05 and RP value <0.05, as well as the top 2% pathways according to the IF values. The details are listed in Table 1, and these 4 hub pathways were regarded to play key roles in BE. The hub cross-talks are shown in Figure 5.
BE is an acquired condition in which the normal stratified squamous epithelium in the distal esophagus is replaced by metaplastic columnar epithelium in response to chronic gastroesophageal reflux , with a predisposition to EA. Better understanding of the molecular alterations during its development might improve prevention and tumor control and ultimately lead to better disease management. High-throughput biological experiments that interrogate many genes simultaneously have generated unprecedented amounts of data. Bioinformatics methods have been accepted as quick and efficient methods for analyzing these huge amounts of data, providing a preliminary understanding of the disease. Pathway analysis has become the first choice for gaining insight into the underlying biology of genes and proteins, as it reduces complexity and has increased explanatory power . Traditional methods often pay close attention to diagnostic or prognostic markers, usually obtained by identification of the most significant differentially expressed genes (DEGs) between the case-control and the disease , then pathway analysis is conducted on the DEGs to disclose the significant differential pathways between the disease and the normal control conditions. However, studies showed that the most significant DEGs obtained from different studies for a particular disease are typically inconsistent . The cross-validation of datasets, such as network-based methods, significantly reduce those false findings and increase sensitivity . Moreover, by utilizing pathway-related networks, one can gain insights into the mechanism by which biological systems operate .
Therefore, in this research, we conducted analysis on BE via integrating biological pathways and protein interaction data. We found that pathways of cAMP signaling pathway, cGMP – PKG signaling pathway, Natural killer cell-mediated cytotoxicity, and Osteoclast differentiation showed significant differences between BE condition and normal control condition. Therefore, to further define the relationship between the altered pathways and BE, we conducted an in-depth analysis of the altered pathways, and cAMP signaling pathway used as an example.
Pathway analysis has been conducted to disclose the molecular mechanisms underlying BE [35–37]. It has been reported that a brief exposure to acid induces MAPK activation in vitro in human Barrett’s-associated esophageal adenocarcinoma cells and in vivo in the metaplastic esophageal mucosa of patients with BE . Cyclic adenosine monophosphate (cAMP) has tissue- specific effects on growth, differentiation, and gene expression . cAMP has been found to activate MAPK and Elk-1 through a B-Raf- and Rap1-dependent pathway . Furthermore, it has been reported that there is significant cross-talk between cAMP and MAPK signaling in the regulation of cell proliferation. In the present study, the cAMP signaling pathway was considered to be significant for EB. Therefore, we predict that there might be a relationship between cAMP signaling pathway and BE. In the future, further experimental verification should be conducted to verify the relationship between the cAMP signaling pathway and BE.
We identified several hub pathways (cAMP signaling pathway, cGMP – PKG signaling pathway, Natural killer cell-mediated cytotoxicity, and Osteoclast differentiation) for BE via integrating biological pathways and protein interaction data. We predict that these pathways might play key roles during the occurrence and development of BE, and are potentially novel predictive and prognostic markers for BE.
This research is the result of the mutual cooperation of the authors. We thank all members of the research group. We are grateful to Beijing Springer Medical Research Institute for professional translation and text polishing.
We declare that we have no conflicts of interest.
Source of support: Departmental sources