PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bioinfoLink to Publisher's site
 
Bioinformatics. 2010 October 15; 26(20): 2637–2638.
Published online 2010 August 26. doi:  10.1093/bioinformatics/btq471
PMCID: PMC2951087

DCGL: an R package for identifying differentially coexpressed genes and links from gene expression microarray data

Bao-Hong Liu,1,2,3, Hui Yu,2,3, Kang Tu,4 Chun Li,5 Yi-Xue Li,1,2,3,* and Yuan-Yuan Li2,3,*

Abstract

Summary: Gene coexpression analysis was developed to explore gene interconnection at the expression level from a systems perspective, and differential coexpression analysis (DCEA), which examines the change in gene expression correlation between two conditions, was accordingly designed as a complementary technique to traditional differential expression analysis (DEA). Since there is a shortage of DCEA tools, we implemented in an R package ‘DCGL’ five DCEA methods for identification of differentially coexpressed genes and differentially coexpressed links, including three currently popular methods and two novel algorithms described in a companion paper. DCGL can serve as an easy-to-use tool to facilitate differential coexpression analyses.

Contact: gro.tibcs@ilyy and gro.tibcs@ilxy

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

From the perspective of systems biology, gene coexpression analysis is useful for investigating gene interconnection at the expression level. Differential coexpression analysis (DCEA), which examines the change in expression correlation of gene pairs between two conditions, helps to explore the global transcriptional mechanisms underlying phenotypic changes. Compared with traditional differential expression analysis (DEA), the development of DCEA tools is lagged. In this work, we developed an R package, DCGL, implementing three previously proposed DCEA methods and two new algorithms reported in a companion paper (Yu,H. et al., submitted for publication).

Log Ratio of Connections (LRC) calculates the logarithm of the ratio of the connectivities of a gene between two conditions (Reverter et al., 2006). Average Specific Connection (ASC) counts the ‘specific connections’ that exist in only one coexpression network (Choi et al., 2005). The weighted gene coexpression network analysis (WGCNA) weights links with correlation coefficients and compares the sums of the correlation coefficients of a gene (Mason et al., 2009; van Nas et al., 2009). In contrast, our two methods, differential coexpression profile (DCp) and differential coexpression enrichment (DCe), are designed based on the exact coexpression changes of gene pairs, and thus can differentiate significant coexpression changes from relatively trivial ones, and identify coexpression reversal between positive and negative (Yu,H. et al., submitted for publication). All the five methods are able to identify differentially coexpressed genes (DCGs) from microarray datasets, and DCe is also able to pick out differentially coexpressed gene pairs or links (DCLs).

2 DESIGN

A typical DCEA workflow involves three successive procedures: gene filtration, link filtration, DCG and DCL identification. Correspondingly, DCGL consists of three parts of functions (Fig. 1). For gene filtration, one choice is based on the expression level (expressionBasedfilter) and the other based on its variability (varianceBasedfilter). For link filtration, we provide three functions for cutting off coexpression values (systematicLinkfilter, percentLinkfilter and qLinkfilter). A gene pair (link) is filtered out if both of its coexpression values for two conditions are lower than the cutoff.

Fig. 1.
DCGL design. Function names are shown in italic texts.

The third part, also the core of the package, includes five methods for identifying DCGs and DCLs, which mainly differ in the measure of differential coexpression (dC) of a gene. After the steps of gene filtration and link filtration, suppose gene i is associated with ni links whose coexpression values are projected to X = {xi1, xi2,…, xini}and Y = {yi1, yi2,…, yini} for two conditions. The dC measures of different methods are given in the following equations.

equation image
(1)

equation image
(2)

where N and K indicate the numbers of total links and total DCL links in the coexpression network, respectively, and ni and ki indicate the links and DCLs connected to gene i (see Yu,H. et al., submitted for publication).

equation image
(3)

with x′ and y′ are transformed from original x and y values with a ‘soft-thresholding’ strategy (Mason et al., 2009; van Nas et al., 2009).

equation image
(4)

Link sets C1(i) and C2(i) for two conditions are determined by screening the coexpression values according to a certain threshold.

equation image
(5)

3 IMPLEMENTATION

DCGL is released as an R package including two gene filtering functions, three link filtering functions and five DCEA functions (Fig. 1). These functions generally expect gene expression matrices (with genes in rows and samples in columns) as a major input, and the ultimate output are genes ranked by dC measure or P-value, from which one can obtain a DCG list. DCe has an additional output of classified DCLs. DCGL can be obtained from the supplementary data to this manuscript, or at http://cran.r-project.org/web/packages/DCGL/index.html.

We tested the five DCEA methods using dataset GSE3068 obtained from GEO (Table 1). Note that this test was carried out with the most time-efficient option of link filtration (setting thresholds on coexpression value directly). For the memory analysis, we tested DCp and DCe with the most memory-intensive filtration option ‘qLinkfilter’. We approached a memory limit of around 5.7 GB at a gene total of 7000. So it is anticipated that, if qLinkfilter is evoked, a gene expression matrix generally should undergo a gene filtration step beforehand so that the gene total is cut down to a few thousands or less.

Table 1.
Execution time (in seconds) of five DCEA methods in handling different subsets of GSE3068

4 EXAMPLE

Three simulated datasets are included in the package for exploring the functions. For example, ‘dataC’ gives expression values of 1000 genes in 20 samples divided equally into two groups corresponding to two conditions. Since this dataset contains a moderate number of genes, the gene filtration step can be skipped. The link filtration procedure is wrapped as a sub-function in the DCEA functions, so one can specify the link filtration choice in the arguments of DCEA functions.

If the DCEA function DCe is called, one can get a resulted variable with four components. The gene names ranked by the dC measure (P-value) make up the first ‘$DCGs’ component, while DCLs of different types are given in other three components.

Funding: Shanghai Institutes for Biological Sciences; Chinese Academy of Sciences (2008KIP207); the National ‘973’ Basic Research Program (2006CB0D1203, 2006CB0D1205); the National Natural Science Foundation of China (30770497, 31000380); National Key Technologies R&D Program (2007AA02Z331 and 2009ZX10603).

Conflict of Interest: none declared.

Supplementary Material

Supplementary Data:

REFERENCES

  • Choi JK, et al. Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics. 2005;21:4348–4355. [PubMed]
  • Mason MJ, et al. Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells. BMC Genomics. 2009;10:327. [PMC free article] [PubMed]
  • Reverter A, et al. Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer. Bioinformatics. 2006;22:2396–2404. [PubMed]
  • van Nas A, et al. Elucidating the role of gonadal hormones in sexually dimorphic gene coexpression networks. Endocrinology. 2009;150:1235–1249. [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press