Search tips
Search criteria 


Logo of bmcsysbioBioMed Centralsearchsubmit a manuscriptregisterthis articleBMC Systems Biology
BMC Syst Biol. 2017; 11: 73.
Published online 2017 August 11. doi:  10.1186/s12918-017-0444-y
PMCID: PMC5553769

Quantitative reproducibility analysis for identifying reproducible targets from high-throughput experiments



High-throughput assays are widely used in biological research to select potential targets. One single high-throughput experiment can efficiently study a large number of candidates simultaneously, but is subject to substantial variability. Therefore it is scientifically important to performance quantitative reproducibility analysis to identify reproducible targets with consistent and significant signals across replicate experiments. A few methods exist, but all have limitations.


In this paper, we propose a new method for identifying reproducible targets. Considering a Bayesian hierarchical model, we show that the test statistics from replicate experiments follow a mixture of multivariate Gaussian distributions, with the one component with zero-mean representing the irreproducible targets.


A target is thus classified as reproducible or irreproducible based on its posterior probability belonging to the reproducible components. We study the performance of our proposed method using simulations and a real data example.


The proposed method is shown to have favorable performance in identifying reproducible targets compared to other methods.

Keywords: Reproducibility, High-throughput experiment, Bayesian classification, Empirical Bayes, Gaussian mixture, EM algorithm


In biological research, high-throughput assays, such as microarrays, are widely used to effectively select potential targets by studying a large number of candidates in a single experiment. However a high-throughput assay is often subject to substantial variability. Reproducibility of high-throughput assays, such as the level of agreement across replicate samples, test sites or data analytical platforms, is a concerned topic in scientific applications, and has been discussed in [1] for microarray and [2] for ChIP-seq technology. Therefore quantitative analysis for the reproducibility of high-throughput assays is an important exercise for evaluating the reliability and robustness of scientific discoveries across studies.

Reproducibility is nonstandard and unsettled across the sciences. Goodman et al. [3] provides a survey on the papers with the word reproducibility included in titles, abstracts and keywords, and concludes that the interpretation of reproducibility varies among different papers. Goodman et al. [3] further allies the word reproducibility in the papers and classifies them into three terms: methods reproducibility, results reproducibility and inferential reproducibility. In [3], methods reproducibility refers to the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated, such as [1] and [2]; results reproducibility refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible, such as [4] and [5]; Inferential reproducibility refers to the drawing of qualitatively similar conclusions from either an independent replication of a study or a reanalysis of the original study, such as [1] and [2].

In this paper, our reproducibility analysis aims to identify reproducible targets with consistent and significant signals across replicate studies, which belongs to the category of inferential reproducibility as defined in [3]. Our reproducibility analysis is different from meta-analysis, such as [6] and [7]. Meta-analysis combines the data from multiple studies to gain extra power for identifying targets with signals. The identified targets may not necessarily be significant across all studies.

A few methods have been developed for our reproducibility analysis. Hong et al. [8] proposed a permutation based method through estimating the empirical distribution of the rank product. Benjamini & Heller [9] developed a framework for testing partial conjunction hypothesis that the discovery is true in at least u studies out of total n studies. Most recently, [10] proposed a copula mixture model for estimating the irreproducible discovery rate across studies.

However all existing methods potentially have limitations. The permutation based method [8] can be computationally expensive when dealing with a large number of candidates. Benjamini & Heller method [9] aims at identifying candidates with reproduced signals in a few but not all the studies, which is a related but generally weaker goal than ours. The special case of Benjamini & Heller method testing whether signals are reproduced in all studies is identical to using the largest p-value. The copula mixture [10] method builds the copula mixture using the rank transformation of the original data, which might be less powerful than modeling the original data with a proper probabilistic model as in our proposed method. A major drawback of both Benjamini & Heller method [9] and the copula mixture [10] method is that they both use the significant score of signals, such as p-value, without taking into account the directionality of signals, hence is prune to selecting candidates with significant scores but different directions across studies. For example, in the context of two replicate microarray studies with a treatment and a control group, consider genes with significant p-values in both experiments, but are up-regulated in one study and down-regulated in the other. Although those genes have inconsistent signals across studies, both methods will likely classify them as reproducible based on p-values alone. In contrast, our proposed method models the test statistics directly and is expected to correctly classify those genes as irreproducible most of the time.

In this paper, we propose a Bayesian hierarchical model and show the test statistics from replicate studies can be approximated by a mixture of multivariate Gaussian distributions. The proposed Gaussian mixture model classifies the signals into three components: one irreproducible component and two reproducible components for consistent up-regulated and down-regulated signals respectively. The posterior probability of belonging to the reproducible components is used as a measure for reproducibility.


For simplicity, we will introduce our method in the context of microarray studies but it can be generalized to studies of other high-throughput assays. We consider I replicate microarray studies for p genes. In this paper, we focus on the situation of two replicate studies I=2, although our method can be readily extended to the case with more than two studies. We assume a study includes two groups, e.g., the treatment and control group, with sample size equal to n ik for group k, k=1,2, in the i-th study. Let x gijk be the normalized and transformed measurement of gene expression of the jth sample from group k for gene g in the i-th study. The test statistics of two-sample unpaired t-test for gene g in the i-th study is


We present an empirical Bayesian hierarchical model to account for various sources of variability. When the sample size n ik is reasonably large, say n i1+n i2≥30, the test statistics d gi is well approximated by a normal distribution:

dgi|μgi ∼ 𝒩(δSiμgi, 1)

where μ gi is the expected group mean difference for gene g in the i-th study, and δSi=σ~i1(1/ni1+1/ni2)1/2 with σ~i being the common standard deviation for {x gij1}, j=1,2,…,n i1 and {x gij2}, j=1,2,…,n i2. When the sample size is small, the same procedure as in [11] can be applied to construct z-tests based on two-sample t-tests. For simplicity we assume the within-group between-sample standard deviation is the same for all the genes. The general case can be derived in a similar fashion but a bit more tedious.

For the expected group mean difference μ gi, we assume it follows


where μ g is the “true" group mean difference for gene g across all studies and σg2 models the between-study variability due to various experiment conditions.

Furthermore we assume μ g is from a mixture distribution


where π i≥0, i=0,1,2, with π 0+π 1+π 2=1, μG1 > 0 and μG2 < 0. The distribution has three components: the null case where there is no differentially expressed gene, the “up-regulated” case where the treatment stimulates the gene expression, and the “down-regulated” case where the treatment suppresses the gene expression. Generally for microarray studies π 0[similar, equals]1. Similar mixture models have been considered in [1116]. Particularly we choose to model the cluster of up-regulated (or down-regulated) genes with a Gaussian distribution for the computational convenience, same as in [12]. Alternative choices include the semiparametric mixture model in [11, 14], mixture of Gaussian distributions in [13, 15] and mixture of t-distributions in [16].

We can show that the test statistics (d g1,d g2) follow a Gaussian mixture model. The derivations are standard by repeatedly applying the law of total expectation and the law of total variance and thus omitted. The mixture model is


where 𝒩(μlΣl) (l=0,1,2) is the biviariate normal distribution with mean vector μl and covariance matrix Σ l. Let I 2 and J 2 be the identity matrix and the square matrix of ones respectively, both with order 2. This mixture model classify the candidates into three components: 𝒩(μ0Σ0) is the irreproducible component with zero-mean μ0 = (0,0)T and covariance structure Σ0=σg2+1I2; 𝒩(μ1Σ1) and 𝒩(μ2Σ2) are two reproducible components with μ1 = (δS1μG1δS2μG1) > 0 and Σ1=σg2+1I2+σG12J2 representing the up-regulated genes, and μ2 = (δS1μG2δS2μG2) < 0 and Σ2=σg2+1I2+σG22J2 representing the down-regulated genes, where the inequalities are meant to be interpreted component-wise.

Note with increased sample sizes or decreased within-group between-sample variability, the mean μ1 and μ2 of the reproducible components move further away from the origin, making the three components more separable. Also note the test statistics from replicate studies have zero correlations in the irreproducible components; in the reproducible components, the correlations become larger when the between-study variability becomes smaller; for all components, the variance is smaller with less between-study variability, resulting in more separable components.

Under the Gaussian mixture model, the posterior probability of (d g1,d g2) belonging to a component is


where ϕ(·|·) is the density function of bivariate normal distribution. According to [10], the posterior probability of being in the irreproducible/null component p i0 can be introduced as the individual significant score, namely local false discovery rate. When p g0 is less than a significant level α, gene g is classified as reproducible.

Next, we consider estimation of the unknown parameters

θ = (μ1μ2Σ0Σ1Σ2π0π1π2)

in the mixture model (4) to get the estimate of p g0 for individual genes. It is natural to use the expectation-maximization (EM) algorithm to estimate θ by maximizing the log-likelihood of the data [17], i.e.,


In our algorithm, we start with some initials value for the parameters θ0, then iterate between two steps: (1) Evaluate the current posterior probabilities p gl using the current parameters; (2) Maximize the likelihood estimator given current posterior probabilities. The details of the EM procedures are provided in Appendix. Multiple random initial vaues are used to avoid being trapped at the local maximum.

Simulation studies

In this section, we present numerical simulations to illustrate the performance of our proposed method compared to three existing methods, the copula mixture model [10], Benjamini & Heller method [9], and the rank product method [8]. We use the following model to simulate data


From this model, the mean expression level of gene g for group 1 of study s is modeled as μ gs1=μ+α g+β i+(α β)gi, where μ is the overall mean; α g is the main effect of gene g; β i is the main effect of study i; (α β)gi is the gene-study interaction. We set μ=0, αg ∼ 𝒩(0, 1), β i=0.1, and (αβ)gi ∼ 𝒩(0, 0.52). For non-differentially expressed genes, the mean expression level for both groups are the same, i.e., μ gs1=μ gs2. For differentially expressed genes, (8) models the difference between the two comparison groups as μ gi2μ gi1=δ+γ g+(γ β)gi, where δ is the fixed effect of group difference; γ g is the effect of gene on the group difference; (γ β)gi is the gene-study interaction of the group difference. We set δ=0, generate γ g from 𝒩(2, 0.52) or 𝒩(−2, 0.52) to mimic two possible directions of signals, (γβ)gi ∼ 𝒩(0, 0.52). ε gijk is the random error term, and following the distribution 𝒩(0, 0.52).

For each simulation run, we generate 2 studies. Each study has two groups with 10 samples per group. We generate G=5000 genes per sample and choose the proportions of reproducible genes (γ) from (80%, 60%, 40%, 20%, 10%, 5%, 1%). We apply the proposed method and the three existing methods to the simulated data, and classify the genes as reproducible based on two commonly used significant levels (α) 0.05 and 0.1. The performance of the four compared methods is evaluated by three criteria, i.e., sensitivity, specificity and misclassification rate. Results from 50 simulations are summarized in Tables Tables1,1, ,22 and and33 respectively. The results shows our proposed method performs the best among the four methods with the smallest misclassification rates (Table (Table1),1), highest sensitivity (Table (Table2)2) and highest specificities (Table (Table33).

Table 1
The summary of misclassification rates for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ)
Table 2
The summary of sensitivities for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ)
Table 3
The summary of specificities for the four compared methods under different significant levels (α) and proportions of reproducible genes (γ)


In this section, we illustrate our proposed method using a real example. This example includes two microarray studies [18] and [19] comparing idiopathic pulmonary fibrosis (IPF) samples with healthy control samples. Data from both studies are obtained from Gene Expression Omnibus [20]. GSE 28042 [18] measures profiles of peripheral blood mononuclear cell (PBMC) for 75 IPF samples and 16 control samples through GeneChip Human 1.0 exon ST arrays, and GSE 33566 [19] measures profiles of peripheral blood RNA for 93 IPF patients and 30 control samples through Agilent Whole Human Genome Oligonucleotide Microarrays. We only consider the overlap 17708 common genes for reproducibility analysis.

We apply our proposed method, the copula mixture model [10] and Benjamini & Heller method [9]. The rank product method [8] is too computationally intensive to be applied to this example and thus excluded from this study. Figures Figures1,1, ,22 and and33 show the results of selected reproducible genes from the three compared methods respectively (green). In all three figures, the x axis represents the test statistics from GSE 28042 [18], and the y axis represents the test statistics from GSE 33566 [19]. The top 500 reproducible genes selected by three methods are highlighted in green. As shown in Fig. Fig.1,1, our proposed method only selects genes with consistently significant signals in both studies. Benjamini & Heller method [9] incorrectly identifies 23 genes (the upper left and bottom right corners of Fig. Fig.2)2) as reproducible, which actually have opposite directions in two studies. The complete list of the 23 genes incorrectly selected by Benjamini & Heller method [9] is provided in Table Table4.4. The copula mixture model [10] selects 7 genes (Table (Table5)5) with opposite directions of signals. It’s also noted that the copula mixture model [10] appears to be less powerful in separating the irreproducible and reproducible genes and has incorrectly selected some insignificant genes (see the center of Fig. Fig.3),3), likely resulting from the rank transformation. Overall, our method performs favorably in identifying reproducible genes.

Fig. 1
Bivariate plot of test statistics from two studies. The x axis represents the test statistics from GSE 28042 study [18], and the y axis represents the test statistics from GSE 33566 [19]. The green points are the top 500 reproducible genes selected by ...
Fig. 2
Bivariate plot of test statistics from two studies. The x axis represents the test statistics from GSE 28042 study [18], and the y axis represents the test statistics from GSE 33566 [19]. The green points are the top 500 reproducible genes selected by ...
Fig. 3
Bivariate plot of test statistics from two studies. The x axis represents the t-statistics from GSE 28042 study [18], and the y axis represents t-statistics from GSE 33566 [19]. The green points are the top 500 reproducible genes selected by Benjamini ...
Table 4
The list of 23 selected genes, which are in the list of the top 500 reproducible genes selected by Benjamini & Heller method [9], but have opposite signs of signals in two studies
Table 5
The list of 7 selected genes, which are in the list of the top 500 reproducbile genes selected by the copula mixture model [10], but have opposite signs of signals in two studies

Conclusion and discussion

This paper proposes a new method for identifying consistent and significant signals across replicate high-throughput experiments. Existing methods ignore the directionality of signals, and can incorrectly identify signals with opposite directions as reproducible ones. Our proposed method considers both the significant scores and directions of signals by modeling the test statistics directly, leading to improved performance in selecting reproducible candidates. When the proposed method is applied to a real data example for identifying reproducible genes in studies of idiopathic pulmonary fibrosis samples, it is shown to have better performance in detecting significant and reproducible genes compared to other methods. Simulations also demonstrate that our method compares favorably to the existing methods.


Expectation-maximization (EM) algorithm to estimate model parameters

The algorithm for estimating θ in (6) is an iterative algorithm between Expectation steps and maximization step. We use θ^v to denote the estimate at vth iteration. The algorithm includes the following steps:

  • Step 1: Initial Values Generate the initial values for θ and denote it as θ^0
  • Step 2: Expectation-Step Continue from the vth iteration step with the estimate θ^v. We can obtain the estimated posterior probability pgl^v of (d g1,d g2) from (5) by
  • Step 3: Maximization-Step Update the parameter θ^v+1 by maximizing the log-likelihood function (θ) in (7) given the current estimated posterior probability pgl^v. The estimated parameters from the maximization are
  • Step 4: Solution The algorithm continues between Expectation-Step and Maximization-Step until the following two conditions are satisfied.
    1. The difference between θ^v and θ^v+1 is less than a small value δ 1 for all their elements;
    2. The change in log-likelihood function (θ) between two consecutive iterations does not exceed a small value δ 2.


We would like thank referees for their time on reviewing this manuscript.

Availability of data and materials

All data are from Gene Expression Omnibus [20].

Authors’ contributions

Authors’ contributions

All authors equally distributed. All authors read and approved the final manuscript.


Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


1. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, De Longueville F, Kawasaki ES, Lee KY, et al. The microarray quality control (maqc) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat Biotechnol. 2006;24(9):1151–61. doi: 10.1038/nbt1239. [PMC free article] [PubMed] [Cross Ref]
2. Park PJ. Chip–seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009;10(10):669–80. doi: 10.1038/nrg2641. [PMC free article] [PubMed] [Cross Ref]
3. Goodman SN, Fanelli D, Ioannidis JP. What does research reproducibility mean? Sci Transl Med. 2016;8(341):341–1234112. doi: 10.1126/scitranslmed.aaf5027. [PubMed] [Cross Ref]
4. Darbani B, Stewart CN. Reproducibility and reliability assays of the gene expression-measurements. J Biol Res (Thessaloniki) 2014;21(1):3. doi: 10.1186/2241-5793-21-3. [PMC free article] [PubMed] [Cross Ref]
5. Parmigiani G, Garrett-Mayer ES, Anbazhagan R, Gabrielson E. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clin Cancer Res. 2004;10(9):2922–7. doi: 10.1158/1078-0432.CCR-03-0490. [PubMed] [Cross Ref]
6. Choi H, Shen R, Chinnaiyan AM, Ghosh D. A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments. BMC Bioinforma. 2007;8(1):364. doi: 10.1186/1471-2105-8-364. [PMC free article] [PubMed] [Cross Ref]
7. Parmigiani G, Garrett ES, Anbazhagan R, Gabrielson E. A statistical framework for expression-based molecular classification in cancer. J R Stat Soc Ser B Stat Methodol. 2002;64(4):717–36. doi: 10.1111/1467-9868.00358. [Cross Ref]
8. Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, Chory J. Rankprod: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics. 2006;22(22):2825–7. doi: 10.1093/bioinformatics/btl476. [PubMed] [Cross Ref]
9. Benjamini Y, Heller R, Yekutieli D. Selective inference in complex research. Philos Trans R Soc Lond A Math Phys Eng Sci. 2009;367(1906):4255–71. doi: 10.1098/rsta.2009.0127. [PMC free article] [PubMed] [Cross Ref]
10. Li Q, Brown JB, Huang H, Bickel PJ. Measuring reproducibility of high-throughput experiments. Annals Appl Stat. 2011;5:1752–79. doi: 10.1214/11-AOAS466. [Cross Ref]
11. Efron B. Microarrays, empirical bayes and the two-groups model. Stat Sci. 2008;23(1):1–22. doi: 10.1214/07-STS236. [Cross Ref]
12. Chen MH, Ibrahim JG, Chi YY. A new class of mixture models for differential gene expression in dna microarray data. J Stat Plan Infer. 2008;138(2):387–404. doi: 10.1016/j.jspi.2007.06.007. [PMC free article] [PubMed] [Cross Ref]
13. Najarian K, Zaheri M, Rad AA, Najarian S, Dargahi J. A novel mixture model method for identification of differentially expressed genes from dna microarray data. BMC Bioinforma. 2004;5(201):201–10. doi: 10.1186/1471-2105-5-201. [PMC free article] [PubMed] [Cross Ref]
14. Newton MA. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004;5(2):155–76. doi: 10.1093/biostatistics/5.2.155. [PubMed] [Cross Ref]
15. Wei Pan JL, Le CT. A mixture model approach to detecting differentially expressed genes with microarray data. Funct Integr Genomics. 2003;3:117–24. doi: 10.1007/s10142-003-0085-7. [PubMed] [Cross Ref]
16. G.J. McLachlan RWB, Peel D. A mixture model-based approach to the clustering of microarray expression data. Bininformatics. 2002;18(3):413–22. doi: 10.1093/bioinformatics/18.3.413. [PubMed] [Cross Ref]
17. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B Methodol. 1977;38:1–38.
18. Herazo-Maya JD, Noth I, Duncan SR, Kim S, Ma SF, Tseng GC, Feingold E, Juan-Guardela BM, Richards TJ, Lussier Y, et al. Peripheral blood mononuclear cell gene expression profiles predict poor outcome in idiopathic pulmonary fibrosis. Sci Transl Med. 2013;5(205):205–136205136. doi: 10.1126/scitranslmed.3005964. [PMC free article] [PubMed] [Cross Ref]
19. Yang IV, Luna LG, Cotter J, Talbert J, Leach SM, Kidd R, Turner J, Kummer N, Kervitsky D, Brown KK, et al. The peripheral blood transcriptome identifies the presence and extent of disease in idiopathic pulmonary fibrosis. PLoS One. 2012;7(6):37708. doi: 10.1371/journal.pone.0037708. [PMC free article] [PubMed] [Cross Ref]
20. Gene Expression Omnibus.

Articles from BMC Systems Biology are provided here courtesy of BioMed Central