This section describes the inputs that are expected by EPEPT and the logs and outputs that are generated. Also, we explain how jobs can be submitted and how the results can be obtained, using either the web service or the website.
• Setting of EPEPT [required, default = PV]
EPEPT can be run in three different settings. In the first setting ('PV'), EPEPT expects that the user uploads permutation values. In the second and the third settings, 'SAM' and 'GSEA', EPEPT assumes that a gene expression dataset is uploaded.
• File with test statistics and permutation values or a labeled gene expression dataset [required]
The file should be a tab delimited text file, a comma separated text file or an Excel file. EPEPT checks the extension of the file to decide upon its format: Excel files should have the .xls or .xlsx extension and the data should be on the first sheet. Comma separated files should have the extension .csv. All files with other extensions are assumed to be tab delimited text files.
In the 'PV' setting, each column in the file should contain one test statistic and its corresponding permutation values. Since multiple columns are allowed, different events (e.g. different genes or gene sets) can be tested simultaneously, yet independently. The file is allowed to have one header row. In case of a header row, the test statistics should be on the second row. In case no header row is used, the test statistics should be on the first row. All numerical values in the rows below the test statistic are assumed to be the permutation values. Non-numerical values, NaN's (not a number) and Inf's (infinite) are ignored. At least 1,000 permutation values per column should be reported in order for the tail estimation procedure to be used.
In the 'SAM' and 'GSEA' setting, each column should contain the expression levels of all genes in the dataset. The first row should contain the class labels or other response type assigned to the columns. Possible configurations of the first row should match the 'resp.type' options of the samR package (http://cran.r-project.org/web/packages/samr/index.html
). (Also see Response Type parameter below.) The first column can be used as a header column for the gene names.
• Estimation method [optional, default = PWM]
Three different methods are available to estimate the parameters of the generalized Pareto distribution (which models the tail of the distribution of the permutation values): probability weighted moments (PWM), maximum likelihood (ML), and method of moments (MOM). Using theoretical distributions and practical applications we found that all methods performed comparably to each other. Some studies have been done comparing these estimators, often favoring ML [7
• Confidence interval [optional, default = 95]
The confidence interval of the estimated P-value indicates the reliability of the estimate. The confidence interval is determined by the confidence level (default 95%). Loosely speaking, the confidence level indicates how sure (e.g. 95% sure) we can be that the actual P-value is within the confidence interval. This level can be set between 10 and 99.
• Confidence interval flag [optional, default = true]
A flag determining whether the confidence interval should be computed.
• Optimal order preserving transformation flag [optional, default = false]
A flag determining whether the optimal order preserving transform action should be applied.
• Convergence criteria flag [optional, default = false]
A flag determining whether the convergence criteria should be applied.
• Random seed [optional, default = 0]
If a numerical value between 1 and 1,000,000 is given, this will be used as a random seed allowing the user to reproduce EPEPT runs. When the (default) value 0 is selected, the random seed will be chosen arbitrarily.
• Email [optional, default = empty]
A mail will be sent to the email address (if stated) when the EPEPT run has completed. This mail contains links to the results and logs.
• Response Type [optional, default = Two class unpaired]
When EPEPT is used to generate permutation values in the 'SAM' or 'GSEA' setting, the user can choose the response type.
• Number of permutations [optional, default = 1000]
When EPEPT is used to generate permutation values in the 'SAM' or 'GSEA' setting, the user can choose the number of permutations to be performed. In the 'SAM' setting the maximum is 1,000. (SAM evaluates the P-value of one gene using the permutation values of all genes, effectively multiplying the number of permutations used by the number of genes.) In the 'GSEA' setting the maximum is 10,000.
• Gene set file [required in 'GSEA' setting]
When EPEPT is used to generate permutation values in the 'GSEA' setting, a file with gene set annotations in gene matrix transposed (.gmt) format has to be given. Such a tab delimited text file contains one gene set per row. The first two columns contain the gene set ID and description. The following columns contain the genes for that particular gene set. The annotation of these genes should match the gene annotation in the header column of the gene expression data file.
• GSEA statistic [optional, default = maxmean]
The main output of EPEPT is the set of estimated P-values. These are reported in a tab delimited text file. If headers were provided in the original file, the output file contains the same headers. If confidence intervals were requested the two rows under the row with the P-value estimates indicate the lower and upper bound of the confidence intervals. Finally, if the convergence criteria were applied another row is added with binary values indicating whether the estimate converged (1) or not (0).
Besides this text file, two picture files (a .png and an .eps file) are generated that visually depict the estimated P-values and their confidence bounds.
EPEPT is web service enabled which means that EPEPT can be accessed programmatically via any programming language with HTTP support, such as C, Java, MATLAB, Perl, Ruby, R, etc. The programmatic flow to make a request to the EPEPT processing host is as follows:
1. The user (i.e. web service client) initializes the set of input parameters and sets them to the user-defined values.
2. The client makes a POST request with the input parameters to the EPEPT host. A unique URI is returned.
3. The client checks the status of the submitted request using the unique URI. The status can be: RUNNING, COMPLETED or ERROR. The client program will loop until the status is COMPLETED or ERROR.
4. The client retrieves the output and/or log files from the host and stores these locally.
In summary, after the request has been made, everything (concerning the client) evolves around the assigned URI. The inputs, logs and outputs are accessible via uri/inputs/, uri/logs/and uri/outputs/, respectively, where uri is the URI assigned to the user by EPEPT.
Figure presents a small example, where R is employed to run EPEPT. The EPEPT website http://informatics.systemsbiology.net/EPEPT/
provides examples for four programming languages (R, Perl, MATLAB and Ruby) and offers downloads to the libraries necessary to run these examples. Also, test data sets and documentation on the exact input requirements (i.e. the variable names to be used) are available.
Figure 4 Example R code to run EPEPT. The inputs (including the tab delimited text file 'mytestdata.tsv' that contains the permutation values) are submitted by making a request to EPEPT after which a unique URI is returned to the client. Using this URI the status (more ...)
The website is a simple HTML input form, where the file with the permutation values or labeled dataset can be uploaded and all options can be set using sliders, drop down menus, check boxes and text fields. The results are presented back to the user in the results pane, which depicts the estimated P-values both as text as well as graphically and gives download links for these output files. See Figure .
A manual for the HTML input form is hosted on the EPEPT Google Code project http://code.google.com/p/epept/
. In addition to the manual, this site also hosts the source code, examples datasets and web service client examples. Links to the EPEPT Google Code website are found on the EPEPT website.
The latest version of the source code for the complete EPEPT package is available for download from the EPEPT Google Code project http://code.google.com/p/epept/
. A stand alone version of EPEPT for MATLAB is also included.