BRM v2.3 efficiently processes HTP datasets (e.g. microarray, RNAseq, proteomic) with the added capability to interact with public data sources and visual analytic tools. The addition of miRNA data sources and cross-species identifier retrieval to BRM creates a unique platform in which miRNA target prediction, merging with experimental mRNA data and downstream functional analysis and interpretation are possible within the same environment. This is particularly novel for non-human vertebrate species, such as zebrafish, which are not supported in other programs. We show here a typical workflow in BRM for integration of experimental zebrafish miRNA and mRNA microarray datasets with example retrievals for zebrafish, including pathway annotation and mapping to human orthologs.
BRM overview
As summarized in Table , BRM allows biologists to manage, process, analyze and visualize HTP data and also perform retrievals of batch annotations, cross reference identifiers and miRNA data necessary for systems biology research. BRM utilizes a familiar, easy to use, spreadsheet format with three primary interfaces that include the Project file browser, Dataset table browser and Get Started menu (Figure , respectively). The Get Started menu provides access to most data import and retrieval options. Data can be imported in delimited or Excel file format or through pasting into a clipboard. The cross-species identifier query for ortholog mapping and miRNA retrieval options, including miRNA targets, miRNA metadata and miRNA IDs, are accessible through the Get Started menu or from within the dataset browser File menu (Figure ). Each of these retrievals, which are described below in more detail, are available for Homo sapiens (human), Mus musculus (mouse), Rattus norvegicus (rat), Danio rerio (zebrafish), and Macaca mulatta (macaque).
| Table 1Bioinformatics Resource Manager (BRM) v2.3 capabilities |
miRNA Targets
Predicted gene targets can be retrieved from the miRNA Target interface for any list of miRNAs through TargetScan [
2], microCosm/miRBase [
26] and miRNA.org [
7] databases, which utilize TargetScanS or miRanda algorithms. Because individual prediction databases may provide false positive targets [
27], we made it possible to identify high-confidence targets in BRM that are conserved across any two databases or in all three databases. Alternatively, it is also possible to retrieve all targets that appear in any database for use in downstream analysis.
miRNA Metadata
The miRNA Metadata retrieval provides mature accession and sequence information for input miRNAs, which is useful for querying external miRNA databases or when comparing miRNA homology for different species.
miRNA IDs
It is also possible to identify all miRNAs associated with a gene of interest or list of genes through the miRNA ID interface. This retrieval window will query genes in several formats, including gene symbol, GenBank accession, Ensembl transcript or Entrez Gene, to output miRNAs from micro/Cosm/miRBase, TargetScan and microRNA.org.
XSpecies identifier
The cross-species identifier retrieval uses Ensembl Gene ID for species mapping making it easy to identify orthologous genes among human, mouse, rat, macaque or zebrafish. The retrieval also provides the percent identity match for both the source and destination species. This cross-species functionality expands the ability of BRM to provide batch annotations (e.g. GO or KEGG), particularly for species such as zebrafish that may have limited annotation available. Instead, human orthologs can be used for improved annotation retrieval from these databases. In addition, queries from both the original and mapped species can be merged in BRM to provide even better annotation. The mapped ortholog data can also be exported from BRM for use in other software programs that may only support a limited number of species, such as human or mouse.
Example annotation, cross-reference and cross-species retrievals for zebrafish
One of the primary conveniences of BRM is the ability to map identifiers across databases (NCBI, UniProt, Ensembl, GO, KEGG) in the same software program used to merge and analyze datasets. It is a common problem for researchers to be limited in their ability to compare or merge data because of the lack of a common identifier, whether this is due to differences in platform (e.g. Agilent v. Affymetrix), data type (e.g. transcriptomic v. proteomic) or species. In BRM v2.3, we have expanded the number of species that can be used for retrieval and added additional retrieval options. Many queries, including those from UniProt, NCBI and GO, are available for all species, while CMR queries are available for microbial species. KEGG pathway and gene retrievals can now be made for approximately 50 of the most common eukaryotic and prokaryotic organisms. All cross-species and miRNA retrievals are currently available for human, mouse, rat, macaque and zebrafish.
Here we present example batch retrievals in zebrafish (Danio rerio) for annotation, cross-reference and cross-species queries (Figure ). By starting with only a single column of zebrafish identifiers (e.g. Entrez Gene ID) as input, we show retrieval of (1) gene symbol and Ensembl gene ID from the Gene (NCBI) Data query; (2) biological process GO term from the GO Data query; (3) pathway names from the KEGG Data query; and (4) UniProt/SwissProt accession from the Protein (UniParc) Data query. Further, by using the zebrafish Ensembl gene as input for the XSpecies Identifier query, we can map the zebrafish genes to human Ensembl Gene ID for additional data mining. We can also use either the Gene or Protein Data queries to cross-reference identifiers and retrieve Ensembl transcript ID for use in the MicroRNA ID query. All query interfaces allow the user to choose the appropriate species, input ID/column and output ID for retrieval. The queries occur in the background through internal conversion tables and the output can be added as columns to the current spreadsheet or as a separate tab. These features of BRM v2.3 allow for seamless processing of HTP data for zebrafish and many other species.
Example workflow for integration and analysis of miRNA and mRNA microarray data
In order to understand the biological consequences of miRNA expression changes, it is necessary to know which miRNA target genes might be post-transcriptionally repressed within a given biological system. Because target prediction algorithms identify hundreds of possible targets, accurate identification of putative mRNA targets can be illuminated from parallel experimental measurements of miRNAs and mRNAs (e.g. microarray or RNAseq) followed by computational analysis involving (1) miRNA target prediction, (2) integration of predicted targets with mRNA transcripts, and (3) functional and pathway analysis of resulting experimental miRNA gene targets. BRM v2.3 provides a convenient platform for performing these steps within a single software environment (Figure ) instead of multiple manual steps in separate tools.
In this example, we wanted to determine whether transient developmental exposure to the neurotoxicant nicotine misregulates expression of miRNAs that control neurobehavioral development and function. Therefore, zebrafish embryos were exposed to 30 μM nicotine from 6–48 hours post fertilization (hpf), a window that encompasses early neurogenesis, and samples were collected at multiple developmental stages for parallel miRNA and mRNA microarray analysis (Additional file
1). Importantly, transient developmental exposure to nicotine resulted in behavioral hyperactivity in larval zebrafish in the absence of overt morphological defects (Additional file
2). To identify miRNA and putative target transcripts that may drive the observed behavioral phenotype, significant gene lists for each dataset were uploaded to BRM for integration and analysis of miRNA target genes.
Step 1. miRNA target prediction
Developmental exposure of zebrafish to nicotine resulted in alteration of 42 significant (p<0.05) miRNAs across all developmental stages compared to control animals (Additional file
3). Because zebrafish miRNA target prediction databases are limited, we uploaded the zebrafish miRNA list with human homologs identified from miRBase to BRM in order to retrieve predicted targets from all three prediction datasources. The orthologous human and zebrafish miRNAs were identified based on coding sequence. In BRM, we used the miRNA Metadata query to retrieve mature sequences for both species and then filtered our list to the 33 miRNAs with ≤ 1 mismatch (but with perfect complementarity in the seed region) using the integrated OpenOffice Calc feature. These highly conserved human miRNAs were submitted to the miRNA Target query for retrieval of human predicted gene targets that were conserved across any 2 of the 3 datasources (TargetScan, microCosm/mirBase, and miRNA.org) using an approach described previously [
28]. The target algorithms consider the presence of the 3’ binding site for prediction of target genes [
2,
7,
26]. As depicted in the Venn diagram (Figure , Step 1), 28,205 targets with unique gene symbols were predicted from all three sources, including 14,577 from TargetScan, 12,601 from microCosm/mirBase and 16,931 from miRNA.org. A total of 13,213 targets matched at least 2 of the 3 databases and were used for integration with the mRNA transcripts in Step 2. In contrast, it is possible to directly query the zebrafish miRNAs for predicted targets in BRM using the target query from microCosm/mirBase, which is currently the only miRNA target prediction tool to support this species. This retrieval results in 4,192 zebrafish targets with unique gene symbol, which could either replace or be used in combination with the human predicted target list.
Step 2. Integration of miRNA target and mRNA microarray gene lists
The next step is to integrate the miRNA target list with the mRNA microarray gene list (Figure , Step 2). Developmental exposure to zebrafish resulted in 496 genes significantly altered (p<0.05) compared to controls as measured by Nimblegen microarray. For data integration, the Merge Datasets function can be accessed from the Edit menu within the open miRNA target prediction dataset from Step 1. The mRNA transcript file is then selected for merging through a browse feature which allows selection of any file in the BRM project menu. Next, the columns to merge on must be chosen. BRM allows multi-level merges so that it is possible to achieve the best overlap between datasets through both direct relationships (i.e. common identifiers between datasets) or indirect relationships that involve translation of one identifier to another (e.g. Entrez Gene → gene symbol). BRM will merge based on the order of the relations shown, so in this case gene symbol is merged, followed by entrez IDs, finally attempting to merge any remaining identifiers by mapping the entrez ID to previously unmapped gene symbols. The user can then select which columns to show in the output, whether to show only matching rows (intersection) or all rows from both datasets (union), as well as the location to save the merged dataset. After clicking the Merge button, a pop up window shows the merge statistics of each relationship.
Overall, 199 mRNA transcripts (out of 496) matched predicted targets of altered miRNAs. These data suggest that transient developmental exposure to nicotine results in differential expression of gene transcripts putatively targeted by miRNAs significantly misregulated upon nicotine exposure. In this example, we merged all mRNA transcripts with conserved miRNA gene targets; however, it is also common to merge only anti-correlated mRNAs with miRNA targets [
28] since a primary mechanism for miRNAs to direct post-transcriptional regulation of proteins is through repression. Anti-correlated lists would first need to be separated into up and down datasets and could then be merged as described above.
Step 3. Functional enrichment and hierarchical clustering using GAGGLE framework in BRM
In Step 3, we predict the functional consequences of nicotine-mediated disruption of miRNA signaling pathways in developing zebrafish (Figure ). Gaggle is used to broadcast data from BRM to other analysis programs, including DAVID (via Firegoose) for functional enrichment and MEV for clustering analysis. First, in order to visualize the changes associated with putative miRNA transcripts after exposure to nicotine, we clustered the data using MEV. To do this, we simply highlighted the identifier column along with all of the expression data columns in BRM to broadcast to MEV. Unsupervised hierarchical clustering then was performed in MEV by Euclidean distance metric and centroid linkage clustering to group patterns of gene expression across the timecourse.
To identify biological processes enriched in predicted miRNA targets, we performed functional enrichment of the data in DAVID. Once the Firegoose option is chosen from the dataset browser in BRM, a Firefox window will open and connect the Firegoose toolbar to the Gaggle boss. It is then possible to broadcast a column of gene identifiers to DAVID via Firegoose. In this example, the gene symbol column from the integrated mRNA/miRNA dataset was broadcast to DAVID for functional enrichment using the Nimblegen array platform as the background. We utilized the DAVID clustering annotation tool to identify significantly enriched (p≤0.05) biological process GO terms for the 199 miRNA gene targets. Figure shows the resulting functional enrichment and clustering analysis of the integrated miRNA predicted target and mRNA transcript dataset. Enrichment of biological processes related to immune function (GO:0002684, GO:0002443), blood coagulation (GO:0007596), metabolic processes (GO:0006096, GO:0006006) and cytoskeleton organization (GO:0030036) were observed. In addition, several processes related to nervous system development and function were also altered, including fear response (GO:0042596), synaptic vessicle transport (GO:0048489) and calcium ion transport (GO:0051928), indicating that nicotine exposure disrupts the expression of genes involved in neurogenesis, possibly through post-transcriptional regulation by differentially expressed miRNAs. These data identify a suite of misexpressed miRNAs and putative target transcripts that may choreographbehavioral hyperactivity in zebrafish developmentally exposed to nicotine. More broadly, the findings open the door for targeted studies to identify the mechanism by which developmental exposure to nicotine produces behavioral abnormalities in larval zebrafish. As this analysis was only performed for miRNAs that are highly conserved across species, the data may reveal insights into the role of miRNA signaling during development and the functional consequences of developmental nicotine exposure in higher vertebrate organisms. Collectively, these data support the concept that miRNA signaling pathways are targets of developmental neurotoxicants and can be altered by developmental nicotine exposure.