|Home | About | Journals | Submit | Contact Us | Français|
Four hundred and eighty-one ultraconserved sequences (UCRs) longer than 200 bases were discovered in the genomes of human, mouse and rat. These are DNA sequences showing 100% identity among the three species. UCRs are frequently located at genomic regions involved in cancer, differentially expressed in human leukemias and carcinomas and in some instances regulated by microRNAs (miRNAs). Here we present UCbase & miRfunc, the first database which provides ultraconserved sequences data and shows miRNA function. Also, it links UCRs and miRNAs with the related human disorders and genomic properties. The current release contains over 2000 sequences from three species (human, mouse and rat). As a web application, UCbase & miRfunc is platform independent and it is accessible at http://microrna.osu.edu/.UCbase4.
UCbase & miRfunc is a database of (i) human, mouse and rat ultraconserved elements and (ii) microRNAs (miRNAs).
The database has three main functions:
Ultraconserved elements (UCRs) were discovered in 2004 by Bejerano and colleagues by bioinformatics comparisons of the genomes of mouse, rat and human (1). Four hundred and eighty-one UCR sequences are 200–779 bp in length showing 100% identity among the three species. Some of them contain protein coding sequences, but over half are not predicted to codify any protein (1).
Previous studies have suggested an important role for these noncoding sequences both in promoting the expression of several genes and in regulation of alternative splicing (5). Probably many of the UCRs date from a very early period in vertebrates evolution, as they have no orthologous counterparts in sea squirts, flies or worms even if at least one group of these UCRs evolved from a novel retrotransposon family that was active in lobe-finned fishes, and is still active today in the ‘living fossil’ Coelacanth (Latimeria chalumnae), the ancient link between marine and land vertebrates (5).
Recently, Calin et al. (6) identified a functional role for miRNAs in the transcriptional regulation of cancer-associated UCRs. They proved in tumors that differentially expressed UCRs could alter the functional characteristics of malignant cells. By combining this data with the much more elaborate model involving miRNAs in human tumorigenesis, they propose that alteration in both coding and noncoding RNAs cooperate in the initiation and/or progression of malignancy.
miRNAs were first described in 1993 by Lee and colleagues (7), yet the term ‘microrna’ was only introduced years later in 2001 in an article published in Science (8). Findings over the past 5 years supported a role for miRNAs in the regulation of crucial processes such as cell proliferation (9), apoptosis (10), development (11), differentiation (12) and metabolism (13). Each miRNA is supposed to target several hundreds of transcripts (14), making miRNAs one of the main genome regulators.
miRNAs regulate their targets by direct cleavage of the mRNA and by inhibition of protein synthesis, depending on the degree of complementarities with their target 3′-UTR regions (15). miRNAs are processed from primary transcripts known as pri-miRNAs (16) but not translated into protein. A portion of this primary transcript is recognized and cleaved by the enzyme Drosha into a miRNA precursor (pre-miRNA) (17) and finally processed to functional miRNA.
These pre-miRNAs are then processed to mature miRNAs in the cytoplasm by interaction with the endonuclease Dicer, which also initiates the formation of the RNA-induced silencing complex (RISC) (18). This complex is responsible for the gene silencing observed due to miRNA expression and RNAs interference. The pathway in plants varies slightly due to their lack of Drosha homologs (19).
At the time of this writing, the miRBase server version 11 contains 678 human pre-miRNAs (48–150 nt long) and 847 mature miRNAs (17–28 nt long).
Pre-miRNAs do not have a perfect double-stranded RNA structure and they are topped by a terminal loop (hairpin shape). Most of them are conserved between classes, the free energy is often less than −25 Kcal/mol, the GC-Ratio is usually 30–70% and the entropy is between 0 and 2 (20).
Recently, miRNA expression has been linked to cancer. The first evidence came from the finding that miR-15a and miR-16–1 are downregulated or deleted in most patients with chronic lymphocytic leukemia (CLL) (21). Several other groups have then studied the miRNA expression in cancer patients and found that miRNAs are differentially expressed in normal and tumor tissues (22–26) and, in some cases, are associated with prognosis (27).
The UCbase & miRfunc database contains three principal types of search:
One of the most useful peculiarities of UCbase & miRfunc is that it provides tables showing ultraconserved elements properties (enhancer activity, alternative splicing, splicing NMD regulation and transcription evidence) and miRNA ‘experimentally proved’ function, addressing a number of common questions pertaining to these noncoding RNA classes (Figure 3). We decided to show the experiments that provide molecular biology evidence of a particular miRNA function. Paper references supplying only miRNA data were not included in the table.
All of this information is periodically retrieved (every 2 months) using a Perl scripts implemented with WWW::Search::Pubmed (http://search.cpan.org/~gwilliams/WWW-Search-PubMed-1.004/) and WWW::Mechanize (http://search.cpan.org/dist/WWWMechanize/) modules which provide an API search engine linked to the Pubmed database (30).
The sequence comparison page allows the researchers to match selected sequences against miRNAs and ultraconserved elements (exact/500/1000/2000/5000/10 000 bp up/downstream) (Figure 4) using The Basic Local Alignment Search Tool (BLAST) which finds regions of local similarity between sequences (31).
The researchers can also align miRNAs and ultraconserved elements versus TRANSFAC (32) and CpG Island Searcher databases (33). Checking the parsing option it is possible to turn the output into a single-line summary. This function is invoked internally by a Perl script which uses the Bio::SearchIO module (http://search.cpan.org/~birney/bioperl-1.2.3/Bio/SearchIO.pm).
Moreover it is possible to download a RepeatMasker (34) miRNAs/UCRs table showing the repetitive miRNAs and ultraconserved sequences in human genome.
UCbase & miRfunc database is linked to microrna.org (35) to retrieve miRNA gene expression data sets; whereas, it collects ultraconserved sequences microarray expression data from ArrayExpress (36), GEO (37) and microrna.osu.edu server.
Cascading Style Sheets (CSS) were used for UCbase & miRfunc web development to style web pages written in HTML. The CSS specifications are maintained by the World Wide Web Consortium (W3C). Internet media type (MIME type) text/css is registered for use with CSS by RFC 2318 (March 1998).
UCbase & miRfunc was created using MySQL database under Debian Etch Linux OS installed on a Quad Core Processor machine with 32 GB RAM.
UCbase & miRfunc database can be accessed by Perl programs running on the web server. The MySQL Perl API is mainly provided with the use of CGI (Common Gateway Interface) which is a standard for external gateway programs to interface with information servers such as HTTP servers (Apache2 in this particular case). A CGI object (cgi.pm) was used to handle POST and GET methods correctly, and distinguish between scripts called from ISINDEX documents and form-based documents.
Almost all the code used to develop the database and obtain miRNAs and ultraconserved elements information was written in Perl.
Perl (www.perl.org) is a programming language originally created for text manipulation and for a wide range of tasks including system administration, web development, network programming and GUI development. It is one of the most common language programming in bioinformatics, where it is valued for rapid application development and deployment, and the ability to handle large data sets. Perl is also used for the web automation.
Web automation can automate all the web processes from simple filling of forms to more complicated tasks for data transfer, web data extraction, image recognition and performing tasks based on it, scheduling tasks, batch process and file management. A Perl module called WWW::Mechanize was used to extract the data from NCBI (search_pubmed_nuovo.pl) and miRBase (www_mechanize_sanger_total.pl) pages; whereas, a script written in Pascal (http://www.newbie.com) was used to extract miRNAs and ultraconserved sequences (500/1000/2000/5000/10 000 bp up/downstream) from UCSC server (ucsc.nbl).
In particular, all the utilities that come with WWW::Mechanize print the names and elements of every form and provide all the needed information when searching for form using regular expressions (http://www.opengroup.org/onlinepubs/007908799/xbd/re.html).
However, using that data in the code means to cut and paste multiple entire blocks of code. For this reason, it is useful to install HTTP::Recorder to set up an HTTP proxy. HTTP::Recorder saves each action as WWW::Mechanize code. Before running the recording script (recorder.pl), the browser proxy has to be configured properly. Running recorder.pl starts an HTTP proxy daemon that the browser uses to make requests. The proxy uses HTTP::Recorder agent to log these requests. It saves a logfile as a.t file, which is specified when creating the HTTP::Recorder object.
Moreover, a window will appear displaying the content of.t file which should be a series of statements involving a hypothetical WWW::Mechanize object. These scripts were added to the final Perl code file called www_mechanize_sanger_total.pl.
WWW::Search::PubMed version 1.004 provides instead a backend to the WWW::Search module allowing searches of the National Library of Medicine's PubMed biomedical citation database (30) (pubmed.pl).
All the miRNAs genomic information are obtained using an automatic script linked to miRBase ftp server (ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/database_files) which runs periodically using CRON, a time-based scheduling service in Unix-like computer operating systems (http://packages.debian.org/etch/cron).
UCbase & miRfunc is the first database containing the long 481 ultraconserved sequences discovered in the genomes of human, mouse and rat by Bejerano et al. (1) and identified as disregulated in cancer by Calin et al. (6). In addition, UCbase & miRfunc is the first database that provides miRNA function, which is a particularly important feature due to the increasing output of miRNA data. The goal of this study is to offer the researchers an advanced set of tools that supply information about correlation between miRNAs, UCRs and the disorders related to their aberrant expression.
The first version of UCbase & miRfunc includes new methods of queries such as search for band, disorders and genes. It has the capability of using multiple queries and combining the results from several miRNA/UCRs input. In addition, the system has a new tool for the RNA secondary structure prediction (up to 500 bp up/downstream), which allows innovative visualization of miRNAs and ultraconserved elements. This feature improves presentation of data output that is especially useful when several structures are obtained using multiple queries.
The sequence comparison tool, which is also available, allows researchers to match selected sequences that are 500/1000/2000/5000/10 000 bp upstream/downstream of all defined miRNAs and UCRs in the database.
Additionally, UCbase & miRfunc has adopted several strategies to integrate search tools. It provides updated data by using automate web scripts.
The system will be regularly upgraded with new human, mouse and rat miRNAs/UCRs by providing the corresponding reference sequences and annotations, thus allowing the data to be refined continuously with every new miRBase and UCSC database version. It also provides complete human, mouse and rat miRBase table as a single file (available from the resource web page), a feature that is particularly useful if one wants to use regular expressions to identify the segment of the input file associated with sub-field of interest.
Although other alternatives are available for retrieval of miRNA nomenclature, sequence data and annotation, UCbase & miRfunc is unique combination of features that provide biologists many options for data analysis and discovery of relationships between miRNAs and ultraconserved sequences.
The improvements to a new set of OMIM (28) and HGMD (38) databases needed for an overall plan for collection (39), hopefully should enable UCbase & miRfunc users, in the next year, to fully exploit the relation between miRNAs/UCRs and the mutation or chromosome aberrations related to their genomic location.
Supplementary data and programming codes are available at http://web.unife.it/utenti/cristian.taccioli/software/. The user has to rename the codes files into.pl format because of security reasons we have uploaded them as.txt.
Free web tools written for miRNA and UCRs microarray analysis can also be downloaded at http://web.unife.it/utenti/cristian.taccioli/mix/examples.html.
Funding for open access charge: OSU Human Cancer Genetics Program.
Conflict of interest statement. None declared.
We gratefully acknowledge the support of all members of the Department of Molecular Virology, Immunology and Medical Genetics of The Ohio State University. We also appreciate the technical support of Melissa Dickman, Daniela Taccioli, Ivan Tassani, Stefan Costinean, Nicola Zanesi, Pierluigi Gasparini and Gianpiero Di Leva.