PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-7 (7)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data 
Bioinformatics  2012;28(23):3163-3165.
Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of ‘widgets’ performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages.
Availability: Freely available from http://www.intermine.org under the LGPL license.
Contact: g.micklem@gen.cam.ac.uk
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts577
PMCID: PMC3516146  PMID: 23023984
2.  modMine: flexible access to modENCODE data 
Nucleic Acids Research  2011;40(D1):D1082-D1088.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database (http://intermine.modencode.org) described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
doi:10.1093/nar/gkr921
PMCID: PMC3245176  PMID: 22080565
3.  PomBase: a comprehensive online resource for fission yeast 
Nucleic Acids Research  2011;40(D1):D695-D699.
PomBase (www.pombase.org) is a new model organism database established to provide access to comprehensive, accurate, and up-to-date molecular data and biological information for the fission yeast Schizosaccharomyces pombe to effectively support both exploratory and hypothesis-driven research. PomBase encompasses annotation of genomic sequence and features, comprehensive manual literature curation and genome-wide data sets, and supports sophisticated user-defined queries. The implementation of PomBase integrates a Chado relational database that houses manually curated data with Ensembl software that supports sequence-based annotation and web access. PomBase will provide user-friendly tools to promote curation by experts within the fission yeast community. This will make a key contribution to shaping its content and ensuring its comprehensiveness and long-term relevance.
doi:10.1093/nar/gkr853
PMCID: PMC3245111  PMID: 22039153
4.  FlyMine: an integrated database for Drosophila and Anopheles genomics 
Genome Biology  2007;8(7):R129.
This novel web-based database provides unique accessibility and querying of integrated genomic and proteomic data for Drosophila and Anopheles.
FlyMine is a data warehouse that addresses one of the important challenges of modern biology: how to integrate and make use of the diversity and volume of current biological data. Its main focus is genomic and proteomics data for Drosophila and other insects. It provides web access to integrated data at a number of different levels, from simple browsing to construction of complex queries, which can be executed on either single items or lists.
doi:10.1186/gb-2007-8-7-r129
PMCID: PMC2323218  PMID: 17615057
5.  A Human-Curated Annotation of the Candida albicans Genome 
PLoS Genetics  2005;1(1):e1.
Recent sequencing and assembly of the genome for the fungal pathogen Candida albicans used simple automated procedures for the identification of putative genes. We have reviewed the entire assembly, both by hand and with additional bioinformatic resources, to accurately map and describe 6,354 genes and to identify 246 genes whose original database entries contained sequencing errors (or possibly mutations) that affect their reading frame. Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that might be targeted for antifungal therapy. We also observed that, compared to other fungi, the protein-coding sequences in the C. albicans genome are especially rich in short sequence repeats. Finally, our improved annotation permitted a detailed analysis of several multigene families, and comparative genomic studies showed that C. albicans has a far greater catabolic range, encoding respiratory Complex 1, several novel oxidoreductases and ketone body degrading enzymes, malonyl-CoA and enoyl-CoA carriers, several novel amino acid degrading enzymes, a variety of secreted catabolic lipases and proteases, and numerous transporters to assimilate the resulting nutrients. The results of these efforts will ensure that the Candida research community has uniform and comprehensive genomic information for medical research as well as for future diagnostic and therapeutic applications.
Synopsis
Candida albicans is a commonly encountered fungal pathogen usually responsible for superficial infections (thrush and vaginitis). However, an estimated 30% of severe fungal infections, most due to Candida, result in death. Those who are most at risk include individuals taking immune-suppressive drugs following organ transplantation, people with HIV infection, premature infants, and cancer patients undergoing chemotherapy. Current therapies for this pathogen are made more difficult by the significant secondary effects of anti-fungal drugs that target proteins that are also found in the human host.
Recent sequencing and assembly of the genome for the fungal pathogen C. albicans used simple automated procedures for the identification of putative genes. Here, we report a detailed annotation of the 6,354 genes that are present in the genome sequence of this organism, essentially writing the dictionary of the C. albicans genome.
Comparison with other fungal genomes permitted the identification of numerous fungus-specific genes that are absent from the human genome and whose products might be targeted for antifungal therapy. The results of these efforts will thus ensure that the Candida research community has uniform and comprehensive genomic information for medical research, for the development of functional genomic tools as well as for future diagnostic and therapeutic applications.
doi:10.1371/journal.pgen.0010001
PMCID: PMC1183520  PMID: 16103911
6.  GeneDB: a resource for prokaryotic and eukaryotic organisms 
Nucleic Acids Research  2004;32(Database issue):D339-D343.
GeneDB (http://www.genedb.org/) is a genome database for prokaryotic and eukaryotic organisms. The resource provides a portal through which data generated by the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be made publicly available. It combines data from finished and ongoing genome and expressed sequence tag (EST) projects with curated annotation, that can be searched, sorted and downloaded, using a single web based resource. The current release stores 11 datasets of which six are curated and maintained by biologists, who review and incorporate information from the scientific literature, public databases and the respective research communities.
doi:10.1093/nar/gkh007
PMCID: PMC308742  PMID: 14681429
7.  The DNA sequence of chromosome I of an African trypanosome: gene content, chromosome organisation, recombination and polymorphism 
Nucleic Acids Research  2003;31(16):4864-4873.
The African trypanosome, Trypanosoma brucei, causes sleeping sickness in humans in sub-Saharan Africa. Here we report the sequence and analysis of the 1.1 Mb chromosome I, which encodes approximately 400 predicted genes organised into directional clusters, of which more than 100 are located in the largest cluster of 250 kb. A 160-kb region consists primarily of three gene families of unknown function, one of which contains a hotspot for retroelement insertion. We also identify five novel gene families. Indeed, almost 20% of predicted genes are members of families. In some cases, tandemly arrayed genes are 99–100% identical, suggesting an active process of amplification and gene conversion. One end of the chromosome consists of a putative bloodstream-form variant surface glycoprotein (VSG) gene expression site that appears truncated and degenerate. The other chromosome end carries VSG and expression site-associated genes and pseudogenes over 50 kb of subtelomeric sequence where, unusually, the telomere-proximal VSG gene is oriented away from the telomere. Our analysis includes the cataloguing of minor genetic variations between the chromosome I homologues and an estimate of crossing-over frequency during genetic exchange. Genetic polymorphisms are exceptionally rare in sequences located within and around the strand-switches between several gene clusters.
PMCID: PMC169939  PMID: 12907729

Results 1-7 (7)