Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone.
We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at http://genotator.hms.harvard.edu.
Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease.
As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.