The annotation of the An. gambiae
genome is being manually appraised using the GMOD annotation tool Apollo (4
). Currently, over 50% of the genome has been completed including the entirety of the chromosome arms 2L, 2R and X. Many loci have been updated to correct systematic errors in the computational annotation; especially in reference to tandem arrays of multi-gene families, gene merges from multiple partial predictions and the removal of suspect predictions likely to be based on transposable element sequences. Manual annotations are stored in a separate CHADO database (5
), displayed as a track in the genome browser via DAS (6
) and integrated into the main gene build during the next round of re-annotation.
Small-scale manual appraisal of gene predictions has been undertaken for An. aegypti and C. quinquefasciatus as part of the quality control for the gene builds. In the case of C. quinquefasciatus, this revealed at least 1500 predictions which were removed from the CpipJ1.2 dataset. Amongst the deprecated gene predictions were a large set of single exon predictions which had no supporting transcript evidence and no similarity to other mosquito proteomes or any other sequences in the public databases. Expert opinion was that these were erroneous over-prediction by the computation algorithms rather than a large Culex-specific gene family. Efforts such as these highlight our determination to improve gene prediction accuracy through the integration of new data sets and the re-appraisal of the existing prediction set.