The silkworm,
Bombyx mori, has been domesticated for silk production for about 5,000 years from the wild silkworm,
Bombyx mandarina. As the only truly domesticated insect, it is completely dependent on humans for survival and reproduction. Currently, it is one of the most important economic insects in many developing countries owing to its large-scale propagation and utilization for silk production. Comparison with its wild ancestor
B. mandarina at the genome level provides an opportunity to examine the effects of artificial selection leading to domestication. In addition, it is the model organism for Lepidoptera, the second largest order of insects, which includes the most destructive agricultural pests. With the development of biotechnology,
B. mori has come to be used as an important bioreactor for production of recombinant proteins [
1,
2]. Silkworm genome information not only makes a strong impact on improving sericulture, but also facilitates the development of new methods for pest control.
Genome analyses of insects have moved rapidly in recent years, because insects are the most diverse species on earth and their characteristic biological phenomena are important resources for basic science and industry. Among model insects, complete genome sequences have been published for
Drosophila melanogaster [
3],
Anopheles gambiae [
4],
Apis mellifera [
5] and
Tribolium castaneum [
6]. In 2004, the draft whole genome shotgun (WGS) sequences of the silkworm were reported independently in Japan [
7] and China [
8], but these produced insufficient genome sequence information because of shallow genome coverage compared with the analyses of the other species. Subsequently, the two independent WGS data sets were merged and assembled together with newly obtained fosmid- and BAC-end sequences. Although these two data sets were derived from two different strains of silkworm, sequence comparison revealed merely 0.2% difference at the nucleotide level. In addition, p50T inbred strain, which was used for WGS by Japanese group, was derived from the same origin of Dazao strain of Chinese group. The RAMEN assembler, which is featured a lookup table generation of seed strings for highly sensitive regions and rapid detection of overlapping reads and precise alignment by efficient banded dynamic programming, was used. Additionally, RAMEN includes a repeat untangling method for transforming a repeat subcontig flanked by two unique subcontigs into one unique contig, thereby circumventing problems associated with the high density of transposable elements in the silkworm genome [
7]. Among all the sequenced agriculturally important insect genomes, the silkworm genome assembly (432 Mb) has the best continuity (~3.7 Mb in N50 scaffold size) and provides extensive nucleotide coverage (88%) of all the 28 chromosomes. This was made possible by the availability of a high-density SNP linkage map constructed by the analysis of BAC-end sequences and integrated with a physical map of contigs established by BAC fingerprinting using the FPC program [
9]. In a related project, EST data derived from various tissues and different developmental stages were compiled in SilkBase [
10] and proteomic data of distinct tissues at different stages were obtained from two-dimensional polyacrylamide gel electrophoresis and mass spectrometry [
11]. Finally, a
Bombyx trap database has been established to provide reporter expression patterns and inserted positions of mutators of enhancer-trap [
12] and gene trap lines.
With the tremendous accumulation of genomic information for various organisms, extensive tools have been developed for visualization of data and results. AceDB [
13],
C. elegans [
14] was one of the pioneering databases, where genetic map, physical map, genes, clones, markers, and so on, were integrated and displayed in a format easy to see. The technique of placing linkage and physical maps side-by-side, first employed in AceDB was used widely in INE [
15], NCBI map viewer [
16], Cmap [
17,
18], and other systems. Subsequently, genome browsers which present extensive genomic information of Mb-order such as chromosomes or scaffolds were developed and available as follows. 1) Ensembl is a portal system developed to handle very large genome and associated requirements of the human genome from sequence analysis to data storage and visualization [
19]. 2) GBrowse [
20], which was developed for
Drosophila genome sequences, has the browser software characterized by readily available open source components, and flexible configuration. 3) UTGB (University of Tokyo Genome Browser) was developed for Japanese Medaka (
Oryzias latipes) [
21]. Track showing information used in UTGB is an independent web application that allows enhanced expressibility and expandability of conventional genome browser. In order to construct a user-friendly and efficient genomic database for silkworm, we integrated all of the genomic sequences, map information, ESTs, proteomic data and information on enhancer trap strain into an integrated database called KAIKObase, where all analyzed results and data on nucleotide sequence, scaffold and chromosome are displayed by GBrowse and UTGB.