The initial publication of two draft versions of the human genome led to intense debate over the exact number of genes in the human genome [1
]. Current estimates suggest that the human genome encodes approximately 35,000 to 38,000 although the final number must await the complete annotation of each genome sequence. The search for additional genes not discovered during early annotation attempts has involved the use of several different approaches. These have included the sequencing of randomly selected cDNAs from various tissue sources, the development of computer-based prediction programs of ever-increasing accuracy, and the direct comparison between the human genome and the genome sequences of other vertebrates and invertebrates [3
]. Using these approaches, fully annotated genomes of numerous species will be available within a relatively short time.
We approached the problem of gene identification by using a combination of experimental and in silico
techniques. Specifically, we initiated a project designed to sequence expressed sequence tags from the hamster testis and used these sequences to identify unannotated, or incompletely annotated, genes in the human and other vertebrate genomes. The hamster has not been used extensively in genomics research; however, it has been used extensively in various areas of investigation including circadian rhythm research [10
] and also in investigations in a number of areas of research in reproductive biology. For example, the study of hamster gametes has revealed significant information concerning the mechanisms underlying species-specific sperm-egg interactions [11
] and the deleterious effects of endocrine disruptors on male and female reproductive development [14
]. The hamster, mouse and rat are all members of the family Muridae, however both mice and rats belong to the subfamily Murinae while hamsters belong to the subfamily Cricetinae. Three hamster species that are commonly used in research are Mesocricetus auratus
(Syrian golden hamster), Cricetulus griseus
(Chinese hamster) and Phodopus sungorus
(Siberian hamster). Therefore, sequence information from any hamster species should complement information gained from other closely related species.
The testis was chosen for these studies as it represents a viable source for the identification of novel genes. The adult testis is a complex organ consisting of numerous different somatic cell types as well as germ cells at all stages of spermatogenesis from the gonocyte stem cells to the mature sperm cells [18
]. Consequently, several unique gene populations, including those involved in the regulation of meiosis, as well as those specific to the various testicular cell types, are expressed in the testis. A recent gene discovery study performed in the testis of Drosophila melanogaster
found that 47% of greater than 1500 sequenced cDNAs did not match to ESTs previously identified in this organism [19
]. Likewise the testis of the cynomolgus monkey has yielded several novel gene sequences [8
]. Therefore, we reasoned that the sequencing of ESTs from hamster testis might reveal the existence of novel genes conserved in other species that may function in controlling testicular development and/or function. In this report, we describe our initial results from the sequencing of randomly-selected cDNAs from the testes of male Syrian golden hamsters. In particular we identified eight cDNAs that appear to be derived from genes that were not previously annotated in the human genome. We describe the detailed analysis of two of these genes, which encode a new member of the kinesin superfamily of microtubule-based molecular motors and a protein likely to be involved in chromatin remodeling.