Draft sequences of the human genome were recently reported (
1,
2). Much work remains to be done to produce a complete finished sequence and progress can be best assured by a diversity of approaches (
1). At present, one of the principal goals for genome research is a careful and systematic validation of the assembled sequence (
1–
4). In this respect it becomes critically important to develop and to apply strategies capable of fulfilling this crucial aim. These strategies should exploit approaches independent of those used to generate the original genome sequence. Short sequences flanking the rare restriction sites, for instance
NotI, might serve as a tool for validation of human genome structure.
NotI linking clones contain pairs of sequences flanking a single
NotI recognition site, while
NotI jumping clones contain DNA sequences spanning between neighboring
NotI restriction sites. Such clones were shown to be tightly associated with CpG islands and genes (
5,
6). The use of
NotI linking and jumping clones as framework markers was proposed to define the structure of large regions of human chromosomes (
7–
13). To achieve this goal, simplified procedures for the construction of
NotI jumping and
NotI linking libraries were developed and a number of chromosome 3-specific and other chromosome-specific and total human
NotI linking libraries were prepared (
7–
15).
One thousand human chromosome 3-specific
NotI linking clones were partially sequenced (
6). Among these, 249 unique clones were identified and 152 were carefully analyzed. To localize these clones, PCR, Southern hybridization, pulsed field gel electrophoresis (PFGE) and two- or three-color fluorescent
in situ hybridization (FISH) were applied. In many cases, chromosome jumping was successfully used to resolve ambiguous mapping (
6,
13). This
NotI map was compared to the chromosome 3 map, based on yeast artificial chromosome clones and radiation hybrids (
14), and significant differences in several chromosome 3 regions were noticed. Importantly, these differences included a 3p14–p22 region with homozygous deletions and most likely containing tumor suppressor genes (
6). These data supported earlier notions (
13,
15) that a
NotI physical map can be more informative than genetic or radiation hybrid maps.
To enable a direct assessment of the value of
NotI clones in genome research, high-density grids with 50 000
NotI linking clones derived from six representative
NotI linking and three
NotI jumping libraries were constructed. Altogether, these libraries contained nearly 100 times the total estimated number of
NotI sites in the human genome. Sequencing of 20 000
NotI clones was projected to provide information linked to 10–20% of all human genes (
9) and may help in the identification of new genes. Before starting a large-scale project, a pilot study to validate the proposed strategy was performed (
16). In that work 3265 unique
NotI flanking sequences were generated. Analysis of sequences demonstrated that ~50% of these clones displayed significant similarity to protein and cDNA sequences. Among these unique sequences, 1868 (57.2%) were novel sequences, not present in the EMBL or expressed sequence tag (EST) databases (similarity ≤90% over 50 bp). The work also showed tight, specific association of
NotI sites with the first exons of genes. From that
NotI resource several new genes have been identified, isolated and mapped (
17–
22).
As the pilot experiments confirmed expectations, the sequencing of NotI clones was continued and ~22 500 unique NotI sequences were generated. This work provides the initial analysis of these data.