Pathogens are involved in a variety of human diseases. Of particular interest are emerging infections and idiopathic chronic diseases, including cancer [1
]. Nevertheless, clinically important microbial pathogens in human disease are likely to be under-recognized [2
]. Infectious agents have been identified as pathogens in several human tumors, accounting for approximately 20% of human cancers worldwide [4
]. However, this proportion accounts for only a few known viral or bacterial agents. Emerging data suggests the role of additional unknown microbes in tumorigenesis [2
]. Detection of these infectious agents could provide novel approaches to the prevention, diagnosis and therapy of human cancer.
In the search for the presence of pathogenic DNA in human disease tissues, two kinds of approaches, candidate-based and subtractive, have been previously used. Consensus polymerase chain reaction (PCR) [5
] and newer DNA microarray-based screening [8
] methods have been used for candidate-based foreign sequence identification. Candidate screens employing PCR-based techniques have been successfully used for identification and typing of HPV in cervical cancer [12
]. DNA microarrays composed of oligonucleotides corresponding to conserved sequences of multiple viruses have been applied to identify a new xenotropic murine leukemia virus-related virus in human prostate tumor cells [8
]. Subtractive methods, including representational difference analysis (RDA) [13
], computational subtraction [14
], and digital transcript subtraction [17
], have been used to filter sequence data to identify non-human sequences. RDA has been used in the identification of herpesvirus in Kaposi's sarcoma [18
], and a method based on long serial analysis of gene expression (LongSAGE) [19
], called digital transcriptome subtraction (DTS), has been used to identify a new polyomavirus in Merkel cell carcinoma [21
While these methods have been successful in microbe identification, current techniques have limitations. Candidate-based approaches only confirm the presence of known viruses or bacteria [6
]. In addition, varying mechanisms of infection confound the identification of foreign transcript sequences, as the agent may remain in a latent state for years in the host cell without being detected by gene expression methods.
To overcome these limitations, we developed a pathogen discovery approach that applies computational subtraction to Digital Karyotyping (DK). DK identifies and enumerates short sequence tags to provide a comprehensive view of the genomic content of any DNA sample [22
]. Generally, experimental tags obtained from a DK library are matched to a virtual tag library derived from the human genome. Although the vast majority of tags are identical to the predicted virtual tags, there are inevitably experimental tags which do not match any human sequence. These sequences originate from a number of sources, including unpublished human sequences, tag site polymorphisms, sequencing errors, or foreign DNA sequences that are not present in the normal human genome.
We developed DK-MICROBE to quantitatively evaluate DK sequences that do not match the human genome and which may be derived from microbial genomic DNA. We verified the sensitivity and specificity of this approach by studying Epstein Barr virus (EBV)-infected human lymphoblastoid cells and murine retrovirus infected tumor xenografts. We then applied this technique to analyze brain, colorectal and ovarian tumors for viral and bacterial genome sequences.