|Home | About | Journals | Submit | Contact Us | Français|
The chromosomal features that influence retroviral integration site selection are not well understood. Here, we report the mapping of 226 avian sarcoma virus (ASV) integration sites in the human genome. The results show that the sites are distributed over all chromosomes, and no global bias for integration site selection was detected. However, RNA polymerase II transcription units (protein-encoding genes) appear to be favored targets of ASV integration. The integration frequency within genes is similar to that previously described for murine leukemia virus but distinct from the higher frequency observed with human immunodeficiency virus type 1. We found no evidence for preferred ASV integration sites over the length of genes and immediate flanking regions. Microarray analysis of uninfected HeLa cells revealed that the expression levels of ASV target genes were similar to the median level for all genes represented in the array. Although expressed genes were targets for integration, we found no preference for integration into highly expressed genes. Our results provide a more detailed description of the chromosomal features that may influence ASV integration and support the idea that distinct, virus-specific mechanisms mediate integration site selection. Such differences may be relevant to viral pathogenesis and provide utility in retroviral vector design.
Retroviral DNA integration is catalyzed by a viral enzyme, integrase (IN), which nicks the two ends of linear viral DNA and splices them into a site in the host DNA (9, 10). This highly orchestrated reaction produces DNA sequence signatures at the virus-host junctions: the loss of usually 2 bp at the ends of the linear viral DNA and duplication of several base pairs of host DNA at the integration site. However, no gross rearrangements or deletions of either the viral or host DNAs are incurred. The integration reaction can be reproduced in vitro using purified IN and viral and target DNA model substrates. Despite the precision of the integration reaction, there are no strict host DNA sequence requirements. Nevertheless, numerous studies indicate that integration site selection is not likely to be entirely random. For example, various features of host chromosomes have been implicated in influencing integration site selection, including primary DNA sequence (7, 14, 18), DNA structure (16, 20, 24), nucleosome structure (27-31), chromatin structure (32, 36, 37, 40), and transcriptional activity (23, 33, 34, 41, 43). In addition to the passive influences of chromosomal structure, it has been suggested that retroviral integration could be actively targeted by tethering to specific, chromatin-bound host factors (5). Lastly, the IN proteins from different retroviruses produce unique in vitro integration patterns in naked DNA targets (18). These intriguing but disparate observations have not yet led to a unifying model, and the mechanisms that govern integration site selection in vivo remain obscure.
In an infected cell, retroviral DNA is organized in a preintegration complex that includes IN, as well as other viral and host components (2-4, 22). Such complexes can be isolated from infected cells, and their integration activity can be measured on naked DNA targets in vitro. However, in vivo, interactions between host chromatin and the preintegration complex likely influence site selection. The basic unit of chromatin, the nucleosome, is further assembled into chromatin fibers and more highly organized regions (“open” euchromatin and “closed” constitutive or facultative heterochromatin). Such regions are subject to dynamic structural changes during chromosome condensation, transcription, and DNA replication.
A series of studies have addressed the fundamental question of how nucleosomes might influence the retroviral integration reaction. Rather than blocking this process, wrapping of the host DNA into nucleosomes appears to distort the DNA in a way that promotes periodic integration events within the nucleosome, both in vitro and in vivo (27-32). Recognition of distorted DNA may be a fundamental feature of the IN catalytic mechanism (15, 35). Although wrapping of DNA around individual nucleosomes may enhance integration, it is unknown if the preintegration complex has free access to host DNA within higher-order chromatin or if such access depends on other factors.
Although the precise mechanisms of site selection remain elusive, recent discoveries in genomics and transcriptional profiling have provided a wealth of both technical and informational resources for studying integration site selection on a genomic scale. Use of such resources is beginning to reveal how the diverse physical and functional features of host cell chromosomes may influence integration site selection (34, 43). The human genome includes approximately 25,000 genes, comprising about 33% of all DNA sequences, and is organized into gene-dense and gene-sparse regions. A large subset of human genes (>20,000) have been more precisely defined by comparing the genomic DNA sequence with expressed RNAs (the RefSeq genes) (25, 26). Further analysis of the human transcriptome has revealed highly and weakly expressed clusters of genes, so-called RIDGEs and anti-RIDGEs, respectively (6, 39). Thus, identification of nucleotide positions of integration sites in human DNA allows their immediate characterization with respect to gene organization and provides information about possible transcriptional activity of the target site at the time of integration. In some cases, the results of such analyses can also provide information concerning the likely biochemical status of the integration site in terms of euchromatic versus heterochromatic structure (7).
Knowledge of the determinants of retroviral integration site selection is critical to understanding the early steps in retroviral replication and viral pathogenesis and may also be relevant to the design of safer retroviral vectors. The advantage of this class of vectors is that integration into the host cell genome is a normal step in the replication cycle. However, a major limitation is the possibility that integration may disrupt normal cellular functions. For example, a well-recognized mechanism for retroviral oncogenesis is the activation of a cellular proto-oncogene by a nearby integrated retroviral long terminal repeat (LTR) (promoter insertion oncogenesis) (13). This limitation was made apparent recently when 2 of 11 patients being treated for X-linked severe combined immunodeficiency disease with murine leukemia virus (MLV)-based gene therapy vectors developed leukemia induced by promoter insertion near the growth-promoting gene LMO2 (8, 11). Although these adverse events may be driven mainly by accumulation of sufficient integrations for subsequent selection, an understanding of the mechanism of integration site selection might prove useful for future vector design.
Results of recent studies have suggested that there are differences in site selection by MLV and human immunodeficiency virus type 1 (HIV-1). These findings may have relevance to both virus biology and vector design (5, 34, 43). We recently observed that, in vitro, the efficiency of avian sarcoma virus (ASV) IN-mediated integration is increased when a nucleosome-packaged DNA substrate is compacted. HIV-1 IN-mediated integration is inhibited under the same conditions (37). We also have found that an ASV vector is susceptible to chromatin-mediated silencing in HeLa cells, while a matched HIV-1 vector is highly resistant (R. A. Katz et al., unpublished data). These results support the idea that there are fundamental differences in retrovirus-host genome interactions and have led us to investigate ASV integration site selection in vivo. Here we report results from mapping and analyzing over 200 ASV integration sites in the human genome.
Human cells (HeLa) were infected with the amphotropic ASV vector (RCASACMVEGFP) (17), which was produced in DF-1 chicken cells. Supernatant from virus-producing cells was passed through a 0.45-μm-pore-size filter and added to HeLa cells with fresh growth medium and DEAE-dextran (final concentration, 10 μg/ml). The avian retroviral DNA can be integrated in mammalian cells, but they cannot support productive replication, thus limiting the infection to one round. Infected-cell DNA was harvested 48 h postinfection. The PCR-based strategy for cloning virus-host junctions was similar to that described previously (43). The concentration of isolated DNA was estimated by UV absorbance, and 2.5 μg was digested with AluI, BstEII, and EcoRI endonucleases. AluI cleaves the human genomic DNA frequently and was used to generate host-virus junction fragments (downstream LTR-host). BstEII cleaves the viral DNA just 3′ of the upstream LTR, thus suppressing internal viral amplifications. EcoRI cleavage suppressed amplification from circular viral DNA forms. Resulting DNA fragments were ligated to the AluI adaptor provided with the Universal Genome Walker kit (lontech) according to the manufacturer's recommendations. PCR was performed with one primer specific to the LTR and another to the adaptor.
The reaction mix (50 μl) contained 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl2, 250 μM deoxynucleoside triphosphates, and 0.3 μl of AmpliTaq DNA polymerase (Applied Biosystems). Reactions were cycled as follows: 30 cycles of 94°C for 40 s, 68°C for 30 s, and 72°C for 45 s in a MultiCycler PTC 225 Tetrad (MJ Research). PCR products were diluted 50-fold and then subjected to nested PCR performed under the same conditions. Amplification products were isolated after agarose gel electrophoresis and cloned in the pCR4-TOPO plasmid, which was used to transform Escherichia coli TOP10 competent cells with the use of the TOPO TA cloning kit for sequencing (Invitrogen). Inserts from isolated clones were amplified by PCR with M13 primers complementary to the pCR4-TOPO plasmid, and the resultant products were purified and sequenced. Primers used for this study were: ASV 3′ LTR, 5′-ACC TGG GTT GAT GGC CGG ACC GTT GAT T-3′; adaptor primer, 5′-GTA ATA CGA CTC ACT ATA GGG C-3′; ASV 3′ LTR nested primer, 5′-CCT GAC GAC TAC GAG CAC CTG CAT GAA G-3′; adaptor nested primer, 5′-ACT ATA GGG CAC GCG TGG T-3′; M13 forward primer, 5′-AAA CGA CGG CCA G-3′; M13 reverse primer, 5′-CAG GAA ACA GCT ATG AC-3′.
To map integration site sequences to the human genome, we used the BLAT program (University of California, Santa Cruz, Human Genome Project Working Draft July 2003 Freeze; http://www.genome.ucsc.edu/cgi-bin/hgBlat).
An integration site was considered to be authentic only if it contained both the downstream LTR and adaptor sequence, matched the genomic location after the end of the downstream LTR (…CA), represented ≥95% identity with the genomic sequence, and matched no more than one genomic region with ≥95% identity. Integration was judged to have occurred in a gene only if it was located within the boundaries of one of the RefSeq genes (National Center for Biotechnology Information Reference Sequence Project). These sequences were designated as genes on the basis of human mRNAs and their translation products, rather than gene prediction programs. We will refer to genes that have incurred an integration event as “target genes.”
Total HeLa cell RNA was isolated with the Trizol reagent (Invitrogen) and purified with the RNeasy kit (Qiagen). For the preparation of cDNA probes, 25 μg of total RNA was reverse transcribed in the presence of oligo(dT)12-18 and aminoallyl-dUTP. The cDNAs were labeled with N-hydroxysuccinimide ester linked to either Cy3 or Cy5 dye, respectively (Amersham Biosciences). The labeled probes were purified using the QIAquick PCR purification kit (Qiagen).
The DNA microarray chips were prepared at the Fox Chase Cancer Center Microarray Facility with a set of 15,552 human oligodeoxynucleotides (MWG Biotech). The DNA was spotted onto polylysine-coated glass slides by using the Omnigrid arrayer (Analytical Instruments). The processing of the slides and the hybridization reaction were performed essentially as described elsewhere (http://cmgm.stanford.edu/pbrown/mguide/).
To examine the level of relative expression of ASV target genes, we used two data sets obtained with HeLa cells (H55 and H56). Genes with intensities greater than 2 standard deviations above the background were considered expressed. This assignment of threshold values for expressed versus nonexpressed genes is arbitrary. For analyses of these data sets, we set this threshold value low, resulting in scoring a majority of genes as expressed. This approach appeared to be most appropriate for examining the relationship between the level of gene expression and integration site selection. The H55 data set contained 13,750 expressed genes out of the total of 15,552, with a range of expression levels from 81 to 61,000 arbitrary units. The H56 data set contained 11,251 expressed genes, with a range of expression values from 83 to 65,000 arbitrary units. Of the 50 ASV target genes represented in the library, 2 and 11 fell in the nonexpressed category for the H55 and H56 data sets, respectively.
We also analyzed two public (GEO) HeLa cDNA array data sets (GSM2145 and GSM2177; http://www.ncbi.nlm.nih.gov/geo/). In these data sets, the background and background standard deviation levels were not available and thus we were unable to analyze these in a way similar to our own data sets. The platform for the GSM2145 and GSM2177 data sets used 9,985 genes, and 34 ASV target genes were represented. The ranges of expression levels for the GSM2145 and GSM2177 data sets were 123 to 50,000 and 104 to 34,000 arbitrary units, respectively.
Median expression values were obtained for target genes versus all genes. For this analysis, all four data sets were used (GSM2145, GSM2177, H55, and H56). The median expression levels of target genes and all genes were expressed as a ratio of target genes to all genes for each pair of data sets.
The GenBank accession numbers for 226 integration sites are AY653309 through AY653534.
We used an ASV vector pseudotyped with the murine amphotropic envelope protein to allow one-step infection of HeLa cells (1, 17). Cells were isolated 48 h postinfection to minimize selective forces that might bias the analysis of integration site selection. Cellular genomic DNA was isolated, and virus-host DNA junctions were amplified by a linker-mediated PCR method (43). The PCR products were cloned and sequenced, and the locations of the integration sites were determined using the human DNA sequence (19, 38). A total of 226 integration sites were analyzed. Mapping of these sites revealed that integration events were distributed over all 23 human chromosomes (Fig. (Fig.1).1). The number of integration events per chromosome was generally proportional to chromosome size (Fig. (Fig.11 legend), validating our methods and suggesting that there were no global restrictions for integration site selection. Furthermore, we found no evidence for integration hotspots as observed for HIV-1, by using the criteria described previously (34).
We next determined the frequency with which integrations occurred into RNA polymerase II genes (protein-encoding genes), by using human RefSeq genes (25, 26) as a criterion. (We refer to genes that have incurred an integration event as target genes.) The results showed that 95 of 226 (42%) integration events occurred in the defined RefSeq gene set (n = 21,404) (Table (Table1).1). The use of the RefSeq gene set allowed us to compare our results with those reported by Wu et al. (43), who also used RefSeq genes to map MLV (and HIV-1) integration sites in HeLa cells. The percentage that we observed (42%) was somewhat higher than the published value for MLV (34.2%) (43). However, the RefSeq gene assembly that we used (July 2003) is slightly larger than the assembly used in the study by Wu et al. (43) (November 2002). We therefore calculated extrapolated values that would normalize the two studies, so that a more accurate comparison could be made (Table (Table1).1). With this correction, the percentages were quite similar (42 versus 40.2%), and both values were significantly higher than the value determined for random integration (26.3%) (Table (Table1).1). We also recalculated our values using the November 2002 RefSeq gene assembly, yielding a value of 39.8% integrations into genes for ASV. This value can be directly compared to the 34.2% value reported for MLV (P = 0.134) and the 22.4% value for random integration (P = 3.1 × 10−9) (Table (Table1).1). By these criteria, we conclude that the observed 42% integration into RefSeq genes with the ASV vector is greater than that predicted for random integration (Table (Table1)1) and that the value is similar to that observed with MLV.
A concern in determining the absolute frequencies of integration events in genes is that the percentage of genes in HeLa cells may not correspond to the percentage represented in the human genome sequence, due to gene amplifications or deletions. To address this issue, we have calculated the number of genes and total size of the HeLa cell genome size by using the detailed karyotype (21) and the NCBI Map Viewer. We found that the ratio between the genome size and the gene number in HeLa cells is equivalent to that of the human genome.
Our conclusion that ASV integration favors genes is also highly dependent on the frequency measured for random integration into RefSeq genes. As mentioned previously, this frequency was experimentally determined using computer-simulated integration events (42). In an attempt to confirm this value, we used a second computational method for determining the frequency of random integration. We calculated the percentage of RefSeq gene sequences in the total human genome sequence; however, this calculation yielded a higher value for random RefSeq integration than was reported. We note that our calculation is subject to biases including, but not limited to, an underestimate of the total human genome size (e.g., due to estimates of unsequenced gap sizes) and the relatively high percentage of overlapping gene sequences that were not taken into account. We therefore rejected this calculation, but note that our results regarding the apparent preference for ASV integration into genes should be viewed with caution, as we have not calculated a random integration value independently. Despite these uncertainties with respect to random integration, we can clearly identify statistically distinct similarities and differences with MLV integration site selection (Table (Table1)1) and HIV-1 selection (as described below).
Two previous studies have indicated that genes are highly favored for HIV-1 integration. In the first high-throughput study of HIV-1 integration site selection, Schröder et al. (34) reported that 69% of HIV-1 integration events occurred within genes (in SupT1 cells), compared to an experimental in vitro control for random integration (35%). In this study, the “gene” designation was not limited to RefSeq sequences, and therefore we cannot directly compare these findings with our ASV results. However, Wu et al. (43) reevaluated the results of Schröder et al. (34) and determined that 61% of the HIV-1 integrations were located in RefSeq genes (Table (Table1).1). These authors also surveyed integration sites in HeLa cells, independently, by using an HIV-1 vector and found that 50% of integrations were in genes (43) (Table (Table1).1). Taken together, these earlier studies and our own suggest that, although all three viruses favor integration into genes, HIV-1 integration displays a much stronger preference for such sequences than does either ASV or MLV.
Previous cell sorting experiments indicated that expression of the human cytomegalovirus immediate-early promoter-driven green fluorescent protein (GFP) reporter in our ASV vector was subject to rapid chromatin-based, epigenetic silencing in a large subset of infected HeLa cells (Katz et al., unpublished). One interpretation of these observations is that integration within genes might promote GFP expression, as such regions (e.g., euchromatin) are accessible to transcription factors. Correspondingly, integration into heterochromatic, nongenic regions might promote silencing. We therefore prepared a second integration site library from cells that had been sorted for high expression of GFP (data not shown). Analysis of a small data set of 22 integration events showed that 36% were within genes in this population; this value is comparable to the 42% observed with unsorted cells. Although limited, this analysis suggests that high ASV reporter gene expression is not correlated with integration into gene sequences.
The results reported previously with MLV indicated that transcriptional start sites were favored for integration by this virus, with 20.2% occurring within a window of ±5 kb. With ASV, we also found a statistically significant bias for integrations in this window (8.4%) compared to the calculated random events (2.1%) (P = 0.0044) (Table (Table2).2). A similar value (10.8%) was also reported for HIV-1 (Table (Table2)2) (43). However, when we compared the distribution of ASV integration sites along the length of genes, including the 5-kb upstream and downstream windows, we observed no bias for transcriptional start sites (Fig. (Fig.2).2). As ASV does not appear to favor transcriptional start regions, we conclude therefore that ASV and MLV integration site selection preferences are distinct in HeLa cells.
As CpG islands are frequently associated with transcription start sites, we determined the percentage of integrations within 1 kb of these elements. In contrast to the 16.8% observed with MLV, only 3.1% of the integration events occurred in this window, and this value is close to that predicted for random integration (Table (Table2)2) (43). However, our sample size may be too small to distinguish the difference from a random value (P = 0.869). We also determined the frequency of ASV integrations into various features of human chromosomes, including satellite repeat elements, retrotransposons, and endogenous retroviruses (Table (Table3).3). The results showed that the relative number of integrations generally paralleled the target size of each element. We found that satellite DNA was a less favored target, as was observed for HIV-1 (7, 34).
Because the open chromatin state of active genes (during access by transcription factors or engagement with actively transcribing RNA polymerases) may also allow access by the viral preintegration complexes, a simple hypothesis is that transcriptionally active genes may be preferred target sites for integration. Having observed that ASV DNA integration occurred with a higher-than-expected frequency within genes, we asked if integration could be correlated with transcriptional activity at the time of infection. The ASV target genes that we identified included housekeeping-type genes, as well as tissue-specific genes not expected to be expressed preferentially in HeLa cells.
To examine the expression of target genes in detail, we analyzed the transcriptional activity of uninfected HeLa cells by microarray analysis. Of the 95 target genes, 50 were represented in the test array. HeLa gene expression levels were collected into groups based on increments of 500 arbitrary units, and the 50 target genes fell into six bins (closed triangles, Fig. Fig.3).3). As shown in Fig. Fig.3,3, analysis of these data showed that most target genes were expressed at levels above background, but no bias for integration into highly expressed genes was observed. The ratio of the median expression values of ASV target genes to all genes was 0.99 (an average of duplicate data sets). This comparison indicates that target genes are expressed at average, rather than high, levels. In an independent test of these results we also analyzed two HeLa cell microarray data sets from a public database that was also used for the previous MLV study (43). Of the 95 genes that were targets for integration, 34 were available in this database (some common to our platform). In this analysis, the ratio of median expression values of target genes to total genes was ca. 0.97, confirming that there was no preference for integration into highly expressed genes. We applied the Wilcoxon test (testing the same hypothesis as a t test) to all four data sets and found that we could not distinguish the median expression levels of target genes and all genes (P ≥ 0.497).
To further assess the relationship between transcriptional activity and integration site selection, we reanalyzed two data sets by sorting the genes into bins based on their expression levels and then determined the number of integration events in each bin. As shown in Fig. Fig.4,4, the number of integrations was generally proportional to the number of genes per bin. Based on our sampled genes, we conclude that there is no strong bias for ASV integration into highly active genes. Instead, integration appears to be random with respect to transcriptional activity of genes.
We also compared the number of integration events that occurred in chromosomes that contain highly expressed gene clusters (6, 39) (RIDGEs on chromosomes 1, 11, 16, and 19) with the number that occurred in those with poorly expressed gene clusters (anti-RIDGEs on chromosomes 4, 8, 13, and 18) (39). The collective target sizes for these two chromosome sets were similar, and we found that the total number of integration events in each chromosome set was nearly equivalent (37 versus 36 events, respectively). Strikingly, only 4 of 226 (1.8%) integration events occurred into chromosome 19 (Fig. (Fig.1),1), which is rich in highly expressed genes; this is in contrast to the much greater percentage observed for HIV-1 (34).
In this study, we investigated integration site selection after infection with an ASV-based retroviral vector. We used a vector system that was engineered for infection of human cells (1, 17), which allowed identification of target sites in the human genome and determination of the transcriptional states of the target genes prior to integration. The general experimental design and the mapping methods were as described previously and shown to be free from significant biases (34, 43). For ASV, we found that, in general, integration sites were not restricted to any particular regions or known chromosomal features, as measured by several parameters. One limitation of our study is the use of heterologous human host cells, which was dictated by the availability of human genome sequence and the ability to perform transcriptional profiling. If site selection is influenced by specific interactions between the avian retroviral preintegration complex and chicken-specific host factors, such targeting would be missed in our system. However, the fact that authentic and apparently efficient ASV integration occurs in mammalian cells argues against the existence of critical species-specific interactions. We note that our conclusions regarding global accessibility of integration sites are generally consistent with an earlier study that used a primer extension method to survey ASV integration sites in turkey cells (42). This study detected local integration hotspots for ASV, likely due to the ability to measure a larger number of events. With the noted caveats, our study provides new information regarding how broad chromosomal features and transcriptional activity affect ASV integration site selection and provides a useful comparison with MLV and HIV-1.
Our results suggest that ASV integration favors genes (e.g., 39.8 versus 22.4% expected for random integration) (Table (Table1).1). Schröder et al. (34) and Wu et al. (43) reported that genes are favored targets for both HIV-1 and MLV integration, with HIV-1 showing a stronger preference than both ASV and MLV (Table (Table1).1). The bias for HIV-1 integration into genes was especially pronounced in genes activated in response to HIV-1 infection (34). The preference for ASV (and MLV) integration into genes is modest and is dependent on the measurement of random integration, as discussed above. Another method for determining the random integration frequency is to perform in vitro integration reactions on naked cellular target DNA (34); however, this method could produce other biases. The measured preferences for integration into genes by ASV and MLV might decrease or increase with future analyses, based on more accurate values for random integration. However, in a comparative sense, we can more definitively conclude that ASV and MLV show similar frequencies of integration into genes, as the same cell type, gene definition (RefSeq), and value for random integration were used in the two studies.
The previous MLV studies revealed a unique bias compared with HIV-1; gene promoter regions were five times more likely than random to be targets for MLV integration, and it is possible that this tendency may be relevant to activation of the LMO2 gene in human patients during retroviral gene therapy (43). With respect to targeting over the length of genes, we found that, in contrast to MLV, ASV does not show a bias for transcriptional start sites (Fig. (Fig.22).
Transcriptional analysis of ASV target genes revealed that their expression values fell very close to the median for all analyzed genes. Further analyses showed that integration site selection is essentially random with respect to the intensity of transcriptional activity of target genes (Fig. (Fig.33 and and4).4). It is possible that the preference for ASV integration into genes may reflect increased accessibility associated with some baseline transcriptional activity. Our results show that, although there is a bias for ASV integration into genes, these genes are expressed at average levels; highly active genes (greater than twofold above the median expression level) are not favored (Fig. (Fig.4).4). Recently, it was reported that highly active transcription may inhibit ASV integration (41), and our results are not inconsistent with this proposal. In contrast to our findings with ASV, a positive correlation between high transcriptional activity and integration site selection was seen with HIV-1 (34).
Results from our study together with previous reports (34, 43) suggest that the measurable differences in integration preferences of HIV, MLV, and ASV vectors may result from differences in the integration site selection mechanisms of these retroviruses. Determinants of selection might include virus-specific properties of IN proteins (18, 37), preintegration complexes, or virus-specific interactions with cellular cofactors (4, 14). The yeast retrotransposons Ty3 and Ty5 have evolved mechanisms of site selection that depend on physical interaction between components of the integration complex and different cellular factors bound to DNA, resulting in different targeting specificities (reviewed in reference 5). Although retroviral integration appears to lack specificity with respect to target DNA sequence, similar host factor-based targeting mechanisms may play a role in site selection (5). If such targeting mechanisms are indeed operative, they may be tissue or cell type specific. For example, putative chromatin-bound targeting proteins might be expected to be expressed differentially in various cell types. Furthermore, it is possible that the lack of targeting to transcription start sites that we observed could be due to interspecies infection.
Elaboration of the determinants of site selection will be necessary to completely describe the early events in retroviral replication; with such knowledge, virus-specific differences may be incorporated into retroviral vector design. The predominant retroviral vectors currently in use are based on MLV and HIV-1. It is possible that virus-specific features of integration site selection (such as an apparent lack of preference for transcriptional start sites, highlighted here for ASV) may provide practical advantages (43). We (17) and others (12) recently demonstrated that transduction by ASV vectors is not limited to dividing cells, further highlighting the prospect of exploiting a broader array of retroviral vectors for gene therapy in differentiated cells. Thus, the choice of retroviral vectors might be tailored for specific needs. It has also been suggested that preferences for gene targeting might be exploited for insertional mutagenesis (34).
Beyond providing a more detailed description of the early events of replication, and the possible implications for vector design, our results contribute to an understanding of how genomes are shaped in evolution, as a large percentage of the human genome comprises integrated retroviral sequences (19, 38). Technical refinements, which should eventually allow higher-throughput analysis of integration sites, may allow use of retroviral integration as an in vivo probe of chromatin structure and genome organization, providing an experimental model for genome shaping.
After the acceptance of our manuscript, similar findings on ASV integration site selection were reported by Mitchell et al. (R. S. Mitchell, B. F. Beitzel, A. R. Schröder, P. Shinn, H. Chen, C. C. Berry, J. R. Ecker, and F. D. Bushman. PLoS Biol. 2:E234, 2004).
We are grateful to Peter Adams, Glenn Rall, and Ken Zaret for critical comments on the manuscript. We acknowledge the following Fox Chase Cancer Center Core Facilities and individuals for excellent technical assistance: Automated DNA Sequencing Facility (Anita Cywinski), Biostatistics Facility, DNA Microarray Facility (Ketaki Datta), and The Fannie E. Rippel Biochemistry and Biotechnology Facility. We also thank Marie Estes for excellent assistance in preparing the manuscript.
This work was supported by National Institutes of Health grants AI40385, CA71515, CA06927, AI30544, and AI48046 and also by an appropriation from the Commonwealth of Pennsylvania. Partial support for R.A.K. was provided by a grant from the American Cancer Society (IRG-92-027-05).
The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute or any other sponsoring organization.