|Home | About | Journals | Submit | Contact Us | Français|
Viral and transposon vectors have been employed in gene therapy as well as functional genomics studies. However, the goals of gene therapy and functional genomics are entirely different; gene therapists hope to avoid altering endogenous gene expression (especially the activation of oncogenes), whereas geneticists do want to alter expression of chromosomal genes. The odds of either outcome depend on a vector's preference to integrate into genes or control regions, and these preferences vary between vectors. Here we discuss the relative strengths of DNA vectors over viral vectors, and review methods to overcome barriers to delivery inherent to DNA vectors. We also review the tendencies of several classes of retroviral and transposon vectors to target DNA sequences, genes, and genetic elements with respect to the balance between insertion preferences and oncogenic selection. Theoretically, knowing the variables that affect integration for various vectors will allow researchers to choose the vector with the most utility for their specific purposes. The three principle benefits from elucidating factors that affect preferences in integration are as follows: in gene therapy, it allows assessment of the overall risks for activating an oncogene or inactivating a tumor suppressor gene that could lead to severe adverse effects years after treatment; in genomic studies, it allows one to discern random from selected integration events; and in gene therapy as well as functional genomics, it facilitates design of vectors that are better targeted to specific sequences, which would be a significant advance in the art of transgenesis.
Elements such as viruses and transposons, through evolution with their host organisms, have acquired the ability to integrate into host genomes and ultimately shuffle genetic material between organisms. These elements have an established history in molecular biology and genetics research because of their ability to deliver specific genetic cargo, randomly disrupt host genomes for genetic screens, and serve as vectors for delivery of therapeutic expression cassettes to treat human disease. Viral vectors have been the predominant tools for these applications for three reasons: the ease and efficiency with which specific viral genetic cassettes can be introduced into cells; the vast accumulated knowledge of viruses and their mechanisms of gene transfer into chromosomes; and the large number of sites in genomes into which they can integrate. Retroviruses in particular have been used for random insertion into chromatin to interrupt host genes (insertional mutagenesis) and thereby identify their function [1-3] as well as for delivery of therapeutic genes [4-6]. Moreover, viral activation of oncogenes and, more recently, inactivation of tumor suppressors have been used to discover several novel genes that are involved in cancer progression [7-12]. The consequence of insertional activation of host cell oncogenes by viral vectors, however, has emerged as a major risk/obstacle in gene therapy, with a few cases of leukemia arising from oncogene activation by therapeutic vectors [13,14]. The potential genetic consequences of insertions of integrating vectors are summarized in Figure Figure11.
Activation of oncogenes in mice by insertionally mutagenic retroviruses suggested that inadvertent oncogene activation resulting from the use of relatively benign therapeutic vectors is a potential risk associated with gene therapy. Gene therapy vectors are extensively minimized to eliminate their replicative potential and reduce their collateral effects on the target genome . However, extensive testing in animals demonstrated that the risk of oncogenic activation was real, although variable and dependent on the viral vector used, the genetic cargo, and the background genetics of the model system [16-22]. Given what was assumed to be acceptable risk, retroviral gene therapy trials have been conducted in human patients. Nearly 1,000 clinical gene therapy trials have been initiated, more than half with retroviral vectors , but as yet no vectors have been approved in the USA for clinical gene therapy outside the clinical trial setting . (Gendicine, an adenovirus designed to restore p53 function in cancerous cells, has been approved for commercial human gene therapy in China , although this vector is essentially nonintegrating and thus carries decreased risk for oncogene activation via vector insertion.)
The worst fears of the gene therapy field, oncogene activation, were realized when three of more than 20 patients treated for X-linked severe combined immunodeficiency disease (X-SCID) developed leukemia. These adverse findings, including one death, occurred 3 years or more after administration of therapeutic murine leukemia virus (MLV)-derived retrovirus vectors [25,26]. The linkage between treatment and leukemias could be inferred because the expanded transformed cell populations harbored clonal integrations of the therapeutic vector, which suggested a biologic selection for the retrovirus-induced mutation [27-30]. However, these studies also indicated that clonal expansions in some cases appeared to be temporary and did not always lead to adverse effects, features that could actually improve the likelihood of successful gene therapy. The cause of at least two of the leukemias appears to be insertion of the MLV vector close to the LMO2 oncogene, which led to LMO2's activation by enhancers in the long terminal repeat (LTR) sequences of the vector [31-33]. Retrospective examination of the role in LMO2 during development supported this conclusion [34,35]. Subsequent studies in which the cargo gene IL2γc was over-expressed in mice (albeit at levels higher than in the X-SCID leukemia patients) suggested that this gene could itself act as an oncogene in T cells . Also, simultaneous activation of IL2γc and LMO2 by oncogenic retroviruses had been observed in one mouse, suggesting a possible genetic interaction between the cargo IL2γc gene and LMO2 . The relevance of these observations to clinical cases, however, is highly debatable [37,38].
In contrast, other gene therapy trials that employed retroviral vectors to treat adenosine deaminase deficiency [39-41] and chronic granulomatosis disease (CGD)  have not yet reported any equivalent adverse events. In the CGD study, there appeared to be powerful selection for integration events of the spleen focus-forming virus vector, which also was used as a vector for X-SCID , into the neighborhoods of three previously identified genes, namely MDS-EVI1, PRDM16, and SETBP1, which have been associated with enhanced proliferation following integration of retroviruses with activating LTRs [44-46]. As noted previously, findings of preferential integration around certain genes is not necessarily due to a preference for these genes, but may rather be a consequence of clonal expansion that can be transient and thereby beneficial in terms of enhancing the number of therapeutic cells. A similar effect has also been observed in nonhuman primate studies, indicating that this result may not be unique . Despite the striking incidence of common integration sites that are often associated with tumor or leukemia formation [8,47,48], there has been no report of adverse events in the CGD patients and no indication that the corrective gene, gp91phox, synergizes with any of the three common integration site genes to promote growth. Likewise, a murine stem cell retrovirus has been used to deliver the α and β chains of the antiMART-1 T-cell receptor complex ex vivo into peripheral blood lymphocytes to treat melanoma without any apparent adverse effects, although integration sites were not examined and the patient population had low odds for survival, even with the treatment (two out of 15 survived) for more than 1 year .
Taken together, the results of the CGD and X-linked plus adenosine deaminase SCID trials demonstrate that oncogenesis is not necessarily an inherent, inevitable side effect of gene therapy. In more than 20 patients, the genetic deficiencies of more than 80% have been fully corrected, allowing them to lead normal lives. However, tumors and leukemias can take years to manifest, and these trials are in their early years. A clearer understanding of the variables that underlie oncogenesis is needed in order to increase the safety of these trials. These variables include insertion site preferences of therapeutic vectors, their abilities to activate nearby genes, and interactions between specific genetic cargos and activated host genes. Although cargo-host interactions will be specific to each gene therapy approach, the vectors themselves govern other parameters of insertion preference and neighboring gene activation. Analyses of insertion preferences, in particular, have received much recent attention, and have sparked interest in the use of transposons as alternatives to viruses as gene therapy vectors.
Transposable elements also have been used for insertional mutagenesis and genetic studies in model organisms, and are being developed as gene therapy agents in humans [50-53]. The most well characterized DNA transposon vector used in mammals is the synthetic Sleeping Beauty (SB) transposon system , which over the past decade has become a powerful tool in functional genomics to identify genes in vertebrates, including fish and mammals [55-61]. Application of transposon-mediated gene transfer to gene therapy has been explored because it avoids several disadvantages of viral delivery systems. These disadvantages of viruses include the following: (1) their preference for integrating into genes [62-65]; (2) the difficulty with purification to eliminate toxic or infectious agents ; (3) their potential to elicit unwanted immune or inflammatory responses [67,68]; (4) the constraint on therapeutic cargo size; and (5) the difficulty and expense associated with their production in large quantities [69,70]. In contrast to viral vectors, preparations of nonviral plasmid-based transposon vectors are relatively inexpensive to purify, are largely nonimmunogenic, and have no hard constraints on genetic sequences that can be delivered.
A negative tradeoff with DNA vectors is increased difficulty in delivery. Delivery of nonviral DNA into mammalian genomes involves avoiding or traversing numerous barriers, including enzymes in the blood and cellular environments, the endothelial lining of vessel walls, cellular plasma membranes, endosomal membranes, nuclear membranes, and chromosomal integrity .
There are three delivery approaches that work across the nanoscale, microscale, and macroscale . Nanoscale delivery involves particles or complexes that are most often designed to be about 100 nm or less in diameter, although sizes up to 1 μm fit into this category. The nanoscale approach comprises delivery of single or small numbers of DNA molecules, which most often are collapsed by polycationic polymers (for example, polylysine and other modified amino acids, and various linear and branched forms of polyethylenimine, among others) or lipids, with or without various ligands (for review, see the report by Wagner and coworkers ). Some polycationic complexes are cytotoxic or unstable in the blood, which can be circumvented by encasing the complexes in polyethylene glycol . Alternative delivery routes are those at the microscale and macroscale, in which DNA in packages up to 10 μm are phagocytized (microscale) or enter cells via fusions with other cells or entities larger than 10 μm (macroscale).
In mice, the most effective method for in vivo gene transfer and expression has been demonstrated in hepatocytes using simple infusion of naked plasmid DNA under increased pressure. This can be accomplished by hydrodynamic delivery of DNA using high pressure/high volume injection [74,75]. In mouse, this procedure involves injection of a large volume (10% volume/weight) of DNA/saline solution through the tail vein in less than 10 seconds. This procedure results in uptake of infused DNA into as many as 10% of hepatocytes in test animals [74,75] by expanding and rupturing liver endothelium, which in mice heals within 24 to 48 hours . Achieving a clinically feasible method of local delivery to liver in large animals, including humans, is a challenge that is being addressed by more localized hydrodynamic delivery using specialized catheters or pressure cuffs [77,78]. On the microscale, condensing DNA with polyamines such as polyethylenimine to a complex small enough to be taken up by cells into endosomes has been studied intensively [79,80]. Our findings (Hackett PB, Podetz-Pedersen K, Bell JB, McIvor RS, unpublished data) suggest that gene expression following hydrodynamic delivery is about 100-fold more effective than delivery using polyethylenimine [81,82] and only about 10-fold to 100-fold less effective than viral delivery to liver . Alternative delivery ex vivo using electroporation is under development and has been achieved in hematopoietic stem cells .
Since the development of the SB system, nonviral, integrating DNAs have established themselves as potential vectors for gene therapy. Following hydrodynamic delivery, transposons have been used in mice to cure hemophilias A and B [84-87] and tyrosinemia type I [88,89]. Other somatic delivery methods were used to ameliorate blistering skin disease (junctional epidermolysis bullosa) , retard glioma xenographs [91,92], produce Huntingtin protein in a model of Huntington disease , and as a preventive treatment for lung allograft fibrosis . Based on the findings summarized above, we estimate that only about one in 10,000 SB transposons that are delivered to liver or lung actually transpose into chromatin (Hackett PB, unpublished data). Although this is a small fraction, it is possible to deliver more than 108 therapeutic cassettes to an animal in order to treat as many as 10% to 20% of liver cells with a single injection of plasmids [84,88,95]. This procedure is sufficient to cure diseases such as hemophilia and tyrosinemia type 1, and to ameliorate other diseases such as mucopolysaccharidoses types I and VII. Although quantifying the number of transposon insertions per cell has not been done because of the difficulty of cloning insertion sites in mostly nondividing cells in most organs of animals, the expression data are consistent with a single integration in most if not all transgene-expressing cells.
In addition to SB, several other transposon vectors and phage integrase-based vectors have been tested for their potential to deliver therapeutic genes, including Frog Prince , Tol2 , and piggyBac , as well as other well characterized transposons such as the Drosophila P-elements, which are not mobilized very efficiently in mammalian cells . These vectors differ in their efficiency of gene insertion, genetic cargo capacity, integration site preferences, and effects on chromosomal stability. Among other advantages these systems have over retroviruses as gene therapy vectors, transposons present a wide variety of insertion site preferences that differ from those of retroviruses, with possible consequences for oncogene activation. The characteristics of these vectors are summarized in Table Table1.1. The remainder of this review discusses these differences as they relate to gene therapy and functional genomics.
Although most vectors will integrate into a vast number of sites scattered throughout the genome, numerous studies have shown that these integrations are not random with respect to several variables. Global preferences for vector integration can be governed by large-scale genomic context such as coding and regulatory regions of genes, and their transcriptional status, as compared with intragenic regions . The fine tuning that determines specific sites of integration is governed by smaller scale, physical features, such as the specific sequences of nucleotides surrounding insertion sites and DNA structural characteristics derived from these sequences. Figure Figure22 illustrates some of the physical features of DNA that are influenced by local sequence.
Viruses and transposons exhibit a wide range of variability with respect to preference for genes and transcriptional units. Several studies have mapped hundreds to thousands of insertions into human or mouse genomes, and correlated insertion positions with known genes. Many retroviruses exhibit a nonrandom preference for genes . This could be due to greater accessibility of the DNA in 'open' chromatin or interaction of integrase enzymes with cellular factors bound to transcriptional regulatory elements. In the case of HIV, the LEDGF/p75 transcriptional factor may act as a tether between the integrase and transcriptionally activated chromatin [100-102], which is similar to an idea that was proposed previously for designer targeting of integrating vectors [103-105]. In a similar approach using the SB transposon, Yant and coworkers  found that SB exhibited a much lower (although nonrandom) preference for genes. Although a preference for transcriptional units might seem beneficial for functional genomics studies, the myriad of recently identified noncoding RNA genes  (as well as other RNA product genes such as those encoding rRNA and tRNAs) involved in gene regulation may not be targeted by viral vectors that preferentially integrate into or near protein encoding genes. Targeting of various vectors to these non-coding RNAs in gene therapy, and any resulting deleterious effects, has not been extensively examined.
Many vectors appear to exhibit a preference for specific genes. In insertional mutagenesis studies, the identification of recurrent viral insertions into a specific group of genes was taken to mean that viral activation of these putative oncogenes in individual cells led to clonal expansion among a pool of cells in which every host gene was an equal target for integration (as discussed above for LMO2). However, when MLV insertions were mapped in normal HeLa cells that did not undergo any type of selection, oncogenic or otherwise, many of these same genes harbored recurrent integrations, suggesting that vectors may inherently target specific genes . The basis of this selection is not understood, but it may be similar to that discussed above for HIV.
In addition to general preferences for genes, many viral vectors, including retroviruses, lentiviruses, and adeno-associated virus, preferentially target transcriptional units or their promoters. MLV retroviruses have a preference for integration proximal to transcriptional initiation sites [64,65,108-111], which is a problematic trait, considering that MLV-based vectors are the most commonly used vectors in human gene therapy . HIV and adeno-associated viruses have preferences for entire transcriptional units [100,108,111-113] (see Note added in proof, below); this is in contrast to MLV, which targets only the region proximal to promoters. Additionally, expression array studies have shown that HIV has a preference for transcriptionally active genes  as well as an avoidance of chromatin regions in which transcription is repressed .
In contrast to these viral vectors, SB transposons and avian leukosis virus (a retrovirus) apparently have only a slight preference for either transcriptional units or their regulatory elements [106,115], with little or no preference for transcriptionally active genes . In one survey, SB exhibited an overall preference for microsatellite repeats, found primarily in noncoding regions , possibly due to the preferred target sites found in TA repeats . A study that correlated insertions sites with hundreds of genome annotations  illustrated the degree to which genomic features and primary sequence influenced vector integration preferences for several vectors (for example, the L1 and SB transposon insertions were much more influenced by primary sequence than were retroviral vectors). This study also found variable preferences between vectors for elements such as CpG islands, DNase I sensitive sites, and transcription factor binding sites. The recent identification of a periodic sequence encoding nucleosome positioning  may also correlate with vector integration patterns, because nucleosomes have been shown to affect patterns of retroviral integration . Similar studies to identify trends for piggyBac and Tol2 with respect to genome-wide integration preferences will be valuable in assessing the relative safety of these vectors for gene therapy.
Although many vectors exhibit a preference for genes, and even specific genes, few vectors repeatedly integrate into the same precise position with any significant frequency. Rather, most genes harboring frequent insertions show a distribution of insertions into several positions within the same gene. Some vector integrases, such as those for phages C31 [119-121], BT1 , as well as the Escherichia coli Tn7 transposon , recognize specific DNA sequences or degenerate sequences that exist in mammalian genomes. SB integrates specifically at a TA dinucleotide, and the piggyBac transposon integrates into the sequence TTAA. Because the oncogenic potential of a vector is related to its propensity to integrate in or near a select few genes, understanding local parameters that affect integration may contribute to our ability to assess the risk associated with these vectors in gene therapy.
For retroviruses and the SB transposon, consensuses sequences have been described surrounding the sites of integration [111,124-127]. Although retroviruses do not exhibit a strong consensus sequence, the nonrandom pattern of integrations and the observation that frequently hit sites did not match the consensus sequences led investigators to examine other properties of DNA sequences surrounding target sites, including structural characteristics of the DNA itself. DNA structural characteristics are based on non-Watson and Crick interactions between nucleotides and encompass deformations to the regular double helix structure caused by interactions between adjacent, planar bases (Figure (Figure2).2). Originally characterized from analysis of crystal structures of DNA bound to histones and other proteins, these characteristics include 'protein-induced DNA deformability', 'A-philicity', and trinucleotide 'bendability'. These properties underlie local variations in DNA structure that are probably relevant to recognition of DNA by transposases and integrases. Early investigations into insertion preferences showed that viruses preferred 'bent' DNA [118,128,129], and several groups have investigated secondary DNA structural patterns in sequences that flank mapped insertion sites for both transposons [115,124,130,131] and retroviruses [111,126] to determine general characteristics of the flanking sequence of 'preferred' integration sites. Similarly, the RAG1/2 protein complex, which has properties akin to the cut-and-paste transposases, recognizes a specific sequence/structure for recombination of antigen receptor genes .
Different DNA sequences may produce highly similar patterns of DNA secondary structure, and thus common structural patterns that are preferred for integration may be obscured by approaches that analyze sequence alone. Analysis of secondary structure for a DNA sequence is based on translation of a sliding window of two or three bases into structural values for each 'step'. For example, the tendency of a B-form helix to adopt the A-form (A-philicity; Figure Figure2)2) can be predicted by translating each consecutive (over-lapping) dinucleotide into one of 10 A-philicity values for the 16 combinations of base pair transitions [133-135]. Similarly, protein-induced deformability encompasses several changes in base pair orientation from a 'perfect B-form double helix' in a transition between two consecutive base pairs (Figure (Figure2c).2c). All of these changes can be expressed as a single composite parameter of protein-induced DNA deformability known as Vstep [136-138]. Vstep represents the physical relationships of any two planar base pairs in terms of their relative shifts and angular orientation. In contrast to A-philicity and protein-induced deformability, DNA bendability is best modeled using a sliding window of three bases, with 64 possible trinucleotide bendability values .
An example of DNA structural analysis for the Tol2 transposon is shown in Figure Figure3,3, in which average structural values for each position flanking an insertion site are plotted and compared with a plot of random sequences. In the case of Tol2, weak preferences in Vstep and A-philicity values at specific coordinates are apparent by the peaks in the heavy black lines in Figure 3a,b (left sides), in contrast to the same averages derived from random sequences (right sides). Overall, the bendability around Tol2 insertion sites exhibits little deviation from a random sequence (Figure (Figure3c),3c), unlike those preferred by SB transposase (Figure (Figure3d).3d). Analysis of hundreds of integration sites for potential gene therapy vectors, including viruses as well as transposons, shows that many have subtle preferences for these variables (Figure (Figure4).4). For example, the piggyBac transposon may favor sites with slightly higher A-philicity, lower bendability, and lower Vstep values than random sequences. In contrast, 'preferred' SB insertion sites (see below) clearly display a jagged Vstep pattern and higher bendability. Interestingly, although retroviruses (avian sarcoma virus [ASV], HIV, MLV, and simian immunodeficiency virus) integrate into bent DNA , such as that bound to nucleosomes, our analyses of sequences around viral insertion sites do not indicate a particular preference for bendable DNA (Figure (Figure4).4). A similar, more rigorous approach has been utilized to characterize Drosophila P-elements  and non-LTR retrotransposons in Entamoeba histolytica , demonstrating that DNA structural characteristics at insertion sites for both elements are significantly different from collections of random sequences.
For SB, the observation of general structural trends surrounding insertion sites eventually led to the identification of a specific DNA structural pattern governing insertion preference. Vigdal and coworkers  observed that increased DNA deformability and A-philicity were features of a consensus sequence that flanked SB TA insertion sites. Subsequently, Liu and colleagues  mapped about 200 integrations into a relatively small 7 kilobase plasmid sequence and observed that some common integration sites did not share the consensus sequence. These results identified several 'preferred' TA dinucleotides that harbored recurrent integrations. These preferred integration sites exhibited a striking, specific pattern of alternating high and low deformability (Vstep) values that were absent in TA sites and that were rarely, if ever, used. This led to the conclusion that SB transposase prefers a 'zigzag' Vstep pattern of DNA deformability , which was later confirmed on a larger, genomic scale . It remains unknown whether these patterns influence the recognition and binding of the SB transposase, catalysis of the transposon integration, or some other mechanistic factor.
This analysis was repeated for other vectors, including piggyBac, P-elements, and several retroviruses . However, only weak structural signatures were detected, which were no more informative than the weak consensus sequences previously identified. A key difference in the SB screen was the level of saturation of a small target, which allowed for the identification of highly preferred sites over nonpreferred TA dinucleotides. In contrast, the datasets for the other vectors were derived from a relatively small number of insertions into mammalian genomes, which were insufficient to obtain an initial set of preferred sequences. Because nonpreferred sites are likely to vastly outnumber preferred sites in the genome for most vectors, any genome-wide screen will produce a mix of indistinguishable preferred and nonpreferred sites. For example, we have estimated that of the approximately 200,000,000 TA sites in a human genome, only about 10% fall into the preferred category , although in the screen conducted by Yant and coworkers  189 out of 573 (33%) genomic SB insertions were classified as preferred sites. Analysis of the bendability of all SB sites mapped in the screen reported by Yant and coworkers shows a peak at the center of the insertion site that is defined by the central TA dinucleotide. However, when only the preferred sites are analyzed, the surrounding nucleotides exhibit a much greater level of bendability (Figure (Figure3d).3d). This effect is in spite of the fact that the preferred sites were identified based on protein-induced deformability, Vstep, which is distinct from DNA bendability. The lesson from these studies is that most genome-wide datasets (particularly from experiments involving some form of genetic selection) will probably show a similar dilution effect of preferred sites by greater numbers of nonpreferred sites.
There is a caveat to the analyses discussed up to this point; they all assume that the structures around integration sites have an absolute center of reference, defined by the site into which the vector integrated. Such analyses could miss structural patterns that are not strictly position specific. For instance, an integrase may have preference for a local region that is highly bendable or deformable, but it may not have a requirement for a particular pattern (or sequence). To account for this, we have examined a parameter called 'jaggedness', which we define as the degree to which Vstep values alternate from high to low, as in the preferred 'zigzag' sites for SB. We calculated jaggedness by taking the sums of the absolute values of the differences between adjacent Vstep values across a sequence, so that a jagged/zigzag site would have a higher total value than a flat, basal site, which should have a jaggedness value close to 0. Jaggedness values for several vectors are shown in Figure Figure4.4. Although jaggedness values at insertion sites are similar to Vstep values for most vectors (with the possible exception of Tol2), the jaggedness patterns show a high degree of variability across genomic sequences and are somewhat independent of Vstep patterns (for instance, the c-myc gene; Figure Figure55).
We see two uses for profiling the insertion site preferences for integrating vectors. First, in functional genomics screens, insertion profiles that emerge can be compared with expected profiles that are only structure based rather than genetics based. A striking example of this is evident in the oncogene screens conducted with the SB transposon [58,59], which is illustrated in Figure Figure66 with respect to the Braf gene. Integration sites that emerged from the screen are shown across the entire locus (Figure (Figure6b)6b) and in a selected region comprising exons 10-13/introns 10-12 (Figure (Figure6d),6d), where most of the integrations were selected because of induced expression of a truncated gain-of-function kinase polypeptide. Panels a and c show insertion site preference scores across the region obtained using an automated script (ProTIS) that counts and scores preferred TA dinucleotide insertion sites based on Vstep values . The results shown in Figure Figure66 make two strong points. The first is that the frequency of oncogenic insertions in a select region correspond to that predicted on the basis of preference profiling (Figure 6c,d; specifically, microscale structure can be a good predictor of integration site preference). The second is that many predicted hotspots (Figure 6a,b) were not sites that lead to oncogenesis. The combination of these two observations enhances the biologic importance of the integrations into introns 11 and 12.
The second application of predicting profiles of vector insertions may be as part of a risk assessment program. Although current understanding of integration site preferences for most vectors is still inadequate to allow prediction of the probability of integration into specific genes, genome-wide integration datasets may suggest the likelihood that a vector will integrate within the general vicinity of a specific gene. Similarly, analysis of DNA structural characteristics may be used to assess the likelihood that each vector will integrate within specific regions of genes. For example, although Braf can act as a potent oncogene, the pattern of SB integrations into Braf suggest that integrations into a relatively small region of the gene (introns 11 and 12) are the most highly selected for oncogenesis, in spite of the presence of hotspots across the entire gene. Thus, the range of possible insertions that are capable of generating an oncogenic transcript, combined with the relative 'attractiveness' of the sequence across these regions, will dictate the chances of insertional activation.
An analysis of several structural characteristics is presented for the mouse c-myc gene (Figure (Figure5),5), the human ortholog of which is activated in many cancers . The figure highlights the 3 kilobase region encompassing the promoter that harbors the bulk of oncogenic retroviral integrations at this locus that have been deposited in the Retroviral-Tagged Cancer Gene Database (RTCGD ). The sequence was divided into 50 base pair (bp) bins, and the total values for Vstep, A-philicity, jaggedness, and bendability were summed across each bin. Measured in 50 bp bins, these structural parameters are highly variable across the sequence, and vary independently from each other. Actual oncogenic retroviral insertions observed in insertional mutagenesis screens and deposited into the RTGCD are shown for comparison in Figure Figure5a.5a. The profiles indicate two features of transposons under consideration for gene therapy. First, the most likely sites for SB transposons to integrate (Figure (Figure5g)5g) are shifted away from the most commonly found activation sites, as revealed by retroviral integrations (Figure (Figure5a).5a). Second, the profile of TTAA sites, required by the piggyBac transposon (Figure (Figure5f),5f), is similar to the preferred SB sites, and further shows that some regions harboring retroviral integrations contain no TTAA sequences, making piggyBac insertions into these sites impossible. Thus, at first approximation, it would appear that the transposons are less likely to insert close to the c-myc promoter than are retroviral vectors. In support of this, c-myc is infrequently hit in SB-based insertional mutagenesis screens; to date, only one c-myc integration has been deposited into the RTCGD. In contrast, many retroviral insertions into c-myc have been mapped, although the number of deposited retroviral insertions is much higher than the number of transposons.
The relative lack of SB insertions into c-myc may be due to either a paucity of favorable SB insertion sites in regions of the gene competent for oncogenic activation, or an overall lack of oncogenic selection for insertions into this gene. In support of the former, transposon-free amplification of c-myc was one of the few genomic aberrations observed in tumors harboring mobile transposons (Largaespada DA, Collier LC, Hackett CS, unpublished observations), suggesting that activation of c-myc plays a role in the biology of these tumors (there was probably oncogenic selection for the genomic amplicon). Similar ProTIS analysis of the LMO2 locus revealed the most preferential integration sites for SB transposons that were considerably farther away from the LMO2 promoter than mapped integrations by activating retroviruses . That said, it is evident that prediction of vector integration is not precise and even rare integrations into unfavorable sites have a potential to promote oncogenic expansion, as indicated in Figure Figure66.
In spite of the inherent behavior of each integrating vector, existing evidence suggests that the oncogenic potential of any given vector can be attenuated depending on how it is used. As with retroviruses, the SB transposon has been used for functional genomics as well as for delivery of therapeutic genes in mouse models of inherited disease. These studies were motivated by two limitations of retroviruses for insertional mutagenesis: the limitation of viruses to infect specific cell types and the tendency of many viral vectors to insert near and activate a possibly limited number of genes . In two recent SB mutagenesis screens, a transgenic concatemer of T2/Onc transposons carried in the germlines of mice was remobilized in somatic cells by a trans-acting, transgenic SB transposase. The two screens differed in expression level, domains of expression, and activity of the SB transposase, as well as the copy number of the transposon concatemers [58,59]. An important finding from the two studies was that the oncogenic potential of the same T2/Onc transposon vector, which was engineered specifically to activate oncogenes and cause cancers in mice, varied between no observable phenotype on one end and rapid development of severe cancer at birth on the other. The oncogenic effect was directly related to the number and types of cells at risk for transposon-induced mutations and perhaps the remobilization rates. The same properties may be relevant for a wide range of other gene therapy vectors.
Coupled with the lack of a preference to integrate near genes, the chances that an SB insertion of a therapeutic gene (in contrast to a genetic cassette designed to wreak havoc on transcriptional units) will activate a neighboring host gene would seem to be lower than for vectors that have an affinity to integrate into genes [65,97]. This feature may be a disadvantage for SB-based functional genomics studies aimed at mutating genes, but it may be advantageous for gene therapy.
As an alternative to finding vectors that do not target genes, several groups are attempting to target vector integration to a specific region of the genome by generating integrase and SB transposase molecules that are fused to DNA-binding domains that recognize specific DNA sequences [143,144]. It appears that targeting introduces a reduction in activity, without much increase in specificity of integration into specific sites in a mammalian genome [144,145]. This is not surprising if the ability of SB transposase to integrate promiscuously into TA sites is not abridged. There are about 2 × 108 potential TA-dinucleotide SB integration sites into which SB transposons can integrate, of which it is estimated that 2 × 107 are preferred integration sites . Consequently, the chances of a sequence-specific targeting motif added to SB transposase actually guiding transposition to a specific, low-copy target sequence is expected to be extremely low compared with the chances of integrating into any of the millions of other available TA sites. Similarly, to overcome the risk for activation of neighboring genes following vector integration, self-inactivating vectors are being engineered to have diminished ability to activate genes over long distances [146,147], although it is not clear whether these vectors will be safer . The C31 phage integrase system targets relatively few sites in mammalian genomes [119,149], but it appears to introduce a relatively high level of chromosomal recombination [149-151]. Thus, further development of safer vectors remains an open area of investigation.
Ultimately, functional genomics and gene therapy would like to answer the same question for any given vector (while hoping for opposite outcomes) - what are the chances of activating genes? There are four major factors influencing the answer, with each retroviral and transposon having different characteristics for each factor. First, what is the overall tendency of the vector to integrate into genes or promoters? Second, are there adequate local target sites around genes of interest to attract the vector? Third, over what distance can the vector activate a gene? Fourth, to what end can the integration activity be modulated to control the overall likelihood of hitting specific insertion sites close enough for activation of specific genes? Theoretically, knowing each of these variables for every vector would allow researchers to choose the vector with the most utility and lowest risk for the specific purpose intended. In gene therapy, these parameters translate into the risk for hitting a specific oncogene or tumor suppressor gene that could lead to a severe adverse effect. If, in the future, hotspots for integration of SB and other potential gene therapy vectors can be predicted, then we should be able to assess more accurately and modify the various risks for adverse effects from therapeutic vectors. This goal should be within reach in the coming years.
Since submission of the manuscript, adeno-associated viral vectors (AAV) have been implicated in the induction of hepatocellular carcinomas in mice  and in the death of a patient in a clinical trial for treatment of rheumatoid arthritis .
PBH owns stock in Discovery Genomics, which is conducting research on the SB transposon system. The other authors declare that they have no competing interests.
We thank the Arnold and Mabel Beckman Foundation for support of our work and all members of the Beckman Center for Transposon Research for a long history of contributions of ideas and results. We appreciate the help of Drs Nik Somia and Marina O'Reilly in determining the number of gene therapy trials reviewed by the RAC. We are especially grateful to Dr Darius Balciunas and Kirk Wangensteen for sharing their Tol2 dataset, and to Drs David Largaespada and Lara Collier, as well as two reviewers, for discussions about the manuscript. The authors were supported by DOD fellowship BC050930 (CSH), and NIH grants T32 HD007480 (AMG) and 1PO1 HD32652-07 and R43 HL076908-01 (PBH).
This article has been published as part of Genome Biology Volume 8, Supplement 1, 2007: Transposons in vertebrate functional genomics. The full contents of the supplement are available online at http://genomebiology.com/supplements/8/S1.