|Home | About | Journals | Submit | Contact Us | Français|
We review different uses of the retroviral mutagenesis technology as the tool to manipulate the zebrafish genome. In addition to serving as a mutagen in a phenotype-driven forward mutagenesis screen as it was originally adapted for, retroviral insertional mutagenesis can also be exploited in reverse genetic approaches, delivering enhancer- and gene-trap vectors for the purpose of examining gene expression patterns and mutagenesis, making sensitized mutants amenable for chemical and genetic modifier screens, and producing gain-of-function mutations by epigenetically overexpressing the retroviral-inserted genes. From a technology point of view, we also summarize the recent advances in the high-throughput cloning of retroviral integration sites, a pivotal step toward identifying mutations. Lastly, we point to some potential directions that retroviral mutagenesis might take from the lessons of studying other model organisms.
One of the most powerful techniques used to define gene pathways is random mutagenesis followed by classical genetic analysis. Using different model organisms, geneticists have long sought to find mutagenesis strategies that could generate mutations with high efficiency of a defined molecular nature. In the late 1920s, X-ray irradiation was shown by H.J. Muller to produce abundant genetic mutations and chromosomal rearrangements in Drosophila . X-ray mutagenesis was further utilized to mutate the mouse genome in the 1930s . X-ray irradiation typically causes chromosomal rearrangements, which can act as a molecular landmark for identifying the affected gene(s). However, many genes are affected by chromosomal rearrangements and the nature of the resulting mutagenic lesions is thus complicated and unpredictable. Later, the chemical chlorambucil was found to be a more efficient mutagen than X-ray radiation in mutating the mouse genome . However, multi-gene deletions and chromosomal rearrangements are also the primary consequence of chlorambucil mutagenesis. Thus, although these two mutagenesis approaches could generate mutations for genetic screens and mapping the affected genes, they did not allow the straightforward assessment and study of the causative mutations. Chemical mutagenesis using ethylnitrosourea (ENU), which primarily produces random, single base-pair changes , became the preferred approach, as ENU-induced mutations usually affect individual genes. However, ENU mutagenesis does not produce molecular landmarks with which to identify mutated genes, making the laborious positional cloning the primary strategy for identifying the ENU-mutated genes.
In 1976, the introduction of retroviral DNA into the mouse germline was first reported . Since that time, retroviral insertional mutagenesis has been utilized by researchers in the field of mouse genetics to knockout thousands of genes in an efficient large-scale effort [6, 7]. The major advantage of this approach is that it ensures the integration of a single copy of the vector in the recipient locus and the proviral genome serves as a molecular landmark to rapidly clone the affected gene.
The idea of using zebrafish Danio rerio as a vertebrate model organism originated in the late 1960s and the last 15 years has seen a rapid uptake in the use of zebrafish as a model . One of the initial appeals of using zebrafish was its potential for use in large-scale mutagenesis screens. Due to its fecundity, small size, transparent embryos, and fast embryonic development, zebrafish provides many of the advantages found in popular invertebrate models. Many mutagenesis approaches applicable to invertebrates and mouse embryonic stem (ES) cells can now be performed in a whole vertebrate animal. The potential of using zebrafish as a model for large-scale vertebrate genetic screens has certainly been fulfilled. Two large-scale ENU mutagenesis screens in zebrafish produced an impressive repertoire of mutant phenotypes, identifying more than 6600 mutations [9, 10]. However, the daunting task of positional cloning required to identify the mutated genes from ENU mutagenesis screens has to date led to the identification of only about 160 genes responsible for the corresponding phenotypes. To circumvent the laborious task of positional cloning following chemical mutagenesis, in the early 1990s, retroviral insertional mutagenesis was introduced as an alternative approach to chemical mutagenesis in zebrafish . Although the initial mutagenic rate was too low to make this approach comparable to ENU mutagenesis, the retroviral insertional mutagenesis technology was quickly improved so that viruses with relatively high titers could be produced to infect zebrafish cells efficiently [12, 13]. To date, the only efficient retrovirus system for mutagenesis in zebrafish is based on Moloney murine leukemia virus (MLV) pseudotyped with the envelope glycoprotein from vesicular stomatitis virus (VSV-G). Pseudotyping renders MLV the ability to infect zebrafish cells and also increases the stability of viral particles, allowing increasing the titer of the virus by 1000-fold through ultracentrifugation [14, 15]. A large-scale forward insertional mutagenesis screen based on this pseudotyped MLV system has been performed successfully, identifying ~500 observable embryonic recessive mutations. These mutants represent about 385 different genes; 335 of which have been identified [16–20]. This collection is estimated to include approximately 25% of the genes essential for embryonic development in zebrafish. Thus, although the mutagenic efficiency of retroviral mutagenesis is not as high as that of chemical mutagenesis in terms of mutagenic events per screened family, the overall speed of molecularly characterizing the mutants generated by retroviral mutagenesis exceeds that generated by chemical mutagenesis.
Although forward genetics has been a powerful tool to study gene function, it is fundamentally limited in its effectiveness by issues such as overlapping gene function and the need to have an observable or measurable phenotype. It is thus also desirable to go in the‘reverse’ direction and systematically disrupt the genes in the zebrafish genome and test the effects of gene inactivation, either one at a time or on a larger scale. This targeted mutagenesis approach has proven to be a powerful means to elucidate gene function in the mouse. However, due to the current lack of long-term zebrafish ES cultures, targeted mutagenesis mediated by homologous recombination is still unfeasible in zebrafish. Recently a target-selected reverse genetic approach, Targeting Induced Local Lesions IN Genomes(TILLING), has been developed to identify a desired mutation from a population of ENU-mutagenized fish through either DNA resequencing or CEL1 nuclease assays [21–23]. However, this technique requires a significant effort in time and money for each mutation identified [24, 25]. Performing saturation mutagenesis using retroviruses as the mutagen followed by mapping the retroviral integrations and cryopreserving the corresponding sperm samples could be an alternative approach to targeted gene knockouts. Once saturation mutagenesis is achieved, any desired mutant can be readily recovered by in vitro fertilization (IVF) from the archived sperm sample containing the desired mutation. Furthermore, this approach takes the middle road between the random and targeted mutagenesis approaches as phenotypes resulted from the disruption of uncharacterized genes can also be assessed by systematically testing mutant alleles of the uncharacterized genes. There are two large-scale efforts utilizing similar approaches, one a collaboration between three groups including our laboratory, University of California, Los Angeles, and Peking University, and the other, a private company, Znomics Inc. Both groups use the VSV-G-pseudotyped MLV-based retroviruses originally developed in the large-scale retroviral mutagenesis screen at Massachusetts Institute of Technology [16–20]. Following is a comparison of these two independent projects.
The mutagenesis pipeline in our laboratories (Figure 1) begins with the production of high-titer pseudotyped MLV followed by the injection of the retroviruses into blastula-stage embryos. We have improved the original protocols in producing pseudotyped MLV and in microinjection techniques  so that the infection rate in the injected embryos (i.e. founders) averages ~45 proviral copies per cell, which is approximately 3-fold higher than previous retroviral mutagenesis screens achieved [13, 20]. We raise the founders, outcross them with wild-type fish, raise a small number of male F1 fish, and subsequently sacrifice them to cryopreserve their testes. A small DNA sample from individual F1 fish is used to clone the genomic sequences adjacent to the retroviral integrations; the cloned sequences are mapped back to the zebrafish genome to locate the retroviral integrations and simultaneously indexed to the corresponding sperm samples. We expect to generate enough retroviral integrations to ‘hit’ every zebrafish gene at least once in 3–5 years. This insertional mutant library will be a public domain resource, freely available for the entire zebrafish research community.
The mutagenesis project by Znomics, Inc. takes a similar approach. The major difference between the two approaches is that Znomics, Inc. cryopreserves the sperm samples from the original injected founder fish and maps the integrations in the founder sperm samples instead of using the F1 male fish. Because the germline of injected founder fish is highly mosaic, any given retroviral integration found in the founder germline is estimated present in 3–20% of the F1 progeny produced by IVF. Because of the mosaicism, more unique retroviral integrations can be recovered from the founder sperm than from the F1's (as in our approach). This reduces the initial work and allows for very high unique integration recovery from a relatively small number of fish. However, it is often necessary to screen a large number of F1 progeny to find any given integration and the number of identified fish recovered from the IVF can be relatively small. This drawback is bypassed in our approach because the integrations are recovered and mapped in the F1 progeny; any given integration would thus be represented in 50% of F2 progeny produced by IVF of the F1 sperm sample. At the time of this writing, Znomics Inc. has integrations in more than 10 000 genes with sequence tags associated with these integrations deposited in the company's publicly accessible database. Fish carrying specific integrations can be purchased directly from the company.
To know how many integrations it will take to ‘hit’ every zebrafish gene with retroviral integrations, it is critical to know what percentage of integrations lands in genes. Based on our pilot data examining ~600 unique mapped integrations in the zebrafish genome , we found that ~65% MLV integrations landed either in genes or within 3 kb on either side of the genes. Furthermore, ~65% of those at or near gene hits landed in the first intron or <3 kb upstream of the genes. This apparent preference for MLV to integrate at the 5′ end of the gene has also been observed in the mouse [27, 28] and human tissue culture cells . We further demonstrated that not only are integrations mutagenic when they directly hit exons, but the integrations landing in the first intron of genes are also highly mutagenic; 80% of the first-intron hits have the mRNA level of the affected genes reduced to <30% of wild-type, usually >90% reduction. Integrations landing in the putative promoter region may also be mutagenic. Overall, roughly one in five retroviral integrations will result in a gene disruption by reducing the mRNA level to less than 30% of the wild-type level.
Other than the direct exon disruptions, the mechanisms by which retroviral integrations abrogate gene expression are currently not clear, especially for the cases with integrations landing in the intronic or promoter region of a gene. One possible mechanism for the intronic integrations to become mutagenic is that the retroviral integration can cause a premature truncation of the gene's transcript, either by using the polyadenylation signals of the LTRs or by using a cryptic polyadenylation signal present in the antisense orientation of the virus. Premature termination of transcription may result in the de-stabilization of the truncated transcript. A similar mechanism has been suggested for MLV to abrogate gene expression in the mouse . Both 5′ and 3′ LTRs of the MLV used in the current zebrafish retroviral mutagenesis contain the canonical polyadenylation signal, AATAAA, raising the possibility that this mechanism could account for at least a portion of the gene abrogation events. In some cases, integrations landing in or close to the exon–intron junction may induce aberrant splicing events (e.g. exon skipping), producing truncated proteins, thereby abrogating gene function (Behra M and Burgess SM, unpublished data and ). Truncated proteins can also be intentionally produced by another mechanism such as the retrovirus used in our mutagenesis scheme that contains a splice-in, splice-out, frameshift-producing gene-trap cassette (Figure 2C and ). When the retrovirus lands in an intron with the correct orientation, the splice donor in the preceding endogenous exon splices into the splice acceptor of the gene-trap cassette and the splice donor in the gene-trap cassette splices out into the next endogenous exon, resulting in the gene-trap cassette being inserted between the two adjacent exons, creating a frameshift or truncated fusion mutant. However, this mechanism of mutations appears to not be particularly efficient, as the endogenous splicing machinery often skips this gene-trap ‘exon’. To date we have only observed 1 gene-trap event among 11, correctly oriented integrations in introns (Jao L and Burgess SM, unpublished data). Another possible mechanism of retrovirus-induced gene inactivation in zebrafish might be mediated by de novo methylation of host sequences flanking the integration sites, as the insertion of a provirus has been shown to change the methylation pattern of host DNA in the mouse and the retrovirus-induced methylation has been correlated with gene inactivation . This has not been demonstrated in zebrafish, but it remains a distinct possibility.
Retroviral integration technologies are not limited to generating mutant phenotypes. They can also be used to examine gene expression patterns that could provide clues to the function of developmentally regulated genes. For this purpose, reporter systems were developed to detect cis-acting enhancer elements in the genome. This concept was first introduced for study in bacteria  and later applied to the multi-cellular organism Drosophila [34–37]. In this strategy, an enhancer detection (“trapping”) vector containing a detectable reporter [e.g. lacZ or green fluorescent protein (GFP) gene] under the control of a basal promoter is introduced into the organism (e.g. via transposable P elements in the fly) (Figure 2A). It requires the vector to insert near a cis-acting enhancer element that activates the basal promoter, causing the expression of the reporter gene. The expression pattern of the reporter gene is assumed to recapitulate that of the nearby endogenous gene, which is (presumably) under the influence of the same cis-acting element. One of the major applications of enhancer detection is to identify cell type and tissue-specific markers from which the origins and lineages of many cells and tissues can be determined. Indeed, various enhancer-trap vectors have been developed and extensively studied in Drosophila, leading to the creation of a rich library of fly strains that spatially and temporally express lacZ in various cells or tissues. Drosophila enhancer-trap studies thereby contribute greatly to our knowledge of embryology and organogenesis (reviewed in ). However, the attempt to apply this enhancer-trap strategy to the vertebrate mouse model is unlikely to be so successful, since generating large numbers of transgenic mice is not practical. In contrast to the mouse, as a model system zebrafish shares many advantages with invertebrate models. Because of these favorable attributes, zebrafish became a vertebrate model amenable to enhancer-trap screens on a whole-organism level.
Transposon elements Sleeping Beauty and Tol2 have been used to deliver the enhancer-trap vectors into the zebrafish. Both systems used GFP as the reporter for the screen of tissue-restricted fluorescent expression patterns in the F1 progeny of the injected founders [39–41]. To date about 170 enhancer-trap lines have been characterized from these three major studies combined.
The pseudotyped MLV system has the advantage of being the most efficient insertional agent in vertebrates to date. Using quality virus preparations, nearly all injected founders can transmit the integrations with an average of 10 copies per cell in the F1 progeny [13, 26]. In the best-published results of the transposon systems, about 70% of injected founders transmit the transgene, with transgene-positive F1 progeny harboring transposon insertions ranging from one to more than ten copies . Thus, the pseudotyped MLV system has significantly higher capacity for use in large-scale studies.
A large-scale enhancer-trap screen has been performed using the pseudotyped MLV system to deliver a yellow fluorescent protein (YFP) reporter gene under the control of the zebrafish gata2 promoter into zebrafish . In the initial analysis of the first 95 unique expression patterns, the authors found that on average one out of three founders transmitted an activated insertion with the majority insertions mapping close to (within 15 kb) or inside genes. Most of the expression patterns of these activated insertions resemble those of endogenous genes close to the insertion sites. However as the number of the characterized insertions, transgenic expression lines, and mapped insertions increased, this large-scale screen revealed a picture that is substantially different from what has been seen with a smaller data set . Namely, it is not necessarily the gene nearest to the insertion whose expression pattern is recapitulated. Specifically, after screening >15 000 insertions and >1000 transgenic lines and mapping 340 of these integrations to the zebrafish genome, Kikuta et al. found that at least 20% of the expressing reporter insertions were more than 15 kb away from the next transcriptional unit and the expression pattern of the transgene does not recapitulate the expression pattern of the nearest gene [42, 43]. Furthermore, several loci around well-known developmental regulatory genes were found to contain multiple copies of enhancer insertions. These areas coincide with large vertebrate chromosomal segments containing identical gene order—a phenomenon known as conserved synteny—and also contain highly conserved noncoding elements (HCNEs). Enhancer detection insertions into such segments exhibit expression patterns resembling those of the developmental regulatory genes (i.e. ‘target’ genes) but differing from those of nearby phylogenetically unrelated genes (i.e. ‘bystander’ genes). This appears to be true even though enhancer detection insertions landed closer to the bystander genes than to the developmental regulatory target genes [42–44]. The large-scale enhancer detection screen thus revealed long-range cis-regulatory elements distributed over large areas—termed genomic regulatory blocks (GRBs) by Kikuta et al. —in and around their target genes (i.e. developmental regulatory genes) and surrounding bystander genes whose expressions are not under the control of GRBs. Therefore, the large-scale enhancer detection screen in zebrafish can not only serve as a tool to mark discrete tissues or cells for the purpose of tracking a developmental process, but can also help accurately annotate the vertebrate chromosomal architecture. Interestingly, the concept of GRBs has also been shown to apply to insect genomes after the enhancer-trap data from five Drosophila genomes had been recently examined . In the same report, the authors further showed that differences in core promoters between the target genes and the bystander genes account for the differences in their responsiveness to GRBs .
This retrovirus-based enhancer detection screen represents the only enhancer-trap system to efficiently generate the enhancer-trap lines in a vertebrate on a scale comparable to what has been accomplished in Drosophila. The question remains whether current trap vectors can report insertions into all classes of genes and their products. It is unlikely that any given enhancer element will work equally on all basal promoters. In addition, different insertional agents have distinct integration preferences in the genome . Thus, to annotate the enhancer elements in the genome comprehensively, it is necessary to develop various kinds of basal promoters to drive the reporter gene and also to use different insertional agents for vector delivery to cover the genome as extensively as possible. Currently at least four different basal promoters have been used in zebrafish enhancer-trap screens—partial/proximal promoters from ef1α , keratin8 , gata2 , and hsp70  genes. Their responsiveness to genomic enhancers appears to be different as the frequency of generating unique expression patterns per insertion using these four different basal promoters varies, ranging from 4% to 58% [41, 47, 48]. However, currently it is unclear whether these various basal promoters used in different enhancer-trap screens have actually ‘trapped’ distinct sets of enhancers. In the most recent Tol2-based enhancer-trap screen , the authors have increased the Tol2 transgene-transmitting founders to 70%, representing a significant advance from 16% in the first Tol2-based enhancer-trap screen . Therefore, it appears that using different insertional agents (e.g. pseudotyped MLV and Tol2 transposon) coupled with various basal promoter-reporter constructs to perform comprehensive genome-wide enhancer-trap screens in a vertebrate model is now feasible. The data generated from those genome-wide enhancer detection screens will not only provide a rich resource of trap lines to mark various tissues and cells for the use in studying cell lineages and developmental biology, but also provide a unique tool to probe the regulatory elements and the vertebrate genome architecture as has been exploited in ref. . Interestingly, enhancer-trap studies using retroviruses are surprisingly mutagenic (5–10% of trap integrations ) which may be a function of the inherent mutagenicity of the retroviral sequences that land in introns of regulated genes and not a direct consequence of the enhancer-trap functions. However, Nagayoshi et al.  have also demonstrated that the Tol2 transposon-based enhancer-trap approach can create insertional mutations in developmental genes. They isolated two phenotypic mutants out of 54 enhancer-trap insertions. This finding opens the possibility that insertional mutagenesis may also be performed using the transposon-based enhancer-trap strategy.
Unlike enhancer detection vectors, gene-trap vectors require that the insertion land within the transcription unit of the gene. Most commonly, a gene-trap vector contains a promoterless reporter gene with a splice acceptor located at its 5′ end and a polyadenylation (polyA) signal at its 3′ end (Figure 2B and ). If the vector lands in the intron of a gene in a correct orientation, a fusion transcript is generated from the upstream coding sequence and the reporter gene, simultaneously mutating the trapped gene and reporting its expression pattern. This gene trapping strategy has been very successful for ES-cell mutagenesis in the mouse. Various gene-trap- and related vectors have been developed . Generally those vectors contain a selectable marker, such as the neomycin resistance gene, so that only ES-cell clones that contain vector insertions can be selected.
To date there is only one reported gene-trap construct delivered into zebrafish through retroviral infection . As discussed in the previous section and shown in Figure 2C, this gene-trap cassette is not a ‘classical’ gene-trap construct because it does not contain a reporter gene and has a splice donor at its 3′ end of the cassette instead of a polyA signal. With an intronic insertion in a correct orientation, this particular gene-trap cassette is expected to act like an ‘exon’ and traps genes via a ‘splice-in, splice-out’ mechanism. However, as discussed in the previous section, this gene trapping mechanism appears to have limited efficiency (e.g. <10% of correctly oriented proviral integrations in the introns triggered the expected trapping event). Unlike the method of gene trapping in the mouse ES cells, where a selectable marker is usually co-expressed with the reporter gene as a means to select the gene-trap-positive clones, introducing the gene-trap cassette via retroviral infection into the whole zebrafish embryos does not involve any pre-selection process. Thus, the low trapping efficiency in the non-selected infected fish is not a total surprise. Gene trapping in zebrafish has also been attempted by using Tol2 transposon element as the vehicle for delivery of the gene-trap cassette, which contains a splice accepter, promoterless GFP, and a polyA signal (the ‘classical’ type) [50–52]. Approximately 1 in every 12 inserts resulted in the expression of GFP in a specific pattern. To date 74 Tol2 gene-trap insertions have been characterized, but only one of them displayed recessive observable mutant phenotypes . Because of the still relatively small sample size, the mutagenicity rate from this Tol2 gene trapping approach is still unclear.
The nature of the gene-trap insertions indicates that the frequency of a gene-trap event is expected to be lower than that of an enhancer-trap event because gene trapping will only work if the insertion is in the intron of a gene and is correctly oriented. Enhancer-trap insertions, on the other hand, can be activated by enhancers in an orientation-independent fashion and across a long range of a genomic segment. Under the circumstances of random transgenesis with no pre-selection process in the whole organism, it is thus expected that a large-scale enhancer-trap screen is more likely to produce a higher number of trap events (i.e. % of insertions that give a pattern) than a gene-trap screen does.
As discussed previously, the only large-scale forward retroviral mutagenesis screen has isolated about 500 visible embryonic recessive mutants, representing about 385 mutated genes, 335 of which have been identified [16–20]. This set of embryonic essential genes covered a broad range of biochemical and cellular functions, prompting a number of ‘shelf screens’ to identify subsets of mutants affecting specific developmental pathways beyond simple embryonic lethality. The power of the shelf screens lay in the fact that the mutated gene was already known, eliminating the need for subsequent gene identification. Examples include the identification of subsets of mutations affecting cartilage differentiation [18, 20, 53], eye development , and mutations leading to cystic kidneys  and enlarged livers . Some of the affected genes in these developmental pathways were also found to be the homologues of human disease genes [55, 56]. Thus, for certain phenotypes, zebrafish embryonic lethal mutants may serve as models for human disease.
However, many human diseases are age-related. Thus, it is also desirable to use adult animals for disease modeling. To analyze permanent mutations in adult fish, it requires either the homozygous mutations to be nonessential for early development or the embryonic lethal mutations to be maintained at the heterozygous state. In either case, the mutant adult fish can potentially serve as ‘sensitized’ mutant lines, which can provide a platform for chemical or genetic modifier screens aimed at identifying small molecules or mutations that suppress or enhance the disease phenotype.
Retroviral insertional mutagenesis is particularly suitable for establishing such sensitized mutant lines, which can serve as the starting generation for modifier screens, because the genetic background of the insertional mutant lines can be thoroughly characterized in terms of the number and locations of proviral integrations in the genome. In contrast, the mutagenic lesions, mainly point mutations, generated through ENU mutagenesis cannot all be identified easily, letting along being thoroughly profiled. The complex genetic background in an ENU-induced sensitized mutant line may complicate the subsequent modifier screens since co-segregating, undesired lesions elsewhere in the genome may also contribute to the altered mutant phenotypes, making the interpretation of the genetic interaction difficult without significant backcrossing to clear the large number of mutations.
One example of sensitized mutant lines generated by retroviral insertional mutagenesis was shown by Amsterdam et al. , where a number of heterozygotes for mutations in ribosomal protein (RP) genes were found to exhibit a strong genetic predisposition to cancer, predominantly malignant peripheral nerve sheath tumors (MPNSTs). MPNST is an otherwise rare neoplasm in wild-type zebrafish populations. Interestingly, it has also been found that tp53 (i.e. the human p53 homologue) homozygous mutant zebrafish were also predisposed to develop this rare tumor type , suggesting possible interactions between these two pathways. Modifier screens based on these sensitized mutant lines could uncover novel genetic interactions that act in downstream and parallel pathways or even small molecules that can rescue the cancer-predisposition phenotypes.
To compliment loss-of-function screens and to expand the application of insertional mutagenesis in zebrafish, we have designed a virus that allows gain-of-function screens, inspired by the P-element-based, gain-of-function screens in flies. In these studies the P element has been used to deliver an upstream activating sequence (UAS) DNA response element into the genome and in combination with Gal4 driver lines, the phenotypic consequence of overexpressing the downstream gene was analyzed [59, 60]. In these Drosophila screens, overexpression from nearly 2% of the insertions led to defects in processes such as wing development , germ cell development and survival , and muscle formation . Similar to the P element, MLV integrations occur primarily at the 5′ end of genes, making it an excellent vector for genome-wide gain-of-function screens. A gain-of function screen should markedly increase the efficiency of phenotype-based screens in at least two ways. First, the screen can be performed in the F1 generation, saving resources that are required to raise F2 generations. Second, the screen examines the effect of all germline insertions in a mosaic founder, instead of propagating only a fraction of the insertions in a few selected F1 progeny. These two features make it possible for a small group to screen tens of thousands of insertions.
To this end, we (Maddison LA and Chen W) modified the viral construct to include a polyA signal to terminate transcription from any upstream promoters, followed by an UAS-based promoter to allow Gal4-mediated gene activation, and a downstream synthetic element that includes a splice donor to tag activated transcripts (Figure 3). When the virus is inserted in the correct orientation, Gal4-dependent transcription will be initiated from the UAS promoter within the provirus and splicing will occur from the splice donor of the synthetic element to the nearest exon. If the viral insertion occurs within an intron, the preceding exons will not be contained in the resulting RNA. If the insertion occurs upstream of the first coding exon, a full-length protein should be produced. To activate the virus-inserted gene, only one allele needs to be targeted, allowing a genetic screen to be performed on the first generation. Furthermore, depending on the promoters used to control expression of the Gal4 transcription factor, overexpression of the viral-inserted gene can be restricted in a spatial and temporal manner.
We generated this viral construct and found that the modification had little impact on viral titers compared to previously used vectors. We routinely generated fish carrying greater than 10 proviral inserts per cell. When the F0 fish carrying viral insertions are mated to a transgenic line that carries Gal4:VP16, we can readily detect overexpression of the genes downstream of the viral insert in F1 embryos. Therefore, this insertional mutagenesis strategy is an excellent method to overexpress genes in a random manner. A full genetic screen is ongoing to determine the functional consequence of overexpression.
The major advantage of retroviral insertional screens over ENU-based screens is the substantially increased speed with which mutated genes can be identified. To locate the proviral integration sites in the genome, it relies on the acquisition of the genomic sequences adjacent to integration sites and the subsequent mapping of the cloned sequences in the whole genome assembly. This process is usually done by the combination of inverse PCR or linker-mediated PCR (LM-PCR), shotgun cloning, and DNA sequencing, followed by the similarity search of the cloned adjacent sequences against the whole genome assembly (e.g. [26, 29]). Under current protocols, it is believed that only a fraction of the integrations can be recovered as the quantitative PCR results often estimate more integrations in the sample than actually recovered (Jao L and Burgess SM, unpublished data). This limited integration recovery did not cause an issue for phenotype-driven insertional mutagenesis screens because only a small fraction of integrations, which were linked to the phenotypes, needed to be identified. However, to use the same retroviral mutagenesis strategy as a reverse genetics tool to build a ‘knock-out’ library, it now becomes critical to efficiently identify as many integrations as possible at the lowest possible cost. Several limitations in the current mapping technology can account for the incomplete recovery of integrations. First, it relates to the choice of restriction sites used to fragment the genomic DNA. Usually only a single frequently cutting enzyme (e.g. MseI) is used for fragmentation. The integration sites that occur either too close or too far from this chosen restriction site may result in the amplicons from LM-PCR being too small to be resolved or too large to be amplified, thus limiting the cloning to a subset of clones in a mix. One solution to this problem is to separately use different frequently cutting enzymes to digest the genomic DNA so as to increase the chance of getting an appropriately sized fragment prior to LM-PCR. Another possible limitation is that some clones within the mix of LM-PCR products may not be accurately represented due to bias during amplification, causing a low abundance in the LM-PCR product mix, and thus failing to be recovered with a small number of sequencing reactions. To solve this problem, exhaustive sampling of the LM-PCR products for DNA sequencing should facilitate the recovery of those underrepresented clones. A third limitation is the labor-intensive and expensive nature of the method. Performing a large number of restriction enzyme digestions, LM-PCR, subcloning, colony picking/growing, DNA isolation, and DNA sequencing requires significant manpower and investment, especially when multiple restriction enzyme digestions and exhaustive sampling are applied to maximize the recovery of integrations.
To circumvent those limitations and improve the global detection capacity, our laboratory has recently developed a new strategy to recover the integration sites (Figure 4, Jao L and Burgess SM, unpublished data). The strategy starts with the use of two restriction enzymes, MseI and MaeI, to fragment each integration-containing genomic DNA sample separately. These two enzymes recognize the 4-base TTAA and CTAG sites in the genome, respectively, resulting in the digested genomic DNA with the same 5′-TA overhangs, which can be subsequently ligated to the same set of linkers while doubling the complexity of the digested genomic DNA. We have further embedded a 6-base stretch of ‘barcode’ sequence in each linker. Thus, up to 4096 different genomic DNA samples (e.g. the ones isolated from different F1 male fish) can be individually ‘coded’ after being ligated to the 6-base barcode linkers. This ‘barcoding’ strategy allows minimizing the sample handling and consumption of reagents during LM-PCR because the barcode linker-ligated DNA samples can be pooled and subjected to LM-PCR as a mix in a highly reduced volume. However, the most labor-intensive and expensive part of the methodology is the cloning and sequencing steps following LM-PCR. To make the post-LM-PCR steps cost-efficient and extremely high-throughput, we incorporated a massively parallel sequencing platform, Illumina 1G system , into our workflow for clonal amplification and sequencing of the LM-PCR products to replace the conventional shotgun cloning and Sanger sequencing . The Illumina 1G sequencer generates up to 1.3 Gbp in 25– 36-bp reads in a single 80 h run. Thus, it can process thousands of DNA samples containing multiple retroviral integrations in a single run. Most importantly, the complexity of the samples should be far less than the capacity that the platform can handle (approximately 10 000–20 000 unique integrations are expected to be present in 1000 F1 progeny of well-infected founders). This means that the under-represented clones should also be sequenced. By sequencing from both ends of each LM-PCR product, the majority of the resulting paired-end 25– 36-bp short reads, which also contain the individual barcode information, would be sufficient for aligning back to the genome to identify the integration sites. By implementing this new technology, we can significantly increase the robustness and coverage of recovering integration sites while lowering the overall cost to more than 100-fold.
Retroviral vectors have been extensively exploited in the mouse as a versatile transgenesis tool. In addition to serving as a powerful tool for use in permanent loss-of-function (e.g. via gene-trap mutagenesis) or gain-of-function strategies (e.g. via overexpression of a transgene), a new generation of retroviral vectors has been developed to achieve conditional control of gene expression in the mouse, allowing genes to be activated or silenced where and when the investigator chooses (reviewed in ). This temporal and spatial control of gene expression has been accomplished by using binary transgenic systems, in which gene expression is controlled by the interaction of two components: an ‘effector’ transgene, the product of which acts on a ‘target’ transgene, resulting in either transcriptional transactivation or DNA recombination. The tetracycline-controlled transactivator (tTA or reverse tTA, rtTA)/tetO operator regulatory system and the yeast Gal4/UAS system are the two examples widely used to control temporal and spatial gene expression, while the Cre and Flp recombinase systems are used to induce site-specific DNA recombination for activating or silencing gene expression by deleting DNA fragments that have been flanked by directly repeated loxP or FRT sites (so called ‘floxed’ or ‘flrted’ alleles for Cre or Flp, respectively). However, applying such binary systems to zebrafish for more sophisticated control of gene expression is still in its infancy. Only few such effector and target transgenic fish lines exist (for Cre/loxP lines, e.g. [67, 68]; for Gal4/UAS lines, e.g. [69–73]). Those transgenic lines were established by either doing linear DNA microinjection [67–69] or utilizing Tol2 [70, 72, 73] or Sleeping Beauty  transposon systems for transgenesis. Unlike in the field of mouse genetics, to date no such line has been generated by using retroviral vectors as the vehicle for transgenesis, most likely because it is not always straightforward to generate viruses with high titers. However, since the retroviral insertional system is still the most efficient insertional agent in vertebrates, it is worth exploiting this system.
As the saturation mutagenesis projects using the retroviral insertional technologies are being developed simultaneously by our laboratories and Znomics, Inc., with the help of new integration recovery technologies, the number of insertional mutants with defined mutagenic lesions is exponentially increasing. Every important class of proteins will soon be represented in the identified insertional mutant genes. Thus, it will become possible to systematically screen for embryonic and late-onset phenotypes by breeding the individual mutations to homozygous states. Prioritizing the classes of proteins involved in human disease, especially those without existing mouse mutations, should be one of the promising avenues to pursue. Combined with their pharmacological and chemical tractability, homozygous mutant lines showing prominent disease phenotypes can act as excellent starting points for additional small-molecule modifier screens to identify compounds that can rescue their phenotypes. Those identified compounds will be the lead candidates for drug discovery to treat the equivalent human disease. At least two chemical modifier screens have been successfully performed on the level of whole, ENU-mutated zebrafish embryos to find compounds that suppress the aortic coarctation  and embryonic cell cycle defects , respectively. In our preliminary effort of screening for late-onset phenotypes, we have found an example in which the homozygous insertional mutants with the putative thyroglobulin (tg) gene being affected showed no observable embryonic phenotypes. However, as they entered adulthood, they developed a goiter-like, enlarged red bulb under their jaws, a phenotype reminiscent of that observed in human hypothyroidism (Figure 5). Since the phenotypes were late-onset, they would have been missed in the classical embryo-based, phenotype-driven screens. Chemical modifier screens should also be applicable to adult mutant fish such as the tg mutants described above to discover small molecules that suppress, in this case, the goiter phenotypes, even without targeting the affected gene directly.
Retroviral insertional mutagenesis in zebrafish has evolved from a straightforward, easily tractable mutagenesis strategy used in classical phenotype-driven screens [17–20] to a powerful reverse genetic tool for constructing mutant libraries that will contain mutations in nearly every zebrafish gene in the future . Unlike in the mouse model, where the insertional mutations are archived in the form of frozen ES cell lines, insertional mutations in zebrafish are stored in the form of frozen sperm samples. Through a simple IVF, a fish heterozygous for the desired mutation can be quickly and cost-efficiently generated. This feature makes zebrafish an excellent alternative vertebrate system for human disease modeling and drug discovery in the whole animal, especially when combined with the reverse genetic approaches prioritizing genes implicated in human disease pathways.
This retroviral insertional technology has also been developed as a transgenesis tool to perform the only genome-wide enhancer-trap screen in a vertebrate . This screen not only generated hundreds of unique enhancer-trap lines with distinct expression patterns in various cell types, but also helped probe the vertebrate genome architecture and shed light on the mechanism for maintaining conserved synteny in vertebrates during the course of evolution . A more thorough probing of the vertebrate genomic regulatory elements can be achieved with more diverse enhancer-trap vectors being designed and newer transgenesis methods being developed. We have also used retroviral technologies for generating gain-of-function mutations where genes can be mis-expressed in a tissue-specific manner.
To ask increasingly more sophisticated biological questions, investigators need to design new tools that render the ability to turn genes on and off at their discretion. It is desirable to have systems that can modify the transgenes after integrations. Various binary transgenic systems have been developed in the mouse for this purpose, but few equivalent systems have been developed in zebrafish. Each functional genomics approach has its strengths and weaknesses, and only by taking advantage of each will the functions of the genome be understood. For this aspect of the retroviral insertional technology in zebrafish, it has plenty of room to improve and expand, as we have seen numerous successful cases in the mouse.
Li-En Jao is from Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
Lisette Maddison is from Znomics Inc., Portland, OR, USA.
Wenbiao Chen is from Vanderbilt University School of Medicine, Nashville, TN, USA.
Shawn M. Burgess is from Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.