Complete collections of well-defined mutants have helped shed light on the biology of model organisms, such as flies 
and bacteria 
. Likewise, the development of a complete collection of mouse mutants would enhance our ability to understand mammalian biology 
. Libraries of mutant mouse embryonic stem cells (ESCs) are particularly valuable because they can be readily cryopreserved and used to generate mutant mice. Gene trapping in ESCs is an effective, high-throughput technique for generating insertional mutations in the mouse genome 
. Ultimately, however, non-targeted trapping becomes inefficient; some genes are repeatedly trapped, and others are trapped rarely, if at all 
. A better understanding of the characteristics that determine susceptibility (or resistance) to trapping would be useful, as it would further understanding of vector insertion into the genome and could help guide large-scale mouse mutagenesis efforts.
The factors that determine the “trappability” of individual genes (i.e.
, their likelihood of being inactivated by gene trapping) are unclear. The integration of gene-trapping vectors into chromosomal DNA is potentially influenced by a number of factors, including the intrinsic properties of the vector, the expression level of the gene in mouse ESCs, chromatin structure, DNA substrate recognition, and gene size. In addition the existence of highly favored integration sites (hotspots) complicates efforts to understand the factors that control trappability. 
Gene expression levels in ESCs are believed to positively correlate with trapping efficiency with expression-dependent vectors, but the extent of the expression effect in different gene-trap vectors has not been systematically quantified or compared. Splice-acceptor (SA) gene-trap vectors depend on the integration of a new SA sequence to interrupt the trapped gene 
. When successful, SA-trap vectors inactivate the trapped gene and result in an antibiotic-resistance gene product that allows for selection of the mutant cell lines. These vectors lack a promoter, so endogenous gene expression is required to drive transcription of the vector product. However, gene expression has not been tested on a large scale while controlling for gene length, which is also thought to affect trappability.
In polyadenylation (poly-A) gene-trap vectors, by contrast, the antibiotic-resistance gene is driven by a strong promoter within the vector. The stability of the transcript for the antibiotic-resistance gene depends on the poly-A signal from the trapped gene 
. Because the transcription of the antibiotic-resistance gene product does not depend on the endogenous expression of the trapped gene, poly-A trap vectors are predicted to trap genes regardless of whether the gene is expressed in ESCs.
The method of vector delivery to cells (retroviral vector versus plasmid DNA) may also influence which genes are inactivated by gene trapping. Retroviruses are predicted to insert at the 5′ end of transcriptionally active genes and may recognize specific substrates in genomic DNA. Little is known about the insertion of plasmid vectors. Both plasmid and retroviral methods have been used in SA gene trapping, while poly-A approaches have exclusively used retroviral delivery methods.
The recent release of a near-complete mouse genome, advances in techniques for estimating the levels of gene expression in a cell, and the availability of a public gene-trapping database (www.genetrap.org
) make it possible to globally assess the likelihood that a gene will be inactivated by gene trapping. In this study, we used regression techniques to model the effects of gene length and gene-expression levels on gene trapping in different gene-trap vectors. We also sought to define hotspots for gene-trapping events by using the regression models to identify genes trapped more frequently than predicted by the models. Our findings provide an improved understanding of the factors that control vector insertion in the genome.