Motivation: Discovering variation among high-throughput sequenced genomes relies on efficient and effective mapping of sequence reads. The speed, sensitivity and accuracy of read mapping are crucial to determining the full spectrum of single nucleotide variants (SNVs) as well as structural variants (SVs) in the donor genomes analyzed.
Results: We present drFAST, a read mapper designed for di-base encoded ‘color-space’ sequences generated with the AB SOLiD platform. drFAST is specially designed for better delineation of structural variants, including segmental duplications, and is able to return all possible map locations and underlying sequence variation of short reads within a user-specified distance threshold. We show that drFAST is more sensitive in comparison to all commonly used aligners such as Bowtie, BFAST and SHRiMP. drFAST is also faster than both BFAST and SHRiMP and achieves a mapping speed comparable to Bowtie.
Availability: The source code for drFAST is available at http://drfast.sourceforge.net
With the introduction of next-generation sequencing (NGS) technologies, we are facing an exponential increase in the amount of genomic sequence data. The success of all medical and genetic applications of next-generation sequencing critically depends on the existence of computational techniques that can process and analyze the enormous amount of sequence data quickly and accurately. Unfortunately, the current read mapping algorithms have difficulties in coping with the massive amounts of data generated by NGS.
We propose a new algorithm, FastHASH, which drastically improves the performance of the seed-and-extend type hash table based read mapping algorithms, while maintaining the high sensitivity and comprehensiveness of such methods. FastHASH is a generic algorithm compatible with all seed-and-extend class read mapping algorithms. It introduces two main techniques, namely Adjacency Filtering, and Cheap K-mer Selection.
We implemented FastHASH and merged it into the codebase of the popular read mapping program, mrFAST. Depending on the edit distance cutoffs, we observed up to 19-fold speedup while still maintaining 100% sensitivity and high comprehensiveness.
The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith–Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.
Motivation: With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this.
Results: We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints.
Availability: YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA.
The development of Next Generation Sequencing technologies, capable of sequencing hundreds of millions of short reads (25–70 bp each) in a single run, is opening the door to population genomic studies of non-model species. In this paper we present SHRiMP - the SHort Read Mapping Package: a set of algorithms and methods to map short reads to a genome, even in the presence of a large amount of polymorphism. Our method is based upon a fast read mapping technique, separate thorough alignment methods for regular letter-space as well as AB SOLiD (color-space) reads, and a statistical model for false positive hits. We use SHRiMP to map reads from a newly sequenced Ciona savignyi individual to the reference genome. We demonstrate that SHRiMP can accurately map reads to this highly polymorphic genome, while confirming high heterozygosity of C. savignyi in this second individual. SHRiMP is freely available at http://compbio.cs.toronto.edu/shrimp.
Next Generation Sequencing (NGS) technologies are revolutionizing the way biologists acquire and analyze genomic data. NGS machines, such as Illumina/Solexa and AB SOLiD, are able to sequence genomes more cheaply by 200-fold than previous methods. One of the main application areas of NGS technologies is the discovery of genomic variation within a given species. The first step in discovering this variation is the mapping of reads sequenced from a donor individual to a known (“reference”) genome. Differences between the reference and the reads are indicative either of polymorphisms, or of sequencing errors. Since the introduction of NGS technologies, many methods have been devised for mapping reads to reference genomes. However, these algorithms often sacrifice sensitivity for fast running time. While they are successful at mapping reads from organisms that exhibit low polymorphism rates, they do not perform well at mapping reads from highly polymorphic organisms. We present a novel read mapping method, SHRiMP, that can handle much greater amounts of polymorphism. Using Ciona savignyi as our target organism, we demonstrate that our method discovers significantly more variation than other methods. Additionally, we develop color-space extensions to classical alignment algorithms, allowing us to map color-space, or “dibase”, reads generated by AB SOLiD sequencers.
Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is <160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.
With the advent of next-generation sequencers, the growing demands to map short DNA sequences to a genome have promoted the development of fast algorithms and tools. The tools commonly used today are based on either a hash table or the suffix array/Burrow–Wheeler transform. These algorithms are the best suited to finding the genome position of exactly matching short reads. However, they have limited capacity to handle the mismatches. To find n-mismatches, they requires O(2n) times the computation time of exact matches. Therefore, acceleration techniques are required.
We propose a hash-based method for genome mapping that reduces the number of hash references for finding mismatches without increasing the size of the hash table. The method regards DNA subsequences as words on Galois extension field GF(22) and each word is encoded to a code word of a perfect Hamming code. The perfect Hamming code defines equivalence classes of DNA subsequences. Each equivalence class includes subsequence whose corresponding words on GF(22) are encoded to a corresponding code word. The code word is used as a hash key to store these subsequences in a hash table. Specifically, it reduces by about 70% the number of hash keys necessary for searching the genome positions of all 2-mismatches of 21-base-long DNA subsequence.
The paper shows perfect hamming code can reduce the number of hash references for hash-based genome mapping. As the computation time to calculate code words is far shorter than a hash reference, our method is effective to reduce the computation time to map short DNA sequences to genome. The amount of data that DNA sequencers generate continues to increase and more accurate genome mappings are required. Thus our method will be a key technology to develop faster genome mapping software.
Read alignment is a computational bottleneck in some sequencing projects. Most of the existing software packages for read alignment are based on two algorithmic approaches: prefix-trees and hash-tables. We propose a new approach to read alignment using random permutations of strings.
We present a prototype implementation and experiments performed with simulated and real reads of human DNA. Our experiments indicate that this permutations-based prototype is several times faster than comparable programs for fast read alignment and that it aligns more reads correctly.
This approach may lead to improved speed, sensitivity, and accuracy in read alignment. The algorithm can also be used for specialized alignment applications and it can be extended to other related problems, such as assembly.
More information: http://alignment.commons.yale.edu
A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds (∼14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows–Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia.edu/index.php/OLego.
Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the increased searching space, reduced complexity of bisulfite sequence, asymmetric cytosine to thymine alignments, and multiple CpG heterogeneous methylation.
We developed an efficient bisulfite reads mapping algorithm BSMAP to address the above issues. BSMAP combines genome hashing and bitwise masking to achieve fast and accurate bisulfite mapping. Compared with existing bisulfite mapping approaches, BSMAP is faster, more sensitive and more flexible.
BSMAP is the first general-purpose bisulfite mapping software. It is able to map high-throughput bisulfite reads at whole genome level with feasible memory and CPU usage. It is freely available under GPL v3 license at .
Recent methods have been developed to perform high-throughput sequencing of DNA by Single Molecule Sequencing (SMS). While Next-Generation sequencing methods may produce reads up to several hundred bases long, SMS sequencing produces reads up to tens of kilobases long. Existing alignment methods are either too inefficient for high-throughput datasets, or not sensitive enough to align SMS reads, which have a higher error rate than Next-Generation sequencing.
We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the read and genome dominated by insertion and deletion error. The method is benchmarked using both simulated reads and reads from a bacterial sequencing project. We also present a combinatorial model of sequencing error that motivates why our approach is effective.
The results indicate that it is possible to map SMS reads with high accuracy and speed. Furthermore, the inferences made on the mapability of SMS reads using our combinatorial model of sequencing error are in agreement with the mapping accuracy demonstrated on simulated reads.
The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a competitive platform to solve problems that are “inherently parallel”.
We present a hybrid system for short read mapping utilizing both FPGA-based hardware and CPU-based software. The computation intensive alignment and the seed generation operations are mapped onto an FPGA. We present a computationally efficient, parallel block-wise alignment structure (Align Core) to approximate the conventional dynamic programming algorithm. The performance is compared to the multi-threaded CPU-based GASSST and BWA software implementations. For single-end alignment, our hybrid system achieves faster processing speed than GASSST (with a similar sensitivity) and BWA (with a higher sensitivity); for pair-end alignment, our design achieves a slightly worse sensitivity than that of BWA but has a higher processing speed.
This paper shows that our hybrid system can effectively accelerate the mapping of short reads to a reference genome based on the seed-and-extend approach. The performance comparison to the GASSST and BWA software implementations under different conditions shows that our hybrid design achieves a high degree of sensitivity and requires less overall execution time with only modest FPGA resource utilization. Our hybrid system design also shows that the performance bottleneck for the short read mapping problem can be changed from the alignment stage to the seed generation stage, which provides an additional requirement for the future development of short read aligners.
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/
Methyltransferases (MTases) of procaryotes affect general cellular processes such as mismatch repair, regulation of transcription, replication, and transposition, and in some cases may be essential for viability. As components of restriction-modification systems, they contribute to bacterial genetic diversity. The genome of Helicobacter pylori strain 26695 contains 25 open reading frames encoding putative DNA MTases. To assess which MTase genes are active, strain 26695 genomic DNA was tested for cleavage by 147 restriction endonucleases; 24 were found that did not cleave this DNA. The specificities of 11 expressed MTases and the genes encoding them were identified from this restriction data, combined with the known sensitivities of restriction endonucleases to specific DNA modification, homology searches, gene cloning and genomic mapping of the methylated bases m4C, m5C, and m6A.
Saccharomyces cerevisiae strains carrying vps18 mutations are defective in the sorting and transport of vacuolar enzymes. The precursor forms of these proteins are missorted and secreted from the mutant cells. Most vps18 mutants are temperature sensitive for growth and are defective in vacuole biogenesis; no structure resembling a normal vacuole is seen. A plasmid complementing the temperature-sensitive growth defect of strains carrying the vps18-4 allele was isolated from a centromere-based yeast genomic library. Integrative mapping experiments indicated that the 26-kb insert in this plasmid was derived from the VPS18 locus. A 4-kb minimal complementing fragment contains a single long open reading frame predicted to encode a 918-amino-acid hydrophilic protein. Comparison of the VPS18 sequence with the PEP3 sequence reported in the accompanying paper (R. A. Preston, H. F. Manolson, K. Becherer, E. Weidenhammer, D. Kirkpatrick, R. Wright, and E. W. Jones, Mol. Cell. Biol. 11:5801-5812, 1991) shows that the two genes are identical. Disruption of the VPS18/PEP3 gene (vps18 delta 1::TRP1) is not lethal but results in the same vacuolar protein sorting and growth defects exhibited by the original temperature-sensitive vps18 alleles. In addition, vps18 delta 1::TRP1 MAT alpha strains exhibit a defect in the Kex2p-dependent processing of the secreted pheromone alpha-factor. This finding suggests that vps18 mutations alter the function of a late Golgi compartment which contains Kex2p and in which vacuolar proteins are thought to be sorted from proteins destined for the cell surface. The Vps18p sequence contains a cysteine-rich, zinc finger-like motif at the COOH terminus. A mutant in which the first cysteine of this motif was changed to serine results in a temperature-conditional carboxypeptidase Y sorting defect shortly after a shift to nonpermissive conditions. We identified a similar cysteine-rich motif near the COOH terminus of another Vps protein, the Vps11/Pep5/End1 protein. Preston et al. (Mol. Cell. Biol. 11:5801-5812, 1991) present evidence that the Vps18/Pep3 protein colocalizes with the Vps11/Pep5 protein to the cytosolic face of the vacuolar membrane. Together with the similar phenotypes exhibited by both vps11 and vps18 mutants, this finding suggests that they may function at a common step during vacuolar protein sorting and that the integrity of their zinc finger motifs may be required for this function.
The gene encoding the major capsid protein of the baculovirus Autographa californica nuclear polyhedrosis virus (AcMNPV) was identified, sequenced, and transcriptionally mapped. The location of the gene was determined by immunological screening of an expression library of AcMNPV open reading frame-beta-galactosidase fusions with an antibody raised to virus structural proteins. The DNA sequence of the corresponding region, which mapped within 56.6 and 58.0 map units on the AcMNPV genome, revealed a 1,040-base-pair open reading frame capable of encoding a 39-kilodalton polypeptide. The identity of the polypeptide was determined by Western blot (immunoblot) analysis of purified empty capsids with an antibody raised to the capsid-beta-galactosidase fusion protein. The identity of the peptide encoded by the gene was confirmed by immunoprecipitation of an in vitro translation product with RNA selected by hybridization to DNA sequences from the coding region of the gene. Transcripts of the capsid gene were analyzed by Northern (RNA) blots and mapped by nuclease protection and primer extension analysis. The capsid gene is transcribed maximally at 12 and 24 h postinfection but not in the presence of cycloheximide, a protein synthesis inhibitor, or aphidicolin, a viral DNA synthesis inhibitor, and is therefore classified as a late gene. The gene is transcribed in a counterclockwise direction with respect to the circular map. There are three transcriptional start sites, all containing the AGTAAG consensus sequence found at the start site of all late AcMNPV genes.
Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects.
We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454) system against a reference sequence. Our approach explores the characteristics of the data in these re-sequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time.
The proposed methodology was implemented in a software tool called TAPyR--Tool for the Alignment of Pyrosequencing Reads--which is publicly available from http://www.tapyr.net.
Summary: We introduce BRAT-BW, a fast, accurate and memory-efficient tool that maps bisulfite-treated short reads (BS-seq) to a reference genome using the FM-index (Burrows–Wheeler transform). BRAT-BW is significantly more memory efficient and faster on longer reads than current state-of-the-art tools for BS-seq data, without compromising on accuracy. BRAT-BW is a part of a software suite for genome-wide single base-resolution methylation data analysis that supports single and paired-end reads and includes a tool for estimation of methylation level at each cytosine.
Availability: The software is available in the public domain at http://compbio.cs.ucr.edu/brat/.
Supplementary information: Supplementary data are available at Bioinformatics online.
Several bioinformatics methods have been proposed for the detection and characterization of genomic structural variation (SV) from ultra high-throughput genome resequencing data. Recent surveys show that comprehensive detection of SV events of different types between an individual resequenced genome and a reference sequence is best achieved through the combination of methods based on different principles (split mapping, reassembly, read depth, insert size, etc.). The improvement of individual predictors is thus an important objective. In this study, we propose a new method that combines deviations from expected library insert sizes and additional information from local patterns of read mapping and uses supervised learning to predict the position and nature of structural variants. We show that our approach provides greatly increased sensitivity with respect to other tools based on paired end read mapping at no cost in specificity, and it makes reliable predictions of very short insertions and deletions in repetitive and low-complexity genomic contexts that can confound tools based on split mapping of reads.
MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/.
Summary: Sequencing reads generated by RNA-sequencing (RNA-seq) must first be mapped back to the genome through alignment before they can be further analyzed. Current fast and memory-saving short-read mappers could give us a quick view of the transcriptome. However, they are neither designed for reads that span across splice junctions nor for repetitive reads, which can be mapped to multiple locations in the genome (multi-reads). Here, we describe a new software package: ABMapper, which is specifically designed for exploring all putative locations of reads that are mapped to splice junctions or repetitive in nature.
Availability and Implementation: The software is freely available at: http://abmapper.sourceforge.net/. The software is written in C++ and PERL. It runs on all major platforms and operating systems including Windows, Mac OS X and LINUX.
Supplementary information: Supplementary data are available at Bioinformatics online.
A vaccinia virus (VV) gene required for DNA replication has been mapped to the left side of the 16-kilobase (kb) VV HindIII D DNA fragment by marker rescue of a DNA- temperature-sensitive mutant, ts17, using cloned fragments of the viral genome. The region of VV DNA containing the ts17 locus (3.6 kb) was sequenced. This nucleotide sequence contains one complete open reading frame (ORF) and two incomplete ORFs reading from left to right. Analysis of this region at early times revealed that transcription from the incomplete upstream ORF terminates coincidentally with the complete ORF encoding the ts17 gene product, which is directly downstream. The predicted proteins encoded by this region correlate well with polypeptides mapped by in vitro translation of hybrid-selected early mRNA. The nucleotide sequences of a 1.3-kb BglII fragment derived from ts17 and from two ts17 revertants were also determined, and the nature of the ts17 mutation was identified. S1 nuclease protection studies were carried out to determine the 5' and 3' ends of the transcripts and to examine the kinetics of expression of the ts17 gene during viral infection. The ts17 transcript is present at both early and late times postinfection, indicating that this gene is constitutively expressed. Surprisingly, the transcriptional start throughout infection occurs at the proposed late regulatory element TAA, which immediately precedes the putative initiation codon ATG. Although the biological activity of the ts17-encoded polypeptide was not identified, it was noted that in ts17-infected cells, expression of a nonlinked VV immediate-early gene (thymidine kinase) was deregulated at the nonpermissive temperature. This result may indicate that the ts17 gene product is functionally required at an early step of the VV replicative cycle.
Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing.
Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms.
We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.
A Tn5-based mutagenesis strategy was used to generate a collection of trichloroethylene (TCE)-sensitive (TCS) mutants in order to identify repair systems or protective mechanisms that shield Burkholderia cepacia G4 from the toxic effects associated with TCE oxidation. Single Tn5 insertion sites were mapped within open reading frames putatively encoding enzymes involved in DNA repair (UvrB, RuvB, RecA, and RecG) in 7 of the 11 TCS strains obtained (4 of the TCS strains had a single Tn5 insertion within a uvrB homolog). The data revealed that the uvrB-disrupted strains were exceptionally susceptible to killing by TCE oxidation, followed by the recA strain, while the ruvB and recG strains were just slightly more sensitive to TCE than the wild type. The uvrB and recA strains were also extremely sensitive to UV light and, to a lesser extent, to exposure to mitomycin C and H2O2. The data from this study establishes that there is a link between DNA repair and the ability of B. cepacia G4 cells to survive following TCE transformation. A possible role for nucleotide excision repair and recombination repair activities in TCE-damaged cells is discussed.
cDNA encoding Ca2+-ATPase was cloned from a chicken skeletal muscle library. The cDNA (termed FCa) comprised 3,239 base pairs, including an open reading frame encoding 994 amino acids which showed the highest degree of homology with the adult rabbit fast-twitch Ca2+-ATPase isoform (C. J. Brandl, S. de Leon, D. R. Martin, and D. H. MacLennan, J. Biol. Chem. 262:3768-3774, 1987). Radiolabeled FCa hybridized to a 3.2-kilobase transcript in chicken skeletal muscle RNA but not to cardiac muscle RNA, which confirmed its identity as encoding the fast Ca2+-ATPase isoenzyme. FCa was transfected into the mouse myogenic line C2C12, from which a protein of 100 kilodaltons was immunopurified by using a monoclonal antibody specific for the avian fast Ca2+-ATPase. Immunofluorescence microscopy of a line (designated C2FCa2) stably expressing the avian Ca2+-ATPase localized the protein to the nuclear envelope and a population of cytoplasmic vesicles. A similar pattern was observed when C2FCa2 cells were stained with DiOC6(3), a cyanine dye that labels endoplasmic reticulum and mitochondria (M. Terasaki, J. Song, J. R. Wong, M. J. Weiss, and L. B. Chen, Cell 38:101-108, 1984). We conclude that the avian Ca2+-ATPase fast isoform is expressed and correctly targeted to the endoplasmic reticulum in mouse C2C12 cells.