Quantitative analysis of CE separations of the products of chemical and enzymatic mapping (footprinting) by CAFA yields more results, more quickly with better precision compared to GE based methods. The quality and rapidity of data analysis enabled by CAFA reduces the problem of characterizing RNA solution structure to common practice. We release CAFA as free open-source software with the hope that it will stimulate quantitative study of nucleic acid structure and function as has our GE-based SAFA software (
23,
25,
28,
55). Furthermore, the fact that this analysis can be run on a standard instrument could lead to core facilities offering CAFA analysis in addition to sequencing.
The experimental versatility and simplicity of quantitative solution mapping enabled by CAFA, melded with indirect labeling, allows very long lengths of nucleic acids to be efficiently interrogated with single nucleotide resolution. Combined with
in vivo chemical mapping protocols (
8), CAFA provides an excellent platform for structural characterization of nucleic acids as well as nucleic-acid protein interactions inside the cell. The combination of automated analysis and the high-throughput achieved with multicapillary machines enables genomic scale studies now to be undertaken. Furthermore, the throughput of the technique could be further increased by simultaneously running samples with different colored dyes within a single capillary. The fundamental fitting algorithms implemented in CAFA are compatible with this approach, although the costs of synthesizing additional colored primers may not always justify extending the approach in this way.
CAFA accommodates the biggest technical hurdle to indirect labeling by RT primer extension; sequence-specific pauses and stops. While our experimental protocols minimize the frequency of these stops, their predictability from a ‘background trace’, makes it possible to flag these sites of low reliability data for exclusion in subsequent analysis (B). The ability to accurately exclude erroneous data early in the analysis procedure is critical to high-throughput data analysis. Our approach is conservative in that we choose to exclude more data (16% false positives, B) to ensure that the remaining data are accurate.
CAFA is a standalone application with a graphical user interface (
Supplementary Figure 4) that accommodates a variety of experimental protocols. The software takes a raw CE-trace, fits a peak model to it and thus quantitates the relative amount of each mapping reaction product (). The output peak areas are associated with nucleotide numbers corresponding to the DNA reference peaks of the size standard ladder; these numbers are then related to either the source from which the cDNA was transcribed (indirect labeling) or the directly labeled sample. CAFA and its documentation show how data can be associated with the sample sequence based on concomitant analysis of the appropriate sequence reference ladders. Postprocessing tools are provided to facilitate this task. We observed excellent agreement between the Beckman size ladder and a T1 digest and are confident in the accurate assignment of sequence to peaks (
Supplementary Figure 1). However, it is possible that systemic shifts in sequence assignment could occur for molecules with extreme GC content. For this reason, we recommend calibration of the ladder against the RNA upon initiation of a new study to identify systematic bias and allow for its correction.
The three applications of CAFA we demonstrated are: determination of nucleotide solvent accessibility with ·OH footprinting, secondary structure mapping using DMS, and protein-binding site identification on DNA. Common to each problem is the need to accurately determine the peak areas corresponding to each nucleotide, which is at the heart of the CAFA algorithm. Given the experimental versatility of RT-based indirect labeling and the availability of the CAFA software, CE appears poised to replace GE as the method of choice for the high-throughput analysis of nucleic acid structure as it already has for DNA sequencing.