Improving the in vitro activity of nucleic acid-modifying enzymes has been a vital driver for molecular biology research, enabling technological advances in cloning, sequencing, forensic science, diagnostics and drug development. Much effort has therefore gone into understanding their function. In many cases these enzymes have evolved to recognise specific features to attain specificity, but a method to comprehensively describe these specificity determinants is lacking.
The characterisation of these determinants is important both to understand biological processes and to modify features for purposes of molecular manipulation. For example, DNA polymerases have been modified to improve fidelity and inhibitor resistance [1
]. RNA ligases have also been studied in detail: thermophylic forms have been identified [3
], and modifications to accept only adenylated RNAs have been made [4
]. These new forms of RNA ligase were instrumental in the development of new protocols for the small RNA cloning required for next generation sequencing (NGS). Currently, identifying the functional determinants of their substrates has been based on low-throughput experiments.
Several innovative approaches using NGS to test millions of molecules in parallel have been developed to study protein function [7
]. Most notably high-throughput sequencing-fluorescent ligand interaction profiling (HiTS-FLIP) is a technique for measuring quantitative protein DNA binding [8
]. NGS has also been combined with SELEX, which uses randomised oligonucleotides to identify ligands for proteins [9
] or transcription factor binding sites [10
]. It was also used to establish the fitness landscape of a catalytic RNA [11
] and to compare the bias of different approaches to sequence mRNA fragments [12
We have developed a method to carry out functional analysis of nucleic acid-modifying enzymes using NGS. This method employs completely randomised oligonucleotide substrates such that all possible sequences are presumed to have similar concentrations, which we call degenerate libraries. We add the enzyme of interest to the degenerate libraries containing millions of different sequences and subject the resulting sample to NGS (Figure a). The enzyme preferences are revealed by the NGS results. We used this approach to characterise RNA ligase sequence preferences in order to investigate the potential for biases in small RNA (sRNA) NGS data sets.
Figure 1 Scheme depicting the experimental approach and HD adapters. a Data were generated to analyse the sequence preferences of T4 Rnl1 and T4 Rnl2 using a degenerate RNA library (N21 RNA). b HD adapters include degenerate tags at the end of the adapters that (more ...)
sRNAs are a major group of gene regulators between 20 and 32 nucleotides in length (reviewed in [13
]) There are several classes of sRNA that play important roles in gene regulation, with the Dicer generated microRNAs (miRNAs) being the most extensively studied [14
]. Their expression levels can be measured by array hybridisation, quantitative PCR (qPCR) or NGS of cDNA libraries (reviewed in [15
]). Arrays and qPCR methods are limited to characterising known miRNAs, and recent reports have suggested significant differences between technologies for quantifying miRNAs [17
]. Indeed, significant sequencing biases for NGS of miRNAs have been reported [19
]. The latest protocol for small RNA library generation requires ligation of an adenylated 3' adapter using a truncated form of T4 RNA ligase2 (Rnl2), followed by ligation of a 5' adapter using T4 Rnl1, although other protocols that use T4 Rnl1 for both ligations are also commonly used. The ligated product is reverse transcribed and then amplified by PCR [22
Rnl1 and Rnl2 are two different families of RNA end-joining enzymes and have two distinct in vivo functions. Rnl1 repairs the virus-induced cleavage of the single-stranded (ss) anticodon loop in tRNA-Lys in Escherichia coli
]. A SELEX type approach was used to show that Rnl1 prefers ss substrates [27
]. Rnl2 is involved in RNA editing in eukaryotic trypanosomes and Leishmania [28
]. The current thinking is that Rnl2 seals nicks in double-stranded (ds) RNA in keeping with its function in RNA editing of mRNA [30
]. The phage T4 Rnl2 is commonly used in molecular biology. Although it can ligate both ds and ss RNA [32
], it is not clear which structure is preferred, and its in vivo function is not currently known. A comprehensive understanding of RNA ligase substrate preferences would help in developing a method to reduce sequencing bias.
We used cDNA libraries generated through ligation of RNA molecules to survey the sequence preference landscape of Rnl1 and Rnl2 using degenerate libraries. This revealed important sequence preferences of these enzymes. This comprehensive analysis allowed us to develop a novel type of high definition adapter (HD adapter) (Figure b) that significantly reduces sequencing bias in biological samples. We demonstrate that the use of HD adapters increased the representation of low-abundance small RNAs and allowed new miRNAs to be identified. In addition, we use available data in miRBase [33
], the global repository for miRNA sequences, to demonstrate that the dominant use of one NGS platform has biased miRNA research.