HTS technology has revolutionized miRNA discovery and expression analysis. Compared to traditional gene expression profiling methods such as hybridization based methods, microarrays and quantitative PCR, HTS offers the advantages of high sensitivity, the ability to identify novel miRNAs and provides information about miRNA editing and 3′-end modification simultaneously (21
). Despite these advantages, recent studies have revealed the existence of bias in HTS when quantifying the level of miRNA expression directly from sequence reads (24
A recent study using a pool of synthetic miRNAs concluded that inconsistencies in miRNA quantitation in HTS experiments are primarily due to biases in the adapter ligation steps and not due to downstream steps such as reverse transcription, PCR, or the sequencing reaction itself (25
). Previous studies examined the bias after complex sample preparation protocols, which reflect a combined bias from two ligation steps using two ligases (24–26
). In this study, we used a random mixture of RNA substrates to examine the bias of T4 RNA ligases during 3′-adapter ligation in isolation. Our selection strategy enabled us to include 9
randomized RNA sequences in one ligation reaction, which provides complete sequence coverage for all possible RNAs 21
nt in length in contrast to previous studies (24–26
To study bias in 3′-adapter ligations, it was critical to accurately determine the content of the random input sequences in our ligation reaction. To do so we employed a homopolymer tailing approach. Alternatively, we attempted to assess the nucleotide content of the random pools using a direct reverse transcription method with two different 3′-overhang degenerate nucleotide stem–loop RT primers (44
). Hairpin RT primers with either 6 or 10, 3′-overhanging degenerate nucleotides, were designed to hybridize to the 3′-ends of unknown RNAs, and serve as reverse transcription primers. Libraries prepared with the degenerate stem–loop RT primers showed bias for G and C nucleotides in the degenerate priming region (Supplementary Figure S1
). We interpreted this to reflect a bias that results from primer annealing, where more stable G·C base pairs were favored over A·T pairs. In addition to being a poor option for assessing the content of a random oligo pool, using a stem–loop primer with a randomized 3′-end for annealing will introduce additional bias when used to quantify miRNAs. In contrast, the poly(A) polymerase tailing method showed no detectable bias (Supplementary Figure S1
). For this reason, we used the random library sequences obtained by the poly(A) polymerase tailing method for all subsequent analysis.
Strikingly, we found that T4 RNA ligases show no significant preference for RNA primary sequence, contradicting a previous report (26
). Instead, we provided experimental evidence for the important role of RNA and adapter cofold structures that were suggested to be influential in an article published while this manuscript was in preparation (25
). What distinguishes our work from these recent studies is that we separated the 3′-ligation step so that we could study its inherent bias in the absence of potentially confounding effects from 5′-ligation that was used by Hafner (25
) and Jayaprakash (26
). In addition, our expanded analysis of a larger group of possible ligation substrates allowed us to accurately assess primary sequence preference. We were then able to predict, test, and prove that particular cofold structural classes are disfavored for ligation with T4 RNA ligases, while others are neutral or slightly favored. Together, these factors explain why we arrived at different conclusions than Jayaprakash et al.
). We performed folding analysis based on their results, but are unable to comment on whether their results can be explained based on our finding of structural bias because of what we believe to be confounding effects of 5′-ligation and different ligation conditions. Future definition of the bias in ligation of adapters to the 5′-ends of RNAs, and interpretation of how that bias may have affected the conclusions of both Hafner et al
) and Jayaprakash et al.
) will necessitate further study. The cumulative results of our experiments demonstrate that, for T4 RNA ligases, the adapter and its ability to interact with an RNA substrate has a major influence on ligation efficiency. The concept of redesigning adapters can be practically applied to improve the ligation efficiency of a specific miRNA with known sequence.
In many experimental situations, the starting material to be ligated is a pool of unknown RNAs, or a mixture of RNAs that is so complex that designing an adapter for each RNA is impractical. Our 5′-randomized adapter approach increases the chance that an appropriate adapter for ligation is present for each miRNA. While largely effective, the ligation of some individual miRNAs to 5′-randomized adapters was not improved in the context of excess small RNA as we predicted. A possible explanation is that there may be interference from other small RNAs in the pool that interact with the miRNA and inhibit productive cofolding with adapters. Overall, however, we observed that ligation bias was reduced with randomized adapters.
Recently, methods that include barcoding when preparing samples for HTS have been shown to be efficient and affordable for sequencing multiple samples simultaneously (45
). A very recent report showed that barcodes introduced at the ligation step resulted in significant bias on miRNA expression profiles in high-throughput multiplex sequencing (46
). The effect of cofold structures on ligation explains the observation of bias. For that reason, introducing barcodes in the 3′-adapter for HTS warrants careful consideration, especially when one tries to compare the relative miRNA expression level from different samples prepared with different barcoded adapters. We therefore suggest introducing barcodes in the reverse transcription or PCR step to avoid introducing ligation bias among samples.
The procedures described here for studying T4 RNA ligase bias should be applicable to other ligases and other ligation conditions, for instance ligation of adapters to unknown RNA 5′-ends. These procedures represent important early steps toward resolving the issue of ligation bias by seeking alternative or modified ligases.
In summary, our findings show that the bias introduced by T4 RNA ligases in HTS experiments is due to structural properties within and between RNA substrates and the adapters used in ligation. Our model of what constitutes a compatible RNA-adapter pair was successfully used to design adapters to improve the ligation of RNAs with a known sequence. The randomized adapter that we designed demonstrated promise toward improving ligation efficiency and reducing bias when ligating a pool of RNAs. This approach may be extended by producing minimized sets of adapters for the study of specific pools of RNAs. For instance, a set of adapters could be designed so that the each member of the miRNA repertoire of an organism would have a corresponding high efficiency adapter included in the mixture. Our approaches should also be applicable to RNAs other than miRNA, including mRNAs fragmented for strand-specific RNA sequencing library preparation.