|Home | About | Journals | Submit | Contact Us | Français|
Protein nanopores are under investigation as key components of rapid, low-cost platforms to sequence DNA molecules. Previously, it has been shown that the α-hemolysin (αHL) nanopore contains three recognition sites, capable of discriminating between individual DNA bases when oligonucleotides are immobilized within the nanopore. However, the direct sequencing of RNA is also of critical importance. Here, we achieve sharply defined current distributions that enable clear discrimination of the four nucleobases, guanine, cytosine, adenine and uracil, in RNA. Further, the modified bases, inosine, N6-methyladenosine and N5-methylcytosine, can be distinguished.
Single-molecule nanopore technology is under development for ultra-rapid, low-cost sequencing of DNA and RNA molecules. Two types of nanopore are being investigated: solid state pores1 and protein pores, such as the heptameric α-hemolysin (αHL) pore. Protein pores have spearheaded the approach as they can be precisely manipulated by chemical and genetic engineering2, which facilitates the determination of sequence in individual DNA strands through base-dependent transitions in ionic current flow3. By immobilizing and stretching DNA strands within protein nanopores, the four canonical DNA bases and epigenetically modified bases have been individually identified by ionic current recording4–10. Individual oxidized bases and abasic sites can also be identified after chemical modification11, 12. However, ssDNA moves through protein nanopores at remarkably high velocities (e.g. ~1–3 µs per nucleobase)13, which provides insufficient signal-to-noise for individual bases to be identified. Therefore, enzymes have been used to ratchet DNA through pores14–17. By using approaches related to these published methods, Oxford Nanopore Technologies have demonstrated nanopore sequencing, achieving kilobase reads18. Additional studies of nanopore sequencing are appearing in the open literature19, 20.
Nanopore sequencing of RNA has received less attention. The direct high-throughput sequencing of RNAs (mRNA, miRNA etc) will allow the rapid identification and quantitation of functional elements of the genome and reveal important splicing patterns and post-transcriptional modifications21–23. RNA sequencing will be valuable in medical diagnosis, the selection of therapies and prognosis. For example, transcriptome sequencing has already been used to detect gene fusions in cancer24, 25. The ability to sequence extracellular RNA from plasma enhances the power of such approaches22, 26, 27. The genomes of RNA viruses and viral RNA transcripts are also accessible. RNA sequencing might also be used to monitor the levels and turnover of therapeutic RNAs28–30. Further, nanopores can directly identify modified bases, which are prevalent in RNA9, 31.
Short ssRNA homopolymer molecules have been distinguished on the basis of differences in residual currents (IRES) recorded while the RNAs are translocating through the wild-type (WT) αHL pore in an applied potential32–35. The transition between two homopolymer runs oligo(rA) and oligo(rC) within a single translocating RNA strand has also been observed, and may be accentuated by differences in the helical structures of the two regions32, 33. However, as in the case of DNA, individual RNA bases have not been identified in moving strands. Therefore, we have examined nucleobase identification in RNA strands by capturing and immobilizing them within the αHL pore with the biotin-streptavidin approach used previously for DNA base identification5, 6. The 5 nm-long β barrel of the αHL nanopore contains three recognition sites, R1, R2 and R3, capable of recognizing individual bases in ssDNA (Figure 1a). We investigated RNA base substitutions at position 9 of synthetic oligonucleotides (bases numbered from the 3' end) to probe the most promising recognition site R1 (which is located at the central constriction of the pore, comprising residues Lys-147, Glu-111 and Met-113, Figure 1b). Modification of the charge distribution within R1 has a powerful effect on IRES for DNA oligonucleotides5, 8. Therefore, we employed homoheptameric mutant pores made from αHL E111N/K147N (NN) and αHL E111N/K147N/M113Y (NNY) subunits.
RNA base discrimination was first tested in homopolymer oligonucleotides, consisting of 30 nucleotides (the sequences of the oligonucleotides used in this paper are in Table S1). ssRNA oligonucleotides with biotin tags at the 3' end (Figure S1) were allowed to form complexes with streptavidin (SI Methods). In this state, the strands were captured and immobilized by αHL pores in an applied potential, but they were not translocated into the trans compartment (Figure 1a, Figure S2 automated voltage protocol). The one-second capture sequence was repeated for at least 400 cycles for each ssRNA added, with >90% of the cycles giving current blockades. The extended residence times of the oligonucleotides within the pore allowed reduction of the current noise by stringent filtration, thereby improving the signal-to-noise ratio and the precision of the measurements.
Once captured, the immobilized RNA molecules caused a sequence-dependent decrease in the ionic current through the pore. At +200 mV in 1 M KCl, 25 mM Tris-HCl, pH 7.5, containing 100 µM EDTA, WT αHL pores have a mean open pore current level (IO WT) of 199 ± 6 pA (n = 12), while the pores formed from NN and NNY, the mutants used in this work, gave IONN = 214 ± 7 pA (n = 8) and IONNY = 210 ± 8 pA (n = 8), respectively. Although the open pore currents carried by the WT pore, NN and NNY are similar (Figure S3), the residual currents in the presence of ssDNA5, 36, 37 and ssRNA (this work) are ~50% higher in the mutant pores (Figure 1c–e) owing to the increased cross section of the lumen after mutation. Immobilized oligo(rA)30 blocked NNY pores to a greater extent (IRES%oligo(rA) = 32.6 ± 0.2%) than oligo(rC)30 (IRES%oligo(rC) = 33.5 ± 0.2%) and oligo(rU)30 (IRES%oligo(rU) = 34.2 ± 0.2%) (Figure 1e, IRES% = IRES/IO X 100). The residual current difference between the oligo(rC)30 and the oligo(rA)30 oligonucleotide blockades (ΔIRES% = IRES%oligo(rC) - IRES%oligo(rA)) is +0.9 ± 0.2% and the ΔIRES% between oligo(rU)30 and the oligo(rA)30 oligonucleotide (ΔIRES% = IRES%oligo(rU) - IRES%oligo(rA)) is +1.6 ± 0.2%. Furthermore, the difference in residual current between the two most widely dispersed current peaks, (ΔIRES%OVERALL) between the three homopolymers also increases in the mutant pores compared to WT, providing improved levels of discrimination: αHL WT, ΔIRES%OVERALL = 1.5 ± 0.4%; αHL NN, ΔIRES%OVERALL = 2.4 ± 0.2% and αHL NNY, ΔIRES%OVERALL = 2.8 ± 0.4% (n = 3 for each pore). There is also a change in the pattern of the current blocks with oligo(rC) producing higher IRES% values than oligo(rA) in the NN and NNY pores, compared to WT (Figure 1c–e).
The IRES% values of the immobilized RNA homopolymers increased with applied positive potential for WT, NN and NNY pores (Figure S4, Table S2). A likely explanation is stretching of the RNA molecules inside the pore. Single-stranded homopolymers are highly flexible with a persistence length of ~1 nm38, 39. Therefore, the molecules will be elongated by confinement within the β barrel and further elongated by the applied potential, which generates a force of ~10 pN on the molecule (SI Methods)40–42. At higher potentials, the RNAs will become more fully extended within the pore, and blockade of the ionic current will be correspondingly reduced, resulting in higher IRES% values. The overall dispersion, ΔIRES%OVERALL , is also affected by increased potentials, which result in a tighter distribution of blockades by the three homopolymers, e.g. for αHL NNY we observed a ~64% decrease in dispersion, from ΔIRES%OVERALL = 4.4 ± 0.4% at +100 mV to 2.8 ± 0.4% at +200 mV (Table S2). The product of the sequential differences (δ) between each of the three residual current levels in the histograms, can also be determined to gauge the ability of the pore to discriminate between the three homopolymers (Table S2, SI Methods). All three pores are able to discriminate between oligo(rA), oligo(rC) and oligo(rU) within the potential range (+100 to +200 mV, δ > 0). With αHL NNY displaying the best discrimination at the highest applied potential (+200 mV), δNNY = 1.8 ± 0.2%. These investigations of voltage dependence aid in the optimization of IRES overlap.
We examined the ability of WT, NN and NNY αHL pores to distinguish between rG, rA, rU, and rC within individual nucleic strands. A first set of four oligonucleotides comprised oligo(dC)30 with a ribonucleotide at position-9 relative to the biotinylated 3' end. A second set of four oligonucleotides comprised oligo(rA)30 with a ribonucleotide at position-9. The oligo(dC) and oligo(rA) backgrounds were used, as both sequences are thought to have minimal secondary structure32, 43. A lack of secondary structure in our model system is advantageous, because any current differences observed between oligonucleotides can be attributed to the nucleobase sequences, rather than structural differences.
For the oligo(dC)30 oligos, WT αHL pores showed weak discrimination between rC, rA, rU and rG (in order of increasing IRES%, Figure 2c). The αHL NN pores displayed improved discrimination, clearly separating the bases in the order rC, rA, rG and rU (in order of increasing IRES%, Figure 2d). The overall dispersion of current levels was far greater for NN pores (NN-ΔIRES%OVERALL = 2.2 ± 0.4% and δNN = 0.3 ± 0.02%) than it was for WT pores (WT-ΔIRES%OVERALL = 1.2 ± 0.2% and δWT = 0.1 ± 0.02%). The αHL NNY pores displayed a twofold improvement in the dispersion of IRES% for the four nucleobases, compared to NN: NNY, ΔIRES%OVERALL = 4.5 ± 0.4%, δNNY = 2.4 ± 0.2%, along with a different order of increasing IRES% (rG<rA<rC<rU, Figures 2a and 2e). In the NNY pore, the tyrosines at position 113 may provide enhanced hydrogen bonding and aromatic stacking interactions with the immobilized bases7, 44, 45.
Similar results were obtained for the second set of oligos in the oligo(rA)30 background. Again, the WT αHL pore showed the weakest discrimination between rC, rA, rU and rG (in order of increasing IRES%, Figure 2f), with a narrow dispersion between the residual current levels: ΔIRES%OVERALL = 1.9 ± 0.1% and δWT = 0.2 ± 0.01%. The αHL NN pore displayed discrimination that differed from that of the WT, with the increasing order of IRES% being rG, rU, rA and rC (Figure 2g), and the dispersion was again improved, ΔIRES%OVERALL = 2.7 ± 0.2% and δNN = 0.5 ± 0.01%. The αHL NNY pores gave the best dispersion in IRES% with a similar pattern between the four nucleobases as seen for the oligo(dC) background (rG, rA, rC and rU, in order of increasing IRES%, Figures 2b and 2h).
The voltage dependencies of the current blocks caused by the four RNA bases within the oligo(dC)30 and oligo(rA)30 chains were also examined (Figures S5, S6 and S7; Tables S3 and S4). We observed a similar trend to that seen in the homopolymer data, in which an increased applied positive potential gave higher IRES% levels for the DNA and RNA oligos across the three pores (Figure S5). For the NN and NNY pores, the four bases within the RNA oligo(rA) gave a higher ΔIRES%OVERALL (between +100 and +140 mV) than that observed for the DNA oligo(dC): eg. at +120 mV, within the oligo(rA) chain, NNY-ΔIRES%OVERALL = 4.9 ± 0.3%, δNNY = 4.2 ± 0.2% (n = 3); at +120 mV, within the oligo(dC) chain, NNY-ΔIRES%OVERALL = 2.8 ± 0.3%, δNNY = 0.7 ± 0.02% (n = 3) (Table S4). In addition, at each applied potential, we consistently observed lower IRES% levels for the four bases within the RNA background oligo(rA)30 in comparison to the DNA background oligo(dC)30.
Cellular RNAs contain more than a hundred different base modifications at thousands of sites. These RNA modifications are dynamic and have critical regulatory roles31. The internal N6-methyladenosine (m6A) modification in messenger RNA (mRNA) is one of the most abundant modifications in higher eukaryotes, and is present at 3 to 5 sites on average per mRNA. The inability to carry out this modification leads to apoptosis. 5-Methylcytosine (m5C) is also widespread in cellular RNAs. While epigenetic DNA methylation has been extensively studied, the precise location of m5C in RNA remains to be elucidated. The deamination of adenosine (rA) to inosine (rI) is another important modification, which occurs in the editing of mRNA46, 47. rI base is read as rG by the ribosome, leading to amino acid substitution with functional implications48.
We therefore attempted to distinguish the riboforms of I, m6A and m5C from rG, rA, rC and rU at position-9 in ssDNA and ssRNA oligonucleotides by using the αHL NNY pore. The NNY pore had given superior discrimination between the standard bases (rG, rA, rC and rU) at recognition site R1 based on the increased ΔIRES%OVERALL values observed in the homopolymeric strands oligo(dC) and oligo(rA). The addition of the rI oligonucleotide to the standard mixture of four gave: ΔIRES%OVERALL = 5.8 ± 0.4%, δ = 2.3 ± 0.2% in the oligo(dC) background; and ΔIRES%OVERALL = 2.6 ± 0.2% and δ = 0.2 ± 0.04% in the oligo(rA) background. The difference in residual current levels between the rI oligo and oligo(rA), ΔIRES%rI-rA, was +0.9 ± 0.1% (n = 3) (Figure 3a–c, Tables S5 and S6). These results were striking, given the small chemical difference between the two bases.
m6A oligonucleotides also produced distinct current blocks (Figure 3a,d). The ΔIRES%m6A-rA between m6A and rA was −0.5 ± 0.1% (n = 3) in oligo(rA) with a similar dispersion of ΔIRES%OVERALL to that observed with rI: ΔIRES%OVERALL = 2.5 ± 0.1% and δ = 0.1 ± 0.04%. m5C could also be identified in the same manner (Figure 3a,e, Table S6): in oligo(rA), ΔIRES%m5C-rA = +1.4 ± 0.4% (n = 3), ΔIRES%OVERALL = 3.2 ± 0.2% and δ = 0.3 ± 0.02%. While the molecular bases for the small differences in IRES% levels for the modified bases are unclear, the observed current levels are distinct from all other bases and each other, demonstrating the possibility of using the αHL nanopore for mapping modified bases in the transcriptome. The ΔIRES% patterns for the WT, NN and NNY αHL pores (Figures 4a and 4b; Tables S5 and S6) demonstrate the potential to identify all seven bases within the oligo(dC) and oligo(rA) backgrounds. The αHL NNY pore displays a large dispersion of current levels for the seven bases in oligo(dC) and a more modest dispersion in oligo(rA), showing that base identification is modulated by the background7.
The ability to sequence ssRNA would require the recognition of individual nucleotides in heteropolymeric backgrounds. To examine discrimination within a heteropolymer, we tested the R1 site in the NNY pore. The sequence we used (Figure 5) did not contain secondary structures, such as hairpins, as predicted by the mfold algorithm49, 50. At +200 mV, all four bases at position 9 were recognized with the same order of IRES% (rG, rA, rC and rU, Figure 5a), as seen in the homopolymeric backgrounds (Figure 2 and and3).3). The dispersion of the current levels in the histogram (ΔIRES%OVERALL = 1.7 ± 0.3%, δ = 0.2 ± 0.06%) were also similar to those seen in the homopolymeric backgrounds, suggesting that the different current levels were indeed directly due to the nucleobases changes, rather than changes in secondary structure. The voltage dependence of the currents arising from the four bases again displayed similar characteristics to the results obtained in the homopolymer backgrounds, with higher positive potentials resulting in higher IRES% levels (Figure 5b, Table S7). The span between rC and rA (Figure 5c) in the residual current histogram (ΔIRES%rC-rA = +0.7 ± 0.1%) was similar to that seen in the oligo(rA) homopolymeric background, ΔIRES%rC-rA = +0.8 ± 0.2% (Figure 2h, Table S4).
We have shown that individual RNA bases can be identified in immobilized DNA and RNA strands. By using the αHL NNY mutant pore, which has superior nucleobase discrimination properties, we were able to distinguish between the standard bases (rG, rA, rC, and rU) and the modified bases rI, m6A and m5C. The ultra-rapid sequencing of RNA will ultimately require an active process to control movement of the nucleic acid through the pore14, 15, 17, 19, 20, 51. Hence, it will be necessary to combine αHL or an alternative protein nanopore with a processive RNA translocating enzyme, such as a exoribonuclease52 or a reverse transcriptase53, to ratchet RNA through the pore at a speed at which base identification is feasible. For a functional device, a ~10 ms measurement per base is feasible, and would reduce the sequencing time for a human genome to less than a day with a 104-pore array device. The data acquisition rate and signal filtering required for a 10 ms measurement time would allow the separation of the currents levels seen with NN and NNY in the present work. Additionally, integration of the protein nanopore with a processing enzyme will support the unfolding of translocating RNA molecules, thereby overcoming a major concern in the analysis of native RNA, which can form complex structures mediated by the base-pairing54, 55. For high throughput sequencing, thousands of pores must be incorporated into arrays and viable approaches towards this end are under development1, 56.
This work was supported by grants from the National Institutes of Health and Oxford Nanopore Technologies. The authors thank E. Mikhailova for the preparation of proteins, and D. Stoddart and A. Heron for useful discussions.
Supporting Information. Details of experimental procedures, oligonucleotide sequences, and the data displayed in Figures 1–5. This material is available free of charge via the Internet at http://pubs.acs.org.
Hagan Bayley is the Founder, a Director and a share-holder of Oxford Nanopore Technologies, a company engaged in the development of nanopore sequencing technology.