Given the multitude of targets and functions of RBPs, various methods have been developed over the years to investigate their binding specificity. The electrophoretic mobility shift assay is a technique that is used currently to detect interactions between proteins and nucleic acids in vitro
]. It is based on the principle that an oligonucleotide migrates slower in an electric field when bound by the protein and it is of course suited only to document that a given RBP binds a given oligonucleotide sequence. Another in vitro
procedure called Systematic Evolution of Ligands by Exponential Enrichment has been widely used to identify aptamers (ssDNA or RNA) that specifically bind to a protein of interest starting from a large library of randomly generated oligonucleotides [82
]. Through multiple rounds of selection, purification under stringent conditions and amplification, high-affinity interactors can be selectively enriched. It has been found however that the high-affinity targets obtained in vitro
are frequently not the most specific. The recently developed RNAcompete method uses a single binding reaction to determine in vitro
the affinities an RBP for a complete set of k
-mers (oligonucleotides) that are presented in structured and unstructured RNA contexts. This method has been applied to various RBPs and the results have been shown to be consistent with previous in vivo
]. One drawback of these in vitro
studies is that they employ recombinant proteins which limits their use when the proteins of interest are unstable, difficult to express or purify to homogeniety.
Genome-wide microarray analyses upon RBP knockdown or overexpression may identify in vivo
targets of RBPs. However, such analyses profile global steady state levels of mRNAs which poorly correlate with the cellular protein levels [84
], and it is therefore unclear how informative such analyses are. RNA immunoprecipitation followed by microarray-based identification of protein-bound RNAs (RIP-Chip) [85–89
] has been widely used to identify in vivo
targets of RBPs. Although this method has the advantage of identifying interactions occurring in vivo
, the RNA–protein complex cannot be washed stringently, thereby having the potential of isolating many false positive targets. Also, because with this approach one isolates large RNA molecules, uncovering the sequence or structure specificity of the protein from RIP-Chip data requires complex computational analyses. To avoid artifacts arising from interactions that occur after cell lysis, methods that rely on cross-linking RNA–protein complexes in vivo
by ultraviolet (UV) irradiation (at 254
nm) have been proposed [90
]. Although such methods have been in use for decades, advances in technology such as high-throughput sequencing and higher computational power have dramatically increased the power of the procedure.
The cross-linking and immunoprecipitation assay (CLIP) exploits covalent protein–nucleic acid cross-linking to stringently purify complexes of RNAs with a specific RBP. Moreover, nuclease digestion of RNA sequence stretches that are not protected by the RBP enables the isolation of binding sites at very high resolution [91
]. The protected RNA fragments are then ligated to adapters at both 5′- and 3′–end, converted into cDNA and PCR-amplified. Initial CLIP experiments employed sequencing of cloned PCR-amplified fragments in a suitable vector. Based on the location of YCAY sequence motifs with respect to intron and exon boundaries, this method generated a genome-wide RNA map predicting the pattern of NOVA-dependent splicing in the brain [91
]. A more recent version of the protocol called HITS-CLIP (high-throughput sequencing of RNA isolated by crosslinking and immunoprecipitation) has provided a more comprehensive picture of the function of this nervous system-specific RBP, revealing the role of NOVA in alternative polyadenylation of transcripts in the brain [93
]. HITS-CLIP has further been used to identify functional miRNA–mRNA interaction sites [94
] and, in combination with custom microarray-based analyses, to identify specific pathways targeted by RBPs [95
]. Though highly successful, this method has a few limitations. UV irradiation at shorter wavelengths (254
nm) may introduce nucleic acids breaks as well as modifications in the nucleotides that may hamper accurate mapping of the cDNAs to the corresponding genomic loci. Given that the binding sites of RBPs are usually very short, <10
], the nuclease digestion has to be very precisely monitored and optimized. Overdigestion will lead to RNA fragments that are too short to be unambiguously mapped to the genome, while underdigestion to fragments that are too long to be sequenced in their entirety yielding reads that do not contain interaction sites.
Recently, another promising variant of this method called photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) has been proposed. PAR-CLIP uses photoreactive analogs of ribonucleosides to crosslink RNA–protein complexes in vivo
]. When 4-thiouridine is used as a uridine analog, crosslinked 4-thiouridines are subsequently reverse transcribed and PCR amplified as cytosines. Loci to which many reads with thymidine to cytosine mutations are mapped, represent in all likelihood binding sites to which the protein has been crosslinked. The crosslink-diagnostic mutations enable one to pinpoint RBP-binding sites at site resolution. An even more specific mutational pattern has been observed when PAR-CLIP was applied to Argonaute proteins: the location of thymidine-to-cytosine mutation was very frequently found immediately upstream of the region of complementarity between the mRNA and the 5′-end (known as the seed region) of an abundantly expressed miRNA [96
]. Using this information, one can identify with accuracy the miRNA that is involved in regulating a particular transcript at a specific site. Another advantage of PAR-CLIP is that the photoreactive analogs can be crosslinked at higher wavelengths (365
nm) which are less damaging to the nucleic acids. While use of 4-thiouridine is suitable for RBPs that have uridines in their binding motif or very close to it, it is not suitable for RBPs that bind to uridine depleted stretches. For such cases, 6-thioguanosine has been proposed as an alternative [96
]. Preliminary studies suggest that 6-thioguanosine crosslinking also yields diagnostic guanosine to adenosine mutations that could be used to accurately pin-point the location of binding sites. However, this change appears to occur less frequently than the thymidine-to-cytosine transition which is observed when 4-thiouridine is used. In addition, earlier reports indicated that 6-thio guanosine is toxic to cells and prevents RNA and protein synthesis beyond certain concentrations [97
], hence the incorporation frequency of 6-thioguanosine in the RNA cannot be as high as that of 4-thiouridine. Also, the crosslinking efficiency of 6-thioguanosine is several folds lower than that of 4-thiouridine [96
] making its use more limited compared to that of 4-thiouridine. Finally, incorporation of photoreactive analogs in animal tissues remains a daunting task. Therefore, this technique has so far been be used in analyses of cell lines.
In yet another promising development in this field, a CLIP variant known as iCLIP exploits the propensity of reverse transcriptase to terminate at a nucleotide remains modified after the protein that was crosslinked to it was digested away [98
]. Sequence reads obtained from iCLIP are thus expected to start exactly at the site of crosslink enabling identification of RBP-binding sites at nucleotide resolution. It has been argued that with previous methods, many bona fide binding sites would have caused the reverse transcriptase to stop before reaching the 5′-adaptor, yielding aborted cDNAs that would be selectively lost during the PCR cycles due to the lack of the annealing site for the sequencing primer. Further studies will be necessary to establish to what extent this occurs, because at least in PAR-CLIP, large numbers of sequence reads with crosslink diagnostic mutations inside the reads have been obtained.
Taken together, all the CLIP variants have been employed with demonstrable success in the transcriptome-wide identification of binding sites for numerous RBPs. The improvements in the technique are enabling increasingly many groups to apply it to a variety of proteins [91
], and we expect that in the not-too-distant future, catalogs of RBP-binding sites will be available, similar to resources that have been constructed for TFs. However, additional work will be needded to establish protocols that would allow one to obtain quantitative data on RBP–RNA interactions in vivo
. It is clear that the choices made in the many steps of the CLIP protocol can lead to the identification of specific subsets of binding sites. For example, in order to identify the position of the crosslink with methods other than iCLIP, the sequence read has to cover the RBP-binding site. This is only ensured when the length of the fragments that are subjected to deep sequencing is relatively short, and thus the nuclease digestion has to be carried out to the extent that little beyond the RBP-protected fragment is left intact. The complete digestion products will likely be too short to be unambigously mapped, and almost complete digestion is very difficult to control. As alternatives, one may use incomplete digestion followed by computational analyses to identify the short-binding sites presumably located at some distance form the suquenced reads, or digestion by a nuclease that does not cleave between all dinucleotide pairs. In fact, of the nucleases that have been employed in CLIP, only RNAse I does not have a preferred nucleotide for cleavage, while the others have specific preferences. RNAse T1 cleaves 3′ of guanosine [101
] and RNAse A cleaves only after cytosine and uridine [102
]. Another commonly used nuclease, micrococcal nuclease cleaves preferentially 5′ of an adenosine or thymidine [103
]. Thus, the choice of nuclease may influence the subset of sites that is isolated through CLIP, enriching for those that do not contain optimal nuclease cleavage sites in the immediate vicinity of the RBP binding site. It is also known that 254
nm UV irradiation preferentially crosslinks pyrimidines [104
] with crosslinks normally not occuring between base paired regions [105
]. Thus, one may expect that binding sites that are embedded in pyrimidine-rich regions will be preferentially enriched in the 254
nm CLIP data. Overall, a large number of techniques to decipher RNA–protein interactions is available today, each having its own set of advantages and limitations, which need to be weighted in relationship to the problem at hand.
Rapid advancements in the experimental methods have facilitated identification of RBP-binding sites at a transcriptome-wide level and lead to fascinating insights into the intricate regulatory networks involving RBPs. Although additional work will be needed to increase the accuracy of these methods, it should soon be possible to obtain comprehensive maps of RNA-protein interactome in a variety of cell types and in a variety of conditions.