Principles and pitfalls of high-throughput analysis of microRNA-binding thermodynamics and kinetics by RNA Bind-n-Seq

Summary RNA Bind-n-Seq (RBNS) is a cost-effective, high-throughput method capable of identifying the sequence preferences of RNA-binding proteins and of qualitatively defining relative dissociation constants. Although RBNS is often described as an unbiased method, several factors may influence the outcome of the analysis. Here, we discuss these biases and present an analytical strategy to estimate absolute binding affinities from RBNS data, extend RBNS to kinetic studies, and develop a framework to compute relative association and dissociation rate constants. As proof of principle, we measured the equilibrium binding properties of mammalian Argonaute2 (AGO2) guided by eight microRNAs (miRNAs) and kinetic parameters for let-7a. The miRNA-binding site repertoires, dissociation constants, and kinetic parameters calculated from RBNS data using our methods correlate well with values measured by traditional ensemble and single-molecule approaches. Our data provide additional quantitative measurements for Argonaute-bound miRNA binding that should facilitate development of quantitative targeting rules for individual miRNAs.

In brief Jouravleva et al. show that, when designed and analyzed with potential biases in mind, RNA Bind-n-Seq provides a high-throughput route to discovering the sequence preferences of RNAbinding proteins, defining their absolute equilibrium binding dissociation constants, and delineating their relative association and dissociation rate constants for sites of disparate sequence and affinity.

INTRODUCTION
Predicting the regulatory consequences of molecular interactions based on measured biochemical properties is a longstanding goal in biology. The strength of an interaction is quantitatively described by its equilibrium dissociation constant (K D ), the substrate concentration required for halfmaximal complex formation. By definition, K D is the ratio of the rate of complex dissolution, described by the dissociation rate constant, k off , and the rate of complex formation, described by the association rate constant, k on . Interactions with the same affinity may arise from different kinetic behaviors that may vary by orders of magnitude: one set of interactions may be driven by rapid recognition and binding (large k on ), while another may be driven by slower k on but increased complex stability (small k off ). Knowledge of k on limits the possible mechanisms for target finding, e.g., whether a binding event is diffusion-limited or whether additional proteins can speed the search for high-affinity binding sites. k off determines the lifespan of the binding interaction and thus provides insight into whether a process is likely to be driven by a hit-and-run mechanism or require continued site occupancy to exert a regulatory effect. Thus, the affinities and the dynamics of molecular interaction provide critical information for developing quantitative models of a regulatory network.
Interactions between RNA-binding proteins (RBPs) and mRNAs are dynamic (M€ uller-McNicoll and Neugebauer, 2013;Licatalosi et al., 2020) and at the core of gene regulation (Moore, 2005). Current approaches to determine thermodynamic and MOTIVATION The speed at which regulatory proteins find their RNA targets and the time they spend bound to specific regulatory sequence motifs govern the specificity and extent of target repression or activation. RBNS was developed to qualitatively describe the specificity of RNA-binding proteins. Here, we present an analytical strategy to measure absolute affinity values and relative association and dissociation rate constants from RBNS data. The resulting analysis provides a resource for replacing >500 biochemical binding assays in examining physicochemical properties and kinetic behavior of miRNA-guided Argonaute (AGO) proteins. Our quantitative measurements should allow prediction of complex and dynamic miRNA-mRNA regulatory networks.
RNA Bind-n-Seq (RBNS) determines the specificity of proteins for nucleic acids in vitro using a single-step binding assay and a high-throughput sequencing readout, making the method widely accessible and cost-effective. Originally developed to identify the repertoire of DNA sequence motifs by zinc-finger proteins and calculate their relative binding affinities (Zykovich et al., 2009), it was subsequently applied to RBPs (Lambert et al., 2014;Taliaferro et al., 2016;Dominguez et al., 2018 Experimental procedures for RBNS and the computational strategy to identify RBP binding sites and their relative K D values have been presented previously (Lambert et al., 2015;McGeary et al., 2019).
Here, we revisit RBNS, providing practical guidelines for performing the assay and highlighting potential biases and pitfalls influencing the outcome of RBNS analysis. We report a novel computational approach that extracts absolute K D values from RBNS data, providing a high-throughput route to quantitatively describe intermolecular interactions. Finally, we extend RBNS to measure relative k on and k off values, enabling a deeper understanding of reversible bimolecular interactions. To test our method, we use published (McGeary et al., 2019) and new data for target binding by mammalian AGO2 loaded with different miRNAs. In vivo, miRNAs guide AGO2 to repress gene expression (Liu et al., 2004;Meister et al., 2004;Baek et al., 2008;Selbach et al., 2008;Guo et al., 2010;Doench and Sharp, 2004;Hendrickson et al., 2009;Huntzinger and Izaurralde, 2011;Bazzini et al., 2012). Unlike most RBPs (69 of 78 assayed), which have detectable binding affinity for just one or two sequence motifs (Van Nostrand et al., 2020), AGO proteins take their binding specificity from their 21-nt-long miRNA guides. Consequently, AGO proteins can bind a variety of sites with affinities ranging from $1 pM to $10 nM. AGO proteins are ideal for illustrating the biases inherent in RBNS, discussing experimental considerations, and benchmarking our strategies for measuring absolute K D values and relative k on and k off values.

RESULTS
Overview of RNA Bind-n-Seq RBNS begins with incubation of a purified RBP with a large pool of RNA molecules, each containing a region of random sequence ( Figure 1). After reaching equilibrium, protein-bound RNA is separated from unbound RNA and then extracted from the nitrocellulose membrane, reverse transcribed, amplified, and sequenced. Analysis of sequencing data readily identifies a list of preferentially bound sequence motifs. Performing multiple individual binding reactions across a broad range of RBP concentration allows measurement of relative K D (Lambert et al., 2015).

Experimental design considerations RISC purification and choice of concentration range
Assembly and purification of miRISC (RNA-induced silencing complex) comprising purified human or mouse AGO2 and a synthetic miRNA guide has been described elsewhere (Figure 2A) (Flores-Jasso et al., 2013). Because different miRISC preparations can have different percent activities for the same amount of protein, a titration experiment to quantify the concentration of binding-competent miRISC is a key step for determining the protein concentrations to use in RBNS reactions. We estimate total miRISC concentration by northern blot and use double-filter binding assays to measure the equilibrium binding of active miRISC with a high-affinity RNA target (Figures 2B and 2C). Concentration of RNA target is chosen to be much greater than its measured K D , and miRISC concentration is varied by an order of magnitude above and below the target concentration. The titration data are then fit to a quadratic equation ( Figure 2C). The stoichiometry of the bound complex is 1:1 (Wee et al., 2012); therefore, the breakpoint in fraction bound versus the ratio of protein to ligand indicates the amount of active miRISC.
To examine binding to both high-and low-affinity sites, RBP concentrations in RBNS reactions should span and exceed K D values of those sites. miRISC has subnanomolar affinity for its canonical sites-mouse AGO2 loaded with let-7a has K D $10 pM for an 8mer site but $200 pM for a 6mer (Becker et al., 2019;Flores-Jasso et al., 2013;Salomon et al., 2015). Therefore, we surveyed logarithmically spaced miRISC concentrations from $5 pM to $1 nM. In addition, we performed a no-RISC binding experiment to detect potential method-specific biases in background binding ( Figure 2D).

RNA design
The length of the random sequence region of RNA ligands is an important aspect of RBNS design (Lambert et al., 2015). miRNA canonical binding sites are 6-8-nt long, whereas noncanonical sites can be 11-12 nt long (Shin et al., 2010). To capture binding at both, we randomized 20 nt, an RNA pool of 4 20 = 1.1 3 10 12 distinct RNA sequences. Each binding reaction was carried out with 2 pmol of RNA-1.2 3 10 12 molecules; thus, the pool theoretically contained >10 4 copies of each of the 4 12 = 1.7 3 10 7 possible 12mers and >10 8 copies of each of the 4 6 = 4.1 3 10 3 possible 6mers. High-throughput sequencing of the RNA pool generally confirmed these expectations and showed an overall balanced base composition, albeit with a slightly reduced frequency of guanine and slightly higher than expected frequency of cytosine ( Figure S1A). In addition, some 10mers were depleted 10-fold, and others were over-represented by 5-fold compared with expected frequency ( Figure S1B). Such departures from randomness in the RNA pool is corrected for by calculating the ratio of the frequency of each 10mer proteinbound RNA to that in the starting pool.
In typical RBNS experiments, the random sequence RNA region is flanked by constant primer-binding regions used for sequencing ( Figure 1). This design simplifies library preparation, avoids biases that can result from RNA ligation, and ensures that any RNA carried over from protein purification will not contaminate the sequenced library. However, fixed primer-binding sequences can affect the secondary structure ensemble of the RNA pool (Cook et al., 2017) and bias interpretation of RBNS assays with RBPs that recognize structured elements. miRISC binds single-stranded sequence motifs; therefore, secondary structures will have little effect on miRISC binding unless they occlude a site. We observed a $1.5-fold higher enrichment of canonical binding sites at the 5 0 end of the random sequence region for miRISC bearing let-7a, miR-34b, or miR-449a (Figures S1C and S1D). Nevertheless, constant primer-binding sequences may bias the RBNS assay if they contain a motif for miRISC binding. In this case, miRISC may bind to virtually any RNA molecule in the pool, underestimating the enrichment of authentic binding motifs within the randomized sequence region, reducing RBNS sensitivity. miRNAs guide miRISC to their targets primary through Watson-Crick pairing with their ''seed'' sequence, miRNA nucleotide positions g2-g8 (Lewis et al., 2003(Lewis et al., , 2005Rajewsky and Socci, 2004;Krek et al., 2005), which is displayed by AGO2 in a helical geometry ready to base pair (Parker et al., 2005;Wang et al., 2008;Elkayam et al., 2012;Nakanishi et al., 2012). Figure S1C presents the nomenclature for canonical site type classification. Given the high diversity and short length of miRNA seed sequences, the potential for miRISC binding to the constant regions is high. For example, constant regions in the RNA pool used by McGeary et al. (2019) contained at least one canonical site for 170 human miRNAs, including miR-7.
Even noncanonical, seed-matched sites with undetectable affinity may impact binding by interacting with higher affinity sites, e.g., those with slow k on . miRISC binds rapidly to a short, seed-matched 4mer-m3.6 site, but this transient binding does not produce a measurable interaction. In contrast, 3 0 -only sites typically have slower association rate constants ($10 7 M À1 s À1 ) than seed-matched sites (Salomon et al., 2015), because miRISC does not pre-organize the 3 0 region of its miRNA guide. Yet a 3 0 -only 10mer-m10.19 site placed adjacent to a 4mer-m3.6 site present in the constant region of the RNA target displayed a diffusion-limited apparent k on ( Figure S1E). Constant regions cannot be readily modified to avoid biases, as they must remain compatible with Illumina sequencing. Inspired by Becker et al. (2019), we use cDNA oligonucleotides to block the common sequences present in each RNA molecule. Making these regions double-stranded prevents miRISC binding and disfavors intramolecular secondary structures, leaving only the randomized sequence and four 5 0 and 3 0 flanking nucleotides accessible (Figure 1). We benchmarked the blocker strategy using miR-449a miRISC, which has a low-affinity seed-matched binding site (5mer-A1) in the 3 0 constant region of the RNA pool. DNA blocking oligonucleotides increased the enrichment of canonical sites 2.5-to 5-fold by decreasing the fraction of RNA with no binding site in the random sequence region ( Figure S1F). Time to reach equilibrium K D , the equilibrium dissociation constant, must be measured after binding reactions have reached equilibration. As reviewed by Jarmoskaite et al. (2020), binding follows an exponential curve characterized by its half-life, t 1/2 . A conservative standard for equilibration is five half-lives, corresponding to 96.6% completion. Half-life is also linked to the equilibration rate constant k eq : Thus, the time for a binding reaction to reach 96.6% of completion is given by For the binding equilibrium where miRISC interacts with an RNA site i , the equilibration rate constant is described by The longer-lived the complex, the longer the incubation time required to reach equilibrium. For the let-7a 8mer site, k on = 2.4 ± 0.1 3 10 8 M À1 s À1 , k off = 0.0036 ± 0.0003 s À1 (Salomon et al., 2015) and the concentration of the 8mer site in the RNA pool used for RBNS is 40 pM (Table S1). Under these conditions, the binding reaction with the lowest miRISC concentration (8 pM) should reach equilibrium in 16 ± 2 min. To provide sufficient time for equilibration, we incubated binding reactions for 2 h. RNA pool concentration In RBNS, the RNA pool concentration is the same for all binding reactions, while the RBP concentration varies. To measure K D , the ''titration'' regime-in which the concentration of a binding site is much greater than K D -must be avoided (reviewed by Jarmoskaite et al., 2020). We measured the frequency of known canonical and noncanonical binding sites in the RNA pool by high-throughput sequencing. Under our experimental conditions, concentrations of known binding sites are of the same order of magnitude as their K D values. This is an acceptable regime, and K D can be determined by an appropriate binding equation that explicitly accounts for bound protein and does not rely on the common assumption that [Protein] free $ [Protein] total (see STAR Methods). Separation of bound and free RNA molecules After reaching equilibrium, bound and free RNA are separated, e.g., by electrophoretic mobility shift assay, capture on beads coupled to specific antibodies that recognize the RBP or a ligand attached to the synthetic miRNA, and a double-filter binding assay.
The double-filter method has been used extensively to investigate the equilibrium binding and kinetic properties of protein-nucleic acid interactions (Wong andLohman, 1992, 1993;Riggs et al., 1970aRiggs et al., , 1970bWinter and von Hippel, 1981;Clore et al., 1982;Wee et al., 2012;Xiao and MacRae, 2020). The nitrocellulose filter preferentially retains protein and protein-bound nucleic acids. The positively charged nylon filter placed directly beneath the nitrocellulose membrane traps protein-free nucleic acids not retained by the nitrocellulose.
RBNS using miRISC immobilized on paramagnetic beads provides an obvious route to high-throughput automation of the method, a prerequisite for defining the site types and their K D values for all known mammalian miRNAs. We compared the double-filter binding assay with miRISC immobilized on paramagnetic beads via an antibody-FLAG epitope tag interaction (Figures 1 and S2A). As the readout, we calculated the enrichment of canonical binding sites: the frequency of a binding site in miRISC-bound reads divided by its frequency in the starting pool ( Figures 3B and S2B). Enrichment of canonical sites was 5-to 10-fold lower when bound RNA was recovered using bead-immobilized miRISC. Concomitantly, RNA non-specifically associated with miRISC was more abundant for the beads than in the double-filter binding assay, diluting the specifically recovered RNA and reducing RBNS sensitivity. We developed an alternative method for enrichment analysis to account for this higher background. Using this approach (described below) for the two methods yielded highly similar lists of binding sites ( Figure S2C) with comparable affinities (%2-fold difference for 12 of 18 binding sites) ( Figure S2D). While magnetic bead assays present a clear advantage for automation, the greater sensitivity and reduced handling of the double-filter binding assay is generally simpler, especially for kinetic assays.

De novo identification of binding sites Principle
The number of occurrences of all k-mers of a specific length is counted over all the reads in a binding reaction. These counts are then divided by the total count of all k-mers to yield the kmer frequency. The same procedure is repeated over the reads from the RNA input pool to account for any sequence biases present in the random region of RNA molecules. Finally, enrichment of a k-mer is defined as the frequency of the k-mer in proteinbound reads divided by its frequency in the RNA input pool (Figure 3C). A motif is identified as a binding site if (1) its enrichment is either above an arbitrarily defined value (McGeary et al., 2019) or is greater than a chosen Z score threshold (Lambert et al., 2014;Dominguez et al., 2018;Van Nostrand et al., 2020) and (2) it is not enriched in the no-RISC control reaction. Identified binding sites were also not enriched in a binding reaction containing the miRNA but not AGO2, indicating miRISC-specific binding ( Figure S3A). RBNS sensitivity depends on RBP concentration Enrichment values are dependent on the concentration of RBP and produce a characteristic unimodal curve (Lambert et al., 2014). At low protein concentrations, enrichment of a motif increases with increasing protein concentration, as increasing amounts of bound RNA improve signal over a constant background of RNA molecules recovered even in the absence of protein. At high RBP concentrations, high-affinity motifs are saturated, and binding is driven toward lower affinity sites, resulting in a lower fraction of high-affinity motifs ( Figure 3B). In our experiments using AGO2 miRISC, the relative rankings of binding motifs obtained at different protein concentrations were always preserved (Figures 3B, S1F, and S2B) and highly correlated (Spearman's r > 0.83 at adjacent protein concentrations), the binding reaction with the highest protein concentration offered the greatest sensitivity for low-affinity sites ( Figure S3B). To limit sensitivity bias when comparing binding sites within various miR-NAs, it is preferable to measure the active concentration of AGO2 and perform site discovery at similar active miRISC concentrations.
Sensitivity depends on the criteria chosen for significance Motifs are typically considered enriched if their enrichment scores (McGeary et al., 2019) or Z scores (Lambert et al., 2014;Dominguez et al., 2018;Van Nostrand et al., 2020) exceed an arbitrarily defined threshold. Because binding site enrichment scores depend on the amount of RNA non-specifically recovered in the assay-increased background dilutes the specific signal, lowering enrichment scores-using a fixed threshold to compare different datasets may bias motif discovery. For example, a threshold of 10 ( Figure S3C) (McGeary et al., 2019) applied to our AGO2 let-7a datasets generated by the double-filter binding method would yield a list of 17 enriched sites, but only 4 binding sites when using bead-immobilized miRISC ( Figure S3D). This bias is prevented by using Z scores, which indicate the number of standard deviations by which enrichment values differ from the mean enrichment ( Figure S2C). Previous reports used various values of Z score for different proteins and motif lengths (Lambert et al., 2014;Dominguez et al., 2018;Van Nostrand et al., 2020). To standardize the procedure of choosing a threshold, we consider a motif significantly enriched if its Z score R 99.9 percentile ( Figure S3E).
Finally, site discovery may be biased by ''shadow motifs''-kmers occurring in reads with high-affinity binding motifs but not conferring binding by themselves. An iterative procedure allows detection of ''true binders'' by masking reads harboring the most enriched binding site in the protein sample and in the RNA pool, then repeating the analysis iteratively ( Figure 3C) (McGeary et al., 2019;Dominguez et al., 2018;Van Nostrand et al., 2020). All enrichments are recalculated on the masked reads to obtain the resulting most enriched motif, with this process continuing until the enrichment Z score (calculated from the original enrichment values) no longer meets the criterion for significance. Motif size Known miRISC binding sites are 5-12 nt long. As enrichment is calculated as the ratio of motif frequency in protein sample over RNA input pool, and division by zero is not supported, kmers must be present in sequenced reads of the RNA pool. With current sequencing costs, one can readily sequence $20 million reads for each condition. At this depth, $200 reads of every possible 10mer will be present within the 20-nt-long randomized central region, but not all 11mers will be sampled. As sequencing costs fall, deeper sequencing should allow longer motif discovery.
Estimation of absolute K D values using synthetic datasets k-mer enrichment approximates relative binding affinities, because high-affinity binding sites are more enriched than lower affinity sites (Zykovich et al., 2009). However, enrichment depends on RBP concentration, saturation of high-affinity sites, and background binding (Figures 3B, S1F, and S2B). Consequently, enrichment (1) has an upper limit that depends on the length of both k-mers and the RNA molecules and (2) is not directly proportional to binding affinities (Lambert et al., 2015). Recently, a strategy was proposed to estimate relative K D values solely from sequenced data without prior information on miRISC concentration, fraction bound of RNA molecules, and non- We modified this procedure to simultaneously determine absolute K D values using maximum likelihood estimation (MLE) ( Figure 4A). In this statistical model, the parameter values-the K D values of various sites, miRISC concentration, and the background of RNA interacting non-specifically with the beads, membrane, or tubes-of a mathematical model predicting expected sequencing counts are fit to maximize the likelihood of observing the predicted sequencing counts in experimental data. We tested our approach using simulated RBNS datasets containing four binding sites and corresponding to our typical experimental conditions. Figure S4A shows the convergence of a typical MLE fit. Our method accurately estimated the ground truth affinity values of the four binding sites ( Figures 4B and S4B).
Estimation of absolute K D values using experimental datasets Next, we applied our MLE procedure to experimental let-7a datasets. To test whether the method's K D estimates were robust, we removed one sample from the dataset and re-fitted the parameters using the remaining binding reactions. Estimates obtained by this leave-one-out procedure strongly correlated in pairwise comparisons (Pearson's r R 0.969 for each of the 15 pairwise comparisons), even when we removed the highest miRISC concentration or the no-RISC binding reaction ( Figures 4C and S4C). Estimates were also highly reproducible in two independent experiments, each comprising five protein concentrations and no-RISC control (Pearson's r R 0.974; Figure 4D).
To estimate K D values and their 95% confidence intervals (CI), we first performed fitting optimization 2,000 times. Each binding experiment used two independent trials to account for errors caused by sample-to-sample variation. Second, we bootstrapped sequencing data 10 times to account for the error caused by the multinomial down-sampling of RNA molecules during sequencing. Third, each fitting optimization was performed 100 times using independent and partially randomized starting points to account for the error of ending at a local minimum. Importantly, while values for the initial guess spanned a >20-fold (C) Illustration of de novo site discovery strategy. Occurrences of all k-mers are calculated in all the sequencing reads (black lines). Each l-nt-long read contains l -k + 1 motifs. For example, the magenta 10mer appears four times in the miRISC-bound sequencing data (five reads total), therefore its frequency in this sample is 4 O (5 3 55). The most enriched motif is selected, and its Z score is compared with the Z score threshold. The magenta motif is the most enriched, and its Z score is above the Z score threshold; therefore, this motif is retained. Next, all reads containing the identified motif are masked in the miRISC-bound and RNA pool sequencing data (gray lines). All enrichment values are recalculated on masked reads to eliminate ''shadow'' motifs (blue). The orange motif is the most enriched at this iteration and is retained. Nevertheless, its Z score (calculated from the original enrichment values) is below the Z score threshold; therefore, the orange k-mer is not considered as enriched. This iterative procedure continues until the Z score of the most enriched binding site is below the Z score threshold. Article ll OPEN ACCESS range, fitted parameters were robustly estimated, yielding narrow 95% CI (Figures 5, 6, and S2D). We note that our procedure for estimating K D values uses only reads containing one miRNA-binding site. Reads containing multiple instances of binding sites represented %1% of sequence reads in the starting library. Omission of these reads did not affect estimation of K D values: fractionally assigning such multi-site reads to appropriate binding sites yielded the same results as excluding them ( Figure 4E).
Fitted parameters were also insensitive to the number and identity of binding sites. For example, when optimization was performed with 15 statistically enriched site types (

High-throughput absolute K D estimation provides insights into miRNA targeting
Our analytical approach can estimate the affinity of any binding site of interest-enriched or not-provided the motif is represented in the sequenced data. We used our strategy to compute the K D values for de novo identified binding sites from previously published RBNS datasets (McGeary et al., 2019) and from additional datasets generated in this study (Figures 5B, S5A, and S6). We found that canonical binding sites display miRNA-specific differences in their affinities. For example, the C. elegans miRNA, lsy-6, loaded into human AGO2 had a 5-fold weaker affinity for an 8mer than AGO2:miR-1 miRISC ( Figure S5A). Notably, the seed of miR-1 is predicted by nearest neighbor methods (Turner and Mathews, 2010) to pair more strongly with its target than the seed of the lsy-6 miRNA ( Figure S5B), consistent with the known relationship between the predicted strength of seed-pairing and the efficacy of target mRNA repression (Garcia et al., 2011). While affinity increased with increased predicted pairing stability, the correlation between measured affinity and affinity predicted by nearest neighbor free energy was significant only for 7mer-m8 binding sites ( Figure S5B). Affinity of seed-matched sites was also increased by A at position t1 ( Figure 6A), which preferentially interact with a t1-nucleotide-binding pocket in AGO2 (Schirle et al., 2015).
Importantly, miRNAs bound some noncanonical sites with greater affinity than a canonical 6mer site (Figures 5B, S5A, and S6). These included 3 0 -only sites that extensively pair to the 3 0 region of the miRNA without pairing to the seed. These sites were identified in the let-7a, lsy-6, miR-155, and miR-124 datasets using the Z score approach and display as much as 4-fold greater binding affinity than that of the canonical 6mer site (Table S3). By contrast, miR-1, miR-7, miR-449a, and miR-34b bound 3 0 -only sites poorly (K D > 1.2 nM). We note that the seed regions of these miRNAs are predicted by nearest neighbor analysis to pair more strongly and their 3 0 regions more weakly than the equivalent sites for let-7a, lsy-6, miR-155, and miR-124, likely explaining the pronounced miRNA-specific differences in the repertoire of noncanonical sites.
Single target mismatches, insertions, and deletions disrupting the seed detectably increased K D ( . Surprisingly, some sites bearing a central 1-nt bulge in the seed were detected among enriched motifs (Figures 3A, 5B, S5A, and S6), suggesting that these bulges are better tolerated. For example, insertion of guanosine between positions 5 and 6 (b5.6G) in miR-124 seed-matched sites decreased affinity by only 21-fold ( Figure 6D). This binding site corresponds to a nucleation-bulge site and was previously identified by highthroughput crosslinking and immunoprecipitation (Chi et al., 2012). Nevertheless, 67% of enriched sites (8 of 12) with one target nucleotide inserted in the center of the seed do not use this mode of binding, suggesting that bulged sites may be more common than previously appreciated, but are not used by all miRNAs.

Determining relative kinetic parameters by RBNS using synthetic datasets
Kinetic studies of molecular interactions provide powerful insights into the underlying microscopic mechanisms and functional outcome. We reasoned that RBNS might enable high-throughput measurement of the kinetic behavior of an RBP if the bound RNA was recovered at different times after the binding reaction was started ( Figure S7A). Traditional association kinetics experiments monitor the time-dependent progress of RNA binding at different protein concentrations. To simplify the experiment, one usually uses pseudo first-order conditions, with a low concentration of one reactant (e.g., RNA) and R10-fold higher concentration of the second reactant (e.g., RBP). Under these conditions, the observed rate constant k obs is related to RBP concentration and the association and dissociation rate constants as k obs = k on 3[RBP]+k off . A plot of k obs versus [RBP] is linear, with slope k on and y intercept k off . Therefore, association kinetics requires performing a time series with several protein concentrations. An alternative approach (B) Testing K D estimation with simulated data. RBNS data were modeled by simulating miRISC binding to RNA pool containing four binding sites and no-site molecules. Stock concentration of miRISC was equal to 2.1 nM (dataset 1) or 8.1 nM (datasets 2 and 3). Background was set to 0.1 nM and K Dnosite was set to 5 nM. Error bars indicate 95% CI on the median. Article ll measures binding at different time points under pseudo first-order kinetics, while blocking reassociation by dilution or adding a high-affinity RNA competitor target at high concentration. In this experimental setup, a single protein concentration is used; k off is first determined from the dissociation and is then used to infer k on from k obs .
We developed a computational strategy that does not assume pseudo first-order kinetics and simultaneously fits association and dissociation rate constants using the same RBNS dataset ( Figure 7A). Our mathematical model predicts the read counts for each site type across the time series as a function of k on and k off values for each miRNA site type (including ''no-site''). We also estimated a constant amount of RNA molecules recovered as background in all samples.
To test our approach, we used simulated kinetic RBNS datasets, in which the observations were modeled to match our typical experimental conditions ( Figure S7B). Our analytical method did not recover the absolute ground truth parameters ( Figure S7B). The cost function f cost , which we minimize during the optimization procedure, is a linear combination of exponential functions. This function has multiple minima, and f cost was routinely trapped at a local minimum yielding k on and k off values different from the ground truth ( Figure 7B). Nevertheless, our method was able to discriminate between fast and slow binders-but maybe was not able to distinguish rank order within these broad classes of sites-and preserved the relative differences in kinetic parameters among binding sites ( Figure 7C).

Estimation of relative kinetic parameters using experimental datasets
To benchmark our approach, we used AGO2 let-7a miRISC, whose kinetics have been measured previously (Figures S5D and S5E) (Becker et al., 2019;Salomon et al., 2015). Leaveone-out validation showed that the fitted k on and k off parameters were robustly estimated (Pearson's r R 0.995 for each of the 55 pairwise comparisons; Table S4). We note that estimation of k off for long-lived binding events was sensitive to removing the first and the equilibrium time points from the kinetic series ( Figure 7D). Estimates differed by %2-fold between two independent experiments using two different miRISC preparations (Pearson's r = 0.86) with two noteworthy exceptions ( Figure 7E). The k on of 5mer-m8 and the k off of 11mer-m11.21 differed between experiments by 5-and 3.5-fold, respectively. These observations highlight two pitfalls of RBNS that we also observed measuring binding affinities: estimation is less accurate for (1) Article ll OPEN ACCESS longer and therefore more sparse motifs, and (2) for no-site motifs. (The 5mer-m8 site type was not enriched in our RBNS equilibrium data but was included in the kinetic analysis for comparison with single-molecule results.) Importantly, our mathematical model and fitting algorithm successfully discriminated between fast-the canonical seed-matched 8mer, 7mer-m8, 7mer-A1, and 6mer sites-and slow bindersthe 3 0 -only site 11mer-m11.21 and the 5mer-m8 ( Figure 7F). The same hierarchy of fast and slow binding sites was observed by single-molecule assays, albeit the dynamic range of k on was compressed relative to RBNS. This variance reflects technical differences between the two methods. First, single-molecule approaches use one unique RNA target >20 nt long, whereas RBNS measures binding for sites %12 nt, because longer motifs are too infrequent in the sequencing data. Second, the identity of nucleotides flanking binding sites impacts k on in vitro, with different flanking contexts spanning as much as a 100-fold range ( Figure S7C) (Becker et al., 2019). RBNS averages these differences while single-molecule assays are typically performed using the most favorable context. Our approach readily distinguished between long-lived and short-lived interactions: canonical seed-matched sites had the slowest departure rates (small k off ), whereas the 11mer-m11.21 and 5mer-m8 sites displayed the fastest ( Figure 7F). Although consistent with single-molecule measurements, the dynamic range of k off values estimated by RBNS was narrower than that offered by single-molecule assays. We do not currently understand the source of the compression of dynamic range in k off , but it is unlikely to arise from technical issues in the MLE procedure, because our method preserved the relative relationship among k off values with simulated data ( Figure 7C). If sufficient protein sample is available, we suggest performing both equilibrium and kinetic series to obtain K D and k on and use these values to infer k off .

DISCUSSION
RBNS is a straightforward and cost-effective strategy to interrogate the sequence specificities of RBPs. RBNS is often characterized as an unbiased method for site discovery in vitro. The method does enable simultaneous binding of RBPs to an exhaustive list of motifs. But as discussed above, several factors in the experimental design and computational analysis can alter the outcome of the assay. Nevertheless, when designed and analyzed with potential biases in mind, RBNS provides a high-throughput route to assessing the biochemical and kinetic properties of RBPs.
Despite its greater utility, RBNS has mainly been used to obtain a list of binding motifs (Taliaferro et al., 2016;Dominguez et al., 2018;Hale et al., 2018;Van Nostrand et al., 2020). Notably, some studies have employed RBNS to obtain qualitative, relative binding affinities (Lambert et al., 2014;McGeary et al., 2019). Here, we present a computational method that estimates the absolute values of dissociation constants for an RBP of interest without prior knowledge of miRISC concentration, fraction of RNA molecules bound, or non-specific RNA recovered as background. Our procedure uses sequencing reads containing single sites or a no-site. Because the proportion of reads with multiple sites is low (%1%), we did not consider a more complex model considering all states in which the AGO-miRNA complex is bound to target sites. Importantly, estimating absolute-rather than relative-affinities does not require considering non-specific binding to no-site RNAs, which display more variance in affinity estimates than sequence-specific binding sites.
Applying our approach to simulated RBNS datasets showed that the method accurately estimated the ground truth K D values of the binding sites. Our ability to measure absolute K D values enables direct comparison of binding affinities for the same regulatory site type among different miRNAs. For example, our analysis revealed a 5-fold range in binding affinities for 8mers of human miR-1 and worm lsy-6. If miR-1 and lsy-6 were present in the same concentrations in a cell and the distribution of their 8mer sites was the same for the two, one would observe a smaller perturbation of gene expression for a loss of lsy-6 function than that of miR-1.
We benchmarked our strategy using the well-studied AGO2:let-7a miRISC and obtained robust results in good agreement with previous biochemical and single-molecule measurements (Becker et al., 2019;Salomon et al., 2015;Sheu-Gruttadauria et al., 2019;Wee et al., 2012;Schirle et al., 2015). As affinities for all motifs are assessed simultaneously, RBNS combined with our computational methods may become an alternative to traditional low-throughput quantitative methods. For example, our analysis of published and newly generated AGO2 datasets provides a resource for replacing >450 binding assays. Our estimation of dissociation constants for canonical and noncanonical sites for eight different miRNAs gives further support to binding models established using one or two miRNA sequences (Becker et al., 2019;Salomon et al., 2015;Sheu-Gruttadauria et al., 2019;Wee et al., 2012;Schirle et al., 2015).
We envision that absolute K D values measured using RBNS may enable prediction of changes in the occupancy of miRNAguided AGO2 or other RBPs at sites across the transcriptome  (Salomon et al., 2015). Error bars indicate 95% CI on the median (RBNS) or propagated error on the mean (CoSMoS). RNA molecules used to measure binding of 5mer-A1 by CoSMoS unintentionally contained an additional 5mer-m2.6-w2 binding site. The solid orange and red lines indicate 2-and 10fold difference, respectively. Dashed diagonal lines show y = x.
12 Cell Reports Methods 2, 100185, March 28, 2022 Article ll OPEN ACCESS in response to developmental and external stimuli, thereby allowing the modeling of the resulting changes in regulatory activity. For miRNAs, RBNS using miRISC immobilized on paramagnetic beads promises to allow determination of the site types and equilibrium binding affinities for all mammalian miRNAs, a prerequisite for predicting the occupancy and identity of miRNA target sites in vivo. Finally, k on and k off measured by RBNS may prove useful for understanding the mechanisms of molecular interactions, providing additional information for developing quantitative models of biology.

Limitations of the study
The method presented here has broad utility in quantitatively assessing the specificity of RNA-or DNA-binding proteins for nucleic acids. We consider miRISC in solution with the RNA pool, and we assume that miRISC may bind one of the k-mers within the RNA molecule with 1:1 stoichiometry and a Hill coefficient of 1 and recovery of bound RNA is complete. For K D estimation, we also assume that the reaction has reached equilibrium, and concentrations of binding sites are below or of the same order of magnitude as their K D values. De novo identification of binding sites, and reliable measurement of K D , k on , and k off values of motifs of interest are achieved if these sites are represented in the sequenced reads. For RNA molecules with 20-nt random region and datasets consisting of 10-20 million sequenced reads after filtering steps, the maximum effective motif size for de novo site discovery is 10 nt. Interrogation of longer motifs could be achieved by including a constant region, such as an imperfect seed sequence, in addition to a random region (McGeary et al., 2022). For fitting K D , k on , and k off values, longer motifs can be interrogated, as the sequenced data are not required to contain all k-mers of a certain length. While our approach correctly ranked long-lived and short-lived interactions, fitted k off values may underestimate differences among the binding motifs. Therefore, we suggest fitting K D and k on and using these values to infer k off .

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:

RESOURCE AVAILABILITY
Lead contact Further information and requests for resources and reagents should be directed to, and will be fulfilled by, the lead contact, Phillip D. Zamore (phillip.zamore@umassmed.edu), or by completing the request form at https://www.zamorelab.umassmed.edu/reagents.

Materials availability
Plasmids generated in this study are available for non-commercial use upon request without restriction.
Data and code availability RBNS sequencing data have been deposited at National Center for Biotechnology Information Sequence Read Archive and are publicly available as of the date of publication using accession number PRJNA807105. All published software required to reanalyze the data reported in this paper is described in the quantification and statistical analysis section below. All original code has been deposited at https://figshare.com/articles/software/MicroRNA-binding_ thermodynamics_and_kinetics_by_RNA_Bind-n-Seq/19180952 and is publicly available as of the date of publication.
Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

Cell lines
Studies used HEK293T cells stably overexpressing FLAG-AGO2. AGO2 cDNA was amplified by RT-PCR from mouse testis total RNA. Restriction cloning was used to add the AGO2 coding sequence to pScalps Puro EGFP, fusing the sequence in-frame with N-terminal 3XFLAG tag. Lentivirus transfer vectors were packaged by co-transfection with psPAX2 and pMD2.G (4:3:1) using TransIT-2020 (Mirus Bio) in HEK293T cells. Supernatant containing lentivirus was used to transduce HEK293T cells in the presence of 16 mg/ml polybrene (Sigma) to obtain stable FLAG-AGO2-expressing cell lines. Three sequential transductions were performed to maximize recombinant protein production. The transduced cells were selected in the presence of 2 mg•ml À1 puromycin for 2 weeks, then the cells expressing the 5% highest EGFP fluorescence were selected by FACS (UMASS Medical School Flow Cytometry Core) and expanded. Cells were cultured at 37 C, 5% CO 2 in DMEM (Gibco, Life Technologies) supplemented with 10% heat-inactivated fetal bovine serum (Sigma).

RISC purification
Cells stably expressing recombinant FLAG-AGO2 protein were expanded. Cells were washed once with ice-cold PBS, collected by scraping and centrifuged at 500 3 g for 5 min. Cell pellets were flash-frozen in liquid nitrogen and stored at À80 C. Cell extract was essentially prepared as described (Dignam et al., 1983). Briefly, the cell pellet was washed three times in ice-cold PBS and once in Buffer A (10 mM HEPES-KOH pH 7.9, 10 mM potassium acetate, 1.5 mM magnesium acetate, 0.01% (w/v) CHAPS, 0.5 mM DTT, 1 mM AEBSF, hydrochloride, 0.3 mM Aprotinin, 40 mM Bestatin, hydrochloride, 10 mM E-64, 10 mM Leupeptin hemisulfate). Next, the pellet was resuspended in twice its volume with buffer A and incubated on ice for 20 min to allow the cells to swell. The cells were subsequently lysed on ice with a Dounce homogenizer using 40 strokes of a tight pestle (B type). The homogenate was centrifuged at 2,000 3 g to remove nuclei and cell membranes. Next, 0.11 volumes (relative to the volume of the clarified supernatant from the low speed centrifugation) of Buffer B (300 mM HEPES-KOH, pH 7.9, 1.4 M potassium acetate, 30 mM magnesium acetate, 0.01% (w/v) CHAPS, 0.5 mM DTT, 1 mM AEBSF, hydrochloride, 0.3 mM Aprotinin, 40 mM Bestatin, hydrochloride, 10 mM E-64, 10 mM Leupeptin, hemisulfate) was added, followed by centrifugation at 100,000 3 g for 20 min at 4 C; the supernatant corresponds to the S100 extract. Ice-cold 80% (w/v) glycerol was then added to achieve a 20% (w/v) final glycerol concentration and mixed by gentle inversion. S100 was aliquoted, frozen in liquid nitrogen, and stored at À80 C. To capture AGO2 protein, clarified lysate was incubated for 2 h at 4 C rotating with 20 ml anti-FLAG M2 paramagnetic beads (Sigma) per ml lysate. Beads were washed three times with wash buffer (30 mM HEPES-KOH, pH 7.9, 120 mM potassium acetate, 3.5 mM magnesium acetate, 2 mM DTT, 0.01% (w/v) CHAPS). Immobilized AGO2 was loaded by incubating with 1 mM single-stranded miRNA guide (Table S2) in wash buffer for 1 h at 37 C. Unbound miRNA guide was removed by washing the beads three times with wash buffer. AGO2 and AGO2:miRNA were eluted for 1 h at room temperature with 100 ng•ml À1 3XFLAG peptide in wash buffer. miRISC was purified as described (Flores-Jasso et al., 2013). Briefly, the assembled miRISC was incubated overnight at 4 C with a biotinylated, 2 0 -O-methyl capture oligonucleotide (Table S2) linked to streptavidin paramagnetic beads (Dynabeads MyOne Streptavidin T1, Life Technologies). miRISC was eluted with a competitor oligonucleotide (Table S2) for 2 h at room temperature. Excess competitor oligonucleotide was removed by incubating the eluate with streptavidin paramagnetic beads (Dynabeads MyOne Streptavidin T1, Life Technologies) for 15 min at room temperature. Finally, miRISC was dialyzed at 4 C against three changes (3 h each) of a 3,000-fold excess of wash buffer supplemented with 20% (w/v) glycerol. miRISC was aliquoted, frozen in liquid Cell Reports Methods 2, 100185, March 28, 2022 e2 Article ll OPEN ACCESS nitrogen, and stored at À80 C. For single-molecule analysis, guide strands were labeled with 3 0 Alexa Fluor 555 (Life Technologies).

Northern blotting
Northern blotting was essentially performed as described (Pall and Hamilton, 2008). Briefly, miRNA guide standards and miRISC were first resolved on a denaturing 15% polyacrylamide gel, transferred to Hybond-XL (Cytiva) by semi-dry transfer at 20 V for 1 h, and the RNA crosslinked to the membrane with 0.16 M EDC in 0.13 M 1-methylimidazole, pH 8.0, at 60 C for 1 h. The crosslinked membrane was pre-hybridized in Church's buffer (1% w/v BSA, 1 mM EDTA, 0.5 M phosphate buffer, and 7% w/v SDS) at 37 C for 1 h. Radiolabeled, 25 pmol 5 0 32 P-DNA probe (Table S2) in Church buffer was added to the membrane and allowed to hybridize overnight at 37 C, followed by two washes with 23 SSC containing 0.1% w/v SDS and two washes with 13 SSC containing 0.1% w/v SDS at 37 C for 15 min. The membrane was air dried and exposed to a storage phosphor screen.

Quantification of active miRISC by double-filter binding assay
Binding assays were essentially performed as described (Wee et al., 2012). Double-filter binding assays measured the equilibrium binding of active, binding-competent miRISC with 28-nt long RNA target fully complementary to the miRNA guide. To block cleavage, the target RNA contained a phosphorothioate linkage flanked by 2 0 -O-methyl ribose at positions t10 and t11 (Schwarz et al., 2004;Rivas et al., 2005;Wee et al., 2012). Binding reactions were performed in 5 ml in the presence of 30 mM HEPES-KOH, pH 7.9, 120 mM potassium acetate, 3.5 mM magnesium acetate, 2 mM DTT, 0.01% (w/v) CHAPS. A 5 0 32 P-RNA target (0.5 nM) complementary to the seed region of the miRNA guide (Table S2) was incubated with a range of miRISC concentrations from 0.05 nM to 5 nM. The assay also included a no-miRISC binding reaction using miRISC storage buffer. Binding reactions were incubated at 37 C for 1 hr. RNA binding was measured by capturing protein-RNA complexes on Protran nitrocellulose (Whatman, GE Healthcare Bioscience, Pittsburgh, PA) and unbound RNA on a Hybond-XL (Cytiva) in a Bio-Dot apparatus (Bio-Rad, Hercules, CA). After applying the sample under vacuum, membranes were washed with 10 ml equilibration buffer (30 mM HEPES-KOH, pH 7.9, 120 mM potassium acetate, 3.5 mM magnesium acetate, 2 mM DTT). Membranes were air-dried and signals detected by phosphorimaging.

Co-localization single-molecule spectroscopy
Single-stranded RNA targets were generated as described (Salomon et al., 2015). Typically, 100 pmol RNA target (Table S2) was mixed with a 1.5-fold molar excess of Klenow template oligonucleotide (Table S2) in 7.5 ml of 10 mM HEPES-KOH, pH 7.4, 20 mM sodium chloride, and 0.1 mM EDTA. Samples were incubated at 90 C for 5 min in a heat block. The heat block was then switched off and allowed to cool to room temperature. Afterwards, the annealed strands (30% of final reaction volume) were added without further purification to a 3 0 extension reaction, comprising 13 NEB Buffer 2 (New England Biolabs, Ipswich, MA), 1 mM dATP, 1 mM dCTP, 0.12 mM Alexa Fluor 647-aminohexylacrylamido-dUTP (Life Technologies), and 0.2 U/ml Klenow fragment (3 0 -to-5 0 exominus, New England Biolabs) and incubated at 37 C for 1 h. The reaction was quenched with 500 mM (f.c.) ammonium acetate and 20 mM (f.c.) EDTA. A 15-fold molar excess of 'trap' oligonucleotide (Table S2) was added to the Klenow template oligonucleotide. The entire reaction was precipitated overnight at À20 C with three volumes of ethanol. The labeled target was recovered by centrifugation, dried, dissolved in loading buffer (7 M Urea, 25 mM EDTA), incubated at 95 C for 5 min, and resolved on a 6% polyacrylamide gel and isolated by electroelution.
Single-molecule experiments were performed and analyzed as described (Smith et al., 2019). Fresh cover glasses were prepared for each day of imaging. Cover glasses (Gold Seal 24 Å $ 60 mM, No. 1.5, Cat. #3423), and glass coverslips (Gold Seal 25 Å $ 25 mM, No. 1, Cat. #3307) were cleaned by sonicating for 30 min in NanoStrip (KMG Chemicals, Houston, TX), washed with 10 changes of deionized water, and dried with a stream of nitrogen. Two $2 mm diameter lines of high vacuum grease (Dow Corning, Midland, MI) were applied to the cover glass to create a flow cell. Three layers of self-sticking labeling tape (Fisher, Cat. No. 159015R) were applied outside of the flow cell. The coverslip was placed on top of the cover glass with a $0.5 mm gap between the cover glass and coverslip. To minimize non-specific binding of protein and RNA to the glass surface, microfluidic chambers were incubated with 2 mg•ml À1 poly-L-lysine-graft-PEG-biotin in 10 mM HEPES-KOH, pH 7.5, at room temperature for 30 min and washed extensively with imaging buffer (30 mM HEPES-KOH, pH 7.9, 120 mM potassium acetate, 3.5 mM magnesium acetate, 20% (w/v) glycerol) immediately before use. To immobilize biotinylated RNA targets, streptavidin (0.01 mg•ml À1 , Sigma) was incubated for 5 min in each microfluidic chamber. Unbound streptavidin was washed away with imaging buffer.
Immediately before each experiment, a flow cell was incubated for 2 min with imaging buffer supplemented with 75 mg•ml À1 heparin (Sigma H4784), oxygen scavenging system (Crawford et al., 2008;Aitken et al., 2008) ), and 1 mM 4-nitrobenzyl alcohol [Aldrich N12821]). The chamber was then filled with $100 pM target in imaging buffer supplemented with 75 mg•ml À1 heparin, oxygen scavenging system, and triplet quenchers. Target deposition was monitored by taking a series of images; once the desired density was achieved, the flow cell was washed three times with imaging buffer supplemented with oxygen scavenging system and triplet quenchers. A syringe pump (KD Scientific, Holliston, MA) running in withdrawal mode at 0.15 ml•min À1 was applied to the flow cell outlet to introduce AGO2:miRNA complex (pre-heated to 37 C) supplemented with an oxygen scavenging system and triplet quenchers. Typically, 3,000 frames were collected at 5 frames per s. A digitally-controlled heater (TP-LH, Tokai Hit) maintained objective temperature at 42 C. A custom fabricated heating stage (Smith et al., 2019) was heated to 40 C to achieve a sample temperature of 37 C. The temperature on Imaging was performed on an IX81-ZDC2 zero-drift inverted microscope equipped with a cell^TIRF motorized multicolor TIRF illuminator with 561 and 640 nm 100 mW lasers and a 1003, oil immersion, 1.49 numerical aperture UAPO N TIRF objective with FN=22 (Olympus, Tokyo, Japan). Fluorescence signals were split with a main dichroic mirror (Olympus OSF-LFQUAD) and triple emission filter (Olympus U-CZ491561639M). The primary image was relayed to two ImagEM X2 EM-CCD cameras (C9100-23B, Hamamatsu Photonics, Hamamatsu, Japan) using a Cairn three-way splitter equipped with a longpass dichroic mirror (T635lpxr-UF2, Chroma). Illumination and acquisition parameters were controlled with cell^TIRF and MetaMorph software (Molecular Devices, Sunnyvale, CA), respectively.
Images were recorded as uncompressed TIFF files and merged into stacked TIFF files. Images were processed using the pipeline (Smith et al., 2019) as described in the manual. Co-localization events required that (1) the intensity of AGO2:miRNA complex >150 photons, (2) ratio intensity of the AGO2:miRNA complex to the local background >1, (3) the distance between the target and guide was < 1.2 pixel, and (4) sigma < 4.6. To exclude short, non-specific events, the minimal event duration was set to 2 frames. To overcome short temporary loss of miRISC fluorescent signal due to fluorescent dye blinking, the gap parameter was set to 2 frames. Only the first binding event at each target location was used to estimate arrival time and dwell time, to minimize errors caused by occupation of sites by photobleached molecules. The same analysis was automatically performed on 'dark' locations, i.e., regions that contained no target molecules; these served as a control for non-specific binding of AGO2:miRNA complex to the surface of the cover glass.
RNA bind-n-Seq for de novo site discovery and K D measurements Two libraries of RNA oligonucleotides, each containing a central region of 20 random-sequence positions (Table S2), were synthesized with equal ratio of bases (25:25:25:25) (IDT), 5 0 32 P-radiolabeled, and gel-purified. After phenol-chloroform extraction and ethanol precipitation, RNA was denatured at 90 C for 1 min, annealed to BRTP primer (Table S2) and reverse transcribed using SuperScript III. RNA was degraded by alkaline hydrolysis using 0.4 M sodium hydroxide for 1 h at 55 C, and cDNA was recovered by ethanol precipitation. The sample was then amplified with AccuPrime Pfx DNA Polymerase (Invitrogen). The reactions were run on a 2% agarose gel, amplicons were purified then sequenced using a NextSeq 500 (Illumina) to obtain 75-nt, single-end reads.
Because of the randomness of the central region, sequence composition differed between the two libraries. For example, frequencies of the four nucleotides at each position within the randomized region of RNA molecules varied between the two libraries ( Figure S1A). RNA pool 1 was used in RBNS of AGO2:miR-34b trials 1 and 2, AGO2:let-7a trial1, and let-7a trials 1 and 2; RNA pool 2 was used in RBNS of AGO2:let-7a trial 2, AGO2:let-7a trials 1 and 2 (recovery of bound RNA using M2 FLAG beads), AGO2:miR-449a -blockers trials 1 and 2, AGO2:miR-449a + blockers trials 1 and 2, miR-34b trials 1 and 2, miR-449a trials 1 and 2, and association experiments. DNA blocking oligonucleotides were synthesized (IDT) and annealed to RNA library in 30 mM HEPES-KOH, pH 7.5, 120 mM potassium acetate, 3.5 mM magnesium acetate using a 1:1.2 molar ratio of RNA pool to DNA blockers by first incubating at 95 C for 1 min, then at 65 C for 10 min, and finally cooled to room temperature.
Each experiment included five or six binding reactions. The highest concentration of miRISC used corresponded to 40% (v/v) of the stock solution and equaled 0.8-5 nM (f.c.) active protein. For additional reactions, the stock was serially diluted 3.2-fold in storage buffer. Each experiment also included a mock binding reaction (no-RISC control) using protein storage buffer without miRISC. For each miRNA, we performed an additional binding reaction using protein storage buffer with miRNA guide at the highest miRISC concentration assayed, but lacking AGO2 protein. All binding reactions (20 ml) were performed in 25 mM HEPES-KOH, pH 7.9, 110 mM potassium acetate, 3.5 mM magnesium acetate, 0.01% (w/v) CHAPS, 2 mM DTT, 8% (w/v) glycerol and contained 100 nM (f.c.) RNA library. To reduce non-specific binding, each reaction also included 2.5 mg•ml À1 BSA and 0.5 mg•ml À1 yeast tRNA. Reactions were incubated for 2 h at 37 C and then filtered through a Protran nitrocellulose membrane (Whatman, GE Healthcare Bioscience, Pittsburgh, PA) on top of a Hybond-XL (Cytiva) nylon membrane in a Bio-Dot apparatus (Bio-Rad, Hercules, CA). To reduce retention of free single-stranded RNA, we pre-conditioned nitrocellulose and nylon membranes prior to use as described (Smolarsky and Tal, 1970;Wong and Lohman, 1993). Nitrocellulose filters were pre-soaked in 0.4 M potassium hydroxide for 10 min. Nylon filters were incubated in 0.1 M EDTA, pH 8.2 for 10 min, washed three times in 1 M sodium chloride for 10 min each followed by a quick rinse ($15 s) in 0.5 M sodium hydroxide. Nitrocellulose and nylon filters were then rinsed in water until the pH returned to neutral and equilibrated in wash buffer (20 mM HEPES-KOH, pH 7.9, 100 mM potassium acetate, 3.5 mM magnesium acetate, 1 mM DTT) for at least 1 h at 37 C. After applying the sample under vacuum, membranes were washed with 100 ml wash buffer for 3 s. Membranes were air-dried and signals detected by phosphorimaging to monitor binding. The nitrocellulose membranes containing miRISC-bound RNA were excised and incubated with 1 mg•ml À1 Proteinase K (Thermo Fischer) in 100 mM Tris-HCl, pH 7.5, 10 mM EDTA, 150 mM sodium chloride, 1% (w/v) SDS for 1 h at 45 C shaking at 300 rpm. After phenol-chloroform extraction and ethanol precipitation, RNA was reverse transcribed, amplified, and sequenced using the procedure described above for the RNA pool.
RNA bind-n-Seq using paramagnetic beads Binding reactions were assembled as described above. After incubation for 1 h at 37 C, reactions were transferred to tubes containing 10 ml anti-FLAG M2 paramagnetic beads (Sigma). Prior to use, beads were washed three times with wash buffer Cell Reports Methods 2, 100185, March 28, 2022 e4 Article ll OPEN ACCESS (20 mM HEPES-KOH, pH 7.9, 100 mM potassium acetate, 3.5 mM magnesium acetate, 1 mM DTT) and incubated for 1 h at 37 C with wash buffer containing 2.5 mg•ml À1 BSA and 0.5 mg•ml À1 yeast tRNA. After adding binding reactions to beads, samples were incubated at 37 C for 1 h. Beads were captured in a magnetic stand, the supernatant containing unbound RNA removed, and the beads washed with 100 ml wash buffer for 3 s. miRISC and bound RNA were eluted twice for 45 min at room temperature with 100 ng•ml À1 3XFLAG peptide in wash buffer. RNA libraries were prepared as described above.
RNA bind-n-Seq for kinetic measurements The nitrocellulose and nylon membranes were prepared as described above. Binding reactions were performed in 25 mM HEPES-KOH, pH 7.9, 110 mM potassium acetate, 3.5 mM magnesium acetate, 0.01% (w/v) CHAPS, 2 mM DTT, 8% (w/v) glycerol, and contained 100 nM (f.c.) RNA library (RNA pool 2) and 136 pM (trial 1) or 170 pM (trial 2) AGO2:let-7a. To reduce non-specific binding, each reaction also included 2.5 mg•ml À1 BSA and 0.5 mg•ml À1 yeast tRNA. All components except the RNA pool were combined to generate a master mix, which was aliquoted and placed at 37 C. The RNA pool was added to the first aliquot, and the timer started. After 1 h, the RNA pool was added to the second aliquot. After 1.5 h, the RNA pool was added to the third aliquot. This procedure was repeated until all aliquots received RNA pool at the desired time points. After 2 h from initiating the first binding reaction, all the samples were applied under vacuum to the membranes, which were washed with 100 ml wash buffer. The nitrocellulose membranes containing miRISC-bound RNA were excised, RNA extracted, reverse-transcribed, amplified, and sequenced as described above.

Quantification of active miRISC by double-filter binding assay
To measure concentration of active, binding-competent miRISC, titration data were fit to where KD is the apparent dissociation constant, r is the molar ratio of [RISC] to [RNA], n is the stochiometric equivalence point, f is the fraction bound, fmax is the maximum fraction bound.

Co-localization single-molecule spectroscopy
The individual datasets were saved and combined. The binding rate (k on ) was determined by fitting the cumulative fractions of miRISC arrivals to: fðtÞ = 1 À ð1 À hÞ 3 e Àkon 3 t À h 3 e Àðkon + k NS Þ 3 t and reported per time unit and concentration of introduced miRISC. A dwell time distribution was fitted as fðtÞ = N 3 e Àk off 3 t + A 3 e Àk NS 1 3 t + B 3 e Àk NS 2 3 t : Parameters relative to non-specific association of miRISC with the glass surface (k NS , on-rate for non-specific arrivals; h, fraction of control locations having received non-specific arrivals; k NS1 and k NS2 , off-rate for non-specific binding events and rate of photobleaching; A and B, their respective amplitudes) were determined from the fitting of data for control locations. Values of k on and k off were derived from data collected from >900 individual RNA target molecules. Error was evaluated by 1,000-cycle bootstrapping of 90% of the data.
Quality control of high-throughput sequencing data Only Illumina reads containing TGG (the first nucleotides of the 3 0 adapter) at positions 21-23 were analyzed. Sequences were filtered (Phred quality score R20 for all nucleotides, and ''N'' base calls disallowed), and the 3 0 adapter sequence (5 0 -TGG AAT TCT CGG GTG CCA AGG-3 0 ) removed.

Enrichment values
Occurrences of all 10-nt long motifs (10-mers) were counted in all the reads of each RBNS sample. These counts were then divided by the total count of all 10-mers to give motif frequencies. Enrichment of a motif was computed as the ratio of the motif frequency in the protein-bound samples over the frequency in the RNA pool. Z-score of a motif was computed as Z = RÀR S where R is enrichment of the motif, R is the mean of enrichment values of all 10-mers, and S is the sample standard deviation of enrichment values of all 10-mers. A motif was considered significant if its Z-score was R99.9 percentile.
De novo site discovery Enrichments in the library from the binding reaction with the greatest miRISC concentration were used for the following iterative procedure: (1) enrichment values of all 10-mers were calculated; (2) the hundred most enriched 10-mers were interrogated for (3) the most enriched site type was identified; (4) Z-scores of motifs belonging to the site type were compared to the Z-score threshold; (5) all reads containing the binding site were masked in the miRISC-bound library and the RNA pool so that stepwise enrichments of subsequent 10-mers could be used to eliminate subsequent 'shadow' motifs; (6) all enrichment values were then recalculated on the masked read sets to obtain the resulting most enriched 10-mers. This process continued until the Z-score of the most enriched binding site (calculated from the original enrichment values) was < 99.9 percentile.
To identify a binding site at each iteration, the one-hundred most enriched 10-mers were tested for base-pairing with the guide miRNAs. If perfect complementarity was not observed, the 10-mer was tested for any of the following in this order: (1) complementarity to nine contiguous miRNA positions, allowing a single bulged target nucleotide; (2) complementarity to ten contiguous miRNA positions while allowing for wobble pairing; (3) complementarity to ten contiguous miRNA positions while allowing a non-wobble mismatch. If none of these configurations allowed assigning the motif to a binding site, the procedure was repeated with two 9-mers within the 10-mer, the three 8-mers within the 10-mer, etc., until a configuration of base-pairing was identified.

Read assignments
Each sequencing read in RNA pool and miRISC-bound libraries was interrogated for presence of all binding sites of interest. The entire single-stranded sequence was interrogated: the 20-nt random-sequence region flanked by constant primer-binding sequences in the case when blockers were not used and the 20-nt random-sequence region flanked by 4 nucleotides of constant primer-binding sequence on either side in the case when blockers were annealed to the RNA pool. For analysis of published datasets (McGeary et al., 2019), the entire 87-nt sequenced encompassing the 37-nt random sequence region and constant primer-binding regions was searched. A read was assigned to a site category if it contained one single binding motif. Reads containing multiple instances of binding sites (from the same or a different site category) and reads containing partially overlapping sites were not included in the analysis and represented %1% of libraries. Reads that did not have any of binding motifs of interest were classified as reads with a no-site.
Modeling of RBNS experiments RBNS data was modeled by simulating the equilibrium binding of an RNA Binding Protein (RBP) with an RNA input pool. The stock solution of the RBP was set to 2.1 nM (dataset 1) and 8.1 nM (datasets 2 and 3). Each in silico experiment included five binding reactions. The highest concentration of the RBP used corresponded to 40% (v/v) of the stock solution. For additional reactions, the stock was serially diluted 3.2-fold. Each experiment also included a mock binding reaction (no-RISC control). We considered the RBP binding to an RNA pool (100 nM f.c.) containing four binding sites with affinities equal to 5, 20, 100 and 500 pM (datasets 1 and 2) or 10, 50, 250 and 1,000 pM (dataset 3). The concentration of specific motifs was set to 15 pM. We also included nonspecific binding sites (K D, nosite = 5 nM). We constructed a system of equations relating the concentrations of the free and bound states of RPB and binding sites to the K D values for each binding site and the total concentrations of each species. This system was solved numerically for each input value of RBP in MATLAB using function fsolve. We allowed recovery of 0.1 nM background RNA.
Kinetic RBNS data was modeled by simulating the association of an RBP with an RNA input pool. Concentration of the RBP was set to 150 pM. Each in silico experiment included fourteen time points, ranging from 0 to 7,200 s. We considered the RBP binding to an RNA pool (100 nM f.c.) containing four binding sites with k on 1 =0.1 nM -1 s -1 , k on 2 =0.01 nM -1 s -1 , k on 3 =0.001 nM -1 s -1 , k on 4 =0.0001 nM -1 s -1 , k on no-site =0.0001 nM -1 s -1 and k off 1 =0.0001 s -1 , k off 2 =k off 3 =k off 4 =0.01 s -1 , k off no-site =10 s -1 (dataset 1) and k on 1 =k on 2 =k on 3 =k on 4 =0.1 nM -1 s -1 , k on no-site =0.001 nM -1 s -1 and k off 1 =0.00001 s -1 , k off 2 =0.001 s -1 , k off 3 =0.01 s -1 , k off 4 =0.1 s -1 , k off no-site =10 s -1 (dataset 2). We constructed a system of equations relating the concentrations of the free and bound states of RPB and binding sites to the K D values for each binding site and the total concentrations of each species. This system was solved numerically for each input value of RBP in MATLAB using function solve. We allowed recovery of 0.1 nM background RNA.
Concentrations of each site recovered in equilibrium or kinetic RBNS in silico experiments were converted to number of molecules. The input RNA pool and bound RNA were subsampled with no replacement to yield 20310 6 molecules-our typical sequencing depth.

Maximum likelihood estimation (MLE) of parameters from RBNS data
Let M be a mathematical model that predicts read counts of each binding site type given information about the RNA pool and a set of K D values for each site type. Let D be observed data from RNA sequencing of n binding reactions. Maximum Likelihood Estimation (MLE) is used to estimate K D values, so that D is the most probable given M. We note the likelihood function LðDjMÞ: LðDjMÞ = L À ðReads 1 ; .; Reads j ; .; Reads n Þ À K D 1 ; .; K D i ; .; K Dm ÁÁ (Equation 1) where Reads j is sequencing data from the binding reaction j and K Di is dissociation constant for binding site i. Binding in reaction j is independent from binding in reaction j+1, so the joint probability is a product of individual probabilities: Each probability is a value comprised between 0 and 1. Multiplying small values is prone to numerical underflow and introduces errors because the computer can only store a certain number of digits. Therefore, we describe L in terms of log conditional probabilities: In the binding reaction j, observations from RNA sequencing follow Negative Multinomial Distribution with possible outcomes {site 1 , site 2 , ., site mÀ1 , noiste}. Therefore, the probability mass function is: where pi,j is the exected frequency of a binding site type i in a binding reaction j and is given by the mathematical model:

Cost function for parameter fitting
We define a cost function; f cost ; as f cost = À LðDjMÞ: (Equation 7) A cost function maps values of variables into a real number, intuitively representing some ''cost'' associated with the event. An optimization procedure seeks to maximize the likelihood function, so that the model looks the most like the data. Because we define the cost function as the opposite of the likelihood function, f cost will be minimized upon fitting.
Using product, quotient and power rules of natural logarithm, Equation (8) can be simplified to yield: We note that some terms in the expression of f cost do not depend on parameters from the mathematical model. Therefore, f cost can be written as  (9) is equivalent to minimizing . The BFGS method does not require the gradient of the cost function, as it can be estimated using fine differences. Nevertheless, the analytical gradient can be supplied and enhances the efficiency of the optimization process. Therefore, derivation of the cost function for MLE of K D values and MLE of kinetic parameters is developed below.

Mathematical model for MLE of K D values
We considered the simplified reaction: RISC + RNA!RISC:RNA For a given binding reaction j and a miRNA binding site i, the recovered concentration of site i , xi,j, is a sum of si,j and nsi,j, which correspond to concentrations of miRISC-bound and non-specifically recovered site i , respectively.
Concentrations of miRISC-bound site i , depends on dissociation constant K D of site i: where ½AGO:miRNA free j is concentration of unbound miRISC in the reaction j and ½site i;j free is the concentration of unbound site i in the binding reaction j.
Let ½site i pool be the concentration of a binding site i in the RNA pool used in RBNS. After substituting ½site i;j free by ½site i pool À s i;j , Equation (12) becomes: We describe concentration of non-specifically recovered site i by: where c j is a sample-specific proportionality constant and ½site i;j free is the concentration of unbound site i in the binding reaction j.
We consider background to be the total concentration of all non-specifically recovered RNA, which is assumed to be the same in all binding reactions: Equation (16) predicts concentrations of each binding site type, given information on RNA pool and a set of K D values for each site type. We note that this equation also contains ½AGO:miRNA free j , concentration of unbound miRISC in a reaction j. ½AGO:miRNA free j cannot be calculated explicitly, as it depends on concentration of miRISC bound to all site types. Therefore, ½AGO:miRNA free j is approximated at each iteration of the optimization routine by solving the equation: To find root of Equation (17), minimize_scalar from a Python-based library SciPy is used within the interval (0, ago).

Parameters fitted during MLE of K D values
The parameters to optimize includes K D values of site 1 , ., site m-1 , and no-site, as well as background, the total concentration of all non-specifically recovered RNA, and ago, the stock concentration of active AGO:miRNA complex. Upon optimization, some parameters may receive negative values, which is meaningless for affinities and concentrations. Therefore, we perform exponential transformation and define q 1 , ., q k , ., q m , q m+1 , and q m+2 as )

Derivation of f cost for MLE of K D values
The function f grad (q) returns the derivative of f cost with respect to each q : f grad ðq 1 ; .; q k ; .; q m + 2 Þ = df cost dq 1 ; .; df cost dq k ; .; df cost dq m + 2 (Equation 19) We derive dfcost dqk using the chain rule: First, to derive vfcost vxi;j , substitute f cost by Equation (10): After using natural logarithm properties, the expression is simplified to: Next, to derive dfcost dqk , we note that xi,j contains si,j and background, which depend on q k . Therefore, we use the chain rule to write: To calculate the partial derivative of xi,j with respect to background, we substitute xi,j by Equation (11), and combining it with Equations (14) and (15)  To calculate the partial derivative of xi,j with respect to si,j, we substitute x i,j by Equation (11), and combining it with Equations (14) and (15)  After using the composite function rule and rearranging, the partial derivative of x i,j with respect to si,j becomes: By combining expressions of partial derivatives of x i,j with respect to background and si,j, expression Equation (20.2) becomes: which can be further simplified to yield: To calculate the partial derivative of si,j with respect to K D of site i, we substitute s i,j by Equation (13): We use the composite function rule to write: To calculate the partial derivative of K Di with respect to q k , we note that K Di = e qi for i in [1, m]: To calculate the partial derivative of s i,j with respect to ½AGO:miRNA free j , we substitute s i,j by Equation (13) To calculate the partial derivative of ½AGO:miRNA free j with respect to q k , we note that ½AGO:miRNA free j = DF j 3 ago À P m i = 1 s i;j , where DF j is the dilution factor of miRISC in the binding reaction j. Therefore, we can write:  The gradient of f cost can now be computed and is given by Equation (20):  Initial guess for MLE of K D values and calculation of 95% cIs Bootstrapping of 95% of the data was performed ten times on sequencing reads from each binding reaction and the RNA pool. MLE of K D values was performed on each bootstrapped sample by using 100 different combinations of 10 initial guesses of miRISC concentration (in the range 0.5-25 nM) and 10 initial guesses of K D for RNA with no enriched site (in the range 0.5-10 nM). K D values were initialized as the inverse of the average enrichment values. The background was initialized at 0.1 nM. All the initial guesses were partially randomized by adding a value drawn from a normal distribution with mean 0 and standard deviation 0.1. The cost function was evaluated in the presence of physically meaningful constraints on the parameters: 0.1 pM % K D site % 100 nM, 100 pM % K D no-site % 10,000 nM, 100 pM % ago % 100 nM, and 5 pM % background % 5 nM. Any of the fitted parameters were at the boundaries at the end of the optimization routine. K D estimates, the background, and the stock concentration of miRISC provided by MLE were used to predict counts of each binding site type in sequencing data. These counts were compared with observed sequencing data, and MLE results were retained if Pearson correlation coefficient was >0.90. Results from independent starting points satisfying this criterion were combined. All bootstrapped samples were combined. Finally, estimates from two independent RBNS assays were merged. Median and 95% confidence intervals on medians were reported.
Mathematical model for MLE of k on and k off values We considered the simplified reaction: RISC + RNA ! kon k off RISC:RNA For a binding reaction at time t and miRNA binding site i, the recovered concentration of site i x i , t is a sum of miRISC-bound site i , si,t, and non-specifically recovered site i . Similar to how Equation (11) was obtained above to estimate K D , xi,t is described by where si,t is given by ds i dt = k on i 3 ½site i free 3 ½AGO:miRNA free À k off i 3 s i (Equation 22) As ½site i free = ½site i pool À s i , Equation (22) becomes: ds i dt + s i 3 À k on i 3 ½AGO:miRNA free + k off i Á = k on i 3 ½site i pool 3 ½AGO:miRNA free (Equation 23) meaningless for kinetic rate constants and concentrations. Therefore, we perform exponential transformation and define q 1 , ., q m , ., q 2m , and q 2m+1 as k onk = e qk for k in [1, m] k offk = e qm + k for k in [1, m] background = e q 2m + 1 (Equation 27) Derivation of f cost for MLE of k on and k off values The function f grad q returns the derivative of f cost with respect to each q : vsz;j vqk whose expressions differ from RBNS experiments at equilibrium, and therefore should be calculated. si,t contains k oni , k offi , and ½AGO:miRNA free t that depend on q k , so partial derivatives are required. We use the chain rule to write: To calculate the partial derivative of si,j with respect to k oni , we use the expression of si,j from Equation (24). After applying the quotient rule and some rearrangements, we obtain: vs i;t vk on i = ½site i pool 3 ½AGO:miRNA free t 3 k off i À k on i 3 ½AGO:miRNA free t + k off i Á 2 À ½site i pool 3 ½AGO:miRNA total k on i 3 ½AGO:miRNA total + k off i 3 e Àðkon i 3 ½AGO:miRNA free t + k off i Þ t 3 k off i k on i 3 ½AGO:miRNA total + k off i À ½AGO:miRNA free t 3 t 3 k on i (Equation 30.1) To calculate the partial derivative of k oni with respect to q k , we note that k onk = e qk for k in [1, m]: To calculate the partial derivative of si,j with respect to k offi , we use the expression of si,j from Equation (24). After applying the quotient rule and some rearrangements, we obtain: vs i;t vk off i = À k on i 3 ½site i pool 3 ½AGO:miRNA free t À k on i 3 ½AGO:miRNA free t + k off i Á 2 + k on i 3 ½site i pool 3 ½AGO:miRNA total À k on i 3 ½AGO:miRNA free t + k off i Á 2 3 e Àðkon i 3 ½AGO:miRNA free t + k off i Þ t 3 t + 1 k on i 3 ½AGO:miRNA total + k off i (Equation 30.3) To calculate the partial derivative of k offi with respect to q k , we note that k offk = e qm + k for k in [1, m] To calculate the partial derivative of si,j with respect to ½AGO:miRNA free t , we use the expression of si,j from Equation (24). After applying the quotient rule and some rearrangements, we obtain: vs i;t v½AGO:miRNA free t = k on i 3 ½site i pool 3 k off i À k on i 3 ½AGO:miRNA free t + k off i Á 2 + k on i 3 ½site i pool 3 ½AGO:miRNA total k on i 3 t k on i 3 ½AGO:miRNA total + k off i 3 e Àðkon i 3 ½AGO:miRNA free t + k off i Þ  The gradient of f cost can now be computed. Because of the complexity of the equations, the full solution of dfcost dqk is not shown.