Dynamic interplay between target search and recognition for a Type I CRISPR-Cas system

CRISPR-Cas effector complexes enable the defense against foreign nucleic acids and have recently been exploited as molecular tools for precise genome editing at a target locus. To bind and cleave their target, the CRISPR-Cas effectors have to interrogate the entire genome for the presence of a matching sequence. Here we dissect the target search and recognition process of the Type I CRISPR-Cas complex Cascade by simultaneously monitoring DNA binding and R-loop formation by the complex. We directly quantify the effect of DNA supercoiling on the target recognition probability and demonstrate that Cascade uses facilitated diffusion for its target search. We show that target search and target recognition are tightly linked and that DNA supercoiling and limited 1D diffusion need to be considered when understanding target recognition and target search by CRISPR-Cas enzymes and engineering more efficient and precise variants.


Supplementary Figures and Tables
indicated the binding of doubly labeled St-Cascade or a St-Cascade dimer. These events were considered in our quantitative analysis as they did not show an altered behavior c Occasional disappearance of the fluorescence signal before R-loop collapse was attributed to bleaching of the Cy5-fluorophore. In the analysis, these events were included for determining the delay between binding and R-loop formation but not for dissociation.       The sequential states represent permissive PAMs with one of the PAMs ( ) containing the adjacent target sequence. After random binding to any of the PAMs, Cascade stochastically transits to either of the two next-neighbor PAMs at equal rates . At every PAM, Cascade can dissociate with rate , corresponding to an irreversible transition to State + 2. At the target PAM, , Cascade can additionally recognize the target with rate , corresponding to an irreversible transition to state + 1. b Kinetics of non-productive target search events for which Cascade dissociated without recognizing the target. c Kinetics of productive target search events for which Cascade recognized the target. The model was solved for a lattice length = 60 and different target recognition probabilities per target binding . The fraction of productive events that is asymptotically approached at long times corresponds to the efficiency ℎ of the target recognition process. Source Data are provided as a Source Data file.

Accuracy of transition points in fluorescence trajectories
The  Fig. 3b).

Accuracy of transition points in magnetic tweezers trajectories
The accuracy of transition times in DNA length trajectories was mainly limited by the response time of the DNA tethered magnetic bead to sudden DNA length changes as well as thermal DNA length fluctuations.
To simulate DNA length trajectories with R-loop formation-collapse events, we first simulated trajectories with sudden (R-loop) transitions between two fixed DNA lengths obeying first order kinetics with , = , = 0.05s -1 at a frame rate of 120 Hz. The length difference between the two states was To find the most likely mean dwell time describing the measured dwell time distribution, ln( ) was numerically maximized for the particular set of data points and error parameters and . The estimated standard errors of and were derived from the inverted Hessian matrix of the optimized function, i.e.
the covariance matrix 5 .

Estimating the number of short non-detected binding events of St-Cascade to DNA
The average dwell times for non-productive binding of St-Cascade to DNA were on the order of 150 ms.
Given a frame rate of 20 Hz in the measurements, a fraction of short-lived events remained undetected.
To reliably determine the search efficiency, we used our MLE approach to estimate the number of undetected events. Assuming that only events with a dwell time larger than a cut-off are detected, the dwell time distribution of the detected events corresponds to an exponential distribution that is shifted by to larger times. Convolution with the error distribution would smoothen the sharp transition around and yield reduced event numbers at low times ( Supplementary Fig. 5a), as also seen experimentally ( Fig. 2e, main text). To obtain a nominal dwell time distribution, the shift by can be achieved by setting the mean of the Gaussian error distribution to = . It can be shown that the fraction of non-detected events among all binding events is then given by: where is the nominal mean dwell time. Using and (i. e. ) as free parameters in our MLE approximation of the experimental dwell times allowed us to estimate and thus to correct for the fraction of non-detected non-productive binding events.
To validate this approach, we simulated fluorescence trajectories with short binding events using exponentially distributed dwell times of mean (see above as described above). Detection of the binding events in the simulated trajectories and analysis of the dwell times (see above) provided dwell time distributions with a significantly reduced number of events at low dwell times, which could be well approximated by the expected dwell time distribution with detection cutoff (Supplementary Fig. 5a).
We carried out the simulations and the analysis for a range of different dwell times and verified that the analysis returned, within error, the input dwell times (Supplementary Fig. 5b). Most importantly, from the simulations we could also determine the actual number of events that were missed by the Hidden-Markov  Fig. 5c).

Target search model
Following the binding to a random PAM on the DNA, we modeled the target search by St-Cascade as a one-dimensional random walk with stepping rate from one PAM to the next adjacent PAM in either direction ( Supplementary Fig. 9a). The 1D PAM lattice was limited to lattice points to mimic the limited illuminated DNA length. The boundaries in our model are considered reflective. One of the PAMs (the target PAM, position 20, taken from the actual DNA sequence used in the experiments) was selected to contain the adjacent target. The mean-squared-displacement of such a discrete random walk is provided by: with ∆= 2.7 nm being the mean distance between the PAMs. 〈 2 〉 has to match the observed meansquare displacement, such that the stepping rate could be obtained from the experimentally determined diffusion coefficient as = thus the search process is then described by 6 : where ̿ is the rate matrix containing the rate constants , of all possible transitions between all states , of the system. For the scheme in Figure S9a, the rate matrix has the following form: Eq. S9 is solved by with exp( ̿ ) being the matrix exponential of ̿ and ⃗(0) being the initial probability distribution at = 0. The exponential term can be expressed as where ̿ is the eigenvector matrix of ̿ , and exp(Λ ̿ ) is a diagonal matrix, containing exponential decays in time with the eigenvalues as decay constants: ) . (S13) The entries of the matrix exponential are, according to Eq. S12, sums of the + 2 exponential decays: (S14) Combining Eqns. S11, S12 and S14, provides then for the occupation probability of state the following sum of the +2 exponential decays: (S15) with = , , −1 . For an initial binding of Cascade to a random PAM, the starting probabilities are given by: With this solution and experimentally determined values for and , one can calculate the kinetics of productive ( +1 ( )) and non-productive ( +2 ( )) target recognition events for any set the lattice length and the target recognition efficiency (Supplementary Fig. 9a). We validated the obtained kinetics using stochastic simulations of the random walk process. The fraction of productive events that is asymptotically approached at long times provides the efficiency ℎ of the target recognition process (Supplementary Fig. 9c).
ℎ was increasing with increasing and generally larger than due to multiple target site revisits during the search. From the kinetics, we furthermore calculated the mean event durations, i.e. the dwell times for non-productive and productive search events: (S17) with being either + 1 or + 2, respectively.
Dwell times for R-loop collapse were obtained from the calculated kinetics of non-productive events that started at the target site at position for which the initial occupancies changed to ⃗(0) = (0, … ,0,1,0, … ,0). The number of target site revisits for productive events was obtained from 10,000 stochastic simulations of the target search process following the scheme in Supplementary Fig. 9a. The search process was always started at the target site and the number of revisits was counted for each productive search event.

Probability to finish a 1D random walk once the first step was taken
Due to the reversibility of the R-loop formation process, an R-loop of a CRISPR-Cas effector complex that became nucleated upon PAM binding has, even on a perfectly matching sequence, a certain chance to collapse without full R-loop formation or locking. To provide a theoretical description for the probability at which a matching target becomes recognized upon PAM binding, we model R-loop formation as a one-dimensional random walk process over at positions 0 and and places a single particle into the system.
Once the particle reaches a boundary, the particle is instantaneously placed to the start position 1. To obtain average quantities over many single-particle trajectories, we assume steady state conditions for this system, i.e. probabilities for the particle to be found at (S18) For the forward/backward flux ± between two neighboring positions and + 1 we can write: In steady state, the probability at any position does not change in time such that for a pure linear chain we have: Recursively, it follows that = 0 for all of a linear chain segment not containing branches. For the backward flux − , we then get from 0 = 0 and Eq. S19: For the forward flux, we get from Eq. S19: Thus, with the boundary condition = 0, we can obtain all / + from N-1 to 1 in a recursive manner: The variable Δ = +1 − is the local bias of the energy landscape, i.e. the free energy difference between positions + 1 and . With this, Eq. S22 becomes: In a recursive manner we can obtain all / + as above, for which we obtain the following form final form: (S26) Using this expression, we obtain 1 / + and obtain after transformation: With − = 1 − 1 , we get for the success probability of the random walk:

Formation probability of a matching R-loop of N base pairs under constant bias
To derive an expression for the formation probability of an (unlocked) R-loop of length following PAM binding, we assume a constant bias for the energy landscape for all R-loop states such that +1 − = for 1 ≤ < . The energy difference between position and 1 is given as . Furthermore, we assume position-independent values for the forward and backward stepping rates. Simplifying the exponential terms of the bias energy in Eq. S28 with the rate ratio between the backward and the forward stepping rate yields for the R-loop formation probability: Using the sum formula of the geometric series, one gets

Locked R-loop formation under constant bias
We further consider 'locking' to derive a more realistic target recognition probability that includes the effects of the conformational changes during locking, which are slow compared to the single base pair steps. To this end, we introduce an additional locking step from position to + 1 with the rate constant The target recognition probability then becomes = 1 1 + − ∑ 1 This can be further transformed to: Assuming that the activation barriers for single base pair steps of R-loop formation are centered between subsequent R-loop positions, the bias-dependent forward and the backward stepping rates for R-loop formation can be expressed using Arrhenius-like terms: with being the forward and backward stepping rate in absence of bias. When plotting the target recognition probability as function of the (negative) bias of the energy landscape as applied by the supercoiling, one sees that it strongly increases with the applied bias, similar to the first passage time of R-loop formation ( Supplementary Fig. 11). Reducing the locking rate reduces the target recognition probability only for weak but not for strong biases (Supplementary Fig. 11a). This can be better understood by looking at the probabilities of (unlocked) R-loop formation and locked R-loop formation as function of the R-loop length ( Supplementary Fig. 12). For an unlocked R-loop formation we have high probabilities to form short R-loops due to the limited length of the random walk which drop with increasing R-loop lengths ( Supplementary Fig. 12a). It is important to note that even for a mild downhill bias, the R-loop formation probability saturates at a minimum value already at short R-loop lengths. This means that the decision for formation or collapse of an R-loop is taken at the beginning of the random walk. Once the R-loop has been further expanded, it will for the given bias practically always form but not collapse anymore. This changes somewhat if a considerably slower locking step is added. Locked R-loop formation is particularly disfavored at short lengths, since the slow step represents a high kinetic (activation) barrier that needs to be overcome. The probability then increases with length in presence of a negative bias, reaching the same saturation values as for unlocked R-loops ( Supplementary Fig. 12b).
Due to the negative bias, the unlocked R-loops are already sufficiently stabilized compared to the activation barrier for locking. Locking is then just an additional step after R-loop formation and occurs without an additional collapse. By contrast, at zero or mild bias, the locking barrier still represents an additional hurdle that can promote R-loop collapse such that the locked R-loop formation is less likely than unlocked R-loop formation.

Determination of the apparent locking rate from magnetic tweezers trajectories
In the following we derive a simple formula to calculate the apparent locking rate from the magnetic tweezers trajectory in Fig. 5, main text. When the R-loop is in the full state , R-loop collapse to the intermediate state and locking state are competing: This kinetic scheme can be described by the following set of differential equations: where , and are the probabilities that the R-loop is in the full, intermediate and locked state respectively. At = 0, i.e. when the full state has just formed, we have the following starting conditions: (0) = 1 (0) = (0) = 0 (S37) The first differential equation is a simple first order decay, such that we get with the starting conditions: as the only measurable probability. The probability ( ) rises only to the amplitude: With or being the probabilities that a formed full R-loop collapses or becomes locked, respectively. The collapse occurs with the rate constant = + (S43) If one determines from the fraction of events that successfully get locked and from fitting the probability of collapse for fully formed R-loop ( Supplementary Fig. 13) using Eq. S40, one can calculate and from:

Estimation of the 'real' locking rate
Due to their reversibility, unlocked R-loops do not possess a fixed length but rather sample different lengths dependent on the energy landscape of R-loop formation. The results of our experiments suggested (see main text) that R-loop locking is only triggered if the R-loop extends over its full length.
Given that the unlocked R-loop samples the full length only for a fraction of the time, the measured torque-dependent apparent locking rate would be considerably lower than the real locking rate , i.e., the rate at which the locking transition occurs, once a full R-loop has been formed. The relation between both rates would be given as: where 32 is the probability that the R-loop extends all the way to the last base pair. The probabilities that an R-loop extends over base pairs can be determined from the energy landscape of R-loop formation using the Boltzmann distribution:  Fig. 14b, c). Finally, using the simplified model for , , the measured apparent locking rate data could be globally fitted using and a base-pair independent mismatch penalty as fit parameters (Fig. 5, main text). Agreement between data and model supported the idea that R-loop locking only occurs for a fully extended R-loop and that the torque dependence of the , is mainly due to an increasing confinement of the R-loop to its full length with increasing negative supercoiling.