Single-photon smFRET. III. Application to pulsed illumination

Förster resonance energy transfer (FRET) using pulsed illumination has been pivotal in leveraging lifetime information in FRET analysis. However, there remain major challenges in quantitative single-photon, single-molecule FRET (smFRET) data analysis under pulsed illumination including 1) simultaneously deducing kinetics and number of system states; 2) providing uncertainties over estimates, particularly uncertainty over the number of system states; and 3) taking into account detector noise sources such as cross talk and the instrument response function contributing to uncertainty; in addition to 4) other experimental noise sources such as background. Here, we implement the Bayesian nonparametric framework described in the first companion article that addresses all aforementioned issues in smFRET data analysis specialized for the case of pulsed illumination. Furthermore, we apply our method to both synthetic as well as experimental data acquired using Holliday junctions.


INTRODUCTION
Among the many fluorescence methods available (1-7), single-molecule Förster resonance energy transfer (smFRET) has been useful in probing interactions and conformational changes on nanometer scales (8)(9)(10)(11)(12). This is typically achieved by estimating FRET efficiencies (and system states) at all instants of an smFRET trace and subsequently estimating transition rates. Further-more, among different FRET modalities, FRET efficiencies are most accurately determined under pulsed illumination (13)(14)(15), where the FRET dyes are illuminated by short laser bursts at known times.
Under this illumination procedure, photon arrival times are recorded with respect to the immediately preceding pulse, thereby facilitating an accurate estimation of fluorescence lifetimes as well as FRET rates. As such, in this article, we will focus on single-photon smFRET analysis under pulsed illumination.
Under pulsed illumination, information on kinetic parameters present in smFRET data is traditionally learned by binned photon methods, thereby eliminating lifetime information altogether (16)(17)(18); bulk correlative methods (19)(20)(21); and single-photon methods (14,22,23). However, these methods are parametric, i.e., require fixing the number of system states a priori, and necessarily only learn system kinetics even though information on the number of system states is encoded in the data.
In this article, we implement a general smFRET analysis framework that was presented in Sec. 2.5.1 of the first companion manuscript (24) for the case of pulsed illumination to learn full distributions. In other words, probability distributions over parameters take into account uncertainties from all existing sources such as cross talk and background. These parameters include the system transition probabilities and photophysical rates, that is, donor and acceptor relaxation and FRET rates, with special attention paid to uncertainty arising from sources such as inherent stochasticity in photon arrival times and detectors. As our main concern is deducing the number of system states using single-photon arrivals while incorporating detector effects, we leverage the formalism of infinite hidden Markov models (iHMMs) (25)(26)(27)(28)(29)(30) within the Bayesian nonparametric (BNP) paradigm (25,26,(31)(32)(33)(34)(35)(36)(37)(38). The iHMM framework assumes an a priori infinite number of system states with associated transition probabilities, where the number of system states warranted by input data is enumerated by those states most visited over the course of the system state trajectory.
Next, to benchmark our BNP-FRET sampler, we analyzed synthetic and experimental smFRET data acquired using a single confocal microscope with pulsed illumination optimized to excite donor dyes.
In particular, we employ a broad range of experimental data acquired from Holliday junctions (HJs) with an array of different kinetic rates due to varying buffer concentration of MgCl 2 (39)(40)(41)(42).

Terminology convention
To be consistent throughout our three-part article, we precisely define some terms as follows.

A macromolecular complex under study is always
referred to as a system. 2. The configurations through which a system transitions are termed system states, typically labeled using s. 3. FRET dyes undergo quantum mechanical transitions between photophysical states, typically labeled using j. 4. A system-FRET combination is always referred to as a composite. 5. A composite undergoes transitions among its superstates, typically labeled using 4. 6. All transition rates are typically labeled using l. 7. The symbol N is generally used to represent the total number of discretized time windows, typically labeled with n. 8. The symbol w n is generally used to represent the observations in the n-th time window.

Forward model and inverse strategy
In this section, we first briefly illustrate the adaptation of the general formalism described in our first compan-ion article (24) to the pulsed illumination case. Next, we present a specialized inference procedure. The details of the framework not provided herein can be found in the supporting material. As before, we consider a molecular complex labeled with a donor-acceptor FRET pair. As the molecular complex transitions through its M s system states indexed by s 1:M s , laser pulses (optimized to excite the donor) separated by time t may excite either the donor or acceptor to drive transitions among the photophysical states, j 1:M j , as defined in the first companion article (24). Such photophysical transitions lead to photon emissions that may be detected in either donor or acceptor channels. The set of N observations, e.g., photon arrival times, from N pulses are recorded as Here, each individual measurement is a pair w n ¼ ðm d n ;m a n Þ, where m d n and m a n are the recorded arrival times (also known as microtimes) after the n-th pulse in both donor and acceptor channels, respectively. In cases where there is no photon detection, we denote the absent microtimes with m d n ¼ B and m a n ¼ B for donor and acceptor channels, respectively.
As is clear from Fig. 1, smFRET traces are inherently stochastic due to the nature of photon excitation, emission, and noise introduced by detector electronics. To analyze such stochastic systems, we begin with the most generic likelihood derived in Eq. 51 of the first companion article (24), where r start is the initial probability vector for the system-FRET composite to be in one of M f ¼ ðM s Â M j Þ superstates, and r norm is a vector that sums the elements of the propagated probability vector. Here, we recall that Q n is the transition probability matrix between pulses at t n and t nþ1 , characterizing system-FRET composite transitions among superstates. The propagators Q n above adopt different forms depending on whether a photon is detected or not during the associated period. Their most general forms are derived in the section on illumination features in the first companion article (24). However, these propagators involve computationally expensive integrals, and thus we make a few approximations here as follows: 1) we assume that the system state remains the same over an interpulse period since typical system kinetic timescales (typically 1 ms or more) are much longer than interpulse periods (z100 ns) (41,43), and 2) the interpulse period (z100 ns) is longer than the donor and acceptor lifetimes (z a few ns) (41,43) such that they relax to the ground state before the next pulse. Furthermore, we will demonstrate a specialized sampling scheme under these physically motivated approximations.
The immediate implications of the first assumption are that the system transitions may now, to a good approximation, only occur at the beginning of each pulse. Consequently, the evolution of the FRET pair between two consecutive pulses is now exclusively photophysical, as the system state remains the same during interpulse times. As such, the system now evolves in equally spaced discrete time steps of size t, where the system state trajectory can be written as s 1:N ¼ fs 1 ; s 2 ; .; s n ; .; s N À 1 ; s N g; where s n is the system state between pulses n and n þ 1. The stochastic evolution of the system states in such discrete steps is then determined by the transition probability matrix designated by P s . For example, in the simplest case of a molecular complex with two system states s 1:2 , this matrix is computed as follows: where the matrix in the exponential contains transition rates among the system states and the Ã represents the negative row sum. Next, by assumption two, we can further suppose that the fluorophores always start in the ground state at the beginning of every pulse. As a result, we treat pulses independently and write the probability of observation w n as pðw n js n ; G j Þ ¼ r ground Q j n ðs n Þr T norm ; where r ground denotes the probability vector when the FRET pair is in the ground state at the beginning of each pulse, G j is the generator matrix with only photophysical transition rates, and Q j n ðs n Þ is the photophysical propagator for the n-th interpulse period.
We further organize the observation probabilities of Eq. 4 into a newly defined detection matrix D s n with its elements given by ðD s n Þ s n /s j ¼ pðw n js n ; G j Þ. Here, we note that the index j does not appear on the righthand side because the system state does not change during an interpulse window, resulting in the independence of observation probability from the next system state, s nþ1 . The explicit formulas for the observation probabilities are provided in the supporting material. Now, using the matrix D s n , we define the reduced propagators for each interpulse period as where 1 denotes the element-by-element product. Finally, using these simplified propagators, we can write the likelihood for an smFRET trace under pulsed illumination as as also introduced in the section on illumination features in the first companion article (24). This form of the likelihood is advantageous in that it allows empty pulses to be computed as a simple product, greatly reducing computational cost. a b FIGURE 1 Events over a pulsed illumination experiment pulse window. Here, the beginning of the n-th interpulse window of size t is marked by time t n . The FRET labels originally in state GG (donor and acceptor, respectively, in ground states) are excited by a high intensity burst (shown in green) to the state EG (only donor excited) for a very short time, d pulse . If FRET occurs, the donor transfers its energy to the acceptor and resides in the ground state, leaving the FRET labels in the GE state (only acceptor excited). The acceptor then emits a photon to be registered by the detector at microtime m n . When using ideal detectors, the microtime is the same as the photon emission time as shown in (a). However, when the timing hardware has jitter (shown in red), a small delay ε n is added to the microtime as shown in (b). For convenience, we have reproduced this figure from our first companion article (24).
In the following, we first illustrate a parametric inference procedure assuming a given number of system states. We next generalize the procedure developed to the nonparametric case to deduce the number of system states along with the rest of parameters.

Inference procedure: Parametric sampler
With the likelihood at hand, we construct the posterior as follows pðr start ; P s ; G j jwÞfpðwjr start ; P s ; G j Þpðr start ÞpðG j ÞpðP s Þ; where we assume that the unknown parameters, including the initial probability vector, r start , the photophysical transition rates in the generator matrix G j , and the transition probabilities among system states in propagator P s are independent, allowing us to conveniently write the prior on these parameters as a product (the last three terms on right hand side).
Here, we can sample the set of unknowns using the above posterior with the Gibbs sampling procedure described in the first companion article (see the section describing inverse strategy in (24)). However, a computationally more convenient inference procedure that allows direct sampling is accomplished by writing the posterior of Eq. 7 as a marginalization (sum) over state trajectories as follows where s 1:N ¼ fs 1 ; s 2 .; s N g denotes a system state trajectory. Now, we can use the nonmarginal posterior pðr start ; P s ; G j ; s 1:N jwÞfpðwjP s ; G j ; s 1:N Þpðr start Þ pðG j ÞpðP s Þpðs 1:N jr start ; P s Þ (9) a b FIGURE 2 Analysis on synthetic data for a system with two system states. In (a), we show a section of synthetic data produced with the values in Table S2. Furthermore, the system state trajectory is shown in blue. Below this, the arrival times of donor and acceptor photons m d n and m a n are shown in green and red, respectively. In (b), we plot the bivariate distribution over escape rates and FRET efficiencies. The ground truth is shown with red dots corresponding to an escape rate of 40 s À1 and FRET efficiencies of 0.22 and 0.59. l esc ε FRET . As seen, the BNP-FRET sampler clearly distinguishes two system states with maximum a posteriori (MAP) estimates for the associated escape rates of z38 þ7 À 7 and z40 þ7 À 7 s À1 and for FRET efficiencies of z0:21 þ0:03 À 0:03 and z0:59 þ0:03 À 0:03 . We have smoothed the distributions using kernel density estimation for illustration purposes only. pðr start ; P s ; G j jwÞ ¼ P to sample the trajectory s 1:N , which, in turn, allows direct sampling of the elements of propagator P s described shortly. For priors on r start and rates in G j , we, respectively, use Dirichlet and Gamma distributions similar to Eqs. 65 and 66 of the first companion article (24). We sample the system state trajectory s 1:N by recursively sampling the states using a forward filtering backward sampling algorithm described in section S4.3. Finally, for each row in the propagator P s , we use a Dirichlet prior where M s is the number of system states and p m denotes the m-th row of the propagator. Here, the hyperparameters a and b are, respectively, the concentration parameter and a vector of length M s described in the first companion article (see Section 3.2.2 of (24)).
We can now directly generate samples for the transition probability vectors p m of length M s via prior-likelihood conjugacy as (see section S4.3) where the vector n m collects the number of times each transition out of system state s m occurs obtained using the system state trajectory. After constructing the posterior, we can make inferences on the parameters by drawing samples from the posterior. However, as the resulting posterior has a nonanalytical form, it cannot be directly sampled. Therefore, we develop a Markov chain Monte Carlo sampling procedure (37,38,(44)(45)(46)(47) to draw samples from the posterior.
Our Markov chain Monte Carlo sampling scheme follows a Gibbs sampling technique, sweeping through updates of the set of parameters in the following order: 1) photophysical transition rates including donor relaxation rates l d (inverse of donor lifetime), acceptor relaxation rate l a (inverse of acceptor lifetime), FRET rates l FRET s 1:Ms for each system state, and excitation rate (inverse of excitation probability p ex ) using the Metropolis-Hastings(MH) step; 2) transition probabilities between system states, p 1:M s , by directly drawing samples from the posterior; 3) the system states trajectory, S , using a forward-backward sampling procedure (48); and 4) the initial probabilities, r start , by taking direct samples. In the end, the chains of samples drawn can be used for subsequent numerical analysis.
Inference procedure: Nonparametrics sampler The smFRET data analysis method illustrated above assumes a given number of system states, M s . However, in many applications, the number of system states is not specified a priori. Here, we describe a generalization of our parametric method to address this shortcoming and estimate the number of system states simultaneously along with other unknown parameters.
We accomplish this by modifying our previously introduced parametric posterior as follows. First, we suppose an infinite number of system states ðM s /NÞ for the likelihood introduced previously and learn the transition matrix P s . The number of system states can then be interpreted as those appreciably visited over the course of the trajectory.
To incorporate this infinite system state space into our inference strategy, we leverage the iHMMs (25,26,(28)(29)(30) from the BNP repertoire, placing a hierarchical Dirichlet process prior over the infinite set of system states as described in the first companion article (the inverse strategy section in (24)). However, as detailed in the first companion manuscript (the inverse strategy section in (24)), dealing with an infinite number of random variables, though feasible, is not computationally efficient, and we approximate this infinite value with a large number, M max s , reducing our hierarchical Dirichlet process prior to Here, b denotes the base probability vector of length M max s serving itself as a prior on the probability transition matrix P s , and p m is the m-th row of P s . Moreover, g is a positive scalar hyperparameter of the Dirichlet process prior often chosen to be one. As such, we ascribe identical weights across the state space a priori for computational convenience (28,29,49). Now, equipped with the nonparametric posterior, we proceed to simultaneously make inferences on transition probabilities, excited-state escape rates, and the remaining parameters. To do so, we employ the Gibbs sampling scheme detailed in the inverse strategy section in the first companion article (24), except that we must now also sample the system state trajectory s 1:N . More details on the overall sampling scheme are found in section S4 of the supporting material.

RESULTS
The main objective of our method is to learn full distributions over 1) transition probabilities among M max s system states determining, in turn, the corresponding system transition rates and the effective number of system states, and 2) photophysical transition rates, including FRET rates l FRET 1:M , and fluorophores' relaxation rates (inverse of lifetimes) l a and l d .
To sample from distributions over these parameters, the BNP-FRET sampler requires input data comprised of photon arrival time traces from both donor and acceptor channels as well as a set of precalibrated input parameters including camera effects such as cross talk matrix and detection efficiency (see Sec. 2.4 and example V of the first companion article (24)); background emission (see the section on background in the first companion article and Section S2.4); and the instrument response function (IRF) (see illumination features sectoin in the first companion article (24) and Section S2.3).
Here, we first show that our method samples posteriors over a set of parameters employing realistic synthetic data generated using the Gillespie algorithm (50) to simulate system and photophysical transitions while incorporating detector artefacts such as crosstalk (see the synthetic data generation section in the first companion article (24)). The list of parameters used in data generation for all figures is provided in Section S6. Furthermore, prior hyperparameters used in the analysis of synthetic and experimental data are listed in Section S3.
We first show that our method works for the simplest case of slow transitions compared with the interpulse period (25 ns) with two system states using synthetic data (see Fig. 2). Next, we proceed to tackle more challenging synthetic data with three system states and higher transition rates (Fig. 3). We show that our nonparametric algorithm correctly infers system transition probabilities and thus the number of system states (see Fig. 3).
After demonstrating the performance of our method using synthetic data, we use experimental data to investigate the kinetics of HJs under different MgCl 2 concentrations in buffer (see Fig. 4).

Simulated data analysis
To help validate BNPs on smFRET single-photon data, we start with a simple case of a two-state system and select kinetics similar to those of the experimental data sets, c.f., the HJ in 10 mM MgCl 2 , with escape rates of 40 s À1 for both system states (51). The generated system state trajectory and photon traces over a period of 500 ms from both channels are shown in Fig. 2 a. Fig. 2 b shows the bivariate posterior distribution over FRET efficiencies, ε FRET , defined as ε FRET ¼ l FRET =ðl FRET þl d Þ, and system escape rates, i.e., obtained by computing the logarithm of the propagator matrix, with two peaks corresponding to the two system states most visited by the sampler. Furthermore, the ground truths, designated by red dots, fall within the posterior with a relative error of less than a b FIGURE 3 Analysis on synthetic data for three system states. In (a), we have a section of synthetic data produced with the values from Table S3. The system state trajectory is seen in blue. Below this, the arrival times of donor and acceptor photons m d n and m a n are shown in green and red, respectively. In (b), we plot the distribution over escape rates and FRET efficiencies ε FRET . The red dots show ground truths corresponding to escape rates of 1,200, 2,400, and 1,200 s À1 and FRET efficiencies of 0.22, 0.53, and 0.7. From our maximum a posteriori (MAP) estimate, l esc ε FRET we clearly see three system states with escape rates of 1; 100 þ60 À 60 , 2; 300 þ131 À 128 , and 1; 050 þ80 À 80 s À1 .
3% from the posterior modes. The results for the remaining parameters, including donor and acceptor transition rates, FRET transition rates, and system transition probabilities, are presented in Section S7.
To showcase the critical role played by BNPs, we also consider the more difficult case of a sample with three system states and faster system state kinetics ranging over 1,200-2,500 s À 1 . We do so by simulating photon traces in both donor and acceptor channels over a period of $150 ms. A 50 ms section of the synthetic photon trace is shown in Fig 3 a. Using direct photon arrivals from the generated photon trace, we find that the most probable system state trajectories sampled by BNP-FRET visit the correct number of system states, as shown in Fig 3 b, while inferring all other parameters. Furthermore, the BNP-FRET sampler estimates the system transition rates and thus the escape rates (i.e., sum of transition rates out of a given state) where the ground-truth escape rates differ from the posterior peaks by a relative average error of less than 8%. The results for the remaining parameters are provided in Section S7.

Experimental data analysis: HJ
In this section, we benchmark our method over a wide range of kinetic rates employing experimental data acquired using HJ under varying buffer MgCl 2 concentrations (15,51).
HJs are four-way double-helical DNA junctions existing in various structural configurations (41,52,53). When not interacting with multivalent metal ions, electrostatic repulsion between negatively charged phosphate groups of the four helical arms forces HJs to assume a wide configuration where the arms lie along the two diagonals of a square. However, in the presence of ions, such as Mg 2þ , interaction with the phosphate groups results in electrostatic screening. This reduced repulsion induces transitions to what is believed to be primarily two compact stacked configurations/conformations. The transitions between both conformations necessitates passing through the intermediate open configuration. Since, at high ion concentrations, displacing ions away from the phosphate group becomes increasingly difficult, in this scenario, we anticipate smaller transition rates between both conformations.
The HJ kinetic rates have been studied using both fluorescence lifetime correlation spectroscopy (15) and HMM analysis (54) on diffusing HJs assuming a priori a pair of high and low FRET system states. As expected, these previous studies show kinetic rates decreasing with increasing MgCl 2 concentrations (41,43) and correspondingly longer dwells.
Here, our method, free from averaging and binning that are otherwise common in HMM analysis, is particularly well suited to learn the rapid kinetics at low Mg 2þ concentrations. We apply our BNP-FRET to data a b c d FIGURE 4 The bivariate posterior for the conformational transition rates l esc and FRET efficiencies ε FRET for experimental data acquired in the presence of different HJ concentrations. Here, we show our bivariate posteriors where red dots show MAP estimates. In (a), we show the posterior for a sample with 1 mM MgCl 2 . We report escape rates of 1; 530 þ500 À 550 and 1; 240 þ420 À 420 s À1 in this case. The posterior for a sample with 3 mM MgCl 2 is shown in (b). We report escape rates of 140 þ38 À 38 and 142 þ32 À 32 s À1 for this case. In (c), we show our posterior for a sample with 5 mM MgCl 2 . Here, we report escape rates of 64 þ9 À 9 and 80 þ10 À 10 s À1 . The posterior in (d) is for a sample with 10 mM MgCl 2 . We report escape rates of 39 þ17 À 12 and 41 þ23 À 12 s À1 .
acquired from HJs at 1, 3, 5, and 10 mM MgCl 2 concentrations and sample the photophysical transition rates and the system transition probabilities. The acquired bivariate posterior distributions over the FRET efficiencies and escape rates (computed via the logarithm of the system transition probability matrix P s ) are presented in Fig. 4. Moreover, estimates for the other parameters can be found in Section S7. We note that our results are obtained on a single-molecule basis with a photon budget of 10 4 --10 5 photons.
For all four concentrations (see Fig. 4), our BNP-FRET sampler most frequently visited only two system states, while this was given as an input to the other analysis methods (15,54). Moreover, both escape rates are found to have similar values with an average of approximately 1,400s À1 (1 mM MgCl 2 ), 140s À1 (3 mM MgCl 2 ), 72s À1 (5 mM MgCl 2 ), and 41 s À1 (10 mM MgCl 2 ). These escape rates are in close agreement with values reported by fluorescence lifetime correlation spectroscopy and H2MM methods (15,54) of z1; 300 s À1 (1 mM MgCl 2 ), z170 (3 mM MgCl 2 ), z100 (5 mM MgCl 2 ), and z60 s À1 (10 mM MgCl 2 ), which lie well within the bounds of our posteriors shown in Fig. 4 while simultaneously, and selfconsistently, learning a number of system states.

Experimental data acquisition
In this section, we describe the protocol for preparing the surface-immobilized HJ sample labeled with a FRET pair and the procedure for recording smFRET traces from individual immobilized molecules. The sample preparation and recording of data follow previous work (55).

Sample preparation
The HJ used in this work consists of four DNA strands whose sequences are as follows: For surface immobilization, the X-strand was labeled with biotin at the 5 0 end. For FRET measurements, the donor (ATTO-532) and acceptor (ATTO-647N) dyes were introduced into the H-and B-strands, respectively. In both cases, the dyes were labeled to thymine nucleotide at the 6th position from the 5 0 ends of the respective strands (shown as T). All DNA samples (labeled or unlabeled) were purchased from JBioS (Shinjuku-ku, Japan) in the high-performance liquid chromatography purified form and were used without any further purification.
The HJ complex was prepared by mixing 1 mM solutions of R-, H-, B-, and X-strands in TN buffer (10 mM Tris-HCl with 50 mM NaCl, pH 8) at a 3:2:3:3 molar ratio, annealing the mixture at 94 C for 4 minutes, and gradually cooling it down (2 C-3 C min À1 ) to room temperature (25 C ). For smFRET measurements, we used a sample chamber (SecureSeal, GBL621502, Grace Bio-Labs, Bend, OR, USA) with a biotin-PEG-SVA (biotinpoly(ethylene glycol)-succinimidyl valerate)-coated coverslip. The chamber was first incubated with streptavidin (0.1 mg mL À1 in TN buffer) for 20 min. This was followed by washing the chamber with TN buffer (3 times) and injection of 1 nM HJ solution (with respect to its H-strand) for 3-10 s. After this incubation period, the chamber was rinsed with TN buffer (3 times) to remove unbound DNA, and it was filled with TN buffer containing 1 mM (or 5 mM) MgCl 2 and 2 mM Trolox for smFRET measurements.

smFRET measurements
The smFRET traces from individual HJs were recorded using a custom-built confocal microscope (Eclipse Ti, Nikon, Tokyo, Japan) equipped with the Perfect Focus System, a sample scanning piezo stage (Nano control B16-055), and a time-correlated single-photon counting module (SPC-130EM, Berlin, Germany).
The excitation light was focused onto the top surface of the coverslip, and, during measurements, the focusing condition was maintained using the Perfect Focus System. The fluorescence signals were collected by the same objective, passed through the dichroic mirror, and guided to the detection assembly (Thorlabs DFM1/M) using a multimode fiber (Thorlabs M50L02S-A). Note that this multimode fiber (core diameter: 50 mm) also acts as the confocal pinhole. In the detection assembly, the fluorescence signals from the donor and acceptor dyes were separated using a dichroic mirror (ZT633rdc, Chroma Technology, Bellows Falls, VT, USA), filtered using band-pass filters (Chroma ET585/65m for donor and Semrock FF02-685/40 for acceptor), and detected using separate hybrid detectors (Becker and Hickl HPM-100-40-C).
For each detected photon, its macrotime (absolute arrival time from the start of the measurement) was recorded with 25.2 ns resolution and its microtime (relative delay from the excitation pulse) was recorded with 6.1 ps resolution using the time-correlated singlephoton counting module operating in time-tagging mode. A router (Becker and Hickl HRT-41) was used to process the signals from the donor and acceptor detectors.
For recording smFRET traces from individual HJs, we first imaged a 10 Â 3 mm area of the sample using the piezo stage by scanning it linearly at a speed of 1 mm s À1 in the x direction and with an increment of 0.1 mm in the y direction. Individual HJs appeared as isolated bright spots in the image.
Next, we fitted the obtained donor and acceptor intensity images with multiple 2D Gaussian functions to determine the precise locations of individual HJs. Note that, during this image acquisition, the laser excitation power was kept to a minimum ($1 mW at the back aperture of the objective lens) to avoid photobleaching the dyes. In addition, we also employed an electronic shutter (Suruga Seiki, Shizuoka, Japan) in the laser excitation path to control the sample excitation as required.
Using the precise locations of individual HJs obtained, we recorded 30 s-long smFRET traces for each molecule by moving them to the center of the excitation beam using the piezo stage. For each trace, the laser excitation was blocked (using the shutter) for the first 5 s and was allowed to excite the sample for the remaining 25 s. Note that the smFRET traces were recorded using 40 mW laser excitation (at the back aperture of the objective lens) to maximize the fluorescence photons emitted from the dyes. We automated the process of acquiring smFRET traces from different molecules sequentially and executed it using a program written in house on Igor Pro (Wavemetrics, Portland, OR, USA).

DISCUSSION
The sensitivity of smFRET under pulsed illumination has been exploited to investigate many different molecular interactions and geometries (8)(9)(10)(11)56). However, quantitative interpretation of smFRET data faces serious challenges including an unknown number of system states and robust propagation of uncertainty from noise sources such as detectors and background. These challenges ultimately mitigate our ability to determine full distributions over all relevant unknowns and, traditionally, have resulted in data pre-or postprocessing compromising the information otherwise encoded in the rawest form of data: single-photon arrivals.
Here, we provide a general BNP framework for smFRET data analysis starting from single-photon arrivals under a pulsed illumination setting. We simultaneously learn transition probabilities among system states as well as determine photophysical rates by incorporating existing sources of uncertainty such as background and cross talk.
We benchmark our method using both experimental and simulated data. That is, we first show that our method correctly learns parameters for the simplest case with two system states and slow system transition rates. Moreover, we test our method on more challenging cases with more than two states using synthetic data and obtain correct estimations for the system state transition probabilities and thus the number of system states along with the remaining parameters of interest. To further assess our method's performance, we analyzed experimental data from HJs suspended in solutions with a range of MgCl 2 concentrations. These data were previously processed using other techniques assuming a fixed number of system states by binning photon arrival times (15).
Despite multiple advantages mentioned above for BNP-FRET, BNPs always come with an added computational cost as they take full advantage of information from single-photon arrival times and all existing sources of uncertainty. For this version of our general BNP method simplified for pulsed illumination, we further reduced the computational complexity by grouping empty pulses together. Therefore, the computational complexity increased only linearly with the number of input photons as the photons are treated independently.
The method described in this paper assumes a Gaussian IRF. However, the developed framework is not limited to a specific form for the IRF and can be used for data collected using any type of IRF by modifying Eq. 4. Furthermore, the framework is flexible in accommodating different illumination techniques such as alternating color pulses, which are typically used to directly excite the acceptor fluorophores. This can be achieved by simple modification of the propagator Q j n in Eq. 4. A future extension of this method could relax the assumption of a static sample by adding spatial dependence to the excitation rate as we explored in previous works (35,47,57). This would allow our method to learn the dynamics of diffusing molecules, as well as their photophysical and system state transition rates.