Single-molecule digital sizing of proteins in solution

The physical characterization of proteins in terms of their sizes, interactions, and assembly states is key to understanding their biological function and dysfunction. However, this has remained a difficult task because proteins are often highly polydisperse and present as multicomponent mixtures. Here, we address this challenge by introducing single-molecule microfluidic diffusional sizing (smMDS). This approach measures the hydrodynamic radius of single proteins and protein assemblies in microchannels using single-molecule fluorescence detection. smMDS allows for ultrasensitive sizing of proteins down to femtomolar concentrations and enables affinity profiling of protein interactions at the single-molecule level. We show that smMDS is effective in resolving the assembly states of protein oligomers and in characterizing the size of protein species within complex mixtures, including fibrillar protein aggregates and nanoscale condensate clusters. Overall, smMDS is a highly sensitive method for the analysis of proteins in solution, with wide-ranging applications in drug discovery, diagnostics, and nanobiotechnology.


Introduction
Proteins form the molecular machinery of life and accomplish their biological function by interacting with other proteins as well as by assembling into biomolecular complexes and higher order structures. 1 The characterization of proteins is thus a key objective in many areas of biological and biomedical sciences, both for understanding the normal functional behavior of proteins and for elucidating aberrant processes and interactions that can lead to dysfunction and disease. [2][3][4][5] Proteins also serve as important therapeutic targets in drug discovery and clinical diagnostics, and are utilized as nanoscale building blocks in bionanotechnological applications. [6][7][8] A rigorous analysis of proteins and protein assemblies is therefore essential for advances in therapeutics development and the discovery of new biomaterials. [9][10][11] In particular features such as the molecular size of proteins and protein complexes, the strength of their interactions in terms of pairwise dissociation constants, and their assembly and oligomerization states are key parameters providing a heightened understanding of the biological function, malfunction, and design of proteins. [12][13][14] While such biophysical characterization of proteins has become routine, there still remain major hurdles that have not been addressed and which pose significant challenges for currently available biophysical approaches.
One particular challenge in the characterization of proteins lies in their inherent compositional complexity and heterogeneity. Protein systems often exist as mixtures of heterogeneous components, and exhibit polydispersity in terms of size and abundance. However, most biophysical approaches perform best when applied to pure homogeneous samples, and methods developed to enable the profiling of heterogeneous mixtures, such as gel filtration, electrophoresis, mass spectrometry, and surface plasmon resonance, pose significant problems. [15][16][17][18] These approaches are generally reliant on separation media, immobilization, or transferring the molecules from the solution phase into the gas phase, all of which may modify the distribution of sizes and thus make it challenging to relate the functional state of a protein to that probed under native solution conditions. While methods capable of studying molecules in bulk solution exist, such as analytical ultracentrifugation, isothermal titration calorimetry, and static and dynamic light scattering, they generally consume large amounts of protein and often require concentrations exceeding the biologically relevant range. [19][20][21] Many proteins are, however, present at low concentrations in biological media or samples are often available only in limited amounts. Therefore, ultrasensitive approaches that can size proteins and resolve heterogeneous mixtures of proteins at very low concentrations directly in solution are much sought after.
Micron-scale measurements of molecular diffusivity have proven to be a versatile and sensitive approach for probing the molecular sizes of proteins, their interactions, and assembly states in solution. [22][23][24] In particular, microfluidic diffusional sizing (MDS) has become an attractive, quantitative method for the characterization of proteins and protein complexes under native solution conditions. [25][26][27][28][29][30][31][32][33][34][35] MDS exploits the unique features of laminar flow in the microfluidic regime and measures the diffusive mass transport of molecules across co-flowing sample and buffer streams within a microchannel. 24,25 By monitoring the diffusive spreading of analyte molecules at different downstream channel positions and analyzing the recorded diffusion profiles with advection-diffusion models, the sizes of analyte molecules can be quantified in terms of their hydrodynamic radii Rh. MDS can further probe the formation of biomolecular interactions and the assembly of analyte molecules into higher order structures by monitoring the increase in size associated with complex formation, and can retrieve binding affinities (i.e., dissociation constants KDs) through measurement of binding curves. However, despite the versatility of the approach, current implementations of MDS and other similar methods are limited in their ability to resolve compositional heterogeneities as their measured signals are based on ensemble readouts, which require deconvolution of an averaged signal in order to evaluate the complete size distribution of component species. 25,36,37 What exacerbates the situation is that diffusional analysis is often performed in a simplified manner, in which only relative amounts of analytes are measured. Such analysis only yields an average Rh for mixtures of differently sized species, thus limiting information content, in particular when studying heterogenous, multicomponent systems. Moreover, detection sensitivities for MDS and other sizing techniques are in the nanomolar to micromolar range, which hampers ultrasensitive protein detection at concentrations in the often desirable pico-to femtomolar range. 25,36,37 Single-molecule detection methods, owing to their high sensitivity, bear great potential to overcome these challenges. 38,39 In particular, single-molecule fluorescence methods based on confocal detection are attractive readout modalities as they allow resolving distributions and heterogeneities underlying the ensemble without the necessity to deconvolve a signal, hence providing rich insights into subpopulations and compositional complexities. 40,41 Furthermore, they enable detection at ultra-low concentrations and provide direct digital readouts, that is, direct counting of individual proteins and protein complexes in solution. [42][43][44][45][46] However, direct readouts from singlemolecule fluorescence cannot provide information on the sizes of molecules. Size information can only be retrieved indirectly, for example, through correlation analysis, as done in fluorescence correlation spectroscopy (FCS). 47,48 FCS and similar approaches, however, are reliant on calibration for absolute size determination and suffer from overparameterization, especially when heterogenous mixtures and polydisperse samples need to be resolved. Therefore, approaches that directly furnish single-molecule readouts with size information are much sought after. Accordingly, we reasoned that combining single-molecule fluorescence detection with diffusivity measurements of proteins in microchannels should lend itself a powerful platform to size proteins at the single-molecule level, enabling the characterization of heterogenous and multicomponent protein systems directly in solution.
Here, we demonstrate measurements of molecular diffusivity of single proteins and protein assemblies in microchannels using single-molecule fluorescence detection. By integrating a confocal fluorescence readout functionality into a microfluidic system that leverages the principles of MDS, we introduce an approach, termed single-molecule MDS (smMDS), that enables diffusional-sizing-based monitoring of protein hydrodynamic radii at the single-molecule level. We first present the working principle and experimental implementation of the smMDS approach and show that, due to the digital nature of the detection process, it can be used to size single proteins and protein complexes with pico-to-femtomolar sensitivities. We further demonstrate the ability of smMDS to quantify the affinity of protein interactions at the single-molecule level and then show that smMDS enables resolving different assembly states of protein oligomers. We further use smMDS to characterize aggregate assemblies within multicomponent protein mixtures and apply the approach to the sizing of nanoscale clusters of a protein condensate system. Taken together, smMDS constitutes a versatile approach for digital, in-solution characterization of the sizes, interactions, and assembly states of proteins. We anticipate that smMDS will have widespread applicability in the biological and biomedical sciences for the discovery of novel biomolecular mechanisms underpinning protein function and malfunction. Furthermore, smMDS holds great potential for the analysis of protein interactions in drug discovery, clinical diagnostics and nanobiotechnological applications.

Results and Discussion
Working principle and experimental implementation of smMDS  The confocal detection  volume is moved at a constant speed across the microfluidic device, enabling the recording of diffusion  profiles from direct intensity readouts. This mode enables recording of diffusion profiles under ensemble conditions. An exemplary diffusion profile from a continuous scan measurement of human serum albumin (HSA) at 100 nM is shown. Diffusion profiles are shown as blue lines, experimental fits as orange lines, and errors as green bands. Extracted RH [with errors] are given as insets. (c) Principle of step scan measurements. The confocal detection volume is moved in a stepwise manner across the device, collecting data at defined positions with each step for a certain period of time in the form of time traces (see panel d). This mode enables detection of individual molecules and the creation of diffusion profiles from single-molecule digital counting. An exemplary diffusion profile from a step scan measurement of α-synuclein at 10 pM is shown. (d) A single-molecule time trace (lower panel) as obtained from a step scan measurement is shown. The time trace in the upper panel is a zoom-in view of the red shaded area in the lower panel. Red dots and highlighting indicate bursts detected by the burst-search algorithm. The bin time is 1 ms in all traces. Figure 1a illustrates the working principle and experimental implementation of smMDS. smMDS measures the molecular diffusivity of analyte molecules within a microfluidic chip. It operates based on the principles of MDS 25 and probes molecular diffusivity by flow-focusing an analyte stream between two auxiliary buffer streams within a microfluidic chip and then observing the diffusive spreading of analyte molecules to either side of the microfluidic channel as they travel downstream (see Methods). Because different positions along the channel correspond to different diffusion times, the tracking of the diffusive broadening of species at different channel positions allows quantification of the diffusion coefficient D and, thus, extraction of the size of analyte molecules in terms of Rh. 25 Experimentally, smMDS measurements are conducted by introducing sample containing fluorescent protein and buffer into the sizing chip and monitoring the micron-scale diffusive mass transport of molecules across the channel as they flow downstream the channel. Fluid flow in the channel is controlled by applying a negative pressure at the device outlet with a syringe pump (see Methods for details).
Detection in smMDS is done by a high sensitivity laser confocal fluorescence microscope functionality integrated into the microfluidic platform (see Methods). By scanning the confocal volume across the microfluidic chip at the mid-height of the channel perpendicular to the flow direction (Figure 1a), fluorescence from passing analyte molecules is recorded. The scan trajectory is chosen such that various positions along the channel are probed, including positions that are close to the nozzle where the sample stream meets the co-flowing buffer medium, and others, further away, downstream of the channel. In our implementation, the four innermost channels of the device are scanned to obtain diffusion profiles, as these cover a wide range of distances and time points along the channel, such that biomolecular analytes with Rhvalues on the 1-100 nm scale can be analyzed. The scan trajectory of the confocal volume in x,y,z-direction is set through two scan markers integrated within the microfluidic chip adjacent to the channels.
Scanning is conducted in two modes, by either continuously moving the confocal volume through the chip (Figure 1b), or by moving the observation volume along the same trajectory in a stepwise manner, collecting data at defined positions with each step (Figure 1c). In continuous scan mode, diffusion profiles are rapidly acquired from direct intensity readouts. Hereby, the confocal volume is moved through the device at a constant scan speed (tens of µm/s) and the fluorescence intensity from analyte sample flowing through the confocal volume is recorded. This allows swift recordings of diffusion profiles under ensemble conditions, that is, at concentrations where many molecules are present in the confocal volume (i.e., typically at concentrations greater than tens of pM). An exemplary diffusion profile obtained from a continuous scan experiment is shown in Figure 1b. To extract RH, the recorded diffusion profiles are analyzed using an advection-diffusion model (see Methods). Hereby, experimental profiles are fit to a set of simulated profiles obtained from numerical simulations. Using a least-squares error algorithm, the best matching simulated profile is identified via fitting to extract D and retrieve RH using the Stokes-Einstein relationship (Figure 1b).
In step scan mode, diffusion profiles are generated from time trace recordings along the scan trajectory. An exemplary diffusion profile obtained from a step scan experiment is shown in Figure 1c. Hereby, the confocal volume is parked at various positions along the trajectory and fluorescence signals from analyte flowing through the confocal volume are recorded for a certain period of time. Typically, between 200 to 400 scan steps across the chip are performed from start to end position and 2-to 4-second-long fluorescence traces are recorded at each position. Importantly, due to the high sensitivity of confocal detection, measurements in step scan mode enable the detection of individual molecules and, thus, the creation of diffusion profiles from single molecule counting (Figure 1d). Hereby, bursts of fluorescence corresponding to the passage of single molecules through the confocal volume are recorded at each channel position. To estimate the number of molecules at each scanned position, a burst-analysis algorithm is employed (see Methods). This algorithm uses a combined maximum interphoton time (IPTmax) and minimum total number of photons (Nmin) threshold criterium to extract single-molecule events from the recorded time trace at each position (see Methods). This approach has been shown to enable effective discrimination between photons that originate from single fluorescent molecules and those that correspond to background, thus allowing individual molecules to be counted directly, that is, in a digital manner. 46 From the detected number of molecules at each position, diffusion profiles are then created by plotting the number of counted molecules as a function of chip position. Extraction of RH is done analogously as described above for continuous scan experiments by fitting the experimental diffusion profiles with our advection-diffusion analysis model. Figure 1c depicts fits and extracted RH values for the example data set. In a first set of experiments, we sought to evaluate the sensitivity of smMDS and demonstrate its capability to determine protein size from bulk to single-molecule conditions. To this end, we labeled human serum albumin (HSA) with a fluorescent dye (Alexa 488) (see Methods) and performed concentration series measurements, both in continuous and step scanning mode, by varying the HSA concentration. The recorded diffusion profiles of the series are shown in Figure 2b,c.

Sizing of proteins from bulk to single-molecule conditions by smMDS
In the range from 1 µM down to tens of pM of HSA (Figure 2b), sufficient molecular flux of HSA protein molecules allowed for the recording of diffusion profiles from continuous scan experiments. The obtained profiles show the characteristic broadening due to diffusion of molecules along the channels. Narrow peaks are observed at channel positions close to the nozzle where the sample meets the carrier medium, and peaks broadened as we probed further downstream channel positions. Modeling of the diffusion profiles using our advection-diffusion analysis approach yielded excellent fits. Extracted RH values were amongst all concentrations, within error, in excellent agreement with previously reported values for HSA, [49][50][51][52] demonstrating the robustness and accuracy of the approach.  (Figure 2c). In this way, we obtained diffusion profiles from digital counting for HSA from 20 pM down to 100 fM. Remarkably, also in this regime, by applying our advection-diffusion analysis, we obtained excellent fits of the experimental data and retrieved RH values that were in agreement with previously reported values for HSA (Figure 2d). [49][50][51][52] This shows that smMDS provides accurate size information over a broad range of concentrations and enables ultrasensitive sizing of proteins even in the picomolar to femtomolar concentration regime. We also evaluated the influence of parameter selection in the single-molecule regime on the extracted sizes ( Figure S1) and found that a wide range of burst selection parameters, that is, varying thresholds for ITPmax and Nmin (see Methods), yielded expected size information, supporting the robustness of the approach.
To compare the sensitivity of the smMDS technique to conventional MDS measurements, we conducted experiments utilizing fluorescence widefield imaging ( Figure S2). We performed concentration series measurements by varying the HSA concentration starting at 1 µM of labeled HSA and then gradually decreasing the protein concentration down to 100 and 50 nM. Image analysis yielded a clear profile only for the measurements at 1 µM and 100 nM protein concentration, and the expected size for HSA could only be recovered for the measurement at 1 µM protein.
Notably, the measurement at 50 nM HSA yielded a featureless profile that could not be fitted and, hence, no RH could be determined. This shows that conventional MDS experiments are limited to concentrations in the tens of nanomolar range. In comparison, the high sensitivity and digital detection capabilities afforded by smMDS allows measuring the size of proteins down to femtomolar concentrations, thereby extending the sensitivity range of diffusional sizing experiments by more than five orders of magnitude. Next, we sought to demonstrate the wide applicability of smMDS in determining the size of proteins and protein assemblies from single-molecule digital counting. We selected a set of analytes differing in size, including the proteins lysozyme, RNase A, α-synuclein, human leukocyte antigen (HLA), HSA, thyroglobulin, and oligomers formed by the protein α-synuclein. We also included the small organic fluorophore Alexa 488 in the series (Figure 3a). The protein analytes were fluorescently labeled and purified before analysis (see Methods). We performed smMDS measurements at an analyte concentration of 10 pM and subjected the analytes to smMDS in step scan mode. We moved the confocal spot in a stepwise manner through the channels and extracted single-molecule events for each analyte at each channel position by digitally counting molecules to create diffusion profiles, which we fitted with our advectiondiffusion model. Exemplary diffusion profiles for thyroglobulin, HLA, and Alexa 488 are shown in Figure 3b. The obtained RH values from smMDS were then plotted against previously reported RH values for Alexa 488, 53 lysozyme, 54 RNase A, 55 α-synuclein, 56 human leukocyte antigen (HLA), 57 HSA, 49-52 thyroglobulin, 58 and α-synuclein oligomers 59 (Figure 3a). The values obtained by smMDS followed the expected trend within error. This demonstrates the excellent agreement between sizes obtained from smMDS and literature values, highlighting the reliability of the single-molecule diffusivity measurements in size determination of protein analytes. In an additional analysis step, we plotted the experimentally obtained RH values against the molecular weight MW. Both, folded proteins (lysozyme, RNase A, HLA, HSA, thyroglobulin) and unfolded protein species (α-synuclein monomer and oligomers), followed the expected scaling behavior for globular (RH ∝ 1/3 ) and disordered (RH ∝ 0.6 ) proteins, respectively (Figure 3a, inset). . For analysis, the binding isotherm was fitted with a binding model assuming two antigen molecules binding one antibody. 57 The dissociation constant was found to be Kd = 400.5 ± 39.6 pM. Error bars are standard deviations from triplicate measurements.

Quantifying protein interactions by smMDS
Next, we set out to demonstrate the capability of smMDS in determining the affinity of biomolecular interactions at the single-molecule level. Interactions of proteins with secondary biomolecules, in particular with other proteins, are of great importance across the biosciences, and quantitative measurements of affinity constants in the form of KDs have become vital in biomedical research and clinical diagnostics, for example, for affinity profiling. 29,57,60,61 Diffusional sizing allows for the detection of biomolecular interactions by monitoring the increase in size associated with binding and complex formation. 25,31 By acquiring binding isotherms, affinity constants of the interaction can be determined in solution, without the need for purification or for immobilization on a surface. So far, diffusional sizing has been limited to the sizing and quantification of protein interactions at bulk nanomolar concentration levels-with smMDS, this barrier can be overcome.
To demonstrate the detection of biomolecular interactions and quantification of binding affinities by smMDS in a digital manner, we probed binding of a clinically relevant antibody-antigen interaction. Specifically, we investigated the binding interaction between HLA A*03:01, an isoform of the major histocompatibility complex type I (MHC) and a key factor in the human immune system, 62 and the antibody W6/32, an antibody that binds to all class I HLA molecules (Figure 4a). 63 We performed a series of step scan smMDS experiments by titrating HLA antigen (labeled with Alexa 488), at a constant concentration of 100 pM, with increasing amounts of the unlabeled W6/32 antibody. Exemplary diffusion profiles for pure HLA at 100 pM, and 100 pM HLA titrated with 10 nM of W6/32 are shown in Figure 4b. smMDS diffusion profiles, from three repeats, were acquired and fitted to obtain effective RH across the concentration series. We observed an increase in average hydrodynamic radius from RH = 3.18 ± 0.04 nm for pure HLA, corresponding to a molecular weight of 50 kDa, as expected for HLA, to RH = 5.08 ± 0.01 nm for the saturated complex, corresponding to a molecular weight of 215 kDa, consistent with the binding of a 150 kDa antibody to HLA (Figure 4c). By fitting the binding isotherm (Figure 4d), we determined the dissociation constant to be Kd = 400.5 ± 39.6 pM, consistent with previous results. 57 Importantly, HLA is an extensively used clinical biomarker to assess, for example, the risk of allograft rejection, and typically present at low concentrations. These results here therefore outline a path towards detection and affinity profiling of antibody responses when only low concentrations of samples are available. More generally, these results highlight the possibility to quantify biomolecular interactions from singlemolecule digital measurements at ultralow concentrations. Many proteins fulfill their biological roles not as monomeric species but as oligomeric assemblies, which often exhibit significant heterogeneity in terms of their degree of oligomerization and relative abundance. [64][65][66] Oligomeric forms of proteins play important functional roles in cellular physiology but are also implicated in diseases such as neurodegeneration. [67][68][69] Resolving the degree of oligomerization and thus the size of such heterogenous protein populations is, however, extremely challenging with currently available biophysical techniques. A key feature of smMDS is that it has the capability to directly distinguish between various assembly states of a protein based on a difference in their emitted fluorescence signals. This feature is afforded by the single-molecule sensitivity of smMDS and enables the creation of diffusion profiles from subspecies that make up the heterogeneous population. To demonstrate this capability, we set out experiments with two protein oligomer systems that are inherently heterogeneous and have distinct functions in biology and disease.

Resolving protein oligomeric states by smMDS
In a first set of experiments, we determined the sizes of low-molecular weight oligomers formed by the protein HSA ( Figure 5). Serum albumins are known to exist in an equilibrium of monomers, dimers, trimers, and tetramers, which are populated with decreasing relative abundance. 70 At the single-molecule level, such different oligomeric states can be discriminated through brightness analysis of fluorescence bursts. 71,72 In this analysis, different species can be distinguished based on their emitted fluorescence intensity because the magnitude of the observed intensity scales directly with the number of individually dye-labeled monomer units present in an oligomeric assembly. By applying differential thresholding, oligomeric states can then be discriminated, which provides an opportunity for smMDS to create distinct diffusion profiles for each oligomeric subspecies, enabling their independent species-specific sizing. To demonstrate this, we set out smMDS measurements to resolve the sizes of HSA monomers, dimers, trimers, and tetramers. We subjected labeled HSA to smMDS at 10 pM protein concentration and performed step scan measurements to extract single-molecule events from single-molecule time trace recordings. We then displayed the extracted normalized burst intensities from all recorded burst events of the measurement in a burst intensity histogram (Figure 5a). This allowed us to display single-molecule burst events according to their brightness and assign regions of intensity for the monomeric and the different oligomeric HSA species. Accordingly, the main peak in the histogram reflects the average intensity of monomeric HSA. We extracted this intensity by fitting the distribution with an asymmetric Gaussian function, which reflects the skewedness of the burst intensity distribution due to the undersampling effects at short burst times, and retrieved a mean intensity for the monomer of Imonomer = 75.33 photons/ms with a standard deviation σmonomer of 37.44 photons/ms. Since oligomers contain as many fluorophores as monomer units, dimeric, trimeric, and tetrameric forms of HSA emit at multiples of the normalized intensity of monomeric HSA, due to the increasing number of fluorophore present within the assembled states. We therefore defined regions at two-, three-, and fourfold of the normalized intensity of the monomer, corresponding to HSA dimer, trimer, and tetramer, respectively by fitting the burst intensity distribution with three Gaussian functions. The widths of these oligomer regions were assumed to have the same standard deviation as the monomeric protein. The resulting fit for all Gaussians, including the asymmetric Gaussian for the monomer, described well the experimental burst intensity distribution. We then generated diffusion profiles from the bursts within each of the four regions and fitted the profiles to extract size information from the respective monomer/oligomer range (Figure 5b) tetramer (Figure 5c). These values are in very good agreement with the expected equilibrium distribution of serum albumins. 70 Overall, our analysis here shows that smMDS can afford speciesresolved insights into oligomer size and abundance. In a further set of experiments, we analyzed a heterogenous mixture of α-synuclein oligomers ( Figure 6). Oligomeric forms of the protein α-synuclein are considered to be central to the pathology of Parkinson's disease and hallmarked by a high degree of heterogeneity in terms of size and structure. [73][74][75] Their characterization is an area of intense interest, not least because such information is useful in drug development activities, however, tools that can directly resolve the heterogeneity of these nanoscale assemblies in solution are scarce. 59,[76][77][78][79][80] To address this challenge and characterize structural heterogeneity of α-synuclein oligomers, we analyzed a heterogenous mixture of α-synuclein oligomers by smMDS. We injected oligomers produced by lyophilization of Alexa 488-labeled α-synuclein into the microfluidic sizing chip at a concentration of 10 pM and performed step scan measurements to digitally extract single-molecule events of passing α-synuclein oligomer molecules at each channel position. To create diffusion profiles from subspecies, we selected bursts with different fluorescence intensities to resolve differently sized assembly states of oligomers within the mixture. Here we took an alternative approach as compared to the analysis of HSA oligomers presented above and extracted bursts by varying the minimum number of fluorescence photons in the burst search algorithm, while keeping the inter-photon time threshold constant. This allowed us to effectively differentiate between singlemolecule burst events that differ in their molecular brightness (see burst intensity histograms shown in Figure 6a). In this way, diffusion profiles from assemblies that differ in their fluorescence intensity and, hence, size in terms of RH were generated. Exemplary diffusion profiles for four different thresholds are shown in Figure 6b. These profiles were then fitted with our advection-diffusion model to extract size information. We applied this analysis to a range of photon thresholds (i.e., 5-50 total number of photons) and generated a plot of the extracted sizes versus photon thresholds from the diffusion profile series (Figure 6c). The measured sizes span from RH = 3.6 nm for the smallest photon threshold value to RH = 16.5 nm for the largest threshold value. The value at the smallest threshold reflects the size of α-synuclein monomers (RH = 3.1 ± 0.4 nm) (c.f. Figure 3), while higher values reflect α-synuclein oligomer subpopulations, spanning a range of RH values greater than 10 nm. Notably, the RH-value distribution obtained by smMDS is in excellent agreement with a previous study, 76 which described the variability in size of monomer/oligomers to be in the range of 3-16 nm. In comparison, we also generated a diffusion profile from the entire set of bursts detected at each position, which through fitting yielded an ensemble-averaged value of the size of the oligomer population (RH = 5.2 ± 0.1 nm) (Figure 6c). To complement the analysis demonstrated here based on varying the minimum number of fluorescence photon thresholds in the burst search algorithm, we also performed size analysis of α-synuclein oligomers by selecting defined regions in the burst intensity histogram as done for HSA. The results are shown in Figure S3 and are in very good agreement with the results obtained by varying the fluorescence photon thresholds value (c.f. Figure 6c). Taken together, our analyses here demonstrate the versatility of smMDS in resolving the size distributions of a heterogenous oligomeric protein samples. Many protein systems are heterogeneous, multicomponent mixtures consisting of proteins and protein assemblies that differ in size by several orders of magnitude. For example, aggregation mixtures are made up of monomeric protein and large fibrillar species. 3,30 Often, one of the components (e.g., the monomeric protein) is present in large excess, while the other one (e.g., fibrillar species) is only present in small amounts. Approaches that can quantify the sizes of such differently populated species are much sought after, yet lacking. smMDS can fill this gap as it has the capability to size molecules and assemblies in heterogeneous mixtures even when an excess of one of the molecular species is present at bulk levels.

Sizing of multiple species within a heterogenous aggregation mixture by smMDS
To illustrate the potential of sizing mixtures by smMDS that are compositionally heterogenous, we set out experiments with a sample system composed of fibrils formed by the protein α-synuclein, a key component in the pathology of Parkinson's disease, 81,82 and an excess of monomeric α-synuclein at nanomolar concentrations (Figure 7a). Such a mixture is often encountered in assays that probe the mechanisms underlying protein aggregation and amyloid formation. 30,83 We first performed continuous scanning experiments on pure α-synuclein fibrils (at 10 nM monomer equivalent concentration) and pure α-synuclein monomer (at 10 nM concentration) to establish the signature of the two species (Figure 7b). For the fibril-only sample (Figure 7b, green profile), we observed burst events of high fluorescence intensity that were narrowly distributed around the center of the channels. The high burst intensity stems from the large number of fluorophores that are contained in a single fibril (>10% of the monomers are fluorescently labelled within fibrils, see Methods). The narrow distribution of bursts located at the center of the channel indicates a low diffusion coefficient and correspondingly a large size, as expected for fibrillar aggregates. For the monomer sample (Figure 7b, blue profile), the sizing profiles exhibited a broader spread, due to the larger diffusivity of the monomeric units as compared to fibrils. The monomer signal is continuous because nanomolar concentrations are used, and therefore multiple monomeric units traverse the confocal detection volume at the same time, resulting in a bulk fluorescence signal rather than individual single-molecule events, as observed for the fibril sample. In addition to establishing the signatures of fibrillar and monomeric samples, we probed a sample mix containing α-synuclein fibrils and an excess of the monomeric protein (Figure 7b, pink profile). The diffusion profile of the mixture now exhibited characteristic signatures for both fibrils and monomeric protein, with broadened fluorescence at the profile base, reflecting monomeric protein, in addition to bright bursts on top of the monomeric signal that were narrowly distributed at the center of the channel, reflecting signals from fibrils.
To demonstrate that smMDS is able to size both the monomeric subpopulation present at bulk levels and the fibrils present at single-particle concentrations, we performed step scan measurements with a mix containing α-synuclein fibrils and an excess of the monomeric protein (Figure 7c). An example fluorescence time trace is shown in Figure 7c (top panel). Fibrils are clearly detectable as bursts above the mean signal, which corresponds to the bulk monomer signal. From these traces, we separated the bulk monomer signal from the fibril burst signals by intensity thresholding. Specifically, fibrils were detected as bursts that exhibit a fluorescence count rate of >250 kHz, after applying a Savitzky-Golay smoothing filter. The remaining signal (i.e., the mean bulk signal in the fluorescence time traces in the absence of fibril signal) formed the signal for the monomer. From the extracted fibril and monomer signal, we created diffusion profiles for the two species (Figure 7c, bottom panels) and subjected these profiles to fitting using our advection-diffusion model to extract size information of the two species. The sizes of monomer and fibrils species, from triplicate measurements (Figure 7d), were estimated to be RH,monomer= 3.23 ± 0.04 nm and RH,fibrils = 56.43 ± 6.69 nm. As a control, we also performed step scan measurements for the fibril-only and monomer-only sample for comparison (Figure 7e and f, respectively) and obtained sizes which were, within error, in excellent agreement with the ones obtained from the sample mix, thereby validating our approach (Figure 7d).
Together, these experiments here show that smMDS has the capability to quantify differently sized molecules or assembly states of a protein within a heterogeneous mixture, even when an excess of one of the molecular species is present at bulk levels. These findings are significant as such an approach allows for the simultaneous probing of differently populated species, for example, in kinetic protein misfolding and aggregation studies. In a final set of experiments, we applied the smMDS approach to the characterization of nanometer-sized clusters of a phase separating protein system. Biomolecular condensates (Figure 8a) formed through phase separation are important players in cellular physiology and disease 84,85 , and emerge from the demixing of a solution into a condensed phase and a well-mixed phase. 86,87 Condensates typically have sizes in the micrometer range and, as such, are typically observable by conventional microscopy imaging. [88][89][90] Recent evidence, however, suggests that phase separationprone proteins, such as the DNA/RNA binding protein fused in sarcoma (FUS), can also form nanoscale assemblies (Figure 8a), well-below the critical concentration at which phase separation occurs (i.e., pre-phase separating regime). [91][92][93][94] These socalled nanoscale clusters have sizes in the tens to hundreds nanometer regime, and thus are beyond the resolution of conventional optical imaging systems. Moreover, as these species are low in abundance and present in a high background of dilute phase protein concentration, they are typically hard to detect by classical approaches. Here, we demonstrate that smMDS can determine the size of nanoscale assemblies formed by the protein TAR DNA binding protein 43 (TDP-43) at sub-saturating concentrations. We further show that smMDS allows for the determination of nanocluster abundance and composition.

Sizing of nanoscale clusters by smMDS
First, we mapped out a one-dimensional phase diagram of the protein TDP-43 (Figure 8b) to assess the phase separation behavior of the protein with respect to changes in salt concentration. GFP-tagged TDP-43 at 0.5 µM protein concentration formed microscopically visible condensates below a critical salt concentration ccrit of 50 mM KCl. No condensates were visible by conventional fluorescence microscopy above that salt concentration and the solution appeared clear and well-mixed. Next, in order to assess whether TDP-43 forms nanoscale assemblies, we performed smMDS measurements at conditions where no microscopically visible condensates could be detected (i.e., well above ccrit). To this end, we first performed a continuous scan experiment at 100 mM KCl. The obtained profile, shown in Figure 8c (top panel), exhibited a broad spread signature, which is characteristic for bulk monomeric protein.
Sizing of the profile (Figure 8c, bottom panel) yielded a hydrodynamic radius of RH = 4.29 ± 0.8 nm, which is in agreement with the size of monomeric GFP-tagged TDP-43. More importantly, in addition to the characteristic signature for monomeric protein in the continuous scan diffusion profile, we observed bright bursts that were narrowly distributed at the center of the channel on top of the diffusion profile, indicating the presence of clusters (Figure 8c, top panel). To explore this further, we carried out smMDS step scan measurements. We performed high-resolution step scans with an interval of only 1 µm between steps within the central region of the diffusion profile where clusters would appear. Outside these regions we performed step scans with lower resolution (Figure 8e, top panel). Clusters were clearly detectable as bursts above the mean bulk signal in the fluorescence time traces (Figure 8d). These bursts were then counted to give us the number of clusters present at each position in the channel and binned in a histogram (Figure 8e, bottom panel). Each peak in the histogram was then independently fit to a Gaussian distribution to obtain a mean diffusion distance that could be utilized to calculate D and, thus, RH. Specifically, we extracted the half of the full-width half maximum (FWHM) of the Gaussians as the diffused distance at the four channel positions, corresponding to 0, 10, 26, and 55 seconds of travel within the channel. The diffusion distances x at each time point t (Figure 8f) were then fitted with a one-dimensional approximation of Fick's law to extract D and, thus, RH via the Stokes-Einstein relation. From the fit, using the average of three replicate measurements, we obtained an RH of 121.5 ± 14.5 nm for TDP-43 nanoclusters (Figure 8f). We note that the sizes of clusters are similar to FUS clusters previously observed, 91 thus indicating that TDP-43 forms similar pre-phase separation clusters as FUS.
The simultaneous, yet independent measurement of the sizes of nanoclusters and monomeric protein further allows estimating the number of monomer units per nanocluster. This is done through comparison of the volume ratios of monomeric TDP-43 and the clustered form. Assuming no restructuring of the protein within the nanocluster, a single cluster could contain as much as 20,000 proteins if the cluster is composed of pure protein. However, as condensates are liquid in nature and contain solvent molecules, typical volume fractions of proteins within condensate systems are on the order of ~10-35%; [95][96][97] hence, we expect the number of proteins per cluster to be in the range of 2,000-7,000. In addition to size measurements, the ability to directly count clusters in a digital manner also enables the quantification of cluster particle concentrations and volume fractions. From three repeat measurements, we detected an average number of clusters of N = 2281 ± 929. Using a previously established conversion strategy, 46 this corresponds to a flux of F = 72,606 ± 29,570 clusters per second or a cluster particle concentration of c = 7.24 ± 2.9 pM, corresponding to a total nanocluster volume fraction of ϕ = 3.16 • 10 -5 . The concentration of TDP-43 nanoclusters detected here was therefore more than an order of magnitude higher than previously determined for FUS nanoclusters formed under the same protein and salt concentrations, 91 suggesting a difference in the intermolar interactions that stabilize TDP-43 nanoclusters. Notably, TDP-43 is prone to aggregation and possesses a disordered domain capable of forming amyloid fibrils. 98 These characteristics may contribute to enhanced intermolecular interactions also in the clustered state, and potentially also explain the higher propensity of TDP-43 to form nanocluster assemblies.
Taken together, we have shown here that smMDS is able to capture and size prephase separation nanoclusters formed beyond the detection limit of conventional fluorescence microscopy. The discovery of TDP-43 nanoclusters and the understanding of the nature of such sub-diffraction assemblies is critical, in particular, for progressing our understanding of macroscopic phase separation phenomena. Our single-molecule sizing approach therefore provides insight into a largely unexplored area of protein assembly, taking advantage of the capability to elucidate properties of low abundance species present in a heterogeneous condensate system.

Conclusions
The physical characterization of the sizes, interactions, and assembly states of proteins is vital for a heightened understanding of the biological function, malfunction, and therapeutic intervention of proteins. Of particular interest are insights into the compositional heterogeneity of proteins and protein mixtures such as of protein oligomers, protein aggregation reactions or protein condensate systems. However, such characterization is challenging to achieve using conventional biophysical approaches as these methods are mostly reliant on ensemble readouts, which limits sensitivity and the information content that can be retrieved from such measurements. The smMDS approach developed herein addresses these challenges and takes advantage of the high sensitivity afforded by single-molecule detection and the physical features of diffusion in the microfluidic regime to enable digital sizing of proteins, protein assemblies, and heterogenous multicomponent protein systems directly in solution.
With different examples, ranging from single proteins to protein assemblies, we illustrated how the digital nature of the smMDS approach enables diffusional-sizingbased monitoring of protein hydrodynamic radii down to the femtomolar concentration range, thereby pushing the limits of sizing experiments by more than 5 orders of magnitude. smMDS further enables measurement of binding affinities of protein interactions at the single-molecule level and allows resolving high-and low-oligomeric states of proteins to gain insights into subpopulation distributions and oligomer equilibria. We further characterized the polydisperse nature of protein assemblies on the example of a protein aggregation reaction and applied smMDS to discover nanoclusters of a phase separating condensate system. These examples highlight the capability of the approach to elucidate properties of low abundance species present in heterogeneous biomolecular systems.
The implementation of the smMDS platform as demonstrated here is based on the integration of microfluidic diffusional sizing with in-situ scanning confocal microscopy. Future iterations of the platform could also incorporate, for example, multicolor singlemolecule detection and FRET techniques, [99][100][101] as well as other, downstream microfluidic separation modalities, 102 to increase sensitivity, resolution, and information content even further. We also foresee that smMDS has the potential to be developed into a commercial benchtop instrument, similar to the one developed, for example, by Fluidic Analytics for conventional MDS experiments. 103 We note that the analytes tested in this work here were probed in aqueous buffer solutions. However, the smMDS approach should be adaptable to the detection of targets in biological and clinical samples. We further note that analytes measured here are in the 1-100 nm regime. However, smMDS is not limited to this size range and much larger proteins and protein assemblies can be quantified by adapting, for example, the flow rates or the chip design.
Taken together, the new capabilities of the smMDS platform augment the information content from sizing experiments beyond what is achievable and assayable with classical techniques. smMDS not only enables direct digital sizing of proteins and protein assemblies, but also provides quantitative information on protein interactions and heterogenous multicomponent protein systems. Given the key features of the technique, we anticipate that the smMDS approach will have a multitude of applications in quantifying the sizes and interactions of proteins and other biomolecules in various areas of biological and biomedical research, including the mechanistic and functional analysis of proteins, the molecular design of protein therapeutics, and the characterization of new nanomedicines and biomaterials.

Protein and sample preparation
Alexa 488 carboxylic acid was obtained as lyophilized powder from Thermo Fisher. Stock solutions at millimolar concentrations were prepared in dimethyl sulfoxide and further diluted in PBS buffer. Lysozyme, thyroglobulin, HSA, and RNase A were purchased from Sigma Aldrich as lyophilized powder in the highest purity available and suspended in 100 mM NaHCO3 buffer (pH 8.2). Human Leukocyte antigen (HLA) A*03:01 was obtained through the NIH Tetramer Core Facility at Emory University (USA) and rebuffered into 100 mM NaHCO3 buffer (pH 8.2). Human wildtype αsynuclein was recombinantly produced following a protocol detailed elsewhere and prepared in PBS. 104 TDP-43 was produced as a C-terminal EGFP-tagged protein variant in insect cells as previously described 88 and stored in 50 mM Tris-HCl (pH 7.4), 500 mM KCl, 5% (w/v) glycerol, 1 mM DTT. Concentrations were determined by UV/VIS spectroscopy.

Protein labeling
Protein solutions (lysozyme, thyroglobulin, HSA, RNase A, HLA, and α-synuclein) were incubated with N-hydroxysuccinimide (NHS)-ester-functionalized Alexa 488 (Thermo Fisher) at 5-molar excess of dye and incubated at room temperature for 1 h. Labeled protein was then separated from unbound dye by size-exclusion chromatography on a Superdex 200 increase 10/300 GL column (Cytiva) connected to an AKTA pure chromatography system (Cytiva) with PBS as elution buffer. For labeling of α-synuclein, the incubation was done at 4°C for 16 h and purification was performed on a Superdex 75 increase 10/300 GL column (Cytiva). Following separation, the labeled protein concentrations were determined using UV/VIS spectroscopy. The proteins were stored at -80°C until further use.

Generation of labeled α-synuclein oligomers and fibrils
A cysteine-containing variant (N122C) of α-synuclein was used for the preparation of labeled α-synuclein oligomers. The protein variant was produced as previously described 59 and labeled with maleimide-functionalized Alexa 488 dye, followed by purification using Sephadex G25 column (GE Healthcare). Oligomers were produced according to procedures detail elsewhere. 59 Briefly, labeled monomeric α-synuclein was lyophilized in deionized water and subsequently resuspended in PBS (pH 7.4) at a final concentration of ~6 mg mL -1 . The resulting solution was passed through a 0.22 µm filter (Millipore) before incubation at 37°C for 16 h under quiescent conditions. Small amounts of fibrillar species were removed by ultracentrifugation. Excess monomeric protein was then removed by multiple filtration steps using 100-kDa cutoff membranes. The final oligomer concentration was determined by UV/VIS spectroscopy.
Recombinant human wildtype α-synuclein was used for the generation of α-synuclein fibrils. Fibrils were prepared from a mixture of Alexa 488 labeled and unlabeled αsynuclein protein, as previously described. 30 Briefly, a mixture containing 10% labeled and 90% unlabeled α-synuclein monomer was shaken at 37 °C and 200 rpm for 4 days to generate 1 st generation fibrils. Then, the seeds of 1 st generation fibrils were recovered by centrifugation and incubated with 10% of labeled and 90% unlabeled monomer at 37 °C and 200 rpm for 3 days to generate 2 nd generation fibrils. After centrifugation and recovery, 2 nd generation fibrils were sonicated (10% power, 30% cycles for 90 s) using a Sonopuls HD 2070 ultrasonic homogenizer (Bandelin) and stored at room temperature until further use. Fibril concentration was determined by UV/VIS spectroscopy.

smMDS platform
The approach described here integrates microchip-based diffusional sizing with confocal fluorescence detection. Schematics of the microfluidic device, the optical setup, and their integration are shown in Figure 1a. Briefly, the microfluidic chip design is based on previously reported device designs for diffusional sizing. 25,30,105 The device has two inlets, one for the injection of the sample and one for the injection of co-flowing buffer solution. Channels of 25 µm in height and 50 µm in width, respectively, direct the sample and the buffer solutions to an entry nozzle. At the entry nozzle, the sample and the buffer stream merge into an observation channel of 25 µm in height and 225 µm in width, in which diffusion is monitored. Notably, the channel geometry at the nozzle point is designed such that sample and buffer solution are drawn through the chip in a ~1:8 volume ratio. The observation channel is folded multiple times and is approximately 90'000 µm long and terminates at a waste outlet where negative pressure is applied by a syringe pump. Scanning markers are integrated on the chip adjacent to the observation channel for defining start and end points of the scan trajectory. Details on the fabrication of the device by standard soft-lithography and molding techniques are given below.
The optical unit of the smMDS platform is based on fluorescence confocal microscopy and optimized for microfluidic experiments. The microscope is built around a 'rapid automated modular microscope' (RAMM) frame (Applied Scientific Instrumentation (ASI)) and is equipped with a motorized x,y,z-scanning stage (PZ-2000FT, ASI), onto which the diffusional sizing chip is mounted. For controlling the exact sample placement along the optical axis of the microscope, the stage is equipped with a zpiezo. To excite the sample in the device, the beam of a 488-nm wavelength laser (Cobolt 06-MLD, 200 mW diode laser, Cobolt) is passed through a single-mode optical fiber (P3-488PM-FC-1, Thorlabs) and collimated at the exit of the fiber by an achromatic collimator (60FC-L-4-M100S-26, Schäfter + Kirchhoff) to form a beam with a Gaussian profile. The beam is then directed into the microscope, reflected by a dichroic beamsplitter (Di03-R488/561, Semrock), and subsequently focused to a concentric diffraction-limited spot in the microfluidic channel through a 60xmagnification water-immersion objective (CFI Plan Apochromat WI 60x, NA 1.2, Nikon). The emitted light from the sample is collected via the same objective, passed through the dichroic beam splitter, and focused by achromatic lenses through a 30µm pinhole (Thorlabs) to remove any out-of-focus light. The emitted photons are filtered through a band-pass filter (FF01-520/35-25, Semrock) and then focused onto a single-photon counting avalanche photodiode (APD, SPCM-14, PerkinElmer Optoelectronics), which is connected to a TimeHarp260 time-correlated single photon counting unit (PicoQuant).

Fabrication of microfluidic devices
The microfluidic device for smMDS was fabricated in poly(dimethylsiloxane) (PDMS) through a single, standard soft-photolithography step. In brief, the device was designed in AutoCAD software (Autodesk) and printed on acetate transparencies (Micro Lithography Services). The master replica for fabricating the device was prepared by soft-lithography methods. 106 Accordingly, SU-8 3025 (Kayaku) was deposited onto a polished silicon wafer and then spun to a height of approximately 25 μm. The acetate mask was placed onto the coated wafer and exposed with UV light using a custom-built LED-based apparatus. 107 Following UV exposure, the master was developed in propylene glycol methyl ether acetate (PGMEA, Sigma Aldrich). The exact height was measured by a Dektak profilometer (Bruker). The mold was then used to generate a patterned PDMS chip. For this purpose, PDMS (Sylgard 184, Dow Corning) was mixed with curing agent (Sylgard 184, Dow Corning) at a ratio of 10:1 (w/w) and subsequently degassed and baked for approximately 1 h at 65°C. This PDMS cast was removed from the mold, and access holes for the inlet and outlet connectors were punched with biopsy punches. The devices were bonded onto a thin glass coverslip (no. 1.5, Menzel) after both the PDMS and the coverslip glass surface had been activated by oxygen plasma (Diener electronic, 40 % power for 30 s). Before injecting the buffer into the channels, the chips were rendered more hydrophilic through an additional plasma oxidation step (Diener electronic, 80% power for 500 s). 108

Experimental procedures
All experiments were performed at room temperature. Buffer was PBS (pH 7.4) in all experiments, except in nanocluster experiments, where TRIS buffer was used (50 mM TRIS-HCl, pH 7.4). Buffers were supplemented with 0.01% Tween 20 (Thermo Fisher) to prevent adhesion of molecules to chip surfaces. The PDMS-glass device was secured to the motorized, programmable microscope stage. Co-flowing buffer and sample were entered into the chip through gel-loading tips inserted into the appropriate inlet orifices and drawn through the chip by applying negative pressure with a syringe (Hamilton) and syringe pump (Cetoni, neMESYS) connected to the outlet. Flow rates were 100 µL/h in all experiments, except in nanocluster experiments, where the flow rates were 150 µL/h for continuous scan experiments and 60 µL/h for step scan experiments. Flow was allowed to equilibrate over six minutes before data acquisition. Diffusion profiles were obtained by translocating the confocal volume either in a continuous or stepwise manner through the four innermost channels of the microfluidic sizing chip using a custom-written Python script that simultaneously controlled the stage movement and the data acquisition at the mid-height of the device (i.e., ~12.5 µm above the surface of the glass coverslip). Continuous scans were performed at 20-100 µm/s.
Step scans were done in 200-400 steps for a duration of 1-60 s at each position. The scanning markers were used to define the x,y,z-coordinates of the start and end positions of the scan trajectory. Each experiment was performed in a freshly fabricated PDMS device. The laser power at the back aperture of the objective was adjusted to 370 µW in all experiments, except for experiments on fibrils, where laser powers of 100 µW were used, and for experiments on nanoclusters, where laser powers of 6 µW were used. Photon recordings were done in T2 mode and the arrival times of photons were measured with respect to the overall measurement start with 16-picosecond resolution.

Data analysis
Data analysis and plotting was done in Python. In continuous scan experiments, photon recordings were binned in 1-ms intervals to obtain intensity readouts, from which diffusion profiles were generated by plotting the obtained fluorescence intensities as a function of chip position. In step scan experiments, diffusion profiles were created by extracting single-molecule events from the recorded time trace at each position using a burst-search algorithm, and plotting the obtained the number of counted molecules as a function of chip position. The burst-search algorithm identifies single molecules from the photon time trace by applying a combined maximum interphoton time (IPTmax) and minimum total number of photons (Nmin) threshold. IPTmax and Nmin were in the range of 0.005-0.02 ms and 5-20 number of photons, respectively, for all experiments performed under single-molecule conditions, unless otherwise stated. In addition, a Lee filter of 2-4 was applied that smoothens regions of constant signal while keeping those with rapid parameter changes unaffected (such as the edges of the bursts).
To extract size information, the obtained diffusion profiles, from both continuous and step-scan experiments, were analyzed with a custom-written analysis software, freely available on Github (https://github.com/impact27/diffusion_device). This script fits the obtained diffusion profiles with simulated diffusion profiles from numerical model simulations solving the diffusion-advection equations for mass transport under flow. Hereby, a least-squares error algorithm is used to find simulated profiles with the lowest residuals to determine D and recover RH via the Stokes-Einstein relationship.