Introduction

Single Molecule Localization Microscopy (SMLM) has boosted our insights into cellular structures below the diffraction limit of light microscopy1. Common to all SMLM variants is the stochastic switching of single dye molecules between a bright and a dark state. Conditions are chosen such that only a marginal portion of the molecules is in the bright state, so that single molecule signals are well separated on each frame. The final superresolution image is reconstructed from the localizations of all single molecule signals.

Researchers have been particularly intrigued by the possibility to determine the spatial distribution of biomolecules in their natural environment, in most cases the intact cell. For example, models for cellular signaling are crucially affected by the spatial organization of receptor and downstream signaling molecules at the plasma membrane2,3. Application of SMLM to various plasma membrane proteins revealed the presence of nanoclusters to different degrees4. More recently concerns were raised that the stochastic activation process of the fluorophores, along with the presence of more than one dye molecule per labeled biomolecule, may lead to multiple observations of the same biomolecule in the superresolution image5,6. Different attempts were undertaken to approach this problem5,7,8,9,10,11, e.g. by merging localization bursts into one localization12, by analyzing the number of blinking events per localization cluster10,11, or by evaluating the spatial spread of the localization clusters7. A disadvantage of existing methods is the requirement of user-defined parameters7,12 or additional experiments to characterize the blinking statistics of the chosen fluorophores10,11. We recently developed a parameter-free method to identify global protein clustering based on a label titration approach8 (see also Spahn et al.9), however, in case of faint bimolecular clustering the discrimination is difficult and rather subjective. Taken together, it would be helpful to provide a parameter-free quantitative assessment for the reliability of the statement, whether biomolecular nanoclusters occur in an image or not.

Here we present a method to assess biomolecular nanoclustering in two-dimensional SMLM via p-values in the framework of statistical significance tests, termed 2-Color Localization microscopy And Significance Testing Approach (2-CLASTA). The idea is to target the same biomolecule of interest with different fluorescent labels, determine the localizations in the respective color channels and calculate the nearest neighbor distances between them. The test compares the nearest neighbor distances for the recorded data with the distances for a randomized data set calculated from the measured data. As an output, the method provides a p-value for the null hypothesis that the experimental data set corresponds to an underlying biomolecular distribution, which is not significantly different from a completely random distribution as described by a spatial Poisson process. In this respect, 2-CLASTA differs from existing quantitative approaches, which typically aim at determining quantitative parameters before actually testing the mere presence of biomolecular clusters. The method is parameter-free and does not require any additional measurements nor grouping of localizations. We validated the method experimentally in cells expressing artificially clustered proteins by showing that sizes down to 2 molecules per cluster can be reliably detected.

Results

Testing the null hypothesis of a random biomolecular distribution

Labeling the biomolecule of interest in two different colors yields different two-color SMLM images for a random versus a clustered biomolecular distribution (Fig. 1a). Both images show clear clustering of localizations in each of the color channels due to multiple observations of single dye molecules. The localization clusters of different color, however, correlate only in case of an underlying clustered distribution of biomolecules. As a quantitative measure of this correlation we used the empirical cumulative distribution function \(cdf\) of the nearest neighbor distance \(r\) between the localizations of the two different color channels. Importantly though, \(cdf(r)\) not only depends on the spatial distribution of the labeled biomolecule. Particularly, the blinking statistics of the fluorophore and the number of dye molecules conjugated to the biomolecule of interest affect the distribution functions. Since these parameters are commonly unknown, the different contributions to \(cdf(r)\) are difficult to disentangle.

Figure 1
figure 1

Analysis of localization maps with 2-CLASTA. (a) Simulated two-color localization maps for a random (left column) and a clustered (right column) distribution of biomolecules. Localization maps  show a 2 × 2 µm2 region. For the simulation of blinking we used experimental data obtained for SNAP488 (blue channel) and SNAP647 (red channel). (b) Shifting all localizations of the blue color channel by the shift vector \(\overrightarrow{v}\) breaks correlations between the two color channels. (c) The cumulative distribution function of nearest neighbor distances, r, between the two color channels is plotted in green for the localization data shown in (a). \(cd{f}_{rand}\) of N = 99 control curves, generated with randomly chosen toroidal shifts, are depicted in light gray. The mean of all control curves is shown in black. From the rank of the curves, we calculated a p-value of p = 0.50 for the random case, and p = 0.01 for the clustered case.

To analyze the data, we hence opted for a strategy which is independent of prior information on label properties. The idea is to determine a randomized distribution function \(cd{f}_{rand}(r)\) for a scenario in which correlations between the two color channels are broken, by directly using the experimental data contained in the original SMLM recording. Our approach is similar to a goodness-of-fit test, in which the experimental data are compared with Monte Carlo-simulated control data sets using a global deviation measure for calculation of a p-value13.

In order to construct a randomized two color data set we transformed the localizations of one color channel and calculated their nearest neighbor distances to the untransformed localizations of the other color channel. As transformation we used a so-called toroidal shift14, which denotes a shift by an arbitrary distance in an arbitrary direction, assuming periodic boundary conditions (Fig. 1b). This breaks potential correlations between the two color channels while conserving the univariate pattern characteristics. The resulting \(cd{f}_{rand}(r)\) implicitly accounts for the correct blinking statistics and degree of labeling, and can hence be taken as ground truth for the situation of two uncorrelated images, irrespective of their univariate clustering that may be present in each color channel itself. Ideally, for a completely random protein distribution the cumulative density functions are equal (\(cdf(r)=cd{f}_{rand}(r)\)), whereas for a non-random distribution they are not (\(cdf(r)\ne cd{f}_{rand}(r)\)). Note that \(cd{f}_{rand}(r)\) does not need to correspond to a truly random distribution of molecules.

For the statistical assessment, we compared the original empirical \(cdf(r)\) with a set of N = 99 realizations of \(cd{f}_{rand,i}\) (\(i=1,\ldots ,N\)) for random choices of the toroidal shift vector \(\overrightarrow{v}\) (Fig. 1c) and ranked the summary statistics \({g}_{data}\) of the original curve (green) with respect to the control curves (gray) (see Methods). Since we are interested in nanoclustering of biomolecules, we determined the one-sided p-value by ranking the original \(cdf(r)\) with respect to all calculated \(cd{f}_{rand,i}\); the rank is measured in descending order (note that the method also allows for assessing biomolecular repulsion by calculating the rank in ascending order). In practice, prior knowledge on cluster sizes can be taken into account e.g. by constraining the analysis to short distances. For this, we introduced a parameter \({r}_{max}\), which should be chosen close to the sum of the localization errors and the expected cluster size. Here we ignored prior knowledge and set \({r}_{max}\) to the maximum occurring distance, if not mentioned otherwise.

Naturally, the p-value as defined here is limited to discrete numbers with steps of \(\frac{1}{N+1}\), which also defines the minimum p-value obtainable with this method. As expected, the p-value is uniformly distributed in the interval \([0,1]\), when testing realizations of the null hypothesis against the null hypothesis itself (Fig. S2). Hence, this p-value allows for the correct interpretation of the significance level α as the probability of falsely rejecting the null hypothesis. α can also be interpreted as the inevitable false positive rate for the erroneous detection of overcounting-induced clustering for a random distribution of biomolecules. Taken together, by offering an appropriate significance test, 2-CLASTA is hardly susceptible to the inadvertent interpretation of localization clusters as biomolecular nanoclusters.

Sufficient sensitivity to detect even faint spatial biomolecular clustering is crucial to this test. We assessed the sensitivity (also frequently termed power) of 2-CLASTA for two clustering scenarios: i) biomolecular oligomerization (dimers, trimers, and tetramers), and ii) spatially extended clusters with varying load. The spatial distribution of the biomolecules and the according localization maps were generated with Monte Carlo simulations and evaluated with 2-CLASTA. We quantified the test performance via the sensitivity defined as \(sensitivity=\frac{tp}{tp+fn}\), with \(tp\) denoting the true positives (here defined as correctly detected clustering) and \(fn\) the false negatives (here defined as erroneously missed clustering). We used a significance level α = 0.05 in the following.

Sensitivity to detect biomolecular oligomerization

We first assessed the sensitivity of 2-CLASTA to detected different degrees of oligomerization. For this, we simulated 10 × 10 µm2 sized localization maps  containing randomly distributed dimers, trimers, or tetramers, assigned labels of the two colors with the according blinking statistics and added localization errors. Each localization map can be considered as a realization of a two-color superresolution experiment. The localization maps were analyzed by 2-CLASTA, yielding a p-value for each localization map and the sensitivity for each parameter set. We showcased the performance of the method with a simulated “ideal” scenario, which lacks the presence of unspecific signals. We further assumed a global degree of labeling of 100%, i.e. each biomolecule was represented either by a blue or a red label, yielding localizations according to the experimentally derived blinking statistics for SNAP488 and SNAP647, respectively (Fig. S1). If not specified otherwise, we simulated a balanced labeling ratio for the two colors. Unspecifically bound fluorophores and background signals may be present in the final localization maps, which may affect the obtained statistics. We hence also analyzed a more “realistic” scenario, for which we added 5 unspecifically bound labels per µm2 in each color channel, and 1 or 2 unspecific background signals per µm2 in the red or blue color channel, respectively; the characteristics of the unspecific background signals were experimentally determined on unstained cells. For the “realistic” case, we further assumed a reduced global degree of labeling of 40%. All simulation parameters are listed in the Materials and Methods section under the subheading Simulations.

Firstly, we were interested in the total number of biomolecules per image that are required for a reliable detection of oligomerization. Already low numbers of biomolecules of ~1,000 per image (corresponding to 10 molecules per µm2) allow for a sensitive detection even of dimerization, both for the “ideal” and the “realistic” scenario (Fig. 2a). As expected, the sensitivity is somewhat reduced with decreasing degree of oligomerization: this is a consequence of the reduced fraction of oligomers carrying two different labels, particularly for the “realistic” scenario. For the following simulations, we used 7,500 molecules per image (75 molecules/µm2). We next tested the influence of a reduced labeling efficiency. In general, sensitivity was found to be high even down to a labeling degree of ~20% (Fig. 2b). To test the influence of different blinking statistics or of multiple dye molecules per label, we compared the simulation results obtained with the blinking statistics of the pair SNAP488 and SNAP647 with results obtained using the blinking behavior of the pair PS-CFP2 (blue channel) and an Alexa Fluor 647-conjugated antibody (KT3647, red channel)15, yielding virtually identical curves (Fig. S3). We also tested artificial blinking statistics assuming log-normal distributed numbers of localizations for each label16. Even for the rather extreme case of ten-fold difference in the mean number of localizations per biomolecule we observed only minor effects on the sensitivity of our method.

Figure 2
figure 2

Robustness of 2-CLASTA for the detection of different degrees of oligomerization. To assess the influence of individual parameters we determined the sensitivity as a function of the number of molecules (a), the labeling efficiency (b), the labeling ratio (c), and directional stage drift (d). We simulated dimers (+), trimers (▲) and tetramers (■), both for the “ideal” (solid line) and the “realistic” scenario (dashed line). For panel (d), virtually all simulated scenarios yielded a sensitivity of 1. If not varied in the respective subpanel, parameters in all simulations were set to a molecular density of 75 molecules/µm2, a labeling efficiency of 40% for the real case and 100% for the ideal case, a labeling ratio of 1:1, and no stage drift. Each data point corresponds to 100 independent simulations.

Importantly, all blinking statistics were obtained under low labelling conditions in cells and hence adequately reflect the variability in the local environments of the dye molecules.

In a real-life experiment, differently colored ligands may have different affinities for the target biomolecule, leading to unbalanced labelling. While this can be compensated experimentally by adjusting the concentrations of the two labels, we found the sensitivity to remain high also for unbalanced labeling (Fig. 2c). Next, we also tested the influence of randomly distributed unspecific labels added onto the simulated oligomer distributions, yielding only marginal influences (Fig. S4a). Further, the magnitude of the localization errors hardly affects the obtained results (Fig. S4b). Interestingly, an assessment of the influence of stage drift showed that drift-velocities of up to 500 nm over 10.000 frames hardly affected the test sensitivity (Fig. 2d). This is not unexpected, as moderate drift hardly diminishes the correlations between the two color channels in an experiment performed at alternating laser excitation. Moderate chromatic aberrations have a negligible effect on the sensitivity of 2-CLASTA (Fig. S4c,d). We also evaluated the influence of different values of \({r}_{max}\) on the sensitivity of the method, yielding only minor effects (Fig. S5). Finally, we compared different test statistics in their sensitivity to detect dimers. Using k-nearest neighbor statistics (k = 3,5,10) or Lcross statistics17 did not improve sensitivity (Fig. S6); note that Lcross statistics is not parameter-free.

Sensitivity to detect areas of enrichment or depletion of biomolecules

As a second realization of a non-random spatial distribution of biomolecules we considered spatially extended circular domains, the centers of which were randomly distributed across a two-dimensional plane. Molecules were placed either inside or outside of the domains, which thereby represent areas enriched or depleted in biomolecules compared to the surface density outside of the domains. To facilitate comparison with our previously published approach8,15, we used here the same parameter settings for assessing the performance of 2-CLASTA: we varied the domain radius between 20 nm and 150 nm, the domain density between 3 and 20 domains per µm2, and the fraction of molecules in domains between 20% and 100% (Fig. S7). The average density of biomolecules across the whole field of view was kept constant at 75 molecules per µm2.

In general, virtually all scenarios with a substantial heterogeneity in the lateral distribution of the biomolecule can be detected by 2-CLASTA (Figs. 3a and S8): both biomolecular clustering (top right corner) and exclusion areas (bottom left corner) yield a high level of correctly identified scenarios. In particular, the new method even outperforms our previous approach based on label titration, as can be seen by comparing the new figures with the respective plots from our previous paper (Supplementary Figs. 5 and 6)15.

Figure 3
figure 3

Sensitivity of 2-CLASTA to detect protein enrichment or depletion. (a) We determined the sensitivity of 2-CLASTA for varying densities of circular domains and percentage of molecules inside the domains. Data are shown for a cluster radius of 100 nm for the “ideal” case and the “realistic” case (see Fig. S8 & S9 for other cluster radii). (b) The sensitivity for the detection of rectangular clusters with a size of 80 × 400 nm2 is shown for the ideal case and the “realistic” case. Numbers in individual fields indicate the average number of molecules per domain, and the relative enrichment or depletion of molecules compared to a random distribution with identical average density. The gray sale indicates the fraction of scenarios with a p-value below the significance level α = 0.05, reflecting the sensitivity. Each field corresponds to 100 independent simulations.

The diagonal in Fig. 3a represents scenarios, in which the biomolecular concentration inside the domains is similar to the concentration outside of the domains. In other words, these situations correspond to random distributions of biomolecules, which – if detected – would lead to false positive results. Per definition, a random distribution leads to a false positive rate that is identical to the chosen level of significance (here α = 0.05). Indeed, for scenarios corresponding to identical biomolecular densities (±10%) inside versus outside the domains we obtained sensitivity values close to α, hence reaching the principal limit for analyzing a statistical data set.

Again, we simulated a more “realistic” scenario as defined above, yielding similar results as for the ideal scenarios (Figs. 3a and S9). Also in the case of extended areas of enrichment or depletion the localization error had only marginal influence on the sensitivity (Fig. S10). To assess whether the use of different fluorescent labels with altered photophysical properties affect the results, we repeated the simulations both for the “ideal” and the „realistic“ case using the blinking statistics derived previously for a multi-labelled antibody and the photoactivatable protein PS-CFP215, yielding virtually unchanged results (Fig. S11). Finally, we tested the algorithm on rectangular clusters of 80 × 400 nm2 size (Fig. 3b), yielding similar sensitivity as for circular domains of the same area coverage. In conclusion, the new approach allows for reliable detection of even faint biomolecular clustering, and is not susceptible to false positives due to overcounting artifacts.

Experimental validation

For experimental validation of the 2-CLASTA approach, we mimicked protein monomers and oligomers by concatemers of SNAP-tags with 1 to 4 subunits. These concatemers were anchored in the plasma membrane of HeLa cells via a glycosyl-phosphatidylinositol- (GPI-) anchor. For example, SNAP-concatemers of 4 SNAP-tag subunits would correspond to tetrameric protein oligomers. Clustering of the GPI anchor per se is not expected7,8. All experiments were performed at similar labeling densities of SNAP-Surface Alexa Fluor 488 (SNAP488) and SNAP-Surface Alexa Fluor 647 (SNAP647). dSTORM experiments were performed at alternating excitation, yielding superresolution images of the two color channels (Fig. 4a). For each concatemer, we recorded ≥25 cells, and determined the according p-value for the null-hypothesis of a random protein distribution, as described above (Fig. 4b). For SNAP-monomers, we observed a uniform distribution of p-values in the interval \([0,1]\), hence providing no indication for a non-random distribution. In contrast, dimeric, trimeric, and tetrameric SNAP-constructs yielded clear deviations from a uniform distribution, with a substantial peak at low p-values. This reflects the expected signature for an underlying non-random distribution of SNAP-tags.

Figure 4
figure 4

2-CLASTA analysis of an experimental data set. We analyzed GPI-anchored concatemers of SNAP-tags with n = 1 to 4 subunits expressed in HeLa cells as mimicry of n-mers. For dSTORM experiments, cells were labeled with SNAP488 and SNAP647. Panels (a) show two-color localization maps for representative cells, and panels (b) histograms of p-values obtained from at least 4 independent experiments per n-mer. Scale bars 250 nm (inset) and 2 µm.

There is, however, a non-negligible fraction of cells which show p-values > 0.05, even in the case of oligomeric SNAP constructs. This effect is rather prominent for dimers and decreases with increasing degree of oligomerization. In a practical situation, however, one should note that different cells show different protein expression levels, thereby yielding a variability in the number of molecules within the region of interest. As shown in Fig. 2a, a low number of molecules would reduce the sensitivity for the detection of oligomers, or – in other words – would likely yield a high p-value. Indeed, when plotting the obtained p-value versus the number of localizations obtained per cell, we found a trend for high p-values at low localization numbers, which became more pronounced with increasing degree of oligomerization (Fig. S12). Particularly, for appr. 1,000 molecules per image – corresponding to appr. 5,000 localizations, we expect reduced sensitivity, which agrees with Fig. S12.

Discussion

We present here a parameter-free method to statistically assess the question whether biomolecules are distributed randomly in two dimensions, yielding a p-value as output parameter. As for all SMLM methods, live cell applications are inherently restricted by dynamic cellular processes within the rather long recording times. The method is compatible with most fluorescence labeling techniques, as long as it is ensured that each protein molecule is connected to one color channel only: this includes fluorescent antibodies or nanobodies, tags, or low affinity binders18.

Up to now, statistical detection of biomolecular nanoclustering demanded for obtaining the localization maps of a truly random protein distribution as a reliable standard for comparison with the experimental data. If such a distribution was available, comparative analysis such as Rényi divergence19 would be feasible. It turns out, however, that localization maps - as they would result from a truly random biomolecular distribution - are difficult to obtain, particularly since the photophysics of organic dyes often changes with the local environment of the chromophore20. We circumvented the problem by analyzing not the images themselves, but a correlation metric between the localizations of the two color channels (in our case, the nearest neighbor distances). In principle, also other metrics could be used for the significance test (e.g. pair cross-correlation analysis7,21 or Ripley’s covariate analysis17), especially for testing deviations on length scales beyond the nearest neighbors.

The chosen analysis strategy based on correlation metrics offers the advantage that potential correlations between the two color channels can be deliberately broken, here by applying a toroidal shift to one of the two color channels. By this, the univariate spatial structure of each localization pattern is conserved, while possible correlations are removed. This provides the possibility of significance testing between the original data and the randomized control data sets as an additional advantage.

To make the method immediately applicable, we provide a plugin for ImageJ (see Supporting Material). The experimental basis is a chromatically corrected two-color SMLM data-set analyzed by standard single molecule localization tools22.

In the following, we give a brief discussion on the strengths and potential pitfalls of our approach:

Strengths

  1. (i)

    2-CLASTA is stable against errors in two-color image registration or drift correction. As long as these errors are smaller than typical cross-correlation distances of the two color-channels, the effects on the obtained p-values are marginal.

  2. (ii)

    The sensitivity of 2-CLASTA is virtually unaffected by increased localization errors. Such errors do not change the physical separation of the biomolecules, but only distribute the single molecule localizations over larger areas, while preserving the correlation of the two color channels. As the method is essentially sensitive to the biomolecular separation, localization errors have no strong effect on the sensitivity.

  3. (iii)

    2-CLASTA is not impaired by blinking dye molecules, and does not require the recording of single molecule blinking statistics (as e.g. in the methods published in refs. 5,10,11), making it insensitive to overcounting problems. In addition, 2-CLASTA can directly be applied to single images, thereby simplifying experimental efforts compared to our previously published method of label density variation8.

  4. (iv)

    The sensitivity of 2-CLASTA is not affected by any unknown characteristics of the clusters. No assumptions on cluster parameters (size, shape, occupancy) are required for the test. The test performs well even down to the detection of dimers, reflecting the smallest possible clusters.

  5. (v)

    2-CLASTA is stable against real life experimental challenges: A typical experiment contains non-specific localizations, or false negatives as a consequence of insufficient degree of labeling. Also the labeling ratio of the two colors may be unbalanced. We extensively tested the influence of such issues in Monte Carlo simulations, and found that the test is very robust over a wide range of parameters.

Potential pitfalls

The sample topography may influence the obtained results: Without further information, it is reasonable to assume a completely random distribution of biomolecules on a flat two-dimensional surface parallel to the focal plane as the null hypothesis of the test. Randomly distributed biomolecules on an arbitrary two-dimensional manifold, however, may lead to virtual clustering in the projection onto a two-dimensional plane. For example, invaginations of the plasma membrane or height differences near the cell borders will cause the accumulation of the detected positions of membrane proteins in the 2D projection23 and hence will likely lead to a rejection of the null hypothesis. Also, care should be taken when selecting the region of interest: ideally, a central region of the cell should be chosen for analysis, avoiding cell edges and apparent vesicular structures. In principle, whenever reasonable one may further restrict the region of interest in order to specifically scrutinize subcellular structures (e.g. synapses) for the presence of local biomolecular clustering. This comes, however, at the cost of lower molecule numbers, hence reducing the sensitivity.

Conclusion

Taken together, we believe that the 2-CLASTA approach is well suited for a qualitative assessment of spatial two-dimensional biomolecular distributions, before more sophisticated methods are used to characterize the clustering quantitatively6. It thereby addresses an interest by the community to include hypothesis testing in SMLM analysis24. By providing p-values, it makes use of the appropriate statistical parameter to test whether a specific data set is in agreement with a particular hypothesis25. Here, small p-values indicate suspicious deviations from randomness. Large p-values, in contrast, do not indicate any peculiarities in the sample; most notably, they do not prove a spatially random distribution of biomolecules. One should note that care has to be taken when interpreting the results of significance tests25,26. As a particular example, fishing for data sets with small p-values should be avoided.

A further application of 2-CLASTA is the analysis of co-localization of two different types of biomolecules: in this case, the two colors would be used to target the two different biomolecules. In this paper, we provide the framework to test for biomolecular association: extension towards assessment of biomolecular repulsion is straightforward and described in the Methods section.

Materials and Methods

Cell culture, DNA constructs, and reagents

All chemicals and cell culture supplies were from Sigma if not noted otherwise. All reagents for molecular cloning were from New England Biolabs. HeLa cells were purchased from DSMZ (ACC 57 Lot 23) and cultured in DMEM high glucose medium (D6439) supplemented with 10% fetal bovine serum (F7524) and 1 kU/ml Penicillin-Streptomycin (P4333). All cells were grown in a humidified atmosphere at 37 °C and 5% CO2.

For transient transfection of HeLa cells with GPI-anchored SNAP concatemers, we fused one or multiple copies of the SNAPf sequence to the N-terminus of the GPI-anchor signal of the human folate receptor. To this end, we carried out PCR to amplify the SNAPN9183S sequence from pSNAPf (N9183S) with >15 nt overhangs complementary to adjacent regions of the following SNAPf copy. We then used the Gibson assembly Master Mix (E2611) following the supplier’s instructions to iteratively insert multiple consecutive copies of the SNAPf sequence in frame with the GPI anchor. The resulting colonies were screened by site specific restriction digest using HindIII (R3104) to verify the number of inserted copies.

SNAP-Surface Alexa Fluor 488 (SNAP488) and SNAP-Surface Alexa Fluor 647 (SNAP647) were from New England BioLabs. Both labels were reconstituted in water-free DMSO (276855) at 10 mg/ml, aliquoted and stored at −20 °C until used.

STORM blinking buffer consisted of PBS, 50 mM β-Mercaptoethylamine (30070), 3% (v/v) OxyFluor (Oxyrase Inc., Mansfield, Ohio, U.S.A.), and 20% (v/v) sodium DL-lactate (L1375)27. The pH was adjusted to 8–8.5 using 1 M NaOH.

Sample preparation

Cells were transfected by reverse transfection using Turbofect (ThermoFisher, R0531) according to the supplier’s instructions with Opti-MEM (Gibco, 31985062) as serum-free growth medium. Briefly, cells were detached from tissue culture flasks using Accutase (A6964). Subsequently, approximately 50,000 cells were mixed with Turbofect-DNA complexes and seeded on fibronectin-coated (F1141) LabTek chambers (Nunc) and incubated overnight. The following day, cells were labeled for 30–45 min in the incubator with 50 nM SNAP488 and 1 µM SNAP647 diluted in cell culture medium. After labeling, cells were extensively washed with HBSS, and fixed with 4% formaldehyde (Thermo Scientific, R28908) and 0.2% glutaraldehyde (GA) for 30 min at room temperature. After another series of two washing steps, we added 450 µl freshly prepared STORM buffer immediately prior to imaging.

Superresolution microscopy and image reconstruction

A Zeiss Axiovert 200 microscope equipped with a 100x Plan-Apochromat (NA = 1.46) objective (Zeiss) was used for imaging samples in objective-based total internal reflection (TIR) configuration. TIR illumination was achieved by shifting the excitation beam parallel to the optical axis with a mirror mounted on a motorized table. The setup was further equipped with a 640 nm diode laser (Obis640, Coherent), a 405 nm diode laser (iBeam smart 405, Toptica) and a 488 nm diode laser (iBeam smart 488, Toptica). Laser lines were overlaid with an OBIS Galaxy beam combiner (Coherent). Laser intensity and timings were modulated using in-house developed LabVIEW software (National Instruments). To separate emission from excitation light, we used a dichroic mirror (Z488 647 RPC, Chroma). Images were split chromatically into two emission channels using an Optosplit2 (Cairn Research) with a dichroic mirror (DD640-FDi01-25 × 36, Semrock) and additional emission filters for each color channel (690/70 H and FF01-550/88-25, Chroma). All data was recorded on a back-illuminated EM-CCD camera (Andor iXon DU897-DV).

Typically, we recorded sequences of 20 000 frames in alternating excitation mode. Samples were illuminated repeatedly at 640 nm, 405 nm, and 488 nm with 2–3 kW/cm2 intensity (640 nm and 488 nm) and 3–5 W/cm2 (405 nm); intensities were measured in epi-configuration. We selected the illumination times in ranges of 3 ms–10 ms (640 nm), 3 ms–30 ms (488 nm), and 6 ms (405 nm). Time delays between consecutive illuminations were below 6 ms. The camera was read out after the 640 nm and after the 488 nm illumination, yielding 10 000 frames in each color channel. Thus, the total recording time for a full dataset ranged from 3 to 7 minutes. Only data from those frames were included in the analysis, in which well-separated single molecule signals were observable.

We recorded calibration images of immobilized fluorescent beads after each experiment (TetraSpeck Fluorescent Microspheres, life technologies, T14792) and registered the images as described previously28. Single molecule localization and image reconstruction was performed using the open-source ImageJ plugin ThunderSTORM29.

Quantitative analysis of single label blinking

We quantified single label blinking on HeLa cells expressing GPI-anchored SNAP-monomers, using the identical illumination protocol as for 2-color dSTORM recordings. To assure sufficient separation between individual label molecules, dSTORM experiments were performed at low labeling concentrations of either SNAP647 or SNAP488. To statistically quantify the blinking of SNAP647 and SNAP488, localizations from individual label molecules were grouped and quantified in MATLAB (R2019b, The MathWorks Inc., Natick, MA). We determined the first frame of appearance, the total number of detections per label (N) (Fig. S1), the time a label is detectable in consecutive frames (ton) and the time a label is not detectable (toff).

Calculation of p-values

We compared the positions of all localizations obtained in the red color channel \(({x}_{red},{y}_{red})\) with those obtained in the blue color channel \(({x}_{blue},{y}_{blue})\). For this, we calculated the distribution of distances, \({\rm{r}}\), from each red localization to the nearest blue localization, and determined the cumulative distribution function \(cdf(r)\). To determine the distribution of nearest neighbor distances under the null model we applied a so-called toroidal shift14 to the positions of the red color channel \(({\tilde{x}}_{red},{\tilde{y}}_{red})=({x}_{red},{y}_{red})+\overrightarrow{v}\), where \(\overrightarrow{v}=({x}_{shift},{y}_{shift})\) is the shift vector, with periodic boundary conditions set by the region of interest. Repeating this procedure \({\rm{N}}\) times with random shift vectors \(\overrightarrow{v}\,\)chosen uniformly within the region of interest, yielded \(N\) realizations of the null model of a random distribution of biomolecules. The according nearest neighbor distributions for each randomized control \(i\) was determined as described above, yielding \(cd{f}_{rand,i}(r)\). We defined the test statistic as the integral \(g={\int }_{0}^{{r}_{max}}cdf(r)dr\), since it captures the whole spectrum of distances, without bias. If not mentioned otherwise we set \({r}_{max}\) to the maximum nearest neighbor distance occurring during the whole analysis, making the method independent from any user-defined parameter (Fig. S5). To compare the distributions, we computed the value of the test statistic for the original data \({g}_{data}={\int }_{0}^{{r}_{max}}cd{f}_{data}(r)dr\). Calculating the test statistic for all randomized controls yielded the set \({G}_{rand}=\{{g}_{rand,i}={\int }_{0}^{{r}_{max}}cd{f}_{rand,i}(r)dr|i=1,\ldots ,N\}.\) We next determined \(rank({g}_{data},G),\) which is defined as the rank of \({g}_{data}\) within the set union \(G={G}_{rand}{\cup }^{}\{{g}_{data}\}\), where the statistical rank is measured in descending order. Naturally, \(p=\frac{rank({g}_{data},G)}{N+1}\) yields the one-sided p-value30, since for randomly distributed molecules \(\frac{rank({g}_{data},G)}{N+1}\) is uniformly distributed on the interval \([\frac{1}{N+1},1]\) (Fig. S2). The calculated p-value is limited to discrete numbers with steps of \(\frac{1}{N+1}\). Real molecular clustering results in a bias towards lower nearest neighbor distances, yielding low p-values as shown in Fig. 1c. In principle, the method can also be used to test for biomolecular repulsion; in this case, \({g}_{data}\) needs to be ranked within \(g\) in ascending order for calculation of the p-value.

To calculate a p-value from k-nearest neighbor statistics, cumulative distribution functions \(cdf(r)\) for the mean distance of the k-nearest neighbors (k = 3, 5 or 10) were processed and analyzed as described above. When using Lcross as basis of the test statistic, cumulative distribution functions \(cdf(L)\) with \(L={L}_{cross}({r}^{\ast })\) were generated for a radius of \({r}^{\ast }=50\,nm.\) The test statistic \(g\) was then calculated as \(g={\int }_{0}^{{L}_{max}}cdf(L)dL\). Since clustering results in high values of Lcross the \(rank({g}_{data},G)\) was measured in ascending order.

Simulations

Conceptually, simulations were performed as described previously15. All simulations were carried out in MATLAB (R2019b, The MathWorks Inc., Natick, MA) on a standard personal computer.

First, we simulated the underlying protein distributions for regions of 10 × 10 µm2, reflecting approximately the size of a typical cell. For all simulations we used 75 molecules per µm2, if not mentioned otherwise.

Simulation of oligomers

We distributed oligomers randomly within the region of interest, and assigned n biomolecules to each n-mer position (n = 1 to 4). A random distribution of biomolecules is naturally reflected by the case of n = 1.

Simulation of areas of enrichment or depletion of biomolecules

Circular domains with a radius of 20, 40, 60, 80, 100 or 150 nm were distributed randomly onto the region of interest with adjustable number of domains per µm2 (3, 5, 10, 15, 20 and 25). The number of biomolecules per domain was calculated from the total number of simulated molecules (here 7,500), the fraction of molecules inside domains (20, 40, 60, 80, 100%), and the number of simulated domains, assuming a Poissonian distribution. Biomolecules were distributed randomly within the domains. The remaining molecules were distributed randomly in the areas outside of the domains.

Second, two different types of labels, corresponding to the two colors, were assigned randomly to the molecules according to the specified labeling ratio, assuming Binomial statistics.

Third, to simulate blinking, we assigned a number of detections to each label (Fig. S1). For the simulations shown in Fig. S3 and Fig. S11, we used blinking statistics determined previously15 or artificial blinking statistics following a log-normal distribution. Localization errors were simulated by spreading these detections using a Gaussian profile centered on the molecule position with a width of 30 nm, which corresponds to typical localization errors achieved in SMLM experiments. We assumed identical localization errors for the two color channels.

Fourth, to account for experimental errors in the “realistic” scenarios, we included unspecifically bound labels at a mean density of 5 labels/µm2 for each color channel, assuming the blinking statistics determined for SNAP488 and SNAP647. We finally considered also false positive localizations by adding a background of 1 (2) signals/µm2 for the red (blue) color channel, again with experimentally determined blinking statistics obtained in unlabeled cells.

Fifth, to account for stage drift in Fig. 2d we assumed alternating laser excitation and hence added a global drift vector \(\overrightarrow{d}\) to the localizations of both color channels obtained at time \(t\) according to \(\overrightarrow{x}\to \overrightarrow{x}+\overrightarrow{d}\cdot t\).

Sixth, to account for residual chromatic aberrations in Fig. S4c,d, we displaced every localization of the red color channel by a vector, which was characterized by the vector field \((x\text{'},y\text{'})=\beta \cdot (x-{x}_{0},y-{y}_{0})\); we set \({x}_{0}={y}_{0}=2.5\,{\mu }m\). \(\beta \) was varied between 0 and 0.06.

If not mentioned otherwise, 100 simulations were performed for each experimental condition.

If not mentioned otherwise, we used the following set of parameters: 10 × 10 µm2 region of interest, 75 molecules per µm2, a balanced labeling ratio between the two color channels, no stage drift, 30 nm localization error (standard deviation), and the blinking statistics determined for SNAP488 and SNAP647 for the two color channels. For the “ideal” scenario we simulated 100% labeling efficiency, no unspecifically bound labels and no unspecific background signals. For the “realistic” scenario we simulated 40% labeling efficiency, 5 unspecifically bound labels per µm2 and color channel, and 1 or 2 unspecific background signals per µm2 in the red and blue color channel, respectively.