Probing the Heterogeneity of Protein Kinase Activation in Cells by Super-resolution Microscopy

Heterogeneity of mitogen-activated protein kinase (MAPK) activation in genetically identical cells, which occurs in response to epidermal growth factor receptor (EGFR) signaling, remains poorly understood. MAPK cascades integrate signals emanating from different EGFR spatial locations, including the plasma membrane and endocytic compartment. We previously hypothesized that in EGF-stimulated cells the MAPK phosphorylation (pMAPK) level and activity are largely determined by the spatial organization of the EGFR clusters within the cell. For experimental testing of this hypothesis, we used super-resolution microscopy to define EGFR clusters by receptor numbers (N) and average intracluster distances (d). From these data, we predicted the extent of pMAPK with 85% accuracy on a cell-to-cell basis with control data returning 54% accuracy (P < 0.001). For comparison, the prediction accuracy was only 61% (P = 0.382) when the diffraction-limited averaged fluorescence intensity/cluster was used. Large clusters (N ≥ 3) with d > 50 nm were most predictive for pMAPK level in cells. Electron microscopy revealed that these large clusters were primarily localized to the limiting membrane of multivesicular bodies (MVB). Many tighter packed dimers/multimers (d < 50 nm) were found on intraluminal vesicles within MVBs, where they were unlikely to activate MAPK because of the physical separation. Our results suggest that cell-to-cell differences in N and d contain crucial information to predict EGFR-activated cellular pMAPK levels and explain pMAPK heterogeneity in isogenic cells.


* S Supporting Information
ABSTRACT: Heterogeneity of mitogen-activated protein kinase (MAPK) activation in genetically identical cells, which occurs in response to epidermal growth factor receptor (EGFR) signaling, remains poorly understood. MAPK cascades integrate signals emanating from different EGFR spatial locations, including the plasma membrane and endocytic compartment. We previously hypothesized that in EGF-stimulated cells the MAPK phosphorylation (pMAPK) level and activity are largely determined by the spatial organization of the EGFR clusters within the cell. For experimental testing of this hypothesis, we used superresolution microscopy to define EGFR clusters by receptor numbers (N) and average intracluster distances (d). From these data, we predicted the extent of pMAPK with 85% accuracy on a cell-to-cell basis with control data returning 54% accuracy (P < 0.001). For comparison, the prediction accuracy was only 61% (P = 0.382) when the diffraction-limited averaged fluorescence intensity/cluster was used. Large clusters (N ≥ 3) with d > 50 nm were most predictive for pMAPK level in cells. Electron microscopy revealed that these large clusters were primarily localized to the limiting membrane of multivesicular bodies (MVB). Many tighter packed dimers/multimers (d < 50 nm) were found on intraluminal vesicles within MVBs, where they were unlikely to activate MAPK because of the physical separation. Our results suggest that cellto-cell differences in N and d contain crucial information to predict EGFR-activated cellular pMAPK levels and explain pMAPK heterogeneity in isogenic cells. KEYWORDS: cell-to-cell heterogeneity, EGFR, MAPK, super-resolution microscopy, Bayesian modeling I n recent years it has become evident that an inhomogeneous microenvironment, combined with the plasticity of the cancer genome, may lead to a significant degree of functional heterogeneity among cancer cells of the same tumor. 1 The magnitude and timing of the output signal commonly varies across a population of genetically identical cells. 2 The output signal integrates various receptor inputs in space and time and directs important physiological processes such as cell proliferation, migration, and survival among cells in culture. 3 Understanding the possible ranges of cellular output heterogeneity and what contributes to them mechanistically would represent a major leap forward in cell biology.
A particularly important example is the mitogen-activated protein kinase (MAPK) pathway, which is a signaling hub for multiple cues and spatiotemporally organized intracellular signal transmitters. 4 Activation of MAPK signaling governs gene expression, thereby controlling many physiological processes. 5 Overexpression of epidermal growth factor receptor (EGFR) has been reported for a number of different tumor types. 6 This has been postulated to lead to MAPK activation by allowing the receptors on the cell membrane to randomly collide and interact with one another with high frequency. 7 Nanoscale imaging of the plasma membrane using near-field scanning optical microscopy (NSOM) has recently shown that EGFR monomers are preferentially organized in ∼150 nm clusters (i.e., not randomly distributed) in both untreated and ligand-stimulated cells. 8 Several other studies applied scanning force microscopy to quantify EGFR clusters but have been restricted to monitoring their spatial organization only on the plasma membrane. 9, 10 We previously hypothesized that the MAPK signaling activity is largely determined by the spatial organization of the EGFR clusters (at a nanometer proximity scale), which, together with the associated Shc/Grb2/SOS/ RAS/RAF/MEK/MAPK complexes (that usually involve assembling scaffold proteins), will mainly determine the MAPK signaling output in individual cells. 11 Experimental validation of this hypothesis has not been achieved to date.
Nowadays super-resolution fluorescence imaging methods can achieve spatial resolutions in the tens of nanometers. 12−16 This enables the measurement of nanometer intracluster distances and the number of receptor molecules per cluster throughout a mammalian cell, which typically spreads in the tens of micrometer range, 17−19 identifying individual receptors and their clusters located on various structural components. A prior study combined super-resolution microscopy and singlemolecule FRET to investigate the dynamics and localization of activated EGFR dimers on the plasma membrane of live cells. 20 A different report applied STORM to provide mechanistic insights into EGFR cluster formation. 21 However, none of these studies have looked at endocytosed EGFRs trafficking to multivesicular bodies (MVBs) located in the perinuclear region of a cell. 22,23 EGFR clusters continue to signal from the endosomal and MVB membranes, activating MAPK via mechanisms that involve scaffold proteins such as MP1. 24−26 Little is known about the relationship between the cell-to-cell heterogeneity in the spatial organization of these EGFR clusters and the functional consequences in the cellular response on a single-cell level.
Here, we used generalized single-molecule high-resolution imaging with photobleaching (gSHRImP) 12,15 to characterize the intracellular heterogeneity in MAPK phosphorylation levels in response to EGF stimulation on a cell-by-cell basis. Quantum dot (QD) blinking has been successfully used as an alternative to photoswitching of organic fluorophores or photoactivatable proteins. 27 We quantified two EGFR cluster parameters, i.e., EGFR molecule number per cluster (N) and average intracluster distance (d) between any two individual EGFR molecules, and used them as input parameters for a Bayesian model to predict MAPK signaling output on a cellular basis. We also employed transmission electron microscopy (TEM) to determine the spatial location of EGFR clusters in the cells relative to their organelle structures. We were able to predict on a cell-by-cell level MAPK phosphorylation states on the basis of nanoscale organizational differences in EGFR clusters that are super-resolved. This lends support to and extends our previous proposition that the endosomal and MVB membrane localization of EGFR clusters can cause an increase in the number (or average lifetime) of signaling complexes, in a manner that is dependent on the spatial organization within the receptor oligomeric configuration. 11

RESULTS AND DISCUSSION
Cell-to-Cell Heterogeneity in Growth Factor-Stimulated MAPK Phosphorylation. MAPK (ERK1/ERK2) phosphorylation is a well-studied readout of MAPK pathway activation downstream of EGFR activation. We stimulated the monoclonal breast cancer cell line HCC1143 with EGF for various lengths of time at both 4 and 37°C and analyzed MAPK phosphorylation on a population level by immunoblotting (see Supporting Information A, Figure S1) and on a cellby-cell level by immunofluorescence ( Figure 1). The antibodies we used were specific for the residues pT202/pY204 (ERK1) and pT185/pY187 (ERK2), respectively. Treatment at 4°C was used to engage receptors with EGF but slow signaling response and receptor endocytosis. 28 On a population basis, MAPK phosphorylation peaked after 15 min of EGF treatment at 37°C ( Figure S1A,B). As expected, no MAPK Figure 1. Cell-to-cell heterogeneity of HCC1143 breast cancer cells in their MAPK signaling response subsequent to EGF stimulation. Each histogram was built from single-cell phospho-MAPK intensity levels (arbitrary units (a.u.)) from three independent experiments (9 confocal tiles per image, N ≥ 350 cells per condition for each experiment). The full width half-maximum (fwhm) of the antiphospho-MAPK intensity is a measure of cellular heterogeneity. Typical micrographs are shown.

ACS Nano
Article phosphorylation was observed by treating cells at 4°C, although phosphorylation of EGFR, immediate adaptor proteins, and some downstream targets (PI3K and PLCγ) at 15 and 30 min was previously reported. 29 Cellular heterogeneity of MAPK signaling was revealed by immunofluorescence microscopy after 15 or 30 min treatment at 37°C, as is demonstrated by the broadening of the anti-pMAPK intensity distributions ( Figure 1). Full width half-maximum (fwhm) was 138 arbitrary units (a.u.) when cells were treated at 4°C for 30 min. The fwhm increased to 524 au after 15 min at 37°C and narrowed upon longer treatment (30 min at 37°C) to 336 au; hence fwhm ratios (as a measure of histogram broadening) as compared to control conditions were 3.8 and 2.4, respectively. This shows an increase in range of observed pMAPK output values and reflects heterogenic behavior between these otherwise isogenic cells as compared to control conditions (4°C ). It is noteworthy that averaging of immunofluorescence intensities over many cells led to a result resembling immunoblotting ( Figure S1C), as both techniques are averaging over the whole population.
Several studies reported on the existence of higher order ErbB multimers and their functional relevance to signaling. 30−32 EGFR cluster formation has been reported to require EGFR kinase activity. 30,33 Members of the ErbB family have been shown to assemble into higher order nanostructures, but a precise structure−function relationship of these assemblies, in terms of how they specify signal output, remains unclear. 34,35 High-resolution imaging methods are required to investigate the significance of receptor nanoscale organization in regulating its function. Ranges of cluster diameters were measured with NSOM and found to have an average diameter of 150 ± 80 nm EGF-stimulated HeLa cells. 8 This study validated and complemented a prior report that estimated an average EGFR cluster density of 33/μm 2 with 10−30 EGFR receptors per cluster in the same cell line. 36 Quantitative Analysis of EGFR Nanoclusters by Superresolution Imaging. We employed super-resolution microscopy to visualize individual EGFR receptors and their oligomerization patterns on the spatial scales below the diffraction limit following EGF stimulation. 8 To fluorescently label EGFR, we followed a previously reported approach 37 to generate equimolar complexes of biotinylated EGF with streptavidin-QD565 (EBSQ) under carefully chosen reaction conditions (see Methods). The EBSQ complex was verified to be equivalent with EGF in stimulating pMAPK signaling responses and EGFR internalization (see Supporting Information B, Figure S2). The proximity between single EGFR molecules on the nanometer scale has been recognized as a prerequisite for receptor activation, and crystallographic studies have shown an asymmetric, ligand-induced activated EGFR dimer. 38,39 Nanopositioning of EBSQ molecules bound to EGFRs was based on the intrinsic capability of QDs to blink and was determined using generalized SHRImP 12,40 as described in Methods. Blinking was assigned to individual QDs based on the observed stepwise intensity changes before and after blinking events ( Our super-resolution technique (see Methods and Supporting Information C) allowed us to resolve QDs with up to 15 nm lateral resolution. 12 Here, the resolution is taken to be the smallest distance at which two EGFR molecules could be separated. We estimated the EBSQ complexes to be <20 nm

ACS Nano
Article (see Supporting Information D). A cluster is defined as the group of super-resolved receptors within a diffraction-limited spot. Intracluster distances (d ij ) were measured as distances between individual QDs using centroid coordinates of each localized QD. In order to correlate MAPK activation with EGFR cluster formation under different experimental conditions, we measured, first, MAPK phosphorylation of a cell using wide-field immunofluorescence microscopy as a surrogate for its MAPK pathway activation and, second, EGFR cluster organization by EBSQ super-resolution imaging. From the super-resolution images we then determined the number of EGFR molecules per cluster (N) and the average intracluster distance between EGFR monomers (d). When super-resolved, the vast majority of the diffraction-limited EBSQ spots imaged in HCC1143 cells incubated at 4°C were EGFR monomers (>70%), with 20% EGFR dimers and the remainder being trimers and, rarely, multimers ( Figure 3A). Upon EBSQ treatment at 37°C, this balance shifted dramatically at the Oligomers formed of more than 5 or 6 receptors could not always be resolved. (B) Trimers and fully resolvable multimers (i.e., clusters with N = 4−6; from here onward referred to as "N > 3" multimers) were analyzed for their intracluster distances between EGFR molecules. We categorized them as either d ≤ 50 nm or d > 50 nm. The relative numbers in those cluster categories change upon treatment. (C) Binary classification of each HCC1143 cell in each treatment condition into a "low" and "high" pMAPK intensity class. Data shown in all panels were obtained from 46 analyzed cells and 2164 clusters acquired in two independent experiments. The classes are significantly different; P < 0.0001 (unpaired two-tailed t-test with Welch's correction assuming unequal standard deviation). A positive weight implies that large values of that covariate are associated with the "high" class, whereas a negative weight indicates that large values of that covariate are associated with the "low" class. The weights are obtained by averaging over the weights obtained during leave-one-out cross-validation with all covariates. The 95% confidence intervals of the regression weights are plotted. The Bayesian analysis has been performed on imaging data acquired in two independent experiments.

ACS Nano
Article expense of EGFR monomers. Longer EBSQ stimulation at 37°C (30 min) led to a reduction of the number of dimers and trimers, but an increase in multimer numbers as compared to shorter treatment (15 min). These data are in line with previous reports of EGFR cluster formation upon EGF treatment. 33,41 Some of the larger, multimeric clusters could not be spatially resolved and, hence, could not be included in the prediction of pMAPK based on d or N and d combined. A number of clusters with N up to 4−6 were well resolved and included in the analysis ( Figure 3B). For simplicity, from here onward, we will refer to this particular cluster group as "N > 3".
Using the Number of EGFR Receptors per Cluster (N) to Predict Cellular MAPK Phosphorylation. Superresolution microscopy allowed us to determine two important parameters characterizing EGFR nanoclusters, N and d, which were used for a data-driven prediction of the MAPK phosphorylation level as an indicator of EGFR signaling. To test whether N or d, or both, contain any predictive information regarding the pMAPK intensity levels, we used a mathematical model based on a Bayesian linear classifier (BLC). 42,43 The BLC is an algorithm that attempts to find a linear combination of input covariates that can be used to predict which class each cell belongs to; classes were either "high/activated" or "low/ resting" MAPK phosphorylation. The pMAPK level was quantified by wide-field immunofluorescence microscopy on a cell-by-cell basis by measuring the average fluorescence intensity per unit area. Activated and resting cells separated in two groups that were significantly different ( Figure 3C; P < 0.0001), i.e., "high/activated" or "low/resting". The algorithm was first trained to make a classification decision. The cells were randomly separated into a training set and a validation set. Initially, the BLC was used to predict pMAPK using N as the only input variable. The four covariates used were the fraction of monomers (N = 1), dimers (N = 2), trimers (N = 3), and oligomers (N > 3) as determined by super-resolution microscopy for each individual cell. Figure 4A shows the training and validation success rates as a function of the number of covariates used in the BLC with the corresponding weights shown in Figure 4B. The trained BLC predicted "high" or "low" MAPK phosphorylation classes using these input variables with an accuracy of 65% (P = 0.090) in the validation set. To compare, a randomized data set of no predictive value yielded a prediction accuracy of 54%. Dimers and trimers were most predictive for EGF-induced MAPK phosphorylation, shown by the positive weights (see Methods).

Prediction of Cellular MAPK Phosphorylation Is Improved by Inclusion of EGFR Nanoscale Proximity
Information. First, we tested if intracluster distances d alone contain any information predictive for pMAPK. Applying BLC as before, we found d alone is not predictive for pMAPK (46%; P = 0.733; Supporting Information E, Figure S4). We then combined both parameters of the EGFR cluster, N and d, to test if this would improve prediction accuracy as compared to N alone. We divided the data into six subsets based on the various combinations between the number of receptors per cluster (N = 2, 3, or >3) and average intracluster distance (d ≤ 50 nm or d > 50 nm). Applying BLC with this new set of covariates, we achieved a prediction accuracy of 85% (P < 0.001, Figure 4C), which was much higher than what could be achieved with either of the two parameters individually. Combination of N = 4 and N = 3 into one covariate caused a drop in accuracy from 85% to 78% (the weight of the redefined covariate remains positive and statistically significant). Thus, we upheld the previous division of subsets. The probability of obtaining a "high" MAPK phosphorylation classification in an individual cell was observed to increase with increasing N until N became greater than 3 ( Figure 4A). The ranking of the covariate weights of the intracluster distance (d) within the N = 3 and N > 3 clusters revealed that d > 50 nm was positively correlated with a greater extent of MAPK phosphorylation ( Figure 4D; N = 3: P = 0.0465, significant; N > 3: P = 0.0601, trend); that is, trimers and, to a lesser extent, multimers with an average intracluster distance d > 50 nm were the top predictive classes. In contrast, clusters with d ≤ 50 nm were not predictive for MAPK phosphorylation. The quantitative analysis presented in Figure  4 was performed on pooled data obtained from two independent biological experiments for better statistics (individual experimental cohorts and their individual analyses are shown in Supporting Information F). The two replicates displayed a similar distribution of clusters across the covariate bins used ( Figure S5A) and achieved prediction accuracies of 89% ( Figure S5B) and 82% ( Figure S5C), respectively.
It is noteworthy that the same BLC-based prediction of pMAPK could also be performed using diffraction-limited fluorescence microscopy data. When we calculated the best achievable predictive accuracy by using averaged fluorescence intensities of EGFR clusters per cell as the only input, this resulted in a poor prediction accuracy of only 61% (P = 0.382), which was marginally above the randomized data set with no predictive value (54%). (B) Partitioning between MVB limiting membranes or ILV membranes for dimers, trimers, and multimers across all measured average intracluster distances; a box and whisker plot is shown with whiskers indicating minimum/maximum values. Localization on the limiting membrane of MVB becomes dominant with increasing average intracluster distance and number of receptors per cluster. Localization on the limiting membrane was found significantly different than localization on ILVs for trimers (P = 0.0247) using an unpaired two-tailed t-test with Welch's correction assuming unequal standard deviation.

Article
Considering that nanoscale proximity between EGFR molecules is a prerequisite for EGFR downstream signaling, 38,39 our data appeared to be counterintuitive. To find a possible explanation why trimers and multimers with average intracluster distances d > 50 nm are more predictive than dimers, trimers, and multimers with average intracluster distances d ≤ 50 nm, we investigated the intracellular trafficking behavior of the EGFR subpopulations of interest by direct visualization of their subcellular localization via TEM.
Top Predictive EGFR Subpopulations Localize Predominantly on the Limiting Membrane of MVBs. To understand the relationship between the number of EGFRs per cluster N, the average intracluster distance d, and the BLC weighting of the super-resolution microscopy data, we quantified the spatial distribution of clusters of N = 2, N = 3, and N > 3 EGFR receptors and correlated it to their location within endosomes as determined by TEM. TEM allowed us to acquire images of subcellular structures and visualize the partitioning of individual and clustered EGFRs between the limiting membranes of MVBs ( Figure S6) and intraluminal vesicles (ILVs) within the MVB lumen ( Figure 5A). The superresolution approach analyzes all EGFR regardless of position in the cell, while the requirement for ultrahigh magnification to visualize gold by TEM prevents parallel whole-cell analysis. We therefore only analyzed endocytosed EGFR gold particles (present on small endocytic vesicles and MVBs with one or more ILVs), which also minimized the possibility of analyzing EGFR that had not bound ligand. EGFR clusters were defined by all gold-labeled antibody-conjugated receptors with separation distances less than 22 nm between adjacent nanogold particles. The cutoff was empirically determined to sensibly assign particles to a cluster as visualized in the TEM images. After 30 min at 37°C the majority of N ≥ 3 multimers with larger intracluster distances displayed a tendency to localize on the limiting membrane of MVBs ( Figure S6), while many of the more compact clusters were sequestered on ILVs within the MVB lumen ( Figure 5B). The mean of the average size clusters with N ≥ 3 located on the limiting membrane was significantly larger than that of clusters located on ILVs (P = 0.0247, unpaired two-tailed t-test with Welch's correction). This is a good fit with our model, showing that EGFR trimer and multimer subpopulations with large intracluster distances (N > 3, d > 50 nm) are top predictors of MAPK phosphorylation.
We found the most predictive EGFR subpopulation to be N ≥ 3 clusters with average sizes larger than 50 nm (P = 0.0465). This first appeared counterintuitive considering EGFR signaling is initiated upon ligand binding to the receptor followed by its dimerization/multimerization. Importantly, TEM experiments allowed us to reveal the subcellular organization of the various subpopulations. While TEM does not have the quantitative statistical power of the super-resolution approach, it reveals the suborganellar EGFR distribution. TEM suggests a possible explanation for the predictive power of larger clusters (N ≥ 3, d > 50 nm) compared with smaller clusters. The predictive large clusters were mainly located on the limiting membrane of endocytic vesicles and multivesicular bodies, where they play an active role in signal transduction. In contrast, many dimers and densely packed EGFR multimers (N ≥ 3, d ≤ 50 nm) were located on ILVs and, thus, no longer contributed to the overall EGFR signaling response because of their physical separation from the vast majority of the MAPK pool, which resides in the cytosol.
The significance of the EGFR ≥ 3 multimers on the limiting membrane in triggering the MAPK signaling output may be explained by virtue of the associated Shc/Grb2/SOS/RAS/ RAF/MEK/MAPK complexes as we previously postulated through models. 11 Some corroborative evidence for this has previously been obtained using image correlation spectroscopy and Foerster resonance energy transfer imaging. This approach showed in EGF-stimulated BaF/3 cells that were stably transfected with Grb2 (Grb2-mRFP) and EGFR (EGFR-eGFP) nanometer-scale association of Grb2-mRFP with EGFR-eGFP multimers, which contained, on average, 4 ± 1 copies of EGFR-eGFP. 44 Many differences exist between this observation and our current work. These include differences in cell types, overexpressed 45 vs endogenous receptors (our study). Importantly, both report on a similar stoichiometry of the EGFR multimers that signal most efficiently in cells.
The spatial organization of cell surface receptors on variable length scales (from molecules to micrometers) plays an important role in cell signaling. 45 These cell-to-cell differences can provide a mechanism to control protein interactions and thus modulate signal transduction efficiency. Here, we were able to relate the spatial organization of the EGFR receptors to signaling dynamics and show that large EGFR trimers are the most predictive subpopulation for MAPK phosphorylation output. Our results strengthen the hypothesis that supramolecular receptor organization and spatial compartmentalization play a decisive role in signal transduction, thereby influencing cell fate and function. Our findings further reveal EGFR trimers and their intracluster distances to play a role as a marker of MAPK phosphorylation. Pending further validation in more complex samples such as cell mixtures and tissues, this finding has potential applications beyond mechanistic signal transduction studies. A better understanding of the role of differentially located EGFR multimers (limiting membrane vs ILV), e.g., by further dissecting the N and d information on our reported ILV-located EGFR in the context of chemotherapy resistance, 46 may have potential value in future clinical applications.

CONCLUSION
To better understand how signals are integrated and transmitted through signaling networks, we provide a framework for using super-resolution microscopy to access detailed spatial information about specific cellular molecules, and how to use it, in combination with mathematical modeling, to predict cellular outcomes. As a paradigm, we focused on how the spatial organization of the EGFR oligomeric network specifies the output signal through to MAPK phosphorylation in genetically identical cells. Ensemble behaviors of a population of cells may not reveal silent features of cell signaling. On the other hand, cell-to-cell differences are always present in any cell population and may or may not serve a biological function or contain meaningful information. Super-resolution imaging has been previously combined with TEM, 13 but we report here an integrated cell-by-cell approach between the two imaging techniques to extract detailed spatial information on EGFR distribution. Assembly of EGFRs into homodimers and smallsize multimers occurs on spatial scales in between those probed by FRET (<10 nm) and diffraction-limited microscopy (>200 nm). Our method, with a resolution of 15 nm, is an excellent approach for studying receptor multimerization at this spatial scale.

Article
Our approach was to analyze and interpret heterogeneity in cellular pMAPK levels by looking at the individual cell behavior and correlating it to the detailed spatial organization of superresolved EGFR clusters. Our finding is that the number of EGFR molecules per cluster N and the average EGFR intracluster distance d are highly informative for the prediction of the EGF-driven pMAPK output. N alone, d alone, and N and d combined were alternatively used as input parameters for a BLC to model, train, and validate the technique, resulting in prediction accuracies for MAPK phosphorylation of 65% (P = 0.090), 46% (P = 0.733), and 85% (P < 0.001), respectively. A randomized data set of no predictive value yielded an accuracy of 54%, demonstrating a 31% gain in accuracy if the N and d information is combined. The intracluster distance information is crucial in this prediction, as it improves accuracy from 65% (P = 0.090; with only EGFR numbers regarded) to 85% (P < 0.001). We and others have previously applied super-resolution imaging to visualize EGFR clusters. 20,21,27 Here we report using super-resolved cluster parameters to accurately predict the individual cells' MAPK phosphorylation levels. This approach of combining advanced imaging with mathematical modeling to understand systems level integration can be extended to improve our understanding of how many similar receptor tyrosine kinases function in various types of cancer.

METHODS
Reagents. Streptavidin-coated quantum dots of 565 nm and biotinylated EGF were from Life Technologies. Mouse anti-phospho p42/44 MAPK (clone MAPK-YT) antibody was from Sigma. Mouse anti-phospho p42/44-ERK1/2 and rabbit anti total-ERK1/2 antibodies were from Cell Signaling Technologies. Mouse monoclonal anti-EGFR antibody clone F4 binding to the sequence DVVDADADEY-LIPQ, which corresponds to EGFR amino acid residues 985−996, was obtained from the monoclonal antibody facility at Cancer Research UK. Donkey anti-mouse secondary antibodies conjugated to either DyLight594 or Cy2 were from Jackson Immuno Research, and goat anti-mouse-HRP and goat anti-rabbit-HRP secondary antibodies were from Dako. Unconjugated EGF was from Peprotech. Sphero polystyrene beads (average diameter 1.23 μm) was from Spherotech and Mowiol from Polysciences. All standard chemicals were from Sigma-Aldrich or VWR.
Cell Treatments and Sample Preparation for Superresolution Microscopy. Triple-negative HCC1143 (ATCC) cells/ cm 2 (30 000) were seeded in complete 10% FBS-containing growth medium (RPMI 1640 supplemented with 4.5 g/L glucose, 25 mM Lglutamine, and 100 IU penicillin/streptomycin) onto sterile acidtreated glass coverslips (20 mm diameter, glass thickness no. 1.5; Metzler). On the day of the experiment, cells were washed twice with serum-free growth medium and then treated with 10 nM preformed EGF-biotin:streptavidin-QD565 (EBSQ) complexes in serum-free growth medium at either 4 or 37°C for the indicated times. EBSQ complexes were formed by mixing equimolar amounts of prewashed streptavidin-QD565 with biotinylated EGF (both from Life Technologies). A dilute solution of biotinylated EGF (200 nM) was slowly pipetted using a Hamilton syringe (over minutes) into a more concentrated solution of streptavidin-QD565 (1 μM) while continuously stirring. The mixture was allowed to react for 30 min on ice. The equimolar reaction was carried out at a biotin:streptavidin ratio of 1:16−20. Reaction mixtures were purified from potential free EGF using P30 size exclusion spin columns (Biorad). After treatment, cells were washed three times with ice-cold Tris-buffered saline (TBS; 25 mM Tris/HCl, 150 mM NaCl, pH = 7.4) and fixed for 15 min in 4% (w/v) freshly prepared paraformaldehyde (PFA) dissolved in TBS. Subsequently, cells were permeabilized with 0.4% (v/v) Triton X-100 in TBS, washed twice with TBS, and blocked with 1% (w/v) bovine serum albumin (BSA) in TBS for 20 min at room temperature. Cells were then incubated with the indicated primary antibodies (in blocking solution, at 4°C for 16 h), washed three times with TBS, and then stained with the indicated secondary antibodies in blocking solution at room temperature for 45 min. After two washing steps each with TBS and distilled water, samples were mounted in Mowiol (Polysciences) and left to air-dry in the dark at room temperature overnight. To correct for stage drift, polystyrene beads (0.1% (w/v), average diameter 1.23 μm, Spherotech) were added to the mounting medium as fiduciary markers.
Confocal Fluorescence Microscopy. Confocal images were obtained using an upright Zeiss LSM 510 META confocal microscope, equipped with a blue 405 diode laser, an argon ion laser, and a green HeNe 543 nm laser using the recommended filter sets for imaging of Hoechst 33342, FITC/GFP, and TRITC/RFP. Images were taken with a Plan-Apochromat oil objective 63×, NA = 1.4 (Zeiss). Image processing was performed using the Fiji/ImageJ software.
gSHRImp Super-resolution Imaging with QD Blinking. Superresolution images were acquired using gSHRImP mostly as previously reported. 12,15 Here, specifically, intrinsic blinking of quantum dots instead of photobleaching of organic dyes was employed to achieve super-resolution. We used backward subtraction of the image frames to find single blinking events of individual QDs and stochastically resolve their localizations with nanometer accuracy. Drift correction ( Figure S3) was performed using bright-field scattering images of the nonfluorescent beads acquired on a distinct CCD camera (CV-A55 IR E, JAI Ltd.) synchronized to the fluorescence imaging CCD (iXon+, Andor) for the entire duration of the measurement. Images were acquired at a rate of 33 frames/s. Typically, we imaged areas with a minimum of three fluorescent beads in the field of view and plotted the calibration drift curve by averaging the beads' displacement in the x and y direction, respectively. Trajectories were extracted by tracking the beads' position at each frame using in-house written code running on the IDL platform version 8.4 (Exelis). All images were corrected for drift individually, and super-resolution images were plotted using inhouse-written software as previously reported. 15 Sample Preparation for Transmission Electron Microscopy. HCC1143 cells/cm 2 (300 000) were seeded on six-well Epon plates (Agar Scientific) in serum-containing growth medium. On the next day, cells were washed twice with serum-free growth medium before they were incubated with the anti-EGFR antibody directed against the extracellular EGFR domain (clone 108) that was conjugated to colloidal 10 nm gold particles as previously described. 47 Optimum gold loading into the cells was observed for 30 min of incubation with 80 nM EGF (Peprotech) present alongside the gold-conjugated EGFR antibody. Subsequently, cells were fixed by applying a mixture of 2% (w/v) PFA and 2% (v/v) glutaraldehyde in 0.1 M cacodylate buffer for 30 min. After washing twice with 0.1 M cacodylate buffer, cells were treated with 1% tannic acid in cacodylate buffer and prepared for TEM as previously reported. 48 Sections of 70 nm thickness were cut using a Leica EM UC microtome and imaged on a JEOL JEM 1010 electron microscope.
Cluster Analysis of TEM Data. EGFR cluster analysis was done using Fiji/ImageJ software with the Graph plugin. 49 Gold particles were manually selected and the distance matrix between all the particles was calculated. Adjacent particles separated by less than 50 nm between their centroids were classified as connected components and assigned to the same cluster. The number of particles per cluster (N EM ) and the average intracluster distance (d EM ) were determined for a total of 10 cells and 702 clusters. The average intracluster distance was calculated as the average of all distances between all particles within the cluster.
Bayesian Linear Classifier Analysis. A Bayesian linear classifier 43 was used to predict MAPK activation status on a cell-by-cell basis. Cells were assigned to a "high" or "low" binary class depending on MAPK phosphorylation (fluorescence intensity averaged over the entire cell). The fluorescence intensity was calculated from a single frame, right after the exposure to the excitation light and before any photobleaching could occur. The threshold for separating the two classes represented the halfway value between the medians of the EGF treated and untreated cell populations and was determined to be 2396 a.u. A cell belonged to the "high" class if its average fluorescence

ACS Nano
Article intensity per pixel was higher, and to the "low" class if lower, than the threshold value. MAPK phosphorylation prediction was based on the various covariates derived from super-resolution imaging of EGFR clusters, i.e., proportions of clusters in each of the N-mer or N−d subclasses. The BLC seeks an optimal linear relationship between the input parameters and the binary classes. The intracohort reproducibility of the BLC classification was estimated using leave-one-out cross-validation. One sample is set aside for validation, and the remaining samples are used for training the BLC. The cells were randomly separated into a training set and a validation set (see Methods); the training set was used to train the BLC, which subsequently was used for predictions using the validation set. The BLC was first used with a complete set of input variables; then the least informative covariate was removed (the one with the smallest weighting) and a new BLC was trained and used for prediction. Once trained, the model was used to predict which class the omitted sample belonged to. This was repeated until each sample in the cohort had been used for validation. The training success was defined as the (mean) fraction of cells correctly classified after training, and the validation success was the fraction of validation set cells that were correctly predicted. The BLC assigned weights to each of the input parameters. These indicate the relative strength of each parameter as a reporter of EGFR signaling to MAPK. The weights are determined by computing the maximum a posteriori (MAP) solution of the BLC. To obtain 95% credible intervals (CIs), we approximated the posterior density with a multivariate Gaussian density by evaluating the curvature matrix at the MAP solution. By making predictions on a validation data set we tested the reproducibility of the inferred weights. The high validation performance indicates that the inferred weights are genuine (as opposed to statistical flukes). Beginning with all of the covariates, the covariate with the smallest absolute value weight (averaged over training and validation runs) was eliminated, and the entire procedure was repeated until only a single covariate remained. Under the null hypothesis that there is no association between the covariates and high/low intensity status we estimated a P value using a permutation test. The covariates were randomly permuted 1000 times, and the maximum prediction accuracy was recorded each time. A P value was obtained based on the comparison of the observed predictive accuracy with a randomized data set of no predictive value (that yielded a prediction accuracy of 54%).