Identification of Design Principles for the Preparation of Colloidal Plexcitonic Materials

Colloidal plexcitonic materials (CPMs) are a class of nanosystems where molecular dyes are strongly coupled with colloidal plasmonic nanoparticles, acting as nanocavities that enhance the light field. As a result of this strong coupling, new hybrid states are formed, called plexcitons, belonging to the broader family of polaritons. With respect to other families of polaritonic materials, CPMs are cheap and easy to prepare through wet chemistry methodologies. Still, clear structure-to-properties relationships are not available, and precise rules to drive the materials’ design to obtain the desired optical properties are still missing. To fill this gap, in this article, we prepared a dataset with all CPMs reported in the literature, rationalizing their design by focusing on their three main relevant components (the plasmonic nanoparticles, the molecular dyes, and the capping layers) and identifying the most used and efficient combinations. With the help of statistical analysis, we also found valuable correlations between structure, coupling regime, and optical properties. The results of this analysis are expected to be relevant for the rational design of new CPMs with controllable and predictable photophysical properties to be exploited in a vast range of technological fields.


S1. Colloidal Plexcitonic Materials (CMPs) dataset
. Dataset of CPM-S. In the headers, Mat. stands for material, Fam. for family, J for J-aggregate, M for monomeric form, Int. for interaction. The full names of dye molecules are reported in Table S3. The dye families are indicated with abbreviations: Cy=Cyanines, Cc=Carbocyanines, O=Oxazines, P=proteins, Po= Porphyrins, Rh=Rhodamines, Sq=Squaraines and T=triarylmethanes. The full names of the CL molecules are reported in Table S4. The interactions are classified as in the main text: (i) = direct dye-metal interaction; (iii)= electrostatic interaction; (iv)= segregation. When in the original reference the CL was not indicated, the CL and Interactions entries are classified as "Other". The coupling parameters ћΩR, ћγ, and ћk are reported in meV. S (=NP's surface area) is reported in 10 3 x nm 2 , V (=NP's volume) is reported in 10 3 x nm 3 .  Table S4. The interactions are classified as in the main text: (i) = direct dye-metal interaction; (iii)= electrostatic interaction; (iv)= segregation. When in the original reference the CL was not indicated, the CL and Interactions entries are classified as "Other". The coupling parameters ћΩR, ћγ, and ћk are reported in meV. S (=NP's surface area) is reported in 10 3 x nm 2 , V (=NP's volume) is reported in 10 3 x nm 3 .

S3.
Additional notes about the definition of the dataset

S3.1 Retrieving optical and geometrical parameters.
The optical parameters ћΩR, ћγ, ћk were retrieved from the data reported in the original papers, if present. When these data were not directly available in the papers, we used the software https://apps.automeris.io/wpd/ to retrieve the plexcitonic peaks maxima in the extinction spectra and calculated ћΩR as the energy difference between them. When ћk and ћγ values were not reported, they were estimated as the full width at half maximum of the bands appearing in the extinction spectra. 73 To estimate the effective volume Veff of the NPs, we assumed that it could be approximated by their geometric volume V. For this reason, we calculated the Veff from the geometrical dimensions of the S12 NPs extracted from the TEM analysis. The samples where the TEM analysis was not reported could not be included in the statistical analysis.

S3.2 Exclusions
In the analysis of ћΩR and CR reported in Figure 3, we had to exclude some samples of the dataset: (i) in the case of large detuning, ћΩR cannot be calculated simply as the difference between the energies of UP and LP, but more sophisticated fitting procedures are required, which require knowing the position of the polaritonic peaks for several detuning values. 74 When this information was not available, the samples were excluded from the analysis. 1,2,49, [3][4][5]8,9,13,22,26 (ii) in some works, one or both the plexcitonic bands were hidden by an excess of free dye in solution. 7,14, In these cases, it is not trivial to reliably extract the ratio between the coupled and uncoupled dyes, leading to an over-or under-estimation which would generate a detrimental systematic error. This is also the case of ref [ 44 ], in which the authors proposed to use differential spectrophotometry to remove the contribution of the uncoupled molecules.

S4. Multiple linear regression
In multiple linear regression, a (dependent) response variable is expressed as a function of (independent) predictor variables ! by the model: where " is the intercept and ! are the regression coefficients.
To estimate the validity of the regression, different kinds of statistical indicators can be used: • R-squared: percentage variation in the dependent variable explained by the independent variables.
• Adj. R-squared: R-squared adjusted for the number of variables in the regression.
• Prob(F-Statistic): quantifies the statistical significance of the regression model.
• p-value: quantifies the statistical significance of the regression coefficient.

Dummy Encoding
While the use of numerical variables in linear regression is straightforward, categorical variables must be converted using dummy encoding. Consider a categorical variable characterized by categories.
In dummy encoding, a dummy variable is created for each category of the categorical variable. These dummy variables are binary variables, taking values 1 or 0 depending on whether or not the sample S13 is part of that category. Since only − 1 dummy variables are linearly independent, one of them can be dropped in order to avoid multicollinearity.

S5. Surface area and volume for different NP shapes
The equations used to generate the curves in Figure 7d are the following, where is the length variable that is varied, ℎ is the height and is the radius which are kept fixed.

S6. Bivariate Plots
In the following figures, we report the scatter plots for CR vs 9 / and Ω & vs 9 / for the various categorical variables (Material, Capping, Dye, Shape, Interaction, Aggregate) for the datasets, CPM-S + CPM-D, CPM-S, CPM-D.

S7. Linear Regression Models
The statistical analysis has been performed using the OLS (Ordinary Least Squares) class implemented in the Python library Statsmodels. 75 In the following are reported the output of the regression analysis, specifying the regression model (using R-style formulas) and the dataset used,