Predicting polarizabilities of silicon clusters using local chemical environments

Calculating polarizabilities of large clusters with first-principles techniques is challenging because of the unfavorable scaling of computational cost with cluster size. To address this challenge, we demonstrate that polarizabilities of large hydrogenated silicon clusters containing thousands of atoms can be efficiently calculated with machine learning methods. Specifically, we construct machine learning models based on the smooth overlap of atomic positions (SOAP) descriptor and train the models using a database of calculated random-phase approximation polarizabilities for clusters containing up to 110 silicon atoms. We first demonstrate the ability of the machine learning models to fit the data and then assess their ability to predict cluster polarizabilities using k-fold cross validation. Finally, we study the machine learning predictions for clusters that are too large for explicit first-principles calculations and find that they accurately describe the dependence of the polarizabilities on the ratio of hydrogen to silicon atoms and also predict a bulk limit that is in good agreement with previous studies.


I. INTRODUCTION
Silicon nanoparticles possess attractive electronic and optical properties that can be tuned by changing the size and shape of the nanoparticle [1] and are used in a variety of scientific and industrial applications, including medical imaging [2,3], photoelectronics [4] or novel electronic devices [5], such as single electron transistors [6] or novel memory devices [7]. In many of these applications, the nanoparticles interact with applied electric fields and the electronic polarizability is a key characteristic of the nanoparticle that determines its dielectric response [8].
To understand the dependence of the polarizability on the size and shape of silicon clusters, a variety of different modelling approaches have been employed. For example, simple empirical models, such bond polarizability models, have been used to predict Raman spectra of silicon clusters in good agreement with experiment [9]. Empirical models can be extended beyond the assumption of additivity of atomic polarizabilities. A class of models that captures interactions between polarization centres are dipole interaction models [10], which have been successfully applied to the construction of polarizable force fields [11]. Highly accurate cluster polarizabilities can be obtained using ab initio approaches such as density functional theory (DFT) [12][13][14][15][16][17], Møller-Plesset perturbation theory [17], coupled-cluster theory [8,17,18] or the random phase approximation (RPA) [19][20][21]. For example, Mochizuki and Agren [21] used the RPA and the second-order polarization propagator approximation to calculate the polarizabilities of spherical hydrogenated silicon clusters with up to 35 Si atoms and found that the polarizability per silicon atom approaches the bulk limit from below. In contrast, for unhydrogenated silicon clusters Jackson and coworkers found that the bulk value is approached from above as the size of the cluster increases [15,16]. This behaviour was attributed to the presence of dangling bonds on the surface. Furthermore, it was observed that the polarizability depends sensitively on the shape of the cluster [16,20]. Jansik et al. [20] compared polarizabilities of three-dimensional (3D), two-dimensional (2D) and one-dimensional (1D) hydrogenated silicon structures and found that the presence of π-bonds in 2D systems leads to a much stronger increase in the polarizability as a function of cluster size when compared to 1D and 3D clusters [20]. A similar trend was observed when comparing prolate and compact clusters, with prolate structures showing a significantly larger polarizability per silicon atom than compact structures [16].
Calculating polarizabilities for large silicon clusters is challenging for first-principles techniques because the computational cost of these methods increases rapidly with system size. In recent years, machine learning (ML) based techniques have been explored as an efficient alternative to physical-based approaches, such as firstprinciples methods. For example, ML has been employed to efficiently represent potential energy surfaces [22][23][24][25] or to predict electronic ground state densities [26][27][28][29]. Very recently, several groups have also started to explore the applicability of ML approaches to calculate molecular polarizabilities [30][31][32]. A key ingredient in ML methods is a descriptor which acts as a molecular fingerprint and encodes the structure and chemistry of a molecule. For example, Grisafi et. al. [31] introduced a symmetry adapted variant of the smooth-overlap of atomic positions (SOAP) descriptor [33] to predict polarizability tensors of molecules. Similarly, Wilkins and coworkers [30] used the symmetry-adapted SOAP kernel to predict polarizabilities and first hyperpolarizabilities of small or-ganic molecules with high accuracy. The SOAP descriptor enables the comparison of different chemical environments of an atom. For this, the overlap of the corresponding neighbourhood densities (constructed as a sum of Gaussians centered on atoms in the local environment) is expressed in terms of the coefficients in a basis of spherical harmonics and radial basis functions [33].
In the present work, we explore the ability of machine learning models based on the Smooth Overlap of Atomic Positions (SOAP) [33] descriptor to describe and predict static polarizabilities of hydrogenated silicon clusters. For this, we first calculate scalar isotropic polarizabilities of a set of clusters containing between 10 and 110 silicon atoms using the random phase approximation (RPA). We then investigate the ability of the ML approach to reproduce the calculated polarizabilities and find that almost perfect agreement can be obtained when the size of the local chemical environments is sufficiently large to contain the whole cluster. Importantly, the ML models already describe the qualitative behaviour of the average polarizability per atom if the local environment only contains nearest neighbour atoms. Next, we study the ability of ML to predict polarizabilities of clusters. Interestingly, we find that the predictive power of ML is strongest when the size of the chemical environment is relatively small. These insights enable the reliable prediction of polarizabilities of large clusters which are difficult to calculate with standard first-principles techniques.

A. Random Phase Approximation polarizabilities
Scalar polarizabilities of molecules and clusters were calculated within the RPA in a linear response framework. The RPA was chosen because it is known to give an accurate description of the dielectric properties of bulk silicon [34]. The polarizability tensor α ij relates the induced dipole moment with Cartesian components µ i to the applied static electric field E j according to To obtain an expression for α ij , we express µ i in terms of the induced electronic charge density ∆ρ(r) via where e denotes the proton charge and r i is the Cartesian component of the position vector. The induced charge density is determined by the interacting density-density response function χ(r, r ) according to where we used that the potential associated with the applied electric field is given by V (r) = e j E j r j . Combining these equations yields Finally, the scalar polarizability α is obtained by dividing the trace of α ij by three.
To evaluate Eq. (4) the interacting density-density response function must be determined. In the RPA χ obeys the Dyson equation where v(r − r ) denotes the Coulomb interaction and χ 0 is the non-interacting density-density response function given by [35,36] where f i denotes an occupancy factor and φ i and i denote Kohn-Sham orbitals and eigenvalues, respectively. Note that the summation ranges over both occupied and unoccupied states resulting in the well-known difficulties of converging such sum-over-states expressions.
To numerically calculate scalar polarizabilities, we employ a plane-wave/pseudopotential approach. Specifically, the BerkeleyGW programme package [37,38] is used to calculate χ GG where G and G denote reciprocal lattice vectors of the periodically repeated supercell. Note that interactions between images are avoided by using a truncated Coulomb interaction. The interacting density-density response function in real space is then given by where V = L 3 denotes the volume of the cubic supercell, with L being the side length. Finally, the scalar polarizability is found to be with and similar expressions for ∆ G,y and ∆ G,z .

B. Environment descriptors
The ability to assess the similarity of different chemical environments plays a key role in machine learning of material properties. In this work, we use the SOAP approach [33] where the environment of atom i is described by the set of neighbourhood densities where ν denotes a specific element that is present in the atom's environment with N ν being the number of such atoms up to a given cut-off radius r cut . In addition, γ ν is a hyperparameter describing the size of the neighbour atom.
The similarity of two chemical environments described by the neighbourhood densities ρ i = {ρ ν i } ν and ρ j = {ρ j ν } ν can be measured by the kernel [39] k whereR denotes a rotation matrix. To evaluate the kernel integral, the angular dependence of the neighbourhood densities is expanded in a basis of spherical harmonics Y lm and the radial part in a set of orthogonal radial basis functions g n (r) according to where c ν i,nlm is an expansion coefficient. Here, l ranges from zero to a cut-off value l max and m ranges from −l to l. As radial basis functions, the modified spherical Bessel functions of the first kind are used and n ranges from zero to a cut-off value n max .
The similarity kernel Eq. (13) has the appealing property that the integrals can be carried out analytically yielding [33] From the above expressions, it can be seen that the set of coefficients {d ν,ν i,nn l } plays the role of a descriptor vector d i for the environment of atom i. In practice, we calculate the descriptor vectors using the Quippy software package [40]. The kernel matrix k(ρ i , ρ j ) is then calculated according to Finally, we note that the sensitivity of the kernel to differences between atomic environments can be increased by defining the effective SOAP kernel [33,39] In this work, we use = 2.

C. Learning cluster polarizabilities
The SOAP descriptor allows the comparison of different environments of a given atom. However, the polarizability is calculated for an entire molecule or cluster consisting of many atoms. To harness the SOAP approach for the prediction of cluster polarizabilities, it is therefore necessary to relate atomic properties to cluster properties. One way to achieve this is by expressing the polarizability α I of cluster I as the sum of atomic contributions α i [39] according to where N I denotes the total number of atoms in the cluster. While the atomic contributions can provide some valuable intuition about the dielectric response of complex clusters, it is important to stress that these quantities are not directly measurable and should be interpreted with care [41]. Using standard kernel ridge regression, the atomic polarizabilities can be expressed as where N train denotes the total number of atoms in the training set (i.e. the total number of atoms contained in all training set clusters), ζ j is a coefficient obtained from training the SOAP model, and where we defined the sum kernel K sum I,j = N I i K ij . Determining the coefficients ζ j is difficult as the fit to the calculated cluster polarizabilities is strongly underdetermined (as the number of coefficients is the total number of atoms of all clusters in the training set). To make progress, the number of coefficients must be reduced. Intuitively, this should be possible as the atomic environments of many atoms in the training set are very similar. Practically, this sparsification is achieved by means of a singular value decomposition (SVD) of the descriptor matrix D whose rows contain the descriptor vectors from Eq. (16). Specifically, D is expressed as where U and V T contain the right and left singular vectors, respectively, and Σ is a diagonal matrix containing the singular values. If many environments in D are similar, only a few singular values will have large magnitudes. We only retain those singular values which are larger than a given threshold and use the corresponding left singular vectors (which form a matrixṼ) as a new basis to represent D.
The elements of the SOAP kernelK corresponding to this new set of effective descriptors are obtained by projecting the descriptors d i onto the rowsṽ j of the truncated matrix of singular vectorsṼ according tõ Next, the effective sum kernelK sum can be calculated using Eq. 21, but now the number of coefficients ζ i is equal to the number of singular vectors whose singular values exceed the threshold. Finally, the vector of coefficients ζ is obtained from [39] where Λ = λI with λ being a regularization parameter and α denotes the vector of calculated cluster polarizabilities.
Alternatively, the cluster polarizability can be expressed as the number of silicon atoms multiplied by their average polarizability α av (note that in this definition α av also contains the smaller contribution from the hydrogen atoms) To calculate the average polarizability, we average the SOAP kernel matrix over environments belonging to pairs of clusters [41] Using kernel ridge regression, the average polarizability of the silicon atoms in a given cluster is expressed as where n train denotes the number of clusters in the training set and the vector of coefficients ζ J is determined by where α av is the vector containing the average polarizabilities per atom of the training set clusters. As a consequence of the averaging, no additional sparsification procedure is required as in the case of the sum kernel. This method has the advantage that the average polarizability can be written as a sum of atomic contributions, which allows one to assign polarizabilities to individual atoms. This can be achieved by omitting the average over the index i in Eq. 26, which yields a prediction for each silicon atom in a cluster It is interesting to note that the polarizability obtained from the average kernel can also be expressed as a sum of atomic contributions given by Apart from the scaling factor 1/N J , the last equation is very similar to Eq. (20) of the sum kernel approach, with the additional constraint that the coefficients ζ j on atoms in a cluster J are all equal, ζ j = ζ J ∀j ∈ J. The effect of the scaling factor is that while the sum kernel is extensive (its magnitude scales with the number of atoms in the cluster), the average kernel is intensive, independent of system size. As a consequence of this, large clusters get more heavily weighted in the solution of the least squares problem, Eq. (24), compared with that for the average kernel.
Finally, we also use the "coherent average" kernel (denoted "coh"), which is obtained as follows. Rather than computing a SOAP descriptor for each atomic environment, as in Eq. (15), we take the spherical harmonic coefficients c ν nlm and average them first to obtain, for cluster Ic and then square these to form the averaged descriptor vectord I with components, and the coherent (unnormalized) kernel between clusters I and J as

D. Physical models
We also use two simple physical-based models to fit the calculated RPA polarizabilities. In the first approach, the cluster polarizability is assumed to be proportional to the number of silicon atoms N Si in the cluster, i.e.
with α av Si denoting the average polarizability per silicon atom (again, any contributions from hydrogen atoms is included in α av Si in this definition). In contrast to the SOAP fitting with the average kernel, the average polarizability is assumed to be the same for all clusters.
The second model is a bond polarizability approach where the cluster polarizability is expressed as a sum of contributions from Si-Si bonds and Si-H bonds according to Dividing both sides by N Si yields the polarizability per silicon atom Interestingly, this shows that the polarizability per silicon atom is a function of the ratio of hydrogen and silicon atoms only.

E. Generation of clusters
To generate atomic structures of hydrogenated silicon clusters we follow a similar procedure as Barnard et al. [42] who carve spherical clusters from a perfect silicon crystal, terminate the dangling bonds on the surface with hydrogen atoms and then relax the atomic positions using DFT. Unfortunately, this approach only yields very few clusters with 100 or less silicon atoms. Because of the relatively large computational cost associated with the RPA polarizability calculations, we instead use the following approach to generate clusters: starting from the spherical Si 123 H 100 cluster, we remove silicon atoms from the surface, terminating any dangling bonds with hydrogen atoms and relax the structure with DFT. In this way, a set of 100 hydrogenated silicon clusters containing between 10 and 110 Si atoms is obtained for which RPA polarizabilities are calculated. In addition, we include the spherical clusters with less than 123 Si atoms from Barnard et al. [42].

F. Computational details
The plane-wave/pseudopotential DFT code Quantum Espresso [43,44] was used to obtain Kohn-Sham energies n and wavefunctions φ n (r). We employed the PBE exchange-correlation functional, norm-conserving pseudopotentials from the original Quantum Espresso Pseudopotential library [43,44] and a plane-wave cut-off of 65 Ry. The clusters were placed in a cubic unit cell with sufficient vacuum to avoid interactions between periodically repeated images. Next, cluster polarizabilities were calculated with BerkeleyGW [37,38] using a plane-wave cutoff of 6 Ry and a truncated Coulomb interaction. A total of 600 Kohn-Sham states were included in the summation for χ which was found to be sufficient to converge the scalar polarizabilities. SOAP descriptors were constructed with l max = 9 and n max = 20 and γ ν = 2.0 for r cut ≤ 10.0Å and γ ν = 0.5 for r cut > 10.0Å. In all calculations, we only study local environments of silicon atoms. As all hydrogen atoms are bonded to silicon atoms, their contribution to the cluster polarizabilities can be captured indirectly through their influence on the silicon atoms.

III. RESULTS AND DISCUSSION
A. Fitting polarizabilities Fig. 1(a) shows the RPA polarizabilities of the hydrogenated silicon clusters as function of the number of silicon atoms in the cluster. We observe that the polarizability exhibits a linear behaviour which suggests that the Si atoms provide the dominant contribution.
Deviations from the linear behaviour become explicit when the cluster polarizability is divided by the number of silicon atoms, see Fig. 1(b). For clusters containing more than 80 Si atoms, the polarizability per silicon atom decreases. Interestingly, α/N Si increases for cluster between 70 and 80 silicon atoms, but decreases again for clusters between 40 and 70 silicon atoms. For clusters with less than 40 Si atoms, there is a significant amount of scatter in the polarizabilities but overall α/N Si tends to increase with increasing number of Si atoms. Overall, the polarizability per silicon atom has an M-like shape as function of the number of silicon atoms.
For very large clusters, α/N Si should converge to the atomic RPA polarizability of bulk silicon which is 3.77Å 3 (determined using the Clausius-Mossotti relation using a bulk dielectric constant of 12.2 [34]). This explains the observed decrease of α/N Si for N Si > 80. Note that in our results the bulk value is not approached from below because we have not removed the hydrogen contributions from the cluster polarizabilities [20,21].
To understand these findings, we first compare our results to two physical-based models: a model in which the cluster polarizability is assumed to be proportional to the number of Si atoms (denoted the linear N Si model) and . The parameters of both models were fitted to the calculated RPA data using a least squares optimization. The results are shown in Fig. 2   Results obtained using SOAP with the sum kernel (green squares), the average kernel (red squares) and the coherent kernel (black crosses). A cut-off of 2.5Å was employed. (c) Same as (b) but with a cut-off of 20Å.
parameters of the bond polarizability model are found to be α Si−Si = 1.98Å 3 and α Si−H = 1.32Å 3 . As the polarizability per Si atom is 2α Si−Si , the predicted bulk value is 3.96Å 3 which is in better agreement with RPA results.
The above analysis demonstrates that both physicalbased models have several shortcomings. This is a consequence of two factors: (i) their parameters do not depend on the properties of the local chemical environment, i.e. bond lengths or bond angles, and (ii) these models do not capture the dependence of the polarizability on the cluster shape. To overcome these problems, we now explore the ability of machine learning models to describe the polarizabilities of Si clusters. Figures 2 (b) and 2 (c) show the results from the machine learning model using both the sum kernel, the average kernel and the coherent kernel (see methods). The real space cutoff that determines the size of the chemical environment of each atom is r c = 2.5Å in Fig. 2(b) and r c = 20Å in Fig. 2(c). In the fit, the regularization parameter λ was kept small (10 −15 for the sum kernel model and 10 −12 for the average and coherent kernels) in order to allow as much flexibility in the parameters as possible. For the smaller cut-off (where only nearest neighbour atoms are included in the local environment), all three kernels provide an improved description compared to the physical-based models. Specifically, they capture the M-shape of α/N Si as function of N Si and also reproduce the scatter for smaller clusters. The coherent kernel is slightly better than the averaged kernel, and significant deviations from the calculated polarizabilities are only observed for the smallest cluster sizes when the sum kernel is used. When r c is increased to 20Å, the agreement between the ML models and the calculated polarizabilities is significantly improved. In particular, the results from the average and the coherent kernel are in almost perfect agreement with the data, while the sum kernel results show small deviations for smaller clusters. The good results obtained for the short cutoff indicate that polarizabilities are dominated by local chemical effects. However, long-range interactions also influence polarizabilities and this is captured when the cutoff radius is increased.

B. Predicting polarizabilities
Up to this point, we only considered the ability of the SOAP approach to fit the calculated cluster polarizabilities. To investigate SOAP's capacity to predict polarizabilities of clusters that it was not trained on, we use kfold cross validation [45]. In this procedure, the clusters in the data set are randomly assigned to five sub-sets. Next, four sub-sets are used to train the ML approach and the fifth sub-set is used as the test set. This is done five times with each sub-set acting as test set once. We optimize the regularization parameter λ to minimize the mean average error (MAE). The optimal parameters are listed in the appendix. The resulting MAE and its standard deviation as function of r cut are shown in Fig. 3  (a). The average kernel and the coherent kernel yield very similar results and are compared in Fig. 3 (b). Strikingly, the sum kernel model produces the largest MAE for the test set among all methods. In particular, the test set MAE is significantly larger than the training set MAE indicating poor capacity to predict polarizabilities. In contrast, the average kernel model yields the smallest test set MAE which is only slightly worse than the training set error. The coherent kernel model yields slightly worse predictions than the average kernel, with the biggest difference between the two occurring at r cut = 7.5Å. The MAE of the two physical-based models lies between those of the sum kernel and the average kernel. The different performances of the sum kernel and the average kernels originate from the different training procedures: the sum kernel model is trained on total cluster polarizabilities, while the average kernel is trained on the average polarizability per silicon atom, see Eq. (28). As a consequence, the sum kernel model is biased towards more accurate predictions for large clusters and is less accurate for small clusters. This can also be seen in Fig. 2(c) which shows that the quality of the sum kernel fit improves for larger clusters. This has been observed before by Stocker et. al. [46] who argued that the intensive average kernel has the advantage of equally weighting small and large molecules, which is beneficial when learning quantities over a large range of cluster sizes. Interestingly, the average kernel performs somewhat better than the coherent kernel suggesting that a model of the cluster polarizability that can be expressed as a sum of atomic contributions constitutes a better representation of the system's dielectric response. Fig. 3 also shows that the minimum test set MAE for the average kernel and the coherent kernel is obtained for r cut = 17.5Å, while for the sum kernel the minimum is achieved for r cut = 15.0Å. Interestingly, neither kernel benefits significantly from increasing r cut beyond 5Å. To understand this finding, we compare the elements of the average kernel matrix for r cut = 5.0Å and r cut = 17.5Å, see Fig. 4. For the smaller cutoff, the kernel matrix decays slowly along the rows and columns of the kernel matrix. In contrast, the decay is significantly more pronounced for the larger cutoff suggesting that a smaller cutoff facilitates the recognition of similar chemical environments in clusters of different size. This is not surprising because for large cutoffs the chemical environment contains a significant amount of vacuum for small clusters, but not for large clusters.
Next, we explore the ability of the ML approach to predict polarizabilities of large clusters based on a training set of small clusters. For this, we train the average kernel on the 60, 70 or 80 smallest clusters and then predict the polarizabilities of the remaining large clusters in the data set. Fig. 5 shows the resulting test set MAE as function of the cutoff radius. All curves exhibit a minimum at small cut-offs near r cut = 5Å and the smallest MAE is obtained for the largest training set. For the smaller training sets (n t = 60 or 70) the MAE increases rapidly as the cutoff is increased, while for the largest training set the increase is mild (and another minimum is found at r cut = 17.5Å). Similar to our findings in the   k-fold cross validation, this shows that it is not beneficial to increase the cut-off radius beyond a certain value. Figure 6 (a)-(c) compare the predictions of the average kernel with r cut = 5Å with the calculated RPA polarizabilities per silicon atom. For all three training set sizes, the ML model captures the qualitative trends. For n t = 60, the average kernel correctly predicts the increase of α/N Si at N Si = 70 and also the decrease starting at N Si = 80. While the ML models underestimate the po- larizabilities per Si atom for large clusters when n t = 60 and n t = 70, good quantitative agreement is achieved for n t = 80. Finally, we train the average kernel model on the entire data set (using r cut = 5Å) and predict the average polarizabilities of the entire Silicon Quantum Dot data set containing clusters with up to 3000 silicon atoms [42]. The results are shown in Fig. 7. It can be observed that the polarizability per Si atom converges slowly to its bulk limit as N Si increases and there is significant scatter in the results. The scatter in α/N Si reflects the different N H /N Si ratios and different environments present in the clusters. To understand the slow convergence to the bulk value, note that the number of silicon atoms scales with the cluster volume, while the number of hydrogen atoms is roughly proportional to the surface area. This suggests that α/N Si should be proportional to the inverse radius of the cluster or, equivalently, to 1/N 1/3 Si . Indeed, Fig. 7 shows that the ML predictions are well described by the function a+b/N

1/3
Si with a = 3.89Å 3 and b = 1.55 obtained from a least-squares fit. The value of a agrees well with the RPA atomic polarizability of bulk silicon of 3.77Å 3 [34].
Additional insights can be obtained by analyzing the atomic polarizabilities obtained from the SOAP average kernel method, see Eq. (29). Fig. 8 shows the atomic polarizabilities of a Si 2109 H 604 cluster. In Fig. 8 (a) only local chemical environments of silicon atoms are considered (and the effect of the hydrogen atoms is captured indirectly through their influence on the silicon chemical environments). Silicon atoms in the center of the cluster have a polarizability of 3.76Å 3 , in excellent agreement with value extracted from bulk calculations of 3.77Å 3 [34]. The polarizability of the silicon atoms in the two surface layers is larger, sometimes as large as 5Å 3 . The reason for this increase is that the surface silicon atoms are bonded to hydrogen atoms and their atomic polarizability is effectively the sum of the silicon and hydrogen contributions. To disentangle contributions from silicon and hydrogen atoms to the cluster polarizability, Fig. 8 shows the atomic polarizabilities from a calculation that explicitly takes chemical environments of hydrogen atoms into account. Interestingly, the results suggest that the atomic polarizability of subsurface silicon atoms is larger than the bulk value, but the polarizability of surface silicon atoms (which are bonded to hydrogens) is smaller. The average atomic polarizability of the silicon atoms is found to be 3.63Å 3 . This is in agreement  with the results of Mochizuki et. al [21], who predicted that the bulk limit of the silicon atomic polarizability is approached from below.  [42] from the average SOAP kernel model with r cut = 5.0Å.

IV. CONCLUSIONS
In this work, we have demonstrated that machine learning models based on the smooth overlap of atomic positions (SOAP) descriptor can be used to accurately and efficiently predict polarizabilities of large hydrogenated silicon clusters. Using the random phase approximation, we calculated the polarizabilities of a set of hydrogenated silicon clusters containing between 10 and 110 silicon atoms. We then assessed the ability of three machine learning models (one using the sum kernel, one using the average kernel and one the coherent kernel) to fit the calculated polarizabilities and find that all three models perform well when the local environment includes nearest neighbour atoms only. Increasing the size of the environment improves the quality of the fit. Next, we investigated the ability of the machine learning models to predict polarizabilities of clusters that are not in the training set. Using k-fold cross validation, we find that the average kernel performs significantly better than the sum kernel and that the predictions only weakly depend on the size of the chemical environment. We also tested the predictive power of the average kernel when it is trained on small clusters only and find that quantitative accuracy can be achieved if the training set is sufficiently large. Finally, we use the average kernel approach to predict the polarizabilities of hydrogenated silicon atoms with up to 3000 silicon atoms and find that the results approach the correct bulk limit. The ability to efficiently calculate polarizabilities of large clusters paves the way towards using machine learning for excited-state properties of these systems. For example, the static polarizability is a key ingredient for calculating quasiparticle properties within the GW approach and also for cal- culating optical properties by solving the Bethe-Salpeter equation. Symmetry-adapted kernel regression could be used to straightforwardly generalise our models to predict the full polarisability tensor [47].