Solvent effects on electronic excitations of an organic chromophore

In this work we study the solvatochromic shift of a selected low energy excited state of alizarin in water by using a linear-scaling implementation of large-scale time-dependent density functional theory (TDDFT). While alizarin, a small organic dye, is chosen as a simple example of solute-solvent interactions, the ﬁndings presented here have wider ramiﬁcations for the realistic modelling of dyes, paints and pigment-protein complexes. We ﬁnd that about 1200 atoms of explicit water need to be considered in order to yield an accurate representation of the solute-solvent interaction and a reliable solvatochromic shift. By using a novel method of constraining the TDDFT excitation vector, we conﬁrm that the origin of the slow convergence of the solvatochromic shift with system size is due to two different effects. The ﬁrst whom factor is a strong redshift of the excitation due to an explicit delocalisation of a small fraction of the electron and the hole from the alizarin onto the water, which is mainly conﬁned to within a distance of 7 Å from the alizarin molecule. The second factor can be identiﬁed as long-range electrostatic inﬂuences of water molecules beyond the 7 Å region on the ground state properties of alizarin. We also show that these electrostatic inﬂuences are not well reproduced by a QM/MM model, suggesting that full QM studies of relatively large systems may be necessary in order to obtain reliable results.


Introduction
Understanding the interactions between chromophores and their environment is one of the important objectives of the computational study of dyes in solvents [1][2][3][4][5][6][7] or pigment-protein complexes in biological systems. [8][9][10][11][12][13][14][15] These systems are generally characterised by a complex interplay between the chromophore and its surroundings, leading to spectral properties that deviate in nontrivial ways from gas-phase results. Most prominently, many chromophores experience a strong solvatochromic shift in solution, which is defined as the energy difference between an excitation in solution and the same excitation in the gas phase. Accounting for environmental effects in a correct and consistent way is crucial when trying to determine the colour of dyes in different solutions. 2,4 On the other hand, the site energies of pigments in pigment-protein complexes such as the Fenna-Matthews-Olson complex 11,12 are found to be finely tuned by their environment in order to achieve very high quantum efficiencies in funneling excitons through the system. Any theoretical study of these types of systems necessarily relies on the correct treatment of environmental effects on excitation energies of pigments.
In this work, we consider an explicit example of a pigment-solvent system, namely alizarin in water, which has been the focus of a number of previous studies. 3,7 Alizarin is chosen because it is a small organic dye that does not deprotonate and has a single electronic ground state, but neverthe-less forms hydrogen bonds such that environmental effects are important. It can therefore be seen as an ideal simple model system for studying environmental effects on electronic excitations, while the insights gained can then be applied to more complicated examples of environmental effects, such as pigment-protein complexes.
Most recent works targeting the properties of pigments in solution use time-dependent density functional theory (TDDFT) [16][17][18] as the method of choice, due to its relatively low computational cost compared to higher order quantum mechanical methods. Computational approaches to represent the environment of a pigment of interest however show a considerable variety, ranging from classical continuum descriptions of solvent molecules through polarisable continuum models (PCMs), 4,7,19 to mixed quantum mechanical-classical (QM/MM) models, both with standard 9,15,20 and polarisable force fields, 10,[21][22][23] or a fully quantum mechanical treatment of the pigment and the environment. 2,3,24 QM/MM methods are often taken to provide a good balance between an accurate treatment of the environment and computational costs, but results obtained can be very sensitive to the size of the QM region. 9,10,15 Meanwhile, with traditional approaches, quantum mechanical calculations are very computationally expensive and thus often rely on relatively small system sizes. 2,3 Here, we focus on studying the sensitivity of calculated solvatochromic shifts with respect to the volume of surrounding environment explicitly included in the TDDFT calculation. While it has been discovered in previous studies that properties such as solvatochromic shifts can show a slow convergence with the size of the QM region, 6,15 to the best of our knowledge, no study has been performed in a simple test system that allows for a detailed breakdown and analysis of both the origin of the shift and the need for large system sizes. In this work, such an analysis is provided by studying the lowest dipole-allowed transition of alizarin in water, a transition of mainly HOMO→LUMO character. The large-scale TDDFT calculations at the centre of this work are enabled by an efficient approach capable of computing excited states in systems containing thousands of atoms, 29,36 while the analysis of solvatochromic shifts is facilitated by an innovative method that allows for the computation of excited states constrained to be confined within certain regions of larger systems. Furthermore, it will be demonstrated that this approach solves certain problems reported in previous studies regarding the computation of excited states in extended systems using (semi)-local exchange-correlation functionals that relate to the presence of unphysical charge-transfer excitations in the low-energy spectrum. 33

Computational Methods
To analyse the solvent effects, we need to create a realistic model of alizarin and its local water environment. We are mainly interested in getting correct solvatochromic shifts, which have previously been shown to be sensitive to long-range interactions requiring large system-sizes, and it is therefore impractical to generate structures using a fully ab initio MD scheme. We therefore perform a classical molecular dynamics simulation of alizarin in water using AMBER. 25 While the structures obtained this way are expected to differ from those that would be obtained with a full QM treatment of the system, they can nevertheless be considered to form a representative model to study the long-range effects of the solvent on the excitation energies of the solute. Alizarin in its ground state DFT structure is solvated in a truncated octahedral box containing approximately 6000 water molecules described by the TIP3P model. 26 The resulting system is heated up from 0 to 300 K for 40 ps using the Langevin thermostat with a collision frequency of 1 ps −1 , followed by a 200 ps volume equilibration at 1 atm of pressure in the NPT ensemble. We then perform a 200 ps production run at constant temperature and pressure. A time step of 2 fs is used throughout.
Five frames are taken from the last 100 ps of the MD trajectory with a spacing of 25 ps and these are used as input geometries for TDDFT calculations. We note that the purpose of this work is to determine the sensitivity of the solvatochromic shift to the treatment of the environment and not to provide a thermally averaged result for the energy of the low energy dipole-allowed state of interest. Therefore five evenly spaced MD frames can be considered to form a sufficient sample to span the range of magnitudes of shifts.
The dimensions of the alizarin molecule allow us to define a rectangular box via the more rigid carbon atoms of the structure, yielding a size of approximately 3.5 × 8.6 Å with a very small thickness, which contains all the carbon atoms of alizarin. To create different representations of the solvent environment, we increase the size of the box in all directions by 4 Å, 6 Å, 8 Å, 10 Å and 12 Å, including all water molecules fully within the extended rectangular boxes into the TDDFT calculations. We thus generate five models of different size for each of the five MD frames. These are then placed at the centre of a larger simulation box that is 8 Å larger in size in all directions than the 12 Å solvent box for each frame. The five solvent models vary slightly in the number of atoms for each of the five frames, but the 4 Å systems contain roughly 100 atoms, while the 12 Å systems contain of the order of 1900 atoms.
The TDDFT calculations are performed with the linear-scaling code ONETEP, 27,28 which is capable of computing optical excitations in systems containing thousands of atoms, 29 using the PBE functional. 34 ONETEP makes use of localised atom-centered functions to represent the singleparticle density matrix that are expanded in an underlying systematically improvable basis equivalent to plane waves. 35 The support functions are optimised in-situ during the calculation, which means that only a minimal number is needed to provide an ideal representation of the density matrix. For the calculations performed in this work, we choose an energy cutoff of 800 eV and a localisation radius of 5.29 Å for all support functions (see the supporting information for further comments on the ONETEP calculation parameters and a demonstration of the insensitivity of the TDDFT results in this work with respect to an increase of the support function localisation radius).
We have demonstrated in previous works 29, 36 that the optimised localised representation used in ONETEP yields TDDFT results of a quality comparable to that of large diffuse Gaussian basis sets like aug-cc-pVTZ. 37 For the TDDFT calculations we first take the atomic positions of alizarin from each of the snapshots and calculate the energy of the first dipole-allowed state, a transition of mainly HOMO→LUMO character, in vacuum. Depending on the chosen MD frame, this state is found to be one of the lowest three excitations of the system in vacuum. We then calculate the excitation energies for the same state in each of the explicit solvent models for all snapshots and compare these energies to the vac-uum energies to determine the effective solvatochromic shifts. All calculations containing explicit solvent molecules are performed by embedding the system in an implicit solvent model 30 which uses a static dielectric constant of ε 0 = 80 in the region outside the water cluster. This has two desirable effects: First, the implicit solvent model prevents all electrostatic interactions between periodic images of the system that would otherwise be present in periodic boundary conditions. 31 Furthermore, it screens any unphysical dipoles on the surface of the solvent region explicitly included in the calculation, preventing a spurious closure of the band gap 32 and greatly reducing spurious charge-transfer states between "edge waters" on the surface of the solvent region and the pigment as observed in the literature. 33 We also perform two further calculations on each snapshot: one where no explicit solvent is considered but in which the ground state of the alizarin molecule is computed within an implicit solvent with ε 0 = 80; and another where we additionally include dynamic solvent effects into the excited state calculations using a dynamic dielectric function of ε ∞ = 1.78. In the first case, the solvent effects due to nuclear rearrangements of solvent molecules on the ground state of alizarin are fully considered, but any solvent reaction to the excitation itself is ignored. In the second case, the excitation is treated to occur on a timescale too fast for nuclear solvent degrees of freedom to react to it but solvent polarisation effects as a reaction to the excitation are modelled via the dynamic dielectric constant. It should be stressed that all calculations containing explicit water molecules are modelled using a static dielectric constant ε 0 = 80 in the region beyond the QM region only, as dynamic polarisation effects are taken to enter into the calculation through the explicit representation of water molecules rather than the averaged dielectric medium. All TDDFT calculations are performed in the Tamm-Dancoff approximation (TDA). 42 However, using a recent implementation of full TDDFT in ONETEP, 36 we have confirmed that the discrepancies between shifts predicted by the TDA and full TDDFT for these models is of the order of 10 meV or less.
In the ONETEP code, the TDDFT eigenvector associated with a given excitation is expressed via a transition density matrix P {1} 29,36 that is represented by two sets of in situ optimised localised atom-centered orbitals {χ α } and {φ β }, where {χ α } ideally spans the low energy conduction manifold 43 and {φ α } ideally spans the valence manifold. This representation comes with a number of advantages, one being its compact size that allows for TDDFT calculations of very large system sizes, 29,36,44 while another is outlined in more detail in this section. It can be demonstrated that atom-centered representation of TDDFT eigenvectors through P {1} allows one to exclude certain characters of transitions from a TDDFT calculation. This is achieved by setting the corresponding density matrix elements of P {1} to zero. 29,44 As an example, consider the system of alizarin in explicit water as described above. Matrix elements P {1}αβ , where χ α or χ β are centered on an atom belonging to a water molecule can be set to zero. This would force the excitation to be strictly localised on the the alizarin (see Figure 1a). Alternatively if non-zero matrix elements are allowed in case that χ α and φ β are both centered on the alizarin or both centered on the water, the local excitations of the alizarin are allowed to couple to local excitations in the water, but any charge-transfer excitations between the two regions are suppressed (see Figure 1b).
Applying this type of density matrix truncation directly to P {1} would yield invalid results as every response density matrix has to follow an implicit invariance constraint of the form 29,44 Since P {c} and P {v} are idempotent, P {1} now obeys the constraint of Eq. (1) by construction. We stress that in defining P {1} through the auxiliary density matrix L {1} , the lowest excitations of the system for a given truncation scheme can be obtained fully variationally. 44 We demonstrate in the next section how different constraints placed on L {1} allow us to analyse the origins of solvatochromic shifts of alizarin placed in explicit water.

Results and Discussion
For illustrative purposes, Figure 2 shows plots of the electron-hole difference density of the lowest dipole-allowed transition of alizarin for one MD frame for three different sizes of solvent representation. As can be clearly seen, the electron-hole densities in the implicit solvent, the 6 Å and the 10 Å explicit model all describe the same excited state localised on the alizarin. However, it is also evident that for the models containing an explicit representation of the solvent molecules, fractions of the electron and the hole delocalise onto the solvent molecules, with most of the delocalisation confined to a relatively small region around the alizarin. Characterising and quantifying this delocalisation is the main focus of the present work and a detailed analysis will be provided in the following section. Here we just note that there is a clear qualitative difference between the excited state of the system within an implicit solvent model and the same state within an explicit representation of solvent molecules. The wide spread of solvatochromic shifts for different snapshots clearly shows the importance of accounting explicitly for the local solvent environment. Furthermore, the implicit solvent models predict a small blueshift for almost all frames, while most of the converged transitions in explicit solvent are strongly red-shifted. This red-shift can be ascribed to a partial delocalisation of the excitation onto the water (see Figure 2) that helps screen the dipole moment of the excited state and thus lowers the excitation energy in a way that cannot be easily accounted for in an implicit solvation approach. We note that a similar analysis to the solvatochromic shift in Figure 3 can also be performed for the oscillator strength of the excitation and is provided in the supporting information. It is generally found that accounting for polarisation effects through ε ∞ leads to an increase in oscillator strengths, while accounting for the environment quantum mechanically and letting the excitation delocalise causes the oscillator strengths to decrease from the ε ∞ values. This finding again reveals that the implicit solvent model accounts for polarisation effects in a fundamentally different way from the explicit water models, as an external dielectric screening of the response density allows for a higher dipole moment to be supported on the alizarin itself, which tends to increase the oscillator strength of the transition. On the other hand, if the excitation is allowed to delocalise onto neighbouring atoms, this delocalisation helps to reduce the net dipole moment, which tends to lower the oscillator strengths of the excitation.
All calculations to this point have been performed using the PBE semi-local exchange-correlation functional. 34 In order to ensure that the strong solvatochromic shifts seen in explicit solvent are not a result of a spurious delocalisation of the excitation due to the local nature of the functional, we also performed calculations using the CAM-B3LYP range-separated hybrid functional 39   the alizarin to excitations within the water. The results of the three different truncation regions for frames 1, 3 and 5 can be found in Figure 5.
As can be seen, the three snapshots show very different behaviour when constraining the excitation to be localised entirely on the alizarin, with frame 1 showing a strong blue-shift of 0.167 eV and frame 3 and 5 a moderate red shift of -0.076 eV and -0.077 eV respectively. However, while the final shift for the constrained excitation is similar between frame 3 and 5, the convergence of the shift with the volume of explicit water is considerably slower for frame 3. Since the excitation stays fully localised on the alizarin in these calculations, the obtained shift can be considered to be due to purely electrostatic effects of the surrounding solvent environment on the Kohn-Sham states associated with alizarin. The effect of allowing the excitation to delocalise onto the surrounding water is a strong red-shift in all three frames. Note that while for frame 5, results to within 10 meV of the unconstrained results can be obtained when constraining the excitation to within 4 Å, for the other frames it is necessary to let the excitation delocalise within the 7 Å box to recover all of the solvatochromic shift predicted from the completely delocalised calculation. However, in all three frames the excitation retains a relatively localised character. It can be summarised that the origin of the solvatochromic redshift is a partial delocalisation of the excitation that is charge-transfer in character but is confined to within a region of about 7 Å from the alizarin. The second component to the solvatochromic shift is the purely electrostatic component due to the specific configuration of the surrounding water. This electrostatic contribution can cause an additional redshift (frame 3 and 5), or a blueshift counteracting the redshift of the delocalisation of the excitation (frame 1).
Furthermore, this electrostatic contribution to the solvatochromic shift can show a very slow convergence with system size (frame 3), requiring considerably larger volumes of explicit water than the 7 Å volume into which the excitation delocalises. While the origin of the slow convergence of the shift in frame 3 as compared to other frames is difficult to determine, we point out that frame 3 is the only frame in which a water molecule forms a hydrogen bond with one of the oxygens of the alizarin itself (see supporting information). It is plausible that a larger amount of explicit solvent environment is needed to correctly converge the electrostatic influences on the water molecule and its effect on the electronic structure of the alizarin molecule.
In summary, we note that the explicit charge-transfer character delocalisation of the excitation is a large contributor to the total solvatochromic shift in all frames studied, and is a purely quantum mechanical effect that is likely difficult to capture with any polarisable continuum or classical force-field model in that region. It can therefore be concluded that the minimum size of a QM region for alizarin in water has to be chosen at around 7 Å (≈ 470 atoms) in order to obtain consistent results for the solvatochromic redshift due to delocalisation. This result has wider ramifications for the modelling of bright transitions in other pigment-solvent complexes, where it is likely that similar redshifts can occur if the system response and thus the excitation energy can be lowered by a partial delocalisation of the excited state into its surrounding environment.
While the truncation of the response density matrix is only introduced in this work to analyse the origin of the solvatochromic shift, it comes with a number of desirable consequences from a computational point of view. First, a sparsity in P {1} allows for fully linear-scaling LR-TDDFT calculations 29 with a potential to significantly speed up the largest TDDFT calculations performed in this work. Furthermore, the truncation removes all spurious charge-transfer states between the alizarin and "edge waters" from the subspace of allowed solutions, causing physical excitations localised on the alizarin to rigorously become the lowest excitations of the system for all models and frames considered. The truncation is, thus, an effective technique to avoid the convergence problems observed in the literature 33 even for local exchange-correlation functionals, greatly reducing the computational cost by reducing the number of states that need to be converged.
The localised character of the excitation can be quantified by integrating over the part of the electron and the hole density that is described by support functions in the basis set localised on atoms belonging to alizarin. The resulting value can be interpreted as the percentage of the electron and the hole that is localised on the alizarin. It is found that only about 2% of the electron and 5% of the hole delocalise onto the water (see Figure 6). Interestingly, the 12 Å system actually shows less delocalisation than the 6 Å system. This suggests that if the volume of solvent environment is chosen large enough to obtain a converged solvatochromic shift, the red shift due  Figure 6: Plot comparing the percentage of electron (filled circle) and hole (empty circle) densities for MD frames 3 (red) and 4 (blue) that is confined to the alizarin only.
to explicit delocalisation of the excitation can be achieved with a minimal amount of electron-hole delocalisation. When the volume of solvent environment is chosen too small, the constraint on the excitation is compensated by a slight over-delocalisation to nearby waters. Also, it is worth noting that the converged degree of electron-hole delocalisation is very similar for both snapshots, even though the lowest dipole-allowed state in frame 3 shows a considerably larger redshift than the corresponding state in frame 4. This again suggests that the amount of solvatochromic redshift due to explicit delocalisation is similar for each frame, while the large differences in solvatochromic shifts between different frames are due to electrostatic effects originating from the specific configurations of solvent molecules.
The relative localisation of the excitation within 7 Å raises the question of whether it is possible to replace the water molecules beyond that region by classical charges in a QM/MM approach to reduce computational costs. To test for this we take the 12 Å model of frames 1, 3 and 5 and perform two sets of different calculations. In the first set we replace all atoms beyond 4 Å and 7 Å respectively with classical charges taken from the TIP3P model. In the second set of calculations the atoms beyond 4 Å and 7 Å respectively are replaced by classical charges obtained from a Mulliken analysis of the full ground state DFT calculation in ONETEP. Both sets of results are compared to the results where the full 12 Å region is treated quantum mechanically but the excitation is constrained to within 4 Å and 7 Å. We note that the approach of simply replacing the waters beyond a certain region of the system by their classical TIP3P charge equivalents is a simple approach to a QM/MM calculation, as only the long-range coulombic part of the potential is considered. More sophisticated approaches to QM/MM involve polarisable force fields 22,23 which are likely to yield a better description of the classical region. The second set of QM/MM results obtained in this study is based on Mulliken charges that are computed fully quantum mechanically, and thus can be treated as a representation of an ideal QM/MM model obtainable with sophisticated polarisable force fields. Both sets of calculations taken together then reveal to which degree the long-range electrostatic influences on the excitation can be reproduced by classical charges when considering a relatively large QM region.
The results of the QM/MM analysis for three frames, using both the TIP3P charges and classical Mulliken charges for the classical regions, can be found in Figure 5. A detailed analysis is provided in the supporting information. Here we just note that the performance of both classical models is found to vary significantly for different frames. While the Mulliken charge model agrees well with the fully quantum mechanical representation for frame 1, it produces significant errors for frame 3 and shows a mixed performance for frame 3. The TIP3P model generally performs slightly worse in predicting the correct shifts when going from a 4 Å QM region to a 7 Å QM region and both models have the tendency to overestimate the total solvatochromic redshift, sometimes by a large amount (up to 0.16 eV for the 4 Å QM region and Mulliken charge model in frame 3).
Thus while the water molecules beyond a 7 Å region evidently do not take part directly in the excitation, their quantum mechanical treatment is still important to obtain reliable results, and hence a QM/MM approach could produce significant errors. This is particularly evident when considering the mixed performance of the QM/MM model when using classical charges derived from a Mulliken analysis of the ground state DFT calculation, as even sophisticated polarisable force fields 46 are unlikely to outperform DFT-derived charges. This suggests that the differences between the fully quantum mechanical and the classical treatment of the environment beyond the 7 Å region, either through classical charges or through continuum models, are due to a ground state effect that is quantum mechanical in nature. It implies that the Kohn-Sham states associated with the water that are involved in the delocalisation of the excitation are influenced by the long-range treatment of the solvent environment.
The slow convergence of the solvatochromic shift with respect to system size is thus due to an interplay of two different effects: a red-shift due to a charge-transfer delocalisation of the excitation, mainly within a region of about 7 Å from the alizarin containing about 470 atoms, and electrostatic effects due to the specific configurations of the water molecules of a given snapshot, which can induce a red-or blueshift and are relatively long-ranged. We also note that the influence of solvent molecules beyond the 7 Å region cannot be reproduced to very high accuracy by a QM/MM approach for a number of frames, although the performance of a Mulliken classical charge model is found to be relatively good for one of the frames considered.

Conclusion
We have performed a detailed study of the origins of solvatochromic shifts of alizarin in water, with a special emphasis being placed on the effects of including large volumes of explicit solvent in the calculation. We have found that large system sizes, of up to 1200 atoms, can become necessary to obtain accurate results, providing a challenge for the study of organic dyes in solution. However, we have also demonstrated that the excitation itself retains a relatively localised character and can be well-represented by a truncated response density matrix, thus opening up the possibility of significantly speeding up TDDFT calculations in these systems by exploiting linear-scaling techniques. These techniques have the additional advantage of removing any spurious charge-transfer states from the solute to the edge of the solvent box from the calculation. Therefore, when using appropriate truncation techniques, (semi)-local functionals are found to be well-suited for the study of pigment-solvent systems, thus removing problems found in previous studies. 33 The degree of delocalisation of the excited states observed in the alizarin suggests that any approach which includes waters within a range of less than 7 Å from the solute in the TDDFT calculation must introduce significant errors by forcing the excitation to artificially localise. We showed that this delocalisation is not simply an artefact of semi-local functionals by repeating a number of tractable calculations with a range-separated hybrid functional. We stress that our test system forms a very simple example of pigment-solvent interactions and that larger pigments as well as more complicated pigment-protein complexes will most likely need considerably larger TDDFT calculations to allow for the correct delocalisation of excited states. Thus, our findings are in agreement with previous studies suggesting the necessity of the use of large QM regions in these systems. 6,15 The issues raised in the present work, especially regarding the contribution to solvatochromic shifts due to delocalisation effects that are quantum mechanical in nature, are therefore likely to be of relevance in the computational modelling of a wide variety of different systems, ranging from the prediction of colours of dyes in solutions to the study of large pigment-protein complexes in computational biology.