How the pH Controls Photoprotection in the Light-Harvesting Complex of Mosses

In response to varying light conditions, light-harvesting complexes (LHCs) switch from a light-harvesting state to a quenched state to protect the photosynthetic organism from excessive light irradiation in a strategy known as nonphotochemical quenching (NPQ). NPQ is activated by an acidification of the thylakoid lumen, which is sensed directly or indirectly by the LHC, resulting in a conformational change of the complex that leads to the quenched state. The conformational changes responsible for NPQ activation and their connection to specific quenching mechanisms are still unknown. Here, we investigate the pH-triggered conformational changes in the light-harvesting complex stress-related (LHCSR) of mosses. By combining constant-pH molecular dynamics and enhanced sampling techniques, we find that the pH sensitivity of the complex is driven by the coupled protonation of three residues modulating the conformation of the short amphipathic helix placed at the lumen side of the embedding membrane. Combining these results with quantum mechanics/molecular mechanics calculations, we show that the quenching mechanism sensitive to the pH goes through a charge-transfer between a carotenoid and an excited chlorophyll, which is controlled by the protein conformation.


S1 Details on Molecular Dynamics simulations
As mentioned in the main text, the sets of constant-pH (CpHMD) and Gaussian accelerated (GaMD) molecular dynamics simulations performed in this work started from a homology model of LHCSR1 recently reported by some of us. S1 More specifically, we used as input structure the last frame of the 1 µs MD8 classical dynamics modeled in Ref., S1 which consists on a monomer of the protein-pigment complex embedded in a DOPC bilayer membrane and solvated with water molecules. For further details on the homology model construction we refer the user to the original publication. S1 Moreover, the force fields employed for describing the protein, the pigments and the lipids are the same used for the homology model building.
These are: the AMBER ff14SB force field for the protein; the force field by Prandi et al S2 for the Carotenoids; the force field by Ceccarelli et al. S3 with Zhang et al. modifications S4 for chlorophylls a; the lipid14 S5 force-field for lipids. Water was described with the TIP3P model.

S2 Fitting of CpHMD titration curves
To obtain the calculated pK a values and Hill coefficients (n) reported in Figure 2 in the main text, we fitted the protonated fraction along CpHMD simulations (f ) and the pH to the modified Hill equation S6-S8 (Eq. 1), by using the Levenberg-Marquardt nonlinear leastsquares algorithm implemented in SciPy. S9 f = 1 1 + 10 n (pH−pKa) (1) S3 Figure S1: Protonation fraction (f ) for eight lumenal residues in LHCSR1. Each point represent the average value of a chunk of 10 ns. Data correspond to the first set of CpHMD replicas performed at pH 3.0 to 8.0. Figure S2: Protonation fraction (f ) for eight lumenal residues in LHCSR1. Each point represent the average value of a chunk of 10 ns. Data correspond to the second set of CpHMD replicas performed at pH 3.0 to 8.0.

S4 Protonation state transitions analysis
In order to analyze the protonation patterns sampled during the constant pH MD, we have constructed a network model in a similar spirit to what is done in Ref. S10. First,starting from the protonation state trajectories visited during the constant pH simulations, we have mapped each frame to a discrete index identifying a protonation microstate. The protonation microstates that we have considered are all the possible combinations (eight) of protonated/deprotonated residues formed by the triad E114-E227-E233 (using P for protonated and D for deprotonated, these microstates are PPP, PPD, PDP, DPP, DDP, DPD, PDD, DDD).
These mapped (featurized) trajectories have been employed to build a Markov matrix T, where T ij = p ij is the probability of going from microstate i to microstate j. This matrix was constructed by counting transitions from the i-th microstate to the j-th microstate using the "sliding window" scheme and employing a maximum likelihood Markov model estimation. S11 This matrix was finally employed to visualize the transitions and the stability of each microstate with the aid of the pyEMMA S12 software, as shown in Figure 4 in the main text. In this plot, the circle sizes are proportional to the entries of the stationary distribution π of the transition matrix, π = Tπ. The arrows show the fluxes f ij = π i p ij associated with the transition from i to j.

S5 GaMD acceleration parameters and reweighting
Since a comprehensive review about the methodological framework and applications of the Gaussian Accelerated Molecular Dynamics (GaMD) technique is provided in Ref. S13, in this work we only specify the computational details needed to reproduce our reported results (see Section 4.2 in the main text).
In order to obtain an accurate reweighting of the free energy profile for each pMS by using the cumulant expansion to the second order, acceleration parameters of the applied boost potential (∆V) were collected every 2 ps along each of the 2.0 µs GaMD replicas, for S6 a total of c.a. 10x10 6 data. Therefore, c.a. 30x10 6 values of ∆V for each pMS were used for reweighting, except for DPP that used c.a. 40x10 6 . Figure S3 illustrates the gaussian distribution of these data, Table S1 reports the statistics of ∆V in terms of average and standard deviation per pMS, and Figure S4 shows the 2D reweighting of the free energy projected onto our tICA space. Figure S3: Gaussian distribution of the boost potential applied to the GaMD simulations performed for six pMSs. Each plot represent compiled data of three independent replicas of 2.0 µs each, except DPP which contains four replicas.  Figure S4: 2D PMF (tICA 0, tICA 1) profile of each of the six analyzed pMSs, calculated by reweighting the set of 2.0 µs GaMD replicas. The first 200 ns of each replica were excluded from reweighting. The white contour represents the tICA reduced space (see Section S6).

S6 tICA construction
In order to identify the slow modes that drive the enhancing conformational sampling in our set of GaMD simulations, which mimic how the conformational space explored by LHCSR1 is modulated by specific pH-dependent protonation microstates (pMS), we have employed the time-lagged Independent Component Analysis (tICA). Such an approach is a dimensionality reduction method widely used for the interpretation of structures obtained from molecular dynamics, which allows linearly transforming a predefined set of structural parameters (e.g. distances, angles, and dihedrals) into collective coordinates sorted by "slowness", to get a two-dimensional description of the conformations. S14-S16 As further described in the main text, we concentrate our analysis on the interpretation of structural changes evidenced in the lumenal side of LHCSR1, by selecting distances and side chain dihedrals involving the three residues characterizing the pMS (E114, E227, and E233), as well as distances and The above listed parameters were computed for 90000 structures (i.e., frames) from each of the 19 GaMD replicas, extracted using an equidistant time step along the 2.0 µs of simulation and after excluding the first 200 ns. Therefore, structural data of a total of 171000 frames was used as input for training the tICA algorithm implemented in the Deeptime python library, S17 using a lagtime of 2 ns. As a result, the two coordinates that show the greatest correlation time was selected as the first two tICA eigenvectors (tICA1 and tICA2) defining our tICA space. As shown in Figure 5A in the main text, such space was used to project not only the six studied pMSs, but also the unbiased MD8 that was not included in the training set. Moreover, Figure S5 plots the projection of the set of CpHMDs. Figure S5: Projection of CpHMDs onto the tICA space. Individual CpHMD simulations for each pH are projected onto the first two principal tICA components computed for the GaMD simulations on pMSs. The first 100 ns of each MD were excluded from the calculation of structural parameters. S10

S7 Clustering
The tICA space determined from our featurization of the accelerated simulations was used as a starting point for the definition of clusters of structures similar to each other. The clustering was performed using the Python implementation of hdbscan, S18 a density-based clustering algorithm. The hyperparameters of the model, namely min_samples, min_cluster_size, and cluster_selection_epsilon, were optimized with the aid of scikit-learn S19 by performing cross-validation (CV) over a random search in hyperparameter space. As a CV score to be maximized, we have chosen the validity index S20 as provided by the hdbscan software. The validity index is higher for dense clusters separated by regions of low density, and therefore favors the choice of small and densely populated clusters. Among the set of hyperparameters yielding the highest CV score, we have chosen the one that provided the more evenly-sized clusters. Finally, the structures from the learned clusters were visualized with VMD, S21 and one cluster which showed significantly heterogeneous interactions of residues E114, E227, E233 was manually split in two. This procedure finally yielded six clusters. We note that not all structures belong to a cluster, as the sparsely populated regions of the tICA space are assigned to noise.

S10 Comparison of the L1 site of LHCSR1, LHCII, and CP29
In the main manuscript, we show that the energy of the CT state Lut + Chl − in LHCSR1 is similar to that in LHCII, S22 Figure S8. Figure S8: Comparison of the residues in the L1 site of LHCSR1 (left), LHCII (center), and CP29 (right). In the three cases, the image is centered on the Chl a612, taken as a common reference. LHCSR1 and LHCII have a lysine residue near the Lutein-Chl pair. CP29, on the other hand, does not present the lysine nor an analogous positively charged residue. Residues along helix A are colored according to the residue type: white for non-polar, blue for basic, red for acidic, and green for polar residues. S13 S11 Lutein position in the a612 reference frame Figure S9: Scatter plot of the center of mass displacement of the L1-Lut isoprenic chain with respect to a reference system fixed onto the Mg 2+ ion of Chl a612. The mean of each cluster is indicated with a star. The covariance ellipse enclosing 40% of the data is also reported.

S13 Effect of Chl site energies on the kinetic model for the lifetime of the complex
As specified in Section 4.5 in the main text, the coarse-grained kinetic model employed in this work makes a series of approximations that might affect the computed rates and complex lifetimes. For instance, it considers exclusively the CT channel L1-Lut/a612 (charge-separate state a612 * /L1-Lut + a612), and assumes that a612 is in fast equilibrium with the pool of the other seven Chls a present in LHCSR1 (see Figure S11). The latter represents a strong assumption since it equally populates the a Chls in the pool. (see Section 4.3).
To validate this assumption, we used an alternative model S24 Table S5. To be consistent with the LE energies calculated in Ref. S1 the CT energy was determined by adding to the energy of a612 the CT-LE average energy difference reported in Table S5. After building the generalized exciton Hamiltonian, we performed a block diagonalization of the exciton domains as described in Ref. S24 Then, by using each of the three sets of reorganization energies listed in Table S4 we computed the driving force (∆G) and rates for each Chl a exciton in the a610-a611-a612 domain. Finally, we computed the mean excitation lifetime of the complex.
The results are presented in Table S6 and Figure S10. As observed, the new lifetimes have the same order of magnitude of those obtained with the coarse-grained model (see Table S5) when comparing values obtained with the same set of reorganization energies.
Interestingly, as illustrated in Figure S10, both models produce the same trend in τ complex among the clusters. We conclude therefore that improving the description of the Chl part S16 in the model does not alter the results obtained with the coarse grained model reported in the main text.   Figure S11: Pigments composition in PpLHCSR1. The complex contains 8 chlorophylls a (green), two luteins located in the sites called L1 and N1 (orange), and a violaxanthin (purple) in the L2 site.