Proteome-wide Analysis of Protein Thermal Stability in the Model Higher Plant Arabidopsis thaliana

Thermal profiles and melting temperatures have been modeled for over 1700 proteins from the model plant Arabidopsis thaliana using six biological replicates, providing a solid groundwork for future thermal shift studies in the species. Highly significant global correlations were found between melting temperature and several protein characteristics, including molecular weight, secondary structure content, and hydrophobic and charged accessible surface areas. Graphical Abstract Highlights Over 1700 Arabidopsis proteins with thermal models in multiple replicates. Melting temperature correlates with 1°, 2°, and 3° protein characteristics. Ligand-induced thermal shifts are evident in complex protein extracts. Modern tandem MS-based sequencing technologies allow for the parallel measurement of concentration and covalent modifications for proteins within a complex sample. Recently, this capability has been extended to probe a proteome's three-dimensional structure and conformational state by determining the thermal denaturation profile of thousands of proteins simultaneously. Although many animals and their resident microbes exist under a relatively narrow, regulated physiological temperature range, plants take on the often widely ranging temperature of their surroundings, possibly influencing the evolution of protein thermal stability. In this report we present the first in-depth look at the thermal proteome of a plant species, the model organism Arabidopsis thaliana. By profiling the melting curves of over 1700 Arabidopsis proteins using six biological replicates, we have observed significant correlation between protein thermostability and several known protein characteristics, including molecular weight and the composition ratio of charged to polar amino acids. We also report on a divergence of the thermostability of the core and regulatory domains of the plant 26S proteasome that may reflect a unique property of the way protein turnover is regulated during temperature stress. Lastly, the highly replicated database of Arabidopsis melting temperatures reported herein provides baseline data on the variability of protein behavior in the assay. Unfolding behavior and experiment-to-experiment variability were observed to be protein-specific traits, and thus this data can serve to inform the design and interpretation of future targeted assays to probe the conformational status of proteins from plants exposed to different chemical, environmental and genetic challenges.


In Brief
Thermal profiles and melting temperatures have been modeled for over 1700 proteins from the model plant Arabidopsis thaliana using six biological replicates, providing a solid groundwork for future thermal shift studies in the species. Highly significant global correlations were found between melting temperature and several protein characteristics, including molecular weight, secondary structure content, and hydrophobic and charged accessible surface areas.

Graphical Abstract
Proteome-wide Analysis of Protein Thermal Stability in the Model Higher Plant Arabidopsis thaliana □ S Jeremy D. Volkening ‡, Kelly E. Stecker §, and Michael R. Sussman ‡ ¶ Modern tandem MS-based sequencing technologies allow for the parallel measurement of concentration and covalent modifications for proteins within a complex sample. Recently, this capability has been extended to probe a proteome's three-dimensional structure and conformational state by determining the thermal denaturation profile of thousands of proteins simultaneously. Although many animals and their resident microbes exist under a relatively narrow, regulated physiological temperature range, plants take on the often widely ranging temperature of their surroundings, possibly influencing the evolution of protein thermal stability. In this report we present the first in-depth look at the thermal proteome of a plant species, the model organism Arabidopsis thaliana. By profiling the melting curves of over 1700 Arabidopsis proteins using six biological replicates, we have observed significant correlation between protein thermostability and several known protein characteristics, including molecular weight and the composition ratio of charged to polar amino acids. We also report on a divergence of the thermostability of the core and regulatory domains of the plant 26S proteasome that may reflect a unique property of the way protein turnover is regulated during temperature stress. Lastly, the highly replicated database of Arabidopsis melting temperatures reported herein provides baseline data on the variability of protein behavior in the assay. Unfolding behavior and experiment-to-experiment variability were observed to be protein-specific traits, and thus this data can serve to inform the design and interpretation of future targeted assays to probe the conformational status of proteins from plants exposed to different chemical, environmental and genetic challenges. Proteins are fundamental macromolecules involved in all aspects of life, from catalyzing metabolic reactions to providing a scaffold for cellular organization to transmitting external environmental changes into the nuclear transcriptional machinery. Until recently, nearly all large-scale proteomic studies have focused on quantifying changes in protein abundance or degree of post-translational modification to amino acid sidechains. However, changes in protein function result from changes in conformation at the secondary, tertiary, and higher-level structures. Indeed, current evidence suggests that three-dimensional structure is more highly conserved between evolutionarily related proteins than is their primary amino acid sequence (1,2,3). Unfortunately, the technology to examine three-dimensional structure at the proteome scale has historically been lacking. Recently, several technologies have emerged that attempt to address this deficiency and have been applied to animal studies, but to our knowledge no such studies have yet been published in the domain of plant research.
It is widely accepted that most cellular interactions involving proteins depend upon and/or induce changes in a protein's three-dimensional conformation. The kinetics and energetics of these changes are closely related to the protein's thermal stability. It is thus reasonable to expect that an organism's proteome will have evolved to minimize the energy needed to maintain any given protein's function at physiological temperature while considering the requirements of any modes of regulation it may undergo. An understanding of global differences in relative thermal stability of proteins may thus provide insight into how different proteins have evolved to function in an energetically efficient way. In addition, given that many plants are exposed to the elements in all seasons and experience large fluctuations in temperature, it is logical to ask whether their proteomes have developed unique characteristics of thermal stability and conformation to preserve their functions under changing environmental conditions.
There are numerous quantifiable attributes of a protein potentially related to its conformation and thermal behavior. One such characteristic is the melting temperature (T m ), typically defined as the temperature at which half of a protein population is unfolded. Until very recently, available techniques for estimation of protein T m have relied upon measurement of various properties of a purified protein solution in vitro. This generally involves isolating a purified protein of interest and observing changes in physical or chemical properties of the solution across a temperature gradient. High-throughput screens are also possible-for instance, when the measurement technique can be carried out in 96-or 382-well platesbut this still requires purification of individual proteins, a laborious and time-consuming step.
In 2014, an untargeted method called thermal proteome profiling (TPP) 1 was introduced which used isotope-encoded multiplexed mass-spectrometry (MS)-based quantification to profile thousands of proteins simultaneously (4). The basis of the technique is like those mentioned above, in that a measure of protein conformation is tracked across a temperature gradient (Fig. 1). In this case, protein solubility is used as a proxy for folding state given that unfolded, denatured proteins precipitate out of solution. After centrifugation to remove proteins precipitated across a temperature gradient, MS/MS of isobarically labeled tryptic peptides derived from the remain-ing nondenatured proteins is used to measure relative abundance of individual proteins. The resulting temperature-abundance profile for each protein is fit to a standard two-state protein melting model and used to calculate a T m (Fig. 2). Because the original TPP protocol was published, additional methodologies have been introduced which use different readouts for protein stability and/or different MS-MS based quantification strategies. In the work described herein, we have chosen to use multiplexed isobaric labeling as in the original TPP paper because of the ease of direct relative quantification at all temperature points for every peptide identified.
There exists considerable untapped potential for TPP in the plant research community, in which many receptor-ligand pairs and protein-protein interactions remain poorly understood. To that end, we have undertaken a characterization of the thermal proteome of the model plant Arabidopsis thaliana, of which much is already known about the proteome and its modifications. By using a relatively large number of biological replicates and applying extensive offline fractionation, our aim was to produce a large and robust database of untreated T m measurements that could serve as the groundwork for future targeted work in the species. In particular, it is critical to understand the limitations of a technology in the domain of 1 The abbreviations used are: TPP, thermal proteome profiling; AI, aliphatic index; ATP, adenosine triphosphate; CvP, charged-versuspolar; FA, formic acid; FDR, false discovery rate; GO, Gene Ontology; HC, high confidence; HCD, higher-energy collisional dissociation; MC, medium confidence; pI, isoelectric point; PSM, peptide-spectrum match; SASA, solvent-accessible surface area; TEAB, Triethylammonium bicarbonate; TMT, tandem mass tag.
FIG. 1. Schematic of TPP workflow. Plants were grown hydroponically and protein was extracted as described in methods. Ten aliquots were incubated over a temperature gradient and the clarified supernatant was subjected to ten-channel isobaric labeling (one tag per temperature). Standard MS/MS-based quantification was used to produce protein-level relative abundance values which were fit to the two-state model as described. FIG. 2. Modeling protein melting using TPP. A, MS/MS-based relative abundance data for a protein at ten temperature points is modeled according to the logistic function shown. The melting point (T m , red dashed line) is defined as the point where half of the protein remains in solution (green dashed line). The parameters are arranged such that m is equal to T m , k controls the slope of the curve (purple dashed line) for a given value of m, and p defines the lower asymptote. The exponential term in the denominator represents K eq of the unfolding equilibrium. B, An example of a protein profile from real-world Arabidopsis treatment data.
interest, and the database of melting profiles developed in this work provides a valuable resource of information on which proteins "behave well" in the assay and with what experimentto-experiment variability, providing researchers with information on their proteins of interest prior to embarking on a TPP experiment.
In the course of this work, the question also arose whether the database of T m measurements could be used to add to the understanding of more general questions regarding protein structure and thermostability. Such questions are of widespread interest both to basic researchers and to those interested in applying such knowledge to the engineering of novel proteins. To this end, we undertook an analysis of the correlation between the empirical T m s and various possible physicochemical determinants of protein thermostability. We were interested in how such determinants might be preserved or differ between a "poikilothermic" plant proteome and existing data from other kingdoms.
Lastly, we demonstrated the ability, in a complex extract from plant tissue, to detect in vitro conformational changes at the proteome-wide scale caused by a common co-factor, adenosine triphosphate complexed with Mg 2ϩ , and correlate these changes with existing knowledge of protein binding sites. Taken together, this initial work establishes a baseline for future studies on wild-type and mutant Arabidopsis and other plant species grown under a variety of environmental, chemical and genetic perturbations.

EXPERIMENTAL PROCEDURES
Experimental Design and Statistical Rationale-For the generation of the core T m database and analysis of factors affecting thermostability, six untreated biological replicates were used. These replicates were grown and processed at different times over the course of several months to minimize batch effects. The relatively high number of biological replicates was used to overcome the low degree of sample-to-sample overlap in protein IDs which is common in shotgun MS/MS experiments. For the ATP treatment study, single biological replicates were used for treatment and control. The lack of replicates was primarily a cost consideration, and we justified this choice on the basis that (1) this was primarily a proof of principle on the application of a new technology in the plant kingdom, and (2) we were looking for a broad response across a large family of proteins rather than for a reproducible response in any given protein. Additionally, each replicate with eight or more underlying PSMs has 90% confidence intervals applied to both the individual datapoints (vertical error bars) and T m estimates (horizontal shading) using the bootstrap method described previously (4). The various statistical filters applied and methods used for significance testing are fully described below in the relevant sections. Of note, no multiple testing correction was applied to the GO enrichment p values as the package authors suggest it is redundant with the algorithm used. This agreed with our own observations that Benjamini-Hochberg correction applied to the results seemed to be overly conservative. However, stringent cutoffs (p Ͻ 0.002 and 0.005 for lower and upper bins) were applied to the terms reported herein.
Tissue Propagation, Harvesting, and Treatment-Arabidopsis Col-0 seeds were grown in liquid medium (0.5ϫ Murashige and Skoog salts, 1% (w/v) sucrose, 0.05% (w/v) MES, pH 5.7) under constant light for 11 days. Tissue balls were immersed in deionized water and gently spun in a commercial salad spinner to remove adhering solution. For untreated replicates C1, C2, C5, and C6, tissue was flash frozen in liquid nitrogen, homogenized with mortar and pestle, and resuspended in ice-cold homogenization buffer (230 mM sorbitol, 50 mM Tris-HCl pH 7.5, 10 mM KCl, 3 mM EGTA, and the following protease inhibitors added fresh: 1 mM potassium metabisulfite, 1 mM PMSF, 0.5 g/ml leupeptin, 0.7 g/ml pepstatin, 1 Roch protease inhibitor tablet). For untreated replicates C3 and C4 and the ATP treated and control samples, homogenization was performed without freezing by placing rinsed tissue balls in 70 l of homogenization buffer and grinding in an upright tissue blender for 30 s at maximum speed. For all samples, the tissue homogenate was cleared by filtering through two layers of Mira-cloth (Calbiochem, St. Louis, MO), placed in a chilled centrifuge tube and spun at 100,000 ϫ g for 20 min at 4°C. For the in vitro ATP binding experiment, Mg-ATP or mock solution was added to 2 ml of crude extract to a final concentration of 2 mM and incubated for 15 min at room temperature. In this experiment, one biological replicate was performed for treated and control conditions.
Gradient Precipitation-Each sample was aliquoted into ten microfuge tubes in volumes ranging from 0.2 to 1.0 ml for different experiments (to normalize protein concentrations) and placed on equilibrated heating blocks containing mineral oil to facilitate rapid thermal transfer across the tube wall. The following ten-temperature gradients were used for each replicate: 21.0 -61.5°C (C1-C2) and 25.0 -65.5°C (C3-C6). Tubes were incubated at the given temperature for 10 mins, removed and allowed to cool for 10 mins at room temperature, and placed on ice. Tubes were spun at 17,000 ϫ g for 20 min at 4°C to pellet precipitated protein and the supernatant was carefully transferred to a new microfuge tube. At this stage, 1.0 g of bovine serum albumin (BSA) was spiked into each tube as an internal standard to facilitate downstream data normalization.
Protein Extraction, Digestion, and Cleanup-Methanol-chloroform protein extraction was performed on the cleared supernatant as described previously (5) and protein pellets were resuspended in 8 M urea. The 660 nm Protein Assay Reagent kit (Pierce, Waltham, MA) was used to quantify proteins in each sample at this stage for later use in orthogonal data normalization. Extracts were diluted to a final concentration of 4 M urea using 50 mM ammonium bicarbonate, reduced with 5 mM DTT for 45 min in a 42°C water bath, and alkylated with 15 mM iodoacetic acid for 45 min in the dark at room temperature. Alkylation was quenched by adding 5 mM DTT for 5 min at room temperature. Protein was digested with LysC (Wako, Richmond, VA) at a 1:60 enzyme:protein ratio at 37°C for 2 h. Samples were diluted to 1.2 M urea and digested with trypsin (Promega, Madison, WI) at a 1:40 enzyme/protein ratio at 37°C overnight. A minimum of 0.1 g of LysC and 0.2 g of trypsin was added to all samples. De-salting was performed using OMIX C18 tips (100 l capacity, Agilent, Santa Clara, CA) as follows. Digests were acidified to pH ¡3 using 20% formic acid (ϳ4 l per 140 l digest). OMIX tips were equilibrated with 3 ϫ 100 l rinses of 75% acetonitrile (ACN) followed by 4 ϫ 100 l rinses of 0.1% TFA. Samples were bound to resin by pipetting up and down ten times, washed 2ϫ with 100 l 0.1% TFA, washed 1ϫ with 100 l 0.01% TFA, and eluted with 75 l of 75% ACN and 0.1% formic acid into a low-protein-binding microfuge tube. Vacuum centrifugation was used to reduce sample volume for isobaric labeling.
Isobaric Labeling-Digests were resuspended in 25 l of 150 mM TEAB, 5% ACN. TMT-10plex reagents (Thermo Fisher Scientific, Waltham, MA) were resuspended in 75 l 100% ACN to a concentration of 10.7 g/l. TMT reagents and protein digests were mixed to achieve a 3:1 label:protein ratio in a 40 l volume at 60% TEAB and 40% ACN. The actual label and protein concentration in each tube varied as higher temperature fractions contained less protein, but a minimum label concentration of 1.33 g/l was used. The specific isobaric tag used for each temperature was varied from replicate to replicate (see supplemental Materials for exact assignments). Samples were labeled for 2 h at room temperature and quenched by adding 5 l of 5% hydroxylamine solution for 15 min at room temperature. All ten temperature fractions for each sample were then pooled.
Offline Fractionation-Samples were vacuum centrifuged to remove ACN and subjected to offline high-pH RP-HPLC fractionation using a Waters 2795 Separation Module HPLC, Gemini C18 5 m LC-MS/MS-Samples were analyzed on an Orbitrap Elite mass spectrometer (Thermo). Inline nanoflow HPLC was performed on a C18 column at a flow rate of 300 nL/min using the following 2-hr gradient: solvent A (0.1% FA); solvent B (95% ACN, 0.1% FA); 0% B at min 0 -30; 3% B at min 31; 30% B at min 108; 50% B at min 113; 95% B at min 118; 0% B at min 123-126. MS/MS spectral data were acquired using the following settings: MS1 acquisition at 120,000 resolving power and a mass range of 380 -1800 m/z. The top ten precursor ions for each scan period, subject to dynamic exclusion, were isolated for MS2 using a 2.0 m/z isolation window width and 200 ms maximum injection time. HCD fragmentation was used to produce product ions for analysis in the Orbitrap at 30,000 resolving power and over a dynamic mass range starting at 100 m/z and bounded at the upper end relative to the precursor mass.
Data Analysis-Thermo RAW files were converted to centroided mzML using msconvert (6) version 3.0.7494 with vendor-supplied peak-picking. A search database was generated from the TAIR10 representative protein sequences concatenated with the GPM cRAP database of common contaminants (http://www.thegpm.org/crap/) which includes the BSA spike-in sequence. A set of decoy sequences generated by reversing the original sequences was added, and the protein sequence order of the resulting database was randomized. The final database contained 55064 target and decoy sequences and is available in the supplemental data repository. The MS2 spectra were searched against this database using the comet search engine (7) version 2016.01 rev. 3 with the following settings: trypsin cleavage (max 1 missed cleavage, min 2 tryptic termini), variable Asn/Gln deamidation, variable Met oxidation, static Cys carbamidomethylation, static N-term/Lys TMT labeling, 0.03 fragment bin tolerance, 0.00 fragment bin offset, 10 ppm precursor mass tolerance, b/y ion series with NH 3 /H 2 O neutral loss. The exact configuration file used is available in the supplemental data repository. PeptideProphet (TPP v4.8.0) was used to combine alkaline fractions and calculate posterior probabilities for spectral matches (accurate mass binning, nonparametric model) (8). PSMs were filtered to a 1% FDR based on the per-charge probability ROC cutoffs reported by PeptideProphet. Protein identification was performed using ProteinProphet (TPP v4.8.0) with default settings (9). Quantification of the TMT channels from each matching spectrum was performed using tmt_quant version 0.010 (https://github.com/jvolkening/ms_bin), using two-step run-specific recalibration of the channel windows and performing isotope interference calculation as previously described (10). Full quantification data in tab-delimited format is available in the supplemental data repository.
Protein-level quantification, normalization, curve-fitting and T m estimation were performed on the filtered PSM tables using our publicly available R package mstherm version 0.4.8 (https://cran.r-project. org/packageϭmstherm). Bootstrap-based 95% confidence intervals were calculated as previously described (4) for all proteins matching the following filtering criteria: minimum total PSMs: 10; minimum distinct peptides: 2; maximum co-isolation interference: 0.3; maximum model slope: Ϫ0.03; minimum model R 2 : 0.7. Protein-level quantification was performed based on summed channel intensities across spectra. Loess smoothing was performed on the data prior to model fitting.
Protein primary characteristics (molecular weight, GRAVY, isoelectric point, CvP, etc) were calculated using ms-perl (http://github.com/ jvolkening/p5-MS). Predicted secondary structure features were calculated using the GOR algorithm as implemented in garnier version 6.6.0.0 (11,12). Protein abundance values were extracted from the PaxDB Arabidopsis integrated whole-plant data set (13). Two statistical tests were used to test for significant patterns among thermostability bins. The Mann-Whitney U test was used to compare the lowest (unstable) and highest (stable) bins for difference in mean whereas Kendall's tau rank correlation was performed using ordinal bin numbers to test for correlation across all bins. All the results reported regarding correlation of protein features with thermostability were calculated after removing the ribosomal proteins, which were highly abundant in the data and were observed to be skewing the results because of the specific amino acid composition of that protein class. Gene set enrichment analysis was performed with the R package topGO version 2.28.0 (14) using the Fisher exact test and the elim algorithm.
For tertiary structure calculations, all available Arabidopsis protein structures and sequences were downloaded from the RCSB Protein Data Bank. Redundant chains (chains from the same structure with identical sequences) were collapsed to a single sequence, and the resulting database was clustered with the TAIR10 representative protein database using CD-HIT version 4.6, requiring a minimum identity of 0.98 and a minimum length overlap of 0.9. PDB structures with matches to TAIR10 proteins were retained for further analysis. The VADAR structural prediction server (version 1.5) (15) was used to calculate surface area values for each structure, and values from structures in the same CD-HIT cluster were averaged by mean. Correlation analysis was performed as described above using only those proteins with both structures and modeled T m s in the HC set. Compactness was calculated as 3 Ϫ SASA ISA , where ISA is the surface area of a sphere of the same volume.

RESULTS
Protein Quantification and Modeling-A total of 1.4 million MS2 spectra were collected from 74 offline high-pH RP-HPLC fractions of six biological replicates. Of these spectra, 313,710 were matched to tryptic peptides at a 1% peptide FDR, representing 4246 identified proteins at a minimum Pro-teinProphet probability of 0.9 (min. two distinct peptides per protein) (supplemental File S1). After modeling and filtering as described above, a total of 2953 unambiguous protein groups were assigned estimated T m values in at least one replicate. After filtering to remove proteins represented in more than one group, 2073 proteins remained. These proteins were further filtered at two levels of confidence. Group HC ("high confidence"; n ϭ 922) contained proteins modeled in three or more replicates with x Ͻ 1.5°C and from at least 3 distinct peptides, and group MC ("medium confidence"; n ϭ 1707) contained proteins modeled in two or more replicates with x Ͻ 1.8°C and from 2 or more distinct peptides. All further analyses refer to group HC, except where noted. The distributions of T m s in all six replicates as well as the median distribution are shown in Fig. 3. The Tukey five-number summary for the HC median T m distribution was: 36.9, 41.5, 45.6, 50.9, 63.3. Melting curve plots for all 2953 protein groups modeled are available in supplemental File S2 and plots for HC proteins only are available in supplemental File S3.
Features of Thermostability-Eight chemical and structural features for each quantified protein were examined for potential correlation with thermostability: molecular weight, protein abundance, aliphatic index (AI), isoelectric point (pI), relative hydrophobicity (GRAVY), charged versus polar residue bias (CvP), predicted secondary structure composition, and relative composition of each of the 20 standard amino acids. All these features were either calculated or predicted directly from primary amino acid sequence or found in published databases. Proteins in the HC group were partitioned into four bins with equal membership and the bins were tested for statistically significant differences in feature distribution. Of the above features, molecular weight, hydrophobicity, CvP bias, and ␣-helix and ␤-sheet composition showed highly significant correlation with relative thermostability (all p Ͻ 3.226 ϫ 10 Ϫ4 for both tests) (Fig. 4A). Molecular weight was observed to decrease with increasing T m , in agreement with previous observations (16,17), as did CvP bias. We observed a statistically significant increase in relative hydrophobicity with increasing T m as calculated by the GRAVY index (18), albeit with a small magnitude of change. Correlation with secondary structure showed an increase in the proportion of residues residing in predicted ␤-sheets and a decrease in ␣-helix residues with increasing T m . When examining specific amino acid composition, we found that the charged residues glutamic acid and arginine were highly depleted in thermostable proteins, whereas the polar residue serine is signifi-cantly enriched (Fig. 5). These three amino acids alone likely account for the strong correlation with CvP observed above. The full table of replicate T m s, mean T m s and variances for the HC protein set, along with all calculated covariates, is available in supplemental File S4.
Tertiary Features of Thermostability-Of the 215 nonredundant Arabidopsis protein structures compiled, 61 had modeled T m s in the HC protein set. The following features were extracted from the VADAR output: nonpolar accessible surface area (ASA) relative to total ASA; relative polar ASA; relative charged ASA; total volume; and compactness (as described in the Methods). Quartile binning and correlation analysis were carried out as for 1°and 2°features above. Of the features tested, nonpolar ASA was positively correlated with thermostability at the 0.05 level based on the Mann-Whitney U test (p ϭ 0.015) (Fig. 6). Charged ASA was seen to be negatively correlated with thermostability (p ϭ 0.014). Other features, including protein compactness, did not show any significant trend.
Thermostability-associated Functional Enrichment-Gene set enrichment analysis was performed on the lowest (unstable) and highest (stable) quartile bins of the HC T m data set to test whether specific bins were associated with functional classes. The results are presented in Tables I and II. Within the unstable bin, enrichment in the three ontologies primarily involved ribosomal proteins/nucleic acid binding, proteasomal proteins, and cytoskeletal proteins. Within the stable bin, enrichment was found in terms relating to protein folding, carbon fixation, and the proteasome. Proteins involved in carbon fixation were highly enriched in the thermostable bin and included two PEP carboxylases, three RuBisCo subunits, and several other Calvin cycle proteins. Full result tables for all terms tested are available in supplemental File S5.
The Arabidopsis Proteasome-It was observed during gene set enrichment analysis that proteasome subunits were enriched in both the lowest and highest bins. We therefore looked carefully at the T m distributions of each of the proteasomal proteins and found a marked difference in thermostability between core and regulatory subunits (Fig. 7). Many proteasomal subunits exist as multiple paralogs in the Arabidopsis genome, and at least one homolog of most subunits was modeled in our data. All modeled core subunits had T m s at the upper end of the proteome range, whereas all regulatory subunit T m s were in the lower bin. This is line with observations from other labs regarding co-precipitating protein complexes and is discussed further below.
Mg-ATP-induced Thermal Stability Shifts-In order to demonstrate the suitability of TPP-based thermal shift assays to detect treatment-induced conformational changes in complex plant extracts, we performed TPP on Arabidopsis lysates treated in vitro with Mg-ATP or mock solutions and compared calculated ⌬T m shifts upon treatment with existing protein functional annotations. The results of this comparison are shown in Fig. 8. There is a significant increase in Mg-ATPinduced ⌬T m s among annotated kinases when compared with all other proteins (p ϭ 0.015). There is also an apparent enrichment in treatment-induced thermal upshifts in the more general annotated ATP-binding protein set, although the statistical significance was weak (p ϭ 0.103). Possible reasons for this weak effect are discussed in more detail below. Lastly, as Mg-ATP was used in the treatment, we also compared annotated magnesium-binding proteins versus all other modeled proteins and observed a highly significant stabilizing effect in this class (p ϭ 0.004). Gene set enrichment analysis on the ⌬T m values greater than 4°C in either direction showed statistically significant enrichment (p Ͻ 0.01) in stabilized proteins of terms related to known ATP binding classes, including "adenosine kinase activity" and "glucose-1-phosphate adenylyltransferase activity." Interestingly, among the most significant enrichments in destablized proteins was the class "heterocylic compound binding," a parent term of ATP-bind-ing proteins, suggesting that ATP binding can have both stabilizing and destablizing effects depending on the protein class (supplemental File S6).

DISCUSSION
Thermostability of a Plant Proteome-We have presented here the first in-depth look at the thermal proteome of a model plant. Out of the roughly 4,000 proteins identified in the samples, 922 were modeled in three or more replicates with x Ͻ 1.5°C, representing high-confidence T m estimates and errors, and 1707 protein T m s were estimated at the lower thresholds of the MC group. This database (in the form of estimated T m , melting temperature profiles, protein-specific measurement variances, and behavior of low-abundance proteins) represents a valuable resource for Arabidopsis researchers plan-  ning a TPP study. It is critical to recognize from the start of such an experiment that all proteins do not behave identically under the assumptions being made, and researchers can thus benefit from prior knowledge as to whether their proteins of interest are likely to "behave well" within the confines of the assay. In our experience, some proteins have highly consistent melting curves from experiment to experiment, whereas others have consistently high variance and many do not appear to behave according to the two-state model at all (Fig. 9 and supplemental File S2). Although biological replicates are critical in any downstream experimental design in order to infer shifts in T m at the individual protein level, proteins which display reproducible curves across the six replicates presented here, prepared at different times using multiple isolation techniques, will be better candidates for detecting small shifts in T m in future experiments than those with higher inherent variation.
To be considered valid, the database presented here should be consistent with existing knowledge and expected results. Gene set enrichment was used both to validate the approach and to search for novel patterns in plant protein stability. The thermo-labile bin was found to be enriched in cytoskeletal proteins. Actin is known to respond to moderate heat stress in Arabidopsis and to readily adopt multiple conformations, so it is not surprising that its subunits would overwhelmingly fall into the unstable bin (19,20). Likewise, it would be expected that proteins involved in protein re-folding would require higher thermal stability to retain function under heat stress, and we observed enrichment for terms related to protein-folding in the stable bin. In addition, we observe higher thermostability in proteins involved in carbon fixation, including the three subunits of Rubisco modeled in the experiment. At the same time, the Rubisco activase protein RCA (AT2G39730) is among the least thermally stable proteins modeled (T m ϭ 38.8°C). This is consistent with the current understanding of the relationship between photosynthesis and heat stress, where activated Rubisco has been found to increase in activity over increasing temperatures while its activase loses activity relatively quickly (21,22).
Another observation arising from the gene set enrichment analysis was the segregation of core and regulatory 26S proteasomal subunits into different extremes of thermostability. Little has been published on the thermostability of the plant proteasome, but the results observed are perhaps not surprising. The proteasome proteolytic core is a highly structured unit composed of rings of ␣ and ␤ isomers, whereas the regulatory base and lid are composed of proteins with a range of functions which serve in various aspects of ubiquitin recognition and ATP-dependent protein transport into the core. It is therefore not surprising that the regulatory complex would require a higher degree of conformational flexibility and thus unfold at lower temperatures. It is also possible that the role of the proteasome in stress response (including heat stress) may make thermal stability of the core proteasome important, as the core 20S proteasome itself is capable of degrading unfolded and damaged proteins in the absence of the regulatory complex.
The observation that the main proteasomal complexes share T m profiles among subunits is also not surprising. Because our initial observations, this finding has been confirmed in human cells as part of a larger observation that tightly bound protein complexes tend to denature as a unit (23). In fact, this tendency has been used as the basis of a technique to study protein associations (the 'interactome') via conserved T m and ⌬T m profiles between complexed proteins (24).
Correlation with Protein Properties-Significant past effort has gone into discovering determinants of protein thermostability, defined herein as the relative ability of a protein to maintain a native conformation under increasing temperature. For the most part, this past work has focused on comparing protein structure (1°and higher-level) between orthologs from mesophilic, thermophilic, and hyperthermophilic prokaryotes (25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40). In some cases a small number of experimental T m s were known (for example, taken from the ProTherm database) (32,37,40), but often it has been assumed that characteristics that differentiate mesophilic from thermophilic primary protein sequences would also be general determinants of thermostability, justifying purely in silico approaches on sequenced genomes (28,29,38) species (17) and found correlations between a number of chemical and structural characteristics of proteins and their relative degree of thermostability.
Our observations on the correlation between fundamental protein characteristics and thermostability bolster many of the previously published observations but also differ in a few key aspects. Molecular weight was seen to be inversely correlated with T m , which agrees with previous observations (16,17). In comparative studies of mesophiles versus thermophiles, this observation usually comes with the caveat that group-specific trends in protein size may be caused by evolutionary pressures other than thermostability. However, our observation of a strong similar trend within a single plant proteome lends credence to the idea of a causal relationship between the two factors. We also observed a statistically significant increase in relative hydrophobicity with increasing T m . Previous mesoph-ile/thermophile studies have come to mixed conclusions regarding this relationship, with Kumar et al. (29) finding no correlation and McDonald (38) finding a positive correlation, in agreement with our results. Correlation with secondary structure was also in agreement with Leuenberger et al., with an increase in the proportion of residues residing in predicted ␤-sheets and a decrease in ␣-helix residues with increasing T m . Most published work examining differences in mesophilic and thermophilic proteomes report an increase in ␣-helix residency in proteins of thermophilic organisms, suggesting that other evolutionary factors may be in play in those organisms (29,26,27). It is important to note that ␤-sheets require special consideration, given that the readout of our assay involves protein precipitation upon unfolding and that ␤sheets are known to affect nonspecific protein aggregation. However, any such effect would tend to bias the results in the   FIG. 9. Protein-specific variance in T m estimates. Most proteins behave relatively reproducibly between replicates and experiments (A). Higher measurement variance is often associated with lower abundance but not always, as some proteins are consistent at low PSM counts (B) whereas others have consistently high variance even at higher abundance (C). opposite direction as that observed, suggesting that any bias because of ␤-sheet content is negligible.
Existing mesophile/thermophile literature suggests that charged-versus-polar bias (CvP, i.e. the relative proportion of D, E, K and R versus N, Q, S and T) is a robust predictor of mesophilic and thermophilic orthologs and thus is thought to be involved in protein thermostability, with an increase in global CvP corresponding with an increase in optimum growth temperature (30). However, in our own work with a plant extract we find a strong negative correlation between the two values in our data at the single protein level (Fig. 4). Indeed, D, E, K, and R are among the most statistically significant depleted amino acids in thermostable proteins, whereas S is strongly enriched (Fig. 5). This is partly in agreement with Leuenberger et al., who found a depletion in aspartic acid in thermostable proteins of E. coli, although their observation of an enrichment in lysine does not agree with our results. Furthermore, we observed in a smaller subset of proteins with known tertiary structures that the depletion in charged residues extends to the protein surface. At the same time, nonpolar residues are enriched on the surface of thermostable proteins. Clearly these features (for instance, overall charged residue composition and charged surface area) are interdependent, but the tertiary analysis strengthens the case for the importance of these specific amino acids in determining protein thermal stability.
Other protein features examined (abundance, aliphatic index, isoelectric point, unstructured content), which in various reports have been correlated with protein thermostability, do not show statistically significant correlation in our data (Fig.  4B). Leuenberger et al. reported a "clear" positive correlation between protein abundance and thermostability, but we observe no correlation or trend in our data. Abundance is a challenging trait to interpret, as the proteins most readily modeled in the assay are strongly biased toward the most abundant proteins in the proteome (this is true of most tandem MS studies). Additionally, there is a danger of specific classes of very abundant proteins skewing the results. This was seen in our data in the case of ribosomal proteins, which are highly enriched in the most unstable bin and which have specific chemical and structural attributes which may be completely unrelated to thermostability but which skew the results of feature correlation (they were removed prior to the final analysis).
Ribosomal Thermostability-Although most of our observations correlate with those from other kingdoms published previously, the Arabidopsis ribosomal complexes showed marked thermal instability, in contrast with observations in Leuenberger et al. and Becher et al. (17,23) in animal cells. Although it is tempting to interpret this as a plant-specific behavior, it is also possible that extraction conditions play a role. Given the highly conserved nature of the ribosome, this is perhaps a more likely explanation. As with the 26S proteasome, it was not surprising to observe the ribosomal complex precipitating as a unit. We also see strong enrichment for eukaryotic initiation factors in the same thermo-labile bin as ribosomal subunits, suggesting co-aggregation of these proteins as part of the ribosomal complex. Of interest was the fact that the 60S acidic subunits displayed a markedly different melting profile, suggesting that they are less tightly associated with the ribosomal complex than the other subunits observed (Fig. 10). Clearly, further experimental work is needed to explain the observed behavior of plant ribosomal proteins in the assay as either a unique trait or a technical artifact.
Thermal Shifts Upon ATP Treatment-Using a broad-spectrum in vitro treatment, we have demonstrated the suitability of TPP for probing changes in plant protein stability upon perturbation, with Mg-ATP as a positive control as used previously in other organisms (4). Although the lack of replicates prevents reliable interpretation of the data at the individual protein level, it is possible to examine differences in ⌬T m distribution among global groups of proteins with statistical significance. The distribution of ⌬T m s in annotated ATP-binding proteins appears bimodal, with a subset behaving like non-ATP-binding proteins and a subset with significant thermo-stabilization. This suggests a complex mechanism of stabilization upon ATP binding that acts differently on different classes of proteins. There is also a possibility that the level of occupancy of the ATP binding pocket in different protein groups prior to treatment can partly or fully explain the bimodal nature of the observed distributions. Indeed, when annotated kinases alone were analyzed, the distribution took on a more unimodal shape with a significant upshift in ⌬T m . The sharpest and most significant shift of all was observed among annotated magnesium-binding proteins, a side-effect of using Mg-ATP as the treatment, despite the generally weak binding affinity of magnesium to proteins compared with other metal ions (41). It is possible that the more marked effect observed in putative Mg-binding proteins is because of lower occupancy of the binding sites pre-treatment compared with ATP-binding proteins. As described above, gene set enrichment analysis of the data was consistent with an effect of ATP binding on thermal stability. Although unreplicated data should be interepreted with great care at the single-protein level, plots of all proteins modeled are provided for inspection in supplemental File S7 and proteins with large shifts (abs(⌬T m ) Ͼ 4) used in the GO analysis are included in supplemental File S8.

CONCLUSIONS
The primary aim of this study was to lay a groundwork for future studies using TPP in plant systems. To this end, we have developed a database of T m , melting profiles and experimental variance data which will help guide future researchers using this tool. It is important to acknowledge that the results of work carried out in vitro on extracts from plants grown in defined hydroponic media can only be interpreted with care in the context of real-world environmental conditions or even in vivo experiments in the lab. Factors such as cytoplasmic protein concentration, small molecule interactions and cellular localization can be expected to have significant affects on thermal stability and molecular interactions. Difficulties in detecting and modeling less abundant proteins also narrow the scope of usefulness of the assay. With these limitations in mind, however, emerging technologies such as TPP present a wealth of new opportunities for plant researchers to pursue unknown cellular interactions at a large-scale level.
A secondary aim of the study was to leverage the database of thermal profiling data to look for determinants of protein thermal stability, with a focus on characteristics unique to plants and the environments they face. We observed several features significantly correlated with T m that add to the body of knowledge in this area. We did not, however, observe any patterns that could confidently be interpreted as unique to the plant proteome, and this remains an area of active interest for future work. Lastly, these results provide a demonstration of the suitability of TPP combined with the thermal shift assay to detect conformational changes in the plant proteome. They complement work from other kingdoms and open a new avenue of investigation to researchers interested in searching for novel protein-ligand interactions, providing the potential to more readily probe the effects of genetic and environmental perturbations on plant protein conformation.