Data preservation in pressure measurement

Suggestions concerning practical pressure scales are largely focused on the selection of primary calibrants, criteria for candidate reference points and choice of the equations of state. Meanwhile, preserving and archiving data related to pressure measurements need also considerable attention. These data, as well as metadata items, corresponding to the pressure determination method are often missing in publications, making it difficult to assess the soundness of the applied approach and validate the reliability of the results. Even if the relevant information is reported, it can be difficult to track down if buried in the article text or supplementary material. Therefore, only using a consistent standardized format one can conform to the requirements of modern science research data being FAIR, i.e. Findable, Accessible, Interoperable and Reusable. Existing data structures can serve as a good starting point, if properly adapted to host specific information related to the pressure measurement. In particular, essential high-pressure data and metadata can be encompassed within Crystallographic Information Framework (CIF), a widely accepted and robust standard file structure for the archiving and distribution of crystallographic information. This review is thus intended to indicate recommendations for data items to be preserved along with the associated measured pressure values.


Introduction
Precise and accurate determination of two basic thermodynamic parameters, temperature and pressure, is crucial for reliable interpretation of experimental data. Despite analogies, however, temperature and pressure metrology are diverse in terms of applied methods and underlying physical basis.
In situ temperature measurement in static spectroscopy and diffraction experiments at non-ambient temperature is usually considered trivial and comes down to the readout of a thermo-sensor located possibly close to the investigated sample. This is significantly different in high-pressure hightemperature studies with laser heating, where the radiation pyrometry is a conventional method of temperature determination and the measurement accuracy depends on a number of factors including the unknown wavelength dependence of sample emissivity, temperature distribution within a sample, and aberrations of optics applied in the experimental setup [1,2].
On the contrary, the most accurate absolute pressure measurements employing primary pressure standards, based on the direct force/area determination, are usable only in a very limited pressure range, e.g. mercury manometers (up to few hundred bars) or force-balanced piston gauges (up to 2-3 GPa) [3,4]. Such secondary standards as Bourdon tubes, pressure transducers and manganine wire gauges are employed in hydraulic systems and can also work just in a moderate pressure regime. From a practical

Data management and data flow
Defining and developing the strategy of data management in pressure measurement requires a wider perspective. The general agreement is that nowadays the data should meet standards for being FAIRfindable, accessible, interoperable, and reusable [8]. Hence, the strong motivation to tackle the issue of management of primary data in scientific research has sprung up in many fields, including structural biology and chemistry. In 2011 the International Union of Crystallography (IUCr) established the Diffraction Data Deposition Working Group to 'address the growing calls within the crystallographic community for the deposition of diffraction data images'. Indeed, numerous case studies show that preserving and sharing raw diffraction data in various areas of crystallographic research improve the quality of structural analysis and validation process [9]. Recently, major steps were accomplished toward the implementation of FAIR data policy in macromolecular crystallography publications [10].
While similar strategies in powder diffraction are still in the stage of development and implementation, a concise and thoughtful review by Aranda highlights the main challenges in this field and presents a step-by-step algorithm for data processing and analysis [11]. Following this idea, one can define primary raw data as the signal acquired by the X-ray detectors -either point detectors (such as scintillation or proportional counters), line detectors (like position sensitive or real-time multiple strip detectors) or area detectors (e.g. image plates, multi-wire proportional counters, Charge Coupled Device cameras, Complementary Metal-Oxide-Semiconductor detectors). In considering such 'raw' data, however, it should be born in mind that they are often already processed to some extent by in-built detector firmware that applies automatically distortion or flood-field corrections. The raw data are subsequently processed, in most cases involving human participation in the decision making. In case of area detectors, this operation includes detector calibration, applying masks to exclude unwanted regions (e.g. beam stop shadows, overexposed areas, defective pixels), imposing instrument geometry, correcting for the polarization of the X-rays, and finally radially integrating to obtain one-dimensional powder diffraction patterns, usually showing intensity in the function of scattering angle 2θ. At this stage, such one-dimensional diffractograms (processed raw data in Aranda's terminology) can be further analyzed (using also some additional data such as wavelength, initial information about the This approach can be straightforwardly applied to pressure determination using the EoS calibrants. The unit-cell volume calculated from lattice parameters or even the individual diffraction lines of the standard (i.e. derived data), can serve to determine the pressure using a known EoS (in case of nonambient temperature studies, the temperature should be also provided as an additional input parameter). In this respect, the pressure values are the final derived data. It is worth noting that data flow in pressure determination using optical sensors can be described per analogy to the powder diffraction data processing pipeline outlined above. In this regard, the collected luminescence, Raman or infrared spectra are primary raw data, while numerical parameters of the specific spectral features (frequency or wavelength corresponding to the peak maximum or inflection point), calculated after background subtraction and curve fitting with appropriate line shape functions, are derived data. Substituting these values to the relevant pressure scale formula yields final derived data: the calculated pressure. An overview of the whole procedure is shown in Figure 1. Original raw diffraction data are mostly stored or planned to be stored in large-scale facilities. It is expected that soon the authors of scientific publications will be required to provide a permanent link to the raw data and associated processed data sets pertaining to their articles. This part of the pressure determination procedure with EoS calibrants should be therefore appropriately covered by general recommendations concerning data sharing in crystallography. As regards the optical sensors, repositories and databases focused on preserving raw spectroscopy data are not numerous, although constant progress has been made recently also in this field [12]. However, these resources still do not offer archiving processed data at different levels, including such details as the data processing method (background treatment and the choice of line shape function). Since all the aforementioned issues are more general concerns in the area of crystallographic and spectroscopic research, it has been decided to narrow the scope of this paper to the last step of data processing illustrated in Figure 1, namely the pressure calculations with the diffraction or spectroscopy derived data as an input.

Requirements of data format
Existing data structures can be conveniently extended to host data and metadata related to pressure measurement. One of the ongoing activities of the IUCr Commission on High Pressure (CHP) is defining essential data descriptors for high pressure research [13]. This can be effectively achieved in practice utilizing the syntax of CIF (Crystallographic Information File) [14], in which each specific piece of information (entry), called a data item, is defined by a data name (identifier) and an associated data value, which can be numerical or textual. The numeric and character data values are distinguished in the base CIF specification, which recognizes four main data types, two of which are relevant to the high pressure research: (i) numb -a decimal base number supplied as an integer, a floating-point number or in scientific notation and (ii) char -a character or text data value (if it contains white spaces, it must be enclosed within single quote characters). Such a structured layout based on the ASCII character tagvalue pairs makes CIF files not only human-readable, but also machine-parsable. An important advantage of the CIF concept is also its flexibility and extensibility. Data items are formalized in CIF dictionaries: the central coreCIF and also more specific dictionaries, developed for particular subdomains like macromolecular crystallography, powder diffraction, magnetic structures, electron density studies, etc. New specialized CIF dictionaries are planned to cover other branches of crystallography, including high pressure research. The high-pressure CIF dictionary should contain, among other essential descriptors of experimental details, the specifics of pressure measurement [15]. Currently, only few CIF data names relevant to high pressure studies exist in official dictionaries; they have been listed in Table 1.

Proposed data descriptors for pressure determination
Here the main goal is defining the minimal set of essential data and metadata guaranteeing reproducibility of the results. For the sake of clarity and coherence, this task was divided into four parts: data for pressure measurements with EoS calibrants, data for pressure measurements with optical sensors, metadata and error analysis.

Data for pressure measurements with EoS calibrants
The first data item that needs to be addressed is the nature of the gauge material. Since the EoS calibrants are usually either chemical elements or very simple inorganic compounds, the textual data values can be simply defined using their chemical formula, or a name of a specific crystal polymorph to avoid ambiguities, e.g. 'Au', 'Pt', 'Mo', 'cBN' (cubic boron nitride), 'diamond' or 'quartz'. It would be convenient to prepare an extendable list of all the standards to enable syntactic recognition via parsing. Form, accuracy and the range of applicability of a thermodynamic EoS are central for correct pressure determination. As Anderson and Mammone rightly pointed out, "an equation of state which is used as a primary pressure standard has to be, like Caesar's wife, above suspicion" [16]. There are only several forms of EoS conventionally used in extreme pressure regime, like e.g. third-order Birch-Murnaghan [17], Vinet [18] or Holzapfel AP2 (adapted polynomial of second order) [19]. They can be therefore straightforwardly assigned to corresponding data names such as 'BM3', 'Vinet' and 'AP2'. Providing the EoS is given in function of the other two state variables, which is further reduced to pT = pT(V) for the isothermal case, its general expression should be included in a definition of the data descriptors. The equilibrium zero-pressure volume at a given temperature V0T, isothermal zero-pressure bulk modulus K0T and its pressure derivatives, also evaluated at zero pressure (K0T', K0T"), can be conveniently included as separate data items. Other material-dependent parameters, such as mean electronic density Z/V0 (where Z is the number of electrons per atom and V0 is the molar volume at p = 0) used in the Holzapfel AP2 EoS, can be defined analogously.
Accurate determination of pressure at non-ambient temperature is a more challenging task. As a first approximation, the effects of pressure and temperature can be accounted for separately, using a known isothermal EoS after considering the volume thermal expansion defined by α(T) = V -1 (∂V/∂T)p. This V- T relationship can be regarded as the isobaric EoS. In particular, Fei [20] formulated a polynomial expression for fitting experimental data over a specific temperature range α(T) = a0 + a1T + a2T -2 . An alternative approach is to derive p-V-T EoS within the frame of the Mie-Grüneisen model. In this way the Debye temperature θ0, Grüneisen parameter γ0 and its volume dependence coefficient q are optimized by fitting to the compression data at different temperature [21]. In any case, a set of parameters specific for a given pressure gauge (a0, a1, a2 or θ0, γ0 q) can be defined as separate data items together with the associated V-T or p-V-T relations used to calculate the pressure. The measurement temperature is another thermodynamic variable that needs to be included in the data set. In the coreCIF dictionary its value is coupled to the _diffrn_ambient_temperature data name.

Data for pressure measurements with optical sensors
Data items associated with pressure determination using optical sensors can be defined analogously to these concerning pressure measurements with EoS calibrants. One of the most significant differences is that the EoS calibrants are always either pure chemical elements or well-defined stoichiometric compounds, while optical gauges are often doped or composite materials, like luminescence sensors ruby (Cr 3+ : Al2O3) [5], Sm 2+ :SrB4O7 [22], Sm 3+ :YAG [23], Sm 2+ :MFCl (M=Sr,Ba) [24], commercial fluorescent polystyrene FluoSpheres ® [25] or infrared sensors, e.g. dilute solid solutions of NaNO2 or NaNO3 in NaBr [7]. Hence, since such factors as a concentration of a dopant as well as residual strain generated during preparation of a sensor can affect its optical properties, it is critical to measure the reference value (e.g. wavelength of the R1 ruby fluorescence line at atmospheric pressure, λ0) for the same sensor specimen before applying pressure (in an empty DAC, before loading) or at least for another piece of the gauge material coming from the same batch. Relying on default reference values can result in systematic errors, particularly considering that the defaults often differ between software pressure calculators, either these incorporated in laboratory ruby fluorescence systems or available on the web.
Optical gauge pressure scale formulas can be addressed similarly as the EoS data items, with their expressions included in the definitions. A recent proposal worked out by the AIRAPT Task Group on the International Practical Pressure Scale, provides a comprehensive list of published ruby scales [26]. As stated previously, these scales were derived by fitting experimental data of a ruby fluorescence sensor measured along with the EoS standards. The differences are mainly in the selection of EoS standards, criteria for reference points, choice of the EoS, the applied weighting scheme and the form of the fitting equation. The fitting procedure yields two or three fitting parameters (named usually A, B and C), which are constant characteristics of the scale. Therefore, the resulting equations give the pressure as a function of these parameters and two variables: the reference zero-pressure ruby R1 line wavelength λ0, and the R1 line wavelength at the measured pressure λ. Pressure scales for Raman and infrared optical sensors are constructed likewise.
Pressure measurement with optical sensors at non-ambient temperature is usually performed using the room-temperature pressure scales, and additionally the linear or polynomial expressions relating a change in optical properties with temperature difference, e.g. in case of the ruby gauge Δλ to ΔT [27]. These expressions, along with the associated coefficients, should be therefore defined as additional data items. It needs to be stressed that these temperature-dependence laws can be valid only over a limited temperature range, with a change of formula and coefficients while moving to a different temperature regime. A similar approach can be applied also for other luminescence and Raman sensors [27]. Alternatively, pressure scales can be determined with various parameters for several isotherms, as in an example of the NaNO2 in NaBr infrared pressure gauge [28].

Metadata
Metadata, often referred to as 'data about data' or 'information about information' are data that describe and give information about other data, putting them in a context. One of the specific examples of metadata related to the pressure determination procedure is the analytical technique used to acquire primary data (X-ray or neutron diffraction, luminescence spectroscopy, infrared absorption spectroscopy, Raman scattering). Here belong also experimental details pertinent to the pressure determination procedure, like the line shape functions applied in curve fitting, type of the laboratory instrument, X-ray or neutron wavelength, luminescence or Raman excitation line wavelength. Other metadata include the items concerning data processing aimed at reducing raw data or processed raw data to derived data (i.e. the lattice parameters, the unit-cell volume, or in the case of optical sensors the wavelength or the frequency attributed to a characteristic spectral feature). Such information can be found in literature, in the article text, e.g. "the (200) reflection of the B1 phase of NaCl was used for pressure determination" [29] or "the unit-cell parameter of gold was determined by least-squares refinement of five diffraction lines [(111), (200), (220), (311), and (222)]" [30]. These descriptions are indeed textual metadata that could be included in a data set, providing that a relevant data item is defined.
A seemingly trivial, nevertheless significant metadata contextualizing the entire pressure determination procedure are literature references to published EoS and pressure scales used to calculate pressure. This information is usually included in research articles to give credit to the authors of a calibration but representing it as a data item which underpins the journal publication would comply with the FAIR principles.
Another important but often neglected information associated with pressure measurement is a p-T working range of EoS and pressure scales. In principle, the calibrant material must be within the elastic behavior regime, but domains of applicability of EoS based on a specific model can be even narrower. This situation is especially exacerbated at high temperature, due to large uncertainties in calculating the thermal pressure and limitations of the working models [31]. Also, optical sensors have a constrained p-T window, within which the measurement is reliable. Ruby becomes structurally metastable above 80-100 GPa and its fluorescence signal weakens with pressure [32]. The decrease in the intensity of luminescence and broadening of the spectral bands with temperature is another well-known limiting factor.
The type of the pressure transmitting medium (PTM) is a crucial metadata item. Each PTM has its characteristic hydrostatic limit, beyond which the non-hydrostatic stress is generated [33][34]. Even for helium, considered the best PTM for its wide hydrostaticity range, above 100 GPa the uniaxial pressure components can reach ~1 GPa [35]. Besides, in multiphase solid mixtures one can observe the so-called Lamé effect resulting in erroneous pressure measurements [36]. It depends on the shear modulus of the PTM (matrix) and the difference between the bulk moduli of the pressure marker (inclusion) and the PTM. Temperature annealing of the compressed sample can be effectively used to mitigate this undesirable phenomenon. It is worth mentioning here that the existing coreCIF data items _diffrn_ambient_environment and _exptl_crystal_pressure_history can be used for storing information about the PTM and history of the sample, respectively.

Error analysis
In the high-pressure research there is a tendency to report statistical uncertainties alone within a single experiment and to neglect the systematic uncertainties associated with the pressure standard. The reasons for this situation are at least twofold: no guidelines and practical recommendations on error analysis, and lack of uncertainty assessment reported along with the EoS or secondary pressure scales. Therefore, careful analysis of the systematic uncertainties associated with the determination of pressure should be conducted in future studies.
The recent proposal for a new practical pressure scale includes the identification and quantitative estimation of principal sources of errors in the p-V relation of diamond [26]: the choice of EoS form (±0.7 %), the scatter of K0T (±0.3 %) and the uncertainty of K0T' (±1 %). Convolution of these three terms yields the combined uncertainty in pressure for diamond as the primary EoS calibrant of (±1.3 %). Similar analyses would be desirable for all available pressure markers. Another interesting review on the accuracy of the determined EoS has been provided by Liu and Bi [37].
The uncertainty in pressure determination can be also derived from the propagation of uncertainty law, providing that all the individual uncertainties of EoS parameters (V0, K0T, K0T', K0T") are available together with the uncertainty in volume measurement. The effectiveness of this approach rests on the presumption that all the constituent uncertainties are independent and not correlated. In the case of EoS calibrants the uncertainty in volume measurement is usually assessed from the diffraction experiment. However, in practice no detailed diffraction data analysis, like a Le Bail or Pawley fitting, is undertaken. If the pressure is determined based on one reflection of the standard only, the estimation of an error is virtually impossible. If a higher number of reflections is available, more complete reasoning can be performed [30].
In pressure measurements with optical sensors the errors in the p-V relationship should be convoluted altogether with that of V-Δλ dependence, to obtain the total uncertainty. Such an uncertainty associated with a newly proposed Ruby2020 gauge is estimated to be ±2.5 % for the pressure range up to 150 GPa [26]. It is appropriate here to warn against a possible misunderstanding. In the already classical book of Eremets [38] the accuracy of pressure determination using the ruby fluorescence method is estimated as ~0.03 GPa, which can be even improved to ~0.01 GPa by fitting of the top of the R1 line [39]. This number is, however, the accuracy of the spectral R1 line measurement (Δλ of 0.01 nm, which corresponds approximately to Δp of 0.03 GPa), not the total uncertainty of the pressure determination, which should include also uncertainty of the ruby scale.
Apart from the accuracy of the diffraction or spectral measurement and the uncertainty of the EoS or pressure scale, other sources of error that may also contribute to the overall cumulative uncertainty should also be considered. As already mentioned, the non-hydrostatic stress [33][34] and other instrumental effects (like the Lamé effect) [36] are among the key factors to be addressed. Another source of systematic uncertainty may be related to a strain generated in a pressure gauge in nonhydrostatic conditions, which exhibits hysteresis and does not return to its initial value on pressure release, causing discrepancies between the pressure values measured in compression and decompression runs. It has been also noted that the ruby R1 fluorescence line shifts considerably with temperature. In the room temperature range, the temperature difference of 5 K results in the same Δλ shift as the pressure difference of ~0.1 GPa. Therefore, any variation of temperature during the experiment and any shift from the measurement temperature of the reference p=0 wavelength, should be taken into account. In high-or low-temperature measurements exploiting thermal EoS or substantial temperature corrections, an accuracy of temperature determination should be appropriately included in error analysis. Furthermore, for experiments with membrane DACs, a pressure drift usually occurs during the data acquisition. Even after some stabilization time, the drift may remain at the level of ~0.5 GPa, comparable to the uncertainty of the pressure measurement [40]. Hence, it is recommended to record the initial (pi) and final (pf) pressure value, at the beginning and just after the sample data collection and report the pressure as the arithmetic mean of pi and pf, properly addressing the propagation of uncertainty.
From a data viewpoint, standard uncertainties of pressure and all other numerical parameters (including V0, K0T, K0T', K0T", the temperature, parameters of the optical pressure scales) can be simply associated with a parent data value. Any numerical CIF data value can contain an appended standard uncertainty number enclosed within parentheses.

Summary and perspectives
In the preceding section, the bare minimum of data and metadata has been identified as fundamental for full interpretation of pressure measurement in static experiments. For the sake of clarity, this set is summarized in Table 2. Laboratory method used for pressure measurement: 'XRD' (X-ray diffraction), 'neutron' (neutron diffraction) 'luminescence' (luminescence spectroscopy), 'Raman' (Raman scattering), 'IR' (infrared spectroscopy). [char] Pressure measurement experimental details Description of experimental details relevant to the laboratory technique (e.g. X-ray or neutron wavelength, luminescence or Raman excitation line wavelength, type of analytical instrument). [char] Pressure measurement data processing details Description of data processing (software used in data processing, line shape functions applied in curve fitting, the number and Bragg indices of diffraction lines used for the determination of lattice parameters). [char] Lattice parameters and unit cell volume Lattice parameters and/or unit cell volume as determined by X-ray or neutron diffraction. [numb]

Spectral characteristics
Characteristic values used for pressure determination (R1 ruby fluorescence line, high-frequency edge of the diamond anvil Raman band, characteristic infrared absorption frequencies). [numb] Reference spectral characteristics Characteristic values used for pressure determination at atmospheric pressure (p = 0). [numb]

Temperature correction form
The temperature correction form should be provided in the definition (the set of parameters may be included). [char] Literature references References to published EoS and pressure scales. [char] PTM PTM used in the experiment. [char] EoS or pressure scale working range p-T window within which the measurement is reliable. [numb] Measurement T Temperature of the pressure sensor.
[numb] Reference measurement T Temperature of the pressure sensor at p = 0 (associated with the reference spectral characteristics).
[numb] Two examples of pressure measurement with an EoS calibrant and an optical sensor are presented in Table 3 to illustrate how the proposal outlined above would work in practice. The current draft of the data and metadata items related to pressure measurement can be a starting point for a discussion on data preservation in the field of high-pressure research. Proper data structuring and organization is an absolute prerequisite of all data management activities. In the existing literature  11 data and metadata are often stored as descriptive annotations in the article text or supplementary material. Such unstructured data resources are primarily human-readable only. Moreover, some laboratory logbooks records are neither published nor deposited in an archive and could be lost forever. To rescue information from looming digital darkness, using a standardized format enabling information handling via electronic data processing is mandatory. Only under such conditions, FAIR data principles can be turned into reality. Flexibility and extensibility of the CIF format (which has no formal distinction between data and metadata), makes it an effective data scheme for the preservation of essential aspects of pressure measurement. This is the first step toward the deposition of pressure-measurement related data in existing databases and the facilitation of efficient data mining.