XLUM: an open data format for exchange and long-term preservation of luminescence data

. The concept of open data has become the modern science meme, and major funding bodies and publishers support open data. On a daily basis, however, the open data mandate frequently encounters technical obstacles, such as a lack of a suitable data format for data sharing and long-term data preservation. Such issues are often community-speciﬁc and best addressed through community-tailored solutions. In Quaternary sciences, luminescence dating is widely used for constraining the timing of event-based processes (e.g. sediment transport). Every luminescence dating study produces a vast body of primary data that usually remains inaccessible and incompatible with future studies or adjacent scien-tiﬁc disciplines. To facilitate data exchange and long-term data preservation (in short, open data) in luminescence dating studies, we propose a new XML-based structured data format called XLUM. The format applies a hierarchical data storage concept consisting of a root node (node 0), a sample (node 1), a sequence (node 2), a record (node 3), and a curve (node 4). The curve level holds information on the technical component (e.g. photomultiplier, thermocouple). A ﬁnite number of curves represent a record (e.g. an optically stimulated luminescence curve). Records are part of a sequence measured for a particular sample. This design concept allows the user to retain information on a technical component level from the measurement process. The additional storage of re-lated metadata fosters future data mining projects on large datasets. The XML-based format is less memory-efﬁcient than binary formats; however, its focus is data exchange, preservation, and hence XLUM long-term format stability by design. XLUM is inherently stable to future updates and backwards-compatible. We support XLUM through a new R package xlum , facilitating the conversion of different formats into the new XLUM format. XLUM is licensed under the MIT licence and hence available for free to be used in open-and closed-source commercial and non-commercial software and research projects.

1 Introduction Wilkinson et al. (2016) proposed four key principles for scientific data management towards open science: Findability, Accessibility, Interoperability, and Reusability-the FAIR guidelines.Since then, major funding bodies (e.g., Thorley and Callaghan, 2019; Agence Nationale de la Recherche (ANR), 2019; European Commission, 2021;Deutsche Forschungs Gemeinschaft (DFG), 2022) and publishers (e.g., Copernicus Press Release, 2018; Wiley Author Service, 2022) have adopted these principles as part of their data management policies, and they have become an integral part of the European Code of Conduct for Research Integrity (ALLEA, 2017).If interweaved with umbrella terms such as 'open data' or 'open science', the added value of transparency and reproducibility of modern science comes across as almost self-evident.Unfortunately, the implementation often seems to fall behind set goals.For instance, Perkel (2020) vividly covered the challenge of 35 participants trying to run decade old-computer code and concluded that maintaining reproducibility of software-based models and analysis pipelines over decades is a demanding, sometimes impossible, task.Likewise, we can infer that data formats tied to a small number of (outdated) programmes runs the risk that data becomes inaccessible.Another aspect on the data side considered by Noy and Noy (2020) who complained that common open-data surrogate statements in articles such as 'data being available upon request' may equate to no data access.Indeed, a pivotal aspect of the FAIR guidelines is their emphasis on principles fostering automated data processing or enabling such processing in the first place.The requirement to actively contact the study authors to request access to the data, e.g., e-mail requests, therefore inherently undermines the principles of open data (Noy and Noy, 2020).On the other hand, authors perhaps refrained from direct sharing because of unclear reporting guidelines or the effort required to document data of presumed low demand.
Adhering to the FAIR guidelines with actual benefits for all parties (e.g., data donators, data users, funding bodies) involves tackling low-level technical issues, such as defining an exchange data format enabling study authors to share their raw data in a manner which is structured, standardised, and ideally effortless, and in a format that will remain accessible long into the future.Here we adopt the idea that those issues are usually community-specific and best addressed through discipline-tailored solutions, for instance, for data generated in luminescence-based chronology studies.
Luminescence dating is a dosimetric dating method of key importance in Quaternary sciences and archaeology (e.g., Rhodes, 2011;Roberts et al., 2015;Bateman, 2019;Murray et al., 2021), covering around the last 300 000 years.In a nutshell, the datable event is the last sunlight or heat exposure of natural minerals such as quartz or feldspar.The dating process determines two parameters: (1) the absorbed dose (in Gy) accumulated in the minerals since the last heat or light exposure, and (2) the environmental dose rate (in Gy ka −1 ).The ratio of dose (Gy) divided by dose rate (Gy ka −1 ) gives the age (ka).Methods frequently applied in luminescence dating studies are distinguished by their stimulation mode, e.g., thermally stimulated luminescence (TL; cf.Aitken, 1985), optically stimulated luminescence (OSL; Huntley et al., 1985) or infrared stimulated luminescence (IRSL; Hütt et al., 1988).Luminescence methods are also used by adjacent scientific disciplines, e.g., accident dosimetry and material characterisation (e.g., Yukihara and McKeever, 2011;Yukihara et al., 2014).
Luminescence (dating) does not measure the absorbed dose directly but infer an equivalent dose (D e ) from the minerals' natural light output (luminescence) compared to a laboratory dose of known size.Luminescence dating studies and research Figure 1.A luminescence age is the result of data aggregation.In order to reproduce all steps, access to primary data (the base level) is indispensable.However, such primary data are seldom published or otherwise accessible.Re-publishing usually leads to information loss.
The number of information/process levels in the graph is arbitrary.
building on such work routinely tabulate only a fraction of the recorded data in the form of aggregated parameters.One could think of a pyramidal information hierarchy with the age on the top (Fig. 1).The base is made out of minimally processed luminescence data, i.e. measured luminescence (for the purpose of this manuscript we "neglect" the dose rate information).
Original dating studies ideally report the full information pyramid.However, the further an age is carried forward through subsequent studies or collected in data repositories, the higher the level of data aggregation.Good examples for aggregated luminescence data are repositories such as Lancaster et al. (2015) or Codilean et al. (2018).Such archives are excellent places to find locations of dating studies, but it is not easy to spatially link different ages without accessing the original studies with primary data.
Original, minimally processed luminescence data (see Fig. 1), i.e. measured luminescence, is hardly ever published along with a study.However, sharing of unprocessed luminescence data, accessible to others after the completion of a dating study, is desirable for several reasons: 1. Luminescence ages are end-members of long measurement series involving various protocols, tests, and analysis steps with potentially different hardware and software tools.Once aggregated, it is challenging for others to re-validate published luminescence dates beyond plausibility checks.Shared raw data will potentially lead to better reproducibility and data quality.
2. Access to luminescence data on a single curve level supports the application of advanced analysis tools employing hierarchical Bayesian models such as the R package 'BayLum' (Philippe et al., 2019) or the model 'baSAR' (Combès et al., 2015;Mercier et al., 2016).Both approaches start with individual luminescence curves to integrate different parameters into a holistic model using Bayesian statistics to derive equivalent doses based on prior knowledge.
Other work has shown examples of how to study sediment pathways by tracing the bleaching histories of sediment grains (Chamberlain and Wallinga, 2019).If such data are never shared, their full potential remains untapped.
Recently, Balco (2020) advocated for a transparent and open middle-layer concept, disconnecting measured quantities from processed ages to account for changed, perhaps improved calculation procedures.His proposal was specified to cosmogenic-nuclide exposure dating, but the general idea appears valid for other dating techniques, such as luminescence dating.For instance, it would enable others to test the impact of alternative applied statistical parameters on the calculated D e in the future.
3. The approach of Balco (2020) renders ages moving targets, i.e. they may change with time due to different calculation procedures.Balco's approach emphasises the data treasure character of measured physical quantities (with "data are described with rich metadata" FORCE11 2014), which need to be preserved and shared instead of processed numbers.This approach holds for luminescence dating studies, which create, somewhat as a by-product, a vast amount of luminescence data of minerals from different origins.Such data are of potential interest, for instance, to geoscientists working on provenance analysis (e.g., Sawakuchi et al., 2018;Tsukamoto et al., 2011), to physicists focusing on luminescence models, or to data scientists trying to develop new approaches to enable exploratory luminescence data analysis to constrain physical parameters of OSL curves (e.g., Burow et al., 2016), or seeking training datasets to test machine learning approaches (e.g., Kröninger et al., 2019).
4. Broadly shared and accessible through a standard format, luminescence curve data will help establish a comprehensive repository for luminescence data, enabling studies and meta-studies not covered by the above mentioned examples.
Data sharing requests can only be reasonably accommodated if luminescence data can be easily exchanged, sufficiently archived, and analysed independently of proprietary software or file formats.We argue that one particular reason hampering the exchange and reuse of luminescence data is the absence of a suitable data format supporting long-term data preservation and fostering data exchange.To our best knowledge, long-term data preservation is an unresolved issue in the luminescence (dating) community.After being analysed and published, one can expect original primary data to be archived in compliance with scientific standards, but they may become inaccessible or incompatible with new data over time when the re-analysis is wanted.Such data are often lost to the public and need to be measured again.Hence, the first step of chronological data sharing and archiving is a data format that qualifies to serve that purpose.
In this contribution, we first briefly list existing data formats commonly used to store luminescence data.We then outline identified general technical requirements for a data format for the long-term preservation of luminescence data.Hereafter, we highlight features of a new XML-based file format, XLUM, developed for long-term preservation and exchange of luminescence data.The remainder provides examples and illustrates a reference implementation in R and Python, showing how existing data can be converted effortlessly into the new XLUM format.The discussion addresses potential shortcomings and challenges and canvasses future directions.We consider our contribution as an initial definition, and the format blueprint is open to discussion within the luminescence community.
In the remainder of this paper, we will use monospace letters for format/code snippets, and file format arguments.XMLelements (nodes), if not accompanied by a closing tag are contracted into one short tag, for instance, <node/> instead of <node> ... </node>.
2 Existing data formats in the luminescence-dating community Equipment manufacturers have introduced most output data formats available in the luminescence-dating community.For instance, Daybreak (Bortolot, 2000), lexsyg (Freiberg Instruments, Richter et al. 2013Richter et al. , 2015)), Risø TL/OSL reader (e.g., Bøtter-Jensen, 1988, 1997;DTU Nutech -Center for Nuclear Technologies, 2016), SUERC portable OSL (Sanderson and Murphy, 2010).Alternative formats were developed as part of research studies (e.g., Mittelstraß and Kreutzer, 2021).In other cases of equipment development, data output formats were not mentioned explicitly (Markey et al., 1997) or the hardware relied on export options of commercial laboratory software solutions (Guérin and Lefèvre, 2014;Mundupuzhakal et al., 2014).Some file formats are proprietary, most are not documented in full.Additionally, data stored in comma-separated value files (file extension *.csv) or raster image-file formats (*.tif, *.spe) appear to be common, however, they lack the metadata required for luminescence data analysis.
Table 1.List of file formats dedicated to store luminescence data in alphabetic order (non-exhaustive).

File extension Type Relation
The BIN/BINX-format was introduced decades ago but it is not the most suitable candidate for long-term data preservation and exchange because: (1) Different file format versions are incompatible because of non-identical file header lengths and byte order; (2) Storage of additional, so far unspecified, metadata requires a format change triggering a new format version.
(3) For historical and memory-efficient reasons, instead of xy-data, only y-data (here counts per channel) are stored, and the temperature is deduced linearly from maxima and the minima values (see Fig. 2).A thermoluminescence (TL) curve represents luminescence against stimulation temperature indeed.However, a detection system usually consists of two independent technical components.One records the luminescence signal, e.g., a photomultiplier tube (PMT), and the other monitors the temperature, e.g., a thermocouple.Both quantities are recorded as a function of time, not temperature.( 4) Data repositories should be findable (e.g., unique identifiers, proper metadata, cf.Wilkinson et al. 2016, Box 2, p. 4 ), and accessible by standard parser libraries (e.g., libxml), and the requirement of format-tailored software solutions should be avoided.
The other data formats listed in Table 1 suffer from similar or related problems because they were designed to accommodate data for a sole purpose or limited application range.In contrast, what is arguably preferable is a format that is as accessible and findable as possible and independent of a specific type of equipment; a requirement that laid the foundation for the development of XLUM.

General data format requirements
A few design prerequisites guided the development of the XLUM format, and we list the most important below.
-The format preserves physical quantities (measured, modelled), while, and their description remains equally readable to humans and machines.
-Data are stored structured on a technical component/sensor level (e.g., photomultiplier, thermo-couple) without limiting the data or forcing data reduction.
-The format enables self-contained storage of data from technical components.
-The format is self-explanatory, i.e., it can be generally understood without format documentation.
-Backwards compatibility is maintained for future versions (newer format versions may carry additional attributes but remain readable to existing tools).
-The format provides a neutral format open and "non-proprietary" with specifications defined by the scientific community not by equipment manufacturers.
-The format application is permitted in closed and open software tools through suitable license conditions.
-Standard software solutions to process measurement data, e.g., MS Excel TM , R, Python, LibreOffice, Matlab TM , GNU Octave are to be supported.Such import routine is, e.g., available in the R package 'Luminescence (Kreutzer et al., 2012).
-Data preservation and exchange are facilitated independently of the operating system running on users' personal computers.
-The FAIR guidelines are supported by design and facilitates the creation of large repositories for long-term preservation and exchange of luminescence measurement and metadata.
We identified an XML (Extensible Markup Language)-based (W3C XML Core Working Group, 2008) format as the most 160 suitable structure serving the outlined requirements.
The idea of introducing an XML-based format for storing luminescence data is not new.Bortolot and Bluszcz (2003) have sketched a few general requirements for such a format 20 years ago, although this approach has not been widely adopted.An XML-based format is rather memory inefficient, particularly if compared to binary formats, leading to relatively large files (tens of megabytes and more instead of megabytes).However, we believe that this aspect is of limited relevance because:

165
(1) mass data storage is inexpensive, particularly if costs are compared to the year of Bortolot and Bluszcz (2003); (2) The overall amount of data produced in luminescence dating is negligible compared to other disciplines working with XML-based formats (e.g., Martens et al., 2011;Röst et al., 2015); (3) Modern storage systems of data repositories usually employ highly efficient low-level data compression methods (e.g., lossless data compression) independent of any file format, reducing the data footprint regardless of the exchange format.

Format description
In the following, we outline the conceptual structure of the XLUM format.To minimise verbosity, we only focus on key design concepts.For full details, we refer to our reference document on GitHub TM (https://xlum.r-luminescence.org,last accessed: 2022-06-30).The GitHub TM repository also contains a formal format description following the XML Schema Definition (XSD) for automated validation.XLUM defines a substructure, which can be part of a file or any other XML structure (W3C XML Core Working Group, 2008) that acts as a container or constitutes a file of its own, for instance, with the file extension *.xlum.
Although the XLUM format does not enforce a specific file extension.
The two key features of the XLUM format are (1) information nesting, with measurement data only stored in the lowest node, (2) support of data sharing by design.

Nesting of information on five node levels
The format consists of five levels (nodes) (Listing 1, Fig. 3), indicated by so-called tags.The correct formal description requires an opening tag (<...>) and a closing tag (</...>) (see Listings 1).Each tag allows various attributes (<tag attribute='' ...>) for metadata of which we will detail a few below.The number of attributes is not limited, and addi-tional user-defined attributes, not covered by the format definition, are explicitly allowed.However, the format definition insists on mandatory attributes, a few accepting the non-empty string NA for not available/not applicable.The first upper four nodes structure the data and provide metadata to describe the dataset.The lowest node (<curve/>) contains the raw (or minimal processed) measurement data.which, e.g., author names, digital object identifier (DOI), and a license can be assigned through attributes.
2. <sample/> is the first child node to <xlum/>.It is the parent structure for luminescence data collected for a single sample.Hence, everything wrapped between <sample/> refers to a specific sample.Amongst others, expected attributes are the name of the sample and the geographic coordinates (latitude, longitude).
3. <sequence/> is the first child node to <sample/>.It sets the structure for measurement data defined through (measurement) sequences, e.g., a single aliquot regenerative (SAR) dose (Murray and Wintle, 2000) measurement sequence or any measurement data arranged in a particular order.Typical attributes are position, fileName, or readerName.

<record/>
with n = max(x) × max(y) × max(t), where t is the extension of the array with respect to time (i.e.channels per time instant) and x and y define the lateral geometry of the detector.Data are stored column-wise starting from A (1,1,1) , A (2,1,1) , ..., A (x,y,1) before continuing in the time dimension.For instance, for measurement over 100 channels with a photomultiplier tube x = y = 1 and t 1 , ..., t 100 .In contrast, for a measurement with a camera with a lateral resolution of 512 × 512 pixels, x, y ∈ 1, ..., 512, where t 1 , ..., t 100 remains the same.The dimensional information is stored in the node attributes xValues, yValues, and tValues.All quantities, except for t are dimensionless (i.e., they have no default unit).However, the attributes xUnit, yUnit, and tUnit allow setting SI units.For more attributes and their meaning, we may refer to the detailed format description (https://github.com/R-Lum/xlum_specification,last accessed: 2022-07-04).

Data representation: example
To illustrate data storage in the XLUM format, we will pick one TL, and one green stimulated luminescence (GSL) record belonging to a test sequence measured for one sample.For simplicity, we limit the number of values for each curve to ten and substitute NA with "..." (three dots).We provide the complete file that can be imported correctly as a supplement.
Listing 2. Example luminescence-data representation in the XLUM format.y V a l u e s = " 0 " t V a l u e s = " 1 2 3 4 5 6 7 8 9 10 " t L a b e l = " t i m e " 14: v L a b e l = " t e m p e r a t u r e " x U n i t = " " y U n i t = " " v U n i t = "K" t U n i t = " s " . . .> < c u r v e component = "PMT" s t a r t D a t e = " 2021 −02 −14 T 2 2 : 5 7 : 1 2 .0 Z" 18: c u r v e T y p e = " m e a s u r e d " d u r a t i o n = " 10 " o f f s e t = " 0 " x V a l u e s = " 0 " 19: y V a l u e s = " 0 " t V a l u e s = " 1 2 3 4 5 6 7 8 9 10 " t L a b e l = " t i m e " 20: v L a b e l = " l u m i n e s c e n c e " x U n i t = " " y U n i t = " " v U n i t = " c t s " t U n i t = " s " t V a l u e s = " 1 2 3 4 5 6 7 8 9 10 " t L a b e l = " t i m e " v L a b e l = " l u m i n e s c e n c e " 30: x U n i t = " " y U n i t = " " v U n i t = " c t s " t U n i t = " s " d e t e c t i o n W i n d o w = " 375 " 31: f i l t e r = " Hoya U340 ; D e l t a BP 3 6 5 / 5 0EX" . . .> 32: 0 .9 0 .8 2 0 .7 4 0 .6 7 0 .6 1 0 .5 5 0 .5 0 0 .4 5 0 .4 1 0 .UTF-8 and must not be changed.In a nutshell, it tells programs for file parsing the character encoding and ensures that characters are interpreted correctly.
Lines 2-5 Start of the XLUM record, with mandatory entries, e.g., for the namespace (xmlns::...) and the used format version (here: 1.0), and metadata related to the data itself, e.g., author and license.Those attributes apply to all child nodes and clarify the data sharing rights in simple and unequivocal terms.In the example, we have applied the Creative Commons (CC) Licence CC-BY1 .This licence allows unrestricted data reuse, mixing, and sharing with the requirement to credit the data creators.
Lines 6-7 The <sample/> node allows providing information about the sample, e.g., mineral, latitude.Those data are helpful for explorative data analysis with data from different geographical origins.
Lines 8-9 The <sequence/> node, with general information that remains unchanged for a sequence, e.g., position for referring to a position in the equipment.follows the workflow outlined in Fig. 4. First, a format blueprint is derived from a prototype shipped with the package 'xlum'.
The prototype is then expanded and filled with data.Before export, the file is validated (xlum::validate_xlum()) against an XSD schema to ensure that the produced XLUM file follows the correct format specification.Both, the prototype and the XSD are copies of files available as part of the XLUM file format definition.
To date, this conversion is not always lossless, i.e., not all metadata are transferred to the XLUM format for all formats due to the work-in-progress character of the package 'xlum'.We will improve the support with further maturity of XLUM.For instance, the conversion of a *.binx file requires the following R code lines: are realised through language-specific software packages.

Python
Similar to R, Python is an interpreted language.Our contribution aims to standardise luminescence data exchange and enable long-term data preservation.However, it is no attempt to bar or abolish other existing formats, which can often be considered primary data because conversion to XLUM may involve data coercion to some extent.However, the direct support of XLUM through other software and equipment manufacturers is desirable in the long run to make luminescence data more findable and accessible, promoting the FAIR data-sharing guidelines.Nonetheless, it does not enforce them, and our contribution should not be understood as a claim on how and if data should be shared.Instead, here we refer readers to the guidelines of their institutes or funding bodies.
For XLUM, we have chosen an XML-derived format structure.Mills et al. (2015) discuss potential adverse effects on the availability of primary data in the field of biology if investigators of long-term studies are obliged to share their data.Although XLUM is merely a data format that sets no sharing rules, comparisons with different disciplines quickly wear off.The fear of study authors that others may use their hard work to publish more quickly might be one of the reasons for the "upon reasonable request" data availability statements (Sec.1).However, in the case of luminescence studies, long-term studies running over many years are scarce (e.g., Guérin and Visocekas, 2015, for an excellent example of such a study), and single datasets, even such as from a whole stratigraphic section as typical in palaeoenvironmental results studies, are of limited use to others.The true benefit of data sharing lies within many accessible and findable single datasets meaningfully linked through metadata, forming large datasets.With its component focussed design with minimum required metadata, XLUM does the groundwork for datasets aligned in luminescence-based chronologies across different sites in data mining projects concerned with luminescence model development and validation or in any explorative data analysis study.
Last, a significant obstacle to our initiative towards success is the question of broad community acceptance of the new format.
Reasonable predictions are difficult to make.We tried to improve the chance of success of our initiative by implementing the first support in the programming language R and Python and by keeping all documents open-access.Furthermore, with the publication of this manuscript after peer review, the XLUM format will be supported by LexStudio2, the software running lexsyg luminescence readers and could be supported by/adopted by other luminescence and dosimetry manufacturers.Further versions of this format will be developed transparently using the GitHub TM repository and are open to comments and contributions.
Additionally, we propose allocating future format developments to a dedicated working group under the umbrella of a (to be formed) trapped-charge dating association.

Conclusions
Our contribution suggested an exchange and long-term data-preservation data format tailored to the specific requirements of the luminescence (dating) community (XLUM).The format is XML-based and intended to store primary luminescence data and metadata self-consistent.The format implements (but does not enforce) the FAIR guidelines, facilitating a focus on accessibility and findability.
1. On the data storage level, XLUM does not constrain the amount of data stored for each measurement by an arbitrary format limitation, i.e. the number of components monitored is not limited by the file format.Furthermore, with this approach, the raw data are self-consistent and inherently contain all relevant information returned by a technical component.
2. On the data analysis level, the format design allows better data quality, as the data wanted for the analysis can be combined with additional information from other technical sensor data.These later data, e.g., stemming from a feedback system monitoring a particular instrument setting, might not be needed to answer the research question; however, it allows data validation and increases confidence in the result.For instance, failure of technical components may have invalidated the measurements and created artefacts.Such records can be excluded in the post-processing.
3. On the data exchange level, data can now be easily exchanged and combined, even though the file format version might be modified in the future, which might increase the overall transparency and value of measurement data.

Figure 2 .
Figure 2. Simplified illustration of two approaches to store a typical TL curve.(a) in the "conventional way" count data of the PMT are recorded channel-wise (1), and the temperature values are re-calculated according to the show equations in step (2) based on the minimum and maximum temperature values to obtain the (3) final TL curves while data is imported in a programme.(b) in the approach suggested here, the luminescence signal and the temperature are recorded by two independent technical components, e.g., a PMT (1) and a temperature sensor (2) monitoring the heating process.While importing, the resulting TL curve (3) matches both recorded signals on the time domain.

Figure 3 .
Figure 3.The figure is a graphical representation of the data storage concept with the different node levels of the XLUM format.Data are stored sequentially over time.Dashed lines are used to indicate the possibility of multiple instances.For example, one XLUM file can contain many <xlum/> nodes and one <xlum/> node many <sample/> nodes etc.

Listing 1 .
'Basic hierarchical structure of the XLUM format following the XML scheme in version 1.0 with UTF-8 encoding.The three dots (...) indicate node attributes.' 1: <? xml v e r s i o n = ' 1 .0 ' e n c o d i n g = ' u t f −8 /> is the root node.It wraps all other data and is parent to all other child nodes.The number of child nodes of <xlum/> is unlimited.Everything within one <xlum/> is considered a collection of data for different samples for 1: <? xml v e r s i o n = " 1 .0 " e n c o d i n g = " u t f −8 " ?> 2: <xlum x m l n s : x l u m = " h t t p : / / xlum .r − l u m i n e s c e n c e .o r g " l a n g = " en " 3:f o r m a t V e r s i o n = " 1 .0 " f l a v o u r = " g e n e r i c " 4: a u t h o r = " M a r i e Sklodowska − C u r i e ; Max K a r l E r n s t Ludwig P l a n c k " 5: l i c e n s e = "CC BY 4 .0 " . . .> 6: <sample name= "LUM−21321 " m i n e r a l = " q u a r t z " l o n g i t u d e = " −4.0702446 " a l t i t u d e = " 50 " d o i = " v a l i d DOI " .e N a m e = " T e s t s e q u e n c e .s e q " s o f t w a r e = " D e v i c e E d i t o r 2 .0 " . . .> 10:< r e c o r d r e c o r d T y p e = "TL" s e q u e n c e S t e p N u m b e r = " 1 " . . .> 11:< c u r v e component = " t h e r m o c o u p l e " s t a r t D a t e = " 2021 −02 −14 T 2 v e T y p e = " m e a s u r e d " d u r a t i o n = " 10 " o f f s e t = " 0 " x V a l u e s = " 0 " 13: e c t i o n W i n d o w = " 375 " f i l t e r = " Hoya U340 ; D e l t a BP 3 c o r d r e c o r d T y p e = "GSL" comment= " s t a n d a r d g r e e n OSL s t e p " v e T y p e = " m e a s u r e d " d u r a t i o n = " 10 " o f f s e t = " 0 " x V a l u e s = " 0 " y V a l u e s = " 0 " 29: Mandatory entry, announcing the XML format.The format follows the Unicode ® (The UnicodeConsortium, 2022)

Figure 4 .
Figure 4.The workflow to generate XLUM-files as implemented in the R package 'xlum'.

Listing 3 .Figure 5 .
Figure 5.The R package 'xlum' supports the conversion of various commonly used luminescence (dating) data formats to <xlum/> using the R package 'Luminescence'.Bindings to the statistical programming language R and the general-purpose programming language Python

Listing 4 .Figure 6 .
Figure6shows a simple representation of the measured values from an example file.
holds all records of a sequence of a particular sample.A record is not necessarily limited to a singleThe crucial concept of the format is that data are stored only in <curve/> nodes, defined by technical components (actual or virtual) measuring or simulating physical quantities over time.Data in <curve/> are numerical (measurement/simulation) values of a physical quantity v 1 , ..., v n ; n ∈ R (discrete/continuous) spanning an array A of the form A [x×y×1] , ..., A [x×y×t] | x, y, t ∈ Z; single technical component, e.g., a photomultiplier.One or many curves define one record.Numerical values in this node are separated by whitespace and span an array with three dimensions (see Eq. 1).Alternatively, this node allows data encoded as base 64 strings.
It is beginner-friendly and viral outside of traditional software development and computer science.A major advantage is the large and active open source community maintaining a wide variety of packages (e.g., 'pandas', 'matplotlib', 'plotly') supporting data analysis workflows.For analysing luminescence data with Python, we provide a work-in-progress version of a package also called 'xlum' via PyPI TM (https://pypi.org/project/xlum, last accessed: 2023-01-08).The package allows loading XLUM files with Python and conversion into pandas DataFrame objects (two-dimensional tabular data).This format is a starting point for further analysis, such as conversion to CSV files, export to Microsoft Excel TM or graphical output.We show a minimalistic example of data import using Python in Listing 4. We provide more information and examples on the corresponding GitHub TM repository (https://github.com/SteveGrehl/xlum-python,last Nolan and Lang, 2013)mory efficient than any binary format, which we see as an acceptable weakness if it helps to facilitate human readability.During the specification process, we evaluated other similar structured data exchange formats, such as JavaScript Object Notation (JSON) (https://www.json.org/)orYAML(https://yaml.org).In particular, discussions about the advantages and disadvantages of XML vs JSON are considered in numerous IT websites, blogs, and platforms such as Stack Overflow (https://stackoverflow.com) or in technology magazines.Without delving into their technical details, JSON has gained popularity over XML in recent years (e.g.,Andy Patrizio, 2016).Nevertheless, given the widespread use (e.g., Copernicus Publications, 2014), and support of XML schemas for data representation in various fields (see examples inNolan and Lang, 2013), we opted for XML as a robust basis.XML provides standard grammar but remains flexible enough to be tailored to our purpose.If wanted and needed, luminescence data will remain easily transferable to other formats once standardised and archived as XLUM files.Another possibility is an amendment of XLUM, for instance, to facilitate better image data, for which storage is already possible today, and the optional support of base 64 string encoding enables a more efficient representation of those data.Those readers enable accurate and precise records of dim light emissions down to a single-grain level.Protocols and methods differ.The primary data is luminescence (light) in all cases.Still, a concept development, perhaps again focusing on luminescence dating, might be a reasonable attempt in the future.Open data has the notion of accessibility and data insight.However, shared data are not becoming automatically accessible, and not every data set may provide similar valuable insight, but it depends on the experimental design.Making data accessible instantaneously with each study published appears advantageous for data users and disadvantageous for donors.For instance, Throughout the manuscript, we implicitly carried forward the limitation that the XLUM concerns luminescence data only, albeit a luminescence age is obtained from a luminescence-derived equivalent dose divided by a dose rate.Radio-nuclide concentrations values used to calculate dose rates are derived using different methods, e.g., high-resolution γ-ray spectrometry, in-situ γ-ray spectrometry, alpha/beta-counting, or inductively coupled plasma mass spectrometry.Different methods make it challenging to develop a data format applicable to all methods.In contrast, nearly every luminescence-dating laboratory has access to luminescence readers (recording luminescence data) with comparable technical capabilities.