Metadata recommendations for light logging and dosimetry datasets

Background Light exposure significantly impacts human health, regulating our circadian clock, sleep–wake cycle and other physiological processes. With the emergence of wearable light loggers and dosimeters, research on real-world light exposure effects is growing. There is a critical need to standardize data collection and documentation across studies. Results This article proposes a new metadata descriptor designed to capture crucial information within personalized light exposure datasets collected with wearable light loggers and dosimeters. The descriptor, developed collaboratively by international experts, has a modular structure for future expansion and customization. It covers four key domains: study design, participant characteristics, dataset details, and device specifications. Each domain includes specific metadata fields for comprehensive documentation. The user-friendly descriptor is available in JSON format. A web interface simplifies generating compliant JSON files for broad accessibility. Version control allows for future improvements. Conclusions Our metadata descriptor empowers researchers to enhance the quality and value of their light dosimetry datasets by making them FAIR (findable, accessible, interoperable and reusable). Ultimately, its adoption will advance our understanding of how light exposure affects human physiology and behaviour in real-world settings.


Introduction
In this article, we are proposing a novel metadata descriptor for obtaining key metadata information in personalized light exposure data sets.Metadata holds information about many elements in a dataset, e.g.location coordinates, exposure duration and the individual circumstances in which it was collected, all of which provide context for meaningful analysis.Light has a fundamental impact on human physiology and behaviour, beyond vision [1][2][3].It serves as the primary zeitgeber or 'time signal' for the human circadian system, allowing it to synchronise physiological and behavioural functions to the external light-dark cycle.In addition to its synchronising effect, light exposure can also modulate melatonin [4][5][6][7], alertness [8][9][10] and cognitive performance [11], and influence sleep architecture [12], thermoregulation and the cardiovascular system [13].Light receptors in the eye, especially the melanopsin-containing retinal ganglion cells, with their peak sensitivity at the blue end of the light spectrum, play a dominant role in this.Thus, the physiological and behavioural influences of light are subsumed under the heading "non-visual" or melanopic effects of light, demarcating them from the visual effects of light, e.g.seeing and perceiving motion, colour and space in the world.
While most mechanistic insights on the non-visual effects of light come from controlled laboratory studies with exposure to constant or parametric modulations of artificial light, there is now an emerging literature on the impact of "real-world" light exposure under ambulatory, daily life conditions [14].In these studies, participants are usually given wearable light dosimeters which capture light exposure over several hours, days or even weeks.These light dosimeters can be placed at different locations, including the wrist using a watch-like wristband, on the chest as a brooch or pendant, or attached to spectacle frames in the direction of gaze [14].Additionally, they have different optical properties and performance characteristics [15][16][17].Especially wrist-worn devices, which often primarily measure activity using accelerometers, now also include different types of light sensors.However, many of them do not estimate melanopic effects (i.e., effects on melanopsin-containing intrinsically photosensitive retinal ganglion cells [ipRGCs]) of light and thus fail to predict its circadian impact.More recently, light dosimeters have been developed that also incorporate the short-wavelength spectral sensitivity of melanopsin [18][19][20].Individual light exposure patterns from such sensors have further been included in mathematical models to predict parameters of circadian physiology [21,22].
To ensure that data collected by different research groups are comparable and can be combined where needed, it is essential to document the conditions which have generated these data.These metadata, i.e., data about the data, have to record which device was used, the context in which it was generated and the descriptors of the participant.More broadly, metadata are key to make data findable, accessible, interoperable and reusable (FAIR, [41]), and seen key as components to support data sharing mandates from funders, journals and institutions [42,43].Over the last decades, infrastructure has been established for sharing data, with generalist platforms such as Zenodo (https:// zenodo.org/), FigShare (https:// figsh are.org/) or the Open Science Framework (https:// osf.io/).Within different areas of biomedical research, specialized metadata descriptors have been developed (e.g., [44][45][46][47][48]).Furthermore, there is an active scholarly community working specifically on theory and practice of metadata [49][50][51][52][53].The importance of standardization and metadata are emerging to be recognized in the domain of sleep and circadian science [53][54][55][56], including the establishment of the US-based NIH-funded National Sleep Research Resource [57] which also provides bespoke tooling to access and process their data [58,59].At present, there is no personalized metadata schema for light logging and dosimetry.
Here, we propose a metadata descriptor for light dosimetry data, incorporating study-level, participantlevel, dataset-level and device-level metadata.The motivation for creating a metadata descriptor for light logging and dosimetry data stems from the need to standardize and enhance research in the field of light-related studies.This descriptor enables researchers to systematically document essential information about light exposure data, promoting reproducibility and comparability across studies.One key benefit is its facilitation of meta-analysis, allowing for comprehensive data synthesis and more robust conclusions.Additionally, it improves the overall quality and transparency of research, aiding peer review and interdisciplinary collaboration, as insights from lighting research intersect with various fields.Finally, journals, funders and institutions may also require the storage and sharing of data in a harmonized way.

Development of metadata descriptor
The metadata descriptor was developed by an international team of authors, from diverse scientific backgrounds (sleep research, chronobiology, vision science, psychology, neuroscience, lighting science, physics, computer science) with experience in complex, real-world data collection, through a joint development process.A series of synchronous Zoom-based discussions were held between 2020 and 2021.After an initial scoping survey and brainstorming discussions, the authors developed different thematic domains to be featured in the metadata descriptor and filled with specific items.The descriptor was refined through an iterative process using feedback given through a collaborative web-based document platform, and subsequently brought into the current final form by author M.S.This draft was then subject to a time-restricting 'veto' process to highlight any further disagreements.Final fine-tuning of the metadata descriptors was performed in a small-group discussion with authors J.Z., K.W., M.M. and M.S.

Structure/hierarchy of the metadata descriptor
The metadata descriptor collects essential information across different domains of a light dosimetry dataset.This includes obligatory information about (i) the study, including its name, whether it is a clinical trial, a description of the study sample and different groups therein, inclusion/exclusion criteria, and contributors, (ii) the participants, including their age, sex and characteristics, (iii) the dataset(s) (at the participant level), including instructions to the participant and wear time, and (iv) the device(s) used, including manufacturer, model, serial number, and information about the sensors.Below, we describe in greater detail the information needed for each of these categories.The modular architecture is shown in Fig. 1.In principle, the metadata descriptor can be expanded to include additional categories.

Study-level information
It is important to capture metadata about a given study.
Here, we consider a study to be a concerted data collection effort using a specific protocol.This could be a longitudinal protocol (studying a cohort over time), an observational protocol or other protocols.At the study level, we record information about the study, participant groups in the study and contributors to the study.
The study-level information includes the following items:  At the level of contributors, the "Data curation" role (https:// credit.niso.org/ contr ibutor-roles/ data-curat ion/) must be defined.While there are key issues around data ownership that go well beyond the scope of this article, it is recommended that the research group involved in the data collection effort discusses data curation and licensing.The contributor schema is given as follows:

Participant-level information
To be able to document the type of study sample from which a light dosimetry data set was generated, it is important to include information about the participants.The participant-level information helps to identify participant characteristics, including demographics, and in particular, facilitates the merging of different datasets indexed in the database for aggregated analyses.To ensure participant anonymity, the information here should exclude personally identifiable information.
To include arbitrary participant-level characteristics that were collected alongside the primary data, e.g., iris colour, handedness, or similar, we provide a reusable "Participant characteristics" metadata field.
The participant-level information contains the following items:

Device-level information
Information about the internal workings of the data collection devices is crucial for correct analyses and outcome.Additionally, we include information about the specific sensors, such as light channels to capture information about the types of light quantities that were recorded.The motivation to use this information is to enable analyses separated by the type of device used.The device-level information contains the following items: At the time of publication there are efforts undertaken by the Joint Technical Committee 20 of the International Commission on Illumination (CIE) (https:// cie.co.at/ techn icalc ommit tees/ weara ble-alpha-opic-dosim etry-and-light-loggi ng-metho ds-limit ations-device) and the MeLiDos project [60].The proposed metadata descriptor uses an interface at the device level for a future descriptor specifically covering topics of accuracy and calibration, as well as standard output channels.The following table shows a cautious attempt at such a datasheet metadata descriptor for devices and sensors to showcase the possible range of such a descriptor.

Limitations
Here, we provided the first metadata descriptor for personalized light exposure data.We wish to highlight the following limitations, for which we provide mitigating strategies under "Future directions": • General applicability.One limitation of the proposed metadata descriptor is its potential limited applicability to specific contexts or types of studies.While it was developed collaboratively by an international team of experts with extensive expertise in real-world data collection, certain study designs or devices may not be adequately represented or documented by the proposed metadata fields.This limitation may affect the descriptor's ability to comprehensively capture metadata across current and future variations of light logging research.This may also include novel technologies, such as spatially resolved measurements.• Validation and independent evaluation: We do not provide concrete evidence of validation or independent evaluation in the current paper.This lack of empirical validation may raise concerns about the descriptor's robustness and effectiveness in different research settings.Without demonstrated validation, the community may question the reliability and accuracy of the metadata captured by the descriptor, potentially limiting its widespread adoption and acceptance.We see future opportunities to address this, including through official standards bodies.• Challenges in implementation: While the metadata descriptor is available in JavaScript Object Notation (JSON) format and comes with a user-friendly web interface for generating compliant files, potential challenges in its implementation across various software languages and platforms are not extensively discussed.The descriptor's compatibility with different data repositories and platforms is crucial for seamless integration into existing research infrastructures.The absence of a detailed discussion on potential implementation challenges and strategies to address them could hinder the descriptor's practical adoption by researchers using diverse technologies and tools.A robust landscape of tooling to support different entry points will need to be developed.

Future directions
We see the follow avenues for future work: • Validation of the metadata descriptor in real-world settings: As we move forward, a critical step is the validation of this metadata descriptor in real-world settings across a variety of users and research contexts.This entails applying the descriptor to diverse light dosimetry datasets collected in different environments, populations, and under varying conditions, including in clinical contexts.This validation process will help ensure the descriptor's adaptability and effectiveness in capturing the nuances of personalized light exposure data.Researchers should collaborate to assess its utility and identify potential improvements systematically.• Independent evaluation of the metadata descriptor: To establish its robustness and credibility, independent evaluation of the metadata descriptor is imperative.Encouraging third-party assessments and peer reviews will provide valuable feedback and insights into its usability and reliability.This independent evaluation should include comparisons with existing metadata schemas and assessments of its compatibility with different data analysis tools and platforms.To streamline the use of the metadata descriptor, it should be integrated into existing data repositories and platforms used by researchers in the field of chronobiology and related disciplines.Creating plugins or extensions that enable seamless incorporation of metadata into data management systems will encourage researchers to adhere to the descriptor's guidelines.This integration will not only enhance data discoverability but also simplify the process of sharing and accessing light dosimetry datasets, further promoting the FAIR principles and facilitating collaborative research efforts.
Incorporating these future directions will not only strengthen the metadata descriptor's utility but also foster a collaborative and dynamic research community focused on advancing our understanding of the non-visual effects of light.By continuously refining and expanding the descriptor, we can collectively contribute to the FAIR principles, making light dosimetry data more accessible, interpretable, and impactful in the fields of chronobiology, sleep science, and beyond.

Conclusion
In conclusion, the development of this metadata descriptor for light dosimetry data is a significant contribution to the field of chronobiology and personalized light exposure research.This descriptor addresses the critical need for standardized documentation of metadata associated with light exposure datasets, ensuring that data collected across various studies, contexts, and devices can be compared and utilized effectively.The modular architecture of the metadata descriptor allows for flexibility and scalability, accommodating potential future expansions.
The implementation of the metadata descriptor in JSON format, along with the user-friendly web interface for generating compliant JSON files, enhances its accessibility and usability within the research community.Furthermore, the provision of versioning ensures that the descriptor remains up-to-date and adaptable to evolving research needs.
Ultimately, this metadata descriptor facilitates the principles of FAIR data (findable, accessible, interoperable, and reusable), promoting collaboration, data sharing, and the advancement of knowledge in the study of light exposure's effects on human physiology and behavior.Researchers and institutions are encouraged to adopt this descriptor to improve the quality and utility of their light dosimetry datasets, contributing to a more comprehensive understanding of the non-visual effects of light in real-world settings.

Fig. 1
Fig. 1 Overview of the metadata descriptor.For clarity, only first-and second-level items are shown