A STRATEGY FOR MANAGING NASA’S LONG-TAIL OF PLANETARY RESEARCH DATA.

Coinvestigators: Barbara Lafuente, SETI Institute Nate Stone, Open Data Repository Mary Parenteau, NASA Ames Research Center Shawn R. Wolfe, NASA Ames Research Center Sara Perez Rojo, NASA Ames Research Center Kevin Boydstun, NASA Ames Research Center Robert Downs, University of Arizona David Blake, NASA Ames Research Center Linda Jahnke, NASA Ames Research Center David Des Marais, NASA Ames Research Center Christopher Dateo, NASA Ames Research Center Mark Fonda, NASA Ames Research Center

cused on the needs of Astrobiologists, the system serves as a guide to data management strategies in other NASA-funded scientific disciplines where long-tail research is performed.AHED components include (Fig. 1): the Astrobiology Resource Metadata Standard (ARMS) [3], an astrobiology specific standardized metadata framework; the Open Data Repository (ODR) [4], a core service provided by OPScI for the rapid online publication of data without the need to write code; and the AHED Web Portal, which provides a web-based home to the project.The Astrobiology Resource Metadata Standard (ARMS) [3]: ARMS (Fig. 2) is an evolving comprehensive standard for the description, access, and discovery of information related to all areas relevant to astrobiology.

Figure 2: The ARMS Astrobiology Resource labels multiple forms of observation and information with
metadata to streamline searches.

2142.pdf 55th LPSC (2024)
ARMS goes beyond just datasets and can describe any product of astrobiological research, including physical samples, software, publications, and so on.The advantage of using ARMS over a generic metadata standard comes from the inclusion of metadata specific to astrobiology, which allows the researcher to describe their information more precisely and in much greater detail.
The consistent structure established by ARMS greatly facilitates precise querying and rapid interpretation of search results in AHED.Moreover, it allows rapid and intuitive archiving of ARMS-labeled files, or links to other online resources, through the AHED Portal.
Open Data Repository's data publisher (ODR): ODR is open-source, discipline agnostic software for the rapid online publication of datasets, with no need to write code (we liken it to the WordPress for data).The platform supports a diverse array of data types and formats, with customizable layout and dataset design edited using a drag-and-drop interface.
ODR can create interactive datasets, with dynamic graphing capabilities, plugins for specific functionality, and the ability to integrate with 3 rd party apps for online data analysis.Dataset creation is streamlined using customizable templates, which also enhance cross-dataset searching.Datasets can be expanded to incorporate data derived from newer research techniques or data formats as the dataset lives and evolves.These tools allow researchers to create, publish, analyze, and maintain living datasets that are more useful and versatile than static repositories of legacy data.
Data access is managed through a permissions system that allows datasets to serve as collaborative environment for research teams prior to public release.Datasets are citable and digital object identifiers are assigned on creation.Additionally, the ODR system also makes each data record accessible through an API (Application Programming Interface) that facilitates interaction between external computer programs and the database.ODR also creates shareable data templates from which multiple groups can create compatible datasets utilizing the ODR software.Combined with cross-dataset searching facilitated by Elasticsearch (https://github.com/elastic),researchers can search data across multiple datasets that utilize the compatible templates in ODR.These networks of datasets allow different groups to manage their own staff and data while participating in a community that combines their data to create powerful, distributed data resources.
AHED Web Portal: Based on the importance of labeling datasets with appropriate metadata (such as ARMS) for discoverability and easy navigation be-tween similar resources, the AHED Web Portal hosts an online dataset creation tool.The tool lets users rapidly and intuitively archive ARMS-labeled files or links to other online resources hosted by the ODR (Fig. 3).Permanent identifiers such as Digital Object Identifiers (DOI) are provided for each dataset in AHED to facilitate dataset discovery and citation.The AHED Web Portal also provides an interactive, multifaceted search interface for AHED datasets.

Figure 1 :
Figure 1: Components of the AHED system.

Figure 3 :
Figure 3: Screenshots of AHED search tools (top), search page (bottom left) and contribution wizard (bottom right).