Published May 7, 2024 | Version v1.0.1
Software Open

GLAM-Workbench/trove-web-archives

Description

CURRENT VERSION: v1.0.1

This repository includes information on finding, understanding, and using Pandora's collections of archived web pages.

Pandora has been selecting web sites and online resources for preservation since 1996. It has assembled a collection of more than 80,000 titles, organised into subjects and collections. The archived websites are now part of the Australian Web Archive (AWA), which combines the selected titles with broader domain harvests, and is searchable through Trove. However, Pandora's curated collections offer a useful entry point for researchers trying to find web sites relating to particular topics or events.

The Web Archives section of the GLAM Workbench provides documentation, tools, and examples to help you work with data from a range of web archives, including the Australian Web Archive. The title urls obtained through Pandora can be used to obtain additional data from the AWA for analysis.

For more information and documentation see the Trove web archive collections (Pandora) section of the GLAM Workbench.

Notebooks

  • Create title datasets from collections and subjects
  • Harvest Pandora subjects and collections
  • Harvest the full collection of Pandora titles

Associated datasets

Created by Tim Sherratt for the GLAM Workbench

Files

GLAM-Workbench/trove-web-archives-v1.0.1.zip

Files (60.6 kB)

Name Size Download all
md5:2a9e02b9f39b651a0e5ffa142def24a0
60.6 kB Preview Download

Additional details

Related works

Is derived from
Software: https://github.com/GLAM-Workbench/trove-web-archives/tree/v1.0.1 (URL)
Is documented by
Software documentation: https://glam-workbench.net/trove-web-archives/ (URL)
Is part of
Other: https://glam-workbench.net/ (URL)