Developing the ArchAIDE Application: A digital work low for identifying, organising and sharing archaeological pottery using automated image recognition

Francesca Anichini1, Francesco Banterle2, Jaume Buxeda i Garrigós3 , Marco Callieri4 , Nachum Dershowitz5, Nevio Dubbini6, Diego Lucendo Diaz7, Tim Evans8 , Gabriele Gattiglia9 , Katie Green10 , Maria Letizia Gualandi11 , Miguel Angel Hervas12, Barak Itkin13, Marisol Madrid i Fernandez14 , Eva Miguel Gascón15 , Michael Remmy16 , Julian Richards17 , Roberto Scopigno18 , Llorenç Vila19, Lior Wolf20, Holly Wright21 , Massimo Zallocco22


Introduction
Every day, archaeologists are working to discover and tell stories using objects from the past, investing considerable time, effort and funding to identify and characterise individual finds. Pottery is of fundamental importance for the comprehension and dating of archaeological contexts, and in order to understand the dynamics of production, trade flows, and social interactions. Today, characterisation and classification of ceramics are carried out manually, through the expertise of specialists and the use of analogue catalogues held in archives and libraries. While not seeking to replace the knowledge and expertise of specialists, the ArchAIDE project worked to optimise and economise identification processes, developing a new system that streamlines the practice of pottery recognition in archaeology, using the latest automated image recognition technology. At the same time, ArchAIDE worked to ensure archaeologists remained at the heart of the decision-making process within the identification workflow, and focused on optimising tasks that were repetitive and time consuming. Specifically, ArchAIDE worked to support the essential classification and interpretation work of archaeologists (during both fieldwork and post-excavation analysis) with an innovative app for tablets and smartphones.

The ArchAIDE project was funded by the European Union's Horizon 2020 Research and Innovation
Programme under grant agreement N.693548, with a consortium of partners representing both the academic and industry-led ICT domains, and the academic and development-led archaeology domains.
The archaeological partners within the consortium were the MAPPA Lab at the University of Pisa (coordinator) which has relevant experience in mathematical and digital applications in archaeology and A digital comparative collection for multiple pottery types, incorporating existing digital collections, digitised paper catalogues and multiple photography campaigns.
An automated-as-possible workflow to digitise paper catalogues accurately and improve the search and retrieval process (Figure 1).
A multilingual thesaurus of descriptive pottery terms, mapped to the Getty Art and Architecture Thesaurus including mappings in French, German, Spanish, Catalan, English and Italian.
Two distinct neural networks for appearance-based and shape-based recognition.
An app using digital comparative collections to support archaeologists in recognising potsherds during excavation and post-excavation analysis, with an easy-to-use interface and efficient image recognition algorithms for search and retrieval, based on either shape or decorative characteristics.
Once a sherd has been recognised, the app can be used to automatically populate the relevant type information about the sherd into a virtual assemblage for a site, including the generation of an identity card as a formatted, digital or printable document. The underlying technologies developed for the app were also implemented as a desktop application, which is a web-based, real-time data visualisation resource, to improve access to archaeological heritage and generate new understanding. Both the desktop application and the app can also be used as a tool to aid learning about pottery identification, either for students or when specialists are not available.
To address the time-consuming and repetitive aspects of pottery recognition, the ArchAIDE system was designed to support a more automated pottery identification workflow, meeting real user needs and reducing time and costs within both academic and professional archaeology ( Figure 2). Although similar in methodological approach, the goals, costs, and types of available expertise between academic and professional archaeology can vary tremendously. The first is less bound by time pressures, and typically allows more time for post-excavation documentation. The second, often related to development-led archaeology, is usually closely bound by time pressures, and often carried out by teams composed of professional archaeologists, where specialist training and/or access to specialist knowledge is at a premium.
As a discipline, archaeology is often an early adopter of novel technologies, but on the whole, maintains a conservative approach when it comes to replacing well-established methods. As such, ArchAIDE has been careful to support good practice, rather than seek to change existing workflows, and the ArchAIDE system was designed to follow specialist methodologies: recognising a pottery type via observation of the profile/cut section shape, and/or through analysis of the decorative surface treatment.
As both speed and accuracy are important in pottery identification, the ArchAIDE system was designed to support the classification and interpretation work of archaeologists (during both fieldwork and postexcavation analysis) through the creation of an innovative app designed for mobile devices and desktop computers. The app features an interface that facilitates the identification of a potsherd from a photograph, acquired using a camera from a typical mobile device. Within the app, the ArchAIDE system supports efficient and powerful algorithms for pottery characterisation. This allows search and retrieval of the possible visual/geometric correspondences against a complex database built from comparative data, derived from digital and analogue resources. In order to create the database, and the training data for the deep learning image-recognition algorithms, it was necessary to acquire robust comparative data for the specific pottery types chosen for the ArchAIDE proof-of-concept. The ArchAIDE database is composed of (1) a Reference Database and (2) a Results Database. The Reference Database contains a number of born-digital and digitised paper catalogues of pottery typologies, which have been combined to create a coherent comparative resource. The Reference Database was designed with a core catalogue, containing basic information about pottery classes and types that are common across different thematic and period-based pottery catalogues, with additional media (drawings, photographs, 3D models etc.) to aid the main recognition application. The Reference Database includes spatial data related to each type entity to allow data analysis and the creation of data visualisation, such as distribution maps.
To create the Reference Database, in addition to openly licensed resources (e.g. Roman Amphorae: a digital resource, held by the ADS; University of Southampton 2014), use of materials protected by copyright (e.g. paper catalogues) and/or by sui generis right (e.g. digital resource, held by the

Development of the Reference and Results Databases
The Reference Database is a core component of the ArchAIDE system and provides data to all the other components. The design and analysis phase involved input from all the archaeological specialist partners to gather the necessary design requirements. The development of the software tools were implemented using 'Agile' software development methodologies, in order to continuously deliver versions of the final system to the partners for testing. Each design-develop-test-refine cycle (about 2 weeks) included a partial revision of the requirements and functionalities of the system. The tight cooperation with the partners allowed the creation of both useful and robust tools. The Reference Database contains the definitions of the pottery types, the decorations, stamps, fabrics, and other information needed by archaeologists during sherd analysis. The Reference Database includes data created and managed by different users, from a variety of sources, concerning different ceramic classes and historical periods; thus the data are organised in catalogues or archives created and managed by and for specific groups of users ( Figure 3). The main entities within the Reference Database are:

Debian Linux
The data are stored in a relational database (MySQL) and indexed for advanced search and API support (Elasticsearch). All the multimedia files related to database entities, such as depictions and 3D models, are stored in the file system.
The reference database is integrated with the following: External sources/sync of data to import and export information.
Geographic systems to define the origin and the occurrences of specific objects.
OCR Tools to support the digitisation of books.
3D rendering tools to enhance the media content of each pottery type.
In order to make the Reference Database able to exchange data with external systems, the ArchAIDE partners defined a data exchange format based on JSON, and to define an API interface allowing external data to be sent and received. The availability of a well-defined data format allowed the partners to export it mapped to the ArchAIDE schema using existing digital comparative data and see it imported in the Reference Database. This was the case for Roman Amphora and CERAMALEX. The format was also used in the process of extracting information from printed pottery catalogues.
The definition of an API interface is essential for the Reference Database, in order to provide information to the ArchAIDE App and to external systems that need to synchronise their data with ArchAIDE. To include geographical information in the database, it was decided to use GeoJSON as a format to represent geographical features (both points, lines and polygons). Location in GeoJSON is represented by a small amount of text in JSON format, thus it is easy to store in the database.
The mappings were supplied to Inera (as JSON and CSV) for incorporation within the reference database. The multilingual vocabularies have been implemented as the basis for establishing a clear level of semantic interoperability within the reference catalogues imported into the database.

Development of comparative collections to populate the database
The choice of the pottery classes, and consequently the catalogues to be used for the ArchAIDE project, was one of the main issues to be considered in order to create a system that must have a real-world implementation. The decision was made to choose three types: amphorae manufactured throughout the Roman world between the late 3rd century BCE and the early 7th century CE; Roman Terra Sigillata manufactured in Italy, Spain and Gaul; Majolica produced in Montelupo Fiorentino (Italy) from the 15th to 18th centuries, and medieval and post-medieval Majolica found in Barcelona. The choice of pottery types was based on the availability of collections to the ArchAIDE partners and their areas of specialist expertise, and the need to find types that relied on both shape-based and decoration-based characteristics for identification.
In addition to the Roman Amphorae and CERAMALEX digital collections, the database was also populated with analogue catalogues, which had to be digitised (see Section 2.2.2). These were in the form of catalogues, books and papers (Berti 1997

Digital comparative data
Databases are common tools for archaeological work, but for different reasons (e.g. technical, legal or historical research traditions) each project, excavation or research community develops their own database structures, naming conventions and data workflows. This leads to a very heterogeneous situation for data provision, even within the same field of research. This is reflected in the two digital sources imported into the database:  Using the import tool provided by INERA as part of the ArchAIDE Reference Database Management System it was possible to import both the ADS and CERAMALEX catalogues. In particular, the Roman amphora catalogue from ADS was the first external source integrated because of the similarity between the conceptual schema of the databases (Figure 4). The work on the CERAMALEX database was more complex because the database schema and architecture led to a different focus of pottery research. The differences between the two systems required complex mapping and normalisation activities, ending with the creation of an ArchAIDE catalogue imported from the CERAMALEX/CeramEgypt database, with the mapping and normalisation process leading to the revision of the ArchAIDE database. Therefore the already structured database with totally different data structure, CERAMALEX/CeramEgypt, was used as proof of concept for the import approach.

Digitising paper catalogues
Paper catalogues contain all the geometric and semantic information needed to populate the reference database and train the image recognition algorithm. This information is presented as both textual description and diagnostic drawings. Digitising these catalogues transforms them into structured/annotated data that can be used within a database, and is also searchable and machinereadable.
The paper catalogues were scanned as images, and then textual information was extracted corresponding to the relevant fields within the database, along with the drawings of the pottery types (which were then processed to trace their profiles, as explained in Section 2.2.3). While the drawings were more standardised, in terms of uniformity and information content, the structure of the textual description varied greatly between catalogues. It was possible to find both highly structured catalogues, where each type is described individually with a series of fields (very similar to a database), and nonstructured catalogues, where the various types are verbosely described as free text, without any recurring, definable structure. It was therefore necessary to create a tool able to deal with both structured and unstructured catalogues. JavaScript. The 'structured catalogue' digitisation tool worked by combining OCR with the structure used in the catalogue, to describe each type (e.g. the number and order of paragraphs, their position within the pages, recurring keywords and section titles, and/or the presence of bulleted and numbered lists).
Starting from the output of the OCR, which contains all the columns/sections/lines/words in the page, the tool automatically splits the description into chunks following the catalogue structure template. The chunks are used to fill the database fields representing a pottery type.
Unfortunately, each catalogue may employ a different structure to describe the types: it then becomes necessary to adjust the parsing process to accommodate the new template. This task of creating a new 'structure parsing template' must still be carried out manually, as the number of structured catalogues was so small it was not useful to create a higher level description of these templates. The output of this processing tool was a series of JSON data structures (containing the different data fields extracted from the catalogue) and images (cropped from the scans), describing each type of pottery. These data were then directly imported into the database, generating the types and associated visual depictions. This was a one-time batch operation, due to the limited number of structured catalogues; thus the tool was used as a stand-alone process, not integrated into the database backend.
The 'unstructured catalogue' digitisation tool was integrated directly into the database backend, to be used by the archaeological partners when creating a new pottery type. In this instance, a fixed structure could not be used, but it was still possible to use OCR to digitise chunks of text from the catalogue scans, speeding up the data entry. The interface lets the archaeologist load multiple scans and browse them. The user reads the text, and when they find a part of the text that can be used to fill one of the database fields, they select it and the tool selectively applies OCR to the selected section. The text may then be edited, to correct errors, or add user-created content. When the text is ready, the user chooses to which database field this text should be assigned. Similarly, from the same interface, images may be marked, cropped from the original images, and inserted directly into the database as visual depictions associated with the new type.

Vectorisation and generation of 3D models from section drawings
Pottery profile drawings in catalogues use a standardised geometry, but the semantic information is flattened in a single raster layer, and encoded following specific representation rules. Designed for human interpretation, this representation is not machine readable. A vector-based representation allows the separation of different semantic and diagnostic elements. It was suitable for use in the Reference Database, but was also found to be useful for training the neural network. Using image processing techniques developed by CNR-ISTI for ArchAIDE, and exploiting the common representation rules of pottery profile drawing in archaeology, it was possible to vectorise and create semantic elements in an automated way ( Figure 6). The aim of this process was the extraction of several geometric and semantic features (body/handles profiles, rim, base, rotation axis). The extracted profile was then stored as an SVG file, and annotated with semantic information. In this way, it was possible to create a digital visual representation (for human access), but the data were also machine-readable. Image-processing techniques also made it possible to generate a 3D model of the traced vessel. The profile extraction was automated, but was able to accommodate small inconsistencies in drawing style across different catalogues, and required only minimal adjustments. The tracing process was implemented as a Matlab function, which could be a batched execution, generating SVG tracings and 3D models for all the drawings in a single catalogue in a matter of minutes.
The tracing of profiles followed a series of steps. First, the scanned image of the archaeological pottery profile was cropped and binarised. Then, a dilate and erode operator was applied to remove any possible pixel outliers resulting from scan noise and printing imperfections. The rotation axis of the vessel was then found, using a Hough transform, isolating the longest vertical line in the drawing. From this axis, the 'profile' side of the drawing was isolated: typically, pottery drawing conventions dictate profiles are drawn on the left side, while the external surface is described in the right part of the drawing, while characteristics such as orientation and drawing style (filled, empty, hatching) may not be consistent between catalogues.
It was then necessary to extract the actual profile regions: external profile, internal profile and handles.
The drawing was segmented to extract all the edges. Starting from the axis, the inner profile curve was found, then using a 'marching' process the whole inner profile was traced, going from the top point to the axis. The outer profile was slightly trickier, as it might contain a handle. If a handle was detected, its profile was isolated and trimmed from the outer vessel profile, using an energy function that provides a trimming surface that is smooth with respect to the rest of the outer vessel profile. The cut-through section of the handle, if present, was isolated and traced separately. In addition to the axis and inner/outer/handle profiles, other metric and semantic information (position of rim and base point, mouth radius) were automatically computed and stored in the SVG. The physical size of the profile could be extracted automatically if a scale bar is present in the image, or recovered starting from the DPI value of the image, and if the information was present in the catalogues (each catalogue/page might be drawn at a specific scale).
Once all the elements were traced and saved in the SVG, a further step produced a 3D model ( Figure 7). This 3D model is a simple representation, aimed at visual presentation. The body of the vessel is a simple rotating object, generated from the profile. For vessels with handles, 3D models for the handles were generated as parametric extrusion along with the handle profiles. Then, the main body is trimmed, and the 3D models of the handles were welded in place. Managing the handles was not always straightforward, as the information in the drawing may be incomplete (e.g. only the profile of the handle is shown, with a middle cross-section, or sometimes not shown at all). These 3D models are available in the desktop Reference Database and can be interactively visualised in the pottery type database page.

Multilingual vocabularies
From the outset of the project, it was clear that a significant component of the ArchAIDE Reference Database would be the ability to contain multilingual vocabularies, to enable linguistic interoperability within the dataset being assembled. However, early in the discussions, project partners quickly identified semantics difficulties when faced with the task of describing either a fragment, or a whole vessel. For example, the difference between what may be referred to as a 'dish', 'plate' or 'platter', even though the physical shape may be similar or the same. Furthermore, it was necessary to establish a scheme of recording that was not only consistent across languages, but also archaeological traditions.
Archaeologists working in different countries may not only use different words but also varying hierarchies or classifications for how they categorise an object. As a hypothetical example, one tradition may have four different words to categorise a candlestick, while another tradition may simply have one.
These issues around semantics apply broadly across most areas of archaeological research, not only within the pottery specialism, so this solution is likely to be useful for a wide range of archaeological material.
The solution was the creation of a discrete set of vocabularies to account for, and reconcile these differences ( Figure 8). These were used as the data forming the backbone of the ArchAIDE Reference Database, allowing different catalogues, and therefore different classification schemes, to be crosssearched and then presented by the ArchAIDE application. Following the precedent set by the EU Infrastructures ARIADNE project (Aloia et al. 2017), the vocabularies were built to create a resource with maximum interoperability for current initiatives and future projects. This was achieved through taking a Linked Data approach. The initial mapping exercise established a concordance between ArchAIDE and a neutral Linked Open Data (LOD) vocabulary. The creation of LOD for ArchAIDE allowed project data to be easily cross-searched as part of the wider Semantic Web, a format that lends itself to use through a range of methods and processes that utilise established web protocols and standards to interact with and retrieve data. Each of the ArchAIDE archaeological partners worked through their catalogues and identified the descriptive terms that refer to, or classify, significant elements of a ceramic form. Learning from the methodology and using the tools developed for the ARIADNE project by the Hypermedia Research Group at the University of South Wales (Binding and Tudhope 2016), using a neutral spine to which partners could map these terms was deemed a preferable solution to establishing a new thesaurus bespoke to the project. As with the ARIADNE project, use of Getty Institutes Art and Architecture Thesaurus (AAT), proved a suitable candidate, with the added value of interoperability with ARIADNE and any other data also mapped to the AAT, thus giving the mapping work done by ArchAIDE a strong sustainable base. In the initial phase of the project all partners agreed on a subset of AAT terms that could be used for this neutral spine, describing the following methods of recording pottery from archaeological excavations: The sherd type The vessel form The decoration type The decoration colour Later in the project, partners established a need for controlled terminologies to describe the type or characteristics of specific parts of a ceramic vessel. As an alternative, and as a concerted effort to ensure that ArchAIDE data was interoperable with past and future projects, the project used the concepts of recording established by the original creators of the Roman Amphorae: a digital resource.
The ArchAIDE multilingual vocabularies are now freely available for download and reuse from the

Development of Deep Learning Image Recognition Algorithms
The use of image recognition on archaeological material is a very new research area, but other work has been done in this field, showing the potential usefulness of this application to other types of archaeological resources. This includes projects such as DADAISM, which focused on the identification of types of Palaeolithic stone tools (Power et al. 2017) and Arch-I-Scan, which is also working to create an easy-to-use, handheld interface for archaeological pottery recognition, but uses 3D scans rather than 2D images as the basis for its image recognition algorithms (Tyukin et al. 2018).
For ArchAIDE, two complementary machine-learning tools were developed to identify archaeological pottery. One method relied on the shape of a sherd's profile while the other was based on decorative features. For the shape-based identification tool, a novel deep learning methodology was employed, integrating shape information from points along the inner and outer surfaces. Fabric type is also an important tool for identification, but after considerable discussion it was deemed to be less promising and more problematic than shape and decoration-based tools. This was largely due to differences in lighting conditions found across thin-sections, and the difficulty in deriving a sufficient amount of training data.
The decoration-based classifier was based on relatively standard methods used in image recognition. In both cases, training the classifiers presented real-world challenges specific (although not limited) to archaeological data. This included a paucity of exemplars of pottery that were well identified, an extreme imbalance between the number of exemplars across different types, and the need to include rare types or minute distinguishing features within types. The scarcity of training data for archaeological pottery required multiple solutions, the first of which was to initiate large-scale photo campaigns to increase these data.

Training data photo campaigns
Multiple photo campaigns were conducted across the life of the project to produce a complete dataset of images for all the ceramic classes under study. The aim of the photo campaigns was to provide a sufficient number of images to train the algorithms for both the appearance-based (Majolica of Montelupo and Majolica from Barcelona) and the shape-based image recognition neural network (Roman amphorae, Terra Sigillata Italica, Hispanica and South Gaulish). ArchAIDE partners led the collection of photos, involving researchers both beyond the consortium and within their own photo campaigns. All partners worked with colleagues, institutions, museums and excavations to acquire the required number of samples. As not all types were stored in a single site, it was necessary to access multiple resources involving more than 30 different institutions in Italy, Spain, and Austria.
To train the shape-based neural network, it was necessary to take diagnostic photos of sherd profiles, so detailed guidelines were prepared for use by the consortium partners and project associates. Finding, classifying, photographing and creating digital storage for the necessary sherds was very time-consuming, as images of at least 10 different sherds for every type were needed to provide enough training information for the algorithm. The ArchAIDE archaeological partners had to verify that the selected sherds were identified correctly, which on occasion also helped institutions (such as museums, research centres, etc.) correct mistakes in the classification of their assemblages. The first approach was to search for at least 10 sherds for each of the 574 different types and sub-types identified in the reference catalogues. It became apparent, however, that not every top-level type and sub-type could be represented. In some instances this was because the type was rare, or because sherds of different types were mixed together when stored, and it was very difficult to locate them. This is a challenge across all forms of pottery studies, not just for a digital application like ArchAIDE. Partners reorganised the search to focus on top-level types, which reduced the number of processed types from 574 to 223.
Overall, 3498 sherds were photographed for training the shape-based recognition model (Figure 9). This included 1311 sherds of Roman amphorae representing 61 different types, including more than 10 sherds photographed for each type. The Terra Sigillata types included 2187 sherds, representing 54 'top level' types, including more than 10 sherds photographed for each type (many with more than 30 sherds).  In the case of the Majolica from Barcelona, photo campaigns were attempted not only to create a dataset in order to train the algorithm, but to also create the first structured catalogue for this type of pottery. For this reason, in order to manage the work economically, the choice of assemblages was crucial. First, attention was focused on green and manganese ware, as there was a preliminary study in progress done by some members of the consortium on a large assemblage stored at the Archaeological

Technical description of training the image recognition classi iers
The creation of training data for both the shape and decoration-based image recognition classifiers (algorithms) was ongoing throughout the ArchAIDE project, but as the decoration-based work proceeded much more quickly, it became the main focus for solving the image-based search and retrieval workflow developed by the Deep Learning Lab at Tel Aviv University. As such, the discussion that follows concerns primarily the decoration-based work. As such, the discussion that follows concerns primarily the decoration-based work. Shape-based recognition will be discussed in more technical detail elsewhere As the output of each layer in ResNet represents the activation of a feature in a specific region of the image, to reduce the size of the feature activation vector while also obtaining more 'global' information on features without referring to their specific location in the image, the activation of each feature over the entire image was averaged ('average pooling'), to produce the final feature vector for our classification system.
To classify the vectors into classes, an SVM classifier was used. As SVM is typically used to separate vectors between two classes, in order to enable support for multiple classes, a One Versus Rest (OVR) technique -training one SVM classifier per class was used, to decide whether a feature vector belongs to a class or not. The classes were then ranked based on the confidence scores of their respective classifiers, to obtain an ordering between the different classes. It is important to note that as the features from different layers in the network are concatenated, their expected range of values is likely to change, therefore each feature is separately normalised to have a mean of 0 and a standard deviation of 1 over all the training dataset, before being used in the SVM classification process.
The pottery training images were typically rectangular, and so in order to fit them to the neural networks they were scaled to 224 pixels (along the long axis) and cropped to the middle to make them square. However, this meant omitting certain parts of wide sherds that failed to fit the central square of the image.
To train the network to work with varying amounts of decoration/background, the original image dataset was enriched by adding augmented versions of each image: Each image was scaled to four different sizes (224 pixels and larger sizes).
On each scaled image, three orientations were created (unflipped, horizontally flipped and vertically flipped).
Finally, all the images were cropped, leaving the centre.
Thus from each image, 12 training images were created.

The creation of the decoration-based training data
For appearance/decoration based classification, work was mainly carried out on the Majolica of Montelupo pottery. The collection of the dataset was led by the University of Pisa (UNIPI), using existing images (from archaeological excavations, PhD theses, etc.) and through multiple photography campaigns that had been carried out since 2016 (see Section 3.1). The classification of the images was based on that made by Berti (1997) and updated by Fornaciari (2016), with more than 80 decoration genres.
Most of the photos in the corpus were collected in dedicated photography campaigns in Pisa (at the Superintendency warehouse) and in Montelupo Fiorentino (at the local pottery museum). Despite the fact that the photo campaigns were carried out in a warehouse containing the largest collection of sherds of Majolica of Montelupo in the world, not every genre was represented by a high number of sherds. For 19 genres, fewer than 10 potsherd images were obtained, with six of them exhibiting only a single example.
The majority of the images were collected during the autumn of 2017, with more than 8000 sherds being photographed, covering 67 genres with more than 20 sherds, and many of them with more than launched to involve the world archaeological community, and obtain more images of Montelupo's

Complications around the creation of the training data
Additional findings reported by the partners using the model indicated multiple factors that had varying effects on the classification results, including noticeable differences when altering the lighting conditions and/or background colours, and additional changes when using different cameras, or even between pairs of images that looked quite similar. This indicated a lack of stability in the classification system, which had to be addressed if it was to be used successfully in the field.
The evaluation results reported accuracy metrics that were significantly lower than the accuracy measures obtained on the test set during training, thus indicating a potential difference between the domain of the dataset images and the domain of images acquired in the field. Owing to the amount of effort already invested in the creation of the dataset, it was not possible to recapture the dataset images in 'more realistic' scenarios. Therefore, a solution was needed for retraining the model using the existing data.
Of all the factors that affect the classification results, altering the lighting had the most impact and therefore had to be addressed first. Looking at samples from the dataset, it was observed that most images were captured during the photo campaigns and under similar lighting conditions, whereas no colour augmentation was done throughout the training. The rationale for the lack of augmentation was the belief that features extracted from images, using the robust ResNet model, should already have been sufficiently resilient for these sorts of changes in the image. The solution to the first problem (lack of robustness to lighting/colour changes) was to add automatic augmentation, simulating different white balance results, and various brightness and contrast adjustments. This was applied during the generation of the training dataset by multiplying the luminosity ('brightness') of all the pixels within each image, using a randomised factor (between 0.8 and 1.2) per image, to simulate different lighting conditions. To compensate for different white balances, a similar random multiplicative factor was applied to each channel in the image; each of the Red/Green/Blue channels was multiplied by a separate random constant factor, in order to change the ratio between colours in the image. To solve the second problem (learning from details specific to a photo campaign), the ideal solution would have been to remove the image of the sherd from the background, to provide an isolated view of the relevant part in the picture ( Figure 11). To automatically remove the foreground, the GrabCut algorithm for interactive foreground extraction was integrated (Rother et al. 2004) into the application used by the archaeologists. This enabled the users, with a few simple clicks, to reliably extract the foreground from most images.

The training data work low
While it was sufficient for images captured in the app, the training dataset contained thousands of images that had to be extracted from the background in order to fix the training process. To avoid the human labour required to perform segmentation manually, an automated solution was developed which benefits from the same phenomenon that originally hampered the training -the homogeneous capture process. The images, as can be seen in Figure 10, have a mostly uniform background, a potsherd and a ruler. For these images, a heuristic technique was devised to provide an automatic input for the GrabCut algorithm, instead of requiring manual input.
First, colour samples from the background were obtained, by sampling pixels around the border of the image, relying on the fact that both the ruler and the potsherd are centred in the image, and do not touch the image edges. Using the samples of background colour, the distance of each pixel in the image from the nearest background colour we sampled was measured. A threshold operation was then applied to obtain two large connected regions to correspond to the ruler and the potsherd. To make sure the image was indeed segmented correctly, it was necessary to verify whether one of these items is a ruler.
As the rulers have a checkerboard pattern, corner detection algorithms were able to detect corners with strong activation scores at the corners of each square. To detect the corners, Harris corner detection was applied, with at least five corners needing to be present after dropping defections near the edges of the area. To distinguish between the ruler and the potsherd (which may exhibit details with 'corners'), a 'checkerboard error' was applied at the corners of the ruler, so each image patch should be roughly identical to its rotation in 180 degrees. By computing the geometric mean on the difference of the rotated patches around the corners, it was possible to rank how likely it was that each image patch had the checkerboard pattern, and was therefore the ruler. Finally, when the segmented area containing the potsherd was identified, GrabCut was applied to obtain a finer segmentation.
After obtaining the processed training dataset, which was much larger than the original due to the augmentations, the SVM classifier was replaced by a neural network. The network contained two dropout layers, which should make the model learn using a more robust representation, and decrease the chances of overfitting the training data. Furthermore, in order to boost the application performance

Development of ArchAIDE Application and Desktop Comparative Database User Interfaces
In order to create the appropriate software for the ArchAIDE mobile app and desktop applications, partners analysed the archaeologist's existing workflows, from the excavation to documentation, including the context in which the activities were carried out and the related constraints. Feedback on how elements of these workflows might be automated, and the research implications of this automation were also extensively discussed and implemented where possible. ArchAIDE partners worked to respond to the variety of needs identified by potential users, who may want to use the application simply as a tool for recognition or more comprehensively for collecting and storing data in digital assemblages ( Figure 12).

The ArchAIDE mobile application
The development of the mobile application was carried out by Inera with the collaboration of ArchAIDE partners at CNR-ISTI and Tel Aviv University. In particular, CNR-ISTI provided the required support to integrate the libraries for the image manipulation used to prepare the images for the shape and decoration recognition tools. Tel Aviv University also supported the Inera team in installing and integrating the deeplearning models used by the prediction functions.
The ArchAIDE mobile application was designed to be used in many contexts, including where archaeologists often work but where connectivity may not be available, such as storehouses or remote rural areas. In such conditions, the use of the app may be limited to the acquisition of materials (i.e. creating new sherds from pictures taken with the device) or browsing the Reference Database to check characteristics of a specific ceramic type, decoration or stamp (Figure 14). When opened for the first time, the mobile app initialises the local database with all the data from the Reference Database, excluding the images. Users are not required to register to access the Reference Database or automatic classification tools. This approach allows anyone to use the tools, but registration is required for users who wish to store information about their own sherds.
Registered users can save a variety of information about their own pottery (e.g. classification information obtained from the automatic classification tools), and access information about their own sites/assemblages/sherds (a sherd belongs to an assemblage that comes from a site) that are stored in the local memory of the device and, if the device is online, on the ArchAIDE server. The app registers the information locally when offline, but will be saved to the server when online. The synchronisation process between the device and the server is bi-directional. The main features of the ArchAIDE mobile app are: Knowledge-base tools. Search and retrieval tools to access the type, decoration and stamp catalogues managed in the Reference Database.
Search and classification tools. The collection of tools for automatic and supported classification of sherds: image recognition, shape recognition, stamp classification.

My sites.
Where registered users create and save information about their own sites and assemblages.
All tools are publicly available for evaluation and experimentation. The mobile app is available on App Store and Google Play.

The ArchAIDE desktop application
The ArchAIDE Desktop web site is a hybrid application mixing web content and interactive tools. The implementation is based on a Content Management System, but also provides a robust and flexible extension mechanism to incorporate plug-in interactive tools. Knowledge-base tools. Search and retrieval tools to access the type, decoration and stamp catalogues managed in the Reference Database.

Data visualisation tools.
A collection of interactive tools for the visualisation of raw data and statistical data derived from the ArchAIDE database.
Search and classification tools. A collection of tools for the automated and supported classification of sherds: image recognition, shape recognition, stamp classification.

My sites.
Where registered users create and save information about their own sites and assemblages.
All tools are publicly available for evaluation and experimentation via the ArchAIDE Desktop.

Data Analysis
The creation of the comparative data within the ArchAIDE Reference Database also allowed statistical relationships between the variables within the data to be considered as part of the work carried out by the Mappa Lab at the University of Pisa. Statistical techniques were used to explore and summarise the main characteristics of the data and identify outliers, trends or patterns. Specifically, Network Analysis was used to identify significant temporal breaks in the data. The network structure was created by linking together locations where ceramics were produced to locations where the same ceramic type was retrieved. This resulted in the formation of 3853 location vertices throughout Europe, the Middle East and North Africa. The structure included 16,820 different edges, joining together 322,764 different data points.
Network analysis allowed the identification of communities within the network, i.e.
groups of vertices being densely connected internally but poorly connected externally.
Such communities may represent commercial routes adopted by producers, or established for geographical or historical reasons. Temporal breaks were identified by an algorithm minimising the variance within intervals, while maximising the variance between intervals. Production and supply of ceramics had a natural context only in certain temporal intervals, making it possible to distinguish four main periods, characterised by different production centres emerging and declining in the different phases (Italian, South-Gaulish, Rhine productions) and showing different production dynamics.
The main area of interest related to networks, described and handled as mathematical graphs, obtained by linking locations where ceramics are produced to locations where (the same) ceramics were retrieved. This allowed the application of network theory techniques, specifically concerning link analysis, classification and clustering techniques.
Roughly speaking, communities, or clusters, are defined as groups of vertices having a higher probability of being connected to each other than to members of other groups; this can of course be computed and checked in terms of network links. Identification of significant communities in the network draws attention to the main 'import-export' systems and their dynamics. Analysing pottery from a spatial point of view allows a better understanding of the economic connections underlying traffic flows.
This analysis is very valuable for archaeologists as it can provide information about aspects of economics and supply. For example, distribution maps permit the representation of areas where a particular ceramic type was in use. When a distribution map is associated with an area of production or origin, it represents the supply movement of pottery. Even where it is possible to describe the correlation between the origin and the destination (occurrence) to indicate a possible trade route, and so better understand the overall mechanism of the distribution process, it is better to analyse data on a larger scale than can be represented by a single site. In this way, as evidence grows, it is possible to create maps to understand the pottery supply and distribution in a region to be investigated.
For quantitative information attached to points (e.g. the number of items on a site) it was possible to create more complex distribution maps, but this must be undertaken carefully, as in many cases there is no information about the size of an assemblage, such that some sites may be over-represented. Moreover, working on the variation of the assemblages over time, it was possible to understand the correlation between origins and occurrences in order to visualise the variation of the main route of commercial exchange over the centuries. In fact, on the basis of the disparities exhibited by the ceramic chronologies, it was also possible to identify temporal intervals illustrating different network behaviours, and then analyse these temporal intervals separately. These analyses show the dynamics of the major production sites and main export areas, the increase and decrease of production, and the spheres of influence of the major production poles over time.
We focused on the following tools: Classification and Clustering techniques, used for understanding whether or not some features of the data possess useful classifications in a number of categories/groups, subsequently suggesting meaningful interpretation of such categories.
Dimensionality Reduction techniques, used to extract a specific combination of features describing the greatest areas of information and variability contained within the data. These specific combinations together provide a way to summarise the data, and the identification of the major sources of variability.
Spatial statistical methods and related predictive modelling were applied directly within a GIS (geographic information system) module. These tools were used primarily to highlight possible patterns within the spatial distribution of data, and to suggest where to look for more data, more information, or optimal strategies to perform testing, resulting in the application of particular clustering algorithms to the obtained graph. Two different alternatives were chosen. In the first algorithm (Newman 2006), detecting communities in networks was approached using a benefit function known as 'modularity' over possible divisions of a network. In this particular algorithm, termed leading eigenvalue clustering, the maximisation is solved on the basis of the eigenspectrum of the modularity matrix. The second algorithm (Pons and Latapy 2006), termed walktrap clustering, follows a completely different approach, being based on random walks on the graph. The walks are more likely to stay within the same community because there are only a few edges that lead outside a given community. In this way the walktrap algorithm captures the community structure in a network. The two algorithms give very similar results. This indicated that the communities identified were very well defined, and not dependent on the definition of the cluster algorithm.
After applying the clustering algorithm, an additional attribute was added to vertices of the graph, indicating the community. The attribute was also given a colour, so that it can be easily visualised. For the sake of visualisation, the first four communities were prioritised in terms of the number of vertices being the most important. Every other edge/vertex was associated with an additional (poor) community, made by vertices and edges not belonging to the four main communities identified by the clustering. Colours of vertices represent communities identified with clustering.
Another important feature concerning networks was the relative importance of the vertices, such as which vertices were more important and central in the network, and why? In this instance, a measure of such importance can be the out-degree, i.e. the quantity of ceramics 'exported' from a specific location, and another is the in-degree, i.e.
the quantity of ceramics 'imported' in a specific location. Of course these two possibilities give a view of places having produced or imported many ceramics, but networks often have complex structures, so more refined measures of importance were derived. One of the most useful and effective is the PageRank (Brin and Page 1998)

Feedback and Testing
Given the high degree of innovation expected by the project, the partners decided from the beginning to involve the archaeological community in the development of the ArchAIDE system, in order to create a tool that responded to user needs. For this reason, five 'Training Open Days' and two 'Multiplier Events' were held in order to effectively engage the project stakeholder communities, providing feedback, allowing users to test the functionality of the ArchAIDE tools and obtain information about technical specifications. These events were intended as one-day seminars with the aim of creating interaction between prospective ArchAIDE users and the project team before the completion of the expected results. These events also served as a first opportunity to encourage users with different archaeological backgrounds and interests to communicate effectively with each other, and to understand shared issues.

Training and feedback
Entrepreneurs and users from professional associations also shared their perspectives with researchers about what the labour market needs, while students and PhD candidates in archaeology or related areas had the opportunity to explore the versatility of a tool also designed to be used as a training tool for those archaeology students learning how to classify pottery. The 'Training Open Days' were held in Italy (Spoletino and Pisa), in the UK (Brighton), and in Germany (Bonn), whereas the Multiplier Events were held in the UK and in Spain.
The first multiplier event was organised on 7 December 2017 by the ADS at the University of York. The event took the form of a discussion workshop to explain the aims and activities of the project, and to show the first release of the app using appearance-based recognition ( Figure 16). As this was near the mid-point of the project, it was a key opportunity to collect important feedback from the participants while there was a tangible output to demonstrate, but with enough time to make changes based on this feedback. Around 25 professional and academic archaeologists attended the event, resulting in a very fruitful discussion around different issues. Significant feedback was collected and can be found here.
0:00 / 3:37 Figure 16: Video of feedback from archaeologists participating in ArchAIDE workshops around Europe (3 mins 37 seconds) Taken from ArchAIDE consortium (2019) On 3 December 2018, the University of Pisa organised the second multiplier event, with 29 specialists from different museums, research institutions and professionals ( Figure 17). During the event, it was possible to test the ArchAIDE app, both with the image recognition tool using Montelupo pottery, and with the new shape recognition tool, using Terra Sigillata Italica, thanks to the Archaeological Archive of the Museu d'Història de Barcelona, whose staff made its pottery assemblages available for the event.
The specialists were able to test the ArchAIDE app over two hours, allowing the project to receive important feedback leading to further refinement. This approach showed the ways that ArchAIDE seeks not to become a substitute for pottery specialists, but to provide a useful tool that automates only the time-consuming and repetitive activities, ensuring archaeologists are the ones to confirm the identification. This created a generally open-minded and positive mood towards ArchAIDE, although some scepticism persisted among senior archaeologists.

Testing
The performance of the ArchAIDE tools was tested on both desktop and mobile devices. The performance checks were based on the following parameters: Percentage of cases where the correct type was returned by the app five times.
Percentage of cases where correct type appeared as the first of the five outputs returned by the app.
Frequency of outputs shown, taken independently of the input types.
Number of comparative photos available for each type.
Difference between consecutive scores shown by the app results.
Frequency of mistakes.
These parameters were computed after testing the app numerous times on multiple different sherds, by following a randomisation procedure. The testing procedure allowed the identification of critical functions of the app, particularly during the first development phase, and helped to identify the main sources of errors and areas for improvement.
One of the main tasks of the project was devoted to creating two testing scenarios related to different applications. One type concerned small and medium sized enterprises (SME) involved in contract archaeology. These users are heavily constrained by constant digging activity, handling great volumes of material and having short timeframes for their research. The second type of user is Higher Education Institutions (HEI) and research centres. These users may also face restrictions similar to archaeological SMEs, especially during fieldwork seasons abroad or in remote/isolated areas. In many circumstances, however, academic users have facilities (research buildings) with suitable equipment for postexcavation. Moreover, the second user group is not constrained by needing to undertake constant digging activity, as fieldwork is only scheduled during certain periods of the year. Thus, the objective was to test the app in these two different scenarios. Figure 18: One of the many testing events carried out in Spain, Italy and Germany The goal of the testing events was to assess the design of the mobile and desktop apps for the acquisition and identification of sherds, and the search and retrieval component to automatically match and classify a sherd according to the digital comparative collections produced during the project ( Figure   18). For archaeological SMEs the tests were developed in the field or during post-excavation, while in the HEIs testing was carried out in their facilities with a selection of well-studied exemplars. Testing was conducted on large numbers of specimens that were automatically classified according to typology (shape recognition) but also according to decoration (image recognition  Figure   19).

The ArchAIDE Archive
While the mobile and desktop apps were designed to be part of a proof of concept, the ArchAIDE project was committed to creating sustainable outputs where the project held copyright (ArchAIDE consortium 2019). This included making the interoperable, multilingual vocabularies available ( Figure 20). The archive also includes the 2D and 3D models created by CNR-ISTI from the archive Roman Amphorae: a digital resource (University of Southampton 2014), which were used to train the ArchAIDE deep learning algorithm for shape-based image recognition. This aspect of the archive represents a particularly good exemplar of best-practice reuse. Roman Amphorae: a digital resource was first deposited with the ADS in 2005, receiving a small update in 2014. As this archive is a comparative collection, allowing users to identify amphorae types, rather than the output of a particular archaeological project, it is widely used around the world as a resource. In fact, of the over 1000 'data rich' archives held by ADS, it is the single most popular resource, consistently receiving an average of 30,000 page-views every month. As such, the archive has become an authoritative resource, showing the potential of digital comparative collections when made freely and openly available.
When David Williams and Simon Keay of the University of Southampton first deposited the archive with the ADS in 2005, creating automated 2D and 3D models that could be used to create 'virtual sherds' to train the deep learning algorithm could not have been a use envisioned for the resource. As 2D and 3D models were created for every amphorae type and variant from Roman Amphorae, it was possible to link the two archives, amplifying the usefulness of both. With the kind permission of Williams and Keay, when a user now accesses the Roman Amphorae archive, and chooses an amphorae type to explore, in addition to the characteristics, pictures, drawings and petrology etc., there is now a link to 3D models, with the ArchAIDE logo to indicate it is a linked resource ( Figure 21). Clicking through to the model for the desired type takes users directly to the page for that type in the ArchAIDE archive, which includes 2D vector drawings in SVG format for download, and 3D models for interactive use within the 3D viewer (created using 3DHOP). The models can be manipulated and measured in a variety of ways ( Figure 22). The 3D models are in OBJ format, and can also be downloaded for use with 3D software and for 3D printing ( Figure 23). Finally, the ArchAIDE archive contains the video corpus created to both document and promote the project. Several of the videos from the archive are embedded within this article, which focused on the project generally, but there are many more in the archive that include individual interviews with partners, and self-shot partner profiles that were later professionally edited by the University of Pisa.
The archive also includes the 30-minute ArcAIDE Documentary, created with footage shot over the course of the project. The ArchAIDE video archive not only represents a unique record of the project, but also an unusual record of the experience of implementing a European Commission-funded project with partners working across several countries.

Conclusion and Future Work
The ArchAIDE project was ambitious and wide-ranging, but accomplished the majority of its aims over its three short years. Much was learned not only about the use of image recognition to identify archaeological pottery, but also the importance of access to comparative collections and intellectual property rights. ArchAIDE partners worked hard both to develop innovative technologies while still ensuring the knowledge and workflows of archaeological pottery specialists were respected. By being as outward facing as possible, both through the inclusion of Associate Partners and a comprehensive programme of communication and dissemination, contacts were made across the archaeological community who wanted to offer ongoing support and continue to contribute to the project, which has become a challenge now that it has formally ended. Along the way, several unexpected lessons were learned.
As the project progressed, it became evident that the comparative data necessary to implement the ArchAIDE database and app must be derived from a variety of sources, each with different advantages and restrictions. Comparative data (data that is meant to show typical pottery types and characteristics, against which pottery to be identified by the user is compared) is often useful only if it is considered authoritative. For instance, an analogue equivalent of the Roman Amphorae archive might be a particular comparative paper catalogue for Majolica of Montelupo that is accepted by specialists in that pottery type as a required citation in any peer-reviewed paper, and is also therefore considered authoritative. Roman Amphorae is free (as long as users abide by the appropriate terms and conditions of use). This is not the case for the paper catalogue described in the second example, where conversion into a dynamic digital resource was never envisioned.
While useful tools to help digitise the authoritative paper catalogues necessary to show the technical proof of concept of the ArchAIDE app were developed by CNR, this does not mean the ArchAIDE project necessarily now holds copyright to the newly digitised, remixed data (although the metadata created as part of this process by the ArchAIDE project can be argued to be new data, for which the project can claim copyright). Whether these data can be made available outside the proof of concept would need to be negotiated with each copyright holder, which represents a major logistical and (potentially) financial difficulty. This becomes even more complicated if the ArchAIDE app were to be monetised in any way.
The issue cannot of course be solved by ArchAIDE, but instead provides another important proof of concept opportunity by the project. Showing the potential of digitising paper catalogues in a way that demonstrates how their content can be actively reused allows ArchAIDE to open a discussion with publishers and other data providers about the importance of making their resources available in new ways, with a tangible benefit (seeing their data in use within the app), thus furthering the long-term discourse around making research data open and accessible.
Another key lesson was the amount of training data necessary for the image recognition algorithm to return useful results. The researchers at Tel Aviv University were used to working with thousands of examples to train for a particular type, but ArchAIDE partners often struggled to obtain 10 per type. Ten per type was not ideal, but was determined to be the minimum amount that had to be acquired. This resulted in far more time and effort spent on digitising the paper catalogues and undertaking the enormous photo campaigns to capture the necessary primary data. This effort helped partners to understand the importance of working together if the humanities wish to take advantage of the many machine-learning methods now available. Datasets are small, fragmented, and rarely optimised for machine-learning applications.
It was also hoped the photo campaigns might result in new comparative collections that could be made freely available as part of the ArchAIDE archive, but intellectual property rights in many European countries are restrictive, and did not allow photos taken by ArchAIDE partners of sherds held in national and regional collections to be made available. It is hoped that seeing the usefulness of these data within an example application such as ArchAIDE may also help convince the holders of these resources to move towards more open data policies. Figure 24: To address the paucity of training data available, 3D models were broken into 'virtual sherds' and used to train the shape-based algorithm Once of the most fundamental lessons was that it was not possible to design an image recognition system that could identify pottery using both decoration-based and shape-based characteristics. It took considerable effort and discussion, but it became clear that it was necessary to separate them, developing two different algorithms. This allowed a creative outcome, as separating out shape-based recognition allowed the 3D models to be used to create desperately needed training data. By breaking the 3D models into 'virtual sherds' and using the sherds to train the shape-based image recognition algorithm, the accuracy rate was increased to an acceptable level ( Figure 24).
ArchAIDE has shown the potential of using automated image recognition to identify archaeological pottery, while also illustrating some of the challenges that will need to be addressed in future. ArchAIDE has also shown it may be used for a variety of pottery types if the necessary comparative data can be gathered (and potentially other artefact types as well), as virtually all pottery identification relies on recognition based on either the shape or decorative elements of a vessel (or both