New developments in drone-based automated surface survey: Towards a functional and effective survey system

This paper presents new developments on drone-based automated survey for the detection of individual items or fragments of material culture visible on the ground surface. Since the publication of our original proof of concept, awarded with the Journal of Archaeological Science and Society for Archaeological Sciences Emerging Investigator Award 2019, additional funding has allowed us to implement a series of improvements to the method. These aim to improve detection capabilities and the extraction of items' shapes and increase flight autonomy, control, area covered per flight and the type of environments in which the method can be applied while reducing computing needs, processing time and expertise necessary for its application. This paper provides an account of the methods followed to achieve these objectives, their preliminary results and the current development for their implementation into a free and open-source system that can be used by the archaeological community at large.


| INTRODUCTION
In a recent paper (Orengo & Garcia-Molsosa, 2019), we presented a proof of concept for an automated archaeological surface survey workflow combining drone-based image acquisition, photogrammetry and probabilistic machine learning (ML) in a cloud computing platform capable of locating and extracting potsherds from drone imagery.
Since the publication of this paper, the team developing the method has received funding to move from this initial proof of concept into a fully functional system that can be routinely applied in standard archaeological surveys. This paper presents the latest and future developments of the system and explores its potential to standardize the practice of archaeological surface survey.
The workflow provided excellent results in its first field tests, proving to be more accurate and faster than standard field survey under ideal conditions. It is also unaffected by traditional survey biases such as individual surveyor experience (but see Section 3 for an account of ML's biases). In addition, the workflow provides quantifiable information such as density, size, shape and number of potsherds per unit area. This approach is also capable of providing quantifiable spatial information about visibility conditions and percentage of both false positives and undetected potsherds. This information can be used to calibrate the classification results and obtain reliable extrapolations of total ceramic densities. This new approach has the potential to change the way in which pedestrian archaeological surveys will be carried out in the future and offers, for the first time, comparable results between different surveys with a fraction of the costs and time than those of traditional survey. In this regard, the method can offer a way forward for the quantifiable large-scale use of survey data in the analysis of past human occupation.
However, the current set-up of the system presents several problems that hinder its implementation: 1. A certain amount of expertise is necessary for (a) flying or programming the drone flight to obtain photographs with enough overlap for photogrammetric reconstruction, (b) the 1. The method is limited to ploughsoils (in flat and unforested terrain) due to flight limitations of current commercial drone platforms, which are not capable of keeping a constant height above ground.
2. Camera resolution requires the drone to fly at a very low altitude (3-4 m above ground) to acquire images of potsherds at enough resolution for the ML algorithm to be able to detect them. This results in a decrease of the speed at which the drone can fly. This coupled with the relatively short flight times provided by current commercial drones (around 28 min in the case of the Phantom 4 Pro V2.0, which we have previously employed), at the time of the writing of the original paper, limits the area that can be surveyed during each individual flight.
3. Although this approach is able to detect ceramic fragments, it does not differentiate between types of ceramic (which is basic to achieve chronological and typological definition of sites defined by ceramic concentrations) or between potsherds and other types of material culture.
Our new approach aims to build up on our previous research on and experience with drone-based automated archaeological survey to overcome these issues and develop a fully operational, practical and accessible system available to archaeologists in a variety of survey circumstances. In this paper, we describe a novel set of software and hardware tools specifically developed or in development for intensive archaeological survey.

| TECHNICAL IMPROVEMENTS TO THE SYSTEM
A series of improvements aiming to tackle the previously stated problems are currently being implemented, as outlined below.

| Platform developments
During the last months, we have been collaborating with HEMAV, one of the largest drone-based industries in Europe, for the development of a drone based on their popular HAR9 multirotor platform. This drone will be carrying a 40-to 60-MP high-quality camera. This camera will considerably increase image resolution and quality and will allow the drone to fly much higher (and therefore faster) than common commercial drone cameras. This will improve considerably the detection capacity and speed of our previous proof of concept. The drone incorporates other characteristics directed towards archaeological survey ( Figure 1): an altimeter that will allow the drone to fly at a constant height above ground. This will facilitate the acquisition of images with similar ground sampling distances independently of terrain variation. Forward-facing computer vision cameras will allow the drone to avoid obstacles. The drone will have a flight autonomy of around 40-45 min depending on the weight of the selected camera (longer than most commercial and professional drone models available). Another important characteristic of the drone will be the integration of real-time kinematic (RTK) navigation with a stable base station. This will allow a positioning of centimetric precision that will be extremely useful to plan the flight and image capture and will facilitate the photogrammetric processing of the images. It will also allow the drone to continue the image acquisition at exactly the same position where it stopped when recording large areas that might require several battery changes. All in all, these features make our prototype an efficient option to conduct archaeological survey. Commercial drone options such as DJI Mavic Air 2 or Autel Evo 2 with flight times above 30 min and cameras above 40 MP do not usually include an RTK option. Despite including bottom sensors that can measure ground distance, they do not use this information to keep a constant height above ground but to avoid collision. The same is true for the DJI Phantom 4 RTK, which includes a GNSS module and ground sensor but a 20-MP camera (although with a larger CMOS sensor than the previously discussed models). The prototype will be available at a price of €8000 (not including camera), which is a lower price than other drone models with similar characteristics currently available in the market. We chose to install only front sensors to reduce weight, battery usage and price; as a consequence, we have limited the drone to forward flight only. This will ensure the drone will always be able to detect any obstacle it its path. HEMAV Planner will also be configured to simplify image acquisition. The pilot will only need to draw the area that requires survey, and the software will automatically calculate the flight height and route for optimal image resolution and image overlap. This route can be interrupted (in order to change the drone battery or the memory card) and continued at exactly the same position as many times as necessary, thus allowing the coverage of large areas with little loss of time.

| Orthophotomosaic generation process
The previous photogrammetric processing of the images to create a single orthoimage that could be used for the location of potsherds was computationally costly and required specialized photogrammetry software such as Agisoft's Metashape. Although it is true that many open-source options are nowadays available for the photogrammetric processing of multiple images, these are still complex to install and use, and not all of them provide options for the output of orthomosaics. In order to address the issues, we decided to abandon photogrammetry and aim at the implementation of a simultaneous localization and mapping (SLAM) algorithm. SLAM algorithms use external data such as lidar, images and sound to geolocate the sensor and construct an approximate map of its environment as it moves.
Common implementations can be found in consumer robot vacuum cleaners, and they are an increasingly important component of autonomous vehicle navigation systems. More importantly, SLAM solutions can produce real-time data, and, although less accurate than photogrammetry, they provide an excellent alternative to the mosaicking of drone-based imagery (Bu et al., 2016;Kern et al., 2016). Therefore, we will aim to implement a SLAM algorithm, such as ORB-SLAM2 We are now developing tests to decide the most efficient way of SLAM implementation. This can take several forms: the generation of an orthoimage from the acquired imagery using an ORB-SLAMbased system such as Map 2DFusion (Bu et al., 2016), which will be later subjected to potsherd extraction or the direct georeferencing of the potsherds detected in each of the individual images using the output of SLAM location data. This last method has the advantage of using the original images instead of a composed image that can introduce deformations and saving the computation necessary to generate the orthoimage and the drive space it will require to be stored. However, it can incorporate duplicate detections due to the same pottery sherds being present in different overlapping images. Duplicates can be removed at a later stage based on the spatial coordinates of their bounding boxes, but this might cause problems with overlapping potsherds.

| ML algorithms
Our previously published random forest ML algorithm provided results more efficiently than standard pedestrian survey in terms of F I G U R E 1 Field operation and sensors of the AIDAS system the number of items detected and time investment (Orengo & Garcia-Molsosa, 2019). However, the algorithm was only able to locate from 1/3 to 3/4 (depending on the field) of the potsherds counted by experienced surveyors using a very intensive total count method. Another problem was the presence of false positives, which could account for a small part of the total potsherd count. Lastly, although the vectorized extraction of potsherds (see Figure 2) was a good approximation of the size of the potsherds, the shape itself was poorly defined, hindering the application of analyses based on the potsherd morphology.
We are testing and implementing new segmentation algorithms based on convolutional neural networks (CNNs), in particular Mask R-CNN. CNNs are much more efficient than traditional ML algorithms such as random forest, which we used in our previous work (see Figure 2 for a comparison), but their training requires much more computational power and a much larger amount of training data (more than 10 k examples are usually recommended). However, once the training is finalized, the algorithm can be executed with a commercial laptop within minutes. Rather than detection algorithms, we have preferred to employ segmentation procedures, both able to locate the position of objects in an image, as these are able to define the shape of the object of interest while automatically tagging the rest of the image as background (i.e. all other types of objects that are not potsherds and have not been delineated as such), which is adequate to improve the algorithm through several interactions.

Region-based CNNs (R-CNN) is an object detection algorithm
based on a combination of classical tools from computer vision and deep learning, which achieved a mean average precision (mAP) of 53.7%, more than 30% relative to the previous best result (Girshick et al., 2014). The fundamental difference between a CNN and fully connected (FC) neural networks is that the former is capable of detecting local patterns versus global patterns that the latter detects (Torres, 2020). Mask R-CNN detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance (He et al., 2017). It extends Faster R-CNN (Girshick, 2015;Ren et al., 2015) by adding a branch for predicting segmentation masks, a small fully convolutional network (FCN) (Long et al., 2015), on each region of interest (RoI), in parallel with the existing branch for classification and bounding box regression. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN (He et al., 2017).
The chosen work environment is Google's Collaboratory environment (Colab), a Jupyter notebook environment that requires no configuration and runs entirely in the cloud. It allows the use of Keras, TensorFlow and PyTorch. The main motivation for its choice was that it provides free accelerators like graphics processing units or specialized hardware like tensor processing units.
Preliminary results present a modest improvement (see Figure 4) as the Mask R-CNN algorithm has managed to detect a larger number of potsherd than those previously detected in the same field (1698 compared with 1597 for our previously published random forestbased algorithm). Despite these encouraging results, we aim to improve the accuracy of the algorithm using more training data at higher resolution and increase the use of data augmentation methods.
Data augmentation processes are able to multiply the training data, creating new synthetic data through the modification of exiting available data. Usual approaches in computer vision include simple image techniques such as scaling, rotating, flipping, cropping and changes in contrast, saturation, brightness, hue and so on. They constitute a popular option in deep learning models as these usually require large amounts of training data that are not always available.
Initial tests using segmentation-based approaches suggest an increase of 6% in the detection rate and a decrease of 11% in the presence of false positives. The random forest classifier seemed to outperform the deep learning segmentation only when it comes to the detection of fragments typically smaller than 1.2-1.5 cm 2 (for a ground sampling distance of approximately 1.34 mm 2 ); the convolution cannot properly identify the texture and edge of the object. For F I G U R E 2 Comparison between the random forest (a) and the mask R-CNN algorithms (b) [Colour figure can be viewed at wileyonlinelibrary.com] these small fragments, the more colour-focused approach of the random forest algorithm can still identify the pixels as belonging to ceramic fragments. All in all, the enhancement in potsherd detection together with a notable improvement in shape extraction (see Figure 2 for a comparison between these two approaches) that the use of the CNN-based segmentation approach offers widely justifies its implementation in the new system. However, random forest potsherd detection can still be useful in specific survey situations and needs (see Figure 5) such as extensive sampling-based potsherd detection where the drone is flying too high (and therefore faster) for the convolutional approaches to work properly given the reduction in image resolution. Depending on the strategy adopted (see Figure 3), the surveyor might prefer not to acquire overlapping images and move into a regular sampling of the area where every image will be analysed independently to provide a measure of the presence of potsherds in that specific image. However, this approach will have to be more sensitive to the environmental conditions in which survey is carried out as the colour of the potsherds will play a more important role in their detection and high contrast between soil surface and potsherd can significantly increase detection chances. Also, although this approach will be able to provide fast results for a relatively large area, this might need a specific training of the algorithm.
The combination of approaches allows flexibility to carry out surveys with different objectives, from rapidly sampling a large area with a minimum of computational resources to very intensive survey, which includes the extraction of the shape of each fragment of material culture detected.

| Automated survey software
An important component of the system will be stand-alone software where archaeologists in charge of survey campaigns can introduce the images captured by the drone or have them directly delivered while the drone flies. This is an important element of the system as it will not just allow the user to evaluate the performance and take survey decisions while on the field based on the results obtained but will make the system independent of an Internet connection, which was necessary to execute the original workflow. The software will automatically mosaic the images (which the drone flight control system will have acquired at suitable distances for correct overlapping and potsherd identification) and apply the potsherd detection algorithm. We are working towards the integration of an automated method to calculate field visibility conditions using the images. At the moment, our efforts in this regard have resulted in the automated production of green pixel maps that can be used to evaluate the distribution of material culture. However, the range of surface vegetation colours is very wide, and special specific types of vegetation might need to be accounted with purposely trained algorithms. In this regard, the software package will also provide options for the training of new segmentation algorithms for specific survey needs.
Besides the identification of specific types of vegetation, this will serve several other purposes: (a) to allow the development of specific detectors to adapt the algorithm to particular survey conditions that might have not been taken into account with our initial training and/or specific types of potsherds, (b) to enable the system to detect specific types of material culture beyond potsherds (lithics, bone, stone and so on) and (c) to facilitate the adoption of automated survey procedures in other disciplines with similar interests. It is important to note that although the system has been originally designed for archaeological F I G U R E 4 Potsherd detection using Mask R-CNN (Waleed, 2017). The used environments were TensorFlow 1.15.2 and Keras 2.1.0. The implemented parameters were epoch = 160 and Fliplr(0.5) for data augmentation. The results were a loss value of 0.0531, an mAP of 57.33% and a number of potsherds detected of 37 [Colour figure can be viewed at wileyonlinelibrary.com] F I G U R E 3 The implemented Mask R-CNN workflow for instance segmentation (He et al., 2017). The first step is to obtain the feature map of the input image using 101-layer residual network (ResNet-101) and feature pyramid network (FPN). Then, through region proposal network (RPN), the region of interest (RoI) is obtained. After a pooling process, the fully connected (FC) layer is applied to obtain the class and the boundary box, and the fully convolutional network (FCN) to get the mask. With these three elements, we obtain the resulting image with the detected potsherds [Colour figure can be viewed at wileyonlinelibrary.com] survey, there are multiple disciplines that could benefit from such a survey system. Detection of plagues, wildlife monitoring (in particular small species), sedimentology, forensic work and precision agriculture are just a few of these for which this system might be useful.
The software should be able to run the SLAM and segmentation algorithms in a laptop computer preferably with NVIDIA graphics hardware. These are of the order of €1000 to 1500 and should provide enough resources to run all the algorithms within 0.5-2 h, depending on the data size. However, for the training of new algorithms (for specific pottery types or objects), the software will need to access much larger computational resources such as cloud computing services or computing clusters. In this regard, we are considering the option to offer a computation service in a high-performance server to cover this need. The training will employ the acquired images to identify specific types of material culture or any other object of interest. The selected objects will undergo a process of data augmentation before the training of the algorithm, and the resulting detector will be evaluated and improved through several interactions until a satisfactory result will be achieved. The resulting detector will then be incorporated into the software to be used on the field. We aim to simplify the training process through the use of a web-based frontend equipped with interactive vector drawing tools.

| DISCUSSION
The new implementations of the method are directed to improve our initial workflow so it can be applied to a wide range of field conditions. The ultimate objective is to develop a system that can become standard practice in archaeological survey. In this regard, many of the concerns detailed in our initial proof of concept have been at least partially addressed: 1. With the implementation of SLAM and deep learning algorithms, the computing resources necessary to carry out the workflow are drastically reduced, making it available for almost real-time on-site analysis without the need of an Internet connection to access cloud computing services or hours of computing.
F I G U R E 5 Workflow development from data capture to output generation depending on survey type 2. The new drone has an extended flight time and can carry a highquality camera, which will allow it to fly at higher altitudes and faster, covering a much larger area per photograph and therefore increasing the distance between flight lines that will also significantly reduce the length of the flight necessary to cover the same space. This will increase considerably the area covered in each flight. The RTK GNSS connection with the base station will allow the flight to be interrupted for battery change and retrieval of photographs and continued at exactly the same position when surveying large areas.
3. The ground control system and analysis software will ease access to the method to surveyors with little experience with drones or computational methods, significantly reducing the level of expertise necessary to apply it.
4. The increase in image quality and the use of deep learning methods will increase significantly the detection rate and shape extraction while reducing the presence of false positives. Not being so dependent of RGB colour combinations, CNN-based approaches will also improve the detection capabilities for diverging ceramics while reducing the effect of shadows and different light conditions. However, several of the problems noted in our initial paper are still unsolved and will remain so for some time to come: The method is still less efficient than very intensive total count pedestrian survey as executed by experienced surveyors. Despite the increased efficiency with respect to previous random forest approaches, very small-sized fragments (<1.2 cm 2 ) are not so efficiently detected when using the images captured with the previous drone camera. This size will be significantly decreased when employing the new drone prototype with a higher resolution camera; however, this is something to take into account when flying the drone at higher altitudes for the fast covering of large areas when CNN-based approaches might be disadvantaged with respect to a simpler random forest purposely trained.
Future drone survey campaigns, currently postponed due to the Covid-19 crisis, will increase the amount of potsherd (and other items of material culture) data available to train the algorithm. However, it will prove impossible to include in our training data all items of material culture, particularly those unique finds of higher value in account of their rarity such as figurines and written tablets.
It is also important to note that the method will not be able to detect more pottery fragments than any knowledgeable human conscious effort to record all pottery fragments in a given area. The notorious 1-s rule (Ng, 2016), although not applicable to many scenarios, offers a useful guide here. It states that anything than takes an average human less than 1 s can be automated using ML. In this regard, to identify a potsherd will certainly take a surveyor much less than 1 s, and therefore, the system is well suited to the task at hand. However, the simple fact of being in the field instead of looking at photographs offers the surveyor the possibility to improve the data in which the analysis is based. For example, if in doubt, the surveyor can always get closer to the potsherd, change the angle of vision to avoid vegetation and ultimately take the potsherd to have a closer look to its surface and fabrics to make sure it is indeed a pottery fragment.
Given a certain amount of expertise, if possible, a cultural identification will be obtained. This is something that will be very difficult to achieve with remote sensing, and, therefore, an equal performance to that of human surveyors will not be reached in the near future. The method assumes a kind of big data approach in which the quantity of data is more important than the quality of the data. The sheer quantities of a typical pedestrian survey project joined with information of location (x and y coordinates), size and shape make this a very useful approach. It will be possible for a person to do this task, and indeed, this person would do the task much better than the ML algorithm, but to geolocate and draw each fragment will cost such an amount of time that it would be impractical given the relative importance of the data gathered. In this regard, many of the opinions received after the publication of our initial paper showed concern about the possibility to substitute traditional pedestrian survey for our automated approach. This is clearly not possible as collection and laboratory-based analysis of selected sherds in which important information, such as the pottery date or function, will always be necessary. However, this is not to say that future iterations of this method cannot eventually substitute the bulk of surface survey campaigns and leave the collection and analysis of artefacts in the hands of survey and material culture specialists rather than student workforce.
Despite the improvements designed in the drone, there will always be environments where the application of this method will not be possible, such as forested areas or areas where the lack of visibility does not allow to conduct traditional pedestrian survey.
A large hole in the future application of this method is the legal restrictions on the use of drones that different countries might have put in place. Whereas in many countries drone flight projects require of a licensed drone pilot and a detailed project describing the activity and all relevant information that needs to be approved by the competent authorities, other countries do not have legislation in place, or the use of drones for archaeology relies on the discretion of archaeological and heritage authorities. Many areas such as airport surroundings are restricted, and special permits are required to be able to fly within.
Another limitation might be the price of the drone we have developed. Even though the pricing of the prototype described in this paper is very competitive compared with similar solutions, it might be still higher than some archaeological projects can afford to pay. Although we remain convinced that such as investment would repay itself with a minimum use, new commercial drones, such as the Autel Robotics Evo 2 (in production at the time of writing) and DJI Mavic Air 2, can be considered as affordable (€850-1300) viable alternatives at the cost of reduction in flight times and image quality but still noticeably superior to the commercial solution we used in our original proof of concept (Orengo & Garcia-Molsosa, 2019). At the moment of writing the revised version of the manuscript, DJI has already launched the Matrice 300 RTK, which promises to outperform all previously discussed unmanned aerial vehicles (UAVs) and incorporates terrain-follow flight capabilities. Unfortunately, prices for the drone and cameras are still not available. It seems clear to us that drone technology is rapidly catching up with the method requirements, and there is little doubt that this will result, during the next few years, in an increase in the quality and efficiency of automated survey methods and the data they will produce.

| CONCLUSIONS
Initial testing of the hardware and algorithms composing the system makes us optimistic in expecting an important boost of the method's applicability and efficiency. The higher quality camera will allow obtaining images at higher resolution and with better contrast, which will increase the number of true positives and the segmentation quality, while allowing the drone to fly higher and faster. This combined with the longer flight times and constant height will allow covering much larger areas and flight over fruit trees and irregular terrain.
The combination of SLAM and deep learning algorithms will bring an important step change in the applicability of the system, allowing the algorithm to be executed on the field while survey is being conducted and obtaining results in almost real time using a standard laptop computer.
Lastly, we would like to conclude this paper with a call to surveyors, computational archaeologists and software engineers for collaborative work in the development of the method. We are open to discussion, criticisms, opinions and suggestions but also to new collaborations for its development. We believe that it is important for the technique to be developed as a community effort so it can be accessed by and be useful to as many of us as possible.