A novel Structure from Motion-based approach to underwater pile field documentation

This article presents a novel methodology to the underwater documentation of pile fields in archaeological lakeside settlement sites using Structure from Motion (SfM). Mapping the piles of such sites is an indispensable basis to the exploitation of the high resolution absolute chronological data gained through dendrochronology. In a case study at the underwater site of Plo ˇ ca, Mi ˇ cov Grad at Lake Ohrid, North Macedonia, nine consecutive 10 m 2 strips and a 6 m 2 excavation section were uncovered, the situation documented, and the wood piles sampled. The gained data was vectorized in a geographic information system. During two field campaigns, a total of 794 wooden elements on a surface of 96 m 2 could be documented three-dimensionally with a residual error of less than 2 cm. The exceptionally high number of fishes in the 5 m deep water resulted in a significant covering of potentially important information on the relevant photos. We present a machine learning approach, especially developed and successfully applied to the automatic detection and masking of these fishes in order to eliminate them from the images. The discussed documentation workflow enables an efficient, cost-effective, accurate and reproducible mapping of pile fields. So far, no other method applied to the recording of pile fields has allowed for a comparably high resolution of spatial information.


Introduction
Prehistoric lakeside settlements, so-called pile dwellings, are archaeological sites well known in the circum-alpine area where they are labelled UNESCO World Heritage since 2011 (Hafner, 2012;Menotti, 2015;Kaeser, 2017). The particular site category manifests from the Neolithic period to the end of the Late Bronze Age, i.e. between ca. 5300 and 750 BCE, with varying regional and temporal concentrations (Menotti, 2015). Often located in shallow water near lake shores their archaeological remains are usually waterlogged which offers ideal conditions for the preservation of organic material. The high number of pile dwellings around the alps and their exceptional research potential have resulted in a specific research tradition. Having its beginnings in the mid 19th century already, pile dwelling archaeology relies on a highly specialized methodology today which is constantly developed and adapted (Eberschweiler et al., 2006;Pétrequin, 2013;Ruoff, 2006).
As the term 'pile dwelling' already indicates, wooden piles play an important role in the archaeology of lakeside settlements. The piles, vertically protruding from the lake bottom, usually manifest in thousands at a single site. The piles under discussion represent in situ preserved remains of numerous house buildings and other wooden structures such as trackways, palisades or fences and often originate from several successive construction events and occupation phases, leading to a dense cover of piles on the lake bottom (Fig. 1). Multiphase sites encompassing the Neolithic and Bronze Age period can reach pile numbers up to 50,000.
Hence, these 'pile fields' are a characteristic archaeological feature of lakeside settlements. The careful recording of the piles in combination with dendrochronological analysis allows the determination of the high resolution chronological development of settlements, with specific regards to their extension as well as to house ground plans and the general layout (Fig. 2).
The site of Ploča, Mičov Grad, located on the eastern shore of Lake Ohrid serves as a case study in the following (Fig. 3, Fig. 4). The first underwater archaeological investigations started in 1997, followed by several further interventions until 2005 (Kuzman, 2013;Naumov, 2015). New interdisciplinary investigations of this site by the Institute of Archaeological Sciences of the University of Bern in partnership with the Institute for Protection of Monuments and Museum Ohrid and the Center for Prehistoric Research in Skopje between 2017 and 2019 have not only offered the opportunity to monitor the site's preservation condition and to thoroughly review the current state of the art by applying specialized expertise, but also to develop novel approaches to the recording of archaeological remains underwater. Preliminary results from recent dendrochronological and radiocarbon dating prove settling activities around 4500-4300 and 1800-1300 BCE with a major concentration in the middle of the 5th millennium BC (Hafner et al., 2021). Today, the site is completely covered by water with depths up to 5 m. As has already been shown in the campaigns between 1997 and 2005 the pile field of Ploča, Mičov Grad extends over an area of approximately 7500 m 2 (Fig. 4, A). During the diving campaigns 2018-19 a total area of 96 m 2 , situated in the center of the pile field, was investigated more thoroughly (Fig. 4, B). Close to 800 wood remains, mostly from vertical piles used in construction, were recovered from this area.
In the framework of this recent research the development of a new approach to pile field recording has been developed which has proven to be ground-breaking for the research at comparable sites. The scientific documentation and excavation of prehistoric pile dwelling sites underwater started in different European countries in the late 1950 s (Arnold, 1986;Bukowski, 1965;Kapitän, 1961). The high density of prehistoric pile dwellings in the circum-alpine lakes has essentially promoted the technical development and strengthened experience not only in the archaeological investigation of pile fields, in particular in the related underwater recording techniques. Pioneering work in this regards was carried out at lake Zurich since the 1960 s (Ruoff, 1981;1971;Mäder, 2020). In the study of pile fields, information is mainly gained by means of mapping and wood sampling for dendrochronological analysis of all piles in a defined area, even if no organic archaeological layers are preserved or excavated (Fig. 2) (Hafner, 1992).
Today, different methods are applied for the recording of underwater pile fields. In water depths of more than 2-3 m, mapping each pile using a total station and a prism pole is difficult to execute and inefficient, and is hence rarely applied. Rather and most commonly a local measuring grid with measuring tapes and lines or metal frames is set up. Each pile is given an individual ID. Their local coordinates are recorded within the local measuring grid with the help of a folding ruler m 2 by m 2 (Arnold, 1986;Kapitän, 1961). For georeferencing, the grid's corner points are measured with a total station or an RTK-GNSS receiver and the local coordinates converted into a global coordinate system. Another previously established simple but efficient method to record a pile field is the tracing of the features of the excavated surface with a wax crayon in scale 1:1 onto a transparent plastic foil or acrylic glass (Fig. 5, A.B). Later, the single 1:1 drawings are scaled down and combined to a coherent map. In more elaborate, longer-term campaigns two fixed metal rails with a movable connection are used as excavation grid, onto which an acrylic panel is fixed above each m 2 to draw in the described manner. The local coordinates of the excavation grid are then transformed into a global coordinate system (Arnold, 1986;Hafner and Suter, 2004;Pohl, 2007;Ruoff, 1981;1971). On a trial basis, this method has recently been further developed by using single photos per square meter, shot with an action camera mounted on a metal frame (Schärer and Pinto, 2020). The so-called SUISS 'Hydra', a prototype for sophisticated RTK-GNSS-based underwater surveying (co-developed by the underwater Archaeological Department of the City of Zurich in the years 2011-2013) enables to record the position of each pile without relying on a local reference frame. Using this device one diver can measure each pile efficiently and accurately in all three axes with a precision of 5 cm ( Fig. 5C) (Mäder et al., 2013). Equipment such as the prototype of the 'Hydrocrawler' (Degel et al., 2019) also appears to have promising potential in this regard but was never used to record pile fields in connection with consistent pile sampling. Therefore, no statements can be made about the reliability of this approach.
An up to now rarely used technique for the recording of pile fields is Structure from Motion (SfM) photogrammetry to produce georeferenced orthophoto mosaics. This technique is already widely used in terrestrial archaeology and for the recording of historical monuments. In underwater archaeology, it is particularly applied in the documentation of shipwrecks (e.g. McCarthy et al., 2019;Yamafune et al., 2017;Yamafune, 2016), but also for a variety of other submerged archaeological sites (e.g. Abdelaziz and Elsayed, 2019;Reinfeld et al., 2019;Menna et al., 2018;Pacheco-Ruiz et al., 2018;McCarthy and Benjamin, 2014;Bruno et al., 2013;Henderson et al., 2013). Initial tests have been made on pile dwelling sites, however, established working procedures integrating systematical dendrochronological sampling have not been discussed so far (Block et al., 2017;Pohl and Weßling, 2016). In the following we describe the development of a targeted approach to the documentation of a pile field by means of SfM. Special attention is paid to the connection of the spatial position of the piles with the respective dendrochronological sample information. The main objective of the described procedure lies in the efficient achievement of high precision results at low financial and logistical expenses.

SfM in archaeology
The application of SfM in archaeology has extensively been described and discussed (e.g. De Reu et al., 2014Reu et al., , 2013Doneus et al., 2011;Green et al., 2014;Reinhard, 2013;Remondino, 2011;Verhoeven, 2011;Verhoeven et al., 2012), hence only a brief overview will be given here. The fundamentals and mathematical principles for close range photogrammetry and mapping started being developed in the mid 19th century already. The origins of SfM as applied today lies in machine vision, whose mathematical foundations were developed from the 1950 s onwards. These are embedded in the algorithms of today's SfMsoftware packages which became very popular and easy to use from around 2010 on. In addition to complete hard-and software solutions, for example for industrial close-range photogrammetry, both free and paid consumer-grade software solutions are available on the market (Luhmann et al., 2020).
To reconstruct three-dimensional structures from two-dimensional photos, the SfM-software semi-automatically identifies common features on the overlapping photos and tracks their position. From the corresponding image coordinates and the (partially) known intrinsic geometry of the camera, the SfM-computation reconstructs the positions of the corresponding features in a metric 3D object space, together with the 3D camera positions and orientations. The latter form the basis for stereo matching, followed by photogrammetric triangulation (ray Neolithic lakeside settlement at Sutz-Lattrigen, Riedstation (Bern, Switzerland). Through the combination of systematical pile mapping and dendrochronological data a high resolution chronological settlement development is reconstructed (Plan: Archäologischer Dienst Bern, René Buschor) (Hafner and Suter, 2000;Hafner, 1992).
intersection) to densify the point cloud (Luhmann et al., 2020;Szeliski, 2011). The calculated 3D point clouds then serve as a framework for the extraction of products such as a textured mesh, an elevation model or an orthophoto mosaic. Based on application tests and published results, we opted for the software PhotoScan Professionalsince 2019 called Metashape Professional (Agisoft LLC, 2020). Compared to other options its userfriendly interface and workflow from the data import to the georeferenced raster graphics output, the fast calculation times and the good results for poorer quality photos as shown by others (Barbasiewicz et al., 2018), were convincing.

Adjustments to SfM for underwater documentation
Underwater, the camera must normally be carried in a waterproof housing filled with air, which separates water and air through glass or synthetic material. Hence, before the light enters the lens it is refracted multiple times, resulting in an enlargement of the image content and a restriction of the field of view. In addition, color absorption according to water depth has to be considered. The metric values affected by the light refractions are considered by the software in the calculations. McCarthy and Benjamin (2014) have demonstrated that the algorithms used today can reconstruct a 3D-model sufficient for archaeological purposes without calibrating the camera in the water beforehand. The calibration is thus carried out on the photos directly. Theoretical considerations about the refraction process as well as practical experiments have confirmed that the quality of the model is higher when a dome port is used for the underwater housing instead of a flat port (McCarthy and Benjamin, 2014; Menna et al., 2017). The described optical changes can hence be reduced. Ideally, the magnification effect underwater is fully compensated by the dome port.
Aiming at a simple procedure, no expensive artificial lighting was used for the photos (Fig. 6). This approach requires a camera with high light sensitivity in order to achieve a sufficiently sharp image quality even in suboptimal exposure conditions. We opted for the compact camera Panasonic Lumix DMC-LX10/15 with an equivalent focal length of 24-75 mm and a maximum light intensity of 1.4 at 24 mm. A modelspecific underwater housing from Ikelite was used with a water-filled wide-angle dome WD-3. This configuration represents a compromise between photo quality and affordability. The water filled wide-angle dome leads to blurred edges in the photos which must be considered while recording (Fig. 7).
Coded targets provided by Metashape Professional were used as ground control points (GCP). These are automatically detected by the software. The according coordinates can be added to the GCP at a later stage. In order to meet the given circumstances, the targets were custommade: They were printed in a waterproof process onto plastic cards and glued onto 2 mm thick galvanized steel plates. To be able to fix the labels quickly onto the lakebed magnets were mounted on a set of galvanized steel nails (Fig. 7).

Automatic detection of fishes in underwater photogrammetric images
On the final orthophoto mosaic, which is the basis of a pile plan in QGIS (QGIS.org, 2020) the coded targets and pile number tags need to be visible and readable. Taking into account the underwater visibility conditions, fishes were more of an issue than suspended particles or light reflections. Fishes tend to gather around divers as they usually gather under boats, which can be an obstacle in the scientific recording of a lake bottom. Whereas they cover information on a single photo, the fishes generally pose no critical problem for the reconstruction as they are constantly moving and hence do not cover the same spots on the overlapping photos. Yet, they are still problematic, as large patches of the input photo texture are used to create the orthophoto mosaic. There, they frequently cover pile number tags, coded targets or possibly important archaeological information. In 2018, fishes on photos were manually masked in order to filter them out. Within a collaboration with the Institute of Geodesy and Photogrammetry of the Swiss Federal Institute of Technology in Zurich in 2020, a software for the automated detection and masking of fishes was developed and successfully applied to the photos (Steiner, 2020).
A machine learning approach was employed to automatically detect and mask out fishes in the individual underwater images. We employed a popular deep convolutional neural network architecture called MaskRCNN, which jointly solves both tasks (He et al., 2017). MaskRCNN starts by encoding the raw input image into a latent feature representation through a series of convolution layers. Based on those feature the algorithm selects box-shaped candidate regions likely to contain objects and classifies each such region a fish or some other object. The fish regions are retained and segmented by classifying their individual pixels into the fish and background classes, based on the same latent feature encoding. All filter weights within the neural network are learned endto-end from data, by feeding the network pairs of input images and ground truth annotation masks, and iteratively optimizing the weights to reproduce the correct masks (Rumelhart et al., 1986). In our work we use the Detectron2 (Wu et al., 2019) implementation of MaskRCNN and train it on the manually annotated masks from the 2018 campaign (some annotations were repeated to ensure an optimal detector, as mistakes were discovered in the original annotations). For the task of masking texture images, it is preferable to generate too large rather than too small masks, since wrongly masked background pixels near a fish can be filled in from other viewpoints, whereas wrongly retained fish pixels can propagate to the texture map and degrade the visual quality of textured models or orthophotos. Hence, we enlarge the predicted masks by 20 pixels. The final detection model successfully masks out greater than 90% of all fish pixels, while over-estimating the area of the masks by 35-40% (Fig. 8, Fig. 9). We point out that these values refer to test images from the same underwater campaign. As the detector is based on machine learning, one would expect some performance loss when generalizing to different image characteristics (camera model, lighting and water conditions, etc.). It may be necessary to retrain the model with images acquired under similar conditions and associated ground truth masks.

Underwater workflow
For the documentation of the piles at the site of Ploča, Mičov Grad, an excavation grid of 10 m by 10 m was set up. In 2018, the corner points were marked with wooden measuring posts driven ca. 1 m deep into the lake bottom and measured using RTK-GNSS. Therefore, the receiver was mounted onto a buoy and positioned over the corner points with a rope by a diver. The diver was able to position the RTK-GNSS-device directly above the point of interest by pulling on the rope while the measurement was executed from a boat. This method is suitable for a small number of points to be measured but is vulnerable to displacements of the buoy  caused by waves and currents. Alternative measurement concepts were tested in 2019 in collaboration with the Institute of Geodesy and Photogrammetry of the Swiss Federal Institute of Technology in Zurich (Fandré, 2020).
After the setup of the excavation grid, the macrophytes, the stones and the silt on the lake bottom were removed on a strip of 10 m by 1 m until the surface of the first archaeological layer was reached (Fig. 10). The pile heads, which were mostly eroded down to the lake bottom, level were cleaned from the sediment in order to be clearly visible on the photos. As a preparation for the later wood sampling, tag labels with inventory numbers were attached to each pile with galvanized steel nails. The extracted wood samples will be tagged with the identical labels and hence remain attributable to the in situ-pile and vice versa.
For the photogrammetric documentation, the excavation grid was set up with measuring tapes and coded targets were placed as GCP on the lake bottom accordingly at a distance of 1 m to each other. For a strip of 10 m 2 a total of 22 coded targets were used. To be able to calculate the altitude of each target, the water depth above all of them was measured. In the subsequent photographic recording, the strip was photographed from two sides 1-1.5 m above the lake bottom (Fig. 6). Special attention was paid to a continuous overlapping of at least 60 %, resulting in a visibility of two or four targets on most photos. Half of the shots of the lake bottom were taken from a vertical perspective and the other half in a slightly oblique angle. Between 60 and 110 photos were taken for each  10 m 2 . All photos were saved as RAW and JPEG formats to be able to adjust the white balance and exposure later manually. Before taking the wood samples underwater, the photo-alignment in PhotoScan/ Metashape Professional (Accuracy: medium, generic preselection; key point limit: 40000; tie point limit: 4000; adaptive camera model fitting) was evaluated to be able to repeat the photo documentation in case of  errors or gaps. When the sparse cloud was usable, the targets and magnetic nails were removed, and the piles sampled. This procedure was repeated for each 10 m 2 strip.

Application-Workflow
Before starting the application-workflow in Metashape Professional raw images have to be processed. In 2019, a grey card was used on site as a reference for the later white balancing which resulted in a higher color consistency than in 2018. After that, the photos were saved as JPEG at the highest quality level and imported together with the fish masks into Metashape Professional. At first, the coded targets were recognized and where the automated recognition failed, they were marked manually. The coordinates of those GCP were imported at the beginning, but for the alignment, all GCP were deactivated again. The alignment was done under the following settings: Accuracy: high, generic preselection; key point limit: 40000; tie point limit 4000; adaptive camera model fitting, apply mask to key points, and adaptive camera model fitting. The setting of the key and tie point limit is strongly depending on the quality of the photos and the character/nature/appearance of the captured surface. For this dataset, the results were not better at higher limits.
Since the photos were shot in two straight lines the sparse cloud may be reconstructed with a dent. This is also referred to as 'bowl effect'. As the coordinates of the GCP will counteract the 'bowl effect' it is important to distribute them regularly over the entire strip. Consequently, all 22 GCP were activated after the alignment. Based on the residual error, the 12 GCP with the largest deviation were then deactivated and the camera positions optimized. This was done as there was no visible 'bowl effect' and therefore the residual error seemed rather caused by the manual positioning of the targets than by the model itself.
The Dense Cloud was created with Quality High and Depth filtering Aggressive. After the calculation of the Digital Elevation Model (Interpolation enabled) the Orthomosaic (blending mode: mosaic, enabled hole filling) was computed and exported as GeoTIFF. For the control of the deviations from the ideal excavation grid, the GCP references with the residual errors were exported as txt-file.
In QGIS all piles were digitized and provided with the numbers visible on the orthomosaic. This was the last step before obtaining the pile map. At this stage additional information such as the wood-species, the number of annual rings, the age of the individual piles, etc. could be added. (Fig. 11).

Results and discussion
During the two archaeological diving campaigns of 2018 and 2019 at Ploča, Mičov Grad on Lake Ohrid, a total of nine successive 10 m 2 strips were documented following the workflow described above. In this area, wood samples for dendrochronological analysis were taken of all the piles. In an additional area of 6 m 2 , the archaeological layer was excavated in four spits, and the wood piles were sampled as well. In this trench, the surface of each spit was documented by means of SfM. Within the 96 m 2 the position of 794 wooden elements, mostly in situ-piles, could be extracted from the generated orthophoto mosaics (Fig. 12). Two divers were able to fully document 10 m 2 with up to 130 wooden piles in less than an hour whereby the photographing took around 10 min. The most time consuming part was the preparation work: the initial set-up of the measuring tapes, the accurate placing of the coded targets and the measuring of the water depth for each target.
In addition to the spatial position (X, Y, Z) of the wooden elementsthe main information to be extractedthe entire cleaned layer surface was recorded in the same step and therefore further archaeological information is captured fast and in high resolution. It is strongly recommended that SfM-based orthophoto mosaics are cross-checked with the original situation on-site and complemented with interpretive information, such as stratigraphic indications, sediment qualities, finding and structure categories. The present case study did rather target the recording of the pile field than of the surface as such.

Accuracy evaluation
In order to monitor the accuracy of the photogrammetric documentation and to assess its actual usability for underwater pile field recording the deviation from the ideal measurement grid was evaluated. Using 10 GCP for georeferencing, it was possible to obtain a subpixel error in all nine documented strips. The combined residual error of the used GCP from the ideal excavation grid lies roughly between 1 cm and below 2 cm (Fig. 13).
As mentioned above, mainly two factors affect a deviation from the ideal excavation grid's coordinates: First, the error produced while manually setting up the local grid underwater and positioning the coded targets; and second, the error in the reconstruction itself, which is mainly due to insufficient photo quality and camera parameters which in the worst case can result in a bent point cloud. Since measuring errors of several centimeters can occur easily when working with measuring tapes underwater and since a 'bowl effect' is not visible by eye, it must be assumed that the first factor is the main source for the detected residual error. To mitigate this main factor of deviations in future applications, metal tubes driven into the sediment before cleaning the surface, will be used to have reusable and more consistent measuring points for georeferencing. For the documentation of a pile field and its further analysis, the achieved residual error of below 2 cm is sufficient.

Limitations and solution approaches
The described method for pile field documentation is applicable with slight adaptations to a broad range of archaeological underwater studies. Nevertheless, besides its proven effectiveness three limiting factors need to be considered: 1: Photo Quality. The photos are the only source for the reconstruction. Poor photo quality will lead to a low reconstruction quality or at least result in difficulties reconstructing the documented excavation field. The environmental conditions have always to be considered and the recording strategy adapted accordingly.
2: Computational capacity. Most of the computations are very demanding in regards to hardware. With an increasing number of photos, the calculation time will increase rapidly. 3: Georeferencing. To measure corner points of a excavation grid by RTK-GNSS or total station the water must be bridged. In shallow water, this can easily be achieved with a pole. Deeper water requires a workaround to position the prism or RTK-GNSS receiver correctly on the water. Waves and currents can lead to additional deviations when ropes or long poles are used.
The first two factors are mostly connected to visibility: A good visibility enables to take high-quality photos and shooting from a greater distance the number of photos decreases considerably. Bad visibility obliges for shorter shooting distances, which increases the number of  photos. More photos will also lead to higher processing times and require better computational capacity. Having large surfaces to document it makes sense to divide them into smaller units when the hardware is at its limits. On the one hand, it is noteworthy that even with bad visibility of fewer than 0.5 m it is possible to achieve useful results (Pacheco-Ruiz et al., 2018). Though, data processing might become an issue. On the other hand, when visibility is very good and the area to record is not in deep water, light reflections of direct sunlight could cause poor image quality. Shooting early or late in the day or with clouded sky can prevent related disturbances. Even though light reflections were frequent in the presented case study, the photo alignment was always successful. Indeed, the need for the checking of the calculated model before sampling and the need for enough exposure to light has an impact on the excavation workflow, in contrast to conventional techniques. Due to the highly divergent conditions of the sites (mainly related to differences in water depth, visibility, and the density of piles), a direct comparison of the time required between the method presented here and conventional methods is difficult. Obviously, the documentation method proposed here is particularly suitable for sites with a high and dense occurrence of piles. An enormous amount of time can be saved here. In cases of widely spaced features the proposed method seems to add less value in terms of time management.

Conclusion
As an alternative method to conventional pile field documentation, the discussed SfM-based documentation workflow is reproducible, effective and highly suitable. Compared to the conventional methods it is logistically more flexible, more cost-effective and less time consuming. More detailed information is gathered. The gained time can be invested in other relevant tasks. In contrast to measurements with a folding ruler, the method presented allows measurement errors to be identified and evaluated retrospectively. Another advantage of the orthophoto mosaics is the opportunity to successively plan further work steps in a very detailed manner based on a maximally accurate model of the current situation underwater.
The possibilities the application offers are versatile and not restricted to the elaboration of pile plans, of course. As demonstrated in the in situdocumentation of shipwrecks or terrestrial excavations the application of SfM can substitute hand-made drawings or sketches (De Reu et al., 2014Doneus et al., 2011;Sapirstein and Murray, 2017). This is also the case for complex archaeological features in lakeside settlements. The SfM-based orthophoto mosaics are an efficient high resolution alternative to time consuming hand drawings based on error-prone measurements by hand. The images and models must be continuously assessed and evaluated during the ongoing work for optimal results. A 2D orthophoto mosaic is just one way to export a visualization from the generated three-dimensional reconstruction. Diverse possibilities to display and analyze the reconstructed three-dimensional record are given, such as digital elevation models or surface profiles, as well as high resolution bathymetric maps (Reich, 2020). In addition, the threedimensional reconstructions offer outstanding opportunities in museum presentations and provide the general public with insights into a world that is normally only accessible to a few. Adapted to the specific conditions and needs of underwater research, SfM-based documentation is a powerful tool in pile dwelling archaeology.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.