1 Introduction and related work

The construction of infrastructure is a common task in every country. Either new infrastructure needs to be built, or it needs to be fixed. Infrastructure construction is currently the most rapidly growing construction subsector with an estimated annual growth of \(4\%\) (Harrison and Leonard 2021) till 2030. Even though the demand in all construction sectors is increasing, this trend will presumably continue with an estimated growth of \(3.5\%\) per year till 2030 (Harrison and Leonard 2021). However, already now delays in the construction phase are a global phenomenon. This problem might even become more severe when the demand for infrastructure construction increases rapidly, as the capacity of construction companies, can not be scaled as easily and fast. For example in road construction, companies require construction machines as well as skilled workers to operate these. Therefore, the planning and scheduling of the work and the resources, which is already identified as one of the top 3 delay factors in Zidane and Andersen (2018), the other two being related to delays in payment and changes during the construction, becomes even more critical.

An initial plan in street construction is never final as the conditions underground can not be known beforehand. Therefore, one needs to be able to react to disturbances by adapting the planning and scheduling. To make necessary adaptions, delays must be detected as early as possible. The best-case scenario would be real-time comparisons of the planned and current state, but checks of this frequency can not be done manually. Depending on the size of the construction site, comparisons might be done weekly or daily, but never in real-time. Therefore, such comparisons must be made in an automatic way. Many construction vehicles are already equipped with a lot of sensors and assistance functions that could help in extracting the current state. However, the current sensors can not yet give the complete picture directly. Nonetheless, if the as-build state could be extracted with all necessary information, a comparison with the plan is less of a concern as planning data is usually available in a digital form. When the comparison shows a significant difference between the planned and actual state, the next step is to identify the reason for the deviation to act accordingly. In the ideal case, this derivation would also be done automatically. Of course, the extraction of the current state and the reasons for a delay are specific to every construction task. One common activity in nearly all construction sites is excavating a pit, which will be the primary use-case of this work at hand.

The estimation of the volume of a complete excavation pit is a well-researched problem from the theoretical point of view (Chen and Lin 1991; Mukherji 2012; Khalil 2015; Tyulenev et al. 2017). The estimation of the volume during the construction phase is less common. Such an estimation needs to include real sensor data. One approach here is to use additional sensors in the surrounding of the pit (Bügler et al. 2014), which might only be applicable for some construction sites. A more promising trend is the advancement in autonomous excavation, which inherently also mandates pit monitoring. Here, current research deals with the lower (Maeda et al. 2014) (Fig. 1) and higher level (Schmidt 2016) (Fig. 1) control, but also with the environment perception (Emter et al. 2017) and specifically the modeling of it (Woock et al. 2020; Heide et al. 2020). Combining the two levels has only recently become possible, as in Jud et al. (2021) (Fig. 1), which uses similar sensors and representations. However, the paper does not explicitly estimate excavation progress or speed. There are even the first commercial solutions Robotics (2023) for autonomous trenching, but they also do not focus on the actual trenching speed, only sub-aspects. The trend in construction monitoring is to try to concentrate all available information in a central platform Woodhead et al. (2018); Li et al. (2018); Tang et al. (2019); Schöberl et al. (2021). The problem with sensor data from, e.g., construction vehicles is that the raw data can not be transferred directly. The amount of data and the update frequency are often too enormous and fast. For example, it would take roughly 200 Mbps to transfer the raw point cloud (128 lines \(\times\) 1024 points \(\times\) 20 Hz \(\times\) 64 bits per double value) and the raw camera images (352 \(\times\) 264 Resolution \(\times\) 25 Hz \(\times\) 2 channels) from the sensor system to a central platform. This example only includes parts of the data and only accounts for one potential vehicle. Before the data can be collected and distributed by such a data-broker, the raw data needs to be preprocessed, and relevant information needs to be extracted and assigned to a specific task in the plan and schedule.

Fig. 1
figure 1

Different prototypical autonomous excavation systems. Top left: Maeda et al. (2014): Autonomous 1.5t excavator by the ACFR (Australian Centre for Field Robotics); Top right: Schmidt (2016): Autonomous excavator THOR (Terraforming Heavy Outdoor Robot); Bottom left: (Jud et al. 2021): HEAP (Hydraulic Excavator for an Autonomous Purpose) by ETH Zürich; Bottom right: Bagger ohne Bediener (BoB) https://www.iosb.fraunhofer.de/de/projekte-produkte/autonomer-bagger.html (german: Excavator without Operator) by the Fraunhofer Institue of Optronics, System Technologies and Image Exploitation (IOSB)

The work at hand will try to solve the problem of an automatic progress estimation utilizing autonomous excavation. An autonomous backhoe loader, more specifically the excavator’s arm, is used for a prototypical implementation of an automated excavation state and speed monitoring. The extracted information is then sent to a web-based application that compares the actual readings to the planned data and gives possible reasons and recommendations for solutions. The used hardware and already existing software of the backhoe loader are described in Sect. 2, and the methodology for extracting additional sensor data for a progress estimation is described in Sect. 3. Afterward, an experimental evaluation is done in Sect. 4. The comparison between the planned and measured data from the construction site, as well as the recommendation deduction, is described in Sect. 5. A discussion of the approach and experiments takes place in Sect. 6.

2 System and preliminary work

The progress estimation of the excavation pit should be based on an autonomous excavation arm of a backhoe loader. Of course, such an autonomous system already yields a lot of helpful intermediate states and systems, which will be beneficial for the estimation, and are therefore described in this section.

2.1 Hardware

The used backhoe loader is a John Deere 410J TMC with a front-loader and an excavation arm, an image of the machine can be seen in Fig. 2. The backhoe loader is already equipped with encoders that give the position of each joint, as well as a control system to control each of the joints via CAN-bus signals. Additionally, a Microstrain 3DM-GX5-25 Inertial Measurement Unit (IMU) and a Hemisphere V320 global navigation satellite system (GNSS) is mounted on the top of the cabin, and a corresponding Real Time Kinematic (RTK) base station is placed in the vicinity and connected to the GNSS system. With this, precise localization down to 2 cm of the machine can be achieved. For the perception of the surroundings an Ouster OS0-128 3D-laser scanner with 128 lines and a Field-of-View of \(90^{\circ } \times 360^{\circ }\) and a IFM O3D301 Time-of-Flight (ToF) camera is mounted next to a simple RGB-webcam on the crowd.

Fig. 2
figure 2

Backhoe loader with additional sensor system

2.2 Software

The higher-level behavior-based control system for the excavation process is mainly based on the architecture already described in Groll et al. (2017). A short summary is given in Fig. 3. Essentially the complete digging cycle is divided into 4 phases, “move to digging”, “digging”, “move to dumping”, and “dumping” arranged as a state machine. Each phase is further divided into smaller motion primitives. Transitions between phases happen when the right conditions, e.g., bucket angle, are met.

Fig. 3
figure 3

The architecture of the trenching process (Groll et al. 2017). Images of each phase can be seen in Fig. 5

One preliminary perception algorithm which is used in this work is the bucket volume estimation described in Hemer et al. (2018). Here, a ToF-camera measures a point cloud of the bucket. Then, a rasterized grid map is created. Afterward, the difference to a grid map of the empty bucket is calculated. This difference map allows the estimation of the filled volume inside the bucket. Also other estimation techniques (Zhang et al. 2021; Ding et al. 2021) for the bucket volume and fill level exist, but as the one in Hemer et al. (2018) (Fig. 4) was specifically implemented for the backhoe loader at hand no additional integration work needed to be done and this approach was therefore used. Another used result is the classification of rocks inside a rock pile, described in detail in Groll et al. (2020). Here, an RGB image and a depth image are used to capture the scene. A watershed-segmentation algorithm extracts pixels belonging to different rocks. With these pixel segmentations, an estimation of the rock sizes becomes feasible. In Groll et al. (2020), this approach was used to determine if a rock would need to be crushed for loading, but here it will be used in a slightly different manner, as described later.

Fig. 4
figure 4

Bucket fill level estimation (Hemer et al. 2018)

3 Approach and implementation

As a next step, one needs to lay a foundation of how a suitable progress estimation of an excavation pit can be done. There are already methods known to parameterize the progress speed in civil engineering. One speed estimation method would be to measure the excavated volume per time. However, one wants to normalize this excavation speed with respect to the current circumstances. This adapted speed is usually called performance \(Q_{n} [\frac{m^{3}}{h}]\) and the relevant parameters will be explained further. Afterward, new methodologies on how to extract necessary information from the sensor data are given.

3.1 Progress estimation

The main basis of the progress estimation is based on the works of Girmscheid, especially (Girmscheid 2010a, b). An overview of the necessary parameters is given in Table 1.

Table 1 Progress estimation parameters according to (Girmscheid 2010a, b)

The performance can then be computed as:

\(Q_{n} = \frac{V_{B}}{t_{c}} \times 3600 \times \alpha \times \phi \times f_{1} \times f_{2} \times f_{3} \times f_{4} \times f_{5} \times \eta _{1} \times \eta _{2} \times \eta _{G}\) From the needed parameters, the ones that can be measured automatically are described first. The datasheet of the machine defines the nominal bucket volume. The cycle time is the time between the start and end of each digging cycle. The dissolving factor is the ratio between the volume of the compacted and the loosened soil. The fill factor measures how much of the fill level of the bucket really corresponds to the volume of the content, e.g., for big rocks, a lot of space is unused compared to fine sand. The trench depth describes the depth of the current digging cycle. The rotation angle is defined as the angle between the digging and dumping positions in each cycle. All other given parameters can currently not be determined automatically. Some are also not relevant for autonomous excavation and can not be calculated in such an backhoe loader. This includes the emptying accuracy, which is determined by the size of the dumping region (which is always a “point” in our autonomous excavation scenario), as well as the operator factor (skill factor). Similarly, the conditions of the teeth and the maintenance state can be more easily extracted from external sources. In principle, it is possible to estimate both with additional sensors, e.g. the teeth state could be estimated by using the point cloud of the time-of-flight camera and then estimating how sharp the teeth still are. However, in both cases extracting information about the working hours from the software of the manufacturer seems to be the better solution, as this is usually recorded and often accessible to the user. However, integration is machine and manufacturer specific and therefore omitted here. The operating conditions are not as clearly defined and include the effect of weather, as well as the quality of preparation work, and are therefore not measured automatically.

3.2 Data abstraction

To extract the defined parameters, additional evaluation of the sensor and control data has to be done, which is not present in the preliminary work. Valuable help is here the knowledge of which action is currently executed by the excavator’s arm. The visualization of each phase or action can be seen in Fig. 5. With this information, the cycle time can be directly calculated by saving the timestamp of the start of the “digging”-phase and calculating the difference till the subsequent “digging”-phase starts. Similarly, the trench depth can be deduced by the z-position of the bucket teeth, which can be calculated by the kinematic model as in Groll et al. (2017), at the start of the “move to dumping” position, e.g., the end of the “digging”-phase plus an offset accounting for the bucket width. In the same manner, the rotation angle can be computed by taking the difference of the yaw values at the start of the “dumping” and “digging” phases.

Fig. 5
figure 5

Image of the start of the four different digging phases of the excavation process. “Digging” (top left), “Move to Dumping” (top right), “Dumping” (bottom left), “Move to Digging” (bottom right)

The fill and dissolving factors are not as easily calculated. For the fill factor, the RGB image and depth image at the start of the “digging”-phase is considered, and then a classification of the visible rock sizes, as in Groll et al. (2020), is used. The fill factor is then determined according to the rock size in the image and the Table 2.

Table 2 Adapted fill factors

To calculate the dissolving factor, the estimation of the bucket fill level as in Hemer et al. (2018) is used, which corresponds to the volume of the loosened soil. To calculate the volume of the compacted soil, the 3D point cloud of the laser scanner is used. The point cloud is rasterized into a grid map, as seen in Fig. 6. Between the start of two “digging”-phases, the height difference of the grid cells belonging to the target map, which defines the wanted final form of the pit, is calculated. The compacted volume can be estimated with this depth difference and the grid size. A major disturbance in this method is the soil, which is only moved by the bucket and not loaded into it. Therefore one would like to include all grid cells in the difference calculation, but this leads to another problem. Figure 6 shows the laser scanner-based pose estimation. The measured points of each scan are fused into the map. Although the terrain around the backhoe loader is flat, the grid map shows increasing terrain further away from the backhoe loader. While the accuracy of the laser scanner reduces with distance, the divergences are mainly due to the fact that the exact pose of the scanner is not known because of the vibrations of the excavator’s arm. The pose further deteriorates during the digging process. This pose error is amplified by the distance, which is why it is not directly visible in the vicinity of the backhoe loader but still plays a role in the measurements. Therefore not all grid cells should be chosen. As a trade-off, the cells directly neighboring the target map are additionally used for the difference calculation.

Fig. 6
figure 6

Grid map of two consecutive digging cycles (left and right) and corresponding target map (middle). The cell heights are color-coded. Green indicates a high positive value and red a high negative value. The pit can be seen as dark red cells in the middle of the grid maps

3.3 Performance data serialization and collection

The performance data that is collected every digging cycle is forwarded to a machine data collector module. This module provides a robust interface that accepts only data with properly annotated SI Units. It collects and bundles all data points belonging to one digging cycle into one data structure per cycle. As soon as a digging cycle ends, the completed data bundle is published on its output interface. The output interface can be connected with multiple consumers, e.g., a queue module for batch processing, a disk writer module, an HTTP endpoint publisher, and a BaSyx publisher. All performance data consumers are processing data bundles. Therefore, the queue module is always connected with the output interface of the data collector to build the expected batches with configurable sizes. A bundle size \(\le 1\) implies the immediate publishing of data bundles in every completed digging cycle. The disk writer module logs performance data to the local filesystem. The HTTP endpoint publisher sends batches of performance data serialized as JSON files to web services, for instance, the DiBa discussed in Sect. 5. The BaSyx publisher sends performance data to an Eclipse BaSyx platform instance. The data flow in Fig. 7 shows how the data is centrally collected, bundled in a queue, and sent to different consumers. Usually, only one of the consumers on the right is required to be active at one time, but combinations are possible. In an ideal setup, the disk writer is always connected to the queue to prevent data loss when remote data sinks are unavailable due to network issues.

Fig. 7
figure 7

Performance data is collected, bundled, and written to disk or sent to web services

4 Experiments

For the experimental evaluation, a trench with dimensions \(2m\times 1m\times 5m\) (Height\(\times\)Width\(\times\)Length) has been excavated. An image of the excavated trench can be seen in Fig. 8. The resulting values can be seen in Table 3. In the area of the experiments, no bigger rocks were present. Therefore the fill factor was always assumed to be 1 and is excluded from the table.

Fig. 8
figure 8

Final excavated trench

Table 3 Experimental evaluation

Some cycles in Table 3 include clear measurement errors. For instance, the GPS Signal was lost in cycles 6 and 18. Therefore, the depth offset of the backhoe’s position is wrong. For other cycles, e.g., 6 and 19, the estimated compacted volume is bigger than the loosened volume. This can only happen for large rocks. In general, an overestimation of the compacted volume seems to be a trend. This is probably due to the way the autonomous excavation happens: the excavator’s arm often goes too deep into the ground and pulls much more soil out of the ground than can fit in the bucket. This soil will be measured as removed from the pit, but not as a loosened volume in the bucket. However, this error should be leveled out, when this soil is scooped up and moved to the correct dumping position. To have a reasonable speed estimation at all times, a sanity check taking these two problems into account is proposed. It includes the following rules:

  • Skip measurement if \(f_{1} > 4.9m\) (maximum digging depth)

  • Skip measurement if \(\alpha > 1\) and \(\phi > 1\) (Reminder: \(\phi = 1\) in all provided experiments)

If only the remaining measurements are included, the estimated performance \(Q_{n}\) results in \(60\%\) of a proficient backhoe loader operator. In comparison, an amateur would achieve roughly \(65\%\) according to (Girmscheid 2010a, b), which suggests that the measurement methods yield suitable results.

5 Web application “Digitale Baustelle”

In a data and knowledge-based software tool under the name “Digitale Baustelle” (DiBa) (in english: “Digital Job Site”), all measured sensor data and calculated values are combined and evaluated via a REST interface. The DiBa provides the user with a machine capacities planning tool in accordance with the tender documents already during work preparation and operation planning. Knowing which construction machine is needed at which time and for which operation on the construction site, its availability and logistics can be checked and organized. Similarly, during execution, the DiBa can assist the user on-site in supervising and monitoring operations by sending machine data generated by machine control software and sensors from the site to DiBa. A user-friendly display of machine data and operation progress (see Fig. 9) allows the user to continuously check the progress of the job site and, thus, the predicted end of the operation, and to track deviations in the parameters.

Fig. 9
figure 9

The dashboard of the DiBa displays the progress on the construction site

Significant deviations are reported to the user, and potential causes are identified. So far, the DiBa uses the method of diagnostic analysis and is already implementing it prototypically. In the future, further functionalities based on the DiBas predictive and/or prescriptive method will be added, extending construction progress forecasts and trend analyses. For root cause analysis of construction progress disturbances, the machine data, determined or calculated based on sensor readings from the construction site, are analyzed for correlations with specific disturbances. These machine measurements include, among others, the play time, the fill factor, the slew angle, the release factor, and the excavation depth at the lower horizon of the respective layer. From a developed correlation matrix, conclusions are drawn from the change of a parameter, for the identification of a construction process disturbance. For each process on the construction site, matrices were created for the causes and the resulting consequences. The graphical representation of the matrix can be seen in Table 4.

Table 4 Graphical representation of the “Causes - value too small” fault matrix. The leftmost column shows the factor which might be too small. The cell entries show potential reasons and the top row groups them into categories. For empty cells there are currently no known correlations

The left column shows the different parameters of the machine data. The corresponding rows describe possible causes that are assigned to the higher-level general categories shown in the top row. The identified correlations between the construction machine data and the disturbances represent the basis for the root cause analysis. These are stored in tabular form in the DiBa database. When displayed in the user interface, direct potential causes and recommendations can be given in addition to the numerical deviation (see Fig. 10).

Fig. 10
figure 10

Identification of the cause of the disturbance

6 Conclusion

The presented approach for estimating the excavation speed seems to be promising as it gives good results in many cases. Even if some measurement errors are present, they do not affect the suitability of the task. Nevertheless, general improvements could be made in the estimation of loosened soil. Possible enhancements include adjusting the size of the grid cells and more accurate localization of the laser scanner to achieve better fusion. Approaches such as Scene Flow or ICP matching seem suitable to improve this aspect. In general, however, with the advancing automation of construction machines and an increasing number of sensors on a construction site, similar approaches are promising. By combining the knowledge of different vehicles, a detailed picture of the condition of the whole construction site could be obtained. With the help of the data- and knowledge-based software tool DiBa, it is possible to collect this measured data centrally, use it for calculations and evaluations, and visualize it for the respective roles on the construction site. On the basis of target/actual comparisons, not only can deviations in construction progress be displayed, but also, through stored comparative data, indications of possible causes can be given. In the future, the number of construction machines supported by DiBa will be continuously expanded. The aim is to map every single machine used on the construction site in DiBa and to include it in the forecasts of potential construction delays. Even though with the advancements for fast wireless transmission, such as 5 G, collecting more data and evaluating it centrally seems to be more suitable, the necessary coverage is still far away. Assessing the relevant values and doing preprocessing on site is still needed and of great importance especially as the needed bandwidth scales with each vehicle.