Activity recognition of construction equipment using fractional random forest

The monitoring and tracking of construction equipment, e.g., excavators, is of great interest to improve the productivity, safety, and sustainability of construction projects. In recent years, digital technologies are lever-aged to develop monitoring systems for construction equipment. These systems are commonly used to detect and/or track different pieces of equipment. However, the recent research work has indicated that the performance of the equipment monitoring system improves when they are able to also recognize/track the activities of the equipment (e.g., digging, compacting, etc.). Nevertheless, the current direction of research on equipment activity recognition is gravitating towards the use of deep learning methods. While very promising, the performance of deep learning methods is predicated on the comprehensiveness of the dataset used for training the model. Given the wide variations of construction equipment, in size and shape, the development of a comprehensive dataset can be challenging. This research hypothesizes that through the use of a robust feature augmentation method, shallow models, such as Random Forest, can yield a comparable performance without requiring a large and comprehensive dataset. Therefore, this research proposes a novel machine learning method based on the integration of Random Forest classifier with the fractional calculus-based feature augmentation technique to develop an accurate activity recognition model using a limited dataset. This method is implemented and applied to three case studies. In the first case study, the operations of two different models of excavators (one small-size and one medium-size) were tracked. By using the data from one excavator for the training and the data from the other one for testing, the impact of equipment size and operators ’ skill level on the performance of the proposed method is investigated. In the second case study, the data from an actual excavator was used to predict the activity of a scaled remotely controlled excavator. In the last case study, the proposed method was applied for rollers (as an example of non-articulating equipment). It is shown that the fractional feature augmentation method can have a positive impact on the performance of all machine learning methods studied in this research (i.e., Neural Network and Support Vector Machine). It is also shown that the proposed Fractional Random Forest method is able to provide comparable results to deep learning methods using considerably smaller training dataset.


Introduction
The construction industry has a high workplace incident rate compared to other industries [3,4] The role of equipment in construction accidents and injuries is significant [1,2,5]. This is mainly due to the sheer size of construction equipment, the high congestion level of construction sites, and the frequent and unstructured interaction between workers and equipment [6,7]. Additionally, given the leading role and criticality of equipment operations, the productivity of construction operations heavily depends on the efficiency of the equipment work [8][9][10][11][12][13][14]. It is also shown that construction equipment is responsible for a large volume of CO 2 emissions and therefore has detrimental environmental impacts [15]. Construction equipment, thus, plays an important role in the development of strategies to improve the sustainability of construction projects [16]. This accentuates the significance of enhancing the use of construction equipment to boost productivity, safety, and sustainability of construction projects [4,17].
The monitoring of operations of the construction heavy equipment is of great interest and importance because the data collected from these operations can, among others, be used to (a) provide feedback to operators and managers about safety and productivity [18], (b) gain better insight into the planning and simulation of construction operation [19], (c) develop materials for training [20], and (d) support innovation in equipment manufacturing.
The monitoring of equipment can be done at different levels: At the most basic level, monitoring systems intend to detect different pieces of equipment [23,40], dangerous proximities [32,41,42] or entrance to/ exit from predefined zones [37,43]. At this level, monitoring systems are indifferent to continuous behavior of specific equipment and only perform static analysis. At the next level, the monitoring systems are not only detecting equipment/events but also track equipment movement across the construction site [7,28,[44][45][46]. Tracking-based systems require a Real-time Location System (RTLS), e.g., GPS or UWB. Other types of sensors such as proximity sensors can also be integrated into the system to combine the element of tracking and detection. While for the majority of the construction equipment (e.g., rollers, pavers, and graders) it is sufficient to track the equipment as a rigid body (i.e., a monolithic object), articulated equipment (e.g., excavators, cranes, and mobile cranes) can be tracked both as rigid bodies (i.e., using a single sensor) [18,46] and articulated bodies (i.e., using a sensor array to capture the motions along different Degrees of Freedom (DOFs) of the equipment) [7,34]. It is shown that pose estimation (i.e., tracking all DOFs of the equipment) is more effective for productivity, safety, and sustainability monitoring of articulated equipment [7,24,34,[47][48][49][50]. While pose estimation is commonly referred to as the 3D tracking of articulated equipment, for the sake of brevity even 2D tracking of equipment as a rigid body will be referred to as pose estimation in the remainder of the paper.
It is demonstrated that automated equipment monitoring systems have great potentials for improving the productivity, safety, and sustainability of construction operations. However, the previous studies have demonstrated that equipment monitoring systems can be significantly improved if the tracking data are contextualized to detect and distinguish between different activities of the equipment, e.g., digging and swinging [18,19,51]. This is mainly because a great number of safety, productivity and sustainability-related measures depend on activities that are being performed by the equipment. For instance, the identification of utility strike dangers by an excavator depends not only on the location of the excavator and utilities but also whether or not the excavator is performing a digging task at that location. Additionally, the data about the activity of the equipment can be used to analyze construction operations at a higher level of abstraction and for managerial and strategic decision making. Such data can be used to estimate the progress of projects [36], update predictive (simulation) models about the productivity [19,31], perform predictive safety assessment [6], and develop data-driven simulation models [51,52].
Conventionally, the translation of the location/pose data into activity information is done either manually or using heuristics, which are very case-specific. This process is also labor-and cost-intensive [40,53]. In recent years, there has been an upsurge in the development and application of automated activity recognition methods [19,21,[51][52][53][54][55][56][57]. These methods, mostly, use Machine Learning (ML) methods to analyze site images or sensory data collected from different equipment to classify different activities. In supervised ML methods, an indexed dataset is used to train the fittest classifier that can explain the variation in the dataset. The trained classifier is, then, applied to new testing datasets to evaluate the accuracy of the method. The existing ML-based equipment activity recognition support both homogenous single datasets, e.g., only images [21], and heterogeneous datasets, e.g., the fusion of GPS and Inertial Measurement Units data [51].
Over the past few years, the ML-based equipment activity recognition methods have matured and developed considerably. But, to improve the performance of these methods (i.e., in terms of generalizability and also accuracy), the general research trend is noticeably gravitating towards the use of deep learning algorithms (e.g., Convolutional Neural Network and Long-Short-Term Memory) [53][54][55][56]. Generalizability refers to the extent to which the model can accurately predict the cases that are not part of the training dataset) As shown in these research work, these methods are successful in predicting the activities of the construction equipment with high accuracy. However, the main limitation of deep-learning methods is their dependency on large and comprehensive training datasets [58][59][60][61]. Given the diversity of construction equipment models, this would mean that a large set of data needs to be collected from different models of each equipment to ensure the accuracy of the activity recognition models. A recent study has clearly pointed out the limitations of deep learning methods for activity recognition and highlighted that the reliance on a large dataset is a major deterrent for the adoption of these methods in practice [62].
Given the above-mentioned challenge, this research is motivated by the ambition to explore an alternative approach in order to reduce the dependency of equipment activity recognition models on large and comprehensive training datasets without compromising the accuracy. Therefore, this research builds on recent findings about the effect of feature augmentation methods on improving the performance of machine learning methods [63][64][65] and hypothesizes that through the use of a robust feature augmentation method, a shallow learning model can yield a comparable performance to deep learning methods without requiring a large and comprehensive dataset. Feature augmentation refers to a set of techniques that try to expand the feature domain of a dataset by adding synthetic features to the original features. In this research, a fractional [66] Random Forest (RF) classifier is proposed. The choice of fractional feature augmentation as a feature augmentation method, and RF as the shallow learning method is justified by the following: (a) RF is chosen because of its inherent ability to better explain the complex pattern in the data by developing a large number of predictors and using a voting mechanism [67]. Being an ensemble classifier, RF is a strong classifier that is shown to be suitable for multi-dimensional and complex problems [68]. RF is also found to be a strong classifier for imbalanced datasets (i.e., the sample sizes of different classes are not the same) [69]. This pertains very well to the case of equipment activity recognition because the distribution of length of different activities for construction equipment is largely disproportional. Also, the literature provides ample evidence that RF classifiers offer high generalizability [70][71][72]. Previous research has indicated that RF outperforms other shallow classifiers for predicting construction equipment activities [73,74]. Most notably, Lee et al. [74] compared 17 different machine learning methods, including several deep learning methods, for construction activity recognition and observed that RF has the best performance. (b) Fractional calculus allows performing fractional order derivative or integration on data series. In the context of ML, fractional feature augmentation allows considering a wider range of features (e.g., input parameters) by performing the fractional derivative and/or integral on original features. For instance, if one of the features for training a model for estimating the activity of an excavator is the angular speed of the boom, the fractional feature augmentation can generate a range of features that are neither velocity nor acceleration, but can account for both. For instance, 0.5 derivatives of velocity offer a feature that contains elements of both velocity and acceleration. Similarly, fractional integration of the angular speed between 0 and 1 considers a range of features between pure velocity to pure displacement. Therefore, fractional feature augmentation allows increasing the number of features without requiring additional data collection. This can help better capture the complex pattern between equipment activities and its kinematics and thus improve the generalizability of the model. The integration of fractional feature augmentation with machine learning methods has been shown effective in other domains [75][76][77]. However, to the best of the authors' knowledge, the integration of ML, including RF, and fractional feature augmentation has never been studied for construction-related problems.
Based on the above premises, the main objective of this research is to develop a Fractional Random Forest (FRF) activity recognition method that can reduce the need for a comprehensive training dataset without compromising the accuracy and generalizability of the model. By addressing this issue, the proposed research contributes to making automated activity recognition of construction equipment easier to develop (by requiring smaller training dataset) and thus more accessible to practitioners and analysts.
The remainder of this paper is structured as follows: First, the relevant literature is reviewed and presented. Next, the proposed method is explained in detail. Then, the implementation of the proposed method and three case studies are presented. Finally, the contributions, conclusions, and limitations and future work are presented.

Related work
Given the benefits of activity-related data for safety, productivity and sustainability analysis of construction operations, many researchers have started to investigate this topic in the past few years. Table 1 summarizes these research works.
Three main categories of equipment activity recognition methods can be identified, namely vision-based, audio-based, and motion-based methods. The major difference between the three categories is the type of data used to estimate the activities of the equipment. While vision-based methods use images generated from the video recording of construction sites, audio-based methods, which have gained momentum in the past few years, use the sound captured from construction sites to infer the activity of different pieces of equipment based on their distinctive sound patterns. Motion-based (or Kinematic-based) methods utilize a wide range of sensors (e.g., IMUs, GPSs, accelerometers, etc.) to capture the motions of the equipment.

Vision-based methods
Zou and Kim [78] developed a color-based tracing method to identify the idle time of excavators. While the reported accuracy is high, this study only considered two activities, namely idle and working. For a wide range of safety and productivity-related analyses, this level of detail is too coarse to develop an accurate estimation of cycle times and proximity. Gong et al. [79] developed a Bayesian Network-based solution to distinguish between 3 different activities of an excavator and obtained an accuracy of 79%. Golparvar-Fard et al. [21] developed an activity recognition model using Support Vector Machine (SVM). Different accuracy levels were achieved for excavator and truck, namely 76% and 98.33%. The main limitation of this work is that it requires a priori knowledge about the duration and starting point of each activity. This would significantly overshadow the applicability of this method as part of an automated monitoring system given that this information needs to be gathered manually or through another method. Azar et al. [80] combined heuristics and SVM to identify the loading cycle of excavators. While very useful for productivity measurements, the method fails to provide detailed information about the time-stamped activity of  [55] Excavator and loader IMU Long short-term memory networks 9 85% ~ 99% Slaton et al. [54] Excavator and compactor Accelerometer Convolutiional Neural Network 7 77.1% Sherafat et al. [89] Excavator Microphone, IMU Support Vector Machine 4 92% A.K. Langroodi et al.

Automation in Construction 122 (2021) 103465
the equipment. Kim et al. [81] also developed a vision-based heuristic method to distinguish between 3 activities of truck and excavators. However, heuristics-based methods generally fail to achieve high generalizability given the complexity of construction sites and operations and given the fact that similar operations can happen differently under different conditions. Kim and Chi [53], Cai et al. [56] and Chen et al. [82] have proposed deep learning methods based on the use of Long Short-term Memory Networks and Convolutional Neural Network and achieved an accuracy of above 90%. While vision-based methods are gaining more tractions in recent years and proved to be very promising, several inherent limitations are still unaddressed. First, the accuracy of the developed model depends heavily on the comprehensiveness of the training dataset. Without a dataset that covers a wide range of (1) equipment type and shape, (2) angles of view and operational patterns, the accuracy of vision-based models can be lower when applied to new cases [56,82]. Second, vision-based methods are very sensitive to the weather condition, lighting, and cameras' field of view [83]. This would impact the practicality of the proposed methods in an uncontrolled environment.

Audio-based methods
Audio-based methods are the next category of construction equipment activity recognition solutions. This is a rather new approach towards identifying equipment activities. In general, these methods apply signal processing and machine learning techniques to detect specific audio pattern each activity or each type of equipment is producing. Cheng et al. [57,84] used the audio signal from equipment and applied SVM to recognize the activity of equipment based on the sound pattern. While high accuracy was obtained, the method only identifies activities at low levels of detail (i.e., splitting equipment activities to only major and minor). The contribution of this level of detail to improving safety, productivity and sustainability can be minimal. Others [74,[85][86][87] have used audio-based methods to distinguish between different pieces of excavation equipment. Despite their good performance, these methods are not purported to recognize different activities of equipment, which in turn limits their applicability for a wide range of monitoring purposes. In general, four main limitations are observed with these methods: (1) audio-based solutions are yet not able to identify equipment activities at the same level of detail as vision and motion-based methods, (2) at the current level of development, these methods are only limited to detect one equipment/activity at the time. This means that at this moment these methods are not able to work on congested construction sites where several pieces of equipment work simultaneously [83], (3) these methods are sensitive to the relative positioning of the data collection device to the source of the sound. Developing a robust multi-equipment classifier seems to require a large set of training data, (4) the sound pattern can be sensitive to geographical and geological features of the site, which makes the development of robust classifiers difficult.

Motion-based methods
The last category of activity recognition methods is motion-based solutions. In this category, a wide range of tracking sensors are used to collect the input data for activity recognition. Vahdatikhaki and Hammad [19] have proposed a heuristics-based method that leverages location data collected by UWB to determine the activities of an excavator. The proposed method is robust because of several layers of filtering mechanisms that are applied to ensure the output of activity recognition corresponds to the logical sequence of excavator operations. Nevertheless, the heuristic rules are sensitive to the context and generally perform with a high degree of variability in the accuracy. Akhavian and Behzadan [51] used GPS, gyroscope, and accelerometer embedded in smartphones to track the operations of loaders. They experimented with different types of classifiers for activity recognition and reported Neural Network (NN) as the best classifier. Kim et al. [73] used dynamic time-warping technique to improve the accuracy of activity recognition. The main limitation of the last two work is that the proposed classifiers are very sensitive to the input features and the initial configuration of the dataset. The proposed methods are not tested for cases outside the scope of the dataset. Axelsson and Daniel [88] compared different classifiers and observed that NN and RF are the best performing machine learning methods for activity recognition. This is one of the very few research works where the generalizability of models is tested for cases outside the scope of the training dataset. Nevertheless, because only the basic features coming from the sensory data are used, a major drop in the level of accuracy, i.e., from 95% to 54%, is observed when the model is applied to a new case. Sherafat et al. [89] proposed a hybrid kinematicacoustic system the utilizes data from IMU and microphones concurrently. Slaton et al. [54] compared two deep learning methods for activity recognition of compactors and excavators. In this study, only accelerometer data are used. Compared to other studies, they reported lower accuracy, i.e., 77.6%, and pointed out the importance of having a more comprehensive dataset to improve the accuracy further. Rashid and Louis [55] identified the limitations of the previous work in terms of sensitivity to the limited patterns present in the dataset and applied a series of data augmentation methods to improve the generalizability of the activity recognition methods. Significant improvement in the accuracy of the model is reported in this research. However, the data augmentation method is limited to expanding the scope of the existing features in the dataset, e.g., by stretching and compacting the time series representing individual activities such as swinging. In doing so, this research does not consider features other than those directly measured by sensors for the augmentation of the dataset.
Based on the above review, the research trends is ostensibly gravitating towards the application of deep learning methods using large training datasets. As stated in the introduction, this can pose applicability and generalizability issues given the wide variety of equipment models. Addressing the above-mentioned gap, this research argues that (1) the use of ensemble learning methods, e.g., Random Forest, can reduce the sensitivity of the model to the dataset because these methods tend to consider many different combinations of input features and use majority voting among classifiers. In this way, ensemble methods can better capture the intricate relationship between input features and develop more generalizable patterns, (2) the use of fractional feature augmentation allows (a) expanding the number of features without requiring additional data collection, (b) generating complex features that can potentially better explain the pattern in the data. Fig. 1 presents the overview of the proposed activity recognition method. The proposed method consists of 3 main phases, namely data preparation, training, and estimation, which function based on the input data coming from IMUs and GPS attached to the equipment. In the data preparation stage, time-series equipment 2D location and/or 3D pose data labeled with activities of the equipment are used for the training of the RF model. Next, fractional feature augmentation is applied to the location/pose data and new fractional features are generated and added to the training dataset. Once the training dataset is prepared, the bootstrapping technique is applied to generate a series of subset training datasets, as will be explained in Section 3.2. Then, an FRF classifier is trained based on the bootstrapped training dataset. As will be explained in Section 3.2, FRF generates several decision tree classifiers, each of which uses a randomly chosen set of features for the classification. At the end of the training, the model (i.e., the forest) contains K trees. In the estimation phase, a new data point (including location and/or pose data) is fed to the model and voting mechanism is triggered. During the voting, each tree presents its estimated activity of the equipment (at a given data point) based on how that tree classifies the data. Consequently, the majority voting of trees will be used to determine the final estimated activity.

Proposed method
The remainder of this section presents a more detailed explanation of the proposed method step-by-step, as shown in Fig. 2.

Data preparation
The first step in the data preparation is to set the basic parameters required for FRF. These parameters include the number of trees (K), a list of features (F), and the number of fractions (S). Number of trees would specify the number of individual decision trees, i.e., classifiers, that will be trained to form the forest. The list of features includes all the features that will be used as input for the model. It is important to mention that since fractional feature augmentation applies fractional integral and derivative on the input features, and to make sure that products of these operations have physical meaning, it is recommended to limit the nature of features to only velocity features. This is the case because applying integral and derivative on velocity yields displacement and acceleration, respectively. As shown in Fig. 3, any fractional integral and derivative between − 1 and 1 on velocity generate a complex feature that can

Integral Derivative
Velocity +1 -1 -0.5 +0.5 Once the basic parameters are set by the user, the required features need to be extracted from the dataset. Based on the explanation presented in the previous paragraph, the input features for examples of (1) an articulated equipment, i.e., an excavator, and (2) a rigid body equipment, i.e., roller, are shown in Fig. 4. In this example, each DOF of the excavator and roller is represented by a velocity vector. These include the magnitude of velocities of the superstructure (v), the rotational velocity of the superstructure (ω 1 ), and the angular velocities of the arm system (ω 2, ω 3, ω 4 ) (only for the excavator). For a piece of equipment like a roller, rotational velocity helps better track the equipment heading and thus distinguish activities where the direction of movement is important (compacting forward/backward).

Fractional calculus
Once the required features are prepared, the next step in the proposed method is to apply fraction calculus on the features. In this research, the Rieman-Liouville method is used for fractional calculus [90]. The general form of fractional integral and derivative according to Rieman-Liouville is presented in Eqs. (1) and (2), respectively [90].  [90].
The above-presented equations are presented for continuous space. However, in the context of the activity recognition, the integral and derivative space are discrete. In this space, the value of f(t) corresponds to the value of each feature shown in Fig. 4 at the time instance t. To better demonstrate the discrete form of Eqs. (2) and (3), the typical data array structure of features list for the example of the excavator can be used, as shown in Fig. 5. To be able to apply Eq. (2) on a discrete space, the first step is to determine the range at which the integral will be applied. In the example, let's assume the data is collected at the rate of 1 τ Hz. In this case, the integral will be applied over the range of T, where T > τ. Given this assumption, Eq. (1) will translate to Eq. (4).
where k is an integer number. For >1, Eq. (5) should be repeated m times on successive time slots.    Table 2.
There are three important points to highlight here: (1) after applying the fractional feature augmentation, the frequency of data is discounted from 1 τ Hz to 1 T Hz. This is because for the fractional feature to be calculated for the data at point kT, all the data in the range of ((k-1)T, (k + 1)T] are used. (2) after discounting the data frequency, the activity of the new data point (i.e., the label) is determined based on the dominant activity in the original period. (3) After applying the fractional feature augmentation for S fractions, the list of features will grow by S times. For instance, if the input feature list includes 5 features (as shown in Fig. 5), and if the number of fractions is set at 5, the final feature size will be 25. In this case, the ultimate list of features that will be used for the training of the RF model is in the form of the matrix presented in Fig. 6. The matrix for a rigid body equipment, e.g., roller, will be simpler in the sense that it would only have the fractional values of traversal velocity (V) and rotational velocity (ω 1 ).

Training
The next phase of the proposed method is the training of the FRF model. The first step is to bootstrap the dataset. Bootstrapping [91] is a technique in which multiple classifiers are developed based on randomly sampled (with replacement) subsets of the original training dataset. The bootstrapped dataset has the same size as the original dataset (i.e., i datapoints), however, because of the sampling with replacement, each bootstrap dataset may contain several repeated data points. Bootstrapping helps reduce the output error that might be caused by specific data points in the original training dataset [92].
In the next step, the Random Subspace Method (RSM) [93] is applied. In this method, a similar approach as bootstrapping is applied to the list of features, but without replacement. Therefore, this step is also known as feature bagging. More specifically, a random subset of the original features will be selected to train each tree in the forest. Therefore, P features will be randomly selected from S × F features available in the fractionalized list of features, where P < S × F. For instance, in the above example, a set of 5 features can be selected from the list of 25 fractionalized features to train each tree in the dataset.
To put the previous two steps into perspective, bootstrapping and feature bagging can be understood as a sampling of columns (with replacement) and rows (without replacement) of the matrix shown in Fig. 6, respectively.
Although from this point onward similar steps as of a basic decision tree algorithm are followed [94], a brief explanation is provided for completeness. To generate a tree, first all features are examined to find the feature that can best classify datasets into two nodes, as shown in Fig. 2. To examine the potential of each feature as a node, first, all the possible split points for that feature are considered. Then, the Gini index is used to determine the degree of impurity of the classification at that split point [94], as shown in Eq. (6). Once all the split points are explored, the point with the minimum Gini index is considered as the classifier based on that specific feature. Next, the same procedure is repeated until the potential classifier by each feature is determined. Finally, the feature with the minimum Gini index is selected as the node.
where G q : Gini index at node q. C: Number of classes (i.e., number of equipment activities). P i : Probability of a data point being classified to class i.
To determine the end of each branch in the tree, the potential of each node (other than the root node) for being a leaf is first controlled before it is grown. A node is determined as a leaf when the Gini index of the node is smaller than any possible subsequent classifier that can be considered at that node. If the node is not a leaf, then the node is branched out and two new nodes are generated. The above procedure is followed until the tree cannot be grown any further, i.e., there are no more unbranched nodes or no feature is left. The procedure of growing a tree is repeated for K trees in the forest. In this way, at the end of the training phase, there are K independent decision trees that are trained  using unique bootstrapped datasets and bagged features, as shown in Fig. 3.

Estimation
The last phase of the method is to assess the activity of a new input data point, which comprises the pose of the equipment. When new input data is provided to the model, first the fractional calculus is applied to the data points to generate the same set of features as in the trained model. It should be highlighted that since fractionalization requires a range of data point, FRF is applied over the discounted update rate of 1 T Hz. Once all the fractional features are generated, FRF asks each tree to determine the activity of the equipment corresponding to the specific datapoints. This procedure results in K estimates about the activity of the equipment corresponding to that time frame. Finally, the majority vote of trees is used to determine the verdict of FRF about the activity of the equipment.

Implementation and case studies
To demonstrate the feasibility and effectiveness of the proposed method, it is implemented and tested on three different case studies. The method is implemented in the Python environment on a workstation laptop with Core i7 2.80 GHz CPU and 16 GB RAM running on a Windows 10 operating system. As shown in Fig. 7, high-end industry-grade MTi 1-series Xsens IMUs were used in conjunction with the U-blox EVK-M8T GNSS for rotational and translation motion tracking, respectively. These sensors collect data at a rate of 100 Hz (the GNSS data is upsampled at this frequency). This sensor network provides the same set of input data shown in Fig. 4, i.e., traversal velocity, angular velocities of the arm system and the rotational velocity of the superstructure.
During the installation of IMUs on the equipment, care should be taken to ensure that the internal coordinate system of the sensor is aligned with the local axes of the part it intends to track. This is important because IMUs are used to measure the relative velocity of each part along its local axes. If there is a misalignment between the IMU and the equipment part, the measurements will have an offset. This can result in significant errors when the excavator is working on topologically rough terrains. During the case studies, IMUs are placed in welldesigned and aligned 3D printed casings that can keep the sensor parallel to the bottom plate of the casing. During the sensorization of the equipment, a spirit level is used to make sure the casing is aligned to the local axis of each component.
The collected data were preprocessed. The steps explained in the previous work of authors have been applied to remove outliers, synchronize the data and unify the time steps [19]. Also, Extended Kalman Filter was applied to smoothen GNSS and IMU data [95]. Moreover, the IMU sensors were equipped with Active Heading Stabilization software component to minimize the drifting error [96].
Given the main hypothesis of this research, i.e., FRF can serve as a generalizable and accurate activity recognition method without requiring a large training dataset, it was necessary to collect data for different types of equipment and operating conditions. To this end three case studies were considered. In the first case study, the method was applied on two different excavators (one small-size and one mediumsize) on a construction field at a training school. This case study intends to demonstrate the robustness and generalizability of the proposed method. In the second case study, and to further substantiate the generalizability of the method for an extreme case, the model developed based on the actual excavator in the first case study was tested on a scaled remotely controlled excavator (scale 1/12). Finally, in the last case study, the method was tested on two rollers as examples of rigid body equipment. While the first two case studies were conducted in controlled environments, the last case study was conducted on an uncontrolled environment of an actual construction project.
In all case studies, the performance of the proposed FRF method is evaluated. However, to put this assessment in perspective, three baseline methods (i.e., Random Forest, Neural Network, and Support Vector Machine) are used to shed light on the following questions: (1) Which ML method performs the best?, and (2) To what extent can fractional feature augmentation improve the performance of each ML method?. Therefore, six different models are trained and tested: RF, FRF, NN, Fractional NN (FNN), SVM, and Fractional SVM (FSVM). In these case studies, Multi-layer Perceptron (MLP) NN is used. Through a set of iterative process, the best structure for the MLP mode is found when two hidden layers of 50 neurons are used. As for SVM, the Radial Basis Function (RBF) is used as the kernel. The shape of decision function is set as one-vs-one (OVO). During the fractionalization, T = 1 s was used, as explained in Section 3.1.1. Also, the fractionalization step was set at 7. In the case of FRF, the number of trees was set at K = 1000 and feature bagging fraction (P) was set at 10.
In conformity with similar studies [55,82], three metrics are used for the assessment of the performance, namely accuracy, precision, and recall. Eqs. (7) to (9) show how these metrics are calculated.  performance of the FRF method for cases where FRF is trained based on one dataset and then tested on (1) a different dataset representing the operation of a different piece of equipment with a different geometry, (2) a different dataset representing the operation of the same equipment but by an operator with a different skill level, and (3) the same dataset but on the portion that was excluded from the training. Two different excavators (i.e., Terex TC75 and Case CX80C) are used for the data collection, as shown in Fig. 8. While Terex TC75 is considered as a small-size excavator, the customized Case CX80C used in this case study is categorized as medium-size. Data is collected at an operator training school in the Netherlands, i.e., SOMA College [97]. This was considered an adequate testing environment because (1) it provided a controlled environment where specific scenarios can be simulated and (2) it provided access to expert and novice operators.

Accuracy
To cover the scenarios mentioned above, three sessions of data collection were organized. In the first session, an expert operator was asked to perform a simple digging operation using Case CX80C. In these operations, operators are asked to relocate to the digging point, excavate soil on an even ground and dump the material on a nearby stockpile. In the second session, the same expert operator was asked to operate Terex TC75 and perform the same operation but at a different nearby location. In the last session, a novice operator is requested to use Case CX80C to perform the same digging operation. For each scenario, about 5 min of data were collected. The sessions are intentionally kept short to test the hypothesis of the research in terms of good generalizability with small datasets. The sizes of the datasets, after filtering the data, are presented in Table 3. The recent studies that proposed deep learning methods for equipment activity recognition [54,55] used 125 K and 287 K datapoints for training the proposed models, respectively. Compared to these studies, the training dataset in this study is considerably smaller (i.e., only 11% of [54] and 5% of [55]).
These three sessions resulted in four different datasets, (1) operation of Case CX80C by an expert operator, (2) operation of Terex TC75 by an expert operator, (3) operation of Case CX80C by a novice operator, and (4) a dataset containing all the above operations. As stated above, these four datasets are used to investigate the performance of the FRF method on (1) varied geometry, (2) varied skill level, and (3) no variation (i.e., when a combined dataset is used for training and testing with 70:30 ratio). Table 4 shows the structure of these scenarios.
To annotate the datasets, a 3D data visualizer is developed to mimic the motions of the equipment in a Virtual Reality (VR) environment and then label the activity of each data point. The use of VR visualizer as a means for data labeling is preferred because it eliminates the synchronization error between the video recording and sensor data. Fig. 9 shows a snapshot of the VR visualizer used for data labeling. As shown in previous studies [51], the number of activities used in data labeling has an impact on the accuracy of the model. Therefore, two different sets of activities were considered. In the first set, 5 activities are considered, namely (1) Idle, (2) Relocating (i.e., the excavator moves on its tracks), (3) Swinging, (4) Digging, and (5) Filling (i.e., dumping the material on the truck). After the close observation of the data in the visualizer, it was noticed that the filling activity is very short and hardly distinguishable from the swinging activity (especially when the equipment is handled by an expert operator). Therefore, in the second set of activities, the filling was eliminated and all of its instances were replaced with swinging. Consequently, the second set comprises 4 activities (i.e., idle, relocating,    swinging, and digging).

Results
As mentioned in Section 4, six different models were tested for 3 different scenarios once considering 5 activities and once considering 4 activities for the excavator. Tables 4 and 5 present the result of the performance analysis for all the scenarios. Also, Figs. 10 and 11 show the confusion matrices of the different models in terms of accuracy.
As a first observation, all models performed better (i.e., an average of 4.6% in accuracy) when only 4 activities are considered, as can be discerned in Tables 5 and 6. This is in line with the findings from the previous studies [21,51]. Also, all models performed worse when the testing dataset did not include the same scenario used in the training set (i.e., an average of 5.4% reduction in accuracy). This highlights the significance of testing activity recognition methods for the cases other than the one present in the training dataset to ensure generalizability.
Concerning the first question mentioned in Section 4.2, it can be observed that, in average, RF-based methods outperformed other methods. To elaborate on this matter, the performance margin of RF over NN and SVM is plotted in Fig. 12. As shown in this figure, RF outperformed NN in terms of accuracy by an average of 2.2%. The performance margin is greater when the fractional variation of the two methods are compared, i.e., an average of 4% increase in accuracy. The accuracy margin of RF over SVM is greater with an average of 8.3% and 9.9% improvements in pure and fractional variations, respectively. The same applies to the precision of the models. RF-based methods always performed better than NN and SVM. Nonetheless, recall performance remained more or less the same between RF and NN-based methods. Again, RF methods performed much better than SVM methods in terms of recall. In general, it can be concluded that RF-based methods have better performance over other methods. It is also shown that RF-based methods have high generalizability with 86.2% and 86.9% accuracy for 5-activity classifiers for varied geometry and varied skill level, respectively. The generalizability was even higher for 4-activity classifiers, i.e., 90.4% and 92.7% for varied geometry and varied skill level, respectively.
To answer the second question in Section 4.2, the contribution of fractional feature augmentation on the improvement of generalizability was scrutinized. Fig. 13 shows the improvement achieved in terms of accuracy, precision, and recall when fractional feature augmentation was applied to different methods. As it is evident in the results, the fractional version of all methods outperformed the pure method. In the Table 5 Accuracy, precision and recall of 5-acitivity models (best performances are highlighted in bold).  case of RF, for instance, and considering 5-activity classifiers, fractional feature augmentation improved the accuracy by 5.3%, 2.8%, and 3.8% for varied geometry, varied skill level, and no variation scenarios, respectively. Even higher improvement was achieved for the 4-activity FRF method, with 5.7%, 5%, and 4.1% improvement in the accuracy for varied geometry, varied skill level, and no variation scenarios, respectively. However, the improvement to the accuracy caused by fractional feature augmentation is not limited to RF-based methods.
Although to a lesser extent, similar patterns of improvement can be observed for NN and SVM-based methods. The contribution of fractional feature augmentation to precision is even higher than that of accuracy. For instance, an average improvement of 8.1% and 8% in the precision is observed when fractional feature augmentation is applied to 5-activity and 4-activity classifiers, respectively. Finally, it is also shown that fractional feature augmentation improved recall, although to a lesser extent compared to precision.

Data collection and preparation
To further test the extent to which the proposed FRF method is generalizable, an extreme case test was designed in which the model trained based on the Case CX80C excavator was used to estimate the activities of a remotely controlled scaled equipment, which is scaled at 1/12. As shown in Fig. 14, IMUs and GNSS receiver are attached to this equipment to capture the motion data. A similar pattern of digging was simulated in the outdoor environment on the campus of the University of Twente. In the labeling of the dataset, the same 5 activities mentioned in the first case study were used. The data were preprocessed using the same methods as in case study 1 and the same activity recognition methods were applied. Table 7 summarizes the results of this case study. Fig. 15    comparison of RF with other methods in terms of accuracy and also the analysis of the accuracy gain by applying fractional feature augmentation to different models. As shown in this figure and in conformity with the previous case study, FRF performs the best in terms of accuracy, precision and recall. The accuracy margins of RF over NN and SVM are about 7% and 25%, respectively. Similar to the previous case study, fractional feature augmentation has improved the accuracy performance of all models. The significant observation in this case study is that while the velocity ranges of RC excavator is considerably different from that of an actual excavator, FRF was still able to estimate activities with high accuracy, i.e., 72.9%. Having said that, while it is still comparable to results shown in previous research [21,54,79], the accuracy of all models is noticeably less than the first case study. This is mainly because of disparities between the kinematic chain of the RC excavator and the actual excavator. Most importantly, the RC excavator used in this case study did not have a controllable DOF at the bucket. This introduced some anomalies in the activity pattern of the RC excavator.

Data collection and preparation
In the last case study, the proposed method was tested for two rollers, as examples of rigid body equipment, on an actual construction site in Amsterdam. The project entailed resurfacing of 3 different sections of a main street, as shown in Fig. 16(a). Therefore, rollers had to relocate several times to get ready for the next batch of compaction. The two rollers were Hamm HW90 model. As shown in Fig. 16(b), each roller was equipment with one GNSS receiver and one IMU, yielding the same data shown in Fig. 4(b). In this case study, a different GNSS receiver was used (i.e., Trimble SPS851 [98]). The data were collected over a window of ±30 min with the update rate of 10 HZ, resulting in an average of 16,000 data points after preprocessing for each roller. The data were manually labeled for four activities, namely relocating, idle, moving forward (i.e., towards the paver) and moving backward (i.e., away from the paver). All models presented in the first case study were trained based on the data from one roller and then tested on the data from the other roller.  Table 8 summarizes the results of this case study. Also, Fig. 17 presents a detailed analysis of the performance of the proposed method against other baseline methods. As shown in the table, FRF outperformed other models in terms of accuracy (82.6%) and recall (94.6%). However, FNN performed the best in terms of precision. As shown in Fig. 17, fractional feature augmentation improved the performance of all models, although the improvement in the case of SVM is very marginal. The noticeable observation in this case study is that in the pure form, all models had similar accuracy. In the fractional mode,   although FRF had a slightly better performance, the difference is not significant. These findings can be construed as an indication that for rigid body equipment, while fractional feature augmentation is as effective as in the case of articulated equipment, the extent to which RF contributes to improved accuracy is less. This is mainly because articulated equipment has a considerably more complex kinematic which makes the distinction between their activities more intricate. In the case of rigid body equipment, the underlying rules that define equipment activities are simpler and less dependent on the complex interaction between DOFs. Therefore, all classifiers are able to predict the activities with similar accuracy. Another important observation from this case study is that the formulation of orientation-based activities (e.g., moving forward and backward) would complicate the activity recognition solely based on velocity-related features. If these two activities are merged into a single activity, the performance would increase significantly for all models as shown in Table 8. Another possible approach is to consider the orientation-related features in the training of the models. It is excepted that while the accuracy will increase for this case study, it would compromise the generalizability of the model because the orientation of movements in different projects is not the same.

Discussion
The results of the case studies corroborate the hypothesis of this research that the FRF can help develop an accurate activity recognition model without requiring large datasets. In the case studies, small datasets (i.e., approximately 10% of those used to develop recently proposed deep learning models) were used to train the FRF model, and then it was successfully tested on new cases. This attests to the high generalizability of the proposed method in spite of much smaller training datasets. With the accuracy of up to 94% for articulated equipment and 99% for rigid body equipment, the method is shown to have comparable accuracy to    [54,55,89]. Based on the above, the main contributions of this paper are as follows: (1) it is shown that accurate and generalizable activity recognition models can be developed without the need for large training datasets and the use of deep learning methods. This was achieved through the novel integration of the fractional feature augmentation method with RF. From the practical standpoint, by easing the need for comprehensive training datasets, this research helps develop accurate activity recognition models that can be used commercially in equipment monitoring systems; (2) it is shown that regardless of the machine learning method employed, the fractional feature augmentation method can improve the performance of activity recognition models. This feature augmentation method is shown to have the potential to become a common part of all the future developments in ML-based equipment activity recognition; (3) to the best of authors' knowledge, it is for the first time that the activity recognition classifiers have been rigorously tested for generalizability by investigating the impact of variation in the geometry of the equipment and the skill of operators. It is demonstrated that these variations indeed take a toll on the performance of the classifier. This signifies the importance of applying a similar testing regime in the future to ensure that activity recognition methods are robust and generalizable.
Based on the presented results from the case studies, the proposed method seems to be more effective and relevant for articulated equipment such as excavators and loaders. This is mainly because the kinematics of this type of equipment and how changes in their DOFs contribute to transition between activities are inherently more complex. As mentioned in the introduction, RF classifiers are known to be more suited for complex multi-dimensional problems. This is better manifested in the case of articulated equipment. The same applies, to a lesser extent, to the use of fractional feature augmentation. While the proposed method of augmenting the feature domain proved to be effective in all cases, the improvement in the case of articulated equipment seems to be more conspicuous. Again, this is because fractional integral and derivative on velocity are expected to better account for complex kinematics where the transition between activities cannot be attributed to a simple single-feature threshold, like the case of 3-activity roller.
This research complements the recent work on the augmentation of training dataset by Rashid and Louis [55] in that while this research focused on the expansion of feature domain through feature augmentation, the work of Rashid and Loius proposed a method (and showed the benefits thereof) for expanding the size of the training dataset through generation of synthetic data. It can be postulated that the combination of the proposed fractional feature augmentation method with the training data augmentation can significantly improve the generalizability of activity recognition classifiers in the future.
Another important point of discussion is about the requirements of the proposed method in terms of sensor data. Because the fractional feature augmentation discounts the update rate of the data, it is essential that the data is collected at a high frequency. At the same time, because the pose estimation of articulated equipment mostly requires sensor data fusion, it is important to be able to synchronize the data. This especially pertains to location data which are normally collected at a lower frequency than IMU readings. While up-sampling techniques can be used, it is preferred to avoid high frequency discrepancy between different sensors to avoid erosion of accuracy. On the same note, given that any new data entry to the model (for the evaluation purpose) also need to be subject to fractional augmentation, the application of the proposed method for high-frequency real-time monitoring needs to be further studied.

Conclusions
The detection of the activities of construction equipment is important for the monitoring of a wide range of safety, productivity, and sustainability related practices. The existing activity recognition methods depend heavily on a comprehensive dataset, which is a limitation given the sheer number of variations in the size and shape of the equipment. This research proposed a novel Fractional Random Forest method to develop an accurate activity recognition model that can work with small datasets and remain generalizable. The proposed method was applied to three case studies where several scenarios were developed to test the generalizability of the proposed method.
Based on the analysis of the results, the following main conclusions can be reported: (1) With the accuracy of up to 94% for articulated equipment and 99% for rigid body equipment, the proposed FRF was able to deliver comparable performance to recent deep learning-based activity recognition methods with only a fraction of training dataset used in previous methods [54,55,89]. The low dependency on the comprehensive training data set provides the advantage of using a small set of training data to accurately predict the activities of a large set of equipment; (2) Compared to other baseline shallow learners (i.e., MLP and SVM), FRF has a better performance in terms of accuracy, precision, and recall;. (3) the FRF model was able to predict activities of an actual piece of equipment in a different size/shape with an accuracy of 86.2%. In an extreme case of testing the model on a scaled RC equipment, FRF delivered an accuracy of 72.9% which is still comparable to results reported in the recent machine-learning based methods. This indicated a high generalizability of the proposed method; and finally (4) regardless of the machine learning method, fractional feature augmentation can help improve performance of activity recognition model.
In the future, the proposed method will be tested for a wider range of variations in equipment sizes and operators' skill level. Also, given that on different types of terrain and for different types of work the movement pattern of the equipment may change, it is beneficial to collect data for equipment working on different types of terrain and performing a wider range of tasks. Finally, the proposed FRF method can be further enhanced by integrating it with the data augmentation method proposed by Rashid and Louis [55]. This will be addressed in the future study of the authors.