Internet of things and ensemble learning-based mental and physical fatigue monitoring for smart construction sites

The construction industry substantially contributes to the economic growth of a country. However, it records a large number of workplace injuries and fatalities annually due to its hesitant adoption of automated safety monitoring systems. To address this critical concern, this study presents a real-time monitoring approach that uses the Internet of Things and ensemble learning. This study leverages wearable sensor technology, such as photoplethysmography and electroencephalography sensors, to continuously track the physiological parameters of construction workers. The sensor data is processed using an ensemble learning approach called the ChronoEnsemble Fatigue Analysis System (CEFAS), comprising deep autoregressive and temporal fusion trans-former models, to accurately predict potential physical and mental fatigue. Comprehensive evaluation metrics, including mean square error, mean absolute scaled error, and symmetric mean absolute percentage error, demonstrated the superior prediction accuracy and reliability of the proposed model compared to standalone models. The ensemble learning model exhibited remarkable precision in predicting physical and mental fatigue, as evidenced by the mean square errors of 0.0008 and 0.0033, respectively. The proposed model promptly recognizes potential hazards and irregularities, considerably enhancing worker safety and reducing on-site risks.


Introduction
The construction industry is a global economic driver, substantially contributing to the gross domestic product of both developed and emerging economies [1].Despite its economic importance, the industry is burdened with a high incidence of workplace accidents and safety hazards [2], establishing it as one of the most hazardous sectors [3].Notably, these accidents are not only detrimental in terms of human cost but also lead to consequential declines in productivity, increased repair costs, and delays in project schedules [4][5][6].Construction workers routinely perform physically demanding tasks [7][8][9][10], which increases their risk for chronic illnesses, cardiovascular diseases, and other health issues [8,[10][11][12].In addition to the high injury rate, their poor lifestyle habits and physical conditions further exacerbate health risks [13].The Bureau of Labor Statistics data [14] highlights these issues, positioning construction as one of the sectors with the highest fatal occupational injury rates in the USA.Factors contributing to construction accidents include inadequate site management, the absence of necessary safety precautions, environmental hazards, unsafe behavior, and physical exhaustion [15,16].Among these, unsafe worker behavior and fatigue majorly contribute to accident incidence [10][11][12][17][18][19][20][21].Moreover, attention lapses due to fatigue and heavy workload when operating construction equipment are key precursors to accidents [22,23].
Furthermore, the dynamic and hazardous nature of construction sites, coupled with the high prevalence of mental and physical fatigue among workers, constitutes a significant risk factor for accidents and injuries.The incidence of fatigue among construction personnel is alarmingly high, ranging from 10 to 40% across various trades [21,24,25], with 20-40% of different craft workers consistently surpassing the widely recognized physiological limits for manual labor [24].This high prevalence can be attributed to the physically strenuous nature of construction tasks, which often involve long hours, shiftwork, and challenging environmental conditions [26,27].Higher levels of mental and physical fatigue were significantly correlated with greater musculoskeletal pain [28].This indicates that fatigue not only increases the risk of accidents but also exacerbates physical discomfort and potential long-term musculoskeletal damage.Occupational fatigue is recognized as a major safety and health concern, particularly among workers at higher risk of injuries and illnesses due to inescapable tiring conditions.More than 43% of laborers suffer from sleep deprivation, notably those who work nights or have long, irregular shifts, contributing to fatigue [29].The pervasive nature of fatigue in this sector is further evidenced by its association with a third of all occupational injuries, akin to impairment levels experienced under the influence of alcohol [26].A 12-h workday is linked to a 37% higher risk of sustaining injuries [30,31].Additionally, when comparing day shifts to evening and night shifts, the incidence of accidents is 18% higher and 30% higher, respectively [32].Fatigue has been empirically linked to impaired physical and cognitive function, leading to an increased risk of incidents as well as injuries [21,33].Construction workers with severe fatigue have been found to have 1.77 times higher odds of experiencing occupational injuries compared to those without fatigue [25].Furthermore, 7.7% of non-fatal injuries among Colorado's construction workers were shown to be related to fatigue [34].In a study of 328 construction workers and 69 incidents, fatigue-related impairment was associated with a significantly elevated incident rate of 9.6 vs. 0.8 incidents per 1000 person-hours worked [13].Mental fatigue, especially prevalent among roles that entail high cognitive demands, leads to decrements in cognitive performance.This is marked by a substantial rise in the time it takes to detect hazards and an increase in the instances of missed detections by operators as subjective mental fatigue levels rise [26,27].The failure to detect hazards has been identified as a leading cause of accidents involving construction equipment, with 50% of fatal accidents being related to equipment operation [27].The probability ratios for reporting a vehicle accident and a near-miss incident after a long work shift, as opposed to a standard-length shift, are 2.3 and 5.9, respectively [35].The economic consequences of fatigue-related accidents are substantial, as they not only affect productivity-costing employers between $1200 to $3100 per employee annually-but also result in an estimated annual cost of $136.4 billion in the United States alone due to health-related lost productive work time [29,36].The precise assessment of fatigue's direct influence on construction safety is challenging to quantify, largely attributed to the lack of established real-time fatigue monitoring methodologies.This deficiency highlights the imperative need for the development of fatigue monitoring technologies to mitigate the risks associated with mental and physical fatigue and ensure the safety and well-being of construction workers [37].
Smart technologies have revolutionized sectors such as healthcare; however, their adoption in construction is surprisingly limited [12,38].In light of these concerns, the primary research question guiding this study is as follows: how can the integration of the Internet of Things (IoT) and ensemble learning methodologies enhance real-time fatigue monitoring in the construction industry?This study aims to address this question by providing an innovative, technology-driven solution to improve safety measures and mitigate health risks in the construction industry.

Overview of wearable technology in the construction industry
Wearable technology that incorporates components such as computers, software, electronics, and sensors seamlessly embedded in clothing or accessories that can be easily worn on the body (e.g., smart glasses, watches, or garments) has recently garnered considerable attention across a multitude of industries due to its capability to monitor and analyze real-time biological, health, and safety parameters [39,40].The most prominent application of such devices is in the healthcare sector, where wearable health devices integrated with sophisticated biosensor systems enable the perpetual surveillance of an individual's physiological parameters, which include, but are not limited to, heart rate (HR), heart-rate variability (HRV), blood volume pressure (BVP), and dermal temperature, thereby providing instantaneous feedback [12,21,41].
To harness these opportunities for proactive safety management, various wearable sensing technologies have been extensively researched.This research includes the deployment of motion sensors such as inertial measurement units and an array of physiological sensors comprising heart-rate monitors, electrodermal activity sensors, skin temperature measurement devices, eye-tracking systems, and brainwave-monitoring apparatuses.These technologies aim to enhance safety measures by providing real-time data on workers' physical and mental states.Wearable technology has also been applied in the construction industry to improve the safety and health of construction workers.Applications in construction include monitoring physical activities such as walking, running, and lifting through embedded motion sensors and utilizing physiological sensors for continuous health status monitoring to identify potential risks like fatigue [8,42,43].
Moreover, wearable technology combined with sophisticated data analytical methods, such as machine learning and deep learning algorithms, can be used to analyze large amounts of data acquired by wearable devices and extract meaningful insights for safety and health management in the construction sector [2,3,44].For instance, machine learning algorithms can predict fatigue risk based on physiological data collected by wearable devices [45,46].This prediction capability can help prevent fatigue-related accidents, thereby enhancing the overall safety of construction workers.However, the adoption of wearable technology in construction faces challenges related to data accuracy, worker privacy, and comfort, which necessitate the development of robust data analysis techniques and addressing worker concerns to promote its usage [47][48][49].To overcome these challenges, advanced wearable systems that combine multiple sensors and data analysis techniques have been widely researched.These systems comprehensively and accurately monitor the safety and health of construction workers [38-40, 50, 51].For instance, a wearable system that integrates motion and physiological sensors was recently proposed to simultaneously monitor the physical activities and physiological status of construction workers [8].
In conclusion, while wearable technology has demonstrated immense potential for enhancing the safety and well-being of construction personnel, further research is required to overcome the existing challenges and realize its full potential.In this regard, integrating wearable technology with the latest data analytics methods, such as ensemble learning, is a promising avenue for future research.

Wearable technology for fatigue detection
Construction work is labor-intensive and involves performing strenuous tasks over extended periods, which causes fatigue and affects the mental and physical health of construction workers while considerably increasing the risk of on-site accidents [7,10,[18][19][20][21].This necessitates the development of reliable fatigue detection systems for the construction industry.Fatigue detection traditionally relies on self-reported fatigue measures or the supervision of a site manager but yields inaccurate and biased results due to subjectivity [18].Real-time fatigue detection, which is pivotal in proactive accidents, presents a significant challenge due to its inherent difficulty.Wearable technology is utilized for this purpose.The devices, which are equipped with an array of sensors, such as heart-rate sensors and brainwave monitors, can continuously monitor workers' physiological status and detect signs of fatigue [45,52].Deep learning algorithms are used to analyze sensor data, predict fatigue risk, and deliver timely warnings to prevent accidents [53].The integration of machine learning techniques with wearable technology allows for fatigue detection, enhancing the timeliness of interventions and helping to prevent fatigue-related accidents [20,52,54].

Photoplethysmography sensor
A photoplethysmography (PPG) sensor is a noninvasive technology that measures cardiovascular parameters such as HR and changes in blood volume.Wearable devices containing PPG sensors, such as smartwatches, have been used for fatigue detection in the construction industry [20,22].The sensor detects variations in blood volume changes with each heartbeat, which are affected by physiological changes that occur during fatigue [21].This technology is easy to use, unobtrusive, and can continuously monitor fatigue, among other advantages.However, PPG sensors may produce inaccurate readings due to motion artifacts, environmental light interference, and variable sensor placement [23].PPG sensors measure the HR, BVP, and HRV, which are associated with fatigue [19].HR denotes the frequency of heartbeats in a minute, whereas HRV refers to the fluctuation in the intervals between consecutive heartbeats.A lower HRV and a higher HR are indicative of fatigue [20].HR can be calculated directly from the time intervals between successive heartbeats.HRV is typically calculated using the root mean square of the successive differences to analyze the intervals between normal-to-normal (NN) intervals.

Electroencephalography sensor
Electroencephalography (EEG) sensors monitor brain activity through the scalp and are often used in sleep studies to identify changes in different sleep stages, including the onset of fatigue [21,55].Research indicates that EEG correlation patterns shift markedly across sleep stages, with significant interhemispheric synchrony during non-REM sleep, particularly in the theta and alpha bands, which are indicative of fatigue [56,57].Additionally, a reduction in EEG complexity from awake states to deeper non-REM stages was found, suggesting a potential analog for physiological fatigue detection [58].These findings align with similar transitions observed during general anesthesia, where shifts in alpha frequency could indicate recovery phases, drawing parallels with sleep transitions that may predict fatigue recovery in real-world scenarios [59].Additionally, with increasing sleep deprivation, higher synchrony among EEG channels across various brain regions was observed, suggesting that increased connectivity serves as a compensatory mechanism to maintain cognitive functions despite fatigue [60].In the construction industry, EEG sensors can directly detect changes in brainwave patterns in real-time to indicate the onset of fatigue [21].Although these sensors are valuable tools, their usage can be challenging due to their susceptibility to noise, the need for specialized expertise for data interpretation, and concerns related to comfort and practicality of use in an active work environment [61].The brainwave frequencies associated with fatigue states are theta (4-7 Hz) and alpha (8)(9)(10)(11)(12) waves.An increase in theta and alpha wave activities is indicative of fatigue [62].These frequencies can be calculated directly from the EEG sensor data using the fast Fourier transform.

Different sensor combinations
To improve fatigue detection accuracy, different types of sensors can be combined in wearable devices [63].For instance, cardiovascular and emotional changes associated with fatigue can be comprehensively monitored by combining PPG and electrodermal activity sensors [7,21].Similarly, physiological changes caused by fatigue can be detected by integrating EEG and skin temperature sensors [21].However, the development and calibration of systems that integrate multiple sensors present challenges in terms of complexity, power consumption, and potential data redundancy [64].Furthermore, privacy and comfort-related concerns might increase when multiple sensors are involved, potentially affecting worker acceptance [48] of the technology.
As each sensor type and combination offers unique strengths and limitations for fatigue detection in the construction industry, their combination can be used to overcome the associated challenges and enhance accuracy and reliability [18,21].Using a combination of different sensors in wearable devices, various fatigue-associated metrics, such as HR, HRV, and theta and alpha waves [65], can be simultaneously measured to monitor a worker's fatigue level comprehensively and reliably.

Computational approaches for fatigue detection
The integration of computational methods in fatigue detection has seen a surge.While individual machine learning models have carved a path for innovative fatigue assessment, they are not without their challenges.Analytical approaches leveraging machine learning or simple statistical techniques, including logistic and linear regression, have been used to explore the data derived from wearable sensors and classify fatigue levels in construction workers [20,21].Various computational algorithms, including decision trees, boosted trees, support vector machines (SVM), random forest (RF), and artificial neural networks [54,66], have been used for fatigue detection.However, these methods pose challenges, such as overfitting and feature selection [66].One study utilized physiological parameters such as heart rate, breathing rate, and skin temperature, measured by textile-based wearable sensors.The method incorporated penalized logistic regression and multiple linear regression models, as well as supervised and unsupervised machine learning approaches, including Artificial Neural Network (ANN), to develop a real-time fatigue assessment system; however, the study's lack of generalizability beyond bar benders questions its broader applicability.The method also suffered from a high quantity of motion artifacts, and the lack of correlation with blood lactate levels also posed additional limitations [20].
Another approach involved a non-invasive method using a wearable respiration sensor and random forest classifier, which showed high accuracy and was less affected by motion artifacts.Even so, there was potential variability in sensor performance, and the study heavily relied on a subjective questionnaire for labeling, which could introduce bias [7].Deep learning algorithms applied in the study [18] for motion capture and biomechanical analysis were hindered by environmental complexities and the dependency of high-quality RGB cameras, which could potentially compromise the accuracy of the model under less ideal conditions.A study [10] assessed workers' physical demand through physiological signals captured by wearable biosensors, employing a Gaussian kernel support vector machine for classifying physical demand levels.The limitations highlighted the need for validation on a larger, more diverse subject pool and the bias of subjective fatigue assessments.Another study [67] developed an automatic biomechanical workload estimation method using computer vision and smart insoles with pressure sensors, but the error rate of the load estimation exceeded 15%, and the noise along with signal artifacts affecting accuracy were not examined; thus, there is a potential for inaccuracies in real-world construction environments.A study also employed EMG and IMU sensors along with a recurrent neural network for continuous fatigue monitoring.The focus of this study is on work severity classification rather than determining realtime physical fatigue.Despite the good results, the potential variability in individual worker characteristics such as work experience, age, and health status can affect the system's accuracy.Furthermore, continuously monitoring fatigue levels for multiple tasks performed in short intervals can be challenging [68].Wang et al. [69] implemented a wearable EEG system for monitoring attention and vigilance; however, the system's effectiveness can vary with individual workers.A different research involved monitoring different physiological parameters such as heart rate and breathing rate, but the study was limited by the small sample size, which may not represent a broader population of construction workers [70].A study [71] looked at wearable-device-based data for measuring construction workers' psychological status, but the initial combined data modeling may not be sufficiently accurate for all individuals.In addition, the K-means clustering algorithm was adapted to classify the physical data into only three distinct classes.[72] used a wristband-type wearable health device to collect heart rate data and measure the physical demands of construction personnel.The study, however, does not account for all individual, environmental, and mental factors affecting physical demands, which could influence the results.Wearable eye-tracking technology can also be used to evaluate the impact of mental fatigue on hazard detection abilities.Yet, the study was only conducted in a controlled lab environment with only a few participants.Additionally, the use of wearable tracking technology may not capture mental fatigue accurately [42].
Ensemble learning operates on the principle of leveraging a collection of "weak learners, " which are individual models trained to address the same issue, to formulate a stronger predictive model through their aggregation.Weaker learners can be combined to develop a model that yields precise and robust results [73,74].One study investigated these ensemble classifiers for physical exertion modeling along with other models including K-nearest neighbor (KNN), SVM, and DT.However, the experiments were only conducted in a controlled environment that did not reflect the complexity of actual construction work, focusing on a single task [19].In another study, boosted trees, bagged trees, and RUSBoosted trees were proposed for fatigue classification.The prediction accuracy of the individual models was compared to the ensemble methods, resulting in enhanced fatigue detection accuracy for the ensemble models.Even so, the overall prediction accuracy was only 82.60%, 79.10%, and 80.60% for the boosted trees, bagged trees, and RUSBoosted trees, respectively.The study was limited by the reliability of subjective Borg's RPE scale for fatigue assessment [21].Ensemble learning was also used to analyze human activity data gathered by smartphone sensors.Results revealed that ensemble learning can enhance the accuracy of human activity recognition, providing more reliable results than several stand-alone algorithms [50].
The computational approaches using standalone models, while promising, still face challenges such as overfitting, feature selection, bias, lack of generalizability, and the nascent stage of application in wearable devices for safety and health monitoring in the construction industry [3,9,66,74].Ensemble learning, which combines predictions of machine learning models trained on different sets of sensor data, has shown potential for higher fatigue detection accuracy [21,50].However, selecting appropriate models for the ensemble and combining their predictions remains a challenge.

Challenges in fatigue detection
The aim to develop an effective fatigue assessment approach for construction workers has led to the development of various computational and non-computational fatigue detection methods.In spite of the improvements, each approach comes with its own unique set of challenges and limitations.One challenge is the accurate detection and assessment of fatigue [7, 10, 18-21, 67, 68, 71, 75].In addition, noise and signal artifacts can affect the accuracy of wearable sensing technology [72,75].There is also a scarcity of studies that have directly examined the relationship between changes in physiological metrics and physical fatigue in construction workers [57].
Approaches utilizing wearable sensors to measure physiological parameters such as heart rate, breathing rate, and skin temperature face several limitations.The measurements suffer from the high quantity of movement artifacts, affecting the accuracy during the actual construction tasks [20], and environmental influences that skew the data [21,51].These wearable sensors can also cause discomfort or distraction to construction workers, affecting the adoption of these technologies [48].Furthermore, several experiments were conducted only in controlled laboratory settings, not reflecting actual construction site conditions [27].Variations in sensor placement can lead to significant data discrepancies, questioning the reliability of the measurements [2,44].Some studies also suffer from a lack of generalizability [69].
Machine learning models, such as regression models and neural networks, are not immune to overfitting and feature selection issues, which can affect their generalizability and real-world application [20].The necessity for large datasets to train these models often poses an obstacle, as does the variability in individual responses to fatigue [7].Privacy concerns related to continuous physiological monitoring also cannot be overlooked [19].The generalizability of findings is a recurrent concern, with studies often limited to specific tasks or controlled environments that do not mirror the complexities of actual construction sites [7,10,[18][19][20][21].Fatigue detection systems struggle with specificity, frequently being unable to differentiate between fatigue and other physiological states, which is crucial for accurate assessments [76].Technologies such as 3D motion capture and deep learning networks require optimal conditions for operation, and their performance is limited by the quality of the equipment and the algorithms' robustness [18].The higher accuracy of fatigue classification often requires multiple sensor data, yet using multiple sensors adds complexity to the system [21].
This study aims to address the following challenges, contributing toward more effective fatigue monitoring in the construction sector:

The development of an IoT sensor and ensemble learning-based ChronoEnsemble
Fatigue Analysis System (CEFAS), which enables the monitoring of both physical and mental fatigue.The sensors can be effortlessly integrated into safety helmets and worn on the wrist.This integration into everyday safety gear ensures that the workers can comfortably perform their duties while their fatigue levels are continuously monitored.2. The ensemble learning model extracts data from various sensors and applies sophisticated learning algorithms to predict physical and mental fatigue (PF and MF) not only in real-time but also proactively.Since people often do not realize when they are fatigued, early detection is crucial.By identifying the signs of fatigue before they become apparent, the model enables preventative measures to be taken, significantly reducing the risk of accidents, unsafe behaviors, and injuries in the construction sector.

3.
A deep ensemble learning model is developed for accurately and reliably detecting MF and PF.Unlike traditional machine learning methods that categorize fatigue into discrete classes, this model provides a continuous and precise measure of fatigue.
The ensemble approach can also handle noisy data, reduce overfitting, and improve feature selection.4. To address the lack of openly available datasets suitable for training deep learning models in this domain, we assembled an exhaustive dataset in laboratory settings.This dataset contains physiological signal measurements recorded by photoplethysmography and electroencephalography sensors.The dataset provided a solid foundation for training and testing the deep ensemble learning model.
In summary, the proposed approach can considerably enhance the health and safety outcomes for construction personnel and contribute to a safer and healthier work environment in the construction industry.Through these contributions, we aspire to advance the current state-of-the-art in fatigue detection, ensuring safer work environments and improved worker well-being.The remainder of this paper is organized as follows: "Methodology" discusses the methodology, including the methodological process, dataset building, ensemble model, and prototype implementation."Results and discussion" highlights the results and discusses the evaluation of the proposed approach."Conclusions and future work" explores the limitations and future scope of the proposed approach and concludes the study.

Methodology
Herein, a deep ensemble learning-based fatigue detection approach is proposed for construction workers.The research combined PPG and EEG sensor data with an ensemble learning model to monitor and predict physical fatigue and mental fatigue in real-time.Through meticulous data collection, preprocessing, transformation, and the implementation of sophisticated models such as DeepAR and TFT, the study endeavors to accurately forecast fatigue levels.The methodological procedure is outlined below.

Methodological process
First, the research problem, i.e., the necessity for an enhanced safety management system in the construction industry, was identified.Then, the use of wearable technology, biosensor systems, and ensemble learning was analyzed by an exhaustive literature review.This study was meticulously constructed to ensure a comprehensive methodology for data accumulation and analysis.The methodological process is shown in Fig. 1.Following an exhaustive literature review on wearable technology, biosensor systems, and ensemble learning, we systematically selected appropriate wearable devices, such as PPG and EEG sensors, accompanied by a microcontroller.
Subsequent to this selection, data was collected using these wearable sensors.The collected data were preprocessed to construct a robust and usable dataset, which was used to train the developed ensemble learning model.
After the training phase, the performance of the model was thoroughly analyzed using various evaluation metrics to evaluate the efficacy of the proposed approach.Lastly, a prototype was developed and validated under real-time conditions.Each phase of this methodological process was executed with meticulous attention to the research objectives and ethical considerations intrinsic to the study.

Dataset formulation and refinement
The dataset used for training the ensemble learning model comprises physiological metrics recast into a time-series configuration.This transformation is vital for the proficient employment of the deep autoregressive (DeepAR) and temporal fusion transformer (TFT) models designed to manage the time-series data.Various physiological metrics, including HR, BVP, HRV, and EEG signals, were systematically collected from 11 participants, comprising males with a mean age of 25.33 years and a standard deviation of 1.21 and females with a mean age of 29.4 years and a standard deviation of 6.19.Table 1 lists the details of the participants.Data was acquired within a specified timeframe to ensure continuous and uninterrupted collection.The data were combined into a temporally annotated dataset using meticulous procedures, such as data acquisition, preprocessing, transformation, and segmentation, to ensure its robustness and suitability for subsequent analysis.

Data acquisition protocol
The initial preparations were completed in approximately 10 min.Before data collection, instruments were rigorously checked to ensure their operational readiness.Participants were acquainted with the objectives and protocols of the study and given an informed consent form for endorsement.The volunteers filled out a comprehensive demographic and health-related questionnaire.This step ensured that the participants met the inclusion criteria of the study and identified any potential exclusions.
Baseline data were acquired in 20 min.The mental fatigue of participants was assessed using scales such as the Stanford sleepiness scale, the Swedish occupational fatigue inventory (SOFI) [77], and the multidimensional fatigue inventory (MFI) [78].Then, a 10-min resting-state EEG focused on the prefrontal cortex (PFC) was recorded using the Muse 2 headband.The HR and HRV were simultaneously documented.A standardized reaction time test was then administered via a dedicated computational interface to gauge mental alertness.This was followed by the PF induction phase, lasting 60 min.Participants began with a 30-min cycling task, segmented by intensities: low (10 min), medium (10 min), and high (10 min).Reaction times were evaluated after each segment.Then, they performed a 10-min jump rope exercise, followed by ball tasks such as squat throws and wall passes, for another 10 min.

Preliminary analysis
Fig. 2 Experimental procedure of the fatigue assessment A 5-min recuperative interval was provided between each activity, and reaction times were assessed after each exercise.Post-exertion data were acquired for 40 min.Participants re-administered the Stanford sleepiness scale, SOFI, and MFI to gauge the shifts in mental fatigue.A follow-up 10-min resting-state PFC EEG was then performed.The post-activity PPG and EEG signals were recorded.Lastly, a reaction-time test was conducted to detect any changes in cognitive responsiveness.The data collation and preliminary analysis were completed in approximately 10 min.All sensor data was curated and structured into a dataset.We ensured a consistent mapping of participant IDs across various metrics, including subjective ratings, EEG recordings, cardiovascular metrics, reaction times, and physical performance indicators.Figure 2 depicts the experimental procedure for the assessment.

Data preprocessing
The original physiological data extracted from sensors was processed in several stages to ascertain its appropriateness for the ensemble learning model.Missing data points, outliers, and erroneous readings, which could distort the model performed, were cleaned using data visualization and statistical analysis.The multi-stage processing involved several filters and techniques, each addressing different types of noise and artifacts.Firstly, the Hampel filter was deployed to correct for sensing failures and outliers typically resulting from transient spikes and artifacts.This filter operates by examining each data point relative to its neighbors, replacing those deviating by more than a threshold-set three times the median absolute deviation-with the median of surrounding values, thus protecting the integrity of physiological signals.Additionally, a third-order one-dimensional median filter was utilized to ensure data integrity.Unlike traditional linear filters, the median filter preserves the fidelity of the temporal sequence by replacing each data point with the median of a predefined window of neighboring points and mitigating the influence of singular aberrant values or spikes that may distort the true signal.This nonlinear filtering approach was particularly adept at maintaining the edge features of signals, essential for the accurate characterization of time-dependent physiological states in both EEG and PPG data.Additionally, data-cleaning techniques were used to remove spikes and aberrant readings.For smoothing the signals while conserving significant features like peak heights and troughs, we employed the Savitzky-Golay filter.By fitting a polynomial to the moving window of data points, this filter refines the waveform without distorting its essential morphology.This step was critical for subsequent analytical steps such as feature extraction.We also incorporated a moving average filter to address short-term fluctuations due to random signal noise.By computing the mean within a moving window, this filter effectively mitigates the impact of transient noise while allowing the underlying trend of the data to be discerned with greater clarity.
Furthermore, baseline removal techniques were used to remove the direct current (DC) component and standardize the signal within a normal range.This enhances the signal quality for subsequent analytical steps.For EEG signals, we addressed the issue of baseline wander due to slow drifts by applying a high-pass filter with a cutoff frequency lower than the frequencies of interest.This effectively removed the drifts while leaving the higher frequency brain wave components intact.In PPG signals, baseline wander is often attributable to respiration, body movements, and other physiological variations.
To rectify this, we considered using techniques like ensemble empirical mode decomposition (EEMD), which adaptively removes baseline trends based on the signal's intrinsic oscillatory modes.These baseline removal techniques were crucial for compensating for low-frequency non-stationarities, potentially induced by sensor movements or attachment issues.By standardizing the signals and aligning them closer to their zero reference line, we facilitated a consistent interpretation of the physiological data, which proved significantly beneficial when comparing temporal segments across individual subjects.

Data segmentation
Signal segmentation followed the preprocessing step.The primary challenge was to determine the optimal window size for fatigue detection, for which window sizes of 1-20 s, with 1-s increments, were tested.After determining the optimal window size, the cleaned and filtered data were segmented into overlapping windows corresponding to the optimal size.Feature extraction was first performed for data transformation.Features from PPG signals were retrieved, considering time and frequency domains, to compile a comprehensive dataset for analysis.Statistical measures, such as mean, variance, and standard deviation, were computed for various physiological parameters derived from the sensor data.Then, feature selection was performed using techniques such as the backward-elimination wrapper method.As the feature set was extensive, dimensionality reduction methods, such as principal component analysis, were used.Lastly, data transformation was performed using the Z-normalization technique to scale the selected features to a standardized range, such as [0, 1].Similar steps were followed for EEG processing, emphasizing filtering and artifact removal, visual assessment, feature extraction from frequency bands, and data segmentation.The specifics of the final dataset are given in Table 2.

Proposed system architecture
To accurately predict PF and MF, a robust and complex system that uses DeepAR and TFT models is used.DeepAR, a probabilistic forecasting model developed by Amazon, uses long short-term memory (LSTM) cells to proficiently capture typical dynamics within time series and incorporates time-dependent covariates influencing the target variables.Conversely, the TFT model, designed for multivariate time-series forecasting, uses self-attention mechanisms, LSTM components, and gating mechanisms to decipher complex temporal dynamics within various time series.Both models accurately capture different dependencies and patterns within the dataset; therefore, an ensemble model is incorporated into the system.This model combines DeepAR and TFT prediction; it avoids overfitting of data and ensures robustness and excellent prediction performance.Thus, this system architecture can handle diverse and heterogeneous time-series data, which is essential for predicting PF and MF.

DeepAR model
DeepAR, a probabilistic forecasting model developed by Amazon, is aptly designed to handle multiple time series exhibiting common seasonal patterns and characteristics [79].As shown in Fig. 3, the architecture leverages recurrent neural networks, namely the LSTM cells, as the foundation for the model architecture.
For the probabilistic forecast of an input vector x t at time t , the hidden state of the LSTM, denoted as h t , is computed using Eq. ( 1).
The likelihood p y t |θ of observing a value y t at a time t is parameterized by a neural network, as shown in Eq. ( 2).
where θ represents the model parameters and NN denotes the neural network that maps the hidden state of the LSTM to the parameters of a chosen distribution.The training objective of DeepAR is to maximize the log-likelihood of the observed data, as shown in Eq. ( 3).
DeepAR proficiently captures typical dynamics within a time series and comprehensively analyzes seasonality and trend patterns.Further, the model is equipped to incorporate time-dependent covariates that may influence the target variable.The LSTM architecture learns from the past values of the time series to predict future values.Moreover, the model outputs a probability distribution over the prospective values, enabling the quantification of uncertainty within the forecasts.
The implementation process of the DeepAR model is explained in Algorithm 1, in which the target variables denote the PF and MF scores.As such, the model employs additional covariates-HRV, HR, gamma, beta, alpha1, alpha2, theta, and delta brain waves-to augment the forecasting of PF and MF scores.During training, the architecture potentially deciphers the relationship between these covariates and the target variables.The model estimates uncertainty by generating a full distribution of probable outcomes-an advantageous feature for estimating fluctuating fatigue levels.The model also accurately captures intricate temporal dependencies in physiological signals and fatigue scores.Multiple time series were generated concurrently for all participants, i.e., multiple measurements were gathered from multiple individuals. (1)

TFT model
The TFT model (Fig. 4) was particularly developed for forecasting multivariate time series using self-attention mechanisms, LSTM components, and gating mechanisms [80].This model deciphers complex temporal dynamics, including both local and long-term dependencies within multiple time series.An integral attribute of the TFT model is its ability to execute quantile forecasts, which enables it to predict outcomes across a continuum of probabilistic quantiles.These prediction intervals represent the range of values within which the actual fatigue scores are expected to fall with a certain probability, thereby generating a distributional forecast that encapsulates a spectrum of fatigue levels, each associated  with a specific quantile, rather than yielding a discrete point estimate.The ability to forecast across multiple quantiles is indispensable in the field of construction safety management, providing a nuanced understanding of potential fatigue scenarios.Such a probabilistic approach is instrumental in formulating comprehensive risk mitigation strategies by quantifying the uncertainty inherent in fatigue predictions.
The TFT model employs attention mechanisms to weigh the importance of different steps.Given an input sequence X = {x 1, x 2 , . . ., x t } , the self-attention mechanism com- putes a weighted sum of the inputs, as shown in Eq. ( 4).
where the attention weight α ti is determined using Eq. ( 5).
The score function can be a dot product or other similarity measure between input vectors.The TFT model also employs gating mechanisms to control the flow of information, as shown in Eq. ( 6).
Here, σ is the sigmoid activation function, and W g and b g are the gating parameters.Its capability to assign varied weights to different points in the time series according to their relevance enhanced the model's proficiency in understanding intricate interdependencies between variables across different temporal contexts.
The TFT model can adeptly handle diverse and heterogeneous time-series data.This is a characteristic of our custom dataset, where training is performed on a multitude of time series, each demonstrating unique attributes and distributions.Three types of features are distinctive to the TFT model: time-dependent data with known future inputs, time-dependent data known only up to the present, and static or categorical variables.These features allow the model to flexibly integrate various types of auxiliary information for the forecasting and modeling of multivariate time series.The TFT model is vital for our CEFAS model as it captures the local and long-term temporal dependencies between MF and PF scores and physiological variables, including HRV, HR, and gamma, beta, alpha1, alpha2, theta, and delta brain waves.These physiological signals display complex temporal dynamics, necessitating the development of a model that can accurately interpret data beyond the capabilities of traditional LSTM and RNN models.The TFT model can accommodate an array of signal types and ranges, including continuous (4) variables such as HRV and HR.Algorithm 2 illustrates the computational steps of the TFT model.

1.
Let X be the input time-series data.

2.
Preprocess the time-series data. a.
Normalize the time-series data to a suitable scale, typically between 0 and 1.
b.If necessary, transform the time series to be stationary (mean and variance do not change over time).
c. Split the time series into training and test sets.

3.
Define the TFT model. a.
Initialize the model with a specified number of layers and hidden units in each layer. b.

Define the data processing layers:
i.Variable selection networks to identify relevant features.
iii.Temporal processing layers, including positional encoding and temporal convolutional layers.ii.Pass the processed data through the self-attention mechanism and gated skip connections.
iii.Predict the future values of the time series with the final output layer.

b.
Calculate the loss between the predicted values and the actual values.

c.
Backpropagate the loss through the model and update the model parameters.

5.
Test the TFT model.

a.
Pass the test set through the trained TFT model.

b.
Calculate the loss between the predicted values and the actual values in the test set.

6.
If the loss is acceptable, the model is ready for forecasting.
Else, adjust the model parameters (number of layers, hidden units, etc.) and go back to step 3.

For forecasting:
Time series dataset

Ensemble model
The ensemble model uses a sophisticated machine learning technique that leverages the strengths of the base models [73,74], DeepAR and TFT, to analyze time-series datasets and generate more accurate forecasts for PF and MF.Herein, the time-series dataset was first used to train the base models, DeepAR and TFT, which then separately created forecasts for PF and MF.Then, a two-step meta-learning process was initiated: the PF forecasts generated by the base models were used as inputs for one meta learner, and the mental fatigue forecasts were channeled into another meta learner.This dual meta-learning strategy allowed for a more refined final forecast.In the ensemble learning architecture, forecasts from base models, DeepAR and Temporal Fusion Transformer, are integrated using XGBoost meta-learners to yield refined ensemble predictions for both physical and mental fatigue, thereby mitigating overfitting risks as opposed to a single deep learning model.For contradictory forecasts, the meta-model can ensure a consensus among these different predictions.The ensemble learning model is illustrated in Fig. 5.
The eXtreme gradient boosting (XGBoost) algorithm [81] sophistically adopts the gradient boosting algorithm to enhance the efficiency of expansive machine learning tasks.It originates from the theoretical framework of the decision tree algorithm, systematically refining model predictions through iterative learning from previous errors.Given a differentiable loss function L y, y , where y is the true value and y is the predicted value, XGBoost constructs an additive model as shown in Eq. (7).where f t (x) is the decision tree and t is the number of iterations.The primary objective is to select f t that minimizes the overall loss, as shown in Eq. (8).
where f t is the regularization term that imposes a penalty on the model complexity and prevents overfitting.It incorporates both L1 (Lasso) and L2 (Ridge) regularization, rendering the model more robust to potential overfitting.In terms of computational efficiency, XGBoost is designed for parallel and distributed computing, ensuring rapid model training.The implementation of the ensemble model is given in Algorithm 3, wherein XGBoost is utilized as the meta-learner.(7)

Experimental setup
The experimental setup uses a distinct combination of wearable technologies and an ensemble learning model for comprehensive fatigue research was conducted in a high-performance computational environment powered by an AMD Ryzen 3990X 64-core processor and an NVIDIA GeForce RTX 3090 integrated CPU, complemented by 256 GB of RAM and a 1 TB SSD single drive.This system ensures the seamless execution of complex algorithms and the efficient handling of large datasets, thereby laying a solid groundwork for the analytical processes.Multiple sensors, including the Muse 2 headband electroencephalography sensor and MAX30102 heart-rate and pulse oximetry sensor, were integrated to capture physiological signals.The setup is shown in Fig. 6.
To streamline data collection and processing, a Bluno Beetle Bluetooth low-energy (BLE) microcontroller is incorporated into the wearable device because it is lightweight and offers high power and programmability.This setup was proposed for real-time monitoring of individuals' health status.Moreover, it was easy and comfortable to use, which is a critical factor in ensuring its sustained application in construction environments.

Prototype development
The prototype was developed via a structured pathway by integrating wearable technologies and an ensemble learning model in five critical steps.First, a high-performance computational environment was established.Then, the Muse 2 headband and MAX30102 sensor were integrated into the system for EEG readings and heart rate and pulse oximetry monitoring, respectively.A microcontroller was then installed to

Sensors and microcontroller
The developed prototype comprised various sensors.The Muse 2 headband electroencephalography sensor was used to obtain EEG signals from TP9, AF7, AF8, and TP10 channels.The heart-rate and pulse oximetry sensor was placed in the wristband to record PPG signals.
The Bluno Beetle BLE microcontroller was also integrated into the wristband configuration to regulate the collection and processing of sensor data.This microcontroller, specifically tailored for wearable technology applications, is lightweight and offers enhanced power and programmability; therefore, it is used herein.A lithium-ion battery was also used as a compact and dependable power solution.The Bluno Beetle BLE wirelessly transmits the collected data to the computer, providing flexibility in data management and processing.Operating as a Bluetooth master or slave, this module permitted the wristband to engage in wireless serial port communication.With a transmission range extending up to 10 m, this BLE technology enables the real-time transmission of sensor data to a smartphone, allowing the wearer to manage the wristband through a user interface.These sensors and modules (Table 3) constitute a complex network that can accurately capture and process physiological signals.A comprehensive fatigue score can be determined to ultimately enhance the safety and health of the wearers.

Prototype validation
The prototype was verified comprehensively to affirm its functionality in real-time operational conditions.The prototype was subjected to testing, wherein sensor data was obtained.The fatigue predictions generated by the proposed model were compared with the established fatigue scoring benchmarks.The performance of the ChronoEnsemble Fatigue Analysis System was evaluated using metrics such as mean square error (MSE), mean absolute scale error (MASE), mean absolute percentage error (MAPE), symmetric mean absolute percentage error (sMAPE), normalized deviation (ND), and mean scaled interval score (MSIS).Prototype validation revealed its salient outcomes, such as high correlation coefficients between the model-predicted outcomes and the actual fatigue scores, as well as the precise tracking of PF and MF fluctuations in real-time conditions.The ensemble model effectively identified the fatigue scores within the sensor data, indicating its proficiency in recognizing and responding to the complex combination of physiological signals that contributed to fatigue.The validation process was predominantly focused on performance metrics, but it was instrumental in detecting areas that required potential enhancements, thereby ensuring its continuous development.

Performance evaluation metrics
The efficacy of the ensemble model was assessed using performance metrics such as MSE, RMSE, MASE, MAPE, sMAPE, ND, and MSIS.The MSE denotes the mean of the squared discrepancies between the predicted and true values, indicating the error variance of the model.MSE is calculated as follows (Eq.( 9)).
where y and ŷ denote the actual and predicted values, respectively.
The RMSE is a standard metric used in supervised learning to measure the accuracy of predictive models.To determine the RMSE, the residual for each observation, defined as the discrepancy between the predicted and observed values, is determined.Then, the squared norm of each residual is computed.The mean of these squared residuals is then calculated, and its square root is calculated to obtain the RMSE.True measurements for each predicted data point are required for calculating the RMSE.The RMSE is represented in Eq. ( 10): where n denotes the total number of observations, y denotes the true values, and ŷ denotes the predicted values corresponding to the observations.
The MASE is a metric used to compare the prediction accuracies of different methods.It is computed by normalizing the mean absolute error by the mean absolute difference of successive actual values, thus facilitating model comparisons.It is mathematically expressed in Eq. (11).
where n is the total number of observations; y and y are the true and predicted values at a specific t time point, respectively; y[t] and y[t − 1] are the actual values at time t and t − 1 , respectively.
The MAPE is the average of the absolute percentage discrepancies between the predicted and actual values, as shown in Eq. (12).It is used as a measure of the prediction accuracy of a forecasting model.
where n is the number of observations in the dataset and y and y denote the true and predicted values, respectively, at a particular time point.
The sMAPE is a symmetric version of MAPE (Eq.( 13)), wherein the mean absolute percentage discrepancy is normalized by the sum of the predicted and absolute values of the actual value.It also addresses some issues encountered with the MAPE.
where n denotes the total number of observations in a dataset and y and y denote the actual and predicted values at a specific time point, respectively.The ND (Eq.( 14)) scales the disparity between the predicted and true values by the range of the actual values: where y and y denote the true and predicted values at a distinct time point, respectively.
The MSIS considers the average width of the predicted interval normalized by the mean absolute discrepancy between the predicted and actual values, as shown in Eq. (15).In other words, it considers the average width of the predicted interval normalized by the mean absolute error.(   where the lower and upper bounds refer to the bounds of the prediction interval, n is the number of observations in the dataset, y is the true value at a specific time point, and ŷ is the predicted value at a specific time point.

Model performance
The ensemble learning model is comprehensively evaluated here.Its performance was analyzed in terms of accuracy, precision, and resilience to variations and deviations using several evaluation metrics.The results were juxtaposed with those of the DeepAR and TFT models to confirm the superior efficacy of the proposed model over these standalone models.The forecasts of physical and mental fatigue on the test dataset by DeepAR, as illusin Fig. 7a, b, exhibit notable fluctuations.In Fig. 7a, the physical fatigue score seems to be relatively stable over a short period of time, as the observation lies within the 95% prediction interval.However, the broad prediction intervals suggest that the model only captures a general trend, and there is uncertainty in the forecasts.Although the mental fatigue score is decreasing over time (Fig. 7b), the predictions lie below the observations.Thus, the model systematically underestimates the mental fatigue score.In Fig. 7b, the prediction intervals are also relatively wide, indicating uncertainty again.This highlights the challenges that standalone models might encounter when attempting to understand intrinsic data patterns.The Temporal Fusion Transformer, shown in Fig. 7c, d, provides forecasts with a more consistent downward trajectory.The model's median prediction closely follows the observations, and the model provides narrower prediction intervals compared to the DeepAR predictions, which is indicative of its ability to manage temporal correlations well and higher confidence in the forecasts.Figure 7e, f capture the outcomes from CEFAS, representing the ensemble learning predictions.As shown in Fig. 7e, f, the ensemble model is able to forecast the physical as well as the mental fatigue precisely and give stable predictions over time.For the mental fatigue forecasts, the CEFAS slightly underestimates the fatigue score.The tight   8a oscillate with a narrower band, suggesting fewer volatile traits, whereas the mental fatigue scores in Fig. 8b display wider swings, underscoring that the model is unable to capture the mental fatigue scores precisely due to the inherent nature of mental fatigue dynamics.TFT (Fig. 8c, d) provides a more confident forecast for both physical and mental fatigue prediction with a strong trend and narrow intervals, which indicates that it clearly understands the temporal dynamics of the data.The ensemble learning model (Fig. 8e) shows even narrower intervals for physical fatigue scores than either of the base models, which demonstrates the capability of the ensemble to synthesize information and correct over-or under-estimations by individual models when it is necessary.As shown in Fig. 8f, the narrow intervals of the prediction denote the ensemble model's ability to learn predictive cues from both DeepAR and TFT models.The ensemble effectively integrates the DeepAR and TFT models' traits to capture the probabilistic nature of the data and to understand complex temporal patterns.
A comparative analysis of the physical fatigue forecasting performance was conducted, as illustrated in Fig. 9.The MSE and RMSE values were determined to evaluate the physical fatigue prediction accuracy of the proposed model.The ensemble model, represented by the CEFAS, consistently registers lower MSE values of 0.0007 for PF (Fig. 9a).Such outcomes can be attributed to the ensemble's iterative refinement process, inherent to gradient boosting mechanisms.The DeepAR model, registering an MSE value of 0.0698 for PF, potentially suggests its inherent limitations in deciphering certain intricate data dynamics.
The TFT model, while achieving an MSE of 0.0639 for PF, marginally surpassing the DeepAR, still does not approach the precision of the ensemble model's results.The result likely stems from the ensemble's ability to combine and optimize diverse model strengths, mitigating individual weaknesses and enhancing overall precision.With RMSE metrics of 0.0277 for PF, as illustrated in Fig. 9b, the proposed approach effectively amalgamates various analytical perspectives, emphasizing its capability of producing consistent forecasting results.The RMSE values achieved by the DeepAR model, 0.2642 for PF, reveal potential limitations in its alignment, as a higher RMSE indicates a greater discrepancy between the predicted values and actual observations.Meanwhile, TFT, despite a commendable RMSE of 0.2642, doesn't attain the precision observed in the ensemble forecasts.
In the context of sMAPE (Fig. 9c), the ensemble model's result of 0.0054 for PF articulates robustness across diverse scales.TFT, while outperforming DeepAR with a MASE value of 0.056576 for PF, may indicate a proclivity towards overfitting temporal patterns.DeepAR's escalated MASE metric of 0.1531 for PF elucidates its broader inefficacies in addressing scaled discrepancies.The MASE values underscore the proposed approach's advanced forecasting capabilities and its ability to minimize percentage and scaled errors.
In terms of sMAPE, as reflected in Fig. 8d, the 0.0008 value for PF signifies that the predicted values are exceptionally close to the actual observations as compared to the standalone models.The low sMAPE value of the ensemble model emphasizes its consistency in handling the intricacies of the data and capturing underlying data patterns.TFT's performance, indicated by a sMAPE of 0.3740 for PF, suggests intermittent challenges in deciphering unpredictable data trajectories.DeepAR, with its considerably higher sMAPE value of 0.9258 for PF, reiterates its vulnerabilities in symmetric error capture.The ensemble model, boasting MSIS metrics (Fig. 9e) of 0.0012 for PF and an ND value (Fig. 9f ) of 0.0008 for PF prediction, proves its expansive comprehension and reliability of its prediction intervals.The MSIS values suggest that the ensemble model's predictions are more reliable, given the narrow intervals they operate within.The ensemble model exhibits the lowest ND scores, indicating the model's increased resilience to variations and exemplary alignment with the actual observations.The minimal deviation proves that the ensemble model is not only accurate on an absolute scale but also consistently accurate relative to the actual data's range.
TFT's metrics, encompassing an MSIS value of 0.0375 and an ND of 0.0536 for PF, while surpassing DeepAR, elucidate occasional limitations in capturing data nuances.In comparison with the standalone models, the significantly lower ND values of the ensemble model underline its superior prediction fidelity, as it manages to maintain constant alignment with the actual data.
Figure 10 shows a comparative analysis of forecasting the performance metrics for the three models.The lower MSE value (Fig. 10a) indicates a closer fit between the observed values and the prediction.The CEFAS has the lowest value for MF prediction with 0.0033, which is due to its potential to correct the errors in the base models' forecasts and minimize the sum of squared deviations.As seen in the figure, TFT also performs well with an MSE value of 0.0044, as the model can leverage temporal patterns effectively.The DeepAR's high MSE of 0.0639 suggests that the model is unable to capture the complexity of mental fatigue as effectively as the others.
Similarly, the RMSE (Fig. 10b) values confirm the previous results, with CEFAS showing the highest accuracy with a value of 0.0577.There is a substantial difference between the MASE scores (Fig. 10c) of the base models and the ensemble model.The MASE value of 0.0016 for CEFAS indicates that the absolute errors from the CEFAS forecasts are much smaller than those from the naïve model.This is because the gradient boosting nature of the XGBoost meta-learner enables the capture of nonlinear relationships and interactions between the features, thus effectively predicting mental fatigue levels.The TFT model uses attention mechanisms to weigh different parts of the time-series data differently, potentially allowing it to focus on more relevant past information when predicting future points, as proven by a low 0.0054 MASE score.The DeepAR MASE value is the highest with 0.0657, which implies that for specific nuances of the mental fatigue data, its method for forecasting is less efficient than the ensemble learning model.
The sMAPE (Fig. 10d) values suggest CEFAS is the most accurate, with the lowest percentage error of 0.0016, indicating its strong predictive capabilities and robustness to outliers in mental fatigue forecasting.TFT also performs well with a value of 0.0045, showcasing its efficient time-series modeling, likely due to its attention mechanisms.DeepAR's higher 0.0528 sMAPE value implies the model is less precise in this context, potentially due to the complex nature of mental fatigue that may not align well with DeepAR's probabilistic autoregressive approach.
The MSIS (Fig. 10e) measures the accuracy of prediction intervals.CEFAS's very low MSIS value of 0.002408 indicates its interval forecasts are both narrow and accurate, reflecting high confidence in its predictions.TFT has a higher MSIS score of 0.037529, suggesting its prediction intervals are less precise than CEFAS's but still relatively  informative.DeepAR's much higher MSIS of 0.588216 implies its prediction intervals are quite wide, indicating less certainty in its forecasts.
In terms of the ND of the forecasts (Fig. 10f), a lower score means a model's predictions are closer to actual values.CEFAS's lowest ND score of 0.001605 indicates it has the smallest deviation from the actual observations in predicting MF, likely due to its efficient handling of complex patterns in the data.TFT has a slightly higher ND score of 0.004416, showing good but less precise predictions than CEFAS.DeepAR's highest ND score of 0.053653 suggests it is less accurate and less effective in capturing the data dynamics related to mental fatigue.The tight alignment between the predictions and the observations suggests that the ensemble model is robust, translating to increased resilience against data variations, which is crucial for time-series forecasting tasks where the data can exhibit intricate temporal patterns.
Table 4 presents a comprehensive comparative analysis of the proposed ensemble model, CEFAS, against various state-of-the-art methods for fatigue prediction.The proposed CEFAS model demonstrates superior performance across multiple metrics compared to existing prediction methods, exhibiting an average MSE of 0.0021, RMSE of 0.0427, and MAPE of 0.0012.
Among the compared methods, the Self-Organizing Neural Fuzzy System (SONFIN) [82] and: Graph regularized Extreme Learning Machine (GELM) [83] are capable of predicting both PF and MF.However, their RMSE values of 0.360 and 0.0712, respectively, are higher than those of CEFAS, indicating lower prediction accuracy.Similarly, the ANN [84] and the regression method with Maximum Mean Discrepancy (MMD) and Gated Recurrent Unit (GRU) [94] have RMSE values of 0.857 and 0.27, respectively, which are significantly higher than CEFAS.
In conclusion, the proposed CEFAS ensemble learning model demonstrates state-ofthe-art performance in predicting both physical and mental fatigue, outperforming a wide range of existing methods across multiple evaluation metrics.This remarkable performance, which signifies a substantial improvement over individual base learners, encourages the continued exploration and application of ensemble models for time-series forecasting, particularly in the context of fatigue prediction.

Conclusions and future work
In this study, the validity and effectiveness of ensemble learning models in processing IoT sensor data for real-time mental fatigue and physical fatigue monitoring were evaluated.Furthermore, the correlations between the prediction capabilities of different models and the improvement in their prediction accuracy were determined after their integration into an ensemble learning-based ChronoEnsemble Fatigue Analysis System.This study was performed in two stages.First, a study was conducted to verify the reliability, accuracy, and improvement potential of the proposed ensemble learning model in processing and analyzing sensor data.Second, a study was conducted using real sensor data to determine the precision and reliability of the proposed model for predicting fatigue states.During the assessment, the CEFAS algorithm was used for predicting PF and MF.In physical fatigue prediction, the algorithm yielded MSE and RMSE values of 0.0007 and 0.0277, respectively.In mental fatigue prediction, the algorithm yielded MSE and RMSE values of 0.0033 and 0.0577, respectively.These evaluation metrics highlighted the efficacy of the algorithm, indicating excellent prediction accuracy.The results from real-world datasets demonstrated that the ensemble model that uses the DeepAR and TFT models as the base learners and XGBoost as the meta learner yields reliable real-time predictions based on sensor data.
While our findings offer significant insights into the use of IoT and ensemble learning for fatigue monitoring, they are subject to certain limitations that must be considered.Firstly, the dataset used for training and validating CEFAS was limited to 11 participants.Although the ensemble model demonstrated high precision in predicting fatigue, a larger and more diverse dataset would enhance the generalizability of our approach.Secondly, our methodology relies solely on wearable sensor data, which can introduce potential biases related to sensor placement, individual physiological variances, and environmental impacts.Despite implementing a range of preprocessing steps-including baseline removal and various filtering methods-to standardize inputs and mitigate noise from multiple sources, these procedures do not completely eliminate variabilities induced by environmental conditions and individual differences.Additionally, our experimental protocols were conducted under controlled conditions to maximize the reliability and validity of our findings; however, applying these results to the dynamic environments of actual construction sites may necessitate further adaptations through extensive field validations and experiments.
Future research should explore the potential of integrating additional models to enhance forecasting accuracy.Moreover, the model's performance may vary with different sensor configurations due to its reliance on data from a specific set of sensors.Thus, future studies should utilize a diverse array of sensor data to confirm the model's applicability.An additional objective for future research is to expand the dataset to include a larger and more diverse participant pool, along with various construction settings.Integrating these limitations, the study illustrates that while the current ensemble learning model yields reliable real-time predictions for fatigue monitoring, these future advancements could further support real-time risk assessment, particularly in hazardous work environments.
The present study predominantly focused on the ensemble model and its application in predicting MF and PF; therefore, the future research directions pointed out herein are crucial for improving its applicability.The continued validation of the prediction results in other domains using extensive IoT sensor data, in addition to exploring its wide potential applications, will be essential to exploring the full potential of the ensemble learning model.

4 .
c. Define the Transformer-style self-attention mechanism.d.Define the gated skip connections to facilitate information flow.e. Define the output layer to predict the future values of the time series.Train the TFT model.a. Pass the training set through the TFT model.i.Process the input data through the variable selection networks, GLUs, and temporal processing layers.

Fig. 5
Fig. 5 Schematic of the ensemble learning model topology

Algorithm 3
Proposed ensemble learning algorithm 7

Fig. 6
Fig. 6 Experimental setup and process

Fig. 7
Fig. 7 Visual evaluation of the physical and mental fatigue forecasts on the test dataset.a DeepAR physical fatigue forecast, b DeepAR mental fatigue forecast, (c) temporal fusion transformer physical fatigue forecast, (d) temporal fusion transformer mental fatigue forecast, (e) CEFAS physical fatigue forecast, and (f) CEFAS mental fatigue forecast

Fig. 8
Fig. 8 Visual evaluation of the physical and mental fatigue forecasts on the training dataset.a DeepAR physical fatigue forecast, b DeepAR mental fatigue forecast, c temporal fusion transformer physical fatigue forecast, d temporal fusion transformer mental fatigue forecast, e CEFAS physical fatigue forecast, and f CEFAS mental fatigue forecast

Fig. 9
Fig. 9 Comparative analysis of forecasting performance metrics for DeepAR, TFT, and CEFAS models in predicting physical fatigue.a MSE, b RMSE, c MASE, d sMAPE, e MSIS, and f ND

Fig. 10
Fig. 10 Comparative analysis of forecasting performance metrics for DeepAR, TFT, and CEFAS models in predicting mental fatigue.a MSE, b RMSE, c MASE, d sMAPE, e MSIS, and f ND

Table 1
Mean age, mean height, mean weight, and standard deviation of parameters and the count of participants by gender

Table 2
Details of the dataset 11 × 2 h frequency = "1S" Self-collected using EEG and PPG sensors

Table 3
Configuration and use of sensors and microcontroller in the prototype streamline data collection and capture physiological signals in real-time.The first stage leveraged the ensemble learning model to process the acquired data, facilitating the extrapolation of PF and MF scores from the sensor-derived data. )