Predicting medical device failure: a promise to reduce healthcare facilities cost through smart healthcare management

Background The advancement of biomedical research generates myriad healthcare-relevant data, including medical records and medical device maintenance information. The COVID-19 pandemic significantly affects the global mortality rate, creating an enormous demand for medical devices. As information technology has advanced, the concept of intelligent healthcare has steadily gained prominence. Smart healthcare utilises a new generation of information technologies, such as the Internet of Things (loT), big data, cloud computing, and artificial intelligence, to completely transform the traditional medical system. With the intention of presenting the concept of smart healthcare, a predictive model is proposed to predict medical device failure for intelligent management of healthcare services. Methods Present healthcare device management can be improved by proposing a predictive machine learning model that prognosticates the tendency of medical device failures toward smart healthcare. The predictive model is developed based on 8,294 critical medical devices from 44 different types of equipment extracted from 15 healthcare facilities in Malaysia. The model classifies the device into three classes; (i) class 1, where the device is unlikely to fail within the first 3 years of purchase, (ii) class 2, where the device is likely to fail within 3 years from purchase date, and (iii) class 3 where the device is likely to fail more than 3 years after purchase. The goal is to establish a precise maintenance schedule and reduce maintenance and resource costs based on the time to the first failure event. A machine learning and deep learning technique were compared, and the best robust model for smart healthcare was proposed. Results This study compares five algorithms in machine learning and three optimizers in deep learning techniques. The best optimized predictive model is based on ensemble classifier and SGDM optimizer, respectively. An ensemble classifier model produces 77.90%, 87.60%, and 75.39% for accuracy, specificity, and precision compared to 70.30%, 83.71%, and 67.15% for deep learning models. The ensemble classifier model improves to 79.50%, 88.36%, and 77.43% for accuracy, specificity, and precision after significant features are identified. The result concludes although machine learning has better accuracy than deep learning, more training time is required, which is 11.49 min instead of 1 min 5 s when deep learning is applied. The model accuracy shall be improved by introducing unstructured data from maintenance notes and is considered the author’s future work because dealing with text data is time-consuming. The proposed model has proven to improve the devices’ maintenance strategy with a Malaysian Ringgit (MYR) cost reduction of approximately MYR 326,330.88 per year. Therefore, the maintenance cost would drastically decrease if this smart predictive model is included in the healthcare management system.


INTRODUCTION
Medical devices are used for the diagnosis and treatment of disease, as well as the rehabilitation of patients during post-treatments (Aridi et al., 2016). It can be used independently or in conjunction with any accessory, consumable, or other pieces of medical device. The reliability, maintainability, availability, and safety of medical devices are the ultimate goal in maintenance strategy, where medical devices should not fail frequently and must be repaired promptly when the failures are detected. Numerous investigations have associated medical device failures with severe patient injuries and deaths (Dhillon, 2011;Mahfoud, Abdellah & El Biyaali, 2018;Palmer, 2010;Verbano & Turra, 2010). The world health organization (WHO) estimates that 50% to 80% of equipment is non-functional due to the lack of maintenance culture, competency, and a tendency to focus on corrective maintenance rather than preventative maintenance (Kutor, Agede & Ali, 2017). Ineffective medical device maintenance causes crucial equipment downtime, diminished device performance, monetary waste, and depletion of resources (Bahreini, Doshmangir & Imani, 2019). Older technological devices require more attention due to a lack of service or user manuals and manufacturer guidance (Engineering Services Division Ministry of Health, 2018;Sezdi, 2016). Meanwhile, among the leading causes of downtime or equipment failures are inadequate storage and transportation, initial failure, inappropriate handling (damage during use), inadequate maintenance, use of non-genuine spare parts or refurbished spare parts, environmental stress, random failure, improper repair technique, and wear-out failures (Kutor, Agede & Ali, 2017).
Medical devices range from basic tongue depressors to complex radiation systems with over 10,000 different types, and 1.5 million unique medical devices are recorded globally. Medical device expenditure climbed from USD 145 billion in 1998 to USD 220 billion in 2006, which indicates an annual growth rate of more than 10% (World Health Organization, 2011a). Medical devices maintenance market projection demonstrates that the medical devices maintenance market is expected to grow at a compound annual growth rate (CAGR) of 10.4% to USD 74.2 billion by 2026, which will increase drastically from USD 45.2 billion in the year 2021 (Markets and Markets, 2021). The increased Meanwhile, in Malaysia's healthcare facilities, preventive maintenance is performed at a set time or interval throughout the year, as recommended by the manufacturer or as scheduled, and corrective maintenance is conducted after failure where the repair work is implemented (Coban et al., 2018). Clinical engineers are responsible for calibration, maintenance, repair, user training, and decommissioning of medical devices by applying engineering and managerial skills (World Health Organization, 2011b). Another aspect that influences the dependability and failures of medical equipment is the level of knowledge of users and biomedical staff. There is a challenge to employ an expert or welltrained worker in biomedical engineering; thus, outsourcing or implementing a service contract with specialized contractors or vendors is an option (Mahfoud, Abdellah & El Biyaali, 2018). Due to the same limitation, Malaysian Government hospitals have adopted a maintenance service contract for medical devices and launched a privatization program with Concession Company in 1997. Other elements that influence the performance of medical equipment include calibration work and electrical safety testing, planned maintenance, and competence. Calibration is a task to verify the accuracy, and an electrical safety test is performed to ensure the patient is not at risk of electrical injury or leakage. According to the results of an investigation test conducted on six high-risk medical devices, 58% of the devices failed the performance test, which is greater than prior studies, which found that 21% and 26% of the devices failed the performance test, respectively (Altayyar et al., 2018). Furthermore, when measured in healthcare facilities, around 9% of infusion pumps and 12.6% of dialysis machines fail to meet electrical safety regulations (Gurbeta Pokvic et al., 2017a, 2017b. Meanwhile, clinical chemistry analyzers and infusion pumps were the most commonly reported medical device affected by electrostatic discharge failures due to a current flow that caused a dielectric breakdown (Kohani & Pecht, 2018). As a result, electrical safety tests are required in Malaysia for scheduled maintenance or newly purchased medical devices, and calibration work is performed on specified types of medical devices as advised by the manufacturer.
Healthcare administrators and biomedical engineers are continually confronted with issues pertaining to the security of their facilities, the satisfaction of their employees, the quality enhancement and improvement of the standard grade of care provided to their patients, the expensive workflow, and the inefficiency of costly processes. Numerous times, improvements in information technology (IT) for the healthcare system are cited as potential enhancement techniques. Innovators and researchers are constantly working to develop new technologies that can aid corporate operations and improve the quality of activities. Today's healthcare facilities are confronted with an increase in the number of patients and heightened expectations for patient experiences and levels of satisfaction as an essential consequence of healthcare services. To fulfill these increasing demands, it is necessary to eliminate unnecessary stages and simplify the workflow. Artificial intelligence (AI)-assisted asset management is one of the primary components of an efficient work process. AI in healthcare applications and model development has emerged as a promising tool in providing a solution to humans while dealing with a crisis, especially during the COVID-19 pandemic. It is utilized during decision-making through feature learning (Ghorbani et al., 2020;Jayatilake & Ganegoda, 2021;Kulathilake et al., 2021). Pre-processing, feature extraction, and classification are the three steps to be examined in machine learning (ML) techniques for healthcare applications (Jayatilake & Ganegoda, 2021). The execution time is lowered, and the classification accuracy is raised by implementing feature selection before classification. Principal component analysis (PCA) and genetic algorithms can be used to generate a feature set for classification (Jayatilake & Ganegoda, 2021).
PCA is a technique to avoid overfitting and is used for performance advancement and noise reduction. The PCA is applied with 16 principal components to obtain 95% of the original variance using a random forest algorithm in ML to detect Chagas disease (Morais et al., 2022). A genetic algorithm (GA) is a search heuristic that imitates the natural evolution process. It is frequently employed to find practical answers to optimization and search challenges (Santra & Christy, 2012). The GA for feature selection has also been proposed by Ghorbani et al. (2020) and Santra & Christy (2012) in the clustering technique. However, they are most commonly produced in supervised learning, where data class labels are known. The primary goal is to reduce the number of features utilized in classification while retaining acceptable classification accuracy. Prioritization, failure, and risk analysis are all employed in various applications of medical device reliability under the risk management area. Failure mode and effect analysis (FMEA) (Arathy & Balasubramanian, 2020), a mix of FMEA and Fuzzy (FFMEA) (Jamshidi et al., 2015), analytical hierarchy process (AHP) (Hutagalung & Hasibuan, 2019), and other methodologies are commonly used in risk management subjects. A study on medical device prioritization using a support vector machine is discussed in Zamzam et al. (2021), where the technique outperforms preventive maintenance and replacement prioritization with 99.42% and 99.80% accuracy, respectively. In addition, K-nearest neighbour had the best accuracy of 98.9% in corrective maintenance prioritization. However, the study shall be enhanced to critical medical devices at more extensive facilities and broader clinical services such as hospitals. ML application in healthcare services is used in three research studies for medical devices performance prediction as described in Badnjević et al. (2019), Hrvat et al. (2020) andSpahić et al. (2020). However, these studies are limited to only one type of medical device to evaluate medical devices' performance.
In a study by Ngabo et al. (2021), the ML model was developed based on the COVID-19 patient data to predict the patients' survival rate with the kNN algorithm, and Decision Trees attained the highest accuracy of 99.30%. Meanwhile, in a study by Iwendi et al. (2020), a Boosted Random Forest algorithm attained an accuracy of 94% in predicting the severity of COVID-19 cases using patients' data and symptoms. Recently, the deep learning (DL) technique evolved as a sophisticated tool primarily used in medical imaging, text data, time series, and various image applications (Ravikumar et al., 2022). It is well known as a subdivision of ML that consists of different processing layers equipped with inputs, hidden, and output layers. Furthermore, researchers have explored the possibility of using long short term memory (LSTM) and deep reinforcement learning to predict losses and cures of patients' symptoms in the following few days after contracting the disease (Kumar et al., 2021). Besides, the admission and mortality of COVID-19 patients are predicted using an interpretable DL model with an area under curve (AUC) of 88.3% (Nazir & Ampadu, 2022). The application of AI is also recently used for privacy and security issues (Hameed et al., 2021) and is widely used in smart and mobile healthcare (Yamakoshi, Rolfe & Yamakoshi, 2021).
Although AI has shown promising results in assisting clinicians in getting the best outcome, delivering healthcare services could be interrupted if the medical devices are not optimally operated. Critical medical devices are the main priority and critical areas for patients. Intensive care units (ICU) and hybrid COVID-19 wards are extensively used in treating patients with medical devices operated without failure. As a result, demand for critical care devices such as ventilators has skyrocketed (Garzotto et al., 2020;Markets and Markets, 2021). A smart healthcare system and efficient maintenance strategy can prevent potential failure or breakdown, disrupting healthcare operations and leading to serious patient injury. To date, the published works focused on developing a medical device reliability system specific to one type of device. Comprehensive maintenance and reliability system was recently published by Hilmi et al. (2021) andZamzam et al. (2021); however, the works focused on predicting maintenance prioritization. They highlighted a research gap of inconsistent mathematical methodologies requiring manual intervention in determining the weights of criteria in reliability assessments. It is necessary to improve the current predictive models for various medical devices. With adequate maintenance history in structured data to train a model in aiming for the best accuracy, the study on performance prediction for medical devices using AI for smart healthcare management can be further explored. Therefore, this article addresses three identified research gaps as follows: i) To date, there are limited published works on smart medical device maintenance strategy frameworks. Only three research studies utilized AI in medical devices' performance prediction. The developed predictive models were proposed by analyzing only one device's history data: infant incubators, infusion pumps, and defibrillators. The comprehensive medical device reliability assessments are still lacking, and cost analysis was not considered in the assessment.
ii) To the best of our knowledge, medical device reliability studies were categorized into three main areas: risk assessment, performance or failure prediction, and management system. Most studies focus on risk management using failure codes analysis and maintenance prioritization in reliability assessments. Utilizations of smart management and monitoring leveraging extensive data capability are limited and have not been appropriately explored. A predictive model to forecast the likelihood of equipment failures is lacking. Anticipating these problems is essential in maximizing device uptime and reducing astronomical repair costs. This is crucial for any country during crisis management, especially during COVID-19 pandemic, where excellent and reliable equipment is highly required.
iii) Three research on performance prediction for medical devices is available using AI, as reported by Badnjević et al. (2019), Hrvat et al. (2020) and Spahić et al. (2020). The existing model predicts medical device performance by developing accurate and faulty classification based on the pass or fail response. Current practice is to perform scheduled preventive maintenance, and the manufacturer suggests its frequency without considering failure history data. A scientific research gap is improved in this study with the development of a critical medical devices predictive model to predict the likelihood of device failure from its purchase date. The predictive model will be able to classify the device into three classes; (i) class 1, where the device is unlikely to fail within the first 3 years of purchase, (ii) class 2, where the device is likely to fail within 3 years from purchase date, and (iii) class 3, where the device is likely to fail more than 3 years after purchase. The machine learning and deep learning models were compared. The goal is to determine the actual maintenance schedule needs and gain comprehensive strategic maintenance management to reduce maintenance and operational cost.

Predictive model framework
This study considers five types of healthcare facilities under the Ministry of Health, Malaysia. They are four types of hospitals: state, major, minor, non-specialist, and one special psychiatric institution, which is equivalent to 15 healthcare facilities. A predictive model was developed based on the medical device data (Engineering Services Division Ministry of Health, 2018) from government hospitals in Perak (west coast of Malaysia peninsula). Perak state hospital is equipped with 990 beds, two major specialist hospitals with 608 and 548 beds, two minor specialist hospitals with 305 and 250 beds, nine nonspecialists or district hospitals with 50 to 160 beds, and 1,800 beds for a special psychiatric institution. The state hospital provides 15 specialist services and designated subspecialists based on the region, and clinical service is managed by clustering in their respective areas. The 15 medical specialist services cover and are not limited to general medicine, general surgery, pediatrics, orthopedics, obstetrics and ophthalmology, ENT (otorhinolaryngology), emergency medicine, psychiatry, dental, dermatology, and nephrology. From these 15 facilities, there were 12,214 medical device units with active and inactive critical medical devices from 1997 to April 2021. The medical device maintenance at these 15 facilities is currently performed by a concession company appointed by the government in a long comprehensive contract. A challenge is encountered in gathering, integrating, maintaining, processing, and analyzing various types of medical data. It is too complicated and inefficient to handle using existing database management systems because it consists of big healthcare data. Although computerized maintenance management can store a vast number of data, difficulties occurred when clinical engineers could not provide myriad maintenance data into a robust tool that enables the implementation of comprehensive maintenance strategies. The utilized medical device database contains both structured and unstructured data. Structured data is the device's general information, whereas unstructured data includes routine maintenance service records and troubleshooting actions performed by competent personnel (i.e., clinical engineers). Utilizing unstructured data necessitates a laborious pre-processing phase to clean and organize the data.

Predictive model development
Government hospitals in Malaysia used a web-based database to record and store medical device history since 1997. The web-based data for this analysis is asset and services information system (ASIS), which spans from the system's inception in 1997 to April 2021. Figure 1 depicts the overall proposed framework where data on 44 types of critical medical devices are extracted from 15 healthcare facilities (comprised of 8,294 devices). The preprocessing data stage is executed, where 17 input parameters are selected based on literature review findings. All 17 features from 8,294 devices are extracted to be fed to the proposed predictive model to anticipate the likelihood of first failure of the medical devices. Normalization is required in the pre-processing stage, where this process will ensure all features are within the same scale and range. After normalization, data is combined with categorical data as an input parameter or features to the predictive model. Normalization returns the vector-wise z-score of the dataset with center zero (0) and a standard deviation of one (1). Normalization operates on each column of data separately, and the below equation is used for z-score of a value x (Zamzam et al., 2021); n v u u u t (x = value, μ = mean, r = standard deviation, n = highest probability estimate of the population's standard deviation) The data point distance from the mean and standard deviation is the measurement for z-score. Returning as a vector and matrix, standard deviation of x and mean of x are used to calculate the z-score. After normalization is executed, a normalized value is imported into the software.
The following process flow is to identify the response classes boundary for classes 1, 2, and 3. The class is divided using an arbitrary technique based on the pattern in the data. The model is trained on five algorithms using the cross-validation technique with 80% training and 20% testing data segregation. Observing the confusion matrix, the model performance is examined, and the values for recall, precision, specificity, F1 score, and AUC are calculated. We perform sensitivity analysis to further optimize the developed predictive model in determining the most significant features for medical device failure prediction. The proposed model is then tested using a new test dataset to predict failure classes and the outcomes on cost impact. A new proposed maintenance schedule and replacement plan will be discussed further in the discussion section.

Machine learning and deep learning application
In this stage, ML and DL techniques are explored using five algorithms and three optimizers for both approaches, respectively ( Fig. 1). Both ML and DL techniques were compared with the 17 features embedded in the model. A sensitivity analysis is performed to optimize the reliability and rank the significant features in determining the best predictive model. Decision trees, naïve Bayes, support vector machines, ensemble classifiers, and neural network algorithms are used for ML applications. Meanwhile, the stochastic gradient descent with momentum (SGDM), root mean square propagation (RMSProp), and adaptive moment estimation (Adam) are applied for DL. A support vector machine in ML splits data into classes by locating the optimal hyperplane that divides each point into its corresponding category. Conversely, numerous weak learners' outputs are combined into one accurate ensemble model using ensemble classifiers algorithm by boosting the maximum number of splits and learners. A similar tree model is also applied in decision trees with responses predicted by following the root to leaf node and containing responses in true or false conditions. Gaussian distribution with a mean and standard deviation is used in naïve Bayes to simulate the predictor distribution within each class. In addition, a feedforward fully connected neural network is utilized, which has a connected layer with each fully linked layer that multiplies the input by a bias vector and a weight matrix.
Moreover, a convolutional neural network (CNN) for DL is used with SGDM, RMSProp, and Adam as the training options. SGDM optimizer specifies the momentum value using momentum training options, and RMSProp has a decay rate option using squared gradient decay factor. Adam optimizer is propounded with decay rates of gradient besides squared gradient moving averages using gradient and squared gradient decay factor. As a result, the SGDM optimizer can oscillate along the precipitous descent, leading to the best result. One method to decrease this oscillation is to include a momentum term in the parameter update as specified in the SGDM equation below (Dubey et al., 2020;Essai Ali & Taha, 2021;Setiawan et al., 2019;Sultana et al., 2021): ð Þ (γ = current iteration's contribution from previous gradient step, a = learning rate, ' = iteration number, h = parameter vector, rEðh) = loss function).
The application of learning rates that vary by parameter and may automatically adjust to the optimized loss function shall enhance the network training. RMSProp comes into place where it upholds a moving average of the parameter gradients' element-wise squares with a decay rate of the moving average identified as β2 and is applied in the below equation (Dubey et al., 2020;Essai Ali & Taha, 2021;Setiawan et al., 2019;Sultana et al., 2021): (b 2 = the moving average's rate of decay, 2 = small constant is added to prevent zero division).
Besides, RMSProp has similar characteristics to Adam, provided Adam has the added momentum term. It maintains a moving average, element by element, of the parameter gradients and their squared values using the below equation (Dubey et al., 2020;Essai Ali & Taha, 2021;Setiawan et al., 2019;Sultana et al., 2021): A decay rate value of β1 and β2 can be specified using the gradient decay factor. Adam optimizer uses a moving average, and network parameters are updated using the equation. When gradients over several iterations are comparable, employing a moving average of the gradient allows the parameters to change and gain momentum in a particular direction. The parameter updates also decrease in size if the gradient is primarily noise-based because the moving average of the gradient shrinks. A technique to counteract a bias that occurs at the start of the training is also included in the whole Adam network.

Features selection
Numerous critical medical devices are discussed in the Malaysia Standard of Good Engineering Maintenance Management of Active Medical Devices in MS2058: 2018 (Department of Standards Malaysia, 2018). They are categorized into three main groups: therapeutic, diagnostic, and laboratory. Besides, the devices are grouped into critical equipment and patient support equipment. This article selects the critical medical devices as listed in MS2058: 2018. From this list, 72.4% of therapeutic and 48% of diagnostic devices are categorized as critical devices, whereas these two categories also increased efficacy, complexity, and tendency for adverse effects (Curtis, Tzannes & Rudge, 2011). Despite its criticality, in 2020, the diagnostic imaging equipment segment held the greatest market share, significantly impacting the medical device maintenance market globally (Markets and Markets, 2021). All devices under the laboratory group are categorized as patient support equipment and not classified as critical medical devices. Hence, by eliminating inactive medical devices and laboratory medical devices, the total number of 8,294 active medical devices. All 44 types of critical devices are located at critical locations such as operation theatre (OT), ICU, accident and emergency (A & E), and wards. A total of 34.68% of the devices are infusion pumps, and 14.93% are physiologic monitoring systems which are located in various areas in Perak hospitals. Other devices below 10% in percentage are radiographic systems, drills bone, cystoscopes, colonoscopes, colposcopes, laparoscopes, mobile radiographic/fluoroscopic, dental radiographic, surgical hand drills, injectors, lithotripters, pacemakers, peritoneal dialysis units, resuscitators, stimulators, and ultrasonic. High-end medical devices or other categories are small in numbers and primarily located in X-ray Department. These include Radiographic/Fluoroscopic Systems Angiographic/Interventional, Radiographic/Fluoroscopic Systems General-Purpose Radiographic Units Mammographic, Scanning Systems Computed Tomography, Scanning Systems Magnetic Resonance Imaging, Full-Body. This high-end equipment is located at State Hospitals, Major Specialist Hospitals, and Minor Specialist Hospitals with medical imaging practitioners or Radiologist specialists.
We have selected 17 input parameters or features to be fed and tested in ML and DL predictive frameworks. Details on each parameter are summarised in Table 1, where some features are entered numerically, and others are provided to the ML and DL framework in a categorical manner.

Service support
In manufacturing medical devices, a high technological impact is implemented. Therefore, manufacturers or authorised representatives must ensure the technical service's availability throughout its life cycle. After-sales service is essential for troubleshooting during failure, including procurement of spare parts, including wear and tear components, and the said parties shall provide a technical recommendation. During the procurement stage, the user and technical procurement committee will request the medical device suppliers to include a guarantee letter of product service for at least 10 years after delivery. Similarly, the manufacturer will publish a discontinuation letter to inform customers that the model has been discontinued after 10 years or less. At this point, the user is urged to replace or upgrade the medical devices.

Asset condition
Asset condition definition is divided into three categories: active or still in use, declared as an unrepairable failure, and approved for disposal. The same terminology is used in ASIS to categorize these devices. A functioning device being utilized by patients or placed on standby in clinical locations is an active device. Whenever after-sales service is no longer available and/or securing spare parts is problematic, the Concessionaire's maintenance team will recommend the device be classified as an unrepairable failure. On this note, clinical engineers will evaluate the recommendation to verify whether there is a shortage of medical devices, the device's safety, and replacement is approved upon disposal. Later, the clinical engineer will issue a disposal certificate to proceed with the disposal process as per the Malaysia Treasury Circular.

Service intention of function
Service intention refers to the medical equipment's primary purpose or intended use. Five criteria are involved: life support, therapeutic, diagnostic, analytical, and miscellaneous (Zamzam et al., 2021). A life support device is the type of device of which patients would suffer prolonged or new injury, or worse still, be in a fatal state, should the device become unavailable or malfunction. The device that provides treatment for any illness or disease is therapeutic equipment. A diagnostic device is used for diagnostic purposes, and the device will display clinical parameters or human anatomy images for further diagnosis that will be examined by clinical personnel. This study excludes the analytical and miscellaneous categories since only critical groups are included.

Frequency maintenance requirement
The manufacturer of the device sets frequency maintenance requirements for every device. As they become old or highly utilized, more maintenance is required, and subsequently high risk of maintenance errors (Dhillon & Liu, 2006). Upon procuring a medical device, the manufacturer will provide a user and service manual with a suggested maintenance schedule. The suggestion offers PPM frequency, spare parts code, minor troubleshooting steps, etc. In Malaysian Government hospitals, the Concessionaires will follow the maintenance schedule set by the manufacturer, which is executed in a cycle of three months, semi-yearly or yearly. Besides, the medical device regulation in Europe establishes a new reusable or reprocessed class I device where the manufacturer must support the safety and efficacy of the cleaning, disinfection, and sterilizing processes (Garzotto et al., 2020). Diagnostic imaging devices shall fulfill the requirement set by the regulation due to harmful radiation effects and must be controlled within a specific limit (Anis et al., 2020;Atomic Energy Licensing Board, 2006). The quality control (QC) certificate issued by a class H Licensee must be issued twice a year, generally after PPM is conducted, to ensure that radiation exposure is within safe limits for patients. Maintenance at the specified intervals can reduce the likelihood of failure, but it requires a high cost and skilled employees trained by the manufacturer.

Maintenance complexity
The degree of difficulty in completing maintenance procedures is defined as maintenance complexity. There are two types of maintenance complexity set as predictors: average and high-end maintenance with electrical safety test (EST). High-end equipment is principally diagnostic imaging equipment and system located in radiology department. This type of system requires a three-phase power supply and is equipped with a console and voltage stabilizer as a complete system. EST is performed on a device with direct contact with patients or humans to ensure no electrical leakage, and the current flowing is within the limit. Electrical safety is paramount in medical device quality assurance best practices. Shock can cause disruptions during healthcare procedures and result in injury or death.
The main objective of this test is to ensure patient and user safety.

Total downtime
Downtime is a reverse condition of uptime. A medical device in a non-functioning state is under a downtime period until the rectification work is completed. The downtime affects the service delivery, and the user needs to search for a replacement or standby unit during the interrupted period. The time will be calculated from the time user launch a complaint through the helpdesk or ASIS, where the time is recorded until the device is back to its normal condition. In Malaysia's medical device service contract, the Concessionaire shall provide a replacement or loaner during this downtime period. To ensure service delivery is in place, they must also bear the cost of outsourcing patients or laboratory samples to private hospitals as a service delivery obligation.

Alternative and backup
Alternative services are a mechanism to minimize service interruption where an external party or private healthcare provider provides patient care services. This includes outsourcing patients or laboratory samples due to faulty medical devices and maintenance factors. A loaner or a backup device is a medical device temporarily placed on service to replace the malfunctioning device to ensure minimal service interruption. The replacement must be executed after failure and limited to the critical medical device set in the contract. A supply of loaners is a must in the comprehensive contract for selected medical devices such as aspirators, bronchoscopes, colonoscopes, cardiotocographs, defibrillators, electrocardiographs, hemodialysis units, incubators, infusion pumps, mattress systems, nebulizer, vital sign monitors, physiologic monitoring systems (acute care), electrosurgical unit, sphygmomanometers, and ventilators. Other devices are not included in the contract, and a replacement request is an option to be fulfilled.

Operations
The average usage in hours within 6 and 7 days is defined as a parameter for operations features. There are two types of operations: average usage of 12 h in 6 days or 24 h in 7 days. The time is an average based on locations and is subject to the actual usage by the user. Unfortunately, no sensor has been installed to monitor every medical device's usage or utilization. The more accurate result for this feature requires the installation of an individual sensor at every device for recording purposes. A sensor installation and usage log are still under the Ministry of Health Malaysia's ongoing projects, and the model shall be improved after the installation is completed.

Maintenance cost
Maintenance cost is the total cost spent by the Concessionaire to rectify the failure of medical devices. The cost will be entered into the system for record purposes and spare part tracking or analysis. Providing all maintenance services in-house is not always possible. In such cases, using external service providers for a significant portion of the maintenance activities may be necessary; thus, this will incur additional costs. External service providers are divided into two categories which are equipment manufacturers and independent service organizations (World Health Organization, 2011b). The maintenance cost in this article includes contractor costs if authorized vendors, labour, and parts costs perform the rectification work. The data will consist of only the cost available in the system, and the cost of outsourcing patients or samples is not included.

Manufacturing country, model, manufacturer, and brand
Despite device utilization and age, other essential factors in evaluating a medical device's performance are model, manufacturer, manufacturing country, and brand. For example, two medical devices of the same age will have different performance and uptime status, depending on the model specification and utilization. These features help train a model to evaluate performance based on its specification, and the proposed model can be used by tendering committee during the medical device procurement stage in the future. The difference between these four features is that the USA is the country of origin, and the model is Smartvent 7900. The manufacturer is Datex-Ohmeda Inc, and the Smartvent represents the Brand. The major players in the medical devices maintenance market are GE Healthcare (Chicago, IL

Failure classes
The classification problem in supervised ML is overcome by defining response classes before execution. Hence, based on the tabulated data extracted from ASIS, an arbitrary technique is used to create three classes based on the available dataset. Then, the best balance classes are selected, and the response classes for critical medical devices are subdivided into classes 1, 2, and 3. With the capability of supervised machine learning analyzing retrospective data, classifying critical medical devices is beneficial for more effective PPM planning, management replacement plan, and strategic annual budget preparation. As for newly purchased critical medical devices, the proposed model can evaluate its performance throughout its lifespan based on the brand, model, failure history, and other features used during the training stage. Five algorithms are utilized in ML: decision trees, naïve Bayes, support vector machines, ensemble classifiers, and neural networks. In addition, a cross-validation technique with segregation of 80% training and 20% testing is used. The proposed predictive model uses 17 features, as tabulated in Table 1, to distinguish the devices into three classes. Class 1 is defined as unlikely to fail within the first 3 years from the purchase date, while class 2 is for devices that are likely to fail within 3 years. Class 3 is for devices likely to fail more than 3 years after purchase, as tabulated in Table 2. Medical devices in class 1 are less critical and have low complexity in maintenance than in classes 2 and 3. Class 1 can be identified as the least problematic devices, including newly purchased ones. Class 2 is a matter of concern since the failures are detected closer to the purchase date. The class 3 devices are classified as prone to failure after 36 months of purchase date. The scheduled maintenance frequency is conducted as per the manufacturer's recommendation in regular maintenance practice. There is no consideration has been made based on the failure history data. Therefore, with the proposed predictive framework in this study, new recommendations shall be made where maintenance frequency and cost can be reduced based on the forecasted analysis attained from the proposed model. Out of the total 8,294 critical devices in 15 different hospital categories, 49.51% of devices were classified as class 2, while the remaining 26.89% and 23.58% are categorized as Class 1 and 3, respectively. Thus, the yearly budget allocation for class 1 can be reduced and should be reallocated to classes 2 and 3.

Parameter tuning and optimization
Parameter tuning and optimization are executed to improve classification performance, accuracy and introduce a unique identity to the model. Ensemble classifier outperforms other algorithms with an accuracy of 77.90%, followed by decision trees with 76.30% after parameter optimization, as explained in Table 3. Receiver operating characteristics (ROC) curves and AUC were measured where the higher the AUC values (as it is closer to '1') indicate an excellent predictive model (Shiferaw, Bewket & Eckert, 2019). In this study, Ensemble classifiers attained an AUC of 0.89, decision trees 0.85, while both support vector machines and neural networks attained 0.82. Meanwhile, naïve Bayes only achieves 0.80 for AUC values. Parameter optimization increased the model accuracy for four algorithms. However, parameter optimization for decision trees remains at an accuracy of 76.30% before and after optimization. Hence, ensemble classifier performs best with the highest recall, precision, specificity, and F1 Score values after parameter tuning and optimization. As compared to the DL technique, SGDM optimizer denotes the highest accuracy compared to RMSProp and Adam optimizer. There is a reduction in performance accuracy from 77.90% for ML to 70.33% for the DL model. However, DL has the advantage of shorter training time than ML. DL requires 1 min 5 s to complete the training progress, much faster than 11.49 min using ML. A model optimization is performed for all algorithms and optimizers in both techniques. A kernel scale is optimized in support vector machines algorithm from automatic to one, producing 65.90% instead of 65.80% during pre-optimization, as in  Fig. 2. Nevertheless, the optimization process reduced the accuracy from 68.88% to 68.76% for Adam optimizer. A training progress graph interprets a DL technique, representing the accuracy of each unique mini-batch. The model performance can be monitored in real-time and model progress can be stopped at any time. Figure 3 shows RMSProp takes the longest elapsed time of 1 min 13 s, followed by 1 min 12 s for Adam and 1 min 5 s for SGDM optimizer, respectively. The model accuracy is plotted in "blue" while model losses are plotted in "orange." Initially, when the epoch increases, the losses are reduced and continually saturated until it reaches the maximum epoch of 60. The training progress is stopped at epoch of 60 since the progress is saturated and gives an insignificant impact if the process continues.

Predictive model performance and evaluation
The quality of the characteristics given to the algorithm determines the accuracy of the model's predictions. The overall accuracy and Kappa coefficient are two extensively used indicators for thematic accuracy controls on error matrix (Garcia-Balboa et al., 2018). In addition, precision and recall are two relevant metrics used for evaluating prediction accuracy (Teo et al., 2020). The recall performance measures the trustworthiness of the result or ability to classify positive outcomes at a true positive rate. At the same time, the precision demonstrates the predictive or positive values which correctly predicted (Kareen, 2020). Both precision and recall are frequently at odds, and boosting precision reduces the recall with text retrieval concepts representing both calculations (Ghorbani et al., 2020). A recall is a number of relevant features in the selected subset divided by the total number of relevant features. For precision, it is divided by the total number of features specified in the dataset (Santra & Christy, 2012).
Analyzing and comparing models based on recall and precision is time-consuming; thus, employing the F1 score method is another viable result (Ghorbani et al., 2020). The F1 score is a harmonic average of precision and recall, consider both metrics and evaluate the model's accuracy and dependability. The confusion matrix is the most common method of reporting on the thematic accuracy of geographic data (Garcia-Balboa et al., 2018;Santra & Christy, 2012). The true class is represented by the rows of the confusion matrix, while the columns represent the predicted class in the 3 × 3 matrix. Correctly classified observations are expressed by diagonal cells; in contrast, erroneously classed observations are represented by off-diagonal cells with TN as a true positive, FP as a false positive, FN as a false negative, and FP as a false positive (Kareen, 2020;Zamzam et al., 2021). An equation or an evaluation metric by Hameed et al. (2021) is used to calculate accuracy, recall, precision, specificity, and F1 score based on a confusion matrix (Kareen, 2020;Teo et al., 2020). The below equation is calculated based on confusion matrix values and is summarized in Table 3. Ensemble classifier has the best performance with 74.67%, 75.39%, 87.60%, and 75.03% for recall, precision, specificity, and F1 score values, as explained in Table 3 previously. Accuracy ¼ Total number of correctly classified Total number of observation Table 4 describes the sensitivity analysis and evaluates the features' effect on the model performance. Leave one out technique is performed by calculating misclassification and then dividing with all feature errors to obtain the ranking ratio (Chen et al., 2020;Gazzaz et al., 2012;Pastor-Bárcenas et al., 2005). The ratio is ranked in descending order to highlight the impact of the features on the model. Next, five sensitivity analysis techniques, namely Leave One Out, MRMR, Chi2, ANOVA, and Kruskal Wallis, are applied to compare the ranking for all features. MRMR utilizes the minimum redundancy maximum relevance algorithm to rank the characteristics in order. A chi-squared algorithm uses a chi-square test, an ANOVA uses one-way variance analysis, and Kruskal Wallis performs a hypothesis with the same median from the population (The MathWorks Inc, 1994-2021. These three algorithms ranked the features by −log(p) scores, where the p-values denote the chi-square test statistics. The five highest-ranking features for every technique are underlined in Table 4, and the features are analyzed in the software to achieve better accuracy. Of the 17 features, only eight are identified as the most significant to obtain the highest accuracy. If more than eight features are selected, the accuracy decreases. Figure 4 compares sensitivity analysis techniques when the eight most significant features are selected to develop the model. The features are age, service support, asset condition, maintenance complexity, total downtime, maintenance cost, model, and purchase date. The graph demonstrates ANOVA and MRMR outperform other techniques with 79.20% accuracy, followed by 79.00% for Chi2. Kruskal Wallis and the Leave one out technique attained 78.10% and 77.50% accuracy, respectively. The model is improved from 77.90% using 17 features to 79.20% accuracy with eight features after sensitivity analysis.

Proposed predictive model using machine learning
The ML predictive model is improved from 77.90% to 79.20% accuracy after sensitivity analysis, as described in the result section. The ML requires more training time of 11.49 min compared to only 1 min 5 s using DL when all 17 features are embedded in the model development. After eight significant features are imported with a learning rate of 0.01 and iterations of 50, the ML model improves its performance to 79.50% accuracy, 76.05% recall, 77.43% precision, 88.36% specificity, and 76.73% for F1 Score indicator. The tuning of learning rate and iterations enhanced the model, with training time reduced from 11.49 to 7.908 min after sensitivity analysis. Although ML has better accuracy than DL, it  requires more training time. The DL technique predicts more extensive data better, is primarily used in time series or image data and requires less time than ML. The result concludes that ML performs better in accuracy and all other performance indicators than DL. This best-optimized ensemble classifier model uses Ada Boost with a maximum number of splits of 20 and a learning rate of 0.01 to yield 79.50% accuracy, as shown in Table 5. The proposed model is expected to improve the current system toward smart healthcare management. The model predicts classes 1, 2, and 3, which represent the time to the first failure event for an effective maintenance schedule and to utilize budget allocation. As compared to the other related works on medical device performance prediction, Kovačević et al. (2020) in infant incubators study predicted the device functionality and classified two different classes: accurate and faulty class with an accuracy of 98.5%. A similar methodology is applied by Badnjevic et al. (2017) for a mechanical ventilator. Using performance parameter values, a defibrillator study achieved 100% accuracy in the Random Forest classifier to predict positive: for devices that passed inspection or negative: for faulty devices (Badnjević et al., 2019). Meanwhile, Hrvat et al. (2020) attained 98.06% accuracy based on a conformity assessment where the outcomes are identified as pass or fail for infusion and syringe pumps. This article has significantly contributed to the medical device reliability assessment research by including 44 types of critical medical devices during model development and analyzing significant cost reduction after implementing the predictive model. To the best of our knowledge, no previous studies proposed a predictive model for anticipating the likelihood of a device's first failure after the purchase with cost analysis and comparison between ML and DL techniques.
Characteristics for classes, schedule maintenance, and replacement plan Classes 1, 2, and 3 have different properties, and the boundaries are set based on the patterns from the vast data. The present service contract fee per month in Malaysia is calculated by multiplying the device purchase cost and rate of fee, with the rate of fee defined in percentage based on the type of device. High-end equipment has a higher fee Poor maintenance, planning, and management are the most common causes of medical device failures (Arab-Zozani et al., 2021). Countries have different approaches to organizing maintenance and operational budget planning for medical devices. Saudi Arabia uses life cycle cost estimation in making decisions (World Health Organization, 2011a), and Turkey divided medical devices into technological groups to calculate cost distinctly (Bektemur et al., 2018). In the UK, they use purchasing, donations, replacement, and disposal policies to decide where, what, and when to procure medical devices. The replacement budget per year is implied by dividing the device's current prices by lifetime (Temple-Bird et al., 2005). The New Delhi Medical Equipment Maintenance policy uses the maintenance cost index as an indicator by dividing maintenance cost by capital cost. The maintenance cost values should not increase by 80% of the capital cost of equipment (Kumar, 2012). The United States of America applies the cost of service ratio by dividing the total annual cost for maintenance by the initial cost value. It is used as guidance to improve performance (World Health Organization, 2011b). Overall, at the time of speaking, none of these countries utilize AI applications for maintenance budget planning.
A rule of thumb with 80/20 rule is used for CM and PPM in Stenström et al. (2015), and their results describe PPM represents 10% to 30% of total budget allocation compared to CM. This article's cost-saving calculation uses 80/20 for CM/PPM yearly and 70/30 for CM/PPM bi-annually for cost estimation analysis. class 1 age ranges between less than 1 year to 30 years, which consists of 2,231 devices in service. There is zero failure, and zero parts cost recorded throughout its lifespan. Throughout the years, PPM is scheduled annually and bi-annually for class 1, with a percentage number of devices are 90.45% and 9.55%, respectively. Therefore, due to its low criticality, PPM frequency is suggested to be reduced from bi-annually to annually with an estimation of cost-saving equal to MYR 199,256.45 per year, as demonstrated in Table 6. On this note, this study proposes a new maintenance strategy, which includes recommendations as illustrated in Table 7. Among others, this study suggests that by reducing the fee rate for class 1 devices, the operational cost can be further reduced. The PPM can be replaced with routine inspection for small devices such as aspirators where the maintenance tasks are minimal; hence the yearly maintenance budget allocation for this group can be reduced for the following year. Maintenance can be executed in-house where only average maintenance complexity with EST is required. In addition, a minimal budget per year is necessary for a loaner or rental costs during downtime for this group of devices.
The biggest class group is Class 2, where 4,107 devices are grouped between 1 to 27 years of age. A total of 83.56% of PPM is scheduled annually, and the remaining are on a biannual basis, including 44 types of critical medical devices. Based on the finding of this study, the PPM schedule in class 2 is suggested to remain due to its criticality and risk of failure identified within 3 years of purchase. The devices are expected to fail at any time after being purchased; hence contingency plans or rental devices shall be planned for better service delivery to patients. Higher priority in budget allocation for maintenance and replacement is recommended for class 2 compared to class 1 and class 3. An existing warranty provided by the manufacturer or authorized party is typically within 1 or 2 years after purchase. Therefore, remedial action, such as improving the warranty service to 3 years after purchase, shall accommodate this group's critical needs. Besides, the cost saving can be maximized by purchasing a 3-year warranty, including breakdown. Hence, zero cost is required under the service contract for the first 3 years, with all PPM and breakdown costs embedded under 3 years warranty. The next group is class 3, with devices likely to fail more than 3 years after purchase, with a lesser risk of failure and complexity than class 2. A percentage of 89.52% of the PPM schedule is planned annually, and 10.48% for biannually. PPM frequency is suggested to be reduced bi-annually to annually with an estimated cost-saving equal to MYR 127,074.43 per year, as shown in Table 6. Like class 2, zero cost is required under the service contract for the first 3 years for class 3 devices if PPM cost is embedded under 3 years warranty. The advantage of this alternative is that the service contract fee will only start in 4 th year after class 2 and class 3 devices are purchased. This framework will lead to more cost savings by reducing service contract values.
In addition to the cost-saving demonstrated in Table 6, a breakeven analysis graph can be observed in Fig. 5. This graph shows a case study analysis for a mammographic unit with 21 years of service in the class 3 category. The analysis from the graph depicts that the proposed model has a lesser service contract cost per year compared to the current maintenance practice. With the purchase cost of MYR 422,026.00 allocated by the Government for mammographic equipment, MYR 48,744.00 per year is spent on maintenance services under the present contract. However, if the proposed model is implemented, MYR 41,432.20 per year will be used, which is a 15% savings from the actual cost spent under the present contract. The service contract cost is equivalent to the purchased cost in the 11 th year for present practice and the 12 th year for the proposed model. Therefore, the Government is suggested to replace the equipment no later than 12 th year for better profit management. At the 12 th year, a new device shall be purchased, and unnecessary service contract costs shall be eliminated from 12 th to the 21 st year of service as per the current implementation. Another strategy to consider is the replacement plan for faulty devices. This study proposes class 2 devices be prioritized for a replacement plan, with 10% of this group's devices being more than 20 years in service, as tabulated in Table 7. A total of 57.19% of class 2 devices have more than 10 years in service, with a high number of failures throughout their lifespan. Due to the aging factor and a significant number of failure events, replacement with new units should be considered to reduce maintenance costs as the age arises. Class 3 has a similar scenario with 65.18% of devices with more than 10 years in service and should be considered for a replacement right after class 2.

Research contributions
The development of IoT, cloud computing, and AI are continually evolving toward smart healthcare and smart city. The proposed predictive model using AI for medical devices failure prediction is categorized into classes: class 1, 2, and 3. The model accuracy is evaluated by examining other elements in performance evaluation, such as recall, precision, specificity, F1 score, and AUC. The research gap is improved by discovering new scientific findings; The proposed model includes 44 types of critical medical devices used in five hospital categories, with 15 healthcare facilities involved, including all critical devices used in the clinical area for patients. The data applied during the training stage includes active  medical devices with data on maintenance cost, devices with more than 20 years in service, and sample size up to 8,294 devices resulting in the highest accuracy of 79.50%. Only eight out of 17 features are significant after sensitivity analysis, with a reduction of training time from 11.49 to 7.908 min. A robust model to predict failure in three classes using AI is introduced with fewer features, shorter training time, and comprehensive cost analysis. This proposed model can forecast the first occurrence of failure in classes 1, 2, and 3 after medical device is purchased for comprehensive maintenance planning and budget utilization which is currently not in the field of study. The maintenance shall be arranged before the first failure event. Class 1 is the less critical device with zero failure. Class 2 is identified as the most crucial and should be attended to, with the first failure occurrence likely to be within 3 years after purchase. Class 3 is at medium risk; devices are likely to fail after 3 years. To reduce the likelihood of future failures, device replacement for class 2 is in higher priority, followed by class 3. A replacement is proposed by a number of failures and year of service category. This is crucial for any country during crisis management, such as COVID-19 pandemic, where excellent and reliable equipment is highly utilized. This article compares the country's role in organizing budgets while reducing costs in maintenance management. A new PPM schedule and replacement plan frequency are proposed and strategized based on actual needs. Comprehensive strategic management by criticality and devices maintenance history using AI improves maintenance and operational cost. Replacement strategy shall be executed to class 2, then to class 3, depending on the age of the devices and the likelihood of failures. Two different techniques between ML and DL were compared, and ML has better performance in accuracy, recall, precision, specificity, and F1 Score. DL has the advantage of shorter training time; however, the accuracy is lower than the ML technique. The accuracy shall be further improved in future work by introducing unstructured maintenance notes written by technical personnel after rectification work is completed.

CONCLUSIONS
A robust predictive model for 44 types of critical medical devices is proposed in this article for smart healthcare management within three failure classes; class 1, 2, and 3. Class 1 includes devices unlikely to fail within the first 3 years from the purchase date; class 2 is the devices that are likely to fail within 3 years from the purchase date, and class 3 is the devices that are likely to fail more than 3 years after purchase. The result concludes Ensemble Classifiers have better performance than SGDM optimizer and attained the highest accuracy of 79.50% with the highest recall, specificity, and F1 score values after significant features are recognized. Replacement of class 2 devices is expected to improve critical medical devices' uptime and optimize yearly budget allocation. An aging medical device with a high number of failures is impacted by high maintenance costs, higher downtime, and exceeding its lifespan, which can jeopardize patient safety. The lifespan of medical devices is also affected by various factors such as age, utilization, environment, user handling, specification, availability of spare parts, safety, etc. Although PPM frequency is suggested to be reduced in classes 1 and 3, technical personnel shall strengthen PPM tasks to accommodate the needs before reaching the following PPM schedule and ensure useful life is not compromised. The functionality and safety should be guaranteed during PPM as per IEC60601 requirements. In addition, the predictive model can be used as a reference for new medical device procurement. AI's capability to understand hidden patterns of historical device data can facilitate decision-making by making the replacement process faster. This can reduce time spent on data analysis, where the process optimization tasks can be done automatically without human intervention. In addition, the predictive maintenance capability achieved by AI allows healthcare institutions to maintain industrial medical equipment based on the times and conditions of operation, allowing the equipment to increase its performance and life cycle. Having more information in a structured way allows clinical engineers in charge to make decisions faster and more efficiently. In addition, we have demonstrated the feasibility of leveraging AI technology in developing smart maintenance systems. This contribution has shown promising results to be implemented in our healthcare facilities by integrating existing asset management systems and the developed AI predictive model. We have also presented how government hospitals can save their maintenance budget by having a smart maintenance system using AI predictive model. The limitation of this research is that data collection is based on existing data in ASIS, and there are possibilities where no failure event is recorded even though it has occurred. Other factors, such as user disregard to launch a failure complaint, error during human intervention with the system, or any system issues, might also be the constraint. Future work shall include adding an unstructured maintenance note during failure events to improve the model's accuracy and reliability. Data Availability