Bone Drilling: Review with Lab Case Study of Bone Layer Classification Using Vibration Signal and Deep Learning Methods

: In orthopedics, bone drilling is a crucial part of a surgical method commonly carried out for internal fixation in bone fracture treatment. The primary purpose of bone drilling is the creation of holes for screw insertion to immobilize fractured parts. The bone drilling task depends on the orthopedist and surgeon’s high level of skill and experience. This paper aimed to provide a summary of previously published review studies in the field of bone drilling. This review paper also presents a comprehensive review of the application of machine learning for bone drilling and as a future direction for automation systems. This review can also help medical surgeons and bone drillers understand the latest improvements through parameter selection and optimization strategies to reduce bone damage in bone drilling procedures. Apart from the review, bone drilling vibration data collected in a university laboratory experiment is also presented in this study. The vibration data consist of three different layers of femur cow bone, which are processed and classified using several deep learning (DL) methods such as long short-term memory (LSTM), convolutional neural network (CNN), and recurrent neural network (RNN). These DL methods are used in the bone drilling lab case study to prove that the layers of bone drilling are associated with the vibration signal and that they can be classified and predicted using DL methods. The result shows that LSTM is outperformed by CNN and RNN.


Introduction
Bone drilling is a medical procedure that involves the creation of small holes or drilling into bones for various purposes, including diagnosis, treatment, and research.Although bone drilling may be invasive, this technique has become an integral part of modern medical science, and it offers various important benefits.Bone drilling plays a crucial role in accurately diagnosing various diseases and conditions.In addition to diagnosis, bone drilling also makes a significant contribution to medical research.The development of safer and more precise drilling equipment and techniques has become possible because of the efforts of scientists and doctors.These innovations have a positive impact on patient safety and the effectiveness of medical procedures and can help reduce the risk of complications and speed up the healing process.In bone drilling, inner base bone structures that are examined and treated with nails and screws are generally composed of three layers, namely first cortical, spongy, and second cortical [1].
Currently, manual hand drilling is still the main method in orthopedic surgery in which the process is solely controlled by a surgeon and orthopedist.Bone drilling requires the orthopedist's and the surgeon's extensive experience and dexterity.The drilling procedures are performed without visual guidance, making it difficult for surgeons to determine the depth of the holes they are creating [2].As a result, the effectiveness of the bone drilling process is strongly dependent on the surgeon's skill and ability to evaluate the drilling operation based on their own understanding [3].The bone drilling experience by the surgeon is subjective; for example, the applied force given by the surgeon depends on the drill bit speed, the bone condition, and the type of drill bit [4,5].
A recent study that compares ultrasonic-assisted drilling (UAD) to conventional drilling in bone surgeries is presented in [6].The study examines optimal drilling parameters such as drilling force, temperature elevation, osteonecrosis, and micro-crack formation.The study found that the ultrasonic drilling resulted in less force and did not produce micro-cracks in cortical bone compared to the conventional drilling.However, it has the side effect in which the temperature elevation is higher than in conventional drilling.In addition, histopathological and scanning electron microscopic (SEM) analysis is conducted to evaluate the osteonecrosis and structural damage.The result shows that UAD is more advantageous for bone surgeries than the conventional method because it can reduce tissue damage.Another comparison study of conventional and UAD techniques is presented in [7].The study presents in detail the comparative analysis of diametric delamination in the drilling of cortical bone using conventional drilling and UAD techniques.A coordinate measuring machine (CMM) is used in the study to characterize delamination during bone drilling.A quantitative comparison was also presented in the study with the finding that UAD causes less delamination than conventional drilling, with maximum delamination for UAD and conventional drilling of 8.54% and 9.15%, respectively.Ultrasonic actuation application in bone drilling is also presented in [8].The objective is to reduce the cutting force and temperature during the bone drilling.The comparison study between conventional drilling and UAD is discussed.The study found that UAD has a higher viability and greater pullout strength, which can potentially lead to low-trauma surgeries.
Recently, bone drilling research has mostly focused on monitoring techniques and drilling parameters.An automated bone drilling system and a bone-drilling medical training system (MTS) [9] are the future directions of this particular research area.The bone drilling MTS is a sophisticated tool designed to train medical professionals in the application of force during bone drilling procedures.This system operates in a virtual environment (VE) and aims to teach users how to apply force within a specific range, thereby maintaining a constant drilling thrust velocity.The virtual reality (VR) simulator consists of visual, acoustic, and haptic warning signals [9].Another study that presents the training system with 3 degrees of freedom (DOF) force feedback is presented in [10].Another proposed MTS concept is presented in [11].The concept generally consists of the following: (1) The system architecture based on haptic display (HD) and graphical user interface (GUI); (2) A control system using proportional derivative (PD) position control.
A comprehensive review of surgical simulators for orthopedic and neurosurgeries, which focuses on haptic and VR technologies, is presented in [12].The review paper informs that the main part of the orthopedic simulator is the haptic system.The haptic system in the simulator is expected to provide tactile sensations that mimic the real-life feel of orthopedic surgery.The haptic system is supported by force feedback, which is calculated based on the interaction between virtual tools and the simulated anatomy [12].More details of the review on MTS and potential automated systems in bone drilling are presented in Section 3.
A current review paper provides a different side of bone drilling, which discusses the vibration analysis of different bone layers and the application of the DL methods.The structure of the paper following the introduction is as follows: Section 2 presents a summary of the previously published review paper on bone drilling.A brief review of the MTS and robotic drilling as potential future technologies are discussed in Section 3. Section 4 presents a review of published papers on bone drilling vibration analysis and the application of ML methods for bone layer classifications.Section 5 presents the bone drilling lab's experimental setup and procedure.An application of DL methods is presented in Sections 6 and 7. A detailed description of the LSTM method and its results is presented in Section 6.For other DL methods, i.e., convolutional neural network (CNN) and recurrent neural network (RNN), a performance comparison is presented in Section 7. Section 8 presents the conclusions and the future direction of the study.For ease of understanding in reading this review paper, which consists of a lot of terminologies, an abbreviation table is provided and presented in Table 1.

Previous Review Studies on Bone Drilling
Extensive review studies related to bone drilling have been presented.Information about what is the difference between the present review paper and other published review papers is presented in Table 2.The following is a more detailed description of Table 2: A detailed review paper of various studies on bone drilling is presented in [4].The paper compares various studies on bone drilling, highlighting the influence of bone drilling parameters and drill specifications to find the optimized bone drill specifications for a better outcome.The study described that a significant risk during drilling is the increase in bone temperature, which can lead to osteonecrosis and can affect the stability and strength of the bone fixation.In their future directions section, the authors emphasize the need for more advanced drilling methods, precise experimental setups, and automated systems to minimize human error and reduce associated risks.At the end of the paper, the authors provided eight points for future works, one of which is to improve the control penetration of the manual skill in typical bone drilling by developing automated drilling systems using a fuzzy logic controller that analyzes the current consumption by the direct current (DC) motor.
A study that presents practicality, limitations, and complications related to surgical drill bits in bone drilling is discussed in [13].The study starts with the types and anatomy of surgical drill bits, followed by the cutting operation, which caused heat generation.Mechanical properties of the drill bit, such as moment of inertia, wear, and dulling of the cutting face, are also explained.Intraoperative and postoperative complications of the drill bit during surgical bone drilling are also presented comprehensively.A study also summarizes the previous research related to the thermonecrosis biological models.In the future direction, the study mentioned that ultrasonic or vibration-assisted drilling is one technology to reduce both axial thrust force and drilling torque.
A review that focused on cutting force and temperature variation in bone drilling is presented in [14].Drilling accurate position holes and maintaining clean surrounding holes are crucial.The study also mentioned the importance of maintaining a temperature of less than 47 • C during drilling to avoid bone cell death due to the occurrence of thermal osteonecrosis.Drill design, drill parameters, and coolant were reported as the important factors for controlling heat in the bone drilling.Other factors, such as spindle speed and feed rate, are also important to avoid bone damage.
A comprehensive review of the mechanical and thermal responses in bone drilling, which is a critical aspect of various procedures, is presented in [15].The discussion of the paper includes the bone structure, drill-bit geometry, operating conditions of bone drilling, and techniques and optimization.In bone structure, the inhomogeneity and anisotropy of bone tissues and their impact on drilling outcomes are discussed.The influence of drill-bit design on the efficiency and safety of bone drilling is a part of the drill-bit geometry section.The effect of drilling parameters such as spindle speed and feed rate on the mechanical and thermal responses during drilling is presented in the operating condition of the bone drilling section.Current techniques used in bone drilling and parameters optimization are presented in the last part of the review paper.However, future works or future direction was not provided.
Another review that discusses the factors affecting heat generation in bone drilling is presented in [16].The paper focused on the thermal osteonecrosis that occurred during the bone drilling.The study suggests the need for more in vivo studies on human bone and how drilling parameters interact to influence heat generation.However, there is a challenge in the measurement method of bone temperature due to the complex properties of bone tissue and the lack of a standard procedure.
Another comprehensive review on the bone drilling process investigation and possible research is presented in [17].A typical schematic diagram of the bone drilling process is provided.Factors influencing bone drilling efficiency and temperature rise are also discussed.The study mainly focused on the investigations of conventional bone drilling to obtain information such as bone type, experimental type, experimental details, and research outcome.To complement the conventional bone drilling review, the authors also provided the investigations of non-conventional bone drilling studies such as ultrasonicassisted drilling (UAD) [18], vibrational drilling technique [19], water-jet drilling [20], automatic drilling process [21], and acoustic emission (AE) based monitoring process [22].In the summary section, the authors highlight that most of the previously published research articles presented temperature measurement and analysis during orthopedic drilling.Another summary can be read in detail in the paper.
Heat is one of the major issues in the bone drilling procedure, and the study in [23] presents the factors that affect drilling behavior to prevent excessive heat generation.The study also discusses a model of bone drilling to find the relationships between the drilling parameters.The study also highlights the necessity of the improved drill bit to minimize thermal and mechanical damage to the bone in the future direction.The development and application of a robotic bone drilling system as an advanced bone drilling procedure is also necessary and emphasized in the paper.
A state-of-the-art review and comprehensive analysis of orthopedic drilling are presented in [24].The review summarized numerous articles on conventional and nonconventional drilling parameters and their technologies.Bone drilling characteristics and control variables were presented in very detail and inclusively.Apart from the detailed review, non-conventional techniques in orthopedic drilling are also described.It includes water jet-assisted drilling (WJAD), laser-assisted drilling (LAD), and UAD.Design of experiments and modeling in orthopedic drilling based on the Taguchi method, analysis of variance (ANOVA), and fuzzy logic are also presented.The review paper also provided several future directions, two of which are as follows: (1) The vibrational bone drilling with an internal closed-loop irrigation system is potentially used to minimize heat and thrust force; (2) Robotic bone drilling with multiobjective optimization can reduce thermal and mechanical damage.
A review paper that highlights the use of robotics and autonomous systems designed in bone drilling as part of computer-aided orthopedic surgery (CAOS) is presented in [25].The robotic autonomous systems were designed to optimize drilling speed, safety, and effectiveness of various drilling parameters.The study also reviews several potential signal processing-based approaches for detecting a condition when a drill bit breaks through bone.Therefore, The authors stated that signal processing methods for motor current, drilling sound, and vibration signal for breakthrough detection in conventional drills are viable new research topics.
A review that focuses on the advancements in surgical drill bit design and its impact on reducing thermomechanical damage during bone drilling is presented in [26].The paper discusses how different geometries of drill bits influence bone damage, especially the importance of precise cutting tools to prevent damage to surrounding tissues.The review explores various drill bit geometries, highlighting how each design influences bone damage.The general objective of the review is to provide guidelines for designing drill bits to minimize damage and improve the effectiveness and safety of bone drilling surgeries.The paper suggests future research directions for improving surgical drill bit design, including flexible drill bits and chip-breaker designs, to enhance safety.
Jung et al. [27] present internal and external factors on heat generation.Drill properties, drill diameter, drill coating, and wear are categorized as internal factors.The external factors include drilling speed or feed rate, drilling depth, cooling, drilling energy, methodology used, and patient individual factors.An almost similar review that also discussed drill bit heat generation on surgical bone drilling is presented in [28].The paper highlights that drill bit design is one of the important factors in reducing thermal damage during surgical bone drilling.In addition, other key parameters, such as feed rate and applied force, also contribute to heat generation.Another review paper on the impact of temperature on the bone drilling process is presented in [29].The review paper encapsulates several related studies that emphasize the critical role of temperature control in the bone drilling process.-Temperature effect in bone drilling -Numerical simulation Not available 1 A more detail of the future study of Ref. [4] is explained in the paragraph in Section 2. 2 An interested reader for a detailed future work may read the article in Ref. [24].

A Brief Review of Medical Training System and Robotic Drilling
A medical training system (MTS) development for bone drilling is presented in [9].The main objective of the training system development is to train and enhance the medical professionals' skills via VE.In particular, it controls the force in a certain range and maintains the drill thrust velocity constant at a certain time.Multi-user is the unique feature of the proposed MTS.One of the important parts of the training system is the haptic feedback for simulating realistic bone drilling sensations.The training system was validated through user tests and assessed using Euclidean distance.
A virtual training simulation approach called machine learning-based guidance (LbG) was introduced in [30].The LbG approach aims for kinesthetic human-robot interaction (HRI) in virtual training simulations, particularly for bone surgical drilling.A femur bone drilling simulation is developed based on haptic feedback and X-ray views to help orthopedic residents practice, train, and improve their skills.The skill level of users and surgical expertise were assessed using machine learning tools.In addition, the virtual training system uses adaptive LbG forces, which are informed by expert surgeon knowledge, to enhance the resident's performance during simulation.
Another study that applied haptic feedback for virtual reality (VR) simulation of surgical drilling is presented in [31].The general objective of the study is to shift surgical training to VR simulation for otolaryngology and temporal bone dissection due to the complexity of anatomy.The haptic feedback in the VR simulation is used to provide a realistic sense of touch, especially the rendering of vibrations during surgical drill use.In detail, the ability of four different haptic hand controllers was evaluated to render realistic drill vibrations in VR surgical training.Some future applications of the study are as follows: (1) To enhance VR surgical simulators by incorporating vibrotactile feedback; (2) To improve the training experience in bone drilling procedures.
The application of physical and virtual prototypes for temporal bone drilling simulation is discussed in [32].The authors mentioned that a combined method of physical and virtual prototypes offers advantages such as ease of access, the possibility of repeated practice, and the absence of ethical issues.The future work of the study is to develop and use virtual reality in bone surgical simulation.

An Application of Vibration and Machine Learning Methods for Bone Drilling
This section presents a review on techniques in bone drilling experiments with vibration, ultrasonic, and acoustic emission signals.This section also presents an application of various machine learning (ML) methods for optimization, regression, and classification in bone drilling studies.

Bone Drilling Vibration
When a drill bit makes mechanical contact with bone during bone drilling, it applies force to the bone surface, causing it to penetrate and trigger a vibration signal.The vibration signal exhibited from this process can be captured using an accelerometer.A study of vibration signal characteristics for bone drilling, especially for bone layer classification, is presented in [33].The vibration signal dataset was acquired intermittently when the drill bit passed through three different layers: periosteum, first cortical, and spongy.Time and frequency domain features were extracted for the acquired vibration signal for three different layers.The features analysis results of the frequency domain show outperformed time domain features, indicating that frequency domain features have more information related to the bone layer compared to the time domain features.This is because the frequency characteristics of the vibration signal generated during bone drilling correspond to the structure and condition of the bone layer itself.Different bone densities will exhibit different frequency characteristics of the vibration signal.These properties can be investigated further using various signal and image processing techniques.
Another study found that the vibration signal during the milling of the ventral cortical bone (VCB), which has a higher density, is different from that during the milling of the cancellous bone (CCB) [34].Cortical bone tends to show higher frequency responses, reflecting greater hardness and density, whereas cancellous bone exhibits lower frequency responses due to its porous structure.These studies [34,35] provide strong evidence that cortical and cancellous bones differ in the frequency patterns of the vibration signals.
A novel ultrasonic vibration-assisted drilling (UVD) technique for precise bone surgery is presented by Kong and Lee [36].An analytical force model is developed for ultrasonic vibration-assisted bone drilling.In comparison to traditional drilling techniques, force and torque were significantly decreased in an experimental study on bovine bone utilizing ultrasonically assisted drilling [18].The study found that sensors-aided drilling, with a vibration frequency of 20 kHz and amplitude of 4-20 µm, produced lower temperatures than conventional drills [3,37].

Applied Machine Learning Methods
An application of the machine learning (ML) method for optimum bone drilling parameters prediction is presented in [38].A genetic algorithm (GA) is used to find a minimum thrust force value from the combination of the bone drilling parameters during bone drilling.A mathematical model of the thrust force as a function of spindle speed and feed rate is calculated using response surface methodology (RSM).In the study, the optimal value of the spindle speed and the feed rate to achieve the minimum thrust force during bone drilling is developed using the GA method.The GA method uses a developed RSM of thrust force as an objective function.The GA optimization result shows that a feed rate of 30 mm/min and a spindle speed of 1000 rpm are the optimal parameters for the minimum thrust force value.The GA predicted result is also compared to the experiment result for a similar feed rate and spindle speed value of 710 rpm.
Pandey et al. [39] presented a combined method to obtain an optimized grey fuzzy reasoning grade (GFRG) from all quality characteristics of bone drilling.The combined method consists of two methods: grey relation analysis (GRA) and fuzzy logic.The GFRG determines the optimal combination of bone drilling parameters that minimize temperature, force, and surface roughness.The highest GFRG is obtained at the speed of 500 rpm and the feed rate of 40 mm/min.
A study that reported the application of radial basis function neural network (RBFNN) for drill wear classification in bone drilling is available [40].The RBFNN is utilized to develop a drill wear classification model based on a multi-sensor approach.The features for the RBFNN classification model were extracted from signals such as cutting forces, servomotor drive currents, and acoustic emission (AE).
Various ML models such as k-nearest neighbors (KNN), support vector regression (SVR), decision tree (DT), and random forest (RF) were used for predicting temperature elevation rotary ultrasonic bone drilling (RUBD) [41].The machine learning models were compared with the response surface methodology (RSM) analysis.The result shows that SVR is the most outperformed model for this application compared to other ML methods.
The monitoring and prediction of temperature elevation during real-time in vivo medical surgery is a challenging task.A study that presents the Ridge regression for prediction of the temperature rise during orthopedic bone drilling is presented by Agarwal et al. [42].The Ridge regression model is compared with other ML models such as multilayer perceptron (MLP), lasso regression, and multi-linear regression.The performance metrics such as mean square error (MSE), root mean square error (RMSE), and mean absolute error (MAE) show that the error metrics of the Ridge regression are lower than other ML models, indicating that the proposed method outperformed other models.In another study by Agarwal et al. [43], the Ridge regression was compared with other ML models such as lassor regression, SVR, multi-linear regression, and artificial neural network (ANN).Ridge regression and other ML methods are used to predict the surface roughness and cutting force during rotary ultrasonic bone drilling.According to the statistical analysis of the predictive results, it was observed that Ridge regression has the least error metrics compared to other ML methods in terms of surface roughness prediction.In the case of the cutting force prediction, SVR was the most accurate model compared to the other ML models.
KNN and ensemble classifiers were utilized in [44] for breakthrough detection in robotic orthopedic surgery.A feature set containing closed-loop control signals and force sensor data were used as the training datasets to develop the prediction models.It was found that the ML models accurately detected the breakthrough during bone drilling operations.The best accuracy of breakthrough detection is 98.1 ± 0.2% for sheep femur bone.
A successful strategy for identifying bone drilling levels (bone layers) using a customized convolutional neural network (CNN) is described in [35].The CNN classification used vibration signals from a three-axial accelerometer attached to the cow femur bone.The CNN accurately classified raw vibration signals from the three-axial accelerometer into three distinct bone layers: periosteum (the outermost layer), first cortical (the next layer beneath the periosteum), and spongy (the innermost layer).A summary of the application of ML methods in bone drilling is presented in Table 3.  [38] Temperature, force, and surface roughness

Grey relation analysis and fuzzy logic Optimization
To determine the optimal combination of bone drilling parameters that minimize temperature, force, and surface roughness.

Grey fuzzy reasoning grade Classification
To find an optimal value of feed rate (mm/min) and speed (rpm).

Radial basis function neural network Classification
To develop a drill wear classification model based on a multi-sensor approach.
Torun et al.

K-nearest neighbors and ensemble classifier Classification
To detect breakthroughs and estimate the condition of the drill bit in robotic bone drilling.

Previous Studies
Several tools are required in the drilling process, namely the hand drill machines and drill bits.Currently, the drilling speed of the hand drill machine varies between 500 and 1500 rpm.The reason for this difference is that some manufacturers have also introduced high speeds as an advantage in their marketing activities.Drill bits are also employed in preparing bone tunnels, for instance, in anterior ligament reconstructions.Typically, drilling is utilized to create holes in the bone before inserting screws.Nevertheless, since rigid bone is invariably surrounded by soft tissues like muscles, fat, ligaments, and tendons, which allow for bone movement, the bone can deviate from its normal position due to the shearing forces exerted by the drilling tool.The process of drilling a bone is depicted in a typical block diagram in Figure 1.Several other critical performance characteristics that determine the success of bone drilling include the straightness of the created hole, an efficient coefficient of friction, and healing time.Therefore, during surgical procedures, the force exerted by the surgeon and the position of the drill bit must be accurate.The accuracy of drilled holes during orthopedic fracture treatment relies greatly on the manual skills of the surgeon.However, currently, bone drilling tools used in surgeries do not include any mechanisms for penetration control.Thus, an automatic drilling system must be developed to minimize human errors during bone drilling.Much research has been conducted to explore new drill-bit designs [13][14][15][16] and new drilling techniques [17][18][19].This was performed to avoid the accumulation of heat at the point where the drill was located.
In [45], there are two approaches to minimize thermal damage during bone drilling.The first strategy involves employing a higher feed rate to decrease the duration of the drilling process.The second method involves utilizing a lower feed rate to achieve a lower maximum temperature.
Bone drilling has been studied in past decades and is still a promising and developing research area.Table 4 shows the selected research from 1976 to 2023 that briefly described the bone sample used, the experimental description and procedure, and the outcome of the study.

Author (Year) Bone Sample Brief Experimental Description The Outcome of the Study
Chen and Gundjian (1976) [46] Bovine femur The bovine femur was split into seven thin-disc samples.Each thin-disc sample dimension is approximately 1 mm thick and 3 mm in diameter.
The material characteristic that affects the bone's maximum temperature when a heat source is present is specific heat.
Cordioli and Majzoub (1997) [47] Bovine cortical femur bone The bone sample was drilled with a diameter of 2 and 3 mm running at 1500 rpm and 200 N of axial force.
Correlation between drilling depth and maximum temperature.

Hillery et al.
(1999) [48] Femur heads, Bovine tibia The drilling machine was operated from 400 to 2000 rpm with an interval of 200 rpm.The feed rate during the bone drilling was 50 mm/min.
The temperature increased with the increasing depth of the hole.
The optimal speed range is between 800 and 1400 rpm with a drill bit diameter of 3.2 mm.

Lee et al. (2012) [49] Bovine femur
Each bone specimen was attached to a drilling dynamometer.The controlled parameters for the drilling time are gauge torque and thrust.
Presented a novel method based on a CNC system for temperature measurement, various thermocouples, and an accurate position.

Pandey et al. (2014) [50] Bovine bone
Using an MTAB 3-axis Flex mill.Temperature data were gathered using a K-type Extech thermocouple and data-gathering software.
The study found that drill diameter had the greatest influence among these variables based on the result of the Taguchi method.

Sarparast et al. (2020) [51] Bovine femur
A high-speed electrical motor with a rotational speed higher than 10,000 rpm was mounted in the lathe machine.High-speed steel (HSS) drill bit that was 2 mm in diameter was selected for the experiment.The lathe machine was run with an increasing feed rate from 10 to 50 mm/min.Single footing load cells and k-type thermocouples are used for force and temperature measurement.
Bone drilling optimum (minimum) temperature was revealed at a rotational speed of 12,000 rpm and feed rate of 50 mm/min.By increasing the feed rate slightly, it increases the process force, which can also lead to the increasing temperature.

Alam et al. (2023) [52] Femoral and tibia bones
A custom-made drilling setup with a feedback control system for force, torque, and temperature was used in drilling tests.Small holes of 1.5 mm in diameter through the bone were produced with rotational speed of 400 rpm and feed rate of 40 mm/min.
Increasing pressure on a worn drill is necessary when drilling passes through the hard cortex of the bone.The torque in bone drilling has a direct relationship with the depth of drilling.
Bone temperature was increased when the drill progressed to wear.

Bone Drilling Lab Experiment of the Present Study
A bone drilling lab experiment utilized a Dobot Magician robot, National Instrument (NI) data acquisition (DAQ) module NI-9345, Brüel & Kjaer three-axis accelerometer type 4535-B, and standalone academic LabVIEW software.A Dobot Magician robot was connected to a PC with available software for robot programming.A schematic of the bone drilling lab experiment is presented in Figure 2.
to 0.006 in/min during the experiment.The drill geometry was chosen as a twisted drill bit with a 3.5 mm diameter.The vibration data were collected in 5 s for each layer and saved in the Microsoft Excel Worksheet.The experiment does not involve data processing because the vibration signal is not filtered or denoised.This is to simplify the method by excluding the data processing step and to examine the robustness of deep learning methods in predicting and classifying the raw vibration signals of bone drilling.The original end effector of the Dobot Magician robot was replaced with the customized drilling mechanism, as presented in Figure 2. In the bone drilling experiment, each layer's depth penetration is accurately controlled by the Dobot Magician robot.Bone drilling vibration was performed intermittently from the periosteum layer (the outermost layer), first cortical layer (the layer beneath the periosteum layer), and spongy layer (the innermost layer) as illustrated in the table part of Figure 2. The vibration signal was acquired using a three-axial accelerometer B&K type 4535-B-001 with a sensitivity for the x-, y-, and z-axis of 96.44, 100.4,and 100.6 mV/g, respectively.A LabVIEW block diagram for the bone drilling vibration experiment was developed, and the vibration data were acquired with a sampling rate of 5 kHz.The drill was run at 500 rpm and a feed rate of 0.002 to 0.006 in/min during the experiment.The drill geometry was chosen as a twisted drill bit with a 3.5 mm diameter.The vibration data were collected in 5 s for each layer and saved in the Microsoft Excel 2013 Worksheet.The experiment does not involve data processing because the vibration signal is not filtered or denoised.This is to simplify the method by excluding the data processing step and to examine the robustness of deep learning methods in predicting and classifying the raw vibration signals of bone drilling.
The three-axial accelerometer was mounted to the bone that would be drilled, and another one was attached to the customized drilling mechanism to receive vibration signals during the drilling operation, as presented in Figure 2. The drill bit was then placed at the anterior surface of the proximal femur and drilled in clockwise rotation continuously.A fresh frozen cow femur was used in this research because this type of bone has almost similar characteristics to human bone [16].The sample was fixed with a laboratory clamp while the drilling process was performed.The total number of holes produced in the experiment was 10.However, two holes did not go through the bone (holes #2 and #7).The remaining eight holes were successfully drilled through the bone sample.
The vibration signal of the bone drilling experiment (hole #5) for three different layers is presented in Figures 3-5. Figure 3 presents the bone drilling vibration data (x-axis) for the duration of one second.Figure 3a, Figure 3b, and Figure 3c are the vibration signals from different layers: periosteum, first cortical, and spongy, respectively.The vibration signals from several layers are difficult to differentiate.On the other hand, the vibration signal displays a distinct form, as seen in Zoom in portions of Figure 3a-c, if it was only plotted for 0.2 s (0.5~0.7 s) for zooming purposes.Another information that can be revealed in Figure 3 is the vibration signal amplitude of each layer.The deeper the drill bit comes in, the higher the amplitude (in mV) of the vibration signal.4 are also difficult to distinguish visually.However, zooming in on the vibration signal for 0.2 s (0.5~0.7 s results in a distinct form and amplitude, as seen in Figure 4a-c. Figure 5 presents the bone drilling vibration data (z-axis) for the duration of one second.Figure 5a, Figure 5b, and Figure 5c are the vibration signals from the periosteum layer, first cortical layer, and spongy layer, respectively.Similar to Figures 3 and 4   Table 5 shows the root mean square (RMS) of the vibration amplitude during bone drilling.It is demonstrated that as the drill bit penetrates deeper, the RMS amplitude increases.It implies that each layer's bone structure is different, and when the drill bit makes contact with the bone structure, the vibration signal is triggered.In particular, the RPM values increase significantly from the first cortical to spongy layer than from the periosteum to the first cortical layer.It shows that the spongy layer is less rigid and dense than the periosteum and first cortical layer.

Long Term-Short Memory Method
Long short-term memory (LSTM) is a type of recurrent neural network (RNN) architecture that processes sequential input.The LSTM is an improved method of RNN, which was designed by Hochreiter and Schmidhuber for sequence prediction tasks [53].In addition, the LSTM method excels in capturing long-term dependencies in sequence data and handling the vanishing gradient problem [53].The LSTM method has been applied previously in an ECG-rhythm classification study [54] and has been used to optimize reactive power usage in high-rise buildings [55].According to the summary of the machine learning application presented in Table 3, LSTM has not been used for orthopedic bone drilling; this is the motivation for selecting the LSTM method in the present bone layer classification study.
An example application of the LSTM method for reproducing variable forces in haptic technology focusing on tactile feedback enhancement in real-time robotic surgery simulation is presented in [56].The LSTM method is used in the study to replicate varied force feedback during a skin layer surgical procedure.The LSTM method is also applied in the study to increase force prediction accuracy in robotic surgery simulation.This is because the bone experiment was conducted intermittently from the periosteum, through the first cortical, and ended at spongy; a sequential method is suitable for this type of data.This is the main reason why LSTM was selected in this study.

LSTM Architecture
The characteristic of LSTM is a chain of repeating modules, as presented in Figure 6a.Each module looks at some input x t and outputs a hidden state value h t .A loop passes information from one network phase to the next.The difference between RNN and LSTM is in the construction of the chained units.The unit of standard RNN has a simple structure, like a single tanh layer, while LSTM has a more complicated unit, as presented in Figure 6b.The keys of LSTM are cell state and gates.The cell state is kind of like a conveyor belt.The cell state (memory) learns new information from the input.The LSTM can remove or add information to the cell state (C t ) using a mechanism called gates.Gates can remove or add information to the cell state.Gates decides whether information should be added to the units or not.In general, the gate equation is presented in (1) [57]: where U, W, and b are the parameters of the door.In this equation, the parameters of each door are different.The variable x t in the current input, and h t−1 is the previous hidden state.
high-rise buildings [55].According to the summary of the machine learning application presented in Table 3, LSTM has not been used for orthopedic bone drilling; this is the motivation for selecting the LSTM method in the present bone layer classification study.An example application of the LSTM method for reproducing variable forces in haptic technology focusing on tactile feedback enhancement in real-time robotic surgery simulation is presented in [56].The LSTM method is used in the study to replicate varied force feedback during a skin layer surgical procedure.The LSTM method is also applied in the study to increase force prediction accuracy in robotic surgery simulation.This is because the bone experiment was conducted intermittently from the periosteum, through the first cortical, and ended at spongy; a sequential method is suitable for this type of data.This is the main reason why LSTM was selected in this study.

LSTM Architecture
The characteristic of LSTM is a chain of repeating modules, as presented in Figure 6a.Each module looks at some input xt and outputs a hidden state value ht.A loop passes information from one network phase to the next.The difference between RNN and LSTM is in the construction of the chained units.The unit of standard RNN has a simple structure, like a single tanh layer, while LSTM has a more complicated unit, as presented in Figure 6b.The keys of LSTM are cell state and gates.The cell state is kind of like a conveyor belt.The cell state (memory) learns new information from the input.The LSTM can remove or add information to the cell state (Ct) using a mechanism called gates.Gates can remove or add information to the cell state.Gates decides whether information should be added to the units or not.In general, the gate equation is presented in (1) [57]: where U, W, and b are the parameters of the door.In this equation, the parameters of each door are different.The variable xt in the current input, and ht−1 is the previous hidden state.
(a) (b) There are three gates in an LSTM unit: forget gate, input gate, and output gate.Forget gate (Ft) controls which information should be removed from the cell state.Input gate (It) There are three gates in an LSTM unit: forget gate, input gate, and output gate.Forget gate (F t ) controls which information should be removed from the cell state.Input gate (I t ) determines which information from the previous timestamps should be remembered or forgotten.The input gate also controls how much new information is added to the cell state.Output gate (O t ) selects useful information from the current cell state and produces it as the output.The output gate additionally sends updated data to the following timestamp.
Among the eight datasets, holes #1 and #3 were not included in DL multiclassification and prediction.Six holes were used in LSTM, CNN, and RNN training and testing (holes #4, #5, #6, #8, #9, and #10).Holes #1 and #3 were not included because they were corrupted by an expected noise after initial dataset checking using MATLAB.The assessment was conducted visually and was not explained in detail in this paper.
In this study, the LSTM algorithm that is available in TensorFlow Keras was used.To develop the LSTM model, 70% of the vibration data were used for training, and 30% of the data were used for testing.The validation data were obtained from 20% of the training data.The LSTM architecture used in this project is presented in Table 6, and the summary is as follows: Input Layer → LSTM Layer (return_sequences = True) → LSTM Layer → Flatten Layer → Dense Layer (Output Layer).With this architecture, the model can accept the sequential vibration data from the x-, y-, and z-axis for each layer.The sequential vibration data were processed in two LSTM layers to obtain the prediction model for multiclass classification.The 'softmax' activation function was selected as the dense layer.

LSTM Classification Results and Discussion
In the multi-classification model development, the LSTM model is configured with an 'adam' optimizer and a 'categorical_crossentropy' loss function.Some 'callback' functions, such as EarlyStopping, ModelCheckpoint, and LearningRateScheduler, are also used to control the training process.An 'accuracy' metric to evaluate the model's performance is used during the training process.The model evaluation is presented in Figure 7 with a test loss of 0.018 and test accuracy of 0.993.Table 7 shows a classification report of LSTM, which provides detailed inform regarding model performance for each target class.The evaluation of layer 1 (perios shows that the model has a result of 0.99 for precision, recall, and F1-score.In layer 2 cortical), the model also has a result of 0.99 for precision, recall, and F1-score.In th of layer 3 (spongy), it produces the highest result of 1 for recall and 0.99 for both pre and F1-score.Figure 8 shows the precision-recall curve of all three layers was close Table 7 shows a classification report of LSTM, which provides detailed information regarding model performance for each target class.The evaluation of layer 1 (periosteum) shows that the model has a result of 0.99 for precision, recall, and F1-score.In layer 2 (first cortical), the model also has a result of 0.99 for precision, recall, and F1-score.In the case of layer 3 (spongy), it produces the highest result of 1 for recall and 0.99 for both precision and F1-score.Figure 8 shows the precision-recall curve of all three layers was close to 1.  Table 7 shows a classification report of LSTM, which provides detailed information regarding model performance for each target class.The evaluation of layer 1 (periosteum) shows that the model has a result of 0.99 for precision, recall, and F1-score.In layer 2 (first cortical), the model also has a result of 0.99 for precision, recall, and F1-score.In the case of layer 3 (spongy), it produces the highest result of 1 for recall and 0.99 for both precision and F1-score.Figure 8 shows the precision-recall curve of all three layers was close to 1.A confusion matrix for the training and testing of the LSTM model with 10 epochs is presented in Figure 9.It is shown the classification of the three bone layers is generally successful.However, there were very few misclassification results, as presented in the confusion matrix of training and testing.In the confusion matrix of training, 366 out of the total 104,957 data points of the periosteum layer wwere misclassified in the first cortical layer, which is about a 0.35% incorrect prediction, and 694 out of the total 104,957 data points of the periosteum layer were predicted as a spongy layer, which resulted in a 0.66% error.For the first cortical layer, 698 out of the total 105,052 data points were misclassified as the periosteum layer (0.66% error), and there is no first cortical data classified as spongy.Another minor misclassification is also found in the first cortical layer, with 346 out of 104,644 data predicted in the periosteum layer, which is about 0.33% classification error, and zero spongy data were classified in the periosteum.
In the testing confusion matrix, 134 and 306 out of 45,042 periosteum data were misclassified in a first cortical layer and spongy layer, respectively, which resulted in 0.3% and 0.68% incorrect prediction error.Similar to the training confusion matrix, only 302 out of the first 44,948 cortical data points were predicted in the periosteum layer, which is about a 0.67% classification error, and no data were classified as spongy.A better classification result is found in the spongy data, with only 154 out of 44,856 data predicted in the first cortical layer, which is about 0.34% classification error, and zero spongy data were classified in the periosteum layer.
points of the periosteum layer were predicted as a spongy layer, which resulted in a 0.66% error.For the first cortical layer, 698 out of the total 105,052 data points were misclassified as the periosteum layer (0.66% error), and there is no first cortical data classified as spongy.Another minor misclassification is also found in the first cortical layer, with 346 out of 104,644 data predicted in the periosteum layer, which is about 0.33% classification error, and zero spongy data were classified in the periosteum.In the testing confusion matrix, 134 and 306 out of 45,042 periosteum data were misclassified in a first cortical layer and spongy layer, respectively, which resulted in 0.3% and 0.68% incorrect prediction error.Similar to the training confusion matrix, only 302 out of the first 44,948 cortical data points were predicted in the periosteum layer, which is about a 0.67% classification error, and no data were classified as spongy.A better classification result is found in the spongy data, with only 154 out of 44,856 data predicted in the first cortical layer, which is about 0.34% classification error, and zero spongy data were classified in the periosteum layer.

Other Deep Learning (DL) Methods for Performance Comparison with LSTM
Two other DL methods were selected for performance comparisons with LSTM: CNN and RNN.

Brief Information of Convolutional Neural Network (CNN)
Convolutional neural network (CNN) is a popular network design for deep learning that is particularly useful for detecting patterns in 2D (image) data.CNN uses layers of interconnected neurons, including convolutional layers that learn features directly from data.These filters slide over input features, extracting relevant patterns.CNN has been used in previous studies, e.g., for automated Cobb angle measurement [58], for bird sound classification [59], and for vibration signal analysis in belt grinding tool wear prediction [60].

Other Deep Learning (DL) Methods for Performance Comparison with LSTM
Two other DL methods were selected for performance comparisons with LSTM: CNN and RNN.

Brief Information of Convolutional Neural Network (CNN)
Convolutional neural network (CNN) is a popular network design for deep learning that is particularly useful for detecting patterns in 2D (image) data.CNN uses layers of interconnected neurons, including convolutional layers that learn features directly from data.These filters slide over input features, extracting relevant patterns.CNN has been used in previous studies, e.g., for automated Cobb angle measurement [58], for bird sound classification [59], and for vibration signal analysis in belt grinding tool wear prediction [60].
CNNs were initially created for the application in images or 2D (image) data; however, there is an increasing trend of CNN applications in 1D data, especially in audio signals, time-series data, biomedical data, structural health monitoring data, and fault detectionbased vibration data.A review of the application of CNNs in 1D data is presented in [61].The review paper also described in detail the fundamental theory and architecture of applied 1D CNNs.In the review, the 1D CNNs were applied to the speech signal, ECG signal for arrhythmia detection, and vibration data for structural damage detection.The review discovered that 1D CNNs have advantages compared to 2D CNNs due to the simpler and more compact configuration.In detail, there are three main advantages of 1D CNNs: (1) Lower computational complexity; (2) Feasibility for real-time; (3) Low-cost hardware implementation [61].

CNN Classification Results and Discussion
Similar to the LSTM model, the CNN model is also configured with an 'adam' optimizer and 'categorical_crossentropy' loss function in the application of a multiclass classification of bone layers.An 'accuracy' metric to evaluate the model's performance is used during the training process.The model evaluation is presented in Figure 10.
Table 8 shows a CNN classification report of bone layers.The evaluation result of layer 1 (periosteum) shows the model has 0.95 for precision and 0.94 for both recall and F1score.In layer 2 (first cortical), the model has 0.95 for precision, 0.97 for recall, and 0.96 for F1-score.In the case of layer 3 (spongy), it has 0.95 for precision, 0.93 for recall, and 0.94 for F1-score.The overall multi-classification accuracy of RNN is 0.95.The precision-recall curve is presented in Figure 11; it is shown that the first cortical layer has better results compared to the other two layers (periosteum and spongy).

CNN Classification Results and Discussion
Similar to the LSTM model, the CNN model is also configured with an 'adam' optimizer and 'categorical_crossentropy' loss function in the application of a multiclass classification of bone layers.An 'accuracy' metric to evaluate the model's performance is used during the training process.The model evaluation is presented in Figure 10.Table 8 shows a CNN classification report of bone layers.The evaluation result of layer 1 (periosteum) shows the model has 0.95 for precision and 0.94 for both recall and F1-score.In layer 2 (first cortical), the model has 0.95 for precision, 0.97 for recall, and 0.96 for F1-score.In the case of layer 3 (spongy), it has 0.95 for precision, 0.93 for recall, and 0.94 for F1-score.The overall multi-classification accuracy of RNN is 0.95.The precisionrecall curve is presented in Figure 11; it is shown that the first cortical layer has better results compared to the other two layers (periosteum and spongy).A confusion matrix for the training and testing of the CNN model with 10 epochs is presented in Figure 12.There were greater misclassification results as presented in the CNN confusion matrix of training and testing, compared to the LSTM confusion matrix.In the confusion matrix of training, 2086 out of the total 104,957 layer 1 data points were misclassified in layer 2, which is about a 1.99% incorrect prediction; 4022 out of the total 104,957 layer 1 data points were predicted as layer 3, which resulted in a 3.83% classification error.For layer 2 prediction, 2094 out of the total 105,052 layer 2 data points were misclassified as layer 1 (1.99% error), and 1205 out of the total 105,052 layer 2 data points were predicted incorrectly as layer 3 (1.15%error).Another misclassification is also found in layer 3, with 3321 out of the total 104,990 layer 3 data points being predicted in layer 1, which is about a 3.16% classification error, and 3482 out of the total 104,990 layer 3 data points being predicted in layer 2, which is about 3.32% prediction error.A confusion matrix for the training and testing of the CNN model with 10 epochs is presented in Figure 12.There were greater misclassification results as presented in the CNN confusion matrix of training and testing, compared to the LSTM confusion matrix.In the confusion matrix of training, 2086 out of the total 104,957 layer 1 data points were misclassified in layer 2, which is about a 1.99% incorrect prediction; 4022 out of the total 104,957 layer 1 data points were predicted as layer 3, which resulted in a 3.83% classification error.For layer 2 prediction, 2094 out of the total 105,052 layer 2 data points were misclassified as layer 1 (1.99% error), and 1205 out of the total 105,052 layer 2 data points were predicted incorrectly as layer 3 (1.15%error).Another misclassification is also found in layer 3, with 3321 out of the total 104,990 layer 3 data points being predicted in layer 1, which is about a 3.16% classification error, and 3482 out of the total 104,990 layer 3 data points being predicted in layer 2, which is about 3.32% prediction error.In the confusion matrix of testing, 914 and 1728 out of the total 45,042 layer 1 data points were misclassified in layer 2 and layer 3, respectively, which produced prediction errors of 2.03% and 3.84% for each layer.In layer 2, 906 out of the total 44,948 layer 2 data points were predicted in layer 1 with a 2.02% classification error, and 545 out of the total In the confusion matrix of testing, 914 and 1728 out of the total 45,042 layer 1 data points were misclassified in layer 2 and layer 3, respectively, which produced prediction errors of 2.03% and 3.84% for each layer.In layer 2, 906 out of the total 44,948 layer 2 data points were predicted in layer 1 with a 2.02% classification error, and 545 out of the total 44,948 layer 2 data points were misclassified in layer 3 with a 1.21% prediction error.For layer 3 classification, 1429 out of the total 45,010 layer 3 data points were predicted in layer 1 with a 3.17% classification error, and 1518 out of the total 45,010 layer 3 data points were misclassified in layer 2 with a 3.37% prediction error.

Brief Information of Recurrent Neural Network (RNN)
A recurrent neural network (RNN) is a superset of a feedforward neural network (FFNN) that is enhanced by the addition of edges spanning neighboring time steps, which gives the model an understanding of time [62].The RNN makes use of sequential information and can, therefore, simultaneously model sequential and time dependencies on multiple scales.This enables a unidirectional process to take information from the past to process later inputs.A basic RNN model is presented in Figure 13a.
Figure 13b illustrates the architecture of a single RNN cell.Each cell has two inputs and two outputs at each time step.For the inputs, a (t−1) and x (t) denote the hidden state from the previous cell and the current time step's input data, respectively.The inputs interact with the weights and biases ( W aa , W ax , and b a ), which are reused in each time step.The new hidden state at the end of each cell is then used to calculate the prediction during the forward propagation using a softmax function, s.The indifferent new hidden value is carried forward; the two needed outputs are produced, which are the hidden state and predictions, as represented by a (t) and ŷ(t) .
A recurrent neural network (RNN) is a superset of a feedforward neural network (FFNN) that is enhanced by the addition of edges spanning neighboring time steps, which gives the model an understanding of time [62].The RNN makes use of sequential information and can, therefore, simultaneously model sequential and time dependencies on multiple scales.This enables a unidirectional process to take information from the past to process later inputs.A basic RNN model is presented in Figure 13a.

RNN Classification Results and Discussion
Similar to the LSTM and CNN model, the RNN model also is configured with an 'adam' optimizer, and 'categorical_crossentropy' loss function in the application of a multiclass classification of bone layers.An 'accuracy' metric to evaluate the model's performance is also used during the training process.The model evaluation is presented in Figure 14.
Table 9 shows an RNN model performance for each target class.The classification report of (periosteum (layer 1) shows that the model has 0.96 for recall and 0.95 for both precision and F1-score.In the first cortical (layer 2), the model has 0.97 for precision and 0.96 for both recall and F1-score.In the case of spongy (layer 3), the precision, recall, and F1-score of the model are 0.94, 0.96, and 0.95, respectively.The overall multi-classification accuracy of RNN is 0.96.The precision-recall curve is presented in Figure 15; it shows that the first cortical layer has better results compared to the other two layers (periosteum and spongy), which is similar to the CNN result.

RNN Classification Results and Discussion
Similar to the LSTM and CNN model, the RNN model also is configured with an 'adam' optimizer, and 'categorical_crossentropy' loss function in the application of a multiclass classification of bone layers.An 'accuracy' metric to evaluate the model's performance is also used during the training process.The model evaluation is presented in Figure 14.Table 9 shows an RNN model performance for each target class.The classification report of (periosteum (layer 1) shows that the model has 0.96 for recall and 0.95 for both precision and F1-score.In the first cortical (layer 2), the model has 0.97 for precision and 0.96 for both recall and F1-score.In the case of spongy (layer 3), the precision, recall, and F1-score of the model are 0.94, 0.96, and 0.95, respectively.The overall multi-classification accuracy of RNN is 0.96.The precision-recall curve is presented in Figure 15; it shows that the first cortical layer has better results compared to the other two layers (periosteum and spongy), which is similar to the CNN result.A confusion matrix for the training and testing process of RNN model development with 10 epochs is presented in Figure 16.In general, the confusion matrix result of the RNN is slightly better than that of the CNN.However, it does not perform as well as LSTM.For the confusion matrix of training, 1202 out of the total 104,957 layer 1 data points were misclassified in layer 2 with a 1.15% classification error, and 3341 out of the total 104,957 layer 1 data points were predicted incorrectly in layer 3 with a 3.18% prediction error.In layer 2, 1928 out of the total 105,052 layer 2 data points were predicted incorrectly in layer 1 with a 1.84% error, and 2855 out of the total 105,052 layer 2 data points were misclassified in layer 3 with a 2.72% error.In layer 3, 3164 and 1572 out of the total 104,990 layer 3 data points were misclassified in layer 1 and layer 2, respectively, with 3.01% and 1.5% prediction error for layer 1 and layer 2, respectively.A confusion matrix for the training and testing process of RNN model development with 10 epochs is presented in Figure 16.In general, the confusion matrix result of the RNN is slightly better than that of the CNN.However, it does not perform as well as LSTM.For the confusion matrix of training, 1202 out of the total 104,957 layer 1 data points were misclassified in layer 2 with a 1.15% classification error, and 3341 out of the total 104,957 layer 1 data points were predicted incorrectly in layer 3 with a 3.18% prediction error.In layer 2, 1928 out of the total 105,052 layer 2 data points were predicted incorrectly in layer 1 with a 1.84% error, and 2855 out of the total 105,052 layer 2 data points were misclassified in layer 3 with a 2.72% error.In layer 3, 3164 and 1572 out of the total 104,990 layer 3 data points were misclassified in layer 1 and layer 2, respectively, with 3.01% and 1.5% prediction error for layer 1 and layer 2, respectively.
A confusion matrix for the training and testing process of RNN model development with 10 epochs is presented in Figure 16.In general, the confusion matrix result of the RNN is slightly better than that of the CNN.However, it does not perform as well as LSTM.For the confusion matrix of training, 1202 out of the total 104,957 layer 1 data points were misclassified in layer 2 with a 1.15% classification error, and 3341 out of the total 104,957 layer 1 data points were predicted incorrectly in layer 3 with a 3.18% prediction error.In layer 2, 1928 out of the total 105,052 layer 2 data points were predicted incorrectly in layer 1 with a 1.84% error, and 2855 out of the total 105,052 layer 2 data points were misclassified in layer 3 with a 2.72% error.In layer 3, 3164 and 1572 out of the total 104,990 layer 3 data points were misclassified in layer 1 and layer 2, respectively, with 3.01% and 1.5% prediction error for layer 1 and layer 2, respectively.In the confusion matrix of testing, 548 and 1409 out of the total 45,042 layer 1 data points were predicted incorrectly in layer 2 and layer 3, respectively.This misclassification results in a prediction error of 1.22% for layer 2 and 3.13% for layer 3.In layer 2 classification, 822 out of the total 44,948 layer 2 data points were predicted incorrectly in layer 1 with a 1.83% classification error, and 1145 out of 44,948 were misclassified in layer 3 with In the confusion matrix of testing, 548 and 1409 out of the total 45,042 layer 1 data points were predicted incorrectly in layer 2 and layer 3, respectively.This misclassification results in a prediction error of 1.22% for layer 2 and 3.13% for layer 3.In layer 2 classification, 822 out of the total 44,948 layer 2 data points were predicted incorrectly in layer 1 with a 1.83% classification error, and 1145 out of 44,948 were misclassified in layer 3 with a 2.55% prediction error.For layer 3 classification, 1336 and 678 out of the total 45,010 layer 3 data points were predicted in layer 1 and layer 2, respectively, with a 2.97% classification error in layer 1 and a 1.51% prediction error in layer 2.

Conclusions
A review of an orthopedic bone drilling study with an example of bone layer classification using vibration signal and deep learning methods such as LSTM, CNN, and RNN is presented.This review aimed to provide a state-of-the-art bone drilling study that will be useful for researchers developing a new method or a new research direction.One summary that can be highlighted according to the review is the potential research direction and future work in the development of the medical training system simulation that comprises sensor and robotic technologies, haptic mechanisms, and real-time monitoring systems.Sensor technology is one of the main factors in the simulation of medical training systems for providing user feedback.This paper presented a potential sensor input-based accelerometer or vibration signal to enable user feedback information in conducting bone drilling.
Three DL methods, i.e., LSTM, CNN, and RNN, are selected to describe the benefit of utilizing the vibration signal for bone drilling study, especially for bone layer classification.The following are a few of the multi-classifications of bone layers based on the three applied DL methods:

•
With an almost similar DL model development parameters and epoch number, the LSTM shows that it is better than CNN and RNN for vibration data (1D data) of bone layer classification.

•
The overall multi-classification accuracy of LSTM, CNN, and RNN, according to the classification report tables, is 0.99, 0.95, and 0.96.This indicates that LSTM is outperformed by CNN and RNN.

•
The bone layer classification study based on vibration signals is still developing.This study can be particularly useful in medical procedures in bone drilling, where accurate identification of different bone layers is crucial.

•
The future work related to the bone drilling experiment is to generate more datasets and to use other potential methods.

Figure 2 .
Figure 2. Schematic diagram of the bone drilling experiment.Figure 2. Schematic diagram of the bone drilling experiment.

Figure 2 .
Figure 2. Schematic diagram of the bone drilling experiment.Figure 2. Schematic diagram of the bone drilling experiment.

Figure 4
Figure4presents the bone drilling vibration data (y-axis) for the duration of one second.Figure4a, Figure4b, and Figure4care the vibration signals from different layers: periosteum, first cortical, and spongy, respectively.Similar to Figure3, the vibration signals of three different layers in Figure4are also difficult to distinguish visually.However, zooming in on the vibration signal for 0.2 s (0.5~0.7 s results in a distinct form and amplitude, as seen in Figure4a-c.Figure5presents the bone drilling vibration data (z-axis) for the duration of one second.Figure5a, Figure5b, and Figure5care the vibration signals from the periosteum layer, first cortical layer, and spongy layer, respectively.Similar to Figures3 and 4, the vibration signals of three separate layers in Figure5are visually indistinguishable.Zooming in on the vibration signal for 0.2 s (0.5-0.7 s) results in a changed shape and amplitude.
Figure4presents the bone drilling vibration data (y-axis) for the duration of one second.Figure4a, Figure4b, and Figure4care the vibration signals from different layers: periosteum, first cortical, and spongy, respectively.Similar to Figure3, the vibration signals of three different layers in Figure4are also difficult to distinguish visually.However, zooming in on the vibration signal for 0.2 s (0.5~0.7 s results in a distinct form and amplitude, as seen in Figure4a-c.Figure5presents the bone drilling vibration data (z-axis) for the duration of one second.Figure5a, Figure5b, and Figure5care the vibration signals from the periosteum layer, first cortical layer, and spongy layer, respectively.Similar to Figures3 and 4, the vibration signals of three separate layers in Figure5are visually indistinguishable.Zooming in on the vibration signal for 0.2 s (0.5-0.7 s) results in a changed shape and amplitude.

Figure 8 .
Figure 8. Precision-recall curve of the LSTM classification model.Figure 8. Precision-recall curve of the LSTM classification model.

Figure 8 .
Figure 8. Precision-recall curve of the LSTM classification model.Figure 8. Precision-recall curve of the LSTM classification model.

Figure 10 .
Figure 10.(a) Model accuracy of CNN; (b) Model loss of CNN.

Figure 10 .
Figure 10.(a) Model accuracy of CNN; (b) Model loss of CNN.

Figure 11 .
Figure 11.Precision-recall curve of the CNN classification model.A confusion matrix for the training and testing of the CNN model with 10 epochs is presented in Figure 12.There were greater misclassification results as presented in the CNN confusion matrix of training and testing, compared to the LSTM confusion matrix.In the confusion matrix of training, 2086 out of the total 104,957 layer 1 data points were misclassified in layer 2, which is about a 1.99% incorrect prediction; 4022 out of the total 104,957 layer 1 data points were predicted as layer 3, which resulted in a 3.83% classification error.For layer 2 prediction, 2094 out of the total 105,052 layer 2 data points were misclassified as layer 1 (1.99% error), and 1205 out of the total 105,052 layer 2 data points

Figure 11 .
Figure 11.Precision-recall curve of the CNN classification model.

Figure 11 .
Figure 11.Precision-recall curve of the CNN classification model.

FigureFigure 13 .
Figure13billustrates the architecture of a single RNN cell.Each cell has two inputs and two outputs at each time step.For the inputs,    and   denote the hidden state from the previous cell and the current time step's input data, respectively.The inputs interact with the weights and biases (  ,   , and   ), which are reused in each time

Figure 15 .
Figure 15.Precision-recall curve of the RNN classification model.

Figure 15 .
Figure 15.Precision-recall curve of the RNN classification model.

Table 1 .
Abbreviations of terminologies used in the paper.

Table 2 .
Highlight the previously published review papers on bone drilling.

Table 3 .
Summary of the application of machine learning methods on bone drilling.

Table 4 .
Review paper on experimental bone drilling.

Table 5 .
Vibration amplitude of different axes and different layers represented by RMS.