Tool-tissue force segmentation and pattern recognition for evaluating neurosurgical performance

Surgical data quantification and comprehension expose subtle patterns in tasks and performance. Enabling surgical devices with artificial intelligence provides surgeons with personalized and objective performance evaluation: a virtual surgical assist. Here we present machine learning models developed for analyzing surgical finesse using tool-tissue interaction force data in surgical dissection obtained from a sensorized bipolar forceps. Data modeling was performed using 50 neurosurgery procedures that involved elective surgical treatment for various intracranial pathologies. The data collection was conducted by 13 surgeons of varying experience levels using sensorized bipolar forceps, SmartForceps System. The machine learning algorithm constituted design and implementation for three primary purposes, i.e., force profile segmentation for obtaining active periods of tool utilization using T-U-Net, surgical skill classification into Expert and Novice, and surgical task recognition into two primary categories of Coagulation versus non-Coagulation using FTFIT deep learning architectures. The final report to surgeon was a dashboard containing recognized segments of force application categorized into skill and task classes along with performance metrics charts compared to expert level surgeons. Operating room data recording of > 161 h containing approximately 3.6 K periods of tool operation was utilized. The modeling resulted in Weighted F1-score = 0.95 and AUC = 0.99 for force profile segmentation using T-U-Net, Weighted F1-score = 0.71 and AUC = 0.81 for surgical skill classification, and Weighted F1-score = 0.82 and AUC = 0.89 for surgical task recognition using a subset of hand-crafted features augmented to FTFIT neural network. This study delivers a novel machine learning module in a cloud, enabling an end-to-end platform for intraoperative surgical performance monitoring and evaluation. Accessed through a secure application for professional connectivity, a paradigm for data-driven learning is established.


List of Supplementary Figures and Tables
: List and description of hand-crafted features for the surgical pattern recognition models. Table S2: Relative importance scores of features for skill classification with a threshold of 0.05 using XGBoost model. Table S3: Relative importance scores of features for skill classification with a threshold of 0.03 using KNN model. Table S4: Relative importance scores of features for task recognition with a threshold of 0.05 using XGBoost model.

)
A rule-based data point filtering was applied to mitigate the problem of imbalanced data in ON and OFF conditions for the recorded force data. In fact, 93.7% of the force data points were labeled as OFF (among a total of 11.6 million records), meaning that inactive status constructs most operating room times for SmartForceps. The algorithm performed inactive state removal by eliminating the excessive idle time points when the rolling average with a window of 5 for the left and right prong forces was less than or equal to 0.3 (N). The points with overlapping OFF labels in both rule-based and manually labeled data were removed from the analysis (data size reduced to approximately 398K records) (Fig. S1). This data regularization method resulted in 54.4% in ON labels and 45.6% in OFF labels, making the segmentation labels balanced across the two classes.
Fig. S1 | Pseudo code for rule-based data point filtering to balance ON and OFF data samples. A rule-based algorithm was designed to remove the excessive inactive time points when the rolling average with a window of 5 for the left and right prong forces were less than or equal to 0.3 (N). Data points with overlapping 0 (OFF) labels in both rule-based and manually labelled indices were removed.
Fr, idx: idx th data point of the right prong force profile. Fl, idx: idx th data point of the left prong force profile. Fs: SmartForceps force profile time-series. MAidx (X, window=w): Moving average of time-series X with a window size of w at idx th data point. init_seg_ididx: Initial segmentation ID (i.e., 0: OFF, 1: ON) at idx th data point.

a. T-U-Net
A custom-designed U-Net (T-U-Net: Time-series-U-Net) model was implemented that consisted of a convolutional encoder and decoder structure to capture the properties and reconstruct the force profile ( !" ∈ ℝ $ ! × ! × & : ' fixed-length segment interval each containing data points through = 2 channels for left and right prong) through a deep stack of feature maps. A mean-pooling-based classifier follows this on point-wise confidence scores for interval-wise timeseries segmentation ( ()*. ∈ ℝ $ × , : final segment intervals containing = 2 segment classes, i.e., device ON/OFF). For the training parameters, we considered Adam as the optimizer, Categorical Cross-Entropy as the loss function, and accuracy and validation loss as the evaluation metrics for a random 20% subset of training data as the validation data.  Fig. S2, and the expanded view of the model created by https://netron.app is present in Fig. S3.     The graph shows detailed procedure names and attribute values for skill classification model (depth size = 6). Note: the network for task recognition is not included in the report to avoid duplication. The network included multiple layers including a stacked series of convolutional layers to learn the features followed by a concatenation layer, a bottleneck layer to reduce the dimensionality accompanied by a max pooling layer. The extracted features were fused into the network after resampling and normalization as a new dimension to the network. The last layer shaped the probabilities of different classes, e.g., surgical proficiency scores or the task categories. The visualization was created in https://netron.app. ?×200×2

o LSTM Model
A recurrent neural network based on LSTM that includes an input layer for the segmented force data ( ()*. ∈ ℝ $ × & ), LSTM layers with TanH activation to interpret the extracted features, a dropout regularization layer, a ReLU activation layer, and an output layer with Softmax activation providing the probability distribution of each surgical task class. The network weights Q which characterizes the behavior of transformations were identified through nonlinear optimization methods, i.e., Adam, to minimize the loss function, e.g., Categorical Cross-Entropy and a customized loss function (details in the supplementary codes), in the training data and backpropagation of error throughout the network for updating the weights. The performance of our models was evaluated by generalization through testing on previously unseen data using accuracy and validation loss. A grid search was applied over the learning rate (between 0.001-0.1), the LSTM unit size (between 100-600), input data window size (between 96-200), and batch size (between 32-128) to tune the hyperparameters. The model architecture visualization is shown in Fig. S6.   Sum of the first ten squared autocorrelation coefficients from force time-series signal in one task segment

Results
• Force Profile Segmentation Model (Fig. 1, Section 2.2) (  • Surgical Skill Classification Model (Fig. 1, Section 2.3) o XGBoost Model A baseline model using XGBoost which allows parallel computation was implemented for skill classification. A regularization parameter lambda was used to avoid overfitting by limiting the gain and similarity score of the model tree nodes. Tree pruning was applied using cover, i.e., minimum allowed child node weight (minimum child weight). In addition, learning rate (i.e.

Statistical Explanation on Feature Selection:
A feature with more relevance and greater contribution to the performance of the ML model has higher importance ranking. The reason can be discussed in the context of their statistical analysis and significance. In skill classification, some features showed a significant difference between skill classes in two-way ANOVA tests (   • Surgical Task Recognition Model (Fig. 1, Section 2.