Computer Science > Computer Vision and Pattern Recognition
[Submitted on 21 Nov 2023 (v1), last revised 22 Nov 2023 (this version, v2)]
Title:Surgical Temporal Action-aware Network with Sequence Regularization for Phase Recognition
View PDFAbstract:To assist surgeons in the operating theatre, surgical phase recognition is critical for developing computer-assisted surgical systems, which requires comprehensive understanding of surgical videos. Although existing studies made great progress, there are still two significant limitations worthy of improvement. First, due to the compromise of resource consumption, frame-wise visual features are extracted by 2D networks and disregard spatial and temporal knowledge of surgical actions, which hinders subsequent inter-frame modeling for phase prediction. Second, these works simply utilize ordinary classification loss with one-hot phase labels to optimize the phase predictions, and cannot fully explore surgical videos under inadequate supervision. To overcome these two limitations, we propose a Surgical Temporal Action-aware Network with sequence Regularization, named STAR-Net, to recognize surgical phases more accurately from input videos. Specifically, we propose an efficient multi-scale surgical temporal action (MS-STA) module, which integrates visual features with spatial and temporal knowledge of surgical actions at the cost of 2D networks. Moreover, we devise the dual-classifier sequence regularization (DSR) to facilitate the training of STAR-Net by the sequence guidance of an auxiliary classifier with a smaller capacity. Our STAR-Net with MS-STA and DSR can exploit visual features of surgical actions with effective regularization, thereby leading to the superior performance of surgical phase recognition. Extensive experiments on a large-scale gastrectomy surgery dataset and the public Cholec80 benchmark prove that our STAR-Net significantly outperforms state-of-the-arts of surgical phase recognition.
Submission history
From: Zhen Chen [view email][v1] Tue, 21 Nov 2023 13:43:16 UTC (2,214 KB)
[v2] Wed, 22 Nov 2023 02:15:51 UTC (2,214 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.