Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks

Kelly, Jason D.; Petersen, Ashley; Lendvay, Thomas S.; Kowalewski, Timothy M.

doi:10.1007/s11548-020-02269-x

Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks

Original Article
Published: 30 September 2020

Volume 15, pages 2079–2088, (2020)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Jason D. Kelly ORCID: orcid.org/0000-0002-7800-4314¹,
Ashley Petersen²,
Thomas S. Lendvay³ &
…
Timothy M. Kowalewski¹

451 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose

The majority of historical surgical skill research typically analyzes holistic summary task-level metrics to create a skill classification for a performance. Recent advances in machine learning allow time series classification at the sub-task level, allowing predictions on segments of tasks, which could improve task-level technical skill assessment.

Methods

A bidirectional long short-term memory (LSTM) network was used with 8-s windows of multidimensional time-series data from the Basic Laparoscopic Urologic Skills dataset. The network was trained on experts and novices from four common surgical tasks. Stratified cross-validation with regularization was used to avoid overfitting. The misclassified cases were re-submitted for surgical technical skill assessment to crowds using Amazon Mechanical Turk to re-evaluate and to analyze the level of agreement with previous scores.

Results

Performance was best for the suturing task, with 96.88% accuracy at predicting whether a performance was an expert or novice, with 1 misclassification, when compared to previously obtained crowd evaluations. When compared with expert surgeon ratings, the LSTM predictions resulted in a Spearman coefficient of 0.89 for suturing tasks. When crowds re-evaluated misclassified performances, it was found that for all 5 misclassified cases from peg transfer and suturing tasks, the crowds agreed more with our LSTM model than with the previously obtained crowd scores.

Conclusion

The technique presented shows results not incomparable with labels which would be obtained from crowd-sourced labels of surgical tasks. However, these results bring about questions of the reliability of crowd sourced labels in videos of surgical tasks. We, as a research community, should take a closer look at crowd labeling with higher scrutiny, systematically look at biases, and quantify label noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skills Evaluation of Specific Surgical Tasks Using Long Short Term Memory Networks

Video-based surgical skill assessment using 3D convolutional neural networks

Article 18 May 2019

AIxSuture: vision-based assessment of open suturing skills

Article Open access 25 March 2024

References

Balasubramanian S, Melendez-Calderon A, Burdett E (2012) A robust and sensitive metric for quantifying movement smoothness. IEEE Trans Biomed Eng 59(8):2126–2136
Article CAS Google Scholar
Hung A, Chen J, Che Z, Nilanon T, Jarc A, Titus M, Oh PJ, Gill IS, Liu Y (2018) Utilizing machine learning and automated performance metrics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. J Endourol 32(5):438–444
Article Google Scholar
Kowalewski TM, White LW, Lendvay TS, Jiang IS, Sweet RS, Wright A, Hannaford B, Sinanan MN (2014) Beyond task time: automated measurements augments fundamentals of laparoscopic skills methodology. J Surg Res 192(2):329–338
Article Google Scholar
Dockter R, Lendvay TS, Sweet RM, Kowalewski TM (2017) The minimally acceptable classification criterion for surgical skill: intent vectors and separability of raw motion data. Int J Comput Assist Radiol Surg 12:1151–1159
Article Google Scholar
Lin HC, Shafran I, Murphy TE, Okamura AM, Yuh DD, Hager GD (2005) Automatic detection and segmentation of robot-assisted surgical motions. In: Duncan JS, Gerig G (eds) Medical image computing and computer-assisted intervention: MICCAI 2005. Lecture notes in computer science, vol 3749. Springer, Berlin
Google Scholar
Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJ (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369(15):1434–1442
Article CAS Google Scholar
Vassiliou MC, Feldman LS, Andrew CG, Bergman S, Leffondre K, Stanbridge D, Fried GM (2005) A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg 190(1):107–113
Article Google Scholar
Chen C, White L, Kowalewski T, Aggarwal R, Lintott C, Comstock B, Kuksenok K, Aragon C, Holst D, Lendvay T (2013) Crowd-sourced assessment of technical skills: a novel method to evaluate surgical performance. J Surg Res 187(1):65–71
Article Google Scholar
Kelly JD, Peterson A, Lendvay TS, Kowalewski TM (2020) The effect of video playback speed on surgeon technical skill perception. In: International proceedings of computer-assisted interventions—IPCAI 2020. Munich, Germany.
Huaulme A, Voros S, Riffaud L, Forestier G, Moreau-Gaudry A, Jannin P (2017) Distinguishing surgical behavior by sequential pattern discovery. J Biomed Inform 67:34–41
Article Google Scholar
Forestier G, Petitjean F, Senin P, Despinoy F, Huaulme A, Fawaz HI, Weber J, Idoumghar L, Muller PA, Jannin P (2018) Surgical motion analysis using discriminative interpretable patterns. Artif Intell Med 91:3–11
Article Google Scholar
Malpani A, Lea C, Chen CCG, Hager GD (2016) System events: readily accessible features for surgical phase detection. Int J Comput Assist Radiol Surg 11(6):1201–1209
Article Google Scholar
Lea C, Reiter A, Vidal R, Hager GD (2016) Segmental spatio-temporal cnns for fine-grained action segmentation and classification. arXiv:1602.02995
Wang Z, Fey AM (2018) Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int J Comput Assist Radiol Surg 13:1959–1970
Article Google Scholar
Doughty H, Damen D, Mayol-Cuevas WM (2017) Who’s better, who’s best: skill determination in video using deep ranking. arXiv:1703.09913
Zia A, Zhang C, Xiong X, Jarc A (2017) Temporal clustering of surgical activities in robot-assisted surgery. Int J Comput Assist Radiol Surg 12:1171–1178
Article Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article CAS Google Scholar
Schuster M, Paliwal KP (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(5):2673–2681
Article Google Scholar
Kowalewski T, Comstock B, Sweet R, Schaffhausen C, Menhadji A, Averch T, Box G, Brand T, Ferrandino M, Kaouk J, Knudsen B, Landman J, Lee B, Schwartz BF, McDougall E, Lendvay TS (2015) Crowd-sourced assessment of technical skills for validation of basic laparoscopic urologic skills (BLUS) tasks. J Urol 195(6):1859–1865
Article Google Scholar
Derossis AM, Fried GM, Abrahamowicz M, Sigman HH, Barkun JS, Meakins JL (1998) Development of a model for training and evaluation of laparoscopic skills. Am J Surg 175:482
Article CAS Google Scholar
Fried GM (2008) FLS assessment of competency using simulated laparoscopic tasks. J Gastroenterol Surg 12:210
Article Google Scholar
Peters JH, Fried GM, Swanstrom LL, Soper NJ, Silin LF, Schirmer B, Hoffman K (2004) Development and validation of a comprehensive program of education and assessment of the basic fundamentals of laparoscopic surgery. Surgery 135:21
Article Google Scholar
Seete RM, Beach R, Sainfort F, Gupta P, Reihsen T, Poniatowski LH, McDougall EM (2012) Introduction and validation of the American urological association basic laparoscopic urology surgery skills curriculum. J Endourol 26:190
Article Google Scholar
Kowalewski TM, Seet R, Lendvay TS, Menhadji A, Averch T, Box G, Brand T, Ferrandino M, Kaouk J, Knudsen B, Landman J, Lee B, Schwartz BF, McDougall E (2016) Validation of the AUA BLUS tasks. J Urol 195:998
Article Google Scholar
French A, Seidel K, Lendvay TS, Kowalewski TM (2018) Role of contextual information in skill evaluation of minimally invasive surgical training procedures. In: Hamlyn symposium on medical robotics, London, United Kingdom

Download references

Author information

Authors and Affiliations

Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN, USA
Jason D. Kelly & Timothy M. Kowalewski
Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
Ashley Petersen
Department of Urology, Seattle Children’s Hospital, Seattle, WA, USA
Thomas S. Lendvay

Authors

Jason D. Kelly
View author publications
You can also search for this author in PubMed Google Scholar
Ashley Petersen
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Lendvay
View author publications
You can also search for this author in PubMed Google Scholar
Timothy M. Kowalewski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason D. Kelly.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical standard

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Funding

This work was supported, in part, by the Office of the Assistant Secretary of Defense for Health Affairs under Award No. W81XWH-15-2-0030, the National Science Foundation M3X CAREER grant under Award No. 1847610, as well as the National Institutes of Health’s National Center for Advancing Translational Sciences, Grant UL1TR002494. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the Department of Defense, the National Science Foundation, or the National Institutes of Health’s National Center for Advancing Translational Sciences.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kelly, J.D., Petersen, A., Lendvay, T.S. et al. Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks. Int J CARS 15, 2079–2088 (2020). https://doi.org/10.1007/s11548-020-02269-x

Download citation

Received: 13 March 2020
Accepted: 23 September 2020
Published: 30 September 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11548-020-02269-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks