Skip to main content
Log in

Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Purpose

The majority of historical surgical skill research typically analyzes holistic summary task-level metrics to create a skill classification for a performance. Recent advances in machine learning allow time series classification at the sub-task level, allowing predictions on segments of tasks, which could improve task-level technical skill assessment.

Methods

A bidirectional long short-term memory (LSTM) network was used with 8-s windows of multidimensional time-series data from the Basic Laparoscopic Urologic Skills dataset. The network was trained on experts and novices from four common surgical tasks. Stratified cross-validation with regularization was used to avoid overfitting. The misclassified cases were re-submitted for surgical technical skill assessment to crowds using Amazon Mechanical Turk to re-evaluate and to analyze the level of agreement with previous scores.

Results

Performance was best for the suturing task, with 96.88% accuracy at predicting whether a performance was an expert or novice, with 1 misclassification, when compared to previously obtained crowd evaluations. When compared with expert surgeon ratings, the LSTM predictions resulted in a Spearman coefficient of 0.89 for suturing tasks. When crowds re-evaluated misclassified performances, it was found that for all 5 misclassified cases from peg transfer and suturing tasks, the crowds agreed more with our LSTM model than with the previously obtained crowd scores.

Conclusion

The technique presented shows results not incomparable with labels which would be obtained from crowd-sourced labels of surgical tasks. However, these results bring about questions of the reliability of crowd sourced labels in videos of surgical tasks. We, as a research community, should take a closer look at crowd labeling with higher scrutiny, systematically look at biases, and quantify label noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Balasubramanian S, Melendez-Calderon A, Burdett E (2012) A robust and sensitive metric for quantifying movement smoothness. IEEE Trans Biomed Eng 59(8):2126–2136

    Article  CAS  Google Scholar 

  2. Hung A, Chen J, Che Z, Nilanon T, Jarc A, Titus M, Oh PJ, Gill IS, Liu Y (2018) Utilizing machine learning and automated performance metrics to evaluate robot-assisted radical prostatectomy performance and predict outcomes. J Endourol 32(5):438–444

    Article  Google Scholar 

  3. Kowalewski TM, White LW, Lendvay TS, Jiang IS, Sweet RS, Wright A, Hannaford B, Sinanan MN (2014) Beyond task time: automated measurements augments fundamentals of laparoscopic skills methodology. J Surg Res 192(2):329–338

    Article  Google Scholar 

  4. Dockter R, Lendvay TS, Sweet RM, Kowalewski TM (2017) The minimally acceptable classification criterion for surgical skill: intent vectors and separability of raw motion data. Int J Comput Assist Radiol Surg 12:1151–1159

    Article  Google Scholar 

  5. Lin HC, Shafran I, Murphy TE, Okamura AM, Yuh DD, Hager GD (2005) Automatic detection and segmentation of robot-assisted surgical motions. In: Duncan JS, Gerig G (eds) Medical image computing and computer-assisted intervention: MICCAI 2005. Lecture notes in computer science, vol 3749. Springer, Berlin

    Google Scholar 

  6. Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJ (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369(15):1434–1442

    Article  CAS  Google Scholar 

  7. Vassiliou MC, Feldman LS, Andrew CG, Bergman S, Leffondre K, Stanbridge D, Fried GM (2005) A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg 190(1):107–113

    Article  Google Scholar 

  8. Chen C, White L, Kowalewski T, Aggarwal R, Lintott C, Comstock B, Kuksenok K, Aragon C, Holst D, Lendvay T (2013) Crowd-sourced assessment of technical skills: a novel method to evaluate surgical performance. J Surg Res 187(1):65–71

    Article  Google Scholar 

  9. Kelly JD, Peterson A, Lendvay TS, Kowalewski TM (2020) The effect of video playback speed on surgeon technical skill perception. In: International proceedings of computer-assisted interventions—IPCAI 2020. Munich, Germany.

  10. Huaulme A, Voros S, Riffaud L, Forestier G, Moreau-Gaudry A, Jannin P (2017) Distinguishing surgical behavior by sequential pattern discovery. J Biomed Inform 67:34–41

    Article  Google Scholar 

  11. Forestier G, Petitjean F, Senin P, Despinoy F, Huaulme A, Fawaz HI, Weber J, Idoumghar L, Muller PA, Jannin P (2018) Surgical motion analysis using discriminative interpretable patterns. Artif Intell Med 91:3–11

    Article  Google Scholar 

  12. Malpani A, Lea C, Chen CCG, Hager GD (2016) System events: readily accessible features for surgical phase detection. Int J Comput Assist Radiol Surg 11(6):1201–1209

    Article  Google Scholar 

  13. Lea C, Reiter A, Vidal R, Hager GD (2016) Segmental spatio-temporal cnns for fine-grained action segmentation and classification. arXiv:1602.02995

  14. Wang Z, Fey AM (2018) Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int J Comput Assist Radiol Surg 13:1959–1970

    Article  Google Scholar 

  15. Doughty H, Damen D, Mayol-Cuevas WM (2017) Who’s better, who’s best: skill determination in video using deep ranking. arXiv:1703.09913

  16. Zia A, Zhang C, Xiong X, Jarc A (2017) Temporal clustering of surgical activities in robot-assisted surgery. Int J Comput Assist Radiol Surg 12:1171–1178

    Article  Google Scholar 

  17. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  Google Scholar 

  18. Schuster M, Paliwal KP (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(5):2673–2681

    Article  Google Scholar 

  19. Kowalewski T, Comstock B, Sweet R, Schaffhausen C, Menhadji A, Averch T, Box G, Brand T, Ferrandino M, Kaouk J, Knudsen B, Landman J, Lee B, Schwartz BF, McDougall E, Lendvay TS (2015) Crowd-sourced assessment of technical skills for validation of basic laparoscopic urologic skills (BLUS) tasks. J Urol 195(6):1859–1865

    Article  Google Scholar 

  20. Derossis AM, Fried GM, Abrahamowicz M, Sigman HH, Barkun JS, Meakins JL (1998) Development of a model for training and evaluation of laparoscopic skills. Am J Surg 175:482

    Article  CAS  Google Scholar 

  21. Fried GM (2008) FLS assessment of competency using simulated laparoscopic tasks. J Gastroenterol Surg 12:210

    Article  Google Scholar 

  22. Peters JH, Fried GM, Swanstrom LL, Soper NJ, Silin LF, Schirmer B, Hoffman K (2004) Development and validation of a comprehensive program of education and assessment of the basic fundamentals of laparoscopic surgery. Surgery 135:21

    Article  Google Scholar 

  23. Seete RM, Beach R, Sainfort F, Gupta P, Reihsen T, Poniatowski LH, McDougall EM (2012) Introduction and validation of the American urological association basic laparoscopic urology surgery skills curriculum. J Endourol 26:190

    Article  Google Scholar 

  24. Kowalewski TM, Seet R, Lendvay TS, Menhadji A, Averch T, Box G, Brand T, Ferrandino M, Kaouk J, Knudsen B, Landman J, Lee B, Schwartz BF, McDougall E (2016) Validation of the AUA BLUS tasks. J Urol 195:998

    Article  Google Scholar 

  25. French A, Seidel K, Lendvay TS, Kowalewski TM (2018) Role of contextual information in skill evaluation of minimally invasive surgical training procedures. In: Hamlyn symposium on medical robotics, London, United Kingdom

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason D. Kelly.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Ethical standard

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Funding

This work was supported, in part, by the Office of the Assistant Secretary of Defense for Health Affairs under Award No. W81XWH-15-2-0030, the National Science Foundation M3X CAREER grant under Award No. 1847610, as well as the National Institutes of Health’s National Center for Advancing Translational Sciences, Grant UL1TR002494. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the Department of Defense, the National Science Foundation, or the National Institutes of Health’s National Center for Advancing Translational Sciences.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kelly, J.D., Petersen, A., Lendvay, T.S. et al. Bidirectional long short-term memory for surgical skill classification of temporally segmented tasks. Int J CARS 15, 2079–2088 (2020). https://doi.org/10.1007/s11548-020-02269-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11548-020-02269-x

Keywords

Navigation