Skip to main content
Log in

Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations

  • Original Article
  • Published:
International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Abstract

Purpose

Surface reconstructions from laryngoscopic videos have the potential to assist clinicians in diagnosing, quantifying, and monitoring airway diseases using minimally invasive techniques. However, tissue movements and deformations make these reconstructions challenging using conventional pipelines.

Methods

To facilitate such reconstructions, we developed video frame pre-filtering and featureless dense matching steps to enhance the Alicevision Meshroom SfM pipeline. Time and the anterior glottic angle were used to approximate the rigid state of the airway and to collect frames with different camera poses. Featureless dense matches were tracked with a correspondence transformer across subsets of images to extract matched points that could be used to estimate the point cloud and reconstructed surface. The proposed pipeline was tested on a simulated dataset under various conditions like illumination and resolution as well as real laryngoscopic videos.

Results

Our pipeline was able to reconstruct the laryngeal region based on 4, 8, and 16 images obtained from simulated and real patient exams. The pipeline was robust to sparse inputs, blur, and extreme lighting conditions, unlike the Meshroom pipeline which failed to produce a point cloud for 6 of 15 simulated datasets.

Conclusion

The pre-filtering and featureless dense matching modules specialize the conventional SfM pipeline to handle the challenging laryngoscopic examinations, directly from patient videos. These 3D visualizations have the potential to improve spatial understanding of airway conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Holsinger FC, Kies MS, Weinstock YE, Lewin JS, Hajibashi S, Nolen DD, Weber R, Laccourreye O (2008) Examination of the larynx and pharynx. New Engl J Med 358:2. https://doi.org/10.1056/NEJMvcm0706392

    Article  Google Scholar 

  2. Ridley MB, Kelly JH, Marsh BR, Roa A (1995) The Larynx : a multidisciplinary approach, 2nd edn. Mosby, St. Louis

    Google Scholar 

  3. Luegmair G, Mehta DD, Kobler JB, Dollinger M (2015) Three-dimensional optical reconstruction of vocal fold kinematics using highspeed video with a laser projection system. IEEE Trans Med Imaging 34(12):2572–2582. https://doi.org/10.1109/TMI.2015.2445921

    Article  PubMed  PubMed Central  Google Scholar 

  4. Fast JF, Dava HR, Ruppel AK, Kundrat D, Krauth M, Laves MH, Spindeldreier S, Kahrs LA, Ptok M (2021) Stereo laryngoscopic impact site prediction for droplet-based stimulation of the laryngeal adductor reflex. IEEE Access 9:112177–112192. https://doi.org/10.1109/ACCESS.2021.3103049

    Article  Google Scholar 

  5. Miyamoto M, Ohara A, Arai T, Koyanagi M, Watanabe I, Nakagawa H, Yokoyama K, Saito K (2019) Three-dimensional imaging of vocalizing larynx by ultra-high-resolution computed tomography. Eur Arch Oto-Rhino-Laryngol. https://doi.org/10.1007/s00405-019-05620-4

    Article  Google Scholar 

  6. Sun D, Liu J, Linte CA, Duan H, Robb RA (2013) Surface reconstruction from tracked endoscopic video using the structure from motion approach. In: Liao H, Linte CA, Masamune K, Peters TM, Zheng G (eds) Augmented reality environments for medical imaging and computer-assisted interventions. Springer, Berlin, pp 127–135

    Chapter  Google Scholar 

  7. Recasens D, Lamarca J, Facil JM, Montiel JMM, Civera J (2021) Endo-depth-and-motion: localization and reconstruction in endoscopic videos using depth networks and photometric constraints. CoRR abs/2103.16525. https://arxiv.org/abs/2103.16525

  8. DeVore EK, Adamian N, Jowett N, Wang T, Song P, Franco R, Naunheim MR (2022) Predictive outcomes of deep learning measure-ment of the anterior glottic angle in bilateral vocal fold immobility. Laryngoscope. https://doi.org/10.1002/lary.30473

    Article  PubMed  Google Scholar 

  9. Ozyoruk KB, Gokceler GI, Bobrow TL, Coskun G, Incetan K, Almalioglu Y, Mahmood F, Curto E, Perdigoto L, Oliveira M, Sahin H, Araujo H, Alexandrino H, Durr NJ, Gilbert HB, Turan M (2021) Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 71:102058. https://doi.org/10.1016/j.media.2021.102058

    Article  PubMed  Google Scholar 

  10. Edwards PJE, Psychogyios D, Speidel S, Maier-Hein L, Stoyanov D (2022) Serv-ct: a disparity dataset from cone-beam ct for validation of endoscopic 3d reconstruction. Med Image Anal 76:102302. https://doi.org/10.1016/j.media.2021.102302

    Article  PubMed  PubMed Central  Google Scholar 

  11. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94/METRICS

    Article  Google Scholar 

  12. Griwodz C, Gasparini S, Calvet L, Gurdjos P, Castan F, Maujean B, Lanthony Y, Lillo GD (2021) Alicevision meshroom. pp. 241–247. https://doi.org/10.1145/3458305.3478443

  13. VisualSFM: a visual structure from motion system. http://ccwu.me/vsfm/index.html

  14. Dong J, Soatto S (2014) Domain-size pooling in local descriptors: Dsp-sift. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 07–12-June-2015, pp. 5097–5106. https://doi.org/10.48550/arxiv.1412.8556

  15. Alcantarilla PF, Nuevo J, Bartoli A (2013) Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: BMVC 2013-electronic proceedings of the british machine vision conference 2013. https://doi.org/10.5244/C.27.13

  16. Levy B, Petitjean S, Ray N, Maillot J (2002) Least squares conformal maps for automatic texture atlas generation. ACM Trans Graph (TOG) 21:362–371. https://doi.org/10.1145/566654.566590

    Article  Google Scholar 

  17. Okatani T, Deguchi K (1997) Shape reconstruction from an endoscope image by shape from shading technique for a point light source at the projection center. Comput Vis Image Underst 66(2):119–131. https://doi.org/10.1006/cviu.1997.0613

    Article  Google Scholar 

  18. Ren Z, He T, Peng L, Liu S, Zhu S, Zeng B (2017) Shape recovery of endoscopic videos by shape from shading using mesh regularization. In: Zhao Y, Kong X, Taubman D (eds) Image and graphics. Springer, Cham, pp 204–213

    Chapter  Google Scholar 

  19. Prinzen M, Trost J, Bergen T, Nowack S, Wittenberg T (2015) 3d shape reconstruction of the esophagus from gastroscopic video. In: Handels H, Deserno TM, Meinzer H-P, Tolxdorff T (eds) Bildverarbeitung F ̈ur die Medizin 2015. Springer, Berlin, pp 173–178

    Chapter  Google Scholar 

  20. Widya AR, Torii A, Okutomi M (2018) Structure-from-motion using dense cnn features with keypoint relocalization. IPSJ Trans Comput Vis Appl 10:1–7. https://doi.org/10.48550/arxiv.1805.03879

    Article  Google Scholar 

  21. Jiang W, Trulls E, Hosang J, Tagliasacchi A, Yi KM (2021) COTR: correspondence transformer for matching across images. In; Proceedings of the IEEE/CVF international conference on computer vision (ICCV). abs/2103.14167. https://arxiv.org/abs/2103.14167

  22. Revaud J, Weinzaepfel P, Harchaoui Z, Schmid C (2015) Deep convolutional matching. CoRR abs/1506.07656. https://arxiv.org/abs/1506.07656

  23. Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) LoFTR: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

  24. Sidhu V, Tretschk E, Golyanik V, Agudo A, Theobalt C (2020) Neural dense non-rigid structure from motion with latent space constraints. In: European conference on computer vision (ECCV)

  25. Golyanik V, Jonas A, Stricker D, Theobalt C (2020) Intrinsic dynamic shape prior for dense non-rigid structure from motion. In: 2020 international conference on 3D vision (3DV), pp. 692–701

  26. Agudo A, Moreno-Noguer F, Calvo B, Montiel JMM (2016) Sequential non-rigid structure from motion using physical priors. IEEE Trans Pattern Anal Mach Intell 38(5):979–994. https://doi.org/10.1109/TPAMI.2015.2469293

    Article  PubMed  Google Scholar 

  27. Agudo A (2021) Total estimation from rgb video: on-line camera self-calibration, non-rigid shape and motion. In: 2020 25th international conference on pattern recognition (ICPR), pp. 8140–8147. https://doi.org/10.1109/ICPR48806.2021.9412923

  28. Adamian N, Naunheim MR, Jowett N (2020) An open-source computer vision tool for automated vocal fold tracking from videoendoscopy. Laryngoscope 131:E219–E225. https://doi.org/10.1002/lary.28669

    Article  PubMed  Google Scholar 

  29. 3D Molier International: Turbosquid: 3D Human Respiratory (2019) https://www.turbosquid.com/3d-models/3d-human-respiratory-1469281

  30. Ashton M, Barnard J, Casset F, Charlton M, Downs G, Gorse D, Holliday J, Lahana R, Willett P (2002) Identification of diverse database subsets using property-based and fragment-based molecular descriptions. Quant Struct-Act Relat 21(6):598–604. https://doi.org/10.1002/qsar.200290002

    Article  CAS  Google Scholar 

  31. Schneider CA, Rasband WS, Eliceiri KW (2012) Nih image to imagej: 25 years of image analysis. Nat Methods 9:671–675. https://doi.org/10.1038/nmeth.2089

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Su M-C, Yeh T-H, Tan C-T, Lin C-D, Linne O-C, Lee S-Y (2002) Measurement of adult vocal fold length. J Laryngol Otol 116(6):447–449. https://doi.org/10.1258/0022215021911257

    Article  PubMed  Google Scholar 

  33. Zhao Q, Price T, Pizer S, Niethammer M, Alterovitz R, Rosenman J (2016) The endoscopogram: a 3d model reconstructed from endoscopic video frames. In: Ourselin S, Joskowicz L, Sabuncu MR, Unal G, Wells W (eds) Medical image computing and computer-assisted intervention–MICCAI 2016. Springer, Cham, pp 439–447

    Google Scholar 

Download references

Funding

We would like to thank the following sources of funding: (1) Department Research Award, Department of Otolarngology—Head & Neck Surgery, Temerty Faculty of Medicine, University of Toronto, and (2) University of Toronto Mississauga Undergraduate Research Grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Justin Regef.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Ethical approval

This study was approved by the Unity Health Toronto Research Ethics Board (REB#: 20–235).

Informed consent

Informed consent was obtained from all individual participants included in the study. Patients signed informed consent regarding their data and photographs.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 498 kb)

Supplementary file2 (MP4 20233 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Regef, J., Talasila, L., Wiercigroch, J. et al. Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations. Int J CARS (2024). https://doi.org/10.1007/s11548-024-03118-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11548-024-03118-x

Keywords

Navigation