Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations

Regef, Justin; Talasila, Likhit; Wiercigroch, Julia; Lin, R. Jun; Kahrs, Lueder A.

doi:10.1007/s11548-024-03118-x

Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations

Original Article
Published: 23 April 2024

(2024)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Justin Regef ORCID: orcid.org/0009-0007-9140-0198^1,2,
Likhit Talasila^1,2,
Julia Wiercigroch ORCID: orcid.org/0000-0001-9997-4371^1,3,
R. Jun Lin⁴ &
…
Lueder A. Kahrs^1,2,3,4,5

66 Accesses
Explore all metrics

Abstract

Purpose

Surface reconstructions from laryngoscopic videos have the potential to assist clinicians in diagnosing, quantifying, and monitoring airway diseases using minimally invasive techniques. However, tissue movements and deformations make these reconstructions challenging using conventional pipelines.

Methods

To facilitate such reconstructions, we developed video frame pre-filtering and featureless dense matching steps to enhance the Alicevision Meshroom SfM pipeline. Time and the anterior glottic angle were used to approximate the rigid state of the airway and to collect frames with different camera poses. Featureless dense matches were tracked with a correspondence transformer across subsets of images to extract matched points that could be used to estimate the point cloud and reconstructed surface. The proposed pipeline was tested on a simulated dataset under various conditions like illumination and resolution as well as real laryngoscopic videos.

Results

Our pipeline was able to reconstruct the laryngeal region based on 4, 8, and 16 images obtained from simulated and real patient exams. The pipeline was robust to sparse inputs, blur, and extreme lighting conditions, unlike the Meshroom pipeline which failed to produce a point cloud for 6 of 15 simulated datasets.

Conclusion

The pre-filtering and featureless dense matching modules specialize the conventional SfM pipeline to handle the challenging laryngoscopic examinations, directly from patient videos. These 3D visualizations have the potential to improve spatial understanding of airway conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Endoscopogram: A 3D Model Reconstructed from Endoscopic Video Frames

Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy

Robust camera localisation with depth reconstruction for bronchoscopic navigation

Article 23 April 2015

References

Holsinger FC, Kies MS, Weinstock YE, Lewin JS, Hajibashi S, Nolen DD, Weber R, Laccourreye O (2008) Examination of the larynx and pharynx. New Engl J Med 358:2. https://doi.org/10.1056/NEJMvcm0706392
Article Google Scholar
Ridley MB, Kelly JH, Marsh BR, Roa A (1995) The Larynx : a multidisciplinary approach, 2nd edn. Mosby, St. Louis
Google Scholar
Luegmair G, Mehta DD, Kobler JB, Dollinger M (2015) Three-dimensional optical reconstruction of vocal fold kinematics using highspeed video with a laser projection system. IEEE Trans Med Imaging 34(12):2572–2582. https://doi.org/10.1109/TMI.2015.2445921
Article PubMed PubMed Central Google Scholar
Fast JF, Dava HR, Ruppel AK, Kundrat D, Krauth M, Laves MH, Spindeldreier S, Kahrs LA, Ptok M (2021) Stereo laryngoscopic impact site prediction for droplet-based stimulation of the laryngeal adductor reflex. IEEE Access 9:112177–112192. https://doi.org/10.1109/ACCESS.2021.3103049
Article Google Scholar
Miyamoto M, Ohara A, Arai T, Koyanagi M, Watanabe I, Nakagawa H, Yokoyama K, Saito K (2019) Three-dimensional imaging of vocalizing larynx by ultra-high-resolution computed tomography. Eur Arch Oto-Rhino-Laryngol. https://doi.org/10.1007/s00405-019-05620-4
Article Google Scholar
Sun D, Liu J, Linte CA, Duan H, Robb RA (2013) Surface reconstruction from tracked endoscopic video using the structure from motion approach. In: Liao H, Linte CA, Masamune K, Peters TM, Zheng G (eds) Augmented reality environments for medical imaging and computer-assisted interventions. Springer, Berlin, pp 127–135
Chapter Google Scholar
Recasens D, Lamarca J, Facil JM, Montiel JMM, Civera J (2021) Endo-depth-and-motion: localization and reconstruction in endoscopic videos using depth networks and photometric constraints. CoRR abs/2103.16525. https://arxiv.org/abs/2103.16525
DeVore EK, Adamian N, Jowett N, Wang T, Song P, Franco R, Naunheim MR (2022) Predictive outcomes of deep learning measure-ment of the anterior glottic angle in bilateral vocal fold immobility. Laryngoscope. https://doi.org/10.1002/lary.30473
Article PubMed Google Scholar
Ozyoruk KB, Gokceler GI, Bobrow TL, Coskun G, Incetan K, Almalioglu Y, Mahmood F, Curto E, Perdigoto L, Oliveira M, Sahin H, Araujo H, Alexandrino H, Durr NJ, Gilbert HB, Turan M (2021) Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Med Image Anal 71:102058. https://doi.org/10.1016/j.media.2021.102058
Article PubMed Google Scholar
Edwards PJE, Psychogyios D, Speidel S, Maier-Hein L, Stoyanov D (2022) Serv-ct: a disparity dataset from cone-beam ct for validation of endoscopic 3d reconstruction. Med Image Anal 76:102302. https://doi.org/10.1016/j.media.2021.102302
Article PubMed PubMed Central Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94/METRICS
Article Google Scholar
Griwodz C, Gasparini S, Calvet L, Gurdjos P, Castan F, Maujean B, Lanthony Y, Lillo GD (2021) Alicevision meshroom. pp. 241–247. https://doi.org/10.1145/3458305.3478443
VisualSFM: a visual structure from motion system. http://ccwu.me/vsfm/index.html
Dong J, Soatto S (2014) Domain-size pooling in local descriptors: Dsp-sift. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, 07–12-June-2015, pp. 5097–5106. https://doi.org/10.48550/arxiv.1412.8556
Alcantarilla PF, Nuevo J, Bartoli A (2013) Fast explicit diffusion for accelerated features in nonlinear scale spaces. In: BMVC 2013-electronic proceedings of the british machine vision conference 2013. https://doi.org/10.5244/C.27.13
Levy B, Petitjean S, Ray N, Maillot J (2002) Least squares conformal maps for automatic texture atlas generation. ACM Trans Graph (TOG) 21:362–371. https://doi.org/10.1145/566654.566590
Article Google Scholar
Okatani T, Deguchi K (1997) Shape reconstruction from an endoscope image by shape from shading technique for a point light source at the projection center. Comput Vis Image Underst 66(2):119–131. https://doi.org/10.1006/cviu.1997.0613
Article Google Scholar
Ren Z, He T, Peng L, Liu S, Zhu S, Zeng B (2017) Shape recovery of endoscopic videos by shape from shading using mesh regularization. In: Zhao Y, Kong X, Taubman D (eds) Image and graphics. Springer, Cham, pp 204–213
Chapter Google Scholar
Prinzen M, Trost J, Bergen T, Nowack S, Wittenberg T (2015) 3d shape reconstruction of the esophagus from gastroscopic video. In: Handels H, Deserno TM, Meinzer H-P, Tolxdorff T (eds) Bildverarbeitung F ̈ur die Medizin 2015. Springer, Berlin, pp 173–178
Chapter Google Scholar
Widya AR, Torii A, Okutomi M (2018) Structure-from-motion using dense cnn features with keypoint relocalization. IPSJ Trans Comput Vis Appl 10:1–7. https://doi.org/10.48550/arxiv.1805.03879
Article Google Scholar
Jiang W, Trulls E, Hosang J, Tagliasacchi A, Yi KM (2021) COTR: correspondence transformer for matching across images. In; Proceedings of the IEEE/CVF international conference on computer vision (ICCV). abs/2103.14167. https://arxiv.org/abs/2103.14167
Revaud J, Weinzaepfel P, Harchaoui Z, Schmid C (2015) Deep convolutional matching. CoRR abs/1506.07656. https://arxiv.org/abs/1506.07656
Sun J, Shen Z, Wang Y, Bao H, Zhou X (2021) LoFTR: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Sidhu V, Tretschk E, Golyanik V, Agudo A, Theobalt C (2020) Neural dense non-rigid structure from motion with latent space constraints. In: European conference on computer vision (ECCV)
Golyanik V, Jonas A, Stricker D, Theobalt C (2020) Intrinsic dynamic shape prior for dense non-rigid structure from motion. In: 2020 international conference on 3D vision (3DV), pp. 692–701
Agudo A, Moreno-Noguer F, Calvo B, Montiel JMM (2016) Sequential non-rigid structure from motion using physical priors. IEEE Trans Pattern Anal Mach Intell 38(5):979–994. https://doi.org/10.1109/TPAMI.2015.2469293
Article PubMed Google Scholar
Agudo A (2021) Total estimation from rgb video: on-line camera self-calibration, non-rigid shape and motion. In: 2020 25th international conference on pattern recognition (ICPR), pp. 8140–8147. https://doi.org/10.1109/ICPR48806.2021.9412923
Adamian N, Naunheim MR, Jowett N (2020) An open-source computer vision tool for automated vocal fold tracking from videoendoscopy. Laryngoscope 131:E219–E225. https://doi.org/10.1002/lary.28669
Article PubMed Google Scholar
3D Molier International: Turbosquid: 3D Human Respiratory (2019) https://www.turbosquid.com/3d-models/3d-human-respiratory-1469281
Ashton M, Barnard J, Casset F, Charlton M, Downs G, Gorse D, Holliday J, Lahana R, Willett P (2002) Identification of diverse database subsets using property-based and fragment-based molecular descriptions. Quant Struct-Act Relat 21(6):598–604. https://doi.org/10.1002/qsar.200290002
Article CAS Google Scholar
Schneider CA, Rasband WS, Eliceiri KW (2012) Nih image to imagej: 25 years of image analysis. Nat Methods 9:671–675. https://doi.org/10.1038/nmeth.2089
Article CAS PubMed PubMed Central Google Scholar
Su M-C, Yeh T-H, Tan C-T, Lin C-D, Linne O-C, Lee S-Y (2002) Measurement of adult vocal fold length. J Laryngol Otol 116(6):447–449. https://doi.org/10.1258/0022215021911257
Article PubMed Google Scholar
Zhao Q, Price T, Pizer S, Niethammer M, Alterovitz R, Rosenman J (2016) The endoscopogram: a 3d model reconstructed from endoscopic video frames. In: Ourselin S, Joskowicz L, Sabuncu MR, Unal G, Wells W (eds) Medical image computing and computer-assisted intervention–MICCAI 2016. Springer, Cham, pp 439–447
Google Scholar

Download references

Funding

We would like to thank the following sources of funding: (1) Department Research Award, Department of Otolarngology—Head & Neck Surgery, Temerty Faculty of Medicine, University of Toronto, and (2) University of Toronto Mississauga Undergraduate Research Grant.

Author information

Authors and Affiliations

Medical Computer Vision and Robotics Lab, University of Toronto, Toronto, ON, Canada
Justin Regef, Likhit Talasila, Julia Wiercigroch & Lueder A. Kahrs
Department of Mathematical and Computational Sciences, University of Toronto Mississauga, 3359 Mississauga Rd, Mississauga, ON, L5L 1C6, Canada
Justin Regef, Likhit Talasila & Lueder A. Kahrs
Department of Computer Science, University of Toronto, 40 St George St, Toronto, ON, M5S 2E4, Canada
Julia Wiercigroch & Lueder A. Kahrs
Department of Otolaryngology – Head & Neck Surgery, Unity Health Toronto – St. Michael’s Hospital, Temerty Faculty of Medicine, University of Toronto, 36 Queen St E, Toronto, ON, M5B 1W8, Canada
R. Jun Lin & Lueder A. Kahrs
Institute of Biomedical Engineering, University of Toronto, 164 College Street, Toronto, ON, M5S 3G9, Canada
Lueder A. Kahrs

Authors

Justin Regef
View author publications
You can also search for this author in PubMed Google Scholar
Likhit Talasila
View author publications
You can also search for this author in PubMed Google Scholar
Julia Wiercigroch
View author publications
You can also search for this author in PubMed Google Scholar
R. Jun Lin
View author publications
You can also search for this author in PubMed Google Scholar
Lueder A. Kahrs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Justin Regef.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Ethical approval

This study was approved by the Unity Health Toronto Research Ethics Board (REB#: 20–235).

Informed consent

Informed consent was obtained from all individual participants included in the study. Patients signed informed consent regarding their data and photographs.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 498 kb)

Supplementary file2 (MP4 20233 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Regef, J., Talasila, L., Wiercigroch, J. et al. Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations. Int J CARS (2024). https://doi.org/10.1007/s11548-024-03118-x

Download citation

Received: 17 August 2023
Accepted: 21 March 2024
Published: 23 April 2024
DOI: https://doi.org/10.1007/s11548-024-03118-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Laryngeal surface reconstructions from monocular endoscopic videos: a structure from motion pipeline for periodic deformations