Elsevier

Pattern Recognition

Volume 57, September 2016, Pages 61-69
Pattern Recognition

Simultaneous pose estimation and patient-specific model reconstruction from single image using maximum penalized likelihood estimation (MPLE)

https://doi.org/10.1016/j.patcog.2016.03.025Get rights and content

Highlights

  • Formulate the simultaneous pose estimation and surface model reconstruction as a MPLE.

  • Solve the MPLE problem effectively by embedding the ECM in the PSO.

  • Simultaneously optimize the pose and deformation rather than sequentially.

  • Model oriented object contour as a mixture of von Mises–Fisher Gaussian distributions.

  • Use probabilities instead of paired correspondences on contours with no delineation.

Abstract

Pose estimation and shape reconstruction are two common problems in pattern recognition, which oftentimes are tackled separately. But in some medical applications, both pose and shape of a target anatomy are crucial and have to be estimated from intra-operative two-dimensional images. As pose estimation and shape reconstruction are two coupled problems, previous feature-based methods solved the problems in consecutive stages utilizing statistical shape models (SSMs). Only the mean shape of SSM is used to estimate the pose by finding paired correspondences in the first stage, based on which SSM-regularized surface deformations are performed in the following stages. Such a strategy heavily depends on the paired correspondences. In this paper, bypassing correspondence establishment, a novel method is proposed to simultaneously optimize pose and shape by formulating the coupled problems as a maximum penalized likelihood estimation (MPLE). It models oriented object contours as a mixture of von Mises–Fisher Gaussian distributions, and solves the MPLE effectively using a global optimizer. It utilize the entire knowledge of SSM in both solving pose and reconstructing shape, providing robustness to large offsets in initializations. Leave-one-out cross-validations on 19 dry cadaveric femurs were performed using simulated X-ray images with accurate ground-truth, under various initial conditions. Our method achieved sub-degree rotational and sub-millimeter in-plane translational pose estimation errors, and an approximately one millimeter average mean surface-to-surface distance in shape reconstruction. The reconstruction accuracy is comparable to those reported in the literature using two or more images. The experiment results are encouraging and indicate that an accurate simultaneous 3D–2D pose estimation and surface reconstruction is achievable from one single image.

Introduction

Registration and reconstruction of a deformable shape are coupled problems, solving either one will appreciably facilitates solving the other. In some medical applications, aligning peri-operative 3D data with intra-operative 2D image(s) is one of the crucial steps for image-guided intervention, surgery and therapy [1]. Registration of the 3D data with respect to 2D image(s) (called pose estimation or 3D–2D registration) provides a mapping between the 3D data and patient anatomy. When a patient׳s tomographic image is used as the 3D data, pose estimation generally recovers a rigid transformation as the anatomy of interest has negligible or no deformation. Otherwise, a patient-specific model needs to be constructed from 2D image(s) along with pose estimation. This requires to optimize the shape for the patient-specific model and the pose for mapping the model to patient anatomy with sufficient accuracy. Various methods have been proposed to reconstruct patient-specific models from 2D images, based on either image intensity or image contour. The former methods [2], [3], [4], [5], [6], [7] maximize certain intensity similarity metric between 2D images and digitally reconstructed radiographs (DRRs) generated from a 3D model. Their major differences lie in the 3D model, similarity metric and optimization method used. It is known that these methods have high computational load and a statistical volumetric model is not easy to build. Moreover, additional structures (e.g., irrelevant anatomies) and foreign objects (e.g., surgical instruments) that present in 2D images but not in volumetric model may introduce big challenges to an accurate pose estimation and model reconstruction.

Contour-based methods minimize a distance metric between image contours and model׳s apparent contours. Mathematically, given unmatched 2D and 3D points, the objective is to solve the minimization problemargminΩn=1Nm=1Mη(m,n)unT(D(Xm;α);Ω)2,where {unR2}n=1N are image points, {XmR3}m=1M are apparent contour points, D:R3R3 denotes a deformation with parameter α, and T:R3R2 represents a perspective projection following an affine transformation with parameter Ω that consists of a scale sR and a 6-DOF rigid transformation θSE(3). The assignment function η(m,n):{1,,N}×{1,,M}{0,1} assigns a paired correspondence between a 2D point and a 3D point; η(m,n) equals one if un corresponds to Xm and zero otherwise.

The presence of deformation D renders the pose estimation ill-conditioned. To deal with this, a common approach is to regularize D using a-priori knowledge represented by a statistical shape model (SSM) built from T triangular meshes. An “instance” of the SSM is approximated asSS[0]+k=1Kαke[k]D(Xm;α),where S[0]=t=1TSt/T is the mean shape, α=(α1,,αK)T the shape parameter, e[k] the kth mode, and K chosen based on accumulated variance.

As pose estimation and reconstruction of a deformable shape are coupled problems, previous methods [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19] proposed to solve them via consecutive stages. The first stage establishes paired correspondences and estimates pose from them, both using only the mean shape. Manually identified correspondences, variants of iterative closest point (ICP) and intensity-based registration were used in this stage. Then, fixing at the estimated pose and using the paired correspondences, the second stage constructs an instance via a SSM-regularized surface deformation by minimizing 2D contour-to-contour distance or 3D point-to-line distance. Adapted gradient descent, pattern search and preconditioned conjugate gradients in a trust region were used. To achieve a better shape reconstruction, some methods have a third stage that minimizes local free-form deformations using a first-order Gauss–Markov process or a surface fitting with thin plate splines as the smoothness constraint, still fixing at the pose and using the paired correspondences obtained in the first stage. In [13], [17], [18], contour normals or image gradient was also used to alleviate false paired correspondences. These methods require delineated image contours or a good initialization.

When estimating pose using the mean shape, deformations (morphological differences) are considered as random noise that contaminates the mean shape. However, the shape deforms under the constraint of SSM rather than randomly. Thus the two-stage procedure may result in a registration biased by deformations and a reconstruction involving an improper pose. Consequently, as the optimization does not fully take into account the SSM, i.e., (1) is not minimized subject to (2), the accuracy of the pose estimated in the first stage will be sacrificed. With this incomplete pose estimation, the reconstruction from the registered mean shape does not describe purely shape deformations. In addition, the convergence of previous methods is local, requiring an initialization near the global optimum. Hence, the coupled pose estimation and shape reconstruction should be essentially formulated as one optimization problem and solved simultaneously using a global optimizer.

The most closest method is the one for 3D–3D rigid point set registration [20], where the pose was estimated using the semi-definite positive (SPD) relaxation. However, it is designed for rigid rather than deformable objects. Most importantly, it cannot be directly extended to 3D–2D registration as the nonlinear perspective projection renders the formulation of the SPD relaxation invalid.

We present a novel contour-based method for simultaneous pose estimation and patient-specific shape reconstruction from a single 2D image. This work is a significant extension of our point-based rigid 3D–2D registration method [21]. An early version was briefly introduced in an application [22], [23], but without derivation and incorporation of image gradient. This paper provides a complete and detailed derivation on how the simultaneous pose estimation and surface reconstruction are rigorously formulated as a MPLE problem, which is unobvious, in Section 2. Then, intensive evaluations were performed under various initializations in Section 3 and results were discussed and compared with state-of-the-art methods in Section 4, followed by conclusions and future work in Section 5.

Section snippets

Problem statement

We do not directly solve the minimization problem (1). Instead, we formulate the problem as a statistical inference of the pose and shape parameters, without establishing paired correspondences.

For image contour points, we form N dyads (4-vectors) P={(un,gn)}1N, where unR2 denotes the location of the nth image contour point and gnR2, gn=1 denotes the image gradient at un. For apparent contour vertices of an instance transformed by Ω, we form M dyads (6-vectors) C={(Xm,Nm)}1M, where XmR3

Experiment and results

The evaluation is to examine the stability and accuracy achievable with our method under various initial conditions using DRRs. The DRRs were generated using our “virtual C-arm” software that simulates a distortion-free flat-panel cone beam CT imaging bench [37]. The source-to-detector distance dSD is 1184 mm and the image resolution is 1024×768 pixels. In practice, these parameters can be obtained via C-arm calibration.

The probabilistic correspondence

Almost all feature-based patient-specific model reconstruction methods are derived from the ICP, making use of image contours and apparent contours of SSM. They assumes that there are paired correspondences between the two types of contour, and as such, dedicate to singling out a set of “best” paired correspondences, based on which both rigid registration and SSM regularized deformation are carried out.

The ICP-based methods assume a position noise that corrupts the data, as illustrated in Fig. 4

Conclusions and future work

We propose a novel method for simultaneous 3D–2D pose estimation and surface reconstruction of a deformable object, which are two intrinsically coupled problems. Unlike previous methods that solve a simplification of the coupled problems as a two-stage procedure in two consecutive steps, the proposed method formulates the two coupled problems as one MPLE problem and solves it simultaneously. In addition, the proposed method utilizes a global optimizer whereas previous methods used local

Conflict of interest

None declared.

Xin Kang received his Ph.D. from The University of Hong Kong, Hong Kong, China. Before that, he served as a Senior Lecturer at the Department of Electrical and Electronic Engineering. His current research interests include pose estimation, image analysis, visual tracking and image fusion.

References (47)

  • X. Chen et al.

    Automatic inference and measurement of 3d carpal bone kinematics from single view fluoroscopic sequences

    IEEE Trans. Med. Imaging

    (2013)
  • G. Zheng

    Personalized x-ray reconstruction of the proximal femur via intensity-based non-rigid 2d–3d registration

  • M. Fleute et al.

    Nonrigid 3-D/2-D registration of images using statistical models

  • S. Benameur et al.

    Three-dimensional biplanar reconstruction of scoliotic rib cage using the estimation of a mixture of probabilistic prior models

    IEEE Trans. Biomed. Eng.

    (2005)
  • S. Benameur et al.

    A hierarchical statistical modeling approach for the unsupervised 3-D biplanar reconstruction of the scoliotic spine

    IEEE Trans. Biomed. Eng.

    (2005)
  • H. Lamecker, T. Wenckebach, H.-C. Hege, Atlas-based 3D-shape reconstruction from X-ray images, in: 18th International...
  • J. Dworzak et al.

    3D reconstruction of the human rib cage from 2D projection images using a statistical shape model

    Int. J. Comput. Assist. Radiol. Surg.

    (2010)
  • G. Zheng

    Statistical shape model-based reconstruction of a scaled, patient-specific surface model of the pelvis from a single standard ap x-ray radiograph

    Med. Phys.

    (2010)
  • A. Hurvitz et al.

    Registration of a CT-like atlas to fluoroscopic X-ray images using intensity correspondences

    Int. J. Comput. Assist. Radiol. Surg.

    (2008)
  • N. Baka et al.

    Statistical shape model-based femur kinematics from biplane fluoroscopy

    IEEE Trans. Med. Imaging

    (2012)
  • S. Laporte et al.

    A biplanar reconstruction method based on 2d and 3d contoursapplication to the distal femur

    Comput. Methods Biomech. Biomed. Eng.

    (2003)
  • R. Horaud et al.

    Rigid and articulated point registration with expectation conditional maximization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • X. Kang et al.

    Robustness and accuracy of feature-based single image 2D–3D registration without correspondences for image-guided intervention

    IEEE Trans. Biomed. Eng.

    (2014)
  • Cited by (10)

    • Multiple geometry representations for 6D object pose estimation in occluded or truncated scenes

      2022, Pattern Recognition
      Citation Excerpt :

      Within the field of 6D object pose estimation, traditional methods either build the relationship between object 3D models and images through hand-crafted features [8] or apply template matching techniques to search the closest pose solution [9–11]. These methods are very sensitive to cluttered background and severe occlusion [12]. In recent years, deep learning technology has been gradually introduced into this field and has shown promising performance improvements.

    • Automatic Tooth Segmentation and 3D Reconstruction from Panoramic and Lateral Radiographs

      2020, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Joint Rigid Registration of Multiple Generalized Point Sets with Hybrid Mixture Models

      2020, IEEE Transactions on Automation Science and Engineering
    View all citing articles on Scopus

    Xin Kang received his Ph.D. from The University of Hong Kong, Hong Kong, China. Before that, he served as a Senior Lecturer at the Department of Electrical and Electronic Engineering. His current research interests include pose estimation, image analysis, visual tracking and image fusion.

    Wai-Pan Yau received his MBBS from The University of Hong Kong in 1992. He is the Chief of the Division of Sports and Arthroscopic Surgery in the Department of Orthopaedics and Traumatology, Queen Mary Hospital, where he is also a Honorary Clinical Associate consultant. He is a Clinical Assistant Professor of the Department of Orthopaedics and Traumatology in The University of Hong Kong.

    He has the Fellowship of the Royal College of Surgeons (Edinburgh) (FRCSE), of the Hong Kong College of Orthopaedic Surgeons (FHKCOS), and of Hong Kong Academy of Medicine (FHKAM). He is also a recipient of numerous awards, including the Harry Fang Gold Medal conjoint examination of FHKCOS and FRCSE (Ortho), the Arthur Yau Award of the 14th Annual Congress of HKOA, the Arthur Yau Award and the David Fang Trophy of the 24th Annual Congress of HKOA, and the Japanese Orthopaedic Association Congress 2008 Fellowship Award.

    Russell H. Taylor received his Ph.D. in Computer Science from Stanford in 1976. He joined IBM Research in 1976, where he developed the AML robot language and managed the Automation Technology Department and (later) the Computer-Assisted Surgery Group before moving in 1995 to Johns Hopkins, where he is the John C. Malone Professor of Computer Science with joint appointments in Mechanical Engineering, Radiology, and Surgery and is also Director of the NSF Engineering Research Center for Computer-Integrated Surgical Systems and Technology.

    He is the author of over 250 peer-reviewed publications, a Fellow of the IEEE, of the AIMBE, of the MICCAI Society, and of the Engineering School of the University of Tokyo. He is also a recipient of numerous awards, including the IEEE Robotics Pioneer Award, the MICCAI Society Enduring Impact Award, and the Maurice Müller Award for excellence in computer-assisted orthopaedic surgery.

    This research was supported by Research Grant and RPg Exchange Funding of University of Hong Kong and Johns Hopkins University internal funds.

    View full text