Nonrigid reconstruction of 3D breast surfaces with a low-cost RGBD camera for surgical planning and aesthetic evaluation

Highlights • A nonrigid 3D breast surface reconstruction pipeline running on a standard PC taking a noisy RGBD input video from a Kinect-style camera is proposed.• Pairwise nonrigid ICP is extended to the multi-view case incorporating soft mobility constraints in areas of non-overlap.• Shortest distance correspondences as a new technique for data association are shown to lead to consistently better alignment.• The method is able to reconstruct clinical-quality surface models in spite of varying degrees of postural sway during data capture.• Landmark and volumetric quantitative validation in metric units demonstrate improved reconstruction quality on par with the gold standard and superior to a competing method.


Introduction
Breast cancer is the most frequently diagnosed cancer site among women worldwide ( Jemal et al., 2011;Fitzmaurice et al., 2015 ).Despite increased incidence, mortality from breast cancer is declining with 10-year survival rates reaching 82% in Europe ( De Angelis et al., 2014 ).A longer life expectancy after developing breast carcinoma in turn emphasizes the importance of aesthetic treatment outcome and late effects.Beside the oncological result, several studies have linked cosmetic and functional outcome to patients' quality of life, mental health and self-image ( Hau et al., 2013;Stanton et al., 2001 ).A good cosmetic outcome is typically associated with high breast symmetry and minimal scarring.In breast conserving surgery also known as lumpectomy the tu-mour with clear tumour-free margins is excised usually followed up by local radiation of the treated breast therefore allowing the patient to keep most of her breast in the setting of early stage breast cancer.This is backed by large long-term clinical studies concluding no difference in disease-free survival between lumpectomy and mastectomy ( Fisher et al., 2002 ).While breast conserving surgery generally surpasses mastectomy regarding their cosmetic outcome, dissatisfactory or poor results are reported for lumpectomy in up to 30% and 6% of cases respectively ( Hill-Kayser et al., 2012 ).Yet, a lack of standardised procedures for aesthetic outcome evaluation persists ( Cardoso et al., 2012 ).For a higher patient satisfaction and fewer adverse cosmetic results it is essential to correlate tumour and treatment related factors to breast aesthetics post-treatment as well as further involve patients in the decision making process between the rising number of therapeutic alternatives.3D surface imaging of the breast has the potential to aid in treatment planning, surgical outcome prediction and objective aesthetic outcome evaluation ( Chae et al., 2016;O'Connell et al., 2015;Cardoso et al., 2014 ).Computational tools start to emerge that incorporate 3D breast surface data to let clinicians appreciate real-istic 3D visualisations of the breasts' morphology on the click of a button, perform shape-related analysis and classification as well as biomechanical simulation of probable surgical outcomes overall showing their utility to replace and extend former crude timeconsuming and subjective techniques ( Eiben et al., 2016;Oliveira et al., 2014 ).Unfortunately, a widespread clinical use is elusive due to the high cost and infrastructure requirements of 3D scanner equipment ( Tzou et al., 2014 ).We therefore investigate the use of a portable consumer-grade RGBD camera for complete accurate 3D breast surface modelling from breast data acquired in a simple contactless acquisition setup in hospital.In this paper in particular, we address the handling of postural sway, the nonrigid involuntary body deformation and breathing motion during incremental data acquisition, and demonstrate our method's superiority in terms of reconstruction quality compared to a previous method.

Previous work
3D surface reconstruction models the external geometry and appearance of real world scenes.To recover a complete surface model of a physical object through perspective projection, data from several overlapping images has to be merged.This requires knowledge of the camera intrinsic parameters and egomotion.With recent ubiquity of cheap portable RGBD cameras delivering medium resolution depth at video frame rate there is a vast literature on image-based 3D surface reconstruction spanning ample application areas in healthcare and beyond.Static scene reconstruction has reached maturity in recent years ( Newcombe et al., 2011 ).State-of-the-art methods are typically posed as a Simultaneous Localisation and Mapping (SLAM) problem employing a variant of Iterative Closest Point (ICP) ( Besl et al., 1992 ) for frame-to-model tracking and an incremental volumetric ( Curless and Levoy, 1996 ) or point-based fusion ( Alexa et al., 2003;Pfister et al., 20 0 0 ).Although follow-up research addressed scalability ( Nießner et al., 2013;Whelan et al., 2015 ), robustness ( Dou et al., 2016;Zhou and Koltun, 2014;Whelan et al., 0 0 0 0; 2015; Keller et al., 2013;Glocker et al., 2015 ) and global consistency ( Henry et al., 2012;Maier et al., 2014;Dai et al., 2017;Zhou and Koltun, 2013;Whelan et al., 0 0 0 0 ) the assumption of a rigid camera motion remains restrictive.
There is less literature on 3D breast surface imaging in particular nonrigid alignment of breast surfaces.Commercial single-shot 3D scanners find limited use in clinical practice despite a superior reconstruction quality due to their generally high cost and space requirements ( Chae et al., 2016 ).Studies reviewing the Kinect's performance for breast surface imaging and other healthcare applications found its accuracy suitable for clinical use ( Pöhlmann et al., 2016;Choppin et al., 2016 ) yet pointing out a large uncertainty in the measured values ( Choppin et al., 2016 ) and a need for further clinical validation studies ( O'Connell et al., 2015 ).A drawback of more affordable and portable 3D scanners are longer capture times, which inevitably lead to postural sway and thus lower the reconstruction quality.Postural sway is the entirety of involuntary body deformation and breathing motions between consecutive breast images in a series.'Hold breath' scanning protocols and respiratory gating techniques have been proposed to counter its effects ( Patete et al., 2013 ).Among studies using single or two Kinect camera setups for breast surface imaging ( Oliveira et al., 2014;Wheat et al., 2014;Costa et al., 2014;Pöhlmann et al., 2014;Lacher et al., 2015 ) only one produces complete denoised 3D breast surface models from a sequence of depth images ( Lacher et al., 2015 ).Although discussing its apparent negative impact on reconstruction quality the work lacked a way of handling postural sway entirely.In summary, the specific contributions of our work are:  ( Lacher et al., 2015 ) is used to rigidly estimate camera poses (3.1) .To deal with the vast number of transformation parameters lower resolution meshes are constructed by uniform downsampling and triangulation constituting a deformation graph structure as described in Sumner et al. (2007) (3.2) .The pairwise nonrigid ICP method proposed in Amberg et al. (2007) is utilised to refine the alignment of all frames in the data set nonrigidly (3.3) .Refinement is organised in two phases: a bundle (local) and a global phase.Resulting node-wise affine transformations are subsequently interpolated to deform the full resolution point clouds (3.4) prior to fusion and meshing (3.5) .
• An extension of pairwise nonrigid ICP to the multiview case incorporating soft mobility constraints in areas of non-overlap.• The proposition of shortest distance correspondences as a new technique for data association.We match source and target through finding the shortest intersecting line from any one source vertex with the target model.We show that shortest distance correspondences consistently lead to better alignment.• A two-fold landmark and breast volumetric quantitative validation in metric units demonstrating improved reconstruction quality towards gold standard level and superior to a competing method.

Method
We propose a markerless template-free nonrigid reconstruction method for accurate 3D breast surface modelling in breast cancer treatment planning and evaluation in the presence of postural sway (see Fig. 1 ).Our method is based on the works of Amberg et al. (2007) and Sumner et al. (2007) combining the linear problem formulation for pairwise nonrigid alignment of the former with the notion of an embedded deformation graph from the latter.We use bold upper and lower case Latin letters for matrices and vectors.The elements of either are enclosed in square brackets.Scalars are written in lower case Latin and Greek letters with normal font weight.Capital Greek and calligraphy letters denote sets and constants respectively.

Preprocessing
The preprocessing stage includes intrinsic and extrinsic camera calibration for undistortion, RGBD registration and point cloud backprojection.In addition, an erosion of an automatically segmented foreground mask excludes spurious points with unreliable colour attribution.

Calibration
Planar checkerboard images were acquired to calibrate the Kinect's internal infrared and RGB camera as projective pinhole cameras ( Hartley and Zisserman, 20 0 0 ).The calibration procedure estimated the camera intrinsic matrices K D and K RGB which define a linear mapping from metric to pixel space.Radial lens distortion was modelled as a polynomial function through a set of distortion coefficients ( Zhang, 1999 ).The inverse distortion of pixel coordinates was computed in a Levenberg-Marquardt optimisation.Stereo calibration recovered the camera extrinsic matrix transformation between the infrared and RGB sensor.Reprojection errors were below 0.2 pixels.
By means of timestamping, depth and colour images are paired.Following undistortion of pixel coordinates, depth values are backprojected into metric space producing an unordered cloud of 3D points in depth camera coordinates using the previously estimated camera intrinsics K D .Invalid depth measurements, e.g. in occlusion areas between the infrared sensor and pattern projector, are skipped.After a change of coordinate system from depth to colour camera 3D point coordinates are reprojected to colour camera pixel coordinates for RGB value lookup.Each 3D point is assigned an RGB colour subject to nearest integer interpolation and image boundary checks.To reduce the computational cost of repeated point cloud creation, the RGB values are permanently stored in a new depth-registered colour frame.Fig. 2 illustrates the projection of depth data on the colour image for point cloud extraction.

Foreground mask erosion
The Kinect camera has been reported to suffer from high noise levels in particular at depth discontinuities ( Sarbolandi et al., 2015 ).Adding to this, discretisation in conjunction with calibration inaccuracies leads to gross errors at depth boundaries causing e.g.background-blending artefacts due to the projective nature of the camera transformations.While the reconstruction method is capable of smoothing out noise, fusing a large number of frames into a single representation, extra caution is required handling regions of inhomogeneous depth.In our controlled acquisition, depth discontinuities mostly occur between the foreground subject and background.We segment the foreground in each depth frame by determining the distance to the background wall.We calculate the median of the 9-by-9 right top corner image patch, obtaining a bi- 0 denoting the integer grid pixel coordinates.This is followed up by 3 binary erosions F O of the foreground mask with a rectangular structuring element O of size 3 × 3 of all ones.

Rigid pose estimation
We employ a sequential visual SLAM for breast surface reconstruction ( Lacher et al., 2015 ) to estimate global camera poses with N being the number of frame pairs in the RGBD se- quence and T i = { R i , t i } the individual rigid body camera transformation comprising a rotation matrix R ∈ SO (3) and a translation vector t ∈ R 3 .The SLAM is based on the popular KinectFusion algorithm ( Newcombe et al., 2011 ) fusing consecutive frames with small inter-frame motion in a discrete truncated signed distance function volume after frame-to-model ICP alignment under point-to-plane error metric.The camera poses initialise our nonrigid alignment algorithm by rigid transformation of the point data into a shared global 3D coordinate system.

Graph construction
Each 3D point is assigned a normal n ∈ R 3 .Normals are computed in a pointwise parallelisation on the GPU.The normal orientation is estimated through principal component analysis of all 3D points in a projective depth window of size 3.The ambiguity in normal orientation is resolved by flipping normals consistently towards the origin of the camera coordinate system.This is achieved trivially by enforcing the dot product between the vector of the point coordinates (viewpoint is at origin) and the normal to be consistently negative or positive.Sumner et al. (2007) suggest the use of a coarse embedded graph to recover realistic shape deformations at human scale.We follow suit by downsampling all source point clouds via voxel grid averaging to a uniform sampling density of 6 mm.The downsampled point clouds are meshed employing the greedy projective triangulation algorithm with a search radius of 2.5 cm and a μ of 2.5 ( Marton et al., 2009 ).Following triangulation, we perform a connected component analysis by iteratively traversing the triangular mesh.We discard smaller unconnected parts that cause a rank deficiency in the coefficient matrix.The triangulated downsampled single component point clouds serve as our deformation graphs J denotes the number of nodes in G j .Each graph node g i j is related to an affine transformation specified by a matrix

Nonrigid ICP
Globally consistent nonrigid alignment is achieved by exhaustive joint alignment of all deformation graphs.For computational efficiency, each 10 consecutive frames are pooled into bundle where B = N / 10 denotes the number of bundles.Each bundle is nonrigidly aligned by sequentially registering all other 9 deformation graphs in the bundle to the first of each bundle producing transformations { X i } k i = j with k − j = 10 .Each bundle graph is then simultaneously aligned against the joint set of all other bundle deformation graphs.Prior to alignment, the joint sets of bundle graphs are resampled to ensure uniform node density.The alignment counts as a step and in turn consists of a maxi-mum of 10 nonrigid ICP iterations.The alignment process iterates over all bundles in a circular fashion terminating after a maximum number of 100 steps has been reached.Each step, bundle transformation matrices { ˜ X i } B i =1 are incrementally updated.

Correspondence finding
Correspondences from the downsampled source to target graph are sought on the grounds of spatial proximity.We compare two correspondence estimation techniques.In addition to classic closest point correspondences we investigate shortest distance correspondences.In the latter case, the correspondence is established using the intersection point at the end of the shortest line connecting the source vertex with the target mesh.The closest point correspondence search is sped up using an octree structure initially populated with the target's vertices.We implement the algorithm by Jones (1995) on the GPU for fast intersection point computation.The closest intersection point may either coincide with a target vertex, edge or lay within a triangular face.We omit superscript indices for readability.The projection of a source vertex g i on the plane of the respective target triangular face denoted as g i simplifies to the following expression where n f is the unit face normal and f 1 an arbitrarily chosen face corner.In case g i falls within the triangle it is also the intersection point g i .Assuming g i lies outside the triangle and is instead closer to edge e 1 , 2 = f 2 − f 1 , we find the intersection point g i on the edge as follows.
A complete geometrical derivation is found in Jones (1995) .Per iteration the set of source graph nodes are updated by multiplication with the current matrix X .As illustrated in Fig. 3 and in accordance with Amberg et al. (2007) , correspondences are clipped if one or more of the following conditions apply.
The normals n i and n denote the normals of g i and g i respectively.Given the target is a non-watertight surface, denotes the set of edges that make up the target's border.The border edges are the set of all single edges.A single edge is an edge that is only part of one triangle.Correspondences that are rejected carry zero weight.The weight for accepted correspondences is set to constant 1.The first two criteria ensure the algorithm is capable of dealing with non-overlap regions correctly while the last criterion prevents false correspondences when the still misaligned source and target intersect in regions where several geometric layers are close to each other.In the case of the torso, the latter may happen at the armpit or where the fingers are placed on the hip.

Data term
The data term d is expressed as the sum of weighted quadratic point-to-point distances of correspondences.For simplicity of notation we assume the corresponding matching point in graph k for the query point g j i in graph j to be g k i .As outlined in Section 3.3.1 , g k i is the nearest intersecting point with the target graph and need not be an actual node in the target graph G k .It follows that } is the set of fixed correspondences between the graphs j and k for a particular step and iteration.
Eq. ( 4) may be vectorised and rearranged to be amenable to linear solving through post-multiplication with X .Let W constitute a J × J diagonal weight matrix with diag(W ) = [ w 1 , ., w J ] .Let ˆ G j ∈ R J ×4 J be a matrix containing the graph's formatted nodes.
Further we define G to store the graph's nodes in a row-major matrix as written in Eq. (5) . (5)

Stiffness regulariser
Allowing each vertex to deform in a locally affine manner leads to an underconstrained optimisation problem with a huge solution space of J • 12 parameters and hence a singular coefficient matrix A .As we do not have any shape or deformation priors at hand which could sensibly reduce the number of parameters and hence the size of the parameter search space we instead choose to regularise the problem adding two regularising terms to our cost function.
The first term s controls the overall smoothness of deformation by penalising the weighted difference of the transformation applied to adjacent vertices.This cost is measured as the Frobenius norm || • || F of the difference of the respective transformation matrices.Adjacency between two vertices a and b is defined as vertex a having a connecting edge to vertex b.The connectivity information is retrieved from the target deformation graph and stored in a neat matrix form using a node-arc incidence matrix M constructed similarly to the formulation in Amberg et al. (2007) .Matrix M k is sparse and of dimension F × K where F and K are denoting the number of target faces and nodes respectively.Edge 1 connecting the nodes 2 and 3 would appear as two nonzero entries -1 at M 1, 2 and 1 at M 1, 3 .Lowering the stiffness gradually allows more local deformation.Let I 4 be a 4 × 4 identity matrix and denote the Kronecker product, our stiffness regulariser takes on the form of Eq. ( 6) .

Mobility constraints
In every pairwise registration, the source and target graphs are only partially overlapping as we are progressively capturing the torso under rotation exploring unseen geometry with every new frame.The source nodes in a nonoverlap region do not have a correspondence and would otherwise only be driven by the smoothness regulariser.This leads to a 'flattening' of curved surface parts e.g.around the core which does not reflect the actual body shape.Hence, we add mobility constraints that softly enforce vertices without correspondences to stay stationary by quadratically penalising any displacement from their starting location.We implement the mobility constraints by stacking the matrix of the 'parked' nodes ˆ P on our coefficient matrix A and the matrix P onto the B matrix.ˆ P and P are similar in definition to the matrices ˆ G and G in Section 3.3.2where correspondences are replaced by self-

Linear least squares optimisation
Assuming a minimisation of the cost to align graph j to k for fixed stiffness α the total cost is the weighted sum of our three terms: the data term d , the stiffness term s and the mobility constraints term m .The weight for the mobility constraints is fixed to β = 1 .We aim to minimise with respect to X j .min The matrices for the individual cost terms can be conveniently stacked upon as described in Amberg et al. (2007) to yield the linear matrix form below.
This system of linear equations in the form AX = B can now be solved for optimal transformations X j in closed form for a fixed set of correspondences and stiffness.
As correspondences are implicit and change with every update of X j , optimisation still follows an iterative approach.The iterative process is considered to have converged if the change in X j defined as || X j − X * || F where superscript X * equals X j from the last iteration drops below a threshold of 10 −4 or the maximum number of 10 iterations has been reached.

Spatial propagation
After we estimated an affine transformation for every node g of our deformation graph we wish to apply the deformation to our topologically similar full resolution point clouds.This is achieved by smoothly interpolating the transformation for arbitrary point locations in 3D Euclidean space which we denote by function X ( • ).The weighted interpolation technique averaging the affine transformation of surrounding nodes is outlined in Sumner et al. (2007) and restated for completeness.
where vertex v is transformed to v .R and t denote the transformation relative to the node position and M is the number of nearest node neighbours.Normals are adjusted similarly ignoring the translation part.The precomputed weights for each node radially decay with distance.We take over the weight formulation by Sumner et al. (2007) d max is the largest vertex-node distance in the set of M nearest nodes.The weight of node j with respect to vertex i is intuitively decaying with the distance between the node and the vertex.This way we are ensuring local overlapping but limited influence on the final transformation.Finally, weights are normalised to sum to one through use of the normalisation constant ζ .

Model generation
After nonrigid alignment and propagation of the frame and bundle transformations to the full resolution point clouds in a twostep process, each point cloud is fused in a joint model point set We then run the Moving Least Squares (MLS) algorithm ( Alexa et al., 2003 ) on the resulting model point set.This algorithm fits a low order polynomial to each point over a small spatial neighbourhood reducing noise while maintaining surface detail.The optimal radius depends on the noise level and sampling density.We use a constant search radius of 8 mm.As the algorithm itself is oblivious to the camera location, a subsequent reorientation of flipped 'rogue' normals was necessary.The model size grows linearly with the number of frames being fused leading to oversampling.In a grid-based resampling we reduce point redundancy to a uniform density of 1 mm.To turn the point set into a triangular mesh we apply Poisson reconstruction ( Kazhdan and Hoppe, 2013 ) with a maximum octree depth of 9 and a minimum of 10 samples per point to obtain surface S .Excess surface is clipped.For a realistic appearance of the digital torso model, we reintegrate vertexwise colour values via hue interpolation in a radial neighbourhood after meshing.This requires a conversion to and from HSV colour space.Due to the circular nature of the hue component, hue values are sequentially averaged determining the shortest path between two hue values.This avoids colour blending artefacts in regions of stark contrast.

Experiments
Data was captured using a first generation Microsoft Kinect sensor complying with a clinical acquisition protocol ( Section 4.1 ).We run our method on 9 clinical data sets of 6 patients diagnosed with early breast cancer.Statistics per data set are given in Section 4.2 .The presented method is validated in two ways.First, the spread of projected landmark instances is evaluated before and after nonrigid refinement ( Section 4.3 ).Second, breast volume is compared against the rigid-only reconstruction as well as the gold standard ( Section 4.4 ).

Clinical data capture
An acquisition protocol served for a cross-sectional and longitudinal clinical study in the context of an EU-funded project.Kinect and gold standard scans were acquired in distinct rooms over the course of one imaging session by a clinical photographer.Despite using a Kinect in this work, the proposed method is generally camera-agnostic and another RGBD camera could be used interchangeably.

Kinect scan
We employ a first generation Microsoft Kinect sensor which was originally sold as a motion capturing video game interface but found huge appeal in other fields such as 3D surface reconstruction ( Zhang, 2012 ).The Kinect couples medium resolution structuredlight depth sensing in the near infrared range with an ordinary RGB camera.Detailed specifications of the Kinect are available online. 1 For Kinect capture, the protocol specifies tripod-mounting the sensor in front of a neutral blue background.The windowless room is evenly lit with diffuse studio lights.The patient stands hands-on-hips at a distance of 0.9 m from the camera and is instructed to perform a 180 °self-rotation facing the camera from lateral to lateral while the device captures RGBD data at video frame rate of the patient's torso.

Gold standard scan
A medical-grade 3dMD system 2 was used as the gold standard scanner.It combines four synchronised modular stereo camera units for a 180 °torso coverage at a specified geometrical accuracy of < 0.2 mm.Each unit additionally houses a white light pattern projector for active stereophotogrammetry and a uniform flash for texture mapping.Continuous triangular surface meshes of the patients' torsos were generated in single-shot frontal acquisitions.The gold standard models serve as validation data sets in Section 4.4 .

Performance
Data was reconstructed offline on a standard PC equipped with an Intel i7 2.8GHz CPU with 32GB of RAM and an Nvidia GeForce GTX 1050 graphics card.Out of the patient cohort, 6 patients with a high distinctiveness of skin features were chosen to faciliate landmark-based validation.For reconstruction, Kinect RGBD sequences were subsampled to about 15 Hz or 1 frame per degree of rotation.The global phase of alignment converged by step 100 for all data sets.A fixed stiffness value s of 20 has been used throughout the experiments to prevent overfitting.Overall timings are reported in Table 1 and relative run times per optimisation step

Table 1
Overall run time distribution of our method on all clinical cases.Our method runs on a single core making only occasional use of CPU multi-threading or the GPU.The computation time scales near-linearly with the size of the deformation graphs and therefore the degree of downsampling applied internally.  in Fig. 4 .We use PCL framework3 in version 1.8 for point cloud processing and Eigen libraries4 in version 3.3.4sparse matrix arithmetic and linear solvers.

Landmark-based validation
The challenges in evaluating nonrigid registration algorithms are acknowledged in literature ( Tam et al., 2013 ).In addition to visual assessment we seek a quantitative metric to validate the quality of our proposed nonrigid alignment method in absence of ground truth data for the clinical patient data sets.Surfaceto-surface distances from our reconstruction to a gold standard model captured with a high-precision 3D scanner are not suitable for validation due to postural differences between both acquisitions.Instead we devise a landmark validation strategy based on explicit feature correspondences.A prominent feature such as the nipple, beauty spots, skin marks or other salient point becomes a landmark sample instance l j i = (x, y ) T where i indexes the landmark with x ∈ [1 .W] and y ∈ [1 .H] where W and H denote the image resolution.Features are manually picked in a sub- C j∈ i of the patient's RGBD sequence in which the landmark i could be successfully manually detected.All landmark sample pixel coordinates are backprojected, rigidly transformed into a common coordinate system and subsequently transformed by the distance-weighted interpolated affine transform of neighbouring deformation graph nodes as described in 3.4 which we denote as function X j ( • ) for frame j.
Ideally, after nonrigid refinement all samples of one landmark coincide in the same 3D location with a smaller spread correlating to a better alignment.To obtain the total landmark error score L , we compute a per landmark covariance matrix and report the mean Frobenius matrix norm || • || F over all covariances in Eq. ( 14) .We Fig. 6.Qualitative results figure comparing Phong-shaded breast surface models from our method (left column), a competitive rigid reconstruction method (middle column) and a high-precision 3D scanner considered gold standard (right column).Accounting for nonrigid deformation our method is able to resolve finer geometric details with less artefacts such as the nipple, belly button or fingers.
let L denote the number of landmarks.
A lower error value L indicates a better nonrigid alignment.Patients' RGB images were visually assessed with respect to a high number, even distribution and small scale of local skin features.Nonetheless, skin features had to be large enough to retain saliency at the given resolution, sharpness and lighting conditions.
Fig. 9 depicts the detected skin features marked as purple dots in frontal and lateral colour images and as blue and green coloured error ellipsoids in 3D space before and after nonrigid alignment.An error ellipsoid represents the multivariate normal distribution fit to all samples of a landmark with elliptic radii of two standard deviations.The number of landmarks and samples per patient are listed in Table 2 .

Breast volume assessment
The second part of our validation evaluates breast volume.Breast volume has been studied and reported extensively as a key morphological metric for breast shape since the middle of the last century ( Ingleby, 1949 ).The use of breast volume in our validation is motivated by the assumption that soft tissue is highly deformable yet incompressible.Through breast volumetry we introduce a clinically relevant statistic with respect to breast cosmesis.In clinical practice, breast volume plays a vital role in surgi-Fig.8. Landmark error versus stiffness chart (left).The green line is the mean over all patients plotted as grey lines.Landmark error is monotonically decreasing with lower stiffness values reaching a minimum in the range of 2 to 15. Quantile plot including all patients shows continuous reduction in landmark error over the number of (right).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)cal decision making.The volumetric analysis of the breast complements the landmark-based validation outlined in the previous section in which systematic errors might go undetected.Breast isolation which forgoes volumetry requires estimation of the breast boundaries and chest wall from surface data.This is achieved by defining the breast as a rectangular region of interest along the lines of Catanuto et al. (2018) .Volume estimation precede cleanup steps that remove non-manifold edges and vertices as well as non-connected components.Contour corner points of both breasts are algorithmically derived from a set of five manually picked landmarks per torso (sternum, lowest point on inframammary fold and anterior axillary line point bilaterally).Contours are then computed as geodesic paths using Dijkstra's algorithm.The four contours meet at their end points and a grid G Coon is bilinearly blended between them as defined in Eq. ( 15) .
The superscript denotes the coordinate, u and v are the embedded and ˆ u and ˆ v are the normalised embedded coordinates or interpolation factors along the contour patch dimensions.The interpolation is equally applicable to the y and z coordinate.Planar as well as Coon's patch chest wall approximations can be found in literature.A Coon's patch has the advantage to be able to model the anatomic breast boundary more flexibly and therefore more accurately.A visualisation of the Coon's torso patches generated from our patient data is shown in the results section in Fig. 10 .Overlapping contour segments are deleted prior to patch triangulation.An unequal number of contour points of opposite contours is adjusted beforehand by duplicating points through linearly spacing the contour point indices to the length of the contour comprising more points.This patch closes the breastless torso as seen in Fig. 10 .Subsequently, the breast is segmented.A pseudocode formulation is given in Fig. 5 .The breast segmentation and torso patch are closed by triangulating all contour edges to a mutual point.This mutual point is found by walking the opposite mean normal direction scaled by the size of the bounding box.The difference volume of both watertight meshes is the sought-after breast volume.Vol-umes are computed as the polyhedral mass of volume integrals as implemented in vcglib v1.0.1 ( Mirtich, 1996 ).Breast segmentation including the rear demarcation of the breast on the chest wall is eminently challenging and errorprone using surface data alone.Many breast delineation strategies based on landmark selection have been suggested in literature ( Pöhlmann et al., 2017 ).We argue that our breast segmentation has sufficient repeatability to perform relative volume comparison between methods.We therefore compute the Coefficient of Repeatability (CR) and Coefficient of Variation (CV) which constitute measures of volume precision calculated from the volume differences of the untreated breast for R = 3 patients captured in two imaging sessions pre and postoperatively.The CV indicates the relative magnitude of volume variation with respect to the measurements.Both statistics are proportional to the within-subject variance where σ 2 i denotes the variance of the two measurements for breast i.

Results
The effectiveness of our method is demonstrated qualitatively through renderings of the polygon torso surfaces.We also provide quantitative results from the manual landmark validation as well as the breast volume assessment study.In both cases, we compare against the only competitive breast reconstruction method known to the authors that uses a large number of breast images from a low-cost depth camera for a complete denoised reconstruction which is further referred to as 'rigid' ( Lacher et al., 2015 ).Results are visually superior to those obtained through rigid reconstruction with the latter and fare well against models from a costly commercial 3D scanner as shown in Figs. 6 and 7 .
Landmark error is consistently reduced with our method as shown in the bar plots in the right column of Fig. 9 .In our 6 patient study, landmark error drops from 1.39 ±0 .99 −5 m 2 with the rigid reconstruction over 1.15 ±0 .92 −5 m 2 using closest point correspondences with our nonrigid reconstruction technique to 0.95 ±0 .81 −5 m 2 switching to shortest distance correspondences.We visualise the reduction in landmark error through shrunken error ellipsoids Fig. 9 .Additionally, we show the landmark error behaviour over all patients in dependence of the stiffness and its decrease with the number of nonrigid refinement steps in Fig. 8 .In the breast volumetry part of our validation, we analyse a total of 51 volumes, 17 per method, measured on 11 breasts of 6 patients in 9 imaging sessions.One breast was excluded due to poor lateral coverage in the gold standard scan.A clear linear association between volumes of all methods can be seen in Fig. 11 .Owing to the lack of ground truth, volume errors in the gold standard have to be assumed because of which we have chosen to use orthogonal over standard linear regression.Volume differences between methods appear normally distributed in histogram and Q-Q plots (not shown).A χ 2 test for normality failed to reject the null hypothesis that volume differences follow a normal distribution at p = 0.86.With a mean and median absolute volume difference to gold standard of 18 and 15 ml over 19.2 and 18.5 ml, results indicate our nonrigid method outperforms the rigid reconstruction in terms of breast volume accuracy by a small margin.The Bland-Altman plot s in the right column of Fig. 11 also show that volume differences do not seem to increase in absolute values with breast size.An examination of volume difference outlier revealed misestimations in the chest wall approximation caused by subtle changes in the patient's posture between modalities, especially in the presence of saggy breasts with a partly occluded inframammary fold.These samples might therefore not reliably reflect the actual volume difference.Furthermore, both the proposed as well as the rigid method suggest a minute volume bias of 4 and 3 ml, respectively.Repeatability of the non-rigid method is, with 5.2%, marginally worse than the gold standard with 5.1% and more than twice as high compared to the rigid method with 2.2%, as per CV in Table 3 .

Discussion
Low-cost lightweight 3D scanning of a human torso always goes hand in hand with some inherent degree of complex deformation.

Table 3
Coefficients of repeatability and variation calculated using the healthy breast volume differences from two imaging sessions in a small subset of three longitudinal patients.Postural and body mass changes, discretisation and manual landmarking contribute to a higher variability in volume estimates.This is due to the fact that the image data from various viewpoints necessary for a complete hole-free 3D model has to be captured consecutively.Between these scans a certain amount of motion is hard to avoid.Soft tissue deformation is increased in particular in the domain of breast scanning when compared to faces or other body parts with an underlying bone structure.The low price and portability of such cameras preclude a setup with multiple calibrated camera pods capable of single shot acquisition.Moreover, noise level tends to be higher with cheaper hardware suggesting the fusion of a larger number of images for noise suppression.This should be reflected in the reconstruction approach by appropriately modelling the interframe transformation.Moving from rigid to nonrigid registration is not trivial as the problem of nonrigid registration is ill-posed.The mainly homogeneous skin texture of the breast additionally hampers the use of photometric consistency to guide the alignment.Bearing these challenges in mind we developed an algorithmic solution to human breast and torso surface reconstruction moving on from purely rigid reconstruction integrating a nonrigid correction for involuntary body motion.This globally consistent nonrigid ICP-based refinement is not restricted to a learnt shape or motion model.Instead we assume deformation between frames is small to be able to generically correct any deformation without explicitly constraining it other than requiring it to be smooth.Due to the fact that the torso is featureless for the most part, landmarks are sparse and not evenly distributed across the surface leaving regions in which the quality of registration cannot be quantified.Specularities causing an oversaturation of RGB values are aggravating this circumstance.We also acknowledge that landmark sample proximity in 3D space, that is the closeness of the backprojection of matching features in multiple frames, is a necessary but not sufficient condition for the alignment to be accurate.An intuitive counter example is the case of a degenerated reconstruction where all points are mapped onto one location with a landmark error of 0. Yet, regularisation successfully thwarts such scenarios and we deem the landmark error score to be a good predictor of registration quality.

Nonrigid
According to literature, breast volume differences of up to 50 ml are regarded as satisfactory for clinical application ( Henseler et al., 2014;Sigurdson and Kirkland, 2006 ).Compared to gold standard, 16 out of 17 breasts reconstructed with the proposed method were measured to be within this error margin of 50 ml, 14 of which were accurate to 20 ml.A similarity in the limits of agreement between the rigid and nonrigid method is to be noted in Fig. 11 .This likely happens since both methods feed on the same Kinect input data whereas the gold standard model is acquired under postural change.Although volume estimates should be unaffected assuming breast tissue incompressibility, the fuzziness of breast segmentation may lead to similar variability characteristics.Various sources of uncertainty in the breast segmentation limit the interpretability of the volume evaluation including manual landmarking, mesh discretisation and posture variation.This is exacerbated by a lack of consensus on breast segmentation and volume calculation ( Choppin et al., 2016 ).
We have further found that the stiffness parameter s correlates to breast volume bias which be attributed overfitting.It is generally known that a stronger regularisation encourages isometry.This is particularly relevant in the absence of isometry-preserving constraints.Given the variability of the volume differences, the finding of a volume bias of 4 ml shall however be discounted.Also, while our algorithm shows an increase in reconstruction quality using a single stiffness over the whole surface, the amount of surface detail and curvature varies strongly and a locally adaptive smoothness may be advisable.This would come at the cost of model complexity.
Repeatability of breast surface reconstruction is assessed through repeated breast volume measurements on longitudinal data.The coefficients in Table 3 suggest that the rigid reconstruction is the most precise whereas the proposed nonrigid approach is on par with the gold standard.For comparison, the repeatability coefficients obtained in a previous study using a Kinect camera for breast volume measurement were larger by a factor of 2-4 depending on patient pose ( Pöhlmann et al., 2017 ).Although comparing favourably to the latter study with respect to repeatability, results have to be interpreted with caution due to the small sample size and the exclusion of patients with overly ptotic breasts that do not permit occlusion-free capture.The discrepancy in repeatability might also be explained by the considerably larger parameter space associated with the nonrigid deformations.More practically, volume precision estimates such as the CR and CV are also affected across methods by substantial weight gain or loss between imaging sessions.A full statistical analysis on more data would be necessary for reliable conclusions about the repeatability of breast volume measurements using the proposed method.
In terms of performance, note that although real-time is not a requirement the execution times in the range of 1-2 h listed in Table 1 might impede our method's utility in practice.However, our code is highly unoptimised and mass parallelism on the graphics card could be exploited to a higher degree for run-time acceleration.

Conclusion
Breast cancer is a complex global health problem.With ever rising incidence numbers and longer survival rates along evidence of a correlation between poor aesthetic outcome and quality of life, treatment planning and objective outcome evaluation through the use of 3D breast surface scanning is an important research path.In the dense reconstruction of 3D breast surfaces from low-cost clinical RGBD video, models suffer from oversmoothing and artefacts due to postural sway during lengthy data acquisition.We successfully drop the rigid scene assumption demonstrating more geometric detail and better texture mapping quality than any previous approach.Quantitatively, we are able to show a consistently better alignment for all patients using the proposed landmark error metric formulation and an accuracy within 20 ml in more than three-quarters of breast volume measurements compared to gold standard.In achieving a high reconstruction quality through mitigation of postural sway we believe to overcome a major obstacle towards routine clinical use of low-cost 3D breast surface modelling.

Fig. 1 .
Fig. 1.Method overview.A sequential SLAM method ( Lacher et al., 2015 ) is used to rigidly estimate camera poses (3.1) .To deal with the vast number of transformation parameters lower resolution meshes are constructed by uniform downsampling and triangulation constituting a deformation graph structure as described in Sumner et al. (2007) (3.2) .The pairwise nonrigid ICP method proposed in Amberg et al. (2007) is utilised to refine the alignment of all frames in the data set nonrigidly (3.3) .Refinement is organised in two phases: a bundle (local) and a global phase.Resulting node-wise affine transformations are subsequently interpolated to deform the full resolution point clouds (3.4) prior to fusion and meshing (3.5) .

Fig. 2 .
Fig. 2. Illustration of RGBD registration.Depth values are backprojected to 3D points using the inverse of the depth camera intrinsic matrix (1).After change of coordinate system (2) 3D points are projected to colour camera pixel coordinates for RGB value lookup (3).For convenience of point cloud creation RGB values are stored in a depthregistered colour image (4).

Fig. 3 .
Fig. 3. Illustration of correspondence estimation and rejection for nonrigid alignment by the example of 2D contours.a) For every source vertex a closest intersection point with the target is determined to yield preliminary correspondences.These exact correspondences are superior to nearest point correspondences especially with coarsely resolved models.Correspondences are subsequently trimmed.b) Source vertices in a non-overlap region mapping to target vertices or edges belonging to the target's boundary are discarded as a correspondence and instead softly enforced to stay in place.c) Correspondences that exceed a distance threshold (leftmost correspondence) or whose normals deviate strongly (rightmost correspondence) are equally rejected.For normal comparison, the normal at the target intersection point is barycentrically interpolated.

Fig. 4 .
Fig. 4. Pie chart with relative timings per optimisation step.The linear solver and corresponding sparse matrix manipulations consume about three-quarter of the processing time.

Fig. 7 .
Fig. 7. Qualitative results figure comparing breast surface models in a vertexwise colour rendering.The figure follows the same layout as above.Corroborating our previous findings surface texture is also resolved to a higher precision using our method.Beauty spots, moles or the areola are clearly delineated.Background blending and shadowing artefacts are greatly reduced.White balancing leads to an overall more natural colour profile.

Fig. 9 .
Fig. 9. Picked landmark locations are marked by purple dots in exemplary frontal and lateral input colour images (left).Landmark error is visualised as ellipsoids (middle).Samples of each landmark are axis aligned and normal fitted.Ellipsoids axes' length is chosen to represent a 95% confidence interval.The Poisson reconstructed surface mesh ( Kazhdan et al., 2006 ) is overlaid for guidance.Bar plot quantifying landmark error reduction through nonrigid reconstruction using closest point and shortest distance correspondences over sequential rigid reconstruction (right).(For interpretation of the references to colour in this figure legend, the is referred to the web version this article.)

Fig. 10 .
Fig. 10.Qualitative results the breast volume evaluation.Methods are compared columnwise.Exemplary cases are shown.Top to bottom: The least and most variable volume estimation among all three methods, the most accurate volume measurement as compared to ground truth using the proposed method and its rigid predecessor.Per case the approximated breastless torso with contoured breast region and the breast segmentation are shown on top of each other.

Fig.
Fig.Quantitative results volume estimation.The left column shows linear regression and the right column Bland-Altman plots for the estimated breast volume data.Each row analyses two out of three methods against each other.The analysis suggests a narrow gain in accuracy and precision of the proposed over the rigid method.

Table 2
Per patient number of identified landmarks and manually picked landmark samples.