Image-based characterization of thrombus formation in time-lapse DIC microscopy

The characterization of thrombus formation in time-lapse DIC microscopy is of increased interest for identifying genes which account for atherothrombosis and coronary artery diseases (CADs). In particular, we are interested in large-scale studies on zebraﬁsh, which result in large amount of data, and require automatic processing. In this work, we present an image-based solution for the automatized extraction of parameters quantifying the temporal development of thrombotic plugs. Our system is based on the joint segmentation of thrombotic and aortic regions over time. This task is made difﬁcult by the low contrast and the high dynamic conditions observed invivo DIC microscopic scenes. Our key idea is to perform this segmentation by distinguishing the different motion patterns in image time series rather than by solving standard image segmentation tasks in each image frame. Thus, we are able to compensate for the poor imaging conditions. We model motion patterns by energies based on the idea of dynamic textures, and regularize the model by two prior energies on the shape of the aortic region and on the topological relationship between the thrombus and the aorta. We demonstrate the performance of our segmentation algorithm by qualitative and quantitative experiments on synthetic examples as well as on real in vivo


Introduction
Thrombosis denotes the abnormal aggregation of blood cells which occurs after a blood vessel injury and which eventually leads to the partial or complete obstruction of the blood circulation (Furie and Furie, 2005). This malfunction plays a major role in the development of coronary artery diseases (CADs), the leading cause of death in the developed world. In addition to the identification of environmental risk factors such as smoking, obesity, and physical inactivity, the identification of possible genetic disorders is becoming increasingly important to improve the diagnosis and the therapy of CADs. Linking functional defects in thrombus formation with underlying genetic disorders requires large-scale statistical in vivo studies on thrombus formation (O'Connor et al., 2009). The increased use of zebrafish has been recently prompted by their high fecundity and their rapid ex vivo development. Some of their advantages over other animal models also include the molecular and genetic similarity of their coagulation process with the coagulation process of humans, the existence of an efficient and rapid technology to knockdown gene expression, and their optical transparency. This makes the in vivo visualization of thrombus formation possible even without sophisticated equipment or invasive surgery. We refer the reader to (Jagadeeswaran et al., 2005;Carradice and Lieschke, 2008;Lieschke and Currie, 2007;Nasevicius and Ekker, 2000;O'Connor et al., 2009) for further details on the vascular injury model in zebrafish.
We study the formation of thrombus on zebrafish larvae with the following experimental setup (O'Connor et al., 2009). The aggregation of blood cells is provoked by artificially injuring the wall of the caudal aorta with a laser pulse. Images of the scene are recorded by a CCD camera mounted on a Differential Interference Contrast (DIC) microscope. In the current state of the system, biologists manually derive measures such as the time to attachment (TTA) of the first blood cell to the aortic wall and the thrombus surface area (TSA). These measures characterize the formation of the thrombus in time. The less influence a candidate gene has on the formation of the thrombus, the more this experiment has to be repeated to gain a sufficient level of sensitivity, and the more crucial an automatized solution becomes. In this work, and as illustrated in Fig. 1, we propose to derive the aforementioned measures from the segmentation of the thrombus and of the aortic regions over time.
The poor quality of in vivo DIC microscopic images makes such segmentation difficult to obtain automatically. The low contrast between the thrombus, the uncoagulated vessel, and the background regions hampers the computation of reliable intensitybased segmentation cues. The circulating blood cells in the vessel and the static structures in the background introduce a high number of disrupting edges, which make the derivation of discriminative gradient-based features and the use of common gradientbased approaches hardly possible.
To overcome these difficulties, the key idea of this work is to replace image segmentation in single-frame by motion segmentation in sub-sequences of images. More precisely, we propose to characterize the variations of the motion patterns both in space and in time. Motion is modeled using the concept of dynamic textures (DTs) (Doretto et al., 2003a). We define the segmentation of the thrombus and of the vessel regions at a given frame as an energy minimization problem. A set of likelihood and prior energies are introduced to account for the three following observations: (1) The thrombus, the uncoagulated vessel, and the background are defined by distinct motion patterns: the thrombus is moving slowly and synchronously with the heart beat, the uncoagulated vessel witnesses the fast and chaotic circulation of blood cells, and the background remains nearly static. We guarantee the spatial homogeneity of the motion patterns in each region by introducing a DT-based motionsegmentation likelihood. The main advantage of this novel likelihood in comparison to prior works on DT segmentation is its capacity to handle multiple dynamic regions while maintaining the simplicity of the closed-formed solution for the estimation of the motion patterns. Our solution is based on the approximation of the normal distribution in the manifold of the DT models (Pennec, 2006). (2) The thrombus is the only region which does not exist at the initial state of the video sequence, i.e. before the attachment of the first blood cell. This observation leads to the modeling of the thrombus as an abnormal event in comparison to the circulation of blood cells in the uncoagulated vessel. Because of the low contrast observed in in vivo microscopic scenes, we characterize such abnormal events by a significant change of the motion patterns in time. To this end we introduce a DT-based event-detection likelihood.
(3) The two aforementioned likelihoods can be locally disrupted by the slow circulation of blood cells in the collateral vessels and by local non-rigid movement of the epithelium. From the observations that the vessel resembles a tubular shape and that the thrombus is included in the vessel, we introduce a shape prior and a topological energy. These quantities constrain the model sufficiently to achieve accurate segmentation results.
The energy obtained by combining (1), (2), and (3) is minimized using multiphase level-sets (Vese and Chan, 2002). As shown by qualitative and quantitative experiments on synthetic examples and real microscopic data, the introduced level-set framework leads to the accurate segmentation of the biological scene. By successively applying our algorithm over time, we are then able to successfully track the thrombus and the vessel regions and to accurately retrieve the temporal variations of the thrombus surface area.
Our contribution is twofold. First, from an application point of view, we develop an innovative solution for the automatized characterization of thrombus formation from DIC microscopic image time-series. This solution is based on the joint segmentation of thrombus and aortic regions in time. Second, from a methodological point of view, we introduce a simple yet accurate energy model for the segmentation of multiple dynamic regions (1). The resulting ability of our algorithm to analyze complex dynamic scene makes it of general interest for the medical imaging and computer vision communities.

Related work
Our application is related to the field of microscopic image analysis and more precisely to the segmentation and the tracking of cells. We review related approaches in Section 1.1.1. In our scenario, the unreliability of segmentation methods based on the analysis of single-frames motivates the analysis of the motion patterns, and more specifically the use of dynamic textures. Therefore, we also review previous motion-segmentation methods based on dynamic textures in Section 1.1.2.

Microscopic image analysis
Some of the main applications of microscopic image analysis include cell lineage construction (Huh et al., 2011;Li et al., 2008), cell segmentation and tracking (Cheng and Rajapakse, 2009;Dong Fig. 1. We study thrombus formation in zebrafish larvae (a). The accumulation of blood cells is observed in vivo via a CCD camera mounted on a DIC microscope, resulting in a huge amount of image data (b). The segmentation of the thrombus (green) and vessel (red) regions in time lead to the derivation of measures characterizing the temporal development of the thrombus (c). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Dzyubachyk et al., 2010;Eden et al., 2005;Kamoun et al., 2008;Mukherjee et al., 2004;Nguyen et al., 2011;Padfield et al., 2009Padfield et al., , 2011Quelhas et al., 2010;Ray et al., 2002), cell recognition (Long et al., 2005(Long et al., , 2008, and vessel segmentation (Tang and Acton, 2004;Schmugge et al., 2008). The low contrast of DIC microscopic images makes the segmentation and detection methods commonly employed in fluorescence microscopy hardly applicable in our case (Cheng and Rajapakse, 2009;Dzyubachyk et al., 2010;Kamoun et al., 2008;Nguyen et al., 2011;Padfield et al., 2009;Padfield et al., 2011;Quelhas et al., 2010). Because of the low contrast conditions also observed in brightfield microscopy, our scenario is more closely related to the problem of in vivo tracking of rolling leukocyte presented in (Dong et al., 2005(Dong et al., , 2007Mukherjee et al., 2004;Ray et al., 2002). In these works, segmentation is based on gradient-based parametric active contours (Dong et al., 2005(Dong et al., , 2007Ray et al., 2002) or on intensity-based levelset methods (Mukherjee et al., 2004). However, these approaches are hardly applicable to our scenario because of the necessity to incorporate a prior knowledge on the circular shape of the leukocytes to compensate for the cluttering of the aortic vessel by blood flow-related stream lines. The active contour presented in (Ray et al., 2002) is constrained with an elliptic shape prior. In (Dong et al., 2005), a more advanced constraint, the GICOV coefficient, ensures the consistency of the gradient magnitude along the leukocyte boundary. In (Mukherjee et al., 2004), the circular shape of the leukocytes is accounted with a slenderness measure. In our case, the accumulation of blood cells results in a high variability in shape and in size of the thrombus region in time. As such, the priors presented in (Dong et al., 2005(Dong et al., , 2007Mukherjee et al., 2004;Ray et al., 2002) could hardly be employed to help the segmentation of the thrombus region. As showed in (Ersoy and Palaniappan, 2008), the unreliability of the gradient and intensity-related features in time-lapse brightfield microscopy can be successfully compensated by analyzing the motion patterns. The authors introduce a flux tensor to discriminate moving human melanoma cells from a static background. In vitro experiments are conducted. The flux tensor faces two main limitations. First, it is restricted to the distinction between moving and stationary pixels. In our scenario, the method developed in (Ersoy and Palaniappan, 2008) would thus most likely fail to discriminate the slowly moving thrombus from the fast circulating blood cells. Furthermore, it relies on the optical flow equation and on the brightness consistency assumption, and would not be robust to the fast and chaotic circulation of the blood cells in the aortic vessel. Compared to (Ersoy and Palaniappan, 2008), we choose to employ the dynamic texture (DT) model of Doretto et al. (2003a) which has been extensively shown to provide reliable motion cues even on such complex scenarios Vasconcelos, 2008, 2009;C, 2010;Doretto et al., 2003b;Huang et al., 2007;Vidal et al., 2007). Another motion-based approach is proposed by Eden et al. (2005) to segment rolling leukocytes in brightfield microscopy. It operates in two steps: first the segmentation of the vessel from the background, second the segmentation of the circulating cells using motion, color, and texture information. The key assumption is that the circulating cells are in constant motion. This does not hold in our scenario because of the back-and-forth movement of the thrombus. Compared to (Eden et al., 2005), we propose a more general motion model and only make the assumption of different motion patterns in the different regions.

Dynamic textures
Dynamic textures model the variations of the intensity at each pixel with a Linear Dynamic System (LDS) (Doretto et al., 2003a). They permit the representation of both appearance and dynamics in a concise and intuitive formulation. They also benefit from the existence of a closed-form solution for the estimation of their parameters. The first method to address the problem of motion segmentation based on dynamic texture is presented in (Doretto et al., 2003b). It consists of the following steps: (1) the DT models are estimated at each pixel of the image, (2) a spatio-temporal signature map is derived by computing the distance between a reference DT model and the DT models observed at each pixel of the image domain, and (3) the obtained map is segmented using a level-set framework (Chan and Vese, 2001). Fig. 2 displays some signature maps obtained on a sequence of 75 microscopic images with respect to three different reference models. The first reference DT model is built from a pixel in the thrombus, the second reference DT model from a pixel in the vessel, and the third DT model from a pixel in the background. The ''single DT model'' segmentation method, introduced in (Doretto et al., 2003b), faces two major limitations. First, the signature map depends on the selection of the reference model. Second, two pixels belonging to two different dynamic regions can be characterized by similar distances. It directly follows that (Doretto et al., 2003b) is restricted to the segmentation of less than two dynamic regions and that it could thus hardly be used for the joint segmentation of the thrombus, of the aortic, and of the background regions.
A few approaches have been developed to solve the aforementioned limitations of (Doretto et al., 2003b). Chan and Vasconcelos (2008) extends the single DT model (Doretto et al., 2003a) to a Mixture of Dynamic Texture (MDT) model. The intensities are explained as the superposition of co-occurring DT models, whose parameters are estimated using an Expectation Maximization (EM) scheme. The authors further extend the MDT model to the Layered Dynamic Texture (LDT) model (Chan and Vasconcelos, 2009) in which the spatial coherence of the DT models is ensured using a Markov random field. The approaches Vasconcelos, 2008, 2009) are to the best of our knowledge the only attempts, based on LDSs, to segment scenes with more than two dynamic texture regions. However, the lack of a closed-form solution for the estimation of the model parameters makes these approaches significantly more involved than the methods directly derived from the single DT model (Doretto et al., 2003a). The motion segmentation energy and the multiphase level-set framework introduced in this work retain the advantages of the MDT and LDT algorithms for multiple dynamic texture segmentation while maintaining the closed-form solution, recalled in Section 2.1, for the estimation of single DT models. The aforementioned algorithms for dynamic texture segmentation Vasconcelos, 2008, 2009;Doretto et al., 2003b) are derived from the LDS model of Doretto et al. (Doretto et al., 2003a). It is worth noting that Ghoreyshi and Vidal (2007) propose to describe dynamic textures by characterizing the temporal variation of Ising texture descriptors with an autoregressive (AR) model. Segmentation is performed by minimizing, using level-sets, the error between the predicted and the observed texture descriptors. The method (Ghoreyshi and Vidal, 2007) circumvents the need for a reference model and is extended, but not experimentally validated, for the segmentation of multiple dynamic textures in . The key advantage of our solution in comparison to  is, in addition to its higher accuracy, the possibility to maintain the more commonly employed LDS-based modeling of dynamic textures.
One of the other key applications of dynamic textures is the detection of salient events under low contrast and highly dynamic conditions. These conditions hamper the use of common intensitybased event-detection algorithms (Stauffer and Grimson, 1999;Zivkovic, 2004). Mahadevan et al. (2010) propose to detect abnormal behaviors in a crowded walkway using the MDT model (Chan and Vasconcelos, 2008). In our work, we propose to model the thrombus formation as an abnormal event in comparison to the circulation of blood cells in the uncoagulated aortic vessel. The complexity of the MDT model leads us to introduce an alternative solution, based on the single DT model (Doretto et al., 2003a), to the problem of event-detection under low contrast and dynamic conditions.

Method
In this overall work, we present an image-based algorithm for the automatized characterization of thrombus formation in DICmicroscopic image sequence. In this section, we focus on the sub-problem of the joint segmentation of the thrombus and aortic regions. We explain how the segmentation is used for tracking the vessel and thrombus regions in Section 3.
Let us denote the complete microscopic video sequence by S ¼ fI t ; 0 6 t < Tg. For each time t, the aim is to find the correct partition x t of the image domain X t into the thrombus X t 1 À Á , the background X t 2 À Á , and the uncoagulated vessel X t 3 À Á regions. The reunion of the thrombus and of the uncoagulated vessel regions forms the aortic region, X t 12 To assess the temporal information necessary to define the two DT-based likelihood energies, we consider the temporal window of s + 1 frames centered at time t, S t ¼ fI t ; t À s=2 6 t 0 6 t þ s=2g, and the initial temporal window S t 0 , S s=2 ¼ fI t ; 0 6 t 0 6 sg.
We model the partitioning of the image domain at time t given the frames in S t and S t 0 as the following energy minimization problem: The likelihood energy EðS t ; S t 0 jx t Þ groups the motion-segmentation E ms and the event-detection energies E ed . Since both energies are based on the concept of dynamic textures and on the derivation of a metric on the dynamic texture manifold, we provide a selfcontaining description of these two concepts in Section 2.1. The motion-segmentation energy guarantees the homogeneity of the motion-patterns in each region and is presented in Section 2.2. The event-detection energy models the growth of the thrombus in an initially uncoagulated vessel and is detailed in Section 2.3.
The prior energy E(x t ) comprises the tubular shape prior energy E tub and the topological energy E top . The tubular shape prior energy, introduced in Section 2.4, constrains the surface of the aortic region and prevents the thrombus and the vessel from leaking into the background. The topological energy guarantees the inclusion of the thrombus in the aortic region. It is introduced in Section 2.5 together with the minimization of Eq. (1) using multiphase level-sets (Vese and Chan, 2002).

Review of the dynamic texture model
In this work, we are interested in the analysis of motion patterns. We thus need a way to describe motion. This can be achieved using the concept of dynamic texture. We will introduce dynamic textures in Section 2.1.1. Furthermore, in order to compare motion, we need a distance measure. For this task, we will employ the Martin distance, which we review in Section 2.1.2. These are the central concepts used in the modeling of the likelihood energies in Sections 2.2 and 2.3.

Dynamic texture model
In this paragraph, we briefly review the dynamic texture (DT) model as introduced by Doretto et al. (2003a). Let us consider the temporal window S t and the square spatial patch of m neighboring pixels centered at the pixel x. We assume that the intensities yðt 0 Þ 2 R m observed in the patch at a time t 0 2 [t À s/2, . . . , t + s/ 2] are related to a set of hidden variables zðt 0 Þ 2 R n by the following n-order linear dynamic system: where y is the mean of y(t 0 ) over S t . The second equation describes the simplification of the pixel intensities in a set of n ( m spatial features. The first equation accounts for the evolution of the features over time. A 2 R nÂn and C 2 R mÂn are the state-transition and the observation matrices respectively. The Gaussian processes vðt 0 Þ / N ð0; QÞ; Q 2 R nÂn , and wðt 0 Þ / N ð0; RÞ; R 2 R mÂm account for noise in the hidden and in the observed states. Given the observations y(t 0 ) over the temporal window S t , the model parameters D t x ¼ A; C f g can be easily estimated under the assumption that both the observation and the state-transition matrices A and C are independent of the time t 0 of the transition (Doretto et al., 2003a). The sub-optimal solution reads as follows.
(2) for the s transitions in S t leads to Z sþ1 . . . ; zðqÞ and Y q p ¼ ½yðpÞ À y; . . . ; yðqÞ À y. Given the singular value decomposition Y sþ1 1 ¼ URV > , assuming the orthogonality of the observation matrix the Moore-Penrose pseudo-inverse operator. Please note that the uniqueness of the matrices b A and b C is implied by the uniqueness of the singular value decomposition.

Martin distance
We use the Martin distance to compare DT models. The orthogonality constraint on the matrix C yields a non-Euclidean structure for the manifold of the DT models (Saisan et al., 2001). This makes the use of adapted metrics necessary to compare DT models. Fortunately, several appropriate metrics have already been proposed in the literature. The Martin distance (De Cock and De Moor, 2002;Martin, 2002) and the Binet-Cauchy kernel (Vishwanathan et al., 2007) assess the similarity between the observation subspaces of the models, while the Kullback-Leibler divergence (Chan and Vasconcelos, 1992) compares their respective probability distributions. Though the Martin distance is not the actual distance in the manifold, it has been extensively shown to be appropriate to measure distances between motion patterns (Chan and Vasconcelos, 2007;Doretto et al., 2003b;Ravichandran et al., 2009;Saisan et al., 2001). We therefore assume that using this distance is an acceptable approximation to the actual geodesic distance. A further argument which supports this approximation is the extensive use of the Martin distance for non-linear dimensionality reduction of the DT manifold (Ravichandran et al., 2009). The key ideas behind the Martin distance are to neglect the noise processes w and v and to note that any observed variable generated by a DT model D ¼ fA; Cg lies in the n-dimensional subspace spanned by the columns of the observability C > > 2 R mnÂn . The Martin distance between two DT models D 1 and D 2 is then defined by the product of the principal angles h i between their respective subspaces (De Cock and De Moor, 2002;Martin, 2002):

Motion-segmentation likelihood
The purpose of the motion segmentation energy is to ensure the homogeneity of the thrombus X t 1 À Á , of the uncoagulated ves- , and of the background X t 3 À Á regions with respect to the motion patterns, and thus to the DT models. To this end, we extend the statistical approach for region-based segmentation (Cremers et al., 2007) to the segmentation in the manifold of the DT models. The Riemannian structure of this manifold shown by Saisan et al. (2001) makes the use of statistical frameworks for analyzing dynamic scenes difficult. Several clustering algorithms on the Euclidean space such as K-mean (Ravichandran et al., 2009), hierarchical EM (C, 2010), and approximate nearest neighbors (Chaudhry and Ivanov, 2010) have been successfully ex-tended to Riemannian manifolds. Level-set approaches such as the one proposed by Lenglet et al. (2004) for the analysis of DTI images guarantee the homogeneity of the results in the feature manifold as well as the smoothness of the segmentation in the image domain. They are thus particularly interesting in our application. However, the complex structure of the DT manifold hampers the derivation of a statistically well-founded likelihood. We propose in the next paragraph to approximate, using the Martin distance, the definition of the intrinsic normal distribution on Riemannian manifolds presented by Pennec (2006) and to derive our motion-segmentation likelihood from this approximation.

Approximation of the normal distribution on the DT manifold
We assume a normal distribution for the DT models in each region. The pdf of a normal distribution of mean x and of concentration matrix C on a Riemannian manifold M has been defined by Pennec (2006): where log x ðyÞ denotes the mapping of y to the tangent plane of M at x. The term in the exponential can be interpreted as the distance between the projections of y and of x to the plane tangent to the manifold at x. The concentration matrix C aims at respecting the curvature of the manifold and accounts for the dispersion of the DT models. This formulation requires the definition of the logarithm operation and of Riemannian metrics. While these quantities are well known on some manifolds such as the manifold of definite positive matrices (Lenglet et al., 2004;Pennec et al., 2006) or for the group of diffeomorphisms (Arsigny et al., 2006), they remain difficult to compute for the manifold of the DT models (C, 2010). We thus propose to approximate the projection step and the computation of the distance in the DT manifold with the previously defined Martin distance. We end up with the following approximation of Eq. (4): where D X t k and r 2 X t k respectively denote the mean and the variance of the DT models observed in the kth region and at the temporal window S t . We finally assume the independence between the labellings of the regions and the independence of the DT models in each pixel. This leads to the following expression for the likelihood energy of S t given x t :

Mean and variance
The above approximation of the likelihood energy relies on the computation of the mean and variance of the DT models. The common definition of the mean by the expectation, which holds for vector spaces, is not valid on non-Euclidean manifolds because of the difficulty to extend the sum with value on the manifold. The mean in the sense of Fréchet is alternatively defined as the point in the manifold that minimizes the variance to the observed points (Pennec, 2006). In our case, the Fréchet mean of the DT models observed in the region X t k reads as: where dist D t x ; D t y is the geodesic distance between the DT model D t x in the manifold M and the DT model D t y observed in the region Pennec (2006) proposes to solve this minimization problem by a gradient descent scheme. The computation of the new mean estimate in the tangent space of the current mean estimate is alternated with the recentring of the tangent space at the point on the manifold that corresponds to the new mean estimate. In this work, we reduce the space of search of the minimization by approximating the manifold M of the DT models by the finite set of DT models observed in the scene, i.e. M ¼ D t x jx 2 X t È É . Furthermore, we limit the number of distances to be computed by only estimating the barycenter on a sparse set X t k;s of samples randomly chosen in X t k . We finally make the same approximation as above that the structure of the manifold of the DT models can be characterized by the Martin distance. These simplifications result in the tractability of the following exhaustive search: for jX t k;s j ( jX t k j. An alternative solution to the computation of the Fréchet mean consists of approximating the squared distance between the fixed DT model D t x and the mean DT model D X t k by the expectation of the squared distance between D t x and a random DT model D t Y representative of the region. This defines the upper bound of the mean according to Doss (Doss, 1949;Pennec, 2006). Again, the expression is made computationally tractable by restricting the summation space to the subset of random samples X t k;s : The variance r 2 X t k of the DT models in the region X t k can be directly computed after estimation of the mean DT models using Eq. (8) or of the squared Martin distances using Eq. (9): A limitation of using a patch-based approach for the segmentation of DT models is that, for any patch overlapping two distinct regions, the intensities are generated by two independent processes. In this case, the estimated DT model does not correspond to the model of any region in the scene. This results in a likelihood discontinuous at the borders of the regions. Assuming that the segmented region X t k at the current iteration is a good estimate of the real position of the kth region, the DT models D t y which are the more likely to correspond to a single motion pattern are the DT models whose pixels y 2 X t k is the further from the borders of the object. We favor these DT models in the above estimation of the mean and of the standard deviation by setting: where d S ð:; X t k Þ is the signed distance function to the region X t k . This confidence value is incorporated into Eqs. (8) and (9) after normalization on the pixel samples in X t k;s , and into Eq. (10) after normalization on the pixel samples in X t k .

Alternative solutions
In the two last paragraphs, we have introduced an approximation for the intrinsic normal distribution on the manifold of the DT models. Ravichandran et al. (2009) propose to solve the nonlinearity problem by projecting the DT models on a low dimen-sional and Euclidean subspace using non-linear dimensionality reduction (NLDR) methods (cf. Fig. 3). The clustering of the DT models is performed on the Euclidean space, i.e. extrinsically to the manifold. Since only a few DT models are considered, the number of pairwise distances necessary to built the similarity graph is limited and the NLDR step remains computationally tractable (Ravichandran et al., 2009). This is not the case in our scenario. Given that a DT model is associated to each pixel in X (jXj % 2.5 Â 10 4 ), building the similarity graph would require the computation of jXj 2 Martin distances and would become computationally intractable. At the opposite, our approximations of the normal pdf and of the mean DT model make possible the reduction of the number of Martin distances which need to be estimated (jXj Â jX s j with jX s j ( jXj).

Event-detection likelihood energy
The aim of the event-detection energy is to provide additional features, based on the variation of the DT models in time, for the segmentation of the thrombus region.
In addition to its distinct motion pattern, the thrombus is characterized by its growth in an initially uncoagulated vessel. As illustrated in Fig. 4, we exploit this observation by analyzing the variations of the motion patterns between the initial temporal window S t 0 , in which the thrombus has not yet developed, and the current temporal window S t .
Let us consider a pixel x belonging to the thrombus region at the current temporal window S t . Since the thrombus is not developed in the first temporal window S t 0 ; x initially belongs to the background or to the uncoagulated aorta. Its motion pattern is modified and the Martin distance between the DT model D t 0 x observed in S t 0 and the DT model D t x observed in S t becomes significant. Let us consider a pixel x belonging to the uncoagulated vessel or to the background region in S t . Following the observation that the vessel boundaries remain nearly constant over the video sequence, x belongs to the same region in S t 0 . Its motion pattern is unchanged and the Martin distance between D t 0 x and D t x tends to zero. Fig. 3. Intrinsic vs. extrinsic definition of the normal distribution. We propose to approximate the normal distribution on the manifold of the DT models using the Martin distance. An extrinsic approach, which consists of projecting the DT models into a lower dimensional Euclidean space using non-linear dimensionality reduction methods (Ravichandran et al., 2009), is intractable in our scenario because of the high number of DT models considered in patch-based segmentation problems.
We define the event-detection map D t 0 t x in each pixel x as the Martin distance between the DT models D t 0 x and D t x . As shown in Fig. 4c-d, D t 0 t x is characterized by high values for the thrombus region, and can therefore be employed as an additional discriminative feature between the thrombus and the two other regions. We assume the normal distribution of this feature in each of the kth region. This leads to the following definition for the eventdetection energy: where l t 0 t X t k and r t 0 t X t k denote the mean and the standard deviation of the event-detection values D t 0 t x in the region X t k .

Tubular shape prior energy
The likelihood energies defined by Eqs. (6) and (12) characterize the spatial and the temporal changes of the DT models. As illustrated in Fig. 5b, these energies and the subsequent segmentations can be occasionally disrupted by the slow circulation of blood cells in the collateral vessels and by the local non-rigid deformations of the background. These perturbations provoke leaks in the segmentation of the thrombus and of the vessel regions. To avoid such inaccuracies, we constrain the shape of the aortic region X t 12 with a tubular shape prior energy. At each iteration of the energy minimization scheme, we fit an ''infinite rectangle'' R 12 to the boundary B 12 of X t 12 taken at the previous iteration.
An infinite rectangle can be parameterized by its centerline C 12 and its radius r 12 (cf. Fig. 5c). We approximate C 12 by the first principal axis of X t 12 . Given m the slope and p c (0) = (x c (0), y c (0)) the center of mass of C 12 , its characteristic equation reads as: b C 12 : y c ¼ m x c þ ðy c ð0Þ À m x c ð0ÞÞ: The distance of a pixel p(i) = (x(i), y(i)) belonging to the boundary B 12 of the region to the estimated centerline b C 12 is defined by the Euclidean distance kp(i) À p c (i)k 2 of this pixel to its orthogonal projection p c (i) = (x c (i), y c (i)) on b C 12 . The coordinates of p c (i) are obtained using basic geometry   5. Estimated segmentation of the thrombus (green) and of the aortic vessel (red) for a sub-sequence containing collateral vessels (red stars), if the tubular shape prior on the vessel region is dismissed or incorporated. In (c), overview of our tubular shape fitting method, displaying the boundary B12 of the vessel region X t 12 in dark gray, its principal axis C12 and its center of mass p c (0) in black, and the reconstructed rectangle R12 in light gray. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) x c ðiÞ ¼ m 1þm 2 ðy c ð0Þ À yðiÞÞ þ xðiÞÀm 2 xc ð0Þ 1þm 2 y c ðiÞ ¼ yðiÞ À 1 m ðx c ð0Þ À xðiÞÞ We approximate the radius r 12 of the fitted rectangle by the median value of the distances between the pixels on B 12 and the centerline b C 12 . For high angle values, the obtained parameters are further optimized using a simplex-based approach. Given the fitted rectangle R 12 , we finally define the following shape prior energy: where d R 12 ð:Þ denotes the distance transform of the binary image derived from the fitted rectangle R 12 . r 2 R sets the range of feasible deformations. This energy prevents the pixels which are far away from the fitted rectangle (d 2 R 12 ðxÞ ) r 2 ) to belong to the estimated aortic region X t 12 .

Energy minimization
The energy minimization problem defined by Eq. (1) is solved using level-sets. Level-sets methods are particularly interesting in our application because of their numerical stability and of their ability to easily combine segmentation cues with spatial regularization and shape priors. The regions X t 1 ; X t 2 ; X t 3 È É are represented as the zero level-sets of two functions U 1 and U 2 . Similarly to , we define the thrombus region X t 1 by the pixels inside the two contours (U 1 < 0, U 2 < 0), the uncoagulated vessel region X t 2 À Á by the pixels outside the first contour and inside the second contour (U 1 > 0, U 2 < 0), and the background region X t 3 À Á by the pixels outside the two contours (U 1 > 0, U 2 > 0). The aortic region (X t 12 ) is thus directly represented by the zero level-set of U 2 . The topological energy E top constrains the number of regions in the partition by penalizing the surface area of the remaining and fourth region (U 1 < 0, U 2 > 0). Given H the smooth approximation of the Heaviside function, E top reads as: The weighted sum of the likelihood and prior energies forms the following a posteriori energy: This function is minimized by alternating until convergence an update of the energies with a diffusion-based regularized gradient descent on U 1 and U 2 : Such Laplacian regularized level-set approaches benefit from their fast convergence and high capture range. The gradients r U 1 EðU 1 ; U 2 jS t ; S t 0 Þ and r U 2 EðU 1 ; U 2 jS t ; S t 0 Þ are given in Eqs. (16) and (17) respectively. To simplify the term corresponding to the event-detection energy, we assume that the distribution of the event-detection maps in the background and in the vessel follows the same normal distribution. This is qualitatively justified in Fig. 4. In Eqs. (16) and (17), d(.) denotes the smooth approximation of the Dirac delta function. k 2 R is the step size. j 2 R is a parameter weighting the diffusion-based regularization. g k is the diffusion coefficient of the level-set embedding function U k . It is more precisely defined by: where I t is the image to be segmented. g k is the median value of the gradient magnitude of I t along the k th contour at the current iteration. This gradient-dependent diffusion coefficient slows down the evolution of the level-sets for high values of the gradient magnitude. This yields further matching of the evolving contours to the edges of the aortic and of the thrombus regions.

Results
The first aim of this section is the validation of the segmentation algorithm presented in Section 2. Section 3.1 tests the likelihood and prior energies on synthetic examples. Section 3.2 quantifies the performance of the level-set framework on real microscopic sub-sequences. The second aim of this section is the extension of the segmentation algorithm to the tracking of the thrombus and aortic regions in time. Section 3.3 details and quantitatively test our tracking strategy. It also discusses the automatized computation of the aforementioned time to attachment (TTA) and thrombus surface area (TSA).

Validation on synthetic data
In this section, we test the performance of the motion-segmentation energy (Section 3.1.1), of the event-detection energy (Section 3.1.2), and of the tubular shape prior (Section 3.1.3).

Motion segmentation likelihood energy
We test our motion-segmentation energy and multiphase levelset framework against state-of-the-art methods for dynamic texture segmentation Vasconcelos, 2008, 2009;Doretto et al., 2003b;Ghoreyshi and Vidal, 2007) on a exhaustive dataset of 199 synthetic sequences (Chan and Vasconcelos, 2011). Evaluation on this database is particularly attractive because the results obtained with MDT (Chan and Vasconcelos, 2008) and LDT (Chan and Vasconcelos, 2009) are publicly available, that it is exhaustive in size, and that it covers a wide variety of motion patterns including grass, water, steam, fire, etc.
In this work, we focus on the segmentation of two and three dynamic textures. This is justified by the applicative context of our developments, i.e. by the partition of the image into three dynamic regions if the thrombus has developed and into two dynamic regions otherwise. In this last case, only one level-set embedding function is necessary. Please note that our approximation of the pdf stands for any number of dynamic regions and could therefore, in theory, be employed in more complex scenarios than the one in the focus of this work.
We discard the event-detection energy, the tubular shape prior energy, by setting b = 0 and c = 0 respectively. This allows us to drive the segmentation only with the motion-segmentation and the topological energies. Also, and for all the experiments conducted on synthetic data, we set g = 1 to smooth the level-set embedding functions independently to the possible edges of the objects. The initial contours and the parameters of the model (n = 15, m = 5 Â 5, s = 60) are the same as in (Chan and Vasconcelos, 2009). The same parameters are also used for the MDT (Chan and Vasconcelos, 2008), the LDT (Chan and Vasconcelos, 2009), and the single DT (Doretto et al., 2003b) methods. The parameters of the Ising approach are taken from (Ghoreyshi and Vidal, 2007). The same level-set framework, described in Section 2.5, is used for our algorithm and for the other level-set based approaches (Doretto et al., 2003b;Ghoreyshi and Vidal, 2007) in order to ensure a fair comparison of the segmentation energies. The results obtained with MDT and LDT are directly taken from Vasconcelos, 2008, 2009).
3.1.1.1. Qualitative performance. The performance of our motionsegmentation energy is qualitatively illustrated in Fig. 6a and b on two synthetic sequences containing two and three dynamic regions respectively. For the scene with two dynamic regions, the circle is successfully segmented. We also achieve accurate segmentation of the scene with three dynamic textures. In this case, the disk models the thrombus and the square models the uncoagulated vessel. It is important to observe in the energy-minimization plots at the far right of Fig. 6a and b that the selection of a unique DT sample in the estimation of Fréchet mean results in a discontinuous segmentation-energy (dotted curve). The mean in the sense of Doss yields a smoother convergence (full curve).
3.1.1.2. Quantitative performance. We estimate for each sequence the Rand-index (Hubert and Arabie, 1985) between the groundtruth segmentation and the segmentations obtained with our approach and with the state-of-the-art methods Vasconcelos, 2008, 2009;Doretto et al., 2003b;Ghoreyshi and Vidal, 2007). Using the Rand-index ensures a fair comparison of  Vasconcelos, 2008, 2009;Doretto et al., 2003a;Ghoreyshi and Vidal, 2007). (e) P-value that the mean accuracy using our method is smaller than the mean accuracy using Vasconcelos, 2008, 2009). (f) Precision curves obtained for jX s j = 40 DT samples.
our approach with the two most recent methods for dynamic texture segmentation Vasconcelos, 2008, 2009). Results are presented in Fig. 6c. Our algorithm achieves a mean Rand-index higher than r = 0.91 ± 14 and r = 0.93 ± 0.12 if more than jX s j = 40 reference points are considered and that the Fréchet and the Doss means are employed respectively. This is a significant improvement in comparison to the initialization (r = 0.69). In comparison, the MDT approach (Chan and Vasconcelos, 2008) yields r = 0.88 ± 13, the LDT algorithm (Chan and Vasconcelos, 2009) r = 0.92 ± 0.13, the single DT approach (Doretto et al., 2003b) r = 0.81 ± 0.17, and the Ising/AR model (Ghoreyshi and Vidal, 2007) r = 0.73 ± 0.14. While using the Fréchet mean yields a similar accuracy than employing LDT, using the Doss mean leads to a significant improvement of the accuracy in comparison to all the state-of-the-art algorithms, including LDT. The higher performance of the Doss mean can be explained by the robustness of Eq. (9) with respect to noise. The noise in the Martin distance maps d M D t x ; D t y tends to cancel out by addition.
It is crucial to observe that, given a fixed initialization, the approaches Vasconcelos, 2008, 2009;Ghoreyshi and Vidal, 2007) are deterministic. At the opposite, our algorithm and the single DT method (Doretto et al., 2003b) depends on the selection of the DT samples and of the reference model respectively. The robustness with respect to the selection of the DT models is measured by the standard deviation of the Rand-indexes of the segmentations for five runs of the algorithms. As shown in Fig. 6d, the variability of our approach with respect to the selection of the DT samples stays smaller than 0.08 ± 0.09 (Fréchet) and 0.025 ± 0.05 (Doss) for X s p > 40 DT samples. As expected, the single DT approach (Doretto et al., 2003b) is characterized by a higher mean variability of r = 0.10 ± 0.07.
To check if the robustness of our approach with respect to the selection of the DT models and to the variety of motion patterns is sufficient to conclude on its better performance, we perform right-tailed Student t-tests between our results and the results obtained with the two most accurate state-of-the art methods, i.e. with MDT (Chan and Vasconcelos, 2008) and LDT (Chan and Vasconcelos, 2009). We show in Fig. 6e that the p-values with respect to MDT stay smaller than 0.01 for both the Doss and the Fréchet means if jX s j > 40 DT samples are selected. We also note that, even though there is no statistical difference between the results obtained with the Fréchet mean and the results obtained with the LDT approach (Chan and Vasconcelos, 2009) (p % 0.5), we can successfully conclude on the better performance of our segmentation framework if the Doss mean is employed (p < 0.17).
We quantify the robustness of the aforementioned algorithms with respect to the variety of the motion patterns by computing the precision curves over the sequences in the database. These curves represent the proportion of segmentations having a Randindex larger than a varying threshold. The precision curves obtained using our level-set framework and the Doss mean stays at the top-right of the precision curves obtained with Vasconcelos, 2008, 2009;Doretto et al., 2003b;Ghoreyshi and Vidal, 2007), showing that we can correctly segment a higher proportion of dynamic texture sequences than the state-of-the-art methods (cf. Fig. 6f).
In this paragraph, we have shown experimental evidence of the performance of our motion-segmentation energy and level-set framework in comparison to state-of-the-art methods for dynamic texture segmentation. We solve the two problems posed by the original segmentation method (Doretto et al., 2003b) based on single DT model, i.e. the choice of the reference model and the incapacity of segmenting multiple dynamic regions, while maintaining the simplicity of the LDS formulation for dynamic textures and of the closed-form solution (Doretto et al., 2003a) for parameter estimation. Fig. 7a and b display two synthetic sequences containing a dynamic foreground object overlaid on a dynamic background. The two sequences are taken from the synthetic database (Chan and Vasconcelos, 2011) and are temporally shifted to ensure that the intensity observed in the sequence S t 0 at any pixel location differs from the intensity observed in S t at the same pixel location. The shape of the foreground object remains the same between the two sequences. The modification of its motion patterns models the appearance of the thrombus. We choose a = 0, c = 0, f = 0, and r = 1 to evolve the level-set embedding function only with the event-detection energy.

Event-detection likelihood
We observe in Fig. 7c that the foreground region is characterized by high values in the event-detection map D t 0 t x . Fig. 7(d) shows that driving the evolution of the level-set with the event-detection energy Eq. (12) results in a successful estimation of the foreground object. We repeat the same experiment on 50 different pairs of sequences and obtain a mean rand-index of r = 0.94 ± 0.11 and a mean Dice-coefficient (Dice, 1945) of DC = 0.91 ± 0.19. x . (d) Level-set evolution using only our event-detection likelihood energy. Changes of the DT models are accurately segmented. Fig. 8. Intensity-based level-set segmentation of a synthetic model for the aortic and collateral vessels with and without tubular shape prior energy.

Tubular shape prior energy
We qualitatively validate our shape prior energy on a synthetic model of the aortic and collateral vessels. The vessels are represented by high intensity values, the background by low intensity values. We replace our motion-segmentation energy by an intensity-based likelihood (Cremers et al., 2007). We set b = 0 and f = 0 to discard the event-detection and topological energies. As shown in Fig. 8, discarding the shape prior energy results in the incorporation of the collateral vessels into the segmentation result. As expected, our shape prior energy prevents the contour from leaking into the collateral vessels and leads to a more specific estimation of the synthetic aortic vessel.

Real microscopic data
In Section 3.1, we have shown experimental evidence of the performance of our segmentation algorithm and of each of the newly introduced energies on synthetic examples. We now test the performance of our segmentation method on real in vivo microscopic sequences. We start this section by detailing the experimental setup in Section 3.2.1. We then analyze the accuracy of our segmentation algorithm on a set of real microscopic sub-sequences in Section 3.2.2, and show the necessity of incorporating each of the introduced energies to obtain accurate segmentation results in Section 3.2.3. Our segmentation approach is used for the tracking of the thrombus and of the vessel and for the automatized computation of TTA and TSA in Section 3.3.

Protocol
The zebrafish larvae are 3-4 days old. They are anesthetized in Tricaine solution (300 lM, Sigma-Aldrich). To initiate thrombus formation, the epithelium of their caudal aorta is injured using a nitrogen-pulse ablating laser (VSL-337, LSI Laser Science). The images are recorded in time with a Leica DFC420-C camera through a Differential Interference Contrast (DIC) Leica DMRXA microscope equipped with a 40 Â water-immersed objective. A region of interest containing the thrombus and the aortic vessel is selected. The images are down-sampled by a factor two and rigidly registered to the first frame. The obtained field of view is of 110 lm Â 70 lm for a resolution of 125 Â 200 pixels. The frame rate is 25 fps and the exposure time 15 ms. Twenty sub-sequences taken from four in vivo microscopic video sequences are used for quantitative and qualitative validation of our segmentation algorithm. Variations between the test subsequences include differences in the size of the thrombus, in the contrast of the aggregated blood cells, and in the location of the collateral vessels. For each sub-sequence, the ground-truth segmentations are manually delineated by a trained biologist. Accuracy is measured with the Dice-coefficient (Dice, 1945), for both the thrombus (DC t ) and aortic (DC v ) regions. If not mentioned otherwise, the weighting and the shape prior parameters are empirically set to (a = 1, b = 0.5, c = 0.1, f = 1.25, r = 5), the order of the DT models to n = 15, and the size of the temporal window to s = 75. The influence of these parameters on the segmentation results is studied in detail in Section 3.2.3. The size of the spatial patches (m Â m = 5) and the number of DT samples (jX s j = 60) are empirically fixed. The initial contours consist of polygonal region of interests, which are defined by the selection of three to five vertices to partially overlap the regions to be segmented. This last constraint arises from our choice of a level-set based framework. Please note that we employ a single initialization per sub-sequence in Sections 3.2.2 and 3.2.3. The robustness of our approach towards the selection of the initial contours is studied in Section 3.2.4. Fig. 9 qualitatively shows the accuracy of the segmentations obtained with our algorithm. The initial contours are characterized by an overall overlap of DC t = 0.66 for the thrombus plug and of DC v = 0.69 for the aortic region. In comparison, our algorithm achieves mean accuracies of DC t = 0.75 and of DC v = 0.88 for the aorta and for the thrombus if the Doss mean is used, and mean accuracies of DC t = 0.74 and of DC v = 0.86 if the Fréchet mean is employed. These values demonstrate the ability of our method to achieve accurate segmentation results despite challenging imaging conditions and far-away initializations. In our scenario, the good convergence of our algorithm is crucial to ensure that the contour follow the thrombus and the vessel regions over the sequence even in the case of fast changes of their surface area. Please note that overlap-values also confirm the better performance of the mean in the sense of Doss in comparison to the Fréchet mean (cf. Section 3.1.1 on synthetic examples). They also illustrate the difficulty of segmenting the thrombus in comparison to the aortic region. Specific challenges include its restricted surface area and the possible changes of its boundaries during the considered sub-sequences. This last point breaks the stationarity assumption inherent to the dynamic texture model and results in the temporal smoothing of the estimated thrombus region.

Parameter setting
The dependency of the results with respect to the selection of the weighting parameters and of the dimensions of the DT models is analyzed in Figs. 10 and 11. Each parameter is independently studied. The remaining parameters are set to their aforementioned values. -If the topological energy is discarded (f = 0), the accuracy of the segmentations drops to less than DC t = 0.52 and DC v = 0.76.
With respect to the dimension of the DT models, and as indicated by Fig. 11, a minimum sub-sequence length of s = 75 frames and a minimal order of n = 10 are necessary to achieve acceptable segmentation results. We also observe that the accuracy of the thrombus segmentation falls if the order of the DT-model exceeds n = 15. This can be explained as follows. The estimation of the state-transition matrix A, detailed in Section 2.1, corresponds to a least-square problem with n Â (s À 1) equations and n 2 unknowns. The higher n is, the less overdetermined the system becomes and the less accurate the estimate of A is. There is thus a trade-off between the characterization of the motion patterns and the accurate estimation of the models.

Robustness towards initialization
We analyze the robustness of our segmentation algorithm with respect to the selection of the initial contours. For each subsequence, we select n i = 10 different initial configurations and run  our segmentation algorithm. We then define the initial cross stability R Init as the mean of the pairwise DC values between the ten initializations, which reads as DCðInitðpÞ; InitðqÞÞ; ð22Þ and the output cross stability R Seg as the mean of the pairwise DC values between the ten associated segmentation results. The cross stability tends to one if the segmentations overlap one with each other and to 0 otherwise. This is more intuitively illustrated in Fig. 12a  This increase shows the stability of our algorithm with respect to an unstable selection of the initial contours. We additionally plot in Fig. 3.2.3 the Dice coefficients, with respect to the ground truth segmentations, of the segmentation results vs the initial segmentations. The dispersion of the DC values in the x-axis direction, corresponding to the initialization, exceeds the of the DC values dispersion in the y-axis direction, corresponding to the obtained segmentations, for most of the test subsequences. This shows the decrease in the variability of the accuracies with respect to the ground-truth segmentations. Please note that the light-blue points are obtained on a subsequence in which the small size of the thrombus challenges the segmentation. More globally, the good stability of our approach with respect to the initialization is explained by our choice of region-based energies and by the use of a regularized gradient-descent scheme for the evolution of the two level-set functions .
In this section, we have shown the ability of the proposed algorithm to segment microscopic scenes under low contrast and highly dynamic conditions. In addition, we have quantitatively demonstrated the necessity to incorporate the motion-segmentation, event-detection, shape prior, and topological energies into the level-set framework as well as the robustness of our method with respect to the choice of the initial contours. The determination of quantitative measures such as the time to attachment and the thrombus surface area leads to the extension of the presented segmentation framework to the tracking of the vessel and of the thrombus in time. This is the focus of the Section 3.3.

Tracking strategy
We perform the tracking by initializing the level-set embedding functions at a given frame using the already segmented frames. In this case, a single vertex-based initialization becomes necessary to segment the whole video sequence.
The late development of the thrombus hampers the initialization of the function / 1 corresponding to the thrombus region in the first temporal window. Therefore, we initialize U 1 and U 2 in the last temporal window, in which the thrombus has already developed. We then propagate the segmentation results in the backward direction. The segmentation results obtained for the frame I tþ1 is used as initialization for the segmentation of I t . Because of the minimal changes of the vessel boundaries in comparison to the thrombus plug, one can additionally regularize the temporal variation of the vessel region by introducing a term dðU 2 Þ 1 À 2H / tþ1 2 À Á À Á to the gradient of the energy associated to the second level-set, where / tÀ1 2 is the level-set embedding function of the output segmentation at time t + 1. There is a balance between the benefits of such regularization, such as temporal smoothing, and the necessity to be robust to a possible false segmentation at t + 1 and, as such, to minimize the propagation of er-ror from t + 1 to t. This leads us to choose a small value for = 10 À2 in comparison to the other weighting parameters.
Because of the high number of frames in the video sequences (T > 1000), we lower the actual temporal resolution of the method from 25 fps to 2.5 fps. The segmentations at the remaining frames are computed using shape-based interpolation (Herman et al., 1992). Also, because of the higher performance of the mean in the sense of Doss in comparison to the Frć het mean, only the Doss mean is employed for tracking.
The performance of our algorithm is quantified against manually delineated ground-truth segmentations, available at every 15 frame of the video sequence (1.6 fps). The temporal shift between the ground-truth segmentations and the ''actively'' segmented frames (by opposition to the segmentations obtained by interpolation) allows us to incorporate the interpolation step in the estimation of the accuracy.

Results and discussion
We test the performance of the tracking strategy on four microscopic video-sequences. We refer the reader to Figs. 13-16 and to Tables 1 and 2 for qualitative and quantitative results. In the two following paragraphs, we discuss the ability of our algorithm to compute biological measures such as the time to attachment (TTA) and the thrombus surface area (TSA). It is important to note that the bottle neck of a manual analysis is the computation of the TSA in time. TTA does not require the manual annotation of the images and is as such relatively easy and quick to determine visually.
We achieve a mean detection rate of 0.86 ± 0.12 for the thrombus region, and an average absolute error of 6.75 ± 5.1 s for TTA. This lack of sensitivity for early-state thrombus can be explained as follows. In Seq.#4, the local and non-rigid deformations of the background before the attachment of the first blood cell at t = 2 s, and the deformation of the aortic vessel by the first attaching blood cells contradict the assumptions of the event-detection and shape prior energies. This leads to the classification of the first blood cells in the background region. In Seq.#2, the thrombus is out of focus and overlaid with circulating blood cells. The motion patterns of the circulating blood cells disrupt the estimation of the DT models and the computation of the DT-based likelihoods. It therefore seems difficult to automatize the computation of the TTA.
However, and as mentioned above, the estimation of the TTA is not the bottle neck of a manual processing. Estimating the temporal variations of the thrombus surface area (TSA) requires the manual delineation of the thrombus in each frame of the sequence. The ability of the presented algorithm to accurately recover this quantity is quantified using the normalized sum of absolute differences (SADs) between the estimated and the ground truth TSAs (cf. Table  2). The normalization by the sum of the ground-truth TSAs allows an intuitive interpretation of the obtained values: the SAD tends to zero if the difference between the estimated and the ground-truth TSAs is small in comparison to the actual TSA while a value higher than one indicates an error higher than the actual TSA. We obtain a normalized SAD of 0.15 ± 0.07, which quantitatively confirms the ability of our approach to correctly estimate the TSA.
We also observe in Tables 1 and 2 that switching one of the energies off yields a decrease in accuracy and robustness of the detection rate, mean Dice coefficient, TSA, and TTA. This shows the importance of incorporating each of the energies into the tracking/segmentation framework and as such confirms the results of Section 3.2.3 and Fig. 10.
More generally, we have shown in this section the ability of our complete framework to segment the thrombus and the aortic regions in time and to compute an accurate estimate of the temporal variations of the thrombus surface area (TSA) despite the poor in vivo imaging conditions.   Table 2 Absolute error between the estimated and the ground-truth TTAs for the complete and degraded versions of the algorithm, and normalized SAD between the estimated and the ground-truth TSAs.