Quantification of vascular networks in photoacoustic mesoscopy

Mesoscopic photoacoustic imaging (PAI) enables non-invasive visualisation of tumour vasculature. The visual or semi-quantitative 2D measurements typically applied to mesoscopic PAI data fail to capture the 3D vessel network complexity and lack robust ground truths for assessment of accuracy. Here, we developed a pipeline for quantifying 3D vascular networks captured using mesoscopic PAI and tested the preservation of blood volume and network structure with topological data analysis. Ground truth data of in silico synthetic vasculatures and a string phantom indicated that learning-based segmentation best preserves vessel diameter and blood volume at depth, while rule-based segmentation with vesselness image filtering accurately preserved network structure in superficial vessels. Segmentation of vessels in breast cancer patient-derived xenografts (PDXs) compared favourably to ex vivo immunohistochemistry. Furthermore, our findings underscore the importance of validating segmentation methods when applying mesoscopic PAI as a tool to evaluate vascular networks in vivo.


Introduction
Tumour blood vessel networks are often chaotic and immature [1][2][3][4][5], with inadequate oxygen perfusion and therapeutic delivery [6,7]. The association of tumour vascular phenotypes with poor prognosis across many solid cancers [1] has generated substantial interest in non-invasive imaging of the structure and function of tumour vasculature, particularly longitudinally during tumour development. Imaging methods that have been tested to visualise the vasculature include whole-body macroscopic methods, such as computed tomography and magnetic resonance imaging, as well as localised methods, such as ultrasound and photoacoustic imaging (PAI) [1]. Microscopy methods can achieve much higher spatial resolution but are typically depth limited, at up to ~1 mm depth, and frequently applied ex vivo [1,[8][9][10][11].
Of the available tumour vascular imaging methods, PAI is highly scalable and, as such, applicable for studies from microscopic to macroscopic regimes. By measuring ultrasound waves emitted from endogenous molecules, including haemoglobin, following the absorption of light, PAI can reconstruct images of vasculature at depths beyond the optical diffusion limit of ~1 mm [12,11,13,14]. State-of-the-art mesoscopic systems now bridge the gap between macroscopy and microscopy, achieving ~20 µm resolution at up to 3 mm in depth [15,16]. Preclinically, mesoscopic PAI has been used to monitor the development of vasculature in several tumour xenograft models [17][18][19] and can differentiate aggressive from slow-growing vascular phenotypes [19]. Studies to-date, however, have been largely restricted to qualitative analyses due to the challenges of accurate 3D vessel segmentation, quantification and robust statistical analyses [17,20,15,18,19,21]. Instead, PAI quantification is typically manual and ad-hoc, with 2D measurements often extracted from 3D PAI data [17,20,22,19,23], reducing repeatability and comparability across datasets.
To assess the performance and accuracy of vessel analyses, ground truth datasets are needed with a priori known features [24]. Creating full-network ground truth reference annotations could be achieved through comprehensive manual labelling of PAI data, but this is difficult due to: the lack of available experts to perform annotation with a new imaging modality; the time taken to label images; and the inherent noise and artefacts present in PAI data. Despite the numerous software packages available to analyse vascular networks [2], their performance in mesoscopic PAI has yet to be evaluated, hence there is an unmet need to improve the quantification of vessel networks in PAI, particularly given the increasing application of PAI in the study of tumour biology [17,15,19].
To quantify PAI vascular images and generate further insights into the role of vessel networks in tumour development and therapy response, accurate segmentation of the vessels must be performed [2] (see step 1 in Fig. 1). A plethora of segmentation methods exist and can be broadly split into two categories: rule-based and machine learning-based methods. Rule-based segmentation methods encompass techniques that automatically delineate the vessels from the background based on a custom set of rules [25]. These methods provide less flexibility and tend to consider only a few features of the image, such as voxel intensity [17,19,26,23] but they are easy-to-use, with no training dataset requirements. On the other hand, machine learning-based methods, such as random forest classifiers, delineate vessels based on self-learned features [25,27]. Nonetheless, learning-based methods are data-driven, requiring large and high-quality annotated datasets for training and can have limited applicability to new datasets. To tackle some of these issues, several software packages have been developed in recent years, and have become increasingly popular in life science research [28,2,29]. Prior to segmentation, denoising and feature enhancement methods, such as Hessian-matrix based filtering, can also be applied to overcome the negative impact of noise and/or to enhance certain vessel structures within an image [30][31][32].
Here, we establish ground truth PAI data based on simulations conducted using synthetic vascular architectures generated in silico and, also using a photoacoustic string phantom, composed of a series of synthetic blood vessels (strings) of known structure, which can be imaged in real-time. Against these ground truths, we compare and validate the performance of two common vessel segmentation methods, with or without the application of 3D Hessian matrix-based vesselness image filtering feature enhancement of blood vessels (steps 2 & 3 in Fig. 1). Following skeletonisation of the segmentation masks, we perform statistical and topological analyses to establish how segmentation influences the architectural characteristics of a vascular network acquired using PAI (steps 4 & 5 in Fig. 1). Finally, we apply our segmentation and analysis pipeline to two in vivo breast cancer models and undertake a biological validation of the segmentation and subsequent statistical and topological descriptors using ex vivo immunohistochemistry (IHC). Compared to a rule-based auto-thresholding method, our findings indicate that a learning-based segmentation, via a random forest classifier, is better able to account for the artefacts observed in our 3D mesoscopic PAI datasets, providing a more accurate segmentation of vascular networks. Statistical and topological descriptors of vascular structure are influenced by the chosen segmentation method, highlighting a need to validate and standardise segmentation methods in PAI for increased reproducibility and repeatability of mesoscopic PAI in biomedical applications.

In silico simulations of synthetic vasculature enable segmentation precision to be evaluated against a known ground truth
Our ground truth consisted of a reference dataset of synthetic vascular network binary masks (n = 30) generated from a Lindenmayer System, referred to as L-nets ( Fig. 2; Supplementary Movie 1 for 3D visualisation). We simulated PAI mesoscopy data from these L-nets ( Fig. 2A) and subsequently used vesselness filtering (VF) as an optional and additional feature enhancement method (Fig. 2B). The four segmentation pipelines selected for testing ( Fig. 1) were applied to the simulated PAI data (Fig. 2C), that is, all images were segmented with: 1. Auto-thresholding using a moment preserving method (AT); 2. Auto-thresholding using a moment preserving method with vesselness filtering pre-segmentation (AT+VF); 3. Random forest classifier (RF); 4. Random forest classifier with vesselness filtering pre-segmentation (RF+VF).
Visually, RF methods appear to segment a larger portion of synthetic blood vessels (Fig. 2C) and they are particularly good at segmenting vessels at depths furthest from the simulated light source (Fig. 2D). A key image quality metric in the context of segmentation is the signal-tonoise (SNR), which is degraded at greater depth (Fig. 3A). To evaluate the relative performance of the methods, we compared the segmented Fig. 1. The mesoscopic photoacoustic image analysis pipeline. 1) Images are acquired and reconstructed at a resolution of 20 × 20×4 µm 3 (PDX tumour example shown with axial and lateral maximum intensity projections -MIPs). 2) Image volumes are pre-processed to remove noise and homogenise the background signal (high-pass and Wiener filtering followed by slice-wise background correction). Vesselness image filtering (VF) is an optional and additional feature enhancement method. 3) Regions of interest (ROIs) are extracted and segmentation is performed on standard and VF images using auto-thresholding (AT or AT + VF, respectively) or random forest-based segmentation with ilastik (RF or RF + VF, respectively). 4) Each segmented image volume is skeletonised (skeletons with diameter and length distributions shown for RF and RF + VF, respectively). 5) Statistical and topological analyses are performed on each skeleton to quantify vascular structures for a set of vascular descriptors. All images in steps 2-4 are shown as x-y MIPs.   and skeletonised blood volumes (BV) from the simulated PAI data to the known ground truth from the L-net. Here, we found that the learningbased RF segmentation outperformed the others in making the segmentation masks, with significantly higher R 2 (segmented BV: AT: 0.68, AT+VF: 0.58, RF: 0.84, RF+VF: 0.89, Fig. 3B skeleton BV: AT: 0.59, AT+VF: 0.73, RF: 0.90, RF+VF: 0.93, Fig. 3C) and lower mean-squared error (MSE) (Fig. 3D), with respect to the ground truth L-net volumes, compared to both AT methods (p < 0.0001 for all comparisons). Bland-Altman plots, which we used to illustrate the level of agreement between segmented and ground truth vascular volumes, showed a mean difference compared to the reference volume of 0.61 mm 3 (limits of agreement, LOA − 0.48 to 1.7 mm 3 , Fig. 3E) and F1 score of 0.73 ± 0.11 (0.49-0.88) for RF segmentation, albeit with a wide variation indicated by the LOA. RT+VF segmentation resulted in a similar mean difference 0.74 mm 3  In all cases, the mean difference shown in Bland-Altman plots increased with ground truth vascular volume, especially in the rulebased AT segmentation, which would be expected due to the restricted illumination geometry of photoacoustic mesoscopy. Since more vessel structures lie at a greater distance from the simulated light source in larger L-nets, they suffer from the depth-dependent decrease in SNR (Fig. 3A). RF segmentation was better able to cope with the SNR degradation, particularly at distances beyond ~1.5 mm, compared to the AT segmentation, which consistently underestimated the vascular volume.
Next, we skeletonised each segmentation mask to enable us to perform statistical and topological data analysis (TDA) to test how each segmentation method quantitatively influences a core set of vessel network descriptors [33]. These descriptors allowed us to evaluate the performance of the different segmentation methods in respect of the biological characterisation of the tumour networks. We used the following statistical descriptors: vessel diameters and lengths, vessel tortuosity (sum-of-angles measure, SOAM) and vessel curvature (chord-to-length ratio, CLR). Our topological network descriptors are connected components (Betti number β 0 ) and looping structures (1D holes, Betti number β 1 ) (see Table S1 for descriptor descriptions).
Here, the accuracy and strength of relationship between the segmented and ground truth vascular descriptors, calculated by MSE (see Fig. 3D) and R 2 values (Figure S1A-I) respectively, gave the same conclusions. Across all skeletons, we measured an increased number of connected components (β 0 ) and changes to the number of looping structures (β 1 ) from the simulated compared to the ground truth L-nets, resulting in low R 2 and high MSE for all methods (Fig. 3D). The observed changes in these topological descriptors arise due to depth-dependent SNR and PAI echo artefacts. For all other descriptors, AT+VF outperformed the other segmentation methods in its ability to accurately preserve the architecture of the L-nets, with higher R 2 and lowest MSE values for vessel lengths, CLR, SOAM, number of edges and number of nodes (Fig. 3D).
Vessel diameters are accurately preserved by both RF segmentation methods, supporting our observation that these methods perform accurate vascular volume segmentation. We note that the number of edges and nodes are also well preserved by RF and RF+VF. This further supports the high accuracy of both RF methods to segment vascular structures.

Random forest classifier accurately segments a string phantom
We next designed a phantom test object (Supplementary Figure 2) to further compare the performance of our segmentation pipelines in a ground truth scenario. Agar phantom images (n = 7) were acquired using a photoacoustic mesoscopy system and contained three strings of the same known diameter (126 µm), length (~8.4 mm) and consequently volume (104.74 µm 3 ), positioned at 3 different depths, 0.5 mm, 1 mm, and 2 mm, respectively (Fig. 4A,B; Supplementary Movie 2). Consistent with our in silico experiments, the accuracy of skeletonised string volumes decreased as a function of depth across all methods (Fig. 4C), due to the decreased SNR with depth (Fig. 4D). Interestingly, the significance of this decrease was very high for all comparisons (top vs. middle, top vs. bottom and middle vs. bottom) in both AT methods (all p < 0.001), but we observed an improvement in string volume predictions across depth for both RF methods, such that middle vs. bottom string volumes were not significantly different in RF+VF (p = 0.42).
The illumination geometry of the photoacoustic mesoscopy system means that vessels or strings are underrepresented when detected as the illumination source is located at the top surface of the tissue or phantom (Fig. 4E). As a result, all string volumes computed from the segmented images are inaccurate relative to ground truth suggesting that blood volume cannot be accurately predicted from segmented PA images (Fig. 4F). Skeletonisation provides a more accurate prediction of vessel and string volume as it approximates the undetected section by representing these objects as axisymmetric tubes (Fig. 4C,F).

Vesselness filtering of in vivo tumour images impacts computed blood volume
Having established the performance of our AT-and RF-based segmentation methods in silico and in a string phantom, next we sought to determine the influence of the chosen method in quantifying tumour vascular networks from size-matched breast cancer patient-derived xenograft (PDX) tumours of two subtypes (ER-n = 6; ER+ n = 8, total n = 14).
Visual inspection of the tumour networks subjected to our processing pipelines suggests that VF increases vessel diameters in vivo (Fig. 5A-C; see Supplementary Movie 3 for 3D visualisation). This could be due to acoustic reverberations observed surrounding vessels in vivo, which VF scores with high vesselness, spreading the apparent extent of a given vessel and ultimately increased volume. Our quantitative analysis confirmed this observation, where significantly higher skeletonised blood volumes were calculated in the AT+VF and RF+VF masks compared to AT and RF alone (Fig. 5D).

Network structure analyses and comparisons to ex vivo immunohistochemistry of tumour vasculature are impacted by the choice of segmentation method
Next, we computed vascular descriptors for our dataset of segmented in vivo images. As expected from our initial in silico and phantom evaluations, VF led to increased vessel diameters and lengths (Fig. 5E,F), as well as blood volume. Our in silico analysis indicated that AT performs poorly in differentiating vessels from noise and introduces many vessel discontinuities (Table S1). This was exacerbated in vivo where more complex vascular networks and real noise lead to an increase in segmented blood volume (p < 0.01), looping structures (Fig. 5G), a greater number of edges (Fig. 5H), and reduced number of connected components (Fig. 5I).
Our prior in silico and phantom experiments indicated that RF-based methods had a greater capacity to segment vessels at depth. Similarly, we observe more connected components for RF-based methods in vivo ( Fig. 5I) along with lower SOAM (Fig. 5J) and higher CLR (Fig. 5K), suggesting that RF-segmented vessels have reduced tortuosity and curvature compared to AT+VF segmented vessels. These in vivo findings support our observations from in silico and phantom studies where RFbased methods provide the most reliable prediction of vascular volume, whereas AT+VF best preserves architecture towards the tissue surface.
Next, we sought to assess how our vascular metrics correlated with the following ex vivo IHC descriptors: CD31 staining area (to mark vessels), ASMA vessel coverage (as a marker of pericyte/smooth muscle coverage and vessel maturity) and CAIX (as a marker of hypoxia) to provide ex vivo biological validation of our in vivo descriptors. Our in silico, phantom and in vivo analyses indicate that AT+VF and RF are the top performing segmentation methods and so we focussed on these (results for AT and RF+VF can be found in Figure S3). We note that none of the vascular metrics derived from AT segmented networks correlated with IHC descriptors.
Finally, levels of hypoxia in the tumours, measured by CAIX IHC, positively correlated in both AT+VF and RF methods with skeletonised blood volume (r = 0.72, p = 0.007; and r = 0.72, p = 0.004, respectively), number of edges (r = 0.59, p = 0.04; and r = 0.84, p < 0.001, respectively), nodes (r = 0.72, p = 0.007; and r = 0.84, p < 0.001, respectively) and looping structures (r = 0.61, p = 0.03; and r = 0.85, p < 0.001, respectively). In the case of blood volume, edges and nodes, these results are expected as it has been shown that breast cancer tumours with dense but immature and dysfunctional vasculatures exhibit elevated hypoxia [1,35], likely due to poor perfusion. CAIX negatively correlated with connected components for RF networks (r = − 0.87, p < 0.001) (Fig. 5L), reflecting results for ASMA vessel coverage. Our cross-validation between ex vivo IHC and vascular descriptors indicate that RF and AT+VF segmentation methods can reliably capture biological characteristics in tumours.

Ex vivo immunohistochemistry and network structural analyses highlight distinct vascular networks between ER-and ER+ breast patientderived xenograft tumours
Finally, we quantified and compared IHC and our vascular descriptors between the two breast cancer subtypes represented (RF in Fig. 6; AT+VF in Figure S4; similar trends and significances are observed unless stated otherwise). From analysis of IHC images (Fig. 6A), ERtumours had higher CD31 staining area (Fig. 6B), poorer ASMA+ pericyte vessel coverage (Fig. 6C) and higher CAIX levels ( Fig. 6D) compared to ER+ tumours. Our IHC data supports our RFderived vascular descriptors, where we found that ER-tumours had denser networks, with higher blood volume, diameter and looping structures (Fig. 6E,F,G). ER+ tumours have a sparse network but showed   more subnetworks (Fig. 6H) with significantly longer vessels in AT+VF segmented networks (p < 0.05, Figure S4C), which could indicate a more mature vessel network based on our prior correlative analyses. No significant differences between the two models were observed for blood vessel tortuosity and curvature (Fig. 6J,K).

Discussion
Mesoscopic PAI enables longitudinal visualisation of blood vessel networks at high resolution, non-invasively and at depths beyond the optical diffraction limit of 1 mm [11,13,15,14]. To quantify the vasculature, PA images need to be accurately segmented. Manual annotation of vasculature in 3D PAI is difficult due to depth-dependent signal-to-noise and imaging artefacts. Whilst a plethora of vascular segmentation techniques are available [2,27], their application in PAI has been limited due to a lack of an available ground truth for comparison and validation.
In this study, we first sought to address the need for ground truth data in PAI segmentation. We generated two ground truth datasets to assess the performance of rule-based and machine learning-based segmentation approaches with or without feature enhancement via vesselness filtering. The first is an in silico dataset where PAI was simulated on 3D synthetic vascular architectures; the second is an experimental dataset acquired from a vessel-like string phantom. These allowed us to evaluate the ability of different segmentation methods to preserve blood volume and vascular network structure.
Our first key finding is that machine learning-based segmentation using RF classification provided the most accurate segmentation of vessel volumes across our in silico, phantom and in vivo datasets, particularly at depths beyond ~1.5 mm, where SNR diminishes due to optical attenuation. Compared to the AT approaches, RF-based segmentation partially overcomes the depth dependence of PAI SNR since it identifies and learns edge and texture features of vessels at different scales and contrasts. Such intrinsic depth-dependent limitations are often ignored in the literature, where analyses are typically performed on 2D maximum intensity projections for simplicity [17,20,22,18,19,23], suggesting that a fully 3D machine learning-based segmentation is needed to accurately recapitulate the complexity of in vivo vasculatures measured using PAI.
As blood vessel networks can be represented as complex, interconnected graphs, we performed statistical and topological data analyses [33,36] to further assess the strengths and weaknesses of our chosen segmentation methods.
Our second key finding is that AT methods struggle to segment vessels with low SNR, but adding VF outperforms all other methods in preserving vessel lengths, loops, curvature and tortuosity. Additionally, where intensity varies across a vessel structure, this results in many disconnected vessels when segmenting with AT alone, as only the highest intensity voxels will pass the threshold. Only when vesselness filtering is applied does AT do well at preserving topology. VF alters the intensity values from a measure of PA signal to a prediction of 'vesselness', generating a more homogeneous intensity across the vessel structures and ultimately a more continuous vessel structure to segment. This likely explains why AT+VF best preserves vessel length and, subsequently, network structure, while AT alone performs poorly. For AT, VF improved BV predictions in silico via better preservation of lengths but not diameters, as our phantom experiments indicate that AT+VF overestimates diameter. Owing to the homogenous intensity of vessels introduced by VF, one could therefore assume that RF+VF would be the most accurate method at preserving network structure (by combining the machine-learning accuracy of segmentation with the shape enhancement of VF). However, this is not the case: RF alone can account for discontinuities in vessel intensity, unlike AT, meaning it does not rely on VF to enhance structural preservation, which is our third key finding. In fact, the slight inaccuracy in diameter preservation introduced by VF in silico appears to decrease topology preservation in RF+VF compared to RF alone. As expected, all methods led to an increase in the number of subnetworks (connected components) in silico, as these segmentation methods cannot reconnect vessel subnetworks that were disconnected due to poor SNR or imaging artefacts. Given the better segmentation at depth by RFmethods, we hypothesise that these increasingly small subnetworks might have biased the segmentations to underperform in our vascular descriptors. This could be explored in future work, for example, by developing string phantoms with more complex topologies.
Taken together, our results suggest that RF performs feature detection across scales in the manually labelled voxels to learn discriminating characteristics for vessel classification and segmentation. Adding VF before RF segmentation may confound this segmentation framework, because VF systematically smooths images and removes non-cylindrical raw image information, which may have been vital in the RF learning of vascular structures on the training dataset.
Applying statistical and topological analyses to our in vivo tumour PDX dataset we observed trends consistent with our in silico and phantom experiments. Cross-validating our vascular descriptors with ex vivo IHC confirmed that we can extract biologically relevant information from mesoscopic PA images. In our analyses, we revealed that predictions of BV correlated with endothelial cell and hypoxia markers via CD31 and CAIX staining, respectively; and vascular descriptors relating to the maturation of vascular structures correlated with ASMA vessel coverage. Applying our segmentation pipeline to compare ER-and ER+ breast cancer PDX models showed that descriptors of network structure can capture the higher density and immaturity of ER-vessel networks which result in decreased oxygen delivery and high hypoxia levels in comparison to ER+ tumours.
Prior work applying topological data analysis after skeletonization of PAI data in a skin burn model also showed biologically relevant differences in between healthy and burnt skin in rats [37]. Looking further afield, vascular descriptors such as these have also shown response to anti-angiogenic drugs and radiotherapy, with decreased SOAM (tortuosity), loops and increased vessel lengths detected in response to treatment in mouse colon carcinomas [33]. Stolz et al. also showed an increase in tortuosity and looping structures upon addition of a vascular-promoting agent. These vascular descriptors are able to capture vascular structures of varying disorder, meaning they could be widely applied to study tumour vasculature, which is often chaotic and tortuous due to excessive angiogenesis [1].
While our pipeline yields encouraging correlations to the underlying tumour vasculature, avenues of further development exist to improve the realism of our ground truth data, including advances in simulation complexity, and tissue-specific synthetic and phantom vasculatures. While our in silico PAI dataset incorporated the effects of depthdependent SNR and gaussian noise found in in vivo PAI mesoscopic data, further development of the optical simulations could, for example, recapitulate the raster-scanning motion of illumination optical fibres, instead of approximating a simultaneous illumination plane of singlepoint sources. The limited aperture of the raster-scanning ultrasound transducer could not be simulated in k-Wave as it is not yet implemented for 3D structures. In terms of vascular complexity, our string phantom represents a highly idealised vessel networks but future work could introduce more complex and interconnected vessel-like networks in order to replicate more realistic vascular topologies [38]. Our ex vivo IHC descriptors were used to confirm our in vivo tumour analyses but did not exhibit correlations across all vascular descriptors. This may be expected as the 2D IHC analysis does not fully encompass the 3D topological characteristics of the vascular network. 3D IHC, microCT or light sheet fluorescence microscopy may provide improved ex vivo validation using exogenous labelling to identify 3D vascular structures, such as tortuosity, at endpoint [39,40]. It should also be noted that we cannot discount the effect of unconscious biases on segmentation performance when manually labelling images with and without VF to train the classifier. The segmentation accuracy of classifiers trained by multiple users could be explored in future work to formally investigate these effects.
Furthermore, the past decade has seen the rise of a multitude of blood vessel segmentation methods using convolutional neural networks and deep learning [41]. Applying deep learning to mesoscopic PAI could provide a means to overcome several equipment-related limitations such as: vessel discontinuities induced by breathing motion in vivo; vessel orientation relative to the ultrasound transducer; shadow and reflection artefacts; or underestimation of vessel diameter in the z-direction due to surface illumination. Whilst we found that skeletonisation addressed diameter underestimation and observed the influence of discontinuities on the extracted statistical and topological descriptors, they were not deeply characterised or corrected. Nonetheless, whilst deep learning may provide superior performance when fine-tuned to specific tasks, the resulting methods may lack generalisability across tissues with differing SNR and blood structures, requiring large datasets for training. In this study we chose to use software that is open-source and widely accessible to biologists in the life sciences. We believe that such a platform shows more potential to be employed widely with limited computational expertise.
In summary, we developed an in silico, phantom, in vivo, and ex vivovalidated end-to-end framework for the segmentation and quantification of vascular networks captured using mesoscopic PAI. We created in silico and string phantom ground truth PAI datasets to validate segmentation of 3D mesoscopic PA images. We then applied a range of segmentation methods to these and images of breast PDX tumours obtained in vivo, including cross-validation of in vivo images with ex vivo IHC. We have shown that learning-based segmentation, via a random forest classifier, best accounted for the artefacts present in mesoscopic PAI, providing a robust segmentation of blood volume at depth in 3D and a good approximation of vessel network structure. Despite the promise of the learning-based approach to account for depth-dependent variation in SNR, auto-thresholding with vesselness filtering more accurately represents statistical and topological characteristics in the superficial blood vessels as it better preserves vessel lengths. Therefore, when quantifying PA images, users need to consider the relative importance of each descriptor as the choice of segmentation method can directly impact the resulting analyses. We have highlighted the potential of statistical and topological analyses to provide a detailed parameterisation of tumour vascular networks, from classic statistical descriptors such as vessel diameters and lengths to more complex descriptors of network topology characterising vessel connectivity and loops. Our results further underscore the potential of photoacoustic mesoscopy as a tool to provide biological insight into studying vascular network in vivo by providing life scientists with a readily deployable and cross-validated pipeline for data analysis.

Generating ground truth vascular architectures in silico
To generate an in silico ground truth vascular network, we utilised Lindenmayer systems (L-Systems, see Figure S5) [42]. L-Systems are language-theoretic models that were originally developed to model cellular interactions but have been extended to model numerous developmental processes in biology [43]. Here, we apply L-Systems to generate realistic, 3D vascular architectures [44,45] (referred to as L-nets) and corresponding binary image volumes. A stochastic grammar E.L. Brown et al. was used [44] to create a string that was evaluated using a lexical and syntactic analyser to build a graphical representation of each L-net. To transfer the L-net to a discretised binary image volume, we used a modified Bresenham's algorithm [46] for 3D to create a vessel skeleton. Voxels within a vessel volume were then identified using the associated vessel diameter for each centreline ( Figure S5).

Photoacoustic image simulation of synthetic ground truths
To test the accuracy of the segmentation pipelines, the L-nets were then used to simulate in vivo photoacoustic vascular networks embedded in muscle tissue using the Simulation and Image Processing for Photoacoustic Imaging (SIMPA) python package (SIMPA v0.1.1, https://gith ub.com/CAMI-DKFZ/simpa) [47] and the k-Wave MATLAB toolbox (k-Wave v1.3, MATLAB v2020b, MathWorks, Natick, MA, USA) [48]. Planar illumination of the L-nets on the XY plane was achieved using Monte-Carlo eXtreme (MCX v2020, 1.8) simulation on the L-net computational grid of size 10.24 × 10.24 × 2.80 mm 3 with 20 µm isotropic resolution. The optical forward modelling was conducted at 532 nm using the optical absorption spectrum of 50% oxygenated haemoglobin for vessels (an approximation of tumour vessel oxygenation based on previously collected photoacoustic data [35] and of water for muscle. Next, 3D acoustic forward modelling was performed on the illuminated L-nets assuming a speed of sound of 1500 ms − 1 in k-Wave. The photoacoustic response of the illuminated L-nets was measured with a planar array of sensors positioned on the surface of the XY plane with transducer elements of bandwidth central frequency of 50 MHz (100% bandwidth) and using a 1504 time steps, where a time step is 5 × 10 − 8 Hz − 1 ). Finally, the 3D initial PA wave-field was reconstructed using fast Fourier transform-based reconstruction [48], after adding uniform gaussian noise on the collected wave-field.

String phantom
We used a string phantom as a ground truth structure (see Supplementary Materials). The agar phantom was prepared as described previously [49] including intralipid (I141-100 ML, Merck, Gillingham, UK) to mimic tissue-like scattering conditions. Red-coloured synthetic fibres (Smilco, USA) were embedded at three different depths defined by the frame of the phantom to provide imaging targets with a known diameter of 126 µm. The top string was positioned at 0.5 mm from the agar surface, the middle one at 1 mm, and the bottom one at 2 mm, as shown in Figure S2.

Animals
All animal procedures were conducted in accordance with project and personal licences, issued under the United Kingdom Animals (Scientific Procedures) Act, 1986 and approved locally under compliance forms CFSB1567 and CFSB1745. For in vivo vascular tumour models, cryopreserved breast PDX tumour fragments in freezing media composed of heat-inactivated foetal bovine serum (10500064, Gibco™, Fisher Scientific, Göteborg Sweden) and 10% dimethyl sulfoxide (D2650, Merck) were defrosted at 37 • C, washed with Dulbecco's Modified Eagle Medium (41965039, Gibco) and mixed with matrigel (354262, Corning®, NY, USA) before surgical implantation. One estrogen receptor negative (ER-, n = 6) PDX model and one estrogen receptor positive (ER+, n = 8) PDX model were implanted subcutaneously into the flank of 6-9 week-old NOD scid gamma (NSG) mice (#005557, Jax Stock, Charles River, UK) as per standard protocols [50]. Once tumours had reached ~1 cm mean diameter, tumours were imaged and mice sacrificed afterwards, with tumours collected in formalin for IHC.

Photoacoustic imaging
Mesoscopic PAI was performed using the raster-scan optoacoustic mesoscopy (RSOM) Explorer P50 (iThera Medical GmbH, Munich, Germany). The system uses a 532 nm laser for excitation. Two optical fibre bundles are arranged either side of a transducer, which provide an elliptical illumination beam of approximately 4 mm × 2 mm in size. The transducer and lasers collectively raster-scan across the field-of-view. A high-frequency single-element transducer with a centre frequency of 50 MHz (>90% bandwidth) detects ultrasound. The system achieves a lateral resolution of 40 µm, an axial resolution of 10 µm and a penetration depth of up to ~3 mm [51].
For image acquisition of both phantom and mice, degassed commercial ultrasound gel (AquaSonics Parker Lab, Fairfield, NJ, USA) was applied to the surface of the imaging target for coupling to the scan interface. Images were acquired over a field of view of 12 × 12 mm 2 (step size: 20 µm) at either 100% (phantom) or 85% (mice) laser energy and a laser pulse repetition rate of 2 kHz (phantom) or 1 kHz (mice). Image acquisition took approximately 7 min. Animals were anaesthetised using 3-5% isoflurane in 50% oxygen and 50% medical air. Mice were shaved and depilatory cream applied to remove fur that could generate image artefacts; single mice were placed into the PAI system, on a heat-pad maintained at 37 • C. Respiratory rate was maintained between 70 and 80 bpm using isoflurane (~1-2% concentration) throughout image acquisition.

Segmentation and extraction of structural and topological vascular descriptors
All acquired data were subjected to pre-processing prior to segmentation, skeletonisation and structural analyses of the vascular network, with an optional step of vesselness filtering also tested (Fig. 1). Prior to segmentation, data were filtered in the Fourier domain in XY plane to remove reflection lines, before being reconstructed using a backprojection algorithm in viewRSOM software (v2.3.5.2 iThera Medical GmbH) with motion correction for in vivo images with a voxel size of 20 × 20 × 4 µm 3 (X,Y,Z). To reduce background noise and artefacts from the data acquisition process, reconstructed images were subjected to a high-pass filter, to remove echo noise, followed by a Wiener filter in MATLAB (v2020b, MathWorks, Natick, MA, USA) to remove stochastic noise. Then, a built-in slice-wise background correction [52] was performed in Fiji [53] to achieve a homogenous background intensity (see exemplars of each pre-processing step in Figure S6).

Image segmentation using auto-thresholding or a random forest classifier
Using two common tools adopted in the life sciences, we tested both a rule-based moment preserving thresholding method (included in Fiji v2.1.0) and a learning-based segmentation method based on random forest classifiers (with ilastik v1.3.3 [28]). These popular packages were chosen to enable widespread application of our findings. Moment preserving thresholding, referred to as auto-thresholding (AT) for the remainder of this work, computes the intensity moments of an image and segments the image while preserving these moments [54]. Training of the random forest (RF) backend was performed on 3D voxel features in labelled regions, including intensity features, as with the AT method, combined with edge filters, to account for the intensity gradient between vessels and background, and texture descriptors, to discern artefacts in the background from the brighter and more uniform vessel features, each evaluated at different scales (up to a sigma of 5.0).
A key consideration in the machine learning-based segmentation is the preparation of training and testing data (Table S2). For the in silico ground truth L-net data, all voxel labels are known. All vessel labels were used for training, however, only partial background labels were supplied to minimise computational expense by labelling the 10 voxel radius surrounding all vessels as well as 3 planes parallel to the Z-axis (edges and middle) as background ( Figure S7A,B). For the phantom data, manual segmentation of the strings from background was performed to provide ground truth. Strings were segmented in all slices on which they appeared and background was segmented tightly around the string ( Figure S7C). For the in vivo tumour data, manual segmentation of vessels was made by a junior user (TLL) supervised by an experienced user (ELB), including images of varying signal-to-noise ratio (SNR) to increase the robustness of the algorithm for application in a range of unseen data. Up to 10 XY slices per image stack in the training dataset were segmented with pencil size 1 at different depths to account for depth-dependent SNR differences ( Figure S7D).
Between pre-processing and segmentation, feature enhancement was tested as a variable in our segmentation pipeline. In Fiji, we adapted a modified version of Sato filtering (α = 0.25) [55] to calculate vesselness from Hessian matrix eigenvalues [56] across multiple scales. Five scales in a linear Gaussian normalized scale space were used, from which the maximal response was measured to produce the final vesselness filtered images (20,40,60,80, and 100 µm) [55].
Finally, all segmented images (either from Fiji or ilastik) were passed through a built-in 3D median filter in Fiji, to remove impulse noises ( Figure S8). To summarise the pipeline (Fig. 1), the methods under test for all datasets were: 1. Auto-thresholding using a moment preserving method (AT); 2. Auto-thresholding using a moment preserving method with vesselness filtering pre-segmentation (AT+VF); 3. Random forest classifier (RF); 4. Random forest classifier with vesselness filtering pre-segmentation (RF+VF).
Computation times are summarised in Table S3.

Extracting tumour ROIs using a 3D CNN
To analyse the tumour data in isolation from the surrounding tissue required delineation of tumour regions of interest (ROIs). To achieve this, we trained a 3D convolutional neural network (CNN) to fully automate extraction of tumour ROIs from PAI volumes. The 3D CNN is based on the U-Net architecture [57] extended for volumetric delineation [58]. Details on the CNN architecture and training are provided in the Supplementary Materials and Figures S9-S10.

Network structure and topological data analysis
Topological data analysis (TDA) of the vascular networks was performed using previously reported software that performs TDA and structural analyses on vasculature [33,36]. Prior to these analyses, segmented image volumes were skeletonised into 3D axisymmetric tubes using the open-source package Russ-learn [59,60]. Direct skeletonisation was performed using a trained convolutional-recurrent neural network [59] and post-processed using a homotopic thinning algorithm [61], followed by a pruning phase to remove artificial branches.
Our vascular descriptors comprised a set of statistical descriptors: vessel diameters and lengths, vessel tortuosity (sum-of-angles measure, SOAM) and curvature (chord-to-length ratio, CLR), In addition, the following descriptors were used to define network topology: the number of connected components (Betti number β 0 ) and looping structures (1D holes, Betti number β 1 ). Full descriptions of the vascular descriptors are provided in Table S1 while outputs are shown in Tables S4-S7.

Statistical analysis
Statistical analyses were conducted using Prism (v9, GraphPad Software, San Diego, CA, USA) and R (v4.0.1 [62], R Foundation, Vienna, Austria). We used the mean square error and R-squared statistics to quantify the accuracy and strength of the relationship between the segmented networks to the ground truth L-nets. For each outcome of interest, we predicted the ground truth (on a scale compatible with the normality assumption according to model checks) by means of each method estimates through a linear model. As model performance statistics are typically overestimated when assessing the model fit on the same data used to estimate the model parameters, we used bootstrapping (R = 500) to correct for the optimism bias and obtain unbiased estimates [63]. Bland-Altman plots were produced for each paired comparison of segmented volume to the ground truth volume in L-nets and associated bias and limits of agreement (LOA) are reported. For L-nets, F1 scores were calculated [64]. PAI quality pre-segmentation was quantified by measuring SNR, defined as the mean of signal over the standard deviation of the background signal. Comparisons of string volume, as well as SNR, were completed using one-way ANOVA with Tukey multiplicity correction.
For each outcome of interest, in vivo data was analysed as follows: A linear mixed effect model was fitted on a response scale (log, square root or cube root) compatible with the normality assumption according to model checks with the segmentation methods as a 4-level fixed predictor and animal as random effect, to take the within mouse dependence into account. Noting that the residual variance was sometimes different for each segmentation group, we also fitted a heteroscedastic linear mixed effect allowing the variance to be a function of the segmentation group. The results of the heteroscedastic model were preferred to results of the homoscedastic model when the likelihood ratio test comparing both models led to a p-value < 0.05. Two multiplicity corrections were performed to achieve a 5% family-wise error rate for each dataset: For each outcome, a parametric multiplicity correction on the segmentation method parameters was first used [65]. A conservative Bonferroni p-value adjustment was then added to it to account for the number of outcomes in the entire in vivo dataset. The following pairwise comparisons were considered: AT vs. AT+VF, AT vs. RF, RF vs. RF+VF and AT+VF vs. RF+VF. Comparisons of our vascular descriptors between ERand ER+ tumours were completed with an unpaired student's t-test. All p-values < 0.05 after multiplicity correction were considered statistically significant.