On the quantitative analysis of lamellar collagen arrangement with second-harmonic generation imaging

Second harmonic generation (SHG) allows for the examination of collagen structure in collagenous tissues. Collagen is a fibrous protein found in abundance in the human body, present in bones, cartilage, the skin, and the cornea, among other areas, providing structure, support, and strength. Its structural arrangement is deeply intertwined with its function. For instance, in the cornea, alterations in collagen organization can result in severe visual impairments. Using SHG imaging, various metrics have demonstrated the potential to study collagen organization. The discrimination between healthy, keratoconus, and crosslinked corneas, assessment of injured tendons, or the characterization of breast and ovarian tumorous tissue have been demonstrated. Nevertheless, these metrics have not yet been objectively evaluated or compared. A total of five metrics were identified and implemented from the literature, and an additional approach adapted from texture analysis was proposed. In this study, we analyzed their effectiveness on a ground-truth set of artificially generated fibrous images. Our investigation provides the first comprehensive assessment of the performance of multiple metrics, identifying both the strengths and weaknesses of each approach and providing valuable insights for future applications of SHG imaging in medical diagnostics and research.


Introduction
Collagen is the most abundant protein in the human body.This fibrous protein plays a critical role in tissue structure and function, and alterations in its organization are indicators of changes in tissue health.With the introduction of second harmonic generation (SHG) imaging, non-invasive in vivo imaging of collagen without labeling, fixation, and sectioning became possible.First applied to image collagenous tissues in 1986 by Freund and Deutsch [1], SHG was shown to be uniquely capable of imaging structures with non-centrosymmetric organization, such as collagen, at high resolution.Today, SHG is the de facto imaging technique for imaging collagen and assessing tissue organization in healthy and pathological conditions.Its potential to aid in the diagnosis of dermatological diseases [2], evaluate the influence of topical creams [3,4], or to differentiate between healthy and tumor tissue [5,6] was already demonstrated.However, ophthalmology has been the main area of application.Collagen accounts for approximately 90% of corneal thickness and plays a key role in the shape, transparency, and optical properties of the cornea.Changes in collagen organization can severely affect vision and have been amply investigated by SHG [7].The advantages of SHG were already demonstrated in the diagnosis of corneal diseases [8][9][10][11] as well as to evaluate the outcome of therapeutic procedures, such as corneal collagen crosslinking [12][13][14][15][16]. Nevertheless, image analysis is often qualitative.
Although several approaches have been proposed to quantify tissue organization from SHG images, this is still an emerging area.
SHG is a coherent process based on the material electric polarization by the electric field of the optical radiation [17,18], which converts two photons of equal energy into a single photon propagating in the same direction with half the wavelength (twice the energy) [17,[19][20][21].In collagenous tissues, due to its inherent dispersion and randomness, SHG is often described as a quasi-coherent process, i.e., second harmonic waves propagate in both forward and backward directions with respect to the excitation light.The calculated level of collagen organization is equivalent regardless of the detection configuration [22,23].This enables in vivo imaging of biological tissues.
The magnitude of the backward detected SHG signals is strongly correlated with the collagen organization [24][25][26][27][28]. Therefore, the ratio of forward and backward detected SHG signals can be used to quantify the degree of tissue alignment [14].However, forward detected SHG is not feasible in biological tissues in vivo.Polarization-resolved SHG has also been used to quantify collagen organization [8,10,22,23,29,30].Because SHG photons depend on the polarization of light, stronger signals are detected when the direction of the collagen fibrils matches that of the excitation light.This approach requires successive images to be acquired from the same area, which greatly increases the acquisition time.Another limitation for in vivo application is subject movement during image acquisition.
Several image analysis-based methods have been proposed and implemented to extract information about collagen organization from forward and backward detected SHG images acquired with circularly polarized light.These include solutions based on the Fourier transform (FT) [9,11,13,[31][32][33][34][35][36][37], structure tensor (ST) [12,[38][39][40], and gray-level co-occurrence matrix (GLCM) analysis [2,5,37,41].FT-based approaches are the most widely used for quantitative analysis of SHG images.FT analysis has been used to confirm the expected differences in organization levels between healthy, keratoconus, and crosslinked corneas [9,11], to compare the organization levels of corneal samples with different storage times [37], to assess changes in injured tendons [42], or to characterize breast [43] and ovarian [44,45] tumorous tissue.First introduced by Ávila et al. in 2015 to assess the organization of collagenous tissue [38], ST has already demonstrated to have the potential to differentiate between healthy and diseased corneas [40], crosslinked and non-crosslinked corneas [12] and to monitor corneal healing after chemical burns [39].GLCM has been shown to be a viable tool for monitoring the decrease in collagen organization expected during corneal storage [37], in the evaluation of ovarian tumors [5,44], and in the characterization of healthy dermis and scar tissue [2].
Although all the metrics generated by FT, ST, and GLCM analysis have shown potential for studying collagen organization, they have been used to discriminate between groups and have not yet been objectively evaluated in images with known levels of organization, nor have their performances been compared.In this study, we performed a comprehensive analysis of the performance of different metrics in a set of artificially generated fibrous images.The different metrics have also been applied to SHG images of the human cornea to demonstrate their performance.

Artificially generated images
Collagen is a fibrous protein with a complex structural organization.It is composed of three peptide chains with a tertiary triple helix conformation that can crosslink to form quaternary filaments (fibrils).The arrangement of collagen fibrils is tissue specific.For instance, in the cornea, fibrils are regularly packed into lamellae, while in tendons and skin they further assemble into fibers that are highly organized along the longitudinal axis of the tendon or form an interwoven network, respectively [46][47][48][49].
Our goal was to simulate collagen fibrils and lamellae properties from cornea-like tissues as imaged by forward detected SHG while being able to individually control each parameter, from fibril location, size, orientation, and waviness to the number, size, and organization of the lamellae in a highly repeatable manner.However, to achieve this level of control and assess the sensitivity of different quantification methods to subtle organizational changes, we accepted a compromise between control and realism.Consequently, the artificial images we generated do not perfectly mirror actual SHG images.
To create the artificial fibrous images, we began by defining individual fibrils.These can be highly linear, wavy (smoothly curving in and out), or highly tortuous, depending on the condition.We chose to define each fibril as a sum of cosines, with each cosine restricted to a set of frequencies that characterize short, medium, and high frequency waviness.A binary value is associated with each cosine and controls whether a particular form of waviness is present or not.The amplitude of each cosine can also be independently controlled to vary the waviness strength.
We assemble fibrils into lamellae by joining them together.Each fibril is translated to a given x-y position along an arbitrary pre-determined path in image space and rotated to a given angle.The spacing between the fibrils can be independently controlled or randomly spread.
To simulate how the multilamellar structure of collagen is imaged, we create patches using Perlin noise [50].By varying the Perlin noise parameters, we can control the number, density, and average size of these patches.To ensure a smooth transition between patches, neighboring patches are faded into each other using the Distance Transform.The fade distance can also be controlled independently.Both intra-lamella and inter-lamellae variance can be set, resulting in a highly versatile generator as shown in Fig. 1.
3 lamellae in a highly repeatable manner.However, to achieve this level of control and assess the sensitivity of different quantification methods to subtle organizational changes, we accepted a compromise between control and realism.Consequently, the artificial images we generated do not perfectly mirror actual SHG images.
To create the artificial fibrous images, we began by defining individual fibrils.These can be highly linear, wavy (smoothly curving in and out), or highly tortuous, depending on the condition.We chose to define each fibril as a sum of cosines, with each cosine restricted to a set of frequencies that characterize short, medium, and high frequency waviness.A binary value is associated with each cosine and controls whether a particular form of waviness is present or not.The amplitude of each cosine can also be independently controlled to vary the waviness strength.
We assemble fibrils into lamellae by joining them together.Each fibril is translated to a given x-y position along an arbitrary pre-determined path in image space and rotated to a given angle.The spacing between the fibrils can be independently controlled or randomly spread.
To simulate how the multilamellar structure of collagen is imaged, we create patches using Perlin noise [50].By varying the Perlin noise parameters, we can control the number, density, and average size of these patches.To ensure a smooth transition between patches, neighboring patches are faded into each other using the Distance Transform.The fade distance can also be controlled independently.Both intra-lamella and inter-lamellae variance can be set, resulting in a highly versatile generator as shown in Fig. 1.Fig. 1.Artificially generated fibrous images -The generator is highly flexible producing both misaligned or aligned fibrils, highly linear or tortuous, within a single lamella or a complex lamellar structure.

SHG images
A total of 15 SHG images of the human corneal stroma were provided by JenLab GmbH.Measurements were performed as described in [51].Briefly, human corneal buttons, unsuitable for corneal transplantation, provided by the Lions Cornea Bank SaarLor-Lux, Trier/Westpfalz at the Department of Ophthalmology, Saarland University Medical Center, Homburg/Saar, Germany, were imaged using a near-infrared (NIR) 12-fs 5-D laser scanning microscope (JenLab GmbH, Jena, Germany) [52].The samples were excited using a mode-locked titaniumsapphire laser (Integral Pro 400; Femtolasers Produktions GmbH, Vienna, Austria) generating 12-fs pulses with a bandwidth of about 95 nm (centered at 800 nm) at a frequency of 85 MHz.
The laser pulses were focused on the sample through a 40x NA 1.3 oil immersion objective.SHG signals were detected in forward geometry, via a 40x, NA 0.75 air objective, by a photomultiplier tube (PMT) detector (H7732; Hamamatsu Photonics, Hamamatsu, Japan).
The acquisition time for 512 × 512 pixel images was 7.4 seconds.

SHG images
A total of 15 SHG images of the human corneal stroma were provided by JenLab GmbH.Measurements were performed as described in [51].Briefly, human corneal buttons, unsuitable for corneal transplantation, provided by the Lions Cornea Bank SaarLor-Lux, Trier/Westpfalz at the Department of Ophthalmology, Saarland University Medical Center, Homburg/Saar, Germany, were imaged using a near-infrared (NIR) 12-fs 5-D laser scanning microscope (JenLab GmbH, Jena, Germany) [52].The samples were excited using a mode-locked titanium-sapphire laser (Integral Pro 400; Femtolasers Produktions GmbH, Vienna, Austria) generating 12-fs pulses with a bandwidth of about 95 nm (centered at 800 nm) at a frequency of 85 MHz.The laser pulses were focused on the sample through a 40x NA 1.3 oil immersion objective.SHG signals were detected in forward geometry, via a 40x, NA 0.75 air objective, by a photomultiplier tube (PMT) detector (H7732; Hamamatsu Photonics, Hamamatsu, Japan).Autofluorescence signals and the laser light were blocked using 440 nm and 680 nm short-pass filters (FF01-440/SP-25 and FF01-680/SP-25, Semrock, Inc, New York, USA), respectively.The acquisition time for 512×512 pixel images was 7.4 seconds.

Fourier transform
The 2D FT decomposes the image into its frequency components and represents it as a weighted combination of vertical and horizontal sinusoids of different frequencies.We used the 2D Fast FT algorithm to obtain the discrete FT of the images.To avoid cross-shaped artifacts in the frequency domain due to spectral leakage, the image is first multiplied by the Kaiser-Bessel apodizing window [53].
A low pass filter is applied in the frequency domain to homogenize sampling in all directions.The magnitude of the frequency components is thresholded to identify significant frequencies.Then, an angular histogram is created that displays how often these significant magnitudes occur at each angular category.Among the FT-based metrics, the standard deviation of the angular distribution (FT STD ) and the ratio of the radii of an ellipse fitted to this distribution (R 1 /R 2 ) have been proposed) [9,11,13,[31][32][33][34][35][36][37].We have computed and evaluated both metrics.For the FT STD metric, only the [0, π] range was considered, and the polar-symmetric components were summed.Figure 2 illustrates the FT-based metrics.
of applications.Besides these already proposed approaches, we evaluated an additional approach based on a Log-Gabor filter bank, a popular method for feature extraction.All metrics were implemented in MATLAB Release 2021b (The Math-Works, Inc., Natick, Massachusetts, United States).

Fourier Transform
The 2D FT decomposes the image into its frequency components and represents it as a weighted combination of vertical and horizontal sinusoids of different frequencies.We used the 2D Fast FT algorithm to obtain the discrete FT of the images.To avoid cross-shaped artifacts in the frequency domain due to spectral leakage, the image is first multiplied by the Kaiser-Bessel apodizing window [53].
A low pass filter is applied in the frequency domain to homogenize sampling in all directions.The magnitude of the frequency components is thresholded to identify significant frequencies.Then, an angular histogram is created that displays how often these significant magnitudes occur at each angular category.Among the FT-based metrics, the standard deviation of the angular distribution (FT STD ) and the ratio of the radii of an ellipse fitted to this distribution (R 1 /R 2 ) have been proposed) [9,11,13,[31][32][33][34][35][36][37].We have computed and evaluated both metrics.For the FT STD metric, only the [0, π] range was considered, and the polarsymmetric components were summed.Fig. 2 illustrates the FT-based metrics.

Structure Tensor
The use of the structure tensor to quantify collagen organization from SHG images was first proposed by Ávila et al. in [38].The structure tensor matrix describes the local gradients around a given point.The rationale is that a structure tensor matrix can be computed for each pixel in an image, allowing the distribution of fibril orientations to be inferred.The standard deviation of the preferential orientation distribution (ST STD ) and degree of isotropy (DoI) metrics have been computed and evaluated [12,[38][39][40].Fig. 3 illustrates the structure tensor-based metrics.

Structure tensor
The use of the structure tensor to quantify collagen organization from SHG images was first proposed by Ávila et al. in [38].The structure tensor matrix describes the local gradients around a given point.The rationale is that a structure tensor matrix can be computed for each pixel in an image, allowing the distribution of fibril orientations to be inferred.The standard deviation of the preferential orientation distribution (ST STD ) and degree of isotropy (DoI) metrics have been computed and evaluated [12,[38][39][40].Figure 3 illustrates the structure tensor-based metrics.

Gray-level co-occurrence matrix
Originally proposed by Haralick et al. [54], GLCM analysis is a popular feature extraction method for texture analysis.The gray values of image pixel pairs with specific spatial relationships are tabulated, resulting in a matrix of co-occurrences from which local features can then be computed (Fig. 4).The GLCM was computed at four angles (0, 45, 90, and 135°) using an optimized scale (see 2.4.1).Symmetry was accounted for, i.e., angles 180°apart are considered equal.Over time, several metrics to convey texture information from the GLCM have been proposed, such as correlation, energy, homogeneity, inertia, and entropy [2,41].Correlation is the feature extraction method most frequently employed in the literature for quantifying collagen organization, thus, it is the metric we chose [2,5,37,41].The value of the standard deviation over all the directions was considered.

Gray-level co-occurrence matrix
Originally proposed by Haralick et al. [54], GLCM analysis is a popular feature extraction method for texture analysis.The gray values of image pixel pairs with specific spatial relationships are tabulated, resulting in a matrix of co-occurrences from which local features can then be computed (Fig. 4).The GLCM was computed at four angles (0, 45, 90, and 135º) using an optimized scale (see 2.4.1).Symmetry was accounted for, i.e., angles 180º apart are considered equal.Over time, several metrics to convey texture information from the GLCM have been proposed, such as correlation, energy, homogeneity, inertia, and entropy [2,41].
Correlation is the feature extraction method most frequently employed in the literature for quantifying collagen organization, thus, it is the metric we chose [2,5,37,41].The value of the standard deviation over all the directions was considered.

Log-Gabor filter bank
Locally, fibrils are expected to maintain their intensity along a given direction.A bank of log-Gabor filters contains a set of kernels, each with a unique orientation-scale pair.Convolving the image with a bank of 2D log-Gabor filters yields a maximum response when the local fibril scale and orientation match that of the kernel.A log-Gabor filter bank performs a robust multiresolution decomposition of the image that captures scale and orientation information.These filters are created by combining a radial and an angular component that constrain the frequency bands and orientation of the filter, respectively, and are known to be robust to

Log-Gabor filter bank
Locally, fibrils are expected to maintain their intensity along a given direction.A bank of log-Gabor filters contains a set of kernels, each with a unique orientation-scale pair.Convolving the image with a bank of 2D log-Gabor filters yields a maximum response when the local fibril scale and orientation match that of the kernel.A log-Gabor filter bank performs a robust multiresolution decomposition of the image that captures scale and orientation information.These filters are created by combining a radial and an angular component that constrain the frequency bands and orientation of the filter, respectively, and are known to be robust to perturbations such as illumination changes and noise [55].After convolution, we computed the orientation distribution over the different scales (Fig. 5).To quantify the organization, we first selected the scale with the maximum response, computed the sum of the response to each orientation, and then computed the standard deviation of the response orientation distribution (LG STD ).

6
perturbations such as illumination changes and noise [55].After convolution, we computed the orientation distribution over the different scales (Fig. 5).To quantify the organization, we first selected the scale with the maximum response, computed the sum of the response to each orientation, and then computed the standard deviation of the response orientation distribution (LG STD ).

Evaluating organization
The very first issue in evaluating collagen organization is to define it.Through the visualization of tissues with highly organized collagen fibrils (such as the human cornea), we can consider as organized a tissue with regular and parallel collagen fibrils arranged in a single lamella.It is clear then that any decrease in fibril alignment within each lamella or an increase in either fibril tortuosity or lamellar complexity (number of lamellae per area) should result in a decrease in the degree of organization.Our goal is to evaluate these three components (intralamellar fibril orientation, tortuosity, and complexity of lamellar organization) separately, as well as the interaction between them.We designed 7 scenarios, namely: • Ideal: aligned, non-tortuous fibrils in a single lamella.
• Low Fibril Alignment: non-tortuous fibrils in a single lamella with a set level of fibril alignment.
• Random Fibril Alignment: non-tortuous fibrils in a single lamella with random level of fibril alignment.
• High Tortuosity: aligned fibrils in a single lamella with a set level of fibril tortuosity.
• Random Tortuosity: aligned fibrils in a single lamella with a random level of fibril tortuosity.
• Complex Lamellae Structure: aligned, non-tortuous fibrils in multiple lamellae with multiple patches.
• Random Lamellae Structure: aligned, non-tortuous fibrils in a randomly complex lamellae structure.
For each scenario, three scales consisting of 100 images with increasing levels of misalignment, tortuosity, or lamellar complexity were then generated.Conflicting scales and scenarios were not considered, i.e., an increasing tortuosity scale and the high tortuosity scenario conflict, and their combination was not considered.The metrics described in the previous section were then computed for each generated image.Spearman rank correlation between the scales and the metrics was computed per scale and scenario as a measure of how well the metric represents the variation in the scale.The spearman rank correlation was the metric of choice because the nature of the correlation (linear vs non-linear) is not presumed.
We assume that the next image in the scale is less organized than before.However, we do not

Evaluating organization
The very first issue in evaluating collagen organization is to define it.Through the visualization of tissues with highly organized collagen fibrils (such as the human cornea), we can consider as organized a tissue with regular and parallel collagen fibrils arranged in a single lamella.It is clear then that any decrease in fibril alignment within each lamella or an increase in either fibril tortuosity or lamellar complexity (number of lamellae per area) should result in a decrease in the degree of organization.Our goal is to evaluate these three components (intra-lamellar fibril orientation, tortuosity, and complexity of lamellar organization) separately, as well as the interaction between them.We designed 7 scenarios, namely: • Ideal: aligned, non-tortuous fibrils in a single lamella.
• Low Fibril Alignment: non-tortuous fibrils in a single lamella with a set level of fibril alignment.
• Random Fibril Alignment: non-tortuous fibrils in a single lamella with random level of fibril alignment.
• High Tortuosity: aligned fibrils in a single lamella with a set level of fibril tortuosity.
• Random Tortuosity: aligned fibrils in a single lamella with a random level of fibril tortuosity.
• Complex Lamellae Structure: aligned, non-tortuous fibrils in multiple lamellae with multiple patches.
• Random Lamellae Structure: aligned, non-tortuous fibrils in a randomly complex lamellae structure.
For each scenario, three scales consisting of 100 images with increasing levels of misalignment, tortuosity, or lamellar complexity were then generated.Conflicting scales and scenarios were not considered, i.e., an increasing tortuosity scale and the high tortuosity scenario conflict, and their combination was not considered.The metrics described in the previous section were then computed for each generated image.Spearman rank correlation between the scales and the metrics was computed per scale and scenario as a measure of how well the metric represents the variation in the scale.The spearman rank correlation was the metric of choice because the nature of the correlation (linear vs non-linear) is not presumed.We assume that the next image in the scale is less organized than before.However, we do not know the exact nature of the relationship -whether linear or non-linear.This process was repeated 20 times to assess variability.The mean and standard deviation of the correlation coefficient were then calculated and compared.

Parameter optimization
The quantification of the organization is significantly affected by the parameterization of the described metrics.Choosing appropriate parameters such as filter size, histogram sampling, and others, is critical to comparing metric performance.However, optimizing them for each and every scenario is impractical and ultimately unattainable in real-world situations where ground-truth does not exist.To standardize the comparison, prior to any evaluation, the parameters for each metric were optimized for the ideal scenario of each of the three components: a grid-search was performed to select the combination that maximized the correlation.This results in a different optimization for each component.The selected parameterization was then fixed for all other scenarios of the component.

Invariance
In addition to the ability to reflect changes in organization, invariance to external conditions such as noise, scale, and rotation are also desirable characteristics.To evaluate these three invariances, three scales consisting of 100 images with progressive noise level (additive Gaussian noise), magnification, and rotation angle were generated for each of the low fibril alignment, high tortuosity, and complex lamellae structure scenarios.The mean absolute deviation (MAD) was the metric of choice to evaluate invariance.A lower MAD indicates better performance.
Since different quantities are being compared, standardization is required.To standardize and make results comparable between metrics, we normalize the range between the values v 0 and v 1 to a [0-1] scale, where v 0 and v 1 are the quantifiers averaged over 20 repetitions for the most and least organized images of the ideal scenario, respectively.This approach is analogous to min-max scaling.However, due to the susceptibility of standard min-max scaling to outliers, since it uses the minimum and maximum values directly, we decided to take a different approach.Instead of using the extreme values, we calculate the average values of our quantifiers at the minimum and maximum levels of organization of the ideal scenario.For example, for the low fibril alignment scenario, the range calculated from the ideal scenario of the intra-lamellar fibril orientation component was used for standardization.Invariance quantification was repeated 20 times to assess variability.

Quantification of organization on SHG images
To analyze the real SHG images, we computed all six metrics per image.The results were then normalized using min-max scaling to ensure comparability.Then, the overall average value for each image was computed for unified assessment.Figure 6.A, shows four examples of real SHG images ranked by the overall average.Figure 6.B illustrates the performance of each metric, displaying both individual normalized metric values and the overall average versus its ranking.

8
To analyze the real SHG images, we computed all six metrics per image.The results were then normalized using min-max scaling to ensure comparability.Then, the overall average value for each image was computed for unified assessment.Notably, all correlations exceeded 0.67, except for the comparison between GLCM and DoI, indicating strong consistency.Additionally, all metrics demonstrated a high correlation with the overall average, with values above 0.85, underscoring their reliability in assessing the collagen organization in the images.

Organization
Figure 7(A)-(C) shows the correlation results for the three tested organization components (intra-lamellar fibril orientation, tortuosity, and lamellar complexity, respectively).Detailed results can be found in the Supplement 1 S1 to S3.As shown, all evaluated metrics are adept at the proposed task (assessing changes in the organization), achieving a high correlation coefficient in many of the tested scenarios.
As mentioned above, FT-based methods are by far the most popular in the literature.We evaluated two FT-based metrics, FT STD and R 1 /R 2 .Despite its widespread use, R 1 /R 2 clearly underperformed, being outclassed by FT STD in 14 out of 15 evaluated combinations.Figure 8(A) shows the FT STD and R 1 /R 2 evolution with an increased fibril misalignment (decreasing organization).As shown, although both metrics perform similarly, R 1 /R 2 is more erratic at points leading to a decreased correlation (0.412 ± 0.087 vs 0.824 ± 0.026).FT STD is particularly adept for evaluating tortuosity, achieving the highest correlation coefficient of all other metrics in all scenarios except the ideal one and random alignment.As mentioned above, FT-based methods are by far the most popular in the literature.We evaluated two FT-based metrics, FT STD and R 1 /R 2 .Despite its widespread use, R 1 /R 2 clearly underperformed, being outclassed by FT STD in 14 out of 15 evaluated combinations.Fig. 8A shows the FT STD and R 1 /R 2 evolution with an increased fibril misalignment (decreasing organization).As shown, although both metrics perform similarly, R 1 /R 2 is more erratic at points leading to a decreased correlation (0.412 ± 0.087 vs 0.824 ± 0.026).FT STD is particularly adept for evaluating tortuosity, achieving the highest correlation coefficient of all other metrics in all scenarios except the ideal one and random alignment.
We also evaluated two ST-based metrics, DoI and ST STD .Both metrics were well rounded and outperformed both FT-based metrics in several combinations.In particular, DoI proved to be one of the best performing metrics, outperforming all others in 2 out of 5 scenarios for fibril misalignment and 5 out of 5 for lamellar complexity.However, it appears to be particularly poor at asserting tortuosity.The comparison between DoI and ST STD shown in Fig. 8B seems to explain this behavior.As shown, both DoI and ST STD are insensitive to the initial increase in tortuosity and exhibit a sluggish-like behavior.This is particularly evident for ST STD , resulting in a lower correlation coefficient in this scenario compared to DoI (0.739 ± 0.044 vs. 0.981 ± 0.004).This behavior is evident in all tortuosity scenarios.The other standout performer was the log-Gabor based metric.It outperformed all others in two scenario-component combinations and showed very few weaknesses, being among the top performers in all but one combination.Another method that has already shown potential in the literature is GLCM.The implemented GLCM-based metric is well rounded, but it is still mostly outperformed by the FT STD , ST STD and LG STD .Fig. 8C shows the LG STD and GLCM evolution with an increased lamellae complexity (decreasing organization).For represented scenario LG STD slightly outperformed GLCM (0.701 ± 0.055 vs 0.631 ± 0.055).

353
The assessment of collagen organization should not be influenced by external factors that 354 cannot be controlled in real-world situations, such as image noise and differences in sample 355 orientation.Scale can be partially controlled.However, an identical level of organization should 356 be achieved within a uniform sample region, regardless of the size of the imaged area.In A, the Fourier transform (FT) based metrics are plotted against fibril alignment for the complex lamellae structure scenario.In B, the structure tensor (ST) based metrics are plotted against tortuosity level for the ideal scenario.In C, the log-Gabor and gray-level co-occurrence matrix (GLCM) based metrics are plotted against lamellar complexity for the high tortuosity scenario.
We also evaluated two ST-based metrics, DoI and ST STD .Both metrics were well rounded and outperformed both FT-based metrics in several combinations.In particular, DoI proved to be one of the best performing metrics, outperforming all others in 2 out of 5 scenarios for fibril misalignment and 5 out of 5 for lamellar complexity.However, it appears to be particularly poor at asserting tortuosity.The comparison between DoI and ST STD shown in Fig. 8(B) seems to explain this behavior.As shown, both DoI and ST STD are insensitive to the initial increase in tortuosity and exhibit a sluggish-like behavior.This is particularly evident for ST STD , resulting in a lower correlation coefficient in this scenario compared to DoI (0.739 ± 0.044 vs. 0.981 ± 0.004).This behavior is evident in all tortuosity scenarios.
The other standout performer was the log-Gabor based metric.It outperformed all others in two scenario-component combinations and showed very few weaknesses, being among the top performers in all but one combination.Another method that has already shown potential in the literature is GLCM.The implemented GLCM-based metric is well rounded, but it is still mostly outperformed by the FT STD , ST STD and LG STD .Figure 8(C) shows the LG STD and GLCM evolution with an increased lamellae complexity (decreasing organization).For represented scenario LG STD slightly outperformed GLCM (0.701 ± 0.055 vs 0.631 ± 0.055).

Invariance
Fig. 9A-C shows the normalized MAD results for the three invariances tested (noise, scale, and rotation, respectively).Detailed results can be found in the supplementary Tables S4 to S6.
The assessment of collagen organization should not be influenced by external factors that cannot be controlled in real-world situations, such as image noise and differences in sample orientation.Scale can be partially controlled.However, an identical level of organization should be achieved within a uniform sample region, regardless of the size of the imaged area.The assessment of collagen organization should not be influenced by external factors that cannot be controlled in real-world situations, such as image noise and differences in sample orientation.Scale can be partially controlled.However, an identical level of organization should be achieved within a uniform sample region, regardless of the size of the imaged area.
FT-based methods proved to be very sensitive to variations in the noise level, scale, and rotation.FT STD and R 1 /R 2 are the two worst performing metrics in 5 out of 9 scenarios.At high tortuosity, both methods are outperformed by all other metrics for noise, scale, and rotation invariances.This is also the case for scale and rotation invariance at low alignment.When considering complex lamellar scenarios, the FT-based methods perform best, with R 1 /R 2 being the metric least affected by the tested external factors.Nevertheless, invariance performance must be weighted against correlation performance.Note that this metric is the worst performer for lamellar complexity (Fig. 7(C)), which partly explains the invariance achieved.
ST-based metrics were well rounded, outperforming all other metrics in multiple scenarios.ST STD proved to be the least sensitive metric to noise, scale, and rotation overall, being among the top performers and outperforming all other metrics in 6 out of 9 scenarios.Note that although ST STD outperforms all other metrics in high tortuosity invariance scenarios, it shows low sensitivity to the level of tortuosity in Fig. 7(B).DoI shows a good invariance to scale and rotation for low alignment and high tortuosity.Nonetheless, it seems to be particularly sensitive to noise in all scenarios, and to scale and rotation with high lamellar complexity (worst performer of this scenario for all invariances).
Once again, the log-Gabor based metric is a standout performer.It is well rounded, being among the top two performers in most of the scenarios (top performer in 2 out of 9).Similar to DoI, LG STD 's sensitivity to noise, scale, and rotation drastically increases when considering scenarios with high lamellar complexity.Although GLCM's performance is comparable to LG STD 's, it is consistently the worst performer in head-to-head comparisons, except for the scenarios with complex lamellar structure.

Discussion
The quantitative analysis of collagen arrangement may serve as a valuable biomarker of pathology.Metrics derived from FT, ST, and GLCM analysis have been shown to discriminate between normal and diseased subjects.However, implementation of these metrics in real world settings requires an understanding of their strengths, limitations, and potential sources of bias.To date, quantification in images with established levels of organization has not been objectively evaluated, nor has the performance of proposed approaches been compared.In this study, we took steps toward that goal.
We demonstrated the application of these various metrics in real forward detected SHG images of human corneas.The analysis revealed a high degree of correlation, both between pairs of metrics and between each metric and the overall average.These results highlight their consistency and agreement.It's important to recognize that while a strong inter-metric correlation is indicative of methodological consistency, it does not fully determine their performance.In fact, this level of correlation underscores the importance of having a repeatable and reliable ground truth for accurate analysis, and further emphasizes the need for further evaluation to fully determine their performance.
The use of artificially generated images allowed us to control all features of the image, providing the highly repeatable and reliable ground truth needed.It is important to note that these images do not represent true SHG collagen images.The trade-off between control and verisimilitude was necessary to determine whether the quantification approaches were sensitive to minute variations in the level of organization.
Before even beginning to evaluate the quantification of organization in collagenous tissues, the concept of collagen organization itself had to be defined.Although there is a clear picture of what an image of organized fibrils looks like, there are several ways in which fibrils can deviate from this organized appearance.It is important to understand that the healthy level of collagen organization can vary between different tissues, and its pathological behavior depends on a variety of factors.Thick, randomly arranged, tortuous collagen fibrils are characteristic of both a healthy dermis and an unhealthy cornea.For a comprehensive analysis of the performance of each metric, we considered the highly organized collagen observed in healthy corneal tissue as the baseline of organized fibrils.In healthy corneas, collagen fibrils are highly regular ultimately arranged in a parallel and orderly manner in wide lamellae that run parallel to each other across the entire surface of the cornea.
Clearly, not all forms of disorganization/organization are possible in all tissues.Nevertheless, we wanted to be comprehensive.We identified three pathways of collagen disarray: fibril misalignment, fibril tortuosity, and lamellar complexity, and created multiple scenarios to understand how these pathways interfere.Our results can be used to draw individual conclusions for a specific form as well as more general conclusions.
Organization assessment in the literature assumes that pixel intensity is consistently higher than the background along each fibril.Approaches evaluate the orientation of line-like features and quantify their variation over the imaged area.The discrepancies arise from the methods used to approximate the values.We envision three probable sources of interference/bias.Although extreme conditions will always impact results, the perfect organizational metric should be unaffected by minor changes in noise, scale, or rotation.Consequently, we conducted tests to assess these invariances.
A total of 5 metrics were identified and implemented from the literature.An additional approach based on a log-Gabor filter bank, a popular method for feature extraction, was also implemented.All evaluated metrics were shown to be adept at assessing changes in fibril organization, achieving a high correlation coefficient in many of our tests.Figure 10 summarizes the results obtained.Although we recognize that the temptation to identify a single, optimal metric exists, we do not believe in a one-size-fits-all.Our findings demonstrate that no single metric outperforms all others in every situation.As shown, both the ST STD and the log-Gabor based metric are strong in almost every component.GLCM results are similar to LG STD , but still inferior.Although FT STD performs well on assessing organization, FT-based metrics perform regular ultimately arranged in a parallel and orderly manner in wide lamellae that run parallel to each other across the entire surface of the cornea.
Clearly, not all forms of disorganization/organization are possible in all tissues.
Nevertheless, we wanted to be comprehensive.We identified three pathways of collagen disarray: fibril misalignment, fibril tortuosity, and lamellar complexity, and created multiple scenarios to understand how these pathways interfere.Our results can be used to draw individual conclusions for a specific form as well as more general conclusions.
Organization assessment in the literature assumes that pixel intensity is consistently higher than the background along each fibril.Approaches evaluate the orientation of line-like features and quantify their variation over the imaged area.The discrepancies arise from the methods used to approximate the values.We envision three probable sources of interference/bias.Although extreme conditions will always impact results, the perfect organizational metric should be unaffected by minor changes in noise, scale, or rotation.Consequently, we conducted tests to assess these invariances.A total of 5 metrics were identified and implemented from the literature.An additional approach based on a log-Gabor filter bank, a popular method for feature extraction, was also implemented.All evaluated metrics were shown to be adept at assessing changes in fibril organization, achieving a high correlation coefficient in many of our tests.Fig. 10 summarizes the results obtained.Although we recognize that the temptation to identify a single, optimal metric exists, we do not believe in a one-size-fits-all.Our findings demonstrate that no single metric outperforms all others in every situation.As shown, both the ST STD and the log-Gabor based metric are strong in almost every component.GLCM results are similar to LG STD , but poorly on invariance.The parameterization of each metric was optimized for general performance (as assessed in the ideal scenarios) and may not be ideally suited for invariance analysis.It is possible that a different set of parameters for the FT-based metrics could lead to improved invariance performance.However, according to our methodology, these would not be optimal for organizational level assessment.It's important to note that the methodological approach remained consistent across all metrics evaluated in this study, suggesting that the observed limitation may be inherent to the FT-based methodology used.
In terms of computation time, our analysis reveals that FT-based metrics are the quickest (∼0.04 seconds), followed by both ST and GLCM-based metrics (over 0.2 seconds).As expected, the convolution with multiple filters necessary for the LG STD metric leads to a considerably slower computation time (over 5 seconds).Of note, we did not optimize any of the implementations for speed, suggesting that there is potential for considerable improvement in computation times.For instance, pre-computing elements such as filters could yield significant efficiency gains for large-scale applications.Nonetheless, especially when faced with time constraints, the faster computation time of FT-based and, to a lesser extent, ST-based metrics, could become an important advantage.
In this study, we have highlighted the strengths and limitations of the most used metrics for assessing collagen organization in tissues.Based on these results, informed decisions can be made to select the most appropriate approach for a particular application.The concurrent use of complementary metrics to better capture the distinct aspects of collagen organization is a strong proposal for a comprehensive analysis that may lead to improved accuracy.

Fig. 1 .
Fig.1.Artificially generated fibrous images -The generator is highly flexible producing both misaligned or aligned fibrils, highly linear or tortuous, within a single lamella or a complex lamellar structure.

Fig. 2 .
Fig. 2. Fourier Transform based metrics illustrated -From left to right, (A) an artificially generated fibrous image with principal orientation superimposed (colored according to the color bar), (B) the respective angular distribution with fitted ellipse (gray dashed line) on polar coordinated, and (C) the distribution according to the [0 -π] range with polar-symmetric components summed.

Fig. 2 .
Fig. 2. Fourier Transform based metrics illustrated -From left to right, (A) an artificially generated fibrous image with principal orientation superimposed (colored according to the color bar), (B) the respective angular distribution with fitted ellipse (gray dashed line) on polar coordinated, and (C) the distribution according to the [0π] range with polar-symmetric components summed.

200
These filters are created by combining a radial and an angular component that constrain the 201 frequency bands and orientation of the filter, respectively, and are known to be robust to

Fig. 3 .
Fig. 3. Structure Tensor based metrics illustrated -From left to right, (A) an artificially generated fibrous image with preferential orientation superimposed (colored according to the color bar), (B) the respective preferential orientation distribution, and (C) the degree of isotropy (colored according to the color bar).

Fig. 3 .
Fig. 3. Structure Tensor based metrics illustrated -From left to right, (A) an artificially generated fibrous image with preferential orientation superimposed (colored according to the color bar), (B) the respective preferential orientation distribution, and (C) the degree of isotropy (colored according to the color bar).

Fig. 5 .
Fig. 5. Log-Gabor based organization metric illustrated -On the left, (A) an artificially generated fibrous image with response-based orientation superimposed (colored according to the color bar), on the right, (B) the respective response orientation distribution over the different scales.

Fig. 5 .
Fig. 5. Log-Gabor based organization metric illustrated -On the left, (A) an artificially generated fibrous image with response-based orientation superimposed (colored according to the color bar), on the right, (B) the respective response orientation distribution over the different scales.

Figure 6 .
Figure 6.C shows the Spearman rank correlation, showcasing the relationship between each pair of metrics and their collective average.

Fig. 6 .
A, shows four examples of real SHG images ranked by the overall average.Fig. 6.B illustrates the performance of each metric, displaying both individual normalized metric values and the overall average versus its ranking.Fig. 6.C shows the Spearman rank correlation, showcasing the relationship between each pair of metrics and their collective average.Notably, all correlations exceeded 0.67, except for the comparison between GLCM and DoI, indicating strong consistency.Additionally, all metrics demonstrated a high correlation with the overall average, with values above 0.85, underscoring their reliability in assessing the collagen organization in the images.

Fig. 6 .
Fig. 6.Analysis of SHG Images -In A, examples of real SHG images ordered by ascending levels of organization as determined by the overall average of all metrics.In B, the normalized values of the six metrics computed for each image and the overall average plotted against its corresponding ranking.In C, the Spearman rank correlation between each metric pair, and the correlation with the overall average, is represented (colored according to the color bar).

Fig. 6 .
Fig. 6.Analysis of SHG Images -In A, examples of real SHG images ordered by ascending levels of organization as determined by the overall average of all metrics.In B, the normalized values of the six metrics computed for each image and the overall average plotted against its corresponding ranking.In C, the Spearman rank correlation between each metric pair, and the correlation with the overall average, is represented (colored according to the color bar).

9 3. 2
Fig.7A-Cshows the correlation results for the three tested organization components (intralamellar fibril orientation, tortuosity, and lamellar complexity, respectively).Detailed results can be found in the supplementary TablesS1 to S3.As shown, all evaluated metrics are adept at the proposed task (assessing changes in the organization), achieving a high correlation coefficient in many of the tested scenarios.

Fig. 7 .
Fig. 7. Correlation results -A-C show the Spearman's rank correlation coefficient between the computed metrics and the ground truth organization level for each scenario.From A to C, the intra-lamellar fibril misalignment, tortuosity level, and lamellar complexity were incrementally increased over 100 images.The correlation assesses how sensitive each metric is to changes in organization.The higher, the better.All quantifications were repeated 20 times to assess variability.Average values are shown.Standard deviation values are shown as error bars.

Fig. 7 .
Fig. 7. Correlation results -A-C show the Spearman's rank correlation coefficient between the computed metrics and the ground truth organization level for each scenario.From A to C, the intra-lamellar fibril misalignment, tortuosity level, and lamellar complexity were incrementally increased over 100 images.The correlation assesses how sensitive each metric is to changes in organization.The higher, the better.All quantifications were repeated 20 times to assess variability.Average values are shown.Standard deviation values are shown as error bars.

3503. 3
Fig. 9A-C shows the normalized MAD results for the three invariances tested (noise, scale,

Fig. 8 .
Fig. 8. Quantification vs organization -Examples of organization level quantification.All quantifications were repeated 20 times to assess variability.Average curves are shown.Standard deviation (STD) is shown as colored bands.Average spearman's rank correlation ± STD is shown in the plot legends.In A, the Fourier transform (FT) based metrics are plotted against fibril alignment for the complex lamellae structure scenario.In B, the structure tensor (ST) based metrics are plotted against tortuosity level for the ideal scenario.In C, the log-Gabor and gray-level co-occurrence matrix (GLCM) based metrics are plotted against lamellar complexity for the high tortuosity scenario.

Figure 9 (
Figure 9(A)-(C) shows the normalized MAD results for the three invariances tested (noise, scale, and rotation, respectively).Detailed results can be found in the Supplement 1 S4 to S6.

Fig. 8 .
Fig. 8. Quantification vs organization -Examples of organization level quantification.All quantifications were repeated 20 times to assess variability.Average curves are shown.Standard deviation (STD) is shown as colored bands.Average spearman's rank correlation ± STD is shown in the plot legends.In A, the Fourier transform (FT) based metrics are plotted against fibril alignment for the complex lamellae structure scenario.In B, the structure tensor (ST) based metrics are plotted against tortuosity level for the ideal scenario.In C, the log-Gabor and graylevel co-occurrence matrix (GLCM) based metrics are plotted against lamellar complexity for the high tortuosity scenario.

Fig. 9 .
Fig. 9. Invariance results -A-C show the normalized mean absolute deviation (MAD) of the calculated metrics.From A to C, the added noise level, scale, and rotation were incrementally changed over 100 images.The MAD assesses how sensitive each metric is to these changes.The lower, the better.All quantifications were repeated 20 times to assess variability.Average values are shown.Standard deviation values are shown as error bars.

Fig. 9 .
Fig. 9. Invariance results -A-C show the normalized mean absolute deviation (MAD) of the calculated metrics.From A to C, the added noise level, scale, and rotation were incrementally changed over 100 images.The MAD assesses how sensitive each metric is to these changes.The lower, the better.All quantifications were repeated 20 times to assess variability.Average values are shown.Standard deviation values are shown as error bars.

Fig. 10 .
Fig. 10.Quantification summary -In A, the average performance per organization component and invariance is shown for all the tested metrics.In B, the same results are shown individualized per metric.For better visualization, in both A and B, the invariance performance has been inverted ( (1 -, 0) ).

Fig. 10 .
Fig. 10.Quantification summary -In A, the average performance per organization component and invariance is shown for all the tested metrics.In B, the same results are shown individualized per metric.For better visualization, in both A and B, the invariance performance has been inverted (max(1 − MAD, 0)).