Dictionary learning technique enhances signal in LED-based photoacoustic imaging.

There has been growing interest in low-cost light sources such as light-emitting diodes (LEDs) as an excitation source in photoacoustic imaging. However, LED-based photoacoustic imaging is limited by low signal due to low energy per pulse-the signal is easily buried in noise leading to low quality images. Here, we describe a signal de-noising approach for LED-based photoacoustic signals based on dictionary learning with an alternating direction method of multipliers. This signal enhancement method is then followed by a simple reconstruction approach delay and sum. This approach leads to sparse representation of the main components of the signal. The main improvements of this approach are a 38% higher contrast ratio and a 43% higher axial resolution versus the averaging method but with only 4% of the frames and consequently 49.5% less computational time. This makes it an appropriate option for real-time LED-based photoacoustic imaging.


Introduction
Photoacoustic imaging (PAI) is a non-invasive hybrid imaging modality with tremendous potential in structural, functional, and molecular imaging for pre-clinical and clinical applications such as brain mapping, tumor detection, cancer staging, tissue vasculature, and oral health [1][2][3][4][5][6][7][8][9][10]. PAI combines optical and ultrasound imaging modalities based on the photoacoustic effect to achieve the good contrast and spectral behavior of optical imaging as well as the spatial and temporal resolution of the ultrasound imaging [11][12][13][14]. In PAI, the tissue is illuminated by a 5-100 ns light pulse; the absorbed optical energy leads to a local temperature and subsequent thermal expansion leading to wideband ultrasound waves.
The acoustic signal is directly proportional to optical fluence, and thus the optical excitation source is a key component of PAI systems. Pulsed lasers are common in PAI and offer powers just below the ANSI limit for strong PAI signal generation. However, these lasers are also bulky, expensive, and delicate. Thus, recent efforts have focused on low-cost light sources such as pulsed laser diodes (PLDs) and light-emitting diodes (LEDs) to further facilitate widespread clinical x can be approximated as x ≈ Dα where α ∈ R k is a sparse vector with the fewest nonzero entries containing the representation coefficients of x. Therefore, the sparse representation problem could be solved as the following optimization problem: where ||α|| 0 is the zero norm of α that represents the number of non-zero values in a vector α. Also, via the proper regularization parameter λ, Eq. (1) could be converted to an unconstrained problem as [52]:α = arg α,D min ||x − Dα|| 2 2 + λ||α|| 0 (2)

Alternating direction method of multipliers (ADMM)
The ADMM is a candidate solver for convex problems. It is a simple but powerful algorithm to solve the convex optimization problem by breaking it into smaller sub-problems that has recently been used in several different areas [53,54]. The ADMM benefits from two main ideas: dual decomposition and augmented Lagrangian methods for constrained problems [55]. The ADMM is designed to solve the separable convex problems of the form: where x ∈ R n , y ∈ R m , A ∈ R p×n , and B ∈ R p×m . The augmentation Lagrangian for the Eq. (3) can be written as: where term ρ is a penalty term that is considered positive, and λ is the Lagrangian multiplier. Equation (4) is solved over three steps: x-minimization and y-minimization. These two are split into N separate problems and followed by an updating step for multiplier λ as follows: x k+1 := arg min x L p (x, y k , λ k ),

Dictionary learning assisted signal de-noising (DLASD)
To solve problems with LED-based photoacoustic signal de-noising, the PA signal is modeled as an observed noisy signal s that is defined as: Here, x is the desired signal and n denotes the observation noise that is bounded as ||n|| 2 ≤ ε. The photoacoustic signal de-noising process actually estimates x from the observation s. Considering the photoacoustic signal has a sparse representation over the dictionary D as x = Dα, the de-noising problem could be written as: In the dictionary learning phase, the dictionary is updated during the learning process according to the decomposed signal. Therefore, it could follow that the properties of decomposed signals can lead to sparser coefficients versus fixed dictionaries such as the wavelet transform. Here, ADMM is proposed for dictionary learning because designing the appropriate dictionary plays an important role in the ideal recovery of a signal [56]. The Lagrange function of dictionary learning based on Eq. (7) and 4 is obtained as: Here, the operator Λ i , (x − Dα) i denotes the trace of the matrix a T (x − Dα), where Λ is the Lagrange multiplier matrix. The ADMM algorithm is applied via this equation, and the Majorization-minimization (MM) algorithm [57] is used to obtain the coefficients. Finally, the updated dictionary is achieved as below: where, The first step is based on the given initial dictionary D 0 and training matrix. D 0 is a column vector with the length n chosen randomly from the given signal. The input signal is raw data detected by ultrasonic transducers decomposed into many patches. The proposed method was then applied on the original signal without any pre-processing. DLASD does not require any prior knowledge about characteristics of the signal and training data. Here, the MM algorithm is proposed to implement the sparse coding to achieve the coefficient vector α. This is followed by the next step in which the sparse vector α is fixed, and the dictionary D is updated using the dictionary learning method based on ADMM via Eq. (9). In the following, the Lagrange multiplier matrix was updated based on Eq. (5) as below: The iteration was performed until the iteration time or pre-defined satisfactory error of the reconstructed signal is achieved. Here, the iteration was performed until the iteration times are reached the fixed iteration numbers 20 times (Fig. 1). We created photoacoustic images that represent an optical absorption distribution map of the targets via the delay and sum (DAS) approach as the most commonly used reconstruction method in the photoacoustic imaging area.

Experimental setup
In this study, we used a commercially available LED-based PAI system (Cyberdyne Co., Tokyo, Japan) to perform all experiments. This imaging system has been characterized previously [17]. There were two high-density high-power LED arrays, and each included four rows of 36 single LEDs. These were coupled to the sides of a 128-element linear array transducer with a central frequency of 7 MHz and a bandwidth of 80.9%. Each single ultrasound element has a dynamic range of 16 bits with 1024 samples. The photoacoustic sampling rate for this imaging device is 40 MHz. The illumination source has a repetition rate of 4 KHz, wavelength of 850 nm, and a 100 ns pulse width.

Contrast measurement
To evaluate the contrast of the reconstructed image from de-noised signal processed by the proposed method, parallel lines (150 µm wide) with distances of 1.1 mm were printed on the transparent film. We placed the film between two layers of 1% agar and fixed the entire object inside the water tank. The B-mode frame rate was 6 Hz.

Spatial resolution and depth measurement
We placed black nylon monofilament sutures with a nominal diameter of 50 µm (Teleflex Medical OEM) inside 2% intralipid (20%, emulsion, Sigma-Aldrich Co, MO, USA) mixed with agar at different depths with an interval distance of 5 mm for the first five filaments and 10 mm for the remainder. The B-mode frame rate was 6 Hz.

In vivo experiment
All animal experiments were performed in compliance with the Institutional Animal Care and Use Committee established by University of California San Diego. Rabbits served as an animal model to evaluate the proposed de-noising method in vivo. We anesthetized the New Zealand rabbit (∼5 kg) using an intramuscular injection of ketamine (35 mg/kg) and xylazine (5 mg/kg). The pupils were dilated and anesthetized using 2.5% phenylephrine hydrochloride, 0.5% proparacaine hydrochloride, and 1% tropicamide. The transducer was placed on the top of opened eye and ultrasound gel was used for acoustic coupling. The LED repetition rate and B-mode frame rate for in vivo experiments are 4 KHz and 6 Hz, respectively. We used 690 nm in this study.

Results and discussion
Three different data sets were used to evaluate the DLASD method. Figure 2 shows a single line of the detected photoacoustic signal generated by the point target phantom where the averaging and proposed method were applied on different numbers of frames. The proposed method was compared with averaging via the same number of frames as a tool to improve the PSNR of the signals. Our proposed method has a PSNR of about 27.93 when using one frame; 20 frames are required to achieve the same PSNR value via averaging. The use of five frames in the DLASD markedly reduced the noise amplitude from 16.1 mv to about zero (-2.2×10 −5 mv), which improved the PSNR by ∼40%.

Contrast assessment
To evaluate the contrast of the reconstructed images, the contrast ratio (CR) metric was used as below: Here, µ object and µ background are the maximum intensity of the object and the mean of image intensity in the background, respectively [32]. The background was defined as the pixels inside the green dashed rectangular region (Fig. 3). The results of reconstructed images from the de-noised signals via averaging and DLASD are shown with 5 and 10 frames (Fig. 3). When using five frames, the CR was found to be about -30.11 dB via the averaging method; DLASD had a CR of -58.61 dB, and the CR ratio is about -42.13 dB when all 1050 frames were used for averaging. The proposed method provides higher contrast than averaging methods with the same number of frames. The CR was improved with more frames via two approaches; however, the CR of DLASD with the same frames was higher than the averaging method.
Furthermore, DLASD used only 0.5% of all frames but still had a 38% improvement in contrast ratio versus averaging using all frames. Our processing time includes averaging time, signal de-noising time, and image reconstruction time. The computational time of the DLASD method for phantom data was about 0.3 s. By using 0.5% of frames, the averaging time reduced from 1.8 s for all 1050 frames to 0.4 s for 5 frames. Also, the image reconstruction computational time is about 0.5 s for the two methods. Consequently, our proposed method versus averaging all frames markedly reduced the total processing time from 2.3 s to 1.2 s. Furthermore, the DLASD method could also eliminate the mirror artifact in the reconstructed images. The mirror image artifact is a form of reverberation that occurs by the false assumption that an echo returns to the transducer after a single reflection. During the dictionary learning process, the dictionary is updated according to the decomposed signals. It follows the properties of main components of signals that can lead to sparse representation of signal. The mirror artifacts could be reduced by this sparse representation of signal.
Additionally, to evaluate the tolerance of the DLASD method to noise, different levels of SNR were added to the signal. We used signals of LED-based photoacoustic which includes noise, and we averaged all frames as a ground truth signal and added different levels of SNR: -5, -10, -15, -20, -25, and -30 dB. The Fig. 4 shows that the DLASD could recover objects until the SNR increased to -25 dB (see appendix, Fig. 7).

Spatial resolution and depth assessment
We next used a depth phantom to evaluate the spatial resolution and depth evaluation. This phantom contains point-targets positioned at different depths. The photoacoustic images of de-noised signals via averaging and DL-based methods for different numbers of frames were reconstructed (Fig. 5). The first column shows the performance of the averaging method by using 30 and 50 frames. The middle column depicts the output of DLASD via the same number of frames, and the last column is the averaging method with all 1290 frames. The averaging method improved the overall CR of the images. However, the result of signal de-noising with averaging of 20 frames missed the 6 th object in the deepest position in the reconstructed image ( Fig. 5(a)). When averaging with 50 frames, the 6 th object is not distinguishable and suffers from low contrast. The proposed method uses only 20 frames and can still detect the deepest object ( Fig. 5(b)). The use of all 50 frames significantly improved the contrast of the 6th object. The DLASD method with 50 frames can detect the deepest object and improves the image quality. Table 1 shows the CR and computational time for both signal de-noising methods for the different number of frames. The proposed method has a CR about of -87.62 with 50 frames. This is better than the CR of about -71.25 obtained with 1290 frames in the averaging method. To quantitatively evaluate the DLASD method in terms of spatial resolution, the full-widthhalf-maximum (FWHM) was calculated in lateral and axial axes for the reconstructed images. The lateral and axial FWHM were calculated for objects at different depths. The focal depth of this transducer is about 20 mm, and the best resolution was achieved at this depth. The lateral and axial FWHM for the objects at this depth are presented in Fig. 5 This improved by about 10% via averaging of all frames. We achieved a 43% improvement in axial FWHM but with only 4% of the frames. Thus, the DLASD method leads to better axial resolution versus averaging all frames as a gold standard method. Additionally, the linearity of the photoacoustic signals could be maintained through the DLASD method (see appendix, Fig. 8).

Temporal resolution assessment
Here, we investigated the effect of frame number on the quality of reconstructed images. Averaging can decrease the effect of noise with more frames but leads to longer scan times. Our proposed method also has better CR with more frames but requires only 50 frames (4% of all frames) to achieve the same CR as 1290 frames of averaging for the deepest object. The DLASD offers better CR than averaging all frames but with only 4% of the frames. The computational time of the DLASD method for this data was about 0.31 s. By using 4% of the frames, the averaging time decreased from 1.58 s for all 1290 frames to 0.57 s for 50 frames. These processes used MATLAB on an Intel Corei7 3.2 GHz CPU with 8 GB RAM. These results prove that our method improved the temporal resolution. The frame rate of averaging all frames is 0.05 Hz whereas the frame rate of DLASD with 50 frames is about 1 Hz. In [35] to improve quality of LED-based PA images, the PA signals were averaged over a number of frames about 1360 frames for the best result and then then recurrent neural network were used to improve quality of PA images as well as gain in imaging frame rate. The computational time of their proposed method was about 0.1 s on CPU (Intel Core i7-7700K @4.20 GHz with 32 GB RAM) without considering training phase time using a GPU (NVIDIA GeForce GTX 1080 Ti). In comparison to averaging method in LED-based PA imaging, we proposed DLASD method which enable us using only simple reconstruction method delay and sum without any further post processing. The computational time of DLASD method was about 0.3 s on CPU (an Intel Corei7 3.2 GHz with 8 GB RAM). However, it does not need any training processes. Furthermore, we only used 50 frame averaging.

In vivo experiment
Finally, we evaluated the performance of the proposed signal de-noising method with in vivo data from a rabbit retina. Figure 6 shows the reconstructed images of de-noised signal via our proposed method as well as the averaging method with different number of frames. We defined the dashed region as the background to calculate CR of images. The CR of -14.5 dB was achieved for reconstructed images of de-noised signal using averaging via all 1536 frames. DLASD with 30 frames has a better CR of -18.05 dB. This shows a 24% improvement in CR versus averaging 30 frames with CR of -13.6 dB. Therefore, the retina of the rabbit can clearly be seen in Fig. 6(b). The retina is not distinguishable in Fig. 6(a) with averaging 30 frames. The CR was improved by using only 2% data versus averaging all frames. Furthermore, the background noise is about -35 dB for DLASD via 30 frames versus -27 dB for all averaged frames. In contrast to the averaging method, the background noise was considerably suppressed with increasing frames numbers: -38 and -42 dB for DLASD using 50 and 100 frames ( Fig. 6(d, f)). The computational time of the DLASD method for in vivo data was about 0.32 s. The averaging time reduced from 4.58 s for all frames to 1.45 s for 30 frames by using fewer than 2% of frames. The image reconstruction computational time is almost equal for the two methods (about 0.5 s). Finally, our proposed method significantly reduced the processing time (includes averaging, signal de-noising, and image reconstruction times) from 5.08 s for averaging all frames to 2.27 s for DLASD. Fig. 6. Reconstructed rabbit retina images of de-noised signals by averaging using 30 (a), 50 (c), and 100 (e) frames. DLASD using 30 (b), 50 (d), and 100 frames (f). Panel g) shows the averaging method using all 1536 frames. Panel h) shows the schematic of imaging setup for rabbit eye. Panel i) compares CR and computational time of averaging and DLASD using 30 frames and also averaging using all frames. The in vivo study has some artefacts in the reconstructed images-especially the data for averaging all frames and DLASD methods where the noise is suppressed.

Conclusion
We proposed a dictionary learning assisted signal de-noising method that combines a Majorizationminimization method with ADMM to compensate for the low SNR of LED-based PAI systems. The proposed method was compared to the averaging method via phantoms and a rabbit retina. The DL-based signal de-noising method outperforms the averaging method in terms of PSNR. It also provides higher contrast versus averaging methods with the same number of frames; this was seen for all samples.
The lateral FWHM in our proposed method is identical to the averaging method but requires only 4% of the frames. The axial FWHM improved by around 43%. Thus, DL-based signal de-noising methods have better axial resolution versus averaging methods. Indeed, DLASD with only one single frame could achieve the same CR as all frames of averaging but we settled on 4% of all frames to identify the deepest object with better contrast than averaging all frames. The main improvement is the use of fewer frames with less computational time and faster frame rates.