Deep learning approach for hyperspectral image demosaicking, spectral correction and high-resolution RGB reconstruction

ABSTRACT Hyperspectral imaging is one of the most promising techniques for intraoperative tissue characterisation. Snapshot mosaic cameras, which can capture hyperspectral data in a single exposure, have the potential to make a real-time hyperspectral imaging system for surgical decision-making possible. However, optimal exploitation of the captured data requires solving an ill-posed demosaicking problem and applying additional spectral corrections. In this work, we propose a supervised learning-based image demosaicking algorithm for snapshot hyperspectral images. Due to the lack of publicly available medical images acquired with snapshot mosaic cameras, a synthetic image generation approach is proposed to simulate snapshot images from existing medical image datasets captured by high-resolution, but slow, hyperspectral imaging devices. Image reconstruction is achieved using convolutional neural networks for hyperspectral image super-resolution, followed by spectral correction using a sensor-specific calibration matrix. The results are evaluated both quantitatively and qualitatively, showing clear improvements in image quality compared to a baseline demosaicking method using linear interpolation. Moreover, the fast processing time of 45 ms of our algorithm to obtain super-resolved RGB or oxygenation saturation maps per image for a state-of-the-art snapshot mosaic camera demonstrates the potential for its seamless integration into real-time surgical hyperspectral imaging applications.


Introduction
Reliable discrimination between tumour and surrounding tissues remains a challenging task in surgery and in particular in neuro-oncology surgery. Despite intensive research and progress in advanced computer-assisted visualisation techniques, most intraoperative surgical evaluations are still heavily reliant on subjective visual assessment from clinicians. Modern intraoperative tissue discrimination techniques therefore often involve the use of interventional techniques, such as fluorescence and ultrasound imaging. However, visual assessment of fluorescence intensities during surgeries is usually qualitative, which hinders accurate, reliable and repeatable measurements for consensus, standardisation, and adoption of fluorescenceguided surgery in the field (Valdes et al. 2019). Ultrasound imaging may suffer from poor resolution and restricted field of view, and the interpretation is highly subjective to the experience of the experts (Kaale et al. 2021).
Intraoperative hyperspectral imaging (HSI) provides a noncontact, non-ionising and non-invasive solution suitable for many medical applications (Lu and Fei 2014;Shapey et al. 2019;Clancy et al. 2020). HSI can provide rich highdimensional spatio-spectral information within the visible and near-infrared electromagnetic spectrum across a wide field of view. Compared to conventional colour imaging that provides red, green, and blue (RGB) colour information, HSI can capture information across multiple spectral bands beyond what the human eye can see, thereby facilitating tissue differentiation and characterisation. Unlike fluorescence and ultrasound imaging, HSI exploits the inherent optical characteristics of different tissue types. It captures the measurements of light that provide quantitative diagnostic information on tissue perfusion and oxygen saturation, enabling improved tissue characterisation relative to fluorescence and ultrasound imaging (Lu and Fei 2014). Depending on the number of acquired spectral bands, hyperspectral imaging may also be referred to as multispectral imaging, but for simplicity the hyperspectral terminology will be used. Single hyperspectral image data typically span three dimensions, two of them represent 2D spatial dimensions and the other represents spectral wavelengths, as illustrated in Figure 1(a). Therefore, 3D HSI data are thus often referred to as hyperspectral cubes, or hypercubes in short. In addition, time comes as a fourth dimension in the context of dynamic scenes, such as those acquired during surgery. Hyperspectral cameras can broadly be divided into three categories based on their acquisition methods, namely spatial scanning, spectral scanning and snapshot cameras (Shapey et al. 2019;Clancy et al. 2020). Spatial scanning acquires the entire wavelength spectrum simultaneously on either a single pixel or a line of pixels using linear or 2D array detector, respectively. The camera will spatially scan through pixels over time to complete the hyperspectral cube capturing. Spectral scanning, on the other hand, is able to capture the entire spatial scene at a certain wavelength with a 2D array detector, and then switches to different Figure 1. Examples to illustrate a hyperspectral cube as well as subsampling and demosaicking operations: (a) shows the spatial dimensions (X and Y) and spectral dimension of a hypercube; (b) shows how hyperspectral cube and snapshot mosaic images can be transformed into each other with band selection/spatial interpolation. Due to the space constraint of the image, 3 � 3 snapshot mosaicking is taken as an example.
wavelengths over time to complete scanning. These two types of spectral cameras are able to acquire hyperspectral data with high spatial and spectral resolution, but long acquisition times prevent them from providing live image displays suitable for real-time intraoperative use. To achieve intraoperative tissue characterisation with HSI in real-time, snapshot cameras are more suitable as they can capture hyperspectral cube data in real-time (Ebner et al. 2021). A common type of snapshot camera uses a snapshot mosaic system to acquire the entire hyperspectral cube instantly without the need of a scanning mechanism. The refined n � n pixel filter array, arranged similarly to the 2 � 2 colour filter array on the RGB sensor, allows the snapshot camera to acquire a maximum of n 2 different spectral bands in a single exposure (Geelen et al. 2014). Other snapshot hyperspectral imaging approaches, such as coded aperture snapshot spectral imaging (CASSI) (Wagadarikar et al. 2008) and micro-lens-based acquisition, have been proposed. In general, the downside of snapshot acquisition is that it sacrifices spatial and spectral resolution to achieve fast data acquisition speeds. Figure 1(b) illustrates the relationship between a high-resolution hyperspectral cube and a 3 � 3 snapshot mosaic image as a simplified example. An X � Y snapshot image is composed of a large number of individual 3 � 3 blocks following mosaic patterns. The 3 � 3 snapshot on the right of Figure 1(b) is an example of a single block captured by the 3 � 3 sensor array.
As the image captured by a snapshot mosaic sensor is in 2D, a demosaicking operation is necessary to restore the spatial and spectral resolution of the image, followed by spectral correction to deal with the parasitic effects of the sensors, such as harmonics, cross-talks and leakage (Pichette et al. 2017). Spectral correction can usually be handled by applying a calibration matrix, such as provided by the camera manufacturer, but demosaicking of the snapshot data is challenging. As illustrated in Figure 1(b), the demosaicking operator usually involves splitting the image into different spectral bands, followed by spatial interpolation to fill in the missing data. Common ways of image demosaicking using interpolation methods usually result in poor image quality of reconstructed hyperspectral data, so several approaches have been presented to address this demosaicking problem. For example, Hy-Demosaicing proposed by Zhuang et al. used dataadaptive subsampled signal subspaces for reconstruction of hyperspectral urban images by exploiting the low-rank and self-similarity properties of the hyperspectral images (Zhuang and Bioucas-Dias 2018). Deep learning methods for hyperspectral demosaicking were also investigated, such as the similarity maximisation framework proposed for performing end-to-end demosaicking and cross-talk correction for agricultural machine vision (Dijkstra et al. 2019).
Despite the development of demosaicking algorithms, research on medical hyperspectral image demosaicking remains limited. The goal of this study is to develop a reliable real-time image demosaicking, spectral correction and associated RGB reconstruction algorithm to recover higher quality medical hyperspectral images suitable for intraoperative applications. Due to the lack of open datasets of snapshot mosaic hyperspectral imaging from intraoperative settings, and more importantly, due to the impossibility of capturing hyperspectral imagery paired for both snapshot and highresolution sensors, the proposed learning-based demosaicking algorithm makes use of publicly available medical hyperspectral image datasets captured in high spatial and spectral resolution by line-scan cameras for training purposes. Based on highresolution data, we exploit the knowledge of the physical image acquisition process to simulate images expected from a snapshot mosaic camera as well as their corresponding ideal demosaicked images. This allows us to form image pairs suitable for supervised training. The results have been evaluated with popular full-reference image quality metrics including structural similarity (SSIM) and peak signal-to-noise ratio (PSNR). A first qualitative survey has also been conducted on the reconstructed RGB image quality, and the proposed algorithm has been applied to real snapshot mosaic test images to demonstrate its effectiveness. The speed and quality of the reconstructed image from our proposed algorithm show respectable results, which will facilitate seamless integration into intraoperative hyperspectral imaging systems using snapshot mosaic cameras for responsive surgical guidance (Ebner et al. 2021).

Material and methods
One of the major challenges for developing learning-based hyperspectral image demosaicking algorithms is the lack of hyperspectral datasets offering paired snapshot and highresolution data. Such datasets would be even more complex to acquire in intraoperative contexts. We took an alternative approach where synthetic low-resolution snapshot images are generated from high-resolution hyperspectral images captured by line-scan sensors endowed with long acquisition times. Our developed demosaicking algorithm can thus take advantage of the resulting synthetic paired high-resolution/snapshot data. This section first introduces the publicly available hyperspectral line-scan datasets used in the experiments. Next, an overall framework for simulating the snapshot image acquisition process using the line-scan data will be presented, with details on how synthetic snapshot images and ideal demosaicked images are generated. After that, this section introduces the integration of supervised image super-resolution methods into the demosaicking, spectral correction and RGB generation framework.

Source datasets
Line-scan sensors are able to capture data across hundreds of spectral bands within the visible and near-infrared range. While they require long acquisition times, they provide high spatial and spectral resolution. Line-scan data contains sufficient information to generate snapshot mosaic images with much lower spatial and spectral resolutions. Two publicly available line-scan hyperspectral image datasets have been used in this work and are presented hereafter. Fabelo et al. (2019) provide a hyperspectral dataset acquired during neurosurgical procedures as part of the HypErspectraL Imaging Cancer Detection (HELICoiD) project. This dataset contains 36 hyperspectral cubes collected from 22 different patients. Their hyperspectral acquisition system acquired intraoperative data containing 826 successive spectral bands within the wavelengths of 400 nm to 1000 nm, with a spectral resolution of 2-3 nm. Preprocessing of the hypercube data was performed as outlined in the paper (Fabelo et al. 2019). Hyttinen et al. (2020) provide the second hyperspectral dataset used in this work. The Oral and Dental Spectral Image Database (ODSI-DB), is a larger dataset containing 316 different oral and dental hyperspectral images. The hyperspectral images acquired in this dataset are from two different cameras. One hundred and seventy-one out of the 316 images were acquired using a Specim IQ (Specim, Spectral Imaging Ltd., Oulu, Finland) line-scan camera, which has a spatial resolution of 512 � 512 and a spectral range of 400-1000 nm with 204 spectral bands captured in total. The remaining images were obtained with the Nuance EX (CRI, PerkinElmer, Inc., Waltham, MA, USA) spectral scan camera, with a higher spatial resolution of 1392 � 1040 but fewer spectral bands. It features 51 bands ranging from 450 to 950 nm. Due to their higher spectral resolution, in this work, only the line-scan (Specim IQ) hyperspectral images were selected for synthetic snapshot image generation. Denser spectral information is indeed beneficial for sampling of sensor responses during our image generation process. The hyperspectral data in this dataset come preprocessed with flat-field correction from a blank reference sample, therefore white-balancing is not necessary for the ODSI-DB. Figure 2 illustrates the pipeline of the demosaicking algorithm for hyperspectral snapshot images. The entire framework consists of two parts. The first part detailed in Section 2.2.1 focuses on the generation of synthetic snapshot image and ideal highresolution image datasets from high-resolution images (HELICoiD or ODSI-DB). The second part detailed in Section 2.2.2 involves the supervised learning method to obtain high-quality hypercube reconstruction result.

Synthetic image generation process
Synthetic image generation starts from a white-balanced highspectral-resolution hyperspectral data cube (HELICoiD or ODSI-DB), referred to as HR Hypercube in the diagram. We denote the size of a high-spectral-resolution hyperspectral cube as X � Y � n d , where X and Y capture spatial and n d the spectral dimensions.
Simulating the Spectral Response of the Snapshot Sensor. Snapshot mosaic hyperspectral sensors only capture a discrete number of n s spectral bands with n s typically much smaller than n d . For example n s ¼ 16 for a 4 � 4 mosaic arrangement. Each of the n s bands can have a non-trivial spectral response (Pichette et al. 2017) (e.g. bimodal and/or heavy tailed response) due to the parasitic effects, such as harmonics, cross-talk and spectral leakage. These responses are nonetheless typically calibrated in factory and can be retrieved from the calibration files of the camera sensor.
An intermediate high-spatial-resolution hyperspectral cube of size X � Y � n s can be generated by simulating the effect of camera sensor response on the high-spectral-resolution data. More specifically, the intermediate hyperspectral cubes can be obtained by computing at each spatial location the inner products of the individual sensor responses with the highresolution spectrum from the input data.
Simulating the Spatial Response of the Snapshot Sensor. Having simulated the spectral response and obtained an X � Y � n s intermediate hypercube, the final simulated 2D mosaic image can be derived by applying spatial subsampling as illustrated in Figure 1(b). More specifically, the hypercube is divided into smaller blocks with the same spatial size as the mosaic sensor array, and for each pixel in each individual block, only one value from the n s wavelengths is preserved. Therefore, the synthetic snapshot mosaic image is a scalar-valued image of size X � Y .
Simulating the Target Ideal Hyperspectral Data. Given the non-trivial spectral response of the captured n s spectral bands (harmonics, spectral leakage, cross-talk, etc.), the spectral correction matrix for snapshot systems provided by the camera manufacturer may reconstruct only a subset of n i � n s spectral bands to ensure high-fidelity measurements of reconstructed bands.
The resulting n i bands are designed to approximate ideal sensor measurements by taking into account the response of ideal Fabry-Pérot resonators (Pichette et al. 2017  The corresponding optical band-pass response f can be characterised as a Lorentzian function of optical frequency. We express it in terms of the wavelength λ, centred around the central wavelength of each snapshot sensor λ 0 , with full-width at half-maximum FWHM and with a quantum efficiency QE: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The number n i and characteristics λ 0 , QE and FWHM of the ideal spectral bands are selected to capture all the reliable information contained in the n s spectral bands. These are typically provided by the camera manufacturer and are used to fit the measured response curves. A calibration matrix C of size n i � n s to map the n s spectral measurements to the n i ideal spectral bands is also computed in factory and provided by the manufacturer (Pichette et al. 2017). While one could try to recover high-spectral-resolution data from low-spectral-resolution snapshot mosaic data, in many applications, it is sufficient to recover the spatial information lost by the spatial sampling process of the mosaic arrangement while estimating a reliable set of spectral bands. As such, in this work, we aim to recover highspatial-resolution information for each of the n i ideal spectral bands. We refer to this target as the ideal hypercube in Figure 2. It can be estimated from the HR hypercube input data by applying ideal Lorentzian responses to it. Thus, the size of the target ideal hypercube is X � Y � n i .

Learning for demosaicking, spectral correction and RGB generation
Supervised Training Approach for Super-resolved Demosaicking. The synthetic data generation in Section 2.2.1 provides paired high-spatial-resolution ideal hypercubes and 2D snapshot mosaic images. Having access to such datasets, we exploit supervised learning approaches to develop a demosaicking approach, thereby achieving super-resolution of the captured mosaic data.
As outlined in the blue box in Figure 2, the algorithm starts with a simple bilinear-interpolation-based demosaicking of the snapshot mosaic images. This operation involves grouping the pixels inside the snapshot images according to the position of the sampled spectral bands, and then using bilinear interpolation along the X-and Y-axes to upsample each spectral band back to the original sensor size. The resulting interpolated data are of size X � Y � n s , i.e. the same size as the intermediate high-spatial-resolution hypercube. While linear interpolation recovers the snapshot data to its original shape before subsampling, the resulting images can still look blurry. It is now well established that deep learning can effectively refine image details with a fast inference speed (Lugmayr et al. 2020), at least when applied to RGB data. In our algorithm, a U-Net (Ronneberger et al. 2015) enhanced to accommodate residual units (Kerfoot et al. 2019) has been adopted for the superresolution and demosaicking task. The network contains a contracting path with four downsampling layers and two residual blocks at each resolution, as well as a symmetric expanding path with skip connections.
Rather than directly predicting the target ideal hypercube, we simplify the training procedure and take advantage of the known correction matrix C . For this purpose, the network aims at inferring an intermediate hypercube of size X � Y � n s . As such, the network output has the same size and spectral characteristics as the bilinearly interpolated input hypercube but achieves sharper details.
Embedding Spectral Correction. From the initial output hypercube of the network, we compensate for the parasitic spectral effects of the sensor by applying the correction matrix C to each spatial location. The size and spectral characteristics of the resulting hypercube match those of the target ideal hypercube. To train the network and the associated spectral correction, we use a loss that captures the error between the inferred corrected hypercube and the ideal hypercube.
In order to provide additional guidance with intermediate supervision, an auxiliary loss between the intermediate hypercube inferred by the residual U-Net and the intermediate synthetic high-spatial-resolution hypercube is also added. The idea behind this auxiliary loss is that instead of directly learning to refine the spatial resolution and compensate for the parasitic spectral effect, the network can be guided to focus solely on the image super-resolution task.
In terms of the choice of loss functions, we investigated two sets of configurations. For L1 loss configuration, both the training loss and the auxiliary loss are set to L1 loss. For perceptual loss configuration, the L1 auxiliary loss is replaced with the feature reconstruction loss component of the perceptual loss (Johnson et al. 2016) as this has been shown to enable improved super-resolution performance.
Denoting the (non-spectrally-corrected) output hypercube from the residual U-Net as ŷ s , the intermediate high-resolution hypercube as y s , the pre-trained loss network for the perceptual loss as ϕ, and the ideal hypercube as y i , then the total loss , can be expressed as follows, where the weight factor γ is set to 0.001 empirically: ,ðŷ s ; y s ; y i Þ ¼k y i À Cŷ s k þγ k ϕðy s Þ À ϕðŷ s Þk 2 (2) sRGB reconstruction. For intuitive visualisation of the result, the linearly interpolated snapshot hypercube data, the demosaicked hypercube results and the ideal hypercube data that serve as the ground truth of the network are all converted into sRGB images. This is achieved by first converting the spectral data (corrected with C where relevant) into CIE XYZ colour space using colour matching functions and assuming a D65 illuminant. We then convert the XYZ colour images into linear RGB colour space and apply gamma correction to obtain sRGB images.

Results
Implementation Details. In our experiment, sensor information from Ximea xiSpec (MQ022HG-IM-SM4X4-VIS2) snapshot camera was used to simulate the visible range (470-620 nm) 4 � 4 mosaic snapshot data. Synthetic image generation and demosaicking were performed on the HELICoiD and ODSI-DB datasets separately. The HELICoiD dataset contains 36 in vivo brain surface hyperspectral cubes in total, which we divided into 3 groups: 24 images for training, 6 images as the validation set and the other 6 images for testing. As for the ODSI-DB dataset, there are 122 hypercubes acquired from the line-scan sensor in total. Seventy-eight hypercubes were used for training, 20 for validation and 24 for testing. Since both datasets have cases where multiple hyperspectral data are obtained from the same subject, the dataset was split manually in order to avoid data from the same subject appearing in different groups. Both loss configurations described in Section 2.2.2 were tested in the experiment. For perceptual loss configuration, VGG-16 (Simonyan and Zisserman 2015) pre-trained network was used for feature extraction during the perceptual loss calculation, and the parameters of VGG-16 were fixed during training. In order to increase the number of training samples and limit the GPU consumption, the hyperspectral data were randomly cropped into smaller patches with a spatial size of 224 � 224 . Random flipping and random multiples of 90 � rotation were also performed for data augmentation. The batch size was set to 3 for all training processes, and the evaluation losses are the same as the training losses. Adam optimisation (Kingma and Ba 2014) was used with an initial learning rate of 0.0001, and the best training models (lowest evaluation loss) after 10,000 epochs with were selected for the proposed algorithm.
Quantitative Evaluation. Three metrics have been used to evaluate the demosaicking results, including the average L1 error, the structural similarity index (SSIM) and peak signal-tonoise ratio (PSNR). The quantitative results of the demosaicked hyperspectral cubes from the HELICoiD and ODSI-DB datasets are listed in metrics-results. In this table, results from both configurations with different auxiliary losses are shown, where the residual U-Net model with perceptual auxiliary loss performs slightly better than the model with L1 auxiliary loss, but the difference is subtle. However, when it comes to crossdataset evaluation (ODSI-DB ! HELICoiD), where the model trained on the ODSI-DB dataset was used to directly test against HELICoiD's dataset without any fine-tuning, the perceptual loss model outperforms the L1 loss model significantly. It can also be observed that the cross-dataset results are slightly worse compared to the results of the model trained directly with the HELICoiD dataset, but it is still acceptable considering the domain gap between the HELICoiD and ODSI-DB datasets.
The results were also evaluated based on the perceptual similarity of the sRGB images converted from the hyperspectral data. A perceptual similarity metric, namely LPIPS, was also used to simulate image comparison with human perception (Zhang et al. 2018). A lower perceptual score indicates that the two images appear more similar to each other, with a score of 0 representing the best possible case, where the two images are the same. sRGB-results lists the perceptual scores of the sRGB images to evaluate the quality of the demosaicked hypercube data. Here, all the images for testing are from the HELICoiD  dataset, and the demosaicking model trained with ODSI-DB is not fine-tuned with any HELICoiD data. The demosaicking algorithm is also compared to the baseline linear demosaicking results, which are derived from the linearly demosaicked and spectral-corrected snapshot images that serve as the input of the residual U-Net as shown in Figure 2. Similar trends can still be observed from this table, where the perceptual loss model outperforms the L1 model. Also, for the HELICoiD dataset in particular, the supervised learning based demosaicking algorithm achieves substantially better scores compared to linear demosaicking. One hyperspectral cube data from the HELICoiD test set has been selected to illustrate the result qualitatively, as shown in Figure 3. The sRGB images show that the model trained on the HELICoiD datasets achieve respectable reconstruction results, with the result from the perceptual loss model having a slightly sharper image, which can be observed around the vessels as an example. On the other hand, the model trained on the ODSI-DB dataset can also recover the spatial resolution of the image to some extent compared to linear demosaicking, but it still suffers from artefacts as can be observed around the reflections in the image.
User Study. Besides quantitative analysis of the data, a qualitative user study was conducted to evaluate the quality of the demosaicked images. In this survey, the demosaicked HELICoiD test images were divided into six groups. Each group contains images with the same scene but is generated from four different demosaicking methods, i.e. linear demosaicking, the proposed algorithm with L1 and perceptual losses, as well as the ideal demosaicked image. The images in each group were randomly shuffled, and the label was hidden. Twelve clinical experts were involved in the survey, who subjectively gave a Likert scale rating (integer score from 1 to 5, 5 is of best quality) for each image. The quality scores of all experts are gathered and divided based on the demosaicking methods, and the percentage distributions are shown in the bar graph in Figure 4. The average score of all linearly demosaicked images is only 1:14 � 0:15, and two experts claimed that some images seemed out of focus. The average scores for the proposed algorithm results from L1 and perceptual loss models are 2:40 � 0:41 and 3:08 � 0:75 respectively, indicating a higher image quality perceptually than linear demosaicking. The ideal demosaicked images achieve the highest average score of 3:60 � 0:92 . We have also performed paired t-test between score statistics of linear demosaicking and L1 loss model, L1 loss model and perceptual loss model, as well as perceptual loss model and ideal demosaicking images, and the p-values are all smaller than the significance level of 0.05. This result indicates the differences in subjective image quality scores between different demosaicking methods are all statistically significant.
Preliminary Evaluation with Real Data. One of the concerns regarding the supervised learning based demosaicking algorithm is that the entire framework relies heavily on synthetic data. Therefore, a real snapshot mosaic image of a hand captured by Ximea xiSpec (MQ022HG-IM-SM4X4-VIS2) was used to validate the effect of the algorithm, as illustrated in the converted sRGB images in Figure 5. The difference between linear demosaicking and the proposed algorithm with perceptual loss model can be easily observed when we zoom in to closely investigate the details, where fingerprints can be recovered using the proposed algorithm. This result shows the generalisability of the algorithm, especially considering that the two models were never trained on real snapshot images. We also generated a blood perfusion map using the super-resolved hyperspectral data as shown in Figure 5 based on (Tetschke et al. 2016) to demonstrate the potential use of the algorithm in real medical applications. However, the cross-dataset results in Table 1 and  Table 2 underline the domain gap between different datasets, which may cause image artefacts.
Real-time Performance. We tested our prototype implementation on our computational workstation for clinical research studies (NVIDIA TITAN RTX 24GB, Intel Core i9 9900 K) by taking advantage of Python, C++, OpenGL, Cuda, and Pytorch. The proposed algorithm achieved an overall processing time of approximately 45 ms per 1088 � 2048 input image frame, including frame-grabbing, white balancing, bilinear demosaicking, followed by learning-based super-resolved demosaicking, spectral correction and in the end either sRGB reconstruction or oxygenation saturation map estimation. The Pytorch-based U-Net super-resolution inference runs in 34 ms.

Conclusion
In this work, we propose a hyperspectral snapshot image demosaicking algorithm for computer-assisted surgery using synthetic image generation and supervised learning. The simulated snapshot images and their corresponding ideal demosaicked images can be generated from publicly available hyperspectral image datasets acquired by linescan sensors. A demosaicking framework has been developed with the adoption of a residual U-Net for hyperspectral image super-resolution, which can be trained with the synthetic image pairs. The quantitative and qualitative results show that the supervised learning approach is able to produce better reconstruction results compared to simple linear demosaicking, and it can still achieve a fast processing speed, which is beneficial for integration of the demosaicking algorithm into real-time surgical imaging applications. Future work will include further investigation on the generalisability of the algorithm when more real snapshot data are captured. Since the proposed demosaicking approach separates the learning-based spatial super-resolution from spectral calibration, generalisation of our approach on real snapshot images can be expected, which has been demonstrated by the convincing results achieved with our preliminary real data evaluation. In addition, there is still room for improvements in speed and the image quality of the demosaicking algorithm. Nevertheless, the proposed demosaicking algorithm provides a solid step forward for medical hyperspectral imaging.

Disclosure statement
No potential conflict of interest was reported by the author(s).  Oxygenation saturation (zoomed) Figure 5. Preliminary test results on real snapshot data converted into sRGB images. The linear demosaicking and the proposed algorithm are compared. The two images on the right also illustrate oxygenation saturation maps derived from hyperspectral information.