Previewable Contract-Based On-Chain X-Ray Image Sharing Framework for Clinical Research

Background An image sharing framework is important to support downstream data analysis especially for pandemics like Coronavirus Disease 2019 (COVID-19). Current centralized image sharing frameworks become dysfunctional if any part of the framework fails. Existing decentralized image sharing frameworks do not store the images on the blockchain, thus the data themselves are not highly available, immutable, and provable. Meanwhile, storing images on the blockchain provides availability/immutability/provenance to the images, yet produces challenges such as large-image handling, high viewing latency while viewing images, and software inconsistency while storing/loading images. Objective This study aims to store chest x-ray images using a blockchain-based framework to handle large images, improve viewing latency, and enhance software consistency. Basic Procedures We developed a splitting and merging function to handle large images, a feature that allows previewing an image earlier to improve viewing latency, and a smart contract to enhance software consistency. We used 920 publicly available images to evaluate the storing and loading methods through time measurements. Main Findings The blockchain network successfully shares large images up to 18 MB and supports smart contracts to provide code immutability, availability, and provenance. Applying the preview feature successfully shared images 93% faster than sharing images without the preview feature. Principal Conclusions The findings of this study can guide future studies to generalize our framework to other forms of data to improve sharing and interoperability.


Introduction
As the COVID-19 pandemic persists among us, it is crucial for healthcare institutions to share COVID-19 related data representing symptoms and side-effects to aid downstream processes that find and maintain the best prevention methods and treatments [1][2][3][4][5][6]. An important type of COVID-19 related data to be shared are chest x-ray images [7,8], which can be investigated in pictorial reviews to determine prognostic COVID-19 pneumonia features and characteristics with more sample data [1,2] or be used to build more generalizable machine learning or deep learning models such as Convolutional Neural Network (CNN) for COVID-19 detection [3,[9][10][11]. Therefore, there is a need for image sharing between medical institutions which require a trustworthy data interoperability framework that can share large amounts of data, ideally independent of a singular controller [12,13].
Current public centralized image sharing mechanisms, such as hospital image databases or open-source image sharing websites, enable collaborative and shareable image repositories [14]. However, they present the possibility of having a single-point-of-failure as seen in Fig. 1A. That is, any corruption or maintenance of the central repository would block access from other institutions to the medical images stored in the centralized server.
To address the single-point-of-failure issue above, prior studies [15][16][17][18] have proposed blockchain-based solutions, which rely on blockchain, a decentralized, distributed ledger based on peer-to-peer networks and various consensus algorithms [19]. Blockchain has been proposed for various applications such as for genomic data assess logging [20], pharmaceutical supply chain [21], and privacy-preserving predictive modeling on clinical research data [22][23][24][25] because of its three main benefits: availability, immutability, and provenance to the data stored on-chain [26] First, the decentralized architecture of blockchain contributes to the continuous availability of medical images without a single-point-of-failure. Second, the block creation process generates an immutable audit trail (i.e., an unalterable ledger), which is crucial for storing medical images. Lastly, possessing traceable and verifiable records and transactions ensures legitimacy, which is important in medical image sharing so future procedures and findings using those images are valid [26]. However, existing proposals [15][16][17][18] only store hashed medical images or pointers to the images on the chain rather than the images themselves (Fig. 1B). Therefore, the images themselves are not highly available, immutable, and provable.
Although storing the images directly on the blockchain can provide availability, immutability, and provenance to the images, several challenges (Fig. 1C) still exist that could preclude existing proposals from adopting this solution: (1) Large image size. Blockchain platforms usually have a limit on the transaction size (e.g., Ethereum [27] can only support up to around 20 to 30 KB per transaction [28]), which could be smaller than the size of the medical images. Hence, a mechanism to handle large-sized images is important. (2) Viewing latency. The block creation times of blockchain, when compared to traditional databases, may be slower and thus would hinder the ability to quickly access the images to be shared. Therefore, a way to quickly preview the images is desirable. (3) Code inconsistency. Although the data could be guarded from being altered by the blockchain, the software to store/load the images may be changed accidentally/maliciously and thus inconsistent across different healthcare institutions. Thus, it would be desirable if the computer programs are also immutable, provable, and highly available to improve the consistency.

Objectives
We aim to utilize the blockchain benefits of immutability, provenance, and availability while addressing the (1) large-sized image handling, (2) image viewing latency, and (3) code inconsistency issues that emerge from sharing images through the blockchain. Our image sharing framework will (1) handle large images, (2) reduce viewing latency, and (3) enhance code consistency.

Method Overview
To achieve these three goals, we devised a framework with three corresponding components: (1) splitting and merging, (2) scaling and previewing, and (3) smart contract (Fig. 1C). (1) Splitting and merging. To handle large images, we split images into smaller "image pieces" that are within the blockchain transaction size limit when storing the images, and then merge the pieces back into images when loading the images.
(2) Scaling and previewing. To reduce viewing latency, we created, stored, and loaded "preview" images, which are descaled images for their corresponding image and thus allow users to quickly glance over preview images before the original image is stored and loaded (e.g., like the preview images on Internet websites). (3) Smart Contract. To improve code consistency across multiple sites, we developed a smart contract, which is a digital and immutable set of programs deployed on certain blockchain platforms such as Ethereum [29], to store and load image pieces and preview images on the blockchain.
The design of our framework is displayed in Fig. 2. Images are split when stored ( Fig. 2A) and images are merged when loaded (Fig. 2C). Storing and loading is supported through a smart contract (Fig. 2B). The details of the storing, smart contract, and loading parts are introduced in the following: • Storing previews/images ( Fig. 2A). The input of this step are the images uploaded, and the output are preview images and image pieces to be recorded on the blockchain. Each patient is stored in a patient Centralized solution stores data in one location such that when this location gets attacked or is under maintenance, all sites can't access the medical images. Multiple researchers, labeled R X,Y , exist at each site X and are responsible for storing and loading the medical images. We omitted researcher icons in the rest of this figure to reduce redundancy. (B) Existing blockchain-based solutions mainly store hashed medical images or pointers to the images on-chain such that when any site becomes unavailable, other sites can still access one another. However, only the stored hashed images or pointer to the images, and not the images themselves, receive the blockchain benefits of availability, immutability, and provenance. (C) Our solution handles large images by splitting and merging images, high viewing latency by scaling images for a preview feature, and code inconsistency by using a smart contract to provide availability/immutability/provenance to the code.
structure with its patient ID as its unique identifier. If a patient has multiple images, these images can be stored at different times by using the patient's ID as an identifier to link the images to the patient. First, each image will be initialized as an image structure mapped to a patient structure using its filename as a key. We scale each image to be less than or equal to C KB (C = 30 in our experiments). Then, we split each image into C KB image pieces, where the sum of all these pieces is the size of the image. The scaled preview image and the image pieces will be stored on the blockchain.
• Smart contract (Fig. 2B). The input/output of this step are both the preview images and the image pieces recorded on the blockchain. We created a smart contract with the specifically designed data structures, Image and Patient, along with functions to store and load pieces and getter functions to retrieve information (Fig. 3). This allows us to store/load the preview images and image pieces while maintaining patient/image relationships. • Loading previews/images (Fig. 2C). The input of this step includes the preview image and the image pieces retrieved from the blockchain, and the output are the preview images and the merged images. First,  we load the preview image to allow fast glancing of the image on another site. Next, we load the original image by extracting its relevant image pieces. Finally, we merge these image pieces back together to form their original image.

Implementation
The architecture of our implementation is shown in Fig. 4. Based on prior review [30,31], we chose the platform, Ethereum [27], because it executes smart contracts, is open-source, and is supported by a community [29,32]. Ethereum has been adopted for medical applications such as medical records management [33] and gene-drug interaction data sharing [32]. We configured Ethereum as a private/permissioned blockchain [34] (i.e., can be joined by only allowed blockchain nodes/ computers) to emulate the scenario of an early-stage image sharing platform where only few authorized institutions can participate in the blockchain network. Also, we adopted Clique [34], a Proof-Of-Authority (PoA) consensus protocol [35] that is specifically designed for a permissioned blockchain. PoA is used instead of other consensus protocols like Proof-Of-Work (PoW) [32] because it can reduce extensive computational cost and energy (by assuming the nodes in the network are authorized participants already) when compared to the latter, thus can improve the sustainability of our proposed solution.
We implemented our Smart Contract in Solidity 0.5.10 [36] in Remix IDE [37] and deployed it on Ethereum [27]. We coded off-chain processes in Java and used Web3j [38] to work on the Ethereum blockchain network. We set C to 30, where the size of each piece stored on-chain is at most 30 KB. We used two virtual machines to represent two medical imaging institutions, each with 2 vCPUs, 8 GB RAM, and 100 GB SSD hard disk, on the UCSD Campus Amazon Web Services (AWS) cloud platform, to conduct the experiments.

Data
We extracted n = 920 chest x-ray images of patients positive or suspected of COVID-19 or other viral and bacterial pneumonias such as Middle East Respiratory Syndrome (MERS), Severe Acute Respiratory Syndrome (SARS), and Acute Respiratory Distress Syndrome (ARDS) from the University of Montreal's publicly available image repository collected from public sources, hospitals, and physicians [39,40]. Within these images, 711 were JPGs and 209 were PNGs. There were m = 450 patients, where each patient may have more than one associated image. The filenames of the images were unique in the dataset. The detailed statistics of the dataset are described in Table 1.

Experiment setting
To understand the performance of our proposed method ("patientlevel with preview"), we compared it with a variant without the preview feature ("patient-level without preview"). Additionally, to further investigate the extreme situation of "one image per patient", we  removed the patient-image relationship to form another pair of methods ("image-level with preview" and "image-level without preview"). All the above comparing methods share the core functionalities of splitting, merging, and utilizing a smart contract. Also, we stored and loaded images sequentially by the patient IDs and then by the patient's image filenames.
For each method, we measured their storing (i.e., time required to publish the whole original image to the blockchain), loading (i.e., time required to retrieve the whole original image from the blockchain), firstviewable storing, first-viewable loading, and total first-viewable times. The first-viewable storing time indicates the time taken for a researcher from a site to store the first "viewable" image (i.e., the preview image for methods with the preview feature, and the original image for methods without the preview feature). Similarly, the first-viewable loading time indicates the time taken for a researcher from another site to load the first "viewable" image. Finally, the total first-viewable time is the sum of the first-viewable storing and first-viewable loading times; this time represents how long it would take for a researcher to preview an image being stored by another researcher, and therefore is our main metric to compare the methods with/without the preview feature. We further conducted a paired two-sample t-test and calculated the Pearson Correlation Coefficient (PCC) for the two pairs of methods (i.e., between the two patient-level methods and between the two image-level methods).

Results
All times for the two patient-level methods are summarized in Table 2, and all times for the image-level methods are listed in Table 3. Preview feature increased patient-level average storing time by around 12 s, image-level average storing time by around 6 s, patient-level average loading time by around 0.6 s, and image-level average loading time by around 0.1 s, as the cost of including the "preview image". Meanwhile, the preview feature decreased patient-level average total first-viewable time by around 167 s, and image-level average total first-viewable time by around 77 s, demonstrating the reduced viewing latency.
The detailed comparison of the average total first-viewable times as well as the p-value and PCC of the two pairs of methods are shown in Fig. 5. The time improvement of "patient-level with preview" method over the "patient-level without preview" method has a p-value < 10 -9 and a PCC = 0.887, while the time enhancement of "image-level with preview" method over the "image-level without preview" method has a pvalue < 10 -44 and a PCC = 0. 626. To understand the impact of the number of images per patient, we further analyzed our proposed "patient-level with preview" method ( Fig. 6). In general, larger total image size of a patient, especially larger number of images per patient, lengthen the total first-viewable time.

Findings
We have the following major findings: (1) Large image handling from splitting and merging. As the sizes of images increase, so does the length of time needed to store and load the images (Fig. 6). We were able to store all images in the COVID-19 image dataset, where the largest image in our experiment was 18.5 MB (Table 1). Furthermore, after stratifying first-viewable times by number of images per patient, the firstviewable times for patients with more images were lengthened despite having the same total image size (KB) as other patients. (2) Reducing viewing latency from preview feature. The preview feature increases storing and loading time; however, there is a significant reduction in the user viewing time, which is crucial in real-world applications. There is a 93.2% reduction in first-viewable time for patient-level methods (Table 2E) and 92.7% reduction in firstviewable time for image-level methods (Table 3E). Both patient-level and image-level pairs of methods have low p-values, showing statistical significance, displaying the effectiveness of using the preview feature to reduce viewing latency. Both patient-level and image-level pairs of methods have mid to high PCC values showing correlation between patient size and firstviewable time. We further analyzed the number of blockchain transactions (each is within the 30 KB size limit of Ethereum) of the "with-preview" methods. There were 19,316 transactions, where 18,396 were for image pieces and 920 were for preview images. It should be noted that the number of blockchain transactions were the same for both the patient-level and the imagelevel approaches. Therefore, the preview feature accounts for 920 / 19,316 ≈ 4.8% of the transaction traffic. This increased traffic is relatively insignificant when compared to the ≈ 93% reduction in first-viewable time, which could represent the perceived daily usage experience by the researchers and the super users. (3) Enhancing code consistency from the smart contract. We were able to deploy a smart contract that ensures code immutability, provenance, and availability, as well as the images, stored on-chain.

Limitations
The limitations of our study include: (a) Blockchain Configuration. We have successfully applied this framework on a 2-node permissioned blockchain network as a proof-of-concept prototype. In a real-world clinical data research network such as pSCANNER [41], there could be more institutions willing to participate in image data sharing. Also, other permissioned blockchain platforms such as Hyperledger Fabric [42] could be adopted in place of Ethereum. Hence, simulation with more nodes and with different blockchain may warrant investigation. (b) Image Scope. We were able to store/load images up to 18 MB each.
However, different types of images such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and ultrasounds can be larger than 100 MB [43]. In addition, a dataset with a larger number of images may lead to image filename duplicates for each patient, and thus require additional steps to resolve the duplications. Moreover, not all images in our dataset contained date/time information. In all, experiments handling larger images, filename duplications, and date/time information have yet to be investigated. (c) Patient Scope. The largest number of images for a patient was 22 images and the average number of images for a patient was 2 only. While our dataset only contained chest x-ray images, other institutions may publish images for different body parts such as Table 2 Results for patient-level methods. All times are measured in seconds.  [47][48][49] and including medical experts as "data champions" [50] to supervise the proper sharing of the images are yet to be included in our framework.

Conclusion
Our results support the use of permissioned blockchain as a solution to share images through on-chain image storage to provide immutability, availability, and provenance to the images themselves while addressing the challenges of on-chain storage. All images, including large images up to 18 MB, were handled by our splitting and merging method. The preview feature effectively resolved the issue of high viewing latency. Specifically, because the patient-level experiments suit real world applications to conserve patient-level data, finding that patient-level with preview was successful reinforces using blockchain along with the preview feature. In addition to image immutability, availability and provenance, the smart contract ensured code consistency.
Although we only worked with the clinical data consisting of chest xray images related to COVID-19, MERS, SARS, and ARDS, our framework can be generalizable to other forms of images such as epidemiological and biological ones. A variety of medical images can prove to be useful in combating and analyzing their corresponding disease. Other COVID-19 related medical images such as CT scans have been used in convolutional neural networks to diagnose COVID-19 pneumonia [51]. Other non-COVID-19 related medical images such as brain MR images from Alzheimer's Disease have been used to detect disease progression [52] and used in machine learning algorithms to detect brain tumors [53]. Breast ultrasounds have been used to detect breast cancer [54].
Overall, this study supports the functionalities in blockchain-based methods that store data on-chain, which can suit various healthcare needs such as mass image sharing between multiple institutions. Our contributions can be summarized as: (a) designing an image sharing blockchain that provides immutability, availability, and provenance to the images; (b) handling large images through a split/merge method, improving viewing latency through a preview feature, and enhancing software consistency through use of a smart contract; and (c) creating a framework generalizable to other image types to improve sharing/ interoperability.   6. The breakdown of the total first-viewable time results of the "patientlevel with preview" method by the number of images per patient.

Summary Table
What was already known?
• Centralized repositories can share images but possess a single-point-of-failure. • Decentralized blockchain-based frameworks can share images but may not provide immutability, provenance, and availability to the images themselves. What did this study add to our knowledge?
• Storing images on the blockchain allows blockchain to directly provide immutability, availability, and provenance to the images. • Splitting and merging images addresses the large image issue. • A preview feature in the blockchain based image sharing framework reduces viewing latency. • Using a smart contract allows blockchain to directly provide immutability, availability, and provenance to the code to improve its consistency.