Differential privacy preserved federated transfer learning for multi-institutional 68Ga-PET image artefact detection and disentanglement

Purpose Image artefacts continue to pose challenges in clinical molecular imaging, resulting in misdiagnoses, additional radiation doses to patients and financial costs. Mismatch and halo artefacts occur frequently in gallium-68 (68Ga)-labelled compounds whole-body PET/CT imaging. Correcting for these artefacts is not straightforward and requires algorithmic developments, given that conventional techniques have failed to address them adequately. In the current study, we employed differential privacy-preserving federated transfer learning (FTL) to manage clinical data sharing and tackle privacy issues for building centre-specific models that detect and correct artefacts present in PET images. Methods Altogether, 1413 patients with 68Ga prostate-specific membrane antigen (PSMA)/DOTA-TATE (TOC) PET/CT scans from 3 countries, including 8 different centres, were enrolled in this study. CT-based attenuation and scatter correction (CT-ASC) was used in all centres for quantitative PET reconstruction. Prior to model training, an experienced nuclear medicine physician reviewed all images to ensure the use of high-quality, artefact-free PET images (421 patients’ images). A deep neural network (modified U2Net) was trained on 80% of the artefact-free PET images to utilize centre-based (CeBa), centralized (CeZe) and the proposed differential privacy FTL frameworks. Quantitative analysis was performed in 20% of the clean data (with no artefacts) in each centre. A panel of two nuclear medicine physicians conducted qualitative assessment of image quality, diagnostic confidence and image artefacts in 128 patients with artefacts (256 images for CT-ASC and FTL-ASC). Results The three approaches investigated in this study for 68Ga-PET imaging (CeBa, CeZe and FTL) resulted in a mean absolute error (MAE) of 0.42 ± 0.21 (CI 95%: 0.38 to 0.47), 0.32 ± 0.23 (CI 95%: 0.27 to 0.37) and 0.28 ± 0.15 (CI 95%: 0.25 to 0.31), respectively. Statistical analysis using the Wilcoxon test revealed significant differences between the three approaches, with FTL outperforming CeBa and CeZe (p-value < 0.05) in the clean test set. The qualitative assessment demonstrated that FTL-ASC significantly improved image quality and diagnostic confidence and decreased image artefacts, compared to CT-ASC in 68Ga-PET imaging. In addition, mismatch and halo artefacts were successfully detected and disentangled in the chest, abdomen and pelvic regions in 68Ga-PET imaging. Conclusion The proposed approach benefits from using large datasets from multiple centres while preserving patient privacy. Qualitative assessment by nuclear medicine physicians showed that the proposed model correctly addressed two main challenging artefacts in 68Ga-PET imaging. This technique could be integrated in the clinic for 68Ga-PET imaging artefact detection and disentanglement using multicentric heterogeneous datasets. Supplementary Information The online version contains supplementary material available at 10.1007/s00259-023-06418-7.


Differential Privacy
By introducing noise to the data, differential privacy protects individuals' privacy within a dataset [3].Differential privacy aims to ensure that the inclusion or exclusion of any individual from the dataset has no appreciable influence on the outcomes of statistical analysis [3,13].This is accomplished using a randomization mechanism, such as the Laplace or Gaussian mechanisms, to introduce noise into the data [3,[13][14][15][16][17].The privacy budget , representing the maximum amount of privacy loss deemed acceptable, determines how much noise is added to the data [3].The "sensitivity" of the function, also known as the difference in probability between any two outcomes, is a common way to define the privacy budget [13][14][15][16][17].The maximum change in the function's output brought on by including or excluding a single subject in the dataset is referred to as the sensitivity [3,[13][14][15][16][17].
If algorithm M is randomized, it's considered (ϵ,δ)-differentially private when, for any two closely related datasets D1 and D2, and any specified event E in set R, the differences in the algorithm's output distributions for these datasets are within the bounds of (| |, ) [3,[13][14][15][16][17].This means that, for any event E, the probability of the event occurring in the output distribution of the algorithm on the dataset  1 is no more than   times the probability of the event occurring in the output distribution of the algorithm on the dataset  2 , plus  [3].If  = 0, and  > 0, the algorithm is termed pure differentially private (DP) and approximate DP, respectively [3].
The Gaussian noise mechanism is an effective technique for implementing DP [3,[13][14][15][16][17].It adds zeromean multivariate Gaussian noise with a standard deviation of .  , to the output of a function  with L2-sensitivity   , which is defined as the maximum difference in the output of the function for any two neighboring datasets [3].The parameter  is chosen based on   2 and .Gaussian noise can be applied to local model parameters before server aggregation, to global parameters on the server before distribution, and during local training [3].

Table 1 .
Summary statistics of quantitative parameters for different approaches.

Table 2 .
Summary statistics of quantitative parameters for different centers trained for each center separately (CeBa) and tested on all test sets (centers 1-8).i.e., column Center 1 represents the results of testing on the whole test set when training is performed only using the Center 1 data set.All test sets represent the results of models, in which training and testing are performed at the same center (whole 20% of the clean dataset).

Table 3 .
Summary statistics of quantitative parameters for the different centers using FTL and tested on all test sets (centers 1-8).i.e., column Center 1 represents the results of testing on the whole data set when training is performed only using the Center 1 data set.All test sets represent the results of models, in which training and testing are performed at the same center (whole 20% of the clean dataset).