Abstract
In addition to threatening human lives, the recent COVID-19 pandemic has
also highlighted how misinformation is plaguing our online social
networks. However, privacy and ethical concerns reduce data sharing by
stakeholders, impeding data-driven misinformation detection. Current
data encryption techniques—providing privacy guarantees on
data—cannot be naively extended to inferring text inputs with Deep
Learning (DL) models mainly due to inherent non-polynomial operations
(which are not encryption-compatible), error introduced by approximate
polynomial activations (because they are only valid for a limited range
of input values), and error accumulated over stacked encrypted
operations of DL classifiers. In this paper, we propose encrypted
federated learning (EFL) framework for text-based misinformation
detection as a (secure and privacy-aware cloud) service, where
classifiers are securely trained in FL framework—which ensures the
privacy of training data holders—and later inference is performed on
homomorphically encrypted data—which ensures the privacy of clients’
data. We evaluate three classifier architectures—Logistic Regression
(LR), Multilayer Perceptron (MLP), and the novel encryption-compatible
self-attention network (SAN) proposed in this paper—on two publicly
available text-based misinformation detection datasets. To reduce the
error induced by polynomial activations, we, formally and empirically,
show the efficacy of L2 regularization while training
the classifier. To regulate the error accumulated due to cascaded
operations over encrypted data, we advocate the use of sigmoid
activation with formal proofs and empirically validate our claims. In
our case, by simply replacing ReLU activation with sigmoid, we were able
to reduce the output error by 1750 times in the best case to 43.75 times
in the worst case.