1 Introduction

Usually, facial plastic surgery reshapes the structure or improves the appearance of the face or neck. Procedures typically include the nose, ears, chin, cheekbones, and neckline. People seeking this surgery might be motivated by a desire to remove irregularities introduced in the face by an injury, a disease, congenital disabilities, or post-surgical deformities. The following is according to the recent statistics released by The American Society for Aesthetic Plastic Surgery (ASPS)Footnote 1 for the year 2019 and International study on aesthetic procedures (ISAP)Footnote 2 for the year 2019.

  1. 1.

    According to the ASPS, almost 17.7 million people underwent surgically and minimally invasive cosmetic procedures, and more than 5.9 million reconstructive procedures happened in the United States in 2019.

  2. 2.

    In 2019, According to the ISAP, more than 2 million facial rejuvenation surgical procedures were performed, and the most common were Chemical Peel, Full-Field Ablative Resurfacing, Micro-Ablative Resurfacing, Dermabrasion, Microdermabrasion, Nonsurgical Skin Tightening, and Face Rejuvenation.

  3. 3.

    In 2019, 11.1% and 8.5% of face and head plastic surgery procedures were performed in Brazil and the USA, respectively. 10.5% and 18.3% of facial rejuvenation were constituted by USA and Japan, respectively.

  4. 4.

    Plastic surgery distribution by age: 0 − 18 years constitutes 4%, 19 − 34 years constitutes 43%, 35 − 50 years constitutes 36%, 51 − 64 years constitutes 14%, and 65 years and above constitute 3% of the total number of plastic surgery procedures.

  5. 5.

    86.5% of women and 13.5% of men are now more affirmative toward plastic surgery procedure as per the data provided by the ISAP in 2019.

The statistics provided by ASPS and ISAP indicate the popularity of facial plastic surgery across all age categories, ethnicity, countries, and gender. In South Korea, every one in three women between 19 to 29 year had a cosmetic or plastic surgery.Footnote 3

These surgical processes demonstrate ideal for individuals experiencing facial distortions or those who want to counter the aging process. Similarly, these procedures can be misused by individuals who are attempting to hide their personality with the goal to cause extortion or dodge law implementation [34]. These surgeries may permit anti-social elements to openly move around with no dread of being recognized by any automated face recognition framework. A considerable amount of research on plastic surgery has been performed and reported [33], cross-age face recognition(CAFR) [40], synthetic aging [5, 8], and synthetic face mask recognition [27]. Due to advances in technology in the medical field, variation in faces due to plastic/cosmetic surgery has also led to the emergence of co-variates [33] of face recognition. Furthermore, if we add synthetic aging to surgery faces then the cross-age face recognition task becomes arduous.

Facial plastic surgery is a discipline that requires years of training for a surgeon to gain the necessary experience, skill, and dexterity. As the demand of minimally invasive procedures (MIP) is increasing rapidly, the patient wants to know how the changes are reflected on their face after the surgery. But in these procedures, the surgeon’s vision often relies on their own and the patient’s imagination completely. Due to the lack of appropriate visualization techniques and technology, surgeons are bound to rely on their skills and imagination power while performing the surgery; this can make this task more challenging.

To attempt to alleviate these challenges, we propose the PlasticGAN framework which can generate diverse photo-realistic faces with respect to facial surgery; this can work as a middleware between surgeons and patients and aid clinical decisions with the help of vivid visualizations. In this manuscript, we also focus on quantifying the effect of plastic surgery with aging and wearing face masks on the performance of face recognition systems.

Our key efforts are summarized as follows:

  1. 1.

    An effective Conditional Generative Adversarial Network (cGAN) based network, PlasticGAN, is proposed to solve the face aging and rejuvenation problem on faces that have undergone plastic surgery for the very first time. Specifically, age and gender are passed as conditional information into both the discriminator and generator to acquire more fine-grained information between input and synthesized results. Besides, BlockCNN-based residual blocks are adopted to remove the artifacts and improve convergence behavior.

  2. 2.

    PlasticGAN will work as a middleware between surgeons and patients in terms of motivation provider and confidence booster for the surgery by providing a better glimpse of post-surgery looks.

  3. 3.

    Our framework does not require pre- and post-plastic surgery faces in the training dataset. At the time of testing, our model synthesizes face aging, rejuvenation, and face completion on surgery faces.

  4. 4.

    We defined a new challenge in the face recognition field named plastic surgery-base synthetic cross-age face recognition (PSBSCAFR).

  5. 5.

    We evaluated the robustness of the PlasticGAN model. For this, we performed an excessive qualitative and quantitative comparison with faces with and without face masks that will contribute to the forensic and law enforcement field.

The primary aim of this paper is to add a new dimension to clinical decision-making between surgeon and patient as well as lend an impact on cross-age face recognition on faces that have undergone plastic surgery. The remainder of the paper is organized as follows: In Section 2 we present a detailed description of different types of plastic surgery. In Section 3, we provide the generative model-based related work on face images. In Section 4, we present the proposed PlasticGAN model in detail. Section 5 presents the overall objective functions used in the optimization of our model. Section 6 describes data collection, pre-processing on surgery, and mask-wearing face. Section 7 presents the qualitative study on different types of surgeries. Section 8 discusses qualitative study on mask-wearing faces. Section 9 presents extensive quantitative experiments to demonstrate the superiority and wide-range practical application of our proposed model. Section 10 presents the performance ablation study. Finally, in Section 11, we conclude the paper and discuss the challenges in the face recognition research domain.

2 Plastic surgery and face recognition

Primarily, plastic surgery constrained to the face can be characterized into two major categories: (1) local plastic surgery and (2) global plastic surgery. Local plastic surgery accounts for correcting defects and craniofacial anomalies and improving skin texture. Procedures for local plastic surgery include Rhinoplasty, Mentoplasty, Blepharoplasty, Browlift, Malar Augmentation, and Otoplasty. It is also used for cosmetic and aesthetic purposes [28]. Global plastic surgery remodels the overall facial structure. This plastic surgery procedure is recommended for cases of recovery from fatal burn, changing facial appearance and skin texture, and modifying facial geometries. Procedures for global plastic surgery include Rhytidectomy, Skin Peeling, Craniofacial, Dermabrasion, and Mole Removal [26, 32].

When global plastic surgery is performed on an individual, face components such as nose, lips, and eyelid geometries might be disturbed or modified. In parallel, we observed that face recognition accuracy was significantly improved by the commercial of the shelf (COTS) and open source deep face recognition system on face aging [5], plastic surgery [34], and disguise face [10].

In this paper, PlasticGAN based on Generative Adversarial model is synthesized with reference to faces that have undergone plastic surgery with and without mask considering face aging and rejuvenation effect. Subsequently, we performed face verification using Face++ app on faces generated by PlasticGAN. Hence, this creates a significant challenge for the face recognition system as it produces extensive changes in facial features. This challenge, namely plastic surgery-based synthetic cross-age face recognition (PSBSCAFR) can become a new dimension for upcoming researches.

3 Related work

3.1 Generative adversarial networks

GAN was proposed by Ian Goodfellow et al. which incorporates two networks, a generator which produces new instance data by accepting noise, a discriminator of the genuineness of the produced images by generator. GAN has gained interest due to its high performance in a wide range of practical applications such as facial attribute alteration [3, 19], finding missing children [4, 5], transferring and removing makeup style [7], super-resolution techniques [22], anti-forensic [11, 37] and law enforcement [1, 21], Age-Invariant Face Recognition [40], etc. Additionally, GAN can explicitly control the features or latent vector in a manner that in its given class includes categorical description of text [29], landmark-type conditions, and background and foreground details to generate images; these conditions make it a conditional GAN (cGAN) [25] model.

However, GAN still has the disadvantage of unstable training and mode collapse problems. GAN-based Single Image Super-Resolution (SISR) models such as SRGAN [22] generate photo-realistic results with respect to these problems. However, the loss function on the feature space makes it to sacrifice the issue of its. Simultaneously, Wasserstein GAN (WGAN) [2] and Wasserstein GAN-Gradient Penalty (WGAN-GP) [15] improved training techniques by adding a Wasserstein (Earth Mover) distance metric loss to address the issues of generator and discriminator training and gradient vanishing. They stabilize their training over an extensive range of architectures with almost no hyper-parameter tuning. It does not rely on weight clipping techniques but penalizes the model if the gradient norm moves away from its target norm value 1. The adversarial loss proposed by Gulrajani et al. [15] moved the distribution of the generated images to the distribution of the real images. Especially, we seek to generate photo-realistic as well as less blurry images of post-surgery faces. To accomplish this, we employ deep feature consistent principle [17] to generate comprehensible face images with natural eyes, teeth, hair texture, and nose. In parallel, we also use improved GAN training mechanism to generate images that lie in the manifold of natural images.

All these principles and improved GAN techniques motivate us to work in facial plastic and aesthetic surgery fields, benefiting the society with its needs and considering current trends such as forensic and security application and the recent worldwide COVID-19 face mask scenario.

3.2 Face aging and rejuvenation

Face age progression is the prediction of future looks, and rejuvenation is the estimation of younger faces also referred to as facial age regression. It significantly impacts a wide range of applications in various domains. Generative models [14] are progressively used to perform face aging and de-aging due to their unquestionable and plausible generation of natural face images with an adversarial training approach. For e.g Zhang et al. proposed CAAE [38] for face aging and de-aging framework that learns a face manifold. Yang et al. [36] designed pyramidal adversarial discriminator for high-level face representations in a detailed manner. Wang et al. [35] presented an identity permanence conditional GAN and used pre-trained age classification loss to estimate the age correctly. Attention-cGAN [41] used the attention-based mechanism, which is a modification of only the facial regions relevant to aging. Recently, Praveen et al. [5] proposed ChildFace specific to child-based face aging and de-aging by introducing the gender and age-aware condition vector to preserve the identity in a small age span.

The aforementioned generative models used for beautification and face rejuvenation inspired us to propose PlasticGAN that integrates both face aging and rejuvenation. Moreover, it does not require any training on the dataset with before and after plastic surgery faces. We leverage the conditional GAN-based architecture integrated with adversarial, perceptual-based identity, and reconstruction loss function. The proposed model is designed as an innovative method for vivid visualization of realistic post-surgery faces to help surgeons in building up confidence and acquiring the patient’s acceptance for surgery.

4 Network architecture

We propose a PlasticGAN framework. The architecture of this framework evolved from the deep feature consistency principle [18], adversarial auto encoder [23], and BlockCNN base residual block [24]. As depicted in Fig. 1, the PlasticGAN system consists of four deep networks: a) deep residual variational autoencoder including a probabilistic encoder E(x), b) probabilistic generator G(z,l), c) pre-trained VGG19 (Φ)[31] for identity preservation, and d) deep residual critic discriminator \(D_{img}(x,\bar {x})\). This model is based on the principle of WGAN-GP [15] to improve the accuracy results for face aging and rejuvenation in terms of generating natural and realistic images.

Fig. 1
figure 1

Overview of training and testing phase of the proposed PlasticGAN model. Encoder (E),Generator (G), and Discriminator (Dimg) are used for reconstruction and aesthetic surgery. \({\mathscr{L}}_{KL}(\mu ,\sigma )\) represent the KL loss, \({\mathscr{L}}_{rec}\) represent the reconstruction loss, Φ represent the perceptual loss, \({\mathscr{L}}_{advG}\) and \({\mathscr{L}}_{advD_{img}}\) represent generator and discriminator adversarial loss. For simplicity, we have omitted the total variation loss \({\mathscr{L}}_{TV}\)

The encoder (E) compresses the input image x having the size 128 × 128 × 3 through two fully connected layers (for mean μ and variance σ), and then they are added to sample latent vector z. Furthermore, this vector z appends with identity feature maps that combine two vectors: 12 times of age vector (a) and 25 times gender vector (g). Then, the output from previous zl passes as an input to the generator (G) for generating image \(\bar {x}\).

Then in the next step, VGGNet takes both x and \(\bar {x}\) as input and extracts deep feature from these images and then constructs the perceptual loss. Meanwhile, the generated images and the real image are conveyed to the Discriminator Dimg. The idea is to take these two as inputs to perform the adversarial min-max game policy and to calculate the discrepancy between the generated and input images. In addition to this information, Dimg also considers the identity feature map l (e.g. age and gender vector) in its first hidden layer as depicted in Fig. 1; these vector values are used as a conditional setting to obtain more fine-grained age and gender information between x and \(\bar {x}\).

E, G, and Dimg have BlockCNN-based residual blocks after each convolution and deconvolution layer except the first one of Dimg. These blocks are used to improve convergence behavior and remove the compression artifact [24]. The spatial differences of pre-trained VGG19 network are calculated in the middle of the layer architecture and then are combined to find total perceptual loss (Φ). This loss network is based on the principle of deep feature consistency [17] and is used to capture the most prominent image features. Our model generates age-progressed and regressed plastic surgery images with comparatively better aesthetic results in terms of the reconstruction of damaged facial parts such as nose, teeth, lips, mouth, and ear textures.

5 Objective function

The adversarial training of PlasticGAN can be considered as a two-player min-max game in which the team of probabilistic residual encoders and generators is trained against the team of residual adversarial discriminators. Both teams have to minimize five losses: 1) KL divergence loss \(({\mathscr{L}}_{KL}(\mu ,\sigma ))\) to regularize the feature vector z with the prior distribution \(P(z) \sim N(0,1)\); 2) the reconstruction loss \(({\mathscr{L}}_{rec})\) between input and generated images is adopted so that sparse aging outcomes are produced and the image background is preserved; 3) \({\mathscr{L}}_{\Phi }\) perceptual loss is computed by pre-trained high-performing CNN as VGGNet [31] the loss network captures the spatial correlation between x and \(\bar x\) face images; 4) the aim of \({\mathscr{L}}_{adv}\) adversarial by incorporating the WGAN-GP into PlasticGAN is to improve the perceptual quality of the output images; 5) The total variation loss \(({\mathscr{L}}_{TV})\) function goal is to regularize the total variation in the generated images.

KL divergence loss

\({\mathscr{L}}_{KL}(\mu ,\sigma )\) helps the residual encoder network learn better feature space representations. For input face x image, the E network E(x) = (μ(x),σ(x)), output the mean μ and variance σ of the approximate posterior. To calculate the feature vector z forgiven x, we sample a random 𝜖 Gaussian distribution where \(\epsilon \sim \mathcal {N}(0,1)\). We sample the feature vector using \(z = \mu + \sigma \bigodot \epsilon \), where \(\bigodot \) represents element-wise multiplication. \({\mathscr{L}}_{KL}(\mu ,\sigma )\) as shown in (1).

$$ \mathcal{L}_{KL}(\mu,\sigma) = -\frac{1}{2}\underset{k}{\sum}(1+\log({\sigma_{k}})-{\mu_{k}}{~}^{2}-{\sigma_{k}}) $$
(1)

where k denotes the indexes over the dimensions of the latent vector.

Reconstruction Loss

\({\mathscr{L}}_{rec}\) ensures the generated image preserves the low-level image content between its input x. For this, we incorporated a mean square base reconstruction loss between x and \(\bar {x}\) in the image space which could be written as (2).

$$ {\mathcal{L}_{rec} = ||x- (G(E(x),zl))||_{2}^{2} } $$
(2)

Where G is taken in the latent vector z generated by E(x) and the identity feature map (l) concatenated with z and passed as zl with input image x.

Perceptual loss

Perceptual loss calculates the spatial difference between the layers of VGG19 [31] and effectively minimizes the perceptual distance between the synthesized \(\bar {x}\) and input image x. Here, we denote perceptual loss by Φ(x)l with l as the layer. Here, we exploit the intermediate activation layer feature map denoted as relu1_1,relu2_1, relu3_1,relu4_1,relu5_1 of VGG as VGG19.

$$ {{\varPhi} = {\varSigma}_{{\varUpsilon}} {\varPhi}^{{\varUpsilon}}} $$
(3)

In order to calculate ΦΥ at layer l, we use Euclidean distance between the activation map of module l for input image x and generated image \(\bar {x}\). Here C, W, and H denote the number of filters, width, and height of each feature map, respectively. ΦΥ denotes the perceptual loss of a single layer Υ.

$$ {{\varPhi}^{{\varUpsilon}} = \frac{1}{2{\times}C^{{\varUpsilon}} W^{{\varUpsilon}} H^{{\varUpsilon}}}\sum\limits_{c=1}^{C^{{\varUpsilon}}} \sum\limits_{w=1}^{W^{{\varUpsilon}}} \sum\limits_{h=1}^{H^{{\varUpsilon}}} ({\varPhi}(x)^{{\varUpsilon}}_{c,w,h}-{\varPhi}(\bar{x})^{{\varUpsilon}}_{c,w,h})^{2}} $$
(4)

Adversarial loss

The adversarial training between the generator G and discriminator Dimg stimulates the generated results to be realistic and identical to real ones. Besides, image generation quality and attribute immutability is also guaranteed by including the attribute of input face images as a conditional vector in adversarial training. To accomplish these two goals, our discriminator Dimg is designed to take the input and generated images with their corresponding attributes after the first convolution block. Dimg calculates the improved adversarial loss by discriminating between input image x and the image generated by G. Formally, the objective function for training the discriminator adversarial loss \(({\mathscr{L}}_{advD_{img}})\) is shown in (5).

$$ \begin{array}{@{}rcl@{}} \small \mathcal{L}_{advD_{img}}= \mathbb{E}_{x,l\sim P_{data}\left( x,l\right)}\left[D_{img}\left( x,l\right)\right]\\ -\mathbb{E}_{x,l\sim P_{data}\left( x,l\right)}\left[\left( D_{img}\left( G\left( E\left( x\right),l\right)\right)\right)\right]\\- \lambda_{gp}\mathbb{E}_{\hat{x}\sim P_{\hat{x}}}[(||\bigtriangledown_{\hat{x}}{D_{img}}{(\bar{x})||_{2}-1)^{2}}] \end{array} $$
(5)

Where \(\hat {x} \sim P_{\hat {x}}\) is uniformly sampled along straight lines between pairs of input x and generated \(\bar {x}\) images, and λgp is the penalty coefficient to penalize the gradients ncritic= 5. The generator network G is trained to confuse Dimg with visually conceivable synthetic images and the objective function is shown in (6).

$$ \mathcal{L}_{advG}= -\mathbb{E}_{x,l\sim P_{data}\left( x,l\right)}\left[D_{img}\left( G\left( E\left( x\right),l\right)\right)\right] $$
(6)

Total Variational Loss

\(({\mathscr{L}}_{TV})\) Total Variational (TV) loss is used for assuring measurable continuity and smoothness in the generated image to avoid noise and sudden changes in high-frequency pixel intensities. The TV loss is the sum of the absolute differences for adjacent pixel values in the generated image. (7) shows TV loss.

$$ {\mathcal{L}_{TV}=\sum\limits_{c=1}^{C} \sum\limits_{w=1}^{W} \sum\limits_{h=1}^{H} \lvert(\bar{x})_{w+1,h}-(\bar{x})_{w,h})\rvert^{2} + \lvert(\bar{x})_{w,h+1}-(\bar{x})_{w,h})\rvert^{2} } \setlength{\textfloatsep}{1pt} $$
(7)

Overall objective

To generate realistic faces while also preserving the identity corresponding to input. The final objective function for the discriminator Dimg is shown in (8).

$$ \underset{||D_{img}||_{L}\leq1}{max}\mathcal{L}_{D_{img}} = \lambda_{advD_{img}}\mathcal{L}_{advD_{img}} $$
(8)

where ||Dimg||L ≤ 1 represents the set of 1-Lipschitz constraint on Dimg. The final objective function for the generator G is shown in (9).

$$ \underset{E,G}{min}\mathcal{L}_{G}=\lambda_{kl}\mathcal{L}_{KL}(\mu,\sigma) + \lambda_{rec}\mathcal{L}_{rec} + \lambda_{per}{\Phi} + \lambda_{advG}\mathcal{L}_{G} + \lambda_{tv}\mathcal{L}_{TV} $$
(9)

where \(\lambda _{kl}, \lambda _{rec}, \lambda _{per}, \lambda _{advD_{img}}, \lambda _{advG}, \lambda _{tv}\) are the hyper-parameters that tune the weight of the above-mentioned loss function. In our model, we used λkl,λrec,λper, λtv, \(\lambda _{advD_{img}}\) as 1 and λadvG as 0.0001.

6 Experimental results

The primary objectives of facial plastic surgery are to reconstruct faces, remove defects, and improve the appearance of the patient or preserve the facial personality. In this section, we demonstrate the power of publicly available facial dataset on a large scale. This section is further divided into 3 subsections: (1) Description of dataset which merges the different publicly available datasets. (2) Dataset preprocessing. (3) Processing of mask-to-face mapping on surgery faces.

6.1 Dataset

To train a relevant population of diverse facial plastic surgery synthesis models, one of the key elements is to generate plausible images and reasonably aged face images from different ethnicities. Thus, we have selected [1 − 40] year age range images from publicly available Cross-Age Celebrity Dataset 35,450(CACD) [6], 5822 UTKFace [38], 35,484 CLF [4, 9], and 1113 Adience [12]. In total, we have used 77,889 images of size 128 × 128 pixels and divided the dataset into 4 age groups as [1 − 10],[11 − 20],[21 − 30],[31 − 40] as shown in Fig. 2. To test our model on plastic and aesthetic surgery face images, we have web-crawled real-world pre- and post-surgery face images. In total, there are 24 paired before and after surgery face images which are referred to as plastic-surgery testing images. The test dataset contains various types of face surgeries such as Otoplasty (ear surgery), Skin Resurfacing (skin peeling), Lip Augmentation, Oral Surgery (teeth surgery), Craniofacial, and Dermabrasion. The testing dataset of plastic surgery includes pairs of before and after plastic and aesthetic surgery faces as shown in Fig. 3.

Fig. 2
figure 2

Training dataset group formation: shows the distribution of face images in given age ranges to train PlasticGAN model

Fig. 3
figure 3

First and third row are examples of pre- and post-surgery images respectively where images were acquired using web-crawling. Second and fourth show the cropping and alignment effect of MTCNN[39] face detector. The types of plastic surgery procedures are given below the fourth row

6.2 Prepossessing on dataset

For training the proposed model to reconstruct damaged areas in the correct orientation, preprocessing the dataset is necessary. As the dataset images have improper alignment and different resolutions, we have used MTCNN [39] to detect the five landmarks points (two eyes, nose, and two mouth corners) used for proper alignment and for cropping the images to a resolution of 128 × 128 pixels as shown in Fig. 3.

6.3 Processing on Mask-to-face mapping

To test our proposed model on the mask-wearing face, we have to cover pre-surgery images with a mask; we used MTCNN [39] to crop and align pre-surgery faces. Besides, an image of 12 key points has been manually annotated on the reference mask image as show in Fig. 4. In the final stage, it has used the face-to-mask mapping on cropped and aligned images.

Fig. 4
figure 4

First column presents the examples of pre-surgery face images where images were acquired using web-crawling. The second column shows the cropping and alignment effect of MTCNN [39] face detector. Third column references the mask images. Fourth column presents the mask-wearing face

6.4 Implementation details

We have trained PlasticGAN model on 77,889 images and divided these into 4 equal age categories, i.e., 1 − 10, 11 − 20, 21 − 30, 31 − 40. The architecture of our model is presented in Fig. 1. In the training phase, all components are trained with batch size 24 using ADAM [20] with hyper-parameter α = 0.0001 and β = (0.5,0.999). The output of Generator (G) is restricted to [− 1,1] using the \(\tanh \) activation function. After 20,000 iterations, we received competent results. In the testing phase, we included plastic surgery images with and without masks, E and G are responsible for generating age-progressed and regressed facial images. The model was trained from scratch with a learning rate of E, G, and Dimag as 0.0001. We have optimized Dimag every 5 iterations and G is updated at every iteration.

7 Qualitative evaluation of plastic surgery face

In order to extensively evaluate the performance of our proposed PlasticGAN framework, we use state-of-the-art work CAAE Footnote 4, AcGAN Footnote 5, AIM (Age-Invariant Model) Footnote 6 and IPCGAN Footnote 7.

The following presents important common observations related to CAAE, AcGAN, AIM, and IPCGAN on various surgery faces. As observed, CAAE does not perform well in the aging effect and even produces artifacts and blurry results due to pixel-wise loss between the input and generated images. AcGAN utilizes an attention mechanism that only modifies the regions relevant to the aging effect. Hence, AcGAN performs poorly on plastic-surgery testing images in terms of reconstruction (teeth, face, lips), aging effect, and generate wired faces. Age-Invariant Model (AIM) though addresses the challenges of face recognition with large age variations but is not capable of generating photorealistic surgery faces with the desired aging effects. IPCGAN uses Image-to-Image translation-based generator network component so it cannot properly structure the different types of plastic surgery faces into a realistic face. Compared to state-of-the-art aging frameworks, the age progress and regress images of the PlasticGAN model are photo-realistic, and it rejuvenates identity-preserved faces on plastic-surgery faces.

7.1 Evaluation I: Teeth surgery

As is evident in each dotted box 6th column of Fig. 5, the pre-surgery image is improperly aligned (crooked, missing teeth). Our model has generated a perfect-looking set of teeth and the real image for the same which is shown in post-surgery row. In 6th column, missing teeth are generated in the mouth. As seen in all age range observations, the aesthetics of the face improve with age. In addition, the respective skin texture is preserved. At the beginning of this Section 7, we have mentioned logical reasons and demonstrated why the face-aging state-of-the-art models do not perform well on surgery faces in terms of post-surgery looks.

Fig. 5
figure 5

Teeth Surgery: Each dotted box denotes one person’s pre- and post-teeth surgery images. In each box, from second column left to right, are images generated by CAAE, AcGAN, AIM, IPCGAN, PlasticGAN, and ablation study, respectively

7.2 Evaluation II: Face surgery

As shown in Fig. 6, the pre-surgery image has no nose and mouth. However, PlasticGAN generated both the missing components perfectly by adhering to structure and texture of the face and even improved the appearance of eyes, producing a youthful appearance. AIM addresses the challenges of cross-age face recognition with large age, however, it is not capable of generating visually appealing faces with the desired aging effect. IPCGAN and AcGAN generator network components translate the input to image space and reconstruct from this space; hence, they cannot properly structure a before-surgery face into an after-surgery face as shown in Fig. 6. Due to this reason, these frameworks are not very helpful to clinical decision-making between doctor and patients.

Fig. 6
figure 6

Face Surgery: Each dotted box denotes one person’s pre- and post-surgery facial images. In each box, from second column left to right, are images generated by CAAE, AcGAN, AIM, IPCGAN, PlasticGAN, and ablation study, respectively

7.3 Evaluation III: Ear surgery

In Fig. 7, the pre-surgery image shows an ear lunging outwards. In PlasticGAN, the generated images are aligned perfectly to the normal settings. In addition, PlasticGAN constructs the internal structure of the ear compared to the state-of-the-art models. Therefore, the post-surgery images resembles the age progress and regress images. In the case of IPCGAN and AcGAN, only the face region is altered. Therefore, the age progress and regress ear surgery images do not properly align.

Fig. 7
figure 7

Ear Surgery: Each dotted box denotes one person’s pre- and post-ear surgery images. In each box, from the second column left to right, are images generated by CAAE, AcGAN, AIM, IPCGAN, PlasticGAN, and ablation study, respectively

7.4 Evaluation IV: Lips surgery

In Fig. 8, the pre-surgery image contains a partial lip and a deformed face structure. PlasticGAN input this image and completed the lip as well as produced open eyes, depicting how the child will look in the future. In addition, PlasticGAN performed age translation and beautifies the entire face, which enhances the rejuvenation effects. Compared to our framework, CAAE produces over-smoothed surgery images with subtle changes of appearance. As for IPCGAN and AcGAN, due to their incapability of face completion, faces generated by these models are deformed as evident in Fig. 8.

Fig. 8
figure 8

Lip Surgery: Each dotted box denotes one person’s pre- and post-lip surgery images. In each box, from second column left to right, are images generated by CAAE, AcGAN, AIM, IPCGAN, PlasticGAN, and ablation study, respectively

8 Model robustness on mask wearing face

To check the robustness of the proposed model, we covered the nose and mouth areas with synthetic masks and checked the aging effect on overall plastic surgery face. As shown in Fig. 9, IPCGAN and AcGAN models could not remove the face mask, could not complete surgery tasks, and could not show the aging effect. In case of CAAE, the face mask region can be seen with a little transparency which causes artifact in generated images. These effect due to pixel-wise loss. As evident from the results for PlasticGAN, it performed well overall in the context of various parameters such as skin tone, hair color, open eyes, reconstruction, and lighter to darker beard appearance. In addition to these, it generated better facial structures with the aging effect.

Fig. 9
figure 9

Mask Wearing face: Each dotted box denotes the same individual’s mask-wearing and pre-surgery images. In each box, from second column left to right, are images generated by CAAE, AcGAN, AIM, IPCGAN, PlasticGAN, and ablation study, respectively

9 Quantitative evaluation

Most of the existing age estimation and face verification approaches have primarily focused on unconstrained face recognition and no endeavor has been made to examine their effect on synthesized face aging and rejuvenation on local and global plastic surgery faces as well as mask-wearing surgery faces. As surgery-based aging and rejuvenation procedures becomes more and more prevalent, face verification framework fails to recognize individual’s faces after surgery. In this section, we explore age estimation and verification aspect on synthesized surgery faces.

We have evaluated the aging and identity permanence accuracy on age progress and regress on plastic surgery face with/without the mask. For this, we have generated all age range [1 − 10], [11 − 20], [21 − 30], [31 − 40] faces from pre-surgery faces with/without the mask. Then, we used the online face analysis tool known as Face++ API [13] to estimate the age distribution and face verification scores. We considered twenty-four test faces and the following protocol used for our comparison:

Face+ + test: [test face, progress-face1], [test face, progress -face2], [test face, progress-face3], [test face, progress-face3], [test face, progress-face4]. (where test face is pre-plastic surgery faces with/without mask).

9.1 Age estimation

Identically, age estimation was conducted to measure aging and deaging accuracy.

9.1.1 Plastic surgery face without mask

Following are the observations on plastic surgery face without mask shown in Table 1. IPCGAN and AcGAN introduce an identity-preserved loss and an age classification loss with the Image-to-Image translation-based generator network. However, if the classification error value is high then the gradient for small age range is not accurate. Therefore, these models’ age estimation accuracy is lower compared to PlasticGAN in most age ranges. AIM generated similar-looking and aged faces as depicted in Figs. 567, and 8. Therefor, this model’s age estimation standard deviation values are high in all age ranges. In case of CAAE, the generated images are blurry owing to the fact that the generated faces in the age ranges (1 − 10, 11 − 20) are aged. PlasticGAN model provides better age estimation results in three age ranges out of four compared to other state-of-the-art models.

Table 1 Estimated Age Distribution (years) on Plastic-surgery testing images by PlasticGAN, ablation, and state-of-the-art models. For simplicity, we only address the mean and standard deviation of age estimation error computed over all age ranges

9.1.2 Plastic surgery face with mask

CAAE uses the mean square-based reconstructed loss, and AcGAN and IPCGAN generator architecture are based on image-to-image translation network. AIM disentangles the age and identity attributes. Consequently, the progress and regress images with face mask are completely deformed as they detect local and global face attributes e.g, nose, lips, mouth, eyes completely deformed. Therefore, state-of-the-art model generated the aged face in age ranges (1 − 10, 11 − 20, 21 − 30) compared to PlasticGAN as illustrated in Table 2.

Note: Face images generated by AcGAN and IPCGAN are not detected by Face++ app because due to the fact that they did not remove the face mask region and due to unstructured eye construction.

Table 2 Estimated Age Distribution (years) on Plastic surgery with mask, testing images using PlasticGAN, ablation, and state-of-the-art models. For simplicity, we only address the mean and standard deviation of age estimation error computed over all age ranges

From the result in Tables 1 and 2, we have the following observation.

  1. 1.

    The age estimation results of no mask compared to with mask are better.

  2. 2.

    IPCGAN and AcGAN do not unmask the surgery face as shown in Fig. 9. Therefore, the aging pattern only reflects the periocular region and forehead. Thus, the age estimation value is not properly distributed over the age ranges as shown in Table 2.

9.2 Face verification on surgery faces with and without mask

For identity permanence, face verification rates are reported along with threshold set to 76.5@FAR = 10− 5 experimented according to the Face+ + API [13]. A confidence score is then obtained for each comparison, demonstrating the similarity between two faces. The confidence range lies under [0 − 100]. A higher confidence score indicates a higher probability that two faces (real and generated) are from the same subject.

9.2.1 Plastic surgery face without mask

With plastic surgery, it is evident that PlasticGAN outperforms over AcGAN and AIM. Although, CAAE in the context of identity information generates surgery faces with ghosting artifacts. PlasticGAN, AIM, and CAAE aging models generate age progress and regress face by disentangling the age and personality features from the latent vector, due to which identity is also altered with age. Thus, face verification accuracy is low compared to IPCGAN and AcGAN.

9.2.2 Plastic surgery face with wearing mask

To verify the robustness and stability of the proposed model, even if the mask covers a region of the plastic surgery face, the features of the upper half of the face, such as eyes, eyebrow, and forehead can still be used to improve the masked cross-face recognition (MCFR). The experiment results are shown in Fig. 9; PlasticGAN can still progress and reconstruct a complete face. In AcGAN and IPCGAN, the network component is unable to unmask the masked face. These models alter the regions particularly relevant to face aging. This is why the face verification accuracy is high compared to other models as shown in Table 4. Objectively speaking, a few progress faces have distortion. Thus, the Face++ APP is unable to detect. Following are general observations from Tables 3 and 4.

  1. 1.

    Without wearing mask cross-age face verification score is improved with wearing face mask in all age ranges and face aging model.

  2. 2.

    With increasing the age gap, the face verification score is decreased in all age ranges.

  3. 3.

    The face verification score of CAAE and AIM models are lower than PlasticGAN.

  4. 4.

    IPCGAN and AcGAN do not unmask the surgery face as shown in Fig. 9. Owing this, the face verification accuracy is better of these models with the no-mask surgery face.

Despite the potential of our proposed model, we can conclude that the task of cross-age face recognition on these synthetic faces after applying surgery becomes challenging as it degrades the verification score. This challenge, namely PSBSCAFR, can become a new dimension for upcoming research regarding how to improve the recognition accuracy of synthetic surgery faces generated by GAN.

Table 3 Face verification results (in %) on Plastic surgery testing images by PlasticGAN and other state-of-the-art models
Table 4 Face verification results (in %) on plastic surgery with mask; testing images by PlasticGAN and other state-of-the-art models

9.3 Comparison between surgery face with and without mask

Traditional CNN-based face recognition systems trained on existing datasets are almost ineffective on faces that have undergone surgery or that are wearing a mask. Simultaneously, new challenges create new opportunities and research direction in this field. The one which we want to include in this study is how plastic surgery face and facial mask face can be correlated based on the idea of and considering face verification of same individuals. During our experiment, we observed a few significant aspects which can also be seen in the Fig. 10. We have conducted this study by Face++ App.

  1. 1.

    In many cases, masked faces are not even detected when the eyes are closed and face is not properly aligned.(In block 2 and 4 of Fig. 10).

  2. 2.

    Different types of face surgery with mask leads to many differences in the impact on face recognition reliability. When it is the case of the mask covering the lip and half nose region, the confidence score is upgraded because eyes are an important consideration at the time of verification. This score is also affected by same mask covering. When it is the case of the above-mask region, the score is degraded due to the fact that the recognition system does not find any similarity points while verification except the mask. (In block 1 and 3 of Fig. 10).

Fig. 10
figure 10

Each block is an example of pre- and post-surgery face with and without mask. We have evaluated the confidence score with the help of Face++ app. ND represents the cases of no detection because the face is covered with mask and showing non similarity

9.4 Inception and Fréchet inception distance

The image quality with the diversity of the generated data is assessed in terms of the inception score (IS) [30] and the Fréchet inception distance (FID) [16]. In Table 5, the PlasticGAN achieves best IS and FID scores on plastic surgery test datasets compared to the state-of-the-art models. Meanwhile, high IS and low FID score indicate that our framework generates more realistic faces.

Table 5 Comparing IS and FID on PlasticGAN and its variant with other state-of-the-art models

9.5 Beauty Score and Gender Prediction

We evaluate beauty score and gender prediction experiment of PlasticGAN and state-of-the-art models on plastic surgery test images. For fair comparison, Face++ [13] as is used as a face analysis tool to evaluate the beauty score and gender prediction of pre-plastic surgery face and corresponding age progress in all age ranges as shown in Table 6.

Table 6 Beauty score and gender prediction: The second column contains pre-surgery images followed by beautification score with gender prediction values. In column 4, the state-of-the-art model, PlasticGAN, and Ablation study are shown

General observations are as follows:

  1. 1.

    CAAE generates images introducing grain-like artifact, which deteriorates image quality. Due to this reason, wrong gender prediction is shown in red color.

  2. 2.

    IPCGAN and AcGAN use an image-to-image translation-based generator network component. Hence, it cannot properly restructure a partially covered face into a realistic face. Owing to this, the destroyed face is not detected by Face++ App as mentioned (Not Detected ) ND in Table 6.

  3. 3.

    PlasticGAN is better at overcoming ghosting artifacts and color distortions. Further, it maintains uniformity in the background and face boundaries as well as shows comparative the beauty score with other state-of-the-art models.

  4. 4.

    The pre-surgery test images’ gender prediction value is either male or female. However, the generated images’ gender prediction value is changed from the ground truth. Compare to PlasticGAN, state-of-the-art models are predicting incorrectly.

  5. 5.

    In ablation study (Without KL loss), the skin color is shown lighter compared to PlasticGAN as shown in Figs. 567 and 8. Due to this effect, the beauty score value is low compared to PlasticGAN.

10 Ablation study

To comprehend the effect of \({\mathscr{L}}_{KL}(\mu ,\sigma )\) over our proposed model, we conducted an experiment on variants of the PlasticGAN model by removing \({\mathscr{L}}_{KL}(\mu ,\sigma )\) (sampling block). This effect can be easily seen in Figs. 576, and 8 for visual comparison. We have observed throughout the process that PlasticGAN produces artifacts-free age-progressed and regressed faces and applies some effects of beautification as well. As shown in the Tables 12345, and 6, the Kl loss helps in face verification, age estimation, fidelity generation, and gender preservation with beautification score. This further elucidates our objective functions and network components that are designed well for face aging and rejuvenation based on social and forensic applications.

11 Conclusions and future research work

The advancement of generative models in beautification and rejuvenation has inspired and motivated us to propose the robust and general PlasticsGAN framework. This model integrates face aging and rejuvenation, face recognition, and face completion which relies on plastic and aesthetic facial surgery cases. This can contribute to a wide range of applications such surgeon and patient consultancy, forensics and security, digital entertainment, and even the fashion and wellness industry.

Furthermore, PlasticGAN unmasks the mask wearing face and properly structures it with aging/deaging effect. Moreover, the PlasticGAN framework does not require pre- and post-plastic surgery faces as a paired dataset during training. In the testing phase, our model paralelly synthesized face aging, rejuvenation, and face completion on faces that had undergone surgery. From the concluded qualitative and quantitative experiments and from the comparison with state-of-the-art face aging architectures on various plastic surgery faces (teeth, face, ear, lips), it was found that our model is robust and has diverse applications especially in the case of aging and rejuvenation with face completion.

As future work, we would like to enhance the framework’s performance by analyzing face aging and rejuvenation entailed in plastic surgery. This can further degrade a commercial and publicly available face recognition systems performance. When these co-occur with other factors, e.g different types of mask-wearing and synthetic surgery face. This can be a new dimension for future work.