1 Introduction

Online education video is the most common method to deliver teaching content in a special period [1], which is a very important part of course learning. The quality of teaching video affects the quality of the entire online course.

In the art network education, some image details are very important to the teaching content. It mainly completes the transmission of knowledge through some key details in the art image. The purpose and significance of the research on video image inpainting technology in art online education are as follows: To improve the quality of video image: The quality of video in online education directly affects learners' experience, so improving video quality through repair technology can improve video viewing experience and improve learning effect. Protection of intellectual property rights: in online education, video content is an important embodiment of intellectual property rights, and video image restoration technology can help protect the integrity of video content and prevent illegal downloading and appropriation. Reduce production cost: video image inpainting technology can repair existing video material, improve video utilization and reduce production cost. Promoting the development of video technology: Video image inpainting technology is an important application in the field of digital video processing, and its research can not only promote the development of video technology, but also provide related technical support for other fields.

If the image details of the online education video are damaged, it will not only affect the overall effect of the video, but also misunderstand the meaning of the video. Therefore, a large number of domestic and foreign scholars are studying to repair the image details of the online education video, such as: Li et al. proposed a dual generator deep convolutional generative adversarial network model that takes incomplete or noisy image samples as the training set [2]. The incomplete image samples are used as the training set, and the image information similar to the loss area is searched as the training sample by cross calculation. The inpainting results are optimized by two indicators: the discrimination model and the minimization of the mean square error of the total distance change of the inpainted image. Feng et al. proposed to use the sparsity prior of L0 norm and the equivalent form of L0 norm [3], and proposed a non-blind image inpainting model and a game-based blind image inpainting model. The L0 norm is used to constrain the data term or the regularization term, so that the repaired result image is as close as possible to the original image, which is suitable for both the known and unknown missing regions in the noise-free image. Based on the characteristics of the model, the adjacent Alternating Direction Multiplier (PADMM) algorithm and the alternating algorithm based on game were designed to solve the model. Most of the repair methods proposed by Sun et al. using low-rank characteristics are modeled by rank functions [4]. Since the matrix rank function is non-convex and discrete, the solution of this model is an NP hard problem, so the nuclear norm is usually used to convex relax the rank of the matrix. However, there is a certain deviation between the inpainting method based on nuclear norm and the method based on rank function minimization. Therefore, a non-convex low-rank constraint image inpainting method is proposed. That is, the log function is used instead of the nuclear norm to constrain the rank, which can overcome the problem that the nuclear norm cannot well approximate the rank minimization. Yuan et al. proposed an efficient portrait inpainting method based on Generative Adversarial Network (GAN) [5].The algorithm consists of two stages. In the first stage, the image is roughly inpainted based on the encoder-decoder network, and then the human pose information in the image is estimated. The second segment accurately inpaints the portrait based on pose information and GAN. The portrait pose information is used to connect the key points of the portrait pose, form the portrait frame and perform the dilation operation to obtain the portrait pose mask, and then construct the portrait pose loss function for network training. Zhai et al. proposed an image restoration method based on the object vector update module and deep network [6]. Due to its powerful nonlinear fitting ability, deep convolutional neural networks use variable splitting techniques to decouple the original image restoration problem into two sub problems. Then, based on the model based approach, a new transform domain deep CNN framework was proposed to simulate the optimization process of two sub problems and achieve image restoration. Tang. proposed a low rank tensor full color image restoration method based on sub images [7]. To enhance the low rank feature, a low rank tensor completion method based on sub images is proposed. We first sample color images to obtain sub images and use these sub images instead of the original individual images to form tensors. Establish a low rank complete model using tensor kernel norm defined based on tensor singular value decomposition. Finally, based on the standard alternating direction multiplier algorithm, the tensor singular value threshold is obtained to solve the above model and achieve low rank tensor complete color image restoration. The above methods can realize image inpainting, but because the image is not preprocessed, the application effect of these methods is not good, and the calculation is large, the training time is long, and the repair time is also long.

The dual discrimination network is a dual network structure of global discrimination and local discrimination. The local discrimination network is responsible for the identification of the image generation results of the damaged area, and the global discrimination network is responsible for the identification of the repair results of the overall visual connectivity. When the two discriminator networks are difficult to distinguish the true and false input samples, it means that the images generated by the generator network can deceive the two discriminator models at the same time after the adversarial training of the damaged object, and the repair effect is better. Combining the generative adversarial network and the double discrimination network inpainting method, this paper proposes a double discrimination network based online educational video art image detail inpainting method, Firstly, this method utilizes the selection of grayscale changes in the spatial domain to enhance the detailed features of the original video art image, thereby better preserving the image's detailed information. Secondly, this method constructs a dual discriminant generative adversarial network model, which utilizes global and local discriminant networks to distinguish the coherence of image details and the authenticity of image details output by the generative network, thereby effectively improving the repair effect. Finally, this method was trained on the U-Net structure generation network, which effectively restored the structural information of image features, avoided problems such as image damage and edge connection structure distortion, and provided technical support for repairing art image details in online education videos. The innovation of this method lies in the application of dual discrimination generative adversarial networks in the restoration of art image details in online educational videos. By enhancing the details of the original image, and using the double discriminant network to repair the image, it effectively improves the image details recovery effect of online education video, improves the robustness of inpainting, and improves the video quality and viewing experience. The contributions of this method are as follows:

  1. (1)

    This method achieves the restoration of detailed features of art images in online educational videos, thereby improving the quality and visualization effect of the images.

  2. (2)

    This method can improve the dynamic range and grayscale contrast of images, making them clearer, brighter, and more three-dimensional.

  3. (3)

    The average peak signal-to-noise ratio of this method is 30.108, and the average structural similarity is 0.961, proving the excellent repair effect of this method.

  4. (4)

    The average peak signal-to-noise ratio of this method under Gaussian noise interference is 29.68, proving its strong anti-interference ability and good robustness

2 Image detail Inpainting method for Online Educational Video art

2.1 Image detail enhancement of Online educational video Art Based on Spatial Domain Gray Change

Image enhancement is a cutting-edge technology to improve video image quality [8, 9], which belongs to the early stage of image processing and is different from image inpainting. According to the different processing space, image enhancement can be divided into two categories: spatial domain processing and frequency domain processing. The former includes the action on the gray level of the image and the histogram correction, both of which directly deal with the pixel gray value. The latter is to analyze the spectral components of the image, and process the high and low frequency parts of the image by Fourier transform, and finally obtain the desired image results by inverse Fourier transform.

In online educational videos, the details of art images are often reduced and the noise is increased due to external exposure and other interference factors in the channel transmission process. In order to effectively eliminate the noise interference and enhance the light and dark contrast of the image, this paper chooses the gray change in the spatial domain to process the video image.

As an important means of image enhancement, gray transformation can increase the dynamic range of the image, expand the image contrast, and make the image features more obvious to improve the image display effect [10, 11]. Gray transformation can be divided into linear transformation [12] and nonlinear transformation. Let the gray scale range of the original image \(m\left(x,y\right)\) be \(\left[a,b\right]\), and the gray scale of the image \(n\left(x,y\right)\) be extended to \(\left[c,d\right]\) after linear transformation. The relationship between them is as follows:

$$n\left(x,y\right)=c+\left[m\left(x,y\right)-a\right]\frac{d-c}{b-a}$$
(1)

Due to over-exposure or under-exposure, the gray level of detail in artistic images may vary within a small range, and the image may be seen without gray levels and blurred on the computer. Linear transformation can be used to linearly stretch the gray level of each pixel of the blurred image [13], which can effectively improve the visual effect of the image. In order to improve the post-processing effect of image restoration [14] and the feature extraction effect [15], the original online education video image was first binarized and gray-scale processed [16], and then the image domain method based on histogram correction technology was used to equalize the image [17].

2.2 Double discrimination network model design

In this paper, a double discrimination generative adversarial network model is designed to inpaint the details of artistic images in online educational videos. The model consists of a generative network and a discriminative network with two branches. The generation network is implemented by the U-Net network architecture, which aims to generate images similar to the damaged area, and then send the repair results to the double discrimination network for discrimination. The two networks compete with each other, so as to realize the repair of the texture and overall structure of the image damaged area.

2.2.1 Generative network design

The generation network consists of generators [18], and the structure is shown in Fig. 1. According to the Nash equilibrium idea of game theory, the image inpainting problem is regarded as a confrontation between the generator and the discriminator. The objective formula of its generation network is as follows:

Fig. 1
figure 1

Discriminant flow chart of discriminator network

$$\underset G{\min\;}\max_DV\left(D,G\right)=E\left[\log\left(1-D\left(x\right)\right)\right]$$
(2)

\(E\) represents the network energy. After the generator tries to generate an image similar to the original image through the input noise data, it is sent to the discriminator network in the later stage, and the discriminator determines whether the image belongs to the real image \(x\) or the generated image \(\mathrm{D}\left(\mathrm{x}\right)\). According to the discrimination results, the corresponding generation level and discrimination level are continuously optimized until the image generated by the discriminator discriminates the generator as a real image. When the dynamic balance is reached, the loop ends. The result is an image that looks real enough, and a discriminator with strong discrimination performance. The process is as follows:

2.2.2 Design of identification network

Discriminative network is a kind of fully convolutional network [19, 20], which is mostly used for semantic segmentation tasks of images. The discrimination network consists of an encoder and a fully symmetric decoder. The left part is an encoder composed of convolution, pooling and downsampling for feature extraction [21]. The right half is the decoder, which completes the upsampling operation by transposed convolution. The discrimination network uses a special skip connection to concatenate the encoder and the symmetric decoder, and then transmits the information of each layer of the downsampling into the corresponding upsampling, and integrates it to retain more detailed information in the deep network.

Discriminative network The network uses the Max pooling operation for feature extraction, but the image details will cause the loss of structure and available information after the pooling layer. Therefore, the Max pooling operation is canceled in this paper, and the convolution operation with step size 1 is used as the down-sampling mode to obtain the main features of image details. In the decoding process, the deconvolution is used for upsampling, so that the image details can obtain more delicate edge structure and other information. In addition, the number of up-sampling and down-sampling was reduced to two times to avoid the loss of a lot of detail information caused by too many up-sampling and down-sampling times. At the same time, the dilated convolution is introduced, so that the generator can increase the receptive field and obtain a wider range of structural features on the basis of less down-sampling, while avoiding the loss of detailed information. The network input image is generated, and then the image mask is randomly generated, and the corresponding convolution and deconvolution operations are performed together. In this way, the damaged image details can be repaired and become the image details that meet the requirements. The skip connection structure in the discrimination network can well fuse multi-scale features [22] and retain pixel-level detail information. Therefore, the generation network in this paper is improved on the U-Net structure. The network architecture of each convolutional layer and deconvolution layer consists of a convolutional layer, a batch regularization layer, and an activation function. Batch normalization (BN) can ensure the nonlinear expression ability of the model, accelerate the training convergence, and avoid problems such as gradient disappearance. Except for the output layer where Tanh is used as the activation function, the remaining layers are Leaky-ReLU.

2.2.3 Design of adversarial network

Combined with the above network model, this paper designs an adversarial network to repair the image details. The adversarial network is divided into two parts, which are the network responsible for repairing the image details, and the discriminative network responsible for identifying the image repair effect and feeding back to the repair network. Different from the traditional generative network, the adversarial network designed in this paper no longer takes a set of noise as input and repairs the image through continuous convolution and other means, but directly takes the image that needs to be repaired as the input, and repairs the details of a broken image into an image that meets the visual requirements through symmetric convolution layers and deconvolution layers.

In the classical generative adversarial network, \(\mathrm{D}\left(G\left(z\right)\right)\) is used to measure the ability of the generative network to repair the image. In this paper, in order to make the repaired image conform to the visual coherence of the video, the concept of more diversified image loss is introduced to replace the concept of \(\mathrm{D}\left(G\left(z\right)\right)\) in the generative adversarial network, and the simple judgment of the repaired image is introduced. By minimizing the image loss, the inpainting network is made to generate image details as close to human visual coherence as possible, and the best inpainting effect is achieved. Objective function formula of Generative adversarial network for repairing image details:

$$\underset C{\min\;}\max_DV\left(D,C\right)=\begin{array}{c}E_{z\sim P_z\left(z\right)}\left[\log\left(1-Loss\right)\right]\\E_{x\sim P_{data}\left(x\right)}\left[\log D\left(x\right)\right]\end{array}$$
(3)

where: \(C\) represents the inpainting network, which is responsible for generating the inpainted image; \(D\) stands for the discrimination network, which is the global discrimination network and the local discrimination network, respectively, and is responsible for identifying the inpainting effect of the inpainted image. \(x\) is the intact image, extracted from the intact training dataset \(P\); \(z\) is the broken image waiting to be repaired by the repair network \(C\). In Eq. (3), a new image Loss function loss is defined to constant the repair effect of the repair network, which is calculated as follows:

$$Loss={\lambda }_{2}{Loss}_{local}+{\lambda }_{1}{Loss}_{global}$$
(4)

where, \({\lambda }_{1}\) and \({\lambda }_{2}\) are two weight coefficients, which are used to balance the weight ratio between the global discrimination network and the local discrimination network. Lossglobal represents the loss of the global discrimination network, Losslocal represents the loss of the local discrimination network output.

2.3 Design of image detail inpainting method based on double discrimination network

2.3.1 Design and application of global discrimination network

In the design of double discrimination network in this paper, the input region \({Area}_{global}\) of the global discrimination network is not the whole image but limited to the damaged region \({Area}_{broken}\), which is twice the area of the damaged region, so that the area of the two conforms to:

$${Area}_{global}=2\times {Area}_{broken}$$
(5)

The reason why such a restriction is adopted is that in the process of image inpainting, this paper mainly focuses on the following two points: First, the inpainted image generated by the inpainting network should be consistent with visual coherence, that is, the texture and structure of the repaired damaged area is consistent with that of the surrounding intact area [23]. Second, it avoids introducing too much irrelevant feature information interference, which is easy to cause the repair network to learn irrelevant features and affect the repair effect. The method of limiting the size of the input image region does not affect the image identification of the damaged image to be repaired by the global discrimination network. At the same time, because the images outside the restricted area are not input into the global discrimination network, more feature information interference is avoided. In order to avoid the inpainting network learning irrelevant features and increase the structure edge information constraint, this paper limits the interference of irrelevant feature information in the image and adds the structure edge information constraint. This is because the inpainting network excessively pursues the smooth connection between the damaged region and the intact region when it is unconstrained, which will lead to the generation of over-smooth images. Observing these over-smoothed images, we can find some obvious texture information, but most of the structure information that is important for visual coherence images is lost. Therefore, this paper needs to highlight the structural information in the inpainting network, so that the structural information can be well reproduced in the images generated by the inpainting network. In this paper, the intact image and the repaired image are used for edge structure detection, and the structural differences between the two are compared to punish those repaired images that are not consistent with the intact image in structure, so as to improve the attention of the inpainting network to the structural information of the repaired image. In view of the fact that the global discrimination network considers both structure and texture information [24], this paper improves the global loss function.

$${Loss}_{global}=loss\left(D\left({Area}_{global}\right)\right)+{loss}_{structure}$$
(6)

where, \({loss}_{structure}\) is the structural loss penalty term, which is used to penalize the structural difference between the pixels of the repaired image and the intact image:

$${loss}_{structure}=\frac{1}{N}\sum_{i=\mathrm{1,2},\cdots ,N}\Vert {p}_{i}-{q}_{i}\Vert$$
(7)

where, \(N\) is the total number of pixels in region \({Area}_{global}\); \({p}_{i}\) represents the \(i\) th pixel of the intact image, and \({q}_{i}\) represents the corresponding pixel in the repaired image.

2.3.2 Design and application of Local discrimination networks

The discriminator network is mainly used to discriminate whether the input inpainted image data comes from the true sample distribution or from the sample distribution generated by the generator network. The main purpose of applying the discriminator network in this paper is to improve the generalization performance of the model while improving the effect of the samples generated by the generator network in the adversarial training process. The generator network has to fool both discriminator networks. When the two discriminator networks are difficult to distinguish the authenticity of the input samples, it means that after the adversarial training of the damaged object, the images generated by the generator network can deceive the two discriminator models at the same time, and the repair effect is good enough. The two discriminator models used in this paper have the same basic network structure, but their loss values are calculated in different methods in the process of data training, so that the two discriminator networks have different optimization objectives. After the data is entered, it is sampled once. A 2D convolution with the number of convolution kernels 32, the kernel size 3 × 3, and the step size 2 is first performed. Then, the Max pooling operation with a pooling size of 3 × 3 was performed to compress the size of the feature map, reduce the number of training parameters to simplify the computational complexity of the network, and extract the main features of the input damaged image. Then we do LeakyRelU activation.

The second part is 3 downsampling modules, whose structure has many similarities with the downsampling modules in the previous part. In each module, a two-dimensional convolution with a convolution kernel size of 3 × 3 and a step size of 2 is performed first, and then a Max pooling operation with a pooling size of 3 × 3 is performed. One difference is that the data is batch normalized before applying LeakyReLU activation. The other difference is that the number of convolution kernels for the convolution operation in the three down-sampling modules of this part is 32, 64, and 128, respectively. After the data is downsampled, a normal 2D convolution operation is performed on the data, where the number of kernels is 256, the kernel size is 3 × 3, and the step size is 2. This is followed by batch normalization and LeakyReLU activation. The last 2D convolutional layer has the same kernel size of 3 × 3 and step size of 2, but the number of convolution kernels is set to 2. In the last part, the Flatten layer is added to "flatten" the input data, that is, transform the multi-dimensional input data into one dimension. After processing by this layer, the data is connected to the fully connected layer with the output dimension of 512 and the activation method of Tanh, and then connected to the fully connected layer with the output dimension of 2 and the activation method of Sigmoid for binary classification processing, and the decision results of the final discriminator network are output. That is, whether the samples input to the network come from the actual distribution or from the distribution of samples generated by the generator network.

In this paper, a dual network structure of global discrimination and local discrimination is used, that is, the damaged image is input to the local discrimination network, and the local discrimination network is responsible for the identification of the image generation results of the damaged area, and the image of the damaged area is input to the global discrimination network, and the global discrimination network is responsible for the identification of the repair results of the global visual connectivity. In this paper, a method based on structural edge information constraint is used to solve the problem of weight imbalance. At the same time, the image input area of the global discrimination network is limited to eliminate the interference caused by too many features, so that the global discrimination network can pay more attention to the global image visual coherence. The local discrimination network can pay more attention to the generated image effect of the current damaged area.

The main role of the loss function is to evaluate the gap between the predicted value of the network model and the true value. The smaller the gap is, the closer the predicted value is to the true value. The optimization of a network model is usually a process of minimizing the loss function. The smaller the value of the loss function, the better the actual effect of the network model is often. In this paper, the combination of adversarial loss and content loss is used as the loss function, as shown in the formula:

$$L=\gamma \cdot {L}_{X}+{L}_{GAN}$$
(8)

where, \({L}_{GAN}\) is the adversarial loss, \({L}_{X}\) is the content loss, and \(\gamma\) is the specific gravity parameter, which is defined as 80 in the experiments of this paper.

The network used in this paper will use the setting of two discriminator models, so in the process of data training, the discriminator model will do adversarial training on the generator model. The local discriminator network uses different loss functions to calculate its adversarial loss, so that it has different focus directions. The adversarial loss of the local discrimination network is defined as follows:

$${L}_{GAN}=\beta \times {E}_{z\sim {p}_{z}}\left[\mathrm{log}{D}_{2}\left(G\left(z\right)\right)\right]+{E}_{x\sim {p}_{data}}\left[-{D}_{2}\left(x\right)\right]$$
(9)

where the hyperparameter 0 < β ≤ 1.

The second part of the loss function is the content loss, and the two classical choices when calculating this loss are the L1 loss and the L2 loss. However, when using these functions as the only optimization objective, it is possible to cause broken artifacts on the generated image due to the pixel average of possible solutions in pixel space. In this paper, Perceptual Losses are used, which is strictly also an L2 loss, but is based on the difference between the CNN feature maps of the generated and target images [25]. Its calculation is defined as follows:

$${L}_{X}=\frac{1}{{{W}_{i,j}H}_{i,\mathrm{j}}}\sum_{x=1}^{{W}_{i,\mathrm{j}}}\sum_{y=1}^{{H}_{i,\mathrm{j}}}{\left({\Phi }_{i,\mathrm{j}}{\left({I}^{s}\right)}_{x,y}-{\Phi }_{i,\mathrm{j}}{\left({I}^{B}\right)}_{x,y}\right)}^{2}$$
(10)

where \({\Phi }_{i,\mathrm{j}}\) is the feature map obtained after the \({j}_{\mathrm{th}}\) convolution before the \({i}_{\mathrm{th}}\) Max pooling layer in the VGG19 network, \({W}_{i,j}\) and \({H}_{i,\mathrm{j}}\) are the dimensions of the feature map, \({I}^{B}\) is the blurred image as input, and \({I}^{s}\) is the clear image generated by the generator model.

In the loss function, the adversarial loss focuses on recovering the texture details of the image, while the perceptual loss focuses on recovering the general content of the image. Combining the optimization methods of these two losses, after the network model goes through a series of adversarial training, and the image is repaired by the generator model, which can further improve the detail performance of the image and enhance the overall effect.

2.3.3 Optimizer selection

In double discriminant network, the two important aspects are the construction of network model and model training, and the process of model training is the process of model learning. Different network structures, different data types, the amount of computation is not the same, the network learning time is also different. The different learning algorithms will also have a certain impact on the learning time of the network, but the final performance of the model is more important. Therefore, choosing an appropriate learning algorithm for the model is particularly important.

Learning algorithms [26, 27] are often called optimizers, and there are a variety of optimizers to choose from, such as BGD, SGD, Momentum, Adagrad, Nesterov, Adadelta, RMSprop, Adam, and more. All of these optimizers are designed to improve on the shortcomings of other optimizers, but they still have varying degrees of shortcomings. For example, due to the frequent update of SGD, the loss function will have serious oscillation and is sensitive to noise. Adagrad The learning rate will gradually shrink as the learning goes on and eventually become very small; Adadelta repeatedly dithers etc. around the minimum in the later stages of training.

This article will use a relatively excellent optimizer—the Nadam optimizer, which can be said to be a combination of Nesterov and Adam. The process of training the Nadam optimizer is as follows:

  1. (1)

    Initialize training parameters:

    $${g}_{t}=f\left({\theta }_{t-1}\right)\nabla {\theta }_{t-1}$$
    (11)
    $${m}_{t}=\left(1-\mu \right){g}_{t}+\mu {\theta }_{t-1}$$
    (12)

In the equation, \({g}_{t}\) is the gradient parameter in the Adam optimizer, \({m}_{t}\) is the Nesterov momentum parameter, \(\theta\) is the learning rate, \({\theta }_{t-1}\) is the dynamic decay learning rate, \(f\left({\theta }_{t-1}\right)\) is the dynamic decay learning rate function obtained as the number of iterations increases, \(\nabla\) is the gradient descent process, and \(\mu\) is the decay constant.

  1. (B)

    Calculate the gradient of each parameter and update the gradient using the Nesterov momentum method:

    $$\widehat{g}=\frac{{g}_{t}}{1-{\prod }_{i=1{\mu }_{i}}^{t}}$$
    (13)
    $${\widehat{m}}_{t}=\frac{{m}_{t}}{\prod \begin{array}{c}t+1\\ i=1\end{array}{\mu }_{i}}$$
    (14)

In the equation, \(\widehat{g}\) is the gradient vector and \({\widehat{m}}_{t}\) is the momentum vector. \(t\), \(t+1\) are the current iteration number and the next iteration number, with \({\mu }_{i}\) being the gradient decay constant and \(i\) being the update step size.

  1. (III)

    Integrate Nesterov and Adam ideas to update learning rates.

    $${\theta }_{t}={\theta }_{t-1}-\eta \widehat{g}\frac{{\widehat{m}}_{t}}{\sqrt{{\widehat{n}}_{t}}}$$
    (15)

In the formula, \({\theta }_{t}\) is the current updated learning rate, \({\widehat{n}}_{t}\) is the standard vector, and \(\eta\) is the weight of the first-order moment estimation of the control gradient.

  1. (IV)

    Use the updated learning rate and calculated momentum gradient to update the parameters and obtain the momentum parameters for the next iteration.

    $${m}_{t+1}={\theta }_{t}{\mu }_{t+1}{\widehat{m}}_{t}+\left(1-\mu \right){\widehat{g}}_{t}$$
    (16)

Among them, \({m}_{t+1}\) is the momentum parameter for the next iteration, and \({\mu }_{t+1}\) is the decay constant for the next iteration.

Repeat steps (2)—(4) to train the Nadam optimizer until there is no significant improvement in the optimizer's performance after more than 10 iterations, and then stop training.

It can be seen from the formula that Nadam can constrain the learning rate more strongly and has an impact on the gradient update. In general, there are many cases where Adam optimizer is already a good optimizer, but in most cases where Adam optimizer is used, Nadam optimizer can be used to achieve better results.

2.3.4 The repair network searched for similar patches in the damaged image region

The proposed method for inpainting a broken image takes the broken image details as input. In order to avoid the loss caused by fitting the disordered information when the inpainting network generates the inpainted image, we adopt the similarity filling method to deal with the damaged image regions to be repaired. By replacing the noise input in the details of the damaged image, the efficiency and quality of the generated image can be improved. The reason is that the noise is an unordered set of random numbers, it does not retain any information that helps the generation network to generate the image, so the repair network will produce many unnecessary losses in order to fit this unordered information. The similar filling method is used to propagate the good image information around the damaged area to the damaged area, so as to eliminate useless interference information like noise, so as to reduce the loss of the repair network. By comparing the similarity of the surrounding neighborhood blocks, the method finds a suitable filling block around the detail area of the damaged image and fills it into the damaged area. The filled similar patches can provide some structure and texture information to the inpainting network, reduce the loss of feature mapping, and improve the quality and speed of the image generated by the inpainting network.

In order to preserve the structure and texture information similar to the intact block as much as possible, the optimal similar block is selected when searching for similar blocks. In order to improve the search speed, the similar block search method is simplified. The detailed operations are as follows: (1) Select an intact block adjacent to the damaged region randomly, assign a bias block randomly to the intact block in the image, and record the position information between the intact block and the damaged block; (2) update the broken block using bias information and position information; (3) Using the random bias block as a baseline, a better bias block is searched randomly with a convergence rate of 0.5, and the broken block is updated with it (Figs. 2 and 3).

Fig. 2
figure 2

Simplified bias information and propagation

Fig. 3
figure 3

Random search

In order to have a better performance in the efficiency and quality of the search, this paper further makes some restrictions on the random iteration.

3 Experimental analysis

3.1 Parameter settings and experimental steps

In this paper, multiple rounds of experiments are carried out to verify the problem of online education video art image detail repair. The experimental platform is Windows 10, python3.8 programming environment, Intel 4.20ghz CPU clock frequency, and 128 GB memory. This paper uses the set of key image details in online education videos of art to train and evaluate the application effect of the proposed method.

We conduct extensive experiments to demonstrate the learning ability of our method on various types of images, where we focus on training the inpainting ability of synthetic missing content on artistic image details in online educational videos. The parameter learning rate of the proposed method is 0.002 In order to balance the images with different losses, \({\lambda }_{1}\) is 0.3, \({\lambda }_{2}\) is 0.5, \(\alpha\) is set to 0.8 and \(\beta\) is 0.2 in the experiment.

To demonstrate the applicability of the method proposed in this article, the resolutions of the online education videos selected in this article include 480p, 720p, and 1080p. In low resolution videos, due to the small number of pixels, the detailed information of the image may be less, so better detail repair methods are needed to improve image quality.

Online educational videos cover various types of content, such as lectures, experimental demonstrations, art exhibitions, etc. Different types of videos also have differences in image detail features. For example, in art exhibition videos, the color saturation and contrast of the image may be more important, while in experimental demonstration videos, the clarity and details of the image may be more important.

Under the above parameter settings and dataset settings, the experimental process for designing art image detail feature restoration in online educational videos based on dual discriminant networks is as follows:

  1. (1)

    Image enhancement: Collect art image data from online educational videos, and perform preprocessing and feature extraction. By selecting grayscale changes in the spatial domain to enhance the detailed features of the original video art image, it is possible to better preserve the detailed information of the image.

  2. (2)

    Preliminary repair: The preliminary inpainting is completed by training the generation network using the U-Net structure generation network with image details as input.

  3. (3)

    Dual discriminant network training: Using the preliminarily repaired image as the input of the dual discriminant network, the global discriminant network is used to identify the coherence of image details, and the local discriminant network is used to identify the authenticity of image details output by the generated network. After generating network adversarial training, good image detail repair results are obtained.

  4. (4)

    Experimental evaluation: image enhancement experiment and fuzzy inpainting are used to evaluate the repaired image, analyze the experimental results, and compare the advantages and disadvantages of this method with other methods.

3.2 Image enhancement experiment

The self-made online educational video art image details are dark due to insufficient light intensity, so the proposed method is used to enhance the online educational video art image details. The histogram comparison results of the online educational video art image details before and after enhancement by the proposed method are shown in Figs. 4 and 5.

Fig. 4
figure 4

Image histogram before enhancement

Fig. 5
figure 5

Enhanced image histogram

It can be seen from the histogram after gray processing that the histogram occupies the allowable range of the entire image gray value, increases the dynamic range of the image gray, and also increases the contrast of the image. There is a large contrast in the image vision, so that the details are more prominent, which lays a good foundation for the later image repair.

3.3 Fuzzy inpainting experiment

In order to verify the effect of inpainting in this method, two fuzzy art images in online education are selected, and the dual generator depth convolution generation countermeasure network method proposed in Reference [2], the L0 norm sparsity apriori method proposed in Reference [3], the low rank feature repair method proposed in Reference [4], and the generation countermeasure network method proposed in Reference [5] are cited as experimental comparison methods to obtain the comparison results of fuzzy art images and restored images, As shown in Figs. 6, 7, 8 and 9.

Fig. 6
figure 6

Fuzzy art inpainting effect 1

Fig. 7
figure 7

Fuzzy art inpainting effect 2

Fig. 8
figure 8

Fuzzy art inpainting effect 3

Fig. 9
figure 9

Fuzzy art inpainting effect 4

It can be seen from Figs. 6, 7, 8 and 9 that the definition of the restored blurred art image is low by using the comparison methods. By using the method described in this article to repair blurry art images, it can be seen that the repaired art image details are not damaged or the edge connection structure is distorted. Moreover, the repaired art image details also restore the image details information well, effectively improving the clarity of the image.

3.4 Subjective and objective evaluation

In order to make the test results more fair, since the objective evaluation is not completely consistent with the visual effect observed by human eyes, this paper combines subjective and objective evaluation to evaluate the detail quality of inpainted art images.

In this paper, five artistic image details are randomly selected in the test set, and the inpainting methods of Reference [2], Reference [3], Reference [4] and Reference [5] are selected as the comparison of the proposed method for image inpainting. In order to intuitively show the effectiveness and superiority of the algorithm in this paper. In this paper, the Peak Signal-to-Noise Ratio (PSNR) index is used to represent the gap between the inpainted image and the original image. The larger the PSNR value, the better the inpainted effect. In order to further analyze the superiority of the proposed method in image inpainting, especially in structure, the Structural SIMilarity index (SSIM) is added to evaluate the inpainting details of online educational video art images. The closer the SSIM value is to 1, the more similar the structure is. The above two objective indexes are used to evaluate, and the corresponding mean values are calculated. Table 1 shows the statistical comparison of PSNR values of different methods, and Table 2 shows the statistical comparison of SSIM values of different methods.

Table 1 Statistical comparison of PSNR values by different methods
Table 2 Statistical comparison of SSIM values by different methods

The PSNR test in Table 1 shows that the proposed method has the highest average image inpainting PSNR of 30.108, followed by the method in Reference [5], and the average image inpainting PSNR of the method in Reference [4] is 24.308, which is the lowest average PSNR of all methods. The highest image inpainting PSNR value of the proposed method is 31.69, while the image inpainting PSNR value of the other methods does not exceed 30. It can be seen that the image inpainting PSNR value of the proposed method is the highest among several methods, indicating that the repaired art image has high detail definition, good quality, small distortion, and little difference from the original image.

According to the test data of SSIM in Table 2, the average value of image inpainting SSIM of the proposed method is 0.961, which is significantly higher than the average value of image inpainting SSIM of Reference methods. The average SSIM of the other methods is not higher than 0.95, so it can be seen that the structure of the image repaired by the proposed method is most similar to the structure of the original image, and it can be seen that the repair effect is significant. Through the analysis of these two objective evaluation indexes, it can be concluded that the repair effect of the proposed method is better than that of the methods in the references.

Different observers have different feelings about the same image. Based on this, this paper uses the five-level Wuckett scale to score, which is divided into very satisfied (5 points), satisfied (4 points), general (3 points), poor (2 points) and very poor (1 point). This time, five graduate students with normal vision were randomly selected to score the overall consistency and visual connectivity of the five groups of repair results according to the above criteria, and the scoring results are shown in Fig. 10.

Fig. 10
figure 10

Overall consistency and visual connectivity scoring results

As can be seen from Fig. 10, the values of the two items of the proposed method are high, and the visual connectivity is close to the full score. The values of the two items of the method in Reference [3] are close to the values of the proposed method, and the values of the other three algorithms are low, and none of them exceeds 4.5. The index of the proposed method is optimal both in the vision after restoration and in comparison with the original image structure, which can show the effectiveness and superiority of the proposed method.

3.5 Robustness analysis

In order to further evaluate whether the performance of this method is stable, robustness analysis experiments are conducted. Add Gaussian noise to the key image detail set in art online education videos, and set the Gaussian noise size to 20 grayscale levels. To test the robustness of this method against this interference factor, the statistical comparison of PSNR values of different methods under Gaussian noise interference is shown in Table 3.

Table 3 Statistical comparison of PSNR values of different methods under Gaussian noise interference

From the test of PSNR in Table 3, under the interference of Gaussian noise, the average value of inpainting PSNR of this method is the highest, 29.68, while the average value of PSNR of literature methods is lower than that of this method. The highest PSNR value of the inpainting method in this paper is 31.24. It can be seen that the PSNR value of the inpainting method in this paper is the highest among several methods, and is less affected by Gaussian noise. The restored art image has high detail clarity, which reflects the robustness of this method.

4 Conclusion

In order to better repair the artistic image details of online educational videos, this paper proposes a method of repairing artistic image details based on double discrimination network. The experimental results show that the image enhancement effect of the proposed method is better, and the gray dynamic range of the image is increased, and the contrast of the image is also increased. The visual contrast of the image is larger, so that the details are more prominent, which lays a good foundation for the image restoration. In addition, the method proposed in this paper has a significant effect on repairing the damage, and the repaired image has no damage and structural distortion of edge connection, and the structural information of the face is recovered well. Combined with subjective and objective evaluation indicators, the peak signal-to-noise ratio of the proposed method is the highest, indicating that the inpainted image has high definition, good quality and small distortion. The high structural similarity indicates that the inpainting result of the proposed method is closest to the original image, and the inpainting result is better.