Few Shot POP Chinese Font Style Transfer using CycleGAN

The new style design of Chinese fonts is an arduous task, because there are many types of commonly used Chinese characters and the composition of Chinese characters is complicated. Therefore, the style transfer of Chinese characters based on GAN has become a research hotspot in the past two years. This line of re-search is dedicated to using a small number of artificially designed new style fonts and learning the map-ping from the source font style domain to the target style domain. However, such methods have two problems: 1. The performance on pop (point of purchase) fonts with exaggerated and random style is not satisfying. 2. Plentiful manually designed fonts are still required. In order to solve the above problems, we propose a few-shot font style transfer model based on CycleGAN. It uses meta-knowledge to reduce the use of manually designed fonts and enables each character to fully learn the knowledge contained in all new style fonts to achieve satisfying pop font style transfer effect. We also construct a dataset based on commonly used 3500 Chinese characters and verify the effectiveness of our model.


Introduction
In the digital age, the threshold for designing a Chinese font that can be used for computer display and paper printing is lower, but it is still a heavy work because over 3,500 characters are commonly used in daily life. This paper proposes a style transfer model based on meta learning and CycleGAN [1] to generate fonts automatically, focusing on pop (point of purchase) font style, which aims to stimulate consumption and activate the atmosphere of the store and contains more decorative elements.
However, designing pop font styles suffers a lot from strokes-deformation and randomness. As figure 1 shows, randomness is an essential part of pop font styles which highlights fashion, freedom and personality. Therefore, pop font style transfer task is more challenging, and past methods [2][3] [4] have problems such as fuzzy generated characters and missing decorative elements.
Based on the above idea, we formalize pop font style transfer task as a few-shot learning problem and utilize CycleGAN combined with meta-learning methods to make font generation easier and more realistic. To summarize, our main contributions are concluded as follows:  We propose the challenging pop font style transfer task and build a validation dataset for it.  We propose a few-shot learning framework Meta-CycleGAN, which could be easily transferred to any GAN-based models, to greatly reduce the number of manually designed characters and perform better on pop font style transfer task simultaneously.
 We demonstrate the efficacy of Meta-CycleGAN through detailed experiments. where ∈ , ≪ . The size of target font style samples N is explored in experiment part to find a minimal value which will not induce an obvious decline in performance. Figure  2 gives an overview of proposed model structure.

Base Model
As shown in figure 2, base model is a CycleGAN. The loss function of CycleGAN contains three parts: adversarial losses, cycle consistency losses and identity loss. For generator and corresponding discriminator , the adversarial loss is: The adversarial loss for and corresponding discriminator can be defined similarly. Cycle consistency losses ensure that the cyclic transformation is able to bring the image back to the original state, which is: =~( ) || ( ( )) − || 1 +~( ) || ( ( )) − || 1 (2) Another identity loss is added to ensure real images are not changed so much after being transformed by generator. Thus the original total loss for CycleGAN is:  Fig. 2 Model structure. We add meta-loss computing unit to original CycleGAN

Meta Learning Procedure
The meta knowledge among episodes is prototype [5][6] [7]. The meta knowledge is that characters with the same font style should stay closer in the embedding space and stay far from those with different font style. Characters which stay close the each other share similar font styles, thus generating target style characters is much easier guided by meta knowledge.
By feeding episodes into model, the model is able to learn common knowledge among episodes, and prototype encodes such knowledge in it. Such knowledge in font style contains edge shape, decoration, and other features. In one training episode, we feed training batch = { , } , =1 into CycleGAN. K is the number of samples from each font style, thus we sample K characters equally and repeatedly from each font style. The encoders in each generator , which are marked as and , serve as project function to embed character images in latent space. The bottleneck vector between encoder and decoder is latent embedding. First we calculate the prototype of each font style within one batch. = ∈ = = ∈ = .
= 1 * ∑ ( ) =1 (4) Figure 3 illustrates the calculating process, from which embeddings of two font styles fall into separate clusters. The meta knowledge within episodes is the prototype of each font style. Clusters should stay far enough, and this property is encoded with a loss form: Function dist is to calculate the distance between prototypes, which is set to Euclidean distance in Meta-CycleGAN. Characters in the same cluster share similar font style. Furthermore, the prototype of each font style represents the most essential content of each style. For example, the butterfly decoration in font DieZuiQingFeng and the round edge in font ZhuLangYuanTi. By pulling the latent embedding near to the prototype, generated characters of target font style learn the essence of style knowledge and increase their randomness.
In addition, the prototype calculated by reconstructed images should be close to their original domain. Similarly, we compute the prototype of each reconstructed font style.
To encode the knowledge of reconstructed image clusters , another loss form is added.
In fact we hope ̂ is the same as , which means cyclic transformation is able to bring the image back to the original state. This loss term means to enhance the cycle consistency of base model. , another loss term based on ̂ ̂ is also added in order to further pull clusters far away. (9) During each training episode, the calculated prototype is the estimation of real prototype representing that font style which can not be actually observed. By adding the above loss terms as regularization to font style clusters, the encoders will gradually learn a project function to embed images separately according to their style. However, at the initial stages of training the encoders are not trained well enough, so a param ℎ dependent on epoch is added to meta loss in order to increase its weight linearly to training epochs.

Data, Baselines and Model Settings
As pop Chinese font style datasets have never been created before, we propose a dataset containing two pop Chinese font styles: DieZuiQingFeng, ZhuLangYuanTi. We choose the commonly used 3500 Chinese characters and translate characters to images of different font styles containing source font style SimKai. For few-shot settings, the source dataset contains 3500 samples, while target dataset contains no more than 1000 samples. Sample batch size K is validated in range: {1, 3, 5, 10, 15, 20} and we finally choose 5.
Baseline models are CycleGAN proposed which is widely used in Chinese font style transfer task and serves as a strong baseline. Pre-trained HCCR-GoogLeNet model[8] is widely used to evaluate the quality of the generation tasks. If the generated characters are realistic, the pre-trained HCCR-GoogLeNet will also be able to classify the generated characters with high accuracy.  Figure 4 shows a direct comparison between original CycleGAN and Meta-CycleGAN on DieZuiQingFeng task. Target font style datasets have 500 samples, and models are trained to the same epoch. In few-shot settings, original CycleGAN performs not well, and it neither learns decoration nor encodes randomness. For contrast, Meta-CycleGAN learns parts of decoration and really encodes randomness in font styles. For example, in the first three lines all character images learn the butterfly decoration, which is the representative part of DieZuiQingFeng style. However, none of them has such decoration in the original target style, and the first character image even changes its original heart-like decoration to butterfly and puts it at another position. Edge shapes are much easier to learn compared to decoration, as both models perform equally well on edge shape transfer. As sample number is limited, there are still some features not well learned by Meta-CycleGAN which is shown at figure 4.

Model Performance on Tasks
On another two tasks, the direct comparison is shown in figure 7. These two tasks are easier than DieZuiQingFeng, thus all models perform better.

Parameter Sensitivity
In this part size of target font style dataset is discussed. Table 3

Visualization for Bottleneck Vectors
In this part we declare that bottleneck vectors from source and target font styles are projected into different clusters in latent space. Bottleneck vectors from generator are 64-dim, so we use t-SN[8] as dimension reduction and visualization method to project these vectors into 2-dim space. Each vector represents a character. Figure 5 gives visualization result on two task: DieZuiQingFeng and ZhuLangYuanTi. The margin between clusters reduce as the difficulty of tasks decrease. Left column shows visualization from Meta-CycleGAN, from which we could draw a separate line for two classes, while samples on the right column could not be separated clearly.

Conclusion
In this paper, we first propose the pop Chinese font style transfer task with few-shot settings, then utilize episodic learning and meta knowledge to solve this task with proposed model Meta-CycleGAN. Experiments are conducted to show model performance on different difficulty-level tasks. Future works include improving model structure to gain better performance and introduce attention mechanism to calculate prototype more precisely.