Applying Deep Learning Technologies to Evaluate the Patent Quality with the Collaborative Training

A system and method for charging an electric vehicle include identifying vehicle information corresponding... Claims A system for charging an electric vehicle, comprising: a camera configured to acquire an electronic image of the electric vehicle...


Introduction
Since the 18th CPC National Congress, the cause of intellectual property has entered a new era of vigorous development driven by policies. Patent applications are trending year by year. As of 2020, the State Intellectual Property Office has authorized a total of 530,291 disclosed invention patents, an increase of 18% over 2019. According to data published by the State Intellectual Property, in 2020, 683,000 invention patent applications were discovered; a total of 217,000 invention patents were authorized. In order to implement the guiding ideology and effectively promote my country's transformation from a country of intellectual property rights to a country of cre-ativity, from the pursuit of quantity to the improvement of quality, recently, the National Intellectual Property "Notice on Further Strictly Regulating Judges' Applicants for Understanding" (Guo) Fabaozi 2021 No. 1 during the project verbally promoted the improvement of patent quality, strengthened quality orientation, and strengthened the standardization and supervision of patent transactions. How to quickly, automatically, fairly, press, openly, and scientifically evaluate the quality and value of patents will become a problem that needs to be resolved.
Patent and other intellectual property intangible assets have become an important force in national development, with the development of intellectual property-related technologies at home and abroad and the improvement of their status in the economic field. In recent years, the patent pledge financing project has gradually taken shape, which has provided a great boost to the growth of small and medium-sized enterprises. In 2020, the total amount of national patent and trademark pledge financing reached 218 billion yuan, an increase of 43. 9% year on year. This fully demonstrates that there is a huge demand for patent value evaluation in China and the world, and it is increasingly not negligible in the fields of science and society. While strengthening innovation, my country should also promote the transformation of scientific and technological achievements. Therefore, in recent years, with the country's high requirements on the speed and efficiency of the transformation of the results of authorized national invention patents, how to objectively and automatically evaluate the value of massive Chinese patents has become a hot research topic and a scientific problem that needs to be solved urgently.
Many scholars have been exploring the construction of models based on quantitative indicators and automatic quality assessment. However, there are still the following shortcomings in the evaluation of Chinese patent quality: (1) The Chinese patent quality evaluation index system is not sound enough and difficult to quantify: in current research, on the one hand, most of the evaluation indexes of patent quality are still at the theoretical stage, which is far from practical applications; on the other hand, the system is not sound more index dimensions should be considered. Important factors affecting patent quality: the content of the patent text itself, the knowledge information mined in the patent text, and the domain knowledge information mined are not included in the existing patent quality evaluation indicator system (2) The difficulty of quantification of the Chinese patent quality evaluation index system and the lack of data make it impossible to implement automatic evaluation: there is no relevant research on the automatic evaluation model of Chinese patent quality in China. On the one hand, due to the aforementioned reasons, most of the Chinese patent quality evaluation indexes are still stuck at the theoretical level; it cannot be quantified, and thus cannot support the automatic evaluation of Chinese patents. On the other hand, due to the lack of data on Chinese patent quality levels, it is impossible to construct a data-driven automatic model of Chinese patent quality In order to make the most of effective role of patent quality assessment and identify high-quality patents, it is urgent to adopt certain technical means to make this process easier. Faced with the three common problems in the field of patent quality evaluation, this article mainly uses data mining and natural language processing technology to extract indicators from a large number of patent information data of different dimensions and uses deep learning methods to predict patent quality. In addition, multitask learning method is used to improve the accuracy of patent quality evaluation and prediction. The constructed patent quality evaluation model (PQE-MT) can automatically evaluate the quality of patents with a high accuracy rate (83.9%), saving a lot of valuable manpower and material resources. Regarding the lack of data labeling of Chinese patents, a large number of US patents with patent quality grade are used to construct basic training data. Therefore, a Chinese patent evaluation method based on transfer learning and active learning method is proposed in this paper. Then, the model is divided into two feature spaces, Chinese and English, and a network model is trained on these two features, respectively, to form collaborative training. The results of the evaluation show that the article method achieved a good migratory effect, with micro-F1 reaching 74%.  [1] summarized the technology cycle, technical scope, scientific relevance, and claims item number as for the professional indicators. On the other hand, the number of patent applications, patents home race number amount, and patents survival time is summarized as a comprehensive index and litigation. Through the above indicators, comparisons are made in terms of time, difficulty, and cost. Jin et al. (2011) [2] constructed a patent attribute network to optimize the prediction effect of algorithms from the aspects of patent content standardization, innovation, technology relevance, market value, patent inventors, and patentees. Han and Sohn (2015) [3] predicted the remaining effective time of the patent by constructing the similarity feature between the patent abstract and the claims and combining it with 14 other quantitative indicators. The article began to combine the text features of patents with patent attributes to make predictions. Yang et al. (2016) [4] broaden the patent quality impact indicator choice from the patent field correlation and life cycle stages. Leng and Zhai (2017) [5] constructed patent quality evaluation indicators, used the analytic hierarchy process to obtain the importance of the indicators to obtain the corresponding weight results, and used the comprehensive fuzzy evaluation algorithm to evaluate the patents in the medical field. In this article, different indicators are weighted to make them more targeted. Li (2018) [6] quantified patent quality evaluation indicators from the three dimensions of technology, law, and economy. The technical dimension mainly uses dependence, novelty, monopoly, and maturity; the economic dimension uses patent revenue and market evaluation, patent survival time, and other indicators; the legal dimension refers to factors such as patent legal status and litigation conditions. Finally, a comprehensive evaluation of the patent value is carried out in three dimensions. The above article explained and elaborated the patent quality impact indicators at the theoretical level, taking into account indicators of different dimensions and different levels, but some indicators are poor in 2

Related Work
Wireless Communications and Mobile Computing availability and difficult to express using algorithms. Meng (2018) [7] conducted a comparative analysis of patent data, combined with traditional patent quality evaluation methods, and quantified patent evaluation indicators in hierarchical levels, set up technical, economic, and legal levels, and further subdivided them into 14 different levels. At two levels, the weights of indicators at different levels are calculated through rough sets, and the final evaluation is performed through cloud models. The article summarizes the advantages and disadvantages of different indicator systems and summarizes the indicators, but the data discretization process and cloud model limit the generalization ability of the method. In general, the abovementioned scholars have put forward numerous patent quality evaluation indicators at the theoretical and application levels, and the topic optimizes and summarizes related indicators and further includes multiple dimensions from time, quantity, technology, law, inventor and agent, and text content. Quantify indicators, compare, and select key indicators as input to the patent quality evaluation model.  [13] use CART to extract evaluation indicators, optimize input parameter items, reduce model scale, and improve fitting effect. This method pro-vides practical methods and theoretical support for the selection of patent indicators. The article uses classification regression trees to solve the patent evaluation problem, but the indicators used in the article are all basic patent attributes, and many influencing factors have not been fully considered. The above three articles did not take into account the textual content of the patent, so that the prediction model lacks vital influencing factors. Lin (2018) [14] applied neural networks to the quality evaluation research of patents and proposed the DLPQE model. The model is composed of two-part vector of the attribute representation structure ANE composed of the patent citation network and the text sequence feature represented by the convolutional neural network connected with the attention mechanism. The article proposes for the first time that the deep learning method is used for patent document analysis, taking into account different indicators and text content, but due to lack of the deep quantification of patent indicators, the convolutional neural network will lose the time series characteristics of the text. The subject combines quantitative indicators with text content and applies multitask learning to the patent quality evaluation model for the first time.

Related Work on Transfer Learning and Active Learning.
Since data labeling is very time-consuming and labourintensive, high-quality labeling data is scarce. The learning of transfer attracted increasing attention from the academy. Banea et al. (2008) [15] proposed a method of learning to transfer to a foreign language because of the numerous English entry data in this document, and a corpus of English data is used to generate the source domain data set of the target language through automatic translation, which demonstrates the feasibility of using automatic translation for learning to transfer through languages. Pan and Yang (2009) [16] propose transferring domains with sufficient training data to domains with similar data distribution and avoiding costly data annotation tasks via knowledge migration to greatly improve the learning effect and discuss transfer learning and domain adaptation, sample selection, the relationship between deviation, and other related machine learning technologies. Xu and Yang (2011) [17] use transfer learning and multitask learning to extract and transfer useful knowledge from auxiliary domain data. Weiss et al. (2016) [18] introduced the latest developments in the field of biological information and explained the information on learning and transfer solutions and discussed possible future research work. Transfer learning solution has nothing to do with the size of the data. Yosinski et al. (2017) [19] conducted research on the transferability of deep neural networks. Perform finetune experiments layer by layer in different layers to explore network transferability. Note that adding an end-to-end to the deep migration network will improve the effectiveness and overcome better data differences. Migration of the number of layers of the network can speed up learning and network optimization, general characteristics of the low-level network learning, and the high-level network learning field feature. In order to make the model conform to the data distribution characteristics of the target area, a small number of 3 Wireless Communications and Mobile Computing materials still need to be marked. Active learning can predict the results and select samples according to the model, and it is most helpful to improve the model. Thompson et al. (1999) [20] use active learning to try to select the most useful example as training data for transfer learning. Experimental results show that skilled learning can significantly reduce the number of samples labeled when the algorithm achieves the same effect. Tong and Koller (2001) [21] use pool-based active learning. The algorithm does not need a randomly selected training group and can request samples for marking. The new algorithm of support vector machine is used for active learning, which provides a theoretical basis for the concept usage algorithm of space version. Experimental results show that the use of the active learning method in the article can significantly reduce the need for labeled samples. Active learning, literature reviews, and less labelled training examples were introduced by Settles (2009) [22]. Discuss the query plan and analyze the experience and theoretical evidence of active learning. Li et al. (2016) [23] propose an active multigrade multilabel learning based on the minimum SVM classification range to take the minimum SVM classification distance as the confidence of the selected example, effectively reduce the number of sample annotations, and improve classification performance. Zhou et al. (2017) [24] propose the impact of active and transfer learning, data expansion, fault selection, continuous information extraction, and other methods on remote data and diagnosis. According to Zhu and Bento (2017) [25], GAN is used for active learning, and an object-oriented learning model GAN and a learning algorithm for motion value samples are given. Konyushkova (2017) [26] tried to transform active learning into a regression problem and dealt with the overfitting of previous query schemes. The experimental results show that the order reduction of data points is feasible in multivariable finite element analysis.

Related Work on Collaborative
Training. After the transfer learning and active learning are used to promote the model to adapt to the feature distribution of the new data set, since the patent data has two views in Chinese and English, the experiment is carried out collaborative training in the remaining unlabeled sample set, making full use of the different learning features of the model on the two views. Features enhance the overall effect of the model. Blum and Mitchell (1998) [27] first tried collaborative training on web content prediction, learning from different views according to the model, using feature differences in different views to predict the labeled samples, and selecting highconfidence data as algorithms in other views training data. Wan (2009) [28], according to the feature difference between Chinese and English bilinguals, combined English data sets with a large number of emotional labels and unlabeled Chinese data, using machine translation technology to translate each other, training models for the two languages separately, and predicting based on the model the probability of each category of the sample is obtained, and the sample with high confidence is obtained as the training data of other models to achieve the purpose of bilingual collaborative training. Guo et al. (2012) [29] first proposed the use of graph-based confidence estimation semisupervised collaborative training algorithm, which improved the speed of collaborative training in obtaining the data to be labeled to a certain extent. According to the information of the sample itself, the probability of the category of the unlabeled sample is calculated, and the confidence of the unlabeled data is estimated by a multiclassifier, which improves the effect of the collaborative classification algorithm, and proves the effectiveness of the algorithm on the UCI data set. Qiao et al. (2018) [30] use a collaborative training method on the neural network model to divide the sample features into multiple different views and fit the corresponding network parameters in different views. Experiments found that different models learned different characteristics of the data. The characteristics learned by different models are generally complementary. Gong and Lv (2019) [31] jointly use active learning, density peak clustering, and collaborative training to increase the amount of information in the data to be labeled and to a certain extent alleviate the mislabeling of fuzzy samples. Huang and Huang (2019) [32] use collaborative training for machine translation technology to improve the translation effect of the model. The experimental comparison shows that after the use of collaborative training, the results of machine translation are more accurate. When there are fewer labeled samples, there are still better results and translation quality. This topic makes full use of the unique Chinese-English bilingual features of the research and applies collaborative training to the Chinese patent quality evaluation model.

Patent Quality Evaluation Model
This paper uses transfer learning and active learning methods based on the PQE-MT model built with English patent data, further proposes a cross-language transfer of patent quality evaluation model based on active learning data expansion, and transfers the original model to Chinese patents. Finally, cooperative training is carried out. We will introduce the model from these three parts.

Patent Quality Evaluation Model Based on Multitask
Learning. We first proposed the PQE-MT_USA model to predict the quality of US patents and selected the bestperforming model structure through the comparison of different algorithms to facilitate the migration to Chinese patents in subsequent research. The related technologies involved in this part of the content are as follows, among which Word2vec and Bert are used for vector representation of text content to extract more semantic information; LSTM is used for learning the long-term dependence of the sequence, and Bi-LSTM also considers the information of the article context; CRF can improve the effect of named entity recognition based on the dependence of label results; multitask learning combines named entity recognition tasks with patent quality level prediction tasks to speed up model training and improve model fitting effects; TextCNN is used as comparison experiment of the effect of product neural network and recurrent neural network in this research. 4 Wireless Communications and Mobile Computing

Construction and Quantification of Patent Quality
Evaluation Indicators. Due to the lack of systematic and fair evaluation grades and evaluation methods for Chinese patents for a long time, there is no clear quality grade. Grading the quality of patents requires professional knowledge and analysis of a large number of related patents, making it difficult to obtain the labels of Chinese patent data, and there is no reference scale for the evaluation process. Through research, it is found that the US patent has a patent quality grade, and the patent quality grade is recognized by scholars. Therefore, the experiment selected US patents in the field of new energy as the research object. The experiment analyzes the information contained in the patent and specifically removes redundant information (such as patent application number: a numerical combination of other patent information) and overly complex information (such as patent specification: the text of the detailed description of the patent). Part, the content is tens of thousands of words, most of the content is the detailed introduction of the patent, which contains less effective information, and most of the main information exists in the abstract and claims of the patent and image information (patent drawing: prototype of the patent). As shown in the figure, this experiment mainly conducts research in the direction of natural language processing, focusing on patented text and numerical data. Part of the image information will not be considered for the time being.
The original attribute information obtained after screening is shown in Table 1 based on the attributes of these existing patents, and the experiment will quantify patent quality evaluation indicators in multiple dimensions and directions.
The experiment carried out in-depth processing of the above data items, mainly quantified and combined the following dimensions, including time dimension, quantity dimension, technical dimension, legal dimension, inventor, and agent dimension, and obtained related to patent quality quantitative indicators. And Figure 1 shows the division of some related attributes in different dimensions.
(1) Time Dimension. The experiment will process the various time attributes of the patent and convert it into computable numerical data. In the end, indicators such as the time from patent application to publication, the time from application to approval, the time from publication to approval, the survival time, and the remaining time to termination are obtained (2) Quantity Dimension. The change in the number of patent applications in a certain field shows the development trend of patents to a large extent. The increase in the number of patent applications means good prospects, and the decline in the number of patent applications indicates that this field is gradually being replaced (3) Technical Dimension. CPC is the joint patent classification, and IPC is the international patent classification. Both have different partitions for different parts in detail, but the overall structure is similar. Each level is a more detailed classification of the previous level (4) Legal Dimension. Different legal status indicates the degree of importance the assignee of the patent attaches to the patent, including survival, revocation, withdrawal, reissue, and termination. Since the patent right was granted, the assignee has to pay a certain fee every year to maintain the validity of the patent, and the fee increases year by year. When the patent cannot satisfy the interests of the assignee, the assignee will stop paying the fees, resulting in the cancellation of the patent (5) The Dimensions of Inventor and Assignee. A good inventor and assignee means a good patent. We get the sum of the number of patents invented by all inventors in the field; the average number of invention patents per inventor in the field, etc. In addition to the same indicators as the inventor, the patent assignee also extracts whether the assignees of previous and future generations have changed, the type of organization, and so on. The assignee involves different organizations, including companies, research institutes, universities, colleges, foundations, and individuals (6) Combined Indicators. We combine the feature of different dimensions to get combination indexes. Such as the proportion of patent survival, withdrawal and expiration of inventors, the number of patents in different technical fields and different years, and so on. The following is a list of the combination indexes of the assignment. In a specific patent agent and a specific field, Num_Application means the experimental definition: the number of patents applied; Num_Approval means the number of patent approved, Q_2 means the number of IPC and CPC, Q_3 means the size of patent family, and Q_ 4 means the average patent quotation rate. There are the following quantitative indexes. Technical In addition, the experiment adds long text description type attributes. On the one hand, the experiment takes the publication time of the patent as the node, uses the TF-IDF algorithm to calculate the text similarity of other patents before and after the node, and selects the top 50 with the highest similarity as the result of the similarity before and after the publication of the patent. The smaller 5 Wireless Communications and Mobile Computing the similarity result before the node, the more novel the patent; the greater the similarity result after the publication time node is, it indicates that the patent is groundbreaking and has been used for reference by other related patents, as well as the number of technical elements (field terms) in the main claim and the positive words (can, enable, so as to, etc.). On the other hand, the patent name, patent abstract, and patent claims are input into the sequence model to participate in the final prediction.
The PQE-MT_USA model mainly involves two parts: a quantitative index model and a sequence model. The sequence model uses a multitask learning structure. Next, we will introduce the model from these two parts.

Patent Quality Evaluation Model Based on
Quantitative Indicators. The patent quality evaluation model mainly includes input, index quantification, fully connected layer, softmax layer, and output. Its structure is shown in Figure 2.  Figure 1: Construction and quantification of patent quality evaluation indicator.

Wireless Communications and Mobile Computing
First, we enter the initial patent attributes, including legal status and number of citations. According to relevant rules, 15 initial indicators and 117 quantitative indicators in different dimensions are constructed, which together constitute 132 indicators. We combine the patent text with 132 indicators into a fully connected layer composed of 512-128-32 nodes. Finally, multiple classifications are performed through the softmax layer to predict the patent quality level. Patent quality levels are divided into eight categories. Take the "Dynamic power supply management system for electric vehicle" patent as an example, its legal status is "Alive," the number of citations is "4," and other patent attributes and 132 evaluation indicators are combined to finally get its patent quality grade prediction result. The result is 8.

Multitask Learning Model
(1) Task 1: Text Classification Model. This task preprocesses the title, abstract, and claims of the patent and predicts the quality of the patent through the deep learning model described below. The detailed structure of the text classification model is as follows.
The model takes the three parts of patent title, abstract, and claims as input and first performs text processing on it, including splicing the input items, removing stop words, removing special symbols, converting to lowercase, and converting to word roots. The experiment uses Word2vec and Bert as the initial vector representation of words. The processed data is fine-tuned in Bert to obtain the word vector of a specific field as a word embedding. Next, through the Bi-LSTM layer, fusing the attention mechanism, LSTM can learn long-term dependencies, and after bidirectional splicing, it can effectively use the context information of the text to dig out more hidden features.
The Bi-LSTM in this model is equivalent to inputting the sequence into a forward LSTM and a back-end LSTM, respectively, and the output result is used as the result. The structure of LSTM is shown in Figure 3. Taking the title part of the patent text "Test system for battery management system" as an example, the basic unit of LSTM is used to encode the text. First, input the word vector "Test" and pass through the forget gate, memory gate, and output gate. The calculation generates a new hidden state ht (test) corresponding to it. The calculation process of the three gates is as follows: Among them, x t is the input of LSTM, f t is the forget gate, it is the input gate, o t is the output gate, h t is the hidden gate, c t is the cell state, W, U, and b are the matrices in training and the network learning calculation element value, and σ is the sigmoid function, which ensures that the output result is between 0-1, which is used to control the proportion of information passing.
The second word vector "system" and the hidden state h t−1 ðtestÞ output at the previous moment are used as new inputs. After three gate calculations again, the updated hidden state h t ðsystemÞ , h t ðsystemÞ is obtained. Contains information about the characters entered in the preceding text. Repeat the above coding process until the end of the patent text.
When the input sequence is very long, it is difficult for the model to learn a reasonable vector representation. The attention mechanism [33] retains the intermediate results of the LSTM encoder on the input sequence, selectively learns the input and associates it with the output sequence, so that the model focuses on the words that are considered more important in the input sequence. Then, through a fully connected layer network composed of 512, 128, and 32 nodes, the softmax layer is used for multiclassification to predict and output the patent quality level. The specific structure is shown in Figure 4.
(2) Task 2 (NER Task): Sequence Labeling Model. This task is used to identify the terms in the patent text. The first few layers of the model are the same as task 1. Here, we focus on the CRF layer and the output layer.
In the NER task, the decoding layer usually uses the conditional random field (CRF) model to perform label  For an input sequence, the input vector is obtained after embedding to the LSTM, and the score on the label corresponding to each word is obtained after the linear layer is applied. According to the label transition matrix T, we can get the score of the label at the previous moment as y i and the label at the next moment as y i+1 , namely, T½y i , y i+1 . For a sequence x, if the length of the sequence x is n and there are m possible labels, then there are a total of m n possible labeling results. We can use the LSTM + CRF model to calculate the score (y) of each possible annotation result, and then use softmax to normalize to find the probability of a certain annotation result, and choose the one with the largest probability as the annotation result. The specific calculation process is as follows: Among them, w ðy',yÞ and b ðy',yÞ are the training parameters of the label pair ðy ' , yÞ, h t represents the output of the coding layer at time t, θ represents the model parameters, Y ðxÞ represents all possible tag sequences corresponding to the character sequence x. In the tag reasoning process, CRF needs to find the tag sequence y^ * that maximizes the conditional probability given the input sequence x: The search problem of the tag sequence y * can be solved efficiently using the Viterbi algorithm.
The experiment defines the label space as fB, I, E, Og. "B " indicates that the element is the beginning of a domain term, "I" indicates that the element is the middle part of a domain term, "E" indicates that the element is the end of a domain term, and "O" indicates that this element does not belong to the domain term part. The CRF layer can determine the domain terms in the sequence data according to the label sequence output by the model. In the final output, each element is marked as a label in fB, I, E, Og. Take the patent text of "Test system for battery management system" as an example, the sequence length is 6, there are four possible tags for BIEO, and a total of 6 4 possible tagging results, including y = ðOOOBIEÞ, y = ðBEOOBIÞ, ⋯, after the calculation (OOBIE) marked the highest score, that is to identify "battery management system" as a patent term. The specific structure of the model is shown in Figure 5.
The experiment considers that the two tasks overlap greatly, and the domain terms in the text sequence are the part that has a greater impact on the patent level. The sequence labeling task backpropagates to update the parameters, which is conducive to the model to obtain a better sequence representation and pay more attention to the learning of domain terms. The two tasks together form a multitask learning structure, which promotes each other and simplifies the overall task complexity and prediction time. Take the patent text of "Automated electric vehicle" as an example, its specific structure is shown in Figure 6.
This figure combines the previous two models. The left side is the sequence model part of multitask learning, and the right side is the quantitative index model part. The PQE_MT_USA model is obtained after the two parts of vectors are spliced to predict the final result and used for subsequent migration training of Chinese patents.

Patent Quality Evaluation Model Based on Transfer
Learning. If the model PQE_MT_USA trained by the US patent is directly applied to the Chinese patent, the predicted patent rating result will have a large error. The reason is that if you use traditional machine learning methods, you need the data to obey the same distribution and sufficient labeled data. On the issue of labeling data, the process of labeling data consumes a lot of manpower and time costs; on the issue of data distribution, on the one hand, there are cultural differences in the writing process of Chinese and English patents. On the other hand, there are differences in the distribution of various attributes and quantitative indicators of Chinese and US patents. By comparing Chinese and American patent data, it is found that the inventors, attorneys, IPC classification numbers, and CPC classification numbers of Chinese and American patents have the same expression. Although the inventors and agents of the Chinese and American patents belong to two different sets, the experiment quantifies the indicators based on the overall situation in the field, and the training model will not be affected by the specific values. The differences in patents between the two countries are mainly reflected in the following four aspects: (1) the legal status of patents is different; (2) there are differences in the distribution of patents between the two countries in the following 6 attributes: number of cited patents, number of citations by other patents, number of common families, number of citations by other documents, number of references, and international patent docu-mentation center; (3) the patent origination level is different; and (4) the content of the text type is different, including the title, abstract, and claims of the patent. The writing language of the US patent is English, and the Chinese is Chinese. In terms of related indicators, US patents are generally higher than Chinese patents. Because the US patent has the patent strength label required for experimentation, the Chinese patent has the contradiction between a large amount of data and a small amount of labeling. At the same time, the impact of various attributes and indicators on patent strength has the same trend, with only the difference in feature distribution. Transfer learning breaks the same distribution assumption of traditional machine learning and can adapt to the requirements of experiments well.
Transfer learning mainly involves domains and tasks. Given a labeled source domain, that is, an English patent text with a large number of labeled data sets: D s = fx i , y i g n i=1 , and an unmarked target domain, that is, the Chinese patent text without data annotation: D t = fx j g n+m j=n+1 . The data distribution of these two fields Pðx s Þ and Pðx s Þ are different, Pðx s Þ ≠ Pðx t Þ [34]. The purpose of transfer learning is to use the knowledge of D s to learn the knowledge (label) of the target domain D t . That is to say, the process of transfer learning is to transfer knowledge from the source domain to the target domain, complete the model update, and predict the target domain label. Transfer learning can be divided into different   Wireless Communications and Mobile Computing types, and there are many classification methods. This article mainly uses the model-based transfer learning method, that is, based on the self-adaptation of model parameters to find new parameters θ, and the transfer of parameters makes the model better at the target work on the domain to minimize its loss. The calculation formula is as follows: The experiment uses US patents as a domain of origin and Chinese patents as a reference domain. It is impossible to transfer directly between the different languages. This concerns the problem of the classification of texts in the cross language. The experiment uses automatic translation to translate the text of Chinese and English patents and marks the patent levels of some Chinese patents. The model adapts to the distribution of indices and to the linguistic characteristics of Chinese patents. Figure 7 shows the transfer learning process.
To put it simply, taking the patent text in the field of new energy vehicles used in the experiment as an example, the process of transfer learning is to first translate the English patent text in the United States (source domain data) into Chinese patent text (target domain data) and use it as the input of the PQE_MT_USA model mentioned in the previous section is vectorized using Bert, and the model is retrained through the Bi-LSTM encoding layer and CRF decoding layer. The initial parameters are the final parameters obtained by PQE_MT_USA training English text. After the training is completed, new parameters suitable for the Chinese patent grade prediction model are obtained, and then other Chinese patent texts (target domain data) are tested and adjusted to obtain the transferred parameters.
Through comparative experiments, try to freeze the parameters of different layers to obtain the best migration effect. The specific experimental process and results will be described in detail in the fourth part of the experimental results and analysis. The final experiment selects the part shown by the dotted line in Figure 5 to transfer the model parameters.
Although migration learning requires less data, too little migration data for larger networks will still cause overfitting problems in the model. The model needs more migration samples to make the network model have better generalization capabilities. The experiment contains more than 20,000 Chinese patent data. Labeling a large number of data  samples requires a lot of manpower and material resources.
In the process of transfer learning, which patents should be selected for labeling has a greater impact on the results of the transfer. With the same amount of target domain data, selecting samples that are difficult for the model to choose has a better effect than selecting samples that can be clearly classified by the model. The process of letting the model actively propose which data needs to be labeled is active learning. Transfer learning reduces the amount of sample labeling, and active learning improves the quality of labeling samples. The proposal of active learning is mainly due to the different amount of information provided by each sample in the training set for model training, that is, different samples have different contributions to the improvement of the model effect. Active learning mainly includes classifiers, labeled training data sets, unlabeled data sets, query functions used to extract samples with large amounts of information in unlabeled data sets, and supervisors who label the extracted samples.
The experiment selects more efficient and mature methods based on uncertainty reduction. The main query functions include methods based on classification uncertainty UðxÞ, classification margin MðxÞ, and classification entropy HðxÞ. The specific calculation method is shown in the following equations.
The experiment uses the most effective method based on classification margin for multiclassification problems, namely, the best versus second best (BVSB) strategy [35] to calculate the uncertainty of the sample. This method calculates the difference of the category with the highest probability among the sample prediction results as a measure of sample selection. The smaller the difference, the greater is the uncertainty. Figure 8 lists the two real samples in the experiment. In contrast, the sample on the right has a smaller probability difference and greater uncertainty, and the model is more inclined to select it as the sample to be labeled.
By analyzing the output value of each category of the two samples, it is observed that the highest probability of sample A is category 2, its probability is 0.32, the category with the highest probability of sample B is category 1, and its probability is 0.36. Although the final predicted result of sample B is more probable, it can be observed from the overall probability of each category that the model is more confusing for sample B because the difference between the highest category probability and the second highest category probability is smaller, based on classification, the active learning algorithm of margin is good at extracting samples with higher uncertainty for multiclassification problems.
The following is a detailed introduction to the process of transfer learning and active learning algorithms based on the Chinese patent.
First, the English text Patent_USA_En in the source domain data set is translated into Chinese text Patent_ USA_Ch, the Chinese patent quality evaluation model PQE-MT_Ch based on the US patent is obtained by Pat-ent_USA_Ch training, and the frozen model PQE-MT_Ch does not require a training layer. Next, we use the unlabeled target domain Chinese patent data set Patent_CHN_Ch as the training set to continue training the model PQE-MT_   Ch. Then, we use this model to classify and predict the sample set Patent_CHN_Ch, calculate the probabilities C1 and C2 of the two categories with the highest predicted probability for each sample, and get the set S of 100 samples with the largest uncertainty, that is, the smallest C1-C2. Then remove the set S from the sample set Patent_CHN_Ch, manually mark the samples in the set S, add the marked set S to the training set, repeat the above training process, and finally get the migration model PQE-MT-CHN_Ch. This model comprehensively utilizes transfer learning and active learning algorithms and is suitable for the English patent quality assessment of Chinese patents. Figure 9 is a diagram of the overall process of transfer learning and active learning.

Patent Quality Evaluation Model Based on Collaborative
Training. In the experiment of the above transfer model, it is not difficult to find that different languages make the model have different effects. The following research will start to use the difference in feature space between Chinese and English languages, and the different features and knowledge learned by the models in different languages, resulting in different prediction effects for different samples, and use bilingual collaborative training to improve the prediction results of the models. Collaborative training is the mainstream semisupervised machine learning algorithm. The theoretical basis is by training classifiers under multiple views; under the existing small amount of annotated corpus, different features of samples can be learned according to different classifiers, and a large amount of category information of data to be annotated can be further obtained, thus, achieving the purpose of learning by using unlabeled data. The cooperative training process is similar to clustering hypothesis; on the whole, it can be explained that the classification model trained by a small amount of labeled data can only roughly describe the distribution of data, and it is difficult to accurately divide most samples. By using a large amount of unlabeled data, the most accurate data can be found based on the classification of the trained model, so that the model can further find a more accurate classification surface.
In the process of cross-language transfer learning, there is a language barrier between Chinese and English, and the feature spaces of the two languages are different, so it is impossible to transfer the model directly. The experiment used machine translation to unify the languages of the patents of the two countries, so the patents of both countries have two language versions. The experiment hopes to make full use of both Chinese and English feature spaces to further improve the prediction effect of the model.
The experiment trains Chinese patent quality evaluation model PQE_MT_CHN_Ch based on Chinese and Chinese patent quality evaluation model PQE_MT_CHN_En based on English. Through mutual supervision of the two models, a Chinese patent quality evaluation model PQE_MT_CHN_ Co based on collaborative training is obtained.
In the experiment, the collaborative training algorithm used machine translation to translate the Chinese text of Chinese annotation data into English text. Train the Chinese-based Chinese patent quality evaluation model PQE_MT_CHN_Ch on the Chinese text of the labeled Chinese patent data set. In the same way, the English-based Chinese patent quality evaluation model PQE_MT_CHN_En is trained on the English text of the labeled Chinese patent data set. Use these two models to label the unlabeled patent texts in the corresponding data sets and select high-confidence data from them to add to the labeled data set collection. Iterate the above training process. Next, the prediction results of the two evaluation models PQE_MT_CHN_Ch and PQE_ MT_CHN_En are combined according to weights, and finally, the Chinese patent quality evaluation model PQE_ MT_CHN_Co based on collaborative training is obtained. The overall flow of the algorithm is shown in Figure 10. The algorithm ensures that the sample data distribution is balanced with the real situation by setting the number mi of the data obtained under each category.

Experimental Results and Analysis
4.1. Experimental Corpus. 59147 patents are used in the experiment, and the number of patents with different grades is shown in Figure 11. The experiment randomly shuffled the data set to ensure the uniform distribution of different label samples. 90% samples (53475) are for training and 10% samples (5942) are for testing. The verification method chooses k-fold cross-validation for model optimization. The k − 1/k of the data set is used as the train set, and the rest is used as the validation set. The result is the average of k times. The value of k is set to 10. There are 21611 Chinese patents, of which 308 are in the United States, which means that the two countries have applied for 308 patents; at the same time, this led to the formation of a monopoly family belonging to the same patent family, in order to meet quality standards, corresponding to Chinese patents. In this paper, 100 Chinese patents with patent grade are taken as the samples of the final model test, 208-as the original data for migration, and the remaining 21303 Chinese unmarked patents are used as the active learning pool for data screening, Mark.
All experimental data are random, and the extraction method is random. In order to ensure the uniform distribution of different label samples and the accuracy and fairness of experimental results, this paper involves Google's machine translation, and this is the most accurate translation system at present.

Experimental
Settings. The accuracy, recall, f 1-score, accuracy, microaveraging, and macroaveraging used in the text classification evaluation are used for evaluation. The calculation process is as equations (17)- (26). For category C, the classification results can be divided into the following situations: (1) The original type C is classified as type C, and the quantity is recorded as a (2) The original non-C category is classified as C category, and the quantity is recorded as b

Wireless Communications and Mobile Computing
(3) Originally C is classified as non-C, and the quantity is recorded as c (4) The original non-C category is classified as non-C category, and the quantity is recorded as d      Table 3. For readability, the model is defined as follows:  Table 4.

Wireless Communications and Mobile Computing
Through the analysis of the experimental results, it can be concluded that (i) Model_2, which uses word2vec as embedding layer and Bi-LSTM as sequence layer, can achieve 58.25% accuracy on train set, 56.52% accuracy on validation set, and 55.22% accuracy on test set (ii) After adding the attention mechanism, the accuracy of the model is improved by nearly 2 percentage points, which proves the effectiveness of attention mechanism (iii) After replacing the embedding layer with BERT, the accuracy of the model has been greatly improved, but the generalization ability has slightly decreased (iv) After using the sequence model based on multitask learning, the effect of model is further optimized and the evaluation effect is further improved.   [36,37] visualizes the layers of CNN and shows the above theory more clearly. The bottom layer 1 and 2 of the model network can usually learn the basic color and edge features of the object; the third layer of the model network generally learns the texture features of the object; the fourth layer that continues upward can learn local features, such as wheels; the top level learns more discernible overall characteristics.
For the image training model, the results of each network layer are easy to observe. In contrast, the knowledge learned by the text training sequence model is difficult to clearly display, but the above rules are also met on the whole. The underlying structure of the network learns the text the part-of-speech and word-sense characteristics of the word; high-level results learn semantic and syntactic characteristics. In response to this phenomenon, experiments try to perform transfer learning in different layers of the model and compare the optimal transfer part.
In the process of transfer learning, the bottom part of the model is usually kept intact to reduce the risk of model over-fitting. At the same time, for the model in this article, the corpus is first converted into the same language through machine translation. In this process, there is usually no major change in part of speech and word meaning. Due to the differences in culture and writing habits, the migration part is closer to semantics, context, and syntax. At the same time, it meets the requirements of coping with the high-level migration of the network model. The detailed information of each layer of the model PQE_MT_USA is shown in Table 5.
According to the structural characteristics of the network model, the experiment gradually freezes at the bottom of the network, compares the results of different parts of the migration, and obtains the final·migration part. Since only 308 Chinese patents have American patents, they have quality ratings. Take the English data migration results as an example to show and use a total of 308 Chinese patent samples to translate into English text and then perform model migration. Use 100 of them as a test set to compare the effects of different models. Actually, the Chinese patent data used for migration are only 208 articles. The experimental results, respectively, enumerate the microaverage and macroaverages of the accuracy, recall, and F1 value on the training set and the test set. The results are shown in Table 6.
It can be seen from the experimental results that as the low-level parameters of the network gradually freeze, the indicators of the training set show a downward trend, and the indicators of the test set, especially the macroaverage, have a process of first increasing and then decreasing. The main reason for this phenomenon is that the new data set  is small. The larger the migration part of the network during the migration process, the better the fitting effect in the training set, but the problem of overfitting occurs. In the end, the experiment freezes the training of the parameters of the 0-2 layers and selects the 3-14 layers of the network for migration. According to the parameter items in Table 6, freezing the underlying model training can reduce a large number of parameter updates, which improves the model effect and reduces the migration time.
In addition, only the English text of the Chinese patent has experimented with training data, and the patent of model migration was not used. Results of the two experiments are shown in Tables 7 and 8.
From the comparison of Tables 7 and 8, it can be verified that the transfer learning has a certain degree of improvement in each indicator of the model. Due to the lack of training samples in normal supervised learning, the entire TEM model has two obvious problems: first, model TEM poor forecast results in categories with a small number of samples. The results of the accuracy, collection, and f value of the main measurement indicators in category 7 and category 8 were all 0; second, the capacity to generalize the model is very poor, and the problem of overadaptation is obvious. The test set is about 35% lower than the training set in both micro and macroaverages. Comparing results after forwarding learning results in both problems improved. For a few categories of samples, the model has better adaptation, the gap between the microaverage indicators is reduced to 17% on the training set and the test set, and the three indicators of the macroaverage, respectively, reduce to 10%, 16%, and 15%. These two problems have been solved to some extent thanks to the model having learned how the relevant characteristics affect the results of a large number of US patents. Although the distribution of characteristics is different, the overall trend is approximately the same. The model adapts even more to changes in several Chinese patent indicators based on the trend initially learned. Although the model has improved, the effect still cannot meet the final requirements for patent assessment. Hope to improve the model effect by increasing migration data.
We randomly selected 200 unlabeled Chinese patents and manually labeled them as data_random and used the active learning algorithm to obtain the top 200 Chinese patents with the largest classification margin and labeled them as data_active_learning. The experiment starts from 0 and increments each time by 20 on these two data sets, until it reaches 200. Compare the effects of randomly selected data and active learning to select data on model migration. The comparison of the prediction accuracy of the model in the two cases is shown in Figure 12.
By comparing the two trends, it has been found that using the active learning method to select data is better than randomly selected data. The effect of the model that used active learning to select about 140 data to be labeled is equivalent to the effect of randomly selecting 200 data labels. Then, the experiment uses active learning to further expand the labeled data, and active learning can get the most informative data. And each round obtained the first 200 data with the largest classification margin for manual labeling. A total of 2000 samples labeled as new migration data were   Table 9. Furthermore, the experiment was conducted in the same way using the patented English language, and the results are given in Table 10.
Experimental results in Tables 8 and 9 are compared to observe the effect of enabled learning on the effect of the data expansion model. Using the initial data migration model, micro-F1 values in the training set and in the test set were 83% and 66%, and macro-F1 values were 70 and 57%. After active learning for data expansion, the values of micro-F1 were 79% and 74% and the macro-F1 values were 68% and   Figure 12: Comparison of the accuracy of transfer learning between active learning and randomly selected labeled data.

Wireless Communications and Mobile Computing
63%. Due to the increase in the amount of migrated data, the ability to generalize the model became stronger, and the effect of the FOI model improved to a certain extent by providing the test set data.
The results of Table 9 were compared with Table 10, and the effects of Chinese and English were observed on the model effect. The model transition effect for English texts is generally better than Chinese text. The authors believe that the use of machine translated Chinese data leads to degradation of the initial model effect in the case of using source models trained by U.S patents due to the limitations of machine translation techniques, resulting in the final effect of the transfer model is reduced; the number of Chinese patent translations used for the target model has a lower overall influence factor; in addition, the differences in the Chinese and English writing specifications and feature spaces also cause difference in model learning effects.

Experimental Results and Analysis of Patent Quality
Evaluation Model Based on Multitask Learning. The article uses SVM as a classification model and Chinese patent text as an example to compare and demonstrate the use of TF-IDF feature vectors and the bag-of-words method for feature word extraction using mutual information, information gain, and chi-square, as well as the three methods to extract feature words take the result of the union. The SVM classification algorithm based on bagof-words is very critical in the selection of feature words. The experiment uses feature extraction formulas for different types of patents to calculate, by manually setting a threshold as a reference value, and extracting words higher than this value as the input features of the bag of words [38]. Table 11 shows the accuracy of algorithm prediction in different situations.
The experimental results show that the SVM classification algorithm based on TF-IDF achieves 68% accuracy on the training set, but in contrast, the prediction result on the test set is reduced by nearly 10%, showing a trend of overfitting. Although the bag-of-words method has a lower accuracy in the training set compared with the TF-IDF method, the accuracy in the test set has been greatly improved. Due to the    Figure 13. It can be observed that the closer the distance between the samples, the greater the mutual information value. This result fully demonstrates the effectiveness of the training set data annotation. Experiments with different numbers of feature words, the number of feature words with the best model effect is obtained, and the result is shown in Figure 14. The figure shows the dimensions of feature words, and the input of the model is a vector of word bag features and quantitative indicators.
A comparative analysis of the experimental results shows that the accuracy of the model in the training set increases as the dimension of the feature word increases. When the dimension of the feature word is too high, overfitting problems will occur. The effect of the model on the test set gradually decreases. Use the 400-dimensional bag-of-words vector with the best comprehensive results to further test different classification algorithms. The results are shown in Table 12.
It can be seen that the classification accuracy of Bayesian and kNN algorithms is poor, and the decision tree algorithm is not suitable for patent quality evaluation classification due to the overfitting problem caused by the complexity of the sample. Finally, SVM is selected as the classification method for classification.
In the experiment, CHI with better results is selected as the feature word extraction method. Extract 400dimensional bag-of-words vectors, perform SVM classification on the text, perform collaborative training with the migrated deep learning model PQE_MT_CHN at the same time, and finally compare the prediction effects of the two methods.
In the experiment, the proposed collaborative training method was compared with the following benchmark methods:    The results of the above model are shown in Table 13. In addition, the experiment compares the number of samples of each category and optimizes the experimental parameters for the iteration rounds of collaborative training and each iteration process. The specific results of the model are shown in Figures 15 and 16. The following text will analyze and summarize the results of this part.
Experimental results show (1) From the comparison of the various data in Table 13, it can be concluded that the Chinese patent quality evaluation model constructed by the experiment using transfer learning is better than the bestperforming SVM in traditional machine learning in the overall transfer effect on the small training set. It proves the feasibility of using transfer learning in this field (2) After combining the Chinese and English features of the two models, the prediction effect of the training set has been improved, but due to the lack of training corpus, certain overfitting problems have occurred, resulting in the model's effect on the test set decline. It further proves the necessity of using collaborative training instead of splicing Chinese and English directly (3) The proportion of each type of data selected in each iteration of collaborative training has a greater impact on the improvement of the training effect. It can be observed from Figure 15 that when the data proportions of each category are the same, the sample space of model learning will change, leading to the reduction of model prediction effect, and the model is the most stable when the data with the same proportion of each category in the initial data set is selected (4) After using the collaborative training algorithm, the semisupervised training method further optimizes the fitting effect of the model on the small training set, and all the evaluation indicators of the model are further improved. The PQE_MT_CHN_Co proposed in the article reaches the highest level in all indicators. Figure 14 can analyze that collaborative training can improve the effect of both Chinese and English models at the same time. Initially, the effect of the two models is quite different, resulting in a lower effect of the model's joint prediction. However, after multiple rounds of iterations, the prediction effects of the two models gradually approached, and the joint prediction effect of the two models is better than that of one of the models. The micro-F 1 value of the final model reached 83% on the training set and 77% on the test set. It shows that the method proposed in the article has improved the effect of quality evaluation to a certain extent

Conclusion and Future Work
Patent quality evaluation model-PQE_MT is proposed in this paper by quantifying the patent indexes from different dimensions and taking the long text part of the patent as one kind of index. The proposed model combines quantitative index model and sequence model based on multitask learning to predict the quality rating of patent. The advantage of this model is that it combines the numerical attribute part with the text part of the patent as the evaluation indexes. Therefore, the feature extracted from the patent is more comprehensive. In addition, multitask learning is added to train two tasks together, so that the process of updating parameters by backpropagation can promote each other and improve the effectiveness of the model. Compared with the baseline model, the PQE_MT model improves the accuracy of English patent quality assessment. The migration process mainly includes three parts: the selection of migrant parts, the cross-sectoral transition, and the use of active learning to enlarge data. The advantage of this model is to select the data labeled from the maximum information quantity using the transition learning technology and the positive learning and to reduce the time cost of manual annotation. In the experimental process, the prediction effect of the model gradually improved, and finally, the Chinese patent quality evaluation model achieved good accuracy. At the end of the experiment, we compared the effects of transmission learning between Chinese and English and found that the results were different and the model learned different features. In the follow-up task, we consider two different character models to have different characteristics that may affect each other to improve different results.
In the future, more comprehensive and more meaningful indexes should be quantified. Patent information such as patent drawings, patent research and development cycle, and other internal information of the company are still unused in the experiment, which may improve the predicted results and be more convincing. Moreover, we will try different data expansion methods or semisupervised algorithms to improve the effect of the migration process.

Data Availability
The data used to support the finding of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper. ject under grant no. 6412006200404, Qin Xin Talents