Evaluation on ChatGPT for Chinese Language Understanding

ABSTRACT ChatGPT has attracted extension attention of academia and industry. This paper aims to evaluate ChatGPT in Chinese language understanding capability on 6 tasks using 11 datasets. Experiments indicate that ChatGPT achieved competitive results in sentiment analysis, summary, and reading comprehension in Chinese, while it is prone to factual errors in closed-book QA. Further, on two more difficult Chinese understanding tasks, that is, idiom fill-in-the-blank and cants understanding, we found that a simple chain-of-thought prompt can improve the accuracy of ChatGPT in complex reasoning. This paper further analyses the possible risks of using ChatGPT based on the results. Finally, we briefly describe the research and development progress of our ChatBIT.


IntroductIon
Large Language Models (LLMs) are one of the most indispensable techniques in natural language processing research in recent years, and a lot of research work has been generated in academia and industry around LLMs: GPT-3 [1], LaMDA [2], Ernie 3.0 titan [3], PANGU-α [4], WeLM [5] and GLM-130B.According to Kiela et al. [7], LLMs have reached or even exceeded the human average in multiple tasks such as language understanding, dialogue generation, etc.Most studies assess the capabilities of language models by their performance on a range of specific datasets.These studies are still in the initial stage of whether the language model can correctly respond to multiple instructions, meet reasonable requirements, and whether the generated content is aligned with users.

Evaluation on ChatGPT for Chinese Language Understanding
ChatGPT  has demonstrated impressive language ability in the process of interacting with humans by tracking the progress of dialogue and answering questions, admitting mistakes, and correcting and rejecting unreasonable requests.There have been some studies on the impact of ChatGPT on academia and education industries, and it is generally believed that ChatGPT can improve work efficiency in a variety of fields, but it needs to be used wisely [8].On the impact of LLMs on scientific research, van Dis et al believe that AI has some benefits, but that academics need to establish more explicit accountability for using AI [9].In the scientific research process, researchers need to be more open and transparent about the usage of AI, and Thorp [10] takes a negative view on questions such as whether LLMs can be authors of papers or whether artificial intelligence can be copyrighted.
The previous research work on performance testing of ChatGPT and other GPT-3.5 series models has focused on zero-shot and few-shot using public datasets, and comparing the evaluation results with supervised fine-tuning models, and previous studies have shown that ChatGPT performs extremely well on many English datasets, and ChatGPT is even comparable to supervised fine-tuning models on some evaluation metrics [11].In the fields of machine translation, sentiment analysis, and fake information detection, ChatGPT surpassed SOTA performance at zero-shot but performed poorly on open-domain knowledge question answering and task-based dialogue tasks [12], machine translation performance in low-resource languages is also weaker than existing translation products [14].However, there are currently few comprehensive evaluation of ChatGPT on Chinese language understanding tasks.
In this paper, we first give a brief introduction to the GPT family models and related research with large language models.Then we use several public datasets to evaluate the Chinese language performance of ChatGPT, compare it with the models developed in China, and study its specific performance in Chinese.We found that despite ChatGPT has impressive performance on classic NLP tasks, it still faces the challenge of fabricating facts due to hallucination.In addition, on two more difficult Chinese understanding tasks, we show that the performance of ChatGPT can be improved by chain-of-thought prompts  .
We analyze its impact and potential risks based on the performance evaluation results.Finally, we introduce our ChatBIT, a knowledge-augmented dialogue system that uses citation to enhance the credibility of generated content and mitigate the impact of hallucination.
In summary, our main contributions include: • We evaluate ChatGPT with zero-shot prompting on sentiment analysis, summarization, reading comprehension, closed-book question answering, cants understanding, and idiom filling in the blank.We also improved ChatGPT's answer accuracy on the last two tasks by using simple chain of thought prompts.
 https://openai.com/blog/chatgpt All evaluation experiments are based on the gpt-3.5-turbomodel.The evaluation of cants understanding and idiom filling in the blank tasks were finished during the period from April 7th to April 9th, and the rest of the experiments were completed during the period from March 5th to March 12th.

Evaluation on ChatGPT for Chinese Language Understanding
• We analyze the potential risks of ChatGPT from three aspects: hallucination, ethics, and privacy security.• We briefly present research progress on ChatBIT, our more trusted chat system.

PrelImInAry of GPt fAmIly models
Language model is one of the research topics in natural language processing, BERT [15].GPT [1] and other models are widely influential innovative research work in LLMs.ChatGPT is a widely influential and groundbreaking product based on GPT-3.5.This section will introduce the main technological innovations of ChatGPT.
Generative pre-trained models.Generative Pre-trained Transformer (GPT) is an autoregressive pretrained model that inputs a sequence of tokens of the current text and predicts the next token.Generative models such as RNN [18], LSTM [19] and Transformer [20] are widely used in deep learning.The basis of the GPT series of models is the decoder of Transformer.In each layer of the decoder, the multi-head selfattention can focus on the different aspects of the sentence, and with the increase of network layers, more advanced and abstract semantic information can be extracted, and the model becomes more capable of understanding textual features.In addition to solid semantic extraction capabilities, multiple language models based on Transformer can also learn from unlabeled data and have better transfer learning capabilities.GPT-1 [16] model follows the decoder structure of Transformer.In GPT-2 [17], layer normalization is placed before each submodule, and layer normalization is added after the last module.
In-context learning.When LLMs are self-supervised on the large-scale corpus, surprising pattern recognition and reasoning capabilities emerge, and in-context learning is one of the capabilities found in GPT-3 [1].With In-context learning, there is no need for gradient backpropagation but a small number of labeled samples and prompts, and then input into the model to guide the inference prediction of GPT-3, which means that LLMs can automatically identify the type of task during decoding and make inferences based on tasks and samples.Noting that GPT-3 has weak code generation capabilities, OpenAI further designed Codex [21], Which is a fine-tuned GPT model on the GitHub public available code.The results of Codex indicate that GPT models can generate computer program code.
Instruction tuning.To make the large language model have better task recognition ability in the case of zero-shot, instruction tuning describes a variety of tasks in natural language and then inputs the task description and training samples into the model for fine-tuning [22].With instruction tuning, language models can generalize their natural language understanding ability to the recognition and understanding of tasks so that they can correctly recognize new tasks in zero-shot inference.
Chain-of-Thought.Chain-of-Thought (CoT) refers to a series of intermediate steps including the reasoning process.In large-scale language models, CoT prompts can be introduced into the context to improve LLM's performance in arithmetic, common sense, and symbolic reasoning [37].CoT prompts can also be automatically generated by LLMs, such as generating code to solve problems [39] based on majority

Evaluation on ChatGPT for Chinese Language Understanding
voting [38], etc.A large amount of high-quality labeled data can be obtained through CoT prompts, and the performance of the language model can be further improved [40].
Reinforcement learning of human feedback.The results of DialoGPT [23], an open-domain dialogue model based on GPT-2, show that it is necessary to use reinforcement learning to prevent the model from producing abnormal responses.To obtain higher quality output, WebGPT [24] uses human feedback to optimize the quality of answers generated by the model and obtain results that are more aligned with human preferences through imitation learning and reinforcement learning.Generative language models need to control the generated content to avoid generating controversial or incorrect text.InstructGPT [25] applies reinforcement learning from human feedback (RLHF) to control the generated content of GPT.Firstly supervised fine-tuning is performed on GPT-3, and then the reward model is trained by ranking the output quality of the model by human annotators [26], and Finally, the supervised fine-tuning model is trained using the PPO [27] algorithm to maximize reward.Through this training method, the model can output text more aligned with humans, avoid sending toxic speech and reduce the likelihood of responding to unreasonable requests.GPT-3.5 is trained for computer language code generation and instruction fine-tuning, Chat GPT is a language model obtained after supervised instruction fine-tuning and RLHF on GPT-3.5.This makes the text generated by ChatGPT more in line with human language habits.

chInese evAluAtIon on chAtGPt
Existing evaluations show that ChatGPT has achieved good results in the fields of machine translation [14], natural language understanding, and sentiment analysis [12].To quantify the performance of ChatGPT in Chinese and compare it with large-scale language models developed in China, this section uses publicly available Chinese datasets to test and evaluate ChatGPT.We mainly test ChatGPT on zero-shot learning in sentiment analysis, automatic summary, reading comprehension, and closed-book question and answer, and in the idiom filling in the blank and cants understanding task, we conducted additional chain of thought reasoning tests.Note that in the evaluation results, except for ChatGPT, for other models we directly use the published results, and although the prompts used by these models are different, we assume that these publicly available results reflect the best performance of these baseline models.In addition, the system message and hyperparameters used in our experiments apply the default settings in OpenAI API.

Sentiment Analysis
Sentiment analysis is a classic NLP task, the model needs to determine the categories of sentiment in the input context, this paper uses the test set of ChnSentiCorp  and EPRSTMT [28], Each of these samples is labeled as two emotional polarities: positive or negative, and the results are shown in Table 1.ChatGPT demonstrated competitive performance in sentiment analysis tasks.We observed a bias in ChatGPT's understanding of context, for example, we used a prompt to indicate that the input text contains only positive and negative sentiment, but "neutral" still appears in the output text of ChatGPT.In Addition, there are some special samples in ChnSentiCorp in which each sentence has a different sentiment tendency, in this case, the advantage we find in a generative model like ChatGPT over a discriminant model is that ChatGPT can automatically analyze the sentiment sentence by sentence and return the result, as shown in Table 2. ChatGPT 第一句话是积极的，表达了受益匪浅的感受；第二句话是消极的，表达了无比郁闷的感受。 (The first sentence is positive, expressing a feeling of benefit; The second sentence is negative and expresses a feeling of the great depression.)

Summarization
The text automatic summary task requires the language model to perform inference on the input long text to output concise and accurate summary text.The text summarization task is divided into extractive abstracts and generative abstracts, and the experiments in this paper mainly focus on generative abstracts and use the test set of LCSTS [29] and TTNews [30].Considering that some samples in TTNews are too long to cause ChatGPT to return results, we randomly sampled 1k samples with less than 1k Chinese characters.the evaluation index is ROUGE-1 [35].
For a summary and headline generation of news texts, ChatGPT achieves excellent performance, but we noticed that in the LCSTS dataset if the prompt is changed to "生成30字以内的摘要:" ("generate summarization under 30 words:"), the average length of the output summary reaches 70 words, which is far beyond the limit of "less than 30 words" in the prompt.So, we changed the prompt to "Generate headlines for the following news:", and the text generated by ChatGPT is closer to the number of words of ground-truth.The results are shown in Table 3,and   Ground-truth "微信系"快速崛起冲击传统电信业务 (The rapid rise of "WeChat" and others impacts traditional telecom business) ChatGPT "免费通讯软件"对电信传统业务造成巨大冲击，全球电信运营商短信营收或将损失540亿 美元。 ("Free communication software" has caused a huge impact on traditional telecommunications business, and global telecom operators may lose US$54 billion in SMS revenue.)

Machine Reading Comprehension
Machine Reading Comprehension (MRC) is one of the mainstream tasks for assessing the comprehension and reasoning ability of LLMs.Extractive MRC, that is, for a given context and a question, language model needs to identify a subsequence that can answer the question from the context as output and evaluates the performance of the model by comparing the matching score of the output sequence with ground-truth.Considering that ChatGPT is a generative language model, this paper converts the extraction task into a generation task and tests the performance of ChatGPT in zero-shot learning by constructing the same prompt template and hyperparameter settings as in PANGU-α [4].The experiment was performed using the validation set of CMRC2018, which is in Chinese Simplified and DRCD [32] of Chinese Traditional.The evaluation metrics are EM (Exact Match) and F1.
Common sense MRC means that the answer to the question is not necessarily given explicitly in the context, the model needs to conduct common sense reasoning through the context and select the most suitable one as the output among the many candidate answers provided.In this paper, the validation set of C 3 [33] is used for evaluation, and the metric is accuracy.Test Results are in Table 5.The evaluated results of reading comprehension show that the Ernie 3.0 titan with a larger parameter scale of 260B has a greater advantage than PANGU-α with a 13B parameter and WeLM with a 10B parameter, confirming that larger general models generally perform better for generative tasks.Locally, ChatGPT is comparable to Ernie 3.0 Titan in terms of reading comprehension in both simplified and traditional Chinese, and overall Ernie 3.0 Titan has more advantages.The reason why ChatGPT's EM score is low and F1 is higher is presumably due to its training method.EM requires two texts to match exactly at the character level to score.ChatGPT is trained by RLHF to generate question-and-answer text with more consideration of human users' feelings, that is, to generate sentences that make the semantics more coherent rather than giving answers directly.Table 6 is an intuitive example.Ground-truth is a segment extracted directly from the context and may not form complete sentences grammatically and semantically.For long input context, the table shows only the beginning and omits the rest of it.

Closed-book Question and Answering
Closed-book question-answering tests do not give context for the answer in the input, and the language model needs to answer based on the knowledge and common sense learned in the pre-training data.The experiments in this paper selected the test dataset of WebQA [34] and CKBQA  .The evaluated metrics are EM/F1 or accuracy, and the results are shown in Table 7. Closed-book question answering requires the model to have a certain reserve of Chinese knowledge and common sense, and ChatGPT is more prone to factual errors in this task.
We further analyzed the output text of ChatGPT on the WebQA test set and found that only 56.96% of the correct answers labeled were contained in its output text, which means that in about half of the cases, ChatGPT will make mistakes, tables 8 is an example of a closed-book question answer.We speculate that the possible reasons may be that the training corpus of the Chinese used is small in scale, or that the Chinese training corpus is of poor quality.

Chain-of-Thought Enhances Chinese language understanding
To understand cants on the Chinese Internet, one usually requires to have a certain understanding of Chinese homonyms, font shapes, and common sense.The DogWhistle dataset [41] comes from the adapted online puzzle game Decrypto.In this game, four players are divided into two groups.Each group of players needs to try to understand the code words sent by teammates and crack the code words of the enemy.We tested ChatGPT's ability to associate and interpret code words using a validation set of its Insider subset.

Evaluation on ChatGPT for Chinese Language Understanding
The task of idiom comprehension and blank filling requires a language model to fill in the appropriate idioms into the blanks of a given sentence based on the meaning of a set of candidate idioms.This task requires the model to have a fuller understanding of the meaning of Chinese idioms.Experiments are evaluated using the validation set of the Chinese idiom reading comprehension fill-in-the-blank dataset ChID [28].
Although CoT prompts are usually used in natural language tasks such as arithmetic and reasoning [37], our experiments show that CoT prompts can be used to improve the ability of ChatGPT in cants and idioms understanding.Specifically, we first ask ChatGPT to explain the meaning of each cant or idiom through prompts, and then let it choose or fill in the blanks according to the previous explanation.
Our experimental results are shown in Table 9, and the evaluation metric is accuracy.The experimental results show that ChatGPT will get higher accuracy using CoT prompts on these two difficult tasks, but it is still not as good as the supervised fine-tuning model.The following are input examples for evaluation, the first round of dialogue for each example is used to generate the inference process of CoT, while the zero-shot group without CoT directly takes the second round of dialogue as input.
In cants understanding, we first prompt ChatGPT to explain the meaning of each cant in a candidate, and then prompt to guess the meaning of the code word based on the context and the code word itself.In the task of filling in the idioms, we first prompt ChatGPT to explain the meaning of each idiom, and then fill in the blanks, and simulate the thinking process of ordinary people through the CoT prompts of these two rounds of dialogue.In both tasks, CoT prompts allow ChatGPT to be more accurate than directly giving the answers.

Evaluation on ChatGPT for Chinese Language Understanding
Table 10.An Example of Cant Understanding.
User 现在我们玩一个游戏，给定以下短语候选集："矛，奇异果，音乐家，丸子"\n请发挥想象力解释候 选集中每一个短语可能的含义。(Now we play a game, given the following candidate set of phrases: "spear, kiwi fruit, musician, meatball"\n Please use your imagination to explain the possible meaning of each phrase in the candidate set.) 道菜。(Okay, let me explain:\n\n1.Spear: Spear is an ancient weapon, which can also be compared to sharp criticism or accusation.So the possible meaning of this phrase is that someone or something has been accused or criticized.\n\n2.Kiwi: The kiwi is a fruit that has a unique look and taste.So the possible meaning of the phrase is that someone or something has unique, distinctive characteristics.\n\n3.Musician: A musician refers to a person engaged in music creation, performance, and other related work.So the possible meaning of this phrase is that someone is a musician, or is related to a musician in a certain scene.\n\n4.Meatballs: Meatballs are a kind of Chinese food, which are usually processed by mixing meat, fish, vegetables, etc. into balls.So the possible meaning of this phrase is that someone is eating meatballs or cooking meatballs.) User 对于候选集"矛，奇异果，音乐家，丸子"\n请你找出暗号"盾牌"可能与候选集中的哪个短语有关 联。你只能输出一个最相关的结果。(For the candidate set "spear, kiwi fruit, musician, meatball"\n Please find out which phrase in the candidate set the password "shield" might be associated with.You can output only one most relevant result.) ChatGPT 经过分析，我认为暗号"盾牌"最可能与候选集中的短语"矛"有关联。因为矛和盾牌在古代战争中通 常是一起使用的，可以互相配合进行攻防。所以这个短语与"盾牌"有一定的联系。(After analysis, I think the password "shield" is most likely related to the phrase "spear" in the candidate set.Because spears and shields are usually used together in ancient warfare, they can cooperate with each other for offense and defense.So this phrase has a certain connection with "shield".) For idiom filling in the blanks, explaining the meaning of idioms is more important for the model to understand the task of filling in the blanks, because the meaning of idioms is highly semantically related to the following sentences, and the range of associations required for cants understanding is vaguer, since in the first round of dialogue, ChatGPT is prompted associating the meaning of the relevant phrase first is not necessarily related to the code word.
In the experiment, we noticed that the choice of CoT prompts is very important for ChatGPT to generate high-quality replies.For example, in the idiom fill-in-the-blank task, if we replace "explain the meaning of each idiom separately" with "make a sentence for each idiom separately", then it will cause the accuracy rate to drop to 45.0%.The following is an example of using different CoT prompts under the same sample as Table 11.
The hallucination of generative models can be observed in this example, where the model generates fluent but meaningless text [42].During training, the model memorizes some frequently occurring segments According to your explanation, choose the most appropriate one from the candidate set of idioms: "大逆无道，正经八百，熙熙攘攘，穿凿附会，身临其境，麻痹大意，势如破竹" to replace the #idiom# in the following example: \n In this article, Google examines the technology behind this process and demonstrates how Cinematic Photos can transform past 2D photos into more #idiom# 3D animations.Depth Estimation Like many of the latest computational photography features, such as portrait mode and augmented reality (AR), cinema photos require a depth map to provide information about the 3D structure of the scene.
ChatGPT 在本文中，Google将研究此过程背后的技术，并演示Cinematic照片如何将过去的2D照片转换为更 具深度估计的3D动画。深度估计像许多最新的计算摄影功能(例如人像模式和增强现实(AR)) 一样，电影照片也需要深度图来提供有关场景3D结构的信息。 In this article, Google examines the technology behind this process and demonstrates how Cinematic Photos can transform past 2D photos into more depth-estimating 3D animations.Depth Estimation Like many of the latest computational photography features, such as portrait mode and augmented reality (AR), cinematic photos require a depth map to provide information about the 3D structure of the scene.

rIsk AnAlysIs
In the performance test, it was observed that ChatGPT frequently commits errors on Chinese knowledge and common-sense Q&A, and it uses fluent sentences to illustrate fabricated false facts, for groups with limited sources of information have a certain degree of confusion.This section aims to analyze the potential usage risks of ChatGPT based on the previous performance tests.

Risk of Hallucination.
The issue of hallucination in generative language models is currently a critical concern, as it undermines the credibility of the content produced by such models.Unfaithful or nonsensical hallucination text may mislead users, and lead to community concerns about the usage of language models.

Evaluation on ChatGPT for Chinese Language Understanding
ChatGPT, as a generative language model, may not be able to correctly answer academic questions and some common-sense questions, which means that the general language model cannot replace the dedicated model in the professional fields for the time being.In addition, because the content generated by LLMs such as ChatGPT may contain factual errors, thus, improper use in the scientific process may distort scientific facts and spread misinformation.
Ethical Risks.The development and application of LLMs will be inevitably accompanied by some ethical issues.On the one hand, the content generated by the LLMs may lead to the creator inadvertently plagiarizing the work of others, resulting in the risk of infringement of intellectual property rights, on the other hand, the reasoning ability and creative ability of LLMs may be used to help illegal acts such as cheating in the exams.It is necessary to restrict the use of LLMs more strictly in education, this is due to the information contained in LLMs is difficult to be quantitatively analyzed, and the impact of the generated content on students' values is uncontrollable.Additionally, because students can easily get answers from LLMs, they may be inclined to use LLMs to complete homework and cheat on exams, which will affect students' ability to think independently.
Risks of Privacy Security.Users need to upload the content of the conversation to the server to obtain the content generated by LLMs.These conversations may contain information such as the user's personality, preferences, and even identity.Users may inadvertently disclose this information to service providers while working with LLMs.Therefore, how to balance the use of LLMs and the protection of personal information may be a problem that AGI will face in the future.

future Work
To address the challenges of LLMs and ensure trusted and controllable generated content, we are developing ChatBIT with a focus on knowledge extraction, reasoning, and reliable citation to the content.The objective is to achieve believable, instant, and transparent LLMs.The model supports domain knowledge question answering, multi-round dialogue, information retrieval, and other functions.At the same time, multiple languages such as Chinese, English, and Japanese are supported, and different languages can be supported according to requirements.Currently, the system has provided web and API services and supports multiple users.
The entire development process of ChatBIT, including data preparation, training, and reasoning is secure and dependable, In addition, we use mixed precision training [43], model quantification and pruning to achieve computational acceleration and reduce the time complexity of training and reasoning.The domain data and instructions we utilize for fine-tuning are of high quality, including the relevant information of the Beijing Institute of Technology and the Chinese knowledge encyclopedia.Fully autonomous and controllable code is utilized for model construction and training, while a controllable scheme is employed for the external service.Table 13 is an example of a trusted citation of ChatBIT.

Table 1 .
Evaluation Results of Sentiment Analysis.

Table 2 .
An Input/Output Example of Sentiment Analysis Task.
Prompt判断以下句子的情感是积极或是消极：{Context}\n (Judge whether the sentiment of the following sentences is positive or negative: {Context}\n) Context 看过于丹解读论语的视频，受益匪浅，想要买本书来收藏。明明都提交成功了，却没到货， 无比郁闷…… (I have benefited a lot from watching the video of Yu Dan interpreting the Analects of Confucius, and I want to buy a book for my collection.All were submitted successfully, but the book did not arrive, which made me feel extremely depressed...)Ground-truth 0 (消极, negative).
Table 4 includes examples of input and output samples.

Table 3 .
Evaluation Results of Chinese Text Summary.

Table 4 .
An Input/Output Example of Text Summary Task.(With the rise of WeChat and others, the traditional telecom business is facing a huge impact.As of the end of 2012, the national SMS sending volume increased by only 2.1% year-on-year, far lower than in previous years.According to Ovum, a well-known telecom consulting company, due to the popularization of free messaging software, global telecom operators' SMS revenue loss will reach USD 23 billion in 2012, and may increase to USD 54 billion in 2016.)

Table 5 .
Evaluation Results of MRC.

Table 6 .
An Input/Output Example of MRC Task.

Table 7 .
Evaluation Results of Chinese Closed Book QA.

Table 9 .
Evaluation Results Using Zero-shot or CoT on Cants and Idiom Understanding Task.

Table 8 .
An Input/Output Example of Chinese Closed Book Question and Answer Task.

Table 12 .
An Example of Making a Sentence using an Idiom before Fill-in-the-blank.This person is always bragging, not serious at all. 2. The shopping mall is really crowded with people today, and I can hardly breathe.3. His reasoning is always a far-fetched way of speaking and unconvincing.4. Watching this movie is immersive, as if you are in the plot of the movie.5.He usually drops his guard, lacks vigilance, and it is only when he takes the exam that he finds out that he has not studied well.6.This team has been unstoppable all the way, winning several games in a row.

Table 13 .
An QA Example ChatBIT with trusted citation User 讲一下北京理工大学的历史。 (Tell me about the history of Beijing Institute of Technology) Beijing Institute of Technology is a national key university established by the Ministry of Industry and Information Technology of China at the vice-minister level, and is one of the key universities of the "Double First Class", "211 Project" and "985 Project".It also is a member of the National Association of Universities, the National Higher Military Industry Colleges and Universities Curriculum Civic Government Alliance, the China Joint Association for Artificial Intelligence Education, the Alliance of Excellence Colleges and Universities, and the Yanhe University Talent Training Alliance.\nThe history of Beijing Institute of Technology dates back to 1940, when it was the Academy of Natural Sciences in Yan'an, and in 1949, after the founding of the new China, it was repositioned as a specialized industrial school in Beijing……)