Survey of Knowledge Graph Approaches and Applications

: With the advent of the era of big data, knowledge engineering has received extensive attention. How to extract useful knowledge from massive data is the key to big data analysis. Knowledge graph technology is an important part of artificial intelligence, which provides a method to extract structured knowledge from massive texts and images, and has broad application prospects. The knowledge base with semantic processing capability and open interconnection ability can be used to generate application value in intelligent information services such as intelligent search, intelligent question answering and personalized recommendation. Although knowledge graph has been applied to various systems, the basic theory and application technology still need further research. On the basis of comprehensively expounding the definition and architecture of knowledge graph, this paper reviews the key technologies of knowledge graph construction, including the research progress of four core technologies such as knowledge extraction technology, knowledge representation technology, knowledge fusion technology and knowledge reasoning technology, as well as some typical applications. Finally, the future development direction and challenges of the knowledge graph are prospected.


Introduction
With the rapid development of the Internet, cloud computing, Internet of Things, Big Data and the global economy, there is the explosive growth of network data content, and people's demand for smart technology is increasing. Hence, it is necessary to dig deeper into knowledge content and provide smarter services. Moreover, due to the large-scale, heterogeneous and diversified structure of Internet content, it poses a challenge for people to effectively obtain information and knowledge. Knowledge graph, with its powerful semantic processing ability and open organization ability, has opened up an avenue for knowledge-based organization and intelligent application in the Internet era.
In order to improve the quality of the answers returned by search engines and the efficiency of user queries, Google proposed the concept of knowledge graphs in 2012 and used this technology to build the next generation of intelligent search engines. Its birth is in line with the development trend of computer science and the Internet [Chuan, Xing and Jun (2019)]. Knowledge graph brings vitality to Internet semantic search, and they also display powerful capabilities in the question-and-answer system. The current knowledge basebased search, recommendation and question and answer are based on knowledge graphs. Knowledge graph technology has experienced the development stages of semantic network, description logic and ontology. Knowledge graph is a kind of relationship graph which is obtained by associating different kinds of knowledge. It is essentially a knowledge base of semantic network. It describes the concepts, entities and their relationships in the objective world in a structured form. A layer of overlay network built on the current Web foundation. It can establish the link relationship between concepts on the Web page, and organize the information accumulated in the Internet at a minimum cost to become available knowledge [Huang, Yu, Liao et al. (2019)]. The idea behind the knowledge graph can be traced back to the expert system, which was born in the 1970s. The expert system is a program system with a lot of specialized knowledge and experience. It applies artificial intelligence technology and computer technology. It is based on the knowledge and experience provided by one or more experts in a certain field to carry out reasoning and judgment, and simulate the decision-making process of human experts, so as to solve the problem Solve complex problems that need to be dealt with by human experts. Tim berners-lee, the father of the world wide web, proposed the semantic web in 1998. The semantic web is an intelligent network that can make judgments according to the semantics, and realize the barrier-free communication between people and computers. It is just like a giant brain with extremely high intelligence degree and strong coordination ability. In 2006, Tim berners-lee proposed the concept of Linked Data, in which Data is not only published in the semantic web, but also links between Data to form a huge Linked Data network [Xu, Sheng, He et al. (2016)]. Despite the rapid development of artificial intelligence in recent years, computers still face the dilemma of not being able to obtain semantic information of web texts. With the knowledge graph as an aid, the machine can understand the meaning behind the text, the search engine can gain insight into the semantic information behind the user query, return more accurate structured information, and express the information of the Internet closer to the human cognitive world. Forms, more likely to meet the user's query needs, provide a better ability to organize, manage, and understand the vast amount of information on the Internet. At present, knowledge graph technology plays an important role in intelligent search, intelligent question and answer, intelligent recommendation, intelligence analysis, anti-fraud, user input disambiguation, social network, finance, medical, e-commerce and education and scientific research [Li and Hou (2017)]. This paper first briefly introduces the concept of knowledge graph and its historical origin, and then introduces the key technology of knowledge graph. According to the process of knowledge graph construction, it is mainly divided into four technologies: knowledge representation technology, acquisition extraction technology, knowledge fusion technology and knowledge reasoning technology. Finally, the typical application of knowledge graph in various fields of the current information age is introduced.

Key technology of knowledge graph
The architecture of the knowledge graph sees Fig. 1 below, that is, the construction pattern structure of the knowledge graph. The part in the dotted line box is the construction process of the knowledge graph, and also includes the update process of the knowledge graph. Building knowledge graph can extract knowledge elements such as entities, relationships, attributes, etc. from some published semi-structured, unstructured and third-party structured databases, and store them in the data layer and pattern layer of the knowledge base; The knowledge elements are represented by certain effective means to facilitate further processing; then the ambiguity between the referential items and the fact objects such as entities, relationships, attributes, etc. is eliminated, and a high-quality knowledge base is formed; finally, the existing knowledge base can be further developed. Excavate implicit knowledge to enrich and extend the knowledge base. This process includes four processes: information extraction, knowledge representation, knowledge fusion and knowledge reasoning. Each update iteration includes these four stages. This paper will focus on the four processes of building knowledge graph, and explain the key technical means of building knowledge graph [Liu, Li, Duan et al. (2016)

Knowledge representation technology
Knowledge representation is the foundation of knowledge mapping construction and application, it has been widely used in natural language processing and image recognition, etc. However, knowledge representation based on triples cannot fully and completely represent the semantic relations between entities, and there are problems such as high computational complexity, low reasoning efficiency and sparse data. Its purpose is to represent the research object as a low-dimensional dense vector. In low-dimensional space, the closer the object is, the more similar the object is semantically. Knowledge representation is to graph entities and relationships from different sources to the same continuous and dense low-dimensional vector semantic space, while preserving the structure and semantic relationships in the graph, so as to reduce the high-dimensional and heterogeneous nature of the knowledge graph, realize the integration of heterogeneous knowledge, and efficiently achieve semantic similarity calculation tasks, significantly improve the calculation efficiency and calculation efficiency; Each entity is mapped to a dense vector, which can effectively solve the problem of data sparsity. It can also be widely used in various downstream learning tasks such as knowledge graph completion, relationship extraction and intelligent Q & A. Knowledge representation learning models can be divided into three categories: distance-based translation model, semantic-based matching model, matrix decomposition model and neural network model, [Cao and Zhao (2015)].

Information extraction technology
Knowledge graph data sources include text, images, sensors, video, etc., in general can be divided into data was obtained from the web page and extract get from database such as data collection, data sources, vast amounts of data information from the Internet and open industry data extraction of available knowledge unit, including entity, the entity attribute, the relationship between entities such as elements, the data on the structure of the different sources, different extraction, in the form of structured knowledge into knowledge graph. The category can be adjusted according to the requirements of the knowledge graph. Relational extraction is to extract semantic relations of multiple entities [Ou (2018)]. How to extract entities, attributes and relations needed to construct knowledge graphs from different data sources is the key technology for the construction of knowledge graphs. The more complete the extracted knowledge is, the more comprehensive the constructed knowledge graphs will be and the higher the utilization value will be. The difficulty of information extraction is to deal with unstructured data. The first is entity naming recognition, which is to extract entities from the text and classify each entity. This is a relatively mature technology, and there are some off-the-shelf tools that can be used to do this. Secondly, we can extract the relationship between entities from the text through the relationship extraction technology. In the process of entity name recognition and relationship extraction, there are two more difficult problems: one is entity unification, that is to say, some entities are not the same in writing, but actually point to the same entity. Entity unification can not only reduce the types of entities, but also reduce the sparsity of the graph. Another problem is anaphora resolution, which entity the pronouns in the text point to respectively [Zhuang, Li and Feng (2016)].

Knowledge fusion technology
Knowledge fusion is similar to ontology integration. Because the data sources used in knowledge extraction of knowledge graph are diverse, the quality of knowledge may be uneven, the knowledge from different data sources is repeated, and the hierarchical structure is missing.
Knowledge fusion can eliminate the ambiguity between reference items such as entities, relations, attributes and fact objects, and make knowledge from different knowledge sources integrate, disambiguate, process, verify and update heterogeneous data under the same framework and specification, so as to achieve the fusion of data, information, methods, experience and human thoughts and form a high-quality knowledge base. Knowledge integration is divided into: (1) entity alignment: it can be used to determine whether multiple entities in the same or different data sets point to the same entity in the objective world and solve the problem that one entity corresponds to multiple names.
(2) attribute value filling: for the same attribute with different values, make decisions according to the number and reliability of data sources and give relatively accurate attribute values. In the context of big data, due to the influence of the size of the knowledge base, the following three challenges will be faced in the entity alignment of the knowledge base: (1) computational complexity. The computational complexity of the matching algorithm increases twice with the size of the knowledge base, which is difficult to accept.
(2) data quality. Because different knowledge bases are constructed for different purposes and in different ways, there may be problems such as uneven knowledge quality, similar and repeated data, isolated data, and inconsistent data granularity.
(3) prior training data. It is very difficult to obtain such prior data in large-scale knowledge base. Usually, researchers need to construct prior training data manually.

Knowledge reasoning technology
Knowledge reasoning is an important means and key link in the construction of knowledge graph. Through knowledge reasoning, new knowledge can be found from existing knowledge. Due to the incompleteness of data sources and the inaccuracy of extraction process, it is necessary to use the existing knowledge graph fact and reasoning technology to further mine the missing and deeper entities and relationships from the semantic web and other corresponding knowledge bases, so as to realize the completion of knowledge graph and the denoising of knowledge graph, so as to enrich and improve the knowledge graph [Qi, Gao and Wu (2017)]. Knowledge inference derives the relationship between new entities from a given knowledge graph, which plays an important role in knowledge calculation, such as knowledge classification, knowledge verification, knowledge link prediction and knowledge completion. At present, the methods of knowledge inference mainly include: (1) reasoning based on traditional methods. It also includes methods based on traditional rule inference and ontology inference.
(2) one-step reasoning. Also includes: a) inference based on distributed representation (representation inference based on transition, representation inference based on tensor/matrix decomposition, representation inference based on spatial distribution); b) reasoning based on neural network; c) mixed reasoning (mixed rules and distributed representation reasoning, mixed neural network and distributed representation reasoning).
(3) multi-step reasoning. Also includes: a) rule-based inference (rule-based inference based on global structure and rule-based inference based on local structure); b) distributed representation based reasoning; c) reasoning based on neural network (neural network modeling multi-step path reasoning, neural network simulation computer or human reasoning); d) mixed reasoning (mixed PRA and distributed representation reasoning, mixed rules and distributed representation reasoning, mixed rules and neural network reasoning) [Fu, Lu and Yan (2018)]. Various reasoning methods have different reasoning abilities. In general, mixed multistep reasoning is better than mixed single-step reasoning in reasoning performance. However, the current mixed reasoning is still limited to the mixing of two methods. For this purpose, the knowledge reasoning technology, the future research direction is mainly oriented to multiple relations of knowledge reasoning, multi-source information fusion and a variety of methods of knowledge reasoning and knowledge reasoning based on small sample study, the direction of the dynamic knowledge reasoning, further improve the inference speed and guarantee the efficiency of the reasoning and time for the user to provide the latest and accurate knowledge [Guan, Jin, Jia et al. (2018)].

Application
Equations and mathematical expressions must be inserted into the main text. Two different types of styles can be used for equations and mathematical expressions. They are: in-line style, and display style. Knowledge graph provides a more effective way for the expression, organization, management and utilization of massive, heterogeneous and dynamic big data in the Internet. It can intelligently process massive information, form a large-scale knowledge base, and then support business applications, so that machines can better understand the network, users and resources, and provide users with new intelligence Chemical services. The application of knowledge graph is the current research hotspot in the field of information, and it is also one of the basic technologies to promote the development of artificial intelligence. Currently, knowledge graph has been applied in semantic search, intelligent question and answer, intelligent recommendation and some vertical industries.

Semantic search
A knowledge graph is a formal representation of the objective world, mapping strings to transactions (entities, events, and relationships between them) of objective events. With the knowledge support of knowledge graph, the current keyword based search technology can rise to entity and relationship based search, which is called semantic search (also known as semantic search). The traditional search engine searches the web pages in the background database based on the keywords entered by users, and feeds back the links of web pages containing search keywords to users. Semantic search first map the user input keywords to the knowledge graph of one or a set of entity or concept, and then based on the concept hierarchy of knowledge graph analysis and reasoning, accurately capture the user search intention, provide an answer to satisfy user's search intention, directly instead of containing keywords related links. Return a wealth of relevant knowledge to the user. After Google proposed semantic search, Baidu's "Zhixin" and Sogou's "zhicube" are also committed to using knowledge graph technology to improve the user's search experience. At present, semantic search based on knowledge graph can achieve: (1) Provide structured search results in the form of knowledge cards. For example, when users search Peking University, the content of the knowledge card shows, including the address, postcode, profile, founding year and other relevant information of the University [Cao and Zhao (2015); Yang (2018); Sun, Chang and Zhu (2018)].
(2) Understand the questions described by users in natural language, and give corresponding answers, that is, simple intelligent Q & A. For example, when users type "what's the biggest country in the world?" in search by asking questions, the feedback page can give accurate information about Russia.

Figure 2: Baidu's search results
The biggest country in the world is Russia (area: 17, 098, 200 square kilometers). The Russian federation, commonly known as Russia, is a constitutional federal republic consisting of 22 autonomous republics, 46 prefectures, 9 border regions, 4 autonomous regions, 1 autonomous prefecture and 3 federal municipalities directly under the central government. The national flag is white, blue and red. The main body of the national emblem is shown in Fig. 2. (3) Expand user search results, find more content and feed back rich association results through the association of entities in the existing knowledge graph. For example, when users search for Da Vinci. Leonardo Da Vinci, Painter, inventor, artist, born in Florence, the town of finch, graduated from the Italian institute of technology, Italy's famous painter, scientist, one of the three Italian Renaissance, but also the representative of the entire European Renaissance. He studied art in Florence, and after arriving in milan in 1482, he wrote and studied in the royal court, and then drifted to Rome and Florence. His greatest achievement was painting, a masterpiece...Sogou encyclopedia. In addition to his profile, they can also return extended information such as his related paintings "Mona Lisa" and "Last Supper" as shown in Fig. 3.

Intelligent Q & A Q & A (Questions and Answers) system
is an advanced form of information retrieval system, which can provide users with accurate and concise natural language to answer questions. The reason why Q & A is a high-level form of retrieval is that there are two important processes of query understanding and knowledge retrieval in Q & A system, and they are completely consistent with the relevant details in the corresponding process of intelligent search. Most question-answering systems prefer to decompose a given question into several small questions, then extract matching answers from the knowledge base one by one, and automatically detect their consistency in time and space, and finally combine the answers and present them to users in an intuitive way. The intelligent Q & A system can be divided into many kinds according to the way of data processing. Although different types of Q & A system have some differences in the division and implementation of the overall module of the system, in general, according to the data process of the intelligent Q & A system, the framework of problem processing includes three functional modules: problem understanding, information search and answer generation. Natural language processing usually uses natural language technology to deeply segment and understand problems [Yang (2018) ;Hou, Wei, Lu et al. (2018)]. At present, knowledge graph has been introduced into many question-and-answer platforms, such as "xiaodu" robot developed by baidu in China and OASK, a large-scale online question-and-answer system developed by Tianjin juwen network technology service center, which is specialized in providing good interactive question-and-answer solutions for portals, enterprises, media, education and other websites. The university of Washington's Paralex system and apple's Siri, the intelligent voice assistant, can provide answers, introductions and other services. Evi, the natural language assistant acquired by amazon, authorized Nuance's speech recognition technology, developed with the True Knowledge engine, and also provided services similar to Siri [Zheng, Zhai, Hu et al. (2019)].

Intelligent recommendation
E-commerce website is one of the typical application scenarios of intelligent recommendation. From the description of business logic, intelligent recommendation is to screen and filter massive product information, display the product information that users are most concerned about and interested in to them, improve their shopping experience, and help realize precision marketing and recommendation through the rich knowledge of the industry knowledge graph. From the perspective of business implementation, knowledge graph technology is introduced to extract commodity information and user information. Based on user profiles, correlation information between items, and information extracted from web pages. When the user enters the keyword to view the product, it will recommend the relevant knowledge that may be needed to the user based on the knowledge graph, including the product result, usage suggestion, collocation, etc., and make relevant intelligent recommendation through "guess you like" or "others are still searching" that you may be interested in. Different users will see different recommendation results, which has important business value. Intelligent recommendation varies according to different business scenarios, mainly the rules for displaying recommended commodities on the front end of e-commerce platform. In the Knowledge graph and data application-intelligent recommendation written by Zhou Jing and others, three kinds of commodity data are obtained through recommendation algorithm, and an intelligent recommendation system based on the application of knowledge graph data is proposed. Recommendation strategy: 70% user portraits (strong correlation), 20% similar tags user portrait (weak correlation), 10% multiple user portraits (not related), as shown in Fig. 4:   Figure 4: intelligent recommendation strategy of 7:2:1 Intelligent recommendation can take strategies for intelligent recommendation of user recommendation data based on the delivery proportion of these three types of goods, so as to realize different recommended goods in different scenarios, different recommended goods by different users, and different recommended goods by the same user in different scenarios, so as to achieve the goal of personalized recommendation and thousands of people [Zhou, Sun, Yu et al. (2019)].

Financial Industry application
The application of vertical industry is represented by the fields of finance, medical treatment and e-commerce, which has shaped the application scenes of financial antifraud, intelligent marketing and product recommendation. Knowledge graph, is essentially a semantic network, is based on the graph data structure, data in the financial industry, there are a large number of entities and relationships, and to establish a connection, can break through the traditional computing model, the depth of the existing data integration of financial industry, combined with the external data, mining potential customers more efficiently and warning potential risks, to help the financial industry business improve efficiency, and the value [Liu (2018);Tang, Chen, He et al. (2018)]. Taking the financial industry as an example, a brief description of the typical application of knowledge graphs in this industry is given below.

Marketing application
Explore potential customers. Through the existing data and external data accurately and quickly find the relevant business potential customers, for the promotion of banking business is of great benefit. Dig deep into potential customer demand. Knowledge graph system based on bank customer relationship can be expanded flexibly and combined with various data sources and user behavior data to more accurately analyze customer behavior, understand the potential needs of existing customers and make accurate push. Similarly, knowledge graph can be used to analyze the capital relationship, legal person relationship, upstream and downstream investment relationship, business relationship of similar enterprises, etc., so as to explore the potential needs of enterprise customers and recommend products and services for them. In terms of precision marketing, Knowledge graph can form a complete description of the knowledge system of users or user groups through multiple linked data sources, so as to better understand, understand and analyze the behavior of users or user groups. For example, marketing managers of financial companies use knowledge graph to analyze the relationship between user groups to be sold, to discover their common interests, and to develop marketing strategies for such user groups more specifically.

Risk control application
Anti-fraud applications. Anti-fraud plays an important role in financial risk control. The core of anti-fraud is people. By integrating all data sources and behavioral data related to borrowers into the anti-fraud knowledge graph, anti-fraud analysis and prediction can be conducted efficiently and accurately. In the application stage, the relationship graph of known fraud elements (mobile phones, devices, accounts, regions, etc.) is constructed, and then the full amount of risk data is statistically analyzed to establish the customer risk characteristics information base, so as to achieve anti-fraud in the trading stage. Internal audit internal control application. With the help of relationship mining and knowledge graph, knowledge graph itself is an intuitive way of relationship expression, which can help more effectively analyze the specific risks existing in complex relationships and help financial institutions improve the efficiency and accuracy of internal audit and internal control systems [Zhang, Cao, Chen et al. (2019)].

Forecast application
Industry forecast of potential risks. By subdividing the industry, establish a relationship mining model according to the payment information and industry information, display the industry correlation degree, timely predict high-risk industries or related industries involved in events, and predict and avoid risks as early as possible. Customer prediction of potential risks. By establishing knowledge graphs of customers, enterprises and industries, data between industries and enterprises can also be connected, and based on the potential risk prediction of the industry, enterprise customers related to industry risks and systematic risks can be timely discovered. Data is an extremely important resource in the financial industry. The concept of knowledge graph based on association relation can break through the limitation of existing relational database and obtain the value brought by data more efficiently, accurately and quickly [Jiang, Huang and Zhao (2016)].

Conclusion
Since Google put forward the concept of knowledge graph, its popularity is still increasing. Through in-depth observation and analysis of the technical system of knowledge graph construction, it can be seen that it is a practical technology based on the research results of multi-disciplinary fields. Knowledge graph is an important branch of artificial intelligence and knowledge engineering. It aims to imitate the way of thinking of human beings and has profound significance for efficient knowledge management, knowledge acquisition and knowledge sharing in the era of big data. At present, knowledge graph has been applied in many fields, and shows an important role. Although great progress and development have been made, there are still a lot of problems to be solved urgently, especially the explosive growth of data and the increasing scale of knowledge graph, which presents a variety of characteristics and requirements such as complex structure, dynamic change of data and real-time response of query, which further increases the challenge of knowledge graph [Hou, Wei, Lu et al. (2018)]. First of all, although the era of big data has produced a large amount of data, but the release of data is not standardized, and the data quality is not high, so it is necessary to mine high-quality data. Secondly, the construction of verticals knowledge graph lacks natural language processing resources, especially the lack of dictionaries, which makes the construction of verticals knowledge graph costly. Finally, there is a lack of open source tools for knowledge graph construction. At present, many researches are not practical. and few tools are published, so the general knowledge graph building platform is difficult to achieve. Based on the brief introduction of the definition and history of knowledge graph, this paper further studies the four core technologies of knowledge representation, knowledge extraction, knowledge fusion and knowledge reasoning in the construction of knowledge graph. It also introduces its practical application in semantic search, intelligent Q & A, intelligent recommendation and financial industry based on the needs of the current industry, and summarizes the main challenges faced by the current knowledge graph. In the next few years, the knowledge graph will still be the frontier research problem of big data intelligence. It is expected that more researchers can join in the study of knowledge graph, and it is also hoped that this paper can provide some help for the research and development of knowledge graph technology.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.