Exploring Technology Inﬂuencers from Patent Data Using Association Rule Mining and Social Network Analysis

: A patent is an important document issued by the government to protect inventions or product design. Inventions consist of mechanical structures, production processes, quality improvements of products, and so on. Generally, goods or appliances in everyday life are a result of an invention or product design that has been published in patent documents. A new invention contributes to the standard of living, improves productivity and quality, reduces production costs for industry, or delivers products with higher added value. Patent documents are considered to be excellent sources of knowledge in a particular ﬁeld of technology, leading to inventions. Technology trend forecasting from patent documents depends on the subjective experience of experts. However, accumulated patent documents consist of a huge amount of text data, making it more di ﬃ cult for those experts to gain knowledge precisely and promptly. Therefore, technology trend forecasting using objective methods is more feasible. There are many statistical methods applied to patent analysis, for example, technology overview, investment volume, and the technology life cycle. There are also data mining methods by which patent documents can be classiﬁed, such as by technical characteristics, to support business decision-making. The main contribution of this study is to apply data mining methods and social network analysis to gain knowledge in emerging technologies and ﬁnd informative technology trends from patent data. We experimented with our techniques on data retrieved from the European Patent O ﬃ ce (EPO) website. The technique includes K-means clustering, text mining, and association rule mining methods. The patent data analyzed include the International Patent Classiﬁcation (IPC) code and patent titles. Association rule mining was applied to ﬁnd associative relationships among patent data, then combined with social network analysis (SNA) to further analyze technology trends. SNA provided metric measurements to explore the most inﬂuential technology as well as visualize data in various network layouts. The results showed emerging technology clusters, their meaningful patterns, and a network structure, and suggested information for the development of technologies and inventions.


Introduction
In the development of innovative or creative products, many companies are more likely to carry out research and development (R&D) to determine feasibility and prevent failure before the production and launch of products to a market. Many large and small companies try to establish departments responsible for "strategic invention" and "ideation development", which involve patent analysis activities [1,2]. patterns can also help us to identify technology trends from the past decade, which we can use as guidelines for developing next-generation technologies.
This paper is organized as follows: Section 2 describes related works that are applied in this study. Section 3 describes the methodologies used in the patent analysis, including data mining methods, text mining, SNA, and our proposed conceptual framework. Section 4 shows the results and analysis of the findings. Section 5 presents concluding remarks on this work.

Patent Database
In this study, we investigated patent data from the European Patent Office (EPO) [10]. The EPO's worldwide database, ESPACENET (formerly written as ESP@CNET), contains online data on more than 110 million patent documents from around the world, in various data formats.
The World Intellectual Property Organization (WIPO) [3] defines the International Patent Classification (IPC) code in sections A-H: A: Human Necessities; B: Performing Operations, Transport; C: Chemistry; D: Textiles, Paper; E: Fixed Construction; F: Mechanical Engineering, Lighting, Heating, Weapons; G: Physics; and H: Electricity. The IPC code is an index that is used to classify inventions, using international standards for which technology they belong to and providing a hierarchical system of language-independent symbols for classification of patents and utility models, as shown in Table 1 [3,6]. A patent title is considered to be a useful secondary source of patent data, as shown in Table 2. The WIPO has issued rules for patent titles, which should convey meaning, indicate the subject to which the invention relates, and contain evidence in different categories (product, process, apparatus, use) [3]. The information from titles of inventions provides the development guidelines in a particular form, which is very useful for patent analysis.

Patent Analysis Reviews
Many studies have been conducted on patent analysis to find opportunities in various technology fields. The research related to our study can be summarized as follows: Kim et al. (2018) [1] proposed a quantitative analysis for patent documents by applying text mining to extracted keywords. The extracted terms or words came from patent documents based on relevant papers, and their authors' keywords. The most representative terms in this study were applied by "frequency-inverse document frequency" or TF-IDF, which can be used to determine the technical characteristics of patent documents. The expected outcome is an increase in the reliability and quality of patent analysis. Chae and Gim (2019) [2] proposed a model to analyze the technical inventions from patent applications based on IPC (International Patent Classification) and CPC (Cooperative Patent Classification) codes. A "taxonomy tree" will be created using the hierarchical structure of IPC and CPC of each patent, which identifies the invention patterns and technological trends of patent applicants. Ma et al. (2014) [9] conducted an experiment on Nano-Enabled Drug Delivery (NEDD), using commercial data, the "Derwent Innovation Index" (DII). The patent title and abstract were rewritten by a technical specialist to make the original data clearer. The keywords from the title and abstract were extracted and carefully selected by experts. After that, extracted terms analyzed by specific software tools, "VantagePoint" [program available at www.thevantagepoint.com] and "ClusterSuite" [program developed by J.J. O'Brien, with Stephen J. Carley, at Georgia Tech-to be available at www.VPInstitute.org]. The results had suggested possible innovations and trends for technology in NEDD. Jun (2012) [10] proposed various data mining methods to forecast technology trends of the Bio-Industry. The data mining methods consisted of three approaches based on "time series analysis", "association rule mining", and "clustering". The results from the "time series analysis" were used to predict the demand of biotechnology, then assign R&D resources of a company to develop biotechnologies. Secondly, the association rule between IPC codes identified key patents to develop or to buy key patents for biotechnology. Lastly, the patent clustering results let us discover vacant areas of biotechnology and detect the disruptive technologies in biotechnology. Park et al. (2015) [11] proposed a network model to present sustainable technology from patent documents based on the degree of centrality patterns from Social Network Analysis (SNA). The SNA is a network model construction based on graph theory in computer science. The patent document was from the Ford Motor Company [www.uspto.gov]. The IPC codes were used as the elements of vertices. The connection among vertices technologies and sub-technologies suggested the development of new product and services, and R&D planning for future technologies. Choi et al. (2015) [12] proposed a predictive model to identify the technology transfer in patent information analysis, focusing on the extraction of vacant core technologies and monitoring technological trends. The predictive model applied a social network analysis, linear regression analysis, and decision tree modeling. The construction model was expected to be useful in technology management in commercialization, preventing mismatches from expert opinions and the wasting of R&D resources.
Choi and Song (2018) [13] proposed "a topic modelling-based approach to extract hidden topics from logistic-related patents using Latent Dirichlet Allocation" (LDA). The patenting activity and major assignees of each topic will be investigated. The technology trends from topics were classified as "emerging topic", "declining topic", "dominant topic", or "saturated topic". This helps organizations to understand technological trends, and the general technology landscape in logistics. Liu et al. (2019) [14] proposed a network theory and social network analysis to investigate the trends of patent collaboration for a smart grid field in China, the so-called "patent collaboration network". The four indicators, i.e., degree centrality, betweenness centrality, closeness centrality, and eigenvector value, were used to identify the positions of technology in a network, such as the influencer (hub), as well as the interconnections, and the importance of technology.

Summary of Findings and Observations from Related Works
Patent document can be used to analyze technology and innovation trends and to form guidelines to develop new products and services. The results of the patent analysis will be used as decision-making for technology management. Patent data are systematically classified and stored in a database. We can use certain characteristics to discover the hidden patterns in a particular area of technology. The European Patent Office has made the bulk of patent data available for statistical analysis and data mining. The content of non-numerical data, IPC codes, and patent titles can be used to find the answers according to the research objectives. The data mining methods allow us to apply an in-depth analysis to find new knowledge. There is an existing data mining tool available to process "structured data" (e.g., IPC code) and "unstructured data" (e.g., patent titles). In our study, we were interested in finding relationships from the data mining results and found that SNA is a tool that can visualize the network of relationships as a network graph. This helps us understand the flow of data and their important parts.

K-Means Clustering
K-Means clustering is a data mining technique used to group objects or datasets into clusters based on their similarities. The similarity is the total distance between values in each cluster to the centroid, where each centroid has an average cluster value. The closer the distance, the higher the similarity, and vice versa. The measurement of similarity or Euclidean distance can be calculated by The grouping of K-means clustering works as follows: (1) Determine the number of cluster K from the data domain.
(2) Choose K random points from data as centroid.
(3) Set all the data points to the closest cluster centroid. (4) Recalculate the centroid of newly formed clusters. (5) Repeat until there is no change in the centroid, i.e., the data points are in their original clusters.
Next, the cluster validation process was applied to find an appropriate number of clusters in patent datasets. One of the cluster validations that is commonly used to compute results from different values of cluster "k" is the average distance between data points and their cluster centroid. The average distance to the centroid, a function of "k", is plotted and the "elbow point" can be used to roughly determine "k" [6,15,16]. From Figure 1, we can see that the value k = 5 is an elbow point since there is a slight bend on both sides of the point. "structured data" (e.g., IPC code) and "unstructured data" (e.g., patent titles). In our study, we were interested in finding relationships from the data mining results and found that SNA is a tool that can visualize the network of relationships as a network graph. This helps us understand the flow of data and their important parts.

K-Means Clustering
K-Means clustering is a data mining technique used to group objects or datasets into clusters based on their similarities. The similarity is the total distance between values in each cluster to the centroid, where each centroid has an average cluster value. The closer the distance, the higher the similarity, and vice versa. The measurement of similarity or Euclidean distance can be calculated by The grouping of K-means clustering works as follows: (1) Determine the number of cluster K from the data domain.
(2) Choose K random points from data as centroid.
(3) Set all the data points to the closest cluster centroid. (4) Recalculate the centroid of newly formed clusters. (5) Repeat until there is no change in the centroid, i.e., the data points are in their original clusters.
Next, the cluster validation process was applied to find an appropriate number of clusters in patent datasets. One of the cluster validations that is commonly used to compute results from different values of cluster "k" is the average distance between data points and their cluster centroid. The average distance to the centroid, a function of "k", is plotted and the "elbow point" can be used to roughly determine "k" [6,15,16]. From Figure 1, we can see that the value k = 5 is an elbow point since there is a slight bend on both sides of the point. The clustering method is used in market segmentation to find customers that are similar in terms of behaviors. In this study, we applied the marketing approach to determine patent data characteristics. In grouping the patent datasets, we used three attributes (variables)-IPC code, technical fields, technical sectors-to calculate the similarities of each cluster. The clustering method is used in market segmentation to find customers that are similar in terms of behaviors. In this study, we applied the marketing approach to determine patent data characteristics. In grouping the patent datasets, we used three attributes (variables)-IPC code, technical fields, technical sectors-to calculate the similarities of each cluster.

Text Mining
Text mining is a process of knowledge discovery from text documents. The common practice for text mining is the analysis of the information extracted through text processing to form new facts and hypotheses that can be explored further with other data mining algorithms [6,7,[15][16][17].
The major processes of text mining are as follows: • Tokenizing: the process of breaking text from the document into single words (tokens or terms).

•
Filtering out stop words: the process of removing meaningless elements (punctuation marks, special characters, prepositions, articles, pronouns, etc.) • Transforming cases: the process of transforming all characters into either lowercase or uppercase to avoid confusion between similar words in different cases.

•
Stemming: the process of reducing the base form of some single words or their stems.
In the patent analysis, the unstructured text from patent titles of each technology cluster will be preprocessed and transformed into a structured format. The key terms extracted from the text mining approach will be used for further analysis to determine the relationships among invention concepts.

Association Rule Mining (ARM)
Association rule mining is an algorithm used for discovering interesting relationships between item sets in a large database. The rules from the algorithm can be used to predict existing cases in an item or item set that are grouped. The algorithm uses the parameters support, confidence, and lift to describe the rules that it generates and to select interesting rules from all possible ones. The support is an indication of how frequently the item set appears in the database; the support of rule (A → B) can be calculated by the following probability: The confidence is an indication of how often the number rule (an if-then statement) is true; the rule (A → B) can be represented by conditional probability: The lift is calculated as the probability of an item set based on the probability of the individual items in the item set; the rule (A → B) can be calculated as follows: If the rule has a lift greater than 1, it implies that two occurrences are dependent on each other and makes those rules potentially useful for predicting the consequences in future datasets [6,7,15,18].
In patent analysis, IPC codes and the key terms extracted from each technology cluster will be processed to determine the association rules. The association rules (IPC_code #1 → IPC_code #2) determine that "If technology IPC code #1 is developed, then technology IPC code #2 is also developed", and the text association from the extracted technical terms (technical_term #1 → technical_term #2) determines that "If technical_term #1 is developed, then technical_term #2 is also developed". The hidden relationships discovered via association rules help us to summarize the collection of patent documents, in which the IPC code association rules define the technology co-occurrences, and the text association rules derived from the extracted key terms determine the invention concepts.

Social Network Analysis (SNA)
A social network analysis (SNA) is a study of social connections among actors, such as individuals, groups, organizations, and processes that cause changes in the relationship between individuals, and between groups, according to the changing situation. SNA helps us understand an informal group, social organization, and the behavior of a social structure. There is a set of measurement metrics to map, measure, explore, and visualize the social relationships between actors. The major metrics include degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality, which are used to analyze and visualize the patterns of network [14,[19][20][21][22].
The performance metrics used in this study are as follows [22]: (1) Degree Centrality (DC): a center of connectivity in a network (Hub), which is the most influential in a network. The node that connects many edges is the most influential in a social network. A vertex v of graph G = (V, E) can be calculated as follows: (2) Betweenness Centrality (BC): the shortest link or path by which an individual node bridges the other node in a network. A high value of BC indicates full control or that it plays an important role between two other nodes participating in a social network. The BC of vertex v of graph G = (V, E) can be calculated as follows: where σ xy is the total number of shortest paths from node x to node y, and σ xy (v) is the number of paths that pass through v.
where d(i, j) is the distance between vertex i and j. (4) Eigenvector Centrality (EC): the relative scores assigned to all nodes in a network. The score of each node is measured from the links with other influential nodes. A high eigenvector score means that a node is connected to many nodes that themselves have high scores. The eigenvector centrality is used for measuring the importance of all nodes in a network. To find the EC score of a graph G = (V, E) with |V| vertices, let B = b_ (v, t) be the adjacency matrix, where b_ (v, t) = 1 if v is linked to vertex t and b_ (v, t) = 0 otherwise. The relative centrality, x, score of vertex v can be calculated as follows: where N(v) is a set of neighbors of v, and λ is a constant.
The SNA is the final analysis of this study, by which the IPC code association rules, and text association rules from the technical terms, are visualized as a network graph to determine the technology and invention communities from each cluster. An overview of the network graph lets us see the influential technologies that may have been used to create the invention, and who is the owner of the invention. All of these results can be used as a guideline for technology management to perform R&D and determine the business feasibility.

Conceptual Framework
The conceptual framework of this study is shown in Figure 2. The patent data, which consist of the IPC code and patent titles, were used as the primary input. The patent data were taken from EPO's online database, then we performed data preprocessing. After that, the four data analysis methods, K-means clustering, Text Mining, Association rule mining (ARM), and Social network analysis (SNA), were applied to analyze a similar group characteristic of patent data, the hidden knowledge of patent data, and the key influencers of technology and invention. The results are presented in a network graph that identified communities of patent data. The developed data analysis framework consists of the following steps: Step Step 2. K-means clustering (2.1) Perform data clustering to obtain the patent cluster profile.
(2.2) Perform cluster validation to obtain an appropriate number of clusters.
Step 3. Text mining (3.1) Perform text mining on the patent titles dataset to obtain the technical terms (key terms).
Step 4. Association rule mining (ARM) (4.1) Apply ARM to IPC code dataset to each cluster to obtain association rules. (4.2) Apply ARM to technical terms (key terms) to obtain text association rules.
Step 5. Social Network Analysis (SNA) (5.1) Use SNA to calculate the degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and eigenvector centrality (EC) of IPC association rules, and the text association rules that exist in each cluster. (5.2) Construct a network graph to visualize association rules and text association rules in each cluster. (5.3) Analysis of the results: the most influential technology, connectivity of technology, and technology prioritization, etc.

K-Means Clustering
The first analysis process was clustering patent datasets, where the objective was to find existing technology clusters from the patent data. The IPC code, technical field, and technical sector were the selected variables for the cluster validation process since they were important parts of patent data to identify cluster characteristics. The results of the five clusters show a group of patents, including from chemistry, electrical engineering, instrument, mechanical engineering, and other fields. There were 153,071 patents from 2009-2018 distributed in each cluster and calculated as a percentage, shown in Table 3. The developed data analysis framework consists of the following steps: Step 1. Data collection and preprocessing Step 2. K-means clustering (2.1) Perform data clustering to obtain the patent cluster profile. (2.2) Perform cluster validation to obtain an appropriate number of clusters.
Step 3. Text mining (3.1) Perform text mining on the patent titles dataset to obtain the technical terms (key terms).
Step 4. Association rule mining (ARM) (4.1) Apply ARM to IPC code dataset to each cluster to obtain association rules. (4.2) Apply ARM to technical terms (key terms) to obtain text association rules.
Step 5. Social Network Analysis (SNA) (5.1) Use SNA to calculate the degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and eigenvector centrality (EC) of IPC association rules, and the text association rules that exist in each cluster. (5.2) Construct a network graph to visualize association rules and text association rules in each cluster. (5.3) Analysis of the results: the most influential technology, connectivity of technology, and technology prioritization, etc.

K-Means Clustering
The first analysis process was clustering patent datasets, where the objective was to find existing technology clusters from the patent data. The IPC code, technical field, and technical sector were the selected variables for the cluster validation process since they were important parts of patent data to identify cluster characteristics. The results of the five clusters show a group of patents, including from chemistry, electrical engineering, instrument, mechanical engineering, and other fields. There were 153,071 patents from 2009-2018 distributed in each cluster and calculated as a percentage, shown in Table 3. The clustering provided the largest number of patents and three clusters with a relatively small number of patents. Cluster 4 has the largest number of patents. This cluster represents an adequate technical sector since it has a large number of patents registered. On the other hand, clusters 1, 3, and 5 are inadequate technology clusters since they have a small number of patents. Each cluster contains data that reflect the relationship between the IPC codes and the key terms extracted from patent titles. Both can be used to describe the technologies, inventions, and influencers that are useful for R&D and technology management in the future. The analysis of IPC codes and the key terms will be explained in the next sections.
Patent cluster characteristics, based on the technical sector, technical field, and IPC code, are summarized in Table 4. Each cluster consists of specific technical fields and IPC codes. The IPC codes represent the inventions shown in each cluster, and will be used to find the relationships between technology by applying association rule mining to forecast technology trends.

Text Mining
The K-means clustering performed in Section 4.1 provided the results of five technical sectors: Chemistry, Electrical engineering, Instruments, Mechanical engineering, and Other fields (patents that cannot be identified with any sector). Next, we performed the major processes, i.e., tokenizing, filtering out stop words, transforming cases, and stemming, for extracting the key terms from patent titles. The examples of patent titles and the extracted key terms are shown in Tables 5 and 6, respectively.  The patent titles are considered to be "unstructured text", usually analyzed by experts-different from analyzing variables (IPC codes in this case) that are computer-readable. Applying text mining, the patent titles in each cluster are broken down into smaller units and structured to make the extracted key terms more meaningful.
The key technical terms derived from the text mining of each cluster were the most frequent words found in patent titles. The inventions might have special qualities that were initially defined by the definition of the patent titles. For example, the terms "system", "device", and "process" were commonly found in all clusters. This was because the patents came from the ideas of systems, devices, and processes initiated by the inventors or experts in each technology area. At the same time, many words convey the meaning of inventions that are relevant to each cluster characteristic as well.

Association Rule Mining (ARM)
In the five clusters from clustering results, we used ARM to find the relationship between IPC codes as well as the relationship between key terms. The association rules were applied to find the relationship between the IPC codes that determined antecedent (IPC code #1) and consequent (IPC code #2) of technologies within each cluster. The key terms derived from the text mining process were so-called "text association rules". The text association rules from the key terms determined the relationship between the related terms of invention. Tables 7 and 8 show some examples of association rules between IPC codes as well as some examples of association rules between key terms, with at least a 10% confidence value. Each cluster identified the top association rules of developed technologies and inventions. The association rules implied that if technology IPC #1 was developed, technology IPC #2 was also developed. Additionally, the output obtained from Section 4.2 was applied ARM to extract text association rules to identify the relationships between key terms in each cluster. There are three common measures to describe association. The results in Tables 7 and 8 can be discussed as follows: 1.
The rules with high support value implied the popularity of technologies and inventions. For example, the rule (C08L → C08K) and the rule (wind, turbine → blade) from the Chemistry cluster had the highest support value. This means technology C08K was widely developed on technology C08L and the invention of "blade" was widely developed from the invention of "wind" and "turbine". 2.
The rules with high confidence value implied the probability of technologies and inventions. For example, the rule (E04C → E04H) and the rule (turbine, foundation → wind) in the Other Fields cluster had the highest confidence values. If technology "E04C" was developed, then technology "E04C" was more likely to developed as well. Additionally, if the inventions related to "turbine" and "foundation" were developed, the invention related to "wind" was more likely to be developed.

3.
The rules with high lift value implied a strong relationship between technologies and inventions development. The lift values from each cluster were greater than 1, which means the antecedent and the consequent of the technologies and inventions are more likely to associate with each other. The rules in the first order of each cluster had the highest lift, which means the technologies, as well as the inventions, are dependent on each other and the rules are potentially useful to predict the consequences in the future.

Constructing a Network of ARM
The association rules in each cluster that resulted from Section 4.3 were jointly analyzed by SNA, where the number of rules in each cluster must be large enough to illustrate a network. In this subsection, we used as many rules as possible to illustrate the unambiguous network. The network graph was arranged in a circular layout. Both the antecedent and consequent from the association rule were represented as IPC code and key term vertices in SNA. The size of each vertex depends on the value of the degree centrality. The higher the degree centrality, the greater the vertex size. The lift values from the association rules were represented as the edges that connected the vertices in SNA. The size of each edge depended on the value of the lift. The higher the lift, the greater the scale of the edge size. Figures 3-7  technology C08L and the invention of "blade" was widely developed from the invention of "wind" and "turbine". 2. The rules with high confidence value implied the probability of technologies and inventions. For example, the rule (E04C → E04H) and the rule (turbine, foundation → wind) in the Other Fields cluster had the highest confidence values. If technology "E04C" was developed, then technology "E04C" was more likely to developed as well. Additionally, if the inventions related to "turbine" and "foundation" were developed, the invention related to "wind" was more likely to be developed. 3. The rules with high lift value implied a strong relationship between technologies and inventions development. The lift values from each cluster were greater than 1, which means the antecedent and the consequent of the technologies and inventions are more likely to associate with each other. The rules in the first order of each cluster had the highest lift, which means the technologies, as well as the inventions, are dependent on each other and the rules are potentially useful to predict the consequences in the future.

Constructing a Network of ARM
The association rules in each cluster that resulted from Section 4.3 were jointly analyzed by SNA, where the number of rules in each cluster must be large enough to illustrate a network. In this subsection, we used as many rules as possible to illustrate the unambiguous network. The network graph was arranged in a circular layout. Both the antecedent and consequent from the association rule were represented as IPC code and key term vertices in SNA. The size of each vertex depends on the value of the degree centrality. The higher the degree centrality, the greater the vertex size. The lift values from the association rules were represented as the edges that connected the vertices in SNA. The size of each edge depended on the value of the lift. The higher the lift, the greater the scale of the edge size. Figures 3-7 illustrate the relationships among IPC codes and key terms in the Chemistry, Electrical Engineering, Mechanical Engineering, Instrument, and Other Fields, respectively.           The network graph above shows the IPC codes and key terms in each cluster that represent technology and invention influencers. The most popular technology and invention can be seen from the size of the vertices. The size of each edge determines the possible inspiration of the inventions. For example, technologies C08L, C08K, and C08G are popular (influencers) in the Chemistry cluster. Technologies B01J, C01B, C21D, and C22C are less popular, but they are still inspiring. Therefore, the network graph allows us to visually evaluate the properties of the large number of association rules.

Summary of Influential Nodes from SNA
The network represented the relationships between five clusters with IPC codes and the key terms by degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and eigenvector centrality (EC). The top ranks of important nodes in the network graph from the results can be summarized as shown in Table 9.  The network graph above shows the IPC codes and key terms in each cluster that represent technology and invention influencers. The most popular technology and invention can be seen from the size of the vertices. The size of each edge determines the possible inspiration of the inventions. For example, technologies C08L, C08K, and C08G are popular (influencers) in the Chemistry cluster. Technologies B01J, C01B, C21D, and C22C are less popular, but they are still inspiring. Therefore, the network graph allows us to visually evaluate the properties of the large number of association rules.

Summary of Influential Nodes from SNA
The network represented the relationships between five clusters with IPC codes and the key terms by degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and eigenvector centrality (EC). The top ranks of important nodes in the network graph from the results can be summarized as shown in Table 9.
The degree of centrality indicates the hub nodes in the network, which reflects the most influential technology and invention in each technical sector. It is seen that if the nodes are always the first place for all measurements, they control the network. Additionally, they collaborate with other nodes and play important roles in promoting new technology and invention in their sectors. From the degree centrality, the technologies "C08G", "H02J", "G01D", "F03D", and "E04H" are considered to be the most influential technologies, while the key terms related to "system", "generator", "wind", and "method" are the most influential inventions. From the betweenness centrality, these technologies and inventions seemed to cooperate with others in the network. This leads to knowledge exchange. From the closeness centrality, these technologies and inventions are potentially used to develop new products and services. However, there are some isolated technologies in the network, such as in the Chemistry, Electrical engineering, and Instrument sectors, with high CC values. We can assume that these are developed for a specific purpose and are unreachable by the other technologies. We observed that all the key terms in the Other fields sector have equal value. This means these are general key terms of invention and they can be used together. Although there is an invention related to "wind" in the Instruments and Other fields sectors, the developed technologies are different. The Instruments sector is involved with the measurement technology, while the Other fields sector is related to construction in Civil engineering. Lastly, the eigenvector centrality of the nodes "C08G", "H02J", "G01D", "F03D", "F01D", and "E04H" determines the most important technologies to the other nodes from each technical sector, while the most important inventions to the other nodes are related to "system", "generator", "wind", and "method".

Application of the Results to Patent Management
The SNA results in the previous subsection not only help decision-makers to evaluate information based on visualization, but also provide measurements (as mentioned above) to determine the connectivity characteristics. The technology influencers of each technical sector and their definitions are shown in Table 10. When the inventor or company is interested in creating or developing products that are classified in various technical clusters, they will have to check whether other investors hold patents to prevent intellectual property infringement. The technology influencers have been patented by inventors and companies around the world. The number of patents implies the capabilities of technology development in each country. The top five countries of technology influencers, according to the number of patents of each cluster, is shown in Figure 8.  Patent management in an organization is not only about inspecting the patent documents registered by competitors, but also obtaining the appropriate technology to develop products or services. Patent data have inspired the inventor or company to be more creative in producing and upgrading products or services. Product development may be blocked by the inventor or company who holds the patent related to the particular technology influencers. Many companies seek partnership for technology transfer as well as to explore the patents that have not been renewed for "freedom to operate". Although the patents have no novelty, the core technology can be used to further develop new products, process, and services for customers, and this does not infringe on the intellectual property of others.

Conclusions
This paper proposes technology analysis from patent documents using IPC codes and patent titles to identify hidden information. The patent data were collected from the European Patent Office (EPO). Our study applied a conceptual framework to find existing technology clusters from the collected patent data, then find the relationships of associated technologies in each cluster, and explore and visualize the insight relationships of associated technologies. The design framework consisted of data mining methods and Social Network Analysis (SNA), which can be useful for the development of new technology and inventions.
The data mining methods, K-Means clustering, Association Rule Mining, and Text mining, were used to analyzing patent data. The K-Means clustering was applied to find the group similarities of patent data to find existing technology clusters from patent data. By performing cluster validation to find an appropriate number of clusters, we observed five clusters that represented technology clusters, i.e., Chemistry, Electrical engineering, Instruments, Mechanical engineering, and Other fields. The knowledge gained from K-Means clustering was the adequate technology, i.e., Mechanical engineering, since it had the largest amount of patent data. The most inadequate technology was Patent management in an organization is not only about inspecting the patent documents registered by competitors, but also obtaining the appropriate technology to develop products or services. Patent data have inspired the inventor or company to be more creative in producing and upgrading products or services. Product development may be blocked by the inventor or company who holds the patent related to the particular technology influencers. Many companies seek partnership for technology transfer as well as to explore the patents that have not been renewed for "freedom to operate". Although the patents have no novelty, the core technology can be used to further develop new products, process, and services for customers, and this does not infringe on the intellectual property of others.

Conclusions
This paper proposes technology analysis from patent documents using IPC codes and patent titles to identify hidden information. The patent data were collected from the European Patent Office (EPO). Our study applied a conceptual framework to find existing technology clusters from the collected patent data, then find the relationships of associated technologies in each cluster, and explore and visualize the insight relationships of associated technologies. The design framework consisted of data mining methods and Social Network Analysis (SNA), which can be useful for the development of new technology and inventions.
The data mining methods, K-Means clustering, Association Rule Mining, and Text mining, were used to analyzing patent data. The K-Means clustering was applied to find the group similarities of patent data to find existing technology clusters from patent data. By performing cluster validation to find an appropriate number of clusters, we observed five clusters that represented technology clusters, i.e., Chemistry, Electrical engineering, Instruments, Mechanical engineering, and Other fields. The knowledge gained from K-Means clustering was the adequate technology, i.e., Mechanical engineering, since it had the largest amount of patent data. The most inadequate technology was Chemistry, since it had the smallest amount of patent data. Both have some interesting aspects to be developed in the future in order for companies to gain a competitive advantage.
The five technology clusters were the focus groups, where each group consisted of various IPC codes and patent titles. Useful information can be extracted using Association Rule Mining (ARM) and Text Mining. ARM was applied to find the co-occurrence among IPC codes and patent titles. The antecedent (A) and consequent (B) of association rules were defined as; if technology A was developed, then technology B was also developed. ARM helped us deduce meaningful rules that identify important relationships among technology classes and invention concepts.
Text Mining was applied to extract key terms from patent titles. Key terms were extracted from the patent titles in each cluster based on text mining methods. The limitation of this study was that we only considered patent titles in English. The key terms extracted from each cluster were pruned to obtain the most relevant and were counted and indexed to compute the total term occurrence and frequency. The extracted key terms would be used to find the co-occurrences among invention concepts by ARM.
Association rules derived from IPC codes and key terms of patent titles can be assessed by using the values of support, confidence, and lift to determine the strength of the rules. Additionally, we used SNA to further analyze association rules and to visualize a network structure. SNA provided a network visualization and some measurements, i.e., degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. These factors can be used to determine the most influential technology, as well as the most influential inventions. Additionally, some factors determine the bridges, closeness, and level of importance of technologies and inventions.
The results of the proposed methods and conceptual framework show the relationships in patent data. Each technology cluster consists of the most influential technologies and inventions, connected with each other, and there are opportunities for the development of new technologies and inventions. The technology influencers can be inspired by an inventor or company to develop products or services that satisfy their customers. Many companies search for patents to explore the target technology to develop their products or services as well as avoid intellectual property infringement. Patent management is necessary for companies that require R&D to create new technology for product development. The companies can manage their knowledge by accessing the patents held by individuals or organizations. Access to technological knowledge can be achieved through collaborations between patent holders in order to receive technology transfer. One of the good practices to minimize the risk of infringement on the patent right of others and save companies' resources is to apply for "freedom to operate" during an early stage of the company's establishment.
Summarizing the above, in this study, we applied various data mining methods to gain insight from patent data, and applied SNA to explore technology-influenced networks and investigate the influential patent holders in various technology sectors around the world. This will contribute to making the strategic invention of inventors or companies more effective.
Author Contributions: P.A. designed and performed the experiment, and analyzed the data. S.T. supervised the research and revised the paper. All authors have read and agreed to the published version of the manuscript. Acknowledgments: Our thanks to the European Patent Office (EPO) for allowing us to access the patent database and update our information regularly.

Conflicts of Interest:
The authors declare no conflict of interest.