Cluster Analysis of Automobile Innovative Users Based on Interactive Innovation Value

. As a new model of product creation, “interactive innovation” can effectively improve the success rate of enterprise product innovation. Because of the wide application of the Internet and the pervasiveness of social media such as forums and blogs, there is a rapid growth of product reviews online. These social media platforms provide an effective channel for interactive innovation between enterprisesand users. This paper is based on the innovation application of automobile products. The purpose of this paper is toidentify and classify theinnovativeusersin automobileforums and analyze the characteristicsof differentusergroups. First,we summarize six typical characteristics of innovative users and quantify these six characteristics. Second, we make a cluster analysis of user data and divide innovative users into three classes according to the innovation value. And then, based on the different characteristics of three types of users, we present different interactive innovation methods for three types of users. Finally, we construct a pyramid model of innovative user to show the distribution status of various users.


Introduction
Product innovation is an essential factor in enterprise development. Constant technology innovation and new products development are essential for enterprises to survive in the increasingly fierce market competition [1]. However, product innovation is a very risky game. A large number of cases demonstrate that many product innovations fail or fail to recover the investment in the end. Therefore, it is important to ensure the success rate of product innovation. As a new model of product innovation, "interactive innovation" enables enterprises to understand the overall needs of users comprehensively and effectively, thereby ensuring the success of enterprise product innovation. "Interactive innovation" refers to the innovative design and improvement process of products through the interaction and cooperation between the production enterprises and users [2]. As Prahalad et al. [3] pointed out that, in the 21st century, users would actively participate in the design and production of enterprise products and services, and users would be the co-creators of product value, thus, products designed by creative users are easily accepted by the market and companies would have a high success rate of product innovation.
With the popularization of Internet and product professional social media, such as blogs and forums, many users share their opinions and ideas, evaluate product performance and features, reflect product defects and problems, and analyze the advantages and disadvantages of different products on such platforms. A large corpus of data about product reviews has been accumulated and those data help enterprises to excavate users' needs and potential needs. Thus, it provides an effective channel and platform for the interactive innovation of enterprises and users. If enterprises can effectively utilize the convenience provided by the Internet and work closely with users to make co-creation innovations, it will be critical for enterprises to adapt to market change and win the competition. In this paper, we regard the users of automobile forum under the Internet as research objects and study the interactive innovation model between enterprises and users.
The key to carry out the interactive innovation activity for enterprises is to identify the user community of innovation value effectively [4]. Companies then can take different 2 Mathematical Problems in Engineering approaches to cooperate with users, thereby maximizing the users' innovation value. However, every user has a different contribution to the enterprise product innovation. In other words, the user value degree to the product innovation is a continuous distribution; thus users cannot simply be divided into lead users and nonlead users. Therefore, we classified different innovative users based on the innovation value and then took different methods of interactive innovation for different user groups, so that the innovation value of users can be maximized in the product innovation activities of enterprises.

Related Work
Von Hippel [5] from Massachusetts Institute of Technology proposed the concept of "innovation source" and divided innovation resource into three classes: user innovation, manufacturer innovation, and supplier innovation. Based on the research on innovation source, innovators who were initially successful in business were classified as high-valued innovative users [6,7]. These innovative users are a small part of the user community [2]. Thus, the key to interactive innovation is to identify and select the right users [8,9].
Professor von Hippel [2] firstly proposed the concept of lead user. However, this method, which simply divided users into lead users and nonlead users, neglected the contributions of nonlead users to enterprise product innovation. In order to provide new methods for enterprises to cooperate with users in product innovation, in this paper, innovative users are used to refer to the major participants of user innovation activities, and the user groups are grouped into three classes according to the different value of innovation: high-valued innovative users, medium-valued innovative users, and lowvalued innovative users.
Much research has been done to explore the identification of innovative users. Poetz et al. [10] proposed screening. This method used questionnaires or telephone interviews to collect information of targeted users, thereby finding out the right users presenting desired characteristics. Von Hippel et al. [11] proposed pyramiding. Pyramiding builds link chain through users' recommendation and follows the link chain to find out users with the maximum of innovative characteristics. Jeppesen et al. [12] proposed broadcasting. Public details of problems are broadcasted to be solved, and this invites capable users to submit possible solution. Belz et al. [13] proposed netnography. Netnography is the combination of network and demography, and it uses demography method to analyze people in network communities. Spann et al. [14] proposed virtual stock market. This method summons targeted users through Internet, and participators trade in the virtual stock market based on their own knowledge and judgement. Finally, the users with good performance are regarded as lead users.
The above studies are based on traditional market research and customer surveys to find or identify highvalued innovative users. These methods are different only in selecting samples and identifying criteria according to the area of the enterprise [15]. The main contributions of this paper are shown as follows. Firstly, we regard the large amount of user information as the research object and obtain the user's innovation value through text mining. Secondly, we employ cluster analysis algorithm to divide the innovative users into three classes and then analyze the characteristics of different user groups with different innovation value. Finally, we proposed corresponding suggestions for different user groups on how to carry out interactive innovation with enterprise, so as to maximize the innovation value of users and ensure the success rate of product innovation.

Characteristic Description of Innovative Users.
Research on innovative users [16] has shown that some users have more motivation and potential than the others. Other studies [17] have also emphasized the characteristics of Internet users. Therefore, we divide the typical characteristic of innovative users in the automobile forum into two categories: characteristics in innovativeness and characteristics in network.

Characteristics in Innovativeness.
Based on the existing research about innovative users [16], we summarized the characteristics in innovativeness in four aspects: Ahead of market trend, willingness to innovate, rich domain knowledge, and rich user experience.
(1) Ahead of Market Trend. Innovative users are capable of detecting the underlying demand in the market. On the one hand, the unsatisfied needs of innovative users reflect the future mainstream market demand. Franke et al. [18,19] proposed that the innovative users should have the ability to perceive potential market demand in advance and expect to benefit from innovation. On the other hand, the dissatisfaction with the existing production is another manifestation of demand leading. Dissatisfied users are more likely to discuss the shortcomings of the products, give suggestions for improvement, and even improve products in reality.
(2) Willingness to Innovate. Innovative users are willing to innovate. According to the researches about innovative users [4,20], the willingness to innovate is the premise for innovative users to take actions to improve the products. Therefore, innovative users tend to be more willing to innovate than normal users. Since no products existing in the market can satisfy their special needs, innovative users are likely to create product prototypes, which can meet their needs better [21]. Compared with other users, they show a stronger motivation to transform existing products.
(3) Rich Domain Knowledge. Domain knowledge reflects user's familiarity with the products and serves as the source of inspiration to satisfy their special needs through innovation [22]. Research on innovative users shows that, as users with high innovativeness, lead users have more internal skills and external resources than other normal users [23]. Under the same conditions, users with richer expertise would cost less to innovate, and they are more likely to innovate [22].
(4) Rich User Experience. Innovative users' new ideas and solutions to current problems are inspired by their user experiences, so rich user experience leads to high potential to innovate [24]. Since they have been using the product for a long time, lead users have accumulated a rich user experience and have the ability to many drawbacks and improvements to the existing products or services. Thus, the rich user experience is another important characteristic of innovative users.

Characteristics in the Network.
Compared with traditional user aggregation community, automobile forums take advantage of the cyberspace to connect users distributed throughout the world and share information and knowledge. Because all users' activities are online, innovative users in the automobile forum would display different characteristics in the network.
(1) Community Activity. Community activity measures the frequency of user's online behaviors, which includes both posting and replying to share expertise, ideas, insights, and solutions in the community. Knowledge sharing is one of the most important functions of a virtual community. Researches have shown that innovative users tend to be active members in virtual communities [25]. Olson et al. [26] have also found that innovative users are greatly affected by the willingness to share and cooperate in the brand community. Thus, user's community activity is a valuable indicator to measure the user's innovation value.
(2) Network Connectivity. Network connectivity measures how central the user is located in the social network formed in the virtual community. Schreier et al. [27] found that the users with higher awareness of innovation tend to be the opinion leaders in the community. The trait of user's opinion leader is reflected through its connectivity in the social network, namely, the strength of network connection. The higher the user's strength of network connection in the social network, the greater its role in innovation communication and delivery, in other words, the higher the user's innovation value. Thus, user's network connectivity is an important indicator to measure the user's innovation value.

Quantitative Model of Innovative User's Characteristics.
The quantitative model of innovative user's characteristics includes two parts: characteristics in innovativeness can be indicated by the content of users' posts, which can be gained using data mining and text analysis technologies; characteristics in the network can be reflected by users' demographic data and network activity data.

Quantitative Model of Characteristics in Innovativeness.
The user's innovativeness is mainly reflected by users' post.
In order to quantify the characteristics of innovativeness, we need to categorize the posts and use text classification technology to classify the posts. The level of innovativeness can be quantified based on the text analysis of user's corpus. In summary, there   Table 1.
Using the text classification technology to classify the posts, for a specific post theme, we can get the number of posts (N), the number of essential posts ( ), and the number of nonessential posts ( ). The essential posts are posts pinned by forum administrators. When users think that a post is valuable, they will send a request to the administrators to highlight the posts. Generally, such posts are rich in content and have high reading value. They can be replied and the author can modify the original post. Thus, the essential posts contain high-valued information and should be differentiated from the nonessential posts. Therefore, we can get the following equation: For the ith post under the corresponding theme post ( ), the level of the quantified characteristic in innovativeness presented in the post can be obtained by the following equation: where represents the number of replies of the ith post and V represents the number of views. and V represent the weights given to the corresponding attributes, respectively.
Consequently, the level of the quantified characteristic in innovativeness (L) can be calculated by the following formula: where and are constants, representing the weights given to essential posts and nonessential posts, respectively. In the same way, we can quantify the level of the four characteristics in innovativeness, i.e., ahead of market trend ( ), willingness to innovate ( ), rich domain knowledge ( ), and rich user experience ( ), as follows.

Quantitative Model of Characteristics in the Network
(1) Community Activity. In consideration of the scalability of data, we choose the duration of registration (T) and the number of activities (F) as metrics to measure the community activity. The duration of registration is the total number of months from the user registration to the date of data acquisition. The number of activities is the total number of the users' posting and replying activities. Therefore, the level of community connectivity can be represented by the frequency of the user's activity frequency, as shown in formula (8).

= (8)
(2) Network Connectivity. In this study, we choose two metrics to measure user's network connectivity: the number of times replying to others' posts ( ) and the number of times being replied by others in the community ( ). Thus, the level of network connectivity can be quantified using formula (9).
where and represent the values after normalization. and are constants, representing the weights of and .

Indicator
System. The indicator system for identifying innovative users in automobile forum (e.g., AutoHome) is shown in Table 2.

Preprocessing of User Data for Automobile Forum.
In this subsection, we will introduce the data preprocessing, including data collection, text categorization, and quantitative processing of user data. Through text categorization processing, the posts from Tiguan Forum in AutoHome are divided into five categories: posts indicating user's ahead of market, willingness to innovate, domain knowledge, user experience, and others, respectively.
As the text categorization method is a supervised learning process, it is necessary to construct a training set in order to provide a reference for the automatic classification. In this study, we randomly selected some posts from the AutoHome Tiguan user database, labeled the posts with the 5 aforementioned tags (ahead of market, willingness to innovate, domain knowledge, user experience, and others), and constructed the test set. The training set contains 4,000 forum posts from five categories. The last tag is used to identify and filter posts that do not reflect the characteristics in innovativeness.
The categorization result is shown in Table 3. Through sample survey of the result, we labeled the sample manually and compared the results of automatic processing and manual processing. The comparison result shows that the accuracy rate is 69.03%, which is quite acceptable to the research requirement.
As shown in the categorization result, the number of posts labeled as user experience is more than the number Mathematical Problems in Engineering 5 Table 2: Indicator system of innovative users identification in the automobile forum.

Characteristics in innovativeness
Ahead of market trend Replies to related recommended posts

Quantitative Processing of User Data.
Through web crawler and text categorization, the user data from Tiguan Forum in AutoHome is obtained and processed. According to the indicator system constructed in Section 3, we need to quantify the data to get the level of user's characteristics. Through seminar and survey of experts in automobile field, the weights of each characteristic are determined. The values of these weights, including , , , V , , and , are 0.65, 0.35, 0.65, 0.35, 0.50, and 0.50, respectively. First, we need to substitute the values of each weight into formula (1)- (9) and calculate the quantization values of corresponding innovation characteristics of each user, respectively. And then, 12624 users' information are obtained. Finally, we normalize the data results of users' innovative features and network characteristics. Partial normalized results of the data above are shown in Table 4.

4.2.1.
Results of Cluster Analysis. The paper adopts cluster analysis to classify users in the automobile forum and identify users with high potential to innovate. Based on the study of clustering algorithms, we choose to combine Balanced Iterative Reducing and Clustering Using Hierarchies (BIRCH) algorithm with Agglomerative Nesting (AGENES) algorithm. First, we use BIRCH algorithm to fulfill the preclustering and construct the clustering feature tree, and then use AGNES algorithm to undertake the clustering and achieve the classification of users.
Applying the chosen two-step clustering algorithm to cluster members of Tiguan Forum in AutoHome, the final result shows the 12624 samples are divided into 3 user groups, one cluster with 379 samples, one with 1880 samples, and the third one with 10365 samples, as shown in Table 5.
The following results can be obtained from Table 5. First, users in Cluster 1 are the minority, accounting for around 3% of the total. These users are significantly higher than the other two groups of users in one or a few innovative characteristics. Thus, users in Cluster 1 are named as highvalued innovative users. Second, users in Cluster 2 are slightly more than those in Cluster 1, accounting for around 15% of the total. Users in Cluster 2 have a higher level of characteristics in innovativeness than users in Cluster 3 but lower than users in Cluster 1. Thus, users in Cluster 2 are named as mediumvalued innovative users. Finally, as the majority of users, users in cluster 3 account for 82% of the total. The level of characteristics of users in Cluster 3 is generally lower than that of the other two types of users. Thus, users in Cluster 3 are named as low-valued innovative users.

Cluster Validation.
After clustering, we need to evaluate the validation of clustering result. First, this paper adopts Analytical Hierarchical Process (AHP) to obtain suitable weights of different characteristics of innovative users. After weighted calculation, each user has a composite score. Second, the users were sorted and classified by the users' AHP comprehensive scores, and the manually annotated clusters were obtained. By comparing the results of clustering and manual annotation, this paper employs three external criteria to evaluate the cluster validation: precision, recall, and F value.
The formulas for these three external criteria are as follows. Assume that the final clustering result is And the cluster structure determined by human being is For every cluster , assume that there is a corresponding cluster , and the correspondence is unknown. In order to find the exact for , we need to scan the clusters in . Then we can calculate precision, recall, and F value. Precision: Recall: F value: Since the number of samples is enormous, we adopt the method of sampling comparative analysis. The manual annotation method is as follows. First, 10% samples were extracted from Cluster 1, Cluster 2, and Cluster 3, and the selected samples were mixed. Second, combining with AHP, we invited experts to rate the six characteristics according to the importance compared with the other characteristics. We then obtained the weights of each characteristic being 0.36,    According to the comparison of cluster analysis and manually labeled testing samples, the precision, recall, and F value are shown in Table 6. The range of the three measurements above is [0, 1]. The bigger the value, the more similar the result of clustering analysis and the result of artificial classification. As shown in Table 6, the clustering result of the chosen algorithm is acceptable, and the analysis of the result is reasonable.

Characteristic Analysis of Cluster User Groups
The cluster analysis automatically divides the automobile forum users into three classes according to their innovation value: high-valued innovative users, medium-valued innovative users, and low-valued innovative users.
According to the analysis result, the distribution of innovative users is a typical pyramid. High-valued innovative users have the greatest ability to innovate. They are the minority with leading characteristics at the top of the pyramid. The number of medium-valued innovative users is slightly more than high-valued innovative users and less than low-valued innovative users. Low-valued innovative users have little potential to innovate, but they are the majority of the user group. The cluster analysis result shows that as the potential to innovate increases, the number of users decreases. Therefore, even for automobile industry with a large number of users, the number of high-valued innovative users will be small as well.
According to the analysis result, we can conclude that the higher the user's value or potential in innovation, the more deeply the users should be involved in co-creation. Based on the user clustering results, we proposed the corresponding innovative interaction methods and constructed the pyramid, as is shown in Figure 1.
Different methods are adopted to co-create with different user groups. For high-valued innovative users, enterprises should cooperate with the users to carry out the product innovation, thereby excavating the innovation value of the high-valued user as the co-creator. For medium-valued innovative users, enterprises should lead and guide the users during the innovation process, thereby excavating the innovation value of the medium-valued users as information providers. For low-valued innovative users, enterprises should interact with this user group after the launching of products to acquire feedback and propagate products, thereby excavating the innovation value of the low-valued user as the product user.
The pyramid of innovative users illustrates the distribution of innovative users in the user group and methods to cocreate with different types of users. As shown in the model, the higher the user's value for innovating, the more active they are in the innovation process.

Conclusions
The purpose of this paper is to identify and classify the innovative users in automobile forums and research the interactive innovation method for different user groups. First, we analyze characteristics of innovative users in automobile forums and construct an innovative identification system to quantify the innovative score for each user. Second, the system acquires the user data of automobile forums and utilizes text mining to calculate the innovation value of each user. We then also conduct the cluster analysis of users according to the different innovation value. Third, we analyze the characteristics of user groups with different value of innovation and give corresponding suggestions on the cocreation innovation method for different users. At the end of this paper, we construct a pyramid model of innovative users to show the distribution status of various users.
This study contributes to innovation user identification and co-creation with users. However, there are still limitations in the study. For example, the determination of index weight inevitably adopts the method of questionnaire surveys and expert interview. Thus, the method proposed in this paper cannot completely avoid subjective influence. In conclusion, further research can continue in the following two directions. First, the application of text analysis and deep learning in innovative user identification can be further explored. Second, the application of cluster analysis in innovative user identification can also be deeply discussed.

Data Availability
The experimental data used to support the findings of this study are included within the article. The website of this data is "Tiguan Forum" (https://www.autohome.com.cn/874/).

Conflicts of Interest
The authors declare that they have no conflicts of interest.