Diversity of Recommendation with Considering Data Similarity among Different Types of Contents

Recommendation methods have important objectives of accuracy and diversity but the traditional researches have been mainly focused on the accuracy of recommendation in terms of quality. At present, the diversity of recommendation is also important to people in terms of quantity in addition to quality since people’s desire for content consumption have been stronger rapidly than past. In this paper, we pay attention to similarity of data gathered simultaneously among different types of contents. With this motivation, we propose an enhanced recommendation method using correlation analysis with considering data similarity between two types of contents which are movie and music. Specifically, we regard folksonomy tags for music as correlated data of genres for movie even though they are different attributes depend on their contents. That is, we make result of new recommendation movie items through mapping music folksonomy tags to movie genres in addition to the recommendation items from the typical collaborative filtering. We evaluate effectiveness of our method by experiments with real data set. As the result of experimentation, we found that the diversity of recommendation could be extended by considering data similarity between music contents and movie contents. 


Information
technologies have changed communication society to knowledge society.People have confused by big data which have been produced massively from information system, web, social network service and smart devices.They have difficulty in selecting the reasonable information necessary for their situation.Therefore it is necessary to filter or refine or analyze the information through big data process, especially data mining [1].Recently, many researchers have focused on personalization with diversity in addition Manuscript received October 15, 2015; revised December 28, 2015.
to accuracy of recommendation in data mining areas [2]- [4].Traditional collaborative filtering researches have been processed in terms of accuracy of recommendation.They have concentrated on the recommendation using user's profiling and personal tendency in conjunction with social network services [5]- [7].Currently, diversity of recommendation is very important.In terms of viewpoint, collaborative filtering approaches with user experience have been studied more and more.These researches are based on folksonomy tag [8].Folksonomy means collaborative tagging or social tagging which are classified by users directly to retrieve their necessary information.Folksonomy tag has advantage of flexibility in terms of expansion of analysis because they are not static hierarchy of taxonomy in typical classification system but dynamic horizontal structure.However, folksonomy has two problems in analyzing process, synonyms and neologism.That is, folksonolomy needs thesaurus to process synomyms and stemming work to remove stopwords and to distinguish terms of similar means [9].To overcome these problems, [8] and [10] established tag cloud architecture with dynamic navigation link and [11] constructed an ontology-based dynamic data catalog using tag cloud.However, these researches have still got limitation of diversity problem because they used folksonomy tags only within one type of contents.
In this paper, we pay attention to similarity of data gathered simultaneously among different types of contents.For example, we note that folksonomy tags for music can be correlated to genres for movie and their values can affect each other in terms of association rule.That is, we regard folksonomy tags for music as correlated data of genres for movie even though they are different attributes depend on their contents.With this motivation, we propose an enhanced recommendation method using correlation analysis with considering data similarity between two types of contents which are movie and music.In this paper, we propose a collaborative filtering algorithm for diversity of recommendation and evaluate effectiveness of our algorithm by experiments with real data set.

A. Overview of the Proposed Method
We propose an enhanced recommendation method using correlation analysis with considering data similarity between two types of contents.We typically deal with the movie contents and music contents.In our method, the core concept is focused on the inter-relationship between movie genres and folksonomy tags of music for diversity of recommend.Flow of the proposed method is shown in Fig. 1.As shown in Fig. 1, Flow of the proposed method is divided into internal data process for movie contents and external data process for music contents.In the internal data process, the method decides recommend items for movie contents by typical collaborative filtering algorithms.In the external process, the method extracts associated interest and rating prediction through the synonym process of mapping between folksonomy tags for music and prefer movie genres generated from the internal process.As the result, our method can generate new recommendation list by analyzing inter-relationship between movie contents and music contents.It will contribute to improve accuracy of recommendation by calculating with doubled data amount of both movie and music.Above all, it will also contribute to make diversity of recommendation by analyzing with different types of contents.

B. Procedure of the Proposed Method
Our proposed method processes correlation analysis with data similarity between two types of contents which consist of movies and music.Specifically, our method recommends the contents to the users based on the result of mapping folksonomy tags of music to movie genres information which are preferred by user group.The proposed method processes the following steps of procedure.
Step 1-1: User profile retrieval / Prefer keyword search  Process clustering work using user profile data which contains age, occupation, sex and etc.  Make first level similar user groups for their consumption data of movie contents gathered from MovieLens by GroupLens Research Project [12] team.
Step 1-2: Data matrix generation  Examine the preferred movie genres for user groups selected from step 1-1. Generate data matrix among the user groups by Apriori algorithm.

A. Movie Data Set Gathering for Experiment
We use MovieLens as experimental data set.The movie genres are shown in Table I.The used data set consists of approximately 100 million of data which include preference behavior rating scales.The rating scales are priced by 5 points for 3,884 movies by 6,040 users.

B. Synonyms Gathering for Movie Genre to Music Tag
We use synonym dictionary of Thesaurus [14] as the gathering method of synonyms to connect the interrelationship between movie genres and music folksonomy tags for the movie.We collected total 473 synonyms for 18 movie genres in Table II.For example, Fig. 2 shows the result of retrieval synonyms corresponding to 'animation' of movie genre.We found the synonyms of 'children' instead of movie genre of 'children's' and the synonyms of 'darkness' instead of movie genre of 'film-noir'.As the result of retrieval, the movie genre of 'action' had the most number of synonyms and the movie genre of 'sci-fi' had the least number of synonyms.We counted the same frequency for each genre if there was a duplicate of the synonym words for the 18 genres.
We collected total 473 synonyms of 18 genres.Table II shows the result of collection.To extract folksonomy tags for 18 genres in Table I, we also gathered music data set from Last.fm.To acquire meaningful analysis, we collected most of top 100 music items among music data which had tags corresponding to each movie genre.At this point, we used OpenAPI of Last.fm to gather music items.As the result, we gathered total 1,750 music items as shown in Table III.For gathered music items, every item has one or more folksonomy tags corresponding to movie genre.The internal tag structure of music is shown in Fig. 3.This data set can be used to extract inter-relationship of movie genres.Table IV shows the number of internal folksonomy tags for each music item in the gathered music data set.As shown in Table IV, we can verify 645 tags for 100 music items which include synonyms corresponding to movie genre of 'animation'.We also verify that the music items for 'comedy' include the most amounts of tags and the music items for 'documentary' include the least amounts of tags.It means that the genre of 'documentary' has relatively weak relationship with music compared with the other genres and the genre of 'comedy' has strong relationship with music.

C. Mapping Folksonomy Tags with Movie Genres
We use folksonomy tags of music to make additional recommendation for movie through mapping process folksonomy tags to movie genres.Fig. 4 shows the mapping process.As the result of experimentation, the tag of 'broadway' is synonym of relationship with 'drama' and it is found 54 times in music items corresponding to the genre of 'horror'.

D. Recommendation Based on Collaborative Filtering with K-Nearest Neborighhood
We implemented similarity calculation algorithm on Eclipse RUNA.Usually K-nearest neighborhood algorithm is used to find similar user group and Pearson correlation coefficient algorithm is used to calculate the similar distance from the user to the other user for neighborhood.However Pearson correlation coefficient algorithm has problem of inaccuracy because it doesn't consider difference of rank scaling among users.To improve this problem, we employ adjusted cosine-based similarity calculation method as the following equation.
In the equation (1), R u,i means rank score of item i which is gotten from user u, Ru means average rank score from user u.In this paper, we make cluster of neighborhood for a user using the above similarity calculation method.Among the items which the user didn't select yet, we find n items which are given high preference from the neighborhood for the user.

E. Extended Recommendation through Mapping
Process Between Movie Genres and Music Tags We make a recommendation list by collaborative filtering with K-nearest neighborhood algorithm as shown in Table V.
In the Table V, if neighborhood for a user preferred to 'horror' of movie genre, the proposed method would recommend the movies Century (1993), Reckless (1995) and 8 Seconds (1994) to the user by collaborative filtering method.And if the method generated the genre of 'drama' as the additional recommendation genre by the mapping process in addition to the existing genre of 'horror', it would extend the recommendation items as the movies Richard III (1995), Jefferson in Paris (1995) and Sonic Outlaws (1995) in addition to the existing recommendation list.As the result of experimentation, we found that the proposed method extended the recommendation of movie using inter-relationship between movies and music.Specifically, the interrelationship meant the correlation analysis with data similarity between movie genres and music folksonomy tags.

IV. CONCLUSION
In this paper, we proposed two dimensional recommendation method using inter-relationship between two different types of contents.Specifically, we tried to make diversity of recommendation through mapping between movie genres and music folksonomy tags.That is, we made result of new recommendation movie items through mapping music folksonomy tags to movie genres in addition to the recommendation items from the typical collaborative filtering.As the result of experimentation, we found that the diversity of recommendation could be extended by considering data similarity between two types of contents.We call our concept two dimensional recommendation because the proposed recommendation processes not data analysis within only a certain type of content but data correlation analysis with data similarity between two types of contents.
Currently we are extending the proposed twodimensional recommendation method to high dimensional recommendation among two or more types of contents in view of the diversity of recommendation.We also consider the accuracy of our high dimensional recommendation through verification work for actual customers in real contents distribution market.

Figure 1 .
Figure 1.Flow of the proposed method.

Figure 3 .
Figure 3.The internal tag structure of music item.

Figure 4 .
Figure 4. Mapping internal tags of music to movie genres.

TABLE I .
MOVIE GENRES IN MOVIELENS

TABLE II .
SYNONYMS FOR 18 MOVIE GENRES drama comedy, farce, melo et al 25 thriller shocker, squeaker et al 7 crime atrocity, breach, felony et al 48 romance love, fling, affair et al 15 children's baby, kid, youth et al 44 documentary broadcast, film, narrative et al 7 sci-fi SF, futurism, sci-fi movie et al 6 horror disgust, dread, panic et al 23 western westerly, westbound et al 6 mystery enigma, problem, thriller et al 37 film-noir(darkness) (Dakeness) black, blackout et al 26 war bloodshed, combat et al 16 fantasy fancy, illusion et al 38 musical sweet, melodic, orchestral et al 25

TABLE III .
MUSIC ITEMS WHICH HAVE TAGS FOR MOVIE GENRES

TABLE IV .
INTERNAL TAGS OF MUSIC FOR MOVIE GENRES

TABLE V .
RECOMMENDATION LIST BY COLLABORATIVE FILTERING