User Profiling in a SPOC: A method based on User Video Clickstream Analysis

—In the present paper, we address to construct a structured user profile in a Small Private Online Course (SPOC) based on user’s video click-stream analysis. We adopt an implicit approach to infer user’s preferences and experience difficulty based on user’s video sequence viewing analysis at the click-level as Play, Pause, Move forward the Bayesian method is used in order to infer implicitly user’s interests. Learners with similar clickstream behavior are then segmented into clusters by using the unsupervised K-Means clustering algorithm. Videos that could meet the individual learner interests and offer a best and personalized experienced learning can therefore be recommended for a learner while enrolling in a SPOC based on his videos interactions and exploiting similar learners’ profiles.


Introduction
The phenomenon of Massive open online courses (Moocs) attracted great attention at the end of 2012, which is known as the year of Moocs. In recent years, MOOCs are said to dominate remarkably online learning in higher education [1, 2] thanks mainly to the flexibility in time and place that they provide to learners and according to [1] these environments will still shape online learning in the future. Small Private Online Courses (SPOCS) like MOOCs, are remarkably growing in universities and corporate education especially in the field of blended learning and flipped classroom learning. Contrary to openness in MOOCs, SPOCs aim to offer a tailor-made course intended for small group of learners [2].
However, the "one-size-fits-all" learning model provided by these environments is not suitable meeting with a diversity of student profiles. The various students learning needs present a real challenge either to MOOCs [3,4] or to SPOCS [5]. In order to overcome these challenges and to improve the average completion rate for MOOCs, adaptive learning and personalized learning are recently explored within MOOCs. Adaptivity and personalization concepts have great potential to deliver the best learn-ing experiences to learners while enrolling in a MOOC [3,4,6,7,8,9,10,11]. In fact, Personalization techniques are a classic solution recommended by many experts for improving learning and adapting courses to the learning preferences and characteristics of the students can enhance the learning process, leading to an increased learner satisfaction [12,13,14,15]. Furthermore, in the last few years, recommendation techniques in the technology-enhanced learning (TEL) field, have invited increasing interest and are employed to enable learners in finding pertinent educational content to meet their profiles [16]. TEL recommendation systems have to be enlarged to various Web-based learning environments such as massive open online courses (MOOC) [17] User profile is one of the issues that it is addressed by personalization systems to recommend individual items such as products, services, documents, etc. that meet the user's interests.
In fact, a user profile is a collection of information relating to user characteristics, interests, preferences and behavior within a system [18,19]. This structure helps a system to know user requirements and behave in accordance of them [20]. Therefore, user profiling, which is the process of capturing the user relevant information, may be a really challenging task, since the values of certain attributes such as interests when they are unknown and not explicit submitted by the user. So, implicit techniques in user profiling gather more importance nowadays [21,22].
In the present paper, we address to use an approach for user profiling in a Small Private Online Course (SPOC) based on an analogical reasoning that stipulate that a user can be characterized by determining a set of similar users [19]. The method known as clique-based filtering method [23,19] is applied in the present work. This technique consists on matching a single profile with profiles of similar users. As Video lecture forms an extremely important part of SPOCs, we suggest building implicitly users profiles based on data collected from analyzing their video interactions while enrolling in the SPOC. We propose to track the user at the click level and we retain the following events: Play, Pause, Move Back (RW), Move Forward (FF), Replay and Download. In order to infer implicitly the user's interests, we suggest using the Bayesian method from datamining techniques. Then, for a given user, we propose to find other users who have similar video clickstream behavior by categorizing them into clusters using the K-means algorithm. The interests of users from the same cluster constitute an implicit profile of the individual user.
The objective of this work is to improve the learner's experience learning within the SPOC by recommending suitable videos that meet the personal learner's interest and match his individual needs by exploiting similar learners' profiles.
The main contributions of the present work is 1) the exploitation and the analysis of the clickstream among a SPOC, 2) The construction of a user profile based on the learner clickstream behavior, 3) the application of machine learning techniques among a SPOC to build a user profile and 4) this work is a contribution on personalization among a SPOC by recommendation of videos that could meet the learner interest and offer a best learning experience.
The structure of this paper is as follows: First, we present the various related work done either on adaptive MOOCs or on clickstream analysis. Second, we describe the elaborated user's profile and finally, we present our ongoing works.

2
Literature Review

User profile
The notion of profile appeared in the 1970s decade, which was mainly due to the need to create custom applications that could be adapted to the user [24]. User profiles were originally developed in the fields of information retrieval [18]. In fact, since a long time, User profiles have played a significant role in recommender systems, retrieval information systems, Search Personalization. But nowadays; user profiling is becoming a widely used technique in many applications like Adaptive Websites, ebusiness applications [22], information seeking [21], web browsing, one-to-one marketing [25], e-commerce websites [26,27], web personalization systems [28], Adaptive MOOCs [3,4,6,8,9,29,30], E-learning systems [16,[31][32][33][34][35].

User Profiling
User Profiling is the process that consists on capturing user information related to his preferences, characteristics, activities and the representation of the user within a system. User profiling helps systems to behave according to users' individual interests in accordance to enhance user experience by providing personalization, adaptivity or suitable recommendation tailored to user's needs [20,26]. A user profile can be static when the user information that it contains such user's common attributes like name, demographics data, learning style rarely changes or dynamic when the user profile's information alters frequently [27]. Various approaches and techniques were used to construct a user profile. Explicit user profiling approaches consist on submitting information directly by users using online forms, surveys, or rating [4,6,9,29]. But even this technique is simple and easy way to create a user profile; it presents a risk of user entering deceitful or incomplete information [22]. Nowadays, recent research is more focused on getting user's data implicitly based on observing user system interaction and infer about relevance of the information captured in such a way [20,21,27,31,36,37,38].

In-Video behavior literature
As Video lectures form an extremely crucial part of MOOCs and SPOCs, we were interested in this paper to build a user profile based on an implicit analysis of data collected from learner video interactions within a SPOC at the click-level. Previous research has already recognized the benefits of user-based analysis related to viewed videos such as explicit comments or tags [39] and much of the available work concern video behavior modelization and prediction with lack of empirical studies [40]. Therefore, there is limited work on implicit indicators deduced from a click-level events analysis within a video [39,40]. In fact, video interactions such as Playing, Pausing, replaying and peaks in re-watching sessions and play events are useful: • To reflect either user's interest in a video or confusion [41] • To reflect experience difficulty [40]; • To tell more about learner video engagement in MOOCs [42,43,44] • To give important drop out indicators [41,44] • To generate video summaries [45] and fast video previews [46] • To identify interesting video segment based on user interaction [39] • To give insights to MOOC instructors about video production to enhance the learner experience [40,42,44] • To identify video interaction patterns [41]; Video interaction profiles [40] and video watching profiles [44] • To reveal the relationship that rely the complexity of videos and student video behavior in a MOOC [47].

Study context
In the present work, we address to build users profiles in a Small Private Online Course (SPOC) named "UNIVTICE" intended for the teachers of the University Hassan 2-Casablanca-Morocco. This course is created and designed on Moodle and aims to accompany and train teachers to acquire techno-pedagogical skills. Our objective is to improve the teacher's experience within the SPOC UNIVTICE by recommending suitable videos that meet his personal interest and match his individual needs by exploiting similar learners' profiles. The present work is the first step where we describe the approach to construct the user profile. In an ongoing work, we intend to test the proposed method on a real dataset collected from the SPOC UNIVTICE.

3.2
The proposed User profiling approach As videos are tremendously important components in the SPOC, we intent to build users profiles based on data collected from analyzing their video interactions while enrolling in the SPOC. We propose to track the learner at the click level and for this, we retain the following events: Play (PL), Pause (PA), Replay (RP), Move Forward (FF), Move Back (RW), Download (DL) and Stop (ST).

Notation
The following notation is used in the present paper (Table 1):

The proposed User Profile
Let define a User Profile as a 3-tuplet (j, , ): Where: j is the learner identifier; is the Video Viewing History related to the learner j and is the Video Interest History related to the learner j.

Video Clickstream tracking
Video Viewing Sequence : Let define a visioning session when a learner begins visioning a video and finishes when he performed another event or action within the SPOC. The learner video interactions are described as a sequence of the events performed by the learner on a video. Each event is described by its name and its duration in second. We adopt the discretization approach introduced in [48] for the temporal dimension. Therefore, we consider a unity of time equal to 1 second, so if event duration is equal to t seconds, we consider that the event is performed by the learner t times. We then normalize the formulation by retaining the event's weight as an event attribute relating to the total events achieved by a learner during a visioning session.
Therefore, for a learner j, we note , his viewing sequence related to a video that is described as a sequence of events ((e, )) performed on video . Each event e in (PL, PA, RP, FF, RW, DL, ST) is characterized by its weight : (1) Where is the weight of event e, is the event e duration.

Video Viewing Profile
: Let define the Video Viewing Profile for a learner j related to a video as a vector composed by all event weights: Learner Video Viewing History : Therefore, for a learner j, we can describe his Viewing Video History as a matrix composed by all the Videos Viewing Profiles related to the learner j:  (4)

Implicit interest indicators:
In this section, we propose an implicit way to assess the interest of a user in a video. We suppose that the Viewing time, the events Replay, Download and Move back performed by a user on a video are implicit interest indicators in this video. We suggest a value equal to 1 for an indicator to mean that a user is interested in a video and a value 0 to mean that a user is not interested in a video. We consider the following implicit interest indicators: : The Total Viewing Time for a user j concerning a video : (5) Where is the Mean of Total Viewing Time corresponding to a video for all users.
: The number of Move Back events for a user j on a video : (6) Where is the Mean of Move Back events on a video for all users.
: The Download event for a user j on a video : http://www.i-jet.org (7) : Replay event for a user j on a video :

The posteriori Probability of an Interest in a video
Let denote the assumption that a user j has an interest in a video . The posteriori probability can be assessed using Bayesian method: (9) Where: and denotes the priori probability of an interest for a user j in a video . They can be assessed with a statistical analysis on the training data.
is the posteriori probability for an interest given the independent interest indicators .
where is the probability of the observing interest indicator given an interest in the video . It can be computed with a statistical analysis on the training data.
can then be estimated using the equation (10):

Video Interest History
For a learner j, the Video Interest History related to a learner j is composed by his interest values corresponding to all the videos can be defined as a vector: 1 if the user downloads the video 0 otherwise iJET -Vol. 14, No. 1, 2019

Users Clustering
The k-means algorithm is one of the widespread unsupervised learning algorithms [49] that is commonly used for classification and pattern recognition [50]. In the present work, we propose to use the k-means algorithm to classify learners given their Video Viewing History into k clusters where each learner belongs to the cluster that minimize the Euclidian distance between his Video Viewing profile vector and the cluster centroid.
The k-means algorithm: Lets k be the number of clusters.
Step 1) Initialization of clusters centroids: Let denotes the set of clusters centers.
Step 2) Calculate the Euclidian distance between each learner video viewing profile given by (equation 3) and each cluster center : Step 3) Assign the learner to the cluster that minimizes the Euclidian distance from all the cluster centers: The user j is then affected to the cluster that minimizes: Step 4) Recalculate cluster centers : Go to step 2) Step 5) If no learner was reassigned then stop, otherwise, repeat from step 3).

Video Recommendation
For a new user , let be his video viewing profile related to video .
Let be the cluster to which the user belongs so as the Euclidian distance to the cluster center is the lowest distance.
The Video Interest for the user related to video can be estimated as described as follows: As possible Video Interest values are 1 or 0, let f1 be the frequency of value 1 and f2 designs the frequency of value 0.    where is the number of cluster members .
Let then define , the most common value (0 or 1) for the video interest for all cluster's members. The "mode" designs the value that occurs most often: (15) For the user , the video interest concerning the video Vi is evaluated by the following relation: (16) Therefore, a list of videos for that the interest value is equal to 1 can be proposed to the user .

Conclusion and Future work
In this work, we proposed an approach to construct a learner profile within a SPOC. The suggested approach is based on learner video clickstream analysis and on the use of machine learning techniques to implicitly assess the learner video interest. Our objective is to improve a learner experience while enrolling within a SPOC by recommending videos that could meet his personal interest and match his individual needs by exploiting similar learners' profiles.
For future work, we plan to test the proposed approach by using a real dataset in the SPOC UNIVTICE in order to improve the learner experience while enrolling within the SPOC. We intend also the exploitation of the dataset to Clustering learners' video behaviors, to assess implicitly learners' difficulties and finally the comparison of the results with similar works.