Introduction

Similar to our daily lives, we somehow behave also in the virtual ones. In the opposite to the reality, our behaviour in the Web is limited to the interaction with its content (mostly implicit). However, it still subjects to user’s actual intent, context or previous knowledge. From this perspective, we understand the user behaviour as a set of his activity within a website. This activity mainly consists of page, i.e., items visits. By mining the user activity data, we can reveal, predict and sometimes understand user behaviour and, respectively, his/her preferences.

One of the promising methods for understanding user behaviour is to find some regularities in actions typical for specific situations (e.g., frequently visited groups of actions, categories, typical visits sequences). These regularities are in general known as behavioural patterns. Behavioural patterns may be represented in several ways, for instance as frequent itemsets [4], frequent sequences of actions [3] or association rules [4].

Usually, behavioural patterns are mined based on all users available. This results to patterns with high confidence. However, depending on a confidence threshold, it allows to find only general patterns or a high number of patterns with low support. It is clear that general global patterns do not reflect specific users’ behaviour. Moreover, patterns interesting for a user (e.g., containing user future actions, interesting content) would be lost in the many others. For this reason, we propose first to segment users based on their behaviour similarity; then, mine so-called group patterns, which are important for individual users. To further explore this idea, we defined the following research questions:

RQ1 Does the idea of proposed combination of general patterns (mined from all website users) with specific patterns (mined from groups of similar users) improve the quality (evaluated indirectly as the precision of a personalized recommendation based on the mined patterns) of mined behavioural patterns?

RQ2 Which algorithm for the stream clustering and behavioural pattern mining produces optimal results from the quality and computational load point of view? What is the performance of proposed approach comparing to the state-of-the-art methods?

Nowadays, a high amount of web usage data are generated especially within large websites with many users (e.g., news, social networks). As these data come in a potentially infinite stream, it is important to be able to process them effectively [10]. As there is need to reveal users’ behaviour as soon as possible and immediately react to it, single-pass methods are required (due to data amount)[26]. Data stream algorithms are built upon models that are updated incrementally in an online time. Traditional data mining algorithms, however, build models off-line in accurate but computationally expensive way, which decreases their usability in real world applications. For this reason, we focus on single-pass algorithms, able to effectively process the data streams. We set the third research question as:

RQ3 What are the characteristics of the HyBPMine method for behavioural pattern mining, when dealing with a data stream?

The exploitation of a user behaviour expressed by behavioural patterns has a wide application as outlined in [13]. It is typically used for the personalization, user action prediction, web pages caching or business decisions support. These applications are typically used in a long-term scale and based on historical data. Behaviour patterns are often used as a feature source for specific tasks, e.g., learning style identification [14], game players classification [29] or taxpayer analysis [37]. For this reason, they do not always reflect the dynamic changes in user behaviour. This is nowadays crucial, as the users’ preferences change very dynamically [28]. For this reason, our aim is to apply behavioural patterns for the recommendation of interesting pages for user to visit.

To answer our research questions and to explore our idea, we propose a novel hybrid method HyBPMine—for behavioural pattern mining over a data stream. We conduct several experiments in e-learning and news domains. Our contributions, presented in this paper, are

  • Our idea of the combination of global behavioural patterns together with specific group patterns significantly increases recommendation precision in comparison to standard global or group patterns used individually.

  • Proposed method HyBPMine based on the CluStream method for user clustering and Incmine for pattern mining brings, in comparison to state-of-the-art methods, the highest pattern quality (measured by recommendation precision) and one of the highest speed rates.

  • The computational cost of proposed method HyBPMine, in comparison to methods mining the global patterns only, is higher only by a constant size (which is stable over the time).

The paper is structured as follows. In Sect. 2, we describe state-of-the-art approaches for mining and application of behavioural patterns. This section contains also descriptions of existing data stream algorithms for behavioural pattern mining related to our method proposed later. In the next section, we present our proposed hybrid method—for mining and hybrid combination of global and group patterns in online time. In the subsequent section, we evaluate the proposed method by the personalized recommendation task. Our main goal is to evaluate how useful is the pattern combination and whether such a method can quickly adapt to new data in a stream.

Related Work

The main purpose of a behavioural pattern is to identify the sequences or sets of actions, which are often performed consequently or at least together in same sessions. As such patterns can be used for various tasks, multiple mining approaches were proposed, differing in their characteristics and thus also usage possibilities. The most used techniques for pattern identification are based on the clustering [24, 32] or graph representation [19]. They, however, differ not only in the way the patterns are mined, but also represented [33]. A common trait of these approaches is that they consist of computationally expensive off-line pattern identification followed by online application (recommendation or prediction). In this way, the patterns become quickly obsolete and their usability for short-term tasks decreases. Moreover, these patterns are same for all website users to sufficiently describe a behaviour of specific users.

One representative of clustering approaches is the method using Fuzzy c-means clustering to find clusters of similar sessions and degree of attachment between the users and the clusters [24]. Behavioural patterns are in this case represented as weighted association rules mined from every cluster of user sessions. This helps to identify patterns specific to certain user groups with similar behaviour. Centroid-based clustering is used also in [25], where user sessions are first clustered using k-means. Later, the Markov models are built over each cluster for behavioural pattern identification. In addition, the association rules are applied to eliminate potential ambiguity in patterns, which increases the usability of found patterns. Similar approach using weighted k-means clustering is used for the prediction of navigational user behaviour and recommendation of next items to visit [32]. As well as previous approaches, it is based on an off-line component used for user session clustering. The behavioural patterns are derived based on cluster centroids, which allow to identify specific patterns. The off-line clustering component, however, decreases the methods’ usability.

Additionally, there exist also density-based clustering and graph approaches, which are able to identify more complex patterns and hidden relations. In [5], authors use DBscan algorithm to cluster sessions and derive the behavioural patterns. By the usage of DBscan, the approach is able to reveal otherwise ignored patterns because of their low support but high confidence when represented as association rules. Finally, the approach uses inverted index to effectively store user behaviour online.

Graph approaches suffer from computationally expensive update operation (addition of new user session). On the contrary, the search operation in a graph is relatively cheap and thus can be used in an online time. The high update cost can, however, limit its usage in domains with high user activity. WebPUM approach uses the user navigation graph, where nodes represent web pages and edges their mutual weighted connections [19]. The weights of edges are calculated based on an intensity and frequency of page pairs being co-visited in same session. Based on the graph, patterns are identified and pages recommended to users according to their actual behaviour.

As we can see (Table 1), various approaches use different types of patterns representation (clusters of similar user sessions, association rules, components of navigation graph), which have an ability to discover interesting and hidden relations in the data. These patterns are, however, difficult to be effectively mined and maintained in an online time.

Table 1 Comparison of selected pattern mining algorithms

Mining Frequent Closed Itemsets over a Data Stream

Frequent itemsets are considered as a simple, but effective representation of behavioural patterns. They are less complex and thus more effective to process than frequent sequences or association rules. On the contrary, they are more suitable for domains with a high number of items, where the consideration of items order should bring high amount of low-quality patterns (in the mean of the low precision of recommendation based on patterns). Frequent itemsets can be further used to compute association rules, which may be considered as more complex behavioural pattern representation and may lead to revealing of hidden dependencies in users’ behaviour.

In our work, we focus on algorithms mining frequent closed itemsets. This representation offers a complete and non-redundant description of all frequent itemsets [34]. An itemset is closed if there does not exist a superset with the same support (number of occurrences) and it is frequent if it has the support higher than given minimal threshold [34].

Several algorithms were proposed for mining frequent closed itemsets. They can be classified based on the windowing approach they use [27]. For example, there are landmark windows, containing all items from start of the stream, or sliding windows, containing only most recent elements. Algorithms could be divided (according to type of frequent itemsets they mine) into exact set miners [12, 23, 36] or an approximate set miners [11, 30]. Approximate mining is more effective, compared to the exact frequent itemset mining, because it does not have to consider all itemsets (both frequent and infrequent) and can respond well to the concept drift.

First algorithm for incremental mining of closed frequent itemsets over a data stream is MOMENT [12]. It mines exact frequent itemsets using sliding window approach. It has become a baseline for several algorithms proposed later. It uses in-memory prefix-tree-based data structure called closed enumeration tree, which effectively stores information about infrequent itemsets, nodes that are likely to become frequent, and closed itemsets. A first successor of MOMENT algorithm, called NEWMOMENT, represents itemsets and window as bitsets [23]. It allows usage of efficient bitwise operations, e.g., to count the support of itemsets or perform a shift of the sliding window.

A different approach, used for the closed frequent itemsets mining is called CloStream. It uses Cid List and the SET function to find the closed itemsets similar to actual transaction [36]. In comparison to previous approaches it is more computationally effective because it intersects transactions with specific closed itemsets only [36].

Abandoning requirement to mine exact frequent itemsets helps to design fast algorithms for mining approximate frequent closed itemsets like IncMine [11]. IncMine uses relaxed minimal support threshold to keep infrequent itemsets that are promising to become frequent later. It uses an update per batch policy that is different to all other algorithms we described. This setting results in the low per-transaction time, while they add a risk of temporal out of date of the identified patterns (while individual batches are being collected) [27].

Another algorithm for the approximate frequent closed itemsets mining, named CLAIM, solves the problem of frequent concept drifts, by the frequent itemset redefining and the usage of support value intervals as same values [30].

In [31], an approach mining only itemsets with high actual support was proposed. In contrast to frequent closed itemsets, it allows to compress the patterns using compressible prefix-tree structure. Nodes in the tree, which represent itemsets with support difference smaller than defined threshold \(\delta \), are joined.

As we can see (Table 2), there exist multiple approaches designed for mining frequent patterns over a data stream. The actual choice should reflect the specific pattern application.

Table 2 Comparison of selected frequent closed itemsets mining algorithms

Clustering over a Data Stream

A logical choice for detecting communities of similarly behaving users is a user clustering. This is used as a base for identifying group-specific behavioural patterns. Clustering is an unsupervised machine learning approach able to find the natural data distribution. Traditional clustering methods are mostly insufficient in finding behavioural patterns in an online time. For this reason, the trend is to use stream clustering approaches. Several algorithms for clustering over a data stream were proposed in recent years.

CluStream algorithm is based on two (online and off-line) components. The online microclustering component performs fast transformation of incoming data instances into a compact approximate statistical representation. The off-line macroclustering component uses this representation to get results of the clustering on demand [1]. This approach is slightly adapted in other works using different macroclustering methods and altering microclustering phase. For instance, a density-based algorithm DenStream [8] uses DBscan as a macroclustering algorithm and defines new concepts of core-microclusters and outlier-microclusters. DenStream can detect clusters of arbitrary shapes, while it does not require any assumption on the number of clusters. CluStream approach is adapted also in HPStream—projected clustering for high-dimensional data streams [2]. It outperforms basic CluStream with the high-dimensional streaming data.

The D-Stream [9] density-based algorithm is a representative of non-CluStream-based algorithms family. It maps input data into a density grid. An off-line component clusters the grid. It adopts a decaying technique to capture dynamic changes of a data stream. ClusTree [21] is a hyperparameter-free algorithm that automatically adapts to the speed of the data stream with usage of compact and self-adaptive index structure for maintaining stream summaries. It considers the age of the objects to reflect the higher importance of more recent data. A common trait of these algorithms (Table 3) is that they effectively process data instances into compact representations and offer their clustering on demand in an online time.

Table 3 Comparison of selected data stream clustering algorithms

HyBPMine—Method for Online Time Behavioural Pattern Mining

In this section, we introduce the idea of our method (Fig. 1) for online mining of behavioural patterns from the user activity stream, which combines

  • global patterns, discovered from behaviour of all website users, and

  • group patterns determined for dynamically identified groups of similar users.

Fig. 1
figure 1

Activity diagram of the proposed method HyBPMine for behavioural pattern mining. u represents user actually modelled user. u.nsc is a new sessions count, a number of new user sessions performed since last macroclustering. muc represents a microcluster update counter, the number of microcluster updates since last macroclustering. tcm is a threshold of changes in microclusters required to perform periodical macroclustering part. u.lmid represents an identifier of the last macroclustering performed since last user activity, lmid is a global identifier of the last macroclustering

The pattern combination helps to find the most important patterns for specific users and to use these patterns for a chosen task, e.g., recommendation of interesting items to visit as we do in evaluation.

The method processes a stream of user sessions S represented as a set of user actions \(a_i, S=\{a_1,a_2,...,a_n\}\) (Fig. 1, activity A1). Actions cover various types of user interactions, e.g., web page visits, product purchases or shopping basket management (adding or removing items). According to the type of the action, it is possible to mine patterns of different complexities. For instance, in the case of web page visits, there will occur high number of specific patterns describing user behaviour in detail but with relatively low confidence. On the other hand, in the case of page section visits, a lower number of more general patterns with high confidence will be mined.

Based on input sessions S, the user behaviour is captured by user models. We represent the user behaviour on the website as a vector of all possible actions (site pages) with their frequency. The user model u consists of the following attributes:

  • u.aq—actions queue. The queue with limited capacity of actions, the user performed in recent history.

  • u.gid—group identifier. ID of the group, the user was classified to during the last macroclustering.

  • u.nsc—new sessions count. A number of user’s new sessions performed since last macroclustering.

  • u.lmid—last macroclustering identifier. ID of the last macroclustering performed since user’s last activity.

The set of user models u is used as an input for user clustering phase, which is continuously updated after each user session (Fig. 1, activity A2). For this reason, our method works with up-to-date data and it can quickly react to frequent changes. Clusters or groups of similarly behaving users are used for the identification of group behavioural patterns. Additionally, the global patterns are identified based on the behaviour of all users. In other words, our method comprises two main stages:

  1. 1.

    Similar user clustering—clustering of users based on the similarity of their behaviour (Fig. 1, activities A3–A6).

  2. 2.

    Behavioural pattern mining—mining global and group behavioural patterns represented as closed frequent itemsets over (session) data stream (Fig. 1 activities A7–A8).

Similar User Clustering

The first stage of our approach is a clustering of users according to their behaviour similarity in recent history. We adopted the CluStream algorithm consisting of fast microclustering phase and a precise macroclustering phase [1].

The microclustering creates a high number of small clusters which are in the macroclustering phase joined by k-means (or other traditional batch clustering to find the final clusters). In total, there are q microclusters (depending on memory available). Typically, q is larger as number of required clusters, but lower as number of instances. Clusters are re-calculated with every new observation. Based on the root mean square deviation an observation is assigned to existing cluster or a new cluster is created.

Thanks to the microclustering phase clusters quickly adapt to changes in data, which is important in the processing of a data stream. The similar user clustering stage of the proposed method consists of following steps:

  1. 1.

    Microclusters update—in this step, actions from actual session S are added to the queue u.aq in the user model and u.nsc (new sessions count) counter is incremented. If u.nsc is higher than input hyperparameter tcu (threshold number of changes in the user model) then u.nsc is set to 0 and microclusters are updated with instance generated from u.aq (user model actions queue) and u.muc (microcluster update counter) is incremented by 1 (Fig. 1, activity A3).

  2. 2.

    Macroclustering—the step is performed in regular intervals [after defined number of microcluster updates (if \(muc>tcm\))], otherwise is skipped and the stage continues with assigning the user model into macrocluster (Fig. 1, activity A6). The muc is calculated as a number of microclusters’ updates, which is compared to the given threshold tcm (threshold of changes in microclusters). When muc is higher than threshold tcm, the muc is set to 0 and macroclustering is performed (Fig. 1, activity A4).

  3. 3.

    User model maintenance—after the macroclustering phase, every user model u with \( tcdiff <(lmid---u.lmid)\) is deleted, where \( tcdiff \) is the threshold of clustering identifier difference and lmid is the last macroclustering counter. In this way, we are able to handle a high amount of the users (Fig. 1, activity A5).

  4. 4.

    User model assigning—if the user was in the macroclustering phase assigned to different cluster than previously, his u.lmid is updated to the value of current global lmid. In this case, the user u is assigned to a new cluster (identified by u.gid) based on his/her most recent behaviour. This results into new group patterns assigned to the user in later method stages (Fig. 1, activity A6).

Behavioural Pattern Mining

After the similar user clustering stage, the method continues with the behaviour pattern mining. These come in two types:

  • group pattern is defined as a typical behaviour extracted from the activity of similar user cluster. There patterns reflect actual preferences of users with similar behaviour.

  • global pattern is a behaviour pattern based on the activity of all users in the system (with defined support and confidence).

The behavioural pattern mining stage consists of the following steps:

  1. 1.

    Mining group-specific and global patterns—the user session becomes an input for mining global and group frequent pattern stages. We use IncMine algorithm [27], implemented in MOA framework,Footnote 1 modified to separate mining of global patterns and patterns for specific groups. It is fast and memory effective as it performs pattern updates in small batches and approximates mining over a sliding window. The result of this step is the set of global patterns \(P_{GL}\) and group patterns \(P_{GRi}\), where i represents the index of the group (Fig. 1, activities A7–A8).

  2. 2.

    Hybrid pattern combination—after the pattern mining, the next step aims at combination of global \(P_{GL}\) and group patterns \(P_{GRi}\). Each pattern \(P_j\) in the set of patterns is represented as a set of actions \(P_j= \{a_1,a_2,...,a_k\}\). We use the following strategy to choose behavioural patterns according to actual evaluation window set \(W_e\) (set of last e actions in user session) (Fig. 1, activity A9):

    1. (a)

      If a user belongs to some group i, then \(P_{all}=P_{GL}\cup P_{GRi}\), otherwise \(P_{all}=P_{GL}\). For each \(P_j \in P_{all}\), an approximate support value \(support(P_j)\) is calculated and normalized. Also, the size of the intersection \(lcs(P_j,W_e)\) of each \(P_j (from P_{all})\) with \(W_e\) is determined with LCS algorithm (least common subset) and normalized.

    2. (b)

      All patterns in \(P_{all}\) are sorted. At first, in descending order by size of their intersection with \(W_e\). At second, by descending support value.

    3. (c)

      Let M be the hashmap of items and their importance. By iterating over all patterns, the item importance is calculated. Importance value of every item \(M_i\) that is contained in a pattern \(P_j\) and not in \(W_e\) is incremented by \(support(P_j) \times lcs(P_j,W_e)\).

    4. (d)

      Finally, M is sorted descending by vote values and best r items are selected into the final list of most interesting items.

The method is designed for distributed topology using Apache Storm, the highly scalable stream processing framework [18]. In proposed topology, the global and group patterns are mined parallel to decrease the computation time of proposed HyBPMine method. In addition, as the user groups are not overlapping, parallel mining process could be realized for each user group independently (Fig. 2). The aim is to minimize a computation overhead caused by clustering and group pattern mining. Recent patterns are cached in Redis data store. They are retrieved to be further used for, e.g., recommendation or prediction.

Fig. 2
figure 2

Apache Storm topology for the proposed method application in distributed environment. Circles represent processing nodes, square data source, full arrow the flow of data, dashed arrow represents flow and usage of patterns

Summary of Hyperparameters

An important part of method proposal is the optimization of the input hyperparameters of individual method parts (clustering algorithm, frequent pattern mining algorithm). We divided input method hyperparameters into three groups according to the method component they belong to (Table 4).

Table 4 Summary of proposed HyBPMine method input hyperparameters

As we use IncMine algorithm for mining frequent closed itemsets, we need to search for best setting of its hyperparameters: ms (minimal support), rr (relaxation rate), sl (segment length) and ws (window size).

For clustering, we use the CluStream algorithm with k-means macroclustering implemented in MOA framework. In this case, we search for the best setting of its input hyperparameters: gc (a number of clusters), mmc (maximal microclusters count), together with input hyperparameters specific to our method related to the clustering part: tcu (threshold of changes in the user model required to update microclusters with new data instance generated from user model) and tcm (threshold of changes in microclusters required to perform periodical macroclustering part).

The group of general method hyperparameters is composed from ews (evaluation window size representing number of actions in user session used to identify best combination of patterns), mts (minimal number of transactions per second accepted) and tcdiff (threshold of clustering identifiers difference). As reported in [22] mining of frequent itemsets over the data stream requires a trade-off between accuracy and computation effectiveness according to needs of specific task it will be used for. Our method offers option to set a minimal required speed as input hyperparameter mts. Threshold of clustering identifier difference represents a maximal allowed difference between global macroclustering counters.

Evaluation

We evaluated proposed HyBPMine method indirectly by a recommendation task. We recommended to users interesting actions according to the identified behavioural patterns. The recommendation was considered to be successful if a given user had clicked on the recommended action (item) anytime in the rest of his actual session (in which the recommendation was generated).

As the session identification itself is an open research problem, in this paper we use a generally accepted approach dividing actions into sessions based on time gap between consecutive actions (with gap 30 min). This gap is set according to power law distribution of sessions’ length within a site [17].

To prove the characteristics of the proposed HyBPMine method, we compared it at first with the method using global patterns only (GL) and the method using patterns specific for groups of users with similar behaviour (GR). Second, we compared the proposed method to the state-of-the-art approaches.

Each of the user sessions S was divided into train window \(\hbox {TR}=\{a_1, a_2,...,\) \(a_{ews}\}\) and test window \(\hbox {TE}=\{a_{ews+1},...,a_{|S|}\}\). The train window was used for the identification of pattern candidates similar to the actual user session. Based on actions from the train window, the set of recommended actions R was generated. The size |R| of items to be recommended was strictly defined to N, where \(N \in \mathbb {N}\). The set R was compared to actions from test window to evaluation of recommendation quality. Recommendation results were evaluated by precision at N metric, which shows the percentage of items chosen by the user from N items recommended to him. It is computed as

$$\begin{aligned} \hbox {Precision}@N=\frac{\left| R \cap \hbox {TE} \right| }{\left| R \right| }; \left| R \right| =N; N \in \mathbb N. \end{aligned}$$
(1)

Datasets

We evaluated proposed method on two datasets from domains with different characteristics. The first dataset comes from e-learning system ALEF [6], second from the news portal. The pre-processing of both datasets consisted of sessions identification and omitting very short sessions. It is clear that one action long session is hardly dividable into train and test windows. As our method processes the data by a single pass, we evaluated it by a “test-then-train” approach, where each single session is used for the method evaluation at first and then for its training [7]. In this way, the method was evaluated on the all dataset instances (sessions).

E-learning dataset contains 24k sessions made by 870 users. Actions were performed between October 2010 and April 2013 [917 days, but only 737 active (with at least one user visiting at least one page)]. There are in average 33.4 sessions per active day with 15 actions in average per session. In this e-learning system, users could visit 2072 learning objects in total (web pages).

In news dataset, there are 334k sessions from 199k users. Data were collected between March 2015 and July 2015 (122 active days). In average, there are 2739.8 sessions per day with three actions in average per session, which is significantly different to the e-learning dataset. In addition, we removed all user sessions with length less than three actions. Many of users were less active than users in e-learning system and they did not return to the site so often. In news domain, there typically exist several thousands of pages (articles) and they are active for a short time to be able to recognize behavioural patterns over their visits. For this reason, in the news dataset, we assigned pages (i.e., articles) into 85 categories (e.g., culture, sport) and we mined patterns over these categories (instead of articles). This abstraction brought a significant pattern recognition speed improvement (lower number of possible actions and patterns) and pattern support increase in comparison to the e-learning dataset.

Hyperparameter Optimization

Considering the number of input hyperparameters, our method accepts (Sect. 3.3), we used a grid search approach to find the most promising configurations maximizing the performance (i.e., recommendation precision). To reduce a set of hyperparameter value combinations, we tuned method parts independently based on their group (Table 4). In this section, we describe best configurations of individual hyperparameters.

Patterns for e-learning dataset were discovered in a high-dimensional space of all accessible pages, which slowed down the mining process. In news dataset, the mining was much faster due to lower dimensionality of actions (page categories abstraction) and, therefore, mts value did not affect the process. In Table 5 we show tested hyperparameter values and their groups based on the method component they describe.

Hyperparameters from the first group are used for the pattern mining component optimization. Let h be number of recent sessions stored in the memory (\(h = ws \times sl\)). Evaluation of every hyperparameter configuration started after certain number of transactions processed (computed as \(h_m=\max _{c \in C} h_c\), where C is the set of all searched hyperparameter configurations). This approach was used because the configuration with smaller h can recommend earlier than configurations with higher h value. For the first group (pattern mining), we evaluated 270 different hyperparameter configurations. In both domains, we observed that configurations with longer window size (\(ws\ge 10\)), shorter segment length (\(sl\le 50\)) and lower relaxation rate (\(rr\le 0.5\)) reached the highest precision. The reason was, that for these configurations, patterns were updated very often (after every segment).

For the first hyperparameter group, we observed that small values of ms caused generation of high number of patterns. This slowed down the method and failed in reaching the minimal method speed requirements for e-learning domain. On the other hand, high ms values caused generation of small number of patterns only and thus also the lack of some potentially important patterns.

Tuning hyperparameters from second group (clustering) brought 120 different configurations. We observed that clustering of users to small number of groups (\(\le 4\)) was less successful, because some groups were incorrectly merged together.

In both domains, we observed the best results when used small threshold of changes in the user model (\(tuc=5\)) in the combination with high threshold of changes in microclusters (\(tcm\ge 400\)). It means that microclusters were updated more often for more recent users’ behaviour. Also the number of microcluster updates, required to perform the macroclustering, was also reasonably higher. In this way, the method got enough information to be able to cluster users correctly.

For the third group of hyperparameters (general), we tuned hyperparameter ews (evaluation window size). Let rc be number of recommended items given by application input hyperparameter. For recommendation evaluation, we could only use sessions with length \(|S|\ge ews + rc\). For this reason, the tuning of ews and rc changed the number of sessions available for the evaluation. We could not directly compare hyperparameter configurations with different values of ews and rc (it would be only possible if we omitted high number of short sessions). We observed that for the e-learning dataset, the best results were achieved for \(ews\ge 2 \wedge ews\le 6\). For news dataset with many short sessions, \(ews=1\) performed the best results for recommendation of one action. Recommendation of multiple actions, however, reached better results with higher ews value.

Table 5 Values of input hyperparameter attributes tested in grid search optimization and a list of best configuration found for e-learning and news domains

Next, we observed that early deleting of inactive user models, i.e., after a new macroclustering is performed (when \( tcdiff \) is low), decreased the precision. In both domains, we observed a trend of precision increase after increasing the \( tcdiff \) threshold. In the e-learning dataset ascending precision trend was visible until \( tcdiff =5\). In the news domain, the recommendation precision grew proportionally to \( tcdiff \), until \( tcdiff =15\), where it stagnated. We represent user actions as category visits, which are stable and thus usable for longer time periods. In this case, models should remain until users are active in the website. As a result, we found for each dataset, the optimal configuration maximizing the precision metric and sessions speed rate (Table 5).

Results

We performed several experiments aiming at answering our research questions, which should explore the performance of proposed method HyBPMine from several perspectives. At first, we focused on a comparison of HyBPMine to its components. In this way, we proved that the pattern combination is beneficial. Later, we compared our hybrid method to the state of the art. In this way, we evaluated pattern quality and computation speed. Finally, we evaluated proposed method characteristics from the view of the long data stream processing.

RQ1 Does the idea of proposed combination of general patterns (mined from all website users) with specific patterns (mined from groups of similar users) improve the quality (evaluated indirectly as the precision of a personalized recommendation based on the mined patterns) of mined behavioural patterns?

For each user session, we separately observed results of recommendation, which employ global behavioural patterns only (GL), group behavioural patterns only (GR) and their combination—our proposed hybrid method (HyBPMine). This results in three different recommendation sets and their intersections (Fig. 3).

Fig. 3
figure 3

Labelling of successful recommendations based on the method using global patterns only (GL), method using group patterns only (GR) and proposed hybrid method (HyBPMine). Each method is marked as H (hits) followed by three bits marking usage of individual methods

We performed a statistical unpaired t test to compare precision (Table 6) of proposed method and baseline methods. For both datasets, HyBPMine reached a significant increase (\(p < 0.0001\), \(df = 7600\) for e-learning dataset, \(df = 12598\) for news dataset) of the precision when compared to GL (1.7% e-learning dataset, 0.9% news dataset) and GR (21.4% e-learning dataset, 41.4% news dataset) (Table 6).

To compare results of proposed hybrid method (HyBPMine) not only to its parts (methods GL and GR used individually), we present also results of the best theoretical combination of both individual methods (Table 6). In this case, for every session, items with the highest recommendation precision were chosen from the GL and GR results. This represents the ideal combination (unreachable in real settings) of items to be recommended based on patterns identified by GL and GR methods. Based on this information, we compared how good our hybrid method HyBPMine was to the ideal combination (chosen based on a posteriori knowledge of actually chosen items).

We observed (Table 6) quite large intersection of successful recommendations generated based on global and group patterns (\(H110 + H111\)). Number of successful recommendations generated by usage of global patterns only (\(H100 + H101\)) was much higher than the number of successful recommendations generated with usage of group patterns only (\(H010 + H011\)). The reason was that a small number of sessions within groups resulted in low-quality patterns (low support). We observed that the precision was higher in larger groups with many active users (several hundreds).

Table 6 Results of recommendation Precision@N (p@N) of proposed hybrid method (HyBPMine) compared to methods using global patterns only (GL), group patterns only (GR) and their best theoretical combination (TC) (in %). Additionally, we provide results of various method combinations and subsets, which use labels Hxxx defined in Fig. 3

In addition to the average precision, we observed also the recommendation precision inside each user group during the stream processing over the time. Results were logged in regular intervals before new macroclustering iteration (in total, 9 measurements were realized during the stream processing in the e-learning dataset and 18 measurements in the news dataset) (Fig. 4). In other words, we can observe the precision metric evolution over the time for both domains.

Fig. 4
figure 4

Recommendation precision (average of user groups) of method using global patterns only (GL), method using group patterns only (GR) and proposed hybrid method (HyBPMine) within groups for e-learning (top) and news datasets (bottom). The iterations on axis x are taken regularly with new macroclustering calculated

To answer our RQ1, based on our results (Fig. 4), we can see that proposed method (HyBPMine) for both datasets reached the highest precision, which supports the idea of combining global and group patterns. The difference, however, lies in results of methods using global patterns only (GL) and method using group patterns only (GR). For e-learning dataset, GL reached better precision in comparison to GR, while in the case of news dataset, the situation was opposite. We believe, it was caused by the number of users in the datasets. In the news dataset, there exist a high number of users, so they could be clustered into highly similar and quality groups. For this reason and despite short sessions, mined patterns offered higher precision when used for recommendation. In the e-learning dataset, there were less users, who performed more specific sessions (higher variety of pages visited). For this reason, it was impossible to create such quality behavioural patterns, which could be used for the recommendation with high precision. Despite this restriction, group patterns were useful, as can be seen from results of HyBPMine, which outperformed the GL method.

RQ2 Which algorithms for stream clustering and behavioural pattern mining produce optimal results from the quality and computational load point of view? What is the performance of proposed approach to state-of-the-art methods?

The method was designed to be modular. For this reason, individual components (e.g., CluStream for clustering and IncMine for the pattern mining) can be replaced easily. To answer our RQ2, we compared the proposed method to different combinations of ClusTree [21] for clustering and Estdec+ [15], respectively, CloStream [36] for pattern mining (Table 7). We used implementation of ClusTree from MOA framework and CloStream and EstDec+ from SFPM framework [15]. CloStream requires no input hyperparameters. To optimize EstDec+ hyperparameters we utilized the same approach as described for IncMine. Results showed that with CluStream we obtained a slightly better recommendation precision than with ClusTree. However, ClusTree seems to be faster with less dimensional data in the news dataset. Its advantage is the ability to reasonably adapt to the speed of the stream. Therefore, in the case of rapid stream speed changes, it may be more suitable to use ClusTree instead of CluStream. IncMine in comparison to other algorithms got the best results in terms of the speed and precision when number of recommended items is \(r\le 5\).

Table 7 Comparison of proposed hybrid method HyBPMine using CluStream and IncMine algorithms to various method variants using different clustering and pattern mining algorithm combinations. Experiments were executed on the experimental prototype of the proposed method (serial, non-distributed version). The speed rate (Sr) represents the number of sessions processed per second, p@N is a Precision@N items recommended.

We compared our method also to well-known exact frequent itemset mining algorithm for static datasets FP-Growth [16] (Table 8). To be able to compare it with off-line approach, we chronologically divided the datasets into train (167k sessions in news dataset, 12k in e-learning dataset) and test sets. For FP-Growth, we performed a hyperparameter optimization (e-learning dataset: \(ms = 0.02\), \(ews = 2\); news dataset: \(ms = 0.0004\), \(ews = 1\)). Our method gained higher average recommendation precision (57% news dataset, 52% e-learning dataset) than FP-Growth (only 33% news dataset, 24% e-learning dataset). This is a promising result in the recommender system domain. In addition, it spent less time for both training and recommendation. This result showed that our method identifies the most recent, highly supported patterns and retains long-lasting patterns, which makes it ideal for personalized recommendation task within highly dynamic websites.

Table 8 Comparison of proposed hybrid method HyBPMine to FP-Growth method for the static datasets. Experiments were executed on the experimental prototype of proposed method (serial, non-distributed version). p@N represents Precision@N items recommended.

To sum up the characteristics based on facts presented in the related work section, we present a list of HyBPMine characteristics based on best configuration of partial algorithms presented in this subsection (Table 9).

Table 9 List of HyBPMine characteristics based on best configuration (CluStream and IncMine).

RQ3 What are the characteristics of HyBPMine method for behavioural pattern mining, when dealing with a data stream?

For approaches specialized in the data stream processing, a computation time represents one of the crucial criteria of the usability. To answer this research question, we observed, the speed rate of the proposed method (pattern identification without recommendation). For evaluation purposes, we defined the speed rate metric as an average number of transactions (sessions) processed in a second.

To reduce the computational overhead caused by clustering and group-specific pattern mining, the parallelism is being used [35]. We deployed and evaluated our method using distributed Apache Storm topology (described in Sect. 3.2). In this way, the pattern mining was performed parallel for each of the user groups. To be able to consider computational cost of user clustering, group pattern creation and their combination with global patterns, we compared maximal achievable speed rate of single serial implementation of proposed hybrid method and its parallel implementation deployed on distributed topology to speed rate of GL method (Fig. 5).

Fig. 5
figure 5

Average speed rate of proposed HyBPMine method for e-learning (top) and news datasets (bottom). Measurements were taken at regular intervals after N sessions processed (for e-learning dataset \(N=2\)k, for news dataset \(N=15\)k)

Differences in speed rate between GL, HyBPMine and HyBPMine distributed version are constant (Fig. 5). As a result their usability from the computational time point of view is almost identical. Of course, the GL method overcomes two other methods which use it as a part of computation (together with GR finding group patterns). On the contrary, they reached better average group precision (Fig. 4) so they are in general more useful than the GL method.

The usage of distributed version of HyBPMine is also supported by the maximum speed rate and corresponding precision of compared approaches (Fig. 6). As we can see, by increasing the mts parameter the precision decreases. Based on the requirements for the application, it is reasonable to restrict mts and/or to use the distributed version.

Similarly, to the speed rate evaluation, we explored the precision metric over the time (i.e., sessions processed). As we can see (Fig. 4), proposed method HyBPMine delivers stable performance over the time (after standard initial cold-start issues).

Fig. 6
figure 6

A precision decreasing trend over increasing speed rate (mts parameter). The results are presented until maximum speed rate is reached for each approach

Finally, we performed a speed rate endurance test for proposed HyBPMine method with Apache Storm topology on news dataset. We extended the dataset to reach 5 million sessions. The results show (Fig. 7) that the speed rate becomes stabilized after certain number of session processed. Both results answer our research question RQ3 and thus we can conclude that the method is able to effectively process a stream even when it contains high amount of data.

Fig. 7
figure 7

A speed rate endurance test of the proposed HyBPMine method on multiplied news dataset

Conclusions

The behavioural patterns have a wide application as they describe typical user actions. Existing pattern mining methods consist of a computationally expensive off-line part, which limits their usage in dynamic web applications. For this reason, in this paper, we aim at proposing novel method, designed to perform the behavioural pattern mining in online time by usage of data stream processing algorithms. The method identifies the behavioural patterns (sets of frequently visited pages or categories) by combining frequent closed itemsets of actions performed by users with the itemsets of actions performed by groups of similar users. In this way, we are able to find more specific patterns, for individual users, than is typically offered by existing approaches. To evaluate the proposed method, we set three research questions.

To answer RQ1, we compared the proposed HyBPMine method to its parts (method mining global behavioural patterns and method mining patterns for groups of similarly behaving users) used individually. The evaluation was realized on datasets from the e-learning and news domain, with highly different characteristics (number of items and users, different average length of session, etc.). We performed multiple experiments intended to investigate how useful is the combination of global and group patterns. The method was evaluated indirectly through a recommendation task to evaluate the quality of mined patterns. We observed Precision@N and computation speed rate of pattern identification. Results clearly showed that usage of group and global patterns combined into HyBPMine method significantly increases the recommendation Precision@N (for all evaluated values of N) in comparison to global or group patterns used individually, for both domains.

Next, to answer the RQ2, we compared the proposed method, using CluStream clustering and IncMine pattern mining, to the state-of-the-art methods. As the proposed method is designed to be modular, we also explored which state-of-the-art algorithms produce the optimal results from the quality and computational load point of view. We evaluated the contribution of our method to its components as well. We find out that HyBPMine method offers the highest precision when evaluated on recommendation task and also one of the highest computation speed rates (setting without clustering was slightly faster as it does not mine group patterns).

In RQ3, we explored HyBPMine characteristics when dealing with continuous data stream. We found out that computational cost of proposed method, in comparison to the method mining the global patterns only, is higher only by a constant size (which is caused by the identification of user groups and group pattern mining). The small additional load allows the production usage of our method. The computation load proved to be constant even on the extended dataset containing several millions of sessions. In this way, we demonstrated method usability on extensive continuous data streams.

To conclude, we proved that proposed method HyBPMine delivers the best performance in the recommendation tasks among state-of-the-art approaches we compared. We also showed that our method is scalable and able to handle stream of user actions within a web application.

Nowadays, we can see an increase of data volumes generated by users’ interaction on the web. This logically results in research of new methods able to process high amount of data effectively. In the future, more and more stream methods will arise and be used for various tasks. Our future aim is to evaluate HyBPMine on short-term prediction tasks as prediction of next user action, session end intent [20] or user loss prediction.