Changing perspectives: Using graph metrics to predict purchase probabilities
Introduction
The e-commerce sector is responsible for a substantial fraction of firm revenues. Annual turnover was 1336 billion US dollars in 2014 and is predicted to have reached 2050 billion US dollars in 2016 (Statista, 2016a). However, given that growth rates are expected to decline in the future (Statista, 2016b), e-commerce shops need to find ways to defend market shares in an increasingly competitive environment. One strategy is to increase purchasing amounts and/or frequencies of existing customers. Important determinants of (re-)purchase intention in online shopping are trust, service quality (Hong & Kim, 2012) and user satisfaction during the online shopping process (Lee, Choi, & Kang, 2009). To offer a richer user experience and increase visitors’ (re-)purchase intentions, understanding customer online behavior is crucial (e.g., Pai, Sharang, Yadagiri, & Agrawal, 2014). To gain such insight and to anticipate user actions, the analysis of clickstream data has been widely adopted in the literature (e.g., Park and Park, 2016, Van den Poel and Buckinx, 2005).
However, previous work in the field has not examined the potential of graph theory to gather auxiliary information from clickstream data and increase the accuracy of behavior prediction models. Graphs are a methodological approach originating from network theory. They consist of nodes and edges, which connect nodes. Graph-based approaches have been used in various fields and have been proven to be helpful for various tasks, for example to predict connections in the social networking context (He, Liu, Hu, & Wang, 2015), to detect money laundering activities (Colladon & Remondi, 2017), for personalized recommendations (Shams & Haratizadeh, 2017) and for customer churn prevention (Óskarsdóttir et al., 2017). Given the success of graph-based predictors in these and other applications, the objective of our paper is to test their potential for online behavior prediction based upon clickstream data.
We contribute to literature as follows: First, we propose an approach to derive graphs from user sessions based on clickstream data. Second, we calculate graph metrics and examine their pairwise dependency in terms of correlation. Third, we assess how they perform as a means to predict customer behavior in online contexts.
The remainder of the paper is structured as follows. First, we give an overview on relevant literature to clarify the research gap the paper strives to close. Afterwards, we present our methodology and how we derive clickstream graphs in particular. We then summarize the resulting data, before presenting empirical results. Lastly, we summarize our findings.
Section snippets
Related work
Much literature considers the use of clickstream data for customer online behavior prediction. Prediction targets range from conversions in purchase prediction (Van den Poel & Buckinx, 2005), whether visitors redeem incentives (Pai et al., 2014) or complete specific tasks such as putting an item into a basket (Kalczynski et al., 2006, Sismeiro and Bucklin, 2004), over navigational behavior prediction (e.g., the next web path access; Montgomery, Li, Srinivasan, & Liechty, 2004) to classifying
Methodology
The following sections explain our approaches to create clickstream graphs and derive corresponding graph metrics as input for predictive modeling.
Empirical results
Based on the methodology discussed above, we report our empirical results in three steps. Firstly, we will take a detailed look at the correlation among the graph measures applied. Secondly, we analyze the performance of the tested classifiers based on AUC-PR and the lift measure. Finally, we will investigate the different graph measures in order to better understand their impact on the predictive accuracy.
Conclusion
Using real-life clickstream datasets of two different shops, we observe for both the linear GLM model and the non-linear Random Forest model that distance- and centrality-based graph metrics are effective in predicting purchase behavior of users. We derived user-centered, session-based graphs from clickstream data, where each graph is developed incrementally, i.e. each new page view of the user develops the graph further. Each of the 23 tested graph metrics are calculated for each intermediate
References (63)
- et al.
Algorithms for clustering clickstream data
Information Processing Letters
(2009) An introduction to ROC analysis
Pattern Recognition Letters
(2006)- et al.
An empirical comparison of classification algorithms for mortgage default prediction: Evidence from a distressed mortgage market
European Journal of Operational Research
(2016) - et al.
OWA operator based link prediction ensemble for social network
Expert Systems with Applications
(2015) - et al.
Segmenting customers in online stores based on factors that affect the customer's intention to purchase
Expert Systems with Applications
(2012) - et al.
Estimating product-choice probabilities from recency and frequency of page views
Knowledge-Based Systems
(2016) - et al.
Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria
Expert Systems with Applications
(2017) - et al.
Formation of e-satisfaction and repurchase intention: Moderating roles of computer self-efficacy and computer anxiety
Expert Systems with Applications
(2009) - et al.
Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research
European Journal of Operational Research
(2015) Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream
Journal of Consumer Psychology
(2003)