Elsevier

Expert Systems with Applications

Volume 94, 15 March 2018, Pages 137-148
Expert Systems with Applications

Changing perspectives: Using graph metrics to predict purchase probabilities

https://doi.org/10.1016/j.eswa.2017.10.046Get rights and content

Highlights

  • We assess the applicability of graph metrics to predict purchase probabilities.

  • Real-world clickstream data of two online retailers is used.

  • Graphs are derived out of sessions of website visitors.

  • Distance- and centrality-based graph metrics are useful for prediction.

  • Closeness vitality, radius, number of circles and self-loops are most important.

Abstract

The prediction of online user behavior (next clicks, repeat visits, purchases, etc.) is a well-studied subject in research. Prediction models typically rely on clickstream data that is captured during the visit of a website and embodies user agent-, path-, time- and basket-related information. The aim of this paper is to propose an alternative approach to extract auxiliary information from the website navigation graph of individual users and to test the predictive power of this information. Using two real-world large datasets of online retailers, we develop an approach to construct within-session graphs from clickstream data and demonstrate the relevance of corresponding graph metrics to predict purchases.

Introduction

The e-commerce sector is responsible for a substantial fraction of firm revenues. Annual turnover was 1336 billion US dollars in 2014 and is predicted to have reached 2050 billion US dollars in 2016 (Statista, 2016a). However, given that growth rates are expected to decline in the future (Statista, 2016b), e-commerce shops need to find ways to defend market shares in an increasingly competitive environment. One strategy is to increase purchasing amounts and/or frequencies of existing customers. Important determinants of (re-)purchase intention in online shopping are trust, service quality (Hong & Kim, 2012) and user satisfaction during the online shopping process (Lee, Choi, & Kang, 2009). To offer a richer user experience and increase visitors’ (re-)purchase intentions, understanding customer online behavior is crucial (e.g., Pai, Sharang, Yadagiri, & Agrawal, 2014). To gain such insight and to anticipate user actions, the analysis of clickstream data has been widely adopted in the literature (e.g., Park and Park, 2016, Van den Poel and Buckinx, 2005).

However, previous work in the field has not examined the potential of graph theory to gather auxiliary information from clickstream data and increase the accuracy of behavior prediction models. Graphs are a methodological approach originating from network theory. They consist of nodes and edges, which connect nodes. Graph-based approaches have been used in various fields and have been proven to be helpful for various tasks, for example to predict connections in the social networking context (He, Liu, Hu, & Wang, 2015), to detect money laundering activities (Colladon & Remondi, 2017), for personalized recommendations (Shams & Haratizadeh, 2017) and for customer churn prevention (Óskarsdóttir et al., 2017). Given the success of graph-based predictors in these and other applications, the objective of our paper is to test their potential for online behavior prediction based upon clickstream data.

We contribute to literature as follows: First, we propose an approach to derive graphs from user sessions based on clickstream data. Second, we calculate graph metrics and examine their pairwise dependency in terms of correlation. Third, we assess how they perform as a means to predict customer behavior in online contexts.

The remainder of the paper is structured as follows. First, we give an overview on relevant literature to clarify the research gap the paper strives to close. Afterwards, we present our methodology and how we derive clickstream graphs in particular. We then summarize the resulting data, before presenting empirical results. Lastly, we summarize our findings.

Section snippets

Related work

Much literature considers the use of clickstream data for customer online behavior prediction. Prediction targets range from conversions in purchase prediction (Van den Poel & Buckinx, 2005), whether visitors redeem incentives (Pai et al., 2014) or complete specific tasks such as putting an item into a basket (Kalczynski et al., 2006, Sismeiro and Bucklin, 2004), over navigational behavior prediction (e.g., the next web path access; Montgomery, Li, Srinivasan, & Liechty, 2004) to classifying

Methodology

The following sections explain our approaches to create clickstream graphs and derive corresponding graph metrics as input for predictive modeling.

Empirical results

Based on the methodology discussed above, we report our empirical results in three steps. Firstly, we will take a detailed look at the correlation among the graph measures applied. Secondly, we analyze the performance of the tested classifiers based on AUC-PR and the lift measure. Finally, we will investigate the different graph measures in order to better understand their impact on the predictive accuracy.

Conclusion

Using real-life clickstream datasets of two different shops, we observe for both the linear GLM model and the non-linear Random Forest model that distance- and centrality-based graph metrics are effective in predicting purchase behavior of users. We derived user-centered, session-based graphs from clickstream data, where each graph is developed incrementally, i.e. each new page view of the user develops the graph further. Each of the 23 tested graph metrics are calculated for each intermediate

References (63)

  • W.W. Moe et al.

    Capturing evolving visit behavior in clickstream data

    Journal of Interactive Marketing

    (2004)
  • S. Park et al.

    Sequence-based clustering for web usage mining: A new experimental framework and ANN-enhanced K-means algorithm

    Data & Knowledge Engineering

    (2008)
  • B. Shams et al.

    Graph-based collaborative ranking

    Expert Systems with Applications

    (2017)
  • E. Suh et al.

    A prediction model for the purchase probability of anonymous customers to support real time web marketing: A case study

    Expert Systems with Applications

    (2004)
  • D. Van den Poel et al.

    Predicting online-purchasing behaviour

    European Journal of Operational Research

    (2005)
  • A. Alin

    Multicollinearity. Wiley interdisciplinary reviews

    Computational Statistics

    (2010)
  • A. Anitha

    A new web usage mining approach for next page access prediction

    International Journal of Computer Applications

    (2010)
  • A. Banerjee et al.

    Clickstream clustering using weighted longest common subsequences

  • P. Berka et al.

    Predicting page occurrence in a click-stream data: Statistical and rule-based approach

    Advances in data mining. Theoretical aspects and applications

    (2007)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • R.E. Bucklin et al.

    Choice and the Internet: From Clickstream to Research Stream

    Marketing Letters

    (2002)
  • H. Byeon

    Evaluating the online buying behavior using network analysis

    International Journal of Advancements in Computing Technology

    (2013)
  • T. Chan et al.

    Predictive models for determining if and when to display online lead forms

  • N.V. Chawla et al.

    SMOTE: Synthetic minority over-sampling technique

    Journal of Artificial Intelligence Research

    (2002)
  • A.F. Colladon et al.

    Using social network analysis to prevent money laundering

    Expert Systems with Applications

    (2017)
  • P. Girija et al.

    An approach for predicting user's web access pattern

    International Journal of Computer Science and Management Research

    (2013)
  • Ş. Gündüz et al.

    A web page prediction model based on click-stream tree representation of user behavior

  • A.A. Hagberg et al.

    Exploring network structure, dynamics, and function using NetworkX”

  • J.F. Hair et al.
    (1998)
  • T. Hastie et al.

    The elements of statistical learning

    (2001)
  • Q. Jiang et al.

    Cross-website navigation behavior and purchase commitment: A pluralistic field research

  • Cited by (0)

    View full text