1 Introduction

Electronic commerce is definitely one of the youngest branches of computer science. Moreover, one could claim that this discipline is also one of the fastest developing fields of computer science. Revenue in the e-commerce market is growing rapidly. One could say that the multidisciplinary nature (connecting many of the well-known branches of computer science such as operations research, combinatorial optimization, and algorithms with other sciences, e.g., logistics, marketing, and many others) lies behind the success of e-commerce science. E-commerce is an industry which focuses on selling and buying products and services through web pages (see Hagel III 1999; Timmers 1998).

Online (Internet) shopping, fitting into the business-to-consumer (B2C) subcategory, is one of the key business activities offered over the Internet. It has become increasingly popular over the past decade. Products available in online stores are often cheaper than the ones offered by regular local retailers, and a wide choice of offers are available just a click away from the customer. A crucial aspect of online shopping is the time spent comparing offers and a convenient way of shopping regardless of shop location. Price comparison sites (Google Shopping, Shopping.com, PriceGrabber, or Kelkoo are among the most popular shopping engines in the EU) are search tools designed to provide price information from many retailers through a single portal. However, current price ranking solutions target only single product buying. A significant percentage of price comparison websites perform suboptimally in one of their major functions: presenting prices and considering multiple products. For a specific decision aid system tackling Internet shopping with self pick-up, see Wojciechowski and Musial (2009).

The Internet shopping optimization problem (ISOP) (see Blazewicz et al. 2010) considers the situation where a single buyer is looking for a multiset of products \(N=\{1,\ldots ,n\}\) to buy in \(m\) shops, \(M=\{1,\ldots ,m\}\). A multiset of available products \(N_i\), a cost \(p_{ij}\) of each product \(j\in N_i\), and a delivery cost \(d_i\) of any subset of the products from the shop to the buyer are associated with each shop \(i\in M\), \(i=1,\ldots ,m\). It is assumed that \(p_{ij}=\infty \) if \(j\not \in N_i\). The problem lies in finding a sequence of disjoint selections (or carts) of products \(S=(S_1,\ldots ,S_m)\), which we call a cart sequence, such that \(S_i\subseteq N_i\), \(\forall i \in M\), \(\cup _{i=1}^m S_i=N\), and the total product and delivery cost, denoted as \(F(S):=\sum _{i=1}^m\Big (\delta (|S_i|)d_i+\sum _{j\in N_i}p_{ij}\Big )\), is minimized. Here, \(|S_i|\) denotes the cardinality of the multiset \(S_i\), and \(\delta (x)=0\) if \(x=0\) and \(\delta (x)=1\) if \(x>0\). We denote this problem as ISOP, its optimal solution as \(S^*\), and its optimal solution value as \(F^*\). Follow Wojciechowski and Musial (2010) for the first heuristic algorithm definition. Moreover, in Musial (2012) one can find a detailed description of a computational experiment for the problem ISOP and comments on its results. The most interesting (and most complicated) specialization of the ISOP is the so-called Internet shopping with price sensitive discounts problem presented by Blazewicz et al. (2014). For each Internet shop, standard prices for products are known, as well as an increasing discounting function of total standard and delivery price. Buying all the required products at the minimum total discounted price constitutes the problem. Its mathematical program can be written as follows:

$$\begin{aligned} \min&\sum _{i=1}^m f_i\left( \sum _{j\in N_i} p_{ij}x_{ij}\right) +\sum _{i=1}^m d_iy_i, \\ s.t.&\sum _{i\in M} x_{ij}=1,\ j=1,\ldots ,n, \nonumber \\&0\le x_{ij}\le y_i,\ i=1,\ldots ,m,\ j=1,\ldots ,n, \nonumber \\&x_{ij}\in \{0,1\},\ y_i\in \{0,1\},\ i=1,\ldots ,m,\ j=1,\ldots ,n.\nonumber \end{aligned}$$
(1)

Note that \(f_i(T)\) is defined as a discounting function for the final sum of standard price \(T\) in shop \(i\) at all points \(T>0,\, f_i(0)=0\).

It is worth noticing that there are some similarities to the well-known facility location problem (FLP) (see Revelle et al. 2008). The main characteristics of the FLP are the metric, i.e., space, given customer locations and given or not given positions for facility locations. A traditional FLP is to open a number of facilities at arbitrary positions of space (continuous problem) or at a subset of given positions (discrete problem) and to assign customers to the opened facilities so that the sum of opening costs and costs related to the distances between customer locations and their corresponding facility locations is minimized. Discussions of FLPs can be found in Krarup et al. (2002), Eiselt and Sandblom (2004), Revelle et al. (2008), Melo et al. (2009), and Iyigun and Ben-Israel (2010). The traditional discrete FLP is NP-hard in the strong sense. Note, however, that the general problem (basic ISOP with price sensitive discounts) cannot be treated as a traditional discrete FLP because there is no evident motivation for a discount on the cumulative cost in the sense of distance. It can be noticed that this problem and the basic ISOP problem are not subcases of one another, while the traditional discrete FLP is a special case of either of these problems.

At this point it is also worth mentioning the relation between the ISOP and the scheduling area (see Blazewicz et al. 2007 or Leung et al. 2004 for a general treatment of this topic).

Let us consider a scheduling problem with a set \(J\!=\!\{J_1,J_2,\ldots ,J_n\}\) of jobs which need to be scheduled on a set \(M\!=\!\{M_1,M_2,\ldots ,M_m\}\) of nonidentical, parallel machines. Additional parameters that describe the problem are:

  • \(p_{ij}\): processing time of job \(J_i\) on machine \(M_j\)

  • \(t_j\): set-up (warm-up) time for machine \(M_j\)

The goal is to find a non-preemptive schedule that minimizes total time (including set-ups); the machines are used to process the jobs. The equivalence between the ISOP and the above scheduling problem can be easily seen if we assume that machines represent shops, jobs are equivalent to products that have to be purchased, processing times are treated as prices, and the machine set-up times are equivalent to the delivery costs.

As a natural enlargement of the ISOP, a new optimization problem is introduced in this paper: the Internet shopping optimization problem with dual discounting functions. Price sensitive shipping costs are often used in Internet shops to attract customers and to encourage them to buy more products.

After the mathematical programming formulation of this problem, algorithms to solve it are proposed. Extensive tests prove their good computational features.

The organization of the paper is as follows: In Sect. 2 the new problem (dual discounting functions for ISOP) is described and formulated as a mathematical programming problem. Section 3 provides a detailed description of (both exact and heuristic) algorithms used to solve the problem. Section 4 contains theoretical analysis of novel algorithm G3. The following Sect. 5 provides information on a model that was used during computational tests. Section 6 defines the experimental environment, which is followed by presentation of the experimental results in Sect. 7. We conclude the paper in Sect. 8.

2 Dual discounting functions for the Internet shopping optimization problem

A typical example is the following advertisement in an Internet shop selling books and CDs: If the value of your purchase is at least 25, then we will ship your products for half the shipping cost; 35—half the price of delivery by a courier; 40—free delivery by post; 50—free courier shipping. Values and thresholds can vary depending on the seller. Price sensitive discounts are often used in Internet shops to attract customers. A typical example is the following advertisement in an Internet shop selling computer parts: If the value of your purchase is more than 50 then your discount is 3 %, more than 100 then 7 %, more than 150 then 10 %, more than 250 then 15 %. Many Internet shops offer free delivery if the price of a purchase exceeds a certain threshold. One can describe the ISOP with two price sensitive functions as follows: We would like to buy a number of products (\(n\)). Products are offered by many stores (\(m\)) from different locations—some are distant; some are close to our position. The price of each product \(n\) in every store \(m\) could be written as \(p_{ij}\), where \(i=1,\ldots ,m,\ j=1,\ldots ,n\). To every store \(m\) corresponds a discounting function (on the base of the total cost of the shopping from that one store) \(f_i(T_i)\). Our goal is to find out how to pay the least for all the products, including delivery costs. The mathematical programming formulation can be written as follows:

$$\begin{aligned} \min&\sum _{i=1}^m f_i\left( \sum _{j\in N_i} p_{ij}x_{ij}\right) +\sum _{i=1}^m d_i\left( \sum _{j\in N_i} p_{ij}x_{ij}\right) , \\ s.t.&\sum _{i\in M} x_{ij}=1,\ j=1,\ldots ,n, \nonumber \\&x_{ij}\in \{0,1\}, \ i=1,\ldots ,m,\ j=1,\ldots ,n. \nonumber \end{aligned}$$
(2)

Both discounting (\(f\)) and shipping cost (\(d\)) functions can be aggregated to present one common function (\(fd\)) based on both criteria. The formulation can be written as follows:

$$\begin{aligned} \min&\sum _{i=1}^m (fd)_i\left( \sum _{j\in N_i} p_{ij}x_{ij}\right) , \\ s.t.&\sum _{i\in M} x_{ij}=1,\ j=1,\ldots ,n, \nonumber \\&x_{ij}\in \{0,1\}, \ i=1,\ldots ,m,\ j=1,\ldots ,n. \nonumber \end{aligned}$$
(3)

The discounting function could look as follows (an example):

\(f_i(T_i) =\,\)...

\(T_i \text { if } 0<T_i\le 50\).

\(50+0.97(T_i-50) \text { if } 50<T_i\le 100\).

\(50+0.97*50+0.93(T_i-100) \text { if } 100<T_i\le 150\).

\(50+0.97*50+0.93*50+0.9(T_i-150) \text { if } 150<T_i\le 250\).

\(50+0.97*50+0.93*50+0.9*100+0.85(T_i-250) \text { if } T_i>250\).

Note that \(T_i\) is the total standard price of books selected in bookstore \(i\). The shipping cost function is more complicated due to its shape, which depends additionally on a customer’s later choices (delivery type, e.g., by post office, courier; payment method). We can define it as \(d_i(T_i,dv_1,dv_2), i=1,\ldots ,m\). Therefore, for the present discussion, we can state that a customer always makes the same decisions \(dv_1,\,dv_2\). Shipping cost function examples are presented in Table 1.

Table 1 Example price structure for delivery

To solve this problem one can use some of the already existing heuristics. However, before finishing computational experiments, it cannot be said that these will be as efficient as for the ISOP with price sensitive discounts. Moreover, we can presume that new specific algorithms for this interesting case should be developed.

3 Algorithm design

Finding an exact solution to this problem takes an exponential time, but to be able to compare the results of heuristics with an optimal solution, a branch-and-bound algorithm is implemented with some improvements to skip solutions that are not optimal.

Furthermore, it is interesting to compare the new heuristic with existing algorithms that are already in use in price comparison sites available on the Internet; let us call them PCS and its improved version PCS+, respectively.

3.1 Algorithms PCS and PCS+

Current price comparison site platforms could be used to solve the ISOP. For testing purposes, we coded both algorithms (PCS and PCS+). As authors of the algorithm, we could indicate Andersen Consulting (now Accenture) with its product BargainFinder including the SmartStore feature, described by Krulwich (1996)—the first widely known tool to compare a music CD’s price from different online shops. PCS is a simpler version of the algorithm. For each product \(j\) it selects its eligible shop \(i\in M_j\) with a minimum value \(p_{ij}\). At the end, it calculates discounts and adds shipping cost. PCS+ is somehow an upgraded version of the former, while all delivery costs are known (basic or flat rates). For each product \(j\) it selects its eligible shop \(i\in M_j\) with a minimum value \(p_{ij}+d_i\), where \(d_i\) is a basic price for a delivery from shop \(i\). At the end, it calculates discounts and adds shipping costs. Patents by Bastnagel et al. (2003), Christensen (2010) use the above known functionality.

3.2 Algorithm G1: GREEDY (see Wojciechowski and Musial 2010)

In the first heuristic algorithm, denoted as G1, products are considered in a certain order. The algorithm is run for various product orders, and the best solution found is presented to the customer. In the following example, it is assumed that products are ordered \(1,\ldots ,n\). Values of total delivery and standard price for all shops are initially set as \(T_i=d_i\), \(i=1,\ldots ,m\). In iteration \(j\) of algorithm G1, product \(j\) is selected in its eligible shop \(i\in M_j\) with minimum value \(fd_i(T_i+p_{ij})\), and the corresponding \(T_i\)-value is re-set: \(T_i:=T_i+p_{ij}\).

The discounting function returns the value of product \(j\) after applying the discount for shop \(i\) (cf. Algorithm 1).

figure a

We observed that algorithm G1 demonstrates very good performance on the experimental data. However, it can provide a solution whose value is \(n\) times worse than the optimum. Consider the product and delivery prices in Table 2. For any product sequence, algorithm G1 selects all the products in shop 1 which cost \(nW-n\), while an optimal solution is to select all the products in shop 2 with cost \(W\). First impressions on algorithm G1 can be found in Wojciechowski and Musial (2010). What is worth noticing is that \(G1\) is a very fast algorithm. Its complexity is \(O(nm)\), which is good to quickly provide an approximation.

Table 2 Price structure for poor performance of algorithm Greedy

3.3 Algorithm G2: Forecasting (see Blazewicz and Musial 2011)

Since an approximation can be \(n\) times worse than the optimal solution (following Blazewicz et al. 2010), \(G2\) (Algorithm 2) has been designed to counterbalance this problem.

\(G2\), unlike \(G1\), starts by choosing a shop in which items should be bought; the one chosen has the lowest average price of items including delivery fees. For each shop \(i\) the valuation \(V_i = (d_i + \sum _{j \in N_i} p_{ij}) / |N_i|\) is calculated, then from the shop \(k\) which has the lowest valuation \(V_k\), the set \(S\) of the \(\alpha \%\) cheapest available items is taken. Then those items are removed from the shops \( \forall i \in M, N_i := N_i \backslash S\), and \(d_k\) is updated to the value \(d_k = 0\), since the delivery fee for shop \(k\) has already been paid.

figure b

The complexity of \(G2\) is \(O(n^2m)\), which is slightly more than for \(G1\). Nonetheless, it is still linear in the number of shops, the parameter that should be biggest for an average consumer.

These two heuristics provide efficient results. However, to get the best out of them, both must be used with many parameters and then the best solution obtained taken. Driven by these observations and the fact that they need some parameters, a new algorithm has been created.

3.4 A new algorithm: G3

The idea behind the new heuristic that is proposed, denoted as \(G3\) (Algorithm 3), is to make groups of products to maximize the ratio \(V_i\) of how much is saved to how much it costs at its maximum in all the shops. In order to do so, for every shop, the item with the biggest ratio is put aside into a temporary basket \(C_i\). If by adding another product this ratio can be increased, the product is also taken into \(C_i\), and this process continues until there are no more items available in the shop or the ratio cannot be increased any more. Then the shop \(k\) with the highest ratio \(V_k\) is chosen, and all the items in \(C_k\) are placed in \(B_i\) to be bought from this shop. This step is repeated until all the products on the shopping list have been bought.

The expression for \(V_i\) takes into account the basket \(B_i\) containing everything already chosen for this shop as well as the temporary basket \(C_i\):

$$\begin{aligned} V_i(C_i,B_i) = 1 - \frac{f_i(T_i(B_i \cup C_i))-f_i(T_i(B_i))}{p_\mathrm{max}(C_i)} \end{aligned}$$
figure c

\(G3\) has complexity \(O(n^3m)\), which is more than \(G1\) (Algorithm 1) and \(G2\) (Algorithm 2), but \(G3\) does not need any extra tuning and does not depend on the order of items.

\(G3\) has some interesting properties such as an upper bound and optimality in some particular cases. These features are provided in Sect. 4.

4 Theoretical analysis of Algorithm \(G3\)

In this section, some interesting properties of Algorithm \(G3\) are analyzed.

4.1 An upper bound for the approximation

Theorem 1

For an instance of the Internet Shopping Problem with \(n\) products and \(m\) shops, with an optimal solution of cost \(c^*\), and \(\forall i \in M,\, \forall x \in [0,\sum _{i \in M} \max _{j \in N_i} p_{ij}]\), \(f_i(x) \ge \beta x\), the cost of the solution suggested by \(G3\) (Algorithm 3) \(c_{G3}\) is such that

$$\begin{aligned} c_{G3} \le (n-1)\max _{i\in M}d_i + \frac{c^*}{\beta } \end{aligned}$$

Before detailing the proof, we informally describe the structure of the worst case. Since each shopping basket is constructed iteratively, there can be instances where buying two objects does not improve the ratio \(V_i\). In these instances, the algorithm may pick at each step a single object in a new shop where this object has a lower price than everywhere else (including delivery cost).

The optimal strategy might be to buy everything from a single shop where the value for a basket of any single item is slightly worse than where the algorithm picked it, but where the discounting function is such that there is a big discount when everything is bought in this shop. This leads to the bound, as the left term is the expression of the fact that this algorithm could use many different shops, while the right term shows that the optimal solution could use a very good discount which could be completely missed by the algorithm.

Proof

Let \(S^*\) be an optimal solution.

Let us consider the \(k\)th iteration of \(G3\); for each previous iteration \(l\), the products \(C^{(l)}\) were chosen from shop \(i^{(l)}\), so all that remains is to find a shop for the products \(N^{(k)} = N\backslash \bigcup _{1<l<k}C^{(l)}\).

Let \(B^{(<k)}_i = \bigcup _{q<k|i^{(q)}=i}C^{(q)}\) be the basket of items selected in shop i before the \(k\)th iteration.

\(C_i^{(k)}\) is the set of items selected from shop \(i\) at the \(k\)th iteration; \(C^{(k)}\) is the one chosen at the end of this iteration in shop \(i_{max(k)}\). To simplify the notations while the work is done on the \(k\)th iteration, the \((k)\) is omitted:

$$\begin{aligned}&C_i^{(k)} = C_i \\&B_i^{(<k)} = B_i \\&i_{max(k)} = i_\mathrm{max}\\&\forall i \in M, V_{i_\mathrm{max}}(C_{i_\mathrm{max}},B_{i_\mathrm{max}}) \ge V_i(C_i,B_i). \end{aligned}$$

Given \(i \in M\), let \(x_1\) be the first element taken to build \(C_i\):

  • \(V_{i_\mathrm{max}}(C_{i_\mathrm{max}},B_{i_\mathrm{max}}) \ge V_i(\{x_1\},B_{i})\)

  • \(\forall x \in N_i \cap N^{(k)}, V_i(\{x_1\},B_{i}) \ge V_i(\{x\},B_{i}) \).

Let \(i^*(x)\) be the shop in which \(x\) is bought for the optimal solution, i.e., \(x \in S_{i^*(x)}\).

So,

$$\begin{aligned}&\!\!\!\forall x \in C_{i_\mathrm{max}}, V_{i_\mathrm{max}}(C_{i_\mathrm{max}},B_{i_\mathrm{max}})\nonumber \\&\quad \ge V_{i^*(x)}(\{x\},B_{i^*(x)}) \end{aligned}$$
(4)

and

$$\begin{aligned}&\!\!\!V_{i_\mathrm{max}}(C_{i_\mathrm{max}},B_{i_\mathrm{max}})\nonumber \\&\quad \ge \frac{\sum _{x \in C_{i_\mathrm{max}}} p_\mathrm{max}(x) V_{i^*(x)}(\{x\},B_{i^*(x)})}{\sum _{x \in C_{i_\mathrm{max}}} p_\mathrm{max}(x)}, \end{aligned}$$
(5)

i.e.,

$$\begin{aligned}&\!\!\!f_{i_\mathrm{max}}\left( C_{i_\mathrm{max}} \cup B_{i_\mathrm{max}}\right) - f_{i_\mathrm{max}}\left( B_{i_\mathrm{max}}\right) \nonumber \\&\quad \le \sum _{x \in C_{i_\mathrm{max}}} f_{i^*(x)}(x). \end{aligned}$$
(6)

Let \(q\) be the total number of iterations of \(G3\) on this instance,

$$\begin{aligned}&\!\!\!\sum _{k = 1}^q f_{i_{\mathrm{max}(k)}} \left( C^{(k)}_{i_{\mathrm{max}(k)}} \cup B^{(<k)}_{i_{\mathrm{max}(k)}} \right) - f_{i_{\mathrm{max}(k)}}\left( B^{(<k)}_{i_{\mathrm{max}(k)}}\right) \nonumber \\&\quad \le \sum _{k = 1}^q \sum _{x \in C_{i_{\mathrm{max}(k)}}^{(k)}} f_{i^*(x)}(x) \end{aligned}$$
(7)
$$\begin{aligned} c_{G3}&\le \sum _{k = 1}^q \sum _{x \in C^{(k)}} f_{i^*(x)}(x) \\ c_{G3}&\le \sum _{k = 1}^q \sum _{x \in C^{(k)}} d_{i^*(x)} + p_{i^*(x)x} \\ c_{G3}&\le \sum _{x \in N} d_{i^*(x)} + p_{i^*(x)x} \\ c_{G3}&\le (n-1) \max _{i \in M} d_i + \frac{1}{\beta }c^*. \end{aligned}$$

4.2 Optimality in a simplified case

In the case where all shops have all the products and offer them at the same price, the problem is solvable in polynomial time by a specific algorithm in \(O(n+m)\)Blazewicz et al. (2014), and under those assumptions \(G3\) also provides an optimal solution.

Theorem 2

If \(\forall i \in M, N_i = N\) and \(\forall j \in N\) and \((j,j')\in M^2, p_{ij} = p_{ij'}\) then \(G3\) provides an optimal solution.

Proof

Given \(i \in M\), \(E \subset N\) and \(j \in N \backslash E\), set \(s = \sum _{k \in E} p_{ik}\); thus, using the fact that \(g_i\) is concave and nondecreasing and that \(h_i\) is decreasing, we have

$$\begin{aligned}&\!\!\!V_i(E) - V_i(E \cup \{ j \}) \\&\;\le \frac{g_i(T_i(E)+p_{ij})}{s+p_{ij}} + \frac{g_i(T_i(E))}{s}+ \frac{h_i(s+p_{ij})}{s+p_{ij}} - \frac{h_i(s)}{s}\nonumber \end{aligned}$$
(8)
$$\begin{aligned}&\!\!\!V_i(E) - V_i(E \cup \{ j \}) \\&\;\le \frac{g_i(T_i(E))+\frac{p_{ij}}{s}*g_i(s) - \frac{s+p_{ij}}{s} * g_i(T_i(E))}{s+p_{ij}},\nonumber \end{aligned}$$
(9)

i.e.,

$$\begin{aligned} V_i(E) - V_i(E \cup \{ j \}) \le 0. \end{aligned}$$

Therefore, for all subsets \(E\) of \(N\), for any element \(x \in N \backslash E\), \(V_i(E \cup \{ x\}) \ge V_i(E)\). Following the process of \(G3\), at the first iteration, all the products are selected in each shop to maximize \(V_i\): \(\forall i, C_i = N\).

Let \(i_\mathrm{max}\) be the value of \(i\) that maximizes \(V_i(C_i)\).

$$\begin{aligned} \forall i \in M, V_{i_\mathrm{max}}(C_{i_\mathrm{max}})&\ge V_i(C_{i_\mathrm{max}})\\ \forall i \in M, V_{i_\mathrm{max}}(N)&\ge V_i(N) \\ \forall i \in M, f_{i_\mathrm{max}}(N)&\le f_i(N) \end{aligned}$$

Yet, with such a constrained problem, it has been proven that all products are bought from only one shop (Blazewicz et al. 2014). Accordingly \(G3\) returns an optimal solution. \(\square \)

5 Benchmark suite with instances generator

A challenging step in the experimental research was to create a model that would be as close to real Internet shopping conditions as possible. We studied the relationship between competitive structure, advertising, price, and price dispersion over Internet stores. As a group of representative products to be taken into account in our computational experiment we chose books because of their wide choice in virtual (Internet) stores and frequency of purchase through this kind of shopping channel. We adopted some information and computational results from Pathak (2012), Ratchford et al. (2003), Pan et al. (2003), and Clay et al. (2001) for our model. It focuses mainly on electronic bookstore (we focused mainly on books, CDs with music, DVDs with movies), price definition models, acceptance factor, retailer brand (Chu et al. 2005), and (what is important for the optimization problem model definition) price dispersion. One should also notice that consumers may choose from a large number of Internet bookstores. Yahoo.com lists more than 100 online bookstores. Data for our sample were collected from 32 stores and covered the largest US-based stores, including Amazon, BarnesandNoble.com, Borders.com, Buy.com, and Booksamillion, and top sellers among Internet bookstores in Poland such as empik.com and merlin.pl. We decided to review and upgrade the model presented by Blazewicz et al. (2010). Our goal was to create a new, more sophisticated (focused not only on books), and even more realistic model than the previous one.

The working model was prepared on the basis of data from the above-mentioned publications as well as our own observations of many Internet stores. We assume that all the tests should be done for a number of stores \(m\in \{20,30,40\}\), with a given number of products in the basket. It is assumed that each store has all the required books. In each instance, the following values are randomly generated for all \(i\) and \(j\) in the corresponding ranges. Reference price (ref) of a product \(j\): \(ref_j\in \{2,4,\ldots ,100\}\) with a percentage of occurrence: 40 % between 0 and 20, 16 % between 22 and 30, 12 % between 32 and 40, 16 % between 42 and 60, and 16 % between 62 and 100. The price of a product \(j\) in store \(i\): \(p_{ij}\in [a_{ij},b_{ij}]\), where \(a_{ij}\ge 0.75 ref_j\), \(b_{ij}\le 1.36 ref_j\), and the structure of intervals between \([a_{ij},b_{ij}]\) is as follows:

Every shop is connected with a delivery fee, taken arbitrarily between \(0\) and \(20\), and a discount function is generated with the following observations:

  • On some websites we can notice that, if more than 50 € worth of products is bought, half of the delivery cost is deducted from the total, and then if the total cost is greater than 100 €  delivery is free. To generate similar situations, the delivery fee can be deducted at three times at most: if the shipment cost is given at \(k\) times, then there are \(k\) thresholds, and a \(k\)th of the shipment fee is deducted every time.

  • Moreover, websites with discounts such as 5 % deducted if more than 50 € is taken and then 10 % if the basket is worth more than 100 € are common, so the model takes into account a discount up to 10 %, applied step by step.

These two strategies can be combined together, and all thresholds used to generate data are multiples of 25 to be as close as possible to the observations of real websites.

6 Experiments

As most price comparison websites only compare prices from \(25\) to \(30\) shops, \(m\) is taken in \(\{ 20,30,40 \}\); likewise, an average customer will only buy a few books at the same time, so \(n\) is taken in \(\{2,3,4,5,6,7\} \).

These choices also take into account the fact that the search for an optimal solution is exponential in time. To study how the algorithms react for bigger instances, for \(n\) from \(5\) to \(100\) in steps of \(5\), the heuristics \(G1\), \(G2\), \(PCS\), and \(PCS+\) are later compared with the results of \(G3\) instead of being compared with an optimal solution. For every experiment, \(400\) instances of the problem are used.

7 Results

7.1 Sensitivity of \(G1\) to the order of items

\(G1\) was tested with two different strategies: taking items in order of nonincreasing reference price and in the opposite order. Using the descending price order provided on average better solutions than the ascending order (Fig. 1). Furthermore, G1 also provides approximated solutions within 2.5 % of the corresponding optimal solution. This shows that a pure greedy algorithm can provide a good approximation in a short time: \(G1\) is \(O(nm)\), but sorting the products also takes at least \(O(n \ln n)\).

Fig. 1
figure 1

Differences between G1 following different strategies and an optimal solution for \(m\) = 30 (with 95 % confidence intervals)

7.2 Sensitivity of \(G2\) to its parameter

To establish the sensitivity of \(G2\) to its parameter \(\alpha \), it was tested with four different values: 25, 50, 75, and 100 %.

The minimum value from the four tests was also taken into account.

The results suggest that \(G2\) needs some fine-tuning to be able to provide good approximations (Fig. 2): even the best solution out of the four parameters exhibited a difference ratio greater than 5 % for a low number of items (\(n\) = 4). The choice of the parameter \(\alpha \) has a great impact on the performance of \(G2\), so it has to be chosen carefully.

Fig. 2
figure 2

Differences between G2 using different parameters and an optimal solution for \(m\) = 30 (with 95 % confidence intervals)

7.3 Results for \(G3\)

For every instance (\(n\), \(m\)) of the problem, PCS+ and \(G3\) were tested in regard to \(G1\) with the descending prices strategy and then compared with an optimal solution.

As \(G2\) needs fine-tuning, the best solution provided by \(G2\) using the parameters 25, 50, 75, and 100 % was taken.

PCS was omitted, because it provided solutions that were at least 15 % worse than the optimal solution.

\(G3\) provides solutions that are at most 0.59 % worse than the optimal solution; it is the algorithm that suggests the best solution out of all the tested heuristics (Figs. 3, 4, 5). \(G1\) also provides good results, but all its solutions but two were on average at least 0.59 % worse than the optimal solution; it is also better than PCS+.

Fig. 3
figure 3

Differences between G1, G2, G3, and PCS+ and an optimal solution for \(m\) = 20 (with 95 % confidence intervals)

Fig. 4
figure 4

Differences between G1, G2, G3, and PCS+ and an optimal solution for \(m\) = 30 (with 95 % confidence intervals)

Fig. 5
figure 5

Differences between G1, G2, G3, and PCS+ and an optimal solution for \(m\) = 40 (with 95 % confidence intervals)

The ratio for \(G3\) tends to increase as the number of items increases, but much more slowly than for PCS+ or \(G1\). Moreover, \(G2\) is highly dependent on the number of shops: for \(m\) = 20, the difference ratio between \(G2\) and an optimal solution does not exceed 7 % for \(n\) in \(\{2,3,4,5,6,7\}\), but for \(m\) = 40 and \(m\) = 30, it is exceeded for \(n\) = 6. \(G3\) provides good solutions close to the optimal one and better than those provided by previous algorithms.

7.4 Scalability of \(G3\)

To study the behavior of the algorithms on bigger instances, \(G1\), \(G2\), PCS, and PCS+ were run and compared with \(G3\) instead of being compared with an optimal solution. Without searching for an optimal solution, \(n\) could be much larger than in the previous tests so \(n\) was now taken in \(\{5,10, ..., 95,100\} \), while \(m\) does not change.

\(G3\) provided better results than other heuristics even with large numbers of products (Figs. 6, 7, 8): \(G1\) produced solutions at least 0.66 % bigger, while those from PCS+ were at least 2.5 % bigger.

Fig. 6
figure 6

Differences between G1, G2, PCS, and PCS+ and the solution from G3 for \(m\) = 20

Fig. 7
figure 7

Differences between G1, G2, PCS, and PCS+ and the solution from G3 for \(m\) = 30

Fig. 8
figure 8

Differences between G1, G2, PCS, and PCS+ and the solution from G3 for \(m\) = 40

These gaps quickly increased, and for \(n\) bigger than 20, the difference between \(G1\) and \(G3\) was at least 1.3 %. The solutions provided by PCS became better as the impact of delivery fees on the total cost decreased when the number of items increased.

Besides, PCS provides even better results than PCS+ when \(n\) increases.

\(G3\) provides better solutions than the other algorithms even with larger numbers of items.

7.5 Summary of the comparison

\(G3\) provides better solutions than \(G1\), \(G2\), and the algorithms commonly used on price comparison websites, but it also deals better with situations where not all items are available on every website, making it more versatile. Nonetheless, \(G3\) has a higher cost in complexity (\(O(m n^3)\)) than \(G1\) (\(O(nm)\)).

ISOP is a new formulated problem, fresh in the electronic commerce world. According to our best knowledge and deep review of state-of-the-art papers, we could not find any already defined problem that could be used as a transformation target from ISOP. Therefore, it is not possible to compare our results with other authors. Consequently, we focus on algorithm development. As an additional comparison, we used PCS and PCS+ algorithms, which are known engines for price comparison sites. However, as could be expected, these solutions were not dedicated to the ISOP problem, and the results obtained from these algorithms were much worse than those obtained by the newly developed ones.

8 Conclusions

E-commerce is an industry which focuses on selling and buying products and services through web pages (see Hagel III 1999; Timmers 1998). One of the very important topics of research in this area is, motivated by practical applications, Internet shopping, which is becoming increasingly popular each year.

Herein we introduce the ISOP including two different discounting functions, namely a shipping cost function as well as a price discounting function. A new algorithm denoted as \(G3\) is introduced to provide solutions to this newly defined problem. An upper bound is proved, as well as the optimality in a particular case where the problem is solvable with a polynomial algorithm. Moreover, other known algorithms were coded to determine the quality of the new algorithm \(G3\) using computational experiments.

An advanced benchmark was created to perform exhaustive computational experiments. It is worth noticing that a new, more realistic world model was proposed. Results of these experiments demonstrated that \(G3\) provides better solutions than existing heuristics, even when items may be unavailable.

The computational experiments used in this work were based on a model that was made to be as realistic as possible, but the next challenging step for this subject would be to test the algorithms on real data collected from online stores.

The succeeding step will be preparation of an exhaustive, vast computational experiment (if possible already based on real data from online stores) engaging all known algorithms. However, to prepare a real big comparison, we plan to check the performance of mathematical programming optimizers. We plan to solve the ISOP in its mathematical programming formulation using optimizers such as CPLEX, Gurobi, LINDO, SCIP, and maybe others.