E XPLORING THE S USTAINABILITY OF S WISS O NLINE S HOPS : P RELIMINARY E VIDENCE FROM A C LUSTERING A PPROACH

This paper proposes a framework for assessing the sustainability of online retailers. By adhering to the logic of the value chain, the framework captures all steps from procurement to shipping of products and handling product returns that typically occur in e-commerce. The framework is then applied to explore the sustainability of 227 Swiss online shops through a clustering approach. Core elements of the framework are dummified and condensed to 20 features using an autoencoder-based neural network. K-means clustering is used to identify three sustainability clusters that are visualised in the latent space. One-way ANOVA and chi-square analyses of the three-cluster solution indicate that the identified clusters are distinct across most elements of the proposed framework. These preliminary results hold a variety of future research avenues.


Introduction
Global e-commerce sales have continually grown over the past years and account for an increasing share of total retail sales.In 2019, emarketer predicted that global e-commerce sales would grow by 19 percent and account for 16 percent of total retail sales in 2020 (Lipsman, 2019).The recent COVID-19 pandemic has further increased the shift from offline to online retailing, especially due to social distancing requirements and lockdowns put in place to prevent spreading of the corona virus.
For individual countries, e-commerce sales were thus predicted to grow more than 20 percent in 2020 (Cramer-Flood, 2020).It is expected that the accelerated trend towards online retailing will remain during the recovery from the COVID-19 pandemic (UNCTAD, 2021a).
E-commerce has environmental consequences (Fichter, 2003).Growth in ecommerce sales translates to, e.g., more intense use of ICT and logistics affecting the use of energy, material resources, and land, and the emission of greenhouse gas (Fichter, 2003).In light of climate change (Steffen et al., 2018) and even continents aiming to become climate-neutral (European Commission, 2020), analyzing the sustainability of e-commerce becomes more relevant than ever before.
This paper develops a framework for assessing the sustainability of online retailers (section 2.3), based on the concept of the value chain specifically in e-commerce (section 2.1) and considering extant research on sustainability in e-commerce (section 2.2).The framework serves as basis for developing a survey instrument for collecting empirical data from 227 online retailers in Switzerland (section 3.1).Using an autoencoder-based neural network, the data is condensed to 20 features for subsequent clustering (section 3.2.1).K-means clustering identifies three clusters of online shops which are visualized in the latent space (sections 3.2.2 and 3.2.3).The statistical analyses of the clusters indicate that they are distinct across most elements of the proposed framework (section 4).These preliminary results provide fertile ground for further and future research (section 5).Overall, this paper contributes to a better understanding of sustainability in e-commerce specifically by proposing an assessment framework, applying it to a unique dataset, and identifying three clusters of online shops according to their sustainability as assessed by our framework.

The e-commerce value chain
The concept of the value chain, developed by (Porter, 1985), is widely used by academia and practice alike.Following a linear logic, the value chain entails five subsequent "generic categories of primary activities […] involved in the physical creation and delivery of the product to the customers" (Ricciotti, 2020, p. 193), namely inbound logistics, operations, outbound logistics, marketing and sales, and service (Porter, 1985).These sequential primary activities are sustained by four support activities such as procurement, technology development, human resource management, and firm infrastructure (Porter, 1985;Ricciotti, 2020).Several scholars have adapted the generic value chain concept to more specific contexts (Ricciotti, 2020).Some have departed from its sequential and linear logic and embraced a network logic to better reflect that firms are part of stakeholder networks that compete with each other, resulting in the concept of the value network (Ricciotti, 2020).While such network logic may more closely represent today's competitive environment, the simplicity of the value chain has not lost its appeal.
In the (online) retailing value chain, the most important players are manufacturers of products, institutional retailers, and consumers (Reinartz, Wiegand, & Imschloss, 2019).Such retailers may be traditional stationary retailers who also sell their products via online channels as part of multichannel strategies or so-called pure plays who sell online only (Reinartz et al., 2019).Manufacturers have also started running their own online operations aiming to engage directly with end customers, thereby circumventing retailers (Reinartz et al., 2019).In this paper, we focus on the value chain from the perspective of the online seller of products.This can include any type of online retailer as well as manufacturers maintaining direct-to-consumer (D2C) online operations.For selling products online, retailers and manufacturers need to run a website with an online shop.1 Products sold online are typically stored in a warehouse.Once ordered, they are packaged, and delivered to the customer.The customer receives the product, keeps it, or may decide to return it.

Sustainability in e-commerce
Research on the sustainability in e-commerce has addressed a wide array of aspects.This includes comparing the environmental impact of brick-and-mortar retail channels with online channels (Bertram & Chi, 2018;Pålsson, Pettersson, & Winslott Hiselius, 2017;P. Van Loon, McKinnon, Deketele, & Dewaele, 2014;Weber et al., 2009), sustainability in e-commerce packaging (Escursell, Llorach, & Roncero, 2020), sustainability of e-commerce logistics (Mangiaracina, Marchet, Perotti, & Tumino, 2015), lifecycle analysis of different fulfilment channel types (Patricia Van Loon, Deketele, Dewaele, McKinnon, & Rutherford, 2015), ecommerce customers' last-mile delivery preferences based on the sustainability impact (Ignat & Chankov, 2020), and product returns' effect on business, society, and the environment (Frei, Jack, & Brown, 2020).While these contributions are diverse in the topics they address and the methods applied, they have one important denominator in common: they address or can be related to specific segments or phases of the e-commerce value chain.This underlines the suitability of the value chain concept as a frame of reference for analyzing sustainability in e-commerce.

A framework for assessing the sustainability of online retailers
Adhering to the logic of the value chain and considering extant literature, we identified six core elements of the e-commerce value chain that need to be considered when assessing the sustainability of online shops: 1) products, 2) online shop operations, 3) intralogistics and storage, 4) packaging, 5) delivery, and 6) product returns.Some of these elements may involve the cooperation with other companies (e.g.suppliers or fulfilment/logistics service providers).We thus included 7) cooperation with partners as a further core element.In our framework (see Figure 1), this element is depicted in parallel to the six linear elements since it can relate to any of them.
To emphasize the goal of this framework, namely to guide assessing online retailers' sustainability, we nest the core framework within the society and the environment.
The link between online retailers and the society is established through interactions with their stakeholders which are part of the society.Such stakeholders can include partners of the company (e.g.suppliers), employees, and customers, but also consumer and trade organizations, and entities such as states with their regulatory requirements.Communication plays a major role for such interactions (Lim & Greenwood, 2017).We thus include communication with stakeholders as further element in our framework.(Osterwalder & Pigneur, 2010).Our approach is distinct from these examples in two ways: First, we purposely focus on society and environment, thereby simplifying the manifold and complex interactions between many types of stakeholders.Second, we nest our framework within society which is in turn nested within the environment.This is in line with that paradigm of sustainability which proposes the economy depends on the society and both depend on the environment, in that order (e.g.Griggs, 2013).

Development of survey instrument, data collection, and sample
The framework developed in section 2.3 served as foundation for developing the survey instrument.For each of the framework's core elements, survey questions and items were designed to adequately address the specific nature of each core element.Further survey questions addressed communication and marketing, the assessment of customer needs, strategic priorities, and questions determining the size and nature of the online shop (e.g.product categories and annual order volume).The final survey instrument was reviewed and pre-tested by experienced researchers, market research professionals, and e-commerce practitioners.
The online survey targeted online shops specifically operating in Switzerland and was administered from March to July 2020.Switzerland ranks first in the UNCTAD B2C E-commerce Index 2020, reflecting high scores in all four dimensions of the index, namely Internet penetration, banking coverage, secure server density, and postal reliability (UNCTAD, 2021b).An invitation to the online survey was sent to a large number of Swiss online shops.In addition, the link to the survey was distributed in social media and specific newsletters to reach out to the target group.
The final sample consisted of n=227 completed online surveys.
In terms of business relationships, the majority stated to serve B2C customers (81%), which was the primary target group of the survey; 51% stated to serve B2B customers; 25% stated to follow a D2C model.Our sample included online shops with different annual order volumes: 32% reported less than 1,000 orders, 36% 1,000 to 9,999 orders, and 32% 10,000 and more orders in 2019.

Data preparation
From the overall survey instrument, we have selected a total of 10 questions to adequately cover all elements of our core framework, i.e. from products to returns, and cooperation with partners.The dummification of the relevant variables lead to a dataset containing 227 online shops and 150 features having only categorical values of "0" or "1".
An autoencoder-based neural network has been used to perform the dimension reduction on the dataset obtained from the previous step.This is beneficial to reducing the noises underlying the original data, which is helpful for the clustering.Compared to the conventional linear dimension reduction techniques using e.g.Principal Component Analysis (PCA), autoencoder-based neural networks are claimed to be more powerful as they can capture nonlinear correlations between features and perform nonlinear transformations to the data (Alkhayrat, Aljnidi, & Aljoumaa, 2020).

Clustering method
The conventional, unsupervised clustering method K-Means has been utilized to group the online shops that potentially have similar features.The elbow method has been used to determine the optimal number of clusters.Basically, the elbow method calculates the sum of squared distance between neighboring points (online shops) in the determined latent space.From Figure 3a, a clear "elbow" can be seen when the number of clusters equals 3. To verify the chosen number of clusters, a silhouette analysis has been performed for a cluster size of 2, 3, 4, 5, and 6.Correspondingly, the silhouette coefficients obtained are 0.289, 0.211, 0.159, 0.151, and 0.148.A cluster size of 4, 5, or 6 is relatively not good as it leads to a below-average silhouette coefficient.In the end, the size of clusters is determined as 3.The online shops are grouped into three clusters (C1, C2, and C3) using the K-Means algorithm.
Figures 3a and b: Elbow method for determining the optimal number of clusters (a; left) and silhouette analysis for verifying the three-cluster solution (b; right)

Cluster visualization
To visualize the determined clusters of online shops on two-dimensional diagrams, both PCA and t-distributed stochastic neighbor embedding (t-SNE) methods are used.PCA is a deterministic, linear dimensionality reduction technique, which keeps the global structure of data by preserving their variance.Here, the first, second, and third principal components represent 78.4% of the variance in the original data, which is quite good.In comparison, t-SNE is a non-deterministic, non-linear dimensionality reduction technique, which is stronger at capturing local structures within the data and preserves distances between data points instead of their variances.In Figure 4, both the PCA and the t-SNE diagrams exhibit clear boundaries between clusters that can be distinguished mainly using the first principal component.

Results
In the survey instrument, we used five items on a 5-point scale to measure sustainability related to the element products.Similarly, five 5-point-scale items were used to assess the elements online shop operations, three items for intralogistics and storage, six items for product returns, and seven items for cooperation with partners.One-way ANOVA was performed to detect statistically significant differences between the clusters.Table 1 details the results for two exemplary items per framework element, highlighting distinct characteristics of each cluster.For the element packaging, we developed lists of specific characteristics of the packaging and the packaging system as multiple-answer checkox-type quesions (e.g. if the packaging material is free of plastic or if the size of boxes is optimized to suit individual consignments).For the element delivery, we developed a similar list for a variety of respective options (e.g.carbon-neutral shipping or bundling of partial deliveries).Figures 5 and 6 detail how the respective characteristics and options vary across the clusters.Cross tabulation and chi-square analyses revealed that most of the characteristics of the packaging systems are unequally distributed across the clusters at p<.001 (***), p<.01 (**), or p<.05 level (*; see Figure 5a).Characteristics of the packing material (see Figure 5b) and the filler material (see Figure 5c) are unequally distributed across the clusters at p<.001 (***) or p<.01 (**) level.Taken together, these results suggest that the identified clusters are quite distinct with regards to the packaging core element of our framework.Out of the seven delivery options displayed in Figure 6, three are unequally distributed across the clusters at p<.001 (***), p<.01 (**), or p<.05 level (*), namely consolidated shipping, use of locally adapted logistics solutions, and delivery to third-party collection points.Our clustering results indicate that online shops can be grouped into three clusters regarding their sustainability.With regard to many aspects of our proposed framework, the clusters are distinct from each other.Among them, cluster 3 has the most distinct characteristics and thus appears to comprise online shops that are most sustainable according to the criteria covered by our survey instrument.Further analysis of the present dataset needs to test to which extent the identified clusters also vary by structural properties of the online shops such as product categories, order volume, or for example D2C model.Future research should also investigate the use of communication channels, possible differences between pure-play and omnichannel strategies, and potential latent drivers of sustainability engagement such as the values of the organizations behind the online shops.

Figure 1 :
Figure 1: E-commerce sustainability assessment framework nested in society and environment

Figure 2
Figure2displays the structure of the stacked autoencoder.To train this autoencoder, 20,000 epochs have been run, with a batch size 32, a validation split 0.1, and a shuffle variable being set as "true".Independent upon the size of the hidden layer, both the train and validation loss curves are getting flat after an epoch of approximately 10,000.At the end of the training process, the train loss and the validation loss are the smallest for a hidden layer size of 20, i.e. 0.1472 and 0.1450, respectively.Therefore, the original 150 features have been "condensed" into 20 new features in the latent space for further clustering analyses.

Figure 2 :
Figure 2: Structure of the stacked autoencoder.The input layers and the output layers are symmetric.n represents the number of neural nodes in the middle layer, which is varied using 10, 20, or 30

FiguresFigure
Figures 5a, b, and c: Packaging system (a; left), packaging material (b; upper right), and filler material characteristics (c; lower right) -percentages by cluster