Online shopping: Factors that affect consumer purchasing behaviour

The objective of this paper is to determine factors that affect the consumers’ willingness to purchase product from the online store. We evaluated the criteria based on which users make decisions when purchasing online. We conducted principal components analysis to reduce the number of these criteria and created seven factors. To confirm factors are accurate, we executed confirmatory factor analysis that proved that model consisting of the newly created factors fits the data well. Subjects: Marketing; Advertising; Marketing Research; Consumer Behavior


PUBLIC INTEREST STATEMENT
The growth of ecommerce environment caused it is easier than ever to research a product and purchase it online from the most suitable seller. This power in the hands of consumers affects online businesses in many spheres. As good customer experience is a driver of purchases, companies should ensure that customer gets what she needs in the user-friendly way. We administered a survey questionnaire among 232 students and make them evaluate the criteria they tend to when purchasing online. Afterwards, we conducted principal components analysis to reduce these criteria and create meaningful factors. We were able to extract 7 components: price, availability, social proof, scarcity, product details, conditions and social media activity. To confirm factors are accurate, we executed confirmatory factor analysis (CFA) that proved that model fits the data well. We believe that defined factors can be used by ecommerce stores as a quick guideline when designing customer experience.
Pinsonneault, Tomiuk, & Liu, 2015). E-commerce and e-business has been the topic of research for many researches, as until 2013, there were more than 600 studies available discussing e-business adoption only (Chen & Holsapple, 2013). In the growing competition of online stores, it is inevitable to monitor factors that affect potential customers during their buying journey. By not doing so, companies put themselves to the risk of losing their customers in favor of their competitors. This paper provides the overview of customers' perception towards selected factors during online shopping. The contribution lies in the use of gathered knowledge by entrepreneurs (companies), as well as agencies focused on developing websites and online stores. While companies can take advantage of these information to optimize the running store, web agencies can take advantage of these information as a guideline during the development of new e-shops. Agencies can also use the information to build a selling strategy for their clients.
E-commerce is a form of business that is conducted in the online environment, while the Internet behaves as an unified platform that connects buyers and sellers (Turban, King, Lee, Liang, & Turban, 2015). Ullman (2013) consider e-commerce to the range of possible commercial transactions conducted online. Each website that is able to generate income (or its intention is to generate income) can be included in this category (Horch, Wohlfrom, & Weisbecker, 2017). Chaffey (2015)defines electronic commerce as social and economic activities between participants, while computer devices and the Internet are used. However, with the rapid development and penetration of mobile devices, this definition can be considered as outdated. Minculete (2013) states that e-commerce and e-business should give up the letter "e" because the use of e-commerce technologies is on its rise and they became the regular part of marketing initiatives.
The core instrument used in e-commerce is undisputedly electronic shop (abbreviated e-shop). E-shop is a store operated in the environment of the Internet (Kollmann, Lomberg, & Peschl, 2016;Turban et al., 2015). Currently, it is possible to develop the e-shop by using free platforms (such as WooCommerce) that can be implemented into content management systems (Beleščák, 2014). Pilík (2013) states that in Czech Republic alone, there is an increase of 800 new e-shops every year. As was mentioned in the beginning of the paper, e-commerce has its benefits for small-and medium-sized enterprises (Kartiwi, Hussin, Suhaimi, Jalaldeen, & Amin, 2018). They claim that these benefits are not noticeable in time of e-commerce implementation in companies but companies will start to notice them later. Trading via modern technologies allows a quick response to the emerging trends in purchased products. Thanks to this possibility, even small-and mediumsized companies are able to be more flexible and it gives them the competitive advantage over big corporations (Cantú, Morejón, Molinaand, & Wong, 2014). Agarwal & Wu (2015) discuss that use of e-commerce is important especially in companies located in developed and emerging countries.
While analyzing the current state of discussed issue, the analysis of already published research in the area is critical. Pilík (2013) conducted a questionnaire survey among 706 pseudo-randomly chosen respondents. The survey was conducted between June and September 2012, while the main purpose of the survey was to determine factors affecting online purchase. The results of the survey proved that 87,5% of respondents used the Internet for product purchasing, while 32,7% of respondents use the Internet to purchase products regularly. Based on this research, age and the Internet literacy affect the purchase in the most significant way. There was found a negative dependence between online purchase and the Internet literacy. The majority of respondents were mostly afraid of product testing, claims, problems with product returns and delivery of the wrong product.
The research conducted by Masínová and Svandová (2014) was conducted on the sample of 167 respondents. The results shown that product description, solving the claims, product photos, payment options and time to response are among factors that affects customers' satisfaction in the major way. These factors happened to be important especially for the Internet users purchasing clothing. Moreover, in Bucko and Vejačka (2011) was discussed that one of the factors affecting the purchasing online is trust and security of the environment and connected identification of users (or communicating parties). In their research, Vilčeková and Sabo (2014) analyzed the sample of 1,067 respondents, while research was conducted between January and April 2013. Based on the results of the survey, the factor analysis was conducted. As a result, six factors describing the relationship towards the country of origin of purchased product. Based on the research results, it was found that country of origin matters more to the older respondents compared to younger ones. In study conducted by Rajyalakshmi (2015), the author examined the sample of 1,500 Internet users from six major cities across India. The factor analysis conducted by author pointed out to eight factors-positive attitude ("I really like buying at the Internet", "I consider internet to be my first choice when I need any product or service"), perceived usefulness, product risk, perceived risk, price, traditional shopping, promotion, financial risk. The regression analysis found that perceived risk, price and promotion have the strongest positive impact on customers' willingness to purchase a product. In study by (Baubonienė & Gulevičiūtė, 2015), 183 Lihtuanian consumers who purchase online were surveyed. Within this study, authors determined four factors that influenced the behavior of customers: technical factors (knowledge of IT technologies and IT skills), consumer-related factors (an attitude to online shopping, cultural types and more), price, product/service factors (the availability of product information on the website, product type). Thesis by Agyapong (2017) was conducted on the sample of 184 respondents via online questionnaire. The author found that the main factors that affect online shopping are convenience and attractive pricing/discount. Advertising and recommendations were among the least effective. In the study by Lian and Yen (2014), authors tested the two dimensions (drivers and barriers) that might affect intention to purchase online. Drivers consisted of performance expectation, effort expectation, social influence and facilitating conditions. Usage, value, risk, tradition and image were all among barriers.

Methods used to determine factors affecting consumer behavior
The main objective of this paper is to determine factors that affect the consumers' willingness to purchase product from an online store in the condition of Slovak market. Our goal was to eliminate the number of criteria affecting the online purchase and create new latent variables (factors) that could gather summarized information within.
In order to achieve our objective, the survey questionnaire was conducted. The general sample consisted of all Internet users located in Slovak republic. For the purpose of our study, the sample was not selected randomly-we selected the sample purposely. As we consider Generation Y (born between 1980 and 1995) to be the major group of Internet users with purchasing potential, our focus was aimed to this particular group. As we were looking for a group of people with the higher level of the Internet literacy (as we wanted to avoid the entropy in terms of theory of marketing communication), we selected university students to be appropriate subjects for our study.
The survey was administered in February and March 2015, and in December 2015 via questionnaire consisting of 17 items. We focused on responses provided to one particular item: "How important are the following criteria for you during the online purchase?" The evaluated criteria was as follows: price of the product, shipping, discounts and special offers during the purchase, price comparison with brickand-mortar store, payment method, delivery time, reviews about product, reviews about seller, limited product quantity, time-limited offers, free shipping, security certificate, product details, product photos, website graphics, easy-to-use navigation, customer service before the purchase, position in search engine results page, mobile optimization of the website, ease of accessibility of terms and conditions, ease of accessibility of shipping conditions, website activity, social media activity, number of followers on social media, seller's country of origin. Respondents selected their answers from the modal options in the interval from 1 = very important to 5 = not important at all. Criteria (or options, factors) were selected based on the previous researches.
In order to achieve the objective, set in the beginning of this part of the paper, the exploratory factor analysis will be used to analyze the data gathered from the survey. The main purpose of factor analysis is to evaluate the structure of mutual relationships among variables. Subsequently, it is important to find out if there is a possibility of splitting variables into groups while correlations within the group would be significant and at the same time, correlations between groups wouldn't be significant. By applying factor analysis, new variables called factors are created from the original variables (Stankovičová & Vojtková, 2007).
Basic assumption for use of factor analysis is the existence of sufficient correlations among data in the data matrix. To analyze these correlations, it is possible to use various tests: • Bartlett sphericity test: null hypothesis assumes the correlation matrix to be an identity matrix.
• Kaiser-Mayer-Olkin test (KMO): compares sizes of experimental correlation coefficients to sizes of partial correlation coefficients. It is highly recommended that KMO value is higher than 0.5 (Meloun et al., 2012).
• MSA: it provides the level of degree of inner correlations between original tokens and expresses how they are predicted by other tokens. It is recommended to not conduct factor analysis when MSA is lower than 0.5 (Coussement et al., 2011).
If the assumption is met, the next step is to estimate parameters of the factor model. In this phase, principal components analysis will be used. It is a statistical method that uses orthogonal transformation to convert the set of correlated observations into set of observations with no correlations among them. If k is the number of principal components and n is number of variables, then k ≤ m (Bro & Smilde, 2014). The next step is the determination of common factors. As Meloun et al. (2012) and Kagraoka 2015) explain, the number of common factors should be based on certain criteria. We will use the principle of eigenvalues (factors with eigenvalues higher than 1 are considered to be significant) and criterion of explained variance (selected factors should explain as high proportion of total variability as possible).
When number of factors is determined, we can compute the factor saturations. During these computations, the situation connected to ambiguity of the results might happen-one variable can correlate with more factors. In order to maximize differences between factors, the rotation is used. Thanks to the rotations, factor loadings (correlations between variable and factor) get into the shape that allows more exact and easier explanation. We distinguish between orthogonal rotation (e.g. varimax, equamax, quartimax) and oblique (e.g. oblimin, promax) (Ronco & De Stéfani, 2012). In our analysis, both orthogonal and oblique rotation will be used in order to find the best possible explanation of factors.
When new latent variables were created, we needed to confirm their accuracy by using CFA. CFA is a multivariate statistical procedure that is used to test how well the measured variables represent the number of constructs (Brown, 2015). CFA can specify the number of factors required in the data and which measured variable is related to which latent variable. CFA is a tool that is used to confirm or reject the measurement theory (Brown, 2015).
CFA is used to test whether a hypothesized structure is appropriate for multivariate data. The hypothesized structure constrains the matrices appearing in the covariance equation. Individual covariances among the latent factors or among the error terms can be assumed equal or set to zero. Likewise, selected variances (diagonal entries) may be presumed to be equal within each of these matrices. Also, selected factor loadings may be set to zero. A random sample of multivariate observations is used to estimate the corresponding sample covariance matrix with and without the constraints imposed by the hypothesized structure (Byrne, 2016;Fox, 2010).
In order to confirm the assumed data structure is correct, it is possible use various fit indices: • Comparative fit index (CFI) represents incremental fit indices. CFI is a revised form of the normed fit index (NFI) which takes into account sample size (Byrne, 2016) that performs well even when sample size is small (Babin, Boles, & Griffin, 2015;Kenny, 2015). The extent to which the tested model is superior to the alternative model established with manifest covariance matrix is evaluated (Chen, 2007). The CFI produces values between 0 and 1 and high values are the indicators of good fit. When CFI value is 0.95, it means that the fit in question is better compared to the independence model. (Cangur & Ercan, 2015).
• Tucker-Lewis Index (TLI) is an incremental fit index. Non-normed fit index (NNFI) which is also known as TLI was developed against the disadvantage of NFI regarding being affected by sample size. The bigger TLI value indicated better fit for the model. Although values larger than 0.95 are interpreted as acceptable fit, 0.97 is accepted as the cut-off value in a great deal of researches.
• Standardized root mean square residual (SRMR) is an index of the average of standardized residuals between the observed and the hypothesized covariance matrices (Chen, 2007). SRMR indicates the acceptable fit when it produces a value smaller than 0.10, it can be interpreted as the indicator of good fit when it produces a value lower than 0.05 (Henseler et al., 2014;Hu & Bentler, 1999;Kline, 2011) • Root Mean Square Error of Approximation (RMSEA) Index of the difference between the observed covariance matrix per degree of freedom and the hypothesized covariance matrix which denotes the model (Chen, 2007). The RMSEA also takes the model complexity into account as it reflects the degree of freedom as well. RMSEA value smaller than 0.05, it can be said to indicate a convergence fit to the analyzed data of the model while it indicates a fit close to good when it produces a value between 0.05 and 0.08. A RMSEA value falling between the range of 0.08 and 0.10 is stated to indicate a fit which is neither good nor bad. Hu and Bentler (Hu & Bentler, 1999) remarked that RMSEA index smaller than 0.06 would be a criterion that will suffice.

The development of the fundamental factors
In total, 232 respondents joined the survey; however, we analyzed only 221 cases after the removal of missing and extreme values. The average age of the respondents was 21.72 years (in the interval between 18 and 26 years) and median of age was 21 years. The sample consisted of students attending study at Faculty of Management, University of Prešov in Prešov (78), Faculty of Public Administration, Pavol Jozef Šafárik University in Košice (88), and Faculty of Arts, Pavol Jozef Šafárik University in Košice (55). In total, 24.89% of the respondents stated their gender was male, 75,11% of respondents stated their gender was female. Table 1 describes the descriptive statistics (mean, median, standard deviation) of the criteria we measured in the survey. It is possible to see that price of the product is the most important criterion when purchasing goods online. Discounts, price comparison with brick-and-mortar stores, payment method, delivery time, reviews about the product, product description and pictures can be also considered to be among the most important evaluation criteria when conducting such a purchase. In Table 1, it is also possible to see that criteria such as number of social media followers, social media activity, website activity, mobile optimization, search engine positions, and limited quantity of the product are considered to be of less importance for the participants of our survey. However, the descriptive results of this survey are not the primary purpose of this study, so we are not going to dive deeper into these results.
As a first step in our analysis, we need to determine if use of factor analysis is the suitable method to analyze our data matrix. In the beginning, we created correlation matrix which showcased many small and moderate correlations between variables. It is a sign that data might be suitable for the analysis, however, we have to confirm it by using abovementioned Bartlett's sphericity test and KMO test.
As was mentioned, Bartlett's sphericity test tests tries to accept or reject the following null hypotheses: H0: The correlation matrix is an identity matrix.
HA: The correlation matrix is not an identity matrix.
With p = 2.2e-16, we can reject the null hypothesis on the significance level α = 0.05. The correlation matrix is not an identity matrix. KMO test with overall MSA value of 0.79 means that factor analysis is suitable for our data.
Before principal components analysis was conducted, we needed to decide how many factors will serve as an outcome of the analysis. Based on eigenvalues higher than 1.0, we selected 8 factors to become the outcome of the initial analysis. As factor saturations were not clear enough, we used orthogonal method varimax to rotate the factors. The results of the analysis showed that two variables (position in search engine results, country of seller's origin) had lower communality (h2) than recommended value 0.5. We decided to remove these variables and repeat the procedure. When the procedure was repeated, we found another variable (customer service before the purchase) to has lower communality than 0.5. Moreover, we found that factor loadings for other seven variables (price comparison in brick-and-mortar store, free shipping, safety certificate, website graphics, easy-to-use navigation, optimization of website for mobile devices and website activity) were not clear and variables correlated with more than one factor. We decided to remove all these variables from analysis and repeat the procedure one more time.
We again performed Bartlett's sphericity test and KMO test to determine if the data are still suitable for factor analysis despite many variables were removed. We were able to reject null hypothesis in Bartlett's sphericity test with the p = 2.2e-16. Overall MSA in KMO test was still above 0.5 at the level of 0.64. These two tests proved we can continue in our analysis. To determine how many factors will be used, we conducted initial principal components analysis and analyzed Eigenvalues and explained variance. Figure 1 displays scree plot with Eigenvalues.
Based on the Eigenvalues, we found seven factors will be sufficient number of factors for our analysis. This number of factors explained 79.71% of variance in the data. As non-rotated solution was not sufficient-the results were not clear, we decided to try varimax, equamax and quartimax rotation. We did not try oblique rotation as we didn't want factors to correlate between each other. All rotation types provided us with almost similar results. We decided to stick to varimax. Table 2 displays generated rotated components (RCs) and factor loadings.
Based on the table above, it is possible to define components (factors) as follows: • RC1-The factor of price: this factor merges variables that affects the price of the purchase• RC2-The factor of availability: this factor covers variables associated with ease of the ordering process itself; • RC3-The factor of social proof: this factor covers people's urge to confirm the product they are going to purchase is good; • RC4-The factor of scarcity: this factor merges variables that affect the speed of people's choice. Together with social proof, scarcity is among Robert Cialdini's Weapons of Influence (Cialdini, 2006); • RC5-The factor of product details: this factor merges variables connected to the presentation of the product; • RC7-The factor of social media activity: this factor merges variables connected to store's activity on social media. Table 3 showcase that RC 1 explains the highest proportion of variance in the whole reduced dataset (13%) and 16% of variance among all components-perhaps because it consists of 3 variables and rest of the components includes only 2 variables. On the contrary, RC 6 explains only 9% of the proportion variance for the whole data set and 12% of the variation among factors.
In order to confirm that newly developed factors as constructs and test if the data fit the model, we executed CFA using lavaan package in R. We used latent variables created in the previous step and developed the hierarchical model represented by Figure 2: Based on this hierarchical model, we developed the following structural model: Purchase behavior <-Price =~Price of the product + Shipping + Discounts and special offers

Availability =~Payment method + Delivery time
Social proof =~Reviews about the product + Reviews about the seller Scarcity =~Limited quantity of the product + Time limited offer Product details =~Product details + Product photo

Conditions =~Accessible terms and conditions + Accessible shipping conditions
Social media activity =~Social media activity + # of social media followers  Each line in model represents separated latent variables, that, on the other hand, are manifested by variables from the data frame used in the beginning of the analysis. Afterwards we fit the model using CFA. The results are presented in Table 4.
Results of CFA shown that estimated model fits the data well. CFI is higher than suggested 0.95 and thus we can see the model is a good fit. TLI index is, on the other hand, lower than suggested level of 0.95 (both CFI and TLI levels suggested by Hu and Bentler (1999). Moreover, RMSEA value is of 0.055 which is lower than suggested 0.06 and lower to consider model to fit the data (Hu & Bentler, 1999).  SRMR of 0.050 also indicates a good fit of the model. By looking to Table 5, it is possible to see that all variables are statistically significant. The estimate ranges between 0.702 and 2.001.
As there were observed correlations between some of the subfactors, we decided to include some of the correlations to our structural model. By doing so, we were able to obtain a nested model that was quite improved compared to the basic model. The CFI index increased to 0.966, TLI index increased to 0.954 (and thus both indexes suggests model fits the data well) and even RMSEA dropped to 0.047. SRMR remained constant at 0.050, so the good fit was achieved again. Three criteria confirmed that basic model is accurate enough and four criteria confirmed that nested model including the correlations between subfactors is accurate enough. We could confirm that consumer online purchasing behavior can be fairly explained by seven factors developed by our study.

Managerial implications
As there is an increasing growth of e-commerce retail market (Statista, 2018), it is predicted that the number of e-commerce stores will raise, too. Therefore, this sector is going to become strongly competitive. The ability of customers to fulfill their needs is limited only by their ability to use web browser. Moreover, comparison websites and e-commerce aggregators allow users to compare the range of online stores from one place with ease. As a result, the company has very often only small and limited room to attract the potential customer. As was possible to see in results of our study and in studies by Rajyalakshmi (2015), Baubonienė & Gulevičiūtė (2015), and Agyapong (2017), the price is among the most important factors for Internet users. This might be the cause of the presumption that in general, it is cheaper to purchase of product online because of lower cost regarding the staff and physical store. As we surveyed young students from whom the majority has only part-time job, and therefore their income is low, it is obvious that price of the product, shipping costs and discounts play the major role when choosing the product. Regarding the limited budget, we might assume that users don't want to buy solely the cheapest product but they want to choose the right product that will fulfill their needs. This is why the scarcity factor is so important (even when study by Agyapong, 2017) suggests that recommendations and word-of-mouth are not important for online purchasers). We might compare our availability factor to convenience described by Agyapong (2017). With so many options available and their differentiation across the countries based on the customers' customs, it is important to provide payment and shipping options that are familiar to customer (because of trust). For example, by using the service Adyen (2018) it is possible to provide the locally-preferred payment methods to customers around the globe. Our product details factor is in alignment with studies by Baubonienė and Gulevičiūtė (2015) and Masínová and Svandová (2014) and indicates that because of the limited possibility to "explore" the product, the product description should be detailed enough to cover the informational needs of a customer. Asos (www.asos.com) or Zappos (www.zapos.com) are among retailers who dedicated appropriate efforts to this issue. Our conditions factor seems to be similar to perceived risk factor discussed by Rajyalakshmi (2015).
In general, by knowing what interests the customer during the phase of information seeking prior the purchase, companies can prioritize the initiatives that should be improved. By considering various recommendations provided by neuro marketing research (Bridger, 2017) or behavioral economics (Pappas, Kourouthanassis, Giannakos, & Lekakos, 2016) (our scarcity factor is based on findings from behavioral economics and psychology), companies are able to get the attention of users towards information that matters most to the potential customers and convince them to at least consider the offer (and increase the likelihood of purchase, even in the future). Therefore, knowing what customer seeks and consider to be the most important when purchasing a product online, company can focus on efforts that lead to better deals with suppliers or carriers, design of the strategy for promotions and discounts etc.
There is also financial aspect of this knowledge. As any additional investment to the ecommerce store delays the return on investment, doing things right from the beginning can have an impact for the financial performance of the company. On the other hand, unfulfilled expectations of the customer can also delay the profits and can cause companies to spend more additional resources to re-convince customers about their choice. Financial managers should spend the proper amount to calculate the loss of not adjusting their offers based on the customers' expectations. This calculation could serve as an argument for getting approval from the upper management.
Last but not least, there is also project management perspective on the issue. Creating an e-commerce store is a large project. In our case, each factor from the previous analysis can be thought of as a possible project stream. This allows getting the right people on board in order to accomplish the best possible outcome of the stream.

Conclusions, implications and limitations of the study
As online shopping became the regular part of people's lives, optimization of e-commerce stores is crucial in order to provide the experience expected by website visitors (potential customers). The positive experience might result in higher revenues, the negative one might result in permanent loss of customers. The main objective of this paper was to determine factors that affect the consumers' willingness to purchase product from an online store in the condition of Slovak market. Based on the theoretical background, we conducted survey questionnaire among university students and afterwards analyzed the perception of selected criteria by users when purchasing products online.
When conducting factor analysis, we initially needed to reduce the size of analyzed data matrix because of low communality or ambiguity in results of principal component analysis. Afterwards, we were able to extract six RCs explaining almost 80% of variance in the data-the factors of price, availability, social proof, scarcity, product details, conditions and social media activity. In order to confirm that created factors are correct, we performed CFA. The analysis has shown that model consisting of 7 latent variables (factors) fits the data well. However, by including correlations between factors, we were able to obtain more accurate model and thus confirm, that latent variables developed by our analysis were correct.
The factor of price explained the largest part of variance in the data. We assume the price is especially important for university students, as in majority of cases, they are not employed and therefore, their financial budget is limited.
There were several limitations of the study. As the sample is not a representative sample of the abovementioned general sample of Internet users, we assume these results cannot be reproduced generally, as other generations might differ when purchasing products online. However, as the population will age, we assume there will not be such a dramatic difference between generations in terms of online purchases. Also, the larger sample size might provide us with different results, as other unobserved patterns might arise with the increase of the sample size. We also consider the selected Likert scale to be not as sufficient as characteristics of data shown that many people had tendency to center their answers towards the neutral answer-3. On the other hand, the factors (latent variables) that consist of variables with the mean or median value close to 3 were proven to be less important as they explained less variability. The last limitation we consider to be important is the consciousness of the sample regarding the factors itself as some of them might affect user's behavior subconsciously without user noticing the factor actually changed her decision.
Although several variables were removed from analysis, we still consider them to be important criteria when purchasing products online and should not be overlooked when optimizing e-commerce store. The area of future research is focused on the elimination of the abovementioned limitations. We (1) plan to increase the sample size so we can contribute with more general results regarding online purchasing behavior, (2) transform Likert scale from 5-to 4-option scale in order to remove the middle value and therefore gain more precise results, (3) conduct a connected user study in order to compare questionnaire results with real user behavior when it comes to evaluate factors affecting purchasing behavior, and (4) monitor the trends and changes in user behavior and find new behavioral patterns over time.