Data modelling of subsistence retail consumer purchase behavior in South Africa

The purpose of the data is to model the purchase behavior of the subsistence consumer within the retail environment in one of the largest townships in South Africa. The data was collected using a self-administered questionnaire from a sample of 281 consumers. The Partial Least Squares Structural Equation Modelling (PLS-SEM) approach was adopted using the SmartPLS 3 software to analyze the data. The insights from the dataset identify convenience, price sensitivity, perceived product quality, customer trust, and perceived value as factors that stimulate purchase behavior. Furthermore, perceived value only mediates the relationship between perceived product quality and purchase intention. Researchers could use the data to position customer trust as a dependent variable to unearth more valuable insights. Additionally, the segment in question is also known to be price-sensitive. It would be intriguing to find out the role of price sensitivity as a moderator.


Specifications
A structured questionnaire was used to collect data from consumers in one of the largest townships in South Africa Data format Raw, descriptive, and analyzed Parameters for data collection The sample consisted of grocery store consumers from one of the largest townships in South Africa, Soweto, in Johannesburg. Soweto is an acronym for South-Western Townships and comprises several periurban townships Description of data collection Face-to-face self-administered questionnaires were distributed to participants in different settings. This included outside and inside the grocery stores and in the comfort of their homes. The purpose of the research was explained to participants, and consent was obtained before distributing the survey. A non-probability convenience sampling technique was used as there was no database to draw from for a probability sampling approach to be possible. The research data and questionnaire are available in the repository [ 8 ]. The questionnaire consists of Section A (demographic information), and Section B (measurement instruments

Value of the Data
• The dataset is essential because it provides insights into the consumer buying behavior of a segment worth billions in terms of spending power. This segment is often called the base of the pyramid, the resource-constrained, the impoverished, and the subsistence consumer. The dataset can be used to identify the consumer behavior factors that influence the purchase of products in retail stores. • The dataset can benefit researchers in retailing and consumer services as the data provides insights on direct relationships between constructs explored and the mediating effect. The data sheds some light on empathy, convenience, price sensitivity, physical environment, perceived product quality, customer trust, and their influence on perceived value and purchase intention. Furthermore, the data highlights the importance of perceived value as a mediator. • The dataset is also beneficial to retailers interested in servicing the subsistence consumer because it provides insights into which factors the consumers consider as a stimulator of purchase intention. Secondly, given that the study was carried out in a township setting, small informal retailers can benefit from these insights, which means that small business government agencies and policymakers could use the data to inform strategies to assist the commercialization efforts of the township economy. Such methods will also benefit the rural economies. • For further insights, the data can be used to identify other constructs that can moderate the relationship between perceived value and purchase intention. For example, the moderating role of price sensitivity, given how price-sensitive the segment is. Customer trust can also take a function of a dependent variable to unearth valuable insights. Fig. 1 demonstrates the proposed conceptual model. The conceptual model suggests a connection between empathy, convenience, price sensitivity, physical environment, perceived product quality, customer trust, and purchase intention, and that perceived value mediates these relationships. A self-administered questionnaire (using a 5-Likert scale) was distributed to consumers regarding their purchase behavior. Table 1 provides the demographic profile and characteristics of respondents. Although most respondents were females (47.7%), the gap was insignificant to male respondents (47.3%). The majority of the participants were single (58.4%). There was a fair representation of age distribution as shown in Table 1 .

Data Description
On the other hand, the majority of the respondents had basic education. Regarding employment status, 52.3% were employed, and 47.7% were unemployed. About 82.6% of consumers indicated that they were regular customers of the grocery stores, with approximately 40% of the consumers indicating that they visit the grocery stores at least 1-2 times per week, and 27% visit the store 2-3 times per week. Table 2 outlines the measurement instruments which were adapted from prior studies. Furthermore, Table 3 shows the assessment of the reflective model, which includes construct reliability and validity tests, while Fig. 3 presents the output of the measurement models with relevant statistics. Tables 4 , 5 , and 6 show the data analysis for the discriminant validity test, and Fig. 3 demonstrates the structural model output, highlighting the R 2 and Q 2 values. Additionally, Table 7 provides a detailed insight into the hypotheses testing in terms of the direct relationships, and Table 8 provides the mediation assessment and indirect effects. The questionnaire and data are provided on Mendely Data. Fig. 2 demonstrates the measurement model output (indicator loadings).  Table 2 Measurement instruments.

Construct Adapted Items Source
Empathy E1 The employees of the grocery store understand the specific needs of their customers The grocery store understands what I need and strives to accommodate me E3 The grocery store has employees who give customers personal service E4 The employees of the grocery stores are very efficient The grocery store layout makes it easy for me to find what I need The grocery store layout makes it easy for me to move around C3 The grocery store always has merchandise available Price Sensitivity PS1 I will continue to buy from the grocery store even if prices increase [3] PS2 I am willing to pay a higher price for the benefit of having the grocery store located close to me PS3 I am willing to stick with the grocery store and not travel to other competitors outside the township who might offer reasonable prices Physical Environment PE1 The store overall has an appealing looking appearance The grocery store provides a clean shopping environment PE3 The grocery store has wide and open aisles PE4 The grocery store has well-marked aisle signage PE5 The grocery store provides a pleasant shopping environment PE6 The grocery store's environment feels safe and secure ( continued on next page ) The overall quality of products I buy from the grocery store is good [4] PPQ2 The quality of the produce department in the grocery store is good PPQ3 The quality of the meat department in the grocery store is good PPQ4 The quality of in-store bakery is good The grocery store always meets my expectations [5] CT 2 I can count on the store to meet my grocery needs CT 3 The grocery store is reliable CT4 The grocery store can always be trusted CT5 The grocery store consistently provides good quality products and services CT6 The grocery store's offerings are worth the money I spend CT7 The grocery store helps me save time The grocery store products have a good value for money The grocery store products are affordable PV3 In this grocery store, compared to other stores outside the township, I can save money Purchase Intention PI1 I intend to purchase from this grocery store I would like to repeat my experience in this kind of grocery store PI3 I would purchase from this grocery store in the future PI4 I would recommend purchasing in this grocery store to others

Experimental Design, Materials and Methods
The dataset [8] presented is quantitative and collected through a self-administered questionnaire. The questionnaire consisted of sections, namely, sections A and B. Section A contained information about the demographic profile of respondents, including gender, marital status, age, level of education, employment status, type of customer, and shopping frequency. For the demographic profile of respondents, the Statistical Package for the Social Sciences (SPSS) was used to analyze the data. Section B included the measurement instruments used for the constructs (empathy, convenience, price sensitivity, physical environment, perceived product quality, customer trust, perceived value, and purchase intention). A non-probability convenience sampling technique was used to obtain data from consumers of grocery stores located in the largest township in South Africa, Soweto, in the city of Johannesburg [9] . Soweto is an acronym for South-Western Townships, and comprises about forty periurban townships [10] . There was no sampling frame to draw from; hence, a convenience approach was more suitable. To increase the response rate, the participants were approached in different settings (in and outside the grocery stores, the comfort of their homes, and the streets of Soweto). Therefore, the convenience sampling approach enabled the researchers to target specific respondents with crucial information and shopping experience in township based grocery stores to provide relevant feedback to enrich the data [11] . However, since the research applies a convenience sampling technique, the results can only be generalized to the subpopulation from which the sample was drawn [12] . The targeted sample size was initially 300, and only 281 data points were usable, indicating a response rate of 94%. Partial Least Squares Structural Equation Modellssing (PLS-SEM) can work with an extensive range of sample sizes efficiently, from small ( n < 100) to large, indicating that 281 data points are adequate to perform PL S-SEM [13] . SmartPL S 3 software was used to analyze the data. A pilot test was conducted to ascertain the reliability and validity of the measurement instruments in preparation for the full-scale data collection.
Before assessing the measurement model, the common method variance (CMV) was evaluated. Harman's one-factor test was conducted to assess the possibility of common method bias. The results demonstrated that the total variance explained by a single factor was 42.24%, which is below the recommended threshold of 50% [ 14 , 15 ], implying that there are no issues of common method bias. The first step in PLS-SEM is assessing the reflective measurement model [16] . This includes evaluating the measurement model, which contains estimating indicator loadings ( > 0.70) as shown in Fig. 2 , which were all acceptable [16] as outlined in Table 3 . The second phase is the composite reliability (CR) for internal consistency assessment, which should be ( > 0.70) and Cronbach's alpha ( > 0.70) [17] . As shown in Table 3 , all conditions were met. The third phase of the reflective measurement model assessment includes evaluating the convergent validity of constructs, which is tested using the average variance extracted (AVE), which should be ( > 0.50) [14] , as demonstrated in Table 3 . All the AVE values were above 0.50 and met the conditions. The fourth phase of the reflective measurement model evaluation assesses the discriminant validity using the Fornell and Larcker (1981), Heterotrait-Monotrait (HTMT), and cross-loadings as indicated in Tables 4 5 , and 6 , respectively. The discriminant validity for all the tests was confirmed [14,16,18] .
To analyze the proposed hypotheses, the structural model was assessed. Before proceeding with the analysis, it is crucial to check multicollinearity. To check multicollinearity, the Variance Inflation Factor (VIF) values were evaluated, and all the values for VIF were ( < 5) as outlined in Table 3 , indicating that there are no issues of collinearity [14] . VIF values greater than or equal to 5 indicate critical collinearity issues [16] . The R 2 and Q 2 assessments for perceived value and purchase intention were (0.497; 0.324) (0.566; 0.519) and demonstrated a moderate and medium effect [16] , as shown in Fig. 3 . The Standardized Root Mean Square Residual (SRMR) was also acceptable at 0.077, which is below the recommended threshold of ( < 0.080) [19] . To assess the significance of the hypothesized relationships, bootstrapping was used with a minimum sample of 5 0 0 0. The direct effects and mediation tests are presented in Tables 7  and 8 .

Ethics Statement
Informed participant consent was obtained. The participants were informed that participation was voluntary and could withdraw at any given point from the survey. Anonymity was also guaranteed as no personal identifiable information was requested. The School of Business Sciences ethics committee (Wits University) approved the ethics clearance certificate under protocol number (CBUSE/1270).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships, which have, or could be perceived to have, influenced the work reported in this article.

Data Availability
Subsistence Retail Consumer Data (Original data) (Mendeley Data).