Exploring the Relationship between Locational and Household Characteristics and E-Commerce Home Delivery Demand

: The rapid growth in online shopping and associated parcel deliveries prompts investigation of the factors that contribute to parcel delivery demand. In this study, we evaluated the inﬂuence of locational and household characteristics on e-commerce home delivery demand. While past research has largely focused on the impacts of the adoption of online shopping using individual/household survey data, we made use of data from an e-commerce carrier. A linear regression model was estimated considering factors such as degree of urbanization, transit and shopping accessibility, and household attributes. The results both conﬁrm and contradict prior research ﬁndings, highlighting the potential for a non-negligible inﬂuence of the local context on demand for parcel deliveries.


Introduction
Online shopping plays an increasing role in people's daily lives, changing the shopping habits of consumers [1]. One of its key aspects is the flexibility of receiving the goods at a specified location, as a parcel can be shipped directly to the receiver's home or workplace, picked up at a parcel station or brick-and-mortar store. In the US, the annual growth rate of online shopping has ranged from 13% to 16% from 2013 to 2018, outpacing the 1% to 5% annual growth in traditional retail sales during the same time period [2]. During the COVID-19 pandemic, consumers purchased more goods online, ranging from food to fitness equipment [3]. According to a survey, 56 percent of the respondents stated that they purchased more online during the COVID-19 pandemic in Singapore with online sales in June 2020 increasing to 151.2% year-on-year [4]. The increase of online shopping volume leads to the growth of parcel delivery service demand [5], which in turn can increase goods vehicles' trips. It is not clear whether the increase in online shopping decreases the number of trips by shoppers for on-site shopping [6,7]. Given the circumstance, it is important to understand the characteristics of neighborhoods that are significantly more likely to adopt e-commerce home deliveries than others. The development of policy measures for mitigating delivery traffic/parking impacts requires proper understanding on the areas where demand is higher than others.
The adoption of e-commerce home delivery (and online shopping in general) depends on various factors. While some factors have been widely studied, such as the attributes of online shopping websites/applications as well as demographic and personal characteristics (e.g., "tech savviness") [8][9][10][11], the effects of locational factors such as accessibility to shopping opportunities remain poorly understood as little empirical work has been conducted [12]. Most research uses the data collected at the individual or household level, which is costly to obtain. Few researchers have explored the use of data from other sources such as operational data from shippers or carriers (e.g., delivery records).
In this study, we investigate whether locational and household characteristics impact e-commerce home delivery demand by using last-mile delivery records as a proxy. While there are multiple options for consumers to receive goods purchased from e-commerce vendors (such as home deliveries, in-store pickup, and pickup from lockers), we focused on home delivery options given its currently predominant role. The paper is organized as follows. The next section summarizes the past studies on the relationships between locational and household/individual characteristics and parcel delivery demand. The third section provides details on the method and data used for the analysis. The fourth section presents the findings and compares them with the findings from past research, and the last section concludes the research.

Impacts of Accessibility and Urbanization Level on Online Shopping Adoption
A number of past studies focused on the determinants of online shopping adoption. Some studies report the negative correlation between accessibility to commerce and online shopping demand (i.e., greater accessibility is associated with less online shopping adoption). Huang and Oppewal [13] conducted a survey and collected a convenience sample of 152 supermarket shoppers in South England. In the survey, two hypothetical grocery shopping scenarios were presented to each respondent. Each respondent was asked to indicate her/his preference for online or on-site shopping in each scenario. The information about their last grocery shopping trips was also collected. The analysis indicates that time saving is the major motivation to shop online. Loo and Wang [14] investigated the characteristics and patterns of home-based e-working and online shopping using the data collected from a household survey conducted in Nanjing, China. The results obtained from the estimated binary logit regression and ordered logit regression models indicate that geographical accessibility variables such as the distance to the nearest subway station and the distance to the nearest shopping opportunity significantly account for the variability of the amount of time spent on online shopping. Those with less accessibility to public transport or shopping opportunities tend to spend more time shopping online. On the other hand, research by Farag et al. [15] highlights the mixed effects of accessibility on shopping opportunities. They collected data from the cities of Utrecht, Netherlands, and Minneapolis in Minnesota (US) through two independently administered surveys. Travel time to shops was compared for daily and non-daily goods among online and on-site buyers. For the US case the results were contrary to expectation, with travel time to shops having no effect on online shopping; however, in the Dutch case, shoppers with shorter travel time shopped significantly more online. This could be due to the difference in lifestyle between suburban and urban residents. In the Dutch case, people who live in urban areas have greater accessibility to shopping opportunities; however, they also adopt innovations such as online shopping earlier compared to suburban residents. Farag et al. [16] developed a structural equation model using the data collected from the residents of four municipalities (one urban, three suburban) in the Netherlands. They found that urban residents shop online more often than suburban residents and argue that internet speed plays a role in online shopping adoption. They also found that the more shopping opportunities one can reach within 10 min by bicycle, the less often one searches online. On the other hand, Weltevreden and van Rietbergen [17] analyzed the influence of the attractiveness of city centers on the relationship between on-site and online shopping using the data of 3200 internet users in the Netherlands. They found that convenience and accessibility to shopping opportunities in city centers discourage online shopping.
More recent research attempts to account for both urbanization and accessibility effects. Zhou and Wang [18] explored the relationship between online shopping and onsite shopping trips in the US. They estimated a structural equation model using the 2009 National Household Travel Survey (NHTS) data and found that region-specific factors such as urbanization and population density contribute to the increase of both online shopping and on-site shopping trips. Cao [12] reviewed the results of several studies considering the following two hypotheses. The first is that people who live in urban areas are more likely to shop online, and the second is that the residential areas with poorer accessibility to shopping opportunities have greater online shopping demand. The review indicates that the results of some studies support the first hypothesis, some agree with the second, and others do not deliver definite results with respect to the adoption or frequency of online shopping.

Impacts of Individual and Household Characteristics on Online Shopping Adoption and Frequency
Past research also focuses on the impacts of individual and household characteristics on online shopping adoption, among which age, gender, and income are the most frequently considered factors. Gong et al. [19] conducted a hierarchical regression analysis to find impact factors of online shopping, using the data collected through a nationwide online survey among 503 Chinese consumers. They found that online shoppers tend to be younger and have higher incomes. Garín-Muñoz et al. [20] applied a logistic regression model using the data collected from the 2016 Survey on Equipment and Use of Information and Communication Technologies in Households, conducted by the Spanish National Statistical Institute. They concluded that individuals aged 35-44 have the greatest probability of adopting online shopping.
As for gender, Hasan [21] shows that males value the utility of online shopping more than females, while Hou and Elliott [22] claim that females are more likely to be impulsive buyers than males. On the other hand, Dittmar et al. [23] found that gender has no impact on the adoption of online shopping. Zhou and Wang [18] found that larger households shop less online. However, Hernández et al. [24] concluded that a series of demographic variables have neither influence on the use of the internet nor in the adoption of online shopping.

Research Gap and Expected Contribution
The results from the previous studies are mixed and context dependent. Our hypothesis is that urban form and lifestyle have influences on the relationship between generalized locational factors (such as density) and online shopping adoption. Knowledge from more cities is required to obtain generalizable insights. Past case studies have been conducted only for a limited number of cities and very few from Asian cities. Thus, we present a case study using data from Singapore, an Asian city-state known for successful transit-oriented development (TOD) [25], significant urban density but some diversity in urban form (core city center, suburbs, denser new towns, etc.) and high online shopping penetration rate. Furthermore, the majority of the past research used survey data of individuals or households to study the impacts of various factors on the adoption of online shopping. In this research, we used an e-commerce carrier's parcel delivery data to homes to understand the driving factors of e-commerce-driven parcel home delivery demand.

Overview
We estimated a linear regression model to reveal the relationship between locational and household characteristics and online shopping delivery demand in Singapore, using the data from an anonymous carrier. The carrier is, hereafter, referred to as Company A (CA). CA is a delivery company providing delivery services for businesses of all sizes across Southeast Asia. It is one of the largest and fastest growing last-mile logistics companies with wide spatial coverage, serving mostly online-shopping-triggered parcel deliveries in Singapore. We assumed, considering these characteristics of CA, and after analyzing the data, that the spatial bias of the delivery records is limited.
Singapore has a population of 5.6 million and a total area of 725.1 km 2 , resulting in a high population density of 7804 people/km 2 . The total GDP of Singapore was 364 USD billion in 2018. The smartphone penetration rate reached 76%. Singapore is a highly urbanized country with high adoption of online shopping. The national 2017/2018 Household Expenditure Survey [26] reports about 60% of households have made purchases online, with online shopping expenditure at around 5% of average household total expenditure excluding housing.
We focused on e-commerce generated parcel deliveries to households regardless of shopping channels. Homes were assumed to have received the bulk of e-commerce parcel deliveries, while acknowledging deliveries to the workplace are increasingly common, as well as to pickup points such as lockers or brick-and-mortar stores. Instead of deliveries to households, a level of resolution that was unavailable in our data, we used the building as the unit of analysis according to the delivery location records.

Data
The data from CA are for parcel deliveries performed in 2019, over a period of three months (from 2 February to 30 April), which is before the COVID-19 pandemic. The delivery records contain information such as delivery destination, time, and destination type (residential or commercial) as well as parcel size and commodity type. The data include 1,757,280 delivery records. Based on the data, the majority of goods delivered to residential buildings by CA are electronics, fashion products, or cosmetics and personal care products.
We used the number of deliveries per household in each residential building as the dependent variable, leveraging available household-level data. The variable is defined by Equation (1): where: demand s : delivery demand per household in three months in residential building s; num_del s : number of deliveries received in residential building s for a period of three months; num_hh s : number of households in residential building s.

Urbanization Level and Accessibility
Most past studies consider urban/suburban classification and accessibility as key spatial characteristics [15][16][17]. We considered population density as the proxy of urbanization level. Furthermore, we considered the average age of residential buildings as the proxy of the development year of the neighborhoods. We used the accessibility to shopping malls as the indicator of retail accessibility. This indicator is justified since, in Singapore, the density of shopping malls is exceptionally high especially in the CBD area (Figure 1a), and the majority of commodities delivered by CA are available in shopping malls. Aside from shopping malls, street shops are also available retail opportunities. However, most street shops are located close to bus stops and mass transit hubs, which are already considered in the accessibility to public transport variable. Therefore, street stores are not included in the indicator for accessibility to retail. In the literature, the number of shopping opportunities within a buffer is commonly used as the indicator for shopping accessibility [7,15,16,27], which cannot take into account the impacts of distant shopping opportunities. In the case of shopping malls, people go shopping not only in the malls that are near their home but also in those far away. To capture the impacts of all shopping malls in the whole case study area, we used the network accessibility of shopping malls as the indicator, which is normally not calculated at the building level [28]. In our case, the accessibility to the shopping mall was Logistics 2021, 5, 29 5 of 13 calculated for each subzone (a zoning system adopted by public agencies in Singapore) level using Equation (2). where: N s : number of shopping malls in subzone s; D i,s : network distance between subzone i and subzone s.
where: : number of shopping malls in subzone ; , : network distance between subzone and subzone . The distance decay factor is 0.5, which gives the best result in terms of among our tests using several different values for .
We also calculated accessibility to public transport based on London's public transport accessibility level (PTAL) methodology for each residential building. Following the calculation steps in Transport Impact Assessment Guidelines for Developments [29], we obtained the PTAL score for every residential building using the following data: • A walking network in the vicinity (up to 800m walking distance) of the point of interest (POI). Each residential building is regarded as a POI. This is to calculate the walk time from the POI to all relevant public transport service access points (SAPs), i.e., bus stops and mass transit (MRT/LRT) station entrances.  The distance decay factor µ is 0.5, which gives the best result in terms of R 2 among our tests using several different values for µ.
We also calculated accessibility to public transport based on London's public transport accessibility level (PTAL) methodology for each residential building. Following the calculation steps in Transport Impact Assessment Guidelines for Developments [29], we obtained the PTAL score for every residential building using the following data: • A walking network in the vicinity (up to 800 m walking distance) of the point of interest (POI). Each residential building is regarded as a POI. This is to calculate the walk time from the POI to all relevant public transport service access points (SAPs), i.e., bus stops and mass transit (MRT/LRT) station entrances. • Location of all relevant SAPs, as walking distances from the POI within 400 m for bus stops or light rapid transit (LRT) entrances and 800 m for mass rapid transit (MRT).

•
Service frequency of all public transport services at the relevant SAPs. Figure 1 shows the spatial density distribution of shopping malls, bus stops, and mass transit (MRT) stations in Singapore. The figure confirms that Singapore has a good coverage of public transit system.

Household Characteristics
We considered some household characteristics as important factors in the adoption of online shopping [18,19]. In this study, the average values of household income and size were computed for each zone. As for age of household head, the percentages of three categories were calculated for each zone: below 35, between 35 and 55, and above 55. We expected the variables associated with age to reflect the difficulty in using online shopping for elderly people. The details of the data source for calculating these variables are explained in the next section.
Car ownership is also expected to play an important role in shopping behavior. For example, Farag et al. [16] and Ren and Kwan [27] consider car ownership as an explanatory variable. Weltevreden and van Rietbergen [17] estimated separate models for car owners and other transportation mode users. In our case, vehicle ownership is included as household vehicle ownership rate (Equation (3)). where: veh_rate s : personal vehicle ownership rate in residential building s; hh_veh s : number of households that own personal vehicles in residential building s; hh_tol s : number of households in residential building s.

Housing Type
Another variable we considered is housing type, which is a context-specific variable. In Singapore, there are three main housing types, namely public housing (public), private housing (condominium), and landed properties (houses). Public housing is built and managed by the Housing and Development Board (HDB) and associated with 99-year leaseholds. Since these flats are subsided and regulated by the government, they are typically vis-a-vis cheaper than condominiums and houses. Condominiums are developed and owned by private property companies. Houses are usually tied to the land title and are mostly freehold. In general, condominiums and houses are more expensive than public housing. Thus, we controlled for housing type in our model using other types, which included all the other types of residential buildings (for example construction worker and work permit holder dormitory housing) that did not belong to the three main categories, as the base category. In total, there are 53,974 residential buildings in Singapore. Table 1 provides information about the differences among different residential buildings in terms of features such as household size and the number of households in each building. It also illustrates that the majority of households live in public housing. Furthermore, public buildings are in general larger than condominiums and houses.

Source of Explanatory Variables
Household-level data was obtained from a synthetic population, originally estimated for use in SimMobility, an urban simulation platform developed for various cities, including Singapore [30,31]. The synthetic population is generated using a two-stage population synthesis approach. In the first stage, a general iterative proportional fitting (IPF) method was applied to estimate the joint distribution of individual and household characteristics considering multiple levels of constraints. In the second stage, a second IPF procedure was used to estimate spatial patterns of housing and household characteristics within additional building information in the building level. In the evaluation stage, a number of important household and individual attributes, including dwelling type, household income, household size and number of workers, gender, and age groups at the planning district, traffic analysis zone, and building levels were tested. The test results show that the proposed two-step IPF-based approach provided better fits than traditional population synthesis methods. The data used in the approach were collected from multiple sources such as the Singapore Census and the Household Interview Travel Survey (HITS) data. Additional details on the method are available in Zhu et al. [32]. The data required to calculate locational characteristics were collected from OpenStreetMap.

Summary of Variables
The explanatory variables included in the analysis are the average household income (hh_income); the average household size (hh_size); the shares of household head aged below 35 (less_than35_p), between 35 and 55 (from35to55_p), and above 55 (more_than55_p); household vehicle ownership rate (veh_rate); residential building age (res_building_age); population density (pop_dens); accessibility to shopping mall (acc_mall); and accessibility to public transport (acc_pt). Table 2 shows the summary statistics of all variables. Figures 2 and 3 show plots of parcel delivery demand and explanatory variables. Based on the figure, parcel delivery demand has a high coverage but low variability. To reduce skewness, the dependent variable and independent variables were log transformed. The values of explanatory variables were also standardized to compare their importance in the model.  Population density Average residential building age Accessibility to shopping malls Accessibility to public transport

Multicollinearity Analysis
Prior to the regression analysis, we conducted the multicollinearity analysis to identify if there is any multicollinearity in the explanatory variables. Table 3 provides the variance inflation factors (VIFs) of the variables. It indicates that there is multicollinearity associated with res_building_age, hh_size, hh_income, and less_than35_p. It must be noted that variables in the spatial data tend to have high correlations with one another. While the potential effects of multicollinearity must be taken into account, we included all variables with the aim of controlling the effect of each variable.

Multicollinearity Analysis
Prior to the regression analysis, we conducted the multicollinearity analysis to identify if there is any multicollinearity in the explanatory variables. Table 3 provides the variance inflation factors (VIFs) of the variables. It indicates that there is multicollinearity associated with res_building_age, hh_size, hh_income, and less_than35_p. It must be noted that variables in the spatial data tend to have high correlations with one another.

Findings and Comparisons with the Past Research
The results of the regression model are shown in Table 4. The adjusted R 2 is 0.5. We suspect the residuals could be further reduced if there were available data that can explain the need for and the accessibility to on-site/online shopping more directly, e.g., the available time for shopping and the accessibility to required goods on-site and online. Signif. codes: *** p < 0.001, ** p < 0.01, * p < 0.05.
The results illustrate that higher population density leads to higher number of parcel deliveries. This is in line with the results from Zhou and Wang [33] and Farag et al. [16], who also indicated that higher population density leads to more online shopping. The coefficient of building age indicates the negative effect, indicating that more recent developments are associated with more parcel deliveries. This result could be due to the differences in both built environment and lifestyle of residents between older and newly developed neighborhoods, which require further investigation. For example, newer neighborhoods are anecdotally known for having fewer street retail stores.
The accessibility to shopping malls has negative impacts on delivery demand, similar to the results identified in Weltevreden and van Rietbergen [17]. They also found that greater convenience and accessibility of shopping leads to less propensity to shop online. As for accessibility to public transport, higher accessibility to public transport leads to lower parcel delivery demand. However, the effect is little due to the fact that in a transit-heavy city like Singapore, reaching a transit boarding point cannot be equated with a fast trip to the desired location.
Among the household variables, household income (hh_income) has a positive relationship with parcel delivery demand. This result is in line with findings in Zhou and Wang [18] and Ferrell [33]. On the other hand, different from Zhou and Wang [18], household size (hh_size) also has positive impacts. Furthermore, contrary to prior findings [16,18,[33][34][35], age has limited impacts on parcel delivery demand, which can be due to considerable heterogeneity at the selected building level or widespread usage of internet in Singapore. The positive effect of vehicle ownership rate corresponds to the finding of Weltevreden and van Rietbergen [17] that supports that car ownership increases online shopping. A possible explanation is that people who own private cars potentially use online shopping as an additional time-saving strategy. In Singapore, owning a car is typically associated with households in higher income brackets.
Housing type is the most significant variable in the model, which illustrates that people living in public houses shop online more than people live in condominium and houses. Public houses may have better accessibility to final delivery facilities such as parcel lockers and collection and delivery points, which has a positive effect on online shopping frequency [36]. However, the data are not available to us, and, therefore, the exclusion of an indicator such as accessibility to final delivery facilities causes the omitted variable bias on the coefficient of housing type, which needs further research.

Conclusions
The adoption of online shopping is still in progress, and an understanding of the impacts of such lifestyle change is critical for transport and land use planning. In this research, the records of parcel deliveries to households from a carrier were used to explore the relationship between locational/household characteristics and online-shopping delivery demand, in Singapore. We identified a set of key variables from the literature and estimated a regression model. In summary, model-derived findings both support and challenge past research. The estimated model shows that the parcel delivery demand can be explained by considering locational and household characteristics. Besides housing type, household size and building age are highly significant variables to explain online-shopping-driven parcel deliveries even though other important factors, such as accessibility and age, are controlled. Thus, to understand the underlying mechanism of such a connection between the building-level factors and delivery demand, further research is required. While the model has limitations, e.g., the accessibility to non-mall shops is not explicitly considered, the estimated coefficients add insights to the past research findings. Furthermore, the research provided a case study from a dense Asian city that has diversity in urban form and an active online shopping culture, adding to the existing pool of regional studies.
This research also demonstrates the potential of carrier data as an alternative to the data from tailored surveys. However, this does not undermine the value of surveys. While the options for the receiving of deliveries have become increasingly varied (offices, parcel lockers, and brick-and-mortar stores are possible destinations), the data used in this research cannot connect the delivery demand and receiver attributes, particularly if the deliveries are to non-residential locations. The methods to collect data related to online shopping and the associated freight traffic have to be further explored, as to provide a basis to better answer remaining questions.  Data Availability Statement: Restrictions apply to the availability of these data. Data was obtained from freight carrier wishing to remain anonymous and was only available for the duration of the research project funded by SUTD-MIT International Design Centre, grant number IDG21800101.