How is ICT use linked to household transport expenditure? A cross-national macro analysis of the in ﬂ uence of home broadband access

Understanding of the interactions between Information and Communication Technologies (ICT) and physical mobility is a major area of research with practical applications in a number of ﬁ elds. Very little, however, is known regarding how these relationships vary on a cross-national basis, including across countries at di ﬀ erent stages in development. To address this gap, this paper presents an analysis of household transport expenditure as a function of the available variables, with a particular focus on the ICT. This analysis is based on a cross-sectional dataset from 2010 comprising information on 33 countries including average household transport expenditure, ICT represented by the percentage of households with Internet access at home, and a number of contextual macroeconomic and infrastructural variables. Using a log-log framework we ﬁ nd that, in our sample of countries, household transport expenditure is negatively associated with Internet penetration with an elasticity of − 0.394. We verify this to be robust to endogeneity using presence of restrictions on foreign ownership in the Internet market as an instrumental variable. We also control for potential di ﬀ erences in data quality across countries using the Corruption Perceptions Index. To the best of our knowledge, this is the ﬁ rst attempt to quantify this relationship at a cross- national level while also controlling for endogeneity and data quality issues. Among the control variables, we observe the estimated e ﬀ ects to be intuitive, and consistent with existing research and microeconomic understandings of the behaviour of individuals and households.


Introduction
Information and Communication Technologies (ICT) have become an inescapable part of everyday life in the recent decades. Since 2000 the global penetration of the Internet has grown from 6.5% to 43% of the population. The penetration of mobile broadband, 1 currently the most dynamic market segment, has increased twelve-fold in the last 8 years to reach 47% in 2015 (ITU, 2015). Access to the Internet is increasingly considered a basic amenity that must be provided to citizens given its role in enabling access to information and knowledge, and in exercising the freedom of opinion (United Nations, 2011). In June 2016 the United Nations Human Rights Council passed a nonbinding resolution which 'affirms that the same rights that people have offline must also be protected online' (UNHRC, 2016, p. 3) thus reflecting the crucial and still growing role of the Internet in meeting various human needs.
While the ICT revolution began in the developed countries, its current dynamism is increasingly fuelled by conditions in the develop-ing countries. As personal ICT devices became more affordable and reliant on mobile and wireless technology (cellular, Wi-Fi), developing countries have escaped the considerable costs of wired infrastructure, frequently prohibitively costly in remote areas with challenging topology and a small user base. The unlocked mass ICT adoption is reflected in the fact that 2 of the 3.2 billion people using the Internet in 2015 come from the developing countries (ITU, 2015). Dutta and Mia (2011) noted the paradoxical existence of areas where connectivity is more available than clean water, sanitation or reliable electricity.
The connectivity enabled by ICT can have profound implications for societies and their welfare by enabling remote access to activities and services, thereby influencing mobility patterns, market relations, social norms, and even spatial forms of settlements (Andreev et al., 2010;Gaspar and Glaeser, 1998;Golob and Regan, 2001;Kim, 2016;Litman, 2006;Pawlak et al., 2015Pawlak et al., , 2014Preston and Rajé, 2007;Rhee, 2009;Rietveld and Vickerman, 2003;Salomon, 1986). The persistent problem with this body of research, however, has been the focus on comparatively narrow contexts of single samples and from specific areas (countries, cities) or socioeconomic groups. Rarely have studies embarked upon rigorous cross-national or macro (countries or regions as units of analysis) analyses (Koski, 1997). Hence very little, however, is known regarding how these relationships vary on a cross-national basis, including across countries at different stages in development.
We address the existing gap in research by analysing the relationships between ICT and transport from a cross-national macro perspective in which we look at relationships across countries, i.e. with countries as the unit of analyses and variables describing the aggregate values specific to those countries. 2 To date such analysis has been absent from the field due to a lack of suitable harmonised transportrelated (personal travel) data variables. What is available, however, is the data on household transport expenditure, collected by national statistical offices and aggregated by the United Nations (2016a, 2016b). These data have previously proved useful in studies on the relationships between ICT and travel behaviour, e.g. Choo et al. (2005) or Selvanathan and Selvanathan (1994). Transport expenditure levels serve as indicators of the economic burden of transport as well as of the mobility of the individuals. Both are important for welfare and social equity, especially in the developing countries (Diaz-Olvera et al., 2008;Venter and Behrens, 2005). Here, we enrich the cross-sectional transport expenditure dataset for the year 2010 with data on the percentage of households with Internet access at home 3 reflecting the diffusion of ICT, and a number of contextual macroeconomic and infrastructural variables. We analyse the compiled dataset within a loglog framework to derive corresponding transport expenditure elasticities with respect to the analysed variables thus facilitating comparisons across them. We also control for potential endogeneity in the form of a reverse causality and data reliability issues, thus enabling more robust conclusions despite potential data limitations.
We argue that this approach contributes evidence for how patterns of mobility and its cost across countries may be varying as a result of the level of digitisation of societies. This is particularly important in contexts where the costs of mobility constitute a substantial economic burden or even prevent access to opportunities and services. We acknowledge, however, that the meaning of ICT can be much broader than Internet access at home 4 . Nonetheless, we also note that home Internet access is an important pre-condition for the wider ICT adoption and digitisation (Goldfarb and Prince, 2008). If then promoting ICT can serve as a mechanism of improving access to spatially separated activities, perhaps measures seeking to accelerate the digitisation of societies and reduce the 'digital divide' (Norris, 2001) can indeed prove effective in increasing social welfare. Finally, our macro perspective contributes to the already rich domain of ICT and travel research which to date has tended to utilise disaggregate data.
This paper is structured in the following way. Section 2 explores the literature discussing cross-national research on ICT and travel behaviour, and factors influencing transport expenditure across the world. Section 3 discusses the research design, including the compilation of the dataset and the analytical tools employed. Section 4 explores the findings, while Section 5 concludes the study, highlighting potential avenues for future research.

Existing studies
There are two key streams of research of particular relevance to the present analysis: studies investigating relationships between ICT use and travel behaviour across countries, and studies looking at factors determining household expenditure on transport. Below we present a brief overview of such studies, highlighting the gap our analysis addresses.

Cross-national studies on interactions between ICT and physical mobility
From the perspective of implications for transport systems, ICT have for long been seen as a means enabling activity participation without the need for physical travel. In this way, ICT have been expanding the conventional spatio-temporal constraints implied by geography and physics, as reflected by the Hägerstrandian daily prisms (Hägerstrand, 1970). Hägerstrand's theory accounted for the possibility of remote interaction by means of telephony as early as in 1960s and 1970s. It was only the seminal contributions from Salomon and Mokhtarian from the late 1980s and early 1990s, however, which provided a theoretical basis for a whole set of possible interactions between ICT and travel behaviour (Mokhtarian, 1990;Salomon, 1986). These interactions include substitution, complementarity, and modification, as well as second-order effects, e.g. cross-sectoral implications such as land use, spatial form of settlements, energy consumption, changes in social norms, preferences, or market relations (Audirac, 2005;Andreev et al., 2010;Cohen-Blankshtain and Rotem-Mindali,2016;Gaspar and Glaeser, 1998;Jung and Pawlowski, 2014;Middleton and Cukier, 2006;Poster, 2007;Rhee, 2009;Rietveld and Vickerman, 2003). Naturally, the latter interactions tend to be difficult to capture due to their entwinement and occurrence over longer time scales. Rietveld and Vickerman (2003) attributed the recognition of ICT as a potential means of improving accessibility to emergence of a suitable policy setting due to the shift towards demand management policies and the understanding of transport as a transaction cost in the 1990s. This was also reflected by Preston and Rajé (2007), who identified ICT as having the potential to combat social exclusion through lower-cost virtual mobility. Rietveld and Vickerman (2003) further noted that ICT could enable more activities to be completed in a given time, although they also pointed out, as a response to the claims of the 'death of distance', that it was not the only factor influencing interactions across space. Similarly, Gaspar and Glaeser (1998) challenged futurist hypotheses which suggested that ICT proliferation would lead to the 'end of cities'. They developed a model of evolution of a city in which individuals could interact, with differing intensity, either face-to-face or via telephone (early ICT). The model demonstrated that the aforementioned hypothesis was superficial as it did not account for the simultaneous presence of the ICT-travel substitution and agglomeration effects. Whilst the former enables escaping from the need for physical proximity and face-to-face meetings, the latter explicitly favours it. Hence evolution of a city and the associated changes in mobility will be the net effect of the interactions of these two forces, driven in turn by the relative quality of meetings face-to-face and via telephony.
In addition to the theoretical contributions above, numerous empirical studies looking at interactions between ICT and travel behaviour have been produced since Salomon and Mokhtarian's studies. A series of reviews (Andreev et al., 2010;Cohen-Blankshtain and Rotem-Mindali, 2016;Golob and Regan, 2001;Pawlak et al., 2015Pawlak et al., , 2014Salomon, 1986) summarise the field comprehensively. Most studies conducted to date, however, have relied on single samples collected in specific contexts whilst cross-national efforts have remained rare. This gap has remained largely unaddressed, and hence the two-decades-old call from Hodge and Koski for more cross-national comparative analyses in this domain remains valid. In that call, the authors emphasised the potential value of such studies in enabling a 2 This is in contrast a standard comparative an analysis which compares relationships between constructs in multiple countries with individuals or households as units of analyses 3 The full definition provided by the ITU: 'Households with Internet access at home refers to the percentage of households with Internet access at home. Access is not assumed to be only via a computer-it may also be by mobile phone, games machine, digital TV etc.' (ITU and The World Bank, 2012, p. 236). The data is typically aggregated from the respective operators, complemented with household and business surveys (The World Bank, 2016e). 4 A broad and frequently used definition has been proposed by the OECD which defines ICT goods and services as those '…intended to fulfil the function of information processing and communication by electronic means, including transmission and display, or use electronic processing to detect, measure and/or record physical phenomena, or to control a physical process' (OECD, 2003, p. 3).
'deeper understanding of how context affects the development and impacts of new technology and how future scenarios might be imagined' (Hodge and Koski, 1997, p. 192). Existing studies that have contributed along these lines are summarised below.  estimated a country-wide system of structural equations (SEM) for the United States. They related travel and telecommunication demands (measured in terms of telephone calls and vehicle-miles travelled respectively), and controlling for infrastructure, land use, and sociodemographic variables. Their conclusion was that of a positive association between frequency of telephone calls and vehicle-miles travelled. They further observed that additional ICT infrastructure (telephone wires) was positively correlated with demand for telephone calls. This indirect mediation made the authors conclude that demand for telecommunications needed to be taken into account in travel demand forecasting, and that parallel promotion of telecommunications and travel reduction policies could be counteracting. Interestingly, a very similar study conducted in the context of Hong Kong led to similar conclusions (Wang and Law, 2007). The positive correlation between the demand for connectivity and the associated infrastructure and device proliferation has been maintained over the years, as reflected in the proliferation of mobile devices mentioned in the opening paragraph of Section 1. However, the exact and causal nature of the link between these developments and mobility remains an ongoing research question.
The study by Kowald et al. (2012) looked at the effect of ICT on the average physical distance in personal networks in cities in four countries: Toronto in Canada, Zurich in Switzerland (in addition to a country-wide analysis), Concepción in Chile, and Eindhoven in the Netherlands. By exploring these five contexts using multilevel modelling, the authors provided evidence for a positive correlation between connectivity (described through an Internet access variable) and average distances in personal networks in Zurich, and to a limited extent in Concepción, but not in Toronto, Eindhoven, or in the Swiss national sample. The authors suggested a partial explanation for this cross-national variation could have been the data collection timing (different years) which in turn affected the observed Internet penetration rates. Another possible explanation, although not one discussed by the authors, could be that higher distances led to faster adoption of the Internet in order to maintain the existing social as suggested by Smoreda and Thomas (2001). This also highlights the challenge of possible existence of reverse causality between ICT and mobility (Pawlak et al., 2015). Pawlak et al. (2014) recognised the lack of datasets simultaneously containing information about ICT use and travel behaviour as the foremost reason for the absence of cross-national analyses in the field. To circumvent this problem they employed a data pooling (fusion) technique to combine ICT-and travel-only datasets using a set of shared socioeconomic variables. Using such pooled datasets for Norway, the UK and the USA, in addition to a complete dataset for Canada, they estimated a set of structural equations models (SEM). They revealed significant cross-national variability in interactions between the use of ICT and travel for various purposes. The study compared datasets collected during different years, however, and not necessarily the same set of variables. In addition, the robustness of their findings based on the pooled datasets was subject to the validity of the data pooling approach and the underlying statistical assumptions, neither of which could be assessed within the study. Nonetheless, the study proved valuable in highlighting the gap in data collection and harmonisation as a major obstruction to cross-national analysis as well as in pushing the domain towards using innovative ways of circumventing these issues. Thomopoulos et al. (2015a) compiled a number of studies exploring various ICT applications in transport systems in different geographical contexts. They delivered not only a good cross-national comparison of experiences, but also discussed how they were able to reflect a wider set of opportunities and threats to transport systems brought about by ICT. In contrast to most studies on ICT in transport, the authors did not restrict themselves to interpreting ICT as either a means of undertaking activities (i.e. the demand side of transport) or as intelligent transport systems (ITS) responsible for improving efficiencies in the supply side of transport. This approach proved valuable given that the two fields seem to co-exist in relative isolation motivated by analytical convenience rather than suitability (Pawlak et al., 2015). The key conclusions drawn by the editors expressed the need for cross-national analyses leading towards greater commonality in terminology, legislation, in technology, as well as practice-sharing and 'business models tested within various city or regional contexts' (Thomopoulos et al., 2015b, p. 301) so as to enable the more effective and sustainable use of ICT in transport contexts.
In addition, substantial effort has been devoted to understanding the use of ICT in the context of tourism-related travel. While tourism constitutes a considerable source of travel, especially long-haul and international, it is often overlooked in travel behaviour studies conventionally focused on commuting and business. Yet ICT seem to be playing crucial role in shaping tourism travel. For example, Tjostheim et al. (2007) analysed the evolution of the role of Internet in providing information for trip planning as compared to more traditional sources (guidebooks, travel agents, brochures). Based on the analysis of 12 European countries and the US between 1997 and 2005, they confirmed increased role of the Internet in travel planning. Another contribution from Kazakov and Predvoditeleva (2015) compared the patterns of use of online resources on hotel choices made by American and Russian tourists and business travellers, with the authors providing evidence for the existence of differences in the analysed behaviour between the American and Russian respondents.

Household expenditure on transport
Another relevant stream of research involves studies looking at determinants of household transport expenditure. Transport expenditure remains among the core indicators describing physical mobility and its cost, and has been used in a number of studies, including those looking at the effects of ICT (Agrawal et al., 2011;Choo et al., 2005;Selvanathan and Selvanathan, 1994;WBCSD, 2004). The main convenience of analysing transport expenditure is the availability of data across countries since it forms part of consumption monitoring efforts maintained by states and international organisations. Thanks to this availability, household transport expenditure is currently the most common, if not the only way to investigate cross-national variations in travel behaviour and its cost to individuals and households, especially if developing countries are also included. Transport expenditure is also linked to the actual mobility since costs of travel can be a constraining factor on the amount of travel and the use of faster modes of transport. For example, low-income households, whose limited budgets impose tighter restrictions on expenses are likely to shift towards less expensive modes, informal modes, or reduce travel (Agrawal et al., 2011;Diaz-Olvera et al., 2008).
At the same time, the use of transport expenditure as a proxy for physical mobility has a number of challenges. Firstly, household transport expenditure data are collected by different national agencies and only compiled together by a international agencies. Hence while OECD (1998) advise that household transport expenditures should include costs of vehicle purchase, operation of personal transport equipment and purchase of transport services, specific agencies or states can interpret these guidelines differently, and thus have both different data collection protocols and levels of adherence to them.
Secondly, whilst using the household as a unit of data collection appears sensible given its role as a unit in which resources are jointly allocated and used and, travel patterns co-organised (Bradley and Vovsha, 2005;Vovsha et al., 2003), the definition of the household varies across the globe. It can be based around kinship, residency and facilities sharing, spending, or a mixture of these among others (Leone et al., 2010;Ünalan, 2005). For example, in Sub-Saharan Africa a household can contain as many non-relatives as relatives, and the practice of polygamy can also result in complex residential arrangements (Beaman and Dillon, 2012). In addition, Deaton et al. (1989) used data from Spain to provide evidence that household size and composition were strongly linked to the level of transport expenditure.
Thirdly, transport expenditure is conditional upon a number of factors which must be taken into account if their effects are to be controlled for, which is also what we adhere to in the present analysis. One of the most important determinants is the overall level of affluence. More affluent societies can afford more and faster transport, even though these are costlier options. Dai et al. (2012) demonstrated how this relationship varied between countries (China, Greece, South Korea, UK, USA), which also manifested different characteristics of the underlying Engel curves.
Diaz-Olvera et al. (2008) conducted a comprehensive review of transport expenditure studies in Sub-Saharan countries. They argued that whilst transport expenditure tended to be highly sensitive to economic crisis and hence affect access to work opportunities, it nevertheless remained low on policy agendas in those countries. Specifically, they used data from Ouagadougou in Burkina Faso, Niamey in Niger, and Dar-es-Salam in Tanzania to demonstrate how high cost of private and public transport can prevent the most deprived population segments from opportunities to increase affluence and human capital. They also noted that substantial variation in transport expenditure was attributable not only to different levels of affluence, but also to methodologies differing across survey types, agencies, countries and years.
Working in the context of the Republic of South Africa (RSA), Venter and Behrens (2005) discussed the merits of the policy of seeking to reduce household expenditure share to below 10%, which was then a policy benchmark in the context of the need for transport availability and affordability. Their critique of the flat benchmark was based on the non-monotonic relationship between share of transport expenditure and welfare, and evidence for heterogeneity in transport needs. Regarding the latter point, they observed previous research in the RSA to indicate the role not only of affluence, but also professional and family duties and availability of public transport. Similar observations were made by Mattingly and Morrissey (2014) who used their analysis of Auckland in New Zealand to investigate trade-offs between transport and housing expenditure. In addition to the role of public transport availability, they also emphasised the role of population density, land use, and vehicle ownership. Mokhtarian and Chen (2004) conducted an extensive review of the literature related to the expenditure of money on travel in order to investigate support for the concepts of constant travel time and expenditure budgets. In addition to presenting evidence for variations in those budgets over time (across the week, year, and years), they presented findings suggesting a positive association of monetary expenditure with car ownership levels, and a negative one with the availability of public transport or population density. Dai et al. (2012) provided evidence for the role of urbanisation. Specifically, they used data from China that indicated differences between urban and rural households' transport expenditure, which they attributed to more general differences in consumer behaviour. On the other hand, Tanner (1981) demonstrated similar level of expenditure in urban and rural areas, despite differing distances and availability of transport alternatives. Paltsev et al. (2004) looked at the effects of fuel taxation on emissions and social welfare using data from the Global Trade Analysis Project (GTAP) covering virtually the whole world economy. They demonstrated that fuel taxes, and thus higher prices, tended to increase household transport expenditure on transport, and hence decrease their welfare. They further noted that the expenditure appeared to depend on the availability of highways and other means of transport.
Clearly, the abundant research on household transport expenditure points to a variety of factors of potential importance. At the same time, the literature on ICT and travel behaviour provides the basis for a hypothesis of ICT playing a role in shaping transport expenditure of households in different countries. To date, such effects were explored using data from single, specific contexts Mokhtarian, 2008a, 2008b). We did not, however, identify an analysis that looked at this question from a cross-national perspective, and thus the following analysis seeks to address this gap.

Research design
In order to explore role of ICT in shaping transport expenditure of households in different countries, we firstly present the process of compiling the suitable dataset in Section 3.1. Subsequently, we discuss ways in which we control for potential endogeneity and data reliability issues, in Sections 3.2 and 3.3 respectively. All these steps provide the basis for a formal definition of the model in Section 3.4.

Dataset compilation
Given that, to the best of our knowledge, there exists no complete dataset suitable for macro-analysis of the relationship between ICT use and physical mobility, we decided to compile one, having as a point of departure, data about household transport expenditure aggregated by the United Nations (2015). We enriched this dataset with a set of control variables, as suggested by the research presented in Section 2.2. We also added the percentage of households with Internet access at home to investigate the postulated effect of ICT on transport expenditure. While ICT is a broader term, which we discussed in Section 1, we relied on Internet access at home as it proved to be the only ICT-access variable consistently reported in an official source across the sample countries. We also supplemented this data with information on the cost of the Internet access by including information on the price of so-called fixed-broadband sub-basket (ITU and The World Bank, 2012, p. 235). 5 In addition, we included a variable indicating presence of any restrictions on foreign ownership in facility-and spectrum-based operators, and Internet Service Providers (ITU and The World Bank, 2012) to serve as an instrumental variable (see Section 3.2).
Finally, we also added the Corruption Perceptions Index (CPI) compiled by Transparency International to control for data quality issues (see Section 3.3). The final set of variables used in the present analysis is presented in Table 1.
In order to reflect the real cost of transport relative to income and price levels in the analysed countries, we adjusted the monetary variables for the Purchasing Power Parity (PPP). To ensure consistency, we used data for the year 2010, except a few instances in which the required information was not available for that year, e.g. because it is reported less requently than annually. In such cases, we used values from the nearest available year as indicated in the column 'Year' to impute the values trough interpolation. We argue this to be sufficient given small number of such instances on the one hand, and unlikely major non-linear variation during the short period between 2010 and the nearby years of reporting.
Following this approach we managed to gather data for 33 countries, at different levels of development and with a considerable degree of heterogeneity in terms of the household transport expenditure (Fig. 1). The resulting dataset is comparatively rich in terms of the geographical coverage, number of variables and their variation (see Table 2). At the same time, while the sample of 33 countries is not 5 The full definition provided by the ITU: 'Fixed-broadband sub-basket refers to the price of the monthly subscription to an entry-level fixed broadband plan. For comparability reason, the fixed broadband sub-basket is based on a monthly usage of (a minimum of) 1Gigabyte (GB). For plans that limit the monthly amount of data transferred by including caps below 1Gigabyte, the cost for additional bytes is added to the subbasket. The minimum speed of a broadband connection is 256kbit/s' (ITU and The World Bank, 2012, p. 235). atypical for studies in regional or transport geography, it might be limiting from the statistical point of view. This is because the limited size can restrict the number of parameters amenable to estimation and thus restrict the power of the inferential tests used in specification search and risk type II error. As a result, in our analysis we retain the control variables regardless of their statistical significance to control for their impact and thus minimise the risk of type I error with respect to the analysis of the role of the Internet penetration.

Endogeneity
Up until now, the proposed approach has implicitly assumed the relationship between ICT and household transport expenditure to be unidirectional, i.e. that ICT influence household transport expenditure, but not the opposite. Some empirical findings, however, appear to highlight that this does not need to be the case. For example, de Graaff and Rietveld (2004) suggested that willingness to work at home and the associated opportunity to reduce commuting might have influenced the decision to purchase a modem, itself a key ICT enabler at the time. While the results may be technologically obsolete, they actually  (2011) (2011) a The data used were for 2010 except instances where the data were not available. In these cases, the data for nearest available years were used for imputing the values through interpolation. demonstrated that even early generation ICT and travel interactions could have been subject to bi-directional causality. More recently Tranos and Nijkamp (2013) conducted analysis of links between European cities in which they demonstrated how the development of Internet infrastructure had been affected by physical proximity of places, itself associated with transport infrastructure. They identified the existence of centripetal forces agglomerating Internet links in specific locations, e.g. a higher user base in areas of greater population density and economic activity, leading to the emergence of digital infrastructure hubs. Similar bi-directional causalities have tended, surprisingly, to be overlooked in the field to date, despite such evidence (Pawlak et al., 2015). From the econometric point of view, the existence of bi-directionality (or simultaneity, or reverse causality) implies endogeneity leading to biased parameter estimates if unaccounted for (Greene, 2012). In the present context, we suspect the ICT variable to be potentially endogenous with respect to household transport expenditure. We address this issue through an instrumental variables (IV) approach for the ICT variable, i.e. Internet penetration, using two-stage least squares (2SLS) regression (Dougherty, 2011). This approach is the most widely accepted treatment method in the instance of cross-sectional data. In the IV approach, the variables potentially affected by endogeneity are replaced by an estimate from an ancillary, first-stage regression undertaken using the instrumental variables that are believed to be uncorrelated with the main dependent variable, i.e. household transport expenditure (and thus error term in the main model) while also controlling for the effect of other covariates. The second-stage of estimation making use of the prediction obtained in the first stage yields unbiased (consistent) estimates though at the cost of lower efficiency, i.e. loss of precision reflected in larger standard errors due to extra error from the first-stage regression. The Wu-Hausman test serves to inspect whether parameters estimated using the naïve and 2SLS approaches are significantly different from each other indicating presence of endogeneity. If this is the case, the IV estimator is preferred. Otherwise, the naïve estimator is typically preferred due to its superior efficiency (Greene, 2012).
The crucial conditions for validity of the 2SLS approach is the strength of the instruments, i.e. their ability to explain variation in the variable affected by endogeneity when controlled for other covariates, and their exogeneity, i.e. lack of correlation with error term in the main model. The first condition is typically investigated using a partial F-test looking at how much variation in the purported endogenous variable unexplained by the other covariates is explained by the IV. Staiger and Stock (1997) recommended that IV be considered strong if the F-value in the auxiliary regression and the partial F-value for each instrument are greater than 10. The assumption of exongeneity can be verified by regressing residuals in the naïve model on the IV and inspecting significance of the parameters.
The IV approach in the context of ICT and travel behaviour research was used by de Graaff and Rietveld (2004), who selected the use of a personal computer as a hobby as an instrumental variable on the grounds of its high correlation with modem possession, but low correlation with teleworking. In the case of Internet penetration, to date researchers have employed a number of instruments, including regulation of data services and Internet provision (Clarke and Wallsten, 2006), number of Internet Protocol (IP) addresses (Timmis, 2013), population residing in a multiple family dwellings (Dettling, 2016), physical distance to service provider backbones (Miner, 2015), or lighting strikes and their impact on ICT infrastructure costs (Andersen et al., 2011;Salami and Seamans, 2014).
In the current research, we explore the presence of restrictions on foreign ownership in facility-and spectrum-based operators, and Internet Service Providers (ITU and The World Bank, 2012). Hence the approach is similar to that followed by Clarke and Wallsten (2006) or Ioannides et al. (2008). It utilises the fact that unrestricted competition tends to be correlated with more mature market for Internet service provision that encourages investment in infrastructure and use of the technological and business known-how. These are, on the other hand, essential for efficient deployment of the Internet infrastructure and ability to provide access to the Internet on a mass scale.

Data quality weighting
In Section 2.2 we mentioned the possibility of differing data quality across countries resulting from, inter alia, adherence to data collection protocols, varying importance attached by the authorities to quality and objectivity of the data, feasibility of the procedures given specific socioeconomic and geographical conditions, or staff resourcing. Such issues can affect the quality and credibility of the data, and thus the estimated parameters. This issue will be more pronounced in the instance of records (countries) characterised by outlying (extreme) values of the analysed variables, such as low GDP or Internet penetration, due to the comparatively higher leverage of such data points. The effect is less severe if the affected data point is not extreme (smaller leverage), and if there are numerous credible data points. Unfortunately, data credibility is often linked to the level of development of a country. This can be observed in Fig. 2 which presents a plot of the 2010 Human Development Index (HDI) against the CPI compiled by Transparency International (Transparency International, 2011). The CPI is derived from a number of ratings and indices compiled by international development agencies and banks, consultancies and nongovernmental organisations, themselves also covering data reliability and transparency. Thus a higher CPI means lower corruption, higher transparency of public institutions and more reliable data. The trend observed in Fig. 2 highlights that data quality could be an issue in the case of less developed countries characterised by more extreme values of the variables. One way of treating the leverage effect makes use of the so-called Cook's statistic, indicating whether an outlying observation should be removed from the dataset given its leverage (Aguinis et al., 2013;Cook, 1977). This approach, however, suffers from its purely statistical basis and its failure to consider the potential sources of such behaviour, and which can therefore bias the data still further, while also reducing the sample. In the current context, it would not only be statistically inconvenient, but would also limit scope of the analysis. We therefore we opt for an alternative approach which weights the data points (countries) according to credibility of the data. This approach is similar to the treatment applied in instances where observations are known to be affected by measurement noise.
However, to the best of our knowledge there exists no formal framework or index which provides a credible assessment of the quality of data collected and published by different countries. The recently launched Open Government Index compiled by the World Justice Project incorporates the measure of openness of publicised government data, partly also reflecting the quality of officially published information and data (The World Justice Project, 2015). The index, however, has only been launched in 2015 and is therefore not available for the year 2010. An alternative could be the Freedom of the Press score compiled by an NGO Freedom House, which incorporates assessment of the legal, political, and economic environment (Freedom House, 2016). However, the assessment and the score are focused on press and news rather than the data.
Yet a different option would be to use the quality of the overall institutional setting and corruption reflected in the CPI. While we are aware of studies which used corruption perceptions measure to derive weights (Dahlström et al., 2012) including the use of CPI (Treisman, 2000) and to control for data quality (Swamy et al., 2001), we have not seen studies attempting this in a context similar to ours. Hence we seek to test this novel approach and see how this type of weighting will affect the estimates as compared to the unweighted case. In order to avoid an arbitrary scale for the weight, we use a normalised CPI as a proxy for data quality, i.e. we divide the CPI by the standard deviation of the observed values.

Model definition
Based on the considerations above, we seek to estimate a model relating household transport expenditure to the ICT and control variables (Eq. (1)) described earlier: where Y TE is household transport expenditure, X ICT is the percentage of households with Internet access at home, and X C is a vector of control variables in accordance with Table 1, and also including a constant term. β ICT and β C describe the parameters to be estimated, the latter in a vector form. We employ logarithms in the estimation since this provides a convenient and scale-free interpretation for the estimated parameters, which are simply elasticities of household transport expenditure with respect to the covariates. Finally, ε denotes the error term assumed to be distributed according to a normal distribution with zero variance, and independently and identically across the observations. We term the model above 'naïve' since the error term assumption implies no endogeneity in the form described in Section 3.2, i.e. X ICT is assumed as exogenous. The 2SLS approach used to control for the endogeneity due to the potentially bi-directional relationship between Y TE and X ICT makes use of the instrumental variable Z (Restrictions on foreign ownership in facility-and spectrum-based operators, and Internet Service Providers) to estimate the following, first-stage equation (Eq. (2)): where γ 0 is the constant term, γ is a parameter describing the effect of changes in logZ on logX ICT , γ C is a vector of parameters controlling for the effects of the other covariates, and ξ is an error term distributed according to a normal distribution with zero variance, and independently and identically across the observations. The aforementioned condition for an instrument Z to be strong requires it to be capable of explaining significant portion of the variation in X ICT conditional upon the other covariates. This is assessed using the partial F-test with null hypothesis of a weak instrument, i.e. the explanatory power of the model not being different to a model without the instrument (i.e. γ constrained to zero). The second condition, i.e. exogeneity of the instrument requires the IV not to be correlated with the error term in Eq.
(1), i.e. Cov(ε, Z) = 0. This can be assessed by regressing ε on Z and inspecting the significance of the estimated slope parameter. If the instrumental variable Z meets the conditions above, it can be used to calculate the predicted values of the percentage of households with Internet access at home  X log ICT (Eq. (3)): This predicted value serves as the input to the second stage estimation which yields a coefficient (and elasticity) for the percentage of households with Internet access at home  β ICT that is robust to endogeneity (Eq. (4)): where ε̂is the error term distributed independently and identically according to a normal distribution with zero variance, and across the observations, and the remaining terms are as in Eq. (1). Thus, Eqs. (3) and (4) establish the 2SLS model. The Wu-Hausman test can then be used to established whether the difference between β ICT and  β ICT , i.e. the naïve and 2SLS parameters, is statistically significant, hence indicating the presence and severity of endogeneity.
In addition, the both naïve and 2SLS models can be weighted as explained in Section 3.3. Given two possible approaches to endogeneity (naïve and 2SLS) and to weighting (unweighted and weighted using normalised CPI), we estimate four models, as outlined in Table 3. The estimation is conducted using the 'ivreg' routine of the package 'AER' 1.2-4 for the R environment (Kleiber and Zeileis, 2016).

Findings
The results of the estimation and the associated diagnostics are presented in Tables 4 and 5 which provide coefficients estimated in the first and second stages respectively. The naïve models are presented in Table 5 to facilitate comparison with the 2SLS approach. Note that the estimated values represent elasticities of average household expenditure with respect to the covariates. The most immediate general observation concerns the goodness-of-fit of all regressions as indicated by the statistically significant values of the respective overall F-tests. The high values of the coefficients of determination further support this observation, although their values are naturally inflated by the high number of regressors and the small sample size. We discuss specific findings regarding ICT (percentage of household with Internet access at home), including the validity of the instrumentation strategy, and control variables in Sections 4.1 and 4.2, respectively.

Household Internet penetration and transport expenditure
Inspecting Table 4 reveals that in both weighted and unweighted instances the proposed IV in the form of a binary variable describing presence of restrictions on foreign ownership in facility-and spectrumbased operators, and Internet Service Providers is positively and significantly correlated with the logarithm of household Internet penetration when controlling for the other covariates. This result is intuitive and supports the observation that unrestricted competition in the market is a symptom of a more mature Internet sector which is essential for a more widespread Internet adoption. We observe that the partial F-statistic is above the desired benchmark of 10 (Staiger and Stock, 1997) thus rejecting the hypothesis of the instrument being weak. In addition, regressing the IV on residuals from the naïve model does not yield statistically significant slope coefficients which supports hypothesis of the instrument's exogeneity. These results provide a basis for proceeding with the 2SLS approach as a valid strategy for controlling for endogeneity in the form of a reverse causality between the percentage of households with Internet access at home and the average household transport expenditure.
We find that household transport expenditure is negatively associated with Internet penetration. Inspecting the naïve unweighted model, which is the starting point of the analysis, reveals this effect to be significant at the 0.050 level, even when most of the control variables, although typically observed to be influential, do not attain this level of significance. This relationship also holds in the weighted naïve instance, though with a larger standard error. This can be attributed to the fact that the weighted estimator is efficient (i.e. the standard error is minimised) only when the weights are known truly to reflect the relative measurement precision. As discussed in Section 3.3, the CPI only serves as a proxy for relative data quality and further research is warranted to investigate the extent to which it meets the above requirement. However, in the current case its use provides us with an increased confidence in the robustness of our finding to the quality of institutional setting in which the analysed data were collected, and hence the quality of the data themselves.
When comparing the 2SLS and naïve parameters for Internet penetration, the observed difference is not large enough to be considered significant, as indicated by the Wu-Hausman test. This is valid in both the unweighted and weighted models. Hence we conclude that the hypothetical endogeneity in the form of reversal causality between transport expenditure and percentage of households with Internet access at home penetration is not severe.
The diagnostics above provide us with the confidence to use the naïve parameters due to their superior efficiency in the absence of evidence for the severe endogeneity. The significance of the parameter despite the limited sample size and in light of the endogeneity considerations above, provides evidence that the negative relationship between household transport expenditure and Internet penetration is not spurious. Using values for the unweighted naïve model, the estimated elasticity indicates that the increase of 1% of the household Internet access penetration is associated with a reduction in the average household transport expenditure of between 0.162% and 0.626%, based on the 95% confidence interval, with the average value of 0.394%. The respective mean value for the weighted naïve model is 0.296% with the 95% confidence interval of 0.041% and 0.550%. Hence we observe that the findings based on unweighted and weighted models are not statistically different.
Thus our findings indicate the existence across countries of at least a degree of substitution between household Internet access and expenditure for physical mobility. Decreases in the latter can occur either through either reduction in the amount of travel by households or in cost per unit of transport. In the former instance, households can employ ICT to conduct some activities remotely, i.e. online, hence  In the latter case of the cost of travel, the possibilities may include awareness of the less expensive choices among the above options. Increasingly, these include joint or shared options such as ad-hoc vehicle pooling and sharing which are facilitated by the proliferation in ICT. As these only became widespread in the last few years and in selected countries, however, they might not have been captured in the 2010 data used in the present context. Furthermore, as household transport expenditure also includes the cost of vehicle ownership, higher Internet penetration could offer more opportunities for utilising online mechanisms such as seller-to-seller platforms and transactions, online auctions, or price comparison tools.
Naturally, the observed negative relationship reflects the net effect, which means that both substitution and complementarity are likely to be present, with the former one dominating. In other words, in addition to higher Internet penetration leading to transport expenditure reduction, it may also lead to instances of increased transport expenditure as previously outlined in Section 2.1. Our findings, however, highlight that the wider proliferation of ICT may indeed be linked to a reduction in the economic burden of physical mobility on households. This implies that development policies aimed at improving the welfare of societies can increasingly incorporate digitisation as one of the mechanisms for achieving their objectives, although such policies will need to ensure that the ICT expenses incurred by households do not outweigh the reduction in transport expenditure. Hence decomposition and a more detailed understanding of the causal nature of these mechanisms should continue to be an important research direction to the aid improvement of such policies and enhance the evidence base.

Control variables and transport expenditure
Our findings concerning the remaining variables highlight the preeminence of GDP and overall consumption expenditure as factors determining the level of household expenditure. The estimated elasticities for these factors are 0.627 and 0.618, respectively, which are clearly above the household Internet penetration elasticity. If GDP and consumption expenditure are taken to serve as a proxy for the income available to households, the estimated elasticities reveal the nature of the underlying Engel curve and indicate that transport can be classified as a necessity (the elasticity below unity). This also conforms to findings elsewhere, e.g. Dargay (2007).
Another highly influential variable, which we also find to be significant and exceeding that of household Internet penetration, is the retail fuel (diesel) price. This naturally follows from the impact of per unit fuel price on the cost of any motorised travel, and remains in line with findings reported in Section 2.2. Our results also provide evidence that higher car ownership levels are linked to higher expenditure which is also in line with the existing research, as outlined in Section 2.2 The remainder of the variables does not attain the critical level of significance, although the directions of the estimated parameters remain mostly intuitive. These variables were retained in the model for controlling for their effects in light of their importance reported elsewhere as outlined in Section 2.2. The positive elasticity for household size reflects the greater expenditure associated with additional household members and their additional needs for mobility. On the other hand, higher unemployment rate tends to be associated with lower transport expenditure which follows from reduced need for workrelated travel (including commuting) and also possibly lower affluence.
We find population density to be negatively associated with household transport expenditure. We interpret this as reflecting the higher amount of travel needed to maintain the same level of economic or social activity in the case of more sparsely populated countries. However, the opposite effect is observed in the weighted model. We attribute this to the limited sample size which may prevent stable estimation of a parameter if the effect is comparatively weak. On the other hand, we find that the rate of urbanisation is positively associated with transport expenditure which can reflect more opportunities for activities provided in urban areas. This is partly supported by the existing research which is ambiguous with respect to the differences in transport expenditure between urban and rural areas, as noted in Section 2.2. As for the role of transport infrastructure, we find evidence that the density of transport network, both road and rail, is negatively associated with transport expenditure. This can reflect the effect of shorter routes to activities enabled by a denser network of roads, as well as better and less costly travel opportunities when citizens have access to a denser network of public, rail-based transport. In addition, the results can indicate presence of denser rail network in areas of higher demand where economies of scale in infrastructure provision and maintenance lead to lower cost of travel. On the other hand, the evidence for the impact of road network density is inconsistent. A denser network can mean easier accessibility and thus reduced transport cost on the one hand, but evidence has been available since the 1990s that his may also induce additional demand, and this additional expenditure (Goodwin, 1996;Hymel et al., 2010).
The final variable which we controlled for is the cost of a fixedbroadband sub-basket. We do not find the estimated coefficients to be statistically significant, although we observe that the positive sign of the estimated coefficients is intuitive. In other words, the positive elasticity would indicate higher transport expenditure when the price of broadband access is higher. This would also be consistent with the observations made in Section 4.1 regarding the negative coefficient for the percentage of households with Internet access at home. More specifically, both the negative elasticity of transport expenditure with respect to home Internet access penetration and the positive cross-price elasticity of transport expenditure with respect to the cost of a fixedbroadband sub-basket support the ICT and travel substitution hypothesis. While the latter effect is not statistically significant in the present analysis (although close to the 0.100 threshold), this could be an artefact of the limited sample size and hence it warrants further confirmation with a larger cross-national sample. Such efforts could also indicate whether falling prices of ICT, e.g. due to economies of scale in infrastructure provision or ICT market liberalisation, could serve as a means of indirect reduction of transport expenditure burden on households.

Conclusions
The point of departure for the present analysis is the gap in understanding that exists regarding the nature of the interactions between ICT use and physical mobility at a macro-, cross-national level. Using data from 33 countries compiled from a number of official sources, we contribute towards a more in-depth understanding of how average household transport expenditure varies across countries due to their different levels of ICT adoption as reflected in the percentage of households with Internet access at home. In particular, we find that association to be negative and robust to the endogeneity in the form of reverse causality as indicated by comparison of the naïve and the 2SLS estimators. The remaining effects associated with the control variables are intuitive and largely in line with existing research.
These findings can be interpreted as cross-national evidence for the existence of substitution between ICT, represented by percentage of households with access to Internet, and physical mobility, or at least the associated economic burden of transport expenditure. In this sense, our analysis adds to the growing understanding of the contexts and conditions under which modern information and communication technologies can help in overcoming the cost of spatial separation. As the latter has remained the key driver of how activities are undertaken across time and space, as expressed by Hägerstrand (1970), and consequently how spatial forms of settlements and spaces evolve, findings such as ours are becoming increasingly valuable if not essential to apprehend and model these processes more accurately.
To the best of our knowledge, this is the first analysis of this kind and has implications for policy making. In particular, our findings point towards the potential for Internet access, and thus perhaps also other digital technologies such as mobile devices and online services, to improve welfare by possibly enabling access to activities and opportu-nities at a reduced expenditure for travel. We are, however, cautious about drawing definite conclusions regarding the causal nature of these relationships in the absence of longitudinal perspective, even though our research could not reject the hypothesis of the uni-directional substitution. While understanding the details of these mechanisms warrants further research, we believe that the evidence provided in the present paper supports the need for maintaining digitisation among the top policy measures aimed at improving societal welfare of societies, especially in developing countries.
To further aid these ends, we identify a number of ways in which our present contribution could be extended in future endeavours. Naturally, replicating this research with more recent data, detailed information about travel behaviour, and a sizeable sample (especially in terms of coverage of developing countries) could yield a more up-todate view of these relationships. This is particularly important in light of the unprecedented pace of development of ICT, as well as the associated changes in transport sectors, including use of intelligent transport systems or emergence of new business models such as mobility as a service. Such an investigation would inevitably require data that is more detailed and harmonised across the studied countries in terms of both the use of ICT and travel behaviour. In that instance, additional investigation of the potential biases resulting from differing data collection practices and the applicability of weighting (and other mechanisms) remains worthy of future work, both empirical and simulation-based.
From the methodological point of view, more research seeking to explore the causal nature of these relationships is warranted. We believe there to be potential in the wider use of the IV approach in the ICT and travel behaviour context, for example utilising variables describing the supply side of ICT, e.g. the infrastructure (number of routers and data centres, density and quality of the network). To the best of our knowledge, such approaches have not been explored to date. In addition, more detailed investigation of causality using longitudinal data is also needed.

Funding
This work was supported by the Digital City Exchange research programme at Imperial College London funded by Research Councils UK's Digital Economy Programme (EPSRC Grant No. EP/I038837/1).