Journal of Transport Geography

Consumer data arising from the interaction between customers and service providers are becoming ubiquitous. These data are appealing for research because they are frequently collected and quickly released; they cover a wide variety of attitudes, lifestyles and behavioural characteristics; and they are often dynamically replenished and longitudinal. It is demonstrated that consumer data can make important contributions to understanding problems in transport geography and in solving applied problems ranging from migration, infrastructure investment and retail service provision to commuting and individual mobility. However more e ﬀ ective exploitation of these data depends on the construction of bridges to allow greater freedom in the transfer of data from the commercial to the academic sector; it requires development of frameworks for privacy and ethics in the secondary use of personal data; and it is contingent on the emergence of e ﬀ ective strategies for the amelioration of selection bias which impairs the quality of many consumer data sources.


Introduction
Research in transport geography is strongly driven by data, with a strong emphasis on empiricism as well as methodology and policy. Although more qualitative studies have an important place in transport geography and transport studies, the discipline as a whole has a more explicitly quantitative focus than academic geography, especially in the UK where the 'cultural turn' has led to greater emphasis on qualitative approaches.
Census data has been a staple of spatial analysis in many countries, and continues to be seen as important (e.g. Rae, 2016;Lima et al., 2017;Parolin and Rostami, 2017). The data is typically accurate and detailed with high spatial granularity, easily accessible and presented with high standards of documentation. However census data is also limited in significant wayscollection is infrequent and publication of the data can be slow; the outputs are aggregated and do not permit longitudinal analysis for individuals; and limited insights are provided into consumption, health, lifestyle and wealth.
Of course many other forms of data have been mobilised to address broader needs in transport geography. Various government sources, diary studies and other forms of primary data collection have generated data which are valuable and focused. These data also have restrictions however, for example the expense of their collection means that they are often collected only at a scale which permits limited disaggregation by geography or specific demographic groups.
Recent studies have been to consider the use of new secondary sources including map data, online timetables, vehicle monitoring and crowd-sourced data (Savage and Burrows, 2007;Goodchild, 2007;Haklay and Weber, 2008). These may be viewed as the geospatial instance of a more general trend towards big data collection which is prevalent in many disciplines. Many of these data are easy to obtain through open data websites (e.g. data.gov.uk, tfl.gov.uk, theodi.org).
Commercial data sets have received much less attention in the literature to date. Consider the example of mobile telephone records. Large operators such as O2 and Vodafone routinely generate billions of call records every day, and the geo-location of such records can therefore yield hugely valuable evidence about individual movement patterns at scale at very fine timescales (de Montjoye et al., 2014). In the next section of this paper the case will be made that telephone operators are representative of a broader class of consumer data, and the character of such data will be explored. It will be argued that such data are often strongest where census data and other sources are limited. The strengths and opportunities which consumer data may bring are explored further with examples in Section 3.
Consumer data are also restricted in a number of important senses. In the discussion (Section 4) problems in gaining access to consumer data are considered. The significance of ethical issues is stressed and the variable quality of big data is highlighted. The relative importance of new techniques in data science alongside established methods is reviewed.
The research hypothesis which this paper seeks to investigate is that consumer data will provide valuable ammunition to augment academic investigations in transport geography. It will gauge the challenges and obstacles to a more complete realisation of this project, and suggest pathways towards their amelioration. The UK Census is utilised as a 'straw man' to focus certain specific features and capabilities of consumer data which are not easily replicated. The benefits of consumer data with respect to other forms of data are also highlighted where this is of direct relevance, although a complete enumeration of types of data in the transport geography literature is beyond the scope of the paper.

The advent of consumer data
Administrative data have been characterised as 'found data that are primarily generated for a purpose other than research' (Connelly et al., 2016, p5) [in contrast to 'made data… (which) are designed and collected to address well-defined hypotheses']. Such data are 'often collected for the purposes of registration, transaction and record-keeping' (ibid). Consumer data are 'found' in a similar way within the commercial transactions of business organisations and service providers.
The widespread adoption of e-business, m-commerce and diverse socio-technical innovations from loyalty cards to sensors means that consumer data are now being generated perpetually in the slipstream of everyday life. Retail transactions in malls, restaurants, bars and coffee shops may be logged by loyalty cards, and if purchases require the provision of additional information e.g. for product guarantees then there is a strong likelihood that this will be added to a large panel of similar responses. Trips to health clubs will be added to a database of membership and activities, or increasingly monitored and shared through wearable devices. Even while we sleep devices such as smart energy meters will continue to monitor activity within our households.
A simple anchor for the concept is the notion of consumption or market-based exchange of value, and this suggests potential for a boundary to be drawn to administrative data which are collected for the purpose of government or public administration. Hence tax or pension records may be considered administrative, whereas smart tickets associated with travel patterns should logically to be considered as examples of consumer data, even where the associated transport provider is a local authority or public body. On the basis of this definition, data which are both collected and distributed freely would be excluded. For example, Open Street Map (Haklay, 2010) would not be considered as consumer data on this basis, but products with similar content such as Google Maps or Ordnance Survey map products could reasonably be treated as such. Housing might be considered as an equally nuanced instance, particularly in relation to rental properties. Privately rented accommodation seems to fall squarely into the domain of consumer data, but property under common ownership (known as 'council housing' in the UK) appears to be at the same time both administrative (in the means of collection and management through local authorities) but relates to a transaction with an obvious financial component (the payment of rent).
The treatment of 'social media' as a data type might also be considered at greater length. In some ways social media have similarities to open data which are crowd-sourced, but at the same time the data are collected and controlled by organisations like Twitter and Facebook many of which have catapulted in value on the basis of the commercial value of these data. For the purposes of this paper a broad and inclusive perspective on consumer data will be adopted as an emerging source with its roots in some form of market-based exchange of value.
It is clear that consumer data are extremely varied in their substantive focus, composition, and means of collection. The contributors include businesses whose sector focus is retail, transport, health, energy, finance, property and leisure. Demographic data relating to the population at large are being accessed from a number of partners whose primary business is market analysis. The most well-established datasets include transactions (e.g. outlet sales data), routine administrative data (e.g. customer records) and surveys. More recent sources include loyalty cards, sensors, smart tickets and wearable devices. The composition of many of the data relates to individuals (e.g. personal health status) and households (e.g. family income), but also includes dwellings (e.g. property value), sales outlets (e.g. transactions by retail unit), trips (e.g. smart tickets) and devices (e.g. wearables, smart meters).
The extent to which consumer data may be classed as 'big data' is worthy of further attention. Regarding the traditional '3 Vs' (Laney, 2001) then consumer data is often characterised by volume, velocity and variety. The volumes of data generated from mobile telephone logs have already been noted. A retail organisation with ten million customers each visiting a store once a week to purchase 20 products or more would also generate tens of billions of transactions at the customerproduct level. It will be seen later that consumer data also move at high velocity. In some cases -mobile phones again, smart tickets or online retail transactionsthen real-time analytics may be plausible. Major purchase decisions such as house moves might be monitored on a less frequent basis, by month, quarter or even annually and still operate much more rapidly than conventional sources such as the Census. The variety of consumer data is arguably more notable still, reflecting attitudes and behaviour e.g. in relation to health, leisure, education, work and social interaction, all of which have important implications for transport and mobility.
The additional Vs of 'veracity' (Lukoianova and Rubin, 2014) and 'value' (Lavalle et al., 2011) are also of prominent importance for consumer data. In the next section of the paper, a range of examples will be introduced which highlight the value of these new sources from a range of perspectives. Consumer data offer wide-ranging insights on preferences or revealed behaviour relating to transactions which have actually taken place. Nevertheless the veracity (or reliability) of consumer data is more challenging. Typically such data are drawn from self-selecting and skewed population samples. They are also fundamentally secondary, having been captured as the by-product from a social process, rather than primary sources which are focused to a specific research question or hypothesis (as in a typical panel survey, for instance).
Uneven data quality and bias are important issues for consumer data, but an even more important issue is data availability. The fundamental challenge here is that consumer data are collected and owned by business organisations and so there is a gap to be bridged if academic access is to be established. A related question concerns the ethics of consumer data exploitation since a well-established principle of international law is that the secondary re-use of data collected for one purpose infringes the privacy of the individual. These topics are highlighted as appropriate in the examples, and discussed at greater length in Section 4 of the paper.

The power of consumer data
This section of the paper is framed around a series of examples which point towards the benefits of exploiting consumer data in academic research. For clarity of exposition, these examples are arranged to demonstrate a counterpoint between the restrictions of census data and the strengths of consumer data. Whereas census data are generated intermittently over a long cycle of planning and delivery, consumer data are often contemporaneous. While census data are focused on the socio-economic and demographic character of small area populations, consumer data have broad coverage of lifestyle, activities, behaviour and attitudes. Thirdly, where census data are captured in cross-section at a point in time, consumer data are generated dynamically in repeat. These elements are now considered in turn.

Consumer data are contemporaneous
The value of census data is constrained by the frequency of its collection and the immediacy of release. In the UK at the present time the most recent census was conducted in March 2011, while other advanced nations including France, Brazil and the US have had no census since 2010. The problem is exacerbated by challenges in processing and delivery of the data, with some of the more detailed outputs from the census taking several years from collection to release. For example, origin-destination statistics for workplace zones in England and Wales were published on March 25th 2015 (ONS, 2015). On March 24th the latest equivalent census data would therefore have been 14 years out of date, relating to April 2001.
In contrast, consumer data are often continually collected and updated. A good example would be from information which is collected by online estate agents who typically operate across a complete national market with extensive customer databases. Major businesses in this category in the UK include Zoopla, RightMove and Purple Bricks. Data recently released by Zoopla for use within the academic community shows transactions for 1.2 million house moves, including additional attributes such as price, number of bedrooms and bathrooms, property type and most importantly including postcodes for both the existing and the new property (more details at www.data.cdrc.ac.uk).
In recent work with these data, the fine spatial detail of both origins and destinations of house moves has been exploited to create a unique picture of geodemographic mobility. Each origin and destination postcode was assigned a neighbourhood profile from CACI's ACORN classification (acorn.caci.co.uk). Other geodemographic typologies including the open source Output Area Classification (Gale et al., 2016) might equally well be appended to these postcodes. ACORN has been used here because it has the appealing feature that it approximates a continuum of affluence from 'difficult circumstances' to 'lavish lifestyles'. This continuum is represented in Fig. 1 as a circle running anticlockwise from the most deprived communities at the top of the illustration. Each instance is represented by a thin black line connecting the two neighbourhood types at the beginning and end of each move. The diagram shows very clearly that there is a strong tendency for households to move between similar types of areas. Still more interesting is a pronounced gradient of increasing mobility between the most affluent neighbourhoods. The data in Fig. 1 relates to changing home ownership in England and Wales in 2014, which have now been updated to 2015 and extended through the inclusion of dwelling rentals. Data are available to third party users for approved purposes (www. cdrc.ac.uk).
The importance of this work is indicated by a substantial academic literature on individual and household migration in the UK (Champion, 2007) and internationally (Stillwell et al., 2016). Consumer data can augment classical studies with conventional sources through a combination of fine spatial resolution, updateability, and additional co-variates such as income and house prices. Patterns can be related to the latest social and market forces, such as austerity and Brexit, or major infrastructure investments like Crossrail and HS2 (Transport for the North, 2015). For example, the immediate impact on social mobility through the introduction of a new 'bedroom tax' in the UK (British Welfare Reform Act, 2012) could be assessed e.g. by comparing from year to year or from one quarter to another, without waiting 9 years for a new census, by which time further interventions will have complicated the picture. House price data have been available in the UK at a fine spatial scale for some time (landregistrydata.gov.uk) but these data lack the interactions which are accessible through consumer data for the property market. Other studies using panel data can also provide insights into residential location decisions (e.g. Ettema and van Nieuwenhuis, 2016) but these are expensive to generate and hard to translate across locations due to their restricted sample size.

Consumer data have breadth
ONS have characterised the UK Census as a study of 'who we are, how we live and what we do'. This statement does not stand up to close scrutiny. Certainly coverage is wide-ranging in terms of the socio-economic and demographic characteristics of the population. However many aspects of lifestyle, attitudes and behaviour are beyond the scope of the Census. Examples which could potentially be richly informed by consumer data include dietary patterns (e.g. from supermarket loyalty cards), affluence (e.g. from house prices, credit cards or bank records) and leisure patterns (e.g. from gym memberships, cinema attendance, holiday questionnaires).
One aspect in which consumer data can add significant breadth is in relation to active travel. While it is true that the census captures data about journey patterns, inferences about mode preference are limited to the capture of a single question relating to trip selection for commuting on census day itself. Wearable devices are now starting to create a much more detailed picture of exercise patterns for recreation, fitness, work and other routine daily activities. Examples from an app with several thousand users (CDRC, 2017) are shown in Fig. 2. The benefits of active travel, and the relative merits of high intensity short duration activities (eg cycling) versus low intensity long duration (e.g. walking) could be richly informed by analytics of daily mobility data from sources such as these, especially if health outcomes could be linked (Sabia et al., 2014).
A different kind of example with many ramifications for transport geography is illustrated in Fig. 3. Here data from a supermarket loyalty card programme is able to distinguish between transactions which take place in store and those which take place online. Overlaying these data with the underlying social geography (demographics and urbanisation), competitor networks and transport networks provides rich evidence for the analytics of social and spatial effectsfor example, whether online share can substitute for physical accessibility in areas of low floorspace provision and the extent to which laggards (e.g. the elderly and less affluent) have begun to close the gap with early adopters of the new technology. The map shows a relatively complex outcome from the tension between two rather simple effectsthe desire of rural residents with low supermarket access to shop online, versus the tendency for young, well-educated early adopters to cluster in urban areas. Understanding variations in the demand preferences of retail customers is important in assessing spatial inequalities, land use zoning and distribution networks, with implications for policy regulation both nationally (e.g. competition) and locally (e.g. planning processes). Similar insights are not possible from the Census, which lacks a focus on either retail trips or expenditure.

Consumer data are dynamic
By definition, the census is by nature fundamentally an audit of the state of the population at a point in time, providing a detailed snapshot of socio-economic and demographic conditions at a single point in time. In contrast, many forms of consumer data are continually replenished at a high velocity, and increasingly at or close to real time. Examples of data which are collected at very high speed include number plate recognition sensors in car parks or on toll roads, online retail websites which track or increasingly seek to influence customer search and purchasing behaviour, and biometric markers which are continually monitored by wearable devices such as smart watches and accelerometers.
Furthermore, consumer data are typically collected in a way which facilitates longitudinal interrogation. Supermarket loyalty cards include repeat shopping transactions by a single individual cardholder over extended periods of time. Account-based business models, such as social media using a combination of unique identifying credentials will naturally connect interactions of a specific user in time and often in space.
A specific instance of the potential of consumer data for dynamic analysis is shown in Fig. 4, where social media data from the 'Four-Square' app have been used to profile the activity types of individuals. Here the activities are determined prima facie from the 'check in' processfor example, 'I'm at PureGym' would be interpreted as a leisure/ fitness activity. In this illustration the focus is reduced to two specific retail and leisure destinations (supermarket grocers and coffee houses); locations are characterised as city centre according to a simple buffering procedure (within 2 km of the centre), otherwise suburban; and time is identified on a 24 h clock split by weekdays and weekends.
Cross-referencing these mobilities in both space and time starts to reveal variable patterns suggesting the underlying construction of individual movement behaviours. It is notable for example that the supermarket profiles are quite different on a Saturday and Sunday to those in the week, but equally that coffee houses in suburban zones have a very different patterns of business when compared to their city centre equivalents.
The space-time geography of mobility patterns within cities has been the subject of elegant theories since at least the 1960s (Hägerstrand, 1967). In the past, evidence about individual movements has been collected using diaries and panel studies which are expensive and inexact. Small area spatial analysis is typically impossible with such sparse data. Thus it has been the case that 'there are often too few travel diary records (A)s with most metropolitan surveys, … census tracts can (only) be combined into sub-areas … larger than a residential neighbourhood' (Cervero and Kockelman, 1997). Even relatively recently, the implementation of small number studies has been expensive leading to restricted sample sizes and a similar absence of spatial detail (e.g. van den Berg et al. (2013) with 747 'useful' diaries, or Rubin et al. (2014) with less than 2000 cases from a national panel).
An example of real-time consumer data which is generated from a mobile device is shown in Fig. 5. In this case data are generated from a journey-planning app which can be downloaded to a smartphone and produces a breadcrumb trail of the mobility of the useror strictly speaking the deviceat regular intervals throughout the day. Algorithms can be devised to add value to data of this type including  assignment of mode of travel according to the speed and modularity of individual trajectories, linked if necessary to third party data such as bus and rail timetables (Zahabi and Patterson, 2016). The capability to profile mobilities around the clock with consumer data provides new insights into service delivery, for example to distinguish between locations for the provision of convenience versus big box supermarket retailing (Waddington et al., 2018). Dynamic movement patterns in real-time show great potential for the analysis of mobility behaviours with the capacity for predictive analytics to inform interventions or 'what if?' policy effects. For example, a Propensity to Cycle tool has been developed for policy use by a multidisciplinary and multi-institutional research group blending census data with social media records from sources such as CycleStreets.net. The tool aims to evaluate the impact of changing propensities to cycle alongside new infrastructure investment to evaluate health and environmental benefits of behavioural adjustment (Woodcock et al., 2017).

The challenges of consumer data
While stressing the benefits and opportunities from the deployment of consumer data for mobility studies, the examples of Section 3 have also identified a number of obstacles which might impede further progress. In this section four major challenges are considered under the following headingsaccess to data; privacy and ethics; data quality; and predictive capabilities.

Access to data
Since consumer data arise as the product of a commercial   transaction between an organisation and its customers the data are typically owned and controlled by business organisations which are external to the academic sector. In order to exploit such data for research it is therefore necessary to construct bridges to promote the flow of data from business to academia. The case for sharing data in this way can be driven through a number of principles: i. Corporate social responsibilityorganisations are willing to share data in order to demonstrate a contribution to society which may form an important of a brand proposition or corporate ethos. An excellent example in the context of consumer data would be the D4D-Senegal open innovation data challenge in which the goal is 'to help address society development questions in novel ways by … (allowing) … access to three mobile phone datasets' (de Montjoye et al., 2014). ii. Common interestorganisations are willing to share data with academic partners in order to promote research which is of specific interest to the data owner. This has historically been a popular basis for providing access. The dangers in such an approach are that it could distort the conduct of academic research from the most important or intellectually challenging problems to those which are of the greatest commercial value, and that if the sharing of data is restricted then the research may be impossible to replicate. iii. Legal enforcementin response to consultation on the (then) digital economy bill in the UK it has been argued that 'data sharing for research purposes can … ultimately benefit the wider public … (through inclusion of) private sector data' (ESRC, 2016). However the Act itself only extends to voluntary sharing of administrative data for uses such as 'improve(d) public service delivery' (HM Government, 2017, Section 35) or fuel poverty (Sections 36, 37). Further legislation would thus be required to mandate sharing of consumer data for research purposes. iv. Data mutualisationthe principle that personal data should be owned and controlled not by the collectors of the data but by the individuals to whom the data pertain. Frameworks are now beginning to emerge in which individuals might choose to pool their data in order to achieve desired goals which might potentially include academic research -'a world in which everyone owns and controls their own data' (www.digi.me).
Open data has been promoted as a means to solve the problem of access to big data sets. In the above example, the mobile telephone company Orange has made data available (i.e. D4D-Senegal) with useful applications for mobility research. TfL is another organisation which has made a strategic commitment to sharing data with the aspiration to encourage algorithms and apps which are of value to its stakeholders (Everitt, 2014). A number of Open Data Institutes (https:// theodi.org/) have now been established to promote use of data sets relating to e.g. air quality, vehicle movements and social media. The Open Data model is poorly suited to consumer data in view of the commercial value or sensitivity of business data. In order to make consumer data more widely available to encourage reproducibility but within a controlled environment, the Economic and Social Research Council has commissioned the creation of a network of big data repositories, including the Consumer Data Research Centre, Urban Big Data Centre and Business & Local Government Research Centre. The Centres follow an operating model previously established by the UK Data Service which distinguishes between open, safeguarded and secure data, embedded in research approvals processes which restrict access to bona fide academic purposes and applying high standards of physical security where necessary. This has not yet been widely replicated on the international stage, although national statistical agencies are beginning a more active exploration of consumer data into official statistics e.g. in the Netherlands using mobile phone data to estimate daytime populations in its Centre for Big Data Statistics (www.cbs.nl).

Privacy and ethics
Some of the difficulties associated with research use of big data are highlighted by the recent ICO ruling (3rd July 2017) on the legality of data sharing between Google Deep Mind and the Royal Free Hospital (ICO, 2017). In July 2015 the health care provider agreed to provide 1.6 million patient records for improved health diagnosis with big data analytics. While noting the 'huge potential that creative use of data could have for patient care', the ruling has been firm in restating the clear frameworks and procedures which need to be adopted in order to undertake such work. The essence of the legal framework is that individual level data may not be used for secondary research beyond the purpose for which they were originally collected without the knowledge and consent of the subject.
Three approaches to enforce privacy are adopted within the census itself. The first is the principle of aggregation, so that data are released as tabular counts coded to specific neighbourhoods. The second is the principle of perturbation, which has been implemented in the UK Census by a process of 'Barnardisation' in which cell counts are randomly incremented or decremented in specific tables. Procedures for perturbation or differential privacy e.g. through the addition of random noise to data are also well-known in the statistics literature (Dwork, 2008). The third principle is anonymisation, so that where samples of individual data are made available (for restricted use only) then potentially disclosive identifiers are replaced by suitably coarse attributes, including the substitution of broad regional geographies for local neighbourhood attribution.
The principle of aggregation has been widely adopted in emerging uses of consumer data for academic research. For example, income data from unique households may be averaged across larger spatial units including streets with ten to twenty properties in order to preserve confidentiality. This allows safe and legal use of data but restricts value. Thus a recent study set out to explore the hypothesis that income and obesity are (negatively) correlated where the key decision-making unit is a household or the individuals within it. Further aggregation weakens the power of inference. In order to benefit from data linkage e.g. of expenditure records to health outcomes then pseudonymisation, explicit consent or trusted third party linkage are options. The possibly of explicit consent to link consumer loyalty card data to patient outcomes is currently being pursued in a project between researchers and Leeds Teaching Hospitals Trust, but the proof of concept stage of this project alone will take several months or years to complete.

Quality
An evaluation of strategies for ameliorating the selection bias in consumer data is urgently required. Variable response rates from social media data, particularly with lower utilisation amongst older demographics, have been widely reported. Typically, retailers and service organisations will have target markets which are unevenly distributed. Since geodemographic classifications have been developed as a means to incorporate multidimensional social and demographic variations on the basis of simple area-based indicators, then these also present themselves as a means to explore bias. An example is shown in Fig. 6 (Kirby-Hawkins, 2017), where the customer base of several large retail organisations has been profiled using a geodemographic classification, demonstrating a significant skew towards affluent and rural areas.
Other forms of misrepresentation in consumer data include the potential for bias in time series data or reporting of activity patterns. Thus if Twitter were used as a means to understand movement patterns in time, then the greater propensity of individuals to tweet from leisure destinations or educational establishments, rather than from homes or workplaces, would need to be accommodated. Consumer datasets are often noisy, especially in relation to geolocation of spatial data from telephone signals or trackers. Volunteered information from surveys or transactional databases will usually be incomplete to varying degrees, hence data relating to characteristics such as house prices or incomes, which are perceived as sensitive, are likely to be under-reported relative to less contentious aspects such as age or gender. Furthermore, data on activities and behaviour are likely to be partialit is possible that a basket of supermarket purchases revealing a healthy lifestyle is countered by different patterns of consumption in quick service restaurants or bars.
In order to alleviate the difficulties presented by selection bias, a microsimulation approach has been adopted elsewhere (Birkin et al., 2017). Here individual attributes were simulated across the population of a whole city, and then linked with consumer data on travel destination choice using shared demographic attributes including age, gender, family status and social group. In effect this presents a sophisticated reweighting mechanism to adjust from a consumer data sample to a complete population.
Self-selection in the sample can also be an important challenge for consumer data. If activity data from wearable devices is to be exploited, how is it possible to gauge the likelihood that the individuals in a given demographic segment are more likely to adopt the technology if they have a more active lifestyle to begin with? Approaches based on some form of cross-validation (Lovelace et al., 2016) are perhaps the most likely way forward. For example movement data from a self-selecting app could be scaled up to a more complete and independent source such as sensor data in order to increase levels of robustness.

Methods
Transport geography has developed a rich tradition in spatial analysis with a huge arsenal of methods and theories ranging from spatial interaction models and microsimulation to agent-based simulation and discrete choice modelling. Such techniques are rich in potential for application in the new domain of consumer data research, in much the same way that gravity models from the 1970s and 1980s were shown to have powerful application to business planning with commercial data in the 1990s and beyond (Birkin et al., 2002(Birkin et al., , 2010. The scope for a 'Fourth Paradigm' in which new mathematical and computational methods are required to drive forward new approach to inductive and data intensive research (Bell et al., 2009;Hey et al., 2009) should be therefore be qualified by the capacity for absorption of new data within existing methods frameworks.
At the same time, the methods themselves will continue to evolve and diversify, not least through translation of research between disciplines. The generation of increasingly large samples of human movement from accelerometers or other mobile devices may call for the adaptation of methods from topology or computational geometry which may already have been translated into contexts from the natural world in which tracking and tagging have been established for longer (Dodge et al., 2016). Natural language processing could be richly beneficial in adding value to increasingly widespread data from text or video (Procter et al., 2013). A variety of methods from machine learning, such as random forests, supervised learning, and support vector machines all have proven capability for the analysis of movement amongst human populations (Carlos and Matos, 2013). The introduction of new consumer data sets could therefore offer a powerful stimulus to the development of existing methods and their integration with some of the best of the new approaches.

Conclusions
New forms of data continue to proliferate. Consumer data arises from sources ranging from retail transactions and customer records to loyalty cards and wearable devices. These data are varied, and often generated at high velocity. The volume of specific datasets can be great, but is not necessarily so.
Consumer data have distinctive characteristics which offer great potential when used as a complement or as an alternative to census data. Beneficial features are that consumer data are contemporaneous where census data are legacy; they are often dynamic and longitudinal rather than cross-sectional and static; and can provide a breadth of content beyond the socio-economic and demographic. When compared to data collection from travel diaries or other forms of primary research data collection consumer data are likely to be cheap, scalable to national and sometimes international scales, and collected in sufficient volume to provide better levels of spatial and temporal disaggregation.
The augmentation of research in transport geography using consumer data is indicated by a number of case studies which have been introduced here. The breadth of consumer data means that it is possible to investigate mobility patterns and processes in the retail and leisure sectors, revealing new patterns such as spatial variation in channel preferences for consumer purchasing. The dynamics of consumer data are making it possible to understand the daily, weekly and seasonal variations in city life, under strong influence of urban morphology and infrastructure, including transport. The contemporaneous supply of consumer data increases the impact of research in transport geography, making it immediately applicable to emerging trends and social influences. This may be particularly significant for policy applications e.g. allowing new legislation to be evaluated in a much more timely fashion than is possible through long-cycle surveys such as a national census.
In order to realise the potential of consumer data, careful ethical controls and well-designed security protocols are required. New methods from data science may be valuable, but opportunities to restore and reinvigorate classical techniques in the light of new data should not be ignored. It is important to design and implement conduits for data sharing between academic and business organisations through investment in infrastructure such as ESRC's Big Data Network in the UK.